Kqueue : Generic Event Notification




Mahendra M
Mahendra_M@infosys.com
http://www.infosys.com


This work is licensed under a Creative Commons License
http://creativecommons.org/licenses/by-sa/2.5/
Agenda
   Traditional ways of multiplexing I/O
   Methods and issues in handling asynchronous events.
   Enter Kqueue
   The Kqueue architecture.
   Kqueue possibilities.
Traditional File/Socket handling
   Traditionally a single file can be handled as below
    /* No error checking here */
    while ( i = read( fd, ... ) ) {
         do_something( with_this_data );
    }
   The above case works fine for one file descriptor
   What about the case where we have two or more such
    descriptors ( for sockets ) and data can appear on any one
    of the socket at any given point of time ?
    –   Basically, we need a mechanism for event driven applications.
    –   This is a case for multiplexing I/O ( or events ) !!
Traditional I/O multiplexing
   Use select() and/or poll()
   select() or poll() pass a list of file descriptors to the kernel
    and wait for updates to happen. On receiving an update
    these calls have the list of file descriptors that got updated.
   File descriptors passed as a bitmap – with each bit being set
    or unset to represent a file descriptor.
   Select() and poll() can watch for read/write/exception events
    on the list of file descriptors.
   On return, the applications have to parse the entire bitmap to
    see which file descriptors have to be handled.
Traditional I/O multiplexing ( contd.. )

fd_set fds;
FD_ZERO( &fds );
FD_SET( 5, &fds );
n = select( 1, &fds, NULL, NULL, NULL );
j = 0;
for ( i = 0; (i < MAX) && (j < n); i++ ) {
    if ( FD_ISSET( i ) ) {
       read_something_from_socket( i );
       j++;
    }
}
Issues with select()/poll()
   Problems of scalability
    –   Entire descriptor set has to be passed to each invocation of
        the system call ( specially with poll() - which uses an array )
    –   Massive copies from user space to kernel space and vice-
        versa
    –   Not all descriptors may have activity all the time
    –   On return, apps had to parse the entire list to check for
        updated descriptors. ( duplicated effort in kernel and app ) -
        O(N) activity
    –   Results in inefficient memory usage within the kernel
    –   In case of sleep, the list has to be parsed three times.
   sleep()/poll() can handle only file descriptors
   Coding was clunky for select()
    –   Descriptor set is a bitmap of fixed size ( default 255 )
Other forms of interesting events
   Asynchronous signal notifications
    –   Required in libraries that may want to be notified of signals
   Asynchronous timer expiry
   Asynchronous Read/Write ( aio_read(), aio_write() )
   VFS changes
   Process state Changes
   Thread state changes
   Device driver notifications
   Anything else – that will require some asynchronous event
    notification – and the design allowing it.
Available solutions
   Linux 2.4 : SIGIO
   Sun Solaris : /dev/poll
   Linux 2.4 : /dev/epoll
    –   Use ioctl() to manipulate the above.
   Even Microsoft Windows had something to offer.
   Kqueue – for BSD boxes.
    –   We shall be talking about that now !!
Kqueue - Goals
   A generic event notification framework
    –   File descriptors (read/write/exceptions), Signals,
        Asynchronous I/O ( not in OSFR ), Vnodes monitoring,
        process monitoring, Timer events.
   A single system call to handle all this.
   Capability to add new functionality.
   Efficient use of memory
    –   Memory should be allocated as per need.
    –   Should be able to register/receive interested number of
        events.
    –   Events should be combined ( eg: data arriving over a socket )
   Should be good replacements for standard calls.
   Should be possible to extend this functionality easily
Kqueue APIs
   int32_t kqueue( void );
    –   Creates a kernel queue. It is identical to a file descriptor. It can
        be deleted using the close() system call.
   int32_t kevent( kq, changes, nc, events, ne,
    timeout );
    –   To register events in the kernel queue
    –   To receive events that occurred between consecutive calls.
    –   Can simulate select(), poll() - Using different values of timeout
    –   No need to store the event descriptors locally in the
        application.
   EV_SET( &event, ident, filter, flags,
    fflags, data, udata)
    –   Used to prepare an event for registering in the kernel queue.
Kqueue sample code
kq = kqueue();
struct kevent kev[10];
// Prepare an event
EV_SET( &kev[0], fd, EVFILT_READ, EV_ADD, 0, 0, 0);
// Register an event
kevent( kq, &kev, 10, NULL, 0, timeout );


// Receive events
n = kevent( kq, NULL, 0, &kev, 10, timeout );
for ( i = 0; i < n; i++ ) {
    // Do something
}
Kqueue filter types
   READ : Returns when data is available for read from
    sockets, vnodes, fifos, pipes
    –   ident = descriptor
    –   Data = amount of data to be read
    –   Flags = can be EOF etc.
   WRITE : Returns when it is possible to write to a descriptor
    ( ident ).
    –   Data = amount of data that can be written
   VNODE : Returns when a file descriptor changes
    –   fflags = delete, write, extend, attrib, link, rename, revoke
Kqueue filter types ( contd... )
   PROC : Monitors a process
    –   Ident = pid of the process to be monitored.
    –   Fflags = Exit, fork, exec, track, trackerr
   SIGNAL : Returns when a signal is delivered to a process.
    –   Ident = signal number
    –   Data = no of times the signal was delivered.
    –   Co-exists with signal() and sigaction() - and has a lower
        precedence.
    –   Is delivered even if SIG_IGN is set for the signal
   TIMER : Establishes a timer
    –   ident = timer id, Data = timeout in milliseconds, or no of times
    –   Periodic by default unless ONESHOT is specified
Kqueue Flags
   ADD : To add an event to the queue
   ENABLE : To enable a disabled event
   DISABLE : To temporarily disable an event ( not deleted )
   DELETE : Remove an event from the kernel queue
   ONESHOT : Cause the event to happen only once.
   CLEAR : Clear the state of the filter after it is received
   EOF : End – of – File
   ERROR : Specific errors.
Kqueue – Good things
   As you would have seen – It is extremely scalable in
    handling large file descriptors
     –   Eliminates most of the deficiencies of select()/poll()
     –   Currently, efforts are underway to migrate some popular
         daemons ( Apache ) to use Kqueue.
   It supports a wide range of events – not just file descriptors.
   Is easily extensible.
   New kqueue filters can be added very easily inside the BSD
    kernels.
   Opens up a lot of interesting possibilities.
Issues with Kqueue
   Kqueue calls are not part of POSIX specifications.
    –   Most of the Unix systems do not implement it.
    –   Breaks portability across Unices
   Third party code may still use select(), poll() etc. We may
    have to migrate this or allow these to co-exist
   Relatively new in the play field – Not time-tested.
References
   Kqueue: A generic and scalable event notification facility -
    Jonathan Lemon
        http://people.freebsd.org/~jlemon/papers/kqueue.pdf
   Man pages for kqueue, knote, kfilter_register
   Read the source, Luke !!
Finally ...
   Questions ??
   Thanks to
    –   Organizers for giving me a chance to speak at GNUnify 2006
    –   NetBSD and Linux developers who helped me during my work
    –   To Infosys for sponsoring my visit to GNUnify 2006
   Special thanks to YOU for listening...


                      You can contact me at :
                    Mahendra_M@infosys.com