• Jason Baron's avatar
    epoll: add EPOLLEXCLUSIVE flag · df0108c5
    Jason Baron authored
    Currently, epoll file descriptors or epfds (the fd returned from
    epoll_create[1]()) that are added to a shared wakeup source are always
    added in a non-exclusive manner.  This means that when we have multiple
    epfds attached to a shared fd source they are all woken up.  This creates
    thundering herd type behavior.
    
    Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the
    'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation.  This new
    flag allows for exclusive wakeups when there are multiple epfds attached
    to a shared fd event source.
    
    The implementation walks the list of exclusive waiters, and queues an
    event to each epfd, until it finds the first waiter that has threads
    blocked on it via epoll_wait().  The idea is to search for threads which
    are idle and ready to process the wakeup events.  Thus, we queue an event
    to at least 1 epfd, but may still potentially queue an event to all epfds
    that are attached to the shared fd source.
    
    Performance testing was done by Madars Vitolins using a modified version
    of Enduro/X.  The use of the 'EPOLLEXCLUSIVE' flag reduce the length of
    this particular workload from 860s down to 24s.
    
    Sample epoll_clt text:
    
    EPOLLEXCLUSIVE
    
      Sets an exclusive wakeup mode for the epfd file descriptor that is
      being attached to the target file descriptor, fd.  Thus, when an event
      occurs and multiple epfd file descriptors are attached to the same
      target file using EPOLLEXCLUSIVE, one or more epfds will receive an
      event with epoll_wait(2).  The default in this scenario (when
      EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
      EPOLLEXCLUSIVE may only be specified with the op EPOLL_CTL_ADD.
    Signed-off-by: default avatarJason Baron <jbaron@akamai.com>
    Tested-by: default avatarMadars Vitolins <m@silodev.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Al Viro <viro@ftp.linux.org.uk>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Eric Wong <normalperson@yhbt.net>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    df0108c5
eventpoll.c 59.5 KB