• Roman Penyaev's avatar
    epoll: use rwlock in order to reduce ep_poll_callback() contention · a218cc49
    Roman Penyaev authored
    The goal of this patch is to reduce contention of ep_poll_callback()
    which can be called concurrently from different CPUs in case of high
    events rates and many fds per epoll.  Problem can be very well
    reproduced by generating events (write to pipe or eventfd) from many
    threads, while consumer thread does polling.  In other words this patch
    increases the bandwidth of events which can be delivered from sources to
    the poller by adding poll items in a lockless way to the list.
    
    The main change is in replacement of the spinlock with a rwlock, which
    is taken on read in ep_poll_callback(), and then by adding poll items to
    the tail of the list using xchg atomic instruction.  Write lock is taken
    everywhere else in order to stop list modifications and guarantee that
    list updates are fully completed (I assume that write side of a rwlock
    does not starve, it seems qrwlock implementation has these guarantees).
    
    The following are some microbenchmark results based on the test [1]
    which starts threads which generate N events each.  The test ends when
    all events are successfully fetched by the poller thread:
    
     spinlock
     ========
    
     threads  events/ms  run-time ms
           8       6402        12495
          16       7045        22709
          32       7395        43268
    
     rwlock + xchg
     =============
    
     threads  events/ms  run-time ms
           8      10038         7969
          16      12178        13138
          32      13223        24199
    
    According to the results bandwidth of delivered events is significantly
    increased, thus execution time is reduced.
    
    This patch was tested with different sort of microbenchmarks and
    artificial delays (e.g.  "udelay(get_random_int() & 0xff)") introduced
    in kernel on paths where items are added to lists.
    
    [1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
    
    Link: http://lkml.kernel.org/r/20190103150104.17128-5-rpenyaev@suse.deSigned-off-by: default avatarRoman Penyaev <rpenyaev@suse.de>
    Cc: Davidlohr Bueso <dbueso@suse.de>
    Cc: Jason Baron <jbaron@akamai.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a218cc49
eventpoll.c 64.8 KB