• Davidlohr Bueso's avatar
    fs/epoll: loosen irq safety in ep_scan_ready_list() · 002b3436
    Davidlohr Bueso authored
    Patch series "fs/epoll: loosen irq safety when possible".
    
    Both patches replace saving+restoring interrupts when taking the ep->lock
    (now the waitqueue lock), with just disabling local irqs.  This shows
    immediate performance benefits in patch 1 for an epoll workload running on
    Xen.  The main concern we need to have with this sort of changes in epoll
    is the ep_poll_callback() which is passed to the wait queue wakeup and is
    done very often under irq context, this patch does not touch this call.
    
    Patches have been tested pretty heavily with the customer workload,
    microbenchmarks, ltp testcases and two high level workloads that use epoll
    under the hood: nginx and libevent benchmarks.
    
    This patch (of 2):
    
    Saving and restoring interrupts in ep_scan_ready_list() is an
    overkill as it is never called with interrupts disabled. Loosen
    this to simply disabling local irqs such that archs where managing
    irqs is expensive or virtual environments. This patch yields
    some throughput improvements on a workload that is epoll intensive
    running on a single Xen DomU.
    
    1 Job	 7500	-->    8800 enq/s  (+17%)
    2 Jobs	14000   -->   15200 enq/s  (+8%)
    3 Jobs	20500	-->   22300 enq/s  (+8%)
    4 Jobs	25000   -->   28000 enq/s  (+8-12)%
    
    On bare metal:
    
    For a 2-socket 40-core (ht) IvyBridge on a few workloads, unfortunately I
    don't have a xen environment and the results for Xen I do have (which
    numbers are in patch 1) I don't have the actual workload, so cannot
    compare them directly.
    
    1) Different configurations were used for a epoll_wait (pipes io)
       microbench (http://linux-scalability.org/epoll/epoll-test.c) and shows
       around a 7-10% improvement in overall total number of times the
       epoll_wait() loops when using both regular and nested epolls, so very
       raw numbers, but measurable nonetheless.
    
    # threads	vanilla		dirty
         1		1677717		1805587
         2		1660510		1854064
         4		1610184		1805484
         8		1577696		1751222
         16		1568837		1725299
         32		1291532		1378463
         64		 752584		 787368
    
       Note that stddev is pretty small.
    
    2) Another pipe test, which shows no real measurable improvement.
       (http://www.xmailserver.org/linux-patches/pipetest.c)
    
    Link: http://lkml.kernel.org/r/20180720172956.2883-2-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
    Cc: Jason Baron <jbaron@akamai.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    002b3436
eventpoll.c 62.7 KB