• Jesper Dangaard Brouer's avatar
    pktgen: RCU-ify "if_list" to remove lock in next_to_run() · 8788370a
    Jesper Dangaard Brouer authored
    The if_lock()/if_unlock() in next_to_run() adds a significant
    overhead, because its called for every packet in busy loop of
    pktgen_thread_worker().  (Thomas Graf originally pointed me
    at this lock problem).
    
    Removing these two "LOCK" operations should in theory save us approx
    16ns (8ns x 2), as illustrated below we do save 16ns when removing
    the locks and introducing RCU protection.
    
    Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30:
     (single CPU performance, ixgbe 10Gbit/s, E5-2630)
     * Prev   : 5684009 pps --> 175.93ns (1/5684009*10^9)
     * RCU-fix: 6272204 pps --> 159.43ns (1/6272204*10^9)
     * Diff   : +588195 pps --> -16.50ns
    
    To understand this RCU patch, I describe the pktgen thread model
    below.
    
    In pktgen there is several kernel threads, but there is only one CPU
    running each kernel thread.  Communication with the kernel threads are
    done through some thread control flags.  This allow the thread to
    change data structures at a know synchronization point, see main
    thread func pktgen_thread_worker().
    
    Userspace changes are communicated through proc-file writes.  There
    are three types of changes, general control changes "pgctrl"
    (func:pgctrl_write), thread changes "kpktgend_X"
    (func:pktgen_thread_write), and interface config changes "etcX@N"
    (func:pktgen_if_write).
    
    Userspace "pgctrl" and "thread" changes are synchronized via the mutex
    pktgen_thread_lock, thus only a single userspace instance can run.
    The mutex is taken while the packet generator is running, by pgctrl
    "start".  Thus e.g. "add_device" cannot be invoked when pktgen is
    running/started.
    
    All "pgctrl" and all "thread" changes, except thread "add_device",
    communicate via the thread control flags.  The main problem is the
    exception "add_device", that modifies threads "if_list" directly.
    
    Fortunately "add_device" cannot be invoked while pktgen is running.
    But there exists a race between "rem_device_all" and "add_device"
    (which normally don't occur, because "rem_device_all" waits 125ms
    before returning). Background'ing "rem_device_all" and running
    "add_device" immediately allow the race to occur.
    
    The race affects the threads (list of devices) "if_list".  The if_lock
    is used for protecting this "if_list".  Other readers are given
    lock-free access to the list under RCU read sections.
    
    Note, interface config changes (via proc) can occur while pktgen is
    running, which worries me a bit.  I'm assuming proc_remove() takes
    appropriate locks, to assure no writers exists after proc_remove()
    finish.
    
    I've been running a script exercising the race condition (leading me
    to fix the proc_remove order), without any issues.  The script also
    exercises concurrent proc writes, while the interface config is
    getting removed.
    Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
    Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    8788370a
pktgen.c 93.8 KB