• Christoph Lameter's avatar
    vmstat: on-demand vmstat workers V8 · 7cc36bbd
    Christoph Lameter authored
    vmstat workers are used for folding counter differentials into the zone,
    per node and global counters at certain time intervals.  They currently
    run at defined intervals on all processors which will cause some holdoff
    for processors that need minimal intrusion by the OS.
    
    The current vmstat_update mechanism depends on a deferrable timer firing
    every other second by default which registers a work queue item that runs
    on the local CPU, with the result that we have 1 interrupt and one
    additional schedulable task on each CPU every 2 seconds If a workload
    indeed causes VM activity or multiple tasks are running on a CPU, then
    there are probably bigger issues to deal with.
    
    However, some workloads dedicate a CPU for a single CPU bound task.  This
    is done in high performance computing, in high frequency financial
    applications, in networking (Intel DPDK, EZchip NPS) and with the advent
    of systems with more and more CPUs over time, this may become more and
    more common to do since when one has enough CPUs one cares less about
    efficiently sharing a CPU with other tasks and more about efficiently
    monopolizing a CPU per task.
    
    The difference of having this timer firing and workqueue kernel thread
    scheduled per second can be enormous.  An artificial test measuring the
    worst case time to do a simple "i++" in an endless loop on a bare metal
    system and under Linux on an isolated CPU with dynticks and with and
    without this patch, have Linux match the bare metal performance (~700
    cycles) with this patch and loose by couple of orders of magnitude (~200k
    cycles) without it[*].  The loss occurs for something that just calculates
    statistics.  For networking applications, for example, this could be the
    difference between dropping packets or sustaining line rate.
    
    Statistics are important and useful, but it would be great if there would
    be a way to not cause statistics gathering produce a huge performance
    difference.  This patche does just that.
    
    This patch creates a vmstat shepherd worker that monitors the per cpu
    differentials on all processors.  If there are differentials on a
    processor then a vmstat worker local to the processors with the
    differentials is created.  That worker will then start folding the diffs
    in regular intervals.  Should the worker find that there is no work to be
    done then it will make the shepherd worker monitor the differentials
    again.
    
    With this patch it is possible then to have periods longer than
    2 seconds without any OS event on a "cpu" (hardware thread).
    
    The patch shows a very minor increased in system performance.
    
    hackbench -s 512 -l 2000 -g 15 -f 25 -P
    
    Results before the patch:
    
    Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
    Each sender will pass 2000 messages of 512 bytes
    Time: 4.992
    Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
    Each sender will pass 2000 messages of 512 bytes
    Time: 4.971
    Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
    Each sender will pass 2000 messages of 512 bytes
    Time: 5.063
    
    Hackbench after the patch:
    
    Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
    Each sender will pass 2000 messages of 512 bytes
    Time: 4.973
    Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
    Each sender will pass 2000 messages of 512 bytes
    Time: 4.990
    Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
    Each sender will pass 2000 messages of 512 bytes
    Time: 4.993
    
    [fengguang.wu@intel.com: cpu_stat_off can be static]
    Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
    Reviewed-by: default avatarGilad Ben-Yossef <gilad@benyossef.com>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: John Stultz <john.stultz@linaro.org>
    Cc: Mike Frysinger <vapier@gentoo.org>
    Cc: Minchan Kim <minchan.kim@gmail.com>
    Cc: Hakan Akkan <hakanakkan@gmail.com>
    Cc: Max Krasnyansky <maxk@qti.qualcomm.com>
    Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Viresh Kumar <viresh.kumar@linaro.org>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7cc36bbd
vmstat.c 38.1 KB