• Peter Zijlstra's avatar
    sched/core: Fix wake_affine() performance regression · d153b153
    Peter Zijlstra authored
    Eric reported a sysbench regression against commit:
    
      3fed382b ("sched/numa: Implement NUMA node level wake_affine()")
    
    Similarly, Rik was looking at the NAS-lu.C benchmark, which regressed
    against his v3.10 enterprise kernel.
    
    PRE (current tip/master):
    
     ivb-ep sysbench:
    
       2: [30 secs]     transactions:                        64110  (2136.94 per sec.)
       5: [30 secs]     transactions:                        143644 (4787.99 per sec.)
      10: [30 secs]     transactions:                        274298 (9142.93 per sec.)
      20: [30 secs]     transactions:                        418683 (13955.45 per sec.)
      40: [30 secs]     transactions:                        320731 (10690.15 per sec.)
      80: [30 secs]     transactions:                        355096 (11834.28 per sec.)
    
     hsw-ex NAS:
    
     OMP_PROC_BIND/lu.C.x_threads_144_run_1.log: Time in seconds =                    18.01
     OMP_PROC_BIND/lu.C.x_threads_144_run_2.log: Time in seconds =                    17.89
     OMP_PROC_BIND/lu.C.x_threads_144_run_3.log: Time in seconds =                    17.93
     lu.C.x_threads_144_run_1.log: Time in seconds =                   434.68
     lu.C.x_threads_144_run_2.log: Time in seconds =                   405.36
     lu.C.x_threads_144_run_3.log: Time in seconds =                   433.83
    
    POST (+patch):
    
     ivb-ep sysbench:
    
       2: [30 secs]     transactions:                        64494  (2149.75 per sec.)
       5: [30 secs]     transactions:                        145114 (4836.99 per sec.)
      10: [30 secs]     transactions:                        278311 (9276.69 per sec.)
      20: [30 secs]     transactions:                        437169 (14571.60 per sec.)
      40: [30 secs]     transactions:                        669837 (22326.73 per sec.)
      80: [30 secs]     transactions:                        631739 (21055.88 per sec.)
    
     hsw-ex NAS:
    
     lu.C.x_threads_144_run_1.log: Time in seconds =                    23.36
     lu.C.x_threads_144_run_2.log: Time in seconds =                    22.96
     lu.C.x_threads_144_run_3.log: Time in seconds =                    22.52
    
    This patch takes out all the shiny wake_affine() stuff and goes back to
    utter basics. Between the two CPUs involved with the wakeup (the CPU
    doing the wakeup and the CPU we ran on previously) pick the CPU we can
    run on _now_.
    
    This restores much of the regressions against the older kernels,
    but leaves some ground in the overloaded case. The default-enabled
    WA_WEIGHT (which will be introduced in the next patch) is an attempt
    to address the overloaded situation.
    Reported-by: default avatarEric Farman <farman@linux.vnet.ibm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: jinpuwang@gmail.com
    Cc: vcaputo@pengaru.com
    Fixes: 3fed382b ("sched/numa: Implement NUMA node level wake_affine()")
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    d153b153
fair.c 247 KB