• Suresh Siddha's avatar
    sched: Use resched IPI to kick off the nohz idle balance · ca38062e
    Suresh Siddha authored
    Current use of smp call function to kick the nohz idle balance can deadlock
    in this scenario.
    
    1. cpu-A did a generic_exec_single() to cpu-B and after queuing its call single
    data (csd) to the call single queue, cpu-A took a timer interrupt.  Actual IPI
    to cpu-B to process the call single queue is not yet sent.
    
    2. As part of the timer interrupt handler, cpu-A decided to kick cpu-B
    for the idle load balancing (sets cpu-B's rq->nohz_balance_kick to 1)
    and __smp_call_function_single() with nowait will queue the csd to the
    cpu-B's queue. But the generic_exec_single() won't send an IPI to cpu-B
    as the call single queue was not empty.
    
    3. cpu-A is busy with lot of interrupts
    
    4. Meanwhile cpu-B is entering and exiting idle and noticed that it has
    it's rq->nohz_balance_kick set to '1'. So it will go ahead and do the
    idle load balancer and clear its rq->nohz_balance_kick.
    
    5. At this point, csd queued as part of the step-2 above is still locked
    and waiting to be serviced on cpu-B.
    
    6. cpu-A is still busy with interrupt load and now it got another timer
    interrupt and as part of it decided to kick cpu-B for another idle load
    balancing (as it finds cpu-B's rq->nohz_balance_kick cleared in step-4
    above) and does __smp_call_function_single() with the same csd that is
    still locked.
    
    7. And we get a deadlock waiting for the csd_lock() in the
    __smp_call_function_single().
    
    Main issue here is that cpu-B can service the idle load balancer kick
    request from cpu-A even with out receiving the IPI and this lead to
    doing multiple __smp_call_function_single() on the same csd leading to
    deadlock.
    
    To kick a cpu, scheduler already has the reschedule vector reserved. Use
    that mechanism (kick_process()) instead of using the generic smp call function
    mechanism to kick off the nohz idle load balancing and avoid the deadlock.
    
       [ This issue is present from 2.6.35+ kernels, but marking it -stable
         only from v3.0+ as the proposed fix depends on the scheduler_ipi()
         that is introduced recently. ]
    Reported-by: default avatarPrarit Bhargava <prarit@redhat.com>
    Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
    Cc: stable@kernel.org # v3.0+
    Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    Link: http://lkml.kernel.org/r/20111003220934.834943260@sbsiddha-desk.sc.intel.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    ca38062e
sched_fair.c 127 KB