• Ingo Molnar's avatar
    [PATCH] SMP races in the timer code · 158fb15f
    Ingo Molnar authored
    This fixes two del_timer_sync() races that are still in the timer code. 
    
    The first race was actually triggered in a 2.4 backport of the 2.6 timer
    code.  The second race was never triggered - it is mostly theoretical on
    a standalone kernel.  (It's more likely in any virtualized or otherwise
    preemptable environment.)
    
    Both races happen when self-rearming timers are used.  One mainstream
    example is kernel/itimer.c.  The effect of the races is that
    del_timer_sync() lets a timer running instead of synchronizing with it,
    causing logic bugs (and crashes) in the affected kernel code.  One
    typical incarnation of the race is a double add_timer(). 
    
    race #1:
    
    this code in __run_timers() is running on CPU0:
    
                            list_del(&timer->entry);
                            timer->base = NULL;
    			[*]
                            set_running_timer(base, timer);
                            spin_unlock_irq(&base->lock);
    			[**]
                            fn(data);
                            spin_lock_irq(&base->lock);
    
    CPU0 gets stuck at the [*] code-point briefly - after the timer->base has
    been set to NULL, but before the base->running_timer pointer has been set
    up. This is a fundamentally volatile scenario, as there's _zero_ knowledge
    in the data structures that this timer is about to be executed!
    
    Now CPU1 comes along and calls del_timer_sync(). It will find nothing -
    neither timer->base nor base->running_timer will cause it to synchronize.  
    It will return and report that the timer has been deleted - shortly
    afterwards CPU1 continues to execute the timer fn, which will cause
    crashes.
    
    This particular race is easy to fix by reordering the timer->base
    clearing with set_running_timer(), and putting a wmb() between them, but
    there's more races:
    
    race #2
    
    The checking of del_timer_sync() for 'pending or running timer' is
    fundamentally unrobust. Eg. if CPU0 gets stuck at the [***] point below:
    
                    base = &per_cpu(tvec_bases, i);
                    if (base->running_timer == timer) {
                            while (base->running_timer == timer) {
                                    cpu_relax();
                                    preempt_check_resched();
                            }
    			[***]
                            break;
                    }
            }
            smp_rmb();
            if (timer_pending(timer))
                    goto del_again;
    
    
    then del_timer_sync() has already decided that this timer is not running
    (we just finished loop-waiting for it), but we have not done the
    timer_pending() check yet.
    
    If the timer has re-armed itself, and if the timer expires on CPU1 (this
    needs a long delay on CPU0 but that's not hard to achieve eg.  in UML or
    with kernel preemption enabled), then CPU1 could start to expire the
    timer and gets to the [**] point in __run_timers (see above), then CPU1
    gets stalled and CPU0 is unstalled, then the timer_pending() check in
    del_timer_sync() will not notice the running timer, and del_timer_sync()
    returns - while CPU1 is just about to run the timer!
    
    Fixing this second race is hard - it involves a heavy race-check
    operation that has to lock all bases, and has to re-check the
    base->running_timer value, and timer_pending condition atomically.
    
    This fix also fixes the first race, due to forcing del_timer_sync() to
    always observe the timer state atomically, so the [*] code point will
    always synchronize with del_timer_sync(). 
    
    The patch is ugly but safe, and it has fixed the crashes in the 2.4
    backport.  I tested the patch on 2.6.0-test7 with some heavy itimer use
    and it works fine.  Removing self-arming timers safely is the sole
    purpose of del_timer_sync(), so there's no way around this overhead i
    think.  I believe we should ultimately fix all major del_timer_sync()
    users to not use self-arming timers - having del_timer_sync() in the
    thread-exit path is now a considerable source of SMP overhead.  But this
    is out of the scope of current 2.6 fixes of course, and we have to
    support self-arming timers as well.
    158fb15f
timer.c 34 KB