kernel/timer.c · 158fb15fdc91f0502e1f0320c3604ce94afff5de · nexedi / linux

[PATCH] SMP races in the timer code · 158fb15f
Ingo Molnar authored Oct 11, 2003
This fixes two del_timer_sync() races that are still in the timer code. 

The first race was actually triggered in a 2.4 backport of the 2.6 timer
code.  The second race was never triggered - it is mostly theoretical on
a standalone kernel.  (It's more likely in any virtualized or otherwise
preemptable environment.)

Both races happen when self-rearming timers are used.  One mainstream
example is kernel/itimer.c.  The effect of the races is that
del_timer_sync() lets a timer running instead of synchronizing with it,
causing logic bugs (and crashes) in the affected kernel code.  One
typical incarnation of the race is a double add_timer(). 

race #1:

this code in __run_timers() is running on CPU0:

                        list_del(&timer->entry);
                        timer->base = NULL;
			[*]
                        set_running_timer(base, timer);
                        spin_unlock_irq(&base->lock);
			[**]
                        fn(data);
                        spin_lock_irq(&base->lock);

CPU0 gets stuck at the [*] code-point briefly - after the timer->base has
been set to NULL, but before the base->running_timer pointer has been set
up. This is a fundamentally volatile scenario, as there's _zero_ knowledge
in the data structures that this timer is about to be executed!

Now CPU1 comes along and calls del_timer_sync(). It will find nothing -
neither timer->base nor base->running_timer will cause it to synchronize.  
It will return and report that the timer has been deleted - shortly
afterwards CPU1 continues to execute the timer fn, which will cause
crashes.

This particular race is easy to fix by reordering the timer->base
clearing with set_running_timer(), and putting a wmb() between them, but
there's more races:

race #2

The checking of del_timer_sync() for 'pending or running timer' is
fundamentally unrobust. Eg. if CPU0 gets stuck at the [***] point below:

                base = &per_cpu(tvec_bases, i);
                if (base->running_timer == timer) {
                        while (base->running_timer == timer) {
                                cpu_relax();
                                preempt_check_resched();
                        }
			[***]
                        break;
                }
        }
        smp_rmb();
        if (timer_pending(timer))
                goto del_again;


then del_timer_sync() has already decided that this timer is not running
(we just finished loop-waiting for it), but we have not done the
timer_pending() check yet.

If the timer has re-armed itself, and if the timer expires on CPU1 (this
needs a long delay on CPU0 but that's not hard to achieve eg.  in UML or
with kernel preemption enabled), then CPU1 could start to expire the
timer and gets to the [**] point in __run_timers (see above), then CPU1
gets stalled and CPU0 is unstalled, then the timer_pending() check in
del_timer_sync() will not notice the running timer, and del_timer_sync()
returns - while CPU1 is just about to run the timer!

Fixing this second race is hard - it involves a heavy race-check
operation that has to lock all bases, and has to re-check the
base->running_timer value, and timer_pending condition atomically.

This fix also fixes the first race, due to forcing del_timer_sync() to
always observe the timer state atomically, so the [*] code point will
always synchronize with del_timer_sync(). 

The patch is ugly but safe, and it has fixed the crashes in the 2.4
backport.  I tested the patch on 2.6.0-test7 with some heavy itimer use
and it works fine.  Removing self-arming timers safely is the sole
purpose of del_timer_sync(), so there's no way around this overhead i
think.  I believe we should ultimately fix all major del_timer_sync()
users to not use self-arming timers - having del_timer_sync() in the
thread-exit path is now a considerable source of SMP overhead.  But this
is out of the scope of current 2.6 fixes of course, and we have to
support self-arming timers as well.
158fb15f
timer.c 34 KB
Replace timer.c