[PATCH] del_timer() vs. mod_timer() SMP race
We just spent some days fighting a rare race in one of the distro's who backported some of timer.c from 2.6 to 2.4 (though they missed a bit). The actual race we found didn't happen in 2.6 _but_ code inspection showed that a similar race is still present in 2.6, explanation below: Code removing a timer from a list (run_timers or del_timer) takes that CPU list lock, does list_del, then timer->base = NULL. It is mandatory that this timer->base = NULL is visible to other CPUs only after the list_del() is complete. If not, then mod timer could see it NULL, thus take it's own CPU list lock and not the one for the CPU the timer was beeing removed from the list, and thus the list_add in mod_timer() could race with the list_del() from run_timers() or del_timer(). Our race happened with run_timers(), which _DOES_ contain a proper smp_wmb() in the right spot in 2.6, but didn't in the "backport" we were fighting with. However, del_timer() doesn't have such a barrier, and thus is subject to this race in 2.6 as well. This patch fixes it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Showing
Please register or sign in to comment