[PATCH] PPC64 Move thread_info flags to its own cache line
This patch fixes a problem I have been seeing since all the preempt changes went in, which is that ppc64 SMP systems would livelock randomly if preempt was enabled. It turns out that what was happening was that one cpu was spinning in spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly doing preempt_enable() and preempt_disable() calls. The other cpu had the lock and was trying to set the TIF_NEED_RESCHED flag for the task running on the first cpu. That is an atomic operation which has to be retried if another cpu writes to the same cacheline between the load and the store, which the other cpu was doing every time it did preempt_enable() or preempt_disable(). I decided to move the thread_info flags field into the next cache line, since it is the only field that would regularly be modified by cpus other than the one running the task that owns the thread_info. (OK possibly the `cpu' field would be on a rebalance; I don't know the rebalancing code, but that should be pretty infrequent.) Thus, moving the flags field seems like a good idea generally as well as solving the immediate problem. For the record I am pretty unhappy with the code we use for spin_lock et al. with preemption turned on (the BUILD_LOCK_OPS stuff in spinlock.c). For a start we do the atomic op (_raw_spin_trylock) each time around the loop. That is going to be generating a lot of unnecessary bus (or fabric) traffic. Instead, after we fail to get the lock we should poll it with simple loads until we see that it is clear and then retry the atomic op. Assuming a reasonable cache design, the loads won't generate any bus traffic until another cpu writes to the cacheline containing the lock. Secondly we have lost the __spin_yield call that we had on ppc64, which is an important optimization when we are running under the hypervisor. I can't just put that in cpu_relax because I need to know which (virtual) cpu is holding the lock, so that I can tell the hypervisor which virtual cpu to give my time slice to. That information is stored in the lock variable, which is why __spin_yield needs the address of the lock. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Showing
Please register or sign in to comment