[PATCH] Fix occasional stop_machine() lockup with > 2 CPUs
Stephen Rothwell noted a case where one CPU was sitting in userspace, one in stop_machine() waiting for everyone to enter stopmachine(). This can happen if migration occurs at exactly the wrong time with more than 2 CPUS. Say we have 4 CPUS: 1) stop_machine() on CPU 0creates stopmachine() threads for CPUS 1, 2 and 3, and yields waiting for them to migrate to their CPUs and ack. 2) stopmachine(2) gets rebalanced (probably on exec) to CPU 1. 3) stopmachine(2) calls set_cpus_allowed on CPU 1, sleeps awaiting migration thread. 4) stopmachine(1) calls set_cpus_allowed on CPU 0, moves onto CPU1 and starts spinning. Now the migration thread never runs, and we deadlock. The simplest solution is for stopmachine() to yield until they are all in place. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Showing
Please register or sign in to comment