Commit a041464f authored by Rusty Russell's avatar Rusty Russell Committed by Linus Torvalds

[PATCH] Fix occasional stop_machine() lockup with > 2 CPUs

Stephen Rothwell noted a case where one CPU was sitting in userspace, one
in stop_machine() waiting for everyone to enter stopmachine().  This can
happen if migration occurs at exactly the wrong time with more than 2 CPUS.
 Say we have 4 CPUS:

1) stop_machine() on CPU 0creates stopmachine() threads for CPUS 1, 2
   and 3, and yields waiting for them to migrate to their CPUs and
   ack.

2) stopmachine(2) gets rebalanced (probably on exec) to CPU 1.

3) stopmachine(2) calls set_cpus_allowed on CPU 1, sleeps awaiting
   migration thread.

4) stopmachine(1) calls set_cpus_allowed on CPU 0, moves onto CPU1 and
   starts spinning.

Now the migration thread never runs, and we deadlock.  The simplest
solution is for stopmachine() to yield until they are all in place.
Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent 4e73e8ed
......@@ -52,7 +52,12 @@ static int stopmachine(void *cpu)
mb(); /* Must read state first. */
atomic_inc(&stopmachine_thread_ack);
}
cpu_relax();
/* Yield in first stage: migration threads need to
* help our sisters onto their CPUs. */
if (!prepared && !irqs_disabled)
yield();
else
cpu_relax();
}
/* Ack: we are exiting. */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment