• Lai Jiangshan's avatar
    workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn() · 960bd11b
    Lai Jiangshan authored
    busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed
    (CPU is down again).  This used to be okay because the flag wasn't
    used for anything else.
    
    However, after 25511a47 "workqueue: reimplement CPU online rebinding
    to handle idle workers", WORKER_REBIND is also used to command idle
    workers to rebind.  If not cleared, the worker may confuse the next
    CPU_UP cycle by having REBIND spuriously set or oops / get stuck by
    prematurely calling idle_worker_rebind().
    
      WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5
     00()
      Hardware name: Bochs
      Modules linked in: test_wq(O-)
      Pid: 33, comm: kworker/1:1 Tainted: G           O 3.6.0-rc1-work+ #3
      Call Trace:
       [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0
       [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20
       [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500
       [<ffffffff810bc16e>] kthread+0xbe/0xd0
       [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
      ---[ end trace e977cf20f4661968 ]---
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500
      PGD 0
      Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in: test_wq(O-)
      CPU 0
      Pid: 33, comm: kworker/1:1 Tainted: G        W  O 3.6.0-rc1-work+ #3 Bochs Bochs
      RIP: 0010:[<ffffffff810b3db0>]  [<ffffffff810b3db0>] worker_thread+0x360/0x500
      RSP: 0018:ffff88001e1c9de0  EFLAGS: 00010086
      RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
      RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001
      R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580
      R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900
      FS:  0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900)
      Stack:
       ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010
       ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900
       ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340
      Call Trace:
       [<ffffffff810bc16e>] kthread+0xbe/0xd0
       [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
      Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b
      RIP  [<ffffffff810b3db0>] worker_thread+0x360/0x500
       RSP <ffff88001e1c9de0>
      CR2: 0000000000000000
    
    There was no reason to keep WORKER_REBIND on failure in the first
    place - WORKER_UNBOUND is guaranteed to be set in such cases
    preventing incorrectly activating concurrency management.  Always
    clear WORKER_REBIND.
    
    tj: Updated comment and description.
    Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    960bd11b
workqueue.c 105 KB