• Lai Jiangshan's avatar
    workqueue: Fix divide error in wq_update_node_max_active() · 91f09870
    Lai Jiangshan authored
    Yue Sun and xingwei lee reported a divide error bug in
    wq_update_node_max_active():
    
    divide error: 0000 [#1] PREEMPT SMP KASAN PTI
    CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.9.0-rc5 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
    RIP: 0010:wq_update_node_max_active+0x369/0x6b0 kernel/workqueue.c:1605
    Code: 24 bf 00 00 00 80 44 89 fe e8 83 27 33 00 41 83 fc ff 75 0d 41
    81 ff 00 00 00 80 0f 84 68 01 00 00 e8 fb 22 33 00 44 89 f8 99 <41> f7
    fc 89 c5 89 c7 44 89 ee e8 a8 24 33 00 89 ef 8b 5c 24 04 89
    RSP: 0018:ffffc9000018fbb0 EFLAGS: 00010293
    RAX: 00000000000000ff RBX: 0000000000000001 RCX: ffff888100ada500
    RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000080000000
    RBP: 0000000000000001 R08: ffffffff815b1fcd R09: 1ffff1100364ad72
    R10: dffffc0000000000 R11: ffffed100364ad73 R12: 0000000000000000
    R13: 0000000000000100 R14: 0000000000000000 R15: 00000000000000ff
    FS:  0000000000000000(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fb8c06ca6f8 CR3: 000000010d6c6000 CR4: 0000000000750ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
     <TASK>
     workqueue_offline_cpu+0x56f/0x600 kernel/workqueue.c:6525
     cpuhp_invoke_callback+0x4e1/0x870 kernel/cpu.c:194
     cpuhp_thread_fun+0x411/0x7d0 kernel/cpu.c:1092
     smpboot_thread_fn+0x544/0xa10 kernel/smpboot.c:164
     kthread+0x2ed/0x390 kernel/kthread.c:388
     ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244
     </TASK>
    Modules linked in:
    ---[ end trace 0000000000000000 ]---
    
    After analysis, it happens when all of the CPUs in a workqueue's affinity
    get offine.
    
    The problem can be easily reproduced by:
    
     # echo 8 > /sys/devices/virtual/workqueue/<any-wq-name>/cpumask
     # echo 0 > /sys/devices/system/cpu/cpu3/online
    
    Use the default max_actives for nodes when all of the CPUs in the
    workqueue's affinity get offline to fix the problem.
    Reported-by: default avatarYue Sun <samsun1006219@gmail.com>
    Reported-by: default avatarxingwei lee <xrivendell7@gmail.com>
    Link: https://lore.kernel.org/lkml/CAEkJfYPGS1_4JqvpSo0=FM0S1ytB8CEbyreLTtWpR900dUZymw@mail.gmail.com/
    Fixes: 5797b1c1 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    91f09870
workqueue.c 221 KB