1. 24 Apr, 2024 2 commits
    • Lai Jiangshan's avatar
      workqueue: Fix divide error in wq_update_node_max_active() · 91f09870
      Lai Jiangshan authored
      Yue Sun and xingwei lee reported a divide error bug in
      wq_update_node_max_active():
      
      divide error: 0000 [#1] PREEMPT SMP KASAN PTI
      CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.9.0-rc5 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:wq_update_node_max_active+0x369/0x6b0 kernel/workqueue.c:1605
      Code: 24 bf 00 00 00 80 44 89 fe e8 83 27 33 00 41 83 fc ff 75 0d 41
      81 ff 00 00 00 80 0f 84 68 01 00 00 e8 fb 22 33 00 44 89 f8 99 <41> f7
      fc 89 c5 89 c7 44 89 ee e8 a8 24 33 00 89 ef 8b 5c 24 04 89
      RSP: 0018:ffffc9000018fbb0 EFLAGS: 00010293
      RAX: 00000000000000ff RBX: 0000000000000001 RCX: ffff888100ada500
      RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000080000000
      RBP: 0000000000000001 R08: ffffffff815b1fcd R09: 1ffff1100364ad72
      R10: dffffc0000000000 R11: ffffed100364ad73 R12: 0000000000000000
      R13: 0000000000000100 R14: 0000000000000000 R15: 00000000000000ff
      FS:  0000000000000000(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fb8c06ca6f8 CR3: 000000010d6c6000 CR4: 0000000000750ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       workqueue_offline_cpu+0x56f/0x600 kernel/workqueue.c:6525
       cpuhp_invoke_callback+0x4e1/0x870 kernel/cpu.c:194
       cpuhp_thread_fun+0x411/0x7d0 kernel/cpu.c:1092
       smpboot_thread_fn+0x544/0xa10 kernel/smpboot.c:164
       kthread+0x2ed/0x390 kernel/kthread.c:388
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      
      After analysis, it happens when all of the CPUs in a workqueue's affinity
      get offine.
      
      The problem can be easily reproduced by:
      
       # echo 8 > /sys/devices/virtual/workqueue/<any-wq-name>/cpumask
       # echo 0 > /sys/devices/system/cpu/cpu3/online
      
      Use the default max_actives for nodes when all of the CPUs in the
      workqueue's affinity get offline to fix the problem.
      Reported-by: default avatarYue Sun <samsun1006219@gmail.com>
      Reported-by: default avatarxingwei lee <xrivendell7@gmail.com>
      Link: https://lore.kernel.org/lkml/CAEkJfYPGS1_4JqvpSo0=FM0S1ytB8CEbyreLTtWpR900dUZymw@mail.gmail.com/
      Fixes: 5797b1c1 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      91f09870
    • Tejun Heo's avatar
      workqueue: The default node_nr_active should have its max set to max_active · d40f9202
      Tejun Heo authored
      The default nna (node_nr_active) is used when the pool isn't tied to a
      specific NUMA node. This can happen in the following cases:
      
       1. On NUMA, if per-node pwq init failure and the fallback pwq is used.
       2. On NUMA, if a pool is configured to span multiple nodes.
       3. On single node setups.
      
      5797b1c1 ("workqueue: Implement system-wide nr_active enforcement for
      unbound workqueues") set the default nna->max to min_active because only #1
      was being considered. For #2 and #3, using min_active means that the max
      concurrency in normal operation is pushed down to min_active which is
      currently 8, which can obviously lead to performance issues.
      
      exact value nna->max is set to doesn't really matter. #2 can only happen if
      the workqueue is intentionally configured to ignore NUMA boundaries and
      there's no good way to distribute max_active in this case. #3 is the default
      behavior on single node machines.
      
      Let's set it the default nna->max to max_active. This fixes the artificially
      lowered concurrency problem on single node machines and shouldn't hurt
      anything for other cases.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Fixes: 5797b1c1 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
      Link: https://lore.kernel.org/dm-devel/20240410084531.2134621-1-shinichiro.kawasaki@wdc.com/Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d40f9202
  2. 23 Apr, 2024 1 commit
    • Sven Schnelle's avatar
      workqueue: Fix selection of wake_cpu in kick_pool() · 57a01eaf
      Sven Schnelle authored
      With cpu_possible_mask=0-63 and cpu_online_mask=0-7 the following
      kernel oops was observed:
      
      smp: Bringing up secondary CPUs ...
      smp: Brought up 1 node, 8 CPUs
      Unable to handle kernel pointer dereference in virtual kernel address space
      Failing address: 0000000000000000 TEID: 0000000000000803
      [..]
       Call Trace:
      arch_vcpu_is_preempted+0x12/0x80
      select_idle_sibling+0x42/0x560
      select_task_rq_fair+0x29a/0x3b0
      try_to_wake_up+0x38e/0x6e0
      kick_pool+0xa4/0x198
      __queue_work.part.0+0x2bc/0x3a8
      call_timer_fn+0x36/0x160
      __run_timers+0x1e2/0x328
      __run_timer_base+0x5a/0x88
      run_timer_softirq+0x40/0x78
      __do_softirq+0x118/0x388
      irq_exit_rcu+0xc0/0xd8
      do_ext_irq+0xae/0x168
      ext_int_handler+0xbe/0xf0
      psw_idle_exit+0x0/0xc
      default_idle_call+0x3c/0x110
      do_idle+0xd4/0x158
      cpu_startup_entry+0x40/0x48
      rest_init+0xc6/0xc8
      start_kernel+0x3c4/0x5e0
      startup_continue+0x3c/0x50
      
      The crash is caused by calling arch_vcpu_is_preempted() for an offline
      CPU. To avoid this, select the cpu with cpumask_any_and_distribute()
      to mask __pod_cpumask with cpu_online_mask. In case no cpu is left in
      the pool, skip the assignment.
      
      tj: This doesn't fully fix the bug as CPUs can still go down between picking
      the target CPU and the wake call. Fixing that likely requires adding
      cpu_online() test to either the sched or s390 arch code. However, regardless
      of how that is fixed, workqueue shouldn't be picking a CPU which isn't
      online as that would result in unpredictable and worse behavior.
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Fixes: 8639eceb ("workqueue: Implement non-strict affinity scope for unbound workqueues")
      Cc: stable@vger.kernel.org # v6.6+
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      57a01eaf
  3. 03 Apr, 2024 2 commits
  4. 02 Apr, 2024 2 commits
    • Linus Torvalds's avatar
      Merge tag 'docs-6.9-fixes' of git://git.lwn.net/linux · b1e6ec0a
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "Four small documentation fixes"
      
      * tag 'docs-6.9-fixes' of git://git.lwn.net/linux:
        docs: zswap: fix shell command format
        tracing: Fix documentation on tp_printk cmdline option
        docs: Fix bitfield handling in kernel-doc
        Documentation: dev-tools: Add link to RV docs
      b1e6ec0a
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs · 67199a47
      Linus Torvalds authored
      Pull bcachefs fixes from Kent Overstreet:
       "Lots of fixes for situations with extreme filesystem damage.
      
        One fix ("Fix journal pins in btree write buffer") applicable to
        normal usage; also a dio performance fix.
      
        New repair/construction code is in the final stages, should be ready
        in about a week. Anyone that lost btree interior nodes (or a variety
        of other damage) as a result of the splitbrain bug will be able to
        repair then"
      
      * tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs: (32 commits)
        bcachefs: On emergency shutdown, print out current journal sequence number
        bcachefs: Fix overlapping extent repair
        bcachefs: Fix remove_dirent()
        bcachefs: Logged op errors should be ignored
        bcachefs: Improve -o norecovery; opts.recovery_pass_limit
        bcachefs: bch2_run_explicit_recovery_pass_persistent()
        bcachefs: Ensure bch_sb_field_ext always exists
        bcachefs: Flush journal immediately after replay if we did early repair
        bcachefs: Resume logged ops after fsck
        bcachefs: Add error messages to logged ops fns
        bcachefs: Split out recovery_passes.c
        bcachefs: fix backpointer for missing alloc key msg
        bcachefs: Fix bch2_btree_increase_depth()
        bcachefs: Kill bch2_bkey_ptr_data_type()
        bcachefs: Fix use after free in check_root_trans()
        bcachefs: Fix repair path for missing indirect extents
        bcachefs: Fix use after free in bch2_check_fix_ptrs()
        bcachefs: Fix btree node keys accounting in topology repair path
        bcachefs: Check btree ptr min_key in .invalid
        bcachefs: add REQ_SYNC and REQ_IDLE in write dio
        ...
      67199a47
  5. 01 Apr, 2024 33 commits