1. 23 Aug, 2021 1 commit
    • Jens Axboe's avatar
      io-wq: remove GFP_ATOMIC allocation off schedule out path · d3e9f732
      Jens Axboe authored
      Daniel reports that the v5.14-rc4-rt4 kernel throws a BUG when running
      stress-ng:
      
      | [   90.202543] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:35
      | [   90.202549] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2047, name: iou-wrk-2041
      | [   90.202555] CPU: 5 PID: 2047 Comm: iou-wrk-2041 Tainted: G        W         5.14.0-rc4-rt4+ #89
      | [   90.202559] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      | [   90.202561] Call Trace:
      | [   90.202577]  dump_stack_lvl+0x34/0x44
      | [   90.202584]  ___might_sleep.cold+0x87/0x94
      | [   90.202588]  rt_spin_lock+0x19/0x70
      | [   90.202593]  ___slab_alloc+0xcb/0x7d0
      | [   90.202598]  ? newidle_balance.constprop.0+0xf5/0x3b0
      | [   90.202603]  ? dequeue_entity+0xc3/0x290
      | [   90.202605]  ? io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202610]  ? pick_next_task_fair+0xb9/0x330
      | [   90.202612]  ? __schedule+0x670/0x1410
      | [   90.202615]  ? io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202618]  kmem_cache_alloc_trace+0x79/0x1f0
      | [   90.202621]  io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202625]  io_wq_worker_sleeping+0x37/0x50
      | [   90.202628]  schedule+0x30/0xd0
      | [   90.202630]  schedule_timeout+0x8f/0x1a0
      | [   90.202634]  ? __bpf_trace_tick_stop+0x10/0x10
      | [   90.202637]  io_wqe_worker+0xfd/0x320
      | [   90.202641]  ? finish_task_switch.isra.0+0xd3/0x290
      | [   90.202644]  ? io_worker_handle_work+0x670/0x670
      | [   90.202646]  ? io_worker_handle_work+0x670/0x670
      | [   90.202649]  ret_from_fork+0x22/0x30
      
      which is due to the RT kernel not liking a GFP_ATOMIC allocation inside
      a raw spinlock. Besides that not working on RT, doing any kind of
      allocation from inside schedule() is kind of nasty and should be avoided
      if at all possible.
      
      This particular path happens when an io-wq worker goes to sleep, and we
      need a new worker to handle pending work. We currently allocate a small
      data item to hold the information we need to create a new worker, but we
      can instead include this data in the io_worker struct itself and just
      protect it with a single bit lock. We only really need one per worker
      anyway, as we will have run pending work between to sleep cycles.
      
      https://lore.kernel.org/lkml/20210804082418.fbibprcwtzyt5qax@beryllium.lan/Reported-by: default avatarDaniel Wagner <dwagner@suse.de>
      Tested-by: default avatarDaniel Wagner <dwagner@suse.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d3e9f732
  2. 22 Aug, 2021 2 commits
  3. 21 Aug, 2021 9 commits
  4. 20 Aug, 2021 26 commits
  5. 19 Aug, 2021 2 commits