1. 23 Aug, 2021 2 commits
    • Jens Axboe's avatar
      io_uring: be smarter about waking multiple CQ ring waiters · 5fd46178
      Jens Axboe authored
      Currently we only wake the first waiter, even if we have enough entries
      posted to satisfy multiple waiters. Improve that situation so that
      every waiter knows how much the CQ tail has to advance before they can
      be safely woken up.
      
      With this change, if we have N waiters each asking for 1 event and we get
      4 completions, then we wake up 4 waiters. If we have N waiters asking
      for 2 completions and we get 4 completions, then we wake up the first
      two. Previously, only the first waiter would've been woken up.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5fd46178
    • Jens Axboe's avatar
      io-wq: remove GFP_ATOMIC allocation off schedule out path · d3e9f732
      Jens Axboe authored
      Daniel reports that the v5.14-rc4-rt4 kernel throws a BUG when running
      stress-ng:
      
      | [   90.202543] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:35
      | [   90.202549] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2047, name: iou-wrk-2041
      | [   90.202555] CPU: 5 PID: 2047 Comm: iou-wrk-2041 Tainted: G        W         5.14.0-rc4-rt4+ #89
      | [   90.202559] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      | [   90.202561] Call Trace:
      | [   90.202577]  dump_stack_lvl+0x34/0x44
      | [   90.202584]  ___might_sleep.cold+0x87/0x94
      | [   90.202588]  rt_spin_lock+0x19/0x70
      | [   90.202593]  ___slab_alloc+0xcb/0x7d0
      | [   90.202598]  ? newidle_balance.constprop.0+0xf5/0x3b0
      | [   90.202603]  ? dequeue_entity+0xc3/0x290
      | [   90.202605]  ? io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202610]  ? pick_next_task_fair+0xb9/0x330
      | [   90.202612]  ? __schedule+0x670/0x1410
      | [   90.202615]  ? io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202618]  kmem_cache_alloc_trace+0x79/0x1f0
      | [   90.202621]  io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202625]  io_wq_worker_sleeping+0x37/0x50
      | [   90.202628]  schedule+0x30/0xd0
      | [   90.202630]  schedule_timeout+0x8f/0x1a0
      | [   90.202634]  ? __bpf_trace_tick_stop+0x10/0x10
      | [   90.202637]  io_wqe_worker+0xfd/0x320
      | [   90.202641]  ? finish_task_switch.isra.0+0xd3/0x290
      | [   90.202644]  ? io_worker_handle_work+0x670/0x670
      | [   90.202646]  ? io_worker_handle_work+0x670/0x670
      | [   90.202649]  ret_from_fork+0x22/0x30
      
      which is due to the RT kernel not liking a GFP_ATOMIC allocation inside
      a raw spinlock. Besides that not working on RT, doing any kind of
      allocation from inside schedule() is kind of nasty and should be avoided
      if at all possible.
      
      This particular path happens when an io-wq worker goes to sleep, and we
      need a new worker to handle pending work. We currently allocate a small
      data item to hold the information we need to create a new worker, but we
      can instead include this data in the io_worker struct itself and just
      protect it with a single bit lock. We only really need one per worker
      anyway, as we will have run pending work between to sleep cycles.
      
      https://lore.kernel.org/lkml/20210804082418.fbibprcwtzyt5qax@beryllium.lan/Reported-by: default avatarDaniel Wagner <dwagner@suse.de>
      Tested-by: default avatarDaniel Wagner <dwagner@suse.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d3e9f732
  2. 22 Aug, 2021 2 commits
  3. 21 Aug, 2021 9 commits
  4. 20 Aug, 2021 26 commits
  5. 19 Aug, 2021 1 commit