1. 22 Oct, 2021 1 commit
  2. 21 Oct, 2021 15 commits
  3. 20 Oct, 2021 16 commits
  4. 19 Oct, 2021 8 commits
    • Zheng Liang's avatar
      block, bfq: fix UAF problem in bfqg_stats_init() · 2fc428f6
      Zheng Liang authored
      In bfq_pd_alloc(), the function bfqg_stats_init() init bfqg. If
      blkg_rwstat_init() init bfqg_stats->bytes successful and init
      bfqg_stats->ios failed, bfqg_stats_init() return failed, bfqg will
      be freed. But blkg_rwstat->cpu_cnt is not deleted from the list of
      percpu_counters. If we traverse the list of percpu_counters, It will
      have UAF problem.
      
      we should use blkg_rwstat_exit() to cleanup bfqg_stats bytes in the
      above scenario.
      
      Fixes: commit fd41e603 ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios")
      Signed-off-by: default avatarZheng Liang <zhengliang6@huawei.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20211018024225.1493938-1-zhengliang6@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2fc428f6
    • Jens Axboe's avatar
      block: inline fast path of driver tag allocation · a808a9d5
      Jens Axboe authored
      If we don't use an IO scheduler or have shared tags, then we don't need
      to call into this external function at all. This saves ~2% for such
      a setup.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a808a9d5
    • Christoph Hellwig's avatar
      blk-mq: don't handle non-flush requests in blk_insert_flush · d92ca9d8
      Christoph Hellwig authored
      Return to the normal blk_mq_submit_bio flow if the bio did not end up
      actually being a flush because the device didn't support it.  Note that
      this is basically impossible to hit without special instrumentation given
      that submit_bio_checks already clears these flags usually, so we'd need a
      tight race to actually hit this code path.
      
      With this the call to blk_mq_run_hw_queue for the flush requests can be
      removed given that the actual flush requests are always issued via the
      requeue workqueue which runs the queue unconditionally.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d92ca9d8
    • Jens Axboe's avatar
      block: attempt direct issue of plug list · dc5fc361
      Jens Axboe authored
      If we have just one queue type in the plug list, then we can extend our
      direct issue to cover a full plug list as well. This allows sending a
      batch of requests for direct issue, which is more efficient than doing
      one-at-a-time kind of issue.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dc5fc361
    • Jens Axboe's avatar
      block: change plugging to use a singly linked list · bc490f81
      Jens Axboe authored
      Use a singly linked list for the blk_plug. This saves 8 bytes in the
      blk_plug struct, and makes for faster list manipulations than doubly
      linked lists. As we don't use the doubly linked lists for anything,
      singly linked is just fine.
      
      This yields a bump in default (merging enabled) performance from 7.0
      to 7.1M IOPS, and ~7.5M IOPS with merging disabled.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bc490f81
    • Andrea Righi's avatar
      blk-wbt: prevent NULL pointer dereference in wb_timer_fn · 480d42dc
      Andrea Righi authored
      The timer callback used to evaluate if the latency is exceeded can be
      executed after the corresponding disk has been released, causing the
      following NULL pointer dereference:
      
      [ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098
      [ 119.987617] #PF: supervisor read access in kernel mode
      [ 119.987971] #PF: error_code(0x0000) - not-present page
      [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0
      [ 119.988697] Oops: 0000 [#1] SMP NOPTI
      [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi
      [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0
      [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
      [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246
      [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780
      [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000
      [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500
      [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00
      [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000
      [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0
      [ 119.995906] Call Trace:
      [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30
      [ 119.996505] blk_stat_timer_fn+0x138/0x140
      [ 119.996830] call_timer_fn+0x2b/0x100
      [ 119.997136] __run_timers.part.0+0x1d1/0x240
      [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20
      [ 119.997826] ? ktime_get+0x3e/0xa0
      [ 119.998110] ? native_apic_msr_write+0x2c/0x30
      [ 119.998456] ? lapic_next_event+0x20/0x30
      [ 119.998779] ? clockevents_program_event+0x94/0xf0
      [ 119.999150] run_timer_softirq+0x2a/0x50
      [ 119.999465] __do_softirq+0xcb/0x26f
      [ 119.999764] irq_exit_rcu+0x8c/0xb0
      [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90
      [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
      [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      In this case simply return from the timer callback (no action
      required) to prevent the NULL pointer dereference.
      
      BugLink: https://bugs.launchpad.net/bugs/1947557
      Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/
      Fixes: 34dbad5d ("blk-stat: convert to callback-based statistics reporting")
      Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktopSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      480d42dc
    • Jens Axboe's avatar
      block: align blkdev_dio inlined bio to a cacheline · 6155631a
      Jens Axboe authored
      We get all sorts of unreliable and funky results since the bio is
      designed to align on a cacheline, which it does not when inlined like
      this.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6155631a
    • Jens Axboe's avatar
      block: move blk_mq_tag_to_rq() inline · e028f167
      Jens Axboe authored
      This is in the fast path of driver issue or completion, and it's a single
      array index operation. Move it inline to avoid a function call for it.
      
      This does mean making struct blk_mq_tags block layer public, but there's
      not really much in there.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e028f167