1. 06 Dec, 2023 4 commits
  2. 02 Dec, 2023 1 commit
  3. 01 Dec, 2023 4 commits
    • David Jeffery's avatar
      md/raid6: use valid sector values to determine if an I/O should wait on the reshape · c467e97f
      David Jeffery authored
      During a reshape or a RAID6 array such as expanding by adding an additional
      disk, I/Os to the region of the array which have not yet been reshaped can
      stall indefinitely. This is from errors in the stripe_ahead_of_reshape
      function causing md to think the I/O is to a region in the actively
      undergoing the reshape.
      
      stripe_ahead_of_reshape fails to account for the q disk having a sector
      value of 0. By not excluding the q disk from the for loop, raid6 will always
      generate a min_sector value of 0, causing a return value which stalls.
      
      The function's max_sector calculation also uses min() when it should use
      max(), causing the max_sector value to always be 0. During a backwards
      rebuild this can cause the opposite problem where it allows I/O to advance
      when it should wait.
      
      Fixing these errors will allow safe I/O to advance in a timely manner and
      delay only I/O which is unsafe due to stripes in the middle of undergoing
      the reshape.
      
      Fixes: 486f6055 ("md/raid5: Check all disks in a stripe_head for reshape progress")
      Cc: stable@vger.kernel.org # v6.0+
      Signed-off-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Tested-by: default avatarLaurence Oberman <loberman@redhat.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20231128181233.6187-1-djeffery@redhat.com
      c467e97f
    • Jens Axboe's avatar
      Merge tag 'nvme-6.7-2023-12-01' of git://git.infradead.org/nvme into block-6.7 · 8ad3ac92
      Jens Axboe authored
      Pull NVMe fixes from Keith:
      
      "nvme fixes for Linux 6.7
      
       - Invalid namespace identification error handling (Marizio Ewan, Keith)
       - Fabrics keep-alive tuning (Mark)"
      
      * tag 'nvme-6.7-2023-12-01' of git://git.infradead.org/nvme:
        nvme-core: check for too small lba shift
        nvme: check for valid nvme_identify_ns() before using it
        nvme-core: fix a memory leak in nvme_ns_info_from_identify()
        nvme: fine-tune sending of first keep-alive
      8ad3ac92
    • Keith Busch's avatar
      nvme-core: check for too small lba shift · 74fbc88e
      Keith Busch authored
      The block layer doesn't support logical block sizes smaller than 512
      bytes. The nvme spec doesn't support that small either, but the driver
      isn't checking to make sure the device responded with usable data.
      Failing to catch this will result in a kernel bug, either from a
      division by zero when stacking, or a zero length bio.
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      74fbc88e
    • Ming Lei's avatar
      blk-mq: don't count completed flush data request as inflight in case of quiesce · 0e4237ae
      Ming Lei authored
      Request queue quiesce may interrupt flush sequence, and the original request
      may have been marked as COMPLETE, but can't get finished because of
      queue quiesce.
      
      This way is fine from driver viewpoint, because flush sequence is block
      layer concept, and it isn't related with driver.
      
      However, driver(such as dm-rq) can call blk_mq_queue_inflight() to count &
      drain inflight requests, then the wait & drain never gets done because
      the completed & not-finished flush request is counted as inflight.
      
      Fix this issue by not counting completed flush data request as inflight in
      case of quiesce.
      
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: David Jeffery <djeffery@redhat.com>
      Cc: John Pittman <jpittman@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20231201085605.577730-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0e4237ae
  4. 29 Nov, 2023 1 commit
  5. 28 Nov, 2023 2 commits
  6. 27 Nov, 2023 3 commits
    • Ewan D. Milne's avatar
      nvme: check for valid nvme_identify_ns() before using it · d8b90d60
      Ewan D. Milne authored
      When scanning namespaces, it is possible to get valid data from the first
      call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second
      call in nvme_update_ns_info_block().  In particular, if the NSID becomes
      inactive between the two commands, a storage device may return a buffer
      filled with zero as per 4.1.5.1.  In this case, we can get a kernel crash
      due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will
      be set to zero.
      
      PID: 326      TASK: ffff95fec3cd8000  CPU: 29   COMMAND: "kworker/u98:10"
       #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7
       #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa
       #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788
       #3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb
       #4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce
       #5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595
       #6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6
       #7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926
          [exception RIP: blk_stack_limits+434]
          RIP: ffffffff92191872  RSP: ffffad8f8702fc80  RFLAGS: 00010246
          RAX: 0000000000000000  RBX: ffff95efa0c91800  RCX: 0000000000000001
          RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000000000001
          RBP: 00000000ffffffff   R8: ffff95fec7df35a8   R9: 0000000000000000
          R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
          R13: 0000000000000000  R14: 0000000000000000  R15: ffff95fed33c09a8
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core]
       #9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core]
      
      This happened when the check for valid data was moved out of nvme_identify_ns()
      into one of the callers.  Fix this by checking in both callers.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186
      Fixes: 0dd6fff2 ("nvme: bring back auto-removal of deleted namespaces during sequential scan")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      d8b90d60
    • Maurizio Lombardi's avatar
      nvme-core: fix a memory leak in nvme_ns_info_from_identify() · e3139cef
      Maurizio Lombardi authored
      In case of error, free the nvme_id_ns structure that was allocated
      by nvme_identify_ns().
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarKanchan Joshi <joshi.k@samsung.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      e3139cef
    • Mark O'Donovan's avatar
      nvme: fine-tune sending of first keep-alive · 136cfcb8
      Mark O'Donovan authored
      Keep-alive commands are sent half-way through the kato period.
      This normally works well but fails when the keep-alive system is
      started when we are more than half way through the kato.
      This can happen on larger setups or due to host delays.
      With this change we now time the initial keep-alive command from
      the controller initialisation time, rather than the keep-alive
      mechanism activation time.
      Signed-off-by: default avatarMark O'Donovan <shiftee@posteo.net>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      136cfcb8
  7. 24 Nov, 2023 1 commit
    • Markus Weippert's avatar
      bcache: revert replacing IS_ERR_OR_NULL with IS_ERR · bb6cc253
      Markus Weippert authored
      Commit 028ddcac ("bcache: Remove unnecessary NULL point check in
      node allocations") replaced IS_ERR_OR_NULL by IS_ERR. This leads to a
      NULL pointer dereference.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000080
      Call Trace:
       ? __die_body.cold+0x1a/0x1f
       ? page_fault_oops+0xd2/0x2b0
       ? exc_page_fault+0x70/0x170
       ? asm_exc_page_fault+0x22/0x30
       ? btree_node_free+0xf/0x160 [bcache]
       ? up_write+0x32/0x60
       btree_gc_coalesce+0x2aa/0x890 [bcache]
       ? bch_extent_bad+0x70/0x170 [bcache]
       btree_gc_recurse+0x130/0x390 [bcache]
       ? btree_gc_mark_node+0x72/0x230 [bcache]
       bch_btree_gc+0x5da/0x600 [bcache]
       ? cpuusage_read+0x10/0x10
       ? bch_btree_gc+0x600/0x600 [bcache]
       bch_gc_thread+0x135/0x180 [bcache]
      
      The relevant code starts with:
      
          new_nodes[0] = NULL;
      
          for (i = 0; i < nodes; i++) {
              if (__bch_keylist_realloc(&keylist, bkey_u64s(&r[i].b->key)))
                  goto out_nocoalesce;
          // ...
      out_nocoalesce:
          // ...
          for (i = 0; i < nodes; i++)
              if (!IS_ERR(new_nodes[i])) {  // IS_ERR_OR_NULL before
      028ddcac
                  btree_node_free(new_nodes[i]);  // new_nodes[0] is NULL
                  rw_unlock(true, new_nodes[i]);
              }
      
      This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this.
      
      Fixes: 028ddcac ("bcache: Remove unnecessary NULL point check in node allocations")
      Link: https://lore.kernel.org/all/3DF4A87A-2AC1-4893-AE5F-E921478419A9@suse.de/
      Cc: stable@vger.kernel.org
      Cc: Zheng Wang <zyytlz.wz@163.com>
      Cc: Coly Li <colyli@suse.de>
      Signed-off-by: default avatarMarkus Weippert <markus@gekmihesg.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bb6cc253
  8. 23 Nov, 2023 3 commits
    • Arnd Bergmann's avatar
      nvme: tcp: fix compile-time checks for TLS mode · 0e6c4fe7
      Arnd Bergmann authored
      When CONFIG_NVME_KEYRING is enabled as a loadable module, but the TCP
      host code is built-in, it fails to link:
      
      arm-linux-gnueabi-ld: drivers/nvme/host/tcp.o: in function `nvme_tcp_setup_ctrl':
      tcp.c:(.text+0x1940): undefined reference to `nvme_tls_psk_default'
      
      The problem is that the compile-time conditionals are inconsistent here,
      using a mix of #ifdef CONFIG_NVME_TCP_TLS, IS_ENABLED(CONFIG_NVME_TCP_TLS)
      and IS_ENABLED(CONFIG_NVME_KEYRING) checks, with CONFIG_NVME_KEYRING
      controlling whether the implementation is actually built.
      
      Change it to use IS_ENABLED(CONFIG_NVME_KEYRING) checks consistently,
      which should help readability and make it less error-prone. Combining
      it with the check for the ctrl->opts->tls flag lets the compiler drop
      all the TLS code in configurations without this feature, which also
      helps runtime behavior in addition to avoiding the link failure.
      
      To make it possible for the compiler to build the dead code, both
      the tls_handshake_timeout variable and the TLS specific members
      of nvme_tcp_queue need to be moved out of the #ifdef block as well,
      but at least the former of these gets optimized out again.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20231122224719.4042108-4-arnd@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0e6c4fe7
    • Arnd Bergmann's avatar
      nvme: target: fix Kconfig select statements · 65e2a74c
      Arnd Bergmann authored
      When the NVME target code is built-in but its TCP frontend is a loadable
      module, enabling keyring support causes a link failure:
      
      x86_64-linux-ld: vmlinux.o: in function `nvmet_ports_make':
      configfs.c:(.text+0x100a2110): undefined reference to `nvme_keyring_id'
      
      The problem is that CONFIG_NVME_TARGET_TCP_TLS is a 'bool' symbol that
      depends on the tristate CONFIG_NVME_TARGET_TCP, so any 'select' from
      it inherits the state of the tristate symbol rather than the intended
      CONFIG_NVME_TARGET one that contains the actual call.
      
      The same thing is true for CONFIG_KEYS, which itself is required for
      NVME_KEYRING.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20231122224719.4042108-3-arnd@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      65e2a74c
    • Arnd Bergmann's avatar
      nvme: target: fix nvme_keyring_id() references · d78abcba
      Arnd Bergmann authored
      In configurations without CONFIG_NVME_TARGET_TCP_TLS, the keyring
      code might not be available, or using it will result in a runtime
      failure:
      
      x86_64-linux-ld: vmlinux.o: in function `nvmet_ports_make':
      configfs.c:(.text+0x100a2110): undefined reference to `nvme_keyring_id'
      
      Add a check to ensure we only check the keyring if there is a chance
      of it being used, which avoids both the runtime and link-time
      problems.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20231122224719.4042108-2-arnd@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d78abcba
  9. 22 Nov, 2023 2 commits
  10. 21 Nov, 2023 1 commit
    • Li Nan's avatar
      nbd: pass nbd_sock to nbd_read_reply() instead of index · 98c598af
      Li Nan authored
      If a socket is processing ioctl 'NBD_SET_SOCK', config->socks might be
      krealloc in nbd_add_socket(), and a garbage request is received now, a UAF
      may occurs.
      
        T1
        nbd_ioctl
         __nbd_ioctl
          nbd_add_socket
           blk_mq_freeze_queue
      				T2
        				recv_work
        				 nbd_read_reply
        				  sock_xmit
           krealloc config->socks
      				   def config->socks
      
      Pass nbd_sock to nbd_read_reply(). And introduce a new function
      sock_xmit_recv(), which differs from sock_xmit only in the way it get
      socket.
      
      ==================================================================
      BUG: KASAN: use-after-free in sock_xmit+0x525/0x550
      Read of size 8 at addr ffff8880188ec428 by task kworker/u12:1/18779
      
      Workqueue: knbd4-recv recv_work
      Call Trace:
       __dump_stack
       dump_stack+0xbe/0xfd
       print_address_description.constprop.0+0x19/0x170
       __kasan_report.cold+0x6c/0x84
       kasan_report+0x3a/0x50
       sock_xmit+0x525/0x550
       nbd_read_reply+0xfe/0x2c0
       recv_work+0x1c2/0x750
       process_one_work+0x6b6/0xf10
       worker_thread+0xdd/0xd80
       kthread+0x30a/0x410
       ret_from_fork+0x22/0x30
      
      Allocated by task 18784:
       kasan_save_stack+0x1b/0x40
       kasan_set_track
       set_alloc_info
       __kasan_kmalloc
       __kasan_kmalloc.constprop.0+0xf0/0x130
       slab_post_alloc_hook
       slab_alloc_node
       slab_alloc
       __kmalloc_track_caller+0x157/0x550
       __do_krealloc
       krealloc+0x37/0xb0
       nbd_add_socket
       +0x2d3/0x880
       __nbd_ioctl
       nbd_ioctl+0x584/0x8e0
       __blkdev_driver_ioctl
       blkdev_ioctl+0x2a0/0x6e0
       block_ioctl+0xee/0x130
       vfs_ioctl
       __do_sys_ioctl
       __se_sys_ioctl+0x138/0x190
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      Freed by task 18784:
       kasan_save_stack+0x1b/0x40
       kasan_set_track+0x1c/0x30
       kasan_set_free_info+0x20/0x40
       __kasan_slab_free.part.0+0x13f/0x1b0
       slab_free_hook
       slab_free_freelist_hook
       slab_free
       kfree+0xcb/0x6c0
       krealloc+0x56/0xb0
       nbd_add_socket+0x2d3/0x880
       __nbd_ioctl
       nbd_ioctl+0x584/0x8e0
       __blkdev_driver_ioctl
       blkdev_ioctl+0x2a0/0x6e0
       block_ioctl+0xee/0x130
       vfs_ioctl
       __do_sys_ioctl
       __se_sys_ioctl+0x138/0x190
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x61/0xc6
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20230911023308.3467802-1-linan666@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      98c598af
  11. 20 Nov, 2023 18 commits