1. 29 Oct, 2020 5 commits
    • Jens Axboe's avatar
      Merge tag 'nvme-5.10-2020-10-29' of git://git.infradead.org/nvme into block-5.10 · 24bb45fd
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme updates for 5.10:
      
       - improve zone revalidation (Keith Busch)
       - gracefully handle zero length messages in nvme-rdma (zhenwei pi)
       - nvme-fc error handling fixes (James Smart)
       - nvmet tracing NULL pointer dereference fix (Chaitanya Kulkarni)"
      
      * tag 'nvme-5.10-2020-10-29' of git://git.infradead.org/nvme:
        nvmet: fix a NULL pointer dereference when tracing the flush command
        nvme-fc: remove nvme_fc_terminate_io()
        nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery
        nvme-fc: remove err_work work item
        nvme-fc: track error_recovery while connecting
        nvme-rdma: handle unexpected nvme completion data length
        nvme: ignore zone validate errors on subsequent scans
      24bb45fd
    • Andy Shevchenko's avatar
      xsysace: use platform_get_resource() and platform_get_irq_optional() · 7cb6e22b
      Andy Shevchenko authored
      Use platform_get_resource() to fetch the memory resource and
      platform_get_irq_optional() to get optional IRQ instead of
      open-coded variants.
      
      IRQ is not supposed to be changed at runtime, so there is
      no functional change in ace_fsm_yieldirq().
      
      On the other hand we now take first resources instead of last ones
      to proceed. I can't imagine how broken should be firmware to have
      a garbage in the first resource slots. But if it the case, it needs
      to be documented.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: default avatarMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7cb6e22b
    • Damien Le Moal's avatar
      null_blk: Fix locking in zoned mode · aa1c09cb
      Damien Le Moal authored
      When the zoned mode is enabled in null_blk, Serializing read, write
      and zone management operations for each zone is necessary to protect
      device level information for managing zone resources (zone open and
      closed counters) as well as each zone condition and write pointer
      position. Commit 35bc10b2 ("null_blk: synchronization fix for
      zoned device") introduced a spinlock to implement this serialization.
      However, when memory backing is also enabled, GFP_NOIO memory
      allocations are executed under the spinlock, resulting in might_sleep()
      warnings. Furthermore, the zone_lock spinlock is locked/unlocked using
      spin_lock_irq/spin_unlock_irq, similarly to the memory backing code with
      the nullb->lock spinlock. This nested use of irq locks wrecks the irq
      enabled/disabled state.
      
      Fix all this by introducing a bitmap for per-zone lock, with locking
      implemented using wait_on_bit_lock_io() and clear_and_wake_up_bit().
      This locking mechanism allows keeping a zone locked while executing
      null_process_cmd(), serializing all operations to the zone while
      allowing to sleep during memory backing allocation with GFP_NOIO.
      Device level zone resource management information is protected using
      a spinlock which is not held while executing null_process_cmd();
      
      Fixes: 35bc10b2 ("null_blk: synchronization fix for zoned device")
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      aa1c09cb
    • Damien Le Moal's avatar
      null_blk: Fix zone reset all tracing · f9c91042
      Damien Le Moal authored
      In the cae of the REQ_OP_ZONE_RESET_ALL operation, the command sector is
      ignored and the operation is applied to all sequential zones. For these
      commands, tracing the effect of the command using the command sector to
      determine the target zone is thus incorrect.
      
      Fix null_zone_mgmt() zone condition tracing in the case of
      REQ_OP_ZONE_RESET_ALL to apply tracing to all sequential zones that are
      not already empty.
      
      Fixes: 766c3297 ("null_blk: add trace in null_blk_zoned.c")
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f9c91042
    • Ming Lei's avatar
      nbd: don't update block size after device is started · b40813dd
      Ming Lei authored
      Mounted NBD device can be resized, one use case is rbd-nbd.
      
      Fix the issue by setting up default block size, then not touch it
      in nbd_size_update() any more. This kind of usage is aligned with loop
      which has same use case too.
      
      Cc: stable@vger.kernel.org
      Fixes: c8a83a6b ("nbd: Use set_blocksize() to set device blocksize")
      Reported-by: default avatarlining <lining2020x@163.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jan Kara <jack@suse.cz>
      Tested-by: default avatarlining <lining2020x@163.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b40813dd
  2. 28 Oct, 2020 2 commits
  3. 27 Oct, 2020 7 commits
    • Chaitanya Kulkarni's avatar
      nvmet: fix a NULL pointer dereference when tracing the flush command · 3c3751f2
      Chaitanya Kulkarni authored
      When target side trace in turned on and flush command is issued from the
      host it results in the following Oops.
      
      [  856.789724] BUG: kernel NULL pointer dereference, address: 0000000000000068
      [  856.790686] #PF: supervisor read access in kernel mode
      [  856.791262] #PF: error_code(0x0000) - not-present page
      [  856.791863] PGD 6d7110067 P4D 6d7110067 PUD 66f0ad067 PMD 0
      [  856.792527] Oops: 0000 [#1] SMP NOPTI
      [  856.792950] CPU: 15 PID: 7034 Comm: nvme Tainted: G           OE     5.9.0nvme-5.9+ #71
      [  856.793790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214
      [  856.794956] RIP: 0010:trace_event_raw_event_nvmet_req_init+0x13e/0x170 [nvmet]
      [  856.795734] Code: 41 5c 41 5d c3 31 d2 31 f6 e8 4e 9b b8 e0 e9 0e ff ff ff 49 8b 55 00 48 8b 38 8b 0
      [  856.797740] RSP: 0018:ffffc90001be3a60 EFLAGS: 00010246
      [  856.798375] RAX: 0000000000000000 RBX: ffff8887e7d2c01c RCX: 0000000000000000
      [  856.799234] RDX: 0000000000000020 RSI: 0000000057e70ea2 RDI: ffff8887e7d2c034
      [  856.800088] RBP: ffff88869f710578 R08: ffff888807500d40 R09: 00000000fffffffe
      [  856.800951] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff8887e7d2c034
      [  856.801807] R13: ffff88869f7105c8 R14: 0000000000000040 R15: ffff88869f710440
      [  856.802667] FS:  00007f6a22bd8780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
      [  856.803635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  856.804367] CR2: 0000000000000068 CR3: 00000006d73e0000 CR4: 00000000003506e0
      [  856.805283] Call Trace:
      [  856.805613]  nvmet_req_init+0x27c/0x480 [nvmet]
      [  856.806200]  nvme_loop_queue_rq+0xcb/0x1d0 [nvme_loop]
      [  856.806862]  blk_mq_dispatch_rq_list+0x123/0x7b0
      [  856.807459]  ? kvm_sched_clock_read+0x14/0x30
      [  856.808025]  __blk_mq_sched_dispatch_requests+0xc7/0x170
      [  856.808708]  blk_mq_sched_dispatch_requests+0x30/0x60
      [  856.809372]  __blk_mq_run_hw_queue+0x70/0x100
      [  856.809935]  __blk_mq_delay_run_hw_queue+0x156/0x170
      [  856.810574]  blk_mq_run_hw_queue+0x86/0xe0
      [  856.811104]  blk_mq_sched_insert_request+0xef/0x160
      [  856.811733]  blk_execute_rq+0x69/0xc0
      [  856.812212]  ? blk_mq_rq_ctx_init+0xd0/0x230
      [  856.812784]  nvme_execute_passthru_rq+0x57/0x130 [nvme_core]
      [  856.813461]  nvme_submit_user_cmd+0xeb/0x300 [nvme_core]
      [  856.814099]  nvme_user_cmd.isra.82+0x11e/0x1a0 [nvme_core]
      [  856.814752]  blkdev_ioctl+0x1dc/0x2c0
      [  856.815197]  block_ioctl+0x3f/0x50
      [  856.815606]  __x64_sys_ioctl+0x84/0xc0
      [  856.816074]  do_syscall_64+0x33/0x40
      [  856.816533]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  856.817168] RIP: 0033:0x7f6a222ed107
      [  856.817617] Code: 44 00 00 48 8b 05 81 cd 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 8
      [  856.819901] RSP: 002b:00007ffca848f058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
      [  856.820846] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6a222ed107
      [  856.821726] RDX: 00007ffca848f060 RSI: 00000000c0484e43 RDI: 0000000000000003
      [  856.822603] RBP: 0000000000000003 R08: 000000000000003f R09: 0000000000000005
      [  856.823478] R10: 00007ffca848ece0 R11: 0000000000000202 R12: 00007ffca84912d3
      [  856.824359] R13: 00007ffca848f4d0 R14: 0000000000000002 R15: 000000000067e900
      [  856.825236] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corel
      
      Move the nvmet_req_init() tracepoint after we parse the command in
      nvmet_req_init() so that we can get rid of the duplicate
      nvmet_find_namespace() call.
      Rename __assign_disk_name() ->  __assign_req_name(). Now that we call
      tracepoint after parsing the command simplify the newly added
      __assign_req_name() which fixes this bug.
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      3c3751f2
    • James Smart's avatar
      nvme-fc: remove nvme_fc_terminate_io() · ac9b820e
      James Smart authored
      __nvme_fc_terminate_io() is now called by only 1 place, in reset_work.
      Consoldate and move the functionality of terminate_io into reset_work.
      
      In reset_work, rather than calling the create_association directly,
      schedule the connect work element to do its thing. After scheduling,
      flush the connect work element to continue with semantic of not
      returning until connect has been attempted at least once.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      ac9b820e
    • James Smart's avatar
      nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery · 95ced8a2
      James Smart authored
      nvme_fc_error_recovery() special cases handling when in CONNECTING state
      and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself
      special cases CONNECTING state and calls the routine to abort outstanding
      ios.
      
      Simplify the sequence by putting the call to abort outstanding I/Os
      directly in nvme_fc_error_recovery.
      
      Move the location of __nvme_fc_abort_outstanding_ios(), and
      nvme_fc_terminate_exchange() which is called by it, to avoid adding
      function prototypes for nvme_fc_error_recovery().
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      95ced8a2
    • James Smart's avatar
      nvme-fc: remove err_work work item · 9c2bb257
      James Smart authored
      err_work was created to handle errors (mainly I/O timeouts) while in
      CONNECTING state. The flag for err_work_active is also unneeded.
      
      Remove err_work_active and err_work.  The actions to abort I/Os are moved
      inline to nvme_error_recovery().
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      9c2bb257
    • James Smart's avatar
      nvme-fc: track error_recovery while connecting · caf1cbe3
      James Smart authored
      Whenever there are errors during CONNECTING, the driver recovers by
      aborting all outstanding ios and counts on the io completion to fail them
      and thus the connection/association they are on.  However, the connection
      failure depends on a failure state from the core routines.  Not all
      commands that are issued by the core routine are guaranteed to cause a
      failure of the core routine. They may be treated as a failure status and
      the status is then ignored.
      
      As such, whenever the transport enters error_recovery while CONNECTING,
      it will set a new flag indicating an association failed. The
      create_association routine which creates and initializes the controller,
      will monitor the state of the flag as well as the core routine error
      status and ensure the association fails if there was an error.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      caf1cbe3
    • zhenwei pi's avatar
      nvme-rdma: handle unexpected nvme completion data length · 25c1ca6e
      zhenwei pi authored
      Receiving a zero length message leads to the following warnings because
      the CQE is processed twice:
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 0 PID: 0 at lib/refcount.c:28
      
      RIP: 0010:refcount_warn_saturate+0xd9/0xe0
      Call Trace:
       <IRQ>
       nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma]
       __ib_process_cq+0x76/0x150 [ib_core]
       ...
      
      Sanity check the received data length, to avoids this.
      
      Thanks to Chao Leng & Sagi for suggestions.
      Signed-off-by: default avatarzhenwei pi <pizhenwei@bytedance.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      25c1ca6e
    • Keith Busch's avatar
      nvme: ignore zone validate errors on subsequent scans · 8685699c
      Keith Busch authored
      Revalidating nvme zoned namespaces requires IO commands, and there are
      controller states that prevent IO. For example, a sanitize in progress
      is required to fail all IO, but we don't want to remove a namespace
      we've previously added just because the controller is in such a state.
      Suppress the error in this case.
      Reported-by: default avatarMichael Nguyen <michael.nguyen@wdc.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8685699c
  4. 26 Oct, 2020 2 commits
  5. 25 Oct, 2020 17 commits
  6. 24 Oct, 2020 7 commits
    • Linus Torvalds's avatar
      Merge tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block · d7691390
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request from Christoph
           - rdma error handling fixes (Chao Leng)
           - fc error handling and reconnect fixes (James Smart)
           - fix the qid displace when tracing ioctl command (Keith Busch)
           - don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
           - fix MTDT for passthru (Logan Gunthorpe)
           - blacklist Write Same on more devices (Kai-Heng Feng)
           - fix an uninitialized work struct (zhenwei pi)"
      
       - lightnvm out-of-bounds fix (Colin)
      
       - SG allocation leak fix (Doug)
      
       - rnbd fixes (Gioh, Guoqing, Jack)
      
       - zone error translation fixes (Keith)
      
       - kerneldoc markup fix (Mauro)
      
       - zram lockdep fix (Peter)
      
       - Kill unused io_context members (Yufen)
      
       - NUMA memory allocation cleanup (Xianting)
      
       - NBD config wakeup fix (Xiubo)
      
      * tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
        block: blk-mq: fix a kernel-doc markup
        nvme-fc: shorten reconnect delay if possible for FC
        nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
        nvme-fc: fix error loop in create_hw_io_queues
        nvme-fc: fix io timeout to abort I/O
        null_blk: use zone status for max active/open
        nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
        nvmet: cleanup nvmet_passthru_map_sg()
        nvmet: limit passthru MTDS by BIO_MAX_PAGES
        nvmet: fix uninitialized work for zero kato
        nvme-pci: disable Write Zeroes on Sandisk Skyhawk
        nvme: use queuedata for nvme_req_qid
        nvme-rdma: fix crash due to incorrect cqe
        nvme-rdma: fix crash when connect rejected
        block: remove unused members for io_context
        blk-mq: remove the calling of local_memory_node()
        zram: Fix __zram_bvec_{read,write}() locking order
        skd_main: remove unused including <linux/version.h>
        sgl_alloc_order: fix memory leak
        lightnvm: fix out-of-bounds write to array devices->info[]
        ...
      d7691390
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block · af004187
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - fsize was missed in previous unification of work flags
      
       - Few fixes cleaning up the flags unification creds cases (Pavel)
      
       - Fix NUMA affinities for completely unplugged/replugged node for io-wq
      
       - Two fallout fixes from the set_fs changes. One local to io_uring, one
         for the splice entry point that io_uring uses.
      
       - Linked timeout fixes (Pavel)
      
       - Removal of ->flush() ->files work-around that we don't need anymore
         with referenced files (Pavel)
      
       - Various cleanups (Pavel)
      
      * tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
        splice: change exported internal do_splice() helper to take kernel offset
        io_uring: make loop_rw_iter() use original user supplied pointers
        io_uring: remove req cancel in ->flush()
        io-wq: re-set NUMA node affinities if CPUs come online
        io_uring: don't reuse linked_timeout
        io_uring: unify fsize with def->work_flags
        io_uring: fix racy REQ_F_LINK_TIMEOUT clearing
        io_uring: do poll's hash_node init in common code
        io_uring: inline io_poll_task_handler()
        io_uring: remove extra ->file check in poll prep
        io_uring: make cached_cq_overflow non atomic_t
        io_uring: inline io_fail_links()
        io_uring: kill ref get/drop in personality init
        io_uring: flags-based creds init in queue
      af004187
    • Linus Torvalds's avatar
      Merge tag 'libata-5.10-2020-10-24' of git://git.kernel.dk/linux-block · cb6b2897
      Linus Torvalds authored
      Pull libata fixes from Jens Axboe:
       "Two minor libata fixes:
      
         - Fix a DMA boundary mask regression for sata_rcar (Geert)
      
         - kerneldoc markup fix (Mauro)"
      
      * tag 'libata-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
        ata: fix some kernel-doc markups
        ata: sata_rcar: Fix DMA boundary mask
      cb6b2897
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 0eac1102
      Linus Torvalds authored
      Pull misc vfs updates from Al Viro:
       "Assorted stuff all over the place (the largest group here is
        Christoph's stat cleanups)"
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: remove KSTAT_QUERY_FLAGS
        fs: remove vfs_stat_set_lookup_flags
        fs: move vfs_fstatat out of line
        fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
        fs: remove vfs_statx_fd
        fs: omfs: use kmemdup() rather than kmalloc+memcpy
        [PATCH] reduce boilerplate in fsid handling
        fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
        selftests: mount: add nosymfollow tests
        Add a "nosymfollow" mount option.
      0eac1102
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping · 1b307ac8
      Linus Torvalds authored
      Pull dma-mapping fixes from Christoph Hellwig:
      
       - document the new dma_{alloc,free}_pages() API
      
       - two fixups for the dma-mapping.h split
      
      * tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: document dma_{alloc,free}_pages
        dma-mapping: move more functions to dma-map-ops.h
        ARM/sa1111: add a missing include of dma-map-ops.h
      1b307ac8
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 9bf8d8bc
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Two fixes for this merge window, and an unrelated bugfix for a host
        hang"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: ioapic: break infinite recursion on lazy EOI
        KVM: vmx: rename pi_init to avoid conflict with paride
        KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build
      9bf8d8bc
    • Linus Torvalds's avatar
      Merge tag 'x86_seves_fixes_for_v5.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c51ae124
      Linus Torvalds authored
      Pull x86 SEV-ES fixes from Borislav Petkov:
       "Three fixes to SEV-ES to correct setting up the new early pagetable on
        5-level paging machines, to always map boot_params and the kernel
        cmdline, and disable stack protector for ../compressed/head{32,64}.c.
        (Arvind Sankar)"
      
      * tag 'x86_seves_fixes_for_v5.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot/64: Explicitly map boot_params and command line
        x86/head/64: Disable stack protection for head$(BITS).o
        x86/boot/64: Initialize 5-level paging variables earlier
      c51ae124