1. 11 Nov, 2017 13 commits
    • James Smart's avatar
      nvme-fc: decouple ns references from lldd references · 158bfb88
      James Smart authored
      In the lldd api, a lldd may unregister a remoteport (loss of connectivity
      or driver unload) or localport (driver unload). The lldd must wait for the
      remoteport_delete or localport_delete before completing its actions post
      the unregister.  The xxx_deletes currently occur only when the xxxport
      structure is fully freed after all references are removed. Thus the lldd
      may be held hostage until an app or in-kernel entity that has a namespace
      open finally closes so the namespace can be removed, the controller
      removed, thus the transport objects, thus the lldd.
      
      This patch decouples the transport and os-facing objects from the lldd
      and the remoteport and localport. There is a point in all deletions
      where the transport will no longer interact with the lldd on behalf of
      a controller. That point centers around the association established
      with the target/subsystem. It will access the lldd whenever it attempts
      to create an association and while the association is active. New
      associations may only be created if the remoteport is live (thus the
      localport is live). It will not access the lldd after deleting the
      association.
      
      Therefore, the patch tracks the count of active controllers - those with
      associations being created or that are active - on a remoteport. It also
      tracks the number of remoteports that have active controllers, on a
      a localport. When a remoteport is unregistered, as soon as there are no
      active controllers, the lldd's remoteport_delete may be called and the
      lldd may continue. Similarly, when a localport is unregistered, as soon
      as there are no remoteports with active controllers, the localport_delete
      callback may be made. This significantly speeds up unregistration with
      the lldd.
      
      The transport objects continue in suspended status with reconnect timers
      running, and upon expiration, normal ref-counting will occur and the
      objects will be freed. The transport object may still be held hostage
      by the application/kernel module, but that is acceptable.
      
      With this change, the lldd may be fully unloaded and reloaded, and
      if registrations occur prior to the timeouts, the nvme controller and
      namespaces will resume normally as if a link bounce.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      158bfb88
    • James Smart's avatar
      nvme-fc: fix localport resume using stale values · c5760f30
      James Smart authored
      The localport resume was not updating the lldd ops structure. If the
      lldd is unloaded and reloaded, the ops pointers will differ.
      
      Additionally, as there are device references taken by the localport,
      ensure that resume only resumes if the device matches as well.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c5760f30
    • Keith Busch's avatar
      nvme: check admin passthru command effects · 84fef62d
      Keith Busch authored
      The NVMe standard provides a command effects log page so the host may
      be aware of special requirements it may need to do for a particular
      command. For example, the command may need to run with IO quiesced to
      prevent timeouts or undefined behavior, or it may change the logical block
      formats that determine how the host needs to construct future commands.
      
      This patch saves the nvme command effects log page if the controller
      supports it, and performs appropriate actions before and after an admin
      passthrough command is completed. If the controller does not support the
      command effects log page, the driver will define the effects for known
      opcodes. The nvme format and santize are the only commands in this patch
      with known effects.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      84fef62d
    • Keith Busch's avatar
      nvme: factor get log into a helper · c627c487
      Keith Busch authored
      And fix the warning on a successful firmware log.
      Reviewed-by: default avatarJavier González <javier@cnexlabs.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c627c487
    • Christoph Hellwig's avatar
      nvme: fix and clarify the check for missing metadata · 715ea9e0
      Christoph Hellwig authored
      Update the check in nvme_setup_rw for missing metadata so that it is
      together with the other metadata handling, does not contain impossible
      to reach conditions and warns if we get an impossible requests for
      a (non-PI) metadata-enabled namespace when CONFIG_BLK_DEV_INTEGRITY
      is not set.
      
      Also add a little helper that checks if a given metadata configuration
      contains protection information
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarJavier González <jg@lightnvm.io>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      715ea9e0
    • Christoph Hellwig's avatar
      nvme: split __nvme_revalidate_disk · 24b0b58c
      Christoph Hellwig authored
      Split out the code that applies the calculate value to a given disk/queue
      into new helper that can be reused by the multipath code.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      24b0b58c
    • Christoph Hellwig's avatar
      nvme: set the chunk size before freezing the queue · 6e78f21a
      Christoph Hellwig authored
      We don't need a frozen queue to update the chunk_size, which just is a
      hint, and moving it a little earlier will allow for some better code
      reuse with the multipath code.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6e78f21a
    • Christoph Hellwig's avatar
      nvme: don't pass struct nvme_ns to nvme_config_discard · 30e5e929
      Christoph Hellwig authored
      To allow reusing this function for the multipath node.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      30e5e929
    • Christoph Hellwig's avatar
      nvme: don't pass struct nvme_ns to nvme_init_integrity · 39b7baa4
      Christoph Hellwig authored
      To allow reusing this function for the multipath node.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      39b7baa4
    • Christoph Hellwig's avatar
      nvme: always unregister the integrity profile in __nvme_revalidate_disk · b5be3b39
      Christoph Hellwig authored
      This is safe because the queue is always frozen when we revalidate, and
      it simplifies both the existing code as well as the multipath
      implementation.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b5be3b39
    • Christoph Hellwig's avatar
      nvme: move the dying queue check from cancel to completion · e54b064c
      Christoph Hellwig authored
      With multipath we don't want a hard DNR bit on a request that is cancelled
      by a controller reset, but instead want to be able to retry it on another
      patch.  To archive this don't always set the DNR bit when the queue is
      dying in nvme_cancel_request, but defer that decision to
      nvme_req_needs_retry.  Note that it applies to any command there and not
      just cancelled commands, but one the queue is dying that is the right
      thing to do anyway.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e54b064c
    • Jens Axboe's avatar
      blktrace: fix unlocked registration of tracepoints · a6da0024
      Jens Axboe authored
      We need to ensure that tracepoints are registered and unregistered
      with the users of them. The existing atomic count isn't enough for
      that. Add a lock around the tracepoints, so we serialize access
      to them.
      
      This fixes cases where we have multiple users setting up and
      tearing down tracepoints, like this:
      
      CPU: 0 PID: 2995 Comm: syzkaller857118 Not tainted
      4.14.0-rc5-next-20171018+ #36
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:16 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:52
        panic+0x1e4/0x41c kernel/panic.c:183
        __warn+0x1c4/0x1e0 kernel/panic.c:546
        report_bug+0x211/0x2d0 lib/bug.c:183
        fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177
        do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
        do_trap+0x260/0x390 arch/x86/kernel/traps.c:260
        do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297
        do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
        invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:tracepoint_add_func kernel/tracepoint.c:210 [inline]
      RIP: 0010:tracepoint_probe_register_prio+0x397/0x9a0 kernel/tracepoint.c:283
      RSP: 0018:ffff8801d1d1f6c0 EFLAGS: 00010293
      RAX: ffff8801d22e8540 RBX: 00000000ffffffef RCX: ffffffff81710f07
      RDX: 0000000000000000 RSI: ffffffff85b679c0 RDI: ffff8801d5f19818
      RBP: ffff8801d1d1f7c8 R08: ffffffff81710c10 R09: 0000000000000004
      R10: ffff8801d1d1f6b0 R11: 0000000000000003 R12: ffffffff817597f0
      R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8801d1d1f7a0
        tracepoint_probe_register+0x2a/0x40 kernel/tracepoint.c:304
        register_trace_block_rq_insert include/trace/events/block.h:191 [inline]
        blk_register_tracepoints+0x1e/0x2f0 kernel/trace/blktrace.c:1043
        do_blk_trace_setup+0xa10/0xcf0 kernel/trace/blktrace.c:542
        blk_trace_setup+0xbd/0x180 kernel/trace/blktrace.c:564
        sg_ioctl+0xc71/0x2d90 drivers/scsi/sg.c:1089
        vfs_ioctl fs/ioctl.c:45 [inline]
        do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
        SYSC_ioctl fs/ioctl.c:700 [inline]
        SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x444339
      RSP: 002b:00007ffe05bb5b18 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000006d66c0 RCX: 0000000000444339
      RDX: 000000002084cf90 RSI: 00000000c0481273 RDI: 0000000000000009
      RBP: 0000000000000082 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
      R13: 00000000c0481273 R14: 0000000000000000 R15: 0000000000000000
      
      since we can now run these in parallel. Ensure that the exported helpers
      for doing this are grabbing the queue trace mutex.
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a6da0024
    • Jens Axboe's avatar
      blktrace: fix unlocked access to init/start-stop/teardown · 1f2cac10
      Jens Axboe authored
      sg.c calls into the blktrace functions without holding the proper queue
      mutex for doing setup, start/stop, or teardown.
      
      Add internal unlocked variants, and export the ones that do the proper
      locking.
      
      Fixes: 6da127ad ("blktrace: Add blktrace ioctls to SCSI generic devices")
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1f2cac10
  2. 07 Nov, 2017 2 commits
  3. 06 Nov, 2017 2 commits
  4. 04 Nov, 2017 11 commits
  5. 03 Nov, 2017 12 commits