1. 11 Nov, 2017 5 commits
    • Christoph Hellwig's avatar
      nvme: don't pass struct nvme_ns to nvme_init_integrity · 39b7baa4
      Christoph Hellwig authored
      To allow reusing this function for the multipath node.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      39b7baa4
    • Christoph Hellwig's avatar
      nvme: always unregister the integrity profile in __nvme_revalidate_disk · b5be3b39
      Christoph Hellwig authored
      This is safe because the queue is always frozen when we revalidate, and
      it simplifies both the existing code as well as the multipath
      implementation.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b5be3b39
    • Christoph Hellwig's avatar
      nvme: move the dying queue check from cancel to completion · e54b064c
      Christoph Hellwig authored
      With multipath we don't want a hard DNR bit on a request that is cancelled
      by a controller reset, but instead want to be able to retry it on another
      patch.  To archive this don't always set the DNR bit when the queue is
      dying in nvme_cancel_request, but defer that decision to
      nvme_req_needs_retry.  Note that it applies to any command there and not
      just cancelled commands, but one the queue is dying that is the right
      thing to do anyway.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e54b064c
    • Jens Axboe's avatar
      blktrace: fix unlocked registration of tracepoints · a6da0024
      Jens Axboe authored
      We need to ensure that tracepoints are registered and unregistered
      with the users of them. The existing atomic count isn't enough for
      that. Add a lock around the tracepoints, so we serialize access
      to them.
      
      This fixes cases where we have multiple users setting up and
      tearing down tracepoints, like this:
      
      CPU: 0 PID: 2995 Comm: syzkaller857118 Not tainted
      4.14.0-rc5-next-20171018+ #36
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:16 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:52
        panic+0x1e4/0x41c kernel/panic.c:183
        __warn+0x1c4/0x1e0 kernel/panic.c:546
        report_bug+0x211/0x2d0 lib/bug.c:183
        fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177
        do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
        do_trap+0x260/0x390 arch/x86/kernel/traps.c:260
        do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297
        do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
        invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:tracepoint_add_func kernel/tracepoint.c:210 [inline]
      RIP: 0010:tracepoint_probe_register_prio+0x397/0x9a0 kernel/tracepoint.c:283
      RSP: 0018:ffff8801d1d1f6c0 EFLAGS: 00010293
      RAX: ffff8801d22e8540 RBX: 00000000ffffffef RCX: ffffffff81710f07
      RDX: 0000000000000000 RSI: ffffffff85b679c0 RDI: ffff8801d5f19818
      RBP: ffff8801d1d1f7c8 R08: ffffffff81710c10 R09: 0000000000000004
      R10: ffff8801d1d1f6b0 R11: 0000000000000003 R12: ffffffff817597f0
      R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8801d1d1f7a0
        tracepoint_probe_register+0x2a/0x40 kernel/tracepoint.c:304
        register_trace_block_rq_insert include/trace/events/block.h:191 [inline]
        blk_register_tracepoints+0x1e/0x2f0 kernel/trace/blktrace.c:1043
        do_blk_trace_setup+0xa10/0xcf0 kernel/trace/blktrace.c:542
        blk_trace_setup+0xbd/0x180 kernel/trace/blktrace.c:564
        sg_ioctl+0xc71/0x2d90 drivers/scsi/sg.c:1089
        vfs_ioctl fs/ioctl.c:45 [inline]
        do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
        SYSC_ioctl fs/ioctl.c:700 [inline]
        SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x444339
      RSP: 002b:00007ffe05bb5b18 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000006d66c0 RCX: 0000000000444339
      RDX: 000000002084cf90 RSI: 00000000c0481273 RDI: 0000000000000009
      RBP: 0000000000000082 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
      R13: 00000000c0481273 R14: 0000000000000000 R15: 0000000000000000
      
      since we can now run these in parallel. Ensure that the exported helpers
      for doing this are grabbing the queue trace mutex.
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a6da0024
    • Jens Axboe's avatar
      blktrace: fix unlocked access to init/start-stop/teardown · 1f2cac10
      Jens Axboe authored
      sg.c calls into the blktrace functions without holding the proper queue
      mutex for doing setup, start/stop, or teardown.
      
      Add internal unlocked variants, and export the ones that do the proper
      locking.
      
      Fixes: 6da127ad ("blktrace: Add blktrace ioctls to SCSI generic devices")
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1f2cac10
  2. 07 Nov, 2017 2 commits
  3. 06 Nov, 2017 2 commits
  4. 04 Nov, 2017 11 commits
  5. 03 Nov, 2017 12 commits
  6. 02 Nov, 2017 2 commits
    • Arnd Bergmann's avatar
      skd: use ktime_get_real_seconds() · 474f5da2
      Arnd Bergmann authored
      Like many storage drivers, skd uses an unsigned 32-bit number for
      interchanging the current time with the firmware. This will overflow in
      y2106 and is otherwise safe.
      
      However, the get_seconds() function is generally considered deprecated
      since the behavior is different between 32-bit and 64-bit architectures,
      and using it may indicate a bigger problem.
      
      To annotate that we've thought about this, let's add a comment here
      and migrate to the ktime_get_real_seconds() function that consistently
      returns a 64-bit number.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      474f5da2
    • Arnd Bergmann's avatar
      block: fix CDROM dependency on BLK_DEV · c091fbe9
      Arnd Bergmann authored
      After the cdrom cleanup, I get randconfig warnings for some configurations:
      
      warning: (BLK_DEV_IDECD && BLK_DEV_SR) selects CDROM which has unmet direct dependencies (BLK_DEV)
      
      This adds an explicit BLK_DEV dependency for both drivers. The other
      drivers that select 'CDROM' already have this and don't need a change.
      
      Fixes: 2a750166 ("block: Rework drivers/cdrom/Makefile")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c091fbe9
  7. 01 Nov, 2017 6 commits
    • Keith Busch's avatar
      nvme: Remove unused headers · 3639efef
      Keith Busch authored
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      3639efef
    • James Smart's avatar
      nvmet: fix fatal_err_work deadlock · a96d4bd8
      James Smart authored
      Below is a stack trace for an issue that was reported.
      
      What's happening is that the nvmet layer had it's controller kato
      timeout fire, which causes it to schedule its fatal error handler
      via the fatal_err_work element. The error handler is invoked, which
      calls the transport delete_ctrl() entry point, and as the transport
      tears down the controller, nvmet_sq_destroy ends up doing the final
      put on the ctlr causing it to enter its free routine. The ctlr free
      routine does a cancel_work_sync() on fatal_err_work element, which
      then does a flush_work and wait_for_completion. But, as the wait is
      in the context of the work element being flushed, its in a catch-22
      and the thread hangs.
      
      [  326.903131] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
      [  326.909832] nvmet: ctrl 1 fatal error occurred!
      [  327.643100] lpfc 0000:04:00.0: 0:6313 NVMET Defer ctx release xri
      x114 flg x2
      [  494.582064] INFO: task kworker/0:2:243 blocked for more than 120
      seconds.
      [  494.589638]       Not tainted 4.14.0-rc1.James+ #1
      [  494.594986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      disables this message.
      [  494.603718] kworker/0:2     D    0   243      2 0x80000000
      [  494.609839] Workqueue: events nvmet_fatal_error_handler [nvmet]
      [  494.616447] Call Trace:
      [  494.619177]  __schedule+0x28d/0x890
      [  494.623070]  schedule+0x36/0x80
      [  494.626571]  schedule_timeout+0x1dd/0x300
      [  494.631044]  ? dequeue_task_fair+0x592/0x840
      [  494.635810]  ? pick_next_task_fair+0x23b/0x5c0
      [  494.640756]  wait_for_completion+0x121/0x180
      [  494.645521]  ? wake_up_q+0x80/0x80
      [  494.649315]  flush_work+0x11d/0x1a0
      [  494.653206]  ? wake_up_worker+0x30/0x30
      [  494.657484]  __cancel_work_timer+0x10b/0x190
      [  494.662249]  cancel_work_sync+0x10/0x20
      [  494.666525]  nvmet_ctrl_put+0xa3/0x100 [nvmet]
      [  494.671482]  nvmet_sq_:q+0x64/0xd0 [nvmet]
      [  494.676540]  nvmet_fc_delete_target_queue+0x202/0x220 [nvmet_fc]
      [  494.683245]  nvmet_fc_delete_target_assoc+0x6d/0xc0 [nvmet_fc]
      [  494.689743]  nvmet_fc_delete_ctrl+0x137/0x1a0 [nvmet_fc]
      [  494.695673]  nvmet_fatal_error_handler+0x30/0x40 [nvmet]
      [  494.701589]  process_one_work+0x149/0x360
      [  494.706064]  worker_thread+0x4d/0x3c0
      [  494.710148]  kthread+0x109/0x140
      [  494.713751]  ? rescuer_thread+0x380/0x380
      [  494.718214]  ? kthread_park+0x60/0x60
      
      Correct by having the fc transport convert to a different workq context
      for the actual controller teardown which may call the cancel_work_sync.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      a96d4bd8
    • James Smart's avatar
      nvme-fc: add dev_loss_tmo timeout and remoteport resume support · 2b632970
      James Smart authored
      When a remoteport is unregistered (connectivity lost), the following
      actions are taken:
      
       - the remoteport is marked DELETED
       - the time when dev_loss_tmo would expire is set in the remoteport
       - all controllers on the remoteport are reset.
      
      After a controller resets, it will stall in a RECONNECTING state waiting
      for one of the following:
      
       - the controller will continue to attempt reconnect per max_retries and
         reconnect_delay.  As no remoteport connectivity, the reconnect attempt
         will immediately fail.  If max reconnects has not been reached, a new
         reconnect_delay timer will be schedule.  If the current time plus
         another reconnect_delay exceeds when dev_loss_tmo expires on the remote
         port, then the reconnect_delay will be shortend to schedule no later
         than when dev_loss_tmo expires.  If max reconnect attempts are reached
         (e.g. ctrl_loss_tmo reached) or dev_loss_tmo ix exceeded without
         connectivity, the controller is deleted.
       - the remoteport is re-registered prior to dev_loss_tmo expiring.
         The resume of the remoteport will immediately attempt to reconnect
         each of its suspended controllers.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      [hch: updated to use nvme_delete_ctrl]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      2b632970
    • James Smart's avatar
      nvme: allow controller RESETTING to RECONNECTING transition · 3cec7f9d
      James Smart authored
      Transport will typically transition from LIVE to RESETTING when initially
      performing a reset or recovering from an error.  Adding this transition
      allows a transport to transition to RECONNECTING when it checks/waits for
      connectivity then creates new transport connections and reinits the
      controller.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      3cec7f9d
    • James Smart's avatar
      nvme-fc: check connectivity before initiating reconnects · 96e24801
      James Smart authored
      Check remoteport connectivity before initiating reconnects
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      96e24801
    • James Smart's avatar
      nvme-fc: add a dev_loss_tmo field to the remoteport · ac7fe82b
      James Smart authored
      Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device
      connectivity loss.
      
      The transport initializes the value in the nvme_fc_register_remoteport()
      call. If the value is not set, a default of 60s is set.
      
      Add a new routine to the api, nvme_fc_set_remoteport_devloss() routine,
      which allows the lldd to dynamically update the value on an existing
      remoteport.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      ac7fe82b