1. 11 Nov, 2017 8 commits
  2. 07 Nov, 2017 2 commits
  3. 06 Nov, 2017 2 commits
  4. 04 Nov, 2017 11 commits
  5. 03 Nov, 2017 12 commits
  6. 02 Nov, 2017 2 commits
    • Arnd Bergmann's avatar
      skd: use ktime_get_real_seconds() · 474f5da2
      Arnd Bergmann authored
      Like many storage drivers, skd uses an unsigned 32-bit number for
      interchanging the current time with the firmware. This will overflow in
      y2106 and is otherwise safe.
      
      However, the get_seconds() function is generally considered deprecated
      since the behavior is different between 32-bit and 64-bit architectures,
      and using it may indicate a bigger problem.
      
      To annotate that we've thought about this, let's add a comment here
      and migrate to the ktime_get_real_seconds() function that consistently
      returns a 64-bit number.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      474f5da2
    • Arnd Bergmann's avatar
      block: fix CDROM dependency on BLK_DEV · c091fbe9
      Arnd Bergmann authored
      After the cdrom cleanup, I get randconfig warnings for some configurations:
      
      warning: (BLK_DEV_IDECD && BLK_DEV_SR) selects CDROM which has unmet direct dependencies (BLK_DEV)
      
      This adds an explicit BLK_DEV dependency for both drivers. The other
      drivers that select 'CDROM' already have this and don't need a change.
      
      Fixes: 2a750166 ("block: Rework drivers/cdrom/Makefile")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c091fbe9
  7. 01 Nov, 2017 3 commits
    • Keith Busch's avatar
      nvme: Remove unused headers · 3639efef
      Keith Busch authored
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      3639efef
    • James Smart's avatar
      nvmet: fix fatal_err_work deadlock · a96d4bd8
      James Smart authored
      Below is a stack trace for an issue that was reported.
      
      What's happening is that the nvmet layer had it's controller kato
      timeout fire, which causes it to schedule its fatal error handler
      via the fatal_err_work element. The error handler is invoked, which
      calls the transport delete_ctrl() entry point, and as the transport
      tears down the controller, nvmet_sq_destroy ends up doing the final
      put on the ctlr causing it to enter its free routine. The ctlr free
      routine does a cancel_work_sync() on fatal_err_work element, which
      then does a flush_work and wait_for_completion. But, as the wait is
      in the context of the work element being flushed, its in a catch-22
      and the thread hangs.
      
      [  326.903131] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
      [  326.909832] nvmet: ctrl 1 fatal error occurred!
      [  327.643100] lpfc 0000:04:00.0: 0:6313 NVMET Defer ctx release xri
      x114 flg x2
      [  494.582064] INFO: task kworker/0:2:243 blocked for more than 120
      seconds.
      [  494.589638]       Not tainted 4.14.0-rc1.James+ #1
      [  494.594986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      disables this message.
      [  494.603718] kworker/0:2     D    0   243      2 0x80000000
      [  494.609839] Workqueue: events nvmet_fatal_error_handler [nvmet]
      [  494.616447] Call Trace:
      [  494.619177]  __schedule+0x28d/0x890
      [  494.623070]  schedule+0x36/0x80
      [  494.626571]  schedule_timeout+0x1dd/0x300
      [  494.631044]  ? dequeue_task_fair+0x592/0x840
      [  494.635810]  ? pick_next_task_fair+0x23b/0x5c0
      [  494.640756]  wait_for_completion+0x121/0x180
      [  494.645521]  ? wake_up_q+0x80/0x80
      [  494.649315]  flush_work+0x11d/0x1a0
      [  494.653206]  ? wake_up_worker+0x30/0x30
      [  494.657484]  __cancel_work_timer+0x10b/0x190
      [  494.662249]  cancel_work_sync+0x10/0x20
      [  494.666525]  nvmet_ctrl_put+0xa3/0x100 [nvmet]
      [  494.671482]  nvmet_sq_:q+0x64/0xd0 [nvmet]
      [  494.676540]  nvmet_fc_delete_target_queue+0x202/0x220 [nvmet_fc]
      [  494.683245]  nvmet_fc_delete_target_assoc+0x6d/0xc0 [nvmet_fc]
      [  494.689743]  nvmet_fc_delete_ctrl+0x137/0x1a0 [nvmet_fc]
      [  494.695673]  nvmet_fatal_error_handler+0x30/0x40 [nvmet]
      [  494.701589]  process_one_work+0x149/0x360
      [  494.706064]  worker_thread+0x4d/0x3c0
      [  494.710148]  kthread+0x109/0x140
      [  494.713751]  ? rescuer_thread+0x380/0x380
      [  494.718214]  ? kthread_park+0x60/0x60
      
      Correct by having the fc transport convert to a different workq context
      for the actual controller teardown which may call the cancel_work_sync.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      a96d4bd8
    • James Smart's avatar
      nvme-fc: add dev_loss_tmo timeout and remoteport resume support · 2b632970
      James Smart authored
      When a remoteport is unregistered (connectivity lost), the following
      actions are taken:
      
       - the remoteport is marked DELETED
       - the time when dev_loss_tmo would expire is set in the remoteport
       - all controllers on the remoteport are reset.
      
      After a controller resets, it will stall in a RECONNECTING state waiting
      for one of the following:
      
       - the controller will continue to attempt reconnect per max_retries and
         reconnect_delay.  As no remoteport connectivity, the reconnect attempt
         will immediately fail.  If max reconnects has not been reached, a new
         reconnect_delay timer will be schedule.  If the current time plus
         another reconnect_delay exceeds when dev_loss_tmo expires on the remote
         port, then the reconnect_delay will be shortend to schedule no later
         than when dev_loss_tmo expires.  If max reconnect attempts are reached
         (e.g. ctrl_loss_tmo reached) or dev_loss_tmo ix exceeded without
         connectivity, the controller is deleted.
       - the remoteport is re-registered prior to dev_loss_tmo expiring.
         The resume of the remoteport will immediately attempt to reconnect
         each of its suspended controllers.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      [hch: updated to use nvme_delete_ctrl]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      2b632970