1. 17 Jan, 2018 1 commit
    • James Smart's avatar
      nvme-fc: fix rogue admin cmds stalling teardown · d625d05e
      James Smart authored
      When connectivity is lost to a device, the association is terminated
      and the blk-mq queues are quiesced/stopped. When connectivity is
      re-established, they are resumed.
      
      If an admin command is received while connectivity is list, the ioctl
      queues the command on the admin_q and the command stalls (the thread
      issuing the ioctl hangs/waits). if the connectivity is lost long
      enough such that the controller is then deleted, the delete code
      makes its calls to initiate the delete, which then expects the core
      layer to call the transport when all references are removed and the
      controller can be freed.  Unfortunately, nothing in this path dequeued
      the admin command, so a reference sits outstanding and things stop,
      hanging the delete indefinitely.
      
      Correct by unquiescing the admin queue in the delete association. This
      means any admin command (which should only be from an ioctl) issued
      after connectivity is lost will detect the controller is in a
      reconnecting state and will (fast) fail the command. Thus, a pending
      reference can no longer be created.  Once connectivity is re-established,
      a new ioctl/admin command would see proper device state and function again.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      d625d05e
  2. 15 Jan, 2018 6 commits
    • Sagi Grimberg's avatar
      nvmet: release a ns reference in nvmet_req_uninit if needed · 423b4487
      Sagi Grimberg authored
      nvmet_req_init looked up a namespace and took a reference on it (unless it
      failed prior to that). If the request is uninitialized (in error cases) we
      need to remove that reference in case it was taken, otherwise we leak
      namespace reference when calling nvme_req_uninit.
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      423b4487
    • Roland Dreier's avatar
      nvme-fabrics: fix memory leak when parsing host ID option · df351ef7
      Roland Dreier authored
      We use match_strdup() to get a copy of the option string for host ID string, but
      we just pass it to uuid_parse() and don't store the string pointer, so we need to
      kfree() the string after parsing it.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      df351ef7
    • Minwoo Im's avatar
      nvme: fix comment typos in nvme_create_io_queues · 8adb8c14
      Minwoo Im authored
      fix comment typos in nvme_create_io_queues() like below.
        _aount_ to _amount_
        _an_    to _can_
      Signed-off-by: default avatarMinwoo Im <minwoo.im.dev@gmail.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8adb8c14
    • Roy Shterman's avatar
      nvme: host delete_work and reset_work on separate workqueues · b227c59b
      Roy Shterman authored
      We need to ensure that delete_work will be hosted on a different
      workqueue than all the works we flush or cancel from it.
      Otherwise we may hit a circular dependency warning [1].
      
      Also, given that delete_work flushes reset_work, host reset_work
      on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
      fix the flushing in the individual drivers to flush nvme_delete_wq
      when draining queued deletes.
      
      [1]:
      [  178.491942] =============================================
      [  178.492718] [ INFO: possible recursive locking detected ]
      [  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G           OE
      [  178.494382] ---------------------------------------------
      [  178.495160] kworker/5:1/135 is trying to acquire lock:
      [  178.495894]  (
      [  178.496120] "nvme-wq"
      [  178.496471] ){++++.+}
      [  178.496599] , at:
      [  178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0
      [  178.497670]
                     but task is already holding lock:
      [  178.498499]  (
      [  178.498724] "nvme-wq"
      [  178.499074] ){++++.+}
      [  178.499202] , at:
      [  178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.500343]
                     other info that might help us debug this:
      [  178.501269]  Possible unsafe locking scenario:
      
      [  178.502113]        CPU0
      [  178.502472]        ----
      [  178.502829]   lock(
      [  178.503115] "nvme-wq"
      [  178.503467] );
      [  178.503716]   lock(
      [  178.504001] "nvme-wq"
      [  178.504353] );
      [  178.504601]
                      *** DEADLOCK ***
      
      [  178.505441]  May be due to missing lock nesting notation
      
      [  178.506453] 2 locks held by kworker/5:1/135:
      [  178.507068]  #0:
      [  178.507330]  (
      [  178.507598] "nvme-wq"
      [  178.507726] ){++++.+}
      [  178.508079] , at:
      [  178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.509004]  #1:
      [  178.509265]  (
      [  178.509532] (&ctrl->delete_work)
      [  178.509795] ){+.+.+.}
      [  178.510145] , at:
      [  178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.511070]
                     stack backtrace:
      :
      [  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G           OE   4.9.0-rc4-c844263313a8-lb #3
      [  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
      [  178.515071]  ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80
      [  178.516195]  ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700
      [  178.517318]  ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e
      [  178.518443] Call Trace:
      [  178.518807]  [<ffffffffa7450823>] dump_stack+0x85/0xc2
      [  178.519542]  [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0
      [  178.520377]  [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30
      [  178.521330]  [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0
      [  178.522174]  [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0
      [  178.522975]  [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220
      [  178.523753]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
      [  178.524535]  [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0
      [  178.525291]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
      [  178.526077]  [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220
      [  178.527040]  [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0
      [  178.527907]  [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40
      [  178.528726]  [<ffffffffa71cb507>] ? printk+0x48/0x50
      [  178.529434]  [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20
      [  178.530381]  [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
      [  178.531314]  [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
      [  178.532271]  [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0
      [  178.533101]  [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0
      [  178.533954]  [<ffffffffa70adc4e>] worker_thread+0x4e/0x490
      [  178.534735]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
      [  178.535588]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
      [  178.536441]  [<ffffffffa70b48cf>] kthread+0xff/0x120
      [  178.537149]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
      [  178.538094]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
      [  178.538900]  [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40
      Signed-off-by: default avatarRoy Shterman <roys@lightbitslabs.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      b227c59b
    • Sagi Grimberg's avatar
      nvme-pci: allocate device queues storage space at probe · 147b27e4
      Sagi Grimberg authored
      It may cause race by setting 'nvmeq' in nvme_init_request()
      because .init_request is called inside switching io scheduler, which
      may happen when the NVMe device is being resetted and its nvme queues
      are being freed and created. We don't have any sync between the two
      pathes.
      
      This patch changes the nvmeq allocation to occur at probe time so
      there is no way we can dereference it at init_request.
      
      [   93.268391] kernel BUG at drivers/nvme/host/pci.c:408!
      [   93.274146] invalid opcode: 0000 [#1] SMP
      [   93.278618] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss
      nfsv4 dns_resolver nfs lockd grace fscache sunrpc ipmi_ssif vfat fat
      intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
      kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt
      intel_cstate ipmi_si iTCO_vendor_support intel_uncore mxm_wmi mei_me
      ipmi_devintf intel_rapl_perf pcspkr sg ipmi_msghandler lpc_ich dcdbas mei
      shpchp acpi_power_meter wmi dm_multipath ip_tables xfs libcrc32c sd_mod
      mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
      fb_sys_fops ttm drm ahci libahci nvme libata crc32c_intel nvme_core tg3
      megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
      [   93.349071] CPU: 5 PID: 1842 Comm: sh Not tainted 4.15.0-rc2.ming+ #4
      [   93.356256] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
      [   93.364801] task: 00000000fb8abf2a task.stack: 0000000028bd82d1
      [   93.371408] RIP: 0010:nvme_init_request+0x36/0x40 [nvme]
      [   93.377333] RSP: 0018:ffffc90002537ca8 EFLAGS: 00010246
      [   93.383161] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008
      [   93.391122] RDX: 0000000000000000 RSI: ffff880276ae0000 RDI: ffff88047bae9008
      [   93.399084] RBP: ffff88047bae9008 R08: ffff88047bae9008 R09: 0000000009dabc00
      [   93.407045] R10: 0000000000000004 R11: 000000000000299c R12: ffff880186bc1f00
      [   93.415007] R13: ffff880276ae0000 R14: 0000000000000000 R15: 0000000000000071
      [   93.422969] FS:  00007f33cf288740(0000) GS:ffff88047ba80000(0000) knlGS:0000000000000000
      [   93.431996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   93.438407] CR2: 00007f33cf28e000 CR3: 000000047e5bb006 CR4: 00000000001606e0
      [   93.446368] Call Trace:
      [   93.449103]  blk_mq_alloc_rqs+0x231/0x2a0
      [   93.453579]  blk_mq_sched_alloc_tags.isra.8+0x42/0x80
      [   93.459214]  blk_mq_init_sched+0x7e/0x140
      [   93.463687]  elevator_switch+0x5a/0x1f0
      [   93.467966]  ? elevator_get.isra.17+0x52/0xc0
      [   93.472826]  elv_iosched_store+0xde/0x150
      [   93.477299]  queue_attr_store+0x4e/0x90
      [   93.481580]  kernfs_fop_write+0xfa/0x180
      [   93.485958]  __vfs_write+0x33/0x170
      [   93.489851]  ? __inode_security_revalidate+0x4c/0x60
      [   93.495390]  ? selinux_file_permission+0xda/0x130
      [   93.500641]  ? _cond_resched+0x15/0x30
      [   93.504815]  vfs_write+0xad/0x1a0
      [   93.508512]  SyS_write+0x52/0xc0
      [   93.512113]  do_syscall_64+0x61/0x1a0
      [   93.516199]  entry_SYSCALL64_slow_path+0x25/0x25
      [   93.521351] RIP: 0033:0x7f33ce96aab0
      [   93.525337] RSP: 002b:00007ffe57570238 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [   93.533785] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f33ce96aab0
      [   93.541746] RDX: 0000000000000006 RSI: 00007f33cf28e000 RDI: 0000000000000001
      [   93.549707] RBP: 00007f33cf28e000 R08: 000000000000000a R09: 00007f33cf288740
      [   93.557669] R10: 00007f33cf288740 R11: 0000000000000246 R12: 00007f33cec42400
      [   93.565630] R13: 0000000000000006 R14: 0000000000000001 R15: 0000000000000000
      [   93.573592] Code: 4c 8d 40 08 4c 39 c7 74 16 48 8b 00 48 8b 04 08 48 85 c0
      74 16 48 89 86 78 01 00 00 31 c0 c3 8d 4a 01 48 63 c9 48 c1 e1 03 eb de <0f>
      0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 53 48 89
      [   93.594676] RIP: nvme_init_request+0x36/0x40 [nvme] RSP: ffffc90002537ca8
      [   93.602273] ---[ end trace 810dde3993e5f14e ]---
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      147b27e4
    • Sagi Grimberg's avatar
      nvme-pci: serialize pci resets · 79c48ccf
      Sagi Grimberg authored
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      79c48ccf
  3. 14 Jan, 2018 1 commit
  4. 12 Jan, 2018 3 commits
  5. 11 Jan, 2018 2 commits
  6. 10 Jan, 2018 17 commits
  7. 09 Jan, 2018 10 commits
    • Jens Axboe's avatar
      null_blk: wire up timeouts · 5448aca4
      Jens Axboe authored
      This is needed to ensure that we actually handle timeouts.
      Without it, the queue_mode=1 path will never call blk_add_timer(),
      and the queue_mode=2 path will continually just return
      EH_RESET_TIMER and we never actually complete the offending request.
      
      This was used to test the new timeout code, and the changes around
      killing off REQ_ATOM_COMPLETE.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5448aca4
    • Jens Axboe's avatar
      bfq-iosched: don't call bfqg_and_blkg_put for !CONFIG_BFQ_GROUP_IOSCHED · 8abef10b
      Jens Axboe authored
      It's not available if we don't have group io scheduling set, and
      there's no need to call it.
      
      Fixes: 0d52af59 ("block, bfq: release oom-queue ref to root group on exit")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8abef10b
    • Michael Lyle's avatar
      bcache: closures: move control bits one bit right · 3609c471
      Michael Lyle authored
      Otherwise, architectures that do negated adds of atomics (e.g. s390)
      to do atomic_sub fail in closure_set_stopped.
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3609c471
    • Bart Van Assche's avatar
      block: Fix kernel-doc warnings reported when building with W=1 · aa98192d
      Bart Van Assche authored
      Commit 3a025e1d ("Add optional check for bad kernel-doc comments")
      causes W=1 the kernel-doc script to be run and thereby causes several
      new warnings to appear when building the kernel with W=1. Fix the
      block layer kernel-doc headers such that the block layer again builds
      cleanly with W=1.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      aa98192d
    • Bart Van Assche's avatar
      blk-mq: Fix spelling in a source code comment · ee3e4de5
      Bart Van Assche authored
      Change "nedeing" into "needing" and "caes" into "cases".
      
      Fixes: commit f906a6a0 ("blk-mq: improve tag waiting setup for non-shared tags")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ee3e4de5
    • Jens Axboe's avatar
      blk-mq: silence false positive warnings in hctx_unlock() · 08b5a6e2
      Jens Axboe authored
      In some stupider versions of gcc, it complains:
      
      block/blk-mq.c: In function ‘blk_mq_complete_request’:
      ./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        __srcu_read_unlock(sp, idx);
        ^
      block/blk-mq.c:620:6: note: ‘srcu_idx’ was declared here
        int srcu_idx;
            ^
      
      which is completely bogus, since we only use srcu_idx when
      hctx->flags & BLK_MQ_F_BLOCKING is set, and that's the case where
      hctx_lock() has initialized it.
      
      Just set it to '0' in the normal path in hctx_lock() to silence
      this annoying warning.
      
      Fixes: 04ced159 ("blk-mq: move hctx lock/unlock into a helper")
      Fixes: 5197c05e ("blk-mq: protect completion path with RCU")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      08b5a6e2
    • Tejun Heo's avatar
      blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu · 05707b64
      Tejun Heo authored
      The RCU protection has been expanded to cover both queueing and
      completion paths making ->queue_rq_srcu a misnomer.  Rename it to
      ->srcu as suggested by Bart.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      05707b64
    • Tejun Heo's avatar
      blk-mq: remove REQ_ATOM_STARTED · 5a61c363
      Tejun Heo authored
      After the recent updates to use generation number and state based
      synchronization, we can easily replace REQ_ATOM_STARTED usages by
      adding an extra state to distinguish completed but not yet freed
      state.
      
      Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with
      blk_mq_rq_state() tests.  REQ_ATOM_STARTED no longer has any users
      left and is removed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5a61c363
    • Tejun Heo's avatar
      blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq · 634f9e46
      Tejun Heo authored
      After the recent updates to use generation number and state based
      synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
      to avoid firing the same timeout multiple times.
      
      Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
      RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple
      times.  This removes atomic bitops from hot paths too.
      
      v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().
      
      v3: Added RQF_MQ_TIMEOUT_EXPIRED flag.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      634f9e46
    • Tejun Heo's avatar
      blk-mq: make blk_abort_request() trigger timeout path · 358f70da
      Tejun Heo authored
      With issue/complete and timeout paths now using the generation number
      and state based synchronization, blk_abort_request() is the only one
      which depends on REQ_ATOM_COMPLETE for arbitrating completion.
      
      There's no reason for blk_abort_request() to be a completely separate
      path.  This patch makes blk_abort_request() piggyback on the timeout
      path instead of trying to terminate the request directly.
      
      This removes the last dependency on REQ_ATOM_COMPLETE in blk-mq.
      
      Note that this makes blk_abort_request() asynchronous - it initiates
      abortion but the actual termination will happen after a short while,
      even when the caller owns the request.  AFAICS, SCSI and ATA should be
      fine with that and I think mtip32xx and dasd should be safe but not
      completely sure.  It'd be great if people who know the drivers take a
      look.
      
      v2: - Add comment explaining the lack of synchronization around
            ->deadline update as requested by Bart.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Asai Thambi SP <asamymuthupa@micron.com>
      Cc: Stefan Haberland <sth@linux.vnet.ibm.com>
      Cc: Jan Hoeppner <hoeppner@linux.vnet.ibm.com>
      Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      358f70da