Commits · fd8563ced8f50c5029fdf8e57735138e2e630423 · Kirill Smelkov / linux

04 Apr, 2017 23 commits

nvme-rdma: Support ctrl_loss_tmo · fd8563ce

Sagi Grimberg authored Mar 18, 2017

Before scheduling a reconnect attempt, check
nr_reconnects against max_reconnects, if not
exhausted (or max_reconnects is not -1), schedule
a reconnect attempts, otherwise schedule ctrl
removal.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

fd8563ce

nvme-fabrics: Allow ctrl loss timeout configuration · 42a45274

Sagi Grimberg authored Mar 18, 2017

When a host sense that its controller session is damaged,
it tries to re-establish it periodically (reconnect every
reconnect_delay). It may very well be that the controller
is gone and never coming back, in this case the host will
try to reconnect forever.

Add a ctrl_loss_tmo to bound the number of reconnect attempts
to a specific controller (default to a reasonable 10 minutes).
The timeout configuration is actually translated into number of
reconnect attempts and not a schedule on its own but rather
divided with reconnect_delay. This is useful to prevent
racing flows of remove and reconnect, and it doesn't really
matter if we remove slightly sooner than what the user requested.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

42a45274

nvme-rdma: get rid of local reconnect_delay · 7777bded

Sagi Grimberg authored Mar 18, 2017

we already have it in opts.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

7777bded

nvme-loop: retrieve iod from the cqe command_id · 3b068376

Sagi Grimberg authored Feb 27, 2017

useful to validate that the we didn't mess up
the command_id.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

3b068376

nvme-loop: remove unneeded includes · d89a39be

Sagi Grimberg authored Mar 19, 2017

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

d89a39be

nvme-fc: fix module_init (theoretical) error path · c0e4a6f5

Sagi Grimberg authored Mar 19, 2017

If nvmf_register_transport happened to fail
(it can't, but theoretically) we leak memory.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

c0e4a6f5

nvme-loop: fix module_init (theoretical) error path · d19eef02

Sagi Grimberg authored Mar 19, 2017

if nvmf_register_transport happend to fail, we
need to nvmet_unregister_transport as well.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

d19eef02

nvme-rdma: fix module_init (theoretical) error path · a56c79cf

Sagi Grimberg authored Mar 19, 2017

If nvmf_register_transport happened to fail
(it can't, but theoretically) we leak memory.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

a56c79cf

nvmet: use symbolic constants for log identifiers · 2ca0786d

Max Gurtovoy authored Mar 06, 2017

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

2ca0786d

nvmet: Introduced helper routine for controller status check. · 64a0ca88

Parav Pandit authored Feb 27, 2017

This patch introduces helper function for checking controller
status during admin and io command processing which returns u16
status. As to bring consistency on returning status, other
friend functions also now return u16 status instead of int
to match the spec.

As part of the theseerror log prints in also prints qid on
which command error occured.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

64a0ca88

nvmet: Fixed avoided printing nvmet: twice in error logs. · 4151dd9a

Parav Pandit authored Feb 27, 2017

This patch avoids printing "nvmet:" twice in error logs as its already
coming through pr_fmt macro.
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

4151dd9a

iscsi-target: use generic inet_pton_with_scope · 4459e042

Sagi Grimberg authored Feb 06, 2017

Instead of parsing address strings, use a generic
helper.
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

4459e042

nvme-rdma: use inet_pton_with_scope helper · 0928f9b4

Sagi Grimberg authored Feb 05, 2017

Both the destination and the host addresses are now
parsed using inet_pton_with_scope helper. We also
get ipv6 (with address scopes support).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

0928f9b4

nvmet-rdma: use generic inet_pton_with_scope · 670c2a3a

Sagi Grimberg authored Feb 05, 2017

Instead of parsing address strings, use a generic
helper. This also adds ipv6 (with address scopes)
support.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

670c2a3a

net/utils: generic inet_pton_with_scope helper · b1a951fe

Sagi Grimberg authored Feb 05, 2017

Several locations in the stack need to handle ipv4/ipv6
(with scope) and port strings conversion to sockaddr.
Add a helper that takes either AF_INET, AF_INET6 or
AF_UNSPEC (for wildcard) to centralize this handling.
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

b1a951fe

nvme-loop: remove some code duplication · 297186d6

Sagi Grimberg authored Mar 13, 2017

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

297186d6

nvme-rdma: Give some more grace for rdma connection establishment · 782d820c

Sagi Grimberg authored Mar 21, 2017

The target might be occupied with multiple hosts so lets
give it some more grace before failing the connection
establishment.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

782d820c

nvmet-rdma: occasionally flush ongoing controller teardown · 777dc823

Sagi Grimberg authored Mar 21, 2017

If we are attacked with establishments/teradowns we need to
make sure we do not consume too much system memory. Thus
let ongoing controller teardowns complete before accepting
new controller establishments.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

777dc823

nvme-rdma: handle cpu unplug when re-establishing the controller · dc2ad16a

Sagi Grimberg authored Mar 09, 2017

If a cpu unplug event has occured, we need to take the minimum
of the provided nr_io_queues and the number of online cpus,
otherwise we won't be able to connect them as blk-mq mapping
won't dispatch to those queues.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

dc2ad16a

nvmet-rdma: Fix a possible uninitialized variable dereference · 8d61413d

Sagi Grimberg authored Mar 09, 2017

When handling a new recv command, we grab a new rsp resource and
check for the queue state being live. In case the queue is not in
live state, we simply restore the rsp back to the free list. However
in this flow we didn't set rsp->queue yet, so we cannot dereference it.

Instead, make sure to initialize rsp->queue (and other rsp members)
as soon as possible so we won't reference uninitialized variables.
Reported-by: Yi Zhang <yizhan@redhat.com>
Reported-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

8d61413d

nvmet: confirm sq percpu has scheduled and switched to atomic · 427242ce

Sagi Grimberg authored Mar 06, 2017

percpu_ref_kill is not enough to prevent subsequent
percpu_ref_tryget_live from failing. Hence call
perfcpu_ref_kill_confirm to make it safe.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

427242ce

nvme-loop: handle cpu unplug when re-establishing the controller · 6ecda70e

Sagi Grimberg authored Mar 13, 2017

If a cpu unplug event has occured, we need to take the minimum
of the provided nr_io_queues and the number of online cpus,
otherwise we won't be able to connect them as blk-mq mapping
won't dispatch to those queues.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

6ecda70e

nvme-loop: fix a possible use-after-free when destroying the admin queue · d476983e

Sagi Grimberg authored Feb 27, 2017

we need to destroy the nvmet sq and let it finish gracefully
before continue to cleanup the queue.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

d476983e

31 Mar, 2017 1 commit

blk-mq: constify struct blk_mq_ops · f363b089

Eric Biggers authored Mar 30, 2017

Constify all instances of blk_mq_ops, as they are never modified.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f363b089

30 Mar, 2017 4 commits

null_blk: add blocking mode · db5bcf87

Jens Axboe authored Mar 30, 2017

This adds a new module parameter to null_blk, blocking. If set, null_blk
will set the BLK_MQ_F_BLOCKING flag, indicating that it sometimes/always
needs to block in its ->queue_rq() function. The intent is to help find
regressions in blocking drivers, since not many of them exist.

If null_blk is loaded with submit_queues > 1 and blocking=1, this
shows the regression recently fixed by bf4907c0.
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

db5bcf87

blk-mq: fix schedule-under-preempt for blocking drivers · bf4907c0

Jens Axboe authored Mar 30, 2017

Commit a4d907b6 unified the single and multi queue request handlers,
but in the process, it also screwed up the locking balance and calls
blk_mq_try_issue_directly() with the ctx preempt lock held. This is a
problem for drivers that have set BLK_MQ_F_BLOCKING, since now they
can't reliably sleep.

While in there, protect against similar issues in the future, by adding
a might_sleep() trigger in the BLOCKING path for direct issue or queue
run.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Josef Bacik <josef@toxicpanda.com>
Fixes: a4d907b6 ("blk-mq: streamline blk_mq_make_request")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

bf4907c0

block/sed-opal: fix spelling mistake: "Lifcycle" -> "Lifecycle" · 47d75207

Colin Ian King authored Mar 30, 2017

trivial fix to spelling mistake in pr_err error message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

47d75207

block: do not put mq context in blk_mq_alloc_request_hctx · 3e06eb3d

Minchan Kim authored Mar 30, 2017

In blk_mq_alloc_request_hctx, blk_mq_sched_get_request doesn't
get sw context so we don't need to put the context with
blk_mq_put_ctx. Unless, we will see preempt counter underflow.

Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

3e06eb3d

29 Mar, 2017 12 commits

blk-mq: include errors in did_work calculation · 3e8a7069

Jens Axboe authored Mar 24, 2017

Currently we return true in blk_mq_dispatch_rq_list() if we queued IO
successfully, but we really want to return whether or not the we made
progress. Progress includes if we got an error return.  If we don't,
this can lead to a hang in blk_mq_sched_dispatch_requests() when a
driver is draining IO by returning BLK_MQ_QUEUE_ERROR instead of
manually ending the IO in error and return BLK_MQ_QUEUE_OK.
Tested-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

3e8a7069

block-mq: don't re-queue if we get a queue error · b58e1769

Josef Bacik authored Mar 28, 2017

When try to issue a request directly and we fail we will requeue the
request, but call blk_mq_end_request() as well.  This leads to the
completed request being on a queuelist and getting ended twice, which
causes list corruption in schedulers and other shenanigans.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

b58e1769

blkcg: allocate struct blkcg_gq outside request queue spinlock · 457e490f

Tahsin Erdogan authored Mar 29, 2017

blkg_conf_prep() currently calls blkg_lookup_create() while holding
request queue spinlock. This means allocating memory for struct
blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM
failures in call paths like below:

  pcpu_alloc+0x68f/0x710
  __alloc_percpu_gfp+0xd/0x10
  __percpu_counter_init+0x55/0xc0
  cfq_pd_alloc+0x3b2/0x4e0
  blkg_alloc+0x187/0x230
  blkg_create+0x489/0x670
  blkg_lookup_create+0x9a/0x230
  blkg_conf_prep+0x1fb/0x240
  __cfqg_set_weight_device.isra.105+0x5c/0x180
  cfq_set_weight_on_dfl+0x69/0xc0
  cgroup_file_write+0x39/0x1c0
  kernfs_fop_write+0x13f/0x1d0
  __vfs_write+0x23/0x120
  vfs_write+0xc2/0x1f0
  SyS_write+0x44/0xb0
  entry_SYSCALL_64_fastpath+0x18/0xad

In the code path above, percpu allocator cannot call vmalloc() due to
queue spinlock.

A failure in this call path gives grief to tools which are trying to
configure io weights. We see occasional failures happen shortly after
reboots even when system is not under any memory pressure. Machines
with a lot of cpus are more vulnerable to this condition.

Do struct blkcg_gq allocations outside the queue spinlock to allow
blocking during memory allocations.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>

457e490f

Revert "blkcg: allocate struct blkcg_gq outside request queue spinlock" · d708f0d5

Jens Axboe authored Mar 29, 2017

I inadvertently applied the v5 version of this patch, whereas
the agreed upon version was v5. Revert this one so we can apply
the right one.

This reverts commit 7fc6b87a.

d708f0d5

blk-mq: fix a typo and a spelling mistake · 48b99c9d
Jens Axboe authored Mar 29, 2017
```
Signed-off-by: Jens Axboe <axboe@fb.com>
```
48b99c9d

blk-mq-pci: Fix two spelling mistakes · 018c259b

Sagi Grimberg authored Mar 29, 2017

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

018c259b

block: fix leak of q->rq_wb · 02ba8893

Omar Sandoval authored Mar 28, 2017

CONFIG_DEBUG_TEST_DRIVER_REMOVE found a possible leak of q->rq_wb when a
request queue is reregistered. This has been a problem since wbt was
introduced, but the WARN_ON(!list_empty(&stats->callbacks)) in the
blk-stat rework exposed it. Fix it by cleaning up wbt when we unregister
the queue.

Fixes: 87760e5e ("block: hook up writeback throttling")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

02ba8893

blk-mq: fix leak of q->stats · 0c9539a4

Omar Sandoval authored Mar 28, 2017

blk_alloc_queue_node() already allocates q->stats, so
blk_mq_init_allocated_queue() is overwriting it with a new allocation.

Fixes: a83b576c ("block: fix stacked driver stats init and free")
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

0c9539a4

block: warn if sharing request queue across gendisks · 334335d2

Omar Sandoval authored Mar 28, 2017

Now that the remaining drivers have been converted to one request queue
per gendisk, let's warn if a request queue gets registered more than
once. This will catch future drivers which might do it inadvertently or
any old drivers that I may have missed.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

334335d2

block: block new I/O just after queue is set as dying · d3cfb2a0

Ming Lei authored Mar 27, 2017

Before commit 780db207(blk-mq: decouble blk-mq freezing
from generic bypassing), the dying flag is checked before
entering queue, and Tejun converts the checking into .mq_freeze_depth,
and assumes the counter is increased just after dying flag
is set. Unfortunately we doesn't do that in blk_set_queue_dying().

This patch calls blk_freeze_queue_start() in blk_set_queue_dying(),
so that we can block new I/O coming once the queue is set as dying.

Given blk_set_queue_dying() is always called in remove path
of block device, and queue will be cleaned up later, we don't
need to worry about undoing the counter.

Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

d3cfb2a0

block: rename blk_mq_freeze_queue_start() · 1671d522

Ming Lei authored Mar 27, 2017

As the .q_usage_counter is used by both legacy and
mq path, we need to block new I/O if queue becomes
dead in blk_queue_enter().

So rename it and we can use this function in both
paths.
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

1671d522

block: add a read barrier in blk_queue_enter() · 5ed61d3f

Ming Lei authored Mar 27, 2017

Without the barrier, reading DEAD flag of .q_usage_counter
and reading .mq_freeze_depth may be reordered, then the
following wait_event_interruptible() may never return.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

5ed61d3f