Commits · c888a8f95ae5b1067855235b3b71c1ebccf504f5 · Kirill Smelkov / linux

13 Apr, 2016 2 commits

block: kill off q->flush_flags · c888a8f9

Jens Axboe authored Apr 13, 2016

Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.
Signed-off-by: Jens Axboe <axboe@fb.com>

c888a8f9

nvme: Avoid reset work on watchdog timer function during error recovery · c875a709

Guilherme G. Piccoli authored Apr 13, 2016

This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.

If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).

In this patch we also have split the huge condition expression on
nvme_watchdog_timer() by introducing an auxiliary function to help
make the code more readable.
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

c875a709

12 Apr, 2016 34 commits

NVMe: silence warning about unused 'dev' · 7e197930

Jens Axboe authored Apr 12, 2016

Depending on options, we might not be using dev in nvme_cancel_io():

drivers/nvme/host/pci.c: In function ‘nvme_cancel_io’:
drivers/nvme/host/pci.c:970:19: warning: unused variable ‘dev’ [-Wunused-variable]
  struct nvme_dev *dev = data;
                   ^

So get rid of it, and just cast for the dev_dbg_ratelimited() call.

Fixes: 82b4552b ("nvme: Use blk-mq helper for IO termination")
Signed-off-by: Jens Axboe <axboe@fb.com>

7e197930

block: kill blk_queue_flush() · 2245f6de

Jens Axboe authored Mar 30, 2016

We don't have any drivers left using it, so kill it off. Update
documentation to use the newer blk_queue_write_cache().
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

2245f6de

um: switch to using blk_queue_write_cache() · f935a8ce
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
f935a8ce
mtd: switch to using blk_queue_write_cache() · fec3ff5d
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
fec3ff5d
mmc/block: switch to using blk_queue_write_cache() · e9d5c746
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
e9d5c746
md: update to using blk_queue_write_cache() · 56883a7e
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
56883a7e
ide-disk: update to using blk_queue_write_cache() · a98239de
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
a98239de
xen-blkfront: switch to using blk_queue_write_cache() · bfd230ac
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
bfd230ac
dm: switch to using blk_queue_write_cache() · 519a7e16
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
519a7e16
bcache: switch to using blk_queue_write_cache() · 84b4ff9e
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
84b4ff9e
virtio_blk: switch to using blk_queue_write_cache() · ad9126ac
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
ad9126ac
ps3disk: switch to using blk_queue_write_cache() · 12c95f13
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
12c95f13
skd_main: switch to using blk_queue_write_cache() · 6975f732
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
6975f732
osdblk: switch to using blk_queue_write_cache() · 177febc8
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
177febc8
nbd: switch to using blk_queue_write_cache() · aafb1eec
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
aafb1eec

mtip32xx: remove call to blk_queue_flush() · 17fe95f4

Jens Axboe authored Apr 12, 2016

The driver calls it with 0 for flags, since it doesn't have a writeback
cache. Just remove the call, as it's a no-op right now.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

17fe95f4

loop: switch to using blk_queue_write_cache() · 21d0727f
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
21d0727f
drbd: switch to using blk_queue_write_cache() · fe8fb75e
Jens Axboe authored Mar 30, 2016
```
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
```
fe8fb75e
NVMe: switch to using blk_queue_write_cache() · 7c88cb00
Jens Axboe authored Apr 12, 2016
```
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
```
7c88cb00

sd: switch to using blk_queue_write_cache() · eb310e23

Jens Axboe authored Mar 30, 2016

Switch to the newer interface, instead of using blk_queue_flush()
directly.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

eb310e23

Merge branch 'for-4.7/core' into for-4.7/drivers · 2f9a0b33
Jens Axboe authored Apr 12, 2016

2f9a0b33

block: add ability to flag write back caching on a device · 93e9d8e8

Jens Axboe authored Apr 12, 2016

Add an internal helper and flag for setting whether a queue has
write back caching, or write through (or none). Add a sysfs file
to show this as well, and make it changeable from user space.

This will replace the (awkward) blk_queue_flush() interface that
drivers currently use to inform the block layer of write cache state
and capabilities.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

93e9d8e8

blk-mq: Make blk_mq_all_tag_busy_iter static · e8f1e163

Sagi Grimberg authored Mar 10, 2016

No caller outside the blk-mq code so we can settle
with it static.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

e8f1e163

mtip32xx: Convert to use blk_mq_tagset_busy_iter · 6d125de4

Keith Busch authored Mar 10, 2016

Only a single tags array anyway.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

6d125de4

nvme: Use blk-mq helper for IO termination · 82b4552b

Sagi Grimberg authored Apr 12, 2016

blk-mq offers a tagset iterator so let's use that
instead of using nvme_clear_queues.

Note, we changed nvme_queue_cancel_ios name to nvme_cancel_io
as there is no concept of a queue now in this function (we
also lost the print).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

82b4552b

NVMe: Skip async events for degraded controllers · 21f033f7

Keith Busch authored Apr 12, 2016

If the controller is degraded, the driver should stay out of the way so
the user can recover the drive. This patch skips driver initiated async
event requests when the drive is in this state.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

21f033f7

nvme: add helper nvme_setup_cmd() · 8093f7ca

Ming Lin authored Apr 12, 2016

This moves nvme_setup_{flush,discard,rw} calls into a common
nvme_setup_cmd() helper. So we can eventually hide all the command
setup in the core module and don't even need to update the fabrics
drivers for any specific command type.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

8093f7ca

nvme: rewrite discard support · 03b5929e

Ming Lin authored Mar 22, 2016

This rewrites nvme_setup_discard() with blk_add_request_payload().
It allocates only the necessary amount(16 bytes) for the payload.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

03b5929e

nvme: add helper nvme_map_len() · 58b45602

Ming Lin authored Mar 22, 2016

The helper returns the number of bytes that need to be mapped
using PRPs/SGL entries.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

58b45602

nvme: add missing lock nesting notation · 2e39e0f6

Ming Lin authored Apr 05, 2016

When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue()
that sends nvme_admin_delete_cq command to admin sq. So when the command
completed, the lock acquired by nvme_irq() actually belongs to admin queue.

While the lock that nvme_del_cq_end() trying to acquire belongs to io queue.
So it will not deadlock.

This patch adds lock nesting notation to fix following report.

[  109.840952] =============================================
[  109.846379] [ INFO: possible recursive locking detected ]
[  109.851806] 4.5.0+ #180 Tainted: G            E
[  109.856533] ---------------------------------------------
[  109.861958] swapper/0/0 is trying to acquire lock:
[  109.866771]  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
[  109.876535]
[  109.876535] but task is already holding lock:
[  109.882398]  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
[  109.891547]
[  109.891547] other info that might help us debug this:
[  109.898107]  Possible unsafe locking scenario:
[  109.898107]
[  109.904056]        CPU0
[  109.906515]        ----
[  109.908974]   lock(&(&nvmeq->q_lock)->rlock);
[  109.913381]   lock(&(&nvmeq->q_lock)->rlock);
[  109.917787]
[  109.917787]  *** DEADLOCK ***
[  109.917787]
[  109.923738]  May be due to missing lock nesting notation
[  109.923738]
[  109.930558] 1 lock held by swapper/0/0:
[  109.934413]  #0:  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
[  109.944010]
[  109.944010] stack backtrace:
[  109.948389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E   4.5.0+ #180
[  109.955734] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013
[  109.962989]  0000000000000000 ffff88011e203c38 ffffffff81383d9c ffffffff81c13540
[  109.970478]  ffffffff826711d0 ffff88011e203ce8 ffffffff810bb429 0000000000000046
[  109.977964]  0000000000000046 0000000000000000 0000000000b2e597 ffffffff81f4cb00
[  109.985453] Call Trace:
[  109.987911]  <IRQ>  [<ffffffff81383d9c>] dump_stack+0x85/0xc9
[  109.993711]  [<ffffffff810bb429>] __lock_acquire+0x19b9/0x1c60
[  109.999575]  [<ffffffff810b6d1d>] ? trace_hardirqs_off+0xd/0x10
[  110.005524]  [<ffffffff810b386d>] ? complete+0x3d/0x50
[  110.010688]  [<ffffffff810bb760>] lock_acquire+0x90/0xf0
[  110.016029]  [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
[  110.022418]  [<ffffffff81772afb>] _raw_spin_lock_irqsave+0x4b/0x60
[  110.028632]  [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
[  110.035019]  [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
[  110.041232]  [<ffffffff8135b485>] blk_mq_end_request+0x35/0x60
[  110.047095]  [<ffffffffc0821ad8>] nvme_complete_rq+0x68/0x190 [nvme]
[  110.053481]  [<ffffffff8135b53f>] __blk_mq_complete_request+0x8f/0x130
[  110.060043]  [<ffffffff8135b611>] blk_mq_complete_request+0x31/0x40
[  110.066343]  [<ffffffffc08209e3>] __nvme_process_cq+0x83/0x240 [nvme]
[  110.072818]  [<ffffffffc0820c35>] nvme_irq+0x25/0x50 [nvme]
[  110.078419]  [<ffffffff810cdb66>] handle_irq_event_percpu+0x36/0x110
[  110.084804]  [<ffffffff810cdc77>] handle_irq_event+0x37/0x60
[  110.090491]  [<ffffffff810d0ea3>] handle_edge_irq+0x93/0x150
[  110.096180]  [<ffffffff81012306>] handle_irq+0xa6/0x130
[  110.101431]  [<ffffffff81011abe>] do_IRQ+0x5e/0x120
[  110.106333]  [<ffffffff8177384c>] common_interrupt+0x8c/0x8c
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

2e39e0f6

NVMe: Always use MSI/MSI-x interrupts · 788e15ab

Keith Busch authored Apr 08, 2016

Multiple users have reported device initialization failure due the driver
not receiving legacy PCI interrupts. This is not unique to any particular
controller, but has been observed on multiple platforms.

There have been no issues reported or observed when with message signaled
interrupts, so this patch attempts to use MSI-x during initialization,
falling back to MSI. If that fails, legacy would become the default.

The setup_io_queues error handling had to change as a result: the admin
queue's msix_entry used to be initialized to the legacy IRQ. The case
where nr_io_queues is 0 would fail request_irq when setting up the admin
queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
the admin queue's msix_entry invalid. Instead, return success immediately.
Reported-by: Tim Muhlemmer <muhlemmer@gmail.com>
Reported-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

788e15ab

blk-mq: Export tagset iter function · e0489487

Sagi Grimberg authored Mar 10, 2016

Its useful to iterate on all the active tags in cases
where we will need to fail all the queues IO.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
[hch: carefully check for valid tagsets]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

e0489487

block: add offset in blk_add_request_payload() · 37e58237

Ming Lin authored Mar 22, 2016

We could kmalloc() the payload, so need the offset in page.
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

37e58237

writeback: Fix performance regression in wb_over_bg_thresh() · 81b76485

Howard Cochran authored Mar 10, 2016

Commit 947e9762 ("writeback: update wb_over_bg_thresh() to use
wb_domain aware operations") unintentionally changed this function's
meaning from "are there more dirty pages than the background writeback
threshold" to "are there more dirty pages than the writeback threshold".
The background writeback threshold is typically half of the writeback
threshold, so this had the effect of raising the number of dirty pages
required to cause a writeback worker to perform background writeout.

This can cause a very severe performance regression when a BDI uses
BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
can now disagree on whether writeback should be initiated.

For example, in a system having 1GB of RAM, a single spinning disk, and
a "pass-through" FUSE filesystem mounted over the disk, application code
mmapped a 128MB file on the disk and was randomly dirtying pages in that
mapping.

Because FUSE uses strictlimit and has a default max_ratio of only 1%,
in balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
dirtying processes when we have 151 dirty pages and wakes up a
background writeback worker. But the worker tests the wrong threshold
(200 instead of 100), so it does not initiate writeback and just
returns.

Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
worker who will do nothing. It remains stuck in this state until the few
dirty pages that we have finally expire and we write them back for that
reason. Then the whole process repeats, resulting in near-zero
throughput through the FUSE BDI.

The fix is to call the parameterized variant of wb_calc_thresh, so that
the worker will do writeback if the bg_thresh is exceeded which was the
bahavior before the referenced commit.

Fixes: 947e9762 ("writeback: update wb_over_bg_thresh() to use
                     wb_domain aware operations")
Signed-off-by: Howard Cochran <hcochran@kernelspring.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>

81b76485

11 Apr, 2016 4 commits

Linux 4.6-rc3 · bf162006
Linus Torvalds authored Apr 10, 2016

bf162006

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · 08b15d13

Linus Torvalds authored Apr 10, 2016

Pull ARM fixes from Russell King:
 "A couple of small fixes, and wiring up the new syscalls which appeared
  during the merge window"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8550/1: protect idiv patching against undefined gcc behavior
  ARM: wire up preadv2 and pwritev2 syscalls
  ARM: SMP enable of cache maintanence broadcast

08b15d13

Merge tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc · 2f422f94

Linus Torvalds authored Apr 10, 2016

Pull MMC fixes from Ulf Hansson:
 "Here are a couple of mmc fixes intended for v4.6 rc3:

  MMC host:
   - sdhci: Fix regression setting power on Trats2 board
   - sdhci-pci: Add support and PCI IDs for more Broxton host controllers"

* tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: sdhci-pci: Add support and PCI IDs for more Broxton host controllers
  mmc: sdhci: Fix regression setting power on Trats2 board

2f422f94

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 6a7c9243

Linus Torvalds authored Apr 10, 2016

Pull i2c fixes from Wolfram Sang:
 "Some bugfixes from I2C:

   - fix a uevent triggered boot problem by removing a useless debug
     print

   - fix sysfs-attributes of the new i2c-demux-pinctrl driver to follow
     standard kernel behaviour

   - fix a potential division-by-zero error (needed two takes)"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: jz4780: really prevent potential division by zero
  Revert "i2c: jz4780: prevent potential division by zero"
  i2c: jz4780: prevent potential division by zero
  i2c: mux: demux-pinctrl: Update docs to new sysfs-attributes
  i2c: mux: demux-pinctrl: Clean up sysfs attributes
  i2c: prevent endless uevent loop with CONFIG_I2C_DEBUG_CORE

6a7c9243