1. 28 May, 2018 2 commits
    • Andy Shevchenko's avatar
      bcache: Move couple of string arrays to sysfs.c · 04cbc211
      Andy Shevchenko authored
      There is couple of string arrays that are used exclusively in sysfs.c.
      Move it to there and make them static.
      
      Besides above, it will allow further clean up.
      
      No functional change intended.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      04cbc211
    • Coly Li's avatar
      bcache: stop bcache device when backing device is offline · 0f0709e6
      Coly Li authored
      Currently bcache does not handle backing device failure, if backing
      device is offline and disconnected from system, its bcache device can still
      be accessible. If the bcache device is in writeback mode, I/O requests even
      can success if the requests hit on cache device. That is to say, when and
      how bcache handles offline backing device is undefined.
      
      This patch tries to handle backing device offline in a rather simple way,
      - Add cached_dev->status_update_thread kernel thread to update backing
        device status in every 1 second.
      - Add cached_dev->offline_seconds to record how many seconds the backing
        device is observed to be offline. If the backing device is offline for
        BACKING_DEV_OFFLINE_TIMEOUT (30) seconds, set dc->io_disable to 1 and
        call bcache_device_stop() to stop the bache device which linked to the
        offline backing device.
      
      Now if a backing device is offline for BACKING_DEV_OFFLINE_TIMEOUT seconds,
      its bcache device will be removed, then user space application writing on
      it will get error immediately, and handler the device failure in time.
      
      This patch is quite simple, does not handle more complicated situations.
      Once the bcache device is stopped, users need to recovery the backing
      device, register and attach it manually.
      
      Changelog:
      v3: call wait_for_kthread_stop() before exits kernel thread.
      v2: remove "bcache: " prefix when calling pr_warn().
      v1: initial version.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Michael Lyle <mlyle@lyle.org>
      Cc: Junhui Tang <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0f0709e6
  2. 25 May, 2018 1 commit
  3. 24 May, 2018 2 commits
    • Joe Perches's avatar
      block drivers/block: Use octal not symbolic permissions · 5657a819
      Joe Perches authored
      Convert the S_<FOO> symbolic permissions to their octal equivalents as
      using octal and not symbolic permissions is preferred by many as more
      readable.
      
      see: https://lkml.org/lkml/2016/8/2/1945
      
      Done with automated conversion via:
      $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>
      
      Miscellanea:
      
      o Wrapped modified multi-line calls to a single line where appropriate
      o Realign modified multi-line calls to open parenthesis
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5657a819
    • Ming Lei's avatar
      blk-mq: avoid starving tag allocation after allocating process migrates · e6fc4649
      Ming Lei authored
      When the allocation process is scheduled back and the mapped hw queue is
      changed, fake one extra wake up on previous queue for compensating wake
      up miss, so other allocations on the previous queue won't be starved.
      
      This patch fixes one request allocation hang issue, which can be
      triggered easily in case of very low nr_request.
      
      The race is as follows:
      
      1) 2 hw queues, nr_requests are 2, and wake_batch is one
      
      2) there are 3 waiters on hw queue 0
      
      3) two in-flight requests in hw queue 0 are completed, and only two
         waiters of 3 are waken up because of wake_batch, but both the two
         waiters can be scheduled to another CPU and cause to switch to hw
         queue 1
      
      4) then the 3rd waiter will wait for ever, since no in-flight request
         is in hw queue 0 any more.
      
      5) this patch fixes it by the fake wakeup when waiter is scheduled to
         another hw queue
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      
      Modified commit message to make it clearer, and make it apply on
      top of the 4.18 branch.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e6fc4649
  4. 23 May, 2018 2 commits
    • Tejun Heo's avatar
      bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue · f1834646
      Tejun Heo authored
      From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
      From: Tejun Heo <tj@kernel.org>
      Date: Wed, 23 May 2018 10:29:00 -0700
      
      cgwb_release() punts the actual release to cgwb_release_workfn() on
      system_wq.  Depending on the number of cgroups or block devices, there
      can be a lot of cgwb_release_workfn() in flight at the same time.
      
      We're periodically seeing close to 256 kworkers getting stuck with the
      following stack trace and overtime the entire system gets stuck.
      
        [<ffffffff810ee40c>] _synchronize_rcu_expedited.constprop.72+0x2fc/0x330
        [<ffffffff810ee634>] synchronize_rcu_expedited+0x24/0x30
        [<ffffffff811ccf23>] bdi_unregister+0x53/0x290
        [<ffffffff811cd1e9>] release_bdi+0x89/0xc0
        [<ffffffff811cd645>] wb_exit+0x85/0xa0
        [<ffffffff811cdc84>] cgwb_release_workfn+0x54/0xb0
        [<ffffffff810a68d0>] process_one_work+0x150/0x410
        [<ffffffff810a71fd>] worker_thread+0x6d/0x520
        [<ffffffff810ad3dc>] kthread+0x12c/0x160
        [<ffffffff81969019>] ret_from_fork+0x29/0x40
        [<ffffffffffffffff>] 0xffffffffffffffff
      
      The events leading to the lockup are...
      
      1. A lot of cgwb_release_workfn() is queued at the same time and all
         system_wq kworkers are assigned to execute them.
      
      2. They all end up calling synchronize_rcu_expedited().  One of them
         wins and tries to perform the expedited synchronization.
      
      3. However, that invovles queueing rcu_exp_work to system_wq and
         waiting for it.  Because #1 is holding all available kworkers on
         system_wq, rcu_exp_work can't be executed.  cgwb_release_workfn()
         is waiting for synchronize_rcu_expedited() which in turn is waiting
         for cgwb_release_workfn() to free up some of the kworkers.
      
      We shouldn't be scheduling hundreds of cgwb_release_workfn() at the
      same time.  There's nothing to be gained from that.  This patch
      updates cgwb release path to use a dedicated percpu workqueue with
      @max_active of 1.
      
      While this resolves the problem at hand, it might be a good idea to
      isolate rcu_exp_work to its own workqueue too as it can be used from
      various paths and is prone to this sort of indirect A-A deadlocks.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f1834646
    • Josef Bacik's avatar
      nbd: set discard granularity properly · 6df133a1
      Josef Bacik authored
      For some reason we had discard granularity set to 512 always even when
      discards were disabled.  Fix this by having the default be 0, and then
      if we turn it on set the discard granularity to the blocksize.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6df133a1
  5. 22 May, 2018 3 commits
  6. 21 May, 2018 2 commits
    • Jens Axboe's avatar
      nvme-pci: fix race between poll and IRQ completions · 68fa9dbe
      Jens Axboe authored
      If polling completions are racing with the IRQ triggered by a
      completion, the IRQ handler will find no work and return IRQ_NONE.
      This can trigger complaints about spurious interrupts:
      
      [  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
      [  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
      [  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
      [  560.175991] Call Trace:
      [  560.175994]  <IRQ>
      [  560.176005]  dump_stack+0x5c/0x7b
      [  560.176010]  __report_bad_irq+0x30/0xc0
      [  560.176013]  note_interrupt+0x235/0x280
      [  560.176020]  handle_irq_event_percpu+0x51/0x70
      [  560.176023]  handle_irq_event+0x27/0x50
      [  560.176026]  handle_edge_irq+0x6d/0x180
      [  560.176031]  handle_irq+0xa5/0x110
      [  560.176036]  do_IRQ+0x41/0xc0
      [  560.176042]  common_interrupt+0xf/0xf
      [  560.176043]  </IRQ>
      [  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
      [  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
      [  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
      [  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
      [  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
      [  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
      [  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
      [  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
      [  560.176071]  do_idle+0x1f4/0x260
      [  560.176075]  cpu_startup_entry+0x6f/0x80
      [  560.176080]  start_secondary+0x184/0x1d0
      [  560.176085]  secondary_startup_64+0xa5/0xb0
      [  560.176088] handlers:
      [  560.178387] [<00000000efb612be>] nvme_irq [nvme]
      [  560.183019] Disabling IRQ #630
      
      A previous commit removed ->cqe_seen that was handling this case,
      but we need to handle this a bit differently due to completions
      now running outside the queue lock. Return IRQ_HANDLED from the
      IRQ handler, if the completion ring head was moved since we last
      saw it.
      
      Fixes: 5cb525c8 ("nvme-pci: handle completions outside of the queue lock")
      Reported-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Tested-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      68fa9dbe
    • Jens Axboe's avatar
      Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-4.18/block · 81b1dab4
      Jens Axboe authored
      Pull NVMe changes from Keith:
      
      "This is just the first nvme pull request for 4.18. There are several
      fabrics and target patches that I missed, so there will be more to
      come."
      
      * 'nvme-4.18' of git://git.infradead.org/nvme:
        nvme-pci: drop IRQ disabling on submission queue lock
        nvme-pci: split the nvme queue lock into submission and completion locks
        nvme-pci: handle completions outside of the queue lock
        nvme-pci: move ->cq_vector == -1 check outside of ->q_lock
        nvme-pci: remove cq check after submission
        nvme-pci: simplify nvme_cqe_valid
        nvme: mark the result argument to nvme_complete_async_event volatile
        nvme/pci: Sync controller reset for AER slot_reset
        nvme/pci: Hold controller reference during async probe
        nvme: only reconfigure discard if necessary
        nvme/pci: Use async_schedule for initial reset work
        nvme: lightnvm: add granby support
        NVMe: Add Quirk Delay before CHK RDY for Seagate Nytro Flash Storage
        nvme: change order of qid and cmdid in completion trace
        nvme: fc: provide a descriptive error
      81b1dab4
  7. 18 May, 2018 8 commits
  8. 16 May, 2018 8 commits
  9. 15 May, 2018 1 commit
  10. 14 May, 2018 11 commits