1. 10 Dec, 2018 9 commits
    • Mikulas Patocka's avatar
      dm: remove the pending IO accounting · 6f757231
      Mikulas Patocka authored
      Remove the "pending" atomic counters, that duplicate block-core's
      in_flight counters, and update md_in_flight() to look at percpu
      in_flight counters.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6f757231
    • Mikulas Patocka's avatar
      block: return just one value from part_in_flight · e016b782
      Mikulas Patocka authored
      The previous patches deleted all the code that needed the second value
      returned from part_in_flight - now the kernel only uses the first value.
      
      Consequently, part_in_flight (and blk_mq_in_flight) may be changed so that
      it only returns one value.
      
      This patch just refactors the code, there's no functional change.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e016b782
    • Mikulas Patocka's avatar
      block: switch to per-cpu in-flight counters · 1226b8dd
      Mikulas Patocka authored
      Now when part_round_stats is gone, we can switch to per-cpu in-flight
      counters.
      
      We use the local-atomic type local_t, so that if part_inc_in_flight or
      part_dec_in_flight is reentrantly called from an interrupt, the value will
      be correct.
      
      The other counters could be corrupted due to reentrant interrupt, but the
      corruption only results in slight counter skew - the in_flight counter
      must be exact, so it needs local_t.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1226b8dd
    • Mikulas Patocka's avatar
      block: delete part_round_stats and switch to less precise counting · 5b18b5a7
      Mikulas Patocka authored
      We want to convert to per-cpu in_flight counters.
      
      The function part_round_stats needs the in_flight counter every jiffy, it
      would be too costly to sum all the percpu variables every jiffy, so it
      must be deleted. part_round_stats is used to calculate two counters -
      time_in_queue and io_ticks.
      
      time_in_queue can be calculated without part_round_stats, by adding the
      duration of the I/O when the I/O ends (the value is almost as exact as the
      previously calculated value, except that time for in-progress I/Os is not
      counted).
      
      io_ticks can be approximated by increasing the value when I/O is started
      or ended and the jiffies value has changed. If the I/Os take less than a
      jiffy, the value is as exact as the previously calculated value. If the
      I/Os take more than a jiffy, io_ticks can drift behind the previously
      calculated value.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5b18b5a7
    • Mike Snitzer's avatar
      block: stop passing 'cpu' to all percpu stats methods · 112f158f
      Mike Snitzer authored
      All of part_stat_* and related methods are used with preempt disabled,
      so there is no need to pass cpu around to allow of them.  Just call
      smp_processor_id() as needed.
      Suggested-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      112f158f
    • Mike Snitzer's avatar
      dm rq: leverage blk_mq_queue_busy() to check for outstanding IO · dbd3bbd2
      Mike Snitzer authored
      Now that request-based dm-multipath only supports blk-mq, make use of
      the newly introduced blk_mq_queue_busy() to check for outstanding IO --
      rather than (ab)using the block core's in_flight counters.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dbd3bbd2
    • Mikulas Patocka's avatar
      dm: dont rewrite dm_disk(md)->part0.in_flight · 80a787ba
      Mikulas Patocka authored
      generic_start_io_acct and generic_end_io_acct already update the variable
      in_flight using atomic operations, so we don't have to overwrite them
      again.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      80a787ba
    • Jens Axboe's avatar
      Merge tag 'v4.20-rc6' into for-4.21/block · 96f77410
      Jens Axboe authored
      Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the
      two corruption fixes. We're going to be overhauling the direct dispatch
      path, and we need to do that on top of the changes we made for that
      in mainline.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      96f77410
    • Jens Axboe's avatar
      sbitmap: silence bogus lockdep IRQ warning · 58ab5e32
      Jens Axboe authored
      Ming reports that lockdep spews the following trace. What this
      essentially says is that the sbitmap swap_lock was used inconsistently
      in IRQ enabled and disabled context, and that is usually indicative of a
      bug that will cause a deadlock.
      
      For this case, it's a false positive. The swap_lock is used from process
      context only, when we swap the bits in the word and cleared mask. We
      also end up doing that when we are getting a driver tag, from the
      blk_mq_mark_tag_wait(), and from there we hold the waitqueue lock with
      IRQs disabled. However, this isn't from an actual IRQ, it's still
      process context.
      
      In lieu of a better way to fix this, simply always disable interrupts
      when grabbing the swap_lock if lockdep is enabled.
      
      [  100.967642] ================start test sanity/001================
      [  101.238280] null: module loaded
      [  106.093735]
      [  106.094012] =====================================================
      [  106.094854] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      [  106.095759] 4.20.0-rc3_5d2ee712_for-next+ #1 Not tainted
      [  106.096551] -----------------------------------------------------
      [  106.097386] fio/1043 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      [  106.098231] 000000004c43fa71
      (&(&sb->map[i].swap_lock)->rlock){+.+.}, at: sbitmap_get+0xd5/0x22c
      [  106.099431]
      [  106.099431] and this task is already holding:
      [  106.100229] 000000007eec8b2f
      (&(&hctx->dispatch_wait_lock)->rlock){....}, at:
      blk_mq_dispatch_rq_list+0x4c1/0xd7c
      [  106.101630] which would create a new lock dependency:
      [  106.102326]  (&(&hctx->dispatch_wait_lock)->rlock){....} ->
      (&(&sb->map[i].swap_lock)->rlock){+.+.}
      [  106.103553]
      [  106.103553] but this new dependency connects a SOFTIRQ-irq-safe lock:
      [  106.104580]  (&sbq->ws[i].wait){..-.}
      [  106.104582]
      [  106.104582] ... which became SOFTIRQ-irq-safe at:
      [  106.105751]   _raw_spin_lock_irqsave+0x4b/0x82
      [  106.106284]   __wake_up_common_lock+0x119/0x1b9
      [  106.106825]   sbitmap_queue_wake_up+0x33f/0x383
      [  106.107456]   sbitmap_queue_clear+0x4c/0x9a
      [  106.108046]   __blk_mq_free_request+0x188/0x1d3
      [  106.108581]   blk_mq_free_request+0x23b/0x26b
      [  106.109102]   scsi_end_request+0x345/0x5d7
      [  106.109587]   scsi_io_completion+0x4b5/0x8f0
      [  106.110099]   scsi_finish_command+0x412/0x456
      [  106.110615]   scsi_softirq_done+0x23f/0x29b
      [  106.111115]   blk_done_softirq+0x2a7/0x2e6
      [  106.111608]   __do_softirq+0x360/0x6ad
      [  106.112062]   run_ksoftirqd+0x2f/0x5b
      [  106.112499]   smpboot_thread_fn+0x3a5/0x3db
      [  106.113000]   kthread+0x1d4/0x1e4
      [  106.113457]   ret_from_fork+0x3a/0x50
      [  106.113969]
      [  106.113969] to a SOFTIRQ-irq-unsafe lock:
      [  106.114672]  (&(&sb->map[i].swap_lock)->rlock){+.+.}
      [  106.114674]
      [  106.114674] ... which became SOFTIRQ-irq-unsafe at:
      [  106.116000] ...
      [  106.116003]   _raw_spin_lock+0x33/0x64
      [  106.116676]   sbitmap_get+0xd5/0x22c
      [  106.117134]   __sbitmap_queue_get+0xe8/0x177
      [  106.117731]   __blk_mq_get_tag+0x1e6/0x22d
      [  106.118286]   blk_mq_get_tag+0x1db/0x6e4
      [  106.118756]   blk_mq_get_driver_tag+0x161/0x258
      [  106.119383]   blk_mq_dispatch_rq_list+0x28e/0xd7c
      [  106.120043]   blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.120607]   blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.121234]   __blk_mq_run_hw_queue+0x137/0x17e
      [  106.121781]   __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.122366]   blk_mq_run_hw_queue+0x151/0x187
      [  106.122887]   blk_mq_sched_insert_requests+0x13f/0x175
      [  106.123492]   blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.124042]   blk_flush_plug_list+0x392/0x3d7
      [  106.124557]   blk_finish_plug+0x37/0x4f
      [  106.125019]   read_pages+0x3ef/0x430
      [  106.125446]   __do_page_cache_readahead+0x18e/0x2fc
      [  106.126027]   force_page_cache_readahead+0x121/0x133
      [  106.126621]   page_cache_sync_readahead+0x35f/0x3bb
      [  106.127229]   generic_file_buffered_read+0x410/0x1860
      [  106.127932]   __vfs_read+0x319/0x38f
      [  106.128415]   vfs_read+0xd2/0x19a
      [  106.128817]   ksys_read+0xb9/0x135
      [  106.129225]   do_syscall_64+0x140/0x385
      [  106.129684]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.130292]
      [  106.130292] other info that might help us debug this:
      [  106.130292]
      [  106.131226] Chain exists of:
      [  106.131226]   &sbq->ws[i].wait -->
      &(&hctx->dispatch_wait_lock)->rlock -->
      &(&sb->map[i].swap_lock)->rlock
      [  106.131226]
      [  106.132865]  Possible interrupt unsafe locking scenario:
      [  106.132865]
      [  106.133659]        CPU0                    CPU1
      [  106.134194]        ----                    ----
      [  106.134733]   lock(&(&sb->map[i].swap_lock)->rlock);
      [  106.135318]                                local_irq_disable();
      [  106.136014]                                lock(&sbq->ws[i].wait);
      [  106.136747]
      lock(&(&hctx->dispatch_wait_lock)->rlock);
      [  106.137742]   <Interrupt>
      [  106.138110]     lock(&sbq->ws[i].wait);
      [  106.138625]
      [  106.138625]  *** DEADLOCK ***
      [  106.138625]
      [  106.139430] 3 locks held by fio/1043:
      [  106.139947]  #0: 0000000076ff0fd9 (rcu_read_lock){....}, at:
      hctx_lock+0x29/0xe8
      [  106.140813]  #1: 000000002feb1016 (&sbq->ws[i].wait){..-.}, at:
      blk_mq_dispatch_rq_list+0x4ad/0xd7c
      [  106.141877]  #2: 000000007eec8b2f
      (&(&hctx->dispatch_wait_lock)->rlock){....}, at:
      blk_mq_dispatch_rq_list+0x4c1/0xd7c
      [  106.143267]
      [  106.143267] the dependencies between SOFTIRQ-irq-safe lock and the
      holding lock:
      [  106.144351]  -> (&sbq->ws[i].wait){..-.} ops: 82 {
      [  106.144926]     IN-SOFTIRQ-W at:
      [  106.145314]                       _raw_spin_lock_irqsave+0x4b/0x82
      [  106.146042]                       __wake_up_common_lock+0x119/0x1b9
      [  106.146785]                       sbitmap_queue_wake_up+0x33f/0x383
      [  106.147567]                       sbitmap_queue_clear+0x4c/0x9a
      [  106.148379]                       __blk_mq_free_request+0x188/0x1d3
      [  106.149148]                       blk_mq_free_request+0x23b/0x26b
      [  106.149864]                       scsi_end_request+0x345/0x5d7
      [  106.150546]                       scsi_io_completion+0x4b5/0x8f0
      [  106.151367]                       scsi_finish_command+0x412/0x456
      [  106.152157]                       scsi_softirq_done+0x23f/0x29b
      [  106.152855]                       blk_done_softirq+0x2a7/0x2e6
      [  106.153537]                       __do_softirq+0x360/0x6ad
      [  106.154280]                       run_ksoftirqd+0x2f/0x5b
      [  106.155020]                       smpboot_thread_fn+0x3a5/0x3db
      [  106.155828]                       kthread+0x1d4/0x1e4
      [  106.156526]                       ret_from_fork+0x3a/0x50
      [  106.157267]     INITIAL USE at:
      [  106.157713]                      _raw_spin_lock_irqsave+0x4b/0x82
      [  106.158542]                      prepare_to_wait_exclusive+0xa8/0x215
      [  106.159421]                      blk_mq_get_tag+0x34f/0x6e4
      [  106.160186]                      blk_mq_get_request+0x48e/0xaef
      [  106.160997]                      blk_mq_make_request+0x27e/0xbd2
      [  106.161828]                      generic_make_request+0x4d1/0x873
      [  106.162661]                      submit_bio+0x20c/0x253
      [  106.163379]                      mpage_bio_submit+0x44/0x4b
      [  106.164142]                      mpage_readpages+0x3c2/0x407
      [  106.164919]                      read_pages+0x13a/0x430
      [  106.165633]                      __do_page_cache_readahead+0x18e/0x2fc
      [  106.166530]                      force_page_cache_readahead+0x121/0x133
      [  106.167439]                      page_cache_sync_readahead+0x35f/0x3bb
      [  106.168337]                      generic_file_buffered_read+0x410/0x1860
      [  106.169255]                      __vfs_read+0x319/0x38f
      [  106.169977]                      vfs_read+0xd2/0x19a
      [  106.170662]                      ksys_read+0xb9/0x135
      [  106.171356]                      do_syscall_64+0x140/0x385
      [  106.172120]                      entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.173051]   }
      [  106.173308]   ... key      at: [<ffffffff85094600>] __key.26481+0x0/0x40
      [  106.174219]   ... acquired at:
      [  106.174646]    _raw_spin_lock+0x33/0x64
      [  106.175183]    blk_mq_dispatch_rq_list+0x4c1/0xd7c
      [  106.175843]    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.176518]    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.177262]    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.177900]    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.178591]    blk_mq_run_hw_queue+0x151/0x187
      [  106.179207]    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.179926]    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.180571]    blk_flush_plug_list+0x392/0x3d7
      [  106.181187]    blk_finish_plug+0x37/0x4f
      [  106.181737]    __se_sys_io_submit+0x171/0x304
      [  106.182346]    do_syscall_64+0x140/0x385
      [  106.182895]    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.183607]
      [  106.183830] -> (&(&hctx->dispatch_wait_lock)->rlock){....} ops: 1 {
      [  106.184691]    INITIAL USE at:
      [  106.185119]                    _raw_spin_lock+0x33/0x64
      [  106.185838]                    blk_mq_dispatch_rq_list+0x4c1/0xd7c
      [  106.186697]                    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.187551]                    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.188481]                    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.189307]                    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.190189]                    blk_mq_run_hw_queue+0x151/0x187
      [  106.190989]                    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.191902]                    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.192739]                    blk_flush_plug_list+0x392/0x3d7
      [  106.193535]                    blk_finish_plug+0x37/0x4f
      [  106.194269]                    __se_sys_io_submit+0x171/0x304
      [  106.195059]                    do_syscall_64+0x140/0x385
      [  106.195794]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.196705]  }
      [  106.196950]  ... key      at: [<ffffffff84880620>] __key.51231+0x0/0x40
      [  106.197853]  ... acquired at:
      [  106.198270]    lock_acquire+0x280/0x2f3
      [  106.198806]    _raw_spin_lock+0x33/0x64
      [  106.199337]    sbitmap_get+0xd5/0x22c
      [  106.199850]    __sbitmap_queue_get+0xe8/0x177
      [  106.200450]    __blk_mq_get_tag+0x1e6/0x22d
      [  106.201035]    blk_mq_get_tag+0x1db/0x6e4
      [  106.201589]    blk_mq_get_driver_tag+0x161/0x258
      [  106.202237]    blk_mq_dispatch_rq_list+0x5b9/0xd7c
      [  106.202902]    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.203572]    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.204316]    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.204956]    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.205649]    blk_mq_run_hw_queue+0x151/0x187
      [  106.206269]    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.206997]    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.207644]    blk_flush_plug_list+0x392/0x3d7
      [  106.208264]    blk_finish_plug+0x37/0x4f
      [  106.208814]    __se_sys_io_submit+0x171/0x304
      [  106.209415]    do_syscall_64+0x140/0x385
      [  106.209965]    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.210684]
      [  106.210904]
      [  106.210904] the dependencies between the lock to be acquired
      [  106.210905]  and SOFTIRQ-irq-unsafe lock:
      [  106.212541] -> (&(&sb->map[i].swap_lock)->rlock){+.+.} ops: 1969 {
      [  106.213393]    HARDIRQ-ON-W at:
      [  106.213840]                     _raw_spin_lock+0x33/0x64
      [  106.214570]                     sbitmap_get+0xd5/0x22c
      [  106.215282]                     __sbitmap_queue_get+0xe8/0x177
      [  106.216086]                     __blk_mq_get_tag+0x1e6/0x22d
      [  106.216876]                     blk_mq_get_tag+0x1db/0x6e4
      [  106.217627]                     blk_mq_get_driver_tag+0x161/0x258
      [  106.218465]                     blk_mq_dispatch_rq_list+0x28e/0xd7c
      [  106.219326]                     blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.220198]                     blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.221138]                     __blk_mq_run_hw_queue+0x137/0x17e
      [  106.221975]                     __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.222874]                     blk_mq_run_hw_queue+0x151/0x187
      [  106.223686]                     blk_mq_sched_insert_requests+0x13f/0x175
      [  106.224597]                     blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.225444]                     blk_flush_plug_list+0x392/0x3d7
      [  106.226255]                     blk_finish_plug+0x37/0x4f
      [  106.227006]                     read_pages+0x3ef/0x430
      [  106.227717]                     __do_page_cache_readahead+0x18e/0x2fc
      [  106.228595]                     force_page_cache_readahead+0x121/0x133
      [  106.229491]                     page_cache_sync_readahead+0x35f/0x3bb
      [  106.230373]                     generic_file_buffered_read+0x410/0x1860
      [  106.231277]                     __vfs_read+0x319/0x38f
      [  106.231986]                     vfs_read+0xd2/0x19a
      [  106.232666]                     ksys_read+0xb9/0x135
      [  106.233350]                     do_syscall_64+0x140/0x385
      [  106.234097]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.235012]    SOFTIRQ-ON-W at:
      [  106.235460]                     _raw_spin_lock+0x33/0x64
      [  106.236195]                     sbitmap_get+0xd5/0x22c
      [  106.236913]                     __sbitmap_queue_get+0xe8/0x177
      [  106.237715]                     __blk_mq_get_tag+0x1e6/0x22d
      [  106.238488]                     blk_mq_get_tag+0x1db/0x6e4
      [  106.239244]                     blk_mq_get_driver_tag+0x161/0x258
      [  106.240079]                     blk_mq_dispatch_rq_list+0x28e/0xd7c
      [  106.240937]                     blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.241806]                     blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.242751]                     __blk_mq_run_hw_queue+0x137/0x17e
      [  106.243579]                     __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.244469]                     blk_mq_run_hw_queue+0x151/0x187
      [  106.245277]                     blk_mq_sched_insert_requests+0x13f/0x175
      [  106.246191]                     blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.247044]                     blk_flush_plug_list+0x392/0x3d7
      [  106.247859]                     blk_finish_plug+0x37/0x4f
      [  106.248749]                     read_pages+0x3ef/0x430
      [  106.249463]                     __do_page_cache_readahead+0x18e/0x2fc
      [  106.250357]                     force_page_cache_readahead+0x121/0x133
      [  106.251263]                     page_cache_sync_readahead+0x35f/0x3bb
      [  106.252157]                     generic_file_buffered_read+0x410/0x1860
      [  106.253084]                     __vfs_read+0x319/0x38f
      [  106.253808]                     vfs_read+0xd2/0x19a
      [  106.254488]                     ksys_read+0xb9/0x135
      [  106.255186]                     do_syscall_64+0x140/0x385
      [  106.255943]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.256867]    INITIAL USE at:
      [  106.257300]                    _raw_spin_lock+0x33/0x64
      [  106.258033]                    sbitmap_get+0xd5/0x22c
      [  106.258747]                    __sbitmap_queue_get+0xe8/0x177
      [  106.259542]                    __blk_mq_get_tag+0x1e6/0x22d
      [  106.260320]                    blk_mq_get_tag+0x1db/0x6e4
      [  106.261072]                    blk_mq_get_driver_tag+0x161/0x258
      [  106.261902]                    blk_mq_dispatch_rq_list+0x28e/0xd7c
      [  106.262762]                    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.263626]                    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.264571]                    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.265409]                    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.266302]                    blk_mq_run_hw_queue+0x151/0x187
      [  106.267111]                    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.268028]                    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.268878]                    blk_flush_plug_list+0x392/0x3d7
      [  106.269694]                    blk_finish_plug+0x37/0x4f
      [  106.270432]                    read_pages+0x3ef/0x430
      [  106.271139]                    __do_page_cache_readahead+0x18e/0x2fc
      [  106.272040]                    force_page_cache_readahead+0x121/0x133
      [  106.272932]                    page_cache_sync_readahead+0x35f/0x3bb
      [  106.273811]                    generic_file_buffered_read+0x410/0x1860
      [  106.274709]                    __vfs_read+0x319/0x38f
      [  106.275407]                    vfs_read+0xd2/0x19a
      [  106.276074]                    ksys_read+0xb9/0x135
      [  106.276764]                    do_syscall_64+0x140/0x385
      [  106.277500]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  106.278417]  }
      [  106.278676]  ... key      at: [<ffffffff85094640>] __key.26212+0x0/0x40
      [  106.279586]  ... acquired at:
      [  106.280026]    lock_acquire+0x280/0x2f3
      [  106.280559]    _raw_spin_lock+0x33/0x64
      [  106.281101]    sbitmap_get+0xd5/0x22c
      [  106.281610]    __sbitmap_queue_get+0xe8/0x177
      [  106.282221]    __blk_mq_get_tag+0x1e6/0x22d
      [  106.282809]    blk_mq_get_tag+0x1db/0x6e4
      [  106.283368]    blk_mq_get_driver_tag+0x161/0x258
      [  106.284018]    blk_mq_dispatch_rq_list+0x5b9/0xd7c
      [  106.284685]    blk_mq_do_dispatch_sched+0x23a/0x287
      [  106.285371]    blk_mq_sched_dispatch_requests+0x379/0x3fc
      [  106.286135]    __blk_mq_run_hw_queue+0x137/0x17e
      [  106.286806]    __blk_mq_delay_run_hw_queue+0x80/0x25f
      [  106.287515]    blk_mq_run_hw_queue+0x151/0x187
      [  106.288149]    blk_mq_sched_insert_requests+0x13f/0x175
      [  106.289041]    blk_mq_flush_plug_list+0x7d6/0x81b
      [  106.289912]    blk_flush_plug_list+0x392/0x3d7
      [  106.290590]    blk_finish_plug+0x37/0x4f
      [  106.291238]    __se_sys_io_submit+0x171/0x304
      [  106.291864]    do_syscall_64+0x140/0x385
      [  106.292534]    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Reported-by: default avatarMing Lei <ming.lei@redhat.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      58ab5e32
  2. 09 Dec, 2018 20 commits
    • Linus Torvalds's avatar
      Linux 4.20-rc6 · 40e020c1
      Linus Torvalds authored
      40e020c1
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · d48f782e
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "A decent batch of fixes here. I'd say about half are for problems that
        have existed for a while, and half are for new regressions added in
        the 4.20 merge window.
      
         1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach.
      
         2) Revert bogus emac driver change, from Benjamin Herrenschmidt.
      
         3) Handle BPF exported data structure with pointers when building
            32-bit userland, from Daniel Borkmann.
      
         4) Memory leak fix in act_police, from Davide Caratti.
      
         5) Check RX checksum offload in RX descriptors properly in aquantia
            driver, from Dmitry Bogdanov.
      
         6) SKB unlink fix in various spots, from Edward Cree.
      
         7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from
            Eric Dumazet.
      
         8) Fix FID leak in mlxsw driver, from Ido Schimmel.
      
         9) IOTLB locking fix in vhost, from Jean-Philippe Brucker.
      
        10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory
            limits otherwise namespace exit can hang. From Jiri Wiesner.
      
        11) Address block parsing length fixes in x25 from Martin Schiller.
      
        12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan.
      
        13) For tun interfaces, only iface delete works with rtnl ops, enforce
            this by disallowing add. From Nicolas Dichtel.
      
        14) Use after free in liquidio, from Pan Bian.
      
        15) Fix SKB use after passing to netif_receive_skb(), from Prashant
            Bhole.
      
        16) Static key accounting and other fixes in XPS from Sabrina Dubroca.
      
        17) Partially initialized flow key passed to ip6_route_output(), from
            Shmulik Ladkani.
      
        18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas
            Falcon.
      
        19) Several small TCP fixes (off-by-one on window probe abort, NULL
            deref in tail loss probe, SNMP mis-estimations) from Yuchung
            Cheng"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
        net/sched: cls_flower: Reject duplicated rules also under skip_sw
        bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips.
        bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips.
        bnxt_en: Keep track of reserved IRQs.
        bnxt_en: Fix CNP CoS queue regression.
        net/mlx4_core: Correctly set PFC param if global pause is turned off.
        Revert "net/ibm/emac: wrong bit is used for STA control"
        neighbour: Avoid writing before skb->head in neigh_hh_output()
        ipv6: Check available headroom in ip6_xmit() even without options
        tcp: lack of available data can also cause TSO defer
        ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
        mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl
        mlxsw: spectrum_router: Relax GRE decap matching check
        mlxsw: spectrum_switchdev: Avoid leaking FID's reference count
        mlxsw: spectrum_nve: Remove easily triggerable warnings
        ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
        sctp: frag_point sanity check
        tcp: fix NULL ref in tail loss probe
        tcp: Do not underestimate rwnd_limited
        net: use skb_list_del_init() to remove from RX sublists
        ...
      d48f782e
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8586ca8a
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Three fixes: a boot parameter re-(re-)fix, a retpoline build artifact
        fix and an LLVM workaround"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/vdso: Drop implicit common-page-size linker flag
        x86/build: Fix compiler support check for CONFIG_RETPOLINE
        x86/boot: Clear RSDP address in boot_params for broken loaders
      8586ca8a
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ebbd3000
      Linus Torvalds authored
      Pull kprobes fixes from Ingo Molnar:
       "Two kprobes fixes: a blacklist fix and an instruction patching related
        corruption fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        kprobes/x86: Blacklist non-attachable interrupt functions
        kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction
      ebbd3000
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4b04e73a
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Two fixes: a large-system fix and an earlyprintk fix with certain
        resolutions"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/earlyprintk/efi: Fix infinite loop on some screen widths
        x86/efi: Allocate e820 buffer before calling efi_exit_boot_service
      4b04e73a
    • Or Gerlitz's avatar
      net/sched: cls_flower: Reject duplicated rules also under skip_sw · 35cc3cef
      Or Gerlitz authored
      Currently, duplicated rules are rejected only for skip_hw or "none",
      hence allowing users to push duplicates into HW for no reason.
      
      Use the flower tables to protect for that.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35cc3cef
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Bug-fixes' · d4b60e94
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      The first patch fixes a regression on CoS queue setup, introduced
      recently by the 57500 new chip support patches.  The rest are
      fixes related to ring and resource accounting on the new 57500 chips.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4b60e94
    • Michael Chan's avatar
      bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips. · e30fbc33
      Michael Chan authored
      The CP rings are accounted differently on the new 57500 chips.  There
      must be enough CP rings for the sum of RX and TX rings on the new
      chips.  The current logic may be over-estimating the RX and TX rings.
      
      The output parameter max_cp should be the maximum NQs capped by
      MSIX vectors available for networking in the context of 57500 chips.
      The existing code which uses CMPL rings capped by the MSIX vectors
      works most of the time but is not always correct.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e30fbc33
    • Michael Chan's avatar
      bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips. · c0b8cda0
      Michael Chan authored
      The new 57500 chips have introduced the NQ structure in addition to
      the existing CP rings in all chips.  We need to introduce a new
      bnxt_nq_rings_in_use().  On legacy chips, the 2 functions are the
      same and one will just call the other.  On the new chips, they
      refer to the 2 separate ring structures.  The new function is now
      called to determine the resource (NQ or CP rings) associated with
      MSIX that are in use.
      
      On 57500 chips, the RDMA driver does not use the CP rings so
      we don't need to do the subtraction adjustment.
      
      Fixes: 41e8d798 ("bnxt_en: Modify the ring reservation functions for 57500 series chips.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0b8cda0
    • Michael Chan's avatar
      bnxt_en: Keep track of reserved IRQs. · 75720e63
      Michael Chan authored
      The new 57500 chips use 1 NQ per MSIX vector, whereas legacy chips use
      1 CP ring per MSIX vector.  To better unify this, add a resv_irqs
      field to struct bnxt_hw_resc.  On legacy chips, we initialize resv_irqs
      with resv_cp_rings.  On new chips, we initialize it with the allocated
      MSIX resources.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75720e63
    • Michael Chan's avatar
      bnxt_en: Fix CNP CoS queue regression. · 804fba4e
      Michael Chan authored
      Recent changes to support the 57500 devices have created this
      regression.  The bnxt_hwrm_queue_qportcfg() call was moved to be
      called earlier before the RDMA support was determined, causing
      the CoS queues configuration to be set before knowing whether RDMA
      was supported or not.  Fix it by moving it to the right place right
      after RDMA support is determined.
      
      Fixes: 98f04cf0 ("bnxt_en: Check context memory requirements from firmware.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      804fba4e
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 0844895a
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small driver fixes for 4.20-rc6.
      
        There is a hyperv fix that for some reaon took forever to get into a
        shape that could be applied to the tree properly, but resolves a much
        reported issue. The others are some gnss patches, one a bugfix and the
        two others updates to the MAINTAINERS file to properly match the gnss
        files in the tree.
      
        All have been in linux-next for a while with no reported issues"
      
      * tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching
        MAINTAINERS: add gnss scm tree
        gnss: sirf: fix activation retry handling
        Drivers: hv: vmbus: Offload the handling of channels to two workqueues
      0844895a
    • Linus Torvalds's avatar
      Merge tag 'staging-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 47dcb080
      Linus Torvalds authored
      Pull staging fixes from Greg KH:
       "Here are two staging driver bugfixes for 4.20-rc6.
      
        One is a revert of a previously incorrect patch that was merged a
        while ago, and the other resolves a possible buffer overrun that was
        found by code inspection.
      
        Both of these have been in the linux-next tree with no reported
        issues"
      
      * tag 'staging-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        Revert commit ef9209b6 "staging: rtl8723bs: Fix indenting errors and an off-by-one mistake in core/rtw_mlme_ext.c"
        staging: rtl8712: Fix possible buffer overrun
      47dcb080
    • Linus Torvalds's avatar
      Merge tag 'tty-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 822b7683
      Linus Torvalds authored
      Pull tty driver fixes from Greg KH:
       "Here are three small tty driver fixes for 4.20-rc6
      
        Nothing major, just some bug fixes for reported issues. Full details
        are in the shortlog.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'tty-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        kgdboc: fix KASAN global-out-of-bounds bug in param_set_kgdboc_var()
        tty: serial: 8250_mtk: always resume the device in probe.
        tty: do not set TTY_IO_ERROR flag if console port
      822b7683
    • Linus Torvalds's avatar
      Merge tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 50a5528a
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 4.20-rc6
      
        The "largest" here are some xhci fixes for reported issues. Also here
        is a USB core fix, some quirk additions, and a usb-serial fix which
        required the export of one of the tty layer's functions to prevent
        code duplication. The tty maintainer agreed with this change.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        xhci: Prevent U1/U2 link pm states if exit latency is too long
        xhci: workaround CSS timeout on AMD SNPS 3.0 xHC
        USB: check usb_get_extra_descriptor for proper size
        USB: serial: console: fix reported terminal settings
        usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device
        USB: Fix invalid-free bug in port_over_current_notify()
        usb: appledisplay: Add 27" Apple Cinema Display
      50a5528a
    • Linus Torvalds's avatar
      Merge tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · bc4caf18
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three small fixes: a fix for smb3 direct i/o, a fix for CIFS DFS for
        stable and a minor cifs Kconfig fix"
      
      * tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: Avoid returning EBUSY to upper layer VFS
        cifs: Fix separator when building path from dentry
        cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)
      bc4caf18
    • Linus Torvalds's avatar
      Merge tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · fa82dcbf
      Linus Torvalds authored
      Pull dax fixes from Dan Williams:
       "The last of the known regression fixes and fallout from the Xarray
        conversion of the filesystem-dax implementation.
      
        On the path to debugging why the dax memory-failure injection test
        started failing after the Xarray conversion a couple more fixes for
        the dax_lock_mapping_entry(), now called dax_lock_page(), surfaced.
        Those plus the bug that started the hunt are now addressed. These
        patches have appeared in a -next release with no issues reported.
      
        Note the touches to mm/memory-failure.c are just the conversion to the
        new function signature for dax_lock_page().
      
        Summary:
      
         - Fix the Xarray conversion of fsdax to properly handle
           dax_lock_mapping_entry() in the presense of pmd entries
      
         - Fix inode destruction racing a new lock request"
      
      * tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        dax: Fix unlock mismatch with updated API
        dax: Don't access a freed inode
        dax: Check page->mapping isn't NULL
      fa82dcbf
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · bd799eb6
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "A regression fix for the Address Range Scrub implementation, yes
        another one, and support for platforms that misalign persistent memory
        relative to the Linux memory hotplug section constraint. Longer term,
        support for sub-section memory hotplug would alleviate alignment
        waste, but until then this hack allows a 'struct page' memmap to be
        established for these misaligned memory regions.
      
        These have all appeared in a -next release, and thanks to Patrick for
        reporting and testing the alignment padding fix.
      
        Summary:
      
         - Unless and until the core mm handles memory hotplug units smaller
           than a section (128M), persistent memory namespaces must be padded
           to section alignment.
      
           The libnvdimm core already handled section collision with "System
           RAM", but some configurations overlap independent "Persistent
           Memory" ranges within a section, so additional padding injection is
           added for that case.
      
         - The recent reworks of the ARS (address range scrub) state machine
           to reduce the number of state flags inadvertantly missed a
           conversion of acpi_nfit_ars_rescan() call sites. Fix the regression
           whereby user-requested ARS results in a "short" scrub rather than a
           "long" scrub.
      
         - Fixup the unit tests to handle / test the 128M section alignment of
           mocked test resources.
      
      * tag 'libnvdimm-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short"
        libnvdimm, pfn: Pad pfn namespaces relative to other regions
        tools/testing/nvdimm: Align test resources to 128M
      bd799eb6
    • Tarick Bedeir's avatar
      net/mlx4_core: Correctly set PFC param if global pause is turned off. · bd5122cd
      Tarick Bedeir authored
      rx_ppp and tx_ppp can be set between 0 and 255, so don't clamp to 1.
      
      Fixes: 6e8814ce ("net/mlx4_en: Fix mixed PFC and Global pause user control requests")
      Signed-off-by: default avatarTarick Bedeir <tarick@google.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd5122cd
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal · 6ec067e3
      Linus Torvalds authored
      Pull thermal SoC fixes from Eduardo Valentin:
       "Fixes for armada and broadcom thermal drivers"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
        thermal: broadcom: constify thermal_zone_of_device_ops structure
        thermal: armada: constify thermal_zone_of_device_ops structure
        thermal: bcm2835: Switch to SPDX identifier
        thermal: armada: fix legacy resource fixup
        thermal: armada: fix legacy validity test sense
      6ec067e3
  3. 08 Dec, 2018 11 commits