1. 05 May, 2018 3 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180504' of git://git.kernel.dk/linux-block · 2f50037a
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes that should to into this release. This contains:
      
         - Set of bcache fixes from Coly, fixing regression in patches that
           went into this series.
      
         - Set of NVMe fixes by way of Keith.
      
         - Set of bdi related fixes, one from Jan and two from Tetsuo Handa,
           fixing various issues around device addition/removal.
      
         - Two block inflight fixes from Omar, fixing issues around the
           transition to using tags for blk-mq inflight accounting that we
           did a few releases ago"
      
      * tag 'for-linus-20180504' of git://git.kernel.dk/linux-block:
        bdi: Fix oops in wb_workfn()
        nvmet: switch loopback target state to connecting when resetting
        nvme/multipath: Fix multipath disabled naming collisions
        nvme/multipath: Disable runtime writable enabling parameter
        nvme: Set integrity flag for user passthrough commands
        nvme: fix potential memory leak in option parsing
        bdi: Fix use after free bug in debugfs_remove()
        bdi: wake up concurrent wb_shutdown() callers.
        bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set
        bcache: set dc->io_disable to true in conditional_stop_bcache_device()
        bcache: add wait_for_kthread_stop() in bch_allocator_thread()
        bcache: count backing device I/O error for writeback I/O
        bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
        bcache: store disk name in struct cache and struct cached_dev
        blk-mq: fix sysfs inflight counter
        blk-mq: count allocated but not started requests in iostats inflight
      2f50037a
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 2e171ffc
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "I've got one more bug fix for xfs for 4.17-rc4, which caps the amount
        of data we try to handle in one dedupe request so that userspace can't
        livelock the kernel.
      
        This series has been run through a full xfstests run during the week
        and through a quick xfstests run against this morning's master, with
        no ajor failures reported.
      
        Summary:
      
        - Cap the maximum length of a deduplication request at MAX_RW_COUNT/2
          to avoid kernel livelock due to excessively large IO requests"
      
      * tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: cap the length of deduplication requests
      2e171ffc
    • Linus Torvalds's avatar
      Merge tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 4148d388
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Two regression fixes and one fix for stable"
      
      * tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: send, fix missing truncate for inode with prealloc extent past eof
        btrfs: Take trans lock before access running trans in check_delayed_ref
        btrfs: Fix wrong first_key parameter in replace_path
      4148d388
  2. 04 May, 2018 11 commits
  3. 03 May, 2018 26 commits
    • Jan Kara's avatar
      bdi: Fix oops in wb_workfn() · b8b78495
      Jan Kara authored
      Syzbot has reported that it can hit a NULL pointer dereference in
      wb_workfn() due to wb->bdi->dev being NULL. This indicates that
      wb_workfn() was called for an already unregistered bdi which should not
      happen as wb_shutdown() called from bdi_unregister() should make sure
      all pending writeback works are completed before bdi is unregistered.
      Except that wb_workfn() itself can requeue the work with:
      
      	mod_delayed_work(bdi_wq, &wb->dwork, 0);
      
      and if this happens while wb_shutdown() is waiting in:
      
      	flush_delayed_work(&wb->dwork);
      
      the dwork can get executed after wb_shutdown() has finished and
      bdi_unregister() has cleared wb->bdi->dev.
      
      Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
      the necessary precautions against racing with bdi unregistration.
      
      CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      CC: Tejun Heo <tj@kernel.org>
      Fixes: 839a8e86Reported-by: default avatarsyzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b8b78495
    • Eric Dumazet's avatar
      dccp: fix tasklet usage · a8d7aa17
      Eric Dumazet authored
      syzbot reported a crash in tasklet_action_common() caused by dccp.
      
      dccp needs to make sure socket wont disappear before tasklet handler
      has completed.
      
      This patch takes a reference on the socket when arming the tasklet,
      and moves the sock_put() from dccp_write_xmit_timer() to dccp_write_xmitlet()
      
      kernel BUG at kernel/softirq.c:514!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.17.0-rc3+ #30
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515
      RSP: 0018:ffff8801d9b3faf8 EFLAGS: 00010246
      dccp_close: ABORT with 65423 bytes unread
      RAX: 1ffff1003b367f6b RBX: ffff8801daf1f3f0 RCX: 0000000000000000
      RDX: ffff8801cf895498 RSI: 0000000000000004 RDI: 0000000000000000
      RBP: ffff8801d9b3fc40 R08: ffffed0039f12a95 R09: ffffed0039f12a94
      dccp_close: ABORT with 65423 bytes unread
      R10: ffffed0039f12a94 R11: ffff8801cf8954a3 R12: 0000000000000000
      R13: ffff8801d9b3fc18 R14: dffffc0000000000 R15: ffff8801cf895490
      FS:  0000000000000000(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2bc28000 CR3: 00000001a08a9000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tasklet_action+0x1d/0x20 kernel/softirq.c:533
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
      dccp_close: ABORT with 65423 bytes unread
       run_ksoftirqd+0x86/0x100 kernel/softirq.c:646
       smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
       kthread+0x345/0x410 kernel/kthread.c:238
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      Code: 48 8b 85 e8 fe ff ff 48 8b 95 f0 fe ff ff e9 94 fb ff ff 48 89 95 f0 fe ff ff e8 81 53 6e 00 48 8b 95 f0 fe ff ff e9 62 fb ff ff <0f> 0b 48 89 cf 48 89 8d e8 fe ff ff e8 64 53 6e 00 48 8b 8d e8
      RIP: tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515 RSP: ffff8801d9b3faf8
      
      Fixes: dc841e30 ("dccp: Extend CCID packet dequeueing interface")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: dccp@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8d7aa17
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · 31140b47
      David S. Miller authored
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2018/05/03
      
      here are smc fixes for 2 problems:
       * receive buffers in SMC must be registered. If registration fails
         these buffers must not be kept within the link group for reuse.
         Patch 1 is a preparational patch; patch 2 contains the fix.
       * sendpage: do not hold the sock lock when calling kernel_sendpage()
                   or sock_no_sendpage()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31140b47
    • Stefan Raspl's avatar
      smc: fix sendpage() call · bda27ff5
      Stefan Raspl authored
      The sendpage() call grabs the sock lock before calling the default
      implementation - which tries to grab it once again.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.ibm.com>
      Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bda27ff5
    • Karsten Graul's avatar
      net/smc: handle unregistered buffers · a6920d1d
      Karsten Graul authored
      When smc_wr_reg_send() fails then tag (regerr) the affected buffer and
      free it in smc_buf_unuse().
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6920d1d
    • Karsten Graul's avatar
      net/smc: call consolidation · e63a5f8c
      Karsten Graul authored
      Consolidate the call to smc_wr_reg_send() in a new function.
      No functional changes.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e63a5f8c
    • Colin Ian King's avatar
      qed: fix spelling mistake: "offloded" -> "offloaded" · df80b8fb
      Colin Ian King authored
      Trivial fix to spelling mistake in DP_NOTICE message
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df80b8fb
    • Colin Ian King's avatar
      net/mlx5e: fix spelling mistake: "loobpack" -> "loopback" · 4e11581c
      Colin Ian King authored
      Trivial fix to spelling mistake in netdev_err error message
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e11581c
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-4.17-4' of git://git.infradead.org/users/hch/dma-mapping · c15f6d8d
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
       "Fix an incorrect warning selection introduced in the last merge
        window"
      
      * tag 'dma-mapping-4.17-4' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: fix inversed DMA_ATTR_NO_WARN test
      c15f6d8d
    • Johannes Thumshirn's avatar
      nvmet: switch loopback target state to connecting when resetting · 8bfc3b4c
      Johannes Thumshirn authored
      After commit bb06ec31 ("nvme: expand nvmf_check_if_ready checks")
      resetting of the loopback nvme target failed as we forgot to switch
      it's state to NVME_CTRL_CONNECTING before we reconnect the admin
      queues. Therefore the checks in nvmf_check_if_ready() choose to go to
      the reject_io case and thus we couldn't sent out an identify
      controller command to reconnect.
      
      Change the controller state to NVME_CTRL_CONNECTING after tearing down
      the old connection and before re-establishing the connection.
      
      Fixes: bb06ec31 ("nvme: expand nvmf_check_if_ready checks")
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8bfc3b4c
    • Keith Busch's avatar
      nvme/multipath: Fix multipath disabled naming collisions · a785dbcc
      Keith Busch authored
      When CONFIG_NVME_MULTIPATH is set, but we're not using nvme to multipath,
      namespaces with multiple paths were not creating unique names due to
      reusing the same instance number from the namespace's head.
      
      This patch fixes this by falling back to the non-multipath naming method
      when the parameter disabled using multipath.
      Reported-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a785dbcc
    • Keith Busch's avatar
      nvme/multipath: Disable runtime writable enabling parameter · 5cadde80
      Keith Busch authored
      We can't allow the user to change multipath settings at runtime, as this
      will create naming conflicts due to the different naming schemes used
      for each mode.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5cadde80
    • Keith Busch's avatar
      nvme: Set integrity flag for user passthrough commands · f31a2110
      Keith Busch authored
      If the command a separate metadata buffer attached, the request needs
      to have the integrity flag set so the driver knows to map it.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f31a2110
    • Chengguang Xu's avatar
      nvme: fix potential memory leak in option parsing · 59a2f3f0
      Chengguang Xu authored
      When specifying same string type option several times,
      current option parsing may cause memory leak. Hence,
      call kfree for previous one in this case.
      Signed-off-by: default avatarChengguang Xu <cgxu519@gmx.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      59a2f3f0
    • Tetsuo Handa's avatar
      bdi: Fix use after free bug in debugfs_remove() · f53823c1
      Tetsuo Handa authored
      syzbot is reporting use after free bug in debugfs_remove() [1].
      
      This is because fault injection made memory allocation for
      debugfs_create_file() from bdi_debug_register() from bdi_register_va()
      fail and continued with setting WB_registered. But when debugfs_remove()
      is called from debugfs_remove(bdi->debug_dir) from bdi_debug_unregister()
       from bdi_unregister() from release_bdi() because WB_registered was set
      by bdi_register_va(), IS_ERR_OR_NULL(bdi->debug_dir) == false despite
      debugfs_remove(bdi->debug_dir) was already called from bdi_register_va().
      
      Fix this by making IS_ERR_OR_NULL(bdi->debug_dir) == true.
      
      [1] https://syzkaller.appspot.com/bug?id=5ab4efd91a96dcea9b68104f159adf4af2a6dfc1Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+049cb4ae097049dac137@syzkaller.appspotmail.com>
      Fixes: 97f07697 ("bdi: convert bdi_debug_register to int")
      Cc: weiping zhang <zhangweiping@didichuxing.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f53823c1
    • Eric Dumazet's avatar
      tcp: restore autocorking · 114f39fe
      Eric Dumazet authored
      When adding rb-tree for TCP retransmit queue, we inadvertently broke
      TCP autocorking.
      
      tcp_should_autocork() should really check if the rtx queue is not empty.
      
      Tested:
      
      Before the fix :
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      2682.85   2.47     1.59     3.618   2.329
      TcpExtTCPAutoCorking            33                 0.0
      
      // Same test, but forcing TCP_NODELAY
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      1408.75   2.44     2.96     6.802   8.259
      TcpExtTCPAutoCorking            1                  0.0
      
      After the fix :
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      5472.46   2.45     1.43     1.761   1.027
      TcpExtTCPAutoCorking            361293             0.0
      
      // With TCP_NODELAY option
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      5454.96   2.46     1.63     1.775   1.174
      TcpExtTCPAutoCorking            315448             0.0
      
      Fixes: 75c119af ("tcp: implement rb-tree based retransmit queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMichael Wenig <mwenig@vmware.com>
      Tested-by: default avatarMichael Wenig <mwenig@vmware.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMichael Wenig <mwenig@vmware.com>
      Tested-by: default avatarMichael Wenig <mwenig@vmware.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      114f39fe
    • Eric Dumazet's avatar
      rds: do not leak kernel memory to user land · eb80ca47
      Eric Dumazet authored
      syzbot/KMSAN reported an uninit-value in put_cmsg(), originating
      from rds_cmsg_recv().
      
      Simply clear the structure, since we have holes there, or since
      rx_traces might be smaller than RDS_MSG_RX_DGRAM_TRACE_MAX.
      
      BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline]
      BUG: KMSAN: uninit-value in put_cmsg+0x600/0x870 net/core/scm.c:242
      CPU: 0 PID: 4459 Comm: syz-executor582 Not tainted 4.16.0+ #87
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157
       kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
       copy_to_user include/linux/uaccess.h:184 [inline]
       put_cmsg+0x600/0x870 net/core/scm.c:242
       rds_cmsg_recv net/rds/recv.c:570 [inline]
       rds_recvmsg+0x2db5/0x3170 net/rds/recv.c:657
       sock_recvmsg_nosec net/socket.c:803 [inline]
       sock_recvmsg+0x1d0/0x230 net/socket.c:810
       ___sys_recvmsg+0x3fb/0x810 net/socket.c:2205
       __sys_recvmsg net/socket.c:2250 [inline]
       SYSC_recvmsg+0x298/0x3c0 net/socket.c:2262
       SyS_recvmsg+0x54/0x80 net/socket.c:2257
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 3289025a ("RDS: add receive message trace used by application")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: linux-rdma <linux-rdma@vger.kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb80ca47
    • Tetsuo Handa's avatar
      bdi: wake up concurrent wb_shutdown() callers. · 8236b0ae
      Tetsuo Handa authored
      syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in
      wb_shutdown() [1]. This seems to be because commit 5318ce7d ("bdi:
      Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call
      wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down).
      
      Introduce a helper function clear_and_wake_up_bit() and use it, in order
      to avoid similar errors in future.
      
      [1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5eSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+c0cf869505e03bdf1a24@syzkaller.appspotmail.com>
      Fixes: 5318ce7d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
      Cc: Tejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8236b0ae
    • Bjørn Mork's avatar
      qmi_wwan: do not steal interfaces from class drivers · 5697db4a
      Bjørn Mork authored
      The USB_DEVICE_INTERFACE_NUMBER matching macro assumes that
      the { vendorid, productid, interfacenumber } set uniquely
      identifies one specific function.  This has proven to fail
      for some configurable devices. One example is the Quectel
      EM06/EP06 where the same interface number can be either
      QMI or MBIM, without the device ID changing either.
      
      Fix by requiring the vendor-specific class for interface number
      based matching.  Functions of other classes can and should use
      class based matching instead.
      
      Fixes: 03304bcb ("net: qmi_wwan: use fixed interface number matching")
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5697db4a
    • Coly Li's avatar
      bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set · 09a44ca2
      Coly Li authored
      It is possible that multiple I/O requests hits on failed cache device or
      backing device, therefore it is quite common that CACHE_SET_IO_DISABLE is
      set already when a task tries to set the bit from bch_cache_set_error().
      Currently the message "CACHE_SET_IO_DISABLE already set" is printed by
      pr_warn(), which might mislead users to think a serious fault happens in
      source code.
      
      This patch uses pr_info() to print the information in such situation,
      avoid extra worries. This information is helpful to understand bcache
      behavior in cache device failures, so I still keep them in source code.
      
      Fixes: 771f393e ("bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      09a44ca2
    • Coly Li's avatar
      bcache: set dc->io_disable to true in conditional_stop_bcache_device() · 4fd8e138
      Coly Li authored
      Commit 7e027ca4 ("bcache: add stop_when_cache_set_failed option to
      backing device") adds stop_when_cache_set_failed option and stops bcache
      device if stop_when_cache_set_failed is auto and there is dirty data on
      broken cache device. There might exists a small time gap that the cache
      set is released and set to NULL but bcache device is not released yet
      (because they are released in parallel). During this time gap, dc->c is
      NULL so CACHE_SET_IO_DISABLE won't be checked, and dc->io_disable is still
      false, so new coming I/O requests will be accepted and directly go into
      backing device as no cache set attached to. If there is dirty data on
      cache device, this behavior may introduce potential inconsistent data.
      
      This patch sets dc->io_disable to true before calling bcache_device_stop()
      to make sure the backing device will reject new coming I/O request as
      well, so even in the small time gap no I/O will directly go into backing
      device to corrupt data consistency.
      
      Fixes: 7e027ca4 ("bcache: add stop_when_cache_set_failed option to backing device")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4fd8e138
    • Coly Li's avatar
      bcache: add wait_for_kthread_stop() in bch_allocator_thread() · ecb2ba8c
      Coly Li authored
      When CACHE_SET_IO_DISABLE is set on cache set flags, bcache allocator
      thread routine bch_allocator_thread() may stop the while-loops and
      exit. Then it is possible to observe the following kernel oops message,
      
      [  631.068366] bcache: bch_btree_insert() error -5
      [  631.069115] bcache: cached_dev_detach_finish() Caching disabled for sdf
      [  631.070220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [  631.070250] PGD 0 P4D 0
      [  631.070261] Oops: 0002 [#1] SMP PTI
      [snipped]
      [  631.070578] Workqueue: events cache_set_flush [bcache]
      [  631.070597] RIP: 0010:exit_creds+0x1b/0x50
      [  631.070610] RSP: 0018:ffffc9000705fe08 EFLAGS: 00010246
      [  631.070626] RAX: 0000000000000001 RBX: ffff880a622ad300 RCX: 000000000000000b
      [  631.070645] RDX: 0000000000000601 RSI: 000000000000000c RDI: 0000000000000000
      [  631.070663] RBP: ffff880a622ad300 R08: ffffea00190c66e0 R09: 0000000000000200
      [  631.070682] R10: ffff880a48123000 R11: ffff880000000000 R12: 0000000000000000
      [  631.070700] R13: ffff880a4b160e40 R14: ffff880a4b160000 R15: 0ffff880667e2530
      [  631.070719] FS:  0000000000000000(0000) GS:ffff880667e00000(0000) knlGS:0000000000000000
      [  631.070740] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  631.070755] CR2: 0000000000000000 CR3: 000000000200a001 CR4: 00000000003606e0
      [  631.070774] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  631.070793] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  631.070811] Call Trace:
      [  631.070828]  __put_task_struct+0x55/0x160
      [  631.070845]  kthread_stop+0xee/0x100
      [  631.070863]  cache_set_flush+0x11d/0x1a0 [bcache]
      [  631.070879]  process_one_work+0x146/0x340
      [  631.070892]  worker_thread+0x47/0x3e0
      [  631.070906]  kthread+0xf5/0x130
      [  631.070917]  ? max_active_store+0x60/0x60
      [  631.070930]  ? kthread_bind+0x10/0x10
      [  631.070945]  ret_from_fork+0x35/0x40
      [snipped]
      [  631.071017] RIP: exit_creds+0x1b/0x50 RSP: ffffc9000705fe08
      [  631.071033] CR2: 0000000000000000
      [  631.071045] ---[ end trace 011c63a24b22c927 ]---
      [  631.071085] bcache: bcache_device_free() bcache0 stopped
      
      The reason is when cache_set_flush() tries to call kthread_stop() to stop
      allocator thread, but it exits already due to cache device I/O errors.
      
      This patch adds wait_for_kthread_stop() at tail of bch_allocator_thread(),
      to prevent the thread routine exiting directly. Then the allocator thread
      can be blocked at wait_for_kthread_stop() and wait for cache_set_flush()
      to stop it by calling kthread_stop().
      
      changelog:
      v3: add Reviewed-by from Hannnes.
      v2: not directly return from allocator_wait(), move 'return 0' to tail of
          bch_allocator_thread().
      v1: initial version.
      
      Fixes: 771f393e ("bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ecb2ba8c
    • Coly Li's avatar
      bcache: count backing device I/O error for writeback I/O · bf78980f
      Coly Li authored
      Commit c7b7bd07 ("bcache: add io_disable to struct cached_dev")
      counts backing device I/O requets and set dc->io_disable to true if error
      counters exceeds dc->io_error_limit. But it only counts I/O errors for
      regular I/O request, neglects errors of write back I/Os when backing device
      is offline.
      
      This patch counts the errors of writeback I/Os, in dirty_endio() if
      bio->bi_status is  not 0, it means error happens when writing dirty keys
      to backing device, then bch_count_backing_io_errors() is called.
      
      By this fix, even there is no reqular I/O request coming, if writeback I/O
      errors exceed dc->io_error_limit, the bcache device may still be stopped
      for the broken backing device.
      
      Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bf78980f
    • Coly Li's avatar
      bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() · 6147305c
      Coly Li authored
      Commit c7b7bd07 ("bcache: add io_disable to struct cached_dev") tries
      to stop bcache device by calling bcache_device_stop() when too many I/O
      errors happened on backing device. But if there is internal I/O happening
      on cache device (writeback scan, garbage collection, etc), a regular I/O
      request triggers the internal I/Os may still holds a refcount of dc->count,
      and the refcount may only be dropped after the internal I/O stopped.
      
      By this patch, bch_cached_dev_error() will check if the backing device is
      attached to a cache set, if yes that CACHE_SET_IO_DISABLE will be set to
      flags of this cache set. Then internal I/Os on cache device will be
      rejected and stopped immediately, and the bcache device can be stopped.
      
      For people who are not familiar with the interesting refcount dependance,
      let me explain a bit more how the fix works. Example the writeback thread
      will scan cache device for dirty data writeback purpose. Before it stopps,
      it holds a refcount of dc->count. When CACHE_SET_IO_DISABLE bit is set,
      the internal I/O will stopped and the while-loop in bch_writeback_thread()
      quits and calls cached_dev_put() to drop dc->count. If this is the last
      refcount to drop, then cached_dev_detach_finish() will be called. In this
      call back function, in turn closure_put(dc->disk.cl) is called to drop a
      refcount of closure dc->disk.cl. If this is the last refcount of this
      closure to drop, then cached_dev_flush() will be called. Then the cached
      device is freed. So if CACHE_SET_IO_DISABLE is not set, the bache device
      can not be stopped until all inernal cache device I/O stopped. For large
      size cache device, and writeback thread competes locks with gc thread,
      there might be a quite long time to wait.
      
      Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6147305c
    • Coly Li's avatar
      bcache: store disk name in struct cache and struct cached_dev · 6e916a7e
      Coly Li authored
      Current code uses bdevname() or bio_devname() to reference gendisk
      disk name when bcache needs to display the disk names in kernel message.
      It was safe before bcache device failure handling patch set merged in,
      because when devices are failed, there was deadlock to prevent bcache
      printing error messages with gendisk disk name. But after the failure
      handling patch set merged, the deadlock is fixed, so it is possible
      that the gendisk structure bdev->hd_disk is released when bdevname() is
      called to reference bdev->bd_disk->disk_name[]. This is why I receive
      bug report of NULL pointers deference panic.
      
      This patch stores gendisk disk name in a buffer inside struct cache and
      struct cached_dev, then print out the offline device name won't reference
      bdev->hd_disk anymore. And this patch also avoids extra function calls
      of bdevname() and bio_devnmae().
      
      Changelog:
      v3, add Reviewed-by from Hannes.
      v2, call bdevname() earlier in register_bdev()
      v1, first version with segguestion from Junhui Tang.
      
      Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
      Fixes: 5138ac67 ("bcache: fix misleading error message in bch_count_io_errors()")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6e916a7e
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · f4ef6a43
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Various fixes in tracing:
      
         - Tracepoints should not give warning on OOM failures
      
         - Use special field for function pointer in trace event
      
         - Fix igrab issues in uprobes
      
         - Fixes to the new histogram triggers"
      
      * tag 'trace-v4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracepoint: Do not warn on ENOMEM
        tracing: Add field modifier parsing hist error for hist triggers
        tracing: Add field parsing hist error for hist triggers
        tracing: Restore proper field flag printing when displaying triggers
        tracing: initcall: Ordered comparison of function pointers
        tracing: Remove igrab() iput() call from uprobes.c
        tracing: Fix bad use of igrab in trace_uprobe.c
      f4ef6a43