1. 07 Dec, 2019 1 commit
    • Jens Axboe's avatar
      Merge branch 'nvme/for-5.5' of git://git.infradead.org/nvme into for-linus · dc3ecfc9
      Jens Axboe authored
      Pull NVMe fixes from Keith
      
      * 'nvme/for-5.5' of git://git.infradead.org/nvme:
        nvme/pci: Fix read queue count
        nvme/pci Limit write queue sizes to possible cpus
        nvme/pci: Fix write and poll queue types
        nvme/pci: Remove last_cq_head
        nvme: Namepace identification descriptor list is optional
        nvme-fc: fix double-free scenarios on hw queues
        nvme: else following return is not needed
        nvme: add error message on mismatching controller ids
        nvme_fc: add module to ops template to allow module references
        nvmet-loop: Avoid preallocating big SGL for data
        nvme-fc: Avoid preallocating big SGL for data
        nvme-rdma: Avoid preallocating big SGL for data
      dc3ecfc9
  2. 06 Dec, 2019 4 commits
  3. 05 Dec, 2019 7 commits
    • Justin Tee's avatar
      block: fix memleak of bio integrity data · ece841ab
      Justin Tee authored
      7c20f116 ("bio-integrity: stop abusing bi_end_io") moves
      bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
      and bio_endio(). This way looks wrong because bio may be freed
      without calling bio_endio(), for example, blk_rq_unprep_clone() is
      called from dm_mq_queue_rq() when the underlying queue of dm-mpath
      is busy.
      
      So memory leak of bio integrity data is caused by commit 7c20f116.
      
      Fixes this issue by re-adding bio_integrity_free() to bio_uninit().
      
      Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by Justin Tee <justin.tee@broadcom.com>
      
      Add commit log, and simplify/fix the original patch wroten by Justin.
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ece841ab
    • LimingWu's avatar
      io_uring: fix a typo in a comment · 0b4295b5
      LimingWu authored
      thatn -> than.
      Signed-off-by: default avatarLiming Wu <19092205@suning.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b4295b5
    • Hou Tao's avatar
      bfq-iosched: Ensure bio->bi_blkg is valid before using it · 08802ed6
      Hou Tao authored
      bio->bi_blkg will be NULL when the issue of the request
      has bypassed the block layer as shown in the following oops:
      
       Internal error: Oops: 96000005 [#1] SMP
       CPU: 17 PID: 2996 Comm: scsi_id Not tainted 5.4.0 #4
       Call trace:
        percpu_counter_add_batch+0x38/0x4c8
        bfqg_stats_update_legacy_io+0x9c/0x280
        bfq_insert_requests+0xbac/0x2190
        blk_mq_sched_insert_request+0x288/0x670
        blk_execute_rq_nowait+0x140/0x178
        blk_execute_rq+0x8c/0x140
        sg_io+0x604/0x9c0
        scsi_cmd_ioctl+0xe38/0x10a8
        scsi_cmd_blk_ioctl+0xac/0xe8
        sd_ioctl+0xe4/0x238
        blkdev_ioctl+0x590/0x20e0
        block_ioctl+0x60/0x98
        do_vfs_ioctl+0xe0/0x1b58
        ksys_ioctl+0x80/0xd8
        __arm64_sys_ioctl+0x40/0x78
        el0_svc_handler+0xc4/0x270
      
      so ensure its validity before using it.
      
      Fixes: fd41e603 ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      08802ed6
    • Pavel Begunkov's avatar
      io_uring: hook all linked requests via link_list · 4493233e
      Pavel Begunkov authored
      Links are created by chaining requests through req->list with an
      exception that head uses req->link_list. (e.g. link_list->list->list)
      Because of that, io_req_link_next() needs complex splicing to advance.
      
      Link them all through list_list. Also, it seems to be simpler and more
      consistent IMHO.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4493233e
    • Pavel Begunkov's avatar
      io_uring: fix error handling in io_queue_link_head · 2e6e1fde
      Pavel Begunkov authored
      In case of an error io_submit_sqe() drops a request and continues
      without it, even if the request was a part of a link. Not only it
      doesn't cancel links, but also may execute wrong sequence of actions.
      
      Stop consuming sqes, and let the user handle errors.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2e6e1fde
    • Jens Axboe's avatar
      io_uring: use hash table for poll command lookups · 78076bb6
      Jens Axboe authored
      We recently changed this from a single list to an rbtree, but for some
      real life workloads, the rbtree slows down the submission/insertion
      case enough so that it's the top cycle consumer on the io_uring side.
      In testing, using a hash table is a more well rounded compromise. It
      is fast for insertion, and as long as it's sized appropriately, it
      works well for the cancellation case as well. Running TAO with a lot
      of network sockets, this removes io_poll_req_insert() from spending
      2% of the CPU cycles.
      Reported-by: default avatarDan Melnic <dmm@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      78076bb6
    • Jens Axboe's avatar
      io-wq: clear node->next on list deletion · 08bdcc35
      Jens Axboe authored
      If someone removes a node from a list, and then later adds it back to
      a list, we can have invalid data in ->next. This can cause all sorts
      of issues. One such use case is the IORING_OP_POLL_ADD command, which
      will do just that if we race and get woken twice without any pending
      events. This is a pretty rare case, but can happen under extreme loads.
      Dan reports that he saw the following crash:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD d283ce067 P4D d283ce067 PUD e5ca04067 PMD 0
      Oops: 0002 [#1] SMP
      CPU: 17 PID: 10726 Comm: tao:fast-fiber Kdump: loaded Not tainted 5.2.9-02851-gac7bc042d2d1 #116
      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
      RIP: 0010:io_wqe_enqueue+0x3e/0xd0
      Code: 34 24 74 55 8b 47 58 48 8d 6f 50 85 c0 74 50 48 89 df e8 35 7c 75 00 48 83 7b 08 00 48 8b 14 24 0f 84 84 00 00 00 48 8b 4b 10 <48> 89 11 48 89 53 10 83 63 20 fe 48 89 c6 48 89 df e8 0c 7a 75 00
      RSP: 0000:ffffc90006858a08 EFLAGS: 00010082
      RAX: 0000000000000002 RBX: ffff889037492fc0 RCX: 0000000000000000
      RDX: ffff888e40cc11a8 RSI: ffff888e40cc11a8 RDI: ffff889037492fc0
      RBP: ffff889037493010 R08: 00000000000000c3 R09: ffffc90006858ab8
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff888e40cc11a8
      R13: 0000000000000000 R14: 00000000000000c3 R15: ffff888e40cc1100
      FS:  00007fcddc9db700(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000e479f5003 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <IRQ>
       io_poll_wake+0x12f/0x2a0
       __wake_up_common+0x86/0x120
       __wake_up_common_lock+0x7a/0xc0
       sock_def_readable+0x3c/0x70
       tcp_rcv_established+0x557/0x630
       tcp_v6_do_rcv+0x118/0x3c0
       tcp_v6_rcv+0x97e/0x9d0
       ip6_protocol_deliver_rcu+0xe3/0x440
       ip6_input+0x3d/0xc0
       ? ip6_protocol_deliver_rcu+0x440/0x440
       ipv6_rcv+0x56/0xd0
       ? ip6_rcv_finish_core.isra.18+0x80/0x80
       __netif_receive_skb_one_core+0x50/0x70
       netif_receive_skb_internal+0x2f/0xa0
       napi_gro_receive+0x125/0x150
       mlx5e_handle_rx_cqe+0x1d9/0x5a0
       ? mlx5e_poll_tx_cq+0x305/0x560
       mlx5e_poll_rx_cq+0x49f/0x9c5
       mlx5e_napi_poll+0xee/0x640
       ? smp_reschedule_interrupt+0x16/0xd0
       ? reschedule_interrupt+0xf/0x20
       net_rx_action+0x286/0x3d0
       __do_softirq+0xca/0x297
       irq_exit+0x96/0xa0
       do_IRQ+0x54/0xe0
       common_interrupt+0xf/0xf
       </IRQ>
      RIP: 0033:0x7fdc627a2e3a
      Code: 31 c0 85 d2 0f 88 f6 00 00 00 55 48 89 e5 41 57 41 56 4c 63 f2 41 55 41 54 53 48 83 ec 18 48 85 ff 0f 84 c7 00 00 00 48 8b 07 <41> 89 d4 49 89 f5 48 89 fb 48 85 c0 0f 84 64 01 00 00 48 83 78 10
      
      when running a networked workload with about 5000 sockets being polled
      for. Fix this by clearing node->next when the node is being removed from
      the list.
      
      Fixes: 6206f0e1 ("io-wq: shrink io_wq_work a bit")
      Reported-by: default avatarDan Melnic <dmm@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      08bdcc35
  4. 04 Dec, 2019 5 commits
  5. 03 Dec, 2019 21 commits
  6. 02 Dec, 2019 2 commits
    • Keith Busch's avatar
      nvme: Namepace identification descriptor list is optional · 22802bf7
      Keith Busch authored
      Despite NVM Express specification 1.3 requires a controller claiming to
      be 1.3 or higher implement Identify CNS 03h (Namespace Identification
      Descriptor list), the driver doesn't really need this identification in
      order to use a namespace. The code had already documented in comments
      that we're not to consider an error to this command.
      
      Return success if the controller provided any response to an
      namespace identification descriptors command.
      
      Fixes: 538af88e ("nvme: make nvme_report_ns_ids propagate error back")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=205679Reported-by: default avatarIngo Brunberg <ingo_brunberg@web.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: stable@vger.kernel.org # 5.4+
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      22802bf7
    • Jens Axboe's avatar
      io_uring: use current task creds instead of allocating a new one · 0b8c0ec7
      Jens Axboe authored
      syzbot reports:
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274
        kthread+0x361/0x430 kernel/kthread.c:255
        ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      Modules linked in:
      ---[ end trace f2e1a4307fbe2245 ]---
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      which is caused by slab fault injection triggering a failure in
      prepare_creds(). We don't actually need to create a copy of the creds
      as we're not modifying it, we just need a reference on the current task
      creds. This avoids the failure case as well, and propagates the const
      throughout the stack.
      
      Fixes: 181e448d ("io_uring: async workers should inherit the user creds")
      Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b8c0ec7