1. 10 Dec, 2019 1 commit
  2. 07 Dec, 2019 1 commit
    • Jens Axboe's avatar
      Merge branch 'nvme/for-5.5' of git://git.infradead.org/nvme into for-linus · dc3ecfc9
      Jens Axboe authored
      Pull NVMe fixes from Keith
      
      * 'nvme/for-5.5' of git://git.infradead.org/nvme:
        nvme/pci: Fix read queue count
        nvme/pci Limit write queue sizes to possible cpus
        nvme/pci: Fix write and poll queue types
        nvme/pci: Remove last_cq_head
        nvme: Namepace identification descriptor list is optional
        nvme-fc: fix double-free scenarios on hw queues
        nvme: else following return is not needed
        nvme: add error message on mismatching controller ids
        nvme_fc: add module to ops template to allow module references
        nvmet-loop: Avoid preallocating big SGL for data
        nvme-fc: Avoid preallocating big SGL for data
        nvme-rdma: Avoid preallocating big SGL for data
      dc3ecfc9
  3. 06 Dec, 2019 4 commits
  4. 05 Dec, 2019 7 commits
    • Justin Tee's avatar
      block: fix memleak of bio integrity data · ece841ab
      Justin Tee authored
      7c20f116 ("bio-integrity: stop abusing bi_end_io") moves
      bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
      and bio_endio(). This way looks wrong because bio may be freed
      without calling bio_endio(), for example, blk_rq_unprep_clone() is
      called from dm_mq_queue_rq() when the underlying queue of dm-mpath
      is busy.
      
      So memory leak of bio integrity data is caused by commit 7c20f116.
      
      Fixes this issue by re-adding bio_integrity_free() to bio_uninit().
      
      Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by Justin Tee <justin.tee@broadcom.com>
      
      Add commit log, and simplify/fix the original patch wroten by Justin.
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ece841ab
    • LimingWu's avatar
      io_uring: fix a typo in a comment · 0b4295b5
      LimingWu authored
      thatn -> than.
      Signed-off-by: default avatarLiming Wu <19092205@suning.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b4295b5
    • Hou Tao's avatar
      bfq-iosched: Ensure bio->bi_blkg is valid before using it · 08802ed6
      Hou Tao authored
      bio->bi_blkg will be NULL when the issue of the request
      has bypassed the block layer as shown in the following oops:
      
       Internal error: Oops: 96000005 [#1] SMP
       CPU: 17 PID: 2996 Comm: scsi_id Not tainted 5.4.0 #4
       Call trace:
        percpu_counter_add_batch+0x38/0x4c8
        bfqg_stats_update_legacy_io+0x9c/0x280
        bfq_insert_requests+0xbac/0x2190
        blk_mq_sched_insert_request+0x288/0x670
        blk_execute_rq_nowait+0x140/0x178
        blk_execute_rq+0x8c/0x140
        sg_io+0x604/0x9c0
        scsi_cmd_ioctl+0xe38/0x10a8
        scsi_cmd_blk_ioctl+0xac/0xe8
        sd_ioctl+0xe4/0x238
        blkdev_ioctl+0x590/0x20e0
        block_ioctl+0x60/0x98
        do_vfs_ioctl+0xe0/0x1b58
        ksys_ioctl+0x80/0xd8
        __arm64_sys_ioctl+0x40/0x78
        el0_svc_handler+0xc4/0x270
      
      so ensure its validity before using it.
      
      Fixes: fd41e603 ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      08802ed6
    • Pavel Begunkov's avatar
      io_uring: hook all linked requests via link_list · 4493233e
      Pavel Begunkov authored
      Links are created by chaining requests through req->list with an
      exception that head uses req->link_list. (e.g. link_list->list->list)
      Because of that, io_req_link_next() needs complex splicing to advance.
      
      Link them all through list_list. Also, it seems to be simpler and more
      consistent IMHO.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4493233e
    • Pavel Begunkov's avatar
      io_uring: fix error handling in io_queue_link_head · 2e6e1fde
      Pavel Begunkov authored
      In case of an error io_submit_sqe() drops a request and continues
      without it, even if the request was a part of a link. Not only it
      doesn't cancel links, but also may execute wrong sequence of actions.
      
      Stop consuming sqes, and let the user handle errors.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2e6e1fde
    • Jens Axboe's avatar
      io_uring: use hash table for poll command lookups · 78076bb6
      Jens Axboe authored
      We recently changed this from a single list to an rbtree, but for some
      real life workloads, the rbtree slows down the submission/insertion
      case enough so that it's the top cycle consumer on the io_uring side.
      In testing, using a hash table is a more well rounded compromise. It
      is fast for insertion, and as long as it's sized appropriately, it
      works well for the cancellation case as well. Running TAO with a lot
      of network sockets, this removes io_poll_req_insert() from spending
      2% of the CPU cycles.
      Reported-by: default avatarDan Melnic <dmm@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      78076bb6
    • Jens Axboe's avatar
      io-wq: clear node->next on list deletion · 08bdcc35
      Jens Axboe authored
      If someone removes a node from a list, and then later adds it back to
      a list, we can have invalid data in ->next. This can cause all sorts
      of issues. One such use case is the IORING_OP_POLL_ADD command, which
      will do just that if we race and get woken twice without any pending
      events. This is a pretty rare case, but can happen under extreme loads.
      Dan reports that he saw the following crash:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD d283ce067 P4D d283ce067 PUD e5ca04067 PMD 0
      Oops: 0002 [#1] SMP
      CPU: 17 PID: 10726 Comm: tao:fast-fiber Kdump: loaded Not tainted 5.2.9-02851-gac7bc042d2d1 #116
      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
      RIP: 0010:io_wqe_enqueue+0x3e/0xd0
      Code: 34 24 74 55 8b 47 58 48 8d 6f 50 85 c0 74 50 48 89 df e8 35 7c 75 00 48 83 7b 08 00 48 8b 14 24 0f 84 84 00 00 00 48 8b 4b 10 <48> 89 11 48 89 53 10 83 63 20 fe 48 89 c6 48 89 df e8 0c 7a 75 00
      RSP: 0000:ffffc90006858a08 EFLAGS: 00010082
      RAX: 0000000000000002 RBX: ffff889037492fc0 RCX: 0000000000000000
      RDX: ffff888e40cc11a8 RSI: ffff888e40cc11a8 RDI: ffff889037492fc0
      RBP: ffff889037493010 R08: 00000000000000c3 R09: ffffc90006858ab8
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff888e40cc11a8
      R13: 0000000000000000 R14: 00000000000000c3 R15: ffff888e40cc1100
      FS:  00007fcddc9db700(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000e479f5003 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <IRQ>
       io_poll_wake+0x12f/0x2a0
       __wake_up_common+0x86/0x120
       __wake_up_common_lock+0x7a/0xc0
       sock_def_readable+0x3c/0x70
       tcp_rcv_established+0x557/0x630
       tcp_v6_do_rcv+0x118/0x3c0
       tcp_v6_rcv+0x97e/0x9d0
       ip6_protocol_deliver_rcu+0xe3/0x440
       ip6_input+0x3d/0xc0
       ? ip6_protocol_deliver_rcu+0x440/0x440
       ipv6_rcv+0x56/0xd0
       ? ip6_rcv_finish_core.isra.18+0x80/0x80
       __netif_receive_skb_one_core+0x50/0x70
       netif_receive_skb_internal+0x2f/0xa0
       napi_gro_receive+0x125/0x150
       mlx5e_handle_rx_cqe+0x1d9/0x5a0
       ? mlx5e_poll_tx_cq+0x305/0x560
       mlx5e_poll_rx_cq+0x49f/0x9c5
       mlx5e_napi_poll+0xee/0x640
       ? smp_reschedule_interrupt+0x16/0xd0
       ? reschedule_interrupt+0xf/0x20
       net_rx_action+0x286/0x3d0
       __do_softirq+0xca/0x297
       irq_exit+0x96/0xa0
       do_IRQ+0x54/0xe0
       common_interrupt+0xf/0xf
       </IRQ>
      RIP: 0033:0x7fdc627a2e3a
      Code: 31 c0 85 d2 0f 88 f6 00 00 00 55 48 89 e5 41 57 41 56 4c 63 f2 41 55 41 54 53 48 83 ec 18 48 85 ff 0f 84 c7 00 00 00 48 8b 07 <41> 89 d4 49 89 f5 48 89 fb 48 85 c0 0f 84 64 01 00 00 48 83 78 10
      
      when running a networked workload with about 5000 sockets being polled
      for. Fix this by clearing node->next when the node is being removed from
      the list.
      
      Fixes: 6206f0e1 ("io-wq: shrink io_wq_work a bit")
      Reported-by: default avatarDan Melnic <dmm@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      08bdcc35
  5. 04 Dec, 2019 5 commits
  6. 03 Dec, 2019 21 commits
  7. 02 Dec, 2019 1 commit