1. 15 Apr, 2020 3 commits
    • Pavel Begunkov's avatar
      io_uring: don't count rqs failed after current one · 31af27c7
      Pavel Begunkov authored
      When checking for draining with __req_need_defer(), it tries to match
      how many requests were sent before a current one with number of already
      completed. Dropped SQEs are included in req->sequence, and they won't
      ever appear in CQ. To compensate for that, __req_need_defer() substracts
      ctx->cached_sq_dropped.
      However, what it should really use is number of SQEs dropped __before__
      the current one. In other words, any submitted request shouldn't
      shouldn't affect dequeueing from the drain queue of previously submitted
      ones.
      
      Instead of saving proper ctx->cached_sq_dropped in each request,
      substract from req->sequence it at initialisation, so it includes number
      of properly submitted requests.
      
      note: it also changes behaviour of timeouts, but
      1. it's already diverge from the description because of using SQ
      2. the description is ambiguous regarding dropped SQEs
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      31af27c7
    • Pavel Begunkov's avatar
      io_uring: kill already cached timeout.seq_offset · b55ce732
      Pavel Begunkov authored
      req->timeout.count and req->io->timeout.seq_offset store the same value,
      which is sqe->off. Kill the second one
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b55ce732
    • Pavel Begunkov's avatar
      io_uring: fix cached_sq_head in io_timeout() · 22cad158
      Pavel Begunkov authored
      io_timeout() can be executed asynchronously by a worker and without
      holding ctx->uring_lock
      
      1. using ctx->cached_sq_head there is racy there
      2. it should count events from a moment of timeout's submission, but
      not execution
      
      Use req->sequence.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      22cad158
  2. 13 Apr, 2020 4 commits
    • Jens Axboe's avatar
      io_uring: only post events in io_poll_remove_all() if we completed some · 8e2e1faf
      Jens Axboe authored
      syzbot reports this crash:
      
      BUG: unable to handle page fault for address: ffffffffffffffe8
      PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G    B        L    5.7.0-rc1-next-20200413 #4
      Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017
      RIP: 0010:__wake_up_common+0x98/0x290
      el/sched/wait.c:87
      Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
      RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
      RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
      RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
      R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
      R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
      FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
      Call Trace:
       __wake_up_common_lock+0xea/0x150
      ommon_lock at kernel/sched/wait.c:124
       ? __wake_up_common+0x290/0x290
       ? lockdep_hardirqs_on+0x16/0x2c0
       __wake_up+0x13/0x20
       io_cqring_ev_posted+0x75/0xe0
      v_posted at fs/io_uring.c:1160
       io_ring_ctx_wait_and_kill+0x1c0/0x2f0
      l at fs/io_uring.c:7305
       io_uring_create+0xa8d/0x13b0
       ? io_req_defer_prep+0x990/0x990
       ? __kasan_check_write+0x14/0x20
       io_uring_setup+0xb8/0x130
       ? io_uring_create+0x13b0/0x13b0
       ? check_flags.part.28+0x220/0x220
       ? lockdep_hardirqs_on+0x16/0x2c0
       __x64_sys_io_uring_setup+0x31/0x40
       do_syscall_64+0xcc/0xaf0
       ? syscall_return_slowpath+0x580/0x580
       ? lockdep_hardirqs_off+0x1f/0x140
       ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
       ? trace_hardirqs_off_caller+0x3a/0x150
       ? trace_hardirqs_off_thunk+0x1a/0x1c
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x7fdcb9dd76ed
      Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
      RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed
      RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54
      RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000
      R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002
      R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000
      Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc]
      CR2: ffffffffffffffe8
      ---[ end trace f9502383d57e0e22 ]---
      RIP: 0010:__wake_up_common+0x98/0x290
      Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
      RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
      RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
      RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
      R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
      R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
      FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
      Kernel panic - not syncing: Fatal exception
      Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      ---[ end Kernel panic - not syncing: Fatal exception ]—
      
      which is due to error injection (or allocation failure) preventing the
      rings from being setup. On shutdown, we attempt to remove any pending
      requests, and for poll request, we call io_cqring_ev_posted() when we've
      killed poll requests. However, since the rings aren't setup, we won't
      find any poll requests. Make the calling of io_cqring_ev_posted()
      dependent on actually having completed requests. This fixes this setup
      corner case, and removes spurious calls if we remove poll requests and
      don't find any.
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8e2e1faf
    • Jens Axboe's avatar
      io_uring: io_async_task_func() should check and honor cancelation · 2bae047e
      Jens Axboe authored
      If the request has been marked as canceled, don't try and issue it.
      Instead just fill a canceled event and finish the request.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2bae047e
    • Jens Axboe's avatar
      io_uring: check for need to re-wait in polled async handling · 74ce6ce4
      Jens Axboe authored
      We added this for just the regular poll requests in commit a6ba632d
      ("io_uring: retry poll if we got woken with non-matching mask"), we
      should do the same for the poll handler used pollable async requests.
      Move the re-wait check and arm into a helper, and call it from
      io_async_task_func() as well.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74ce6ce4
    • Jens Axboe's avatar
      io_uring: correct O_NONBLOCK check for splice punt · 88357580
      Jens Axboe authored
      The splice file punt check uses file->f_mode to check for O_NONBLOCK,
      but it should be checking file->f_flags. This leads to punting even
      for files that have O_NONBLOCK set, which isn't necessary. This equates
      to checking for FMODE_PATH, which will never be set on the fd in
      question.
      
      Fixes: 7d67af2c ("io_uring: add splice(2) support")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      88357580
  3. 12 Apr, 2020 6 commits
    • Xiaoguang Wang's avatar
      io_uring: restore req->work when canceling poll request · b1f573bd
      Xiaoguang Wang authored
      When running liburing test case 'accept', I got below warning:
      RED: Invalid credentials
      RED: At include/linux/cred.h:285
      RED: Specified credentials: 00000000d02474a0
      RED: ->magic=4b, put_addr=000000005b4f46e9
      RED: ->usage=-1699227648, subscr=-25693
      RED: ->*uid = { 256,-25693,-25693,65534 }
      RED: ->*gid = { 0,-1925859360,-1789740800,-1827028688 }
      RED: ->security is 00000000258c136e
      eneral protection fault, probably for non-canonical address 0xdead4ead00000000: 0000 [#1] SMP PTI
      PU: 21 PID: 2037 Comm: accept Not tainted 5.6.0+ #318
      ardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
      IP: 0010:dump_invalid_creds+0x16f/0x184
      ode: 48 8b 83 88 00 00 00 48 3d ff 0f 00 00 76 29 48 89 c2 81 e2 00 ff ff ff 48
      81 fa 00 6b 6b 6b 74 17 5b 48 c7 c7 4b b1 10 8e 5d <8b> 50 04 41 5c 8b 30 41 5d
      e9 67 e3 04 00 5b 5d 41 5c 41 5d c3 0f
      SP: 0018:ffffacc1039dfb38 EFLAGS: 00010087
      AX: dead4ead00000000 RBX: ffff9ba39319c100 RCX: 0000000000000007
      DX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8e10b14b
      BP: ffffffff8e108476 R08: 0000000000000000 R09: 0000000000000001
      10: 0000000000000000 R11: ffffacc1039df9e5 R12: 000000009552b900
      13: 000000009319c130 R14: ffff9ba39319c100 R15: 0000000000000246
      S:  00007f96b2bfc4c0(0000) GS:ffff9ba39f340000(0000) knlGS:0000000000000000
      S:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      R2: 0000000000401870 CR3: 00000007db7a4000 CR4: 00000000000006e0
      all Trace:
      __invalid_creds+0x48/0x4a
      __io_req_aux_free+0x2e8/0x3b0
      ? io_poll_remove_one+0x2a/0x1d0
      __io_free_req+0x18/0x200
      io_free_req+0x31/0x350
      io_poll_remove_one+0x17f/0x1d0
      io_poll_cancel.isra.80+0x6c/0x80
      io_async_find_and_cancel+0x111/0x120
      io_issue_sqe+0x181/0x10e0
      ? __lock_acquire+0x552/0xae0
      ? lock_acquire+0x8e/0x310
      ? fs_reclaim_acquire.part.97+0x5/0x30
      __io_queue_sqe.part.100+0xc4/0x580
      ? io_submit_sqes+0x751/0xbd0
      ? rcu_read_lock_sched_held+0x32/0x40
      io_submit_sqes+0x9ba/0xbd0
      ? __x64_sys_io_uring_enter+0x2b2/0x460
      ? __x64_sys_io_uring_enter+0xaf/0x460
      ? find_held_lock+0x2d/0x90
      ? __x64_sys_io_uring_enter+0x111/0x460
      __x64_sys_io_uring_enter+0x2d7/0x460
      do_syscall_64+0x5a/0x230
      entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      After looking into codes, it turns out that this issue is because we didn't
      restore the req->work, which is changed in io_arm_poll_handler(), req->work
      is a union with below struct:
      	struct {
      		struct callback_head	task_work;
      		struct hlist_node	hash_node;
      		struct async_poll	*apoll;
      	};
      If we forget to restore, members in struct io_wq_work would be invalid,
      restore the req->work to fix this issue.
      Signed-off-by: default avatarXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      
      Get rid of not needed 'need_restore' variable.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b1f573bd
    • Pavel Begunkov's avatar
      io_uring: move all request init code in one place · ef4ff581
      Pavel Begunkov authored
      Requests initialisation is scattered across several functions, namely
      io_init_req(), io_submit_sqes(), io_submit_sqe(). Put it
      in io_init_req() for better data locality and code clarity.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ef4ff581
    • Pavel Begunkov's avatar
      io_uring: keep all sqe->flags in req->flags · dea3b49c
      Pavel Begunkov authored
      It's a good idea to not read sqe->flags twice, as it's prone to security
      bugs. Instead of passing it around, embeed them in req->flags. It's
      already so except for IOSQE_IO_LINK.
      1. rename former REQ_F_LINK -> REQ_F_LINK_HEAD
      2. introduce and copy REQ_F_LINK, which mimics IO_IOSQE_LINK
      
      And leave req_set_fail_links() using new REQ_F_LINK, because it's more
      sensible.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dea3b49c
    • Pavel Begunkov's avatar
      io_uring: early submission req fail code · 1d4240cc
      Pavel Begunkov authored
      Having only one place for cleaning up a request after a link assembly/
      submission failure will play handy in the future. At least it allows
      to remove duplicated cleanup sequence.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1d4240cc
    • Pavel Begunkov's avatar
      io_uring: track mm through current->mm · bf9c2f1c
      Pavel Begunkov authored
      As a preparation for extracting request init bits, remove self-coded mm
      tracking from io_submit_sqes(), but rely on current->mm. It's more
      convenient, than passing this piece of state in other functions.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bf9c2f1c
    • Pavel Begunkov's avatar
      io_uring: remove obsolete @mm_fault · dccc587f
      Pavel Begunkov authored
      If io_submit_sqes() can't grab an mm, it fails and exits right away.
      There is no need to track the fact of the failure. Remove @mm_fault.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dccc587f
  4. 11 Apr, 2020 5 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 5b8b9d0c
      Linus Torvalds authored
      Merge yet more updates from Andrew Morton:
      
       - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
         gup, hugetlb, pagemap, memremap)
      
       - Various other things (hfs, ocfs2, kmod, misc, seqfile)
      
      * akpm: (34 commits)
        ipc/util.c: sysvipc_find_ipc() should increase position index
        kernel/gcov/fs.c: gcov_seq_next() should increase position index
        fs/seq_file.c: seq_read(): add info message about buggy .next functions
        drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
        change email address for Pali Rohár
        selftests: kmod: test disabling module autoloading
        selftests: kmod: fix handling test numbers above 9
        docs: admin-guide: document the kernel.modprobe sysctl
        fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
        kmod: make request_module() return an error when autoloading is disabled
        mm/memremap: set caching mode for PCI P2PDMA memory to WC
        mm/memory_hotplug: add pgprot_t to mhp_params
        powerpc/mm: thread pgprot_t through create_section_mapping()
        x86/mm: introduce __set_memory_prot()
        x86/mm: thread pgprot_t through init_memory_mapping()
        mm/memory_hotplug: rename mhp_restrictions to mhp_params
        mm/memory_hotplug: drop the flags field from struct mhp_restrictions
        mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
        mm/vma: introduce VM_ACCESS_FLAGS
        mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
        ...
      5b8b9d0c
    • Linus Torvalds's avatar
      Merge tag 'docs-5.7-2' of git://git.lwn.net/linux · ca6151a9
      Linus Torvalds authored
      Pull Documentation fixes from Jonathan Corbet:
       "A handful of late-arriving fixes for the documentation tree"
      
      * tag 'docs-5.7-2' of git://git.lwn.net/linux:
        Documentation: android: binderfs: add 'stats' mount option
        Documentation: driver-api/usb/writing_usb_driver.rst Updates documentation links
        docs: driver-api: address duplicate label warning
        Documentation: sysrq: fix RST formatting
        docs: kernel-parameters.txt: Fix broken references
        docs: kernel-parameters.txt: Remove nompx
        docs: filesystems: fix typo in qnx6.rst
      ca6151a9
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.7-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 4e4bdcfa
      Linus Torvalds authored
      Pull orangefs updates from Mike Marshall:
       "A fix and two cleanups.
      
        Fix:
      
         - Christoph Hellwig noticed that some logic I added to
           orangefs_file_read_iter introduced a race condition, so he sent a
           reversion patch. I had to modify his patch since reverting at this
           point broke Orangefs.
      
        Cleanups:
      
         - Christoph Hellwig noticed that we were doing some unnecessary work
           in orangefs_flush, so he sent in a patch that removed the un-needed
           code.
      
         - Al Viro told me he had trouble building Orangefs. Orangefs should
           be easy to build, even for Al :-).
      
           I looked back at the test server build notes in orangefs.txt, just
           in case that's where the trouble really is, and found a couple of
           typos and made a couple of clarifications"
      
      * tag 'for-linus-5.7-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        orangefs: clarify build steps for test server in orangefs.txt
        orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush
        orangefs: get rid of knob code...
      4e4bdcfa
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20200410' of git://github.com/jcmvbkbc/linux-xtensa · 9539303a
      Linus Torvalds authored
      Pull xtensa updates from Max Filippov:
      
       - replace setup_irq() by request_irq()
      
       - cosmetic fixes in xtensa Kconfig and boot/Makefile
      
      * tag 'xtensa-20200410' of git://github.com/jcmvbkbc/linux-xtensa:
        arch/xtensa: fix grammar in Kconfig help text
        xtensa: remove meaningless export ccflags-y
        xtensa: replace setup_irq() by request_irq()
      9539303a
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · e6383b18
      Linus Torvalds authored
      Pull more xen updates from Juergen Gross:
      
       - two cleanups
      
       - fix a boot regression introduced in this merge window
      
       - fix wrong use of memory allocation flags
      
      * tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        x86/xen: fix booting 32-bit pv guest
        x86/xen: make xen_pvmmu_arch_setup() static
        xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()
        xen: Use evtchn_type_t as a type for event channels
      e6383b18
  5. 10 Apr, 2020 22 commits