- 26 Nov, 2019 40 commits
-
-
Jens Axboe authored
If we don't inherit the original task creds, then we can confuse users like fuse that pass creds in the request header. See link below on identical aio issue. Link: https://lore.kernel.org/linux-fsdevel/26f0d78e-99ca-2f1b-78b9-433088053a61@scylladb.com/T/#uSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We currently pass in 4 arguments outside of the bounded size. In preparation for adding one more argument, let's bundle them up in a struct to make it more readable. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Read/write requests to devices without implemented read/write_iter using fixed buffers can cause general protection fault, which totally hangs a machine. io_import_fixed() initialises iov_iter with bvec, but loop_rw_iter() accesses it as iovec, dereferencing random address. kmap() page by page in this case Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This allows an application to call connect() in an async fashion. Like other opcodes, we first try a non-blocking connect, then punt to async context if we have to. Note that we can still return -EINPROGRESS, and in that case the caller should use IORING_OP_POLL_ADD to do an async wait for completion of the connect request (just like for regular connect(2), except we can do it async here too). Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This is identical to __sys_connect(), except it takes a struct file instead of an fd, and it also allows passing in extra file->f_flags flags. The latter is done to support masking in O_NONBLOCK without manipulating the original file flags. No functional changes in this patch. Cc: netdev@vger.kernel.org Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We return -EBUSY on submit when we have a CQ ring overflow backlog, but that can be a bit problematic if the application is using pure userspace poll of the CQ ring. For that case, if the ring briefly overflowed and we have pending entries in the backlog, the submit flushes the backlog successfully but still returns -EBUSY. If we're able to fully flush the CQ ring backlog, let the submission proceed. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Pass only non-null @nxt to io_issue_sqe() and handle it at the caller's side. And propagate it. - kiocb_done() is only called from io_read() and io_write(), which are only called from io_issue_sqe(), so it's @nxt != NULL - io_put_req_find_next() is called either with explicitly non-null local nxt, or from one of the functions in io_issue_sqe() switch (or their callees). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
"if (nxt)" is always true, as it was checked in the while's condition. io_wq_current_is_worker() is unnecessary, as non-async callers don't pass nxt, so io_queue_async_work() will be called for them anyway. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Make io_req_find_next() and io_req_link_next() to accept only non-null nxt, and handle it in callers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There is only one one-liner user of io_free_req_find_next(). Inline it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
The number of SQEs to submit is specified by a user, so io_get_sqring() in most of the cases succeeds. Hint compilers about that. Checking ASM genereted by gcc 9.2.0 for x64, there is one branch misprediction. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
__io_submit_sqe() is issuing requests, so call it as such. Moreover, it ends by calling io_iopoll_req_issued(). Rename it and make terminology clearer. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We don't have shadow requests anymore, so get rid of the shadow argument. Add the user_data argument, as that's often useful to easily match up requests, instead of having to look at request pointers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There's an issue with the shadow drain logic in that we drop the completion lock after deciding to defer a request, then re-grab it later and assume that the state is still the same. In the mean time, someone else completing a request could have found and issued it. This can cause a stall in the queue, by having a shadow request inserted that nobody is going to drain. Additionally, if we fail allocating the shadow request, we simply ignore the drain. Instead of using a shadow request, defer the next request/link instead. This also has the following advantages: - removes semi-duplicated code - doesn't allocate memory for shadows - works better if only the head marked for drain - doesn't need complex synchronisation On the flip side, it removes the shadow->seq == last_drain_in_in_link->seq optimization. That shouldn't be a common case, and can always be added back, if needed. Fixes: 4fe2c963 ("io_uring: add support for link with drain") Cc: Jackie Liu <liuyun01@kylinos.cn> Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
When we find new work to process within the work handler, we queue the linked timeout before we have issued the new work. This can be problematic for very short timeouts, as we have a window where the new work isn't visible. Allow the work handler to store a callback function for this in the work item, and flag it with IO_WQ_WORK_CB if the caller has done so. If that is set, then io-wq will call the callback when it has setup the new work item. Reported-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We currently try and start the next link when we put the request, and only if we were going to free it. This means that the optimization to continue executing requests from the same context often fails, as we're not putting the final reference. Add REQ_F_LINK_NEXT to keep track of this, and allow io_uring to find the next request more efficiently. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We currently rely on the ring destroy on cleaning things up in case of failure, but io_allocate_scq_urings() can leave things half initialized if only parts of it fails. Be nice and return with either everything setup in success, or return an error with things nicely cleaned up. Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Always mark requests with allocated sqe and deallocate it in __io_free_req(). It's easier to follow and doesn't add edge cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We currently clear the linked timeout field if we cancel such a timeout, but we should only attempt to cancel if it's the first one we see. Others should simply be freed like other requests, as they haven't been started yet. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
let have a dependant link: REQ -> LINK_TIMEOUT -> LINK_TIMEOUT 1. submission stage: submission references for REQ and LINK_TIMEOUT are dropped. So, references respectively (1,1,2) 2. io_put(REQ) + FAIL_LINKS stage: calls io_fail_links(), which for all linked timeouts will call cancel_timeout() and drop 1 reference. So, references after: (0,0,1). That's a leak. Make it treat only the first linked timeout as such, and pass others through __io_double_put_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Pass any IORING_OP_LINK_TIMEOUT request further, where it will eventually fail in io_issue_sqe(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
If io_req_defer() failed, it needs to cancel a dependant link. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dan Carpenter authored
These lines are indented an extra space character. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We currently have a race where if setup is really slow, we can be calling io_wq_destroy() before we're done setting up. This will cause the caller to get stuck waiting for the manager to set things up, but the manager already exited. Fix this by doing a sync setup of the manager. This also fixes the case where if we failed creating workers, we'd also get stuck. In practice this race window was really small, as we already wait for the manager to start. Hence someone would have to call io_wq_destroy() after the task has started, but before it started the first loop. The reported test case forked tons of these, which is why it became an issue. Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com Fixes: 771b53d0 ("io-wq: small threadpool implementation for io_uring") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We currently don't explicitly break links if a request is cancelled, but we should. Add explicitly link breakage for all types of request cancellations that we support. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Currently a poll request fills a completion entry of 0, even if it got cancelled. This is odd, and it makes it harder to support with chains. Ensure that it returns -ECANCELED in the completions events if it got cancelled, and furthermore ensure that the linked timeout that triggered it completes with -ETIME if we did indeed trigger the completions through a timeout. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
With the conversion to io-wq, we no longer use that flag. Kill it. Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We have an issue with timeout links that are deeper in the submit chain, because we only handle it upfront, not from later submissions. Move the prep + issue of the timeout link to the async work prep handler, and do it normally for non-async queue. If we validate and prepare the timeout links upfront when we first see them, there's nothing stopping us from supporting any sort of nesting. Fixes: 2665abfd ("io_uring: add support for linked SQE timeouts") Reported-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
There are a few reasons for this: - As a prep to improving the linked timeout logic - io_timeout is the biggest member in the io_kiocb opcode union This also enables a few cleanups, like unifying the timer setup between IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT, and not needing multiple arguments to the link/prep helpers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we don't use the normal completion path, we may skip killing links that should be errored and freed. Add __io_double_put_req() for use within the completion path itself, other calls should just use io_double_put_req(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
__io_queue_sqe(), io_queue_sqe(), io_queue_link_head() all return 0/err, but the caller doesn't care since the errors are handled inline. Clean these up and just make them void. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we have a linked request, this enables us to pass it back directly without having to go through async context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasLinus Torvalds authored
Pull EDAC updates from Borislav Petkov: "A lot of changes this time around, details below. From the next cycle onwards, we'll switch the EDAC tree to topic branches (instead of a single edac-for-next branch) which should make the changes handling more flexible, hopefully. We'll see. Summary: - Rework error logging functions to accept a count of errors parameter (Hanna Hawa) - Part one of substantial EDAC core + ghes_edac driver cleanup (Robert Richter) - Print additional useful logging information in skx_* (Tony Luck) - Improve amd64_edac hw detection + cleanups (Yazen Ghannam) - Misc cleanups, fixes and code improvements" * tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (35 commits) EDAC/altera: Use the Altera System Manager driver EDAC/altera: Cleanup the ECC Manager EDAC/altera: Use fast register IO for S10 IRQs EDAC/ghes: Do not warn when incrementing refcount on 0 EDAC/Documentation: Describe CPER module definition and DIMM ranks EDAC: Unify the mc_event tracepoint call EDAC/ghes: Remove intermediate buffer pvt->detail_location EDAC/ghes: Fix grain calculation EDAC/ghes: Use standard kernel macros for page calculations EDAC: Remove misleading comment in struct edac_raw_error_desc EDAC/mc: Reduce indentation level in edac_mc_handle_error() EDAC/mc: Remove needless zero string termination EDAC/mc: Do not BUG_ON() in edac_mc_alloc() EDAC: Introduce an mci_for_each_dimm() iterator EDAC: Remove EDAC_DIMM_OFF() macro EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function EDAC/amd64: Get rid of the ECC disabled long message EDAC/ghes: Fix locking and memory barrier issues EDAC/amd64: Check for memory before fully initializing an instance EDAC/amd64: Use cached data when checking for ECC ...
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull KVM updates from Paolo Bonzini: "ARM: - data abort report and injection - steal time support - GICv4 performance improvements - vgic ITS emulation fixes - simplify FWB handling - enable halt polling counters - make the emulated timer PREEMPT_RT compliant s390: - small fixes and cleanups - selftest improvements - yield improvements PPC: - add capability to tell userspace whether we can single-step the guest - improve the allocation of XIVE virtual processor IDs - rewrite interrupt synthesis code to deliver interrupts in virtual mode when appropriate. - minor cleanups and improvements. x86: - XSAVES support for AMD - more accurate report of nested guest TSC to the nested hypervisor - retpoline optimizations - support for nested 5-level page tables - PMU virtualization optimizations, and improved support for nested PMU virtualization - correct latching of INITs for nested virtualization - IOAPIC optimization - TSX_CTRL virtualization for more TAA happiness - improved allocation and flushing of SEV ASIDs - many bugfixes and cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints KVM: x86: Grab KVM's srcu lock when setting nested state KVM: x86: Open code shared_msr_update() in its only caller KVM: Fix jump label out_free_* in kvm_init() KVM: x86: Remove a spurious export of a static function KVM: x86: create mmu/ subdirectory KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page KVM: x86: remove set but not used variable 'called' KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID KVM: x86: do not modify masked bits of shared MSRs KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT KVM: x86: Unexport kvm_vcpu_reload_apic_access_page() KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1 KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull xen updates from Juergen Gross: - a small series to remove the build constraint of Xen x86 MCE handling to 64-bit only - a bunch of minor cleanups * tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Fix Kconfig indentation xen/mcelog: also allow building for 32-bit kernels xen/mcelog: add PPIN to record when available xen/mcelog: drop __MC_MSR_MCGCAP xen/gntdev: Use select for DMA_SHARED_BUFFER xen: mm: make xen_mm_init static xen: mm: include <xen/xen-ops.h> for missing declarations
-
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds authored
Pull MIPS updates from Paul Burton: "The main MIPS changes for 5.5: - Atomics-related code sees some rework & cleanup, most notably allowing Loongson LL/SC errata workarounds to be more bulletproof & their correctness to be checked at build time. - Command line setup code is simplified somewhat, resolving various corner cases. - MIPS kernels can now be built with kcov code coverage support. - We can now build with CONFIG_FORTIFY_SOURCE=y. - Miscellaneous cleanups. And some platform specific changes: - We now disable some broken TLB functionality on certain Ingenic systems, and JZ4780 systems gain some devicetree nodes to support more devices. - Loongson support sees a number of cleanups, and we gain initial support for Loongson 3A R4 systems. - We gain support for MediaTek MT7688-based GARDENA Smart Gateway systems. - SGI IP27 (Origin 2*) see a number of fixes, cleanups & simplifications. - SGI IP30 (Octane) systems are now supported" * tag 'mips_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (107 commits) MIPS: SGI-IP27: Enable ethernet phy on second Origin 200 module MIPS: PCI: Fix fake subdevice ID for IOC3 MIPS: Ingenic: Disable abandoned HPTLB function. MIPS: PCI: remember nasid changed by set interrupt affinity MIPS: SGI-IP27: Fix crash, when CPUs are disabled via nr_cpus parameter mips: add support for folded p4d page tables mips: drop __pXd_offset() macros that duplicate pXd_index() ones mips: fix build when "48 bits virtual memory" is enabled MIPS: math-emu: Reuse name array in debugfs_fpuemu() MIPS: allow building with kcov coverage MIPS: Loongson64: Drop setup_pcimap MIPS: Loongson2ef: Convert to early_printk_8250 MIPS: Drop CPU_SUPPORTS_UNCACHED_ACCELERATED MIPS: Loongson{2ef, 32, 64} convert to generic fw cmdline MIPS: Drop pmon.h MIPS: Loongson: Unify LOONGSON3/LOONGSON64 Kconfig usage MIPS: Loongson: Rename LOONGSON1 to LOONGSON32 MIPS: Loongson: Fix return value of loongson_hwmon_init MIPS: add support for SGI Octane (IP30) MIPS: PCI: make phys_to_dma/dma_to_phys for pci-xtalk-bridge common ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68kLinus Torvalds authored
Pull m68k updates from Geert Uytterhoeven: - Atari Falcon IDE platform driver conversion for module autoload - defconfig updates (including enablement of Amiga ICY I2C) - small fixes and cleanups * tag 'm68k-for-v5.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k/atari: Convert Falcon IDE drivers to platform drivers m68k: defconfig: Enable ICY I2C and LTC2990 on Amiga m68k: defconfig: Update defconfigs for v5.4-rc1 m68k: q40: Fix info-leak in rtc_ioctl nubus: Remove cast to void pointer
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull RAS updates from Borislav Petkov: - Fully reworked thermal throttling notifications, there should be no more spamming of dmesg (Srinivas Pandruvada and Benjamin Berg) - More enablement for the Intel-compatible CPUs Zhaoxin (Tony W Wang-oc) - PPIN support for Icelake (Tony Luck) * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/therm_throt: Optimize notifications of thermal throttle x86/mce: Add Xeon Icelake to list of CPUs that support PPIN x86/mce: Lower throttling MCE messages' priority to warning x86/mce: Add Zhaoxin LMCE support x86/mce: Add Zhaoxin CMCI support x86/mce: Add Zhaoxin MCE support x86/mce/amd: Make disable_err_thresholding() static
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 microcode updates from Borislav Petkov: "This converts the late loading method to load the microcode in parallel (vs sequentially currently). The patch remained in linux-next for the maximum amount of time so that any potential and hard to debug fallout be minimized. Now cloud folks have their milliseconds back but all the normal people should use early loading anyway :-)" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Issue the revision updated message only on the BSP x86/microcode: Update late microcode in parallel x86/microcode/amd: Fix two -Wunused-but-set-variable warnings
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull s390 updates from Vasily Gorbik: - Adjust PMU device drivers registration to avoid WARN_ON and few other perf improvements. - Enhance tracing in vfio-ccw. - Few stack unwinder fixes and improvements, convert get_wchan custom stack unwinding to generic api usage. - Fixes for mm helpers issues uncovered with tests validating architecture page table helpers. - Fix noexec bit handling when hardware doesn't support it. - Fix memleak and unsigned value compared with zero bugs in crypto code. Minor code simplification. - Fix crash during kdump with kasan enabled kernel. - Switch bug and alternatives from asm to asm_inline to improve inlining decisions. - Use 'depends on cc-option' for MARCH and TUNE options in Kconfig, add z13s and z14 ZR1 to TUNE descriptions. - Minor head64.S simplification. - Fix physical to logical CPU map for SMT. - Several cleanups in qdio code. - Other minor cleanups and fixes all over the code. * tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits) s390/cpumf: Adjust registration of s390 PMU device drivers s390/smp: fix physical to logical CPU map for SMT s390/early: move access registers setup in C code s390/head64: remove unnecessary vdso_per_cpu_data setup s390/early: move control registers setup in C code s390/kasan: support memcpy_real with TRACE_IRQFLAGS s390/crypto: Fix unsigned variable compared with zero s390/pkey: use memdup_user() to simplify code s390/pkey: fix memory leak within _copy_apqns_from_user() s390/disassembler: don't hide instruction addresses s390/cpum_sf: Assign error value to err variable s390/cpum_sf: Replace function name in debug statements s390/cpum_sf: Use consistant debug print format for sampling s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr() s390: add error handling to perf_callchain_kernel s390: always inline current_stack_pointer() s390/mm: add mm_pxd_folded() checks to pxd_free() s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported s390/mm: simplify page table helpers for large entries s390/mm: make pmd/pud_bad() report large entries as bad ...
-