1. 02 May, 2019 4 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190502' of git://git.kernel.dk/linux-block · 5ce3307b
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "This is mostly io_uring fixes/tweaks. Most of these were actually done
        in time for the last -rc, but I wanted to ensure that everything
        tested out great before including them. The code delta looks larger
        than it really is, as it's mostly just comment additions/changes.
      
        Outside of the comment additions/changes, this is mostly removal of
        unnecessary barriers. In all, this pull request contains:
      
         - Tweak to how we handle errors at submission time. We now post a
           completion event if the error occurs on behalf of an sqe, instead
           of returning it through the system call. If the error happens
           outside of a specific sqe, we return the error through the system
           call. This makes it nicer to use and makes the "normal" use case
           behave the same as the offload cases. (me)
      
         - Fix for a missing req reference drop from async context (me)
      
         - If an sqe is submitted with RWF_NOWAIT, don't punt it to async
           context. Return -EAGAIN directly, instead of using it as a hint to
           do async punt. (Stefan)
      
         - Fix notes on barriers (Stefan)
      
         - Remove unnecessary barriers (Stefan)
      
         - Fix potential double free of memory in setup error (Mark)
      
         - Further improve sq poll CPU validation (Mark)
      
         - Fix page allocation warning and leak on buffer registration error
           (Mark)
      
         - Fix iov_iter_type() for new no-ref flag (Ming)
      
         - Fix a case where dio doesn't honor bio no-page-ref (Ming)"
      
      * tag 'for-linus-20190502' of git://git.kernel.dk/linux-block:
        io_uring: avoid page allocation warnings
        iov_iter: fix iov_iter_type
        block: fix handling for BIO_NO_PAGE_REF
        io_uring: drop req submit reference always in async punt
        io_uring: free allocated io_memory once
        io_uring: fix SQPOLL cpu validation
        io_uring: have submission side sqe errors post a cqe
        io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP
        io_uring: remove unnecessary barrier after incrementing dropped counter
        io_uring: remove unnecessary barrier before reading SQ tail
        io_uring: remove unnecessary barrier after updating SQ head
        io_uring: remove unnecessary barrier before reading cq head
        io_uring: remove unnecessary barrier before wq_has_sleeper
        io_uring: fix notes on barriers
        io_uring: fix handling SQEs requesting NOWAIT
      5ce3307b
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b7a5b22b
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "I apologize for sending these so late in the cycle. We went back and
        forth about how to deal with the unexpected logging of intentional
        link state changes and finally decided to just config them off by
        default.
      
        PCI fixes:
      
         - Stop ignoring "pci=disable_acs_redir" parameter (Logan Gunthorpe)
      
         - Use shared MSI/MSI-X vector for Link Bandwidth Management (Alex
           Williamson)
      
         - Add Kconfig option for Link Bandwidth notification messages (Keith
           Busch)"
      
      * tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/LINK: Add Kconfig option (default off)
        PCI/portdrv: Use shared MSI/MSI-X vector for Bandwidth Management
        PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored
      b7a5b22b
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · e2a4b102
      Linus Torvalds authored
      Pull MTD fix from Richard Weinberger:
       "A single regression fix for the marvell nand driver"
      
      * tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: rawnand: marvell: Clean the controller state before each operation
      e2a4b102
    • Keith Busch's avatar
      PCI/LINK: Add Kconfig option (default off) · 2078e1e7
      Keith Busch authored
      e8303bb7 ("PCI/LINK: Report degraded links via link bandwidth
      notification") added dmesg logging whenever a link changes speed or width
      to a state that is considered degraded.  Unfortunately, it cannot
      differentiate signal integrity-related link changes from those
      intentionally initiated by an endpoint driver, including drivers that may
      live in userspace or VMs when making use of vfio-pci.  Some GPU drivers
      actively manage the link state to save power, which generates a stream of
      messages like this:
      
        vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link)
      
      Since we can't distinguish the intentional changes from the signal
      integrity issues, leave the reporting turned off by default.  Add a Kconfig
      option to turn it on if desired.
      
      Fixes: e8303bb7 ("PCI/LINK: Report degraded links via link bandwidth notification")
      Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.comSigned-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      2078e1e7
  2. 01 May, 2019 15 commits
    • Linus Torvalds's avatar
      Merge tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · 600d7258
      Linus Torvalds authored
      Pull power supply fixes from Sebastian Reichel:
       "Two more fixes for the 5.1 cycle.
      
        One division by zero fix in a specific driver and one core workaround
        for bad userspace behaviour from systemd regarding uevents. IMHO this
        can be considered to be a userspace bug, but the debug messages are
        useless anyways
      
         - cpcap-battery: fix a division by zero
      
         - core: fix systemd issue due to log messages produced by uevent"
      
      * tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
        power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
        power: supply: cpcap-battery: Fix division by zero
      600d7258
    • Linus Torvalds's avatar
      Merge tag 'arc-5.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 65beea4c
      Linus Torvalds authored
      Pull ARC fixes from Vineet Gupta:
       "A few minor fixes for ARC.
      
         - regression in memset if line size !64
      
         - avoid panic if PAE and IOC"
      
      * tag 'arc-5.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: memset: fix build with L1_CACHE_SHIFT != 6
        ARC: [hsdk] Make it easier to add PAE40 region to DTB
        ARC: PAE40: don't panic and instead turn off hw ioc
      65beea4c
    • Alex Williamson's avatar
      PCI/portdrv: Use shared MSI/MSI-X vector for Bandwidth Management · 15d2aba7
      Alex Williamson authored
      The Interrupt Message Number in the PCIe Capabilities register (PCIe r4.0,
      sec 7.5.3.2) indicates which MSI/MSI-X vector is shared by interrupts
      related to the PCIe Capability, including Link Bandwidth Management and
      Link Autonomous Bandwidth Interrupts (Link Control, 7.5.3.7), Command
      Completed and Hot-Plug Interrupts (Slot Control, 7.5.3.10), and the PME
      Interrupt (Root Control, 7.5.3.12).
      
      pcie_message_numbers() checked whether we want to enable PME or Hot-Plug
      interrupts but neglected to check for Link Bandwidth Management, so if we
      only wanted the Bandwidth Management interrupts, it decided we didn't need
      any vectors at all.  Then pcie_port_enable_irq_vec() tried to reallocate
      zero vectors, which failed, resulting in fallback to INTx.
      
      On some systems, e.g., an X79-based workstation, that INTx seems broken or
      not handled correctly, so we got spurious IRQ16 interrupts for Bandwidth
      Management events.
      
      Change pcie_message_numbers() so that if we want Link Bandwidth Management
      interrupts, we use the shared MSI/MSI-X vector from the PCIe Capabilities
      register.
      
      Fixes: e8303bb7 ("PCI/LINK: Report degraded links via link bandwidth notification")
      Link: https://lore.kernel.org/lkml/155597243666.19387.1205950870601742062.stgit@gimli.homeSigned-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      [bhelgaas: changelog]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      15d2aba7
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fb0af61d
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Revert a recent ACPICA change that caused initialization to fail on
        systems with Thunderbolt docking stations connected at the init time"
      
      * tag 'acpi-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Revert "ACPICA: Clear status of GPEs before enabling them"
      fb0af61d
    • Linus Torvalds's avatar
      gcc-9: don't warn about uninitialized btrfs extent_type variable · 7e74e235
      Linus Torvalds authored
      The 'extent_type' variable does seem to be reliably initialized, but
      it's _very_ non-obvious, since there's a "goto next" case that jumps
      over the normal initialization.  That will then always trigger the
      "start >= extent_end" test, which will end up never falling through to
      the use of that variable.
      
      But the code is certainly not obvious, and the compiler warning looks
      reasonable.  Make 'extent_type' an int, and initialize it to an invalid
      negative value, which seems to be the common pattern in other places.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e74e235
    • Linus Torvalds's avatar
      gcc-9: properly declare the {pv,hv}clock_page storage · 459e3a21
      Linus Torvalds authored
      The pvlock_page and hvclock_page variables are (as the name implies)
      addresses to pages, created by the linker script.
      
      But we declared them as just "extern u8" variables, which _works_, but
      now that gcc does some more bounds checking, it causes warnings like
      
          warning: array subscript 1 is outside array bounds of ‘u8[1]’
      
      when we then access more than one byte from those variables.
      
      Fix this by simply making the declaration of the variables match
      reality, which makes the compiler happy too.
      Signed-off-by: default avatarLinus Torvalds <torvalds@-linux-foundation.org>
      459e3a21
    • Linus Torvalds's avatar
      gcc-9: don't warn about uninitialized variable · cf676908
      Linus Torvalds authored
      I'm not sure what made gcc warn about this code now.  The 'ret' variable
      does end up initialized in all cases, but it's definitely not obvious,
      so the compiler is quite reasonable to warn about this.
      
      So just add initialization to make it all much more obvious both to
      compilers and to humans.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf676908
    • Linus Torvalds's avatar
      gcc-9: silence 'address-of-packed-member' warning · 6f303d60
      Linus Torvalds authored
      We already did this for clang, but now gcc has that warning too.  Yes,
      yes, the address may be unaligned.  And that's kind of the point.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f303d60
    • Mark Rutland's avatar
      io_uring: avoid page allocation warnings · d4ef6475
      Mark Rutland authored
      In io_sqe_buffer_register() we allocate a number of arrays based on the
      iov_len from the user-provided iov. While we limit iov_len to SZ_1G,
      we can still attempt to allocate arrays exceeding MAX_ORDER.
      
      On a 64-bit system with 4KiB pages, for an iov where iov_base = 0x10 and
      iov_len = SZ_1G, we'll calculate that nr_pages = 262145. When we try to
      allocate a corresponding array of (16-byte) bio_vecs, requiring 4194320
      bytes, which is greater than 4MiB. This results in SLUB warning that
      we're trying to allocate greater than MAX_ORDER, and failing the
      allocation.
      
      Avoid this by using kvmalloc() for allocations dependent on the
      user-provided iov_len. At the same time, fix a leak of imu->bvec when
      registration fails.
      
      Full splat from before this patch:
      
      WARNING: CPU: 1 PID: 2314 at mm/page_alloc.c:4595 __alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 2314 Comm: syz-executor326 Not tainted 5.1.0-rc7-dirty #4
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193
       show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x190 lib/dump_stack.c:113
       panic+0x384/0x68c kernel/panic.c:214
       __warn+0x2bc/0x2c0 kernel/panic.c:571
       report_bug+0x228/0x2d8 lib/bug.c:186
       bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956
       call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline]
       brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316
       do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831
       el1_dbg+0x18/0x8c
       __alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595
       alloc_pages_current+0x164/0x278 mm/mempolicy.c:2132
       alloc_pages include/linux/gfp.h:509 [inline]
       kmalloc_order+0x20/0x50 mm/slab_common.c:1231
       kmalloc_order_trace+0x30/0x2b0 mm/slab_common.c:1243
       kmalloc_large include/linux/slab.h:480 [inline]
       __kmalloc+0x3dc/0x4f0 mm/slub.c:3791
       kmalloc_array include/linux/slab.h:670 [inline]
       io_sqe_buffer_register fs/io_uring.c:2472 [inline]
       __io_uring_register fs/io_uring.c:2962 [inline]
       __do_sys_io_uring_register fs/io_uring.c:3008 [inline]
       __se_sys_io_uring_register fs/io_uring.c:2990 [inline]
       __arm64_sys_io_uring_register+0x9e0/0x1bc8 fs/io_uring.c:2990
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
       el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83
       el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129
       el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948
      SMP: stopping secondary CPUs
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      CPU features: 0x002,23000438
      Memory Limit: none
      Rebooting in 1 seconds..
      
      Fixes: edafccee ("io_uring: add support for pre-mapped user IO buffers")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4ef6475
    • Ming Lei's avatar
      iov_iter: fix iov_iter_type · f5eb4d3b
      Ming Lei authored
      Commit 875f1d07 ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
      introduces one extra flag of ITER_BVEC_FLAG_NO_REF, and this flag
      is stored into iter->type.
      
      However, iov_iter_type() doesn't consider the new added flag, fix
      it by masking this flag in iov_iter_type().
      
      Fixes: 875f1d07 ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f5eb4d3b
    • Ming Lei's avatar
      block: fix handling for BIO_NO_PAGE_REF · 60a27b90
      Ming Lei authored
      Commit 399254aa ("block: add BIO_NO_PAGE_REF flag") introduces
      BIO_NO_PAGE_REF, and once this flag is set for one bio, all pages
      in the bio won't be get/put during IO.
      
      However, if one bio is submitted via __blkdev_direct_IO_simple(),
      even though BIO_NO_PAGE_REF is set, pages still may be put.
      
      Fixes this issue by avoiding to put pages if BIO_NO_PAGE_REF is
      set.
      
      Fixes: 399254aa ("block: add BIO_NO_PAGE_REF flag")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      60a27b90
    • Jens Axboe's avatar
      io_uring: drop req submit reference always in async punt · 817869d2
      Jens Axboe authored
      If we don't end up actually calling submit in io_sq_wq_submit_work(),
      we still need to drop the submit reference to the request. If we
      don't, then we can leak the request. This can happen if we race
      with ring shutdown while flushing the workqueue for requests that
      require use of the mm_struct.
      
      Fixes: e65ef56d ("io_uring: use regular request ref counts")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      817869d2
    • Mark Rutland's avatar
      io_uring: free allocated io_memory once · 52e04ef4
      Mark Rutland authored
      If io_allocate_scq_urings() fails to allocate an sq_* region, it will
      call io_mem_free() for any previously allocated regions, but leave
      dangling pointers to these regions in the ctx. Any regions which have
      not yet been allocated are left NULL. Note that when returning
      -EOVERFLOW, the previously allocated sq_ring is not freed, which appears
      to be an unintentional leak.
      
      When io_allocate_scq_urings() fails, io_uring_create() will call
      io_ring_ctx_wait_and_kill(), which calls io_mem_free() on all the sq_*
      regions, assuming the pointers are valid and not NULL.
      
      This can result in pages being freed multiple times, which has been
      observed to corrupt the page state, leading to subsequent fun. This can
      also result in virt_to_page() on NULL, resulting in the use of bogus
      page addresses, and yet more subsequent fun. The latter can be detected
      with CONFIG_DEBUG_VIRTUAL on arm64.
      
      Adding a cleanup path to io_allocate_scq_urings() complicates the logic,
      so let's leave it to io_ring_ctx_free() to consistently free these
      pointers, and simplify the io_allocate_scq_urings() error paths.
      
      Full splats from before this patch below. Note that the pointer logged
      by the DEBUG_VIRTUAL "non-linear address" warning has been hashed, and
      is actually NULL.
      
      [   26.098129] page:ffff80000e949a00 count:0 mapcount:-128 mapping:0000000000000000 index:0x0
      [   26.102976] flags: 0x63fffc000000()
      [   26.104373] raw: 000063fffc000000 ffff80000e86c188 ffff80000ea3df08 0000000000000000
      [   26.108917] raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
      [   26.137235] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
      [   26.143960] ------------[ cut here ]------------
      [   26.146020] kernel BUG at include/linux/mm.h:547!
      [   26.147586] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      [   26.149163] Modules linked in:
      [   26.150287] Process syz-executor.21 (pid: 20204, stack limit = 0x000000000e9cefeb)
      [   26.153307] CPU: 2 PID: 20204 Comm: syz-executor.21 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #18
      [   26.156566] Hardware name: linux,dummy-virt (DT)
      [   26.158089] pstate: 40400005 (nZcv daif +PAN -UAO)
      [   26.159869] pc : io_mem_free+0x9c/0xa8
      [   26.161436] lr : io_mem_free+0x9c/0xa8
      [   26.162720] sp : ffff000013003d60
      [   26.164048] x29: ffff000013003d60 x28: ffff800025048040
      [   26.165804] x27: 0000000000000000 x26: ffff800025048040
      [   26.167352] x25: 00000000000000c0 x24: ffff0000112c2820
      [   26.169682] x23: 0000000000000000 x22: 0000000020000080
      [   26.171899] x21: ffff80002143b418 x20: ffff80002143b400
      [   26.174236] x19: ffff80002143b280 x18: 0000000000000000
      [   26.176607] x17: 0000000000000000 x16: 0000000000000000
      [   26.178997] x15: 0000000000000000 x14: 0000000000000000
      [   26.181508] x13: 00009178a5e077b2 x12: 0000000000000001
      [   26.183863] x11: 0000000000000000 x10: 0000000000000980
      [   26.186437] x9 : ffff000013003a80 x8 : ffff800025048a20
      [   26.189006] x7 : ffff8000250481c0 x6 : ffff80002ffe9118
      [   26.191359] x5 : ffff80002ffe9118 x4 : 0000000000000000
      [   26.193863] x3 : ffff80002ffefe98 x2 : 44c06ddd107d1f00
      [   26.196642] x1 : 0000000000000000 x0 : 000000000000003e
      [   26.198892] Call trace:
      [   26.199893]  io_mem_free+0x9c/0xa8
      [   26.201155]  io_ring_ctx_wait_and_kill+0xec/0x180
      [   26.202688]  io_uring_setup+0x6c4/0x6f0
      [   26.204091]  __arm64_sys_io_uring_setup+0x18/0x20
      [   26.205576]  el0_svc_common.constprop.0+0x7c/0xe8
      [   26.207186]  el0_svc_handler+0x28/0x78
      [   26.208389]  el0_svc+0x8/0xc
      [   26.209408] Code: aa0203e0 d0006861 9133a021 97fcdc3c (d4210000)
      [   26.211995] ---[ end trace bdb81cd43a21e50d ]---
      
      [   81.770626] ------------[ cut here ]------------
      [   81.825015] virt_to_phys used for non-linear address: 000000000d42f2c7 (          (null))
      [   81.827860] WARNING: CPU: 1 PID: 30171 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x48/0x68
      [   81.831202] Modules linked in:
      [   81.832212] CPU: 1 PID: 30171 Comm: syz-executor.20 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #19
      [   81.835616] Hardware name: linux,dummy-virt (DT)
      [   81.836863] pstate: 60400005 (nZCv daif +PAN -UAO)
      [   81.838727] pc : __virt_to_phys+0x48/0x68
      [   81.840572] lr : __virt_to_phys+0x48/0x68
      [   81.842264] sp : ffff80002cf67c70
      [   81.843858] x29: ffff80002cf67c70 x28: ffff800014358e18
      [   81.846463] x27: 0000000000000000 x26: 0000000020000080
      [   81.849148] x25: 0000000000000000 x24: ffff80001bb01f40
      [   81.851986] x23: ffff200011db06c8 x22: ffff2000127e3c60
      [   81.854351] x21: ffff800014358cc0 x20: ffff800014358d98
      [   81.856711] x19: 0000000000000000 x18: 0000000000000000
      [   81.859132] x17: 0000000000000000 x16: 0000000000000000
      [   81.861586] x15: 0000000000000000 x14: 0000000000000000
      [   81.863905] x13: 0000000000000000 x12: ffff1000037603e9
      [   81.866226] x11: 1ffff000037603e8 x10: 0000000000000980
      [   81.868776] x9 : ffff80002cf67840 x8 : ffff80001bb02920
      [   81.873272] x7 : ffff1000037603e9 x6 : ffff80001bb01f47
      [   81.875266] x5 : ffff1000037603e9 x4 : dfff200000000000
      [   81.876875] x3 : ffff200010087528 x2 : ffff1000059ecf58
      [   81.878751] x1 : 44c06ddd107d1f00 x0 : 0000000000000000
      [   81.880453] Call trace:
      [   81.881164]  __virt_to_phys+0x48/0x68
      [   81.882919]  io_mem_free+0x18/0x110
      [   81.886585]  io_ring_ctx_wait_and_kill+0x13c/0x1f0
      [   81.891212]  io_uring_setup+0xa60/0xad0
      [   81.892881]  __arm64_sys_io_uring_setup+0x2c/0x38
      [   81.894398]  el0_svc_common.constprop.0+0xac/0x150
      [   81.896306]  el0_svc_handler+0x34/0x88
      [   81.897744]  el0_svc+0x8/0xc
      [   81.898715] ---[ end trace b4a703802243cbba ]---
      
      Fixes: 2b188cc1 ("Add io_uring IO interface")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-block@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      52e04ef4
    • Mark Rutland's avatar
      io_uring: fix SQPOLL cpu validation · 975554b0
      Mark Rutland authored
      In io_sq_offload_start(), we call cpu_possible() on an unbounded cpu
      value from userspace. On v5.1-rc7 on arm64 with
      CONFIG_DEBUG_PER_CPU_MAPS, this results in a splat:
      
        WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
      
      There was an attempt to fix this in commit:
      
        917257da ("io_uring: only test SQPOLL cpu after we've verified it")
      
      ... by adding a check after the cpu value had been limited to NR_CPU_IDS
      using array_index_nospec(). However, this left an unbound check at the
      start of the function, for which the warning still fires.
      
      Let's fix this correctly by checking that the cpu value is bound by
      nr_cpu_ids before passing it to cpu_possible(). Note that only
      nr_cpu_ids of a cpumask are guaranteed to exist at runtime, and
      nr_cpu_ids can be significantly smaller than NR_CPUs. For example, an
      arm64 defconfig has NR_CPUS=256, while my test VM has 4 vCPUs.
      
      Following the intent from the commit message for 917257da, the
      check is moved under the SQ_AFF branch, which is the only branch where
      the cpu values is consumed. The check is performed before bounding the
      value with array_index_nospec() so that we don't silently accept bogus
      cpu values from userspace, where array_index_nospec() would force these
      values to 0.
      
      I suspect we can remove the array_index_nospec() call entirely, but I've
      conservatively left that in place, updated to use nr_cpu_ids to match
      the prior check.
      
      Tested on arm64 with the Syzkaller reproducer:
      
        https://syzkaller.appspot.com/bug?extid=cd714a07c6de2bc34293
        https://syzkaller.appspot.com/x/repro.syz?x=15d8b397200000
      
      Full splat from before this patch:
      
      WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
      WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpumask_check include/linux/cpumask.h:128 [inline]
      WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpumask_test_cpu include/linux/cpumask.h:344 [inline]
      WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_sq_offload_start fs/io_uring.c:2244 [inline]
      WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_uring_create fs/io_uring.c:2864 [inline]
      WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_uring_setup+0x1108/0x15a0 fs/io_uring.c:2916
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 27601 Comm: syz-executor.0 Not tainted 5.1.0-rc7 #3
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193
       show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x190 lib/dump_stack.c:113
       panic+0x384/0x68c kernel/panic.c:214
       __warn+0x2bc/0x2c0 kernel/panic.c:571
       report_bug+0x228/0x2d8 lib/bug.c:186
       bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956
       call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline]
       brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316
       do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831
       el1_dbg+0x18/0x8c
       cpu_max_bits_warn include/linux/cpumask.h:121 [inline]
       cpumask_check include/linux/cpumask.h:128 [inline]
       cpumask_test_cpu include/linux/cpumask.h:344 [inline]
       io_sq_offload_start fs/io_uring.c:2244 [inline]
       io_uring_create fs/io_uring.c:2864 [inline]
       io_uring_setup+0x1108/0x15a0 fs/io_uring.c:2916
       __do_sys_io_uring_setup fs/io_uring.c:2929 [inline]
       __se_sys_io_uring_setup fs/io_uring.c:2926 [inline]
       __arm64_sys_io_uring_setup+0x50/0x70 fs/io_uring.c:2926
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
       el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83
       el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129
       el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948
      SMP: stopping secondary CPUs
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      CPU features: 0x002,23000438
      Memory Limit: none
      Rebooting in 1 seconds..
      
      Fixes: 917257da ("io_uring: only test SQPOLL cpu after we've verified it")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-block@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      
      Simplied the logic
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      975554b0
    • Jens Axboe's avatar
      io_uring: have submission side sqe errors post a cqe · 5c8b0b54
      Jens Axboe authored
      Currently we only post a cqe if we get an error OUTSIDE of submission.
      For submission, we return the error directly through io_uring_enter().
      This is a bit awkward for applications, and it makes more sense to
      always post a cqe with an error, if the error happens on behalf of an
      sqe.
      
      This changes submission behavior a bit. io_uring_enter() returns -ERROR
      for an error, and > 0 for number of sqes submitted. Before this change,
      if you wanted to submit 8 entries and had an error on the 5th entry,
      io_uring_enter() would return 4 (for number of entries successfully
      submitted) and rewind the sqring. The application would then have to
      peek at the sqring and figure out what was wrong with the head sqe, and
      then skip it itself. With this change, we'll return 5 since we did
      consume 5 sqes, and the last sqe (with the error) will result in a cqe
      being posted with the error.
      
      This makes the logic easier to handle in the application, and it cleans
      up the submission part.
      Suggested-by: default avatarStefan Bühler <source@stbuehler.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5c8b0b54
  3. 30 Apr, 2019 12 commits
  4. 29 Apr, 2019 5 commits
    • Linus Torvalds's avatar
      Merge tag 'seccomp-v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 83a50840
      Linus Torvalds authored
      Pull seccomp fixes from Kees Cook:
       "Syzbot found a use-after-free bug in seccomp due to flags that should
        not be allowed to be used together.
      
        Tycho fixed this, I updated the self-tests, and the syzkaller PoC has
        been running for several days without triggering KASan (before this
        fix, it would reproduce). These patches have also been in -next for
        almost a week, just to be sure.
      
         - Add logic for making some seccomp flags exclusive (Tycho)
      
         - Update selftests for exclusivity testing (Kees)"
      
      * tag 'seccomp-v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        seccomp: Make NEW_LISTENER and TSYNC flags exclusive
        selftests/seccomp: Prepare for exclusive seccomp flags
      83a50840
    • Linus Torvalds's avatar
      x86: make ZERO_PAGE() at least parse its argument · 80871482
      Linus Torvalds authored
      This doesn't really do anything, but at least we now parse teh
      ZERO_PAGE() address argument so that we'll catch the most obvious errors
      in usage next time they'll happen.
      
      See commit 6a5c5d26 ("rdma: fix build errors on s390 and MIPS due to
      bad ZERO_PAGE use") what happens when we don't have any use of the macro
      argument at all.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80871482
    • Linus Torvalds's avatar
      rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE use · 6a5c5d26
      Linus Torvalds authored
      The parameter to ZERO_PAGE() was wrong, but since all architectures
      except for MIPS and s390 ignore it, it wasn't noticed until 0-day
      reported the build error.
      
      Fixes: 67f269b3 ("RDMA/ucontext: Fix regression with disassociate")
      Cc: stable@vger.kernel.org
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Leon Romanovsky <leonro@mellanox.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6a5c5d26
    • Paulo Alcantara's avatar
      selinux: use kernel linux/socket.h for genheaders and mdp · dfbd199a
      Paulo Alcantara authored
      When compiling genheaders and mdp from a newer host kernel, the
      following error happens:
      
          In file included from scripts/selinux/genheaders/genheaders.c:18:
          ./security/selinux/include/classmap.h:238:2: error: #error New
          address family defined, please update secclass_map.  #error New
          address family defined, please update secclass_map.  ^~~~~
          make[3]: *** [scripts/Makefile.host:107:
          scripts/selinux/genheaders/genheaders] Error 1 make[2]: ***
          [scripts/Makefile.build:599: scripts/selinux/genheaders] Error 2
          make[1]: *** [scripts/Makefile.build:599: scripts/selinux] Error 2
          make[1]: *** Waiting for unfinished jobs....
      
      Instead of relying on the host definition, include linux/socket.h in
      classmap.h to have PF_MAX.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaulo Alcantara <paulo@paulo.ac>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      [PM: manually merge in mdp.c, subject line tweaks]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      dfbd199a
    • Linus Torvalds's avatar
      Linux 5.1-rc7 · 37624b58
      Linus Torvalds authored
      37624b58
  5. 28 Apr, 2019 4 commits
    • Jan Kara's avatar
      fsnotify: Fix NULL ptr deref in fanotify_get_fsid() · b1da6a51
      Jan Kara authored
      fanotify_get_fsid() is reading mark->connector->fsid under srcu. It can
      happen that it sees mark not fully initialized or mark that is already
      detached from the object list. In these cases mark->connector
      can be NULL leading to NULL ptr dereference. Fix the problem by
      being careful when reading mark->connector and check it for being NULL.
      Also use WRITE_ONCE when writing the mark just to prevent compiler from
      doing something stupid.
      
      Reported-by: syzbot+15927486a4f1bfcbaf91@syzkaller.appspotmail.com
      Fixes: 77115225 ("fanotify: cache fsid in fsnotify_mark_connector")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      b1da6a51
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 9520b532
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
       "A small number of ARM fixes
      
         - Fix function tracer and unwinder dependencies so that we don't end
           up building kernels that will crash
      
         - Fix ARMv7M nommu initialisation (missing register initialisation)
      
         - Fix EFI decompressor entry (ensuring barrier instructions are
           enabled prior to use)"
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache
        ARM: 8856/1: NOMMU: Fix CCR register faulty initialization when MPU is disabled
        ARM: fix function graph tracer and unwinder dependencies
      9520b532
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 0d82044e
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "A one-liner to make our Radix MMU support depend on HUGETLB_PAGE. We
        use some of the hugetlb inlines (eg. pud_huge()) when operating on the
        linear mapping and if they're compiled into empty wrappers we can
        corrupt memory.
      
        Then two fixes to our VFIO IOMMU code. The first is not a regression
        but fixes the locking to avoid a user-triggerable deadlock.
      
        The second does fix a regression since rc1, and depends on the first
        fix. It makes it possible to run guests with large amounts of memory
        again (~256GB).
      
        Thanks to Alexey Kardashevskiy"
      
      * tag 'powerpc-5.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm_iommu: Allow pinning large regions
        powerpc/mm_iommu: Fix potential deadlock
        powerpc/mm/radix: Make Radix require HUGETLB_PAGE
      0d82044e
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190428' of git://git.kernel.dk/linux-block · 975a0f40
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A set of io_uring fixes that should go into this release. In
        particular, this contains:
      
         - The mutex lock vs ctx ref count fix (me)
      
         - Removal of a dead variable (me)
      
         - Two race fixes (Stefan)
      
         - Ring head/tail condition fix for poll full SQ detection (Stefan)"
      
      * tag 'for-linus-20190428' of git://git.kernel.dk/linux-block:
        io_uring: remove 'state' argument from io_{read,write} path
        io_uring: fix poll full SQ detection
        io_uring: fix race condition when sq threads goes sleeping
        io_uring: fix race condition reading SQ entries
        io_uring: fail io_uring_register(2) on a dying io_uring instance
      975a0f40