1. 03 Sep, 2021 4 commits
    • David Hildenbrand's avatar
      binfmt: remove in-tree usage of MAP_DENYWRITE · 4589ff7c
      David Hildenbrand authored
      At exec time when we mmap the new executable via MAP_DENYWRITE we have it
      opened via do_open_execat() and already deny_write_access()'ed the file
      successfully. Once exec completes, we allow_write_acces(); however,
      we set mm->exe_file in begin_new_exec() via set_mm_exe_file() and
      also deny_write_access() as long as mm->exe_file remains set. We'll
      effectively deny write access to our executable via mm->exe_file
      until mm->exe_file is changed -- when the process is removed, on new
      exec, or via sys_prctl(PR_SET_MM_MAP/EXE_FILE).
      
      Let's remove all usage of MAP_DENYWRITE, it's no longer necessary for
      mm->exe_file.
      
      In case of an elf interpreter, we'll now only deny write access to the file
      during exec. This is somewhat okay, because the interpreter behaves
      (and sometime is) a shared library; all shared libraries, especially the
      ones loaded directly in user space like via dlopen() won't ever be mapped
      via MAP_DENYWRITE, because we ignore that from user space completely;
      these shared libraries can always be modified while mapped and executed.
      Let's only special-case the main executable, denying write access while
      being executed by a process. This can be considered a minor user space
      visible change.
      
      While this is a cleanup, it also fixes part of a problem reported with
      VM_DENYWRITE on overlayfs, as VM_DENYWRITE is effectively unused with
      this patch and will be removed next:
        "Overlayfs did not honor positive i_writecount on realfile for
         VM_DENYWRITE mappings." [1]
      
      [1] https://lore.kernel.org/r/YNHXzBgzRrZu1MrD@miu.piliscsaba.redhat.com/Reported-by: default avatarChengguang Xu <cgxu519@mykernel.net>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      4589ff7c
    • David Hildenbrand's avatar
      kernel/fork: always deny write access to current MM exe_file · fe69d560
      David Hildenbrand authored
      We want to remove VM_DENYWRITE only currently only used when mapping the
      executable during exec. During exec, we already deny_write_access() the
      executable, however, after exec completes the VMAs mapped
      with VM_DENYWRITE effectively keeps write access denied via
      deny_write_access().
      
      Let's deny write access when setting or replacing the MM exe_file. With
      this change, we can remove VM_DENYWRITE for mapping executables.
      
      Make set_mm_exe_file() return an error in case deny_write_access()
      fails; note that this should never happen, because exec code does a
      deny_write_access() early and keeps write access denied when calling
      set_mm_exe_file. However, it makes the code easier to read and makes
      set_mm_exe_file() and replace_mm_exe_file() look more similar.
      
      This represents a minor user space visible change:
      sys_prctl(PR_SET_MM_MAP/EXE_FILE) can now fail if the file is already
      opened writable. Also, after sys_prctl(PR_SET_MM_MAP/EXE_FILE) the file
      cannot be opened writable. Note that we can already fail with -EACCES if
      the file doesn't have execute permissions.
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      fe69d560
    • David Hildenbrand's avatar
      kernel/fork: factor out replacing the current MM exe_file · 35d7bdc8
      David Hildenbrand authored
      Let's factor the main logic out into replace_mm_exe_file(), such that
      all mm->exe_file logic is contained in kernel/fork.c.
      
      While at it, perform some simple cleanups that are possible now that
      we're simplifying the individual functions.
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      35d7bdc8
    • David Hildenbrand's avatar
      binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() · 42be8b42
      David Hildenbrand authored
      uselib() is the legacy systemcall for loading shared libraries.
      Nowadays, applications use dlopen() to load shared libraries, completely
      implemented in user space via mmap().
      
      For example, glibc uses MAP_COPY to mmap shared libraries. While this
      maps to MAP_PRIVATE | MAP_DENYWRITE on Linux, Linux ignores any
      MAP_DENYWRITE specification from user space in mmap.
      
      With this change, all remaining in-tree users of MAP_DENYWRITE use it
      to map an executable. We will be able to open shared libraries loaded
      via uselib() writable, just as we already can via dlopen() from user
      space.
      
      This is one step into the direction of removing MAP_DENYWRITE from the
      kernel. This can be considered a minor user space visible change.
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      42be8b42
  2. 29 Aug, 2021 8 commits
  3. 28 Aug, 2021 3 commits
  4. 27 Aug, 2021 18 commits
  5. 26 Aug, 2021 7 commits
    • Marek Marczykowski-Górecki's avatar
      PCI/MSI: Skip masking MSI-X on Xen PV · 1a519dc7
      Marek Marczykowski-Górecki authored
      When running as Xen PV guest, masking MSI-X is a responsibility of the
      hypervisor. The guest has no write access to the relevant BAR at all - when
      it tries to, it results in a crash like this:
      
          BUG: unable to handle page fault for address: ffffc9004069100c
          #PF: supervisor write access in kernel mode
          #PF: error_code(0x0003) - permissions violation
          RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
           e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
           e1000_probe+0x41f/0xdb0 [e1000e]
           local_pci_probe+0x42/0x80
          (...)
      
      The recently introduced function msix_mask_all() does not check the global
      variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking
      of MSI[-X] interrupts.
      
      Add the check to make this function XEN PV compatible.
      
      Fixes: 7d5ec3d3 ("PCI/MSI: Mask all unused MSI-X entries")
      Signed-off-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.com
      1a519dc7
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux · 73367f05
      Linus Torvalds authored
      Pull nfsd fix from Bruce Fields:
       "This is a one-liner fix for a serious bug that can cause the server to
        become unresponsive to a client, so I think it's worth the last-minute
        inclusion for 5.14"
      
      * tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
        SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
      73367f05
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8a2cb8bd
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from can and bpf.
      
        Closing three hw-dependent regressions. Any fixes of note are in the
        'old code' category. Nothing blocking release from our perspective.
      
        Current release - regressions:
      
         - stmmac: revert "stmmac: align RX buffers"
      
         - usb: asix: ax88772: move embedded PHY detection as early as
           possible
      
         - usb: asix: do not call phy_disconnect() for ax88178
      
         - Revert "net: really fix the build...", from Kalle to fix QCA6390
      
        Current release - new code bugs:
      
         - phy: mediatek: add the missing suspend/resume callbacks
      
        Previous releases - regressions:
      
         - qrtr: fix another OOB Read in qrtr_endpoint_post
      
         - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
      
        Previous releases - always broken:
      
         - inet: use siphash in exception handling
      
         - ip_gre: add validation for csum_start
      
         - bpf: fix ringbuf helper function compatibility
      
         - rtnetlink: return correct error on changing device netns
      
         - e1000e: do not try to recover the NVM checksum on Tiger Lake"
      
      * tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
        Revert "net: really fix the build..."
        net: hns3: fix get wrong pfc_en when query PFC configuration
        net: hns3: fix GRO configuration error after reset
        net: hns3: change the method of getting cmd index in debugfs
        net: hns3: fix duplicate node in VLAN list
        net: hns3: fix speed unknown issue in bond 4
        net: hns3: add waiting time before cmdq memory is released
        net: hns3: clear hardware resource when loading driver
        net: fix NULL pointer reference in cipso_v4_doi_free
        rtnetlink: Return correct error on changing device netns
        net: dsa: hellcreek: Adjust schedule look ahead window
        net: dsa: hellcreek: Fix incorrect setting of GCL
        cxgb4: dont touch blocked freelist bitmap after free
        ipv4: use siphash instead of Jenkins in fnhe_hashfun()
        ipv6: use siphash in rt6_exception_hash()
        can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
        net: usb: asix: ax88772: fix boolconv.cocci warnings
        net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
        qede: Fix memset corruption
        net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
        ...
      8a2cb8bd
    • Jens Axboe's avatar
      Revert "block/mq-deadline: Prioritize high-priority requests" · 7b05bf77
      Jens Axboe authored
      This reverts commit fb926032.
      
      Zhen reports that this commit slows down mq-deadline on a 128 thread
      box, going from 258K IOPS to 170-180K. My testing shows that Optane
      gen2 IOPS goes from 2.3M IOPS to 1.2M IOPS on a 64 thread box.
      
      Looking in detail at the code, the main culprit here is needing to sum
      percpu counters in the dispatch hot path, leading to very high CPU
      utilization there. To make matters worse, the code currently needs to
      sum 2 percpu counters, and it does so in the most naive way of iterating
      possible CPUs _twice_.
      
      Since we're close to release, revert this commit and we can re-do it
      with regular per-priority counters instead for the 5.15 kernel.
      
      Link: https://lore.kernel.org/linux-block/20210826144039.2143-1-thunder.leizhen@huawei.com/Reported-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7b05bf77
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 1a6d80ff
      Linus Torvalds authored
      Pull arm64 fix from Will Deacon:
       "We received a report this week that the generic version of
        pfn_valid(), which we switched to this merge window in 16c9afc7
        ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), interacts badly with
        dma_map_resource() due to the following check:
      
              /* Don't allow RAM to be mapped */
              if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
                      return DMA_MAPPING_ERROR;
      
        Since the ongoing saga to determine the semantics of pfn_valid() is
        unlikely to be resolved this week (does it indicate valid memory, or
        just the presence of a struct page, or whether that struct page has
        been initialised?), just revert back to our old version of pfn_valid()
        for 5.14.
      
        Summary:
      
         - Fix dma_map_resource() by reverting back to old pfn_valid() code"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
      1a6d80ff
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client · 97d8cc20
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two memory management fixes for the filesystem"
      
      * tag 'ceph-for-5.14-rc8' of git://github.com/ceph/ceph-client:
        ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()
        ceph: correctly handle releasing an embedded cap flush
      97d8cc20
    • Kalle Valo's avatar
      Revert "net: really fix the build..." · 9ebc2758
      Kalle Valo authored
      This reverts commit ce78ffa3.
      
      Wren and Nicolas reported that ath11k was failing to initialise QCA6390
      Wi-Fi 6 device with error:
      
      qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22
      
      Commit ce78ffa3 ("net: really fix the build..."), introduced in
      v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
      devices are broken, but I only tested QCA6390. Let's revert the broken
      commit so that ath11k works again.
      Reported-by: default avatarWren Turkal <wt@penguintechs.org>
      Reported-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9ebc2758