1. 27 May, 2022 8 commits
    • Florian Westphal's avatar
      netfilter: conntrack: re-fetch conntrack after insertion · 56b14ece
      Florian Westphal authored
      In case the conntrack is clashing, insertion can free skb->_nfct and
      set skb->_nfct to the already-confirmed entry.
      
      This wasn't found before because the conntrack entry and the extension
      space used to free'd after an rcu grace period, plus the race needs
      events enabled to trigger.
      
      Reported-by: <syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com>
      Fixes: 71d8c47f ("netfilter: conntrack: introduce clash resolution on insertion race")
      Fixes: 2ad9d774 ("netfilter: conntrack: free extension area immediately")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      56b14ece
    • Florian Westphal's avatar
      netfilter: nfnetlink: fix warn in nfnetlink_unbind · ffd219ef
      Florian Westphal authored
      syzbot reports following warn:
      WARNING: CPU: 0 PID: 3600 at net/netfilter/nfnetlink.c:703 nfnetlink_unbind+0x357/0x3b0 net/netfilter/nfnetlink.c:694
      
      The syzbot generated program does this:
      
      socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3
      setsockopt(3, SOL_NETLINK, NETLINK_DROP_MEMBERSHIP, [1], 4) = 0
      
      ... which triggers 'WARN_ON_ONCE(nfnlnet->ctnetlink_listeners == 0)' check.
      
      Instead of counting, just enable reporting for every bind request
      and check if we still have listeners on unbind.
      
      While at it, also add the needed bounds check on nfnl_group2type[]
      access.
      
      Reported-by: <syzbot+4903218f7fba0a2d6226@syzkaller.appspotmail.com>
      Reported-by: <syzbot+afd2d80e495f96049571@syzkaller.appspotmail.com>
      Fixes: 2794cdb0 ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ffd219ef
    • Miaoqian Lin's avatar
      net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_register · 02ded5a1
      Miaoqian Lin authored
      of_get_child_by_name() returns a node pointer with refcount
      incremented, we should use of_node_put() on it when done.
      
      mv88e6xxx_mdio_register() pass the device node to of_mdiobus_register().
      We don't need the device node after it.
      
      Add missing of_node_put() to avoid refcount leak.
      
      Fixes: a3c53be5 ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02ded5a1
    • Miaoqian Lin's avatar
      net: ethernet: ti: am65-cpsw-nuss: Fix some refcount leaks · 5dd89d2f
      Miaoqian Lin authored
      of_get_child_by_name() returns a node pointer with refcount
      incremented, we should use of_node_put() on it when not need anymore.
      am65_cpsw_init_cpts() and am65_cpsw_nuss_probe() don't release
      the refcount in error case.
      Add missing of_node_put() to avoid refcount leak.
      
      Fixes: b1f66a5b ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support")
      Fixes: 93a76530 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5dd89d2f
    • Dan Carpenter's avatar
      net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry() · e7e7104e
      Dan Carpenter authored
      The "fsp->location" variable comes from user via ethtool_get_rxnfc().
      Check that it is valid to prevent an out of bounds read.
      
      Fixes: 7aab747e ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7e7104e
    • Vincent Ray's avatar
      net: sched: fixed barrier to prevent skbuff sticking in qdisc backlog · a54ce370
      Vincent Ray authored
      In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit()
      does not provide any ordering guarantee as test_bit() is not an atomic
      operation. This, added to the fact that the spin_trylock() call at
      the beginning of qdisc_run_begin() does not guarantee acquire
      semantics if it does not grab the lock, makes it possible for the
      following statement :
      
      if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))
      
      to be executed before an enqueue operation called before
      qdisc_run_begin().
      
      As a result the following race can happen :
      
                 CPU 1                             CPU 2
      
            qdisc_run_begin()               qdisc_run_begin() /* true */
              set(MISSED)                            .
            /* returns false */                      .
                .                            /* sees MISSED = 1 */
                .                            /* so qdisc not empty */
                .                            __qdisc_run()
                .                                    .
                .                              pfifo_fast_dequeue()
       ----> /* may be done here */                  .
      |         .                                clear(MISSED)
      |         .                                    .
      |         .                                smp_mb __after_atomic();
      |         .                                    .
      |         .                                /* recheck the queue */
      |         .                                /* nothing => exit   */
      |   enqueue(skb1)
      |         .
      |   qdisc_run_begin()
      |         .
      |     spin_trylock() /* fail */
      |         .
      |     smp_mb__before_atomic() /* not enough */
      |         .
       ---- if (test_bit(MISSED))
              return false;   /* exit */
      
      In the above scenario, CPU 1 and CPU 2 both try to grab the
      qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the
      bypass code path, where it emits its skb then calls __qdisc_run().
      
      CPU1 fails, sets MISSED and goes down the traditionnal enqueue() +
      dequeue() code path. But when executing qdisc_run_begin() for the
      second time, after enqueuing its skbuff, it sees the MISSED bit still
      set (by itself) and consequently chooses to exit early without setting
      it again nor trying to grab the spinlock again.
      
      Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue
      and found it empty, so it returned.
      
      At the end of the sequence, we end up with skb1 enqueued in the
      backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set,
      and no __netif_schedule() called made. skb1 will now linger in the
      qdisc until somebody later performs a full __qdisc_run(). Associated
      to the bypass capacity of the qdisc, and the ability of the TCP layer
      to avoid resending packets which it knows are still in the qdisc, this
      can lead to serious traffic "holes" in a TCP connection.
      
      We fix this by replacing the smp_mb__before_atomic() / test_bit() /
      set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin()
      by a single test_and_set_bit() call, which is more concise and
      enforces the needed memory barriers.
      
      Fixes: 89837eb4 ("net: sched: add barrier to ensure correct ordering for lockless qdisc")
      Signed-off-by: default avatarVincent Ray <vray@kalrayinc.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220526001746.2437669-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a54ce370
    • Michael Walle's avatar
      net: lan966x: check devm_of_phy_get() for -EDEFER_PROBE · b58cdd43
      Michael Walle authored
      At the moment, if devm_of_phy_get() returns an error the serdes
      simply isn't set. While it is bad to ignore an error in general, there
      is a particular bug that network isn't working if the serdes driver is
      compiled as a module. In that case, devm_of_phy_get() returns
      -EDEFER_PROBE and the error is silently ignored.
      
      The serdes is optional, it is not there if the port is using RGMII, in
      which case devm_of_phy_get() returns -ENODEV. Rearrange the error
      handling so that -ENODEV will be handled but other error codes will
      abort the probing.
      
      Fixes: d28d6d2e ("net: lan966x: add port module support")
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20220525231239.1307298-1-michael@walle.ccSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b58cdd43
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 4548ad72
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Fix UAF when creating non-stateful expression in set.
      
      2) Set limit cost when cloning expression accordingly, from Phil Sutter.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nft_limit: Clone packet limits' cost value
        netfilter: nf_tables: disallow non-stateful expression in sets earlier
      ====================
      
      Link: https://lore.kernel.org/r/20220526205411.315136-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4548ad72
  2. 26 May, 2022 12 commits
  3. 25 May, 2022 20 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 7e062cda
      Linus Torvalds authored
      Pull networking updates from Jakub Kicinski:
       "Core
        ----
      
         - Support TCPv6 segmentation offload with super-segments larger than
           64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
      
         - Generalize skb freeing deferral to per-cpu lists, instead of
           per-socket lists.
      
         - Add a netdev statistic for packets dropped due to L2 address
           mismatch (rx_otherhost_dropped).
      
         - Continue work annotating skb drop reasons.
      
         - Accept alternative netdev names (ALT_IFNAME) in more netlink
           requests.
      
         - Add VLAN support for AF_PACKET SOCK_RAW GSO.
      
         - Allow receiving skb mark from the socket as a cmsg.
      
         - Enable memcg accounting for veth queues, sysctl tables and IPv6.
      
        BPF
        ---
      
         - Add libbpf support for User Statically-Defined Tracing (USDTs).
      
         - Speed up symbol resolution for kprobes multi-link attachments.
      
         - Support storing typed pointers to referenced and unreferenced
           objects in BPF maps.
      
         - Add support for BPF link iterator.
      
         - Introduce access to remote CPU map elements in BPF per-cpu map.
      
         - Allow middle-of-the-road settings for the
           kernel.unprivileged_bpf_disabled sysctl.
      
         - Implement basic types of dynamic pointers e.g. to allow for
           dynamically sized ringbuf reservations without extra memory copies.
      
        Protocols
        ---------
      
         - Retire port only listening_hash table, add a second bind table
           hashed by port and address. Avoid linear list walk when binding to
           very popular ports (e.g. 443).
      
         - Add bridge FDB bulk flush filtering support allowing user space to
           remove all FDB entries matching a condition.
      
         - Introduce accept_unsolicited_na sysctl for IPv6 to implement
           router-side changes for RFC9131.
      
         - Support for MPTCP path manager in user space.
      
         - Add MPTCP support for fallback to regular TCP for connections that
           have never connected additional subflows or transmitted
           out-of-sequence data (partial support for RFC8684 fallback).
      
         - Avoid races in MPTCP-level window tracking, stabilize and improve
           throughput.
      
         - Support lockless operation of GRE tunnels with seq numbers enabled.
      
         - WiFi support for host based BSS color collision detection.
      
         - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
      
         - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
      
         - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
      
         - Allow matching on the number of VLAN tags via tc-flower.
      
         - Add tracepoint for tcp_set_ca_state().
      
        Driver API
        ----------
      
         - Improve error reporting from classifier and action offload.
      
         - Add support for listing line cards in switches (devlink).
      
         - Add helpers for reporting page pool statistics with ethtool -S.
      
         - Add support for reading clock cycles when using PTP virtual clocks,
           instead of having the driver convert to time before reporting. This
           makes it possible to report time from different vclocks.
      
         - Support configuring low-latency Tx descriptor push via ethtool.
      
         - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
      
        New hardware / drivers
        ----------------------
      
         - Ethernet:
            - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
            - Sunplus SP7021 SoC (sp7021_emac)
            - Add support for Renesas RZ/V2M (in ravb)
            - Add support for MediaTek mt7986 switches (in mtk_eth_soc)
      
         - Ethernet PHYs:
            - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
            - TI DP83TD510 PHY
            - Microchip LAN8742/LAN88xx PHYs
      
         - WiFi:
            - Driver for pureLiFi X, XL, XC devices (plfxlc)
            - Driver for Silicon Labs devices (wfx)
            - Support for WCN6750 (in ath11k)
            - Support Realtek 8852ce devices (in rtw89)
      
         - Mobile:
            - MediaTek T700 modems (Intel 5G 5000 M.2 cards)
      
         - CAN:
            - ctucanfd: add support for CTU CAN FD open-source IP core from
              Czech Technical University in Prague
      
        Drivers
        -------
      
         - Delete a number of old drivers still using virt_to_bus().
      
         - Ethernet NICs:
            - intel: support TSO on tunnels MPLS
            - broadcom: support multi-buffer XDP
            - nfp: support VF rate limiting
            - sfc: use hardware tx timestamps for more than PTP
            - mlx5: multi-port eswitch support
            - hyper-v: add support for XDP_REDIRECT
            - atlantic: XDP support (including multi-buffer)
            - macb: improve real-time perf by deferring Tx processing to NAPI
      
         - High-speed Ethernet switches:
            - mlxsw: implement basic line card information querying
            - prestera: add support for traffic policing on ingress and egress
      
         - Embedded Ethernet switches:
            - lan966x: add support for packet DMA (FDMA)
            - lan966x: add support for PTP programmable pins
            - ti: cpsw_new: enable bc/mc storm prevention
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - Wake-on-WLAN support for QCA6390 and WCN6855
            - device recovery (firmware restart) support
            - support setting Specific Absorption Rate (SAR) for WCN6855
            - read country code from SMBIOS for WCN6855/QCA6390
            - enable keep-alive during WoWLAN suspend
            - implement remain-on-channel support
      
         - MediaTek WiFi (mt76):
            - support Wireless Ethernet Dispatch offloading packet movement
              between the Ethernet switch and WiFi interfaces
            - non-standard VHT MCS10-11 support
            - mt7921 AP mode support
            - mt7921 IPv6 NS offload support
      
         - Ethernet PHYs:
            - micrel: ksz9031/ksz9131: cabletest support
            - lan87xx: SQI support for T1 PHYs
            - lan937x: add interrupt support for link detection"
      
      * tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
        ptp: ocp: Add firmware header checks
        ptp: ocp: fix PPS source selector debugfs reporting
        ptp: ocp: add .init function for sma_op vector
        ptp: ocp: vectorize the sma accessor functions
        ptp: ocp: constify selectors
        ptp: ocp: parameterize input/output sma selectors
        ptp: ocp: revise firmware display
        ptp: ocp: add Celestica timecard PCI ids
        ptp: ocp: Remove #ifdefs around PCI IDs
        ptp: ocp: 32-bit fixups for pci start address
        Revert "net/smc: fix listen processing for SMC-Rv2"
        ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
        selftests/bpf: Dynptr tests
        bpf: Add dynptr data slices
        bpf: Add bpf_dynptr_read and bpf_dynptr_write
        bpf: Dynptr support for ring buffers
        bpf: Add bpf_dynptr_from_mem for local dynptrs
        bpf: Add verifier support for dynptrs
        bpf: Suppress 'passing zero to PTR_ERR' warning
        bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
        ...
      7e062cda
    • Linus Torvalds's avatar
      Merge branch 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 5d1772b1
      Linus Torvalds authored
      Pull workqueue update from Tejun Heo:
       "A lone commit fixing CPU offline handling for per-cpu wq workers so
        that they don't bother isolated CPUs"
      
      * 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs
      5d1772b1
    • Linus Torvalds's avatar
      Merge branch 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 8b49c4b1
      Linus Torvalds authored
      Pull cgroup updates from Tejun Heo:
       "Nothing too interesting. This adds cpu controller selftests and there
        are a couple code cleanup patches"
      
      * 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: remove the superfluous judgment
        cgroup: Make cgroup_debug static
        kseltest/cgroup: Make test_stress.sh work if run interactively
        kselftest/cgroup: fix test_stress.sh to use OUTPUT dir
        cgroup: Add config file to cgroup selftest suite
        cgroup: Add test_cpucg_max_nested() testcase
        cgroup: Add test_cpucg_max() testcase
        cgroup: Add test_cpucg_nested_weight_underprovisioned() testcase
        cgroup: Adding test_cpucg_nested_weight_overprovisioned() testcase
        cgroup: Add test_cpucg_weight_underprovisioned() testcase
        cgroup: Add test_cpucg_weight_overprovisioned() testcase
        cgroup: Add test_cpucg_stats() testcase to cgroup cpu selftests
        cgroup: Add new test_cpu.c test suite in cgroup selftests
      8b49c4b1
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-kunit-5.19-rc1' of... · 64e34b50
      Linus Torvalds authored
      Merge tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull KUnit updates from Shuah Khan:
       "Several fixes, cleanups, and enhancements to tests and framework:
      
         - introduce _NULL and _NOT_NULL macros to pointer error checks
      
         - rework kunit_resource allocation policy to fix memory leaks when
           caller doesn't specify free() function to be used when allocating
           memory using kunit_add_resource() and kunit_alloc_resource() funcs.
      
         - add ability to specify suite-level init and exit functions"
      
      * tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (41 commits)
        kunit: tool: Use qemu-system-i386 for i386 runs
        kunit: fix executor OOM error handling logic on non-UML
        kunit: tool: update riscv QEMU config with new serial dependency
        kcsan: test: use new suite_{init,exit} support
        kunit: tool: Add list of all valid test configs on UML
        kunit: take `kunit_assert` as `const`
        kunit: tool: misc cleanups
        kunit: tool: minor cosmetic cleanups in kunit_parser.py
        kunit: tool: make parser stop overwriting status of suites w/ no_tests
        kunit: tool: remove dead parse_crash_in_log() logic
        kunit: tool: print clearer error message when there's no TAP output
        kunit: tool: stop using a shell to run kernel under QEMU
        kunit: tool: update test counts summary line format
        kunit: bail out of test filtering logic quicker if OOM
        lib/Kconfig.debug: change KUnit tests to default to KUNIT_ALL_TESTS
        kunit: Rework kunit_resource allocation policy
        kunit: fix debugfs code to use enum kunit_status, not bool
        kfence: test: use new suite_{init/exit} support, add .kunitconfig
        kunit: add ability to specify suite-level init and exit functions
        kunit: rename print_subtest_{start,end} for clarity (s/subtest/suite)
        ...
      64e34b50
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-next-5.19-rc1' of... · 1c6d2ead
      Linus Torvalds authored
      Merge tag 'linux-kselftest-next-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest updates from Shuah Khan:
       "Several fixes, cleanups, and enhancements to tests:
      
         - add mips support for kprobe args string and syntax tests
      
         - updates to resctrl test to use kselftest framework
      
         - fixes, cleanups, and enhancements to tests"
      
      * tag 'linux-kselftest-next-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        kselftests/ir : Improve readability of modprobe error message
        selftests/resctrl: Fix null pointer dereference on open failed
        selftests/resctrl: Add missing SPDX license to Makefile
        selftests/resctrl: Update README about using kselftest framework to build/run resctrl_tests
        selftests/resctrl: Make resctrl_tests run using kselftest framework
        selftests/resctrl: Fix resctrl_tests' return code to work with selftest framework
        selftests/resctrl: Change the default limited time to 120 seconds
        selftests/resctrl: Kill child process before parent process terminates if SIGTERM is received
        selftests/resctrl: Print a message if the result of MBM&CMT tests is failed on Intel CPU
        selftests/resctrl: Extend CPU vendor detection
        selftests/x86/corrupt_xstate_header: Use provided __cpuid_count() macro
        selftests/x86/amx: Use provided __cpuid_count() macro
        selftests/vm/pkeys: Use provided __cpuid_count() macro
        selftests: Provide local define of __cpuid_count()
        selftests/damon: add damon to selftests root Makefile
        selftests/binderfs: Improve message to provide more info
        selftests: mqueue: drop duplicate min definition
        selftests/ftrace: add mips support for kprobe args syntax tests
        selftests/ftrace: add mips support for kprobe args string tests
      1c6d2ead
    • Linus Torvalds's avatar
      Merge tag 'docs-5.19' of git://git.lwn.net/linux · 88a61892
      Linus Torvalds authored
      Pull documentation updates from Jonathan Corbet:
       "It was a moderately busy cycle for documentation; highlights include:
      
         - After a long period of inactivity, the Japanese translations are
           seeing some much-needed maintenance and updating.
      
         - Reworked IOMMU documentation
      
         - Some new documentation for static-analysis tools
      
         - A new overall structure for the memory-management documentation.
           This is an LSFMM outcome that, it is hoped, will help encourage
           developers to fill in the many gaps. Optimism is eternal...but
           hopefully it will work.
      
         - More Chinese translations.
      
        Plus the usual typo fixes, updates, etc"
      
      * tag 'docs-5.19' of git://git.lwn.net/linux: (70 commits)
        docs: pdfdocs: Add space for chapter counts >= 100 in TOC
        docs/zh_CN: Add dev-tools/gdb-kernel-debugging.rst Chinese translation
        input: Docs: correct ntrig.rst typo
        input: Docs: correct atarikbd.rst typos
        MAINTAINERS: Become the docs/zh_CN maintainer
        docs/zh_CN: fix devicetree usage-model translation
        mm,doc: Add new documentation structure
        Documentation: drop more IDE boot options and ide-cd.rst
        Documentation/process: use scripts/get_maintainer.pl on patches
        MAINTAINERS: Add entry for DOCUMENTATION/JAPANESE
        docs/trans/ja_JP/howto: Don't mention specific kernel versions
        docs/ja_JP/SubmittingPatches: Request summaries for commit references
        docs/ja_JP/SubmittingPatches: Add Suggested-by as a standard signature
        docs/ja_JP/SubmittingPatches: Randy has moved
        docs/ja_JP/SubmittingPatches: Suggest the use of scripts/get_maintainer.pl
        docs/ja_JP/SubmittingPatches: Update GregKH links
        Documentation/sysctl: document max_rcu_stall_to_panic
        Documentation: add missing angle bracket in cgroup-v2 doc
        Documentation: dev-tools: use literal block instead of code-block
        docs/zh_CN: add vm numa translation
        ...
      88a61892
    • Linus Torvalds's avatar
      Merge tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · 537e62c8
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
      
       - Offload writing printk() messages on consoles to per-console
         kthreads.
      
         It prevents soft-lockups when an extensive amount of messages is
         printed. It was observed, for example, during boot of large systems
         with a lot of peripherals like disks or network interfaces.
      
         It prevents live-lockups that were observed, for example, when
         messages about allocation failures were reported and a CPU handled
         consoles instead of reclaiming the memory. It was hard to solve even
         with rate limiting because it would need to take into account the
         amount of messages and the speed of all consoles.
      
         It is a must to have for real time. Otherwise, any printk() might
         break latency guarantees.
      
         The per-console kthreads allow to handle each console on its own
         speed. Slow consoles do not longer slow down faster ones. And
         printk() does not longer unpredictably slows down various code paths.
      
         There are situations when the kthreads are either not available or
         not reliable, for example, early boot, suspend, or panic. In these
         situations, printk() uses the legacy mode and tries to handle
         consoles immediately.
      
       - Add documentation for the printk index.
      
      * tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        printk, tracing: fix console tracepoint
        printk: remove @console_locked
        printk: extend console_lock for per-console locking
        printk: add kthread console printers
        printk: add functions to prefer direct printing
        printk: add pr_flush()
        printk: move buffer definitions into console_emit_next_record() caller
        printk: refactor and rework printing logic
        printk: add con_printk() macro for console details
        printk: call boot_delay_msec() in printk_delay()
        printk: get caller_id/timestamp after migration disable
        printk: wake waiters for safe and NMI contexts
        printk: wake up all waiters
        printk: add missing memory barrier to wake_up_klogd()
        printk: cpu sync always disable interrupts
        printk: rename cpulock functions
        printk/index: Printk index feature documentation
        MAINTAINERS: Add printk indexing maintainers on mention of printk_index
      537e62c8
    • Linus Torvalds's avatar
      Merge tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · 2e17ce11
      Linus Torvalds authored
      Pull slab updates from Vlastimil Babka:
      
       - Conversion of slub_debug stack traces to stackdepot, allowing more
         useful debugfs-based inspection for e.g. memory leak debugging.
         Allocation and free debugfs info now includes full traces and is
         sorted by the unique trace frequency.
      
         The stackdepot conversion was already attempted last year but
         reverted by ae14c63a. The memory overhead (while not actually
         enabled on boot) has been meanwhile solved by making the large
         stackdepot allocation dynamic. The xfstest issues haven't been
         reproduced on current kernel locally nor in -next, so the slab cache
         layout changes that originally made that bug manifest were probably
         not the root cause.
      
       - Refactoring of dma-kmalloc caches creation.
      
       - Trivial cleanups such as removal of unused parameters, fixes and
         clarifications of comments.
      
       - Hyeonggon Yoo joins as a reviewer.
      
      * tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        MAINTAINERS: add myself as reviewer for slab
        mm/slub: remove unused kmem_cache_order_objects max
        mm: slab: fix comment for __assume_kmalloc_alignment
        mm: slab: fix comment for ARCH_KMALLOC_MINALIGN
        mm/slub: remove unneeded return value of slab_pad_check
        mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache()
        mm/slub: remove meaningless node check in ___slab_alloc()
        mm/slub: remove duplicate flag in allocate_slab()
        mm/slub: remove unused parameter in setup_object*()
        mm/slab.c: fix comments
        slab, documentation: add description of debugfs files for SLUB caches
        mm/slub: sort debugfs output by frequency of stack traces
        mm/slub: distinguish and print stack traces in debugfs files
        mm/slub: use stackdepot to save stack trace in objects
        mm/slub: move struct track init out of set_track()
        lib/stackdepot: allow requesting early initialization dynamically
        mm/slub, kunit: Make slub_kunit unaffected by user specified flags
        mm/slab: remove some unused functions
      2e17ce11
    • Linus Torvalds's avatar
      linux/types.h: reinstate "__bitwise__" macro for user space use · caa28984
      Linus Torvalds authored
      Commit c724c866 ("linux/types.h: remove unnecessary __bitwise__")
      was right that there are no users of __bitwise__ in the kernel, but it
      turns out there are user space users of it that do expect it.
      
      It is, after all, in the uapi directory, so user space usage is to be
      expected.
      
      Instead of reverting the commit completely, let's just clarify the
      situation so that it doesn't happen again, and have some in-code
      explanations for why that "__bitwise__" still exists.
      Reported-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Link: https://lore.kernel.org/all/b5c0a68d-8387-4909-beea-f70ab9e6e3d5@kernel.org/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caa28984
    • Sean Young's avatar
      media: lirc: revert removal of unused feature flags · e5499dd7
      Sean Young authored
      Commit b2a90f4f ("media: lirc: remove unused lirc features") removed
      feature flags which were never implemented, but they are still used by
      the lirc daemon went built from source.
      
      Reinstate these symbols in order not to break the lirc build.
      
      Fixes: b2a90f4f ("media: lirc: remove unused lirc features")
      Link: https://lore.kernel.org/all/a0470450-ecfd-2918-e04a-7b57c1fd7694@kernel.org/Reported-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Signed-off-by: default avatarSean Young <sean@mess.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5499dd7
    • Linus Torvalds's avatar
      Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache · fdaf9a58
      Linus Torvalds authored
      Pull page cache updates from Matthew Wilcox:
      
       - Appoint myself page cache maintainer
      
       - Fix how scsicam uses the page cache
      
       - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
      
       - Remove the AOP flags entirely
      
       - Remove pagecache_write_begin() and pagecache_write_end()
      
       - Documentation updates
      
       - Convert several address_space operations to use folios:
           - is_dirty_writeback
           - readpage becomes read_folio
           - releasepage becomes release_folio
           - freepage becomes free_folio
      
       - Change filler_t to require a struct file pointer be the first
         argument like ->read_folio
      
      * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
        nilfs2: Fix some kernel-doc comments
        Appoint myself page cache maintainer
        fs: Remove aops->freepage
        secretmem: Convert to free_folio
        nfs: Convert to free_folio
        orangefs: Convert to free_folio
        fs: Add free_folio address space operation
        fs: Convert drop_buffers() to use a folio
        fs: Change try_to_free_buffers() to take a folio
        jbd2: Convert release_buffer_page() to use a folio
        jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
        reiserfs: Convert release_buffer_page() to use a folio
        fs: Remove last vestiges of releasepage
        ubifs: Convert to release_folio
        reiserfs: Convert to release_folio
        orangefs: Convert to release_folio
        ocfs2: Convert to release_folio
        nilfs2: Remove comment about releasepage
        nfs: Convert to release_folio
        jfs: Convert to release_folio
        ...
      fdaf9a58
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.19-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 8642174b
      Linus Torvalds authored
      Pull iomap updates from Darrick Wong:
       "There's a couple of corrections sent in by Andreas for some accounting
        errors.
      
        The biggest change this time around is that writeback errors longer
        clear pageuptodate nor does XFS invalidate the page cache anymore.
        This brings XFS (and gfs2/zonefs) behavior in line with every other
        Linux filesystem driver, and fixes some UAF bugs that only cropped up
        after willy turned on multipage folios for XFS in 5.18-rc1.
      
        Regrettably, it took all the way to the end of the 5.18 cycle to find
        the source of these bugs and reach a consensus that XFS' writeback
        failure behavior from 20 years ago is no longer necessary.
      
        Summary:
      
         - Fix a couple of accounting errors in the buffered io code.
      
         - Discontinue the practice of marking folios !uptodate and
           invalidating them when writeback fails.
      
           This fixes some UAF bugs when multipage folios are enabled, and
           brings the behavior of XFS/gfs/zonefs into alignment with the
           behavior of all the other Linux filesystems"
      
      * tag 'iomap-5.19-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: don't invalidate folios after writeback errors
        iomap: iomap_write_end cleanup
        iomap: iomap_write_failed fix
      8642174b
    • Linus Torvalds's avatar
      Merge tag 'dlm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm · f2898112
      Linus Torvalds authored
      Pull dlm updates from David Teigland:
       "This includes several large patches to improve endian handling and
        remove sparse warnings. The code previously used in/out, in-place
        endianness conversion functions.
      
        Other code cleanup includes the list iterator changes.
      
        Finally, a long standing bug was found and fixed, caused by missed
        decrement on an lock struct ref count"
      
      * tag 'dlm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (28 commits)
        dlm: use kref_put_lock in __put_lkb
        dlm: use kref_put_lock in put_rsb
        dlm: remove unnecessary error assign
        dlm: fix missing lkb refcount handling
        fs: dlm: cast resource pointer to uintptr_t
        dlm: replace usage of found with dedicated list iterator variable
        dlm: remove usage of list iterator for list_add() after the loop body
        dlm: fix pending remove if msg allocation fails
        dlm: fix wake_up() calls for pending remove
        dlm: check required context while close
        dlm: cleanup lock handling in dlm_master_lookup
        dlm: remove found label in dlm_master_lookup
        dlm: remove __user conversion warnings
        dlm: move conversion to compile time
        dlm: use __le types for dlm messages
        dlm: use __le types for rcom messages
        dlm: use __le types for dlm header
        dlm: use __le types for options header
        dlm: add __CHECKER__ for false positives
        dlm: move global to static inits
        ...
      f2898112
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · fea30433
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "Various bug fixes and cleanups for ext4.
      
        In particular, move the crypto related fucntions from fs/ext4/super.c
        into a new fs/ext4/crypto.c, and fix a number of bugs found by fuzzers
        and error injection tools"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
        ext4: only allow test_dummy_encryption when supported
        ext4: fix bug_on in __es_tree_search
        ext4: avoid cycles in directory h-tree
        ext4: verify dir block before splitting it
        ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state
        ext4: fix bug_on in ext4_writepages
        ext4: refactor and move ext4_ioctl_get_encryption_pwsalt()
        ext4: cleanup function defs from ext4.h into crypto.c
        ext4: move ext4 crypto code to its own file crypto.c
        ext4: fix memory leak in parse_apply_sb_mount_options()
        ext4: reject the 'commit' option on ext2 filesystems
        ext4: remove duplicated #include of dax.h in inode.c
        ext4: fix race condition between ext4_write and ext4_convert_inline_data
        ext4: convert symlink external data block mapping to bdev
        ext4: add nowait mode for ext4_getblk()
        ext4: fix journal_ioprio mount option handling
        ext4: mark group as trimmed only if it was fully scanned
        ext4: fix use-after-free in ext4_rename_dir_prepare
        ext4: add unmount filesystem message
        ext4: remove unnecessary conditionals
        ...
      fea30433
    • Linus Torvalds's avatar
      Merge tag 'gfs2-v5.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 7208c984
      Linus Torvalds authored
      Pull gfs2 updates from Andreas Gruenbacher:
      
       - Clean up the allocation of glocks that have an address space attached
      
       - Quota locking fix and quota iomap conversion
      
       - Fix the FITRIM error reporting
      
       - Some list iterator cleanups
      
      * tag 'gfs2-v5.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Convert function bh_get to use iomap
        gfs2: use i_lock spin_lock for inode qadata
        gfs2: Return more useful errors from gfs2_rgrp_send_discards()
        gfs2: Use container_of() for gfs2_glock(aspace)
        gfs2: Explain some direct I/O oddities
        gfs2: replace 'found' with dedicated list iterator variable
      7208c984
    • Linus Torvalds's avatar
      Merge tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · bd1b7c13
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "Features:
      
         - subpage:
            - support for PAGE_SIZE > 4K (previously only 64K)
            - make it work with raid56
      
         - repair super block num_devices automatically if it does not match
           the number of device items
      
         - defrag can convert inline extents to regular extents, up to now
           inline files were skipped but the setting of mount option
           max_inline could affect the decision logic
      
         - zoned:
            - minimal accepted zone size is explicitly set to 4MiB
            - make zone reclaim less aggressive and don't reclaim if there are
              enough free zones
            - add per-profile sysfs tunable of the reclaim threshold
      
         - allow automatic block group reclaim for non-zoned filesystems, with
           sysfs tunables
      
         - tree-checker: new check, compare extent buffer owner against owner
           rootid
      
        Performance:
      
         - avoid blocking on space reservation when doing nowait direct io
           writes (+7% throughput for reads and writes)
      
         - NOCOW write throughput improvement due to refined locking (+3%)
      
         - send: reduce pressure to page cache by dropping extent pages right
           after they're processed
      
        Core:
      
         - convert all radix trees to xarray
      
         - add iterators for b-tree node items
      
         - support printk message index
      
         - user bulk page allocation for extent buffers
      
         - switch to bio_alloc API, use on-stack bios where convenient, other
           bio cleanups
      
         - use rw lock for block groups to favor concurrent reads
      
         - simplify workques, don't allocate high priority threads for all
           normal queues as we need only one
      
         - refactor scrub, process chunks based on their constraints and
           similarity
      
         - allocate direct io structures on stack and pass around only
           pointers, avoids allocation and reduces potential error handling
      
        Fixes:
      
         - fix count of reserved transaction items for various inode
           operations
      
         - fix deadlock between concurrent dio writes when low on free data
           space
      
         - fix a few cases when zones need to be finished
      
        VFS, iomap:
      
         - add helper to check if sb write has started (usable for assertions)
      
         - new helper iomap_dio_alloc_bio, export iomap_dio_bio_end_io"
      
      * tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (173 commits)
        btrfs: zoned: introduce a minimal zone size 4M and reject mount
        btrfs: allow defrag to convert inline extents to regular extents
        btrfs: add "0x" prefix for unsupported optional features
        btrfs: do not account twice for inode ref when reserving metadata units
        btrfs: zoned: fix comparison of alloc_offset vs meta_write_pointer
        btrfs: send: avoid trashing the page cache
        btrfs: send: keep the current inode open while processing it
        btrfs: allocate the btrfs_dio_private as part of the iomap dio bio
        btrfs: move struct btrfs_dio_private to inode.c
        btrfs: remove the disk_bytenr in struct btrfs_dio_private
        btrfs: allocate dio_data on stack
        iomap: add per-iomap_iter private data
        iomap: allow the file system to provide a bio_set for direct I/O
        btrfs: add a btrfs_dio_rw wrapper
        btrfs: zoned: zone finish unused block group
        btrfs: zoned: properly finish block group on metadata write
        btrfs: zoned: finish block group when there are no more allocatable bytes left
        btrfs: zoned: consolidate zone finish functions
        btrfs: zoned: introduce btrfs_zoned_bg_is_full
        btrfs: improve error reporting in lookup_inline_extent_backref
        ...
      bd1b7c13
    • Linus Torvalds's avatar
      Merge tag 'zonefs-5.19-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · 3842007b
      Linus Torvalds authored
      Pull zonefs fix from Damien Le Moal:
       "A single patch to fix zonefs_init_file_inode() return value"
      
      * tag 'zonefs-5.19-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonefs: Fix zonefs_init_file_inode() return value
      3842007b
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 65965d95
      Linus Torvalds authored
      Pull erofs (and fscache) updates from Gao Xiang:
       "After working on it on the mailing list for more than half a year, we
        finally form 'erofs over fscache' feature into shape. Hopefully it
        could bring more possibility to the communities.
      
        The story mainly started from a new project what we called "RAFS v6" [1]
        for Nydus image service almost a year ago, which enhances EROFS to be
        a new form of one bootstrap (which includes metadata representing the
        whole fs tree) + several data-deduplicated content addressable blobs
        (actually treated as multiple devices). Each blob can represent one
        container image layer but not quite exactly since all new data can be
        fully existed in the previous blobs so no need to introduce another
        new blob.
      
        It is actually not a new idea (at least on my side it's much like a
        simpilied casync [2] for now) and has many benefits over per-file
        blobs or some other exist ways since typically each RAFS v6 image only
        has dozens of device blobs instead of thousands of per-file blobs.
        It's easy to be signed with user keys as a golden image, transfered
        untouchedly with minimal overhead over the network, kept in some type
        of storage conveniently, and run with (optional) runtime verification
        but without involving too many irrelevant features crossing the system
        beyond EROFS itself. At least it's our final goal and we're keeping
        working on it. There was also a good summary of this approach from the
        casync author [3].
      
        Regardless further optimizations, this work is almost done in the
        previous Linux release cycles. In this round, we'd like to introduce
        on-demand load for EROFS with the fscache/cachefiles infrastructure,
        considering the following advantages:
      
         - Introduce new file-based backend to EROFS. Although each image only
           contains dozens of blobs but in densely-deployed runC host for
           example, there could still be massive blobs on a machine, which is
           messy if each blob is treated as a device. In contrast, fscache and
           cachefiles are really great interfaces for us to make them work.
      
         - Introduce on-demand load to fscache and EROFS. Previously, fscache
           is mainly used to caching network-likewise filesystems, now it can
           support on-demand downloading for local fses too with the exact
           localfs on-disk format. It has many advantages which we're been
           described in the latest patchset cover letter [4]. In addition to
           that, most importantly, the cached data is still stored in the
           original local fs on-disk format so that it's still the one signed
           with private keys but only could be partially available. Users can
           fully trust it during running. Later, users can also back up
           cachefiles easily to another machine.
      
         - More reliable on-demand approach in principle. After data is all
           available locally, user daemon can be no longer online in some use
           cases, which helps daemon crash recovery (filesystems can still in
           service) and hot-upgrade (user daemon can be upgraded more
           frequently due to new features or protocols introduced.)
      
         - Other format can also be converted to EROFS filesystem format over
           the internet on the fly with the new on-demand load feature and
           mounted. That is entirely possible with on-demand load feature as
           long as such archive format metadata can be fetched in advance like
           stargz.
      
        In addition, although currently our target user is Nydus image service [5],
        but laterly, it can be used for other use cases like on-demand system
        booting, etc. As for the fscache on-demand load feature itself,
        strictly it can be used for other local fses too. Laterly we could
        promote most code to the iomap infrastructure and also enhance it in
        the read-write way if other local fses are interested.
      
        Thanks David Howells for taking so much time and patience on this
        these months, many thanks with great respect here again! Thanks Jeffle
        for working on this feature and Xin Yin from Bytedance for
        asynchronous I/O implementation as well as Zichen Tian, Jia Zhu, and
        Yan Song for testing, much appeciated. We're also exploring more
        possibly over fscache cache management over FSDAX for secure
        containers and working on more improvements and useful features for
        fscache, cachefiles, and on-demand load.
      
        In addition to "erofs over fscache", NFS export and idmapped mount are
        also completed in this cycle for container use cases as well.
      
        Summary:
      
         - Add erofs on-demand load support over fscache
      
         - Support NFS export for erofs
      
         - Support idmapped mounts for erofs
      
         - Don't prompt for risk any more when using big pcluster
      
         - Fix buffer copy overflow of ztailpacking feature
      
         - Several minor cleanups"
      
      [1] https://lore.kernel.org/r/20210730194625.93856-1-hsiangkao@linux.alibaba.com
      [2] https://github.com/systemd/casync
      [3] http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
      [4] https://lore.kernel.org/r/20220509074028.74954-1-jefflexu@linux.alibaba.com
      [5] https://github.com/dragonflyoss/image-service
      
      * tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (29 commits)
        erofs: scan devices from device table
        erofs: change to use asynchronous io for fscache readpage/readahead
        erofs: add 'fsid' mount option
        erofs: implement fscache-based data readahead
        erofs: implement fscache-based data read for inline layout
        erofs: implement fscache-based data read for non-inline layout
        erofs: implement fscache-based metadata read
        erofs: register fscache context for extra data blobs
        erofs: register fscache context for primary data blob
        erofs: add erofs_fscache_read_folios() helper
        erofs: add anonymous inode caching metadata for data blobs
        erofs: add fscache context helper functions
        erofs: register fscache volume
        erofs: add fscache mode check helper
        erofs: make erofs_map_blocks() generally available
        cachefiles: document on-demand read mode
        cachefiles: add tracepoints for on-demand read mode
        cachefiles: enable on-demand read mode
        cachefiles: implement on-demand read
        cachefiles: notify the user daemon when withdrawing cookie
        ...
      65965d95
    • Linus Torvalds's avatar
      Merge tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat · 850f6033
      Linus Torvalds authored
      Pull exfat updates from Namjae Jeon:
      
       - fix referencing wrong parent directory information during rename
      
       - introduce a sys_tz mount option to use system timezone
      
       - improve performance while zeroing a cluster with dirsync mount option
      
       - fix slab-out-bounds in exat_clear_bitmap() reported from syzbot
      
      * tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
        exfat: check if cluster num is valid
        exfat: reduce block requests when zeroing a cluster
        block: add sync_blockdev_range()
        exfat: introduce mount option 'sys_tz'
        exfat: fix referencing wrong parent directory information after renaming
      850f6033
    • Linus Torvalds's avatar
      Merge tag 'fs.idmapped.v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · f30fabe7
      Linus Torvalds authored
      Pull fs idmapping updates from Christian Brauner:
       "This contains two minor updates:
      
         - An update to the idmapping documentation by Rodrigo making it
           easier to understand that we first introduce several use-cases that
           fail without idmapped mounts simply to explain how they can be
           handled with idmapped mounts.
      
         - When changing a mount's idmapping we now hold writers to make it
           more robust.
      
           This is similar to turning a mount ro with the difference that in
           contrast to turning a mount ro changing the idmapping can only ever
           be done once while a mount can transition between ro and rw as much
           as it wants.
      
           The vfs layer itself takes care to retrieve the idmapping of a
           mount once ensuring that the idmapping used for vfs permission
           checking is identical to the idmapping passed down to the
           filesystem. All filesystems with FS_ALLOW_IDMAP raised take the
           same precautions as the vfs in code-paths that are outside of
           direct control of the vfs such as ioctl()s.
      
           However, holding writers makes this more robust and predictable for
           both the kernel and userspace.
      
           This is a minor user-visible change. But it is extremely unlikely
           to matter. The caller must've created a detached mount via
           OPEN_TREE_CLONE and then handed that O_PATH fd to another process
           or thread which then must've gotten a writable fd for that mount
           and started creating files in there while the caller is still
           changing mount properties. While not impossible it will be an
           extremely rare corner-case and should in general be considered a
           bug in the application. Consider making a mount MOUNT_ATTR_NOEXEC
           or MOUNT_ATTR_NODEV while allowing someone else to perform lookups
           or exec'ing in parallel by handing them a copy of the
           OPEN_TREE_CLONE fd or another fd beneath that mount.
      
           I've pinged all major users of idmapped mounts pointing out this
           change and none of them have active writers on a mount while still
           changing mount properties. It would've been strange if they did.
      
        The rest and majority of the work will be coming through the overlayfs
        tree this cycle. In addition to overlayfs this cycle should also see
        support for idmapped mounts on erofs as I've acked a patch to this
        effect a little while ago"
      
      * tag 'fs.idmapped.v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        fs: hold writers when changing mount's idmapping
        docs: Add small intro to idmap examples
      f30fabe7