1. 10 Aug, 2022 7 commits
  2. 08 Aug, 2022 4 commits
  3. 05 Aug, 2022 1 commit
  4. 04 Aug, 2022 11 commits
    • Jinghao Jia's avatar
      BPF: Fix potential bad pointer dereference in bpf_sys_bpf() · e2dcac2f
      Jinghao Jia authored
      The bpf_sys_bpf() helper function allows an eBPF program to load another
      eBPF program from within the kernel. In this case the argument union
      bpf_attr pointer (as well as the insns and license pointers inside) is a
      kernel address instead of a userspace address (which is the case of a
      usual bpf() syscall). To make the memory copying process in the syscall
      work in both cases, bpfptr_t was introduced to wrap around the pointer
      and distinguish its origin. Specifically, when copying memory contents
      from a bpfptr_t, a copy_from_user() is performed in case of a userspace
      address and a memcpy() is performed for a kernel address.
      
      This can lead to problems because the in-kernel pointer is never checked
      for validity. The problem happens when an eBPF syscall program tries to
      call bpf_sys_bpf() to load a program but provides a bad insns pointer --
      say 0xdeadbeef -- in the bpf_attr union. The helper calls __sys_bpf()
      which would then call bpf_prog_load() to load the program.
      bpf_prog_load() is responsible for copying the eBPF instructions to the
      newly allocated memory for the program; it creates a kernel bpfptr_t for
      insns and invokes copy_from_bpfptr(). Internally, all bpfptr_t
      operations are backed by the corresponding sockptr_t operations, which
      performs direct memcpy() on kernel pointers for copy_from/strncpy_from
      operations. Therefore, the code is always happy to dereference the bad
      pointer to trigger a un-handle-able page fault and in turn an oops.
      However, this is not supposed to happen because at that point the eBPF
      program is already verified and should not cause a memory error.
      
      Sample KASAN trace:
      
      [   25.685056][  T228] ==================================================================
      [   25.685680][  T228] BUG: KASAN: user-memory-access in copy_from_bpfptr+0x21/0x30
      [   25.686210][  T228] Read of size 80 at addr 00000000deadbeef by task poc/228
      [   25.686732][  T228]
      [   25.686893][  T228] CPU: 3 PID: 228 Comm: poc Not tainted 5.19.0-rc7 #7
      [   25.687375][  T228] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
      [   25.687991][  T228] Call Trace:
      [   25.688223][  T228]  <TASK>
      [   25.688429][  T228]  dump_stack_lvl+0x73/0x9e
      [   25.688747][  T228]  print_report+0xea/0x200
      [   25.689061][  T228]  ? copy_from_bpfptr+0x21/0x30
      [   25.689401][  T228]  ? _printk+0x54/0x6e
      [   25.689693][  T228]  ? _raw_spin_lock_irqsave+0x70/0xd0
      [   25.690071][  T228]  ? copy_from_bpfptr+0x21/0x30
      [   25.690412][  T228]  kasan_report+0xb5/0xe0
      [   25.690716][  T228]  ? copy_from_bpfptr+0x21/0x30
      [   25.691059][  T228]  kasan_check_range+0x2bd/0x2e0
      [   25.691405][  T228]  ? copy_from_bpfptr+0x21/0x30
      [   25.691734][  T228]  memcpy+0x25/0x60
      [   25.692000][  T228]  copy_from_bpfptr+0x21/0x30
      [   25.692328][  T228]  bpf_prog_load+0x604/0x9e0
      [   25.692653][  T228]  ? cap_capable+0xb4/0xe0
      [   25.692956][  T228]  ? security_capable+0x4f/0x70
      [   25.693324][  T228]  __sys_bpf+0x3af/0x580
      [   25.693635][  T228]  bpf_sys_bpf+0x45/0x240
      [   25.693937][  T228]  bpf_prog_f0ec79a5a3caca46_bpf_func1+0xa2/0xbd
      [   25.694394][  T228]  bpf_prog_run_pin_on_cpu+0x2f/0xb0
      [   25.694756][  T228]  bpf_prog_test_run_syscall+0x146/0x1c0
      [   25.695144][  T228]  bpf_prog_test_run+0x172/0x190
      [   25.695487][  T228]  __sys_bpf+0x2c5/0x580
      [   25.695776][  T228]  __x64_sys_bpf+0x3a/0x50
      [   25.696084][  T228]  do_syscall_64+0x60/0x90
      [   25.696393][  T228]  ? fpregs_assert_state_consistent+0x50/0x60
      [   25.696815][  T228]  ? exit_to_user_mode_prepare+0x36/0xa0
      [   25.697202][  T228]  ? syscall_exit_to_user_mode+0x20/0x40
      [   25.697586][  T228]  ? do_syscall_64+0x6e/0x90
      [   25.697899][  T228]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [   25.698312][  T228] RIP: 0033:0x7f6d543fb759
      [   25.698624][  T228] Code: 08 5b 89 e8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 a6 0e 00 f7 d8 64 89 01 48
      [   25.699946][  T228] RSP: 002b:00007ffc3df78468 EFLAGS: 00000287 ORIG_RAX: 0000000000000141
      [   25.700526][  T228] RAX: ffffffffffffffda RBX: 00007ffc3df78628 RCX: 00007f6d543fb759
      [   25.701071][  T228] RDX: 0000000000000090 RSI: 00007ffc3df78478 RDI: 000000000000000a
      [   25.701636][  T228] RBP: 00007ffc3df78510 R08: 0000000000000000 R09: 0000000000300000
      [   25.702191][  T228] R10: 0000000000000005 R11: 0000000000000287 R12: 0000000000000000
      [   25.702736][  T228] R13: 00007ffc3df78638 R14: 000055a1584aca68 R15: 00007f6d5456a000
      [   25.703282][  T228]  </TASK>
      [   25.703490][  T228] ==================================================================
      [   25.704050][  T228] Disabling lock debugging due to kernel taint
      
      Update copy_from_bpfptr() and strncpy_from_bpfptr() so that:
       - for a kernel pointer, it uses the safe copy_from_kernel_nofault() and
         strncpy_from_kernel_nofault() functions.
       - for a userspace pointer, it performs copy_from_user() and
         strncpy_from_user().
      
      Fixes: af2ac3e1 ("bpf: Prepare bpf syscall to be used from kernel and user space.")
      Link: https://lore.kernel.org/bpf/20220727132905.45166-1-jinghao@linux.ibm.com/Signed-off-by: default avatarJinghao Jia <jinghao@linux.ibm.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20220729201713.88688-1-jinghao@linux.ibm.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e2dcac2f
    • Paul E. McKenney's avatar
      bpf: Update bpf_design_QA.rst to clarify that BTF_ID does not ABIify a function · 8fcf1969
      Paul E. McKenney authored
      This patch updates bpf_design_QA.rst to clarify that mentioning a function
      to the BTF_ID macro does not make that function become part of the Linux
      kernel's ABI.
      Suggested-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Link: https://lore.kernel.org/r/20220802173913.4170192-3-paulmck@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8fcf1969
    • Paul E. McKenney's avatar
      bpf: Update bpf_design_QA.rst to clarify that attaching to functions is not ABI · 62fc770d
      Paul E. McKenney authored
      This patch updates bpf_design_QA.rst to clarify that the ability to
      attach a BPF program to an arbitrary function in the kernel does not
      make that function become part of the Linux kernel's ABI.
      
      [ paulmck: Apply Daniel Borkmann feedback. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Link: https://lore.kernel.org/r/20220802173913.4170192-2-paulmck@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      62fc770d
    • Paul E. McKenney's avatar
      bpf: Update bpf_design_QA.rst to clarify that kprobes is not ABI · b9b738ee
      Paul E. McKenney authored
      This patch updates bpf_design_QA.rst to clarify that the ability to
      attach a BPF program to a given point in the kernel code via kprobes
      does not make that attachment point be part of the Linux kernel's ABI.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Link: https://lore.kernel.org/r/20220802173913.4170192-1-paulmck@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b9b738ee
    • Yu Xiao's avatar
      nfp: ethtool: fix the display error of `ethtool -m DEVNAME` · 4ae97cae
      Yu Xiao authored
      The port flag isn't set to `NFP_PORT_CHANGED` when using
      `ethtool -m DEVNAME` before, so the port state (e.g. interface)
      cannot be updated. Therefore, it caused that `ethtool -m DEVNAME`
      sometimes cannot read the correct information.
      
      E.g. `ethtool -m DEVNAME` cannot work when load driver before plug
      in optical module, as the port interface is still NONE without port
      update.
      
      Now update the port state before sending info to NIC to ensure that
      port interface is correct (latest state).
      
      Fixes: 61f7c6f4 ("nfp: implement ethtool get module EEPROM")
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarYu Xiao <yu.xiao@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220802093355.69065-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4ae97cae
    • Florian Fainelli's avatar
      net: phy: Warn about incorrect mdio_bus_phy_resume() state · 744d23c7
      Florian Fainelli authored
      Calling mdio_bus_phy_resume() with neither the PHY state machine set to
      PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
      that we can produce a race condition looking like this:
      
      CPU0						CPU1
      bcmgenet_resume
       -> phy_resume
         -> phy_init_hw
       -> phy_start
         -> phy_resume
                                                      phy_start_aneg()
      mdio_bus_phy_resume
       -> phy_resume
          -> phy_write(..., BMCR_RESET)
           -> usleep()                                  -> phy_read()
      
      with the phy_resume() function triggering a PHY behavior that might have
      to be worked around with (see bf8bfc43 ("net: phy: broadcom: Fix
      brcm_fet_config_init()") for instance) that ultimately leads to an error
      reading from the PHY.
      
      Fixes: fba863b8 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220801233403.258871-1-f.fainelli@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      744d23c7
    • Jakub Kicinski's avatar
      Merge branch 'make-dsa-work-with-bonding-s-arp-monitor' · 7de196a6
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Make DSA work with bonding's ARP monitor
      
      Since commit 2b86cb82 ("net: dsa: declare lockless TX feature for
      slave ports") in v5.7, DSA breaks the ARP monitoring logic from the
      bonding driver, fact which was pointed out by Brian Hutchinson who uses
      a linux-5.10.y stable kernel.
      
      Initially I got lured by other similar hacks introduced for other
      NETIF_F_LLTX drivers, which, inspired by the bonding documentation,
      update the trans_start of their TX queues by hand.
      
      However Jakub pointed out that this simply isn't a proper solution, and
      after coming to think more about it, I agree, and it doesn't work
      properly with DSA nor is it maintainable for the future changes I plan
      for it (multiple DSA masters in a LAG).
      
      I've tested these changes using a DSA-based setup and a veth-based
      setup, using the active-backup mode and ARP monitoring, with and without
      arp_validate.
      
      Link to v1:
      https://patchwork.kernel.org/project/netdevbpf/patch/20220715232641.952532-1-vladimir.oltean@nxp.com/
      
      Link to v2:
      https://patchwork.kernel.org/project/netdevbpf/patch/20220727152000.3616086-1-vladimir.oltean@nxp.com/
      ====================
      
      Link: https://lore.kernel.org/r/20220731124108.2810233-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7de196a6
    • Vladimir Oltean's avatar
      docs: net: bonding: remove mentions of trans_start · cba8d8f5
      Vladimir Oltean authored
      ARP monitoring no longer depends on dev->last_rx or dev_trans_start(),
      so delete this information.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cba8d8f5
    • Vladimir Oltean's avatar
      Revert "veth: Add updating of trans_start" · 08b403d5
      Vladimir Oltean authored
      This reverts commit e66e257a. The veth
      driver no longer needs these hacks which are slightly detrimential to
      the fast path performance, because the bonding driver is keeping track
      of TX times of ARP and NS probes by itself, which it should.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08b403d5
    • Vladimir Oltean's avatar
      net/sched: remove hacks added to dev_trans_start() for bonding to work · 4873a1b2
      Vladimir Oltean authored
      Now that the bonding driver keeps track of the last TX time of ARP and
      NS probes, we effectively revert the following commits:
      
      32d3e51a ("net_sched: use macvlan real dev trans_start in dev_trans_start()")
      07ce76aa ("net_sched: make dev_trans_start return vlan's real dev trans_start")
      
      Note that the approach of continuing to hack at this function would not
      get us very far, hence the desire to take a different approach. DSA is
      also a virtual device that uses NETIF_F_LLTX, but there, many uppers
      share the same lower (DSA master, i.e. the physical host port of a
      switch). By making dev_trans_start() on a DSA interface return the
      dev_trans_start() of the master, we effectively assume that all other
      DSA interfaces are silent, otherwise this corrupts the validity of the
      probe timestamp data from the bonding driver's perspective.
      
      Furthermore, the hacks didn't take into consideration the fact that the
      lower interface of @dev may not have been physical either. For example,
      VLAN over VLAN, or DSA with 2 masters in a LAG.
      
      And even furthermore, there are NETIF_F_LLTX devices which are not
      stacked, like veth. The hack here would not work with those, because it
      would not have to provide the bonding driver something to chew at all.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4873a1b2
    • Vladimir Oltean's avatar
      net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS · 06799a90
      Vladimir Oltean authored
      The bonding driver piggybacks on time stamps kept by the network stack
      for the purpose of the netdev TX watchdog, and this is problematic
      because it does not work with NETIF_F_LLTX devices.
      
      It is hard to say why the driver looks at dev_trans_start() of the
      slave->dev, considering that this is updated even by non-ARP/NS probes
      sent by us, and even by traffic not sent by us at all (for example PTP
      on physical slave devices). ARP monitoring in active-backup mode appears
      to still work even if we track only the last TX time of actual ARP
      probes.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      06799a90
  5. 03 Aug, 2022 17 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · f86d1fbb
      Linus Torvalds authored
      Pull networking changes from Paolo Abeni:
       "Core:
      
         - Refactor the forward memory allocation to better cope with memory
           pressure with many open sockets, moving from a per socket cache to
           a per-CPU one
      
         - Replace rwlocks with RCU for better fairness in ping, raw sockets
           and IP multicast router.
      
         - Network-side support for IO uring zero-copy send.
      
         - A few skb drop reason improvements, including codegen the source
           file with string mapping instead of using macro magic.
      
         - Rename reference tracking helpers to a more consistent netdev_*
           schema.
      
         - Adapt u64_stats_t type to address load/store tearing issues.
      
         - Refine debug helper usage to reduce the log noise caused by bots.
      
        BPF:
      
         - Improve socket map performance, avoiding skb cloning on read
           operation.
      
         - Add support for 64 bits enum, to match types exposed by kernel.
      
         - Introduce support for sleepable uprobes program.
      
         - Introduce support for enum textual representation in libbpf.
      
         - New helpers to implement synproxy with eBPF/XDP.
      
         - Improve loop performances, inlining indirect calls when possible.
      
         - Removed all the deprecated libbpf APIs.
      
         - Implement new eBPF-based LSM flavor.
      
         - Add type match support, which allow accurate queries to the eBPF
           used types.
      
         - A few TCP congetsion control framework usability improvements.
      
         - Add new infrastructure to manipulate CT entries via eBPF programs.
      
         - Allow for livepatch (KLP) and BPF trampolines to attach to the same
           kernel function.
      
        Protocols:
      
         - Introduce per network namespace lookup tables for unix sockets,
           increasing scalability and reducing contention.
      
         - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
      
         - Add support to forciby close TIME_WAIT TCP sockets via user-space
           tools.
      
         - Significant performance improvement for the TLS 1.3 receive path,
           both for zero-copy and not-zero-copy.
      
         - Support for changing the initial MTPCP subflow priority/backup
           status
      
         - Introduce virtually contingus buffers for sockets over RDMA, to
           cope better with memory pressure.
      
         - Extend CAN ethtool support with timestamping capabilities
      
         - Refactor CAN build infrastructure to allow building only the needed
           features.
      
        Driver API:
      
         - Remove devlink mutex to allow parallel commands on multiple links.
      
         - Add support for pause stats in distributed switch.
      
         - Implement devlink helpers to query and flash line cards.
      
         - New helper for phy mode to register conversion.
      
        New hardware / drivers:
      
         - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
      
         - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
      
         - Ethernet DSA driver for the Microchip LAN937x switch.
      
         - Ethernet PHY driver for the Aquantia AQR113C EPHY.
      
         - CAN driver for the OBD-II ELM327 interface.
      
         - CAN driver for RZ/N1 SJA1000 CAN controller.
      
         - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
      
        Drivers:
      
         - Intel Ethernet NICs:
            - i40e: add support for vlan pruning
            - i40e: add support for XDP framented packets
            - ice: improved vlan offload support
            - ice: add support for PPPoE offload
      
         - Mellanox Ethernet (mlx5)
            - refactor packet steering offload for performance and scalability
            - extend support for TC offload
            - refactor devlink code to clean-up the locking schema
            - support stacked vlans for bridge offloads
            - use TLS objects pool to improve connection rate
      
         - Netronome Ethernet NICs (nfp):
            - extend support for IPv6 fields mangling offload
            - add support for vepa mode in HW bridge
            - better support for virtio data path acceleration (VDPA)
            - enable TSO by default
      
         - Microsoft vNIC driver (mana)
            - add support for XDP redirect
      
         - Others Ethernet drivers:
            - bonding: add per-port priority support
            - microchip lan743x: extend phy support
            - Fungible funeth: support UDP segmentation offload and XDP xmit
            - Solarflare EF100: add support for virtual function representors
            - MediaTek SoC: add XDP support
      
         - Mellanox Ethernet/IB switch (mlxsw):
            - dropped support for unreleased H/W (XM router).
            - improved stats accuracy
            - unified bridge model coversion improving scalability (parts 1-6)
            - support for PTP in Spectrum-2 asics
      
         - Broadcom PHYs
            - add PTP support for BCM54210E
            - add support for the BCM53128 internal PHY
      
         - Marvell Ethernet switches (prestera):
            - implement support for multicast forwarding offload
      
         - Embedded Ethernet switches:
            - refactor OcteonTx MAC filter for better scalability
            - improve TC H/W offload for the Felix driver
            - refactor the Microchip ksz8 and ksz9477 drivers to share the
              probe code (parts 1, 2), add support for phylink mac
              configuration
      
         - Other WiFi:
            - Microchip wilc1000: diable WEP support and enable WPA3
            - Atheros ath10k: encapsulation offload support
      
        Old code removal:
      
         - Neterion vxge ethernet driver: this is untouched since more than 10 years"
      
      * tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
        doc: sfp-phylink: Fix a broken reference
        wireguard: selftests: support UML
        wireguard: allowedips: don't corrupt stack when detecting overflow
        wireguard: selftests: update config fragments
        wireguard: ratelimiter: use hrtimer in selftest
        net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
        net: usb: ax88179_178a: Bind only to vendor-specific interface
        selftests: net: fix IOAM test skip return code
        net: usb: make USB_RTL8153_ECM non user configurable
        net: marvell: prestera: remove reduntant code
        octeontx2-pf: Reduce minimum mtu size to 60
        net: devlink: Fix missing mutex_unlock() call
        net/tls: Remove redundant workqueue flush before destroy
        net: txgbe: Fix an error handling path in txgbe_probe()
        net: dsa: Fix spelling mistakes and cleanup code
        Documentation: devlink: add add devlink-selftests to the table of contents
        dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
        net: ionic: fix error check for vlan flags in ionic_set_nic_features()
        net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
        nfp: flower: add support for tunnel offload without key ID
        ...
      f86d1fbb
    • Linus Torvalds's avatar
      Merge tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 526942b8
      Linus Torvalds authored
      Pull ATA updates from Damien Le Moal:
      
       - Some code refactoring for the pata_hpt37x and pata_hpt3x2n drivers,
         from Sergei.
      
       - Several patches to cleanup in libata-core, libata-scsi and libata-eh
         code: fixes arguments and variables types, change some functions
         declaration to static and fix for a typo in a comment. From Sergey
         and Xiang.
      
       - Fix a compilation warning in the pata_macio driver, from me.
      
       - A fix for the expected number of resources in the sata_mv driver fix,
         from Andrew.
      
      * tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: sata_mv: Fixes expected number of resources now IRQs are gone
        ata: libata-scsi: fix result type of ata_ioc32()
        ata: pata_macio: Fix compilation warning
        ata: libata-eh: fix sloppy result type of ata_internal_cmd_timeout()
        ata: libata-core: fix sloppy parameter type in ata_exec_internal[_sg]()
        ata: make ata_port::fastdrain_cnt *unsigned int*
        ata: libata-eh: fix sloppy result type of ata_eh_nr_in_flight()
        ata: libata-core: make ata_exec_internal_sg() *static*
        ata: make transfer mode masks *unsigned int*
        ata: libata-core: get rid of *else* branches in ata_id_n_sectors()
        ata: libata-core: fix sloppy typing in ata_id_n_sectors()
        ata: pata_hpt3x2n: pass base DPLL frequency to hpt3x2n_pci_clock()
        ata: pata_hpt37x: merge hpt374_read_freq() to hpt37x_pci_clock()
        ata: pata_hpt37x: factor out hpt37x_pci_clock()
        ata: pata_hpt37x: move claculating PCI clock from hpt37x_clock_slot()
        ata: libata: Fix syntax errors in comments
      526942b8
    • Linus Torvalds's avatar
      Merge tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · a39b5dbd
      Linus Torvalds authored
      Pull zonefs update from Damien Le Moal:
       "A single change for this cycle to simplify handling of the memory page
        used as super block buffer during mount (from Fabio)"
      
      * tag 'zonefs-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonefs: Call page_address() on page acquired with GFP_KERNEL flag
      a39b5dbd
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f18d7309
      Linus Torvalds authored
      Pull iomap updates from Darrick Wong:
       "The most notable change in this first batch is that we no longer
        schedule pages beyond i_size for writeback, preferring instead to let
        truncate deal with those pages.
      
        Next week, there may be a second pull request to remove
        iomap_writepage from the other two filesystems (gfs2/zonefs) that use
        iomap for buffered IO. This follows in the same vein as the recent
        removal of writepage from XFS, since it hasn't been triggered in a few
        years; it does nothing during direct reclaim; and as far as the people
        who examined the patchset can tell, it's moving the codebase in the
        right direction.
      
        However, as it was a late addition to for-next, I'm holding off on
        that section for another week of testing to see if anyone can come up
        with a solid reason for holding off in the meantime.
      
        Summary:
      
         - Skip writeback for pages that are completely beyond EOF
      
         - Minor code cleanups"
      
      * tag 'iomap-5.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        dax: set did_zero to true when zeroing successfully
        iomap: set did_zero to true when zeroing successfully
        iomap: skip pages past eof in iomap_do_writepage()
      f18d7309
    • Linus Torvalds's avatar
      Merge tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 2e4f8c72
      Linus Torvalds authored
      Pull affs fix from David Sterba:
       "One update to AFFS, switching away from the kmap/kmap_atomic API"
      
      * tag 'affs-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        affs: use memcpy_to_page and remove replace kmap_atomic()
      2e4f8c72
    • Linus Torvalds's avatar
      Merge tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 353767e4
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "This brings some long awaited changes, the send protocol bump,
        otherwise lots of small improvements and fixes. The main core part is
        reworking bio handling, cleaning up the submission and endio and
        improving error handling.
      
        There are some changes outside of btrfs adding helpers or updating
        API, listed at the end of the changelog.
      
        Features:
      
         - sysfs:
            - export chunk size, in debug mode add tunable for setting its size
            - show zoned among features (was only in debug mode)
            - show commit stats (number, last/max/total duration)
      
         - send protocol updated to 2
            - new commands:
               - ability write larger data chunks than 64K
               - send raw compressed extents (uses the encoded data ioctls),
                 ie. no decompression on send side, no compression needed on
                 receive side if supported
               - send 'otime' (inode creation time) among other timestamps
               - send file attributes (a.k.a file flags and xflags)
            - this is first version bump, backward compatibility on send and
              receive side is provided
            - there are still some known and wanted commands that will be
              implemented in the near future, another version bump will be
              needed, however we want to minimize that to avoid causing
              usability issues
      
         - print checksum type and implementation at mount time
      
         - don't print some messages at mount (mentioned as people asked about
           it), we want to print messages namely for new features so let's
           make some space for that
            - big metadata - this has been supported for a long time and is
              not a feature that's worth mentioning
            - skinny metadata - same reason, set by default by mkfs
      
        Performance improvements:
      
         - reduced amount of reserved metadata for delayed items
            - when inserted items can be batched into one leaf
            - when deleting batched directory index items
            - when deleting delayed items used for deletion
            - overall improved count of files/sec, decreased subvolume lock
              contention
      
         - metadata item access bounds checker micro-optimized, with a few
           percent of improved runtime for metadata-heavy operations
      
         - increase direct io limit for read to 256 sectors, improved
           throughput by 3x on sample workload
      
        Notable fixes:
      
         - raid56
            - reduce parity writes, skip sectors of stripe when there are no
              data updates
            - restore reading from on-disk data instead of using stripe cache,
              this reduces chances to damage correct data due to RMW cycle
      
         - refuse to replay log with unknown incompat read-only feature bit
           set
      
         - zoned
            - fix page locking when COW fails in the middle of allocation
            - improved tracking of active zones, ZNS drives may limit the
              number and there are ENOSPC errors due to that limit and not
              actual lack of space
            - adjust maximum extent size for zone append so it does not cause
              late ENOSPC due to underreservation
      
         - mirror reading error messages show the mirror number
      
         - don't fallback to buffered IO for NOWAIT direct IO writes, we don't
           have the NOWAIT semantics for buffered io yet
      
         - send, fix sending link commands for existing file paths when there
           are deleted and created hardlinks for same files
      
         - repair all mirrors for profiles with more than 1 copy (raid1c34)
      
         - fix repair of compressed extents, unify where error detection and
           repair happen
      
        Core changes:
      
         - bio completion cleanups
            - don't double defer compression bios
            - simplify endio workqueues
            - add more data to btrfs_bio to avoid allocation for read requests
            - rework bio error handling so it's same what block layer does,
              the submission works and errors are consumed in endio
            - when asynchronous bio offload fails fall back to synchronous
              checksum calculation to avoid errors under writeback or memory
              pressure
      
         - new trace points
            - raid56 events
            - ordered extent operations
      
         - super block log_root_transid deprecated (never used)
      
         - mixed_backref and big_metadata sysfs feature files removed, they've
           been default for sufficiently long time, there are no known users
           and mixed_backref could be confused with mixed_groups
      
        Non-btrfs changes, API updates:
      
         - minor highmem API update to cover const arguments
      
         - switch all kmap/kmap_atomic to kmap_local
      
         - remove redundant flush_dcache_page()
      
         - address_space_operations::writepage callback removed
      
         - add bdev_max_segments() helper"
      
      * tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits)
        btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
        btrfs: fix repair of compressed extents
        btrfs: remove the start argument to check_data_csum and export
        btrfs: pass a btrfs_bio to btrfs_repair_one_sector
        btrfs: simplify the pending I/O counting in struct compressed_bio
        btrfs: repair all known bad mirrors
        btrfs: merge btrfs_dev_stat_print_on_error with its only caller
        btrfs: join running log transaction when logging new name
        btrfs: simplify error handling in btrfs_lookup_dentry
        btrfs: send: always use the rbtree based inode ref management infrastructure
        btrfs: send: fix sending link commands for existing file paths
        btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
        btrfs: zoned: wait until zone is finished when allocation didn't progress
        btrfs: zoned: write out partially allocated region
        btrfs: zoned: activate necessary block group
        btrfs: zoned: activate metadata block group on flush_space
        btrfs: zoned: disable metadata overcommit for zoned
        btrfs: zoned: introduce space_info->active_total_bytes
        btrfs: zoned: finish least available block group on data bg allocation
        btrfs: let can_allocate_chunk return error
        ...
      353767e4
    • Linus Torvalds's avatar
      Merge tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · ab17c0cd
      Linus Torvalds authored
      Pull efivars sysfs interface removal from Ard Biesheuvel:
       "Remove the obsolete 'efivars' sysfs based interface to the EFI
        variable store, now that all users have moved to the efivarfs pseudo
        file system, which was created ~10 years ago to address some
        fundamental shortcomings in the sysfs based driver.
      
        Move the 'business logic' related to which EFI variables are important
        and may affect the boot flow from the efivars support layer into the
        efivarfs pseudo file system, so it is no longer exposed to other parts
        of the kernel"
      
      * tag 'efi-efivars-removal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: vars: Move efivar caching layer into efivarfs
        efi: vars: Switch to new wrapper layer
        efi: vars: Remove deprecated 'efivars' sysfs interface
      ab17c0cd
    • Linus Torvalds's avatar
      Merge tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 97a77ab1
      Linus Torvalds authored
      Pull EFI updates from Ard Biesheuvel:
      
       - Enable mirrored memory for arm64
      
       - Fix up several abuses of the efivar API
      
       - Refactor the efivar API in preparation for moving the 'business
         logic' part of it into efivarfs
      
       - Enable ACPI PRM on arm64
      
      * tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
        ACPI: Move PRM config option under the main ACPI config
        ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64
        ACPI: PRM: Change handler_addr type to void pointer
        efi: Simplify arch_efi_call_virt() macro
        drivers: fix typo in firmware/efi/memmap.c
        efi: vars: Drop __efivar_entry_iter() helper which is no longer used
        efi: vars: Use locking version to iterate over efivars linked lists
        efi: pstore: Omit efivars caching EFI varstore access layer
        efi: vars: Add thin wrapper around EFI get/set variable interface
        efi: vars: Don't drop lock in the middle of efivar_init()
        pstore: Add priv field to pstore_record for backend specific use
        Input: applespi - avoid efivars API and invoke EFI services directly
        selftests/kexec: remove broken EFI_VARS secure boot fallback check
        brcmfmac: Switch to appropriate helper to load EFI variable contents
        iwlwifi: Switch to proper EFI variable store interface
        media: atomisp_gmin_platform: stop abusing efivar API
        efi: efibc: avoid efivar API for setting variables
        efi: avoid efivars layer when loading SSDTs from variables
        efi: Correct comment on efi_memmap_alloc
        memblock: Disable mirror feature if kernelcore is not specified
        ...
      97a77ab1
    • Linus Torvalds's avatar
      Merge tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ff89dd08
      Linus Torvalds authored
      Pull 9p iov_iter fix from Al Viro:
       "net/9p abuses iov_iter primitives - it attempts to copy _from_ a
        destination-only iov_iter when it handles Rerror arriving in reply to
        zero-copy request.   Not hard to fix, fortunately.
      
        This is a prereq for the iov_iter_get_pages() work in the second part
        of iov_iter series, ended up in a separate branch"
      
      * tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        9p: handling Rerror without copy_from_iter_full()
      ff89dd08
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d9b58ab7
      Linus Torvalds authored
      Pull copy_to_iter_mc fix from Al Viro:
       "Backportable fix for copy_to_iter_mc() - the second part of iov_iter
        work will pretty much overwrite this, but would be much harder to
        backport"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix short copy handling in copy_mc_pipe_to_iter()
      d9b58ab7
    • Linus Torvalds's avatar
      Merge tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 5264406c
      Linus Torvalds authored
      Pull vfs iov_iter updates from Al Viro:
       "Part 1 - isolated cleanups and optimizations.
      
        One of the goals is to reduce the overhead of using ->read_iter() and
        ->write_iter() instead of ->read()/->write().
      
        new_sync_{read,write}() has a surprising amount of overhead, in
        particular inside iocb_flags(). That's the explanation for the
        beginning of the series is in this pile; it's not directly
        iov_iter-related, but it's a part of the same work..."
      
      * tag 'pull-work.iov_iter-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        first_iovec_segment(): just return address
        iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
        iov_iter: first_{iovec,bvec}_segment() - simplify a bit
        iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
        iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT
        iov_iter_bvec_advance(): don't bother with bvec_iter
        copy_page_{to,from}_iter(): switch iovec variants to generic
        keep iocb_flags() result cached in struct file
        iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
        struct file: use anonymous union member for rcuhead and llist
        btrfs: use IOMAP_DIO_NOSYNC
        teach iomap_dio_rw() to suppress dsync
        No need of likely/unlikely on calls of check_copy_size()
      5264406c
    • Linus Torvalds's avatar
      Merge tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 200e340f
      Linus Torvalds authored
      Pull vfs dcache updates from Al Viro:
       "The main part here is making parallel lookups safe for RT - making
        sure preemption is disabled in start_dir_add()/ end_dir_add() sections
        (on non-RT it's automatic, on RT it needs to to be done explicitly)
        and moving wakeups from __d_lookup_done() inside of such to the end of
        those sections.
      
        Wakeups can be safely delayed for as long as ->d_lock on in-lookup
        dentry is held; proving that has caught a bug in d_add_ci() that
        allows memory corruption when sufficiently bogus ntfs (or
        case-insensitive xfs) image is mounted. Easily fixed, fortunately"
      
      * tag 'pull-work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/dcache: Move wakeup out of i_seq_dir write held region.
        fs/dcache: Move the wakeup from __d_lookup_done() to the caller.
        fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT
        d_add_ci(): make sure we don't miss d_lookup_done()
      200e340f
    • Linus Torvalds's avatar
      Merge tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a782e866
      Linus Torvalds authored
      Pull vfs lseek updates from Al Viro:
       "Jason's lseek series.
      
        Saner handling of 'lseek should fail with ESPIPE' - this gets rid of
        the magical no_llseek thing and makes checks consistent.
      
        In particular, the ad-hoc "can we do splice via internal pipe" checks
        got saner (and somewhat more permissive, which is what Jason had been
        after, AFAICT)"
      
      * tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: remove no_llseek
        fs: check FMODE_LSEEK to control internal pipe splicing
        vfio: do not set FMODE_LSEEK flag
        dma-buf: remove useless FMODE_LSEEK flag
        fs: do not compare against ->llseek
        fs: clear or set FMODE_LSEEK based on llseek function
      a782e866
    • Linus Torvalds's avatar
      Merge tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d9395512
      Linus Torvalds authored
      Pull vfs namei updates from Al Viro:
       "RCU pathwalk cleanups.
      
        Storing sampled ->d_seq of the next dentry in nameidata simplifies
        life considerably, especially if we delay fetching ->d_inode until
        step_into()"
      
      * tag 'pull-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        step_into(): move fetching ->d_inode past handle_mounts()
        lookup_fast(): don't bother with inode
        follow_dotdot{,_rcu}(): don't bother with inode
        step_into(): lose inode argument
        namei: stash the sampled ->d_seq into nameidata
        namei: move clearing LOOKUP_RCU towards rcu_read_unlock()
        switch try_to_unlazy_next() to __legitimize_mnt()
        follow_dotdot{,_rcu}(): change calling conventions
        namei: get rid of pointless unlikely(read_seqcount_retry(...))
        __follow_mount_rcu(): verify that mount_lock remains unchanged
      d9395512
    • Linus Torvalds's avatar
      Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache · f0065400
      Linus Torvalds authored
      Pull folio updates from Matthew Wilcox:
      
       - Fix an accounting bug that made NR_FILE_DIRTY grow without limit
         when running xfstests
      
       - Convert more of mpage to use folios
      
       - Remove add_to_page_cache() and add_to_page_cache_locked()
      
       - Convert find_get_pages_range() to filemap_get_folios()
      
       - Improvements to the read_cache_page() family of functions
      
       - Remove a few unnecessary checks of PageError
      
       - Some straightforward filesystem conversions to use folios
      
       - Split PageMovable users out from address_space_operations into
         their own movable_operations
      
       - Convert aops->migratepage to aops->migrate_folio
      
       - Remove nobh support (Christoph Hellwig)
      
      * tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
        fs: remove the NULL get_block case in mpage_writepages
        fs: don't call ->writepage from __mpage_writepage
        fs: remove the nobh helpers
        jfs: stop using the nobh helper
        ext2: remove nobh support
        ntfs3: refactor ntfs_writepages
        mm/folio-compat: Remove migration compatibility functions
        fs: Remove aops->migratepage()
        secretmem: Convert to migrate_folio
        hugetlb: Convert to migrate_folio
        aio: Convert to migrate_folio
        f2fs: Convert to filemap_migrate_folio()
        ubifs: Convert to filemap_migrate_folio()
        btrfs: Convert btrfs_migratepage to migrate_folio
        mm/migrate: Add filemap_migrate_folio()
        mm/migrate: Convert migrate_page() to migrate_folio()
        nfs: Convert to migrate_folio
        btrfs: Convert btree_migratepage to migrate_folio
        mm/migrate: Convert expected_page_refs() to folio_expected_refs()
        mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
        ...
      f0065400
    • Linus Torvalds's avatar
      Merge tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray · e087437a
      Linus Torvalds authored
      Pull XArray/IDR updates from Matthew Wilcox:
      
       - Add appropriate might_alloc() annotations to the XArray APIs
      
       - Document that the IDR is deprecated
      
      * tag 'xarray-6.0' of git://git.infradead.org/users/willy/xarray:
        IDR: Note that the IDR API is deprecated
        XArray: Add calls to might_alloc()
      e087437a
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · b6bb70f9
      Linus Torvalds authored
      Pull cgroup updates from Tejun Heo:
       "Several core optimizations:
      
         - threadgroup_rwsem write locking is skipped when configuring
           controllers in empty subtrees.
      
           Combined with CLONE_INTO_CGROUP, this allows the common static
           usage pattern to not grab threadgroup_rwsem at all (glibc still
           doesn't seem ready for CLONE_INTO_CGROUP unfortunately).
      
         - threadgroup_rwsem used to be put into non-percpu mode by default
           due to latency concerns in specific use cases. There's no reason
           for everyone else to pay for it. Make the behavior optional.
      
         - psi no longer allocates memory when disabled.
      
        ... along with some code cleanups"
      
      * tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: Skip subtree root in cgroup_update_dfl_csses()
        cgroup: remove "no" prefixed mount options
        cgroup: Make !percpu threadgroup_rwsem operations optional
        cgroup: Add "no" prefixed mount options
        cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
        cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
        cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
        psi: dont alloc memory for psi by default
      b6bb70f9