1. 11 Jul, 2023 5 commits
  2. 10 Jul, 2023 4 commits
  3. 09 Jul, 2023 1 commit
  4. 07 Jul, 2023 1 commit
  5. 06 Jul, 2023 10 commits
  6. 05 Jul, 2023 3 commits
  7. 30 Jun, 2023 9 commits
  8. 29 Jun, 2023 3 commits
  9. 28 Jun, 2023 4 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 3a8a670e
      Linus Torvalds authored
      Pull networking changes from Jakub Kicinski:
       "WiFi 7 and sendpage changes are the biggest pieces of work for this
        release. The latter will definitely require fixes but I think that we
        got it to a reasonable point.
      
        Core:
      
         - Rework the sendpage & splice implementations
      
           Instead of feeding data into sockets page by page extend sendmsg
           handlers to support taking a reference on the data, controlled by a
           new flag called MSG_SPLICE_PAGES
      
           Rework the handling of unexpected-end-of-file to invoke an
           additional callback instead of trying to predict what the right
           combination of MORE/NOTLAST flags is
      
           Remove the MSG_SENDPAGE_NOTLAST flag completely
      
         - Implement SCM_PIDFD, a new type of CMSG type analogous to
           SCM_CREDENTIALS, but it contains pidfd instead of plain pid
      
         - Enable socket busy polling with CONFIG_RT
      
         - Improve reliability and efficiency of reporting for ref_tracker
      
         - Auto-generate a user space C library for various Netlink families
      
        Protocols:
      
         - Allow TCP to shrink the advertised window when necessary, prevent
           sk_rcvbuf auto-tuning from growing the window all the way up to
           tcp_rmem[2]
      
         - Use per-VMA locking for "page-flipping" TCP receive zerocopy
      
         - Prepare TCP for device-to-device data transfers, by making sure
           that payloads are always attached to skbs as page frags
      
         - Make the backoff time for the first N TCP SYN retransmissions
           linear. Exponential backoff is unnecessarily conservative
      
         - Create a new MPTCP getsockopt to retrieve all info
           (MPTCP_FULL_INFO)
      
         - Avoid waking up applications using TLS sockets until we have a full
           record
      
         - Allow using kernel memory for protocol ioctl callbacks, paving the
           way to issuing ioctls over io_uring
      
         - Add nolocalbypass option to VxLAN, forcing packets to be fully
           encapsulated even if they are destined for a local IP address
      
         - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
           in-kernel ECMP implementation (e.g. Open vSwitch) select the same
           link for all packets. Support L4 symmetric hashing in Open vSwitch
      
         - PPPoE: make number of hash bits configurable
      
         - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
           (ipconfig)
      
         - Add layer 2 miss indication and filtering, allowing higher layers
           (e.g. ACL filters) to make forwarding decisions based on whether
           packet matched forwarding state in lower devices (bridge)
      
         - Support matching on Connectivity Fault Management (CFM) packets
      
         - Hide the "link becomes ready" IPv6 messages by demoting their
           printk level to debug
      
         - HSR: don't enable promiscuous mode if device offloads the proto
      
         - Support active scanning in IEEE 802.15.4
      
         - Continue work on Multi-Link Operation for WiFi 7
      
        BPF:
      
         - Add precision propagation for subprogs and callbacks. This allows
           maintaining verification efficiency when subprograms are used, or
           in fact passing the verifier at all for complex programs,
           especially those using open-coded iterators
      
         - Improve BPF's {g,s}setsockopt() length handling. Previously BPF
           assumed the length is always equal to the amount of written data.
           But some protos allow passing a NULL buffer to discover what the
           output buffer *should* be, without writing anything
      
         - Accept dynptr memory as memory arguments passed to helpers
      
         - Add routing table ID to bpf_fib_lookup BPF helper
      
         - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
      
         - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
           maps as read-only)
      
         - Show target_{obj,btf}_id in tracing link fdinfo
      
         - Addition of several new kfuncs (most of the names are
           self-explanatory):
            - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
              bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
              and bpf_dynptr_clone().
            - bpf_task_under_cgroup()
            - bpf_sock_destroy() - force closing sockets
            - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
      
        Netfilter:
      
         - Relax set/map validation checks in nf_tables. Allow checking
           presence of an entry in a map without using the value
      
         - Increase ip_vs_conn_tab_bits range for 64BIT builds
      
         - Allow updating size of a set
      
         - Improve NAT tuple selection when connection is closing
      
        Driver API:
      
         - Integrate netdev with LED subsystem, to allow configuring HW
           "offloaded" blinking of LEDs based on link state and activity
           (i.e. packets coming in and out)
      
         - Support configuring rate selection pins of SFP modules
      
         - Factor Clause 73 auto-negotiation code out of the drivers, provide
           common helper routines
      
         - Add more fool-proof helpers for managing lifetime of MDIO devices
           associated with the PCS layer
      
         - Allow drivers to report advanced statistics related to Time Aware
           scheduler offload (taprio)
      
         - Allow opting out of VF statistics in link dump, to allow more VFs
           to fit into the message
      
         - Split devlink instance and devlink port operations
      
        New hardware / drivers:
      
         - Ethernet:
            - Synopsys EMAC4 IP support (stmmac)
            - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
            - Marvell 88E6250 7 port switches
            - Microchip LAN8650/1 Rev.B0 PHYs
            - MediaTek MT7981/MT7988 built-in 1GE PHY driver
      
         - WiFi:
            - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
            - Realtek RTL8723DS (SDIO variant)
            - Realtek RTL8851BE
      
         - CAN:
            - Fintek F81604
      
        Drivers:
      
         - Ethernet NICs:
            - Intel (100G, ice):
               - support dynamic interrupt allocation
               - use meta data match instead of VF MAC addr on slow-path
            - nVidia/Mellanox:
               - extend link aggregation to handle 4, rather than just 2 ports
               - spawn sub-functions without any features by default
            - OcteonTX2:
               - support HTB (Tx scheduling/QoS) offload
               - make RSS hash generation configurable
               - support selecting Rx queue using TC filters
            - Wangxun (ngbe/txgbe):
               - add basic Tx/Rx packet offloads
               - add phylink support (SFP/PCS control)
            - Freescale/NXP (enetc):
               - report TAPRIO packet statistics
            - Solarflare/AMD:
               - support matching on IP ToS and UDP source port of outer
                 header
               - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
               - add devlink dev info support for EF10
      
         - Virtual NICs:
            - Microsoft vNIC:
               - size the Rx indirection table based on requested
                 configuration
               - support VLAN tagging
            - Amazon vNIC:
               - try to reuse Rx buffers if not fully consumed, useful for ARM
                 servers running with 16kB pages
            - Google vNIC:
               - support TCP segmentation of >64kB frames
      
         - Ethernet embedded switches:
            - Marvell (mv88e6xxx):
               - enable USXGMII (88E6191X)
            - Microchip:
               - lan966x: add support for Egress Stage 0 ACL engine
               - lan966x: support mapping packet priority to internal switch
                 priority (based on PCP or DSCP)
      
         - Ethernet PHYs:
            - Broadcom PHYs:
               - support for Wake-on-LAN for BCM54210E/B50212E
               - report LPI counter
            - Microsemi PHYs: support RGMII delay configuration (VSC85xx)
            - Micrel PHYs: receive timestamp in the frame (LAN8841)
            - Realtek PHYs: support optional external PHY clock
            - Altera TSE PCS: merge the driver into Lynx PCS which it is a
              variant of
      
         - CAN: Kvaser PCIEcan:
            - support packet timestamping
      
         - WiFi:
            - Intel (iwlwifi):
               - major update for new firmware and Multi-Link Operation (MLO)
               - configuration rework to drop test devices and split the
                 different families
               - support for segmented PNVM images and power tables
               - new vendor entries for PPAG (platform antenna gain) feature
            - Qualcomm 802.11ax (ath11k):
               - Multiple Basic Service Set Identifier (MBSSID) and Enhanced
                 MBSSID Advertisement (EMA) support in AP mode
               - support factory test mode
            - RealTek (rtw89):
               - add RSSI based antenna diversity
               - support U-NII-4 channels on 5 GHz band
            - RealTek (rtl8xxxu):
               - AP mode support for 8188f
               - support USB RX aggregation for the newer chips"
      
      * tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
        net: scm: introduce and use scm_recv_unix helper
        af_unix: Skip SCM_PIDFD if scm->pid is NULL.
        net: lan743x: Simplify comparison
        netlink: Add __sock_i_ino() for __netlink_diag_dump().
        net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
        Revert "af_unix: Call scm_recv() only after scm_set_cred()."
        phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
        libceph: Partially revert changes to support MSG_SPLICE_PAGES
        net: phy: mscc: fix packet loss due to RGMII delays
        net: mana: use vmalloc_array and vcalloc
        net: enetc: use vmalloc_array and vcalloc
        ionic: use vmalloc_array and vcalloc
        pds_core: use vmalloc_array and vcalloc
        gve: use vmalloc_array and vcalloc
        octeon_ep: use vmalloc_array and vcalloc
        net: usb: qmi_wwan: add u-blox 0x1312 composition
        perf trace: fix MSG_SPLICE_PAGES build error
        ipvlan: Fix return value of ipvlan_queue_xmit()
        netfilter: nf_tables: fix underflow in chain reference counter
        netfilter: nf_tables: unbind non-anonymous set if rule construction fails
        ...
      3a8a670e
    • Linus Torvalds's avatar
      Merge tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · 6a8cbd92
      Linus Torvalds authored
      Pull sysctl updates from Luis Chamberlain:
       "The changes for sysctl are in line with prior efforts to stop usage of
        deprecated routines which incur recursion and also make it hard to
        remove the empty array element in each sysctl array declaration.
      
        The most difficult user to modify was parport which required a bit of
        re-thinking of how to declare shared sysctls there, Joel Granados has
        stepped up to the plate to do most of this work and eventual removal
        of register_sysctl_table(). That work ended up saving us about 1465
        bytes according to bloat-o-meter. Since we gained a few bloat-o-meter
        karma points I moved two rather small sysctl arrays from
        kernel/sysctl.c leaving us only two more sysctl arrays to move left.
      
        Most changes have been tested on linux-next for about a month. The
        last straggler patches are a minor parport fix, changes to the sysctl
        kernel selftest so to verify correctness and prevent regressions for
        the future change he made to provide an alternative solution for the
        special sysctl mount point target which was using the now deprecated
        sysctl child element.
      
        This is all prep work to now finally be able to remove the empty array
        element in all sysctl declarations / registrations which is expected
        to save us a bit of bytes all over the kernel. That work will be
        tested early after v6.5-rc1 is out"
      
      * tag 'v6.5-rc1-sysctl-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
        sysctl: replace child with an enumeration
        sysctl: Remove debugging dump_stack
        test_sysclt: Test for registering a mount point
        test_sysctl: Add an option to prevent test skip
        test_sysctl: Add an unregister sysctl test
        test_sysctl: Group node sysctl test under one func
        test_sysctl: Fix test metadata getters
        parport: plug a sysctl register leak
        sysctl: move security keys sysctl registration to its own file
        sysctl: move umh sysctl registration to its own file
        signal: move show_unhandled_signals sysctl to its own file
        sysctl: remove empty dev table
        sysctl: Remove register_sysctl_table
        sysctl: Refactor base paths registrations
        sysctl: stop exporting register_sysctl_table
        parport: Removed sysctl related defines
        parport: Remove register_sysctl_table from parport_default_proc_register
        parport: Remove register_sysctl_table from parport_device_proc_register
        parport: Remove register_sysctl_table from parport_proc_register
        parport: Move magic number "15" to a define
      6a8cbd92
    • Linus Torvalds's avatar
      Merge tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · 4e3c09e9
      Linus Torvalds authored
      Pull module updates from Luis Chamberlain:
       "The changes queued up for modules are pretty tame, mostly code removal
        of moving of code.
      
        Only two minor functional changes are made, the only one which stands
        out is Sebastian Andrzej Siewior's simplification of module reference
        counting by removing preempt_disable() and that has been tested on
        linux-next for well over a month without no regressions.
      
        I'm now, I guess, also a kitchen sink for some kallsyms changes"
      
      [ There was a mis-communication about the concurrent module load changes
        that I had expected to come through Luis despite me authoring the
        patch. So some of the module updates were left hanging in the email
        ether, and I just committed them separately.
      
        It's my bad - I should have made it more clear that I expected my
        own patches to come through the module tree too. Now they missed
        linux-next, but hopefully that won't cause any issues    - Linus ]
      
      * tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
        kallsyms: make kallsyms_show_value() as generic function
        kallsyms: move kallsyms_show_value() out of kallsyms.c
        kallsyms: remove unsed API lookup_symbol_attrs
        kallsyms: remove unused arch_get_kallsym() helper
        module: Remove preempt_disable() from module reference counting.
      4e3c09e9
    • Linus Torvalds's avatar
      modules: catch concurrent module loads, treat them as idempotent · 9b9879fc
      Linus Torvalds authored
      This is the new-and-improved attempt at avoiding huge memory load spikes
      when the user space boot sequence tries to load hundreds (or even
      thousands) of redundant duplicate modules in parallel.
      
      See commit 9828ed3f ("module: error out early on concurrent load of
      the same module file") for background and an earlier failed attempt that
      was reverted.
      
      That earlier attempt just said "concurrently loading the same module is
      silly, just open the module file exclusively and return -ETXTBSY if
      somebody else is already loading it".
      
      While it is true that concurrent module loads of the same module is
      silly, the reason that earlier attempt then failed was that the
      concurrently loaded module would often be a prerequisite for another
      module.
      
      Thus failing to load the prerequisite would then cause cascading
      failures of the other modules, rather than just short-circuiting that
      one unnecessary module load.
      
      At the same time, we still really don't want to load the contents of the
      same module file hundreds of times, only to then wait for an eventually
      successful load, and have everybody else return -EEXIST.
      
      As a result, this takes another approach, and treats concurrent module
      loads from the same file as "idempotent" in the inode.  So if one module
      load is ongoing, we don't start a new one, but instead just wait for the
      first one to complete and return the same return value as it did.
      
      So unlike the first attempt, this does not return early: the intent is
      not to speed up the boot, but to avoid a thundering herd problem in
      allocating memory (both physical and virtual) for a module more than
      once.
      
      Also note that this does change behavior: it used to be that when you
      had concurrent loads, you'd have one "winner" that would return success,
      and everybody else would return -EEXIST.
      
      In contrast, this idempotent logic goes all Oprah on the problem, and
      says "You are a winner! And you are a winner! We are ALL winners".  But
      since there's no possible actual real semantic difference between "you
      loaded the module" and "somebody else already loaded the module", this
      is more of a feel-good change than an actual honest-to-goodness semantic
      change.
      
      Of course, any true Johnny-come-latelies that don't get caught in the
      concurrency filter will still return -EEXIST.  It's no different from
      not even getting a seat at an Oprah taping.  That's life.
      
      See the long thread on the kernel mailing list about this all, which
      includes some numbers for memory use before and after the patch.
      
      Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      Tested-by: default avatarJohan Hovold <johan@kernel.org>
      Tested-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Tested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: default avatarRudi Heitbaum <rudi@heitbaum..com>
      Tested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b9879fc