1. 08 Dec, 2023 3 commits
    • Ryno Swart's avatar
      nfp: ethtool: add extended ack report messages · b0318e28
      Ryno Swart authored
      Add descriptive error messages to common ethtool failures to be more
      user friendly.
      
      Update `nfp_net_coalesce_para_check` to only check one argument, which
      facilitates unique error messages.
      
      Additionally, three error codes are updated to `EOPNOTSUPP` to reflect
      that these operations are not supported.
      Signed-off-by: default avatarRyno Swart <ryno.swart@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Link: https://lore.kernel.org/r/20231206151209.20296-2-louis.peens@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b0318e28
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 2483e7f0
      Jakub Kicinski authored
      Cross-merge networking fixes after downstream PR.
      
      Conflicts:
      
      drivers/net/ethernet/stmicro/stmmac/dwmac5.c
      drivers/net/ethernet/stmicro/stmmac/dwmac5.h
      drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
      drivers/net/ethernet/stmicro/stmmac/hwif.h
        37e4b8df ("net: stmmac: fix FPE events losing")
        c3f3b972 ("net: stmmac: Refactor EST implementation")
      https://lore.kernel.org/all/20231206110306.01e91114@canb.auug.org.au/
      
      Adjacent changes:
      
      net/ipv4/tcp_ao.c
        9396c4ee ("net/tcp: Don't store TCP-AO maclen on reqsk")
        7b0f570f ("tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2483e7f0
    • Linus Torvalds's avatar
      Merge tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5e3f5b81
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - veth: fix packet segmentation in veth_convert_skb_to_xdp_buff
      
        Current release - new code bugs:
      
         - tcp: assorted fixes to the new Auth Option support
      
        Older releases - regressions:
      
         - tcp: fix mid stream window clamp
      
         - tls: fix incorrect splice handling
      
         - ipv4: ip_gre: handle skb_pull() failure in ipgre_xmit()
      
         - dsa: mv88e6xxx: restore USXGMII support for 6393X
      
         - arcnet: restore support for multiple Sohard Arcnet cards
      
        Older releases - always broken:
      
         - tcp: do not accept ACK of bytes we never sent
      
         - require admin privileges to receive packet traces via netlink
      
         - packet: move reference count in packet_sock to atomic_long_t
      
         - bpf:
            - fix incorrect branch offset comparison with cpu=v4
            - fix prog_array_map_poke_run map poke update
      
         - netfilter:
            - three fixes for crashes on bad admin commands
            - xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref
            - nf_tables: fix 'exist' matching on bigendian arches
      
         - leds: netdev: fix RTNL handling to prevent potential deadlock
      
         - eth: tg3: prevent races in error/reset handling
      
         - eth: r8169: fix rtl8125b PAUSE storm when suspended
      
         - eth: r8152: improve reset and surprise removal handling
      
         - eth: hns: fix race between changing features and sending
      
         - eth: nfp: fix sleep in atomic for bonding offload"
      
      * tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
        vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
        net/smc: fix missing byte order conversion in CLC handshake
        net: dsa: microchip: provide a list of valid protocols for xmit handler
        drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
        psample: Require 'CAP_NET_ADMIN' when joining "packets" group
        bpf: sockmap, updating the sg structure should also update curr
        net: tls, update curr on splice as well
        nfp: flower: fix for take a mutex lock in soft irq context and rcu lock
        net: dsa: mv88e6xxx: Restore USXGMII support for 6393X
        tcp: do not accept ACK of bytes we never sent
        selftests/bpf: Add test for early update in prog_array_map_poke_run
        bpf: Fix prog_array_map_poke_run map poke update
        netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
        netfilter: nf_tables: validate family when identifying table via handle
        netfilter: nf_tables: bail out on mismatching dynset and set expressions
        netfilter: nf_tables: fix 'exist' matching on bigendian arches
        netfilter: nft_set_pipapo: skip inactive elements during set walk
        netfilter: bpf: fix bad registration on nf_defrag
        leds: trigger: netdev: fix RTNL handling to prevent potential deadlock
        octeontx2-af: Update Tx link register range
        ...
      5e3f5b81
  2. 07 Dec, 2023 34 commits
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 9ace34a8
      Linus Torvalds authored
      Pull cgroup fix from Tejun Heo:
       "Just one fix.
      
        Commit f5d39b02 ("freezer,sched: Rewrite core freezer logic")
        changed how freezing state is recorded which made cgroup_freezing()
        disagree with the actual state of the task while thawing triggering a
        warning. Fix it by updating cgroup_freezing()"
      
      * tag 'cgroup-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup_freezer: cgroup_freezing: Check if not frozen
      9ace34a8
    • Linus Torvalds's avatar
      Merge tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · e0348c1f
      Linus Torvalds authored
      Pull workqueue fix from Tejun Heo:
       "Just one patch to fix a bug which can crash the kernel if the
        housekeeping and wq_unbound_cpu cpumask configuration combination
        leaves the latter empty"
      
      * tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: Make sure that wq_unbound_cpumask is never empty
      e0348c1f
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 4388ae22
      Linus Torvalds authored
      Pull regmap fix from Mark Brown:
       "An incremental fix for the fix introduced during the merge window for
        caching of the selector for windowed register ranges. We were
        incorrectly leaking an error code in the case where the last selector
        accessed was for some reason not cached"
      
      * tag 'regmap-fix-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: fix bogus error on regcache_sync success
      4388ae22
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · d5c0b601
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Fix dt-extract-compatibles for builds with in tree build directory
      
       - Drop Xinlei Lee <xinlei.lee@mediatek.com> bouncing email
      
       - Fix the of_reconfig_get_state_change() return value documentation
      
       - Add missing #power-domain-cells property to QCom MPM
      
       - Fix warnings in i.MX LCDIF and adi,adv7533
      
      * tag 'devicetree-fixes-for-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: display: adi,adv75xx: Document #sound-dai-cells
        dt-bindings: lcdif: Properly describe the i.MX23 interrupts
        dt-bindings: interrupt-controller: Allow #power-domain-cells
        of: dynamic: Fix of_reconfig_get_state_change() return value documentation
        dt-bindings: display: mediatek: dsi: remove Xinlei's mail
        dt: dt-extract-compatibles: Don't follow symlinks when walking tree
      d5c0b601
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.7-3' of... · 33d42bde
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Ilpo Järvinen:
      
       - Fix i8042 filter resource handling, input, and suspend issues in
         asus-wmi
      
       - Skip zero instance WMI blocks to avoid issues with some laptops
      
       - Differentiate dev/production keys in mlxbf-bootctl
      
       - Correct surface serdev related return value to avoid leaking errno
         into userspace
      
       - Error checking fixes
      
      * tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/mellanox: Check devm_hwmon_device_register_with_groups() return value
        platform/mellanox: Add null pointer checks for devm_kasprintf()
        mlxbf-bootctl: correctly identify secure boot with development keys
        platform/x86: wmi: Skip blocks with zero instances
        platform/surface: aggregator: fix recv_buf() return value
        platform/x86: asus-wmi: disable USB0 hub on ROG Ally before suspend
        platform/x86: asus-wmi: Filter Volume key presses if also reported via atkbd
        platform/x86: asus-wmi: Change q500a_i8042_filter() into a generic i8042-filter
        platform/x86: asus-wmi: Move i8042 filter install to shared asus-wmi code
      33d42bde
    • Linus Torvalds's avatar
      Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f35e4663
      Linus Torvalds authored
      Pull x86 int80 fixes from Dave Hansen:
       "Avoid VMM misuse of 'int 0x80' handling in TDX and SEV guests.
      
        It also has the very nice side effect of getting rid of a bunch of
        assembly entry code"
      
      * tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tdx: Allow 32-bit emulation by default
        x86/entry: Do not allow external 0x80 interrupts
        x86/entry: Convert INT 0x80 emulation to IDTENTRY
        x86/coco: Disable 32-bit emulation by default on TDX and SEV
      f35e4663
    • Stefano Garzarella's avatar
      vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning · b0a930e8
      Stefano Garzarella authored
      After backporting commit 581512a6 ("vsock/virtio: MSG_ZEROCOPY
      flag support") in CentOS Stream 9, CI reported the following error:
      
          In file included from ./include/linux/kernel.h:17,
                           from ./include/linux/list.h:9,
                           from ./include/linux/preempt.h:11,
                           from ./include/linux/spinlock.h:56,
                           from net/vmw_vsock/virtio_transport_common.c:9:
          net/vmw_vsock/virtio_transport_common.c: In function ‘virtio_transport_can_zcopy‘:
          ./include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
             20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                |                                   ^~
          ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck‘
             26 |                 (__typecheck(x, y) && __no_side_effects(x, y))
                |                  ^~~~~~~~~~~
          ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp‘
             36 |         __builtin_choose_expr(__safe_cmp(x, y), \
                |                               ^~~~~~~~~~
          ./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp‘
             45 | #define min(x, y)       __careful_cmp(x, y, <)
                |                         ^~~~~~~~~~~~~
          net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of macro ‘min‘
             63 |                 int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
      
      We could solve it by using min_t(), but this operation seems entirely
      unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(),
      which performs almost the same check, returning at most MAX_SKB_FRAGS
      elements. So, let's eliminate this unnecessary comparison.
      
      Fixes: 581512a6 ("vsock/virtio: MSG_ZEROCOPY flag support")
      Cc: avkrasnov@salutedevices.com
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarArseniy Krasnov <avkrasnov@salutedevices.com>
      Link: https://lore.kernel.org/r/20231206164143.281107-1-sgarzare@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b0a930e8
    • Wen Gu's avatar
      net/smc: fix missing byte order conversion in CLC handshake · c5a10397
      Wen Gu authored
      The byte order conversions of ISM GID and DMB token are missing in
      process of CLC accept and confirm. So fix it.
      
      Fixes: 3d9725a6 ("net/smc: common routine for CLC accept and confirm")
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Link: https://lore.kernel.org/r/1701882157-87956-1-git-send-email-guwen@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c5a10397
    • duanqiangwen's avatar
      net: wangxun: fix changing mac failed when running · 87e839c8
      duanqiangwen authored
      in some bonding mode, service need to change mac when
      netif is running. Wangxun netdev add IFF_LIVE_ADDR_CHANGE
      priv_flag to support it.
      Signed-off-by: default avatarduanqiangwen <duanqiangwen@net-swift.com>
      Link: https://lore.kernel.org/r/20231206095044.17844-1-duanqiangwen@net-swift.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      87e839c8
    • Sean Nyekjaer's avatar
      net: dsa: microchip: provide a list of valid protocols for xmit handler · 1499b892
      Sean Nyekjaer authored
      Provide a list of valid protocols for which the driver will provide
      it's deferred xmit handler.
      
      When using DSA_TAG_PROTO_KSZ8795 protocol, it does not provide a
      "connect" method, therefor ksz_connect() is not allocating ksz_tagger_data.
      
      This avoids the following null pointer dereference:
       ksz_connect_tag_protocol from dsa_register_switch+0x9ac/0xee0
       dsa_register_switch from ksz_switch_register+0x65c/0x828
       ksz_switch_register from ksz_spi_probe+0x11c/0x168
       ksz_spi_probe from spi_probe+0x84/0xa8
       spi_probe from really_probe+0xc8/0x2d8
      
      Fixes: ab32f56a ("net: dsa: microchip: ptp: add packet transmission timestamping")
      Signed-off-by: default avatarSean Nyekjaer <sean@geanix.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20231206071655.1626479-1-sean@geanix.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1499b892
    • Jakub Kicinski's avatar
      Merge branch 'generic-netlink-multicast-fixes' · a041adee
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      Generic netlink multicast fixes
      
      Restrict two generic netlink multicast groups - in the "psample" and
      "NET_DM" families - to be root-only with the appropriate capabilities.
      See individual patches for more details.
      ====================
      
      Link: https://lore.kernel.org/r/20231206213102.1824398-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a041adee
    • Ido Schimmel's avatar
      drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group · e0378187
      Ido Schimmel authored
      The "NET_DM" generic netlink family notifies drop locations over the
      "events" multicast group. This is problematic since by default generic
      netlink allows non-root users to listen to these notifications.
      
      Fix by adding a new field to the generic netlink multicast group
      structure that when set prevents non-root users or root without the
      'CAP_SYS_ADMIN' capability (in the user namespace owning the network
      namespace) from joining the group. Set this field for the "events"
      group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the
      nature of the information that is shared over this group.
      
      Note that the capability check in this case will always be performed
      against the initial user namespace since the family is not netns aware
      and only operates in the initial network namespace.
      
      A new field is added to the structure rather than using the "flags"
      field because the existing field uses uAPI flags and it is inappropriate
      to add a new uAPI flag for an internal kernel check. In net-next we can
      rework the "flags" field to use internal flags and fold the new field
      into it. But for now, in order to reduce the amount of changes, add a
      new field.
      
      Since the information can only be consumed by root, mark the control
      plane operations that start and stop the tracing as root-only using the
      'GENL_ADMIN_PERM' flag.
      
      Tested using [1].
      
      Before:
      
       # capsh -- -c ./dm_repo
       # capsh --drop=cap_sys_admin -- -c ./dm_repo
      
      After:
      
       # capsh -- -c ./dm_repo
       # capsh --drop=cap_sys_admin -- -c ./dm_repo
       Failed to join "events" multicast group
      
      [1]
       $ cat dm.c
       #include <stdio.h>
       #include <netlink/genl/ctrl.h>
       #include <netlink/genl/genl.h>
       #include <netlink/socket.h>
      
       int main(int argc, char **argv)
       {
       	struct nl_sock *sk;
       	int grp, err;
      
       	sk = nl_socket_alloc();
       	if (!sk) {
       		fprintf(stderr, "Failed to allocate socket\n");
       		return -1;
       	}
      
       	err = genl_connect(sk);
       	if (err) {
       		fprintf(stderr, "Failed to connect socket\n");
       		return err;
       	}
      
       	grp = genl_ctrl_resolve_grp(sk, "NET_DM", "events");
       	if (grp < 0) {
       		fprintf(stderr,
       			"Failed to resolve \"events\" multicast group\n");
       		return grp;
       	}
      
       	err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE);
       	if (err) {
       		fprintf(stderr, "Failed to join \"events\" multicast group\n");
       		return err;
       	}
      
       	return 0;
       }
       $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o dm_repo dm.c
      
      Fixes: 9a8afc8d ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
      Reported-by: default avatar"The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20231206213102.1824398-3-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e0378187
    • Ido Schimmel's avatar
      psample: Require 'CAP_NET_ADMIN' when joining "packets" group · 44ec98ea
      Ido Schimmel authored
      The "psample" generic netlink family notifies sampled packets over the
      "packets" multicast group. This is problematic since by default generic
      netlink allows non-root users to listen to these notifications.
      
      Fix by marking the group with the 'GENL_UNS_ADMIN_PERM' flag. This will
      prevent non-root users or root without the 'CAP_NET_ADMIN' capability
      (in the user namespace owning the network namespace) from joining the
      group.
      
      Tested using [1].
      
      Before:
      
       # capsh -- -c ./psample_repo
       # capsh --drop=cap_net_admin -- -c ./psample_repo
      
      After:
      
       # capsh -- -c ./psample_repo
       # capsh --drop=cap_net_admin -- -c ./psample_repo
       Failed to join "packets" multicast group
      
      [1]
       $ cat psample.c
       #include <stdio.h>
       #include <netlink/genl/ctrl.h>
       #include <netlink/genl/genl.h>
       #include <netlink/socket.h>
      
       int join_grp(struct nl_sock *sk, const char *grp_name)
       {
       	int grp, err;
      
       	grp = genl_ctrl_resolve_grp(sk, "psample", grp_name);
       	if (grp < 0) {
       		fprintf(stderr, "Failed to resolve \"%s\" multicast group\n",
       			grp_name);
       		return grp;
       	}
      
       	err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE);
       	if (err) {
       		fprintf(stderr, "Failed to join \"%s\" multicast group\n",
       			grp_name);
       		return err;
       	}
      
       	return 0;
       }
      
       int main(int argc, char **argv)
       {
       	struct nl_sock *sk;
       	int err;
      
       	sk = nl_socket_alloc();
       	if (!sk) {
       		fprintf(stderr, "Failed to allocate socket\n");
       		return -1;
       	}
      
       	err = genl_connect(sk);
       	if (err) {
       		fprintf(stderr, "Failed to connect socket\n");
       		return err;
       	}
      
       	err = join_grp(sk, "config");
       	if (err)
       		return err;
      
       	err = join_grp(sk, "packets");
       	if (err)
       		return err;
      
       	return 0;
       }
       $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o psample_repo psample.c
      
      Fixes: 6ae0a628 ("net: Introduce psample, a new genetlink channel for packet sampling")
      Reported-by: default avatar"The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20231206213102.1824398-2-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      44ec98ea
    • Jakub Kicinski's avatar
      Merge branch 'fixes-for-ktls' · 4a02609d
      Jakub Kicinski authored
      John Fastabend says:
      
      ====================
      Couple fixes for TLS and BPF interactions.
      ====================
      
      Link: https://lore.kernel.org/r/20231206232706.374377-1-john.fastabend@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4a02609d
    • John Fastabend's avatar
      bpf: sockmap, updating the sg structure should also update curr · bb9aefde
      John Fastabend authored
      Curr pointer should be updated when the sg structure is shifted.
      
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20231206232706.374377-3-john.fastabend@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bb9aefde
    • John Fastabend's avatar
      net: tls, update curr on splice as well · c5a59500
      John Fastabend authored
      The curr pointer must also be updated on the splice similar to how
      we do this for other copy types.
      
      Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reported-by: default avatarJann Horn <jannh@google.com>
      Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c5a59500
    • Kirill A. Shutemov's avatar
      x86/tdx: Allow 32-bit emulation by default · f4116bfc
      Kirill A. Shutemov authored
      32-bit emulation was disabled on TDX to prevent a possible attack by
      a VMM injecting an interrupt on vector 0x80.
      
      Now that int80_emulation() has a check for external interrupts the
      limitation can be lifted.
      
      To distinguish software interrupts from external ones, int80_emulation()
      checks the APIC ISR bit relevant to the 0x80 vector. For
      software interrupts, this bit will be 0.
      
      On TDX, the VAPIC state (including ISR) is protected and cannot be
      manipulated by the VMM. The ISR bit is set by the microcode flow during
      the handling of posted interrupts.
      
      [ dhansen: more changelog tweaks ]
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # v6.0+
      f4116bfc
    • Thomas Gleixner's avatar
      x86/entry: Do not allow external 0x80 interrupts · 55617fb9
      Thomas Gleixner authored
      The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
      kernel expects to receive a software interrupt as a result of the INT
      0x80 instruction. However, an external interrupt on the same vector
      also triggers the same codepath.
      
      An external interrupt on vector 0x80 will currently be interpreted as a
      32-bit system call, and assuming that it was a user context.
      
      Panic on external interrupts on the vector.
      
      To distinguish software interrupts from external ones, the kernel checks
      the APIC ISR bit relevant to the 0x80 vector. For software interrupts,
      this bit will be 0.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # v6.0+
      55617fb9
    • Thomas Gleixner's avatar
      x86/entry: Convert INT 0x80 emulation to IDTENTRY · be5341eb
      Thomas Gleixner authored
      There is no real reason to have a separate ASM entry point implementation
      for the legacy INT 0x80 syscall emulation on 64-bit.
      
      IDTENTRY provides all the functionality needed with the only difference
      that it does not:
      
        - save the syscall number (AX) into pt_regs::orig_ax
        - set pt_regs::ax to -ENOSYS
      
      Both can be done safely in the C code of an IDTENTRY before invoking any of
      the syscall related functions which depend on this convention.
      
      Aside of ASM code reduction this prepares for detecting and handling a
      local APIC injected vector 0x80.
      
      [ kirill.shutemov: More verbose comments ]
      Suggested-by: default avatarLinus Torvalds <torvalds@linuxfoundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # v6.0+
      be5341eb
    • Kirill A. Shutemov's avatar
      x86/coco: Disable 32-bit emulation by default on TDX and SEV · b82a8dbd
      Kirill A. Shutemov authored
      The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
      kernel expects to receive a software interrupt as a result of the INT
      0x80 instruction. However, an external interrupt on the same vector
      triggers the same handler.
      
      The kernel interprets an external interrupt on vector 0x80 as a 32-bit
      system call that came from userspace.
      
      A VMM can inject external interrupts on any arbitrary vector at any
      time.  This remains true even for TDX and SEV guests where the VMM is
      untrusted.
      
      Put together, this allows an untrusted VMM to trigger int80 syscall
      handling at any given point. The content of the guest register file at
      that moment defines what syscall is triggered and its arguments. It
      opens the guest OS to manipulation from the VMM side.
      
      Disable 32-bit emulation by default for TDX and SEV. User can override
      it with the ia32_emulation=y command line option.
      
      [ dhansen: reword the changelog ]
      Reported-by: default avatarSupraja Sridhara <supraja.sridhara@inf.ethz.ch>
      Reported-by: default avatarBenedict Schlüter <benedict.schlueter@inf.ethz.ch>
      Reported-by: default avatarMark Kuhne <mark.kuhne@inf.ethz.ch>
      Reported-by: default avatarAndrin Bertschi <andrin.bertschi@inf.ethz.ch>
      Reported-by: default avatarShweta Shinde <shweta.shinde@inf.ethz.ch>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # v6.0+: 1da5c9bc x86: Introduce ia32_enabled()
      Cc: <stable@vger.kernel.org> # v6.0+
      b82a8dbd
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 4de75d3e
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Incorrect nf_defrag registration for bpf link infra, from D. Wythe.
      
      2) Skip inactive elements in pipapo set backend walk to avoid double
         deactivation, from Florian Westphal.
      
      3) Fix NFT_*_F_PRESENT check with big endian arch, also from Florian.
      
      4) Bail out if number of expressions in NFTA_DYNSET_EXPRESSIONS mismatch
         stateful expressions in set declaration.
      
      5) Honor family in table lookup by handle. Broken since 4.16.
      
      6) Use sk_callback_lock to protect access to sk->sk_socket in xt_owner.
         sock_orphan() might zap this pointer, from Phil Sutter.
      
      All of these fixes address broken stuff for several releases.
      
      * tag 'nf-23-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
        netfilter: nf_tables: validate family when identifying table via handle
        netfilter: nf_tables: bail out on mismatching dynset and set expressions
        netfilter: nf_tables: fix 'exist' matching on bigendian arches
        netfilter: nft_set_pipapo: skip inactive elements during set walk
        netfilter: bpf: fix bad registration on nf_defrag
      ====================
      
      Link: https://lore.kernel.org/r/20231206180357.959930-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4de75d3e
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · c85e5594
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-12-06
      
      We've added 4 non-merge commits during the last 6 day(s) which contain
      a total of 7 files changed, 185 insertions(+), 55 deletions(-).
      
      The main changes are:
      
      1) Fix race found by syzkaller on prog_array_map_poke_run when
         a BPF program's kallsym symbols were still missing, from Jiri Olsa.
      
      2) Fix BPF verifier's branch offset comparison for BPF_JMP32 | BPF_JA,
         from Yonghong Song.
      
      3) Fix xsk's poll handling to only set mask on bound xsk sockets,
         from Yewon Choi.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add test for early update in prog_array_map_poke_run
        bpf: Fix prog_array_map_poke_run map poke update
        xsk: Skip polling event check for unbound socket
        bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
      ====================
      
      Link: https://lore.kernel.org/r/20231206220528.12093-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c85e5594
    • Daniel Danzberger's avatar
      net: dsa: microchip: move ksz_chip_id enum to platform include · d16f1096
      Daniel Danzberger authored
      With the ksz_chip_id enums moved to the platform include file for ksz
      switches, platform code that instantiates a device can now use these to
      set ksz_platform_data::chip_id.
      Signed-off-by: default avatarDaniel Danzberger <dd@embedd.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d16f1096
    • Vladimir Oltean's avatar
      net: dsa: microchip: properly support platform_data probing · 3bc05faf
      Vladimir Oltean authored
      The ksz driver has bits and pieces of platform_data probing support, but
      it doesn't work.
      
      The conventional thing to do is to have an encapsulating structure for
      struct dsa_chip_data that gets put into dev->platform_data. This driver
      expects a struct ksz_platform_data, but that doesn't contain a struct
      dsa_chip_data as first element, which will obviously not work with
      dsa_switch_probe() -> dsa_switch_parse().
      
      Pointing dev->platform_data to a struct dsa_chip_data directly is in
      principle possible, but that doesn't work either. The driver has
      ksz_switch_detect() to read the device ID from hardware, followed by
      ksz_check_device_id() to compare it against a predetermined expected
      value. This protects against early errors in the SPI/I2C communication.
      With platform_data, the mechanism in ksz_check_device_id() doesn't work
      and even leads to NULL pointer dereferences, since of_device_get_match_data()
      doesn't work in that probe path.
      
      So obviously, the platform_data support is actually missing, and the
      existing handling of struct ksz_platform_data is bogus. Complete the
      support by adding a struct dsa_chip_data as first element, and fixing up
      ksz_check_device_id() to pick up the platform_data instead of the
      unavailable of_device_get_match_data().
      
      The early dev->chip_id assignment from ksz_switch_register() is also
      bogus, because ksz_switch_detect() sets it to an initial value. So
      remove it.
      
      Also, ksz_platform_data :: enabled_ports isn't used anywhere, delete it.
      
      Link: https://lore.kernel.org/netdev/20231204154315.3906267-1-dd@embedd.com/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDaniel Danzberger <dd@embedd.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bc05faf
    • David S. Miller's avatar
      Merge branch 'dsa-microchip-rmii-reference' · d5449d59
      David S. Miller authored
      Ante Knezic says:
      
      ====================
      net: dsa: microchip: enable setting rmii reference
      
      KSZ88X3 devices can select between internal and external RMII reference clock.
      This patch series introduces new device tree property for setting reference
      clock to internal.
      
      ---
      V5:
        - move rmii-clk-internal to be a port device tree property.
      V4:
        - remove rmii_clk_internal from ksz_device, as its not needed any more
        - move rmii clk config as well as ksz8795_cpu_interface_select to
          ksz8_config_cpu_port
      V3:
        - move ksz_cfg from global switch config to port config as suggested by Vladimir
          Oltean
        - reverse patch order as suggested by Vladimir Oltean
        - adapt dt schema as suggested by Conor Dooley
      V2:
        - don't rely on default register settings - enforce set/clear property as
          suggested by Andrew Lunn
        - enforce dt schema as suggested by Conor Dooley
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5449d59
    • Ante Knezic's avatar
      net: dsa: microchip: add property to select internal RMII reference clock · 9f19a4eb
      Ante Knezic authored
      Microchip KSZ8863/KSZ8873 have the ability to select between internal
      and external RMII reference clock. By default, reference clock
      needs to be provided via REFCLKI_3 pin. If required, device can be
      setup to provide RMII clock internally so that REFCLKI_3 pin can be
      left unconnected.
      Add a new "microchip,rmii-clk-internal" property which will set
      RMII clock reference to internal. If property is not set, reference
      clock needs to be provided externally.
      
      While at it, move the ksz8795_cpu_interface_select() to
      ksz8_config_cpu_port() to get a cleaner call path for cpu port.
      Signed-off-by: default avatarAnte Knezic <ante.knezic@helmholz.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f19a4eb
    • Ante Knezic's avatar
      dt-bindings: net: microchip,ksz: document microchip,rmii-clk-internal · 8e3bfaab
      Ante Knezic authored
      Add documentation for selecting reference rmii clock on KSZ88X3 devices
      Signed-off-by: default avatarAnte Knezic <ante.knezic@helmholz.de>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e3bfaab
    • Hui Zhou's avatar
      nfp: flower: fix for take a mutex lock in soft irq context and rcu lock · 0ad722bd
      Hui Zhou authored
      The neighbour event callback call the function nfp_tun_write_neigh,
      this function will take a mutex lock and it is in soft irq context,
      change the work queue to process the neighbour event.
      
      Move the nfp_tun_write_neigh function out of range rcu_read_lock/unlock()
      in function nfp_tunnel_request_route_v4 and nfp_tunnel_request_route_v6.
      
      Fixes: abc21095 ("nfp: flower: tunnel neigh support bond offload")
      CC: stable@vger.kernel.org # 6.2+
      Signed-off-by: default avatarHui Zhou <hui.zhou@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ad722bd
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 55b224d9
      Linus Torvalds authored
      Pull parisc fix from Helge Deller:
       "A single line patch for parisc which fixes the build in tinyconfig
        configurations:
      
         - Fix asm operand number out of range build error in bug table"
      
      * tag 'parisc-for-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix asm operand number out of range build error in bug table
      55b224d9
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 803a809d
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-12-05 (ice, i40e, iavf)
      
      This series contains updates to ice, i40e and iavf drivers.
      
      Michal fixes incorrect usage of VF MSIX value and index calculation for
      ice.
      
      Marcin restores disabling of Rx VLAN filtering which was inadvertently
      removed for ice.
      
      Ivan Vecera corrects improper messaging of MFS port for i40e.
      
      Jake fixes incorrect checking of coalesce values on iavf.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: validate tx_coalesce_usecs even if rx_coalesce_usecs is zero
        i40e: Fix unexpected MFS warning message
        ice: Restore fix disabling RX VLAN filtering
        ice: change vfs.num_msix_per to vf->num_msix
      ====================
      
      Link: https://lore.kernel.org/r/20231205211918.2123019-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      803a809d
    • Johannes Berg's avatar
      net: rtnetlink: remove local list in __linkwatch_run_queue() · b8dbbbc5
      Johannes Berg authored
      Due to linkwatch_forget_dev() (and perhaps others?) checking for
      list_empty(&dev->link_watch_list), we must have all manipulations
      of even the local on-stack list 'wrk' here under spinlock, since
      even that list can be reached otherwise via dev->link_watch_list.
      
      This is already the case, but makes this a bit counter-intuitive,
      often local lists are used to _not_ have to use locking for their
      local use.
      
      Remove the local list as it doesn't seem to serve any purpose.
      While at it, move a variable declaration into the loop using it.
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Link: https://lore.kernel.org/r/20231205170011.56576dcc1727.I698b72219d9f6ce789bd209b8f6dffd0ca32a8f2@changeidSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b8dbbbc5
    • Tobias Waldekranz's avatar
      net: dsa: mv88e6xxx: Restore USXGMII support for 6393X · 0c7ed1f9
      Tobias Waldekranz authored
      In 4a562127, USXGMII support was added for 6393X, but this was
      lost in the PCS conversion (the blamed commit), most likely because
      these efforts where more or less done in parallel.
      
      Restore this feature by porting Michal's patch to fit the new
      implementation.
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Tested-by: default avatarMichal Smulski <michal.smulski@ooma.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Fixes: e5b732a2 ("net: dsa: mv88e6xxx: convert 88e639x to phylink_pcs")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Link: https://lore.kernel.org/r/20231205221359.3926018-1-tobias@waldekranz.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c7ed1f9
    • Eric Dumazet's avatar
      tcp: do not accept ACK of bytes we never sent · 3d501dd3
      Eric Dumazet authored
      This patch is based on a detailed report and ideas from Yepeng Pan
      and Christian Rossow.
      
      ACK seq validation is currently following RFC 5961 5.2 guidelines:
      
         The ACK value is considered acceptable only if
         it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
         SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
         above condition MUST be discarded and an ACK sent back.  It needs to
         be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a
         duplicate (SEG.ACK < SND.UNA), it can be ignored.  If the ACK
         acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an
         ACK, drop the segment, and return".  The "ignored" above implies that
         the processing of the incoming data segment continues, which means
         the ACK value is treated as acceptable.  This mitigation makes the
         ACK check more stringent since any ACK < SND.UNA wouldn't be
         accepted, instead only ACKs that are in the range ((SND.UNA -
         MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through.
      
      This can be refined for new (and possibly spoofed) flows,
      by not accepting ACK for bytes that were never sent.
      
      This greatly improves TCP security at a little cost.
      
      I added a Fixes: tag to make sure this patch will reach stable trees,
      even if the 'blamed' patch was adhering to the RFC.
      
      tp->bytes_acked was added in linux-4.2
      
      Following packetdrill test (courtesy of Yepeng Pan) shows
      the issue at hand:
      
      0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1024) = 0
      
      // ---------------- Handshake ------------------- //
      
      // when window scale is set to 14 the window size can be extended to
      // 65535 * (2^14) = 1073725440. Linux would accept an ACK packet
      // with ack number in (Server_ISN+1-1073725440. Server_ISN+1)
      // ,though this ack number acknowledges some data never
      // sent by the server.
      
      +0 < S 0:0(0) win 65535 <mss 1400,nop,wscale 14>
      +0 > S. 0:0(0) ack 1 <...>
      +0 < . 1:1(0) ack 1 win 65535
      +0 accept(3, ..., ...) = 4
      
      // For the established connection, we send an ACK packet,
      // the ack packet uses ack number 1 - 1073725300 + 2^32,
      // where 2^32 is used to wrap around.
      // Note: we used 1073725300 instead of 1073725440 to avoid possible
      // edge cases.
      // 1 - 1073725300 + 2^32 = 3221241997
      
      // Oops, old kernels happily accept this packet.
      +0 < . 1:1001(1000) ack 3221241997 win 65535
      
      // After the kernel fix the following will be replaced by a challenge ACK,
      // and prior malicious frame would be dropped.
      +0 > . 1:1(0) ack 1001
      
      Fixes: 354e4aa3 ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarYepeng Pan <yepeng.pan@cispa.de>
      Reported-by: default avatarChristian Rossow <rossow@cispa.de>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20231205161841.2702925-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d501dd3
    • Eric Dumazet's avatar
      ipv6: add debug checks in fib6_info_release() · 5a08d006
      Eric Dumazet authored
      Some elusive syzbot reports are hinting to fib6_info_release(),
      with a potential dangling f6i->gc_link anchor.
      
      Add debug checks so that syzbot can catch the issue earlier eventually.
      
      BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
      BUG: KASAN: slab-use-after-free in hlist_del_init include/linux/list.h:1016 [inline]
      BUG: KASAN: slab-use-after-free in fib6_clean_expires_locked include/net/ip6_fib.h:533 [inline]
      BUG: KASAN: slab-use-after-free in fib6_purge_rt+0x986/0x9c0 net/ipv6/ip6_fib.c:1064
      Write of size 8 at addr ffff88802805a840 by task syz-executor.1/10057
      
      CPU: 1 PID: 10057 Comm: syz-executor.1 Not tainted 6.7.0-rc2-syzkaller-00029-g9b6de136 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:364 [inline]
      print_report+0xc4/0x620 mm/kasan/report.c:475
      kasan_report+0xda/0x110 mm/kasan/report.c:588
      __hlist_del include/linux/list.h:990 [inline]
      hlist_del_init include/linux/list.h:1016 [inline]
      fib6_clean_expires_locked include/net/ip6_fib.h:533 [inline]
      fib6_purge_rt+0x986/0x9c0 net/ipv6/ip6_fib.c:1064
      fib6_del_route net/ipv6/ip6_fib.c:1993 [inline]
      fib6_del+0xa7a/0x1750 net/ipv6/ip6_fib.c:2038
      __ip6_del_rt net/ipv6/route.c:3866 [inline]
      ip6_del_rt+0xf7/0x200 net/ipv6/route.c:3881
      ndisc_router_discovery+0x295b/0x3560 net/ipv6/ndisc.c:1372
      ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856
      icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979
      ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438
      ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
      NF_HOOK include/linux/netfilter.h:314 [inline]
      NF_HOOK include/linux/netfilter.h:308 [inline]
      ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492
      ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586
      dst_input include/net/dst.h:461 [inline]
      ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
      NF_HOOK include/linux/netfilter.h:314 [inline]
      NF_HOOK include/linux/netfilter.h:308 [inline]
      ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310
      __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529
      __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643
      netif_receive_skb_internal net/core/dev.c:5729 [inline]
      netif_receive_skb+0x133/0x700 net/core/dev.c:5788
      tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
      tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002
      tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
      call_write_iter include/linux/fs.h:2020 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x64f/0xdf0 fs/read_write.c:584
      ksys_write+0x12f/0x250 fs/read_write.c:637
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
      RIP: 0033:0x7f38e387b82f
      Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 b9 80 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c 81 02 00 48
      RSP: 002b:00007f38e45c9090 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f38e399bf80 RCX: 00007f38e387b82f
      RDX: 00000000000003b6 RSI: 0000000020000680 RDI: 00000000000000c8
      RBP: 00007f38e38c847a R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000000003b6 R11: 0000000000000293 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f38e399bf80 R15: 00007f38e3abfa48
      </TASK>
      
      Allocated by task 10044:
      kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
      kasan_set_track+0x25/0x30 mm/kasan/common.c:52
      ____kasan_kmalloc mm/kasan/common.c:374 [inline]
      __kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383
      kasan_kmalloc include/linux/kasan.h:198 [inline]
      __do_kmalloc_node mm/slab_common.c:1007 [inline]
      __kmalloc+0x59/0x90 mm/slab_common.c:1020
      kmalloc include/linux/slab.h:604 [inline]
      kzalloc include/linux/slab.h:721 [inline]
      fib6_info_alloc+0x40/0x160 net/ipv6/ip6_fib.c:155
      ip6_route_info_create+0x337/0x1e70 net/ipv6/route.c:3749
      ip6_route_add+0x26/0x150 net/ipv6/route.c:3843
      rt6_add_route_info+0x2e7/0x4b0 net/ipv6/route.c:4316
      rt6_route_rcv+0x76c/0xbf0 net/ipv6/route.c:985
      ndisc_router_discovery+0x138b/0x3560 net/ipv6/ndisc.c:1529
      ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856
      icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979
      ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438
      ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
      NF_HOOK include/linux/netfilter.h:314 [inline]
      NF_HOOK include/linux/netfilter.h:308 [inline]
      ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492
      ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586
      dst_input include/net/dst.h:461 [inline]
      ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
      NF_HOOK include/linux/netfilter.h:314 [inline]
      NF_HOOK include/linux/netfilter.h:308 [inline]
      ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310
      __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529
      __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643
      netif_receive_skb_internal net/core/dev.c:5729 [inline]
      netif_receive_skb+0x133/0x700 net/core/dev.c:5788
      tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
      tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002
      tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
      call_write_iter include/linux/fs.h:2020 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x64f/0xdf0 fs/read_write.c:584
      ksys_write+0x12f/0x250 fs/read_write.c:637
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Freed by task 5123:
      kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
      kasan_set_track+0x25/0x30 mm/kasan/common.c:52
      kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:522
      ____kasan_slab_free mm/kasan/common.c:236 [inline]
      ____kasan_slab_free+0x15b/0x1b0 mm/kasan/common.c:200
      kasan_slab_free include/linux/kasan.h:164 [inline]
      slab_free_hook mm/slub.c:1800 [inline]
      slab_free_freelist_hook+0x114/0x1e0 mm/slub.c:1826
      slab_free mm/slub.c:3809 [inline]
      __kmem_cache_free+0xc0/0x180 mm/slub.c:3822
      rcu_do_batch kernel/rcu/tree.c:2158 [inline]
      rcu_core+0x819/0x1680 kernel/rcu/tree.c:2431
      __do_softirq+0x21a/0x8de kernel/softirq.c:553
      
      Last potentially related work creation:
      kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
      __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:492
      __call_rcu_common.constprop.0+0x9a/0x7a0 kernel/rcu/tree.c:2681
      fib6_info_release include/net/ip6_fib.h:332 [inline]
      fib6_info_release include/net/ip6_fib.h:329 [inline]
      rt6_route_rcv+0xa4e/0xbf0 net/ipv6/route.c:997
      ndisc_router_discovery+0x138b/0x3560 net/ipv6/ndisc.c:1529
      ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856
      icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979
      ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438
      ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
      NF_HOOK include/linux/netfilter.h:314 [inline]
      NF_HOOK include/linux/netfilter.h:308 [inline]
      ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492
      ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586
      dst_input include/net/dst.h:461 [inline]
      ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
      NF_HOOK include/linux/netfilter.h:314 [inline]
      NF_HOOK include/linux/netfilter.h:308 [inline]
      ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310
      __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529
      __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643
      netif_receive_skb_internal net/core/dev.c:5729 [inline]
      netif_receive_skb+0x133/0x700 net/core/dev.c:5788
      tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
      tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002
      tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
      call_write_iter include/linux/fs.h:2020 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x64f/0xdf0 fs/read_write.c:584
      ksys_write+0x12f/0x250 fs/read_write.c:637
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Second to last potentially related work creation:
      kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
      __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:492
      insert_work+0x38/0x230 kernel/workqueue.c:1647
      __queue_work+0xcdc/0x11f0 kernel/workqueue.c:1803
      call_timer_fn+0x193/0x590 kernel/time/timer.c:1700
      expire_timers kernel/time/timer.c:1746 [inline]
      __run_timers+0x585/0xb20 kernel/time/timer.c:2022
      run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035
      __do_softirq+0x21a/0x8de kernel/softirq.c:553
      
      The buggy address belongs to the object at ffff88802805a800
      which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 64 bytes inside of
      freed 512-byte region [ffff88802805a800, ffff88802805aa00)
      
      The buggy address belongs to the physical page:
      page:ffffea0000a01600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x28058
      head:ffffea0000a01600 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      page_type: 0xffffffff()
      raw: 00fff00000000840 ffff888013041c80 ffffea0001e02600 dead000000000002
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 18706, tgid 18699 (syz-executor.2), ts 999991973280, free_ts 996884464281
      set_page_owner include/linux/page_owner.h:31 [inline]
      post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1537
      prep_new_page mm/page_alloc.c:1544 [inline]
      get_page_from_freelist+0xa25/0x36d0 mm/page_alloc.c:3312
      __alloc_pages+0x22e/0x2420 mm/page_alloc.c:4568
      alloc_pages_mpol+0x258/0x5f0 mm/mempolicy.c:2133
      alloc_slab_page mm/slub.c:1870 [inline]
      allocate_slab mm/slub.c:2017 [inline]
      new_slab+0x283/0x3c0 mm/slub.c:2070
      ___slab_alloc+0x979/0x1500 mm/slub.c:3223
      __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3322
      __slab_alloc_node mm/slub.c:3375 [inline]
      slab_alloc_node mm/slub.c:3468 [inline]
      __kmem_cache_alloc_node+0x131/0x310 mm/slub.c:3517
      __do_kmalloc_node mm/slab_common.c:1006 [inline]
      __kmalloc+0x49/0x90 mm/slab_common.c:1020
      kmalloc include/linux/slab.h:604 [inline]
      kzalloc include/linux/slab.h:721 [inline]
      copy_splice_read+0x1ac/0x8f0 fs/splice.c:338
      vfs_splice_read fs/splice.c:992 [inline]
      vfs_splice_read+0x2ea/0x3b0 fs/splice.c:962
      splice_direct_to_actor+0x2a5/0xa30 fs/splice.c:1069
      do_splice_direct+0x1af/0x280 fs/splice.c:1194
      do_sendfile+0xb3e/0x1310 fs/read_write.c:1254
      __do_sys_sendfile64 fs/read_write.c:1322 [inline]
      __se_sys_sendfile64 fs/read_write.c:1308 [inline]
      __x64_sys_sendfile64+0x1d6/0x220 fs/read_write.c:1308
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
      page last free stack trace:
      reset_page_owner include/linux/page_owner.h:24 [inline]
      free_pages_prepare mm/page_alloc.c:1137 [inline]
      free_unref_page_prepare+0x4fa/0xaa0 mm/page_alloc.c:2347
      free_unref_page_list+0xe6/0xb40 mm/page_alloc.c:2533
      release_pages+0x32a/0x14f0 mm/swap.c:1042
      tlb_batch_pages_flush+0x9a/0x190 mm/mmu_gather.c:98
      tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
      tlb_flush_mmu mm/mmu_gather.c:300 [inline]
      tlb_finish_mmu+0x14b/0x6f0 mm/mmu_gather.c:392
      exit_mmap+0x38b/0xa70 mm/mmap.c:3321
      __mmput+0x12a/0x4d0 kernel/fork.c:1349
      mmput+0x62/0x70 kernel/fork.c:1371
      exit_mm kernel/exit.c:567 [inline]
      do_exit+0x9ad/0x2ae0 kernel/exit.c:858
      do_group_exit+0xd4/0x2a0 kernel/exit.c:1021
      get_signal+0x23be/0x2790 kernel/signal.c:2904
      arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309
      exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
      exit_to_user_mode_prepare+0x121/0x240 kernel/entry/common.c:204
      irqentry_exit_to_user_mode+0xa/0x40 kernel/entry/common.c:309
      asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20231205173250.2982846-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5a08d006
  3. 06 Dec, 2023 3 commits