1. 11 Aug, 2022 6 commits
    • Florian Westphal's avatar
      netfilter: nfnetlink: re-enable conntrack expectation events · 0b2f3212
      Florian Westphal authored
      To avoid allocation of the conntrack extension area when possible,
      the default behaviour was changed to only allocate the event extension
      if a userspace program is subscribed to a notification group.
      
      Problem is that while 'conntrack -E' does enable the event allocation
      behind the scenes, 'conntrack -E expect' does not: no expectation events
      are delivered unless user sets
      "net.netfilter.nf_conntrack_events" back to 1 (always on).
      
      Fix the autodetection to also consider EXP type group.
      
      We need to track the 6 event groups (3+3, new/update/destroy for events and
      for expectations each) independently, else we'd disable events again
      if an expectation group becomes empty while there is still an active
      event group.
      
      Fixes: 2794cdb0 ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist")
      Reported-by: default avatarYi Chen <yiche@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      0b2f3212
    • Florian Westphal's avatar
      netfilter: nf_tables: fix scheduling-while-atomic splat · 2024439b
      Florian Westphal authored
      nf_tables_check_loops() can be called from rhashtable list
      walk so cond_resched() cannot be used here.
      
      Fixes: 81ea0106 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2024439b
    • Florian Westphal's avatar
      netfilter: nf_ct_irc: cap packet search space to 4k · 976bf59c
      Florian Westphal authored
      This uses a pseudo-linearization scheme with a 64k global buffer,
      but BIG TCP arrival means IPv6 TCP stack can generate skbs
      that exceed this size.
      
      In practice, IRC commands are not expected to exceed 512 bytes, plus
      this is interactive protocol, so we should not see large packets
      in practice.
      
      Given most IRC connections nowadays use TLS so this helper could also be
      removed in the near future.
      
      Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536")
      Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      976bf59c
    • Florian Westphal's avatar
      netfilter: nf_ct_ftp: prefer skb_linearize · c783a29c
      Florian Westphal authored
      This uses a pseudo-linearization scheme with a 64k global buffer,
      but BIG TCP arrival means IPv6 TCP stack can generate skbs
      that exceed this size.
      
      Use skb_linearize.  It should be possible to rewrite this to properly
      deal with segmented skbs (i.e., only do small chunk-wise accesses),
      but this is going to be a lot more intrusive than this because every
      helper function needs to get the sk_buff instead of a pointer to a raw
      data buffer.
      
      In practice, provided we're really looking at FTP control channel packets,
      there should never be a case where we deal with huge packets.
      
      Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536")
      Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c783a29c
    • Florian Westphal's avatar
      netfilter: nf_ct_h323: cap packet size at 64k · f3e124c3
      Florian Westphal authored
      With BIG TCP, packets generated by tcp stack may exceed 64kb.
      Cap datalen at 64kb.  The internal message format uses 16bit fields,
      so no embedded message can exceed 64k size.
      
      Multiple h323 messages in a single superpacket may now result
      in a message to get treated as incomplete/truncated, but thats
      better than scribbling past h323_buffer.
      
      Another alternative suitable for net tree would be a switch to
      skb_linearize().
      
      Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536")
      Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f3e124c3
    • Florian Westphal's avatar
      netfilter: nf_ct_sane: remove pseudo skb linearization · a664375d
      Florian Westphal authored
      For historical reason this code performs pseudo linearization of skbs
      via skb_header_pointer and a global 64k buffer.
      
      With arrival of BIG TCP, packets generated by TCP stack can exceed 64kb.
      
      Rewrite this to only extract the needed header data.  This also allows
      to get rid of the locking.
      
      Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536")
      Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a664375d
  2. 10 Aug, 2022 21 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: possible module reference underflow in error path · c485c35f
      Pablo Neira Ayuso authored
      dst->ops is set on when nft_expr_clone() fails, but module refcount has
      not been bumped yet, therefore nft_expr_destroy() leads to module
      reference underflow.
      
      Fixes: 8cfd9b0f ("netfilter: nftables: generalize set expressions support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c485c35f
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag · 4963674c
      Pablo Neira Ayuso authored
      These are mutually exclusive, actually NFTA_SET_ELEM_KEY_END replaces
      the flag notation.
      
      Fixes: 7b225d0b ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4963674c
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access · 34002783
      Pablo Neira Ayuso authored
      The generation ID is bumped from the commit path while holding the
      mutex, however, netlink dump operations rely on RCU.
      
      This patch also adds missing cb->base_eq initialization in
      nf_tables_dump_set().
      
      Fixes: 38e029f1 ("netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      34002783
    • Jakub Kicinski's avatar
      genetlink: correct uAPI defines · f329a0eb
      Jakub Kicinski authored
      Commit 50a896cf ("genetlink: properly support per-op policy dumping")
      seems to have copy'n'pasted things a little incorrectly.
      
      The #define CTRL_ATTR_MCAST_GRP_MAX should have stayed right
      after the previous enum. The new CTRL_ATTR_POLICY_* needs
      its own define for MAX and that max should not contain the
      superfluous _DUMP in the name.
      
      We probably can't do anything about the CTRL_ATTR_POLICY_DUMP_MAX
      any more, there's likely code which uses it. For consistency
      (*cough* codegen *cough*) let's add the correctly name define
      nonetheless.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f329a0eb
    • Ido Schimmel's avatar
      devlink: Fix use-after-free after a failed reload · 6b4db2e5
      Ido Schimmel authored
      After a failed devlink reload, devlink parameters are still registered,
      which means user space can set and get their values. In the case of the
      mlxsw "acl_region_rehash_interval" parameter, these operations will
      trigger a use-after-free [1].
      
      Fix this by rejecting set and get operations while in the failed state.
      Return the "-EOPNOTSUPP" error code which does not abort the parameters
      dump, but instead causes it to skip over the problematic parameter.
      
      Another possible fix is to perform these checks in the mlxsw parameter
      callbacks, but other drivers might be affected by the same problem and I
      am not aware of scenarios where these stricter checks will cause a
      regression.
      
      [1]
      mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev
      mlxsw_spectrum3 0000:00:10.0: Failed to create ports
      
      ==================================================================
      BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
      Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777
      
      CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1
      Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: netns cleanup_net
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:313 [inline]
       print_report.cold+0x5e/0x5cf mm/kasan/report.c:429
       kasan_report+0xb9/0xf0 mm/kasan/report.c:491
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306
       mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
       mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106
       mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854
       devlink_param_get net/core/devlink.c:4981 [inline]
       devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089
       devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168
       devlink_ns_change_notify net/core/devlink.c:4417 [inline]
       devlink_ns_change_notify net/core/devlink.c:4396 [inline]
       devlink_reload+0x15f/0x700 net/core/devlink.c:4507
       devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272
       ops_pre_exit_list net/core/net_namespace.c:152 [inline]
       cleanup_net+0x494/0xc00 net/core/net_namespace.c:582
       process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289
       worker_thread+0x675/0x10b0 kernel/workqueue.c:2436
       kthread+0x30c/0x3d0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
       </TASK>
      
      The buggy address belongs to the physical page:
      page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc
      flags: 0x100000000000000(node=0|zone=1)
      raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                          ^
       ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      
      Fixes: 98bbf70c ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b4db2e5
    • Sun Shouxin's avatar
      net:bonding:support balance-alb interface with vlan to bridge · d5410ac7
      Sun Shouxin authored
      In my test, balance-alb bonding with two slaves eth0 and eth1,
      and then Bond0.150 is created with vlan id attached bond0.
      After adding bond0.150 into one linux bridge, I noted that Bond0,
      bond0.150 and  bridge were assigned to the same MAC as eth0.
      Once bond0.150 receives a packet whose dest IP is bridge's
      and dest MAC is eth1's, the linux bridge will not match
      eth1's MAC entry in FDB, and not handle it as expected.
      The patch fix the issue, and diagram as below:
      
      eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac)
                            |
                         bond0.150(mac:eth0_mac)
                            |
                         bridge(ip:br_ip, mac:eth0_mac)--other port
      Suggested-by: default avatarHu Yadi <huyd12@chinatelecom.cn>
      Signed-off-by: default avatarSun Shouxin <sunshouxin@chinatelecom.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5410ac7
    • Clayton Yager's avatar
      macsec: Fix traffic counters/statistics · 91ec9bd5
      Clayton Yager authored
      OutOctetsProtected, OutOctetsEncrypted, InOctetsValidated, and
      InOctetsDecrypted were incrementing by the total number of octets in frames
      instead of by the number of octets of User Data in frames.
      
      The Controlled Port statistics ifOutOctets and ifInOctets were incrementing
      by the total number of octets instead of the number of octets of the MSDUs
      plus octets of the destination and source MAC addresses.
      
      The Controlled Port statistics ifInDiscards and ifInErrors were not
      incrementing each time the counters they aggregate were.
      
      The Controlled Port statistic ifInErrors was not included in the output of
      macsec_get_stats64 so the value was not present in ip commands output.
      
      The ReceiveSA counters InPktsNotValid, InPktsNotUsingSA, and InPktsUnusedSA
      were not incrementing.
      Signed-off-by: default avatarClayton Yager <Clayton_Yager@selinc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91ec9bd5
    • Peilin Ye's avatar
      vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout() · a3e7b29e
      Peilin Ye authored
      Imagine two non-blocking vsock_connect() requests on the same socket.
      The first request schedules @connect_work, and after it times out,
      vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps
      *socket* state as SS_CONNECTING.
      
      Later, the second request returns -EALREADY, meaning the socket "already
      has a pending connection in progress", even though the first request has
      already timed out.
      
      As suggested by Stefano, fix it by setting *socket* state back to
      SS_UNCONNECTED, so that the second request will return -ETIMEDOUT.
      Suggested-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3e7b29e
    • Peilin Ye's avatar
      vsock: Fix memory leak in vsock_connect() · 7e97cfed
      Peilin Ye authored
      An O_NONBLOCK vsock_connect() request may try to reschedule
      @connect_work.  Imagine the following sequence of vsock_connect()
      requests:
      
        1. The 1st, non-blocking request schedules @connect_work, which will
           expire after 200 jiffies.  Socket state is now SS_CONNECTING;
      
        2. Later, the 2nd, blocking request gets interrupted by a signal after
           a few jiffies while waiting for the connection to be established.
           Socket state is back to SS_UNCONNECTED, but @connect_work is still
           pending, and will expire after 100 jiffies.
      
        3. Now, the 3rd, non-blocking request tries to schedule @connect_work
           again.  Since @connect_work is already scheduled,
           schedule_delayed_work() silently returns.  sock_hold() is called
           twice, but sock_put() will only be called once in
           vsock_connect_timeout(), causing a memory leak reported by syzbot:
      
        BUG: memory leak
        unreferenced object 0xffff88810ea56a40 (size 1232):
          comm "syz-executor756", pid 3604, jiffies 4294947681 (age 12.350s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            28 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  (..@............
          backtrace:
            [<ffffffff837c830e>] sk_prot_alloc+0x3e/0x1b0 net/core/sock.c:1930
            [<ffffffff837cbe22>] sk_alloc+0x32/0x2e0 net/core/sock.c:1989
            [<ffffffff842ccf68>] __vsock_create.constprop.0+0x38/0x320 net/vmw_vsock/af_vsock.c:734
            [<ffffffff842ce8f1>] vsock_create+0xc1/0x2d0 net/vmw_vsock/af_vsock.c:2203
            [<ffffffff837c0cbb>] __sock_create+0x1ab/0x2b0 net/socket.c:1468
            [<ffffffff837c3acf>] sock_create net/socket.c:1519 [inline]
            [<ffffffff837c3acf>] __sys_socket+0x6f/0x140 net/socket.c:1561
            [<ffffffff837c3bba>] __do_sys_socket net/socket.c:1570 [inline]
            [<ffffffff837c3bba>] __se_sys_socket net/socket.c:1568 [inline]
            [<ffffffff837c3bba>] __x64_sys_socket+0x1a/0x20 net/socket.c:1568
            [<ffffffff84512815>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
            [<ffffffff84512815>] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
            [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
        <...>
      
      Use mod_delayed_work() instead: if @connect_work is already scheduled,
      reschedule it, and undo sock_hold() to keep the reference count
      balanced.
      
      Reported-and-tested-by: syzbot+b03f55bf128f9a38f064@syzkaller.appspotmail.com
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Co-developed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e97cfed
    • Jose Alonso's avatar
      Revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" · 6fd2c17f
      Jose Alonso authored
      This reverts commit 36a15e1c.
      
      The usage of FLAG_SEND_ZLP causes problems to other firmware/hardware
      versions that have no issues.
      
      The FLAG_SEND_ZLP is not safe to use in this context.
      See:
      https://patchwork.ozlabs.org/project/netdev/patch/1270599787.8900.8.camel@Linuxdev4-laptop/#118378
      The original problem needs another way to solve.
      
      Fixes: 36a15e1c ("net: usb: ax88179_178a needs FLAG_SEND_ZLP")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216327
      Link: https://bugs.archlinux.org/task/75491Signed-off-by: default avatarJose Alonso <joalonsof@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fd2c17f
    • Topi Miettinen's avatar
      netlabel: fix typo in comment · 2cd0e8db
      Topi Miettinen authored
      'IPv4 and IPv4' should be 'IPv4 and IPv6'.
      Signed-off-by: default avatarTopi Miettinen <toiwoton@gmail.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cd0e8db
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-6.0-20220810' of... · e7f16495
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-6.0-20220810' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      this is a pull request of 4 patches for net/master, with the
      whitespace issue fixed.
      
      Fedor Pchelkin contributes 2 fixes for the j1939 CAN protocol.
      
      A patch by me for the ems_usb driver fixes an unaligned access
      warning.
      
      Sebastian Würl's patch for the mcp251x driver fixes a race condition
      in the receive interrupt.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7f16495
    • Jakub Kicinski's avatar
      Merge branch 'do-not-use-rt_tos-for-ipv6-flowlabel' · 996237d9
      Jakub Kicinski authored
      Matthias May says:
      
      ====================
      Do not use RT_TOS for IPv6 flowlabel
      
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      ====================
      
      Link: https://lore.kernel.org/r/20220805191906.9323-1-matthias.may@westermo.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      996237d9
    • Matthias May's avatar
      ipv6: do not use RT_TOS for IPv6 flowlabel · ab7e2e0d
      Matthias May authored
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 571912c6 ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab7e2e0d
    • Matthias May's avatar
      mlx5: do not use RT_TOS for IPv6 flowlabel · bcb0da7f
      Matthias May authored
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: ce99f6b9 ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bcb0da7f
    • Matthias May's avatar
      vxlan: do not use RT_TOS for IPv6 flowlabel · e488d4f5
      Matthias May authored
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 1400615d ("vxlan: allow setting ipv6 traffic class")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e488d4f5
    • Matthias May's avatar
      geneve: do not use RT_TOS for IPv6 flowlabel · ca2bb695
      Matthias May authored
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 3a56f86f ("geneve: handle ipv6 priority like ipv4 tos")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ca2bb695
    • Matthias May's avatar
      geneve: fix TOS inheriting for ipv4 · b4ab94d6
      Matthias May authored
      The current code retrieves the TOS field after the lookup
      on the ipv4 routing table. The routing process currently
      only allows routing based on the original 3 TOS bits, and
      not on the full 6 DSCP bits.
      As a result the retrieved TOS is cut to the 3 bits.
      However for inheriting purposes the full 6 bits should be used.
      
      Extract the full 6 bits before the route lookup and use
      that instead of the cut off 3 TOS bits.
      
      Fixes: e305ac6c ("geneve: Add support to collect tunnel metadata.")
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Link: https://lore.kernel.org/r/20220805190006.8078-1-matthias.may@westermo.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b4ab94d6
    • Chia-Lin Kao (AceLan)'s avatar
      net: atlantic: fix aq_vec index out of range error · 2ba5e47f
      Chia-Lin Kao (AceLan) authored
      The final update statement of the for loop exceeds the array range, the
      dereference of self->aq_vec[i] is not checked and then leads to the
      index out of range error.
      Also fixed this kind of coding style in other for loop.
      
      [   97.937604] UBSAN: array-index-out-of-bounds in drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1404:48
      [   97.937607] index 8 is out of range for type 'aq_vec_s *[8]'
      [   97.937608] CPU: 38 PID: 3767 Comm: kworker/u256:18 Not tainted 5.19.0+ #2
      [   97.937610] Hardware name: Dell Inc. Precision 7865 Tower/, BIOS 1.0.0 06/12/2022
      [   97.937611] Workqueue: events_unbound async_run_entry_fn
      [   97.937616] Call Trace:
      [   97.937617]  <TASK>
      [   97.937619]  dump_stack_lvl+0x49/0x63
      [   97.937624]  dump_stack+0x10/0x16
      [   97.937626]  ubsan_epilogue+0x9/0x3f
      [   97.937627]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
      [   97.937629]  ? __scm_send+0x348/0x440
      [   97.937632]  ? aq_vec_stop+0x72/0x80 [atlantic]
      [   97.937639]  aq_nic_stop+0x1b6/0x1c0 [atlantic]
      [   97.937644]  aq_suspend_common+0x88/0x90 [atlantic]
      [   97.937648]  aq_pm_suspend_poweroff+0xe/0x20 [atlantic]
      [   97.937653]  pci_pm_suspend+0x7e/0x1a0
      [   97.937655]  ? pci_pm_suspend_noirq+0x2b0/0x2b0
      [   97.937657]  dpm_run_callback+0x54/0x190
      [   97.937660]  __device_suspend+0x14c/0x4d0
      [   97.937661]  async_suspend+0x23/0x70
      [   97.937663]  async_run_entry_fn+0x33/0x120
      [   97.937664]  process_one_work+0x21f/0x3f0
      [   97.937666]  worker_thread+0x4a/0x3c0
      [   97.937668]  ? process_one_work+0x3f0/0x3f0
      [   97.937669]  kthread+0xf0/0x120
      [   97.937671]  ? kthread_complete_and_exit+0x20/0x20
      [   97.937672]  ret_from_fork+0x22/0x30
      [   97.937676]  </TASK>
      
      v2. fixed "warning: variable 'aq_vec' set but not used"
      
      v3. simplified a for loop
      
      Fixes: 97bde5c4 ("net: ethernet: aquantia: Support for NIC-specific code")
      Signed-off-by: default avatarChia-Lin Kao (AceLan) <acelan.kao@canonical.com>
      Acked-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Link: https://lore.kernel.org/r/20220808081845.42005-1-acelan.kao@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2ba5e47f
    • Christophe JAILLET's avatar
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 690bf643
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Harden set element field checks to avoid out-of-bound memory access,
         this patch also fixes the type of issue described in 7e6bc1f6
         ("netfilter: nf_tables: stricter validation of element data") in a
         broader way.
      
      2) Patches to restrict the chain, set, and rule id lookup in the
         transaction to the corresponding top-level table, patches from
         Thadeu Lima de Souza Cascardo.
      
      3) Fix incorrect comment in ip6t_LOG.h
      
      4) nft_data_init() performs upfront validation of the expected data.
         struct nft_data_desc is used to describe the expected data to be
         received from userspace. The .size field represents the maximum size
         that can be stored, for bound checks. Then, .len is an input/output field
         which stores the expected length as input (this is optional, to restrict
         the checks), as output it stores the real length received from userspace
         (if it was not specified as input). This patch comes in response to
         7e6bc1f6 ("netfilter: nf_tables: stricter validation of element data")
         to address this type of issue in a more generic way by avoid opencoded
         data validation. Next patch requires this as a dependency.
      
      5) Disallow jump to implicit chain from set element, this configuration
         is invalid. Only allow jump to chain via immediate expression is
         supported at this stage.
      
      6) Fix possible null-pointer derefence in the error path of table updates,
         if memory allocation of the transaction fails. From Florian Westphal.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: fix null deref due to zeroed list head
        netfilter: nf_tables: disallow jump to implicit chain from set element
        netfilter: nf_tables: upfront validation of data via nft_data_init()
        netfilter: ip6t_LOG: Fix a typo in a comment
        netfilter: nf_tables: do not allow RULE_ID to refer to another chain
        netfilter: nf_tables: do not allow CHAIN_ID to refer to another table
        netfilter: nf_tables: do not allow SET_ID to refer to another table
        netfilter: nf_tables: validate variable length element extension
      ====================
      
      Link: https://lore.kernel.org/r/20220809220532.130240-1-pablo@netfilter.org/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      690bf643
  3. 09 Aug, 2022 13 commits
    • Sebastian Würl's avatar
      can: mcp251x: Fix race condition on receive interrupt · d80d60b0
      Sebastian Würl authored
      The mcp251x driver uses both receiving mailboxes of the CAN controller
      chips. For retrieving the CAN frames from the controller via SPI, it checks
      once per interrupt which mailboxes have been filled and will retrieve the
      messages accordingly.
      
      This introduces a race condition, as another CAN frame can enter mailbox 1
      while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
      the interrupt handler is called next, mailbox 0 is emptied before
      mailbox 1, leading to out-of-order CAN frames in the network device.
      
      This is fixed by checking the interrupt flags once again after freeing
      mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
      
      For reproducing the bug I created the following setup:
       - Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
       - Setup CAN to 1 MHz
       - Spam bursts of 5 CAN-messages with increasing CAN-ids
       - Continue sending the bursts while sleeping a second between the bursts
       - Check on the RPi whether the received messages have increasing CAN-ids
       - Without this patch, every burst of messages will contain a flipped pair
      
      v3: https://lore.kernel.org/all/20220804075914.67569-1-sebastian.wuerl@ororatech.com
      v2: https://lore.kernel.org/all/20220804064803.63157-1-sebastian.wuerl@ororatech.com
      v1: https://lore.kernel.org/all/20220803153300.58732-1-sebastian.wuerl@ororatech.com
      
      Fixes: bf66f373 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
      Signed-off-by: default avatarSebastian Würl <sebastian.wuerl@ororatech.com>
      Link: https://lore.kernel.org/all/20220804081411.68567-1-sebastian.wuerl@ororatech.com
      [mkl: reduce scope of intf1, eflag1]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      d80d60b0
    • Florian Westphal's avatar
      plip: avoid rcu debug splat · bc3c8fe3
      Florian Westphal authored
      WARNING: suspicious RCU usage
      5.2.0-rc2-00605-g2638eb8b #1 Not tainted
      drivers/net/plip/plip.c:1110 suspicious rcu_dereference_check() usage!
      
      plip_open is called with RTNL held, switch to the correct helper.
      
      Fixes: 2638eb8b ("net: ipv4: provide __rcu annotation for ifa_list")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/20220807115304.13257-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc3c8fe3
    • Sandor Bodo-Merle's avatar
      net: bgmac: Fix a BUG triggered by wrong bytes_compl · 1b7680c6
      Sandor Bodo-Merle authored
      On one of our machines we got:
      
      kernel BUG at lib/dynamic_queue_limits.c:27!
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
      CPU: 0 PID: 1166 Comm: irq/41-bgmac Tainted: G        W  O    4.14.275-rt132 #1
      Hardware name: BRCM XGS iProc
      task: ee3415c0 task.stack: ee32a000
      PC is at dql_completed+0x168/0x178
      LR is at bgmac_poll+0x18c/0x6d8
      pc : [<c03b9430>]    lr : [<c04b5a18>]    psr: 800a0313
      sp : ee32be14  ip : 000005ea  fp : 00000bd4
      r10: ee558500  r9 : c0116298  r8 : 00000002
      r7 : 00000000  r6 : ef128810  r5 : 01993267  r4 : 01993851
      r3 : ee558000  r2 : 000070e1  r1 : 00000bd4  r0 : ee52c180
      Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      Control: 12c5387d  Table: 8e88c04a  DAC: 00000051
      Process irq/41-bgmac (pid: 1166, stack limit = 0xee32a210)
      Stack: (0xee32be14 to 0xee32c000)
      be00:                                              ee558520 ee52c100 ef128810
      be20: 00000000 00000002 c0116298 c04b5a18 00000000 c0a0c8c4 c0951780 00000040
      be40: c0701780 ee558500 ee55d520 ef05b340 ef6f9780 ee558520 00000001 00000040
      be60: ffffe000 c0a56878 ef6fa040 c0952040 0000012c c0528744 ef6f97b0 fffcfb6a
      be80: c0a04104 2eda8000 c0a0c4ec c0a0d368 ee32bf44 c0153534 ee32be98 ee32be98
      bea0: ee32bea0 ee32bea0 ee32bea8 ee32bea8 00000000 c01462e4 ffffe000 ef6f22a8
      bec0: ffffe000 00000008 ee32bee4 c0147430 ffffe000 c094a2a8 00000003 ffffe000
      bee0: c0a54528 00208040 0000000c c0a0c8c4 c0a65980 c0124d3c 00000008 ee558520
      bf00: c094a23c c0a02080 00000000 c07a9910 ef136970 ef136970 ee30a440 ef136900
      bf20: ee30a440 00000001 ef136900 ee30a440 c016d990 00000000 c0108db0 c012500c
      bf40: ef136900 c016da14 ee30a464 ffffe000 00000001 c016dd14 00000000 c016db28
      bf60: ffffe000 ee21a080 ee30a400 00000000 ee32a000 ee30a440 c016dbfc ee25fd70
      bf80: ee21a09c c013edcc ee32a000 ee30a400 c013ec7c 00000000 00000000 00000000
      bfa0: 00000000 00000000 00000000 c0108470 00000000 00000000 00000000 00000000
      bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
      [<c03b9430>] (dql_completed) from [<c04b5a18>] (bgmac_poll+0x18c/0x6d8)
      [<c04b5a18>] (bgmac_poll) from [<c0528744>] (net_rx_action+0x1c4/0x494)
      [<c0528744>] (net_rx_action) from [<c0124d3c>] (do_current_softirqs+0x1ec/0x43c)
      [<c0124d3c>] (do_current_softirqs) from [<c012500c>] (__local_bh_enable+0x80/0x98)
      [<c012500c>] (__local_bh_enable) from [<c016da14>] (irq_forced_thread_fn+0x84/0x98)
      [<c016da14>] (irq_forced_thread_fn) from [<c016dd14>] (irq_thread+0x118/0x1c0)
      [<c016dd14>] (irq_thread) from [<c013edcc>] (kthread+0x150/0x158)
      [<c013edcc>] (kthread) from [<c0108470>] (ret_from_fork+0x14/0x24)
      Code: a83f15e0 0200001a 0630a0e1 c3ffffea (f201f0e7)
      
      The issue seems similar to commit 90b3b339 ("net: hisilicon: Fix a BUG
      trigered by wrong bytes_compl") and potentially introduced by commit
      b38c83dd ("bgmac: simplify tx ring index handling").
      
      If there is an RX interrupt between setting ring->end
      and netdev_sent_queue() we can hit the BUG_ON as bgmac_dma_tx_free()
      can miscalculate the queue size while called from bgmac_poll().
      
      The machine which triggered the BUG runs a v4.14 RT kernel - but the issue
      seems present in mainline too.
      
      Fixes: b38c83dd ("bgmac: simplify tx ring index handling")
      Signed-off-by: default avatarSandor Bodo-Merle <sbodomerle@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220808173939.193804-1-sbodomerle@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1b7680c6
    • Vladimir Oltean's avatar
      net: dsa: felix: suppress non-changes to the tagging protocol · 4c46bb49
      Vladimir Oltean authored
      The way in which dsa_tree_change_tag_proto() works is that when
      dsa_tree_notify() fails, it doesn't know whether the operation failed
      mid way in a multi-switch tree, or it failed for a single-switch tree.
      So even though drivers need to fail cleanly in
      ds->ops->change_tag_protocol(), DSA will still call dsa_tree_notify()
      again, to restore the old tag protocol for potential switches in the
      tree where the change did succeeed (before failing for others).
      
      This means for the felix driver that if we report an error in
      felix_change_tag_protocol(), we'll get another call where proto_ops ==
      old_proto_ops. If we proceed to act upon that, we may do unexpected
      things. For example, we will call dsa_tag_8021q_register() twice in a
      row, without any dsa_tag_8021q_unregister() in between. Then we will
      actually call dsa_tag_8021q_unregister() via old_proto_ops->teardown,
      which (if it manages to run at all, after walking through corrupted data
      structures) will leave the ports inoperational anyway.
      
      The bug can be readily reproduced if we force an error while in
      tag_8021q mode; this crashes the kernel.
      
      echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
      echo edsa > /sys/class/net/eno2/dsa/tagging # -EPROTONOSUPPORT
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014
      Call trace:
       vcap_entry_get+0x24/0x124
       ocelot_vcap_filter_del+0x198/0x270
       felix_tag_8021q_vlan_del+0xd4/0x21c
       dsa_switch_tag_8021q_vlan_del+0x168/0x2cc
       dsa_switch_event+0x68/0x1170
       dsa_tree_notify+0x14/0x34
       dsa_port_tag_8021q_vlan_del+0x84/0x110
       dsa_tag_8021q_unregister+0x15c/0x1c0
       felix_tag_8021q_teardown+0x16c/0x180
       felix_change_tag_protocol+0x1bc/0x230
       dsa_switch_event+0x14c/0x1170
       dsa_tree_change_tag_proto+0x118/0x1c0
      
      Fixes: 7a29d220 ("net: dsa: felix: reimplement tagging protocol change with function pointers")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220808125127.3344094-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4c46bb49
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 7ba0fa7f
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.0
      
      First set of fixes for v6.0. Small one this time, fix a cfg80211
      warning seen with brcmfmac and remove an unncessary inline keyword
      from wilc1000.
      
      * tag 'wireless-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: wilc1000: fix spurious inline in wilc_handle_disconnect()
        wifi: cfg80211: Fix validating BSS pointers in __cfg80211_connect_result
      ====================
      
      Link: https://lore.kernel.org/r/20220809164756.B1DAEC433D6@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7ba0fa7f
    • Florian Westphal's avatar
      netfilter: nf_tables: fix null deref due to zeroed list head · 58007785
      Florian Westphal authored
      In nf_tables_updtable, if nf_tables_table_enable returns an error,
      nft_trans_destroy is called to free the transaction object.
      
      nft_trans_destroy() calls list_del(), but the transaction was never
      placed on a list -- the list head is all zeroes, this results in
      a null dereference:
      
      BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59
      Call Trace:
       nft_trans_destroy+0x26/0x59
       nf_tables_newtable+0x4bc/0x9bc
       [..]
      
      Its sane to assume that nft_trans_destroy() can be called
      on the transaction object returned by nft_trans_alloc(), so
      make sure the list head is initialised.
      
      Fixes: 55dd6f93 ("netfilter: nf_tables: use new transaction infrastructure to handle table")
      Reported-by: default avatarmingi cho <mgcho.minic@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      58007785
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: disallow jump to implicit chain from set element · f323ef3a
      Pablo Neira Ayuso authored
      Extend struct nft_data_desc to add a flag field that specifies
      nft_data_init() is being called for set element data.
      
      Use it to disallow jump to implicit chain from set element, only jump
      to chain via immediate expression is allowed.
      
      Fixes: d0e2c7de ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f323ef3a
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: upfront validation of data via nft_data_init() · 341b6941
      Pablo Neira Ayuso authored
      Instead of parsing the data and then validate that type and length are
      correct, pass a description of the expected data so it can be validated
      upfront before parsing it to bail out earlier.
      
      This patch adds a new .size field to specify the maximum size of the
      data area. The .len field is optional and it is used as an input/output
      field, it provides the specific length of the expected data in the input
      path. If then .len field is not specified, then obtained length from the
      netlink attribute is stored. This is required by cmp, bitwise, range and
      immediate, which provide no netlink attribute that describes the data
      length. The immediate expression uses the destination register type to
      infer the expected data type.
      
      Relying on opencoded validation of the expected data might lead to
      subtle bugs as described in 7e6bc1f6 ("netfilter: nf_tables:
      stricter validation of element data").
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      341b6941
    • Christophe JAILLET's avatar
      netfilter: ip6t_LOG: Fix a typo in a comment · 13494168
      Christophe JAILLET authored
      s/_IPT_LOG_H/_IP6T_LOG_H/
      
      While at it add some surrounding space to ease reading.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      13494168
    • Thadeu Lima de Souza Cascardo's avatar
      netfilter: nf_tables: do not allow RULE_ID to refer to another chain · 36d5b291
      Thadeu Lima de Souza Cascardo authored
      When doing lookups for rules on the same batch by using its ID, a rule from
      a different chain can be used. If a rule is added to a chain but tries to
      be positioned next to a rule from a different chain, it will be linked to
      chain2, but the use counter on chain1 would be the one to be incremented.
      
      When looking for rules by ID, use the chain that was used for the lookup by
      name. The chain used in the context copied to the transaction needs to
      match that same chain. That way, struct nft_rule does not need to get
      enlarged with another member.
      
      Fixes: 1a94e38d ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
      Fixes: 75dd48e2 ("netfilter: nf_tables: Support RULE_ID reference in new rule")
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      36d5b291
    • Thadeu Lima de Souza Cascardo's avatar
      netfilter: nf_tables: do not allow CHAIN_ID to refer to another table · 95f466d2
      Thadeu Lima de Souza Cascardo authored
      When doing lookups for chains on the same batch by using its ID, a chain
      from a different table can be used. If a rule is added to a table but
      refers to a chain in a different table, it will be linked to the chain in
      table2, but would have expressions referring to objects in table1.
      
      Then, when table1 is removed, the rule will not be removed as its linked to
      a chain in table2. When expressions in the rule are processed or removed,
      that will lead to a use-after-free.
      
      When looking for chains by ID, use the table that was used for the lookup
      by name, and only return chains belonging to that same table.
      
      Fixes: 837830a4 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      95f466d2
    • Thadeu Lima de Souza Cascardo's avatar
      netfilter: nf_tables: do not allow SET_ID to refer to another table · 470ee20e
      Thadeu Lima de Souza Cascardo authored
      When doing lookups for sets on the same batch by using its ID, a set from a
      different table can be used.
      
      Then, when the table is removed, a reference to the set may be kept after
      the set is freed, leading to a potential use-after-free.
      
      When looking for sets by ID, use the table that was used for the lookup by
      name, and only return sets belonging to that same table.
      
      This fixes CVE-2022-2586, also reported as ZDI-CAN-17470.
      
      Reported-by: Team Orca of Sea Security (@seasecresponse)
      Fixes: 958bee14 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      470ee20e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: validate variable length element extension · 34aae2c2
      Pablo Neira Ayuso authored
      Update template to validate variable length extensions. This patch adds
      a new .ext_len[id] field to the template to store the expected extension
      length. This is used to sanity check the initialization of the variable
      length extension.
      
      Use PTR_ERR() in nft_set_elem_init() to report errors since, after this
      update, there are two reason why this might fail, either because of
      ENOMEM or insufficient room in the extension field (EINVAL).
      
      Kernels up until 7e6bc1f6 ("netfilter: nf_tables: stricter
      validation of element data") allowed to copy more data to the extension
      than was allocated. This ext_len field allows to validate if the
      destination has the correct size as additional check.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      34aae2c2