1. 18 May, 2022 3 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: disable expression reduction infra · 9e539c5b
      Pablo Neira Ayuso authored
      Either userspace or kernelspace need to pre-fetch keys inconditionally
      before comparisons for this to work. Otherwise, register tracking data
      is misleading and it might result in reducing expressions which are not
      yet registers.
      
      First expression is also guaranteed to be evaluated always, however,
      certain expressions break before writing data to registers, before
      comparing the data, leaving the register in undetermined state.
      
      This patch disables this infrastructure by now.
      
      Fixes: b2d30654 ("netfilter: nf_tables: do not reduce read-only expressions")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9e539c5b
    • Ritaro Takenaka's avatar
      netfilter: flowtable: move dst_check to packet path · 2738d9d9
      Ritaro Takenaka authored
      Fixes sporadic IPv6 packet loss when flow offloading is enabled.
      
      IPv6 route GC and flowtable GC are not synchronized.
      When dst_cache becomes stale and a packet passes through the flow before
      the flowtable GC teardowns it, the packet can be dropped.
      So, it is necessary to check dst every time in packet path.
      
      Fixes: 227e1e4d ("netfilter: nf_flowtable: skip device lookup from interface index")
      Signed-off-by: default avatarRitaro Takenaka <ritarot634@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2738d9d9
    • Pablo Neira Ayuso's avatar
      netfilter: flowtable: fix TCP flow teardown · e5eaac2b
      Pablo Neira Ayuso authored
      This patch addresses three possible problems:
      
      1. ct gc may race to undo the timeout adjustment of the packet path, leaving
         the conntrack entry in place with the internal offload timeout (one day).
      
      2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
         timeout is reached before the flow offload del.
      
      3. tcp ct is always set to ESTABLISHED with a very long timeout
         in flow offload teardown/delete even though the state might be already
         CLOSED. Also as a remark we cannot assume that the FIN or RST packet
         is hitting flow table teardown as the packet might get bumped to the
         slow path in nftables.
      
      This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
      conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
      state transition.
      
      Moreover, teturn the connection's ownership to conntrack upon teardown
      by clearing the offload flag and fixing the established timeout value.
      The flow table GC thread will asynchonrnously free the flow table and
      hardware offload entries.
      
      Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
      which is also misleading since the flow is back to classic conntrack
      path.
      
      If nf_ct_delete() removes the entry from the conntrack table, then it
      calls nf_ct_put() which decrements the refcnt. This is not a problem
      because the flowtable holds a reference to the conntrack object from
      flow_offload_alloc() path which is released via flow_offload_free().
      
      This patch also updates nft_flow_offload to skip packets in SYN_RECV
      state. Since we might miss or bump packets to slow path, we do not know
      what will happen there while we are still in SYN_RECV, this patch
      postpones offload up to the next packet which also aligns to the
      existing behaviour in tc-ct.
      
      flow_offload_teardown() does not reset the existing tcp state from
      flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
      path might have already update the state to CLOSE/FIN.
      
      Joint work with Oz and Sven.
      
      Fixes: 1e5b2471 ("netfilter: nf_flow_table: teardown flow timeout race")
      Signed-off-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSven Auhagen <sven.auhagen@voleatech.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e5eaac2b
  2. 16 May, 2022 4 commits
  3. 12 May, 2022 13 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f3f19f93
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless, and bluetooth.
      
        No outstanding fires.
      
        Current release - regressions:
      
         - eth: atlantic: always deep reset on pm op, fix null-deref
      
        Current release - new code bugs:
      
         - rds: use maybe_get_net() when acquiring refcount on TCP sockets
           [refinement of a previous fix]
      
         - eth: ocelot: mark traps with a bool instead of guessing type based
           on list membership
      
        Previous releases - regressions:
      
         - net: fix skipping features in for_each_netdev_feature()
      
         - phy: micrel: fix null-derefs on suspend/resume and probe
      
         - bcmgenet: check for Wake-on-LAN interrupt probe deferral
      
        Previous releases - always broken:
      
         - ipv4: drop dst in multicast routing path, prevent leaks
      
         - ping: fix address binding wrt vrf
      
         - net: fix wrong network header length when BPF protocol translation
           is used on skbs with a fraglist
      
         - bluetooth: fix the creation of hdev->name
      
         - rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
      
         - wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing
      
         - wifi: ath11k: reduce the wait time of 11d scan and hw scan while
           adding an interface
      
         - mac80211: fix rx reordering with non explicit / psmp ack policy
      
         - mac80211: reset MBSSID parameters upon connection
      
         - nl80211: fix races in nl80211_set_tx_bitrate_mask()
      
         - tls: fix context leak on tls_device_down
      
         - sched: act_pedit: really ensure the skb is writable
      
         - batman-adv: don't skb_split skbuffs with frag_list
      
         - eth: ocelot: fix various issues with TC actions (null-deref; bad
           stats; ineffective drops; ineffective filter removal)"
      
      * tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
        tls: Fix context leak on tls_device_down
        net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
        net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
        net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
        mlxsw: Avoid warning during ip6gre device removal
        net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
        net: ethernet: mediatek: ppe: fix wrong size passed to memset()
        Bluetooth: Fix the creation of hdev->name
        i40e: i40e_main: fix a missing check on list iterator
        net/sched: act_pedit: really ensure the skb is writable
        s390/lcs: fix variable dereferenced before check
        s390/ctcm: fix potential memory leak
        s390/ctcm: fix variable dereferenced before check
        net: atlantic: verify hw_head_ lies within TX buffer ring
        net: atlantic: add check for MAX_SKB_FRAGS
        net: atlantic: reduce scope of is_rsc_complete
        net: atlantic: fix "frag[0] not initialized"
        net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
        net: phy: micrel: Fix incorrect variable type in micrel
        decnet: Use container_of() for struct dn_neigh casts
        ...
      f3f19f93
    • Linus Torvalds's avatar
      Merge branch 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 0ac824f3
      Linus Torvalds authored
      Pull cgroup fix from Tejun Heo:
       "Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which
        were hot-added"
      
      * 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
      0ac824f3
    • Linus Torvalds's avatar
      Merge tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · c37dba6a
      Linus Torvalds authored
      Pull fs fixes from Jan Kara:
       "Three fixes that I'd still like to get to 5.18:
      
         - add a missing sanity check in the fanotify FAN_RENAME feature
           (added in 5.17, let's fix it before it gets wider usage in
           userspace)
      
         - udf fix for recently introduced filesystem corruption issue
      
         - writeback fix for a race in inode list handling that can lead to
           delayed writeback and possible dirty throttling stalls"
      
      * tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        udf: Avoid using stale lengthOfImpUse
        writeback: Avoid skipping inode writeback
        fanotify: do not allow setting dirent events in mask of non-dir
      c37dba6a
    • Maxim Mikityanskiy's avatar
      tls: Fix context leak on tls_device_down · 3740651b
      Maxim Mikityanskiy authored
      The commit cited below claims to fix a use-after-free condition after
      tls_device_down. Apparently, the description wasn't fully accurate. The
      context stayed alive, but ctx->netdev became NULL, and the offload was
      torn down without a proper fallback, so a bug was present, but a
      different kind of bug.
      
      Due to misunderstanding of the issue, the original patch dropped the
      refcount_dec_and_test line for the context to avoid the alleged
      premature deallocation. That line has to be restored, because it matches
      the refcount_inc_not_zero from the same function, otherwise the contexts
      that survived tls_device_down are leaked.
      
      This patch fixes the described issue by restoring refcount_dec_and_test.
      After this change, there is no leak anymore, and the fallback to
      software kTLS still works.
      
      Fixes: c55dcdd4 ("net/tls: Fix use-after-free after the TLS device goes down and up")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3740651b
    • Taehee Yoo's avatar
      net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() · 1fa89ffb
      Taehee Yoo authored
      In the NIC ->probe() callback, ->mtd_probe() callback is called.
      If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too.
      In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and
      initializes mtd partiion.
      But mtd partition for sfc is shared data.
      So that allocated mtd partition data from last called
      efx_ef10_mtd_probe() will not be used.
      Therefore it must be freed.
      But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe().
      
      kmemleak reports:
      unreferenced object 0xffff88811ddb0000 (size 63168):
        comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120
          [<ffffffffa3873f0e>] __kmalloc+0x20e/0x250
          [<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc]
          [<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc]
          [<ffffffffa414192c>] local_pci_probe+0xdc/0x170
          [<ffffffffa4145df5>] pci_device_probe+0x235/0x680
          [<ffffffffa443dd52>] really_probe+0x1c2/0x8f0
          [<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460
          [<ffffffffa443e92a>] driver_probe_device+0x4a/0x120
          [<ffffffffa443f2ae>] __driver_attach+0x16e/0x320
          [<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190
          [<ffffffffa443b75e>] bus_add_driver+0x39e/0x560
          [<ffffffffa4440b1e>] driver_register+0x18e/0x310
          [<ffffffffc02e2055>] 0xffffffffc02e2055
          [<ffffffffa3001af3>] do_one_initcall+0xc3/0x450
          [<ffffffffa33ca574>] do_init_module+0x1b4/0x700
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Fixes: 8127d661 ("sfc: Add support for Solarflare SFC9100 family")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1fa89ffb
    • Guangguan Wang's avatar
      net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending · f3c46e41
      Guangguan Wang authored
      Non blocking sendmsg will return -EAGAIN when any signal pending
      and no send space left, while non blocking recvmsg return -EINTR
      when signal pending and no data received. This may makes confused.
      As TCP returns -EAGAIN in the conditions described above. Align the
      behavior of smc with TCP.
      
      Fixes: 846e344e ("net/smc: add receive timeout check")
      Signed-off-by: default avatarGuangguan Wang <guangguan.wang@linux.alibaba.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220512030820.73848-1-guangguan.wang@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f3c46e41
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down() · b7be130c
      Florian Fainelli authored
      After commit 2d1f90f9 ("net: dsa/bcm_sf2: fix incorrect usage of
      state->link") the interface suspend path would call our mac_link_down()
      call back which would forcibly set the link down, thus preventing
      Wake-on-LAN packets from reaching our management port.
      
      Fix this by looking at whether the port is enabled for Wake-on-LAN and
      not clearing the link status in that case to let packets go through.
      
      Fixes: 2d1f90f9 ("net: dsa/bcm_sf2: fix incorrect usage of state->link")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220512021731.2494261-1-f.fainelli@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b7be130c
    • Amit Cohen's avatar
      mlxsw: Avoid warning during ip6gre device removal · 810c2f0a
      Amit Cohen authored
      IPv6 addresses which are used for tunnels are stored in a hash table
      with reference counting. When a new GRE tunnel is configured, the driver
      is notified and configures it in hardware.
      
      Currently, any change in the tunnel is not applied in the driver. It
      means that if the remote address is changed, the driver is not aware of
      this change and the first address will be used.
      
      This behavior results in a warning [1] in scenarios such as the
      following:
      
       # ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit
       # ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit
       # ip link delete gre1
      
      The change of the address is not applied in the driver. Currently, the
      driver uses the remote address which is stored in the 'parms' of the
      overlay device. When the tunnel is removed, the new IPv6 address is
      used, the driver tries to release it, but as it is not aware of the
      change, this address is not configured and it warns about releasing non
      existing IPv6 address.
      
      Fix it by using the IPv6 address which is cached in the IPIP entry, this
      address is the last one that the driver used, so even in cases such the
      above, the first address will be released, without any warning.
      
      [1]:
      
      WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum]
      ...
      CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84
      Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
      RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum]
      ...
      Call Trace:
       <TASK>
       mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum]
       mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum]
       mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum]
       raw_notifier_call_chain+0x3c/0x50
       call_netdevice_notifiers_info+0x2f/0x80
       unregister_netdevice_many+0x311/0x6d0
       rtnl_dellink+0x136/0x360
       rtnetlink_rcv_msg+0x12f/0x380
       netlink_rcv_skb+0x49/0xf0
       netlink_unicast+0x233/0x340
       netlink_sendmsg+0x202/0x440
       ____sys_sendmsg+0x1f3/0x220
       ___sys_sendmsg+0x70/0xb0
       __sys_sendmsg+0x54/0xa0
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: e846efe2 ("mlxsw: spectrum: Add hash table for IPv6 address mapping")
      Reported-by: default avatarMaksym Yaremchuk <maksymy@nvidia.com>
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      810c2f0a
    • Florian Fainelli's avatar
      net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral · 6b77c066
      Florian Fainelli authored
      The interrupt controller supplying the Wake-on-LAN interrupt line maybe
      modular on some platforms (irq-bcm7038-l1.c) and might be probed at a
      later time than the GENET driver. We need to specifically check for
      -EPROBE_DEFER and propagate that error to ensure that we eventually
      fetch the interrupt descriptor.
      
      Fixes: 9deb48b5 ("bcmgenet: add WOL IRQ check")
      Fixes: 5b1f0e62 ("net: bcmgenet: Avoid touching non-existent interrupt")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Link: https://lore.kernel.org/r/20220511031752.2245566-1-f.fainelli@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6b77c066
    • Yang Yingliang's avatar
      net: ethernet: mediatek: ppe: fix wrong size passed to memset() · 00832b1d
      Yang Yingliang authored
      'foe_table' is a pointer, the real size of struct mtk_foe_entry
      should be pass to memset().
      
      Fixes: ba37b7ca ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Acked-by: default avatarFelix Fietkau <nbd@nbd.name>
      Link: https://lore.kernel.org/r/20220511030829.3308094-1-yangyingliang@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      00832b1d
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · a48ab883
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix the creation of hdev->name when index is greater than 9999
      
      * tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: Fix the creation of hdev->name
      ====================
      
      Link: https://lore.kernel.org/r/20220512002901.823647-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a48ab883
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 8bf6008c
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v5.18
      
      Second set of fixes for v5.18 and hopefully the last one. We have a
      new iwlwifi maintainer, a fix to rfkill ioctl interface and important
      fixes to both stack and two drivers.
      
      * tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
        nl80211: fix locking in nl80211_set_tx_bitrate_mask()
        mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection
        mac80211_hwsim: fix RCU protected chanctx access
        mailmap: update Kalle Valo's email
        mac80211: Reset MBSSID parameters upon connection
        cfg80211: retrieve S1G operating channel number
        nl80211: validate S1G channel width
        mac80211: fix rx reordering with non explicit / psmp ack policy
        ath11k: reduce the wait time of 11d scan and hw scan while add interface
        MAINTAINERS: update iwlwifi driver maintainer
        iwlwifi: iwl-dbg: Use del_timer_sync() before freeing
      ====================
      
      Link: https://lore.kernel.org/r/20220511154535.A1A12C340EE@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8bf6008c
    • Itay Iellin's avatar
      Bluetooth: Fix the creation of hdev->name · 103a2f32
      Itay Iellin authored
      Set a size limit of 8 bytes of the written buffer to "hdev->name"
      including the terminating null byte, as the size of "hdev->name" is 8
      bytes. If an id value which is greater than 9999 is allocated,
      then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)"
      function call would lead to a truncation of the id value in decimal
      notation.
      
      Set an explicit maximum id parameter in the id allocation function call.
      The id allocation function defines the maximum allocated id value as the
      maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined
      as 10000.
      Signed-off-by: default avatarItay Iellin <ieitayie@gmail.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      103a2f32
  4. 11 May, 2022 12 commits
  5. 10 May, 2022 8 commits