1. 02 Oct, 2021 1 commit
  2. 28 Sep, 2021 3 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reverse order in rule replacement expansion · 2c964c55
      Pablo Neira Ayuso authored
      Deactivate old rule first, then append the new rule, so rule replacement
      notification via netlink first reports the deletion of the old rule with
      handle X in first place, then it adds the new rule (reusing the handle X
      of the replaced old rule).
      
      Note that the abort path releases the transaction that has been created
      by nft_delrule() on error.
      
      Fixes: ca089878 ("netfilter: nf_tables: deactivate expressions in rule replecement routine")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2c964c55
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add position handle in event notification · e189ae16
      Pablo Neira Ayuso authored
      Add position handle to allow to identify the rule location from netlink
      events. Otherwise, userspace cannot incrementally update a userspace
      cache through monitoring events.
      
      Skip handle dump if the rule has been either inserted (at the beginning
      of the ruleset) or appended (at the end of the ruleset), the
      NLM_F_APPEND netlink flag is sufficient in these two cases.
      
      Handle NLM_F_REPLACE as NLM_F_APPEND since the rule replacement
      expansion appends it after the specified rule handle.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e189ae16
    • Florian Westphal's avatar
      netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1 · 339031ba
      Florian Westphal authored
      This is a revert of
      7b1957b0 ("netfilter: nf_defrag_ipv4: use net_generic infra")
      and a partial revert of
      8b0adbe3 ("netfilter: nf_defrag_ipv6: use net_generic infra").
      
      If conntrack is builtin and kernel is booted with:
      nf_conntrack.enable_hooks=1
      
      .... kernel will fail to boot due to a NULL deref in
      nf_defrag_ipv4_enable(): Its called before the ipv4 defrag initcall is
      made, so net_generic() returns NULL.
      
      To resolve this, move the user refcount back to struct net so calls
      to those functions are possible even before their initcalls have run.
      
      Fixes: 7b1957b0 ("netfilter: nf_defrag_ipv4: use net_generic infra")
      Fixes: 8b0adbe3 ("netfilter: nf_defrag_ipv6: use net_generic infra").
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      339031ba
  3. 27 Sep, 2021 13 commits
    • Xu Liang's avatar
      net: phy: enhance GPY115 loopback disable function · 3b1b6e82
      Xu Liang authored
      GPY115 need reset PHY when it comes out from loopback mode if the firmware
      version number (lower 8 bits) is equal to or below 0x76.
      
      Fixes: 7d901a1e ("net: phy: add Maxlinear GPY115/21x/24x driver")
      Signed-off-by: default avatarXu Liang <lxu@maxlinear.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b1b6e82
    • David S. Miller's avatar
      Merge tag 'mac80211-for-net-2021-09-27' of... · ca48aa4a
      David S. Miller authored
      Merge tag 'mac80211-for-net-2021-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes berg says:
      
      ====================
      Some fixes:
       * potential use-after-free in CCMP/GCMP RX processing
       * potential use-after-free in TX A-MSDU processing
       * revert to low data rates for no-ack as the commit
         broke other things
       * limit VHT MCS/NSS in radiotap injection
       * drop frames with invalid addresses in IBSS mode
       * check rhashtable_init() return value in mesh
       * fix potentially unaligned access in mesh
       * fix late beacon hrtimer handling in hwsim (syzbot)
       * fix documentation for PTK0 rekeying
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca48aa4a
    • David S. Miller's avatar
      Merge branch 'mv88e6xxx-mtu-fixes' · 3ebaaad4
      David S. Miller authored
      Andrew Lunn says:
      
      ====================
      mv88e6xxx: MTU fixes
      
      These three patches fix MTU issues reported by 曹煜.
      
      There are two different ways of configuring the MTU in the hardware.
      The 6161 family is using the wrong method. Some of the marvell switch
      enforce the MTU when the port is used for CPU/DSA, some don't.
      Because of the extra header, the MTU needs increasing with this
      overhead.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ebaaad4
    • Andrew Lunn's avatar
      dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports · b9c587fe
      Andrew Lunn authored
      Same members of the Marvell Ethernet switches impose MTU restrictions
      on ports used for connecting to the CPU or another switch for DSA. If
      the MTU is set too low, tagged frames will be discarded. Ensure the
      worst case tagger overhead is included in setting the MTU for DSA and
      CPU ports.
      
      Fixes: 1baf0fac ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
      Reported by: 曹煜 <cao88yu@gmail.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9c587fe
    • Andrew Lunn's avatar
      dsa: mv88e6xxx: Fix MTU definition · b92ce2f5
      Andrew Lunn authored
      The MTU passed to the DSA driver is the payload size, typically 1500.
      However, the switch uses the frame size when applying restrictions.
      Adjust the MTU with the size of the Ethernet header and the frame
      checksum. The VLAN header also needs to be included when the frame
      size it per port, but not when it is global.
      
      Fixes: 1baf0fac ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
      Reported by: 曹煜 <cao88yu@gmail.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b92ce2f5
    • Andrew Lunn's avatar
      dsa: mv88e6xxx: 6161: Use chip wide MAX MTU · fe230361
      Andrew Lunn authored
      The datasheets suggests the 6161 uses a per port setting for jumbo
      frames. Testing has however shown this is not correct, it uses the old
      style chip wide MTU control. Change the ops in the 6161 structure to
      reflect this.
      
      Fixes: 1baf0fac ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
      Reported by: 曹煜 <cao88yu@gmail.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe230361
    • Yanfei Xu's avatar
      net: mdiobus: Fix memory leak in __mdiobus_register · ab609f25
      Yanfei Xu authored
      Once device_register() failed, we should call put_device() to
      decrement reference count for cleanup. Or it will cause memory
      leak.
      
      BUG: memory leak
      unreferenced object 0xffff888114032e00 (size 256):
        comm "kworker/1:3", pid 2960, jiffies 4294943572 (age 15.920s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 08 2e 03 14 81 88 ff ff  ................
          08 2e 03 14 81 88 ff ff 90 76 65 82 ff ff ff ff  .........ve.....
        backtrace:
          [<ffffffff8265cfab>] kmalloc include/linux/slab.h:591 [inline]
          [<ffffffff8265cfab>] kzalloc include/linux/slab.h:721 [inline]
          [<ffffffff8265cfab>] device_private_init drivers/base/core.c:3203 [inline]
          [<ffffffff8265cfab>] device_add+0x89b/0xdf0 drivers/base/core.c:3253
          [<ffffffff828dd643>] __mdiobus_register+0xc3/0x450 drivers/net/phy/mdio_bus.c:537
          [<ffffffff828cb835>] __devm_mdiobus_register+0x75/0xf0 drivers/net/phy/mdio_devres.c:87
          [<ffffffff82b92a00>] ax88772_init_mdio drivers/net/usb/asix_devices.c:676 [inline]
          [<ffffffff82b92a00>] ax88772_bind+0x330/0x480 drivers/net/usb/asix_devices.c:786
          [<ffffffff82baa33f>] usbnet_probe+0x3ff/0xdf0 drivers/net/usb/usbnet.c:1745
          [<ffffffff82c36e17>] usb_probe_interface+0x177/0x370 drivers/usb/core/driver.c:396
          [<ffffffff82661d17>] call_driver_probe drivers/base/dd.c:517 [inline]
          [<ffffffff82661d17>] really_probe.part.0+0xe7/0x380 drivers/base/dd.c:596
          [<ffffffff826620bc>] really_probe drivers/base/dd.c:558 [inline]
          [<ffffffff826620bc>] __driver_probe_device+0x10c/0x1e0 drivers/base/dd.c:751
          [<ffffffff826621ba>] driver_probe_device+0x2a/0x120 drivers/base/dd.c:781
          [<ffffffff82662a26>] __device_attach_driver+0xf6/0x140 drivers/base/dd.c:898
          [<ffffffff8265eca7>] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:427
          [<ffffffff826625a2>] __device_attach+0x122/0x260 drivers/base/dd.c:969
          [<ffffffff82660916>] bus_probe_device+0xc6/0xe0 drivers/base/bus.c:487
          [<ffffffff8265cd0b>] device_add+0x5fb/0xdf0 drivers/base/core.c:3359
          [<ffffffff82c343b9>] usb_set_configuration+0x9d9/0xb90 drivers/usb/core/message.c:2170
          [<ffffffff82c4473c>] usb_generic_driver_probe+0x8c/0xc0 drivers/usb/core/generic.c:238
      
      BUG: memory leak
      unreferenced object 0xffff888116f06900 (size 32):
        comm "kworker/0:2", pid 2670, jiffies 4294944448 (age 7.160s)
        hex dump (first 32 bytes):
          75 73 62 2d 30 30 31 3a 30 30 33 00 00 00 00 00  usb-001:003.....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81484516>] kstrdup+0x36/0x70 mm/util.c:60
          [<ffffffff814845a3>] kstrdup_const+0x53/0x80 mm/util.c:83
          [<ffffffff82296ba2>] kvasprintf_const+0xc2/0x110 lib/kasprintf.c:48
          [<ffffffff82358d4b>] kobject_set_name_vargs+0x3b/0xe0 lib/kobject.c:289
          [<ffffffff826575f3>] dev_set_name+0x63/0x90 drivers/base/core.c:3147
          [<ffffffff828dd63b>] __mdiobus_register+0xbb/0x450 drivers/net/phy/mdio_bus.c:535
          [<ffffffff828cb835>] __devm_mdiobus_register+0x75/0xf0 drivers/net/phy/mdio_devres.c:87
          [<ffffffff82b92a00>] ax88772_init_mdio drivers/net/usb/asix_devices.c:676 [inline]
          [<ffffffff82b92a00>] ax88772_bind+0x330/0x480 drivers/net/usb/asix_devices.c:786
          [<ffffffff82baa33f>] usbnet_probe+0x3ff/0xdf0 drivers/net/usb/usbnet.c:1745
          [<ffffffff82c36e17>] usb_probe_interface+0x177/0x370 drivers/usb/core/driver.c:396
          [<ffffffff82661d17>] call_driver_probe drivers/base/dd.c:517 [inline]
          [<ffffffff82661d17>] really_probe.part.0+0xe7/0x380 drivers/base/dd.c:596
          [<ffffffff826620bc>] really_probe drivers/base/dd.c:558 [inline]
          [<ffffffff826620bc>] __driver_probe_device+0x10c/0x1e0 drivers/base/dd.c:751
          [<ffffffff826621ba>] driver_probe_device+0x2a/0x120 drivers/base/dd.c:781
          [<ffffffff82662a26>] __device_attach_driver+0xf6/0x140 drivers/base/dd.c:898
          [<ffffffff8265eca7>] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:427
          [<ffffffff826625a2>] __device_attach+0x122/0x260 drivers/base/dd.c:969
      
      Reported-by: syzbot+398e7dc692ddbbb4cfec@syzkaller.appspotmail.com
      Signed-off-by: default avatarYanfei Xu <yanfei.xu@windriver.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab609f25
    • Desnes A. Nunes do Rosario's avatar
      Revert "ibmvnic: check failover_pending in login response" · 2974b8a6
      Desnes A. Nunes do Rosario authored
      This reverts commit d437f5aa.
      
      Code has been duplicated through commit <273c29e9> "ibmvnic: check
      failover_pending in login response"
      Signed-off-by: default avatarDesnes A. Nunes do Rosario <desnesn@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2974b8a6
    • Matthew Hagan's avatar
      net: bgmac-platform: handle mac-address deferral · 763716a5
      Matthew Hagan authored
      This patch is a replication of Christian Lamparter's "net: bgmac-bcma:
      handle deferred probe error due to mac-address" patch for the
      bgmac-platform driver [1].
      
      As is the case with the bgmac-bcma driver, this change is to cover the
      scenario where the MAC address cannot yet be discovered due to reliance
      on an nvmem provider which is yet to be instantiated, resulting in a
      random address being assigned that has to be manually overridden.
      
      [1] https://lore.kernel.org/netdev/20210919115725.29064-1-chunkeey@gmail.comSigned-off-by: default avatarMatthew Hagan <mnhagan88@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      763716a5
    • Colin Ian King's avatar
      net: hns: Fix spelling mistake "maped" -> "mapped" · 44b6aa2e
      Colin Ian King authored
      There is a spelling mistake in a dev_err error message. Fix it.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44b6aa2e
    • Alexander Wetzel's avatar
      mac80211: Fix Ptk0 rekey documentation · 33092aca
      Alexander Wetzel authored
      @IEEE80211_KEY_FLAG_GENERATE_IV setting is irrelevant for RX.
      Move the requirement to the correct section in the PTK0 rekey
      documentation.
      Signed-off-by: default avatarAlexander Wetzel <alexander@wetzel-home.de>
      Link: https://lore.kernel.org/r/20210924200514.7936-1-alexander@wetzel-home.deSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      33092aca
    • MichelleJin's avatar
      mac80211: check return value of rhashtable_init · 111461d5
      MichelleJin authored
      When rhashtable_init() fails, it returns -EINVAL.
      However, since error return value of rhashtable_init is not checked,
      it can cause use of uninitialized pointers.
      So, fix unhandled errors of rhashtable_init.
      Signed-off-by: default avatarMichelleJin <shjy180909@gmail.com>
      Link: https://lore.kernel.org/r/20210927033457.1020967-4-shjy180909@gmail.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      111461d5
    • Johannes Berg's avatar
      mac80211: fix use-after-free in CCMP/GCMP RX · 94513069
      Johannes Berg authored
      When PN checking is done in mac80211, for fragmentation we need
      to copy the PN to the RX struct so we can later use it to do a
      comparison, since commit bf30ca92 ("mac80211: check defrag
      PN against current frame").
      
      Unfortunately, in that commit I used the 'hdr' variable without
      it being necessarily valid, so use-after-free could occur if it
      was necessary to reallocate (parts of) the frame.
      
      Fix this by reloading the variable after the code that results
      in the reallocations, if any.
      
      This fixes https://bugzilla.kernel.org/show_bug.cgi?id=214401.
      
      Cc: stable@vger.kernel.org
      Fixes: bf30ca92 ("mac80211: check defrag PN against current frame")
      Link: https://lore.kernel.org/r/20210927115838.12b9ac6bb233.I1d066acd5408a662c3b6e828122cd314fcb28cdb@changeidSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      94513069
  4. 26 Sep, 2021 1 commit
    • 王贇's avatar
      net: prevent user from passing illegal stab size · b193e15a
      王贇 authored
      We observed below report when playing with netlink sock:
      
        UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10
        shift exponent 249 is too large for 32-bit type
        CPU: 0 PID: 685 Comm: a.out Not tainted
        Call Trace:
         dump_stack_lvl+0x8d/0xcf
         ubsan_epilogue+0xa/0x4e
         __ubsan_handle_shift_out_of_bounds+0x161/0x182
         __qdisc_calculate_pkt_len+0xf0/0x190
         __dev_queue_xmit+0x2ed/0x15b0
      
      it seems like kernel won't check the stab log value passing from
      user, and will use the insane value later to calculate pkt_len.
      
      This patch just add a check on the size/cell_log to avoid insane
      calculation.
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Signed-off-by: default avatarMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b193e15a
  5. 25 Sep, 2021 1 commit
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 7fe7f318
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      1) ipset limits the max allocatable memory via kvmalloc() to MAX_INT,
         from Jozsef Kadlecsik.
      
      2) Check ip_vs_conn_tab_bits value to be in the range specified
         in Kconfig, from Andrea Claudi.
      
      3) Initialize fragment offset in ip6tables, from Jeremy Sowden.
      
      4) Make conntrack hash chain length random, from Florian Westphal.
      
      5) Add zone ID to conntrack and NAT hashtuple again, also from Florian.
      
      6) Add selftests for bidirectional zone support and colliding tuples,
         from Florian Westphal.
      
      7) Unlink table before synchronize_rcu when cleaning tables with
         owner, from Florian.
      
      8) ipset limits the max allocatable memory via kvmalloc() to MAX_INT.
      
      9) Release conntrack entries via workqueue in masquerade, from Florian.
      
      10) Fix bogus net_init in iptables raw table definition, also from Florian.
      
      11) Work around missing softdep in log extensions, from Florian Westphal.
      
      12) Serialize hash resizes and cleanups with mutex, from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
        netfilter: conntrack: serialize hash resizes and cleanups
        netfilter: log: work around missing softdep backend module
        netfilter: iptable_raw: drop bogus net_init annotation
        netfilter: nf_nat_masquerade: defer conntrack walk to work queue
        netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic
        netfilter: nf_tables: Fix oversized kvmalloc() calls
        netfilter: nf_tables: unlink table before deleting it
        selftests: netfilter: add zone stress test with colliding tuples
        selftests: netfilter: add selftest for directional zone support
        netfilter: nat: include zone id in nat table hash again
        netfilter: conntrack: include zone id in tuple hash again
        netfilter: conntrack: make max chain length random
        netfilter: ip6_tables: zero-initialize fragment offset
        ipvs: check that ip_vs_conn_tab_bits is between 8 and 20
        netfilter: ipset: Fix oversized kvmalloc() calls
      ====================
      
      Link: https://lore.kernel.org/r/20210924221113.348767-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7fe7f318
  6. 24 Sep, 2021 8 commits
  7. 23 Sep, 2021 13 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9bc62afe
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Current release - regressions:
      
         - dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
      
        Previous releases - regressions:
      
         - introduce a shutdown method to mdio device drivers, and make DSA
           switch drivers compatible with masters disappearing on shutdown;
           preventing infinite reference wait
      
         - fix issues in mdiobus users related to ->shutdown vs ->remove
      
         - virtio-net: fix pages leaking when building skb in big mode
      
         - xen-netback: correct success/error reporting for the
           SKB-with-fraglist
      
         - dsa: tear down devlink port regions when tearing down the devlink
           port on error
      
         - nexthop: fix division by zero while replacing a resilient group
      
         - hns3: check queue, vf, vlan ids range before using
      
        Previous releases - always broken:
      
         - napi: fix race against netpoll causing NAPI getting stuck
      
         - mlx4_en: ensure link operstate is updated even if link comes up
           before netdev registration
      
         - bnxt_en: fix TX timeout when TX ring size is set to the smallest
      
         - enetc: fix illegal access when reading affinity_hint; prevent oops
           on sysfs access
      
         - mtk_eth_soc: avoid creating duplicate offload entries
      
        Misc:
      
         - core: correct the sock::sk_lock.owned lockdep annotations"
      
      * tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
        atlantic: Fix issue in the pm resume flow.
        net/mlx4_en: Don't allow aRFS for encapsulated packets
        net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
        net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
        nfc: st-nci: Add SPI ID matching DT compatible
        MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
        nexthop: Fix memory leaks in nexthop notification chain listeners
        mptcp: ensure tx skbs always have the MPTCP ext
        qed: rdma - don't wait for resources under hw error recovery flow
        s390/qeth: fix deadlock during failing recovery
        s390/qeth: Fix deadlock in remove_discipline
        s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
        net: dsa: realtek: register the MDIO bus under devres
        net: dsa: don't allocate the slave_mii_bus using devres
        Doc: networking: Fox a typo in ice.rst
        net: dsa: fix dsa_tree_setup error path
        net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
        net/smc: add missing error check in smc_clc_prfx_set()
        net: hns3: fix a return value error in hclge_get_reset_status()
        net: hns3: check vlan id before using it
        ...
      9bc62afe
    • Shakeel Butt's avatar
      memcg: flush lruvec stats in the refault · 1f828223
      Shakeel Butt authored
      Prior to the commit 7e1c0d6f ("memcg: switch lruvec stats to rstat")
      and the commit aa48e47e ("memcg: infrastructure to flush memcg
      stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
      32) at worst and for unbounded amount of time.  The commit aa48e47e
      moved the lruvec stats to rstat infrastructure and the commit
      7e1c0d6f bounded the error for all the lruvec stats to (nr_cpus *
      32) at worst for at most 2 seconds.  More specifically it decoupled the
      number of stats and the number of cgroups from the error rate.
      
      However this reduction in error comes with the cost of triggering the
      slowpath of stats update more frequently.  Previously in the slowpath
      the kernel adds the stats up the memcg tree.  After aa48e47e, the
      kernel triggers the asyn lruvec stats flush through queue_work().  This
      causes regression reports from 0day kernel bot [1] as well as from
      phoronix test suite [2].
      
      We tried two options to fix the regression:
      
       1) Increase the threshold to trigger the slowpath in lruvec stats
          update codepath from 32 to 512.
      
       2) Remove the slowpath from lruvec stats update codepath and instead
          flush the stats in the page refault codepath. The assumption is that
          the kernel timely flush the stats, so, the update tree would be
          small in the refault codepath to not cause the preformance impact.
      
      Following are the results of will-it-scale/page_fault[1|2|3] benchmark
      on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
      aa48e47e and 7e1c0d6f reverted (3) 5.15-rc1 with option-1
      (4) 5.15-rc1 with option-2.
      
        test       (1)      (2)               (3)               (4)
        pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
        pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
        pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)
      
      From the above result, it seems like the option-2 not only solves the
      regression but also improves the performance for at least these
      benchmarks.
      
      Feng Tang (intel) ran the aim7 benchmark with these two options and
      confirms that option-1 reduces the regression but option-2 removes the
      regression.
      
      Michael Larabel (phoronix) ran multiple benchmarks with these options
      and reported the results at [3] and it shows for most benchmarks
      option-2 removes the regression introduced by the commit aa48e47e
      ("memcg: infrastructure to flush memcg stats").
      
      Based on the experiment results, this patch proposed the option-2 as the
      solution to resolve the regression.
      
      Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
      Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
      Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
      Fixes: aa48e47e ("memcg: infrastructure to flush memcg stats")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Tested-by: default avatarMichael Larabel <Michael@phoronix.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hillf Danton <hdanton@sina.com>,
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>,
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f828223
    • Sudarsana Reddy Kalluru's avatar
      atlantic: Fix issue in the pm resume flow. · 4d88c339
      Sudarsana Reddy Kalluru authored
      After fixing hibernation resume flow, another usecase was found which
      should be explicitly handled - resume when device is in "down" state.
      Invoke aq_nic_init jointly with aq_nic_start only if ndev was already
      up during suspend/hibernate. We still need to perform nic_deinit() if
      caller requests for it, to handle the freeze/resume scenarios.
      
      Fixes: 57f780f1 ("atlantic: Fix driver resume flow.")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d88c339
    • Aya Levin's avatar
      net/mlx4_en: Don't allow aRFS for encapsulated packets · fdbccea4
      Aya Levin authored
      Driver doesn't support aRFS for encapsulated packets, return early error
      in such a case.
      
      Fixes: 1eb8c695 ("net/mlx4_en: Add accelerated RFS support")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdbccea4
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled · acc64f52
      Vladimir Oltean authored
      The blamed commit made the fatally incorrect assumption that ports which
      aren't in the FORWARDING STP state should not have packets forwarded
      towards them, and that is all that needs to be done.
      
      However, that logic alone permits BLOCKING ports to forward to
      FORWARDING ports, which of course allows packet storms to occur when
      there is an L2 loop.
      
      The ocelot_get_bridge_fwd_mask should not only ask "what can the bridge
      do for you", but "what can you do for the bridge". This way, only
      FORWARDING ports forward to the other FORWARDING ports from the same
      bridging domain, and we are still compatible with the idea of multiple
      bridges.
      
      Fixes: df291e54 ("net: ocelot: support multiple bridges")
      Suggested-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Reported-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acc64f52
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries · e68daf61
      Felix Fietkau authored
      Sometimes multiple CLS_REPLACE calls are issued for the same connection.
      rhashtable_insert_fast does not check for these duplicates, so multiple
      hardware flow entries can be created.
      Fix this by checking for an existing entry early
      
      Fixes: 502e84e2 ("net: ethernet: mtk_eth_soc: add flow offloading support")
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarIlya Lipnitskiy <ilya.lipnitskiy@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e68daf61
    • Mark Brown's avatar
      nfc: st-nci: Add SPI ID matching DT compatible · 31339440
      Mark Brown authored
      Currently autoloading for SPI devices does not use the DT ID table, it uses
      SPI modalises. Supporting OF modalises is going to be difficult if not
      impractical, an attempt was made but has been reverted, so ensure that
      module autoloading works for this driver by adding the part name used in
      the compatible to the list of SPI IDs.
      
      Fixes: 96c8395e ("spi: Revert modalias changes")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31339440
    • Guvenc Gulce's avatar
      MAINTAINERS: remove Guvenc Gulce as net/smc maintainer · 5b099870
      Guvenc Gulce authored
      Remove myself as net/smc maintainer, as I am
      leaving IBM soon and can not maintain net/smc anymore.
      
      Cc: Julian Wiedmann <jwi@linux.ibm.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarGuvenc Gulce <guvenc@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b099870
    • Ido Schimmel's avatar
      nexthop: Fix memory leaks in nexthop notification chain listeners · 3106a084
      Ido Schimmel authored
      syzkaller discovered memory leaks [1] that can be reduced to the
      following commands:
      
       # ip nexthop add id 1 blackhole
       # devlink dev reload pci/0000:06:00.0
      
      As part of the reload flow, mlxsw will unregister its netdevs and then
      unregister from the nexthop notification chain. Before unregistering
      from the notification chain, mlxsw will receive delete notifications for
      nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw
      will not receive notifications for nexthops using netdevs that are not
      dismantled as part of the reload flow. For example, the blackhole
      nexthop above that internally uses the loopback netdev as its nexthop
      device.
      
      One way to fix this problem is to have listeners flush their nexthop
      tables after unregistering from the notification chain. This is
      error-prone as evident by this patch and also not symmetric with the
      registration path where a listener receives a dump of all the existing
      nexthops.
      
      Therefore, fix this problem by replaying delete notifications for the
      listener being unregistered. This is symmetric to the registration path
      and also consistent with the netdev notification chain.
      
      The above means that unregister_nexthop_notifier(), like
      register_nexthop_notifier(), will have to take RTNL in order to iterate
      over the existing nexthops and that any callers of the function cannot
      hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN
      driver. To avoid a deadlock, change the latter to unregister its nexthop
      listener without holding RTNL, making it symmetric to the registration
      path.
      
      [1]
      unreferenced object 0xffff88806173d600 (size 512):
        comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s)
        hex dump (first 32 bytes):
          41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff  A..`......sa....
          08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00  ..sa............
        backtrace:
          [<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522
          [<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline]
          [<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline]
          [<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231
          [<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline]
          [<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline]
          [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline]
          [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline]
          [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239
          [<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83
          [<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline]
          [<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306
          [<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244
          [<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline]
          [<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline]
          [<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913
          [<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572
          [<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504
          [<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590
          [<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
          [<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340
          [<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929
          [<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline]
          [<ffffffff832fde84>] sock_sendmsg net/socket.c:724 [inline]
          [<ffffffff832fde84>] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409
          [<ffffffff83304a44>] ___sys_sendmsg+0x104/0x170 net/socket.c:2463
          [<ffffffff83304c01>] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492
          [<ffffffff83304d5d>] __do_sys_sendmsg net/socket.c:2501 [inline]
          [<ffffffff83304d5d>] __se_sys_sendmsg net/socket.c:2499 [inline]
          [<ffffffff83304d5d>] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499
      
      Fixes: 2a014b20 ("mlxsw: spectrum_router: Add support for nexthop objects")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3106a084
    • Johannes Berg's avatar
      mac80211-hwsim: fix late beacon hrtimer handling · 313bbd19
      Johannes Berg authored
      Thomas explained in https://lore.kernel.org/r/87mtoeb4hb.ffs@tglx
      that our handling of the hrtimer here is wrong: If the timer fires
      late (e.g. due to vCPU scheduling, as reported by Dmitry/syzbot)
      then it tries to actually rearm the timer at the next deadline,
      which might be in the past already:
      
       1          2          3          N          N+1
       |          |          |   ...    |          |
      
       ^ intended to fire here (1)
                  ^ next deadline here (2)
                                            ^ actually fired here
      
      The next time it fires, it's later, but will still try to schedule
      for the next deadline (now 3), etc. until it catches up with N,
      but that might take a long time, causing stalls etc.
      
      Now, all of this is simulation, so we just have to fix it, but
      note that the behaviour is wrong even per spec, since there's no
      value then in sending all those beacons unaligned - they should be
      aligned to the TBTT (1, 2, 3, ... in the picture), and if we're a
      bit (or a lot) late, then just resume at that point.
      
      Therefore, change the code to use hrtimer_forward_now() which will
      ensure that the next firing of the timer would be at N+1 (in the
      picture), i.e. the next interval point after the current time.
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com
      Fixes: 01e59e46 ("mac80211_hwsim: hrtimer beacon")
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20210915112936.544f383472eb.I3f9712009027aa09244b65399bf18bf482a8c4f1@changeidSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      313bbd19
    • Johannes Berg's avatar
      mac80211: mesh: fix potentially unaligned access · b9731062
      Johannes Berg authored
      The pointer here points directly into the frame, so the
      access is potentially unaligned. Use get_unaligned_le16
      to avoid that.
      
      Fixes: 3f52b7e3 ("mac80211: mesh power save basics")
      Link: https://lore.kernel.org/r/20210920154009.3110ff75be0c.Ib6a2ff9e9cc9bc6fca50fce631ec1ce725cc926b@changeidSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      b9731062
    • Lorenzo Bianconi's avatar
      mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotap · 13cb6d82
      Lorenzo Bianconi authored
      Limit max values for vht mcs and nss in ieee80211_parse_tx_radiotap
      routine in order to fix the following warning reported by syzbot:
      
      WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_rate_set_vht include/net/mac80211.h:989 [inline]
      WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244
      Modules linked in:
      CPU: 0 PID: 10717 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:ieee80211_rate_set_vht include/net/mac80211.h:989 [inline]
      RIP: 0010:ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244
      RSP: 0018:ffffc9000186f3e8 EFLAGS: 00010216
      RAX: 0000000000000618 RBX: ffff88804ef76500 RCX: ffffc900143a5000
      RDX: 0000000000040000 RSI: ffffffff888f478e RDI: 0000000000000003
      RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000100
      R10: ffffffff888f46f9 R11: 0000000000000000 R12: 00000000fffffff8
      R13: ffff88804ef7653c R14: 0000000000000001 R15: 0000000000000004
      FS:  00007fbf5718f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2de23000 CR3: 000000006a671000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Call Trace:
       ieee80211_monitor_select_queue+0xa6/0x250 net/mac80211/iface.c:740
       netdev_core_pick_tx+0x169/0x2e0 net/core/dev.c:4089
       __dev_queue_xmit+0x6f9/0x3710 net/core/dev.c:4165
       __bpf_tx_skb net/core/filter.c:2114 [inline]
       __bpf_redirect_no_mac net/core/filter.c:2139 [inline]
       __bpf_redirect+0x5ba/0xd20 net/core/filter.c:2162
       ____bpf_clone_redirect net/core/filter.c:2429 [inline]
       bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2401
       bpf_prog_eeb6f53a69e5c6a2+0x59/0x234
       bpf_dispatcher_nop_func include/linux/bpf.h:717 [inline]
       __bpf_prog_run include/linux/filter.h:624 [inline]
       bpf_prog_run include/linux/filter.h:631 [inline]
       bpf_test_run+0x381/0xa30 net/bpf/test_run.c:119
       bpf_prog_test_run_skb+0xb84/0x1ee0 net/bpf/test_run.c:663
       bpf_prog_test_run kernel/bpf/syscall.c:3307 [inline]
       __sys_bpf+0x2137/0x5df0 kernel/bpf/syscall.c:4605
       __do_sys_bpf kernel/bpf/syscall.c:4691 [inline]
       __se_sys_bpf kernel/bpf/syscall.c:4689 [inline]
       __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4689
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      
      Reported-by: syzbot+0196ac871673f0c20f68@syzkaller.appspotmail.com
      Fixes: 646e76bb ("mac80211: parse VHT info in injected frames")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/c26c3f02dcb38ab63b2f2534cb463d95ee81bb13.1632141760.git.lorenzo@kernel.orgSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      13cb6d82
    • YueHaibing's avatar
      mac80211: Drop frames from invalid MAC address in ad-hoc mode · a6555f84
      YueHaibing authored
      WARNING: CPU: 1 PID: 9 at net/mac80211/sta_info.c:554
      sta_info_insert_rcu+0x121/0x12a0
      Modules linked in:
      CPU: 1 PID: 9 Comm: kworker/u8:1 Not tainted 5.14.0-rc7+ #253
      Workqueue: phy3 ieee80211_iface_work
      RIP: 0010:sta_info_insert_rcu+0x121/0x12a0
      ...
      Call Trace:
       ieee80211_ibss_finish_sta+0xbc/0x170
       ieee80211_ibss_work+0x13f/0x7d0
       ieee80211_iface_work+0x37a/0x500
       process_one_work+0x357/0x850
       worker_thread+0x41/0x4d0
      
      If an Ad-Hoc node receives packets with invalid source MAC address,
      it hits a WARN_ON in sta_info_insert_check(), this can spam the log.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Link: https://lore.kernel.org/r/20210827144230.39944-1-yuehaibing@huawei.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      a6555f84