1. 24 Sep, 2021 33 commits
  2. 23 Sep, 2021 7 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 2fcd14d0
      Jakub Kicinski authored
      net/mptcp/protocol.c
        977d293e ("mptcp: ensure tx skbs always have the MPTCP ext")
        efe686ff ("mptcp: ensure tx skbs always have the MPTCP ext")
      
      same patch merged in both trees, keep net-next.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fcd14d0
    • Linus Torvalds's avatar
      Merge tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9bc62afe
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Current release - regressions:
      
         - dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
      
        Previous releases - regressions:
      
         - introduce a shutdown method to mdio device drivers, and make DSA
           switch drivers compatible with masters disappearing on shutdown;
           preventing infinite reference wait
      
         - fix issues in mdiobus users related to ->shutdown vs ->remove
      
         - virtio-net: fix pages leaking when building skb in big mode
      
         - xen-netback: correct success/error reporting for the
           SKB-with-fraglist
      
         - dsa: tear down devlink port regions when tearing down the devlink
           port on error
      
         - nexthop: fix division by zero while replacing a resilient group
      
         - hns3: check queue, vf, vlan ids range before using
      
        Previous releases - always broken:
      
         - napi: fix race against netpoll causing NAPI getting stuck
      
         - mlx4_en: ensure link operstate is updated even if link comes up
           before netdev registration
      
         - bnxt_en: fix TX timeout when TX ring size is set to the smallest
      
         - enetc: fix illegal access when reading affinity_hint; prevent oops
           on sysfs access
      
         - mtk_eth_soc: avoid creating duplicate offload entries
      
        Misc:
      
         - core: correct the sock::sk_lock.owned lockdep annotations"
      
      * tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
        atlantic: Fix issue in the pm resume flow.
        net/mlx4_en: Don't allow aRFS for encapsulated packets
        net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
        net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
        nfc: st-nci: Add SPI ID matching DT compatible
        MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
        nexthop: Fix memory leaks in nexthop notification chain listeners
        mptcp: ensure tx skbs always have the MPTCP ext
        qed: rdma - don't wait for resources under hw error recovery flow
        s390/qeth: fix deadlock during failing recovery
        s390/qeth: Fix deadlock in remove_discipline
        s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
        net: dsa: realtek: register the MDIO bus under devres
        net: dsa: don't allocate the slave_mii_bus using devres
        Doc: networking: Fox a typo in ice.rst
        net: dsa: fix dsa_tree_setup error path
        net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
        net/smc: add missing error check in smc_clc_prfx_set()
        net: hns3: fix a return value error in hclge_get_reset_status()
        net: hns3: check vlan id before using it
        ...
      9bc62afe
    • Shakeel Butt's avatar
      memcg: flush lruvec stats in the refault · 1f828223
      Shakeel Butt authored
      Prior to the commit 7e1c0d6f ("memcg: switch lruvec stats to rstat")
      and the commit aa48e47e ("memcg: infrastructure to flush memcg
      stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
      32) at worst and for unbounded amount of time.  The commit aa48e47e
      moved the lruvec stats to rstat infrastructure and the commit
      7e1c0d6f bounded the error for all the lruvec stats to (nr_cpus *
      32) at worst for at most 2 seconds.  More specifically it decoupled the
      number of stats and the number of cgroups from the error rate.
      
      However this reduction in error comes with the cost of triggering the
      slowpath of stats update more frequently.  Previously in the slowpath
      the kernel adds the stats up the memcg tree.  After aa48e47e, the
      kernel triggers the asyn lruvec stats flush through queue_work().  This
      causes regression reports from 0day kernel bot [1] as well as from
      phoronix test suite [2].
      
      We tried two options to fix the regression:
      
       1) Increase the threshold to trigger the slowpath in lruvec stats
          update codepath from 32 to 512.
      
       2) Remove the slowpath from lruvec stats update codepath and instead
          flush the stats in the page refault codepath. The assumption is that
          the kernel timely flush the stats, so, the update tree would be
          small in the refault codepath to not cause the preformance impact.
      
      Following are the results of will-it-scale/page_fault[1|2|3] benchmark
      on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
      aa48e47e and 7e1c0d6f reverted (3) 5.15-rc1 with option-1
      (4) 5.15-rc1 with option-2.
      
        test       (1)      (2)               (3)               (4)
        pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
        pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
        pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)
      
      From the above result, it seems like the option-2 not only solves the
      regression but also improves the performance for at least these
      benchmarks.
      
      Feng Tang (intel) ran the aim7 benchmark with these two options and
      confirms that option-1 reduces the regression but option-2 removes the
      regression.
      
      Michael Larabel (phoronix) ran multiple benchmarks with these options
      and reported the results at [3] and it shows for most benchmarks
      option-2 removes the regression introduced by the commit aa48e47e
      ("memcg: infrastructure to flush memcg stats").
      
      Based on the experiment results, this patch proposed the option-2 as the
      solution to resolve the regression.
      
      Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
      Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
      Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
      Fixes: aa48e47e ("memcg: infrastructure to flush memcg stats")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Tested-by: default avatarMichael Larabel <Michael@phoronix.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hillf Danton <hdanton@sina.com>,
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>,
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f828223
    • Sudarsana Reddy Kalluru's avatar
      atlantic: Fix issue in the pm resume flow. · 4d88c339
      Sudarsana Reddy Kalluru authored
      After fixing hibernation resume flow, another usecase was found which
      should be explicitly handled - resume when device is in "down" state.
      Invoke aq_nic_init jointly with aq_nic_start only if ndev was already
      up during suspend/hibernate. We still need to perform nic_deinit() if
      caller requests for it, to handle the freeze/resume scenarios.
      
      Fixes: 57f780f1 ("atlantic: Fix driver resume flow.")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d88c339
    • Aya Levin's avatar
      net/mlx4_en: Don't allow aRFS for encapsulated packets · fdbccea4
      Aya Levin authored
      Driver doesn't support aRFS for encapsulated packets, return early error
      in such a case.
      
      Fixes: 1eb8c695 ("net/mlx4_en: Add accelerated RFS support")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdbccea4
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled · acc64f52
      Vladimir Oltean authored
      The blamed commit made the fatally incorrect assumption that ports which
      aren't in the FORWARDING STP state should not have packets forwarded
      towards them, and that is all that needs to be done.
      
      However, that logic alone permits BLOCKING ports to forward to
      FORWARDING ports, which of course allows packet storms to occur when
      there is an L2 loop.
      
      The ocelot_get_bridge_fwd_mask should not only ask "what can the bridge
      do for you", but "what can you do for the bridge". This way, only
      FORWARDING ports forward to the other FORWARDING ports from the same
      bridging domain, and we are still compatible with the idea of multiple
      bridges.
      
      Fixes: df291e54 ("net: ocelot: support multiple bridges")
      Suggested-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Reported-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acc64f52
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries · e68daf61
      Felix Fietkau authored
      Sometimes multiple CLS_REPLACE calls are issued for the same connection.
      rhashtable_insert_fast does not check for these duplicates, so multiple
      hardware flow entries can be created.
      Fix this by checking for an existing entry early
      
      Fixes: 502e84e2 ("net: ethernet: mtk_eth_soc: add flow offloading support")
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarIlya Lipnitskiy <ilya.lipnitskiy@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e68daf61