1. 10 Jun, 2024 13 commits
    • Jakub Kicinski's avatar
      rtnetlink: move rtnl_lock handling out of af_netlink · 5380d64f
      Jakub Kicinski authored
      Now that we have an intermediate layer of code for handling
      rtnl-level netlink dump quirks, we can move the rtnl_lock
      taking there.
      
      For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can
      avoid taking rtnl_lock just to generate NLM_DONE, once again.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5380d64f
    • Andy Shevchenko's avatar
      net: dsa: hellcreek: Replace kernel.h with what is used · c917b26e
      Andy Shevchenko authored
      kernel.h is included solely for some other existing headers.
      Include them directly and get rid of kernel.h.
      
      While at it, sort headers alphabetically for easier maintenance.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c917b26e
    • David S. Miller's avatar
      Merge branch 'tcp-up-pin-tw-timer' · a9522664
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      net: tcp: un-pin tw timer
      
      Changes since previous iteration:
       - Patch 1: update a comment, I copied Erics v7 RvB tag.
       - Patch 2: move bh off/on into hashdance_schedule and get rid of
         comment mentioning pinned tw timer.
         I did not copy Erics RvB tag over from v7 because of the change.
       - Patch 3 is unchanged, so I kept Erics RvB tag.
      
      This is v8 of the series where the tw_timer is un-pinned to get rid of
      interferences in isolated CPUs setups.
      
      First patch makes necessary preparations, existing code relies on
      TIMER_PINNED to avoid races.
      
      Second patch un-pins the TW timer. Could be folded into the first one,
      but it might help wrt. bisection.
      
      Third patch is a minor cleanup to move a helper from .h to the only
      remaining compilation unit.
      
      Tested with iperf3 and stress-ng socket mode.
      ====================
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9522664
    • Florian Westphal's avatar
      tcp: move inet_twsk_schedule helper out of header · f81d0dd2
      Florian Westphal authored
      Its no longer used outside inet_timewait_sock.c, so move it there.
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f81d0dd2
    • Florian Westphal's avatar
      net: tcp: un-pin the tw_timer · c75ad7c7
      Florian Westphal authored
      After previous patch, even if timer fires immediately on another CPU,
      context that schedules the timer now holds the ehash spinlock, so timer
      cannot reap tw socket until ehash lock is released.
      
      BH disable is moved into hashdance_schedule.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c75ad7c7
    • Valentin Schneider's avatar
      net: tcp/dccp: prepare for tw_timer un-pinning · b334b924
      Valentin Schneider authored
      The TCP timewait timer is proving to be problematic for setups where
      scheduler CPU isolation is achieved at runtime via cpusets (as opposed to
      statically via isolcpus=domains).
      
      What happens there is a CPU goes through tcp_time_wait(), arming the
      time_wait timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer
      fires, causing interference for the now-isolated CPU. This is conceptually
      similar to the issue described in commit e02b9312 ("workqueue: Unbind
      kworkers before sending them to exit()")
      
      Move inet_twsk_schedule() to within inet_twsk_hashdance(), with the ehash
      lock held. Expand the lock's critical section from inet_twsk_kill() to
      inet_twsk_deschedule_put(), serializing the scheduling vs descheduling of
      the timer. IOW, this prevents the following race:
      
      			     tcp_time_wait()
      			       inet_twsk_hashdance()
        inet_twsk_deschedule_put()
          del_timer_sync()
      			       inet_twsk_schedule()
      
      Thanks to Paolo Abeni for suggesting to leverage the ehash lock.
      
      This also restores a comment from commit ec94c269 ("tcp/dccp: avoid
      one atomic operation for timewait hashdance") as inet_twsk_hashdance() had
      a "Step 1" and "Step 3" comment, but the "Step 2" had gone missing.
      
      inet_twsk_deschedule_put() now acquires the ehash spinlock to synchronize
      with inet_twsk_hashdance_schedule().
      
      To ease possible regression search, actual un-pin is done in next patch.
      
      Link: https://lore.kernel.org/all/ZPhpfMjSiHVjQkTk@localhost.localdomain/Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Co-developed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b334b924
    • David S. Miller's avatar
      Merge branch 'mlxsw-acl-fixes' · 8d466c8f
      David S. Miller authored
      Petr Machata says:
      
      ====================
      mlxsw: ACL fixes
      
      Ido Schimmel writes:
      
      Patches #1-#3 fix various spelling mistakes I noticed while working on
      the code base.
      
      Patch #4 fixes a general protection fault by bailing out when the error
      occurs and warning.
      
      Patch #5 fixes the warning.
      
      Patch #6 fixes ACL scale regression and firmware errors.
      
      See the commit messages for more info.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d466c8f
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors · 75d8d7a6
      Ido Schimmel authored
      ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and
      newer ASICs can share the same mask if their masks only differ in up to
      8 consecutive bits. For example, consider the following filters:
      
       # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop
       # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop
      
      The second filter can use the same mask as the first (dst_ip/24) with a
      delta of 1 bit.
      
      However, the above only works because the two filters have different
      values in the common unmasked part (dst_ip/24). When entries have the
      same value in the common unmasked part they create undesired collisions
      in the device since many entries now have the same key. This leads to
      firmware errors such as [1] and to a reduced scale.
      
      Fix by adjusting the hash table key to only include the value in the
      common unmasked part. That is, without including the delta bits. That
      way the driver will detect the collision during filter insertion and
      spill the filter into the circuit TCAM (C-TCAM).
      
      Add a test case that fails without the fix and adjust existing cases
      that check C-TCAM spillage according to the above limitation.
      
      [1]
      mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available))
      
      Fixes: c22291f7 ("mlxsw: spectrum: acl: Implement delta for ERP")
      Reported-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75d8d7a6
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_erp: Fix object nesting warning · 97d833ce
      Ido Schimmel authored
      ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
      (A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
      contain more ACLs (i.e., tc filters), but the number of masks in each
      region (i.e., tc chain) is limited.
      
      In order to mitigate the effects of the above limitation, the device
      allows filters to share a single mask if their masks only differ in up
      to 8 consecutive bits. For example, dst_ip/25 can be represented using
      dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
      number of masks being used (and therefore does not support mask
      aggregation), but can contain a limited number of filters.
      
      The driver uses the "objagg" library to perform the mask aggregation by
      passing it objects that consist of the filter's mask and whether the
      filter is to be inserted into the A-TCAM or the C-TCAM since filters in
      different TCAMs cannot share a mask.
      
      The set of created objects is dependent on the insertion order of the
      filters and is not necessarily optimal. Therefore, the driver will
      periodically ask the library to compute a more optimal set ("hints") by
      looking at all the existing objects.
      
      When the library asks the driver whether two objects can be aggregated
      the driver only compares the provided masks and ignores the A-TCAM /
      C-TCAM indication. This is the right thing to do since the goal is to
      move as many filters as possible to the A-TCAM. The driver also forbids
      two identical masks from being aggregated since this can only happen if
      one was intentionally put in the C-TCAM to avoid a conflict in the
      A-TCAM.
      
      The above can result in the following set of hints:
      
      H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
      H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
      
      After getting the hints from the library the driver will start migrating
      filters from one region to another while consulting the computed hints
      and instructing the device to perform a lookup in both regions during
      the transition.
      
      Assuming a filter with mask X is being migrated into the A-TCAM in the
      new region, the hints lookup will return H1. Since H2 is the parent of
      H1, the library will try to find the object associated with it and
      create it if necessary in which case another hints lookup (recursive)
      will be performed. This hints lookup for {mask Y, A-TCAM} will either
      return H2 or H3 since the driver passes the library an object comparison
      function that ignores the A-TCAM / C-TCAM indication.
      
      This can eventually lead to nested objects which are not supported by
      the library [1].
      
      Fix by removing the object comparison function from both the driver and
      the library as the driver was the only user. That way the lookup will
      only return exact matches.
      
      I do not have a reliable reproducer that can reproduce the issue in a
      timely manner, but before the fix the issue would reproduce in several
      minutes and with the fix it does not reproduce in over an hour.
      
      Note that the current usefulness of the hints is limited because they
      include the C-TCAM indication and represent aggregation that cannot
      actually happen. This will be addressed in net-next.
      
      [1]
      WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
      Modules linked in:
      CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
      Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
      Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
      RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
      [...]
      Call Trace:
       <TASK>
       __objagg_obj_get+0x2bb/0x580
       objagg_obj_get+0xe/0x80
       mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
       mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
       mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
       mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
       mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
       process_one_work+0x151/0x370
      
      Fixes: 9069a381 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97d833ce
    • Ido Schimmel's avatar
      lib: objagg: Fix general protection fault · b4a3a89f
      Ido Schimmel authored
      The library supports aggregation of objects into other objects only if
      the parent object does not have a parent itself. That is, nesting is not
      supported.
      
      Aggregation happens in two cases: Without and with hints, where hints
      are a pre-computed recommendation on how to aggregate the provided
      objects.
      
      Nesting is not possible in the first case due to a check that prevents
      it, but in the second case there is no check because the assumption is
      that nesting cannot happen when creating objects based on hints. The
      violation of this assumption leads to various warnings and eventually to
      a general protection fault [1].
      
      Before fixing the root cause, error out when nesting happens and warn.
      
      [1]
      general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI
      CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G        W          6.9.0-rc6-custom-gd9b4f1cca7fb #7
      Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
      Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
      RIP: 0010:mlxsw_sp_acl_erp_bf_insert+0x25/0x80
      [...]
      Call Trace:
       <TASK>
       mlxsw_sp_acl_atcam_entry_add+0x256/0x3c0
       mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
       mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
       mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
       process_one_work+0x151/0x370
       worker_thread+0x2cb/0x3e0
       kthread+0xd0/0x100
       ret_from_fork+0x34/0x50
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      
      Fixes: 9069a381 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
      Reported-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4a3a89f
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_atcam: Fix wrong comment · 06fcdf24
      Ido Schimmel authored
      The key is encoded, not encrypted.
      
      Fixes: c22291f7 ("mlxsw: spectrum: acl: Implement delta for ERP")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06fcdf24
    • Ido Schimmel's avatar
      lib: test_objagg: Fix spelling · 2aad28ec
      Ido Schimmel authored
      Fixes: 0a020d41 ("lib: introduce initial implementation of object aggregation manager")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aad28ec
    • Ido Schimmel's avatar
      lib: objagg: Fix spelling · c1e156ae
      Ido Schimmel authored
      Fixes: 0a020d41 ("lib: introduce initial implementation of object aggregation manager")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1e156ae
  2. 09 Jun, 2024 3 commits
  3. 07 Jun, 2024 2 commits
  4. 06 Jun, 2024 22 commits