1. 24 Jun, 2023 10 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · a685d0df
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2023-06-23
      
      We've added 49 non-merge commits during the last 24 day(s) which contain
      a total of 70 files changed, 1935 insertions(+), 442 deletions(-).
      
      The main changes are:
      
      1) Extend bpf_fib_lookup helper to allow passing the route table ID,
         from Louis DeLosSantos.
      
      2) Fix regsafe() in verifier to call check_ids() for scalar registers,
         from Eduard Zingerman.
      
      3) Extend the set of cpumask kfuncs with bpf_cpumask_first_and()
         and a rework of bpf_cpumask_any*() kfuncs. Additionally,
         add selftests, from David Vernet.
      
      4) Fix socket lookup BPF helpers for tc/XDP to respect VRF bindings,
         from Gilad Sever.
      
      5) Change bpf_link_put() to use workqueue unconditionally to fix it
         under PREEMPT_RT, from Sebastian Andrzej Siewior.
      
      6) Follow-ups to address issues in the bpf_refcount shared ownership
         implementation, from Dave Marchevsky.
      
      7) A few general refactorings to BPF map and program creation permissions
         checks which were part of the BPF token series, from Andrii Nakryiko.
      
      8) Various fixes for benchmark framework and add a new benchmark
         for BPF memory allocator to BPF selftests, from Hou Tao.
      
      9) Documentation improvements around iterators and trusted pointers,
         from Anton Protopopov.
      
      10) Small cleanup in verifier to improve allocated object check,
          from Daniel T. Lee.
      
      11) Improve performance of bpf_xdp_pointer() by avoiding access
          to shared_info when XDP packet does not have frags,
          from Jesper Dangaard Brouer.
      
      12) Silence a harmless syzbot-reported warning in btf_type_id_size(),
          from Yonghong Song.
      
      13) Remove duplicate bpfilter_umh_cleanup in favor of umd_cleanup_helper,
          from Jarkko Sakkinen.
      
      14) Fix BPF selftests build for resolve_btfids under custom HOSTCFLAGS,
          from Viktor Malik.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits)
        bpf, docs: Document existing macros instead of deprecated
        bpf, docs: BPF Iterator Document
        selftests/bpf: Fix compilation failure for prog vrf_socket_lookup
        selftests/bpf: Add vrf_socket_lookup tests
        bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings
        bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint
        bpf: Factor out socket lookup functions for the TC hookpoint.
        selftests/bpf: Set the default value of consumer_cnt as 0
        selftests/bpf: Ensure that next_cpu() returns a valid CPU number
        selftests/bpf: Output the correct error code for pthread APIs
        selftests/bpf: Use producer_cnt to allocate local counter array
        xsk: Remove unused inline function xsk_buff_discard()
        bpf: Keep BPF_PROG_LOAD permission checks clear of validations
        bpf: Centralize permissions checks for all BPF map types
        bpf: Inline map creation logic in map_create() function
        bpf: Move unprivileged checks into map_create() and bpf_prog_load()
        bpf: Remove in_atomic() from bpf_link_put().
        selftests/bpf: Verify that check_ids() is used for scalars in regsafe()
        bpf: Verify scalar ids mapping in regsafe() using check_ids()
        selftests/bpf: Check if mark_chain_precision() follows scalar ids
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20230623211256.8409-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a685d0df
    • Jakub Kicinski's avatar
      Merge branch 'mlxsw-maintain-candidate-rifs' · d1d29a42
      Jakub Kicinski authored
      Petr Machata says:
      
      ====================
      mlxsw: Maintain candidate RIFs
      
      The mlxsw driver currently makes the assumption that the user applies
      configuration in a bottom-up manner. Thus netdevices need to be added to
      the bridge before IP addresses are configured on that bridge or SVI added
      on top of it. Enslaving a netdevice to another netdevice that already has
      uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
      it is rather easy to get into situations where the offloaded configuration
      is just plain wrong.
      
      As an example, take a front panel port, configure an IP address: it gets a
      RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the
      port from the bridge again, but the RIF never comes back. There is a number
      of similar situations, where changing the configuration there and back
      utterly breaks the offload.
      
      The situation is going to be made better by implementing a range of replays
      and post-hoc offloads.
      
      This patch set lays the ground for replay of next hops. The particular
      issue that it deals with is that currently, driver-specific bookkeeping for
      next hops is hooked off RIF objects, which come and go across the lifetime
      of a netdevice. We would rather keep these objects at an entity that
      mirrors the lifetime of the netdevice itself. That way they are at hand and
      can be offloaded when a RIF is eventually created.
      
      To that end, with this patchset, mlxsw keeps a hash table of CRIFs:
      candidate RIFs, persistent handles for netdevices that mlxsw deems
      potentially interesting. The lifetime of a CRIF matches that of the
      underlying netdevice, and thus a RIF can always assume a CRIF exists. A
      CRIF is where next hops are kept, and when RIF is created, these next hops
      can be easily offloaded. (Previously only the next hops created after the
      RIF was created were offloaded.)
      
      - Patches #1 and #2 are minor adjustments.
      - In patches #3 and #4, add CRIF bookkeeping.
      - In patch #5, link CRIFs to RIFs such that given a netdevice-backed RIF,
        the corresponding CRIF is easy to look up.
      - Patch #6 is a clean-up allowed by the previous patches
      - Patches #7 and #8 move next hop tracking to CRIFs
      
      No observable effects are intended as of yet. This will be useful once
      there is support for RIF creation for netdevices that become mlxsw uppers,
      which will come in following patch sets.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1d29a42
    • Petr Machata's avatar
      mlxsw: spectrum_router: Track next hops at CRIFs · 9464a3d6
      Petr Machata authored
      Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The
      reason is that eventually, next hops for mlxsw uppers should be offloaded
      and unoffloaded on demand as a netdevice becomes an upper, or stops being
      one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a
      netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the
      netdevice lifetime.
      
      Correspondingly, track at each next hop not its RIF, but its CRIF (from
      which a RIF can always be deduced).
      
      Note that now that next hops are tracked at a CRIF, it is not necessary to
      move each over to a new RIF when it is necessary to edit a RIF. Therefore
      drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy()
      call mlxsw_sp_nexthop_rif_update() directly.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9464a3d6
    • Petr Machata's avatar
      mlxsw: spectrum_router: Split nexthop finalization to two stages · a285d664
      Petr Machata authored
      Nexthop finalization consists of two steps: the part where the offload is
      removed, because the backing RIF is now gone; and the part where the
      association to the RIF is severed.
      
      Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the
      unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later
      be called independently.
      
      Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini()
      vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a
      historical happenstance than a conscious decision. The two cleanups do not
      depend on each other, and this change should have no observable effects.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a285d664
    • Petr Machata's avatar
      mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_index · bdc0b78e
      Petr Machata authored
      A previous patch added a pointer to loopback CRIF to the router data
      structure. That makes the loopback RIF index redundant, as everything
      necessary can be derived from the CRIF. Drop the field and adjust the code
      accordingly.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bdc0b78e
    • Petr Machata's avatar
      mlxsw: spectrum_router: Link CRIFs to RIFs · aa21242b
      Petr Machata authored
      When a RIF is about to be created, the registration of the netdevice that
      it should be associated with must have been seen in the past, and a CRIF
      created. Therefore make this a hard requirement by looking up the CRIF
      during RIF creation, and complaining loudly when there isn't one.
      
      This then allows to keep a link between a RIF and its corresponding
      CRIF (and back, as the relationship is one-to-at-most-one), which do.
      
      The CRIF will later be useful as the objects tracked there will be
      offloaded lazily as a result of RIF creation.
      
      CRIFs are created when an "interesting" netdevice is registered, and
      destroyed after such device is unregistered. CRIFs are supposed to already
      exist when a RIF creation request arises, and exist at least as long as
      that RIF exists. This makes for a simple invariant: it is always safe to
      dereference CRIF pointer from "its" RIF.
      
      To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER
      event is delivered. The reason is that if a RIF's netdevices has an IPv6
      address, removal of this address is notified in an atomic block. To remove
      the RIF, the IPv6 removal handler schedules a work item. It must be safe
      for this work item to access the associated CRIF as well.
      
      Thus when a netdevice that backs the CRIF is removed, if it still has a
      RIF, do not actually free the CRIF, only toggle its can_destroy flag, which
      this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa21242b
    • Petr Machata's avatar
      mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF · 78126cfd
      Petr Machata authored
      CRIFs are generally not maintained for loopback RIFs. However, the RIF for
      the default VRF is used for offloading of blackhole nexthops. Nexthops
      expect to have a valid CRIF. Therefore in this patch, add code to maintain
      CRIF for the loopback RIF as well.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      78126cfd
    • Petr Machata's avatar
      mlxsw: spectrum_router: Maintain a hash table of CRIFs · 4796c287
      Petr Machata authored
      CRIFs are objects that mlxsw maintains for netdevices that may not have an
      associated RIF (i.e. they may not have been instantiated in the ASIC), but
      if indeed they do not, it is quite possible they will in the future. These
      netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are
      created include e.g. bridges, LAGs, or front panel ports. The idea is that
      next hops would be kept at CRIFs, not RIFs, and thus it would be easier to
      offload and unoffload the entities that have been added before the RIF was
      created.
      
      In this patch, add the code for low-level CRIF maintenance: create and
      destroy, and keep in a table keyed by the netdevice pointer for easy
      recall.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4796c287
    • Petr Machata's avatar
      mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIF · f3c85eed
      Petr Machata authored
      The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the
      function mentioned in the subject. As such it forms an external interface
      of the router code.
      
      In future patches we will want to maintain connection between RIFs and the
      CRIFs (introduced in the next patch) that back them. That will not hold
      for the VRF-based loopback netdevices, so the whole CRIF business can be
      kept hidden from the rest of mlxsw.
      
      But for the main VRF loopback RIF we do want to keep the RIF-CRIF
      connection, because that RIF is used for blackhole next hops, and the next
      hop code can be kept simpler for assuming rif->crif is valid.
      
      Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback
      RIF. This being an internal function will take the CRIF argument anyway.
      Furthermore, the function does not lock, which is not necessary at this
      point in code yet.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f3c85eed
    • Petr Machata's avatar
  2. 23 Jun, 2023 30 commits