1. 27 Nov, 2018 25 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-Prepare-for-VLAN-aware-bridge-w-VxLAN' · 50853808
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Prepare for VLAN-aware bridge w/VxLAN
      
      The driver is using 802.1Q filtering identifiers (FIDs) to represent the
      different VLANs in the VLAN-aware bridge (only one is supported).
      
      However, the device cannot assign a VNI to such FIDs, which prevents the
      driver from supporting the enslavement of VxLAN devices to the
      VLAN-aware bridge.
      
      This patchset works around this limitation by emulating 802.1Q FIDs
      using 802.1D FIDs, which can be assigned a VNI and so far have only been
      used in conjunction with VLAN-unaware bridges.
      
      The downside of this approach is that multiple {Port,VID}->FID entries
      are required, whereas a single VID->FID entry is required with "true"
      802.1Q FIDs.
      
      First four patches introduce the new FID family of emulated 802.1Q FIDs
      and the associated type of router interfaces (RIFs). Last patch flips
      the driver to use this new FID family.
      
      The diff is relatively small because the internal implementation of each
      FID family is contained and hidden in spectrum_fid.c. Different internal
      users (e.g., bridge, router) are aware of the different FID types, but
      do not care about their internal implementation. This makes it trivial
      to swap the current implementation of 802.1Q FIDs with the new one,
      using 802.1D FIDs.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50853808
    • Ido Schimmel's avatar
      mlxsw: spectrum: Flip driver to use emulated 802.1Q FIDs · c2e7490c
      Ido Schimmel authored
      Replace 802.1Q FIDs and VLAN RIFs with their emulated counterparts.
      
      The emulated 802.1Q FIDs are actually 802.1D FIDs and thus use the same
      flood tables, of per-FID type. Therefore, add 4K-1 entries to the
      per-FID flood tables for the new FIDs and get rid of the FID-offset
      flood tables that were used by the old 802.1Q FIDs.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2e7490c
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Introduce emulated VLAN RIFs · ba6da02a
      Ido Schimmel authored
      Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of
      "VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of
      "FID" type.
      
      In other words, the RIF type is derived from the underlying FID type.
      VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on
      top of 802.1D FIDs.
      
      Since the previous patch emulated 802.1Q FIDs using 802.1D FIDs, this
      patch emulates VLAN RIFs using FID RIFs.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba6da02a
    • Ido Schimmel's avatar
      mlxsw: spectrum_fid: Introduce emulated 802.1Q FIDs · d62dd8a0
      Ido Schimmel authored
      The driver uses 802.1Q FIDs when offloading a VLAN-aware bridge.
      Unfortunately, it is not possible to assign a VNI to such FIDs, which
      prompts the driver to forbid the enslavement of VxLAN devices to a
      VLAN-aware bridge.
      
      Workaround this hardware limitation by creating a new family of FIDs,
      emulated 802.1Q FIDs. These FIDs are emulated using 802.1D FIDs, which
      can be assigned a VNI.
      
      The downside of this approach is that multiple {Port, VID}->FID entries
      are required, whereas only a single VID->FID is required with "true"
      802.1Q FIDs.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d62dd8a0
    • Ido Schimmel's avatar
      mlxsw: spectrum_fid: Make flood index calculation more robust · 7c4a7292
      Ido Schimmel authored
      802.1D FIDs use a per-FID flood table, where the flood index into the
      table is calculated by subtracting 4K from the FID's index.
      
      Currently, 802.1D FIDs start at 4K, so the calculation is correct, but
      if it was ever to change, the calculation will no longer be correct.
      
      In addition, this change will allow us to reuse the flood index
      calculation function in the next patch, where we are going to emulate
      802.1Q FIDs using 802.1D FIDs.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c4a7292
    • Ido Schimmel's avatar
      mlxsw: spectrum_switchdev: Do not set field when it is reserved · 6502be9f
      Ido Schimmel authored
      When configuring an FDB entry pointing to a LAG netdev (or its upper),
      the driver should only set the 'lag_vid' field when the FID (filtering
      identifier) is of 802.1D type.
      
      Extend the 802.1D FID family with an attribute indicating whether this
      field should be set and based on its value set the field or leave it
      blank.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6502be9f
    • YueHaibing's avatar
      net: aquantia: return 'err' if set MPI_DEINIT state fails · 4e3c7c00
      YueHaibing authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:260:7:
       warning: variable 'err' set but not used [-Wunused-but-set-variable]
      
      'err' should be returned while set MPI_DEINIT state fails
      in hw_atl_utils_soft_reset.
      
      Fixes: cce96d18 ("net: aquantia: Regression on reset with 1.x firmware")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e3c7c00
    • David S. Miller's avatar
      Merge branch 'bridge-bools' · ff223789
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: add an option to disabe linklocal learning
      
      This set adds a new bridge option which can control learning from
      link-local packets, by default learning is on to be consistent and avoid
      breaking users expectations. If the new no_linklocal_learn option is
      enabled then the bridge will stop learning from link-local packets.
      
      In order to save space for future boolean options, patch 01 adds a new
      bool option API that uses a bitmask to control boolean options. The
      bridge is by far the largest netlink attr user and we keep adding simple
      boolean options which waste nl attr ids and space. We're not directly
      mapping these to the in-kernel bridge flags because some might require
      more complex configuration changes (e.g. if we were to add the per port
      vlan stats now, it'd require multiple checks before changing value).
      Any new bool option needs to be handled by both br_boolopt_toggle and get
      in order to be able to retrieve its state later. All such options are
      automatically exported via netlink. The behaviour of setting such
      options is consistent with netlink option handling when a missing
      option is being set (silently ignored), e.g. when a newer iproute2 is used
      on older kernel. All supported options are exported via bm's optmask
      when dumping the new attribute.
      
      v2: address Andrew Lunn's comments, squash a minor change into patch 01,
          export all supported options via optmask when dumping, add patch 03,
          pass down extack so options can return meaningful errors, add
          WARN_ON on unsupported options (should not happen)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff223789
    • Nikolay Aleksandrov's avatar
      net: bridge: export supported boolopts · 1ed1ccb9
      Nikolay Aleksandrov authored
      Now that we have at least one bool option, we can export all of the
      supported bool options via optmask when dumping them.
      
      v2: new patch
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ed1ccb9
    • Nikolay Aleksandrov's avatar
      net: bridge: add no_linklocal_learn bool option · 70e4272b
      Nikolay Aleksandrov authored
      Use the new boolopt API to add an option which disables learning from
      link-local packets. The default is kept as before and learning is
      enabled. This is a simple map from a boolopt bit to a bridge private
      flag that is tested before learning.
      
      v2: pass NULL for extack via sysfs
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70e4272b
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for user-controlled bool options · a428afe8
      Nikolay Aleksandrov authored
      We have been adding many new bridge options, a big number of which are
      boolean but still take up netlink attribute ids and waste space in the skb.
      Recently we discussed learning from link-local packets[1] and decided
      yet another new boolean option will be needed, thus introducing this API
      to save some bridge nl space.
      The API supports changing the value of multiple boolean options at once
      via the br_boolopt_multi struct which has an optmask (which options to
      set, bit per opt) and optval (options' new values). Future boolean
      options will only be added to the br_boolopt_id enum and then will have
      to be handled in br_boolopt_toggle/get. The API will automatically
      add the ability to change and export them via netlink, sysfs can use the
      single boolopt function versions to do the same. The behaviour with
      failing/succeeding is the same as with normal netlink option changing.
      
      If an option requires mapping to internal kernel flag or needs special
      configuration to be enabled then it should be handled in
      br_boolopt_toggle. It should also be able to retrieve an option's current
      state via br_boolopt_get.
      
      v2: WARN_ON() on unsupported option as that shouldn't be possible and
          also will help catch people who add new options without handling
          them for both set and get. Pass down extack so if an option desires
          it could set it on error and be more user-friendly.
      
      [1] https://www.spinics.net/lists/netdev/msg532698.htmlSigned-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a428afe8
    • David S. Miller's avatar
      Merge branch 'virtio-support-packed-ring' · 02c72d5e
      David S. Miller authored
      Tiwei Bie says:
      
      ====================
      virtio: support packed ring
      
      This patch set implements packed ring support in virtio driver.
      
      A performance test between pktgen (pktgen_sample03_burst_single_flow.sh)
      and DPDK vhost (testpmd/rxonly/vhost-PMD) has been done, I saw
      ~30% performance gain in packed ring in this case.
      
      To make this patch set work with below patch set for vhost,
      some hacks are needed to set the _F_NEXT flag in indirect
      descriptors (this should be fixed in vhost):
      
      https://lkml.org/lkml/2018/7/3/33
      
      v2 -> v3:
      - Use leXX instead of virtioXX (MST);
      - Refactor split ring first (MST);
      - Add debug helpers (MST);
      - Put split/packed ring specific fields in sub structures (MST);
      - Handle normal descriptors and indirect descriptors differently (MST);
      - Track the DMA addr/len related info in a separate structure (MST);
      - Calculate AVAIL/USED flags only when wrap counter wraps (MST);
      - Define a struct/union to read event structure (MST);
      - Define a macro for wrap counter bit in uapi (MST);
      - Define the AVAIL/USED bits as shifts instead of values (MST);
      - s/_F_/_FLAG_/ in VRING_PACKED_EVENT_* as they are values (MST);
      - Drop the notify workaround for QEMU's tx-timer in packed ring (MST);
      
      v1 -> v2:
      - Use READ_ONCE() to read event off_wrap and flags together (Jason);
      - Add comments related to ccw (Jason);
      
      RFC v6 -> v1:
      - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
        when event idx is off (Jason);
      - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
      - Test the state of the desc at used_idx instead of last_used_idx
        in virtqueue_enable_cb_delayed_packed() (Jason);
      - Save wrap counter (as part of queue state) in the return value
        of virtqueue_enable_cb_prepare_packed();
      - Refine the packed ring definitions in uapi;
      - Rebase on the net-next tree;
      
      RFC v5 -> RFC v6:
      - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
      - Define wrap counter as bool (Jason);
      - Use ALIGN() in vring_init_packed() (Jason);
      - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
      - Add comments for barriers (Jason);
      - Don't enable RING_PACKED on ccw for now (noticed by Jason);
      - Refine the memory barrier in virtqueue_poll();
      - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
      - Remove the hacks in virtqueue_enable_cb_prepare_packed();
      
      RFC v4 -> RFC v5:
      - Save DMA addr, etc in desc state (Jason);
      - Track used wrap counter;
      
      RFC v3 -> RFC v4:
      - Make ID allocation support out-of-order (Jason);
      - Various fixes for EVENT_IDX support;
      
      RFC v2 -> RFC v3:
      - Split into small patches (Jason);
      - Add helper virtqueue_use_indirect() (Jason);
      - Just set id for the last descriptor of a list (Jason);
      - Calculate the prev in virtqueue_add_packed() (Jason);
      - Fix/improve desc suppression code (Jason/MST);
      - Refine the code layout for XXX_split/packed and wrappers (MST);
      - Fix the comments and API in uapi (MST);
      - Remove the BUG_ON() for indirect (Jason);
      - Some other refinements and bug fixes;
      
      RFC v1 -> RFC v2:
      - Add indirect descriptor support - compile test only;
      - Add event suppression supprt - compile test only;
      - Move vring_packed_init() out of uapi (Jason, MST);
      - Merge two loops into one in virtqueue_add_packed() (Jason);
      - Split vring_unmap_one() for packed ring and split ring (Jason);
      - Avoid using '%' operator (Jason);
      - Rename free_head -> next_avail_idx (Jason);
      - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
      - Some other refinements and bug fixes;
      ====================
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02c72d5e
    • Tiwei Bie's avatar
      virtio_ring: advertize packed ring layout · f959a128
      Tiwei Bie authored
      Advertize the packed ring layout support.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f959a128
    • Tiwei Bie's avatar
      virtio_ring: disable packed ring on unsupported transports · 3a814fdf
      Tiwei Bie authored
      Currently, ccw, vop and remoteproc need some legacy virtio
      APIs to create or access virtio rings, which are not supported
      by packed ring. So disable packed ring on these transports
      for now.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a814fdf
    • Tiwei Bie's avatar
      virtio_ring: leverage event idx in packed ring · f51f9826
      Tiwei Bie authored
      Leverage the EVENT_IDX feature in packed ring to suppress
      events when it's available.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f51f9826
    • Tiwei Bie's avatar
      virtio_ring: introduce packed ring support · 1ce9e605
      Tiwei Bie authored
      Introduce the packed ring support. Packed ring can only be
      created by vring_create_virtqueue() and each chunk of packed
      ring will be allocated individually. Packed ring can not be
      created on preallocated memory by vring_new_virtqueue() or
      the likes currently.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ce9e605
    • Tiwei Bie's avatar
      virtio_ring: cache whether we will use DMA API · fb3fba6b
      Tiwei Bie authored
      Cache whether we will use DMA API, instead of doing the
      check every time. We are going to check whether DMA API
      is used more often in packed ring.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb3fba6b
    • Tiwei Bie's avatar
      virtio_ring: extract split ring handling from ring creation · d79dca75
      Tiwei Bie authored
      Introduce a specific function to create the split ring.
      And also move the DMA allocation and size information to
      the .split sub-structure.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d79dca75
    • Tiwei Bie's avatar
      virtio_ring: allocate desc state for split ring separately · cbeedb72
      Tiwei Bie authored
      Put the split ring's desc state into the .split sub-structure,
      and allocate desc state for split ring separately, this makes
      the code more readable and more consistent with what we will
      do for packed ring.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbeedb72
    • Tiwei Bie's avatar
      virtio_ring: introduce helper for indirect feature · 2f18c2d1
      Tiwei Bie authored
      Introduce a helper to check whether we will use indirect
      feature. It will be used by packed ring too.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f18c2d1
    • Tiwei Bie's avatar
      virtio_ring: introduce debug helpers · 4d6a105e
      Tiwei Bie authored
      Introduce debug helpers for last_add_time update, check and
      invalid. They will be used by packed ring too.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d6a105e
    • Tiwei Bie's avatar
      virtio_ring: put split ring fields in a sub struct · e593bf97
      Tiwei Bie authored
      Put the split ring specific fields in a sub-struct named
      as "split" to avoid misuse after introducing packed ring.
      There is no functional change.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e593bf97
    • Tiwei Bie's avatar
      virtio_ring: put split ring functions together · e6f633e5
      Tiwei Bie authored
      Put the xxx_split() functions together to make the
      code more readable and avoid misuse after introducing
      the packed ring. There is no functional change.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6f633e5
    • Tiwei Bie's avatar
      virtio_ring: add _split suffix for split ring functions · 138fd251
      Tiwei Bie authored
      Add _split suffix for split ring specific functions. This
      is a preparation for introducing the packed ring support.
      There is no functional change.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      138fd251
    • Tiwei Bie's avatar
      virtio: add packed ring types and macros · 89a9157e
      Tiwei Bie authored
      Add types and macros for packed ring.
      Signed-off-by: default avatarTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89a9157e
  2. 26 Nov, 2018 4 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 4afe60a9
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-11-26
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Extend BTF to support function call types and improve the BPF
         symbol handling with this info for kallsyms and bpftool program
         dump to make debugging easier, from Martin and Yonghong.
      
      2) Optimize LPM lookups by making longest_prefix_match() handle
         multiple bytes at a time, from Eric.
      
      3) Adds support for loading and attaching flow dissector BPF progs
         from bpftool, from Stanislav.
      
      4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.
      
      5) Enable verifier to support narrow context loads with offset > 0
         to adapt to LLVM code generation (currently only offset of 0 was
         supported). Add test cases as well, from Andrey.
      
      6) Simplify passing device functions for offloaded BPF progs by
         adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
         Also convert nfp and netdevsim to make use of them, from Quentin.
      
      7) Add support for sock_ops based BPF programs to send events to
         the perf ring-buffer through perf_event_output helper, from
         Sowmini and Daniel.
      
      8) Add read / write support for skb->tstamp from tc BPF and cg BPF
         programs to allow for supporting rate-limiting in EDT qdiscs
         like fq from BPF side, from Vlad.
      
      9) Extend libbpf API to support map in map types and add test cases
         for it as well to BPF kselftests, from Nikita.
      
      10) Account the maximum packet offset accessed by a BPF program in
          the verifier and use it for optimizing nfp JIT, from Jiong.
      
      11) Fix error handling regarding kprobe_events in BPF sample loader,
          from Daniel T.
      
      12) Add support for queue and stack map type in bpftool, from David.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4afe60a9
    • David Calavera's avatar
      bpf: align map type names formatting. · ffac28f9
      David Calavera authored
      Make the formatting for map_type_name array consistent.
      Signed-off-by: default avatarDavid Calavera <david.calavera@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ffac28f9
    • Colin Ian King's avatar
      bpf: btf: fix spelling mistake "Memmber" -> "Member" · 311fe1a8
      Colin Ian King authored
      There is a spelling mistake in a btf_verifier_log_member message,
      fix it.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      311fe1a8
    • Rustam Kovhaev's avatar
      bpf, tags: Fix DEFINE_PER_CPU expansion · cf0dd411
      Rustam Kovhaev authored
      Building tags produces warning:
      
        ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name pattern "\1"
      
      Let's use the same fix as in commit 25528213 ("tags: Fix DEFINE_PER_CPU
      expansions"), even though it violates the usual code style.
      Signed-off-by: default avatarRustam Kovhaev <rkovhaev@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cf0dd411
  3. 25 Nov, 2018 10 commits
  4. 24 Nov, 2018 1 commit