1. 03 Dec, 2017 9 commits
    • David S. Miller's avatar
      Merge branch 'dsa-cross-chip-FDB-support' · 75d0de8c
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: cross-chip FDB support
      
      DSA can have interconnected switches. For instance, the ZII Dev Rev B
      board described in arch/arm/boot/dts/vf610-zii-dev-rev-b.dts has a
      switch fabric composed of 3 switch devices like this:
      
                                lan4                 lan6
              CPU (eth1)            |  lan5         |  lan7
                        |           | |             | |
             [0 1 2 3 4 6 5]---[6 0 1 2 3 4 5]---[9 0 1 2 3 4 5 6 7 8]
              | | |               |                     | | |
          lan0  |  lan2       lan3                  lan8  |  optical4
                 lan1                                      optical3
      
      One current issue with DSA is cross-chip FDB. If we add a static MAC
      address on lan3, only its parent switch 1 (the one in the middle) will
      be programmed. That is not correct in a cross-chip environment, because
      the DSA ports connecting to switch 1 of adjacent switch 0 (on the left)
      and switch 2 (on the right) must be programmed too.
      
      Without this patchset, a dump of the hardware FDB of switches 0, 1 and 2
      after programming a MAC address on lan3 looks like this (*):
      
          # bridge fdb add 11:22:33:44:55:66 dev lan3
          # cat /sys/kernel/debug/mv88e6xxx/sw*/atu/0 | grep -v FID
             0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6
             0  11:22:33:44:55:66    MC_STATIC_MGMT_PO       n  0 - - - - - -
             0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6
             0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6 7 8 9
      
      With this patchset applied, adjacent DSA ports get programmed too:
      
          # bridge fdb add 11:22:33:44:55:66 dev lan3
          # cat /sys/kernel/debug/mv88e6xxx/sw*/atu/0 | grep -v FID
             0  11:22:33:44:55:66    MC_STATIC_MGMT_PO       n  - - - - - 5 -
             0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6
             0  11:22:33:44:55:66    MC_STATIC_MGMT_PO       n  0 - - - - - -
             0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6
             0  11:22:33:44:55:66    MC_STATIC_MGMT_PO       n  - - - - - - - - - 9
             0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6 7 8 9
      
      In order to do that, the first commit introduces a dsa_towards_port()
      helper which returns the local port of a switch which must be used to
      reach an arbitrary switch port (local or from an adjacent switch.)
      
      The second patch uses this helper to configure the port reaching the
      target port for every switches of the fabric.
      
      (*) a patch for squashed debugfs interface which applies on top of this
      patchset is available here:
      
          https://github.com/vivien/linux/commit/f8e6ba34c68a72d3bf42f4dea79abacb2e61a3cc.patch
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75d0de8c
    • Vivien Didelot's avatar
      net: dsa: support cross-chip FDB operations · 3169241f
      Vivien Didelot authored
      When a MAC address is added to or removed from a switch port in the
      fabric, the target switch must program its port and adjacent switches
      must program their local DSA port used to reach the target switch.
      
      For this purpose, use the dsa_towards_port() helper to identify the
      local switch port which must be programmed.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3169241f
    • Vivien Didelot's avatar
      net: dsa: introduce dsa_towards_port helper · 3b8fac5d
      Vivien Didelot authored
      Add a new helper returning the local port used to reach an arbitrary
      switch port in the fabric.
      
      Its only user at the moment is the dsa_upstream_port helper, which
      returns the local port reaching the dedicated CPU port, but it will be
      used in cross-chip FDB operations.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b8fac5d
    • David S. Miller's avatar
      Merge branch 'dsa-simplify-switchdev-prepare-phase' · 5420683a
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: simplify switchdev prepare phase
      
      This patch series brings no functional changes.
      
      It removes the unused switchdev_trans arguments from the dsa_switch_ops
      for both MDB and VLAN operations, and provides functions to prepare and
      add these objects for a given bitmap of ports.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5420683a
    • Vivien Didelot's avatar
      net: dsa: add switch mdb bitmap functions · e6db98db
      Vivien Didelot authored
      This patch brings no functional changes.
      It moves out the MDB code iterating on a multicast group into new
      dsa_switch_mdb_{prepare,add}_bitmap() functions.
      
      This gives us a better isolation of the two switchdev phases.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6db98db
    • Vivien Didelot's avatar
      net: dsa: add switch vlan bitmap functions · 9c428c59
      Vivien Didelot authored
      This patch brings no functional changes.
      It moves out the VLAN code iterating on a list of VLAN members into new
      dsa_switch_vlan_{prepare,add}_bitmap() functions.
      
      This gives us a better isolation of the two switchdev phases.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c428c59
    • Vivien Didelot's avatar
      net: dsa: remove trans argument from mdb ops · 3709aadc
      Vivien Didelot authored
      The DSA switch MDB ops pass the switchdev_trans structure down to the
      drivers, but no one is using them and they aren't supposed to anyway.
      
      Remove the trans argument from MDB prepare and add operations.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3709aadc
    • Vivien Didelot's avatar
      net: dsa: remove trans argument from vlan ops · 80e02360
      Vivien Didelot authored
      The DSA switch VLAN ops pass the switchdev_trans structure down to the
      drivers, but no one is using them and they aren't supposed to anyway.
      
      Remove the trans argument from VLAN prepare and add operations.
      
      At the same time, fix the following checkpatch warning:
      
          WARNING: line over 80 characters
          #74: FILE: drivers/net/dsa/dsa_loop.c:177:
          +				      const struct switchdev_obj_port_vlan *vlan)
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80e02360
    • Paolo Abeni's avatar
      openvswitch: do not propagate headroom updates to internal port · 183dea58
      Paolo Abeni authored
      After commit 3a927bc7 ("ovs: propagate per dp max headroom to
      all vports") the need_headroom for the internal vport is updated
      accordingly to the max needed headroom in its datapath.
      
      That avoids the pskb_expand_head() costs when sending/forwarding
      packets towards tunnel devices, at least for some scenarios.
      
      We still require such copy when using the ovs-preferred configuration
      for vxlan tunnels:
      
          br_int
        /       \
      tap      vxlan
                 (remote_ip:X)
      
      br_phy
           \
          NIC
      
      where the route towards the IP 'X' is via 'br_phy'.
      
      When forwarding traffic from the tap towards the vxlan device, we
      will call pskb_expand_head() in vxlan_build_skb() because
      br-phy->needed_headroom is equal to tun->needed_headroom.
      
      With this change we avoid updating the internal vport needed_headroom,
      so that in the above scenario no head copy is needed, giving 5%
      performance improvement in UDP throughput test.
      
      As a trade-off, packets sent from the internal port towards a tunnel
      device will now experience the head copy overhead. The rationale is
      that the latter use-case is less relevant performance-wise.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      183dea58
  2. 01 Dec, 2017 26 commits
  3. 30 Nov, 2017 5 commits
    • David S. Miller's avatar
      Merge branch 'macb-rx-packet-filtering' · 201c78e0
      David S. Miller authored
      Rafal Ozieblo says:
      
      ====================
      Receive packets filtering for macb driver
      
      This patch series adds support for receive packets
      filtering for Cadence GEM driver. Packets can be redirect
      to different hardware queues based on source IP, destination IP,
      source port or destination port. To enable filtering,
      support for RX queueing was added as well.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      201c78e0
    • Rafal Ozieblo's avatar
      net: macb: Added support for RX filtering · ae8223de
      Rafal Ozieblo authored
      This patch allows filtering received packets to different
      hardware queues (aka ntuple).
      Signed-off-by: default avatarRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae8223de
    • Rafal Ozieblo's avatar
      net: macb: Added some queue statistics · 512286bb
      Rafal Ozieblo authored
      Added statistics per queue:
      - qX_rx_packets
      - qX_rx_bytes
      - qX_rx_dropped
      - qX_tx_packets
      - qX_tx_bytes
      - qX_tx_dropped
      Signed-off-by: default avatarRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      512286bb
    • Rafal Ozieblo's avatar
      net: macb: Added support for many RX queues · ae1f2a56
      Rafal Ozieblo authored
      To be able for packet reception on different RX queues some
      configuration has to be performed. This patch checks how many
      hardware queue does GEM support and initializes them.
      Signed-off-by: default avatarRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae1f2a56
    • Shrikrishna Khare's avatar
      vmxnet3: increase default rx ring sizes · 7475908f
      Shrikrishna Khare authored
      There are several reasons for increasing the receive ring sizes:
      
      1. The original ring size of 256 was chosen about 10 years ago when
      vmxnet3 was first created. At that time, 10Gbps Ethernet was not prevalent
      and servers were dominated by 1Gbps Ethernet. Now 10Gbps is common place,
      and higher bandwidth links -- 25Gbps, 40Gbps, 50Gbps -- are starting
      to appear. 256 Rx ring entries are simply not enough to keep up with
      higher link speed when there is a burst of network frames coming from
      these high speed links. Even with full MTU size frames, they are gone
      in a short time. It is also more common to have a mix of frame sizes,
      and more likely bi-modal distribution of frame sizes so the average frame
      size is not close to full MTU. If we consider average frame size of 800B,
      1024 frames that come in a burst takes ~0.65 ms to arrive at 10Gbps. With
      256 entires, it takes ~0.16 ms to arrive at 10Gbps.  At 25Gbps or 40Gbps,
      this time is reduced accordingly.
      
      2. On a hypervisor where there are many VMs and CPU is over committed,
      i.e. the number of VCPUs is more than the number of VCPUs, each PCPU is
      in effect time shared between multiple VMs/VCPUs. The time granularity at
      which this multiplexing occurs is typically coarser than between processes
      on a guest OS. Trying to time slice more finely is not efficient, for
      example, if memory cache is barely warmed up when switching from one VM
      to another occurs. This CPU overcommit adds delay to when the driver
      in a VM can service incoming packets. Whether CPU is over committed
      really depends on customer workloads. For certain situations, it is very
      common. For example, workloads of desktop VMs and product testing setups.
      Consolidation and sharing is what drives efficiency of a customer setup
      for such workloads. In these situations, the raw network bandwidth may
      not be very high, but the delays between when a VM is running or not
      running can also be relatively long.
      Signed-off-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Acked-by: default avatarJin Heo <heoj@vmware.com>
      Acked-by: default avatarGuolin Yang <gyang@vmware.com>
      Acked-by: default avatarBoon Ang <bang@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7475908f