1. 01 Dec, 2017 25 commits
  2. 30 Nov, 2017 15 commits
    • David S. Miller's avatar
      Merge branch 'macb-rx-packet-filtering' · 201c78e0
      David S. Miller authored
      Rafal Ozieblo says:
      
      ====================
      Receive packets filtering for macb driver
      
      This patch series adds support for receive packets
      filtering for Cadence GEM driver. Packets can be redirect
      to different hardware queues based on source IP, destination IP,
      source port or destination port. To enable filtering,
      support for RX queueing was added as well.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      201c78e0
    • Rafal Ozieblo's avatar
      net: macb: Added support for RX filtering · ae8223de
      Rafal Ozieblo authored
      This patch allows filtering received packets to different
      hardware queues (aka ntuple).
      Signed-off-by: default avatarRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae8223de
    • Rafal Ozieblo's avatar
      net: macb: Added some queue statistics · 512286bb
      Rafal Ozieblo authored
      Added statistics per queue:
      - qX_rx_packets
      - qX_rx_bytes
      - qX_rx_dropped
      - qX_tx_packets
      - qX_tx_bytes
      - qX_tx_dropped
      Signed-off-by: default avatarRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      512286bb
    • Rafal Ozieblo's avatar
      net: macb: Added support for many RX queues · ae1f2a56
      Rafal Ozieblo authored
      To be able for packet reception on different RX queues some
      configuration has to be performed. This patch checks how many
      hardware queue does GEM support and initializes them.
      Signed-off-by: default avatarRafal Ozieblo <rafalo@cadence.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae1f2a56
    • Shrikrishna Khare's avatar
      vmxnet3: increase default rx ring sizes · 7475908f
      Shrikrishna Khare authored
      There are several reasons for increasing the receive ring sizes:
      
      1. The original ring size of 256 was chosen about 10 years ago when
      vmxnet3 was first created. At that time, 10Gbps Ethernet was not prevalent
      and servers were dominated by 1Gbps Ethernet. Now 10Gbps is common place,
      and higher bandwidth links -- 25Gbps, 40Gbps, 50Gbps -- are starting
      to appear. 256 Rx ring entries are simply not enough to keep up with
      higher link speed when there is a burst of network frames coming from
      these high speed links. Even with full MTU size frames, they are gone
      in a short time. It is also more common to have a mix of frame sizes,
      and more likely bi-modal distribution of frame sizes so the average frame
      size is not close to full MTU. If we consider average frame size of 800B,
      1024 frames that come in a burst takes ~0.65 ms to arrive at 10Gbps. With
      256 entires, it takes ~0.16 ms to arrive at 10Gbps.  At 25Gbps or 40Gbps,
      this time is reduced accordingly.
      
      2. On a hypervisor where there are many VMs and CPU is over committed,
      i.e. the number of VCPUs is more than the number of VCPUs, each PCPU is
      in effect time shared between multiple VMs/VCPUs. The time granularity at
      which this multiplexing occurs is typically coarser than between processes
      on a guest OS. Trying to time slice more finely is not efficient, for
      example, if memory cache is barely warmed up when switching from one VM
      to another occurs. This CPU overcommit adds delay to when the driver
      in a VM can service incoming packets. Whether CPU is over committed
      really depends on customer workloads. For certain situations, it is very
      common. For example, workloads of desktop VMs and product testing setups.
      Consolidation and sharing is what drives efficiency of a customer setup
      for such workloads. In these situations, the raw network bandwidth may
      not be very high, but the delays between when a VM is running or not
      running can also be relatively long.
      Signed-off-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Acked-by: default avatarJin Heo <heoj@vmware.com>
      Acked-by: default avatarGuolin Yang <gyang@vmware.com>
      Acked-by: default avatarBoon Ang <bang@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7475908f
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Utilize b53_get_tag_protocol() · 9f66816a
      Florian Fainelli authored
      Utilize the much more capable b53_get_tag_protocol() which takes care of
      all Broadcom switches specifics to resolve which port can have Broadcom
      tags enabled or not.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f66816a
    • Paolo Abeni's avatar
      net/reuseport: drop legacy code · e94a62f5
      Paolo Abeni authored
      Since commit e32ea7e7 ("soreuseport: fast reuseport UDP socket
      selection") and commit c125e80b ("soreuseport: fast reuseport
      TCP socket selection") the relevant reuseport socket matching the current
      packet is selected by the reuseport_select_sock() call. The only
      exceptions are invalid BPF filters/filters returning out-of-range
      indices.
      In the latter case the code implicitly falls back to using the hash
      demultiplexing, but instead of selecting the socket inside the
      reuseport_select_sock() function, it relies on the hash selection
      logic introduced with the early soreuseport implementation.
      
      With this patch, in case of a BPF filter returning a bad socket
      index value, we fall back to hash-based selection inside the
      reuseport_select_sock() body, so that we can drop some duplicate
      code in the ipv4 and ipv6 stack.
      
      This also allows faster lookup in the above scenario and will allow
      us to avoid computing the hash value for successful, BPF based
      demultiplexing - in a later patch.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e94a62f5
    • Linus Walleij's avatar
      Documentation: net: dsa: Cut set_addr() documentation · 0fc66ddf
      Linus Walleij authored
      This is not supported anymore, devices needing a MAC address
      just assign one at random, it's just a driver pecularity.
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fc66ddf
    • David S. Miller's avatar
      Merge branch 'net-dst_entry-shrink' · 3d8068c5
      David S. Miller authored
      David Miller says:
      
      ====================
      net: Significantly shrink the size of routes.
      
      Through a combination of several things, our route structures are
      larger than they need to be.
      
      Mostly this stems from having members in dst_entry which are only used
      by one class of routes.  So the majority of the work in this series is
      about "un-commoning" these members and pushing them into the type
      specific structures.
      
      Unfortunately, IPSEC needed the most surgery.  The majority of the
      changes here had to do with bundle creation and management.
      
      The other issue is the refcount alignment in dst_entry.  Once we get
      rid of the not-so-common members, it really opens the door to removing
      that alignment entirely.
      
      I think the new layout looks really nice, so I'll reproduce it here:
      
      	struct net_device       *dev;
      	struct  dst_ops	        *ops;
      	unsigned long		_metrics;
      	unsigned long           expires;
      	struct xfrm_state	*xfrm;
      	int			(*input)(struct sk_buff *);
      	int			(*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
      	unsigned short		flags;
      	short			obsolete;
      	unsigned short		header_len;
      	unsigned short		trailer_len;
      	atomic_t		__refcnt;
      	int			__use;
      	unsigned long		lastuse;
      	struct lwtunnel_state   *lwtstate;
      	struct rcu_head		rcu_head;
      	short			error;
      	short			__pad;
      	__u32			tclassid;
      
      (This is for 64-bit, on 32-bit the __refcnt comes at the very end)
      
      So, the good news:
      
      1) struct dst_entry shrinks from 160 to 112 bytes.
      
      2) struct rtable shrinks from 216 to 168 bytes.
      
      3) struct rt6_info shrinks from 384 to 320 bytes.
      
      Enjoy.
      
      v2:
      	Collapse some patches logically based upon feedback.
      	Fix the strange patch #7.
      
      v3:	xfrm_dst_path() needs inline keyword
      	Properly align __refcnt on 32-bit.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d8068c5
    • David Miller's avatar
      net: Remove dst->next · 7149f813
      David Miller authored
      There are no more users.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      7149f813
    • David Miller's avatar
      xfrm: Stop using dst->next in bundle construction. · 5492093d
      David Miller authored
      While building ipsec bundles, blocks of xfrm dsts are linked together
      using dst->next from bottom to the top.
      
      The only thing this is used for is initializing the pmtu values of the
      xfrm stack, and for updating the mtu values at xfrm_bundle_ok() time.
      
      The bundle pmtu entries must be processed in this order so that pmtu
      values lower in the stack of routes can propagate up to the higher
      ones.
      
      Avoid using dst->next by simply maintaining an array of dst pointers
      as we already do for the xfrm_state objects when building the bundle.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      5492093d
    • David Miller's avatar
      net: Rearrange dst_entry layout to avoid useless padding. · 8b207e73
      David Miller authored
      We have padding to try and align the refcount on a separate cache
      line.  But after several simplifications the padding has increased
      substantially.
      
      So now it's easy to change the layout to get rid of the padding
      entirely.
      
      We group the write-heavy __refcnt and __use with less often used
      items such as the rcu_head and the error code.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      8b207e73
    • David Miller's avatar
      xfrm: Move dst->path into struct xfrm_dst · 0f6c480f
      David Miller authored
      The first member of an IPSEC route bundle chain sets it's dst->path to
      the underlying ipv4/ipv6 route that carries the bundle.
      
      Stated another way, if one were to follow the xfrm_dst->child chain of
      the bundle, the final non-NULL pointer would be the path and point to
      either an ipv4 or an ipv6 route.
      
      This is largely used to make sure that PMTU events propagate down to
      the correct ipv4 or ipv6 route.
      
      When we don't have the top of an IPSEC bundle 'dst->path == dst'.
      
      Move it down into xfrm_dst and key off of dst->xfrm.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      0f6c480f
    • David Miller's avatar
      ipv6: Move dst->from into struct rt6_info. · 3a2232e9
      David Miller authored
      The dst->from value is only used by ipv6 routes to track where
      a route "came from".
      
      Any time we clone or copy a core ipv6 route in the ipv6 routing
      tables, we have the copy/clone's ->from point to the base route.
      
      This is used to handle route expiration properly.
      
      Only ipv6 uses this mechanism, and only ipv6 code references
      it.  So it is safe to move it into rt6_info.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      3a2232e9
    • David Miller's avatar
      xfrm: Move child route linkage into xfrm_dst. · b6ca8bd5
      David Miller authored
      XFRM bundle child chains look like this:
      
      	xdst1 --> xdst2 --> xdst3 --> path_dst
      
      All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
      The final child pointer in the chain, here called 'path_dst', is some
      other kind of route such as an ipv4 or ipv6 one.
      
      The xfrm output path pops routes, one at a time, via the child
      pointer, until we hit one which has a dst->xfrm pointer which
      is NULL.
      
      We can easily preserve the above mechanisms with child sitting
      only in the xfrm_dst structure.  All children in the chain
      before we break out of the xfrm_output() loop have dst->xfrm
      non-NULL and are therefore xfrm_dst objects.
      
      Since we break out of the loop when we find dst->xfrm NULL, we
      will not try to dereference 'dst' as if it were an xfrm_dst.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6ca8bd5