1. 23 Sep, 2014 20 commits
  2. 22 Sep, 2014 10 commits
  3. 20 Sep, 2014 1 commit
    • Andy Zhou's avatar
      udp_tunnel: Only build ip6_udp_tunnel.c when IPV6 is selected · 6d967f87
      Andy Zhou authored
      Functions supplied in ip6_udp_tunnel.c are only needed when IPV6 is
      selected. When IPV6 is not selected, those functions are stubbed out
      in udp_tunnel.h.
      
      ==================================================================
       net/ipv6/ip6_udp_tunnel.c:15:5: error: redefinition of 'udp_sock_create6'
           int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
       In file included from net/ipv6/ip6_udp_tunnel.c:9:0:
            include/net/udp_tunnel.h:36:19: note: previous definition of 'udp_sock_create6' was here
             static inline int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
      ==================================================================
      
      Fixes:  fd384412 udp_tunnel: Seperate ipv6 functions into its own file
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndy Zhou <azhou@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d967f87
  4. 19 Sep, 2014 9 commits
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 6c62f606
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2014-09-18
      
      This series contains updates to ixgbe and ixgbevf.
      
      Ethan Zhao cleans up ixgbe and ixgbevf by removing bd_number from the
      adapter struct because it is not longer useful.
      
      Mark fixes ixgbe where if a hardware transmit timestamp is requested,
      an uninitialized workqueue entry may be scheduled.  Added a check for
      a PTP clock to avoid that.
      
      Jacob provides a number of cleanups for ixgbe.  Since we may call
      ixgbe_acquire_msix_vectors() prior to registering our netdevice, we
      should not use the netdevice specific printk and use e_dev_warn()
      instead.  Similar to how ixgbevf handles acquiring MSI-X vectors, we
      can return an error code instead of relying on the flag being set.
      This makes it more clear that we have failed to setup MSI-X mode and
      will make it easier to consolidate MSI-X related code into a single
      function.  In the case of disabling DCB, it is not an error since we
      still can function, we just have to let the user know.  So use
      e_dev_warn() instead of e_err().  Added warnings for other features
      that are disabled when we are without MSI-X support.  Cleanup flags
      that are no longer used or needed.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c62f606
    • David S. Miller's avatar
      Merge branch 'mlx4-next' · 58310b3f
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      mlx4: CQE/EQE stride support
      
      This series from Ido Shamay is intended for archs having
      cache line larger then 64 bytes.
      
      Since our CQE/EQEs are generally 64B in those systems, HW will write
      twice to the same cache line consecutively, causing pipe locks due to
      he hazard prevention mechanism. For elements in a cyclic buffer, writes
      are consecutive, so entries smaller than a cache line should be
      avoided, especially if they are written at a high rate.
      
      Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
      driver to increase the distance between entries so that each will reside
      in a different cache line.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58310b3f
    • Ido Shamay's avatar
      net/mlx4_en: Add mlx4_en_get_cqe helper · b1b6b4da
      Ido Shamay authored
      This function derives the base address of the CQE from the CQE size,
      and calculates the real CQE context segment in it from the factor
      (this is like before). Before this change the code used the factor to
      calculate the base address of the CQE as well.
      
      The factor indicates in which segment of the cqe stride the cqe information
      is located. For 32-byte strides, the segment is 0, and for 64 byte strides,
      the segment is 1 (bytes 32..63). Using the factor was ok as long as we had
      only 32 and 64 byte strides. However, with larger strides, the factor is zero,
      and so cannot be used to calculate the base of the CQE.
      
      The helper uses the same method of CQE buffer pulling made by all other
      components that reads the CQE buffer (mlx4_ib driver and libmlx4).
      Signed-off-by: default avatarIdo Shamay <idos@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1b6b4da
    • Ido Shamay's avatar
      net/mlx4_core: Cache line EQE size support · 43c816c6
      Ido Shamay authored
      Enable mlx4 interrupt handler to work with EQE stride feature,
      The feature may be enabled when cache line is bigger than 64B.
      The EQE size will then be the cache line size, and the context
      segment resides in [0-31] offset.
      Signed-off-by: default avatarIdo Shamay <idos@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43c816c6
    • Ido Shamay's avatar
      net/mlx4_core: Enable CQE/EQE stride support · 77507aa2
      Ido Shamay authored
      This feature is intended for archs having cache line larger then 64B.
      
      Since our CQE/EQEs are generally 64B in those systems, HW will write
      twice to the same cache line consecutively, causing pipe locks due to
      he hazard prevention mechanism. For elements in a cyclic buffer, writes
      are consecutive, so entries smaller than a cache line should be
      avoided, especially if they are written at a high rate.
      
      Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
      driver to increase the distance between entries so that each will reside
      in a different cache line. Until the introduction of this feature, there
      were two types of CQE/EQE:
      
      1. 32B stride and context in the [0-31] segment
      2. 64B stride and context in the [32-63] segment
      
      This feature introduces two additional types:
      
      3. 128B stride and context in the [0-31] segment (128B cache line)
      4. 256B stride and context in the [0-31] segment (256B cache line)
      
      Modify the mlx4_core driver to query the device for the CQE/EQE cache
      line stride capability and to enable that capability when the host
      cache line size is larger than 64 bytes (supported cache lines are
      128B and 256B).
      
      The mlx4 IB driver and libmlx4 need not be aware of this change. The PF
      context behaviour is changed to require this change in VF drivers
      running on such archs.
      Signed-off-by: default avatarIdo Shamay <idos@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77507aa2
    • Sabrina Dubroca's avatar
      net: fix sparse warnings in SNMP_UPD_PO_STATS(_BH) · 54003f11
      Sabrina Dubroca authored
      ptr used to be a non __percpu pointer (result of a this_cpu_ptr
      assignment, 7d720c3e ("percpu: add __percpu sparse annotations to
      net")). Since d25398df ("net: avoid reloads in SNMP_UPD_PO_STATS"),
      that's no longer the case, SNMP_UPD_PO_STATS uses this_cpu_add and ptr
      is now __percpu.
      
      Silence sparse warnings by preserving the original type and
      annotation, and remove the out-of-date comment.
      
      warning: incorrect type in initializer (different address spaces)
         expected unsigned long long *ptr
         got unsigned long long [noderef] <asn:3>*<noident>
      warning: incorrect type in initializer (different address spaces)
         expected void const [noderef] <asn:3>*__vpp_verify
         got unsigned long long *<noident>
      warning: incorrect type in initializer (different address spaces)
         expected void const [noderef] <asn:3>*__vpp_verify
         got unsigned long long *<noident>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54003f11
    • David S. Miller's avatar
      Merge branch 'fou-next' · fb5690d2
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      net: foo-over-udp (fou)
      
      This patch series implements foo-over-udp. The idea is that we can
      encapsulate different IP protocols in UDP packets. The rationale for
      this is that networking devices such as NICs and switches are usually
      implemented with UDP (and TCP) specific mechanims for processing. For
      instance, many switches and routers will implement a 5-tuple hash
      for UDP packets to perform Equal Cost Multipath Routing (ECMP) or
      RSS (on NICs). Many NICs also only provide rudimentary checksum
      offload (basic TCP and UDP packet), with foo-over-udp we may be
      able to leverage these NICs to offload checksums of tunneled packets
      (using checksum unnecessary conversion and eventually remote checksum
      offload)
      
      An example encapsulation of IPIP over FOU is diagrammed below. As
      illustrated, the packet overhead for FOU is the 8 byte UDP header.
      
      +------------------+
      |    IPv4 hdr      |
      +------------------+
      |     UDP hdr      |
      +------------------+
      |    IPv4 hdr      |
      +------------------+
      |     TCP hdr      |
      +------------------+
      |   TCP payload    |
      +------------------+
      
      Conceptually, FOU should be able to encapsulate any IP protocol.
      The FOU header (UDP hdr.) is essentially an inserted header between the
      IP header and transport, so in the case of TCP or UDP encapsulation
      the pseudo header would be based on the outer IP header and its length
      field must not include the UDP header.
      
      * Receive
      
      In this patch set the RX path for FOU is implemented in a new fou
      module. To enable FOU for a particular protocol, a UDP-FOU socket is
      opened to the port to receive FOU packets. The socket is mapped to the
      IP protocol for the packets. The XFRM mechanism used to receive
      encapsulated packets (udp_encap_rcv) for the port. Upon reception, the
      UDP is removed and packet is reinjected in the stack for the
      corresponding protocol associated with the socket (return -protocol
      from udp_encap_rcv function).
      
      GRO is provided with the appropriate fou_gro_receive and
      fou_gro_complete. These routines need to know the encapsulation
      protocol so we save that in udp_offloads structure with the port
      and pass it in the napi_gro_cb structure.
      
      * TX
      
      This patch series implements FOU transmit encapsulation for IPIP, GRE, and
      SIT. This done by some common infrastructure in ip_tunnel including an
      ip_tunnel_encap to perform FOU encapsulation and common configuration
      to enable FOU on IP tunnels. FOU is configured on existing tunnels and
      does not create any new interfaces. The transmit and receive paths are
      independent, so use of FOU may be assymetric between tunnel endpoints.
      
      * Configuration
      
      The fou module using netlink to configure FOU receive ports. The ip
      command can be augmented with a fou subcommand to support this. e.g. to
      configure FOU for IPIP on port 5555:
      
        ip fou add port 5555 ipproto 4
      
      GRE, IPIP, and SIT have been modified with netlink commands to
      configure use of FOU on transmit. The "ip link" command will be
      augmented with an encap subcommand (for supporting various forms of
      secondary encapsulation). For instance, to configure an ipip tunnel
      with FOU on port 5555:
      
        ip link add name tun1 type ipip \
          remote 192.168.1.1 local 192.168.1.2 ttl 225 \
          encap fou encap-sport auto encap-dport 5555
      
      * Notes
        - This patch set does not implement GSO for FOU. The UDP encapsulation
          code assumes TEB, so that will need to be reimplemented.
        - When a packet is received through FOU, the UDP header is not
          actually removed for the skbuf, pointers to transport header
          and length in the IP header are updated (like in ESP/UDP RX). A
          side effect is the IP header will now appear to have an incorrect
          checksum by an external observer (e.g. tcpdump), it will be off
          by sizeof UDP header. If necessary we could adjust the checksum
          to compensate.
        - Performance results are below. My expectation is that FOU should
          entail little overhead (clearly there is some work to do :-) ).
          Optimizing UDP socket lookup for encapsulation ports should help
          significantly.
        - I really don't expect/want devices to have special support for any
          of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
          and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
          steering is provided by commonly implemented UDP hashing. GRO/GSO
          seem fairly comparable with LRO/TSO already.
      
      * Performance
      
      Ran netperf TCP_RR and TCP_STREAM tests across various configurations.
      This was performed on bnx2x and I disabled TSO/GSO on sender to get
      fair comparison for FOU versus non-FOU. CPU utilization is reported
      for receive in TCP_STREAM.
      
        GRE
          IPv4, FOU, UDP checksum enabled
            TCP_STREAM
              24.85% CPU utilization
              9310.6 Mbps
            TCP_RR
              94.2% CPU utilization
              155/249/460 90/95/99% latencies
              1.17018e+06 tps
          IPv4, FOU, UDP checksum disabled
            TCP_STREAM
              31.04% CPU utilization
              9302.22 Mbps
            TCP_RR
              94.13% CPU utilization
              154/239/419 90/95/99% latencies
              1.17555e+06 tps
          IPv4, no FOU
            TCP_STREAM
              23.13% CPU utilization
              9354.58 Mbps
            TCP_RR
              90.24% CPU utilization
              156/228/360 90/95/99% latencies
              1.18169e+06 tps
      
        IPIP
          FOU, UDP checksum enabled
            TCP_STREAM
              24.13% CPU utilization
              9328 Mbps
            TCP_RR
              94.23
              149/237/429 90/95/99% latencies
              1.19553e+06 tps
          FOU, UDP checksum disabled
            TCP_STREAM
              29.13% CPU utilization
              9370.25 Mbps
            TCP_RR
              94.13% CPU utilization
              149/232/398 90/95/99% latencies
              1.19225e+06 tps
          No FOU
            TCP_STREAM
              10.43% CPU utilization
              5302.03 Mbps
            TCP_RR
              51.53% CPU utilization
              215/324/475 90/95/99% latencies
              864998 tps
      
        SIT
          FOU, UDP checksum enabled
            TCP_STREAM
              30.38% CPU utilization
              9176.76 Mbps
            TCP_RR
              96.9% CPU utilization
              170/281/581 90/95/99% latencies
              1.03372e+06 tps
          FOU, UDP checksum disabled
            TCP_STREAM
              39.6% CPU utilization
              9176.57 Mbps
            TCP_RR
              97.14% CPU utilization
              167/272/548 90/95/99% latencies
              1.03203e+06 tps
          No FOU
            TCP_STREAM
              11.2% CPU utilization
              4636.05 Mbps
            TCP_RR
              59.51% CPU utilization
              232/346/489 90/95/99% latencies
              813199 tps
      
      v2:
        - Removed encap IP tunnel ioctls, configuration is done by netlink
          only.
        - Don't export fou_create and fou_destroy, they are currently
          intended to be called within fou module only.
        - Filled on tunnel netlink structures and functions for new values.
      
      v3:
        - Fixed change logs for some of the patches.
        - Remove inline from fou_gro_receive and fou_gro_complete, let
          compiler decide on these.
      
      v4:
        - Don't need to cast void in fou_from_sock
        - Removed incorrest htons for port in fou_destroy
        - Some minor cleanup for readability
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb5690d2
    • Tom Herbert's avatar
      gre: Setup and TX path for gre/UDP foo-over-udp encapsulation · 4565e991
      Tom Herbert authored
      Added netlink attrs to configure FOU encapsulation for GRE, netlink
      handling of these flags, and properly adjust MTU for encapsulation.
      ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU
      encapsulation.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4565e991
    • Tom Herbert's avatar
      ipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation · 473ab820
      Tom Herbert authored
      Add netlink handling for IP tunnel encapsulation parameters and
      and adjustment of MTU for encapsulation.  ip_tunnel_encap is called
      from ip_tunnel_xmit to actually perform FOU encapsulation.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      473ab820