1. 06 Nov, 2014 19 commits
  2. 05 Nov, 2014 14 commits
    • WANG Cong's avatar
      ipv6: move INET6_MATCH() to include/net/inet6_hashtables.h · 25de4668
      WANG Cong authored
      It is only used in net/ipv6/inet6_hashtables.c.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25de4668
    • David S. Miller's avatar
      net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b
      David S. Miller authored
      This encapsulates all of the skb_copy_datagram_iovec() callers
      with call argument signature "skb, offset, msghdr->msg_iov, length".
      
      When we move to iov_iters in the networking, the iov_iter object will
      sit in the msghdr.
      
      Having a helper like this means there will be less places to touch
      during that transformation.
      
      Based upon descriptions and patch from Al Viro.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51f3d02b
    • David S. Miller's avatar
      Merge branch 'gue-next' · 1d76c1d0
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      gue: Remote checksum offload
      
      This patch set implements remote checksum offload for
      GUE, which is a mechanism that provides checksum offload of
      encapsulated packets using rudimentary offload capabilities found in
      most Network Interface Card (NIC) devices. The outer header checksum
      for UDP is enabled in packets and, with some additional meta
      information in the GUE header, a receiver is able to deduce the
      checksum to be set for an inner encapsulated packet. Effectively this
      offloads the computation of the inner checksum. Enabling the outer
      checksum in encapsulation has the additional advantage that it covers
      more of the packet than the inner checksum including the encapsulation
      headers.
      
      Remote checksum offload is described in:
      http://tools.ietf.org/html/draft-herbert-remotecsumoffload-01
      
      The GUE transmit and receive paths are modified to support the
      remote checksum offload option. The option contains a checksum
      offset and checksum start which are directly derived from values
      set in stack when doing CHECKSUM_PARTIAL. On receipt of the option, the
      operation is to calculate the packet checksum from "start" to end of
      the packet (normally derived for checksum complete), and then set
      the resultant value at checksum "offset" (the checksum field has
      already been primed with the pseudo header). This emulates a NIC
      that implements NETIF_F_HW_CSUM.
      
      The primary purpose of this feature is to eliminate cost of performing
      checksum calculation over a packet when encpasulating.
      
      In this patch set:
        - Move fou_build_header into fou.c and split it into a couple of
          functions
        - Enable offloading of outer UDP checksum in encapsulation
        - Change udp_offload to support remote checksum offload, includes
          new GSO type and ensuring encapsulated layers (TCP) doesn't try to
          set a checksum covered by RCO
        - TX support for RCO with GUE. This is configured through ip_tunnel
          and set the option on transmit when packet being encapsulated is
          CHECKSUM_PARTIAL
        - RX support for RCO with GUE for normal and GRO paths. Includes
          resolving the offloaded checksum
      
      v2:
        Address comments from davem: Move accounting for private option
        field in gue_encap_hlen to patch in which we add the remote checksum
        offload option.
      
      Testing:
      
      I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200
      streams, comparing GUE with and without remote checksum offload (doing
      checksum-unnecessary to complete conversion in both cases). These
      were run on mlnx4 and bnx2x. Some mlnx4 results are below.
      
      GRE/GUE
          TCP_STREAM
            IPv4, with remote checksum offload
              9.71% TX CPU utilization
              7.42% RX CPU utilization
              36380 Mbps
            IPv4, without remote checksum offload
              12.40% TX CPU utilization
              7.36% RX CPU utilization
              36591 Mbps
          TCP_RR
            IPv4, with remote checksum offload
              77.79% CPU utilization
      	91/144/216 90/95/99% latencies
              1.95127e+06 tps
            IPv4, without remote checksum offload
              78.70% CPU utilization
              89/152/297 90/95/99% latencies
              1.95458e+06 tps
      
      IPIP/GUE
          TCP_STREAM
            With remote checksum offload
              10.30% TX CPU utilization
              7.43% RX CPU utilization
              36486 Mbps
            Without remote checksum offload
              12.47% TX CPU utilization
              7.49% RX CPU utilization
              36694 Mbps
          TCP_RR
            With remote checksum offload
              77.80% CPU utilization
              87/153/270 90/95/99% latencies
              1.98735e+06 tps
            Without remote checksum offload
              77.98% CPU utilization
              87/150/287 90/95/99% latencies
              1.98737e+06 tps
      
      SIT/GUE
          TCP_STREAM
            With remote checksum offload
              9.68% TX CPU utilization
              7.36% RX CPU utilization
              35971 Mbps
            Without remote checksum offload
              12.95% TX CPU utilization
              8.04% RX CPU utilization
              36177 Mbps
          TCP_RR
            With remote checksum offload
              79.32% CPU utilization
              94/158/295 90/95/99% latencies
              1.88842e+06 tps
            Without remote checksum offload
              80.23% CPU utilization
              94/149/226 90/95/99% latencies
              1.90338e+06 tps
      
      VXLAN
          TCP_STREAM
              35.03% TX CPU utilization
              20.85% RX CPU utilization
              36230 Mbps
          TCP_RR
              77.36% CPU utilization
              84/146/270 90/95/99% latencies
              2.08063e+06 tps
      
      We can also look at CPU time in csum_partial using perf (with bnx2x
      setup). For GRE with TCP_STREAM I see:
      
          With remote checksum offload
              0.33% TX
              1.81% RX
          Without remote checksum offload
              6.00% TX
              0.51% RX
      
      I suspect the fact that time in csum_partial noticably increases
      with remote checksum offload for RX is due to taking the cache miss on
      the encapsulated header in that function. By similar reasoning, if on
      the TX side the packet were not in cache (say we did a splice from a
      file whose data was never touched by the CPU) the CPU savings for TX
      would probably be more pronounced.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d76c1d0
    • Tom Herbert's avatar
      gue: Receive side of remote checksum offload · a8d31c12
      Tom Herbert authored
      Add processing of the remote checksum offload option in both the normal
      path as well as the GRO path. The implements patching the affected
      checksum to derive the offloaded checksum.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8d31c12
    • Tom Herbert's avatar
      gue: TX support for using remote checksum offload option · b17f709a
      Tom Herbert authored
      Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
      remote checksum offload on an IP tunnel. Add logic in gue_build_header
      to insert remote checksum offload option.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b17f709a
    • Tom Herbert's avatar
      gue: Protocol constants for remote checksum offload · c1aa8347
      Tom Herbert authored
      Define a private flag for remote checksun offload as well as a length
      for the option.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1aa8347
    • Tom Herbert's avatar
      udp: Changes to udp_offload to support remote checksum offload · e585f236
      Tom Herbert authored
      Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
      checksum offload being done (in this case inner checksum must not
      be offloaded to the NIC).
      
      Added logic in __skb_udp_tunnel_segment to handle remote checksum
      offload case.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e585f236
    • Tom Herbert's avatar
      gue: Add infrastructure for flags and options · 5024c33a
      Tom Herbert authored
      Add functions and basic definitions for processing standard flags,
      private flags, and control messages. This includes definitions
      to compute length of optional fields corresponding to a set of flags.
      Flag validation is in validate_gue_flags function. This checks for
      unknown flags, and that length of optional fields is <= length
      in guehdr hlen.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5024c33a
    • Tom Herbert's avatar
      udp: Offload outer UDP tunnel csum if available · 4bcb877d
      Tom Herbert authored
      In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
      ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
      if device features allow it.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bcb877d
    • Tom Herbert's avatar
      net: Move fou_build_header into fou.c and refactor · 63487bab
      Tom Herbert authored
      Move fou_build_header out of ip_tunnel.c and into fou.c splitting
      it up into fou_build_header, gue_build_header, and fou_build_udp.
      This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
      to call fou_build_header or gue_build_header based on the tunnel
      encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
      functions which are called by ip_encap_hlen. New net/fou.h has
      prototypes and defines for this.
      
      Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
      can use FOU/GUE and fou module is also selected.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63487bab
    • David S. Miller's avatar
      Merge branch 'stmmac-next' · 890b7916
      David S. Miller authored
      Giuseppe Cavallaro says:
      
      ====================
      stmmac: review driver Koptions
      
      Recently many Koption options have been added to have new glue logic on several
      platforms.
      
      The main goal behind this work is to guarantee that the driver built
      fine on all the branches where it is present independently of which
      glue logic is selected.
      
      IMHO, it is better to remove all the not necessary Koption(s) that can hide
      build problems when something changes in the driver and especially when
      the DT compatibility allows us to manage all the platform data.
      
      I compiled the driver w/o any issue on net-next Git for:
      
        x86, arm and sh4.
      
      In case of there are build problems on some repos now it will be
      easy to catch them and cherry-pick patches from mainstream.
      
      For sure, do not hesitate to contact me in case of issue.
      
      Also this set removes STMMAC_DEBUG_FS and BUS_MODE_DA. The latter is useless
      and the former can be replaced by DEBUG_FS (always to make safe the build).
      
      V2: patch-set re-based on top of the latest updates for net-next
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      890b7916
    • Giuseppe CAVALLARO's avatar
      stmmac: remove BUS_MODE_DA · 98fbebcb
      Giuseppe CAVALLARO authored
      This is a very old and often unused option to configure
      a bit in a register inside the DMA. This support should
      not stay under Koption and should be extended for new chips too.
      This will be do later maybe via device-tree parameters.
      Also no performance impact when remove this setting on STi platforms.
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98fbebcb
    • Giuseppe CAVALLARO's avatar
      stmmac: remove STMMAC_DEBUG_FS · 50fb4f74
      Giuseppe CAVALLARO authored
      the STMMAC_DEBUG_FS Koption is now removed from the
      driver configuration and this support will be built
      by default when DEBUG_FS is present. This can also be
      useful on building driver verification.
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50fb4f74
    • Giuseppe CAVALLARO's avatar
      stmmac: remove specific SoC Koption from platform. · c0d54066
      Giuseppe CAVALLARO authored
      This patch removes all the Koptions added to build the glue-logic files
      for all different architectures: DWMAC_MESON, DWMAC_SUNXI, DWMAC_STI ...
      Nowadays the stmmac needs to be compiled on several platforms; in some
      case it very convenient to guarantee that its build is always completed
      with success on all the branches where the driver is present.
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0d54066
  3. 04 Nov, 2014 7 commits
    • Vladimir Zapolskiy's avatar
      net: phy: spi_ks8995: remove sysfs bin file by registered attribute · 30349bdb
      Vladimir Zapolskiy authored
      When a sysfs binary file is asked to be removed, it is found by
      attribute name, so strictly speaking this change is not a fix, but
      just in case when attribute name is changed in the driver or sysfs
      internals are changed, it might be better to remove the previously
      created file using right the same binary attribute.
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30349bdb
    • Fabian Frederick's avatar
      6cf1093e
    • Florent Fourcot's avatar
      ipv6: trivial, add bracket for the if block · 869ba988
      Florent Fourcot authored
      The "else" block is on several lines and use bracket.
      Signed-off-by: default avatarFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      869ba988
    • Fabian Frederick's avatar
      05006e8c
    • David S. Miller's avatar
      Merge branch 'ecn_via_routing_table' · 90284c2b
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      net: allow setting ecn via routing table
      
      Here is v4 of the patchset, its exactly the same as v3 except in patch3/3
      where I added the missing 'const' qualifier to a function argument that
      Eric spotted during review.
      
      I preserved Erics Acks so that he doesn't have to resend them.
      
      v3 cover letter:
      
      When using syn cookies, then do not simply trust that the echoed timestamp
      was not modified to make sure that ecn is not turned on magically when it
      is disabled on the host.
      
      The first two patches, which were not part of earlier series, prepare
      the cookie code for the ecn route metrics change by allowing is to
      more easily use the existing dst object for ecn validation.
      
      The 3rd patch adds the ecn route metric feature support.
      It is almost the same as in v2, except that we'll now also test the
      dst_features when decoding a syn cookie timestamp that indicates ecn support.
      
      These three patches then allow turning on explicit congestion notification
      based on the destination network.
      
      For example, assuming the default tcp_ecn sysctl '2', the following will
      enable ecn (tcp_ecn=1 behaviour, i.e. request ecn to be enabled for a
      tcp connection) for all connections to hosts inside the 192.168.2/24 network:
      
      ip route change 192.168.2.0/24 dev eth0 features ecn
      
      Having a more fine-grained per-route setting can be beneficial for
      various reasons, for example 1) within data centers, or 2) local ISPs
      may deploy ECN support for their own video/streaming services [1], etc.
      
      Joint work with Daniel Borkmann, feature suggested by Hannes Frederic Sowa.
      
      The patch to enable this in iproute2 will be posted shortly, it is currently
      also available here:
      http://git.breakpoint.cc/cgit/fw/iproute2.git/commit/?h=iproute_features&id=8843d2d8973fb81c78a7efe6d42e3a17d739003e
      
      [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90284c2b
    • Florian Westphal's avatar
      net: allow setting ecn via routing table · f7b3bec6
      Florian Westphal authored
      This patch allows to set ECN on a per-route basis in case the sysctl
      tcp_ecn is not set to 1. In other words, when ECN is set for specific
      routes, it provides a tcp_ecn=1 behaviour for that route while the rest
      of the stack acts according to the global settings.
      
      One can use 'ip route change dev $dev $net features ecn' to toggle this.
      
      Having a more fine-grained per-route setting can be beneficial for various
      reasons, for example, 1) within data centers, or 2) local ISPs may deploy
      ECN support for their own video/streaming services [1], etc.
      
      There was a recent measurement study/paper [2] which scanned the Alexa's
      publicly available top million websites list from a vantage point in US,
      Europe and Asia:
      
      Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
      blamed to commit 255cac91 ("tcp: extend ECN sysctl to allow server-side
      only ECN") ;)); the break in connectivity on-path was found is about
      1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
      more common in the negotiation phase (and mostly seen in the Alexa
      middle band, ranks around 50k-150k): from 12-thousand hosts on which
      there _may_ be ECN-linked connection failures, only 79 failed with RST
      when _not_ failing with RST when ECN is not requested.
      
      It's unclear though, how much equipment in the wild actually marks CE
      when buffers start to fill up.
      
      We thought about a fallback to non-ECN for retransmitted SYNs as another
      global option (which could perhaps one day be made default), but as Eric
      points out, there's much more work needed to detect broken middleboxes.
      
      Two examples Eric mentioned are buggy firewalls that accept only a single
      SYN per flow, and middleboxes that successfully let an ECN flow establish,
      but later mark CE for all packets (so cwnd converges to 1).
      
       [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
       [2] http://ecn.ethz.ch/
      
      Joint work with Daniel Borkmann.
      
      Reference: http://thread.gmane.org/gmane.linux.network/335797Suggested-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7b3bec6
    • Florian Westphal's avatar
      syncookies: split cookie_check_timestamp() into two functions · f1673381
      Florian Westphal authored
      The function cookie_check_timestamp(), both called from IPv4/6 context,
      is being used to decode the echoed timestamp from the SYN/ACK into TCP
      options used for follow-up communication with the peer.
      
      We can remove ECN handling from that function, split it into a separate
      one, and simply rename the original function into cookie_decode_options().
      cookie_decode_options() just fills in tcp_option struct based on the
      echoed timestamp received from the peer. Anything that fails in this
      function will actually discard the request socket.
      
      While this is the natural place for decoding options such as ECN which
      commit 172d69e6 ("syncookies: add support for ECN") added, we argue
      that in particular for ECN handling, it can be checked at a later point
      in time as the request sock would actually not need to be dropped from
      this, but just ECN support turned off.
      
      Therefore, we split this functionality into cookie_ecn_ok(), which tells
      us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.
      
      This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
      won't be enough anymore at that point; if the timestamp indicates ECN
      and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.
      
      This would mean adding a route lookup to cookie_check_timestamp(), which
      we definitely want to avoid. As we already do a route lookup at a later
      point in cookie_{v4,v6}_check(), we can simply make use of that as well
      for the new cookie_ecn_ok() function w/o any additional cost.
      
      Joint work with Daniel Borkmann.
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1673381