1. 19 May, 2015 12 commits
    • Daniel Borkmann's avatar
      tcp: add rfc3168, section 6.1.1.1. fallback · 49213555
      Daniel Borkmann authored
      This work as a follow-up of commit f7b3bec6 ("net: allow setting ecn
      via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
      ECN connections. In other words, this work adds a retry with a non-ECN
      setup SYN packet, as suggested from the RFC on the first timeout:
      
        [...] A host that receives no reply to an ECN-setup SYN within the
        normal SYN retransmission timeout interval MAY resend the SYN and
        any subsequent SYN retransmissions with CWR and ECE cleared. [...]
      
      Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
      that is, Linux default since 2009 via commit 255cac91 ("tcp: extend
      ECN sysctl to allow server-side only ECN"):
      
       1) Normal ECN-capable path:
      
          SYN ECE CWR ----->
                      <----- SYN ACK ECE
                  ACK ----->
      
       2) Path with broken middlebox, when client has fallback:
      
          SYN ECE CWR ----X crappy middlebox drops packet
                            (timeout, rtx)
                  SYN ----->
                      <----- SYN ACK
                  ACK ----->
      
      In case we would not have the fallback implemented, the middlebox drop
      point would basically end up as:
      
          SYN ECE CWR ----X crappy middlebox drops packet
                            (timeout, rtx)
          SYN ECE CWR ----X crappy middlebox drops packet
                            (timeout, rtx)
          SYN ECE CWR ----X crappy middlebox drops packet
                            (timeout, rtx)
      
      In any case, it's rather a smaller percentage of sites where there would
      occur such additional setup latency: it was found in end of 2014 that ~56%
      of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
      ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
      when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
      fallback would mitigate with a slight latency trade-off. Recent related
      paper on this topic:
      
        Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
        Gorry Fairhurst, and Richard Scheffenegger:
          "Enabling Internet-Wide Deployment of Explicit Congestion Notification."
          Proc. PAM 2015, New York.
        http://ecn.ethz.ch/ecn-pam15.pdf
      
      Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
      section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
      which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
      allows for disabling the fallback.
      
      tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
      rather we let tcp_ecn_rcv_synack() take that over on input path in case a
      SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
      ECN being negotiated eventually in that case.
      
      Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
      Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdfSigned-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
      Signed-off-by: default avatarBrian Trammell <trammell@tik.ee.ethz.ch>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Dave That <dave.taht@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49213555
    • David S. Miller's avatar
      Merge branch 'cxgb4-next' · 134e0dbe
      David S. Miller authored
      Hariprasad Shenai says:
      
      ====================
      cxgb4: Remove dead code and replace byte-oder functions
      
      This series removes dead fn t4_read_edc and t4_read_mc, also replaces
      ntoh{s,l} and hton{s,l} calls with the generic byteorder.
      
      PATCH 2/2 was sent as a single PATCH, but had some byte-ordering issues
      in t4_read_edc and t4_read_mc function. Found that t4_read_edc and
      t4_read_mc is unused, so PATCH 1/2 is added to remove it.
      
      This patch series is created against net-next tree and includes
      patches on cxgb4 driver.
      
      We have included all the maintainers of respective drivers. Kindly review
      the change and let us know in case of any review comments.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      134e0dbe
    • Hariprasad Shenai's avatar
      cxgb4: replace ntoh{s, l} and hton{s, l} calls with the generic byteorder · f404f80c
      Hariprasad Shenai authored
      replace ntoh{s,l} and hton{s,l} calls with the generic byteorder in
      cxgb4/t4_hw.c file
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f404f80c
    • Hariprasad Shenai's avatar
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-davem-2015-05-19' of... · b7a3a8e3
      David S. Miller authored
      Merge tag 'mac80211-next-for-davem-2015-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      This just has a few fixes:
       * LED throughput trigger was crashing
       * fast-xmit wasn't treating QoS changes in IBSS correctly
       * TDLS could use the wrong channel definition
       * using a reserved channel context could use the wrong channel width
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7a3a8e3
    • Arnd Bergmann's avatar
      be2net: make hwmon interface optional · 9a03259c
      Arnd Bergmann authored
      The hwmon interface in the be2net driver causes a link error when
      be2net is built-in while the hwmon subsystem is a loadable module:
      
      drivers/built-in.o: In function `be_probe':
      drivers/net/ethernet/emulex/benet/be_main.c:5761: undefined reference to `devm_hwmon_device_register_with_groups'
      
      This adds a new Kconfig symbol, following the example of multiple
      other drivers that have the same problem. The new CONFIG_BE2NET_HWMON
      will not be available when (BE2NET=y && HWMON=m) to avoid this
      problem.
      
      We have to also mark be_hwmon_show_temp as 'static' to ensure the
      compiler can optimize out all the unused code.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 29e9122b ("be2net: Export board temperature using hwmon-sysfs interface.")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a03259c
    • Eric B Munson's avatar
      tcp: Return error instead of partial read for saved syn headers · aea0929e
      Eric B Munson authored
      Currently the getsockopt() requesting the cached contents of the syn
      packet headers will fail silently if the caller uses a buffer that is
      too small to contain the requested data.  Rather than fail silently and
      discard the headers, getsockopt() should return an error and report the
      required size to hold the data.
      Signed-off-by: default avatarEric B Munson <emunson@akamai.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aea0929e
    • Parav Pandit's avatar
    • David S. Miller's avatar
      Merge branch 'icmp_frag' · 76d7c457
      David S. Miller authored
      Andy Zhou says:
      
      ====================
      fragmentation ICMP
      
      Currently, we send ICMP packets when errors occur during fragmentation or
      de-fragmentation.  However, it is a bug when sending those ICMP packets
      in the context of using netfilter for bridging.
      
      Those ICMP packets are only expected in the context of routing, not in
      bridging mode.
      
      The local stack is not involved in bridging forward decisions, thus
      should be not used for deciding the reverse path for those ICMP messages.
      
      This bug only affects IPV4, not in IPv6.
      
      v1->v2:  restructure the patches into two patches that fix defragmentation and
               fragmentation respectively.
      
      	 A bit is add in IPCB to control whether ICMP packet should be
      	 generated for defragmentation.
      
      	 Fragmentation ICMP is now removed by restructuring the
      	 ip_fragment() API.
      
      v2->v3:  Add droping icmp for bridging contrack users
               drop exporting ip_fragment() API.
      
      v3->v4:  Remove unnecessary parentheses in 'return' statements
      
      v4->v5:  Drop the patch that sets and checks a bit in IPCB
               that prevents ip_defrag to send ICMP.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76d7c457
    • Andy Zhou's avatar
      bridge_netfilter: No ICMP packet on IPv4 fragmentation error · 49d16b23
      Andy Zhou authored
      When bridge netfilter re-fragments an IP packet for output, all
      packets that can not be re-fragmented to their original input size
      should be silently discarded.
      
      However, current bridge netfilter output path generates an ICMP packet
      with 'size exceeded MTU' message for such packets, this is a bug.
      
      This patch refactors the ip_fragment() API to allow two separate
      use cases. The bridge netfilter user case will not
      send ICMP, the routing output will, as before.
      Signed-off-by: default avatarAndy Zhou <azhou@nicira.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49d16b23
    • Andy Zhou's avatar
      IPv4: skip ICMP for bridge contrack users when defrag expires · 8bc04864
      Andy Zhou authored
      users in [IP_DEFRAG_CONNTRACK_BRIDGE_IN, __IP_DEFRAG_CONNTRACK_BR_IN]
      should not ICMP message also.
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarAndy Zhou <azhou@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bc04864
    • Andy Zhou's avatar
      ipv4: introduce frag_expire_skip_icmp() · 5cf42280
      Andy Zhou authored
      Improve readability of skip ICMP for de-fragmentation expiration logic.
      This change will also make the logic easier to maintain when the
      following patches in this series are applied.
      Signed-off-by: default avatarAndy Zhou <azhou@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cf42280
  2. 18 May, 2015 26 commits
  3. 17 May, 2015 2 commits
    • Eric Dumazet's avatar
      net: fix two sparse errors · c91d4606
      Eric Dumazet authored
      First one in __skb_checksum_validate_complete() fixes the following
      (and other callers)
      
      make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/tcp_ipv4.o
        CHECK   net/ipv4/tcp_ipv4.c
      include/linux/skbuff.h:3052:24: warning: incorrect type in return expression (different base types)
      include/linux/skbuff.h:3052:24:    expected restricted __sum16
      include/linux/skbuff.h:3052:24:    got int
      
      Second is fixing gso_make_checksum() :
      
        CHECK   net/ipv4/gre_offload.c
      include/linux/skbuff.h:3360:14: warning: incorrect type in assignment (different base types)
      include/linux/skbuff.h:3360:14:    expected unsigned short [unsigned] [usertype] csum
      include/linux/skbuff.h:3360:14:    got restricted __sum16
      include/linux/skbuff.h:3365:16: warning: incorrect type in return expression (different base types)
      include/linux/skbuff.h:3365:16:    expected restricted __sum16
      include/linux/skbuff.h:3365:16:    got unsigned short [unsigned] [usertype] csum
      
      Fixes: 5a212329 ("net: Support for csum_bad in skbuff")
      Fixes: 7e2b10c1 ("net: Support for multiple checksums with gso")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      CC: Tom Herbert <tom@herbertland.com>
      Acked-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c91d4606
    • Eric Dumazet's avatar
      netfilter: synproxy: fix sparse errors · ba6d0564
      Eric Dumazet authored
      Fix verbose sparse errors :
      
      make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/netfilter/ipt_SYNPROXY.o
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba6d0564