1. 22 Jan, 2015 30 commits
  2. 21 Jan, 2015 2 commits
    • David Vrabel's avatar
      xen-netfront: use correct linear area after linearizing an skb · 02db7d0c
      David Vrabel authored
      commit 11d3d2a1 upstream.
      
      Commit 97a6d1bb (xen-netfront: Fix
      handling packets on compound pages with skb_linearize) attempted to
      fix a problem where an skb that would have required too many slots
      would be dropped causing TCP connections to stall.
      
      However, it filled in the first slot using the original buffer and not
      the new one and would use the wrong offset and grant access to the
      wrong page.
      
      Netback would notice the malformed request and stop all traffic on the
      VIF, reporting:
      
          vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144
          vif vif-3-0 vif3.0: fatal error; disabling device
      Reported-by: default avatarAnthony Wright <anthony@overnetdata.com>
      Tested-by: default avatarAnthony Wright <anthony@overnetdata.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Stefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      02db7d0c
    • Zoltan Kiss's avatar
      xen-netfront: Fix handling packets on compound pages with skb_linearize · de2d6357
      Zoltan Kiss authored
      commit 97a6d1bb upstream.
      
      There is a long known problem with the netfront/netback interface: if the guest
      tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
      it gets dropped. The reason is that netback maps these slots to a frag in the
      frags array, which is limited by size. Having so many slots can occur since
      compound pages were introduced, as the ring protocol slice them up into
      individual (non-compound) page aligned slots. The theoretical worst case
      scenario looks like this (note, skbs are limited to 64 Kb here):
      linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
      using 2 slots
      first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
      end and the beginning of a page, therefore they use 3 * 15 = 45 slots
      last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
      Although I don't think this 51 slots skb can really happen, we need a solution
      which can deal with every scenario. In real life there is only a few slots
      overdue, but usually it causes the TCP stream to be blocked, as the retry will
      most likely have the same buffer layout.
      This patch solves this problem by linearizing the packet. This is not the
      fastest way, and it can fail much easier as it tries to allocate a big linear
      area for the whole packet, but probably easier by an order of magnitude than
      anything else. Probably this code path is not touched very frequently anyway.
      Signed-off-by: default avatarZoltan Kiss <zoltan.kiss@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Paul Durrant <paul.durrant@citrix.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Stefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      de2d6357
  3. 20 Jan, 2015 1 commit
  4. 19 Jan, 2015 7 commits
    • Govindarajulu Varadarajan's avatar
      enic: fix rx skb checksum · 969d621e
      Govindarajulu Varadarajan authored
      commit 17e96834 upstream.
      
      Hardware always provides compliment of IP pseudo checksum. Stack expects
      whole packet checksum without pseudo checksum if CHECKSUM_COMPLETE is set.
      
      This causes checksum error in nf & ovs.
      
      kernel: qg-19546f09-f2: hw csum failure
      kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: GF          O--------------   3.10.0-123.8.1.el7.x86_64 #1
      kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
      kernel: ffff881218f40000 df68243feb35e3a8 ffff881237a43ab8 ffffffff815e237b
      kernel: ffff881237a43ad0 ffffffff814cd4ca ffff8829ec71eb00 ffff881237a43af0
      kernel: ffffffff814c6232 0000000000000286 ffff8829ec71eb00 ffff881237a43b00
      kernel: Call Trace:
      kernel: <IRQ>  [<ffffffff815e237b>] dump_stack+0x19/0x1b
      kernel: [<ffffffff814cd4ca>] netdev_rx_csum_fault+0x3a/0x40
      kernel: [<ffffffff814c6232>] __skb_checksum_complete_head+0x62/0x70
      kernel: [<ffffffff814c6251>] __skb_checksum_complete+0x11/0x20
      kernel: [<ffffffff8155a20c>] nf_ip_checksum+0xcc/0x100
      kernel: [<ffffffffa049edc7>] icmp_error+0x1f7/0x35c [nf_conntrack_ipv4]
      kernel: [<ffffffff814cf419>] ? netif_rx+0xb9/0x1d0
      kernel: [<ffffffffa040eb7b>] ? internal_dev_recv+0xdb/0x130 [openvswitch]
      kernel: [<ffffffffa04c8330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffffa049e302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
      kernel: [<ffffffff815005ca>] nf_iterate+0xaa/0xc0
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffff81500664>] nf_hook_slow+0x84/0x140
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffff81509dd4>] ip_rcv+0x344/0x380
      
      Hardware verifies IP & tcp/udp header checksum but does not provide payload
      checksum, use CHECKSUM_UNNECESSARY. Set it only if its valid IP tcp/udp packet.
      
      Cc: Jiri Benc <jbenc@redhat.com>
      Cc: Stefan Assmann <sassmann@redhat.com>
      Reported-by: default avatarSunil Choudhary <schoudha@redhat.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Reviewed-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      969d621e
    • Jiri Pirko's avatar
      team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin · 0db432dd
      Jiri Pirko authored
      commit b0d11b42 upstream.
      
      This patch is fixing a race condition that may cause setting
      count_pending to -1, which results in unwanted big bulk of arp messages
      (in case of "notify peers").
      
      Consider following scenario:
      
      count_pending == 2
         CPU0                                           CPU1
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 1)
      					  schedule_delayed_work
       team_notify_peers
         atomic_add (adding 1 to count_pending)
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 1)
      					  schedule_delayed_work
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 0)
         schedule_delayed_work
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to -1)
      
      Fix this race by using atomic_dec_if_positive - that will prevent
      count_pending running under 0.
      
      Fixes: fc423ff0 ("team: add peer notification")
      Fixes: 492b200e  ("team: add support for sending multicast rejoins")
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      0db432dd
    • Eric Dumazet's avatar
      alx: fix alx_poll() · 2cdf9e93
      Eric Dumazet authored
      commit 7a05dc64 upstream.
      
      Commit d75b1ade ("net: less interrupt masking in NAPI") uncovered
      wrong alx_poll() behavior.
      
      A NAPI poll() handler is supposed to return exactly the budget when/if
      napi_complete() has not been called.
      
      It is also supposed to return number of frames that were received, so
      that netdev_budget can have a meaning.
      
      Also, in case of TX pressure, we still have to dequeue received
      packets : alx_clean_rx_irq() has to be called even if
      alx_clean_tx_irq(alx) returns false, otherwise device is half duplex.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: d75b1ade ("net: less interrupt masking in NAPI")
      Reported-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Bisected-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Tested-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      2cdf9e93
    • Palik, Imre's avatar
      xen-netback: fixing the propagation of the transmit shaper timeout · 58db20fc
      Palik, Imre authored
      commit 07ff890d upstream.
      
      Since e9ce7cb6 ("xen-netback: Factor queue-specific data into queue struct"),
      the transimt shaper timeout is always set to 0.  The value the user sets via
      xenbus is never propagated to the transmit shaper.
      
      This patch fixes the issue.
      
      Cc: Anthony Liguori <aliguori@amazon.com>
      Signed-off-by: default avatarImre Palik <imrep@amazon.de>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      58db20fc
    • Herbert Xu's avatar
      tcp: Do not apply TSO segment limit to non-TSO packets · c17a8003
      Herbert Xu authored
      commit 843925f3 upstream.
      
      Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.
      
      In fact the problem was completely unrelated to IPsec.  The bug is
      also reproducible if you just disable TSO/GSO.
      
      The problem is that when the MSS goes down, existing queued packet
      on the TX queue that have not been transmitted yet all look like
      TSO packets and get treated as such.
      
      This then triggers a bug where tcp_mss_split_point tells us to
      generate a zero-sized packet on the TX queue.  Once that happens
      we're screwed because the zero-sized packet can never be removed
      by ACKs.
      
      Fixes: 1485348d ("tcp: Apply device TSO segment limit earlier")
      Reported-by: default avatarThomas Jarosch <thomas.jarosch@intra2net.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      
      Cheers,
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [ luis: backported to 3.16: used davem's backport to 3.18 ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c17a8003
    • Jay Vosburgh's avatar
      net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding · b2108a88
      Jay Vosburgh authored
      commit 2c26d34b upstream.
      
      When using VXLAN tunnels and a sky2 device, I have experienced
      checksum failures of the following type:
      
      [ 4297.761899] eth0: hw csum failure
      [...]
      [ 4297.765223] Call Trace:
      [ 4297.765224]  <IRQ>  [<ffffffff8172f026>] dump_stack+0x46/0x58
      [ 4297.765235]  [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
      [ 4297.765238]  [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
      [ 4297.765240]  [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
      [ 4297.765243]  [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
      [ 4297.765246]  [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
      
      	These are reliably reproduced in a network topology of:
      
      container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch
      
      	When VXLAN encapsulated traffic is received from a similarly
      configured peer, the above warning is generated in the receive
      processing of the encapsulated packet.  Note that the warning is
      associated with the container eth0.
      
              The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
      because the packet is an encapsulated Ethernet frame, the checksum
      generated by the hardware includes the inner protocol and Ethernet
      headers.
      
      	The receive code is careful to update the skb->csum, except in
      __dev_forward_skb, as called by dev_forward_skb.  __dev_forward_skb
      calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
      to skip over the Ethernet header, but does not update skb->csum when
      doing so.
      
      	This patch resolves the problem by adding a call to
      skb_postpull_rcsum to update the skb->csum after the call to
      eth_type_trans.
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b2108a88
    • Thomas Graf's avatar
      net: Reset secmark when scrubbing packet · a6dd2222
      Thomas Graf authored
      commit b8fb4e06 upstream.
      
      skb_scrub_packet() is called when a packet switches between a context
      such as between underlay and overlay, between namespaces, or between
      L3 subnets.
      
      While we already scrub the packet mark, connection tracking entry,
      and cached destination, the security mark/context is left intact.
      
      It seems wrong to inherit the security context of a packet when going
      from overlay to underlay or across forwarding paths.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarFlavio Leitner <fbl@sysclose.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a6dd2222