1. 16 Dec, 2014 5 commits
    • willy tarreau's avatar
      net: mvneta: fix Tx interrupt delay · 9823d713
      willy tarreau authored
      [ Upstream commit aebea2ba ]
      
      The mvneta driver sets the amount of Tx coalesce packets to 16 by
      default. Normally that does not cause any trouble since the driver
      uses a much larger Tx ring size (532 packets). But some sockets
      might run with very small buffers, much smaller than the equivalent
      of 16 packets. This is what ping is doing for example, by setting
      SNDBUF to 324 bytes rounded up to 2kB by the kernel.
      
      The problem is that there is no documented method to force a specific
      packet to emit an interrupt (eg: the last of the ring) nor is it
      possible to make the NIC emit an interrupt after a given delay.
      
      In this case, it causes trouble, because when ping sends packets over
      its raw socket, the few first packets leave the system, and the first
      15 packets will be emitted without an IRQ being generated, so without
      the skbs being freed. And since the socket's buffer is small, there's
      no way to reach that amount of packets, and the ping ends up with
      "send: no buffer available" after sending 6 packets. Running with 3
      instances of ping in parallel is enough to hide the problem, because
      with 6 packets per instance, that's 18 packets total, which is enough
      to grant a Tx interrupt before all are sent.
      
      The original driver in the LSP kernel worked around this design flaw
      by using a software timer to clean up the Tx descriptors. This timer
      was slow and caused terrible network performance on some Tx-bound
      workloads (such as routing) but was enough to make tools like ping
      work correctly.
      
      Instead here, we simply set the packet counts before interrupt to 1.
      This ensures that each packet sent will produce an interrupt. NAPI
      takes care of coalescing interrupts since the interrupt is disabled
      once generated.
      
      No measurable performance impact nor CPU usage were observed on small
      nor large packets, including when saturating the link on Tx, and this
      fixes tools like ping which rely on too small a send buffer. If one
      wants to increase this value for certain workloads where it is safe
      to do so, "ethtool -C $dev tx-frames" will override this default
      setting.
      
      This fix needs to be applied to stable kernels starting with 3.10.
      Tested-By: default avatarMaggie Mae Roxas <maggie.mae.roxas@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9823d713
    • Denis Kirjanov's avatar
      mips: bpf: Fix broken BPF_MOD · 6c2f1fef
      Denis Kirjanov authored
      [ Upstream commit 2e46477a ]
      
      Remove optimize_div() from BPF_MOD | BPF_K case
      since we don't know the dividend and fix the
      emit_mod() by reading the mod operation result from HI register
      Signed-off-by: default avatarDenis Kirjanov <kda@linux-powerpc.org>
      Reviewed-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c2f1fef
    • Pravin B Shelar's avatar
      openvswitch: Fix flow mask validation. · 3e496d49
      Pravin B Shelar authored
      [ Upstream commit f2a01517 ]
      
      Following patch fixes typo in the flow validation. This prevented
      installation of ARP and IPv6 flows.
      
      Fixes: 19e7a3df ("openvswitch: Fix NDP flow mask validation")
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Reviewed-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e496d49
    • Tom Herbert's avatar
      gre: Set inner mac header in gro complete · 435dcf66
      Tom Herbert authored
      [ Upstream commit 6fb2a756 ]
      
      Set the inner mac header to point to the GRE payload when
      doing GRO. This is needed if we proceed to send the packet
      through GRE GSO which now uses the inner mac header instead
      of inner network header to determine the length of encapsulation
      headers.
      
      Fixes: 14051f04 ("gre: Use inner mac length when computing tunnel length")
      Reported-by: default avatarWolfgang Walter <linux@stwm.de>
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      435dcf66
    • Marcelo Leitner's avatar
      Fix race condition between vxlan_sock_add and vxlan_sock_release · 8407165b
      Marcelo Leitner authored
      [ Upstream commit 00c83b01 ]
      
      Currently, when trying to reuse a socket, vxlan_sock_add will grab
      vn->sock_lock, locate a reusable socket, inc refcount and release
      vn->sock_lock.
      
      But vxlan_sock_release() will first decrement refcount, and then grab
      that lock. refcnt operations are atomic but as currently we have
      deferred works which hold vs->refcnt each, this might happen, leading to
      a use after free (specially after vxlan_igmp_leave):
      
        CPU 1                            CPU 2
      
      deferred work                    vxlan_sock_add
        ...                              ...
                                         spin_lock(&vn->sock_lock)
                                         vs = vxlan_find_sock();
        vxlan_sock_release
          dec vs->refcnt, reaches 0
          spin_lock(&vn->sock_lock)
                                         vxlan_sock_hold(vs), refcnt=1
                                         spin_unlock(&vn->sock_lock)
          hlist_del_rcu(&vs->hlist);
          vxlan_notify_del_rx_port(vs)
          spin_unlock(&vn->sock_lock)
      
      So when we look for a reusable socket, we check if it wasn't freed
      already before reusing it.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Fixes: 7c47cedf ("vxlan: move IGMP join/leave to work queue")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8407165b
  2. 07 Dec, 2014 2 commits
  3. 06 Dec, 2014 2 commits
  4. 05 Dec, 2014 7 commits
  5. 04 Dec, 2014 5 commits
  6. 03 Dec, 2014 19 commits