1. 27 Oct, 2016 1 commit
  2. 26 Oct, 2016 17 commits
  3. 23 Oct, 2016 11 commits
  4. 22 Oct, 2016 7 commits
    • David S. Miller's avatar
      Merge branch 'bpf-numa-id' · 67dc1596
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Add BPF numa id helper
      
      This patch set adds a helper for retrieving current numa node
      id and a test case for SO_REUSEPORT.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67dc1596
    • Daniel Borkmann's avatar
      reuseport, bpf: add test case for bpf_get_numa_node_id · 3c2c3c16
      Daniel Borkmann authored
      The test case is very similar to reuseport_bpf_cpu, only that here
      we select socket members based on current numa node id.
      
        # numactl -H
        available: 2 nodes (0-1)
        node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
        node 0 size: 128867 MB
        node 0 free: 120080 MB
        node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
        node 1 size: 96765 MB
        node 1 free: 87504 MB
        node distances:
        node   0   1
          0:  10  20
          1:  20  10
      
        # ./reuseport_bpf_numa
        ---- IPv4 UDP ----
        send node 0, receive socket 0
        send node 1, receive socket 1
        send node 1, receive socket 1
        send node 0, receive socket 0
        ---- IPv6 UDP ----
        send node 0, receive socket 0
        send node 1, receive socket 1
        send node 1, receive socket 1
        send node 0, receive socket 0
        ---- IPv4 TCP ----
        send node 0, receive socket 0
        send node 1, receive socket 1
        send node 1, receive socket 1
        send node 0, receive socket 0
        ---- IPv6 TCP ----
        send node 0, receive socket 0
        send node 1, receive socket 1
        send node 1, receive socket 1
        send node 0, receive socket 0
        SUCCESS
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c2c3c16
    • Daniel Borkmann's avatar
      bpf: add helper for retrieving current numa node id · 2d0e30c3
      Daniel Borkmann authored
      Use case is mainly for soreuseport to select sockets for the local
      numa node, but since generic, lets also add this for other networking
      and tracing program types.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d0e30c3
    • David S. Miller's avatar
      Merge branch 'udpmem' · a10b91b8
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      udp: refactor memory accounting
      
      This patch series refactor the udp memory accounting, replacing the
      generic implementation with a custom one, in order to remove the needs for
      locking the socket on the enqueue and dequeue operations. The socket backlog
      usage is dropped, as well.
      
      The first patch factor out pieces of some queue and memory management
      socket helpers, so that they can later be used by the udp memory accounting
      functions.
      The second patch adds the memory account helpers, without using them.
      The third patch replacse the old rx memory accounting path for udp over ipv4 and
      udp over ipv6. In kernel UDP users are updated, as well.
      
      The memory accounting schema is described in detail in the individual patch
      commit message.
      
      The performance gain depends on the specific scenario; with few flows (and
      little contention in the original code) the differences are in the noise range,
      while with several flows contending the same socket, the measured speed-up
      is relevant (e.g. even over 100% in case of extreme contention)
      
      Many thanks to Eric Dumazet for the reiterated reviews and suggestions.
      
      v5 -> v6:
       - do not orphan the skb on enqueue, skb_steal_sock() already did
         the work for us
      
      v4 -> v5:
       - use the receive queue spin lock to protect the memory accounting
       - several minor clean-up
      
      v3 -> v4:
       - simplified the locking schema, always use a plain spinlock
      
      v2 -> v3:
       - do not set the now unsed backlog_rcv callback
      
      v1 -> v2:
       - changed slighly the memory accounting schema, we now perform lazy reclaim
       - fixed forward_alloc updating issue
       - fixed memory counter integer overflows
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a10b91b8
    • Paolo Abeni's avatar
      udp: use it's own memory accounting schema · 850cbadd
      Paolo Abeni authored
      Completely avoid default sock memory accounting and replace it
      with udp-specific accounting.
      
      Since the new memory accounting model encapsulates completely
      the required locking, remove the socket lock on both enqueue and
      dequeue, and avoid using the backlog on enqueue.
      
      Be sure to clean-up rx queue memory on socket destruction, using
      udp its own sk_destruct.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr readers      Kpps (vanilla)  Kpps (patched)
      1               170             440
      3               1250            2150
      6               3000            3650
      9               4200            4450
      12              5700            6250
      
      v4 -> v5:
        - avoid unneeded test in first_packet_length
      
      v3 -> v4:
        - remove useless sk_rcvqueues_full() call
      
      v2 -> v3:
        - do not set the now unsed backlog_rcv callback
      
      v1 -> v2:
        - add memory pressure support
        - fixed dropwatch accounting for ipv6
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      850cbadd
    • Paolo Abeni's avatar
      udp: implement memory accounting helpers · f970bd9e
      Paolo Abeni authored
      Avoid using the generic helpers.
      Use the receive queue spin lock to protect the memory
      accounting operation, both on enqueue and on dequeue.
      
      On dequeue perform partial memory reclaiming, trying to
      leave a quantum of forward allocated memory.
      
      On enqueue use a custom helper, to allow some optimizations:
      - use a plain spin_lock() variant instead of the slightly
        costly spin_lock_irqsave(),
      - avoid dst_force check, since the calling code has already
        dropped the skb dst
      - avoid orphaning the skb, since skb_steal_sock() already did
        the work for us
      
      The above needs custom memory reclaiming on shutdown, provided
      by the udp_destruct_sock().
      
      v5 -> v6:
        - don't orphan the skb on enqueue
      
      v4 -> v5:
        - replace the mem_lock with the receive queue spin lock
        - ensure that the bh is always allowed to enqueue at least
          a skb, even if sk_rcvbuf is exceeded
      
      v3 -> v4:
        - reworked memory accunting, simplifying the schema
        - provide an helper for both memory scheduling and enqueuing
      
      v1 -> v2:
        - use a udp specific destrctor to perform memory reclaiming
        - remove a couple of helpers, unneeded after the above cleanup
        - do not reclaim memory on dequeue if not under memory
          pressure
        - reworked the fwd accounting schema to avoid potential
          integer overflow
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f970bd9e
    • Paolo Abeni's avatar
      net/socket: factor out helpers for memory and queue manipulation · f8c3bf00
      Paolo Abeni authored
      Basic sock operations that udp code can use with its own
      memory accounting schema. No functional change is introduced
      in the existing APIs.
      
      v4 -> v5:
        - avoid whitespace changes
      
      v2 -> v4:
        - avoid exporting __sock_enqueue_skb
      
      v1 -> v2:
        - avoid export sock_rmem_free
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8c3bf00
  5. 21 Oct, 2016 2 commits
    • Jarod Wilson's avatar
      net: remove MTU limits on a few ether_setup callers · 8b1efc0f
      Jarod Wilson authored
      These few drivers call ether_setup(), but have no ndo_change_mtu, and thus
      were overlooked for changes to MTU range checking behavior. They
      previously had no range checks, so for feature-parity, set their min_mtu
      to 0 and max_mtu to ETH_MAX_MTU (65535), instead of the 68 and 1500
      inherited from the ether_setup() changes. Fine-tuning can come after we get
      back to full feature-parity here.
      
      CC: netdev@vger.kernel.org
      Reported-by: default avatarAsbjoern Sloth Toennesen <asbjorn@asbjorn.st>
      CC: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
      CC: R Parameswaran <parameswaran.r7@gmail.com>
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b1efc0f
    • Vitaly Kuznetsov's avatar
      hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf() · e8f0a89c
      Vitaly Kuznetsov authored
      Fix in commit 88098834 ("hv_netvsc: set nvdev link after populating
      chn_table") turns out to be incomplete. A crash in
      netvsc_get_next_send_section() is observed on mtu change when the device
      is under load. The race I identified is: if we get to netvsc_send() after
      we set net_device_ctx->nvdev link in netvsc_device_add() but before we
      finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not
      allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev
      link after the netvsc_init_buf() call as during the negotiation we need
      to receive packets and on the receive path we check for it. It would
      probably be possible to split nvdev into a pair of nvdev_in and nvdev_out
      links and check them accordingly in get_outbound_net_device()/
      get_inbound_net_device() but this looks like an overkill.
      
      Check that send_section_map is allocated in netvsc_send().
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8f0a89c
  6. 20 Oct, 2016 2 commits
    • David S. Miller's avatar
      Merge branch 'MTU-core-range-checking-more' · 9e58c5dc
      David S. Miller authored
      Jarod Wilson says:
      
      ====================
      net: use core MTU range checking everywhere
      
      This stack of patches should get absolutely everything in the kernel
      converted from doing their own MTU range checking to the core MTU range
      checking. This second spin includes alterations to hopefully fix all
      concerns raised with the first, as well as including some additional
      changes to drivers and infrastructure where I completely missed necessary
      updates.
      
      These have all been built through the 0-day build infrastructure via the
      (rebasing) master branch at https://github.com/jarodwilson/linux-muck, which
      at the time of the most recent compile across 147 configs, was based on
      net-next at commit 7b1536ef.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e58c5dc
    • Jarod Wilson's avatar
      ipv4/6: use core net MTU range checking · b96f9afe
      Jarod Wilson authored
      ipv4/ip_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_vti:
      - min_mtu = 1280, max_mtu = 65535
      - remove redundant vti6_change_mtu
      
      ipv6/sit:
      - min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen
      - remove redundant ipip6_tunnel_change_mtu
      
      CC: netdev@vger.kernel.org
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b96f9afe