1. 28 Apr, 2016 37 commits
    • Eric Dumazet's avatar
      tcp: give prequeue mode some care · 0cef6a4c
      Eric Dumazet authored
      TCP prequeue goal is to defer processing of incoming packets
      to user space thread currently blocked in a recvmsg() system call.
      
      Intent is to spend less time processing these packets on behalf
      of softirq handler, as softirq handler is unfair to normal process
      scheduler decisions, as it might interrupt threads that do not
      even use networking.
      
      Current prequeue implementation has following issues :
      
      1) It only checks size of the prequeue against sk_rcvbuf
      
         It was fine 15 years ago when sk_rcvbuf was in the 64KB vicinity.
         But we now have ~8MB values to cope with modern networking needs.
         We have to add sk_rmem_alloc in the equation, since out of order
         packets can definitely use up to sk_rcvbuf memory themselves.
      
      2) Even with a fixed memory truesize check, prequeue can be filled
         by thousands of packets. When prequeue needs to be flushed, either
         from sofirq context (in tcp_prequeue() or timer code), or process
         context (in tcp_prequeue_process()), this adds a latency spike
         which is often not desirable.
         I added a fixed limit of 32 packets, as this translated to a max
         flush time of 60 us on my test hosts.
      
         Also note that all packets in prequeue are not accounted for tcp_mem,
         since they are not charged against sk_forward_alloc at this point.
         This is probably not a big deal.
      
      Note that this might increase LINUX_MIB_TCPPREQUEUEDROPPED counts,
      which is misnamed, as packets are not dropped at all, but rather pushed
      to the stack (where they can be either consumed or dropped)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cef6a4c
    • Michal Kazior's avatar
      fq: split out backlog update logic · b43e7199
      Michal Kazior authored
      mac80211 (which will be the first user of the
      fq.h) recently started to support software A-MSDU
      aggregation. It glues skbuffs together into a
      single one so the backlog accounting needs to be
      more fine-grained.
      
      To avoid backlog sorting logic duplication split
      it up for re-use.
      Signed-off-by: default avatarMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b43e7199
    • Dan Carpenter's avatar
      tipc: remove an unnecessary NULL check · b4358657
      Dan Carpenter authored
      This is never called with a NULL "buf" and anyway, we dereference 's' on
      the lines before so it would Oops before we reach the check.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4358657
    • Arnd Bergmann's avatar
      net/mlx5e: avoid stack overflow in mlx5e_open_channels · 6b87663f
      Arnd Bergmann authored
      struct mlx5e_channel_param is a large structure that is allocated
      on the stack of mlx5e_open_channels, and with a recent change
      it has grown beyond the warning size for the maximum stack
      that a single function should use:
      
      mellanox/mlx5/core/en_main.c: In function 'mlx5e_open_channels':
      mellanox/mlx5/core/en_main.c:1325:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      The function is already using dynamic allocation and is not in
      a fast path, so the easiest workaround is to use another kzalloc
      for allocating the channel parameters.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: d3c9bc27 ("net/mlx5e: Added ICO SQs")
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b87663f
    • Jason Wang's avatar
      tuntap: calculate rps hash only when needed · 3df97ba8
      Jason Wang authored
      There's no need to calculate rps hash if it was not enabled. So this
      patch export rps_needed and check it before trying to get rps
      hash. Tests (using pktgen to inject packets to guest) shows this can
      improve pps about 13% (when rps is disabled).
      
      Before:
      ~1150000 pps
      After:
      ~1300000 pps
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      ----
      Changes from V1:
      - Fix build when CONFIG_RPS is not set
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3df97ba8
    • David S. Miller's avatar
      Merge branch 'tcp-eor' · f345c9a5
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      tcp: Make use of MSG_EOR in tcp_sendmsg
      
      v4:
      ~ Do not set eor bit in do_tcp_sendpages() since there is
        no way to pass MSG_EOR from the userland now.
      ~ Avoid rmw by testing MSG_EOR first in tcp_sendmsg().
      ~ Move TCP_SKB_CB(skb)->eor test to a new helper
        tcp_skb_can_collapse_to() (suggested by Soheil).
      ~ Add some packetdrill tests.
      
      v3:
      ~ Separate EOR marking from the SKBTX_ANY_TSTAMP logic.
      ~ Move the eor bit test back to the loop in tcp_sendmsg and
        tcp_sendpage because there could be >1 threads doing
        sendmsg.
      ~ Thanks to Eric Dumazet's suggestions on v2.
      ~ The TCP timestamp bug fixes are separated into other threads.
      
      v2:
      ~ Rework based on the recent work
        "add TX timestamping via cmsg" by
        Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
      ~ This version takes the MSG_EOR bit as a signal of
        end-of-response-message and leave the selective
        timestamping job to the cmsg
      ~ Changes based on the v1 feedback (like avoid
        unlikely check in a loop and adding tcp_sendpage
        support)
      ~ The first 3 patches are bug fixes.  The fixes in this
        series depend on the newly introduced txstamp_ack in
        net-next.  I will make relevant patches against net after
        getting some feedback.
      ~ The test results are based on the recently posted net fix:
        "tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks"
      
      One potential use case is to use MSG_EOR with
      SOF_TIMESTAMPING_TX_ACK to get a more accurate
      TCP ack timestamping on application protocol with
      multiple outgoing response messages (e.g. HTTP2).
      
      One of our use case is at the webserver.  The webserver tracks
      the HTTP2 response latency by measuring when the webserver sends
      the first byte to the socket till the TCP ACK of the last byte
      is received.  In the cases where we don't have client side
      measurement, measuring from the server side is the only option.
      In the cases we have the client side measurement, the server side
      data can also be used to justify/cross-check-with the client
      side data.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f345c9a5
    • Martin KaFai Lau's avatar
      tcp: Handle eor bit when fragmenting a skb · a166140e
      Martin KaFai Lau authored
      When fragmenting a skb, the next_skb should carry
      the eor from prev_skb.  The eor of prev_skb should
      also be reset.
      
      Packetdrill script for testing:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 sendto(4, ..., 15330, MSG_EOR, ..., ...) = 15330
      0.200 sendto(4, ..., 730, 0, ..., ...) = 730
      
      0.200 > .  1:7301(7300) ack 1
      0.200 > . 7301:14601(7300) ack 1
      
      0.300 < . 1:1(0) ack 14601 win 257
      0.300 > P. 14601:15331(730) ack 1
      0.300 > P. 15331:16061(730) ack 1
      
      0.400 < . 1:1(0) ack 16061 win 257
      0.400 close(4) = 0
      0.400 > F. 16061:16061(0) ack 1
      0.400 < F. 1:1(0) ack 16062 win 257
      0.400 > . 16062:16062(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a166140e
    • Martin KaFai Lau's avatar
      tcp: Handle eor bit when coalescing skb · a643b5d4
      Martin KaFai Lau authored
      This patch:
      1. Prevent next_skb from coalescing to the prev_skb if
         TCP_SKB_CB(prev_skb)->eor is set
      2. Update the TCP_SKB_CB(prev_skb)->eor if coalescing is
         allowed
      
      Packetdrill script for testing:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
      0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
      0.200 write(4, ..., 11680) = 11680
      
      0.200 > P. 1:731(730) ack 1
      0.200 > P. 731:1461(730) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:13141(4380) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:13141,nop,nop>
      0.300 > P. 1:731(730) ack 1
      0.300 > P. 731:1461(730) ack 1
      0.400 < . 1:1(0) ack 13141 win 257
      
      0.400 close(4) = 0
      0.400 > F. 13141:13141(0) ack 1
      0.500 < F. 1:1(0) ack 13142 win 257
      0.500 > . 13142:13142(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a643b5d4
    • Martin KaFai Lau's avatar
      tcp: Make use of MSG_EOR in tcp_sendmsg · c134ecb8
      Martin KaFai Lau authored
      This patch adds an eor bit to the TCP_SKB_CB.  When MSG_EOR
      is passed to tcp_sendmsg, the eor bit will be set at the skb
      containing the last byte of the userland's msg.  The eor bit
      will prevent data from appending to that skb in the future.
      
      The change in do_tcp_sendpages is to honor the eor set
      during the previous tcp_sendmsg(MSG_EOR) call.
      
      This patch handles the tcp_sendmsg case.  The followup patches
      will handle other skb coalescing and fragment cases.
      
      One potential use case is to use MSG_EOR with
      SOF_TIMESTAMPING_TX_ACK to get a more accurate
      TCP ack timestamping on application protocol with
      multiple outgoing response messages (e.g. HTTP2).
      
      Packetdrill script for testing:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 14600) = 14600
      0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
      0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
      
      0.200 > .  1:7301(7300) ack 1
      0.200 > P. 7301:14601(7300) ack 1
      
      0.300 < . 1:1(0) ack 14601 win 257
      0.300 > P. 14601:15331(730) ack 1
      0.300 > P. 15331:16061(730) ack 1
      
      0.400 < . 1:1(0) ack 16061 win 257
      0.400 close(4) = 0
      0.400 > F. 16061:16061(0) ack 1
      0.400 < F. 1:1(0) ack 16062 win 257
      0.400 > . 16062:16062(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c134ecb8
    • David S. Miller's avatar
      Merge branch 'tcp-redundant-checks' · 2a9e8438
      David S. Miller authored
      Soheil Hassas Yeganeh says:
      
      ====================
      tcp: simplify ack tx timestamps
      
      v2:
      - Fully remove SKBTX_ACK_TSTAMP, as suggested by Willem de Bruijn.
      
      This patch series aims at removing redundant checks and fields
      for ack timestamps for TCP.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a9e8438
    • Soheil Hassas Yeganeh's avatar
      tcp: remove SKBTX_ACK_TSTAMP since it is redundant · 0a2cf20c
      Soheil Hassas Yeganeh authored
      The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when
      the timestamp of the TCP acknowledgement should be reported on
      error queue. Since accessing skb_shinfo is likely to incur a
      cache-line miss at the time of receiving the ack, the
      txstamp_ack bit was added in tcp_skb_cb, which is set iff
      the SKBTX_ACK_TSTAMP flag is set for an skb. This makes
      SKBTX_ACK_TSTAMP flag redundant.
      
      Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit
      everywhere.
      
      Note that this frees one bit in shinfo->tx_flags.
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Suggested-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a2cf20c
    • Soheil Hassas Yeganeh's avatar
      tcp: remove an unnecessary check in tcp_tx_timestamp · 863c1fd9
      Soheil Hassas Yeganeh authored
      Remove the redundant check for sk->sk_tsflags in tcp_tx_timestamp.
      
      tcp_tx_timestamp() receives the tsflags as a parameter. As a
      result the "sk->sk_tsflags || tsflags" is redundant, since
      tsflags already includes sk->sk_tsflags plus overrides from
      control messages.
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      863c1fd9
    • Eric Dumazet's avatar
      net: snmp: fix 64bit stats on 32bit arches · ba7863f4
      Eric Dumazet authored
      I accidentally replaced BH disabling by preemption disabling
      in SNMP_ADD_STATS64() and SNMP_UPD_PO_STATS64() on 32bit builds.
      
      For 64bit stats on 32bit arch, we really need to disable BH,
      since the "struct u64_stats_sync syncp" might be manipulated
      both from process and BH contexts.
      
      Fixes: 6aef70a8 ("net: snmp: kill various STATS_USER() helpers")
      Reported-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Tested-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba7863f4
    • David S. Miller's avatar
      Merge branch 'socket-space-optimizations' · 8be2748a
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: avoid some atomic ops when FASYNC is not used
      
      We can avoid some atomic operations on sockets not using FASYNC
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8be2748a
    • Eric Dumazet's avatar
      net: SOCKWQ_ASYNC_WAITDATA optimizations · 4be73522
      Eric Dumazet authored
      SOCKWQ_ASYNC_WAITDATA is set/cleared in sk_wait_data()
      and equivalent functions, so that sock_wake_async() can send
      a SIGIO only when necessary.
      
      Since these atomic operations are really not needed unless
      socket expressed interest in FASYNC, we can omit them in most
      cases.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4be73522
    • Eric Dumazet's avatar
      net: SOCKWQ_ASYNC_NOSPACE optimizations · 9317bb69
      Eric Dumazet authored
      SOCKWQ_ASYNC_NOSPACE is tested in sock_wake_async()
      so that a SIGIO signal is sent when needed.
      
      tcp_sendmsg() clears the bit.
      tcp_poll() sets the bit when stream is not writeable.
      
      We can avoid two atomic operations by first checking if socket
      is actually interested in the FASYNC business (most sockets in
      real applications do not use AIO, but select()/poll()/epoll())
      
      This also removes one cache line miss to access sk->sk_wq->flags
      in tcp_sendmsg()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9317bb69
    • David S. Miller's avatar
      Merge branch 'snmp-stats-update' · 210732d1
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: snmp: update SNMP methods
      
      In the old days (before linux-3.0), SNMP counters were duplicated,
      one set for user context, and anther one for BH context.
      
      After commit 8f0ea0fe ("snmp: reduce percpu needs by 50%")
      we have a single copy, and what really matters is preemption being
      enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
      respectively.
      
      This patch series kills the obsolete STATS_USER() helpers,
      and rename all XXX_BH() helpers to __XXX() ones, to more
      closely match conventions used to update per cpu variables.
      
      This is probably going to hurt maintainers job for a while,
      since cherry-picks will not be clean, but this had to be
      cleaned at one point. I am so sorry guys.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      210732d1
    • Eric Dumazet's avatar
      net: snmp: kill STATS_BH macros · 13415e46
      Eric Dumazet authored
      There is nothing related to BH in SNMP counters anymore,
      since linux-3.0.
      
      Rename helpers to use __ prefix instead of _BH prefix,
      for contexts where preemption is disabled.
      
      This more closely matches convention used to update
      percpu variables.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13415e46
    • Eric Dumazet's avatar
      ipv6: kill ICMP6MSGIN_INC_STATS_BH() · f3832ed2
      Eric Dumazet authored
      IPv6 ICMP stats are atomics anyway.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3832ed2
    • Eric Dumazet's avatar
      ipv6: rename IP6_UPD_PO_STATS_BH() · c2005eb0
      Eric Dumazet authored
      Rename IP6_UPD_PO_STATS_BH() to __IP6_UPD_PO_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2005eb0
    • Eric Dumazet's avatar
      ipv6: rename IP6_INC_STATS_BH() · 1d015503
      Eric Dumazet authored
      Rename IP6_INC_STATS_BH() to __IP6_INC_STATS()
      and IP6_ADD_STATS_BH() to __IP6_ADD_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d015503
    • Eric Dumazet's avatar
      net: rename NET_{ADD|INC}_STATS_BH() · 02a1d6e7
      Eric Dumazet authored
      Rename NET_INC_STATS_BH() to __NET_INC_STATS()
      and NET_ADD_STATS_BH() to __NET_ADD_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02a1d6e7
    • Eric Dumazet's avatar
      net: rename IP_UPD_PO_STATS_BH() · b15084ec
      Eric Dumazet authored
      Rename IP_UPD_PO_STATS_BH() to __IP_UPD_PO_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b15084ec
    • Eric Dumazet's avatar
      net: rename IP_ADD_STATS_BH() · 98f61995
      Eric Dumazet authored
      Rename IP_ADD_STATS_BH() to __IP_ADD_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98f61995
    • Eric Dumazet's avatar
      net: rename ICMP6_INC_STATS_BH() · a16292a0
      Eric Dumazet authored
      Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a16292a0
    • Eric Dumazet's avatar
      net: rename IP_INC_STATS_BH() · b45386ef
      Eric Dumazet authored
      Rename IP_INC_STATS_BH() to __IP_INC_STATS(), to
      better express this is used in non preemptible context.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b45386ef
    • Eric Dumazet's avatar
      net: sctp: rename SCTP_INC_STATS_BH() · 08e3baef
      Eric Dumazet authored
      Rename SCTP_INC_STATS_BH() to __SCTP_INC_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08e3baef
    • Eric Dumazet's avatar
      net: icmp: rename ICMPMSGIN_INC_STATS_BH() · 214d3f1f
      Eric Dumazet authored
      Remove misleading _BH suffix.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      214d3f1f
    • Eric Dumazet's avatar
      net: tcp: rename TCP_INC_STATS_BH · 90bbcc60
      Eric Dumazet authored
      Rename TCP_INC_STATS_BH() to __TCP_INC_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90bbcc60
    • Eric Dumazet's avatar
      net: xfrm: kill XFRM_INC_STATS_BH() · b540f9d7
      Eric Dumazet authored
      Not used anymore.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b540f9d7
    • Eric Dumazet's avatar
      net: udp: rename UDP_INC_STATS_BH() · 02c22347
      Eric Dumazet authored
      Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(),
      and UDP6_INC_STATS_BH() to __UDP6_INC_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02c22347
    • Eric Dumazet's avatar
      net: rename ICMP_INC_STATS_BH() · 5d3848bc
      Eric Dumazet authored
      Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d3848bc
    • Eric Dumazet's avatar
      dccp: rename DCCP_INC_STATS_BH() · aa62d76b
      Eric Dumazet authored
      Rename DCCP_INC_STATS_BH() to __DCCP_INC_STATS()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa62d76b
    • Eric Dumazet's avatar
      net: snmp: kill various STATS_USER() helpers · 6aef70a8
      Eric Dumazet authored
      In the old days (before linux-3.0), SNMP counters were duplicated,
      one for user context, and one for BH context.
      
      After commit 8f0ea0fe ("snmp: reduce percpu needs by 50%")
      we have a single copy, and what really matters is preemption being
      enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
      respectively.
      
      We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
      NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
      SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
      UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()
      
      Following patches will rename __BH helpers to make clear their
      usage is not tied to BH being disabled.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6aef70a8
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 2995aea5
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-04-27
      
      This series contains updates to i40e and i40evf.
      
      Alex Duyck cleans up the feature flags since they are becoming pretty
      "massive", the primary change being that we now build our features list
      around hw_encap_features.  Added support for IPIP and SIT offloads,
      which should improvement in throughput for IPIP and SIT tunnels with
      the offload enabled.
      
      Mitch adds support for configuring RSS on behalf of the VFs, which removes
      the burden of dealing with different hardware interfaces from the VF
      drivers and improves future compatibility.  Fix to ensure that we do not
      panic by checking that the vsi_res pointer is valid before dereferencing
      it, after which we can drink beer and eat peanuts.
      
      Shannon does come housekeeping in i40e_add_fdir_ethtool() in preparation
      for more cloud filter work.  Added flexibility to the nvmupdate
      facility by adding the ability to specify an AQ event opcode to wait on
      after Exec_AQ request.
      
      Michal adds device capability which defines if an update is available and
      if a security check is needed during the update process.
      
      Kamil just adds a device id to support X722 QSFP+ device.
      
      Greg fixes an issue where a mirror rule ID may be zero, so do not return
      invalid parameter when the user passes in a zero for a rule ID.  Adds
      support to steer packets to VSIs by VLAN tag alone while being in
      promiscuous mode for multicast and unicast MAC addresses.
      
      Jesse fixes the driver from offloading the VLAN tag into the skb any
      time there was a VLAN tag and the hardware stripping was enabled, to
      making sure it is enabled before put_tag.
      
      v2: Dropped patch 8 ("i40e: Allow user to change input set mask for flow
          director") while Kiran reworks a more generalized solution based
          on feedback from David Miller.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2995aea5
    • Eric Dumazet's avatar
      net-rfs: fix false sharing accessing sd->input_queue_head · 501e7ef5
      Eric Dumazet authored
      sd->input_queue_head is incremented for each processed packet
      in process_backlog(), and read from other cpus performing
      Out Of Order avoidance in get_rps_cpu()
      
      Moving this field in a separate cache line keeps it mostly
      hot for the cpu in process_backlog(), as other cpus will
      only read it.
      
      In a stress test, process_backlog() was consuming 6.80 % of cpu cycles,
      and the patch reduced the cost to 0.65 %
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      501e7ef5
    • Akinobu Mita's avatar
      net: w5100: support W5500 · 35ef7d68
      Akinobu Mita authored
      This adds support for W5500 chip.
      
      W5500 has similar register and memory organization with W5100 and W5200.
      There are a few important differences listed below but it is still
      possible to share common code with W5100 and W5200.
      
      * W5500 register and memory are organized by multiple blocks.  Each one
      is selected by 16bits offset address and 5bits block select bits.
      
      But the existing register access operations take u16 address.  This change
      extends the addess by u32 address and put offset address to lower 16bits
      and block select bits to upper 16bits.
      
      This change also adds the offset addresses for socket register and TX/RX
      memory blocks to the driver private data structure in order to reduce
      conditional switches for each chip.
      
      * W5500 has the different register offset for socket interrupt mask
      register.  Newly added internal functions w5100_enable_intr() and
      w5100_disable_intr() take care of the diffrence.
      
      * W5500 has the different register offset for retry time-value register.
      But this register is only used to verify that the reset value is correctly
      read at initialization.  So move the verification to w5100_hw_reset()
      which already does different things for different chips.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Mike Sinkovsky <msink@permonline.ru>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35ef7d68
  2. 27 Apr, 2016 3 commits