1. 05 Dec, 2018 5 commits
    • Eric Dumazet's avatar
      tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT · a74f0fa0
      Eric Dumazet authored
      TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12
      as a step to enable bigger tcp sndbuf limits.
      
      It works reasonably well, but the following happens :
      
      Once the limit is reached, TCP stack generates
      an [E]POLLOUT event for every incoming ACK packet.
      
      This causes a high number of context switches.
      
      This patch implements the strategy David Miller added
      in sock_def_write_space() :
      
       - If TCP socket has a notsent_lowat constraint of X bytes,
         allow sendmsg() to fill up to X bytes, but send [E]POLLOUT
         only if number of notsent bytes is below X/2
      
      This considerably reduces TCP_NOTSENT_LOWAT overhead,
      while allowing to keep the pipe full.
      
      Tested:
       100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM
      
      A:/# cat /proc/sys/net/ipv4/tcp_wmem
      4096	262144	64000000
      A:/# super_netperf 100 -H B -l 1000 -- -K bbr &
      
      A:/# grep TCP /proc/net/sockstat
      TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/
      
      A:/# vmstat 5 5
      procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
       r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
       0  0      0 256220672  13532 694976    0    0    10     0   28   14  0  1 99  0  0
       2  0      0 256320016  13532 698480    0    0   512     0 715901 5927  0 10 90  0  0
       0  0      0 256197232  13532 700992    0    0   735    13 771161 5849  0 11 89  0  0
       1  0      0 256233824  13532 703320    0    0   512    23 719650 6635  0 11 89  0  0
       2  0      0 256226880  13532 705780    0    0   642     4 775650 6009  0 12 88  0  0
      
      A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat
      
      A:/# grep TCP /proc/net/sockstat
      TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow
      
      A:/# vmstat 5 5  # check that context switches have not inflated too much.
      procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
       r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
       2  0      0 260386512  13592 662148    0    0    10     0   17   14  0  1 99  0  0
       0  0      0 260519680  13592 604184    0    0   512    13 726843 12424  0 10 90  0  0
       1  1      0 260435424  13592 598360    0    0   512    25 764645 12925  0 10 90  0  0
       1  0      0 260855392  13592 578380    0    0   512     7 722943 13624  0 11 88  0  0
       1  0      0 260445008  13592 601176    0    0   614    34 772288 14317  0 10 90  0  0
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a74f0fa0
    • David S. Miller's avatar
      Merge branch 'act_tunnel_key-support-key-less-tunnels' · 4dc88ce6
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      net/sched: act_tunnel_key: support key-less tunnels
      
      This short series from Adi Nissim allows to support key-less tunnels
      by the tc tunnel key actions, which is needed for some GRE use-cases.
      
      changes from V0:
       - addresses build warning spotted by kbuild, make sure to always init
         to zero the tunnel key
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4dc88ce6
    • Adi Nissim's avatar
      net/sched: act_tunnel_key: Don't dump dst port if it wasn't set · 1c25324c
      Adi Nissim authored
      It's possible to set a tunnel without a destination port. However,
      on dump(), a zero dst port is returned to user space even if it was not
      set, fix that.
      
      Note that so far it wasn't required, b/c key less tunnels were not
      supported and the UDP tunnels do require destination port.
      Signed-off-by: default avatarAdi Nissim <adin@mellanox.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c25324c
    • Adi Nissim's avatar
      net/sched: act_tunnel_key: Allow key-less tunnels · 80ef0f22
      Adi Nissim authored
      Allow setting a tunnel without a tunnel key. This is required for
      tunneling protocols, such as GRE, that define the key as an optional
      field.
      Signed-off-by: default avatarAdi Nissim <adin@mellanox.com>
      Acked-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80ef0f22
    • Colin Ian King's avatar
      qed: fix spelling mistake "Dispalying" -> "Displaying" · d1ecf8a6
      Colin Ian King authored
      There is a spelling mistake in a DP_NOTICE message, fix it.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1ecf8a6
  2. 04 Dec, 2018 28 commits
  3. 03 Dec, 2018 7 commits
    • David S. Miller's avatar
      Merge branch 'udp-msg_zerocopy' · 6e360f73
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      udp msg_zerocopy
      
      Enable MSG_ZEROCOPY for udp sockets
      
      Patch 1/3 is the main patch, a rework of RFC patch
        http://patchwork.ozlabs.org/patch/899630/
        more details in the patch commit message
      
      Patch 2/3 is an optimization to remove a branch from the UDP hot path
        and refcount_inc/refcount_dec_and_test pair when zerocopy is used.
        This used to be included in the first patch in v2.
      
      Patch 3/3 runs the already existing udp zerocopy tests
        as part of kselftest
      
      See also recent Linux Plumbers presentation
        https://linuxplumbersconf.org/event/2/contributions/106/attachments/104/128/willemdebruijn-lpc2018-udpgso-presentation-20181113.pdf
      
      Changes:
        v1 -> v2
          - Fixup reverse christmas tree violation
        v2 -> v3
          - Split refcount avoidance optimization into separate patch
            - Fix refcount leak on error in fragmented case
              (thanks to Paolo Abeni for pointing this one out!)
            - Fix refcount inc on zero
        v3 -> v4
          - Move skb_zcopy_set below the only kfree_skb that might cause
            a premature uarg destroy before skb_zerocopy_put_abort
            - Move the entire skb_shinfo assignment block, to keep that
      	cacheline access in one place
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e360f73
    • Willem de Bruijn's avatar
      selftests: extend zerocopy tests to udp · db63e489
      Willem de Bruijn authored
      Both msg_zerocopy and udpgso_bench have udp zerocopy variants.
      Exercise these as part of the standard kselftest run.
      
      With udp, msg_zerocopy has no control channel. Ensure that the
      receiver exits after the sender by accounting for the initial
      delay in starting them (in msg_zerocopy.sh).
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db63e489
    • Willem de Bruijn's avatar
      udp: elide zerocopy operation in hot path · 52900d22
      Willem de Bruijn authored
      With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
      Release of its last reference triggers a completion notification.
      
      The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
      the skbs, because it can build, send and free skbs within its loop,
      possibly reaching refcount zero and freeing the ubuf_info too soon.
      
      The UDP stack currently also takes this extra ref, but does not need
      it as all skbs are sent after return from __ip(6)_append_data.
      
      Avoid the extra refcount_inc and refcount_dec_and_test, and generally
      the sock_zerocopy_put in the common path, by passing the initial
      reference to the first skb.
      
      This approach is taken instead of initializing the refcount to 0, as
      that would generate error "refcount_t: increment on 0" on the
      next skb_zcopy_set.
      
      Changes
        v3 -> v4
          - Move skb_zcopy_set below the only kfree_skb that might cause
            a premature uarg destroy before skb_zerocopy_put_abort
            - Move the entire skb_shinfo assignment block, to keep that
              cacheline access in one place
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52900d22
    • Willem de Bruijn's avatar
      udp: msg_zerocopy · b5947e5d
      Willem de Bruijn authored
      Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
      interpret flag MSG_ZEROCOPY.
      
      This patch was previously part of the zerocopy RFC patchsets. Zerocopy
      is not effective at small MTU. With segmentation offload building
      larger datagrams, the benefit of page flipping outweights the cost of
      generating a completion notification.
      
      tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
      test patch and making skb_orphan_frags_rx same as skb_orphan_frags:
      
          ipv4 udp -t 1
          tx=191312 (11938 MB) txc=0 zc=n
          rx=191312 (11938 MB)
          ipv4 udp -z -t 1
          tx=304507 (19002 MB) txc=304507 zc=y
          rx=304507 (19002 MB)
          ok
          ipv6 udp -t 1
          tx=174485 (10888 MB) txc=0 zc=n
          rx=174485 (10888 MB)
          ipv6 udp -z -t 1
          tx=294801 (18396 MB) txc=294801 zc=y
          rx=294801 (18396 MB)
          ok
      
      Changes
        v1 -> v2
          - Fixup reverse christmas tree violation
        v2 -> v3
          - Split refcount avoidance optimization into separate patch
            - Fix refcount leak on error in fragmented case
              (thanks to Paolo Abeni for pointing this one out!)
            - Fix refcount inc on zero
            - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
              This is needed since commit 5cf4a853 ("tcp: really ignore
      	MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5947e5d
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-for-davem-2018-11-30' of... · ce01a56b
      David S. Miller authored
      Merge tag 'wireless-drivers-next-for-davem-2018-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for 4.21
      
      First set of patches for 4.21. Most notable here is support for
      Quantenna's QSR1000/QSR2000 chipsets and more flexible ways to provide
      nvram files for brcmfmac.
      
      Major changes:
      
      brcmfmac
      
      * add support for first trying to get a board specific nvram file
      
      * add support for getting nvram contents from EFI variables
      
      qtnfmac
      
      * use single PCIe driver for all platforms and rename
        Kconfig option CONFIG_QTNFMAC_PEARL_PCIE to CONFIG_QTNFMAC_PCIE
      
      * add support for QSR1000/QSR2000 (Topaz) family of chipsets
      
      ath10k
      
      * add support for WCN3990 firmware crash recovery
      
      * add firmware memory dump support for QCA4019
      
      wil6210
      
      * add firmware error recovery while in AP mode
      
      ath9k
      
      * remove experimental notice from dynack feature
      
      iwlwifi
      
      * PCI IDs for some new 9000-series cards
      
      * improve antenna usage on connection problems
      
      * new firmware debugging infrastructure
      
      * some more work on 802.11ax
      
      * improve support for multiple RF modules with 22000 devices
      
      cordic
      
      * move cordic macros and defines to a public header file
      
      * convert brcmsmac and b43 to fully use cordic library
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce01a56b
    • David S. Miller's avatar
      Merge branch 'davinci_emac-read-the-MAC-address-from-nvmem' · 37a0bc39
      David S. Miller authored
      Bartosz Golaszewski says:
      
      ====================
      davinci_emac: read the MAC address from nvmem
      
      This series is part of a bigger series that aims at removing the platform
      data structure from the at24 EEPROM driver[1].
      
      We provide a generalized version of of_get_nvmem_mac_address(), switch the
      only user of the of_ variant to using it, remove the previous
      implementation and use the new routine in the davinci_emac driver.
      
      [1] https://lkml.org/lkml/2018/11/13/884
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37a0bc39
    • Bartosz Golaszewski's avatar
      net: davinci_emac: use nvmem_get_mac_address() · 18dbfc81
      Bartosz Golaszewski authored
      All DaVinci boards still supported in board files now define nvmem
      cells containing the MAC address. We want to stop using the setup
      callback from at24 so the MAC address for those users will no longer
      be provided over platform data. If we didn't get a valid MAC in pdata,
      try nvmem before resorting to a random MAC.
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18dbfc81