1. 04 Feb, 2021 27 commits
  2. 03 Feb, 2021 13 commits
    • Jakub Kicinski's avatar
      Merge branch 'net-use-indirect_call-in-some-dst_ops' · 2d912da0
      Jakub Kicinski authored
      Brian Vazquez says:
      
      ====================
      net: use INDIRECT_CALL in some dst_ops
      
      This patch series uses the INDIRECT_CALL wrappers in some dst_ops
      functions to mitigate retpoline costs. Benefits depend on the
      platform as described below.
      
      Background: The kernel rewrites the retpoline code at
      __x86_indirect_thunk_r11 depending on the CPU's requirements.
      The INDIRECT_CALL wrappers provide hints on possible targets and
      save the retpoline overhead using a direct call in case the
      target matches one of the hints.
      
      The retpoline overhead for the following three cases has been
      measured by Luigi Rizzo in microbenchmarks, using CPU performance
      counters, and cover reasonably well the range of possible retpoline
      overheads compared to a plain indirect call (in equal conditions,
      specifically with predicted branch, hot cache):
      
      - just "jmp *(%r11)" on modern platforms like Intel Cascadelake.
        In this case the overhead is just 2 clock cycles:
      
      - "lfence; jmp *(%r11)" on e.g. some recent AMD CPUs.
        In this case the lfence is blocked until pending reads complete,
        so the actual overhead depends on previous instructions.
        The best case we have measured 15 clock cycles of overhead.
      
      - worst case, e.g. skylake, the full retpoline is used
      
          __x86_indirect_thunk_r11:     call set_u_target
          capture_speculation:          pause
                                        lfence
                                        jmp capture_speculation
          .align 16
          set_up_target:                mov %r11, (%rsp)
                                        ret
      
         In this case the overhead has been measured in 35-40 clock cycles.
      
      The actual time saved hence depends on the platform and current
      clock speed (which varies heavily, especially when C-states are active).
      Also note that actual benefit might be lower than expected if the
      longer retpoline overlaps with some pending memory read.
      
      MEASUREMENTS:
      The INDIRECT_CALL wrappers in this patchset involve the processing
      of incoming SYN and generation of syncookies. Hence, the test has been
      run by configuring a receiving host with a single NIC rx queue, disabling
      RPS and RFS so that all processing occurs on the same core.
      An external source generates SYN fast enough to saturate the receiving CPU.
      We ran two sets of experiments, with and without the dst_output patch,
      comparing the number of syncookies generated over a 20s period
      in multiple runs.
      
      Assuming the CPU is saturated, the time per packet is
         t = number_of_packets/total_time
      and if the two datasets have statistically meaningful difference,
      the difference in times between the two cases gives an estimate
      of the benefits from one INDIRECT_CALL.
      
      Here are the experimental results:
      
      Skylake     Syncookies over 20s (5 tests)
      ---------------------------------------------------
      indirect    9166325 9182023 9170093 9134014 9171082
      retpoline   9099308 9126350 9154841 9056377 9122376
      
      Computing the stats on the ns_pkt = 20e6/total_packets gives the following:
      
      $ ministat -c 95 -w 70 /tmp/sk-indirect /tmp/sk-retp
      x /tmp/sk-indirect
      + /tmp/sk-retp
      +----------------------------------------------------------------------+
      |x     xx x     +          x    + +           +                       +|
      ||______M__A_______|_|____________M_____A___________________|          |
      +----------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x   5   2.17817e-06   2.18962e-06     2.181e-06  2.182292e-06 4.3252133e-09
      +   5   2.18464e-06   2.20839e-06   2.19241e-06  2.194974e-06 8.8695958e-09
      Difference at 95.0% confidence
              1.2682e-08 +/- 1.01766e-08
              0.581132% +/- 0.466326%
              (Student's t, pooled s = 6.97772e-09)
      
      This suggests a difference of 13ns +/- 10ns
      Our expectation from microbenchmarks was 35-40 cycles per call,
      but part of the gains may be eaten by stalls from pending memory reads.
      
      For Cascadelake:
      Cascadelake     Syncookies over 20s (5 tests)
      ---------------------------------------------------------
      indirect     10339797 10297547 10366826 10378891 10384854
      retpoline    10332674 10366805 10320374 10334272 10374087
      
      Computing the stats on the ns_pkt = 20e6/total_packets gives no
      meaningful difference even at just 80% (this was expected):
      
      $ ministat -c 80 -w 70 /tmp/cl-indirect /tmp/cl-retp
      x /tmp/cl-indirect
      + /tmp/cl-retp
      +----------------------------------------------------------------------+
      |   x    x  +     *                   x   + +        +                x|
      ||______________|_M_________A_____A_______M________|___|               |
      +----------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x   5   1.92588e-06   1.94221e-06   1.92923e-06  1.931716e-06 6.6936746e-09
      +   5   1.92788e-06   1.93791e-06   1.93531e-06  1.933188e-06 4.3734106e-09
      No difference proven at 80.0% confidence
      ====================
      
      Link: https://lore.kernel.org/r/20210201174132.3534118-1-brianvv@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2d912da0
    • Brian Vazquez's avatar
      net: indirect call helpers for ipv4/ipv6 dst_check functions · bbd807df
      Brian Vazquez authored
      This patch avoids the indirect call for the common case:
      ip6_dst_check and ipv4_dst_check
      Signed-off-by: default avatarBrian Vazquez <brianvv@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bbd807df
    • Brian Vazquez's avatar
      net: use indirect call helpers for dst_mtu · f67fbeae
      Brian Vazquez authored
      This patch avoids the indirect call for the common case:
      ip6_mtu and ipv4_mtu
      Signed-off-by: default avatarBrian Vazquez <brianvv@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f67fbeae
    • Brian Vazquez's avatar
      net: use indirect call helpers for dst_output · 6585d7dc
      Brian Vazquez authored
      This patch avoids the indirect call for the common case:
      ip6_output and ip_output
      Signed-off-by: default avatarBrian Vazquez <brianvv@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6585d7dc
    • Brian Vazquez's avatar
      net: use indirect call helpers for dst_input · e43b2190
      Brian Vazquez authored
      This patch avoids the indirect call for the common case:
      ip_local_deliver and ip6_input
      Signed-off-by: default avatarBrian Vazquez <brianvv@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e43b2190
    • Emil Renner Berthing's avatar
      net: usb: cdc_ncm: use new API for bh tasklet · 4f4e5436
      Emil Renner Berthing authored
      This converts the driver to use the new tasklet API introduced in
      commit 12cc923f ("tasklet: Introduce new initialization API")
      
      It is unfortunate that we need to add a pointer to the driver context to
      get back to the usbnet device, but the space will be reclaimed once
      there are no more users of the old API left and we can remove the data
      value and flag from the tasklet struct.
      Signed-off-by: default avatarEmil Renner Berthing <kernel@esmil.dk>
      Link: https://lore.kernel.org/r/20210130234637.26505-1-kernel@esmil.dkSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4f4e5436
    • Geert Uytterhoeven's avatar
      net: fec: Silence M5272 build warnings · 32d1bbb1
      Geert Uytterhoeven authored
      If CONFIG_M5272=y:
      
          drivers/net/ethernet/freescale/fec_main.c: In function ‘fec_restart’:
          drivers/net/ethernet/freescale/fec_main.c:948:6: warning: unused variable ‘val’ [-Wunused-variable]
            948 |  u32 val;
      	  |      ^~~
          drivers/net/ethernet/freescale/fec_main.c: In function ‘fec_get_mac’:
          drivers/net/ethernet/freescale/fec_main.c:1667:28: warning: unused variable ‘pdata’ [-Wunused-variable]
           1667 |  struct fec_platform_data *pdata = dev_get_platdata(&fep->pdev->dev);
      	  |                            ^~~~~
      
      Fix this by moving the variable declarations inside the existing #ifdef
      blocks.
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20210202130650.865023-1-geert@linux-m68k.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      32d1bbb1
    • Eric Dumazet's avatar
      inet: do not export inet_gro_{receive|complete} · fca23f37
      Eric Dumazet authored
      inet_gro_receive() and inet_gro_complete() are part
      of GRO engine which can not be modular.
      
      Similarly, inet_gso_segment() does not need to be exported,
      being part of GSO stack.
      
      In other words, net/ipv6/ip6_offload.o is part of vmlinux,
      regardless of CONFIG_IPV6.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20210202154145.1568451-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fca23f37
    • Jakub Kicinski's avatar
      Merge tag 'mac80211-next-for-net-next-2021-02-02' of... · 0256317a
      Jakub Kicinski authored
      Merge tag 'mac80211-next-for-net-next-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      This time, only RTNL locking reduction fallout.
       - cfg80211_dev_rename() requires RTNL
       - cfg80211_change_iface() and cfg80211_set_encryption()
         require wiphy mutex (was missing in wireless extensions)
       - cfg80211_destroy_ifaces() requires wiphy mutex
       - netdev registration can fail due to notifiers, and then
         notifiers are "unrolled", need to handle this properly
      
      * tag 'mac80211-next-for-net-next-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next:
        cfg80211: fix netdev registration deadlock
        cfg80211: call cfg80211_destroy_ifaces() with wiphy lock held
        wext: call cfg80211_set_encryption() with wiphy lock held
        wext: call cfg80211_change_iface() with wiphy lock held
        nl80211: call cfg80211_dev_rename() under RTNL
      ====================
      
      Link: https://lore.kernel.org/r/20210202144106.38207-1-johannes@sipsolutions.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0256317a
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-updates-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 390d9b56
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2021-02-01
      
      mlx5 netdev updates:
      
      1) Trivial refactoring ahead of the upcoming uplink representor series.
      2) Increased RSS table size to 256, for better results
      3) Misc. Cleanup and very trivial improvements
      
      * tag 'mlx5-updates-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5: DR, Avoid unnecessary csum recalculation on supporting devices
        net/mlx5e: CT: remove useless conversion to PTR_ERR then ERR_PTR
        net/mlx5e: accel, remove redundant space
        net/mlx5e: kTLS, Improve TLS RX workqueue scope
        net/mlx5e: remove h from printk format specifier
        net/mlx5e: Increase indirection RQ table size to 256
        net/mlx5e: Enable napi in channel's activation stage
        net/mlx5e: Move representor neigh init into profile enable
        net/mlx5e: Avoid false lock depenency warning on tc_ht
        net/mlx5e: Move set vxlan nic info to profile init
        net/mlx5e: Move netif_carrier_off() out of mlx5e_priv_init()
        net/mlx5e: Refactor mlx5e_netdev_init/cleanup to mlx5e_priv_init/cleanup
        net/mxl5e: Add change profile method
        net/mlx5e: Separate between netdev objects and mlx5e profiles initialization
      ====================
      
      Link: https://lore.kernel.org/r/20210202065457.613312-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      390d9b56
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-add_addr-enhancements' · a1a809c4
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: ADD_ADDR enhancements
      
      This patch series from the MPTCP tree contains enhancements and
      associated tests for the ADD_ADDR ("add address") MPTCP option. This
      option allows already-connected MPTCP peers to share additional IP
      addresses with each other, which can then be used to create additional
      subflows within those MPTCP connections.
      
      Patches 1 & 2 remove duplicated data in the per-connection path manager
      structure.
      
      Patches 3-6 initiate additional subflows when an address is added using
      the netlink path manager interface and improve ADD_ADDR signaling
      reliability, subject to configured limits. Self tests are also updated.
      
      Patches 7-15 add new support for optional port numbers in ADD_ADDR. This
      includes creating an additional in-kernel TCP listening socket for the
      requested port number, validating the port number when processing
      incoming subflow connections, including the port number in netlink
      interfaces, and adding some new MIBs. New self test cases are added for
      subflows connecting with alternate port numbers.
      ====================
      
      Link: https://lore.kernel.org/r/20210201230920.66027-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a1a809c4
    • Geliang Tang's avatar
      selftests: mptcp: add testcases for ADD_ADDR with port · 8a127bf6
      Geliang Tang authored
      This patch adds testcases for ADD_ADDR with port and the related MIB
      counters check in chk_add_nr. The output looks like this:
      
       24 signal address with port           syn[ ok ] - synack[ ok ] - ack[ ok ]
                                             add[ ok ] - echo  [ ok ] - pt [ ok ]
                                             syn[ ok ] - synack[ ok ] - ack[ ok ]
                                             syn[ ok ] - ack   [ ok ]
       25 subflow and signal with port       syn[ ok ] - synack[ ok ] - ack[ ok ]
                                             add[ ok ] - echo  [ ok ] - pt [ ok ]
                                             syn[ ok ] - synack[ ok ] - ack[ ok ]
                                             syn[ ok ] - ack   [ ok ]
       26 remove single address with port    syn[ ok ] - synack[ ok ] - ack[ ok ]
                                             add[ ok ] - echo  [ ok ] - pt [ ok ]
                                             syn[ ok ] - synack[ ok ] - ack[ ok ]
                                             syn[ ok ] - ack   [ ok ]
                                             rm [ ok ] - sf    [ ok ]
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a127bf6
    • Geliang Tang's avatar
      mptcp: add the mibs for ADD_ADDR with port · 2fbdd9ea
      Geliang Tang authored
      This patch adds the mibs for ADD_ADDR with port:
      
      MPTCP_MIB_PORTADD for received ADD_ADDR suboption with a port number.
      
      MPTCP_MIB_PORTSYNRX, MPTCP_MIB_PORTSYNACKRX, MPTCP_MIB_PORTACKRX, for
      received MP_JOIN's SYN or SYN/ACK or ACK with a port number which is
      different from the msk's port number.
      
      MPTCP_MIB_MISMATCHPORTSYNRX and MPTCP_MIB_MISMATCHPORTACKRX, for
      received SYN or ACK MP_JOIN with a mismatched port-number.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fbdd9ea