1. 17 Jun, 2019 18 commits
    • David S. Miller's avatar
      Merge branch 'UDP-GSO-audit-tests' · f97252a8
      David S. Miller authored
      Fred Klassen says:
      
      ====================
      UDP GSO audit tests
      
      Updates to UDP GSO selftests ot optionally stress test CMSG
      subsytem, and report the reliability and performance of both
      TX Timestamping and ZEROCOPY messages.
      ====================
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f97252a8
    • Fred Klassen's avatar
      net/udpgso_bench.sh test fails on error · 4ffc37f5
      Fred Klassen authored
      Ensure that failure on any individual test results in an overall
      failure of the test script.
      Signed-off-by: default avatarFred Klassen <fklassen@appneta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ffc37f5
    • Fred Klassen's avatar
      net/udpgso_bench.sh add UDP GSO audit tests · ade90d69
      Fred Klassen authored
      Audit tests count the total number of messages sent and compares
      with total number of CMSG received on error queue. Example:
      
          udp gso zerocopy timestamp audit
          udp rx:   1599 MB/s  1166414 calls/s
          udp tx:   1615 MB/s    27395 calls/s  27395 msg/s
          udp rx:   1634 MB/s  1192261 calls/s
          udp tx:   1633 MB/s    27699 calls/s  27699 msg/s
          udp rx:   1633 MB/s  1191358 calls/s
          udp tx:   1631 MB/s    27678 calls/s  27678 msg/s
          Summary over 4.000 seconds...
          sum udp tx:   1665 MB/s      82772 calls (27590/s)      82772 msgs (27590/s)
          Tx Timestamps:               82772 received                 0 errors
          Zerocopy acks:               82772 received
      
      Errors are thrown if CMSG count does not equal send count,
      example:
      
          Summary over 4.000 seconds...
          sum tcp tx:   7451 MB/s     493706 calls (123426/s)     493706 msgs (123426/s)
          ./udpgso_bench_tx: Unexpected number of Zerocopy completions:    493706 expected    493704 received
      
      Also reduce individual test time from 4 to 3 seconds so that
      overall test time does not increase significantly.
      
      v3: Enhancements as per Willem de Bruijn <willemb@google.com>
          - document -P option for TCP audit
      Signed-off-by: default avatarFred Klassen <fklassen@appneta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ade90d69
    • Fred Klassen's avatar
      net/udpgso_bench_tx: options to exercise TX CMSG · 79ebc3c2
      Fred Klassen authored
      This enhancement adds options that facilitate load testing with
      additional TX CMSG options, and to optionally print results of
      various send CMSG operations.
      
      These options are especially useful in isolating situations
      where error-queue messages are lost when combined with other
      CMSG operations (e.g. SO_ZEROCOPY).
      
      New options:
          -a - count all CMSG messages and match to sent messages
          -T - add TX CMSG that requests TX software timestamps
          -H - similar to -T except request TX hardware timestamps
          -P - call poll() before reading error queue
          -v - print detailed results
      
      v2: Enhancements as per Willem de Bruijn <willemb@google.com>
          - Updated control and buffer parameters for recvmsg
          - poll() parameter cleanup
          - fail on bad audit results
          - remove TOS options
          - improved reporting
      
      v3: Enhancements as per Willem de Bruijn <willemb@google.com>
          - add SOF_TIMESTAMPING_OPT_TSONLY to eliminate MSG_TRUNC
          - general code cleanup
      Signed-off-by: default avatarFred Klassen <fklassen@appneta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79ebc3c2
    • David S. Miller's avatar
      Merge branch 'net-ipv4-remove-erroneous-advancement-of-list-pointer' · 4bd366ce
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      net: ipv4: remove erroneous advancement of list pointer
      
      Tariq reported a soft lockup on net-next that Mellanox was able to
      bisect to 2638eb8b ("net: ipv4: provide __rcu annotation for ifa_list").
      
      While reviewing above patch I found a regression when addresses have a
      lifetime specified.
      
      Second patch extends rtnetlink.sh to trigger crash
      (without first patch applied).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bd366ce
    • Florian Westphal's avatar
      selftests: rtnetlink: add addresses with fixed life time · 3cfa1488
      Florian Westphal authored
      This exercises kernel code path that deal with addresses that have
      a limited lifetime.
      
      Without previous fix, this triggers following crash on net-next:
       BUG: KASAN: null-ptr-deref in check_lifetime+0x403/0x670
       Read of size 8 at addr 0000000000000010 by task kworker [..]
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cfa1488
    • Florian Westphal's avatar
      net: ipv4: remove erroneous advancement of list pointer · 40008e92
      Florian Westphal authored
      Causes crash when lifetime expires on an adress as garbage is
      dereferenced soon after.
      
      This used to look like this:
      
       for (ifap = &ifa->ifa_dev->ifa_list;
            *ifap != NULL; ifap = &(*ifap)->ifa_next) {
                if (*ifap == ifa) ...
      
      but this was changed to:
      
      struct in_ifaddr *tmp;
      
      ifap = &ifa->ifa_dev->ifa_list;
      tmp = rtnl_dereference(*ifap);
      while (tmp) {
         tmp = rtnl_dereference(tmp->ifa_next); // Bogus
         if (rtnl_dereference(*ifap) == ifa) {
           ...
         ifap = &tmp->ifa_next;		// Can be NULL
         tmp = rtnl_dereference(*ifap);	// Dereference
         }
      }
      
      Remove the bogus assigment/list entry skip.
      
      Fixes: 2638eb8b ("net: ipv4: provide __rcu annotation for ifa_list")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40008e92
    • Arnd Bergmann's avatar
      net: dsa: sja1105: fix ptp link error · 78fe8a28
      Arnd Bergmann authored
      Due to a reversed dependency, it is possible to build
      the lower ptp driver as a loadable module and the actual
      driver using it as built-in, causing a link error:
      
      drivers/net/dsa/sja1105/sja1105_spi.o: In function `sja1105_static_config_upload':
      sja1105_spi.c:(.text+0x6f0): undefined reference to `sja1105_ptp_reset'
      drivers/net/dsa/sja1105/sja1105_spi.o:(.data+0x2d4): undefined reference to `sja1105et_ptp_cmd'
      drivers/net/dsa/sja1105/sja1105_spi.o:(.data+0x604): undefined reference to `sja1105pqrs_ptp_cmd'
      drivers/net/dsa/sja1105/sja1105_main.o: In function `sja1105_remove':
      sja1105_main.c:(.text+0x8d4): undefined reference to `sja1105_ptp_clock_unregister'
      drivers/net/dsa/sja1105/sja1105_main.o: In function `sja1105_rxtstamp_work':
      sja1105_main.c:(.text+0x964): undefined reference to `sja1105_tstamp_reconstruct'
      drivers/net/dsa/sja1105/sja1105_main.o: In function `sja1105_setup':
      sja1105_main.c:(.text+0xb7c): undefined reference to `sja1105_ptp_clock_register'
      drivers/net/dsa/sja1105/sja1105_main.o: In function `sja1105_port_deferred_xmit':
      sja1105_main.c:(.text+0x1fa0): undefined reference to `sja1105_ptpegr_ts_poll'
      sja1105_main.c:(.text+0x1fc4): undefined reference to `sja1105_tstamp_reconstruct'
      drivers/net/dsa/sja1105/sja1105_main.o:(.rodata+0x5b0): undefined reference to `sja1105_get_ts_info'
      
      Change the Makefile logic to always build the ptp module
      the same way as the rest. Another option would be to
      just add it to the same module and remove the exports,
      but I don't know if there was a good reason to keep them
      separate.
      
      Fixes: bb77f36a ("net: dsa: sja1105: Add support for the PTP clock")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78fe8a28
    • Arnd Bergmann's avatar
      net: stmmac: fix unused-variable warning · c63d1e5c
      Arnd Bergmann authored
      When building without CONFIG_OF, we get a harmless build warning:
      
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_phy_setup':
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:973:22: error: unused variable 'node' [-Werror=unused-variable]
        struct device_node *node = priv->plat->phy_node;
      
      Reword it so we always use the local variable, by making it the
      fwnode pointer instead of the device_node.
      
      Fixes: 74371272 ("net: stmmac: Convert to phylink and remove phylib logic")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c63d1e5c
    • Jiri Pirko's avatar
      net: sched: cls_matchall: allow to delete filter · f517f271
      Jiri Pirko authored
      Currently user is unable to delete the filter. See following example:
      $ tc filter add dev ens16np1 ingress pref 1 handle 1 matchall action drop
      $ tc filter show dev ens16np1 ingress
      filter protocol all pref 1 matchall chain 0
      filter protocol all pref 1 matchall chain 0 handle 0x1
        in_hw
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1
      
      $ tc filter del dev ens16np1 ingress pref 1 handle 1 matchall action drop
      RTNETLINK answers: Operation not supported
      
      Implement tcf_proto_ops->delete() op and allow user to delete the filter.
      Reported-by: default avatarEli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f517f271
    • Colin Ian King's avatar
      net: hns3: fix dereference of ae_dev before it is null checked · ad9bf545
      Colin Ian King authored
      Pointer ae_dev is null checked however, prior to that it is dereferenced
      when assigned pointer ops. Fix this by assigning pointer ops after ae_dev
      has been null checked.
      
      Addresses-Coverity: ("Dereference before null check")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad9bf545
    • David S. Miller's avatar
      Merge branch 'net-sched-act_ctinfo-fixes' · 43321251
      David S. Miller authored
      Kevin Darbyshire-Bryant says:
      
      ====================
      net: sched: act_ctinfo: fixes
      
      This is first attempt at sending a small series.  Order is important
      because one bug (policy validation) prevents us from encountering the
      more important 'OOPS' generating bug in action creation.  Fix the OOPS
      first.
      
      Confession time: Until very recently, development of this module has
      been done on 'net-next' tree to 'clean compile' level with run-time
      testing on backports to 4.14 & 4.19 kernels under openwrt.  It turns out
      that sched: action: based code has been under more active change than I
      realised.
      
      During the back & forward porting during development & testing, the
      critical ACT_P_CREATED return code got missed despite being in the 4.14
      & 4.19 backports.  I have now gone through the init functions, using
      act_csum as reference with a fine toothed comb and am happy they do the
      same things.
      
      This issue hadn't been caught till now due to another issue caused by
      new strict nla_parse_nested function failing parsing validation before
      action creation.
      
      Thanks to Marcelo Leitner <marcelo.leitner@gmail.com> for flagging
      extack deficiency (fixed in 733f0766 sched: act_ctinfo: use extack
      error reporting) which led to b424e432 ("netlink: add validation of
      NLA_F_NESTED flag") and 8cb08174 ("netlink: make validation more
      configurable for future strictness”) which led to the policy validation
      fix, which then led to the action creation fix both contained in this
      series.
      
      If I ever get to a developer conference please feel free to
      tar/feather/apply cone of shame.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43321251
    • Kevin Darbyshire-Bryant's avatar
      net: sched: act_ctinfo: fix policy validation · c197d636
      Kevin Darbyshire-Bryant authored
      Fix nla_policy definition by specifying an exact length type attribute
      to CTINFO action paraneter block structure.  Without this change,
      netlink parsing will fail validation and the action will not be
      instantiated.
      
      8cb08174 ("netlink: make validation more configurable for future")
      introduced much stricter checking to attributes being passed via
      netlink.  Existing actions were updated to use less restrictive
      deprecated versions of nla_parse_nested.
      
      As a new module, act_ctinfo should be designed to use the strict
      checking model otherwise, well, what was the point of implementing it.
      
      Confession time: Until very recently, development of this module has
      been done on 'net-next' tree to 'clean compile' level with run-time
      testing on backports to 4.14 & 4.19 kernels under openwrt.  This is how
      I managed to miss the run-time impacts of the new strict
      nla_parse_nested function.  I hopefully have learned something from this
      (glances toward laptop running a net-next kernel)
      
      There is however a still outstanding implication on iproute2 user space
      in that it needs to be told to pass nested netlink messages with the
      nested attribute actually set.  So even with this kernel fix to do
      things correctly you still cannot instantiate a new 'strict'
      nla_parse_nested based action such as act_ctinfo with iproute2's tc.
      Signed-off-by: default avatarKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c197d636
    • Kevin Darbyshire-Bryant's avatar
      net: sched: act_ctinfo: fix action creation · a658c2e4
      Kevin Darbyshire-Bryant authored
      Use correct return value on action creation: ACT_P_CREATED.
      
      The use of incorrect return value could result in a situation where the
      system thought a ctinfo module was listening but actually wasn't
      instantiated correctly leading to an OOPS in tcf_generic_walker().
      
      Confession time: Until very recently, development of this module has
      been done on 'net-next' tree to 'clean compile' level with run-time
      testing on backports to 4.14 & 4.19 kernels under openwrt.  During the
      back & forward porting during development & testing, the critical
      ACT_P_CREATED return code got missed despite being in the 4.14 & 4.19
      backports.  I have now gone through the init functions, using act_csum
      as reference with a fine toothed comb.  Bonus, no more OOPSes.  I
      managed to also miss this issue till now due to the new strict
      nla_parse_nested function failing validation before action creation.
      
      As an inexperienced developer I've learned that
      copy/pasting/backporting/forward porting code correctly is hard.  If I
      ever get to a developer conference I shall don the cone of shame.
      Signed-off-by: default avatarKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a658c2e4
    • Jason Wang's avatar
      vhost_net: disable zerocopy by default · 098eadce
      Jason Wang authored
      Vhost_net was known to suffer from HOL[1] issues which is not easy to
      fix. Several downstream disable the feature by default. What's more,
      the datapath was split and datacopy path got the support of batching
      and XDP support recently which makes it faster than zerocopy part for
      small packets transmission.
      
      It looks to me that disable zerocopy by default is more
      appropriate. It cold be enabled by default again in the future if we
      fix the above issues.
      
      [1] https://patchwork.kernel.org/patch/3787671/Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      098eadce
    • Ard Biesheuvel's avatar
      net: ipv4: move tcp_fastopen server side code to SipHash library · c681edae
      Ard Biesheuvel authored
      Using a bare block cipher in non-crypto code is almost always a bad idea,
      not only for security reasons (and we've seen some examples of this in
      the kernel in the past), but also for performance reasons.
      
      In the TCP fastopen case, we call into the bare AES block cipher one or
      two times (depending on whether the connection is IPv4 or IPv6). On most
      systems, this results in a call chain such as
      
        crypto_cipher_encrypt_one(ctx, dst, src)
          crypto_cipher_crt(tfm)->cit_encrypt_one(crypto_cipher_tfm(tfm), ...);
            aesni_encrypt
              kernel_fpu_begin();
              aesni_enc(ctx, dst, src); // asm routine
              kernel_fpu_end();
      
      It is highly unlikely that the use of special AES instructions has a
      benefit in this case, especially since we are doing the above twice
      for IPv6 connections, instead of using a transform which can process
      the entire input in one go.
      
      We could switch to the cbcmac(aes) shash, which would at least get
      rid of the duplicated overhead in *some* cases (i.e., today, only
      arm64 has an accelerated implementation of cbcmac(aes), while x86 will
      end up using the generic cbcmac template wrapping the AES-NI cipher,
      which basically ends up doing exactly the above). However, in the given
      context, it makes more sense to use a light-weight MAC algorithm that
      is more suitable for the purpose at hand, such as SipHash.
      
      Since the output size of SipHash already matches our chosen value for
      TCP_FASTOPEN_COOKIE_SIZE, and given that it accepts arbitrary input
      sizes, this greatly simplifies the code as well.
      
      NOTE: Server farms backing a single server IP for load balancing purposes
            and sharing a single fastopen key will be adversely affected by
            this change unless all systems in the pool receive their kernel
            upgrades at the same time.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c681edae
    • Tuong Lien's avatar
      tipc: include retrans failure detection for unicast · 6a6b5c8b
      Tuong Lien authored
      In patch series, commit 9195948f ("tipc: improve TIPC throughput by
      Gap ACK blocks"), as for simplicity, the repeated retransmit failures'
      detection in the function - "tipc_link_retrans()" was kept there for
      broadcast retransmissions only.
      
      This commit now reapplies this feature for link unicast retransmissions
      that has been done via the function - "tipc_link_advance_transmq()".
      
      Also, the "tipc_link_retrans()" is renamed to "tipc_link_bc_retrans()"
      as it is used only for broadcast.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.se>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a6b5c8b
    • Hangbin Liu's avatar
      team: add ethtool get_link_ksettings · 9ed68ca0
      Hangbin Liu authored
      Like bond, add ethtool get_link_ksettings to show the total speed.
      
      v2: no update, just repost.
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ed68ca0
  2. 16 Jun, 2019 12 commits
  3. 15 Jun, 2019 10 commits