1. 25 Sep, 2018 3 commits
  2. 24 Sep, 2018 13 commits
  3. 23 Sep, 2018 5 commits
  4. 22 Sep, 2018 19 commits
    • David S. Miller's avatar
      Merge branch 'net-dsa-b53-SGMII-modes-fixes' · bd4d08da
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: dsa: b53: SGMII modes fixes
      
      Here are two additional fixes that are required in order for SGMII to
      work correctly. This was discovered with using a copper SFP which would
      make us use SGMII mode, we would actually leave the HW configured in its
      default mode: Fiber.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd4d08da
    • Florian Fainelli's avatar
      net: dsa: b53: Also include SGMII for mac_config and mac_link_state · 55a4d2ea
      Florian Fainelli authored
      In both 802.3z and SGMII modes we need to configure the MAC accordingly
      to flip between Fiber and SGMII modes, and we need to read the MAC
      status from the SGMII in-band control word.
      
      Fixes: 0e01491d ("net: dsa: b53: Add SerDes support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55a4d2ea
    • Florian Fainelli's avatar
      net: dsa: b53: Fix B53_SERDES_DIGITAL_CONTROL offset · 2cae8c07
      Florian Fainelli authored
      Maths went wrong, to get 0x20, we need to do 0x1e + (x) * 2, not 0x18,
      fix that offset so we access the correct registers. This would make us
      not access the correct SerDes Digital control words, status would be
      fine and so we would not be correctly flipping between Fiber and SGMII
      modes resulting in incorrect status words being pulled into the SerDes
      digital status register.
      
      Fixes: 0e01491d ("net: dsa: b53: Add SerDes support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cae8c07
    • Florian Fainelli's avatar
      net: dsa: b53: Don't assign autonegotiation enabled · e24cf6b3
      Florian Fainelli authored
      PHYLINK takes care of filing the right information into
      state->an_enabled, get rid of the read from the SerDes's BMCR register.
      
      Fixes: 0e01491d ("net: dsa: b53: Add SerDes support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e24cf6b3
    • Nathan Chancellor's avatar
      decnet: Remove unnecessary check for dev->name · 5b9b0a80
      Nathan Chancellor authored
      Clang warns that the address of a pointer will always evaluated as true
      in a boolean context.
      
      net/decnet/dn_dev.c:1366:10: warning: address of array 'dev->name' will
      always evaluate to 'true' [-Wpointer-bool-conversion]
                                      dev->name ? dev->name : "???",
                                      ~~~~~^~~~ ~
      1 warning generated.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/116Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b9b0a80
    • Peter Oskolkov's avatar
      selftests/net: add ipv6 tests to ip_defrag selftest · bccc1711
      Peter Oskolkov authored
      This patch adds ipv6 defragmentation tests to ip_defrag selftest,
      to complement existing ipv4 tests.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bccc1711
    • Peter Oskolkov's avatar
      net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net · 83619623
      Peter Oskolkov authored
      Currently, ip[6]frag_high_thresh sysctl values in new namespaces are
      hard-limited to those of the root/init ns.
      
      There are at least two use cases when it would be desirable to
      set the high_thresh values higher in a child namespace vs the global hard
      limit:
      
      - a security/ddos protection policy may lower the thresholds in the
        root/init ns but allow for a special exception in a child namespace
      - testing: a test running in a namespace may want to set these
        thresholds higher in its namespace than what is in the root/init ns
      
      The new behavior:
      
       # ip netns add testns
       # ip netns exec testns bash
      
       # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl net.ipv4.ipfrag_high_thresh
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
       net.ipv6.ip6frag_high_thresh = 9000000
      
       # sysctl net.ipv6.ip6frag_high_thresh
       net.ipv6.ip6frag_high_thresh = 9000000
      
      The old behavior:
      
       # ip netns add testns
       # ip netns exec testns bash
      
       # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
       net.ipv4.ipfrag_high_thresh = 9000000
      
       # sysctl net.ipv4.ipfrag_high_thresh
       net.ipv4.ipfrag_high_thresh = 4194304
      
       # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
       net.ipv6.ip6frag_high_thresh = 9000000
      
       # sysctl net.ipv6.ip6frag_high_thresh
       net.ipv6.ip6frag_high_thresh = 4194304
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83619623
    • Peter Oskolkov's avatar
      ipv6: discard IP frag queue on more errors · 2475f59c
      Peter Oskolkov authored
      This is similar to how ipv4 now behaves:
      commit 0ff89efb ("ip: fail fast on IP defrag errors").
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2475f59c
    • Eric Dumazet's avatar
      net/ipv4: avoid compile error in fib_info_nh_uses_dev · 075e264f
      Eric Dumazet authored
      net/ipv4/fib_frontend.c: In function 'fib_info_nh_uses_dev':
      net/ipv4/fib_frontend.c:322:6: error: unused variable 'ret' [-Werror=unused-variable]
      cc1: all warnings being treated as errors
      
      Fixes: 78f2756c ("net/ipv4: Move device validation to helper")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      075e264f
    • David S. Miller's avatar
      Merge branch 'tcp-switch-to-Early-Departure-Time-model' · a88e24f2
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: switch to Early Departure Time model
      
      In the early days, pacing has been implemented in sch_fq (FQ)
      in a generic way :
      
      - SO_MAX_PACING_RATE could be used by any sockets.
      
      - TCP would vary effective pacing rate based on CWND*MSS/SRTT
      
      - FQ would ensure delays between packets based on current
        sk->sk_pacing_rate, but with some quantum based artifacts.
        (inflating RPC tail latencies)
      
      - BBR then tweaked the pacing rate in its various phases
        (PROBE, DRAIN, ...)
      
      This worked reasonably well, but had the side effect that TCP RTT
      samples would be inflated by the sojourn time of the packets in FQ.
      
      Also note that when FQ is not used and TCP wants pacing, the
      internal pacing fallback has very different behavior, since TCP
      emits packets at the time they should be sent (with unreasonable
      assumptions about scheduling costs)
      
      Van Jacobson gave a talk at Netdev 0x12 in Montreal, about letting
      TCP (or applications for UDP messages) decide of the Earliest
      Departure Time, instead of letting packet schedulers derive it
      from pacing rate.
      
      https://www.netdevconf.org/0x12/session.html?evolving-from-afap-teaching-nics-about-time
      https://www.files.netdevconf.org/d/46def75c2ef345809bbe/files/?p=/Evolving%20from%20AFAP%20%E2%80%93%20Teaching%20NICs%20about%20time.pdf
      
      Recent additions in linux provided SO_TXTIME and a new ETF qdisc
      supporting the new skb->tstamp role
      
      This patch series converts TCP and FQ to the same model.
      
      This might in the future allow us to relax tight TSQ limits
      (if FQ is present in the output path), and thus lower
      number of callbacks to tcp_write_xmit(), thanks to batching.
      
      This will be followed by FQ change allowing SO_TXTIME support
      so that QUIC servers can let the pacing being done in FQ (or
      offloaded if network device permits)
      
      For example, a TCP flow rated at 24Mbps now shows a more meaningful RTT
      
      Before :
      
      ESTAB  0  211408 10.246.7.151:41558   10.246.7.152:33723
      	 cubic wscale:8,8 rto:203 rtt:2.195/0.084 mss:1448 rcvmss:536
        advmss:1448 cwnd:20 ssthresh:20 bytes_acked:36897937
        segs_out:25488 segs_in:12454 data_segs_out:25486
        send 105.5Mbps lastsnd:1 lastrcv:12851 lastack:1
        pacing_rate 24.0Mbps/24.0Mbps delivery_rate 22.9Mbps
        busy:12851ms unacked:4 rcv_space:29200 notsent:205616 minrtt:0.026
      
      After :
      
      ESTAB  0  192584 10.246.7.151:61612   10.246.7.152:34375
      	 cubic wscale:8,8 rto:201 rtt:0.165/0.129 mss:1448 rcvmss:536
        advmss:1448 cwnd:20 ssthresh:20 bytes_acked:170755401
        segs_out:117931 segs_in:57651 data_segs_out:117929
        send 1404.1Mbps lastsnd:1 lastrcv:56915 lastack:1
        pacing_rate 24.0Mbps/24.0Mbps delivery_rate 24.2Mbps
        busy:56915ms unacked:4 rcv_space:29200 notsent:186792 minrtt:0.054
      
      A nice side effect of this patch series is a reduction of max/p99
      latencies of RPC workloads, since the FQ quantum no longer adds
      artifact.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a88e24f2
    • Eric Dumazet's avatar
      net_sched: sch_fq: remove dead code dealing with retransmits · 90caf67b
      Eric Dumazet authored
      With the earliest departure time model, we no longer plan
      special casing TCP retransmits. We therefore remove dead
      code (since most compilers understood skb_is_retransmit()
      was false)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90caf67b
    • Eric Dumazet's avatar
      tcp: switch tcp_internal_pacing() to tcp_wstamp_ns · c092dd5f
      Eric Dumazet authored
      Now TCP keeps track of tcp_wstamp_ns, recording the earliest
      departure time of next packet, we can remove duplicate code
      from tcp_internal_pacing()
      
      This removes one ktime_get_tai_ns() call, and a divide.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c092dd5f
    • Eric Dumazet's avatar
      tcp: switch tcp and sch_fq to new earliest departure time model · ab408b6d
      Eric Dumazet authored
      TCP keeps track of tcp_wstamp_ns by itself, meaning sch_fq
      no longer has to do it.
      
      Thanks to this model, TCP can get more accurate RTT samples,
      since pacing no longer inflates them.
      
      This has the nice effect of removing some delays caused by FQ
      quantum mechanism, causing inflated max/P99 latencies.
      
      Also we might relax TCP Small Queue tight limits in the future,
      since this new model allow TCP to build bigger batches, since
      sch_fq (or a device with earliest departure time offload) ensure
      these packets will be delivered on time.
      
      Note that other protocols are not converted (they will probably
      never be) so sch_fq has still support for SO_MAX_PACING_RATE
      
      Tested:
      
      Test showing FQ pacing quantum artifact for low-rate flows,
      adding unexpected throttles for RPC flows, inflating max and P99 latencies.
      
      The parameters chosen here are to show what happens typically when
      a TCP flow has a reduced pacing rate (this can be caused by a reduced
      cwin after few losses, or/and rtt above few ms)
      
      MIBS="MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY,P99_LATENCY,STDDEV_LATENCY"
      Before :
      $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
       Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
      19,82.78,5279,3825,482.02
      
      After :
      $ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
      Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
      20,49.94,128,63,3.18
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab408b6d
    • Eric Dumazet's avatar
      tcp: switch internal pacing timer to CLOCK_TAI · fd2bca2a
      Eric Dumazet authored
      Next patch will use tcp_wstamp_ns to feed internal
      TCP pacing timer, so switch to CLOCK_TAI to share same base.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd2bca2a
    • Eric Dumazet's avatar
      tcp: provide earliest departure time in skb->tstamp · d3edd06e
      Eric Dumazet authored
      Switch internal TCP skb->skb_mstamp to skb->skb_mstamp_ns,
      from usec units to nsec units.
      
      Do not clear skb->tstamp before entering IP stacks in TX,
      so that qdisc or devices can implement pacing based on the
      earliest departure time instead of socket sk->sk_pacing_rate
      
      Packets are fed with tcp_wstamp_ns, and following patch
      will update tcp_wstamp_ns when both TCP and sch_fq switch to
      the earliest departure time mechanism.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3edd06e
    • Eric Dumazet's avatar
      tcp: add tcp_wstamp_ns socket field · 9799ccb0
      Eric Dumazet authored
      TCP will soon provide earliest departure time on TX skbs.
      It needs to track this in a new variable.
      
      tcp_mstamp_refresh() needs to update this variable, and
      became too big to stay an inline.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9799ccb0
    • Eric Dumazet's avatar
      net_sched: sch_fq: switch to CLOCK_TAI · 142537e4
      Eric Dumazet authored
      TCP will soon provide per skb->tstamp with earliest departure time,
      so that sch_fq does not have to determine departure time by looking
      at socket sk_pacing_rate.
      
      We chose in linux-4.19 CLOCK_TAI as the clock base for transports,
      qdiscs, and NIC offloads.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      142537e4
    • Eric Dumazet's avatar
      tcp: introduce tcp_skb_timestamp_us() helper · 2fd66ffb
      Eric Dumazet authored
      There are few places where TCP reads skb->skb_mstamp expecting
      a value in usec unit.
      
      skb->tstamp (aka skb->skb_mstamp) will soon store CLOCK_TAI nsec value.
      
      Add tcp_skb_timestamp_us() to provide proper conversion when needed.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fd66ffb
    • Eric Dumazet's avatar
      tcp: switch tcp_clock_ns() to CLOCK_TAI base · 72b0094f
      Eric Dumazet authored
      TCP pacing is either implemented in sch_fq or internally.
      We have the goal of being able to offload pacing on the NICS.
      
      TCP will soon provide per skb skb->tstamp as early departure time.
      
      Like ETF in commit 25db26a9 ("net/sched: Introduce the ETF Qdisc")
      we chose CLOCK_T as the clock base, so that TCP and pacers can share
      a common clock, to get better RTT samples (without pacing artificially
      inflating these samples).
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72b0094f