1. 25 Jan, 2019 19 commits
    • David S. Miller's avatar
      Merge branch 'tcp_bbr-Improving-TCP-BBR-performance-for-WiFi-and-cellular-networks' · 58e0b4ab
      David S. Miller authored
      Priyaranjan Jha says:
      
      ====================
      tcp_bbr: Improving TCP BBR performance for WiFi and cellular networks
      
      Ack aggregation is quite prevalent with wifi, cellular and cable modem
      link tchnologies, ACK decimation in middleboxes, and common offloading
      techniques such as TSO and GRO, at end hosts. Previously, BBR was often
      cwnd-limited in the presence of severe ACK aggregation, which resulted in
      low throughput due to insufficient data in flight.
      
      To achieve good throughput for wifi and other paths with aggregation, this
      patch series implements an ACK aggregation estimator for BBR, which
      estimates the maximum recent degree of ACK aggregation and adapts cwnd
      based on it. The algorithm is further described by the following
      presentation:
      https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00
      
      (1) A preparatory patch, which refactors bbr_target_cwnd for generic
          inflight provisioning.
      
      (2) Implements BBR ack aggregation estimator and adapts cwnd based
          on measured degree of ACK aggregation.
      ====================
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58e0b4ab
    • Priyaranjan Jha's avatar
      tcp_bbr: adapt cwnd based on ack aggregation estimation · 78dc70eb
      Priyaranjan Jha authored
      Aggregation effects are extremely common with wifi, cellular, and cable
      modem link technologies, ACK decimation in middleboxes, and LRO and GRO
      in receiving hosts. The aggregation can happen in either direction,
      data or ACKs, but in either case the aggregation effect is visible
      to the sender in the ACK stream.
      
      Previously BBR's sending was often limited by cwnd under severe ACK
      aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets
      were acked in bursts after long delays (e.g. one ACK acking 5*BDP after
      5*RTT), BBR's sending was halted after sending 2*BDP over 2*RTT, leaving
      the bottleneck idle for potentially long periods. Note that loss-based
      congestion control does not have this issue because when facing
      aggregation it continues increasing cwnd after bursts of ACKs, growing
      cwnd until the buffer is full.
      
      To achieve good throughput in the presence of aggregation effects, this
      algorithm allows the BBR sender to put extra data in flight to keep the
      bottleneck utilized during silences in the ACK stream that it has evidence
      to suggest were caused by aggregation.
      
      A summary of the algorithm: when a burst of packets are acked by a
      stretched ACK or a burst of ACKs or both, BBR first estimates the expected
      amount of data that should have been acked, based on its estimated
      bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max
      filter to estimate the recent level of observed ACK aggregation. Then cwnd
      is increased by the ACK aggregation estimate. The larger cwnd avoids BBR
      being cwnd-limited in the face of ACK silences that recent history suggests
      were caused by aggregation. As a sanity check, the ACK aggregation degree
      is upper-bounded by the cwnd (at the time of measurement) and a global max
      of BW * 100ms. The algorithm is further described by the following
      presentation:
      https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00
      
      In our internal testing, we observed a significant increase in BBR
      throughput (measured using netperf), in a basic wifi setup.
      - Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi)
      - 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps
      - 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps
      
      Also, this code is running globally on YouTube TCP connections and produced
      significant bandwidth increases for YouTube traffic.
      
      This is based on Ian Swett's max_ack_height_ algorithm from the
      QUIC BBR implementation.
      Signed-off-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78dc70eb
    • Priyaranjan Jha's avatar
      tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning · 232aa8ec
      Priyaranjan Jha authored
      Because bbr_target_cwnd() is really a general-purpose BBR helper for
      computing some volume of inflight data as a function of the estimated
      BDP, refactor it into following helper functions:
      - bbr_bdp()
      - bbr_quantization_budget()
      - bbr_inflight()
      Signed-off-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      232aa8ec
    • Heiner Kallweit's avatar
      r8169: factor out PHY init sequence adjusting 10M and ALDPS · a1ead2ec
      Heiner Kallweit authored
      Few chip versions use the same sequence to adjust 10M and ALDPS, so
      let's factor it out. This patch also fixes a (most likely) typo in
      rtl8168g_1_hw_phy_config. There bit 8 in reg 0x14 on page 0x0bcc
      was set and not cleared. According to the vendor driver this bit
      needs to be cleared in all cases.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1ead2ec
    • Heiner Kallweit's avatar
      r8169: factor out disabling ALDPS · c46863ab
      Heiner Kallweit authored
      Chip versions from RTL8168g onward use the same sequence to disable
      ALDPS (Advanced Link-Down Power Saving). So let's factor this out.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c46863ab
    • Nikolay Aleksandrov's avatar
      bonding: count master 3ad stats separately · 949e7cea
      Nikolay Aleksandrov authored
      I made a dumb mistake when I summed up the slave stats, obviously slaves
      can come and go which would make the master stats unreliable.
      Count and export the master stats separately.
      
      Fixes: a258aeac ("bonding: add support for xstats and export 3ad stats")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      949e7cea
    • David S. Miller's avatar
      Merge branch 'net-phy-improve-starting-PHY' · 2ab64da6
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      net: phy: improve starting PHY
      
      This patch series improves few aspects of starting the PHY.
      
      v2:
      - improve a warning in patch 4
      v3:
      - extend commit message for patch 2
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ab64da6
    • Heiner Kallweit's avatar
      net: phy: change phy_start_interrupts to phy_request_interrupt · 434a4315
      Heiner Kallweit authored
      Now that we enable the interrupts in phy_start() we don't have to do it
      before. Therefore remove enabling interrupts from phy_start_interrupts()
      and rename this function to reflect the changed functionality.
      
      v2:
      - improve warning to clearly state that we fall back to polling
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      434a4315
    • Heiner Kallweit's avatar
      net: phy: start interrupts in phy_start · 9e573cfc
      Heiner Kallweit authored
      Interrupts don't have to be enabled before calling phy_start().
      Therefore let's enable them in phy_start(). In a subsequent step
      we'll remove enabling interrupts from phy_connect_direct().
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e573cfc
    • Heiner Kallweit's avatar
      net: phy: warn if phy_start is called from invalid state · 21796261
      Heiner Kallweit authored
      phy_start() should be called from states PHY_READY or PHY_HALTED only.
      Check for this to detect misbehaving drivers. Also the state machine
      should be started only when being called from one of the valid states.
      
      Some more background:
      For all invalid states phy_start() basically was a no-op. All it did
      was triggering a state machine run, but for all "running" states the
      poll loop was active anyway. And if called from PHY_DOWN, the state
      machine does nothing.
      
      v3:
      - extended commit message
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21796261
    • Heiner Kallweit's avatar
      net: phy: start state machine in phy_start only · a016becd
      Heiner Kallweit authored
      The state machine is a no-op before phy_start() has been called.
      Therefore let's enable it in phy_start() only. In phy_start()
      let's call phy_start_machine() instead of phy_trigger_machine().
      phy_start_machine is an alias for phy_trigger_machine but it makes
      clearer that we start the state machine here instead of just
      triggering a run.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a016becd
    • Wei Yongjun's avatar
      net: stmmac: Fix return value check in qcom_ethqos_probe() · 8f4ebaaa
      Wei Yongjun authored
      In case of error, the function devm_clk_get() returns ERR_PTR() and
      never returns NULL. The NULL test in the return value check should be
      replaced with IS_ERR().
      
      Fixes: a7c30e62 ("net: stmmac: Add driver for Qualcomm ethqos")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Acked-by: default avatarVinod Koul <vkoul@kernel.org>
      Acked-by: default avatarNiklas Cassel <niklas.cassel@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f4ebaaa
    • Colin Ian King's avatar
      net: amd8111e: clean up two minor indentation issues · 843ef94e
      Colin Ian King authored
      Two statements are incorrecly indented, fix these by removing a space.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      843ef94e
    • David S. Miller's avatar
      Merge branch 'ENETC' · 556b2710
      David S. Miller authored
      Claudiu Manoil says:
      
      ====================
      Introduce ENETC ethernet drivers
      
      ENETC is a multi-port virtualized Ethernet controller supporting GbE
      designs and Time-Sensitive Networking (TSN) functionality.
      ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated
      Endpoint (RCIE).  As such, it contains multiple physical (PF) and virtual
      (VF) PCIe functions, discoverable by standard PCI Express.
      
      The patch series adds basic enablement for these otherwise standard
      buffer descriptor (BD) ring based ethernet devices (PCIe PFs and VFs),
      currently included in the 64-bit dual ARMv8 processors LS1028A SoC.
      The driver is portable to 32-bit designs, and it's independent of CPU
      endianness.
      
      Contributors:
      Alex Marginean <alexandru.marginean@nxp.com>
      Catalin Horghidan <catalin.horghidan@nxp.com>
      
      TODO list:
      * IEEE 1588 PTP support;
      * TSN support;
      * MDIO support and VF link management;
      * power management support;
      * flow control support;
      * TC offloading with h/w MQPRIO;
      * interrupt coalescing, configurable BD ring sizes, and other usual
      config options if missing.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      556b2710
    • Claudiu Manoil's avatar
      enetc: Add RFS and RSS support · d382563f
      Claudiu Manoil authored
      A ternary match table is used for RFS. If multiple entries in the table
      match, the entry with the lowest numerical values index is chosen as the
      matching entry.  Entries in the table are identified using an index
      which takes a value from 0 to PRFSCAPR[NUM_RFS]-1 when accessed by the
      PSI (PF).
      Portions of the RFS table can be assigned to each SI by the PSI (PF)
      driver in PSIaRFSCFGR.  Assignments are cumulative, the entries assigned
      to SIn start after those assigned to SIn-1.  The total assignments to
      all SIs must be equal to or less than the number available to the port
      as found in PRFSCAPR.
      
      For RSS, the Toeplitz hash function used requires two inputs, a 40B
      random secret key that is supplied through the PRSSKR0-9 registers as well
      as the relevant pieces of the packet header (n-tuple).  The 6 LSB bits of
      the hash function result will then be used as a pointer to obtain the tag
      referenced in the 64 entry indirection table.  The result will provide a
      winning group which will be used to help route the received packet.
      Signed-off-by: default avatarAlex Marginean <alexandru.marginean@nxp.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d382563f
    • Claudiu Manoil's avatar
      enetc: Add vf to pf messaging support · beb74ac8
      Claudiu Manoil authored
      VSIs (VFs) may send a message to the PSI (PF) for general notification
      or to gain access to hardware resources which requires host inspection.
      These messages may vary in size and are handled as a partition copy
      between two memory regions owned by the respective participants.
      The PSI will respond with fail or success and a 16-bit message code.
      The patch implements the vf to pf messaging mechanism above and, as the
      first application making use of this support, it enables the VF to
      configure its own primary MAC address.
      Signed-off-by: default avatarCatalin Horghidan <catalin.horghidan@nxp.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      beb74ac8
    • Claudiu Manoil's avatar
      enetc: Add ethtool statistics · 16eb4c85
      Claudiu Manoil authored
      This adds most h/w statistics counters: non-privileged SI conters, as
      well as privileged Port and MAC counters available only to the PF.
      Per ring software stats are also included.
      Signed-off-by: default avatarAlex Marginean <alexandru.marginean@nxp.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16eb4c85
    • Claudiu Manoil's avatar
      enetc: Introduce basic PF and VF ENETC ethernet drivers · d4fd0404
      Claudiu Manoil authored
      ENETC is a multi-port virtualized Ethernet controller supporting GbE
      designs and Time-Sensitive Networking (TSN) functionality.
      ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated
      Endpoint (RCIE).  As such, it contains multiple physical (PF) and
      virtual (VF) PCIe functions, discoverable by standard PCI Express.
      
      Introduce basic PF and VF ENETC ethernet drivers.  The PF has access to
      the ENETC Port registers and resources and makes the required privileged
      configurations for the underlying VF devices.  Common functionality is
      controlled through so called System Interface (SI) register blocks, PFs
      and VFs own a SI each.  Though SI register blocks are almost identical,
      there are a few privileged SI level controls that are accessible only to
      PFs, and so the distinction is made between PF SIs (PSI) and VF SIs (VSI).
      As such, the bulk of the code, including datapath processing, basic h/w
      offload support and generic pci related configuration, is shared between
      the 2 drivers and is factored out in common source files (i.e. enetc.c).
      
      Major functionalities included (for both drivers):
      MSI-X support for Rx and Tx processing, assignment of Rx/Tx BD ring pairs
      to MSI-X entries, multi-queue support, Rx S/G (Rx frame fragmentation) and
      jumbo frame (up to 9600B) support, Rx paged allocation and reuse, Tx S/G
      support (NETIF_F_SG), Rx and Tx checksum offload, PF MAC filtering and
      initial control ring support, VLAN extraction/ insertion, PF Rx VLAN
      CTAG filtering, VF mac address config support, VF VLAN isolation support,
      etc.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4fd0404
    • Tariq Toukan's avatar
      net/mlx4_core: A write memory barrier is sufficient in EQ ci update · 5e5b9f62
      Tariq Toukan authored
      Soften the memory barrier call of mb() by a sufficient wmb() in the
      consumer index update of the event queues.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e5b9f62
  2. 23 Jan, 2019 21 commits