1. 17 Jan, 2017 3 commits
    • David S. Miller's avatar
      Merge branch 'mvneta-xmit_more-bql' · b8128c42
      David S. Miller authored
      Marcin Wojtas says:
      
      ====================
      mvneta xmit_more and bql support
      
      This is a delayed v2 of short patchset, which introduces xmit_more and BQL
      to mvneta driver. The only one change was added in xmit_more support -
      condition check preventing excessive descriptors concatenation before
      flushing in HW.
      
      Any comments or feedback would be welcome.
      
      Changelog:
      v1 -> v2:
      
      * Add checking condition that ensures too much descriptors are not
        concatenated before flushing in HW.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8128c42
    • Marcin Wojtas's avatar
      net: mvneta: add BQL support · a29b6235
      Marcin Wojtas authored
      Tests showed that when whole bandwidth is consumed, the latency for
      various kind of traffic can reach high values. With saturated
      link (e.g. with iperf from target to host) simple ping could take
      significant amount of time. BQL proved to improve this situation
      when implemented in mvneta driver. Measurements of ping latency
      for 3 link speeds:
      Speed | Latency w/o BQL | Latency with BQL
      10    |      7-14 ms    |     3.5 ms
      100   |      2-12 ms    |     0.6 ms
      1000  |   often timeout |   up to 2ms
      
      Decreasing latency as above result in sligt performance cost - 4kpps
      (-1.4%) when pushing 64B packets via two bridged interfaces of Armada 38x.
      For 1500B packets in the same setup, the mpstat tool showed +8% of
      CPU occupation (default affinity, second CPU idle). Even though this
      cost seems reasonable to take, considering other improvements.
      
      This commit adds byte queue limit mechanism for the mvneta driver.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a29b6235
    • Simon Guinot's avatar
      net: mvneta: add xmit_more support · 2a90f7e1
      Simon Guinot authored
      Basing on xmit_more flag of the skb, TX descriptors can be concatenated
      before flushing. This commit delay Tx descriptor flush if the queue is
      running and if there is more skb's to send.
      
      A maximum allowed number of descriptors for flushing at once due to
      MVNETA_TXQ_UPDATE_REG(q) reqisters limitation, is 255. Because of that
      a new macro was added (MVNETA_TXQ_DEC_SENT_MASK) in order to ensure that
      concatenated amount of descriptor does not exceed that value.
      Signed-off-by: default avatarSimon Guinot <simon.guinot@sequanux.org>
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a90f7e1
  2. 16 Jan, 2017 15 commits
  3. 14 Jan, 2017 22 commits
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-davem-2017-01-13' of... · bb60b8b3
      David S. Miller authored
      Merge tag 'mac80211-next-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      For 4.11, we seem to have more than in the past few releases:
       * socket owner support for connections, so when the wifi
         manager (e.g. wpa_supplicant) is killed, connections are
         torn down - wpa_supplicant is critical to managing certain
         operations, and can opt in to this where applicable
       * minstrel & minstrel_ht updates to be more efficient (time and space)
       * set wifi_acked/wifi_acked_valid for skb->destructor use in the
         kernel, which was already available to userspace
       * don't indicate new mesh peers that might be used if there's no
         room to add them
       * multicast-to-unicast support in mac80211, for better medium usage
         (since unicast frames can use *much* higher rates, by ~3 orders of
         magnitude)
       * add API to read channel (frequency) limitations from DT
       * add infrastructure to allow randomizing public action frames for
         MAC address privacy (still requires driver support)
       * many cleanups and small improvements/fixes across the board
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb60b8b3
    • Shyam Saini's avatar
      cxgb4: Remove redundant memset before memcpy · ca4b5eb8
      Shyam Saini authored
      The region set by the call to memset, immediately overwritten by
      the subsequent call to memcpy and thus makes the  memset redundant.
      
      Also remove the memset((&info, 0, sizeof(info)) on line 398 because
      info is memcpy()'ed to before being used in the loop and it isn't
      used outside of the loop.
      Signed-off-by: default avatarShyam Saini <mayhs11saini@gmail.com>
      Reviewed-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca4b5eb8
    • Ganesh Goudar's avatar
      cxgb4: Fix misleading packet/frame count stats. · f750e82e
      Ganesh Goudar authored
      Do not count pause frames as part of general TX/RX frame
      counters.
      
      Based on the original work of Casey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f750e82e
    • David S. Miller's avatar
      Merge branch 'bnxt_en-next' · 4b89aa3c
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Misc. updates for net-next.
      
      Miscellaneous updates including firmware spec update, ethtool -p blinking
      LED support, RDMA SRIOV config callback, and minor fixes.
      
      v2: Dropped the DCBX RoCE app TLV patch until the ETH_P_IBOE RDMA patch
      is merged.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b89aa3c
    • Michael Chan's avatar
      bnxt_en: Add the ulp_sriov_cfg hooks for bnxt_re RDMA driver. · 2f593846
      Michael Chan authored
      Add the ulp_sriov_cfg callbacks when the number of VFs is changing.  This
      allows the RDMA driver to provision RDMA resources for the VFs.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f593846
    • Michael Chan's avatar
      bnxt_en: Add support for ethtool -p. · 5ad2cbee
      Michael Chan authored
      Add LED blinking code to support ethtool -p on the PF.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ad2cbee
    • Michael Chan's avatar
    • Michael Chan's avatar
      bnxt_en: Clear TPA flags when BNXT_FLAG_NO_AGG_RINGS is set. · 341138c3
      Michael Chan authored
      Commit bdbd1eb5 ("bnxt_en: Handle no aggregation ring gracefully.")
      introduced the BNXT_FLAG_NO_AGG_RINGS flag.  For consistency,
      bnxt_set_tpa_flags() should also clear TPA flags when there are no
      aggregation rings.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      341138c3
    • Michael Chan's avatar
      bnxt_en: Fix compiler warnings when CONFIG_RFS_ACCEL is not defined. · b7429954
      Michael Chan authored
      CC [M]  drivers/net/ethernet/broadcom/bnxt/bnxt.o
      drivers/net/ethernet/broadcom/bnxt/bnxt.c:4947:21: warning: ‘bnxt_get_max_func_rss_ctxs’ defined but not used [-Wunused-function]
       static unsigned int bnxt_get_max_func_rss_ctxs(struct bnxt *bp)
                           ^
        CC [M]  drivers/net/ethernet/broadcom/bnxt/bnxt.o
      drivers/net/ethernet/broadcom/bnxt/bnxt.c:4956:21: warning: ‘bnxt_get_max_func_vnics’ defined but not used [-Wunused-function]
       static unsigned int bnxt_get_max_func_vnics(struct bnxt *bp)
                           ^
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7429954
    • David S. Miller's avatar
      Merge branch 'tcp-RACK-fast-recovery' · 718e14bb
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      tcp: RACK fast recovery
      
      The patch set enables RACK loss detection (draft-ietf-tcpm-rack-01)
      to trigger fast recovery with a reordering timer.
      
      Previously RACK has been running in auxiliary mode where it is
      used to detect packet losses once the recovery has triggered by
      other algorithms (e.g., FACK). By inspecting packet timestamps,
      RACK can start ACK-driven repairs timely. A few similar heuristics
      are no longer needed and are either removed or disabled to reduce
      the complexity of the Linux TCP loss recovery engine:
      
        1. FACK (Forward Acknowledgement)
        2. Early Retransmit (RFC5827)
        3. thin_dupack (fast recovery on single DUPACK for thin-streams)
        4. NCR (Non-Congestion Robustness RFC4653) (RFC4653)
        5. Forward Retransmit
      
      After this change, Linux's loss recovery algorithms consist of
        1. Conventional DUPACK threshold approach (RFC6675)
        2. RACK and Tail Loss Probe (draft-ietf-tcpm-rack-01)
        3. RTO plus F-RTO extension (RFC5682)
      
      The patch set has been tested on Google servers extensively and
      presented in several IETF meetings. The data suggests that RACK
      successfully improves recovery performance:
      https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-draft-ietf-tcpm-rack-01.pdf
      https://www.ietf.org/proceedings/96/slides/slides-96-tcpm-3.pdf
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      718e14bb
    • Yuchung Cheng's avatar
      tcp: disable fack by default · 94bdc978
      Yuchung Cheng authored
      This patch disables FACK by default as RACK is the successor of FACK
      (inspired by the insights behind FACK).
      
      FACK[1] in Linux works as follows: a packet P is deemed lost,
      if packet Q of higher sequence is s/acked and P and Q are distant
      by at least dupthresh number of packets in sequence space.
      
      FACK is more aggressive than the IETF recommened recovery for SACK
      (RFC3517 A Conservative Selective Acknowledgment (SACK)-based Loss
       Recovery Algorithm for TCP), because a single SACK may trigger
      fast recovery. This obviously won't work well with reordering so
      FACK is dynamically disabled upon detecting reordering.
      
      RACK supersedes FACK by using time distance instead of sequence
      distance. On reordering, RACK waits for a quarter of RTT receiving
      a single SACK before starting recovery. (the timer can be made more
      adaptive in the future by measuring reordering distance in time,
      but currently RTT/4 seem to work well.) Once the recovery starts,
      RACK behaves almost like FACK because it reduces the reodering
      window to 1ms, so it fast retransmits quickly. In addition RACK
      can detect loss retransmission as it does not care about the packet
      sequences (being repeated or not), which is extremely useful when
      the connection is going through a traffic policer.
      
      Google server experiments indicate that disabling FACK after enabling
      RACK has negligible impact on the overall loss recovery performance
      with more reordering events detected.  But we still keep the FACK
      implementation for backup if RACK has bugs that needs to be disabled.
      
      [1] M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining
      TCP Congestion Control," In Proceedings of SIGCOMM '96, August 1996.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94bdc978
    • Yuchung Cheng's avatar
      tcp: remove thin_dupack feature · 4a7f6009
      Yuchung Cheng authored
      Thin stream DUPACK is to start fast recovery on only one DUPACK
      provided the connection is a thin stream (i.e., low inflight).  But
      this older feature is now subsumed with RACK. If a connection
      receives only a single DUPACK, RACK would arm a reordering timer
      and soon starts fast recovery instead of timeout if no further
      ACKs are received.
      
      The socket option (THIN_DUPACK) is kept as a nop for compatibility.
      Note that this patch does not change another thin-stream feature
      which enables linear RTO. Although it might be good to generalize
      that in the future (i.e., linear RTO for the first say 3 retries).
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a7f6009
    • Yuchung Cheng's avatar
      tcp: remove RFC4653 NCR · ac229dca
      Yuchung Cheng authored
      This patch removes the (partial) implementation of the aggressive
      limited transmit in RFC4653 TCP Non-Congestion Robustness (NCR).
      
      NCR is a mitigation to the problem created by the dynamic
      DUPACK threshold.  With the current adaptive DUPACK threshold
      (tp->reordering) could cause timeouts by preventing fast recovery.
      For example, if the last packet of a cwnd burst was reordered, the
      threshold will be set to the size of cwnd. But if next application
      burst is smaller than threshold and has drops instead of reorderings,
      the sender would not trigger fast recovery but instead resorts to a
      timeout recovery.
      
      NCR mitigates this issue by checking the number of DUPACKs against
      the current flight size additionally. The techniqueue is similar to
      the early retransmit RFC.
      
      With RACK loss detection, this mitigation is not needed, because RACK
      does not use DUPACK threshold to detect losses. RACK arms a reordering
      timer to fire at most a quarter RTT later to start fast recovery.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac229dca
    • Yuchung Cheng's avatar
      tcp: remove early retransmit · bec41a11
      Yuchung Cheng authored
      This patch removes the support of RFC5827 early retransmit (i.e.,
      fast recovery on small inflight with <3 dupacks) because it is
      subsumed by the new RACK loss detection. More specifically when
      RACK receives DUPACKs, it'll arm a reordering timer to start fast
      recovery after a quarter of (min)RTT, hence it covers the early
      retransmit except RACK does not limit itself to specific inflight
      or dupack numbers.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bec41a11
    • Yuchung Cheng's avatar
      tcp: remove forward retransmit feature · 840a3cbe
      Yuchung Cheng authored
      Forward retransmit is an esoteric feature in RFC3517 (condition(3)
      in the NextSeg()). Basically if a packet is not considered lost by
      the current criteria (# of dupacks etc), but the congestion window
      has room for more packets, then retransmit this packet.
      
      However it actually conflicts with the rest of recovery design. For
      example, when reordering is detected we want to be conservative
      in retransmitting packets but forward-retransmit feature would
      break that to force more retransmission. Also the implementation is
      fairly complicated inside the retransmission logic inducing extra
      iterations in the write queue. With RACK losses are being detected
      timely and this heuristic is no longer necessary. There this patch
      removes the feature.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      840a3cbe
    • Yuchung Cheng's avatar
      tcp: extend F-RTO to catch more spurious timeouts · 89fe18e4
      Yuchung Cheng authored
      Current F-RTO reverts cwnd reset whenever a never-retransmitted
      packet was (s)acked. The timeout can be declared spurious because
      the packets acknoledged with this ACK was transmitted before the
      timeout, so clearly not all the packets are lost to reset the cwnd.
      
      This nice detection does not really depend F-RTO internals. This
      patch applies the detection universally. On Google servers this
      change detected 20% more spurious timeouts.
      Suggested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89fe18e4
    • Yuchung Cheng's avatar
      tcp: enable RACK loss detection to trigger recovery · a0370b3f
      Yuchung Cheng authored
      This patch changes two things:
      
      1. Start fast recovery with RACK in addition to other heuristics
         (e.g., DUPACK threshold, FACK). Prior to this change RACK
         is enabled to detect losses only after the recovery has
         started by other algorithms.
      
      2. Disable TCP early retransmit. RACK subsumes the early retransmit
         with the new reordering timer feature. A latter patch in this
         series removes the early retransmit code.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0370b3f
    • Yuchung Cheng's avatar
      tcp: check undo conditions before detecting losses · 98e36d44
      Yuchung Cheng authored
      Currently RACK would mark loss before the undo operations in TCP
      loss recovery. This could incorrectly identify real losses as
      spurious. For example a sender first experiences a delay spike and
      then eventually some packets were lost due to buffer overrun.
      In this case, the sender should perform fast recovery b/c not all
      the packets were lost.
      
      But the sender may first trigger a (spurious) RTO and reset
      cwnd to 1. The following ACKs may used to mark real losses by
      tcp_rack_mark_lost. Then in tcp_process_loss this ACK could trigger
      F-RTO undo condition and unmark real losses and revert the cwnd
      reduction. If there are no more ACKs coming back, eventually the
      sender would timeout again instead of performing fast recovery.
      
      The patch fixes this incorrect process by always performing
      the undo checks before detecting losses.
      
      Fixes: 4f41b1c5 ("tcp: use RACK to detect losses")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98e36d44
    • Yuchung Cheng's avatar
      tcp: use sequence to break TS ties for RACK loss detection · 1d0833df
      Yuchung Cheng authored
      The packets inside a jumbo skb (e.g., TSO) share the same skb
      timestamp, even though they are sent sequentially on the wire. Since
      RACK is based on time, it can not detect some packets inside the
      same skb are lost.  However, we can leverage the packet sequence
      numbers as extended timestamps to detect losses. Therefore, when
      RACK timestamp is identical to skb's timestamp (i.e., one of the
      packets of the skb is acked or sacked), we use the sequence numbers
      of the acked and unacked packets to break ties.
      
      We can use the same sequence logic to advance RACK xmit time as
      well to detect more losses and avoid timeout.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d0833df
    • Yuchung Cheng's avatar
      tcp: add reordering timer in RACK loss detection · 57dde7f7
      Yuchung Cheng authored
      This patch makes RACK install a reordering timer when it suspects
      some packets might be lost, but wants to delay the decision
      a little bit to accomodate reordering.
      
      It does not create a new timer but instead repurposes the existing
      RTO timer, because both are meant to retransmit packets.
      Specifically it arms a timer ICSK_TIME_REO_TIMEOUT when
      the RACK timing check fails. The wait time is set to
      
        RACK.RTT + RACK.reo_wnd - (NOW - Packet.xmit_time) + fudge
      
      This translates to expecting a packet (Packet) should take
      (RACK.RTT + RACK.reo_wnd + fudge) to deliver after it was sent.
      
      When there are multiple packets that need a timer, we use one timer
      with the maximum timeout. Therefore the timer conservatively uses
      the maximum window to expire N packets by one timeout, instead of
      N timeouts to expire N packets sent at different times.
      
      The fudge factor is 2 jiffies to ensure when the timer fires, all
      the suspected packets would exceed the deadline and be marked lost
      by tcp_rack_detect_loss(). It has to be at least 1 jiffy because the
      clock may tick between calling icsk_reset_xmit_timer(timeout) and
      actually hang the timer. The next jiffy is to lower-bound the timeout
      to 2 jiffies when reo_wnd is < 1ms.
      
      When the reordering timer fires (tcp_rack_reo_timeout): If we aren't
      in Recovery we'll enter fast recovery and force fast retransmit.
      This is very similar to the early retransmit (RFC5827) except RACK
      is not constrained to only enter recovery for small outstanding
      flights.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57dde7f7
    • Yuchung Cheng's avatar
      tcp: record most recent RTT in RACK loss detection · deed7be7
      Yuchung Cheng authored
      Record the most recent RTT in RACK. It is often identical to the
      "ca_rtt_us" values in tcp_clean_rtx_queue. But when the packet has
      been retransmitted, RACK choses to believe the ACK is for the
      (latest) retransmitted packet if the RTT is over minimum RTT.
      
      This requires passing the arrival time of the most recent ACK to
      RACK routines. The timestamp is now recorded in the "ack_time"
      in tcp_sacktag_state during the ACK processing.
      
      This patch does not change the RACK algorithm itself. It only adds
      the RTT variable to prepare the next main patch.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      deed7be7
    • Yuchung Cheng's avatar
      tcp: new helper for RACK to detect loss · e636f8b0
      Yuchung Cheng authored
      Create a new helper tcp_rack_detect_loss to prepare the upcoming
      RACK reordering timer patch.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e636f8b0