1. 25 Sep, 2016 18 commits
  2. 24 Sep, 2016 22 commits
    • David Howells's avatar
      rxrpc: Implement slow-start · 57494343
      David Howells authored
      Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
      tracepoint is added to log the state of the congestion management algorithm
      and the decisions it makes.
      
      Notes:
      
       (1) Since we send fixed-size DATA packets (apart from the final packet in
           each phase), counters and calculations are in terms of packets rather
           than bytes.
      
       (2) The ACK packet carries the equivalent of TCP SACK.
      
       (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
           suited to SACK of a small number of packets.  It seems that, almost
           inevitably, by the time three 'duplicate' ACKs have been seen, we have
           narrowed the loss down to one or two missing packets, and the
           FLIGHT_SIZE calculation ends up as 2.
      
       (4) In rxrpc_resend(), if there was no data that apparently needed
           retransmission, we transmit a PING ACK to ask the peer to tell us what
           its Rx window state is.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      57494343
    • David Howells's avatar
      rxrpc: Schedule an ACK if the reply to a client call appears overdue · 0d967960
      David Howells authored
      If we've sent all the request data in a client call but haven't seen any
      sign of the reply data yet, schedule an ACK to be sent to the server to
      find out if the reply data got lost.
      
      If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to
      demand a response to find out whether we need to retransmit.
      
      If the server says it has received all of the data, we send an IDLE ACK to
      tell the server that we haven't received anything in the receive phase as
      yet.
      
      To make this work, a non-immediate PING ACK must carry a delay.  I've chosen
      the same as the IDLE ACK for the moment.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      0d967960
    • David Howells's avatar
      rxrpc: Generate a summary of the ACK state for later use · 31a1b989
      David Howells authored
      Generate a summary of the Tx buffer packet state when an ACK is received
      for use in a later patch that does congestion management.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      31a1b989
    • David Howells's avatar
      rxrpc: Delay the resend timer to allow for nsec->jiffies conv error · df0562a7
      David Howells authored
      When determining the resend timer value, we have a value in nsec but the
      timer is in jiffies which may be a million or more times more coarse.
      nsecs_to_jiffies() rounds down - which means that the resend timeout
      expressed as jiffies is very likely earlier than the one expressed as
      nanoseconds from which it was derived.
      
      The problem is that rxrpc_resend() gets triggered by the timer, but can't
      then find anything to resend yet.  It sets the timer again - but gets
      kicked off immediately again and again until the nanosecond-based expiry
      time is reached and we actually retransmit.
      
      Fix this by adding 1 to the jiffies-based resend_at value to counteract the
      rounding and make sure that the timer happens after the nanosecond-based
      expiry is passed.
      
      Alternatives would be to adjust the timestamp on the packets to align
      with the jiffie scale or to switch back to using jiffie-timestamps.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      df0562a7
    • David Howells's avatar
      rxrpc: Reinitialise the call ACK and timer state for client reply phase · dd7c1ee5
      David Howells authored
      Clear the ACK reason, ACK timer and resend timer when entering the client
      reply phase when the first DATA packet is received.  New ACKs will be
      proposed once the data is queued.
      
      The resend timer is no longer relevant and we need to cancel ACKs scheduled
      to probe for a lost reply.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      dd7c1ee5
    • David Howells's avatar
      rxrpc: Include the last reply DATA serial number in the final ACK · b69d94d7
      David Howells authored
      In a client call, include the serial number of the last DATA packet of the
      reply in the final ACK.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b69d94d7
    • David Howells's avatar
      rxrpc: Send an immediate ACK if we fill in a hole · a7056c5b
      David Howells authored
      Send an immediate ACK if we fill in a hole in the buffer left by an
      out-of-sequence packet.  This may allow the congestion management in the peer
      to avoid a retransmission if packets got reordered on the wire.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a7056c5b
    • David Howells's avatar
      rxrpc: Send an ACK after every few DATA packets we receive · 805b21b9
      David Howells authored
      Send an ACK if we haven't sent one for the last two packets we've received.
      This keeps the other end apprised of where we've got to - which is
      important if they're doing slow-start.
      
      We do this in recvmsg so that we can dispatch a packet directly without the
      need to wake up the background thread.
      
      This should possibly be made configurable in future.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      805b21b9
    • Stephen Hemminger's avatar
      hv_netvsc: fix comments · c6a77ff8
      Stephen Hemminger authored
      Typo's and spelling errors. Also remove old comment from staging era.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6a77ff8
    • David S. Miller's avatar
      Merge branch 'thunderx-bql' · 15a09901
      David S. Miller authored
      Sunil Goutham says:
      
      ====================
      BQL support and fix for a regression issue
      
      These patches add byte queue limit support and also fixes a regression
      issue introduced by commit
      'net: thunderx: Use netdev's name for naming VF's interrupts'
      
      Changes from v1:
      - As suggested added 'Fixes' tag with commit id of previous commit
        which cuased issue.
      - Also fixed the missing netdev_tx_reset_queue() function call in
        byte queue limits support patch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15a09901
    • Sunil Goutham's avatar
      net: thunderx: Support for byte queue limits · 2c204c2b
      Sunil Goutham authored
      This patch adds support for byte queue limits
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c204c2b
    • Sunil Goutham's avatar
      net: thunderx: Fix issue with IRQ namimg · b4e28c1f
      Sunil Goutham authored
      This patch fixes a regression caused by previous commit
      when irq name exceeds 20 byte array if interface's name
      size is large.
      
      Fixes: e4126213 ("net: thunderx: Use netdev's name for naming VF's interrupts")
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4e28c1f
    • Colin Ian King's avatar
      mlxsw: spectrum: remove redundant check if err is zero · faac0ff0
      Colin Ian King authored
      There is an earlier check and return if err is non-zero, so
      the check to see if it is zero is redundant in every iteration
      of the loop and hence the check can be removed.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faac0ff0
    • Sean Wang's avatar
      Documentation: devicetree: fix typo in MediaTek ethernet device-tree binding · 7f8c2865
      Sean Wang authored
      fix typo in
      Documentation/devicetree/bindings/net/mediatek-net.txt
      
      Cc: devicetree@vger.kernel.org
      Reported-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f8c2865
    • Sean Wang's avatar
      Documentation: devicetree: revise ethernet device-tree binding about TRGMII · 4ce4862a
      Sean Wang authored
      add phy-mode "trgmii" to
      Documentation/devicetree/bindings/net/ethernet.txt
      
      Cc: devicetree@vger.kernel.org
      Reported-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ce4862a
    • David S. Miller's avatar
      Merge tag 'rxrpc-rewrite-20160923' of... · 2a9aa41f
      David S. Miller authored
      Merge tag 'rxrpc-rewrite-20160923' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: Bug fixes and tracepoints
      
      Here are a bunch of bug fixes:
      
       (1) Need to set the timestamp on a Tx packet before queueing it to avoid
           trouble with the retransmission function.
      
       (2) Don't send an ACK at the end of the service reply transmission; it's
           the responsibility of the client to send an ACK to close the call.
           The service can resend the last DATA packet or send a PING ACK.
      
       (3) Wake sendmsg() on abnormal call termination.
      
       (4) Use ktime_add_ms() not ktime_add_ns() to add millisecond offsets.
      
       (5) Use before_eq() & co. to compare serial numbers (which may wrap).
      
       (6) Start the resend timer on DATA packet transmission.
      
       (7) Don't accidentally cancel a retransmission upon receiving a NACK.
      
       (8) Fix the call timer setting function to deal with timeouts that are now
           or past.
      
       (9) Don't use a flag to communicate the presence of the last packet in the
           Tx buffer from sendmsg to the input routines where ACK and DATA
           reception is handled.  The problem is that there's a window between
           queueing the last packet for transmission and setting the flag in
           which ACKs or reply DATA packets can arrive, causing apparent state
           machine violation issues.
      
           Instead use the annotation buffer to mark the last packet and pick up
           and set the flag in the input routines.
      
      (10) Don't call the tx_ack tracepoint and don't allocate a serial number if
           someone else nicked the ACK we were about to transmit.
      
      There are also new tracepoints and one altered tracepoint used to track
      down the above bugs:
      
      (11) Call timer tracepoint.
      
      (12) Data Tx tracepoint (and adjustments to ACK tracepoint).
      
      (13) Injected Rx packet loss tracepoint.
      
      (14) Ack proposal tracepoint.
      
      (15) Retransmission selection tracepoint.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a9aa41f
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 3eb193e0
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      10GbE Intel Wired LAN Driver Updates 2016-09-23
      
      This series contains updates to ixgbe and ixgbevf.
      
      Emil provides several changes, first simplifies the logic for setting VLAN
      filtering by checking the VMDQ flag and the old 82598 MAC, instead of
      having to maintain a list of MAC types.  Then made two functions static
      that are used only within the file, a by-product is sparse is now happy.
      Added spinlocks to make sure that the MTU configuration is handled
      properly.  Fixed an issue where when SR-IOV is enabled while the
      ixgbevf driver is loaded would result in all mailbox requests being
      rejected by ixgbe, so call ixgbe_sriov_reinit() before pci_enable_sriov()
      to ensure mailbox requests are properly handled.
      
      Mark resolves a NULL pointer issue by simply setting the read and write
      *_ref_mdi pointers for x550em_a devices.  Then clearly indicates within
      ethtool that all MACs support pause frames and made sure that the
      advertising is set to the requested mode.  Fixed an issue where
      MDIO_PRTAD_NONE was not being used consistently to indicate no PHY
      address.
      
      Alex fixes an issue, where the support for multiple queues when SR-IOV
      is enabled was added but the support was not reported.  With that, fix
      an issue where the hardware redirection table could support more queues
      then the PF currently has when SR-IOV is enabled, so use the RSS mask to
      trim off the bits that are not used.  Lastly, instead of limiting the
      VFs if we do not use 4 queues for RSS in the PF, we can instead just limit
      the RSS queues used to a power of 2.  We can now support use cases where
      VFs are using more queues than the PF is currently using and can support
      RSS if so desired.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3eb193e0
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 1678c113
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2016-09-23
      
      Only two patches this time:
      
      1) Fix a comment reference to struct xfrm_replay_state_esn.
         From Richard Guy Briggs.
      
      2) Convert xfrm_state_lookup to rcu, we don't need the
         xfrm_state_lock anymore in the input path.
         From Florian Westphal.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1678c113
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 834d9649
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-09-22
      
      This series contains updates to i40e and i40evf only.
      
      Sridhar fixes link state event handling by updating the carrier and
      starts/stops the Tx queues based on the link state notification from PF.
      
      Brady fixes an issue where a user defined RSS hash key was not being
      set because a user defined indirection table is not supplied when changing
      the hash key, so if an indirection table is not supplied now, then a
      default one is created and the hash key is correctly set.  Also fixed
      an issue where when NPAR was enabled, we were still using pf->mac_seid
      to perform the dump port query. Instead, go through the VSI to determine
      the correct ID to use in either case.
      
      Mitch provides one fix where a conditional return code was reversed, so
      he does a "switheroo" to fix the issue.
      
      Carolyn has two fixes, first fixes an issue in the virt channel code,
      where a return code was not checked for NULL when applicable.  Second,
      fixes an issue where we were byte swapping the port parameter, then
      byte swapping it again in function execution.
      
      Colin Ian King fixes a potential NULL pointer dereference.
      
      Bimmy changes up i40evf_up_complete() to be void since it always returns
      success anyways, which allows cleaning up of code which checked the
      return code from this function.
      
      Alex fixed an issue where the driver was incorrectly assuming that we
      would always be pulling no more than 1 descriptor from each fragment.
      So to correct this, we just need to make certain to test all the way to
      the end of the fragments as it is possible for us to span 2 descriptors
      in the block before us so we need to guarantee that even the last 6
      descriptors have enough data to fill a full frame.
      
      v2: dropped patches 1-3, 10 and 12 from the original series since Or
          Gerlitz pointed out several areas of improvement in the implementation
          of the VF Port representor netdev.  Sridhar is re-working the series
          for later submission.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      834d9649
    • David S. Miller's avatar
      Merge branch 'mlx4-vf-vlan-802.1ad' · 1ad0751d
      David S. Miller authored
      Tariq Toukan says:
      
      ====================
      mlx4 VF vlan protocol 802.1ad support
      
      This patchset adds VF VLAN protocol 802.1ad support to the
      mlx4 driver.
      We extended the VF VLAN API with an additional parameter
      for VLAN protocol, and kept 802.1Q as drivers' default.
      
      We prepared a userspace support (ip link tool).
      The patch will be submitted to the iproute2 mailing list.
      
      The ip link tool VF VLAN protocol parameter is optional (default: 802.1Q).
      A configuration command of VF VLAN that is used prior to this patchset
      will result in same functionality as today's (VST with VLAN protocol 802.1Q).
      
      The series generated against net-next commit:
      688dc536 "Merge branch 'mlx4-next'"
      
      All maintainers of the modified modules are in cc.
      
      v3:
        Expand the UAPI to a nested list to support future use-cases.
        Use a more formal feature name.
      
      v2:
        Drop patch 4 ("net/mlx4_core: Add an option to configure SVLAN TPID").
        Patch 1/5: Update commit log.
        2-3/5: Split patch 2 into two patches, to separate between changes
               done in mlx4_core and the ones done in mlx4_en.
        4-5/5: Split patch 3 into two patches, to separate between the
               addition of a protocol parameter and the actual implementation
      	 in mlx4_en.
      	 In addition, we implement a handshake mechanism so PF and VF
      	 exchange their VST QinQ support capability.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ad0751d
    • Moshe Shemesh's avatar
      net/mlx4: Add VF vlan protocol 802.1ad support · b42959dc
      Moshe Shemesh authored
      Move the vf to VST 802.1ad mode (mlx4 VST QinQ mode) by setting vf vlan
      protocol to 802.1ad.
      VST 802.1ad mode in mlx4, is used for STAG strip/insertion by PF, while
      the CTAG is set by the VF.
      Read current vlan protocol as part of the vf configuration state.
      
      Upon setting vf vlan protocol to 802.1ad, we use a mechanism of handshake
      to verify that both the vf and the pf driver version support it.
      The handshake uses the command QUERY_FUNC_CAP:
      - The vf sets a pre-defined support bit in input modifier.
      - A pf that supports the feature sends the request to the vf through a
        pre-defined field in the output mailbox.
      - In case vf does not support the feature, the pf will fail the control
        command (in this case, IP link tool command to set the vf vlan
        protocol to 802.1ad).
      
      No change in VST 802.1Q mode.
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b42959dc
    • Moshe Shemesh's avatar
      net: Update API for VF vlan protocol 802.1ad support · 79aab093
      Moshe Shemesh authored
      Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
      the ability for user-space application to specify it for the VF, as an
      option to support 802.1ad.
      We adjusted IP Link tool to support this option.
      
      For future use cases, the new UAPI supports multiple vlans. For now we
      limit the list size to a single vlan in kernel.
      Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
      compatibility with older versions of IP Link tool.
      
      Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
      We kept 802.1Q as the drivers' default vlan protocol.
      Suitable ip link tool command examples:
        Set vf vlan protocol 802.1ad:
          ip link set eth0 vf 1 vlan 100 proto 802.1ad
        Set vf to VST (802.1Q) mode:
          ip link set eth0 vf 1 vlan 100 proto 802.1Q
        Or by omitting the new parameter
          ip link set eth0 vf 1 vlan 100
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79aab093