1. 22 May, 2017 23 commits
  2. 21 May, 2017 17 commits
    • Eric Dumazet's avatar
      tcp: fix tcp_probe_timer() for TCP_USER_TIMEOUT · 4ab68879
      Eric Dumazet authored
      TCP_USER_TIMEOUT is still converted to jiffies value in
      icsk_user_timeout
      
      So we need to make a conversion for the cases HZ != 1000
      
      Fixes: 9a568de4 ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ab68879
    • stephen hemminger's avatar
      ipv6: drop unused variables in seg6_genl_dumphac · 0a9fc39e
      stephen hemminger authored
      THe seg6_pernet_data variable was set but never used.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a9fc39e
    • stephen hemminger's avatar
      fou: make local function static · 9dc621af
      stephen hemminger authored
      The build header functions are not used by any other code.
      
      net/ipv6/fou6.c:36:5: warning: no previous prototype for ‘fou6_build_header’ [-Wmissing-prototypes]
      net/ipv6/fou6.c:54:5: warning: no previous prototype for ‘gue6_build_header’ [-Wmissing-prototypes]
      
      Need to do some code rearranging to satisfy different Kconfig possiblities.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dc621af
    • stephen hemminger's avatar
      tcpnv: do not export local function · c718c6d6
      stephen hemminger authored
      The TCP New Vegas congestion control was exporting an internal
      function tcpnv_get_info which is not used by any other in tree
      kernel code. Make it static.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c718c6d6
    • stephen hemminger's avatar
      inet: fix warning about missing prototype · 9691724e
      stephen hemminger authored
      The prototype for inet_rcv_saddr_equal was not being included.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9691724e
    • stephen hemminger's avatar
      ila: propagate error code in ila_output · 9e7b19c5
      stephen hemminger authored
      This warning:
      net/ipv6/ila/ila_lwt.c: In function ‘ila_output’:
      net/ipv6/ila/ila_lwt.c:42:6: warning: variable ‘err’ set but not used [-Wunused-but-set-variable]
      
      It looks like the code attempts to set propagate different error
      values, but always returned -EINVAL.
      
      Compile tested only. Needs review by original author.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e7b19c5
    • stephen hemminger's avatar
      dcb: enforce minimum length on IEEE_APPS attribute · 332b4fc8
      stephen hemminger authored
      Found by reviewing the warning about unused policy table.
      The code implies that it meant to check for size, but since
      it unrolled the loop for attribute validation that is never used.
      Instead do explicit check for attribute.
      
      Compile tested only. Needs review by original author.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      332b4fc8
    • David S. Miller's avatar
      Merge branch 'net-extend-socket-timestamping-API' · dae37055
      David S. Miller authored
      Miroslav Lichvar says:
      
      ====================
      Extend socket timestamping API
      
      Changes v5->v6:
      - fixed skb_is_swtx_tstamp() when OPT_TX_SWHW is disabled and improved
        its description
      - improved OPT_PKTINFO documentation
      - improved scm_timestamping documentation
      
      Changes v4->v5:
      - fixed initialization of reserved fields in struct scm_ts_pktinfo
      
      Changes v3->v4:
      - added reserved fields to struct scm_ts_pktinfo
      - replaced patch fixing false SW timestamps with a documentation fix
      - updated OPT_TX_SWHW patch to handle false SW timestamps
      
      Changes v2->v3:
      - modified struct scm_ts_pktinfo to use fixed-width integer types
      - added WARN_ON_ONCE for missing RCU lock in dev_get_by_napi_id()
      - modified dev_get_by_napi_id() to not return dev in unexpected branch
      - modified recv to return SCM_TIMESTAMPING_PKTINFO even if the interface
        index is unknown
      
      Changes v1->v2:
      - added separate patch for new NAPI functions
      - split code from __sock_recv_timestamp() for better readability
      - fixed RCU locking
      - fixed compiler warning (missing case in switch in first patch)
      - inline sw_tx_timestamp() in its only user
      
      Changes RFC->v1:
      - reworked SOF_TIMESTAMPING_OPT_PKTINFO patch to not add new fields to
        skb shared info (net device is now looked up by napi_id), not require
        any changes in drivers, and restrict the cmsg to incoming packets
      - renamed SOF_TIMESTAMPING_OPT_MULTIMSG to SOF_TIMESTAMPING_OPT_TX_SWHW
        and fixed its description
      - moved struct scm_ts_pktinfo from errqueue.h to net_tstamp.h as it
        can't be received from the error queue anymore
      - improved commit descriptions and removed incorrect comment
      
      This patchset adds new options to the timestamping API that will be
      useful for NTP implementations and possibly other applications.
      
      The first patch specifies a timestamp filter for NTP packets. The second
      patch updates drivers that can timestamp all packets, or need to list
      the filter as unsupported. There is no attempt to add the support to the
      phyter driver.
      
      The third patch adds two helper functions working with NAPI ID, which is
      needed by the next patch. The fourth patch adds a new option to get a
      new control message with the L2 length and interface index for incoming
      packets with hardware timestamps.
      
      The fifth patch fixes documentation on number of non-zero fields in
      scm_timestamping and warns about false software timestamps when
      SO_TIMESTAMP(NS) is combined with SCM_TIMESTAMPING.
      
      The sixth patch adds a new option to request both software and hardware
      timestamps for outgoing packets. The seventh patch updates drivers that
      assumed software timestamping cannot be used together with hardware
      timestamping.
      
      The patches have been tested on x86_64 machines with igb and e1000e
      drivers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dae37055
    • Miroslav Lichvar's avatar
      net: ethernet: update drivers to make both SW and HW TX timestamps · 74abc9b1
      Miroslav Lichvar authored
      Some drivers were calling the skb_tx_timestamp() function only when
      a hardware timestamp was not requested. Now that applications can use
      the SOF_TIMESTAMPING_OPT_TX_SWHW option to request both software and
      hardware timestamps, the drivers need to be modified to unconditionally
      call skb_tx_timestamp().
      
      CC: Richard Cochran <richardcochran@gmail.com>
      CC: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74abc9b1
    • Miroslav Lichvar's avatar
      net: allow simultaneous SW and HW transmit timestamping · b50a5c70
      Miroslav Lichvar authored
      Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
      be looped to the socket's error queue with a software timestamp even
      when a hardware transmit timestamp is expected to be provided by the
      driver.
      
      Applications using this option will receive two separate messages from
      the error queue, one with a software timestamp and the other with a
      hardware timestamp. As the hardware timestamp is saved to the shared skb
      info, which may happen before the first message with software timestamp
      is received by the application, the hardware timestamp is copied to the
      SCM_TIMESTAMPING control message only when the skb has no software
      timestamp or it is an incoming packet.
      
      While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
      there are no other users.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      CC: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b50a5c70
    • Miroslav Lichvar's avatar
      net: fix documentation of struct scm_timestamping · 67953d47
      Miroslav Lichvar authored
      The scm_timestamping struct may return multiple non-zero fields, e.g.
      when both software and hardware RX timestamping is enabled, or when the
      SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false
      software timestamp is generated in the recvmsg() call in order to always
      return a SCM_TIMESTAMP(NS) message.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      CC: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67953d47
    • Miroslav Lichvar's avatar
      net: add new control message for incoming HW-timestamped packets · aad9c8c4
      Miroslav Lichvar authored
      Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
      for incoming packets with hardware timestamps. It contains the index of
      the real interface which received the packet and the length of the
      packet at layer 2.
      
      The index is useful with bonding, bridges and other interfaces, where
      IP_PKTINFO doesn't allow applications to determine which PHC made the
      timestamp. With the L2 length (and link speed) it is possible to
      transpose preamble timestamps to trailer timestamps, which are used in
      the NTP protocol.
      
      While this information could be provided by two new socket options
      independently from timestamping, it doesn't look like they would be very
      useful. With this option any performance impact is limited to hardware
      timestamping.
      
      Use dev_get_by_napi_id() to get the device and its index. On kernels
      with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
      index will be returned in the control message.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aad9c8c4
    • Miroslav Lichvar's avatar
      net: add function to retrieve original skb device using NAPI ID · 90b602f8
      Miroslav Lichvar authored
      Since commit b6858177 ("net: Make skb->skb_iif always track
      skb->dev") skbs don't have the original index of the interface which
      received the packet. This information is now needed for a new control
      message related to hardware timestamping.
      
      Instead of adding a new field to skb, we can find the device by the NAPI
      ID if it is available, i.e. CONFIG_NET_RX_BUSY_POLL is enabled and the
      driver is using NAPI. Add dev_get_by_napi_id() and also skb_napi_id() to
      hide the CONFIG_NET_RX_BUSY_POLL ifdef.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      Suggested-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90b602f8
    • Miroslav Lichvar's avatar
      net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL · e3412575
      Miroslav Lichvar authored
      Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid
      filter and update drivers which can timestamp all packets, or which
      explicitly list unsupported filters instead of using a default case, to
      handle the filter.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      CC: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3412575
    • Miroslav Lichvar's avatar
      net: define receive timestamp filter for NTP · b8210a9e
      Miroslav Lichvar authored
      Add HWTSTAMP_FILTER_NTP_ALL to the hwtstamp_rx_filters enum for
      timestamping of NTP packets. There is currently only one driver
      (phyter) that could support it directly.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      CC: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8210a9e
    • Ganesh Goudar's avatar
      cxgb4 : retrieve port information from firmware · 2061ec3f
      Ganesh Goudar authored
      issue get port information command to firmware to retrieve port
      information and update if it is different from what was last
      recorded and also add indication for supported link modes for
      firmware port types FW_PORT_TYPE_SFP28, FW_PORT_TYPE_KR_SFP28,
      FW_PORT_TYPE_CR4_QSFP.
      
      Based on the original work by Casey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2061ec3f
    • Sivakumar Krishnasamy's avatar
      ibmveth: Support to enable LSO/CSO for Trunk VEA. · 66aa0678
      Sivakumar Krishnasamy authored
      Current largesend and checksum offload feature in ibmveth driver,
       - Source VM sends the TCP packets with ip_summed field set as
         CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in
         checksum field
       - CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark
         "no checksum" and "checksum good" bits in transmit buffer descriptor
         before the packet is delivered to pseries PowerVM Hypervisor
       - If ibmveth has largesend capability enabled, transmit buffer descriptors
         are market accordingly before packet is delivered to Hypervisor
         (along with mss value for packets with length > MSS)
       - Destination VM's ibmveth driver receives the packet with "checksum good"
         bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY
       - If "largesend" bit was on, mss value is copied from receive descriptor
         into SKB's gso_size and other flags are appropriately set for
         packets > MSS size
       - The packet is now successfully delivered up the stack in destination VM
      
      The offloads described above works fine for TCP communication among VMs in
      the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B )
      
      We are now enabling support for OVS in pseries PowerVM environment. One of
      our requirements is to have ibmveth driver configured in "Trunk" mode, when
      they are used with OVS. This is because, PowerVM Hypervisor will no more
      bridge the packets between VMs, instead the packets are delivered to
      IO Server which hosts OVS to bridge them between VMs or to external
      networks (flow shown below),
        VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor
                                                                         <=> VM B
      In "IO server" the packet is received by inbound Trunk ibmveth and then
      delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown
      below),
              Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth
      
      In this model, we hit the following issues which impacted the VM
      communication performance,
      
       - Issue 1: ibmveth doesn't support largesend and checksum offload features
         when configured as "Trunk". Driver has explicit checks to prevent
         enabling these offloads.
      
       - Issue 2: SYN packet drops seen at destination VM. When the packet
         originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to
         IO server's inbound Trunk ibmveth, on validating "checksum good" bits
         in ibmveth receive routine, SKB's ip_summed field is set with
         CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux
         Bridge) and delivered to outbound Trunk ibmveth. At this point the
         outbound ibmveth transmit routine will not set "no checksum" and
         "checksum good" bits in transmit buffer descriptor, as it does so only
         when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets
         delivered to destination VM, TCP layer receives the packet with checksum
         value of 0 and with no checksum related flags in ip_summed field. This
         leads to packet drops. So, TCP connections never goes through fine.
      
       - Issue 3: First packet of a TCP connection will be dropped, if there is
         no OVS flow cached in datapath. OVS while trying to identify the flow,
         computes the checksum. The computed checksum will be invalid at the
         receiving end, as ibmveth transmit routine zeroes out the pseudo
         checksum value in the packet. This leads to packet drop.
      
       - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list.
         When Physical NIC has GRO enabled and when OVS bridges these packets,
         OVS vport send code will end up calling dev_queue_xmit, which in turn
         calls validate_xmit_skb.
         In validate_xmit_skb routine, the larger packets will get segmented into
         MSS sized segments, if SKB has a frag_list and if the driver to which
         they are delivered to doesn't support NETIF_F_FRAGLIST feature.
      
      This patch addresses the above four issues, thereby enabling end to end
      largesend and checksum offload support for better performance.
      
       - Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and
         checksum offloads.
       - Fix for Issue 2 : When ibmveth receives a packet with "checksum good"
         bit set and if its configured in Trunk mode, set appropriate SKB fields
         using skb_partial_csum_set (ip_summed field is set with
         CHECKSUM_PARTIAL)
       - Fix for Issue 3: Recompute the pseudo header checksum before sending the
         SKB up the stack.
       - Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up
         allocating buffers and copying data, this fix gives
         upto 4X throughput increase.
      
      Note: All these fixes need to be dropped together as fixing just one of
      them will lead to other issues immediately (especially for Issues 1,2 & 3).
      Signed-off-by: default avatarSivakumar Krishnasamy <ksiva@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66aa0678