1. 06 Jan, 2016 20 commits
  2. 05 Jan, 2016 20 commits
    • xypron.glpk@gmx.de's avatar
      include/uapi/linux/sockios.h: mark SIOCRTMSG unused · 2fbf5758
      xypron.glpk@gmx.de authored
      IOCTL SIOCRTMSG does nothing but return EINVAL.
      
      So comment it as unused.
      
      SIOCRTMSG is only used in:
      * net/ipv4/af_inet.c
      * include/uapi/linux/sockios.h
      
      inet_ioctl calls ip_rt_ioctl.
      ip_rt_ioctl only handles SIOCADDRT and SIOCDELRT and returns -EINVAL
      otherwise.
      Signed-off-by: default avatarHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fbf5758
    • David S. Miller's avatar
      Merge branch 'mlx5e-tstamp' · 1633bf11
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Introduce mlx5 ethernet timestamping
      
      This patch series introduces the support for ConnectX-4 timestamping
      and the PTP kernel interface.
      
      Changes from V2:
      net/mlx5_core: Introduce access function to read internal_timer
      	- Remove one line function
      	- Change function name
      
      net/mlx5e: Add HW timestamping (TS) support:
      	- Data path performance optimization (caching tstamp struct in rq,sq)
      	- Change read/write_lock_irqsave to read/write_lock
      	- Move ioctl functions to en_clock file
      	- Changed overflow start algorithm according to comments from Richard
      	- Move timestamp init/cleanup to open/close ndos.
      
      In details:
      
      1st patch prevents the driver from modifying skb->data and SKB CB in
      device xmit function.
      
      2nd patch adds the needed low level helpers for:
      	- Fetching the hardware clock (hardware internal timer)
      	- Parsing CQEs timestamps
      	- Device frequency capability
      
      3rd patch adds new en_clock.c file that handles all needed timestamping
      operations:
      	- Internal clock structure initialization and other helper functions
      	- Added the needed ioctl for setting/getting the current timestamping
      	  configuration.
      	- used this configuration in RX/TX data path to fill the SKB with
      	  the timestamp.
      
      4th patch Introduces PTP (PHC) support.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1633bf11
    • Eran Ben Elisha's avatar
      net/mlx5e: Add PTP Hardware Clock (PHC) support · 3d8c38af
      Eran Ben Elisha authored
      Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
      protect the timecounter since every packet received needs to call
      timecounter_cycle2time() when timestamping is enabled.  This can become
      a performance bottleneck with RSS and multiple receive queues if normal
      spinlocks are used.
      
      The driver has been tested with both Documentation/ptp/testptp and the
      linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
      ConnectX-4 card.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d8c38af
    • Eran Ben Elisha's avatar
      net/mlx5e: Add HW timestamping (TS) support · ef9814de
      Eran Ben Elisha authored
      Add support for enable/disable HW timestamping for incoming and/or
      outgoing packets. To enable/disable HW timestamping appropriate
      ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and
      HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in
      RX/TX flows to consider TS request and plant HW timestamps into
      relevant structures.
      
      Add internal clock for converting hardware timestamp to nanoseconds. In
      addition, add a service task to catch internal clock overflow, to make
      sure timestamping is accurate.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef9814de
    • Eran Ben Elisha's avatar
      net/mlx5_core: Introduce access function to read internal timer · b0844444
      Eran Ben Elisha authored
      A preparation step which adds support for reading the hardware
      internal timer and the hardware timestamping from the CQE.
      In addition, advertize device_frequency_khz HCA capability.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0844444
    • Achiad Shochat's avatar
      net/mlx5e: Do not modify the TX SKB · 34802a42
      Achiad Shochat authored
      If the SKB is cloned, or has an elevated users count, someone else
      can be looking at it at the same time.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34802a42
    • David S. Miller's avatar
      Merge branch 'sctp-transport-rhashtable' · 33c15297
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: use transport hashtable to replace association's with rhashtable
      
      for telecom center, the usual case is that a server is connected by thousands
      of clients. but if the server with only one enpoint(udp style) use the same
      sport and dport to communicate with every clients, and every assoc in server
      will be hashed in the same chain of global assoc hashtable due to currently we
      choose dport and sport as the hash key.
      
      when a packet is received, sctp_rcv try to find the assoc with sport and dport,
      since that chain is too long to find it fast, it make the performance turn to
      very low, some test data is as follow:
      
      in server:
      $./ss [start a udp style server there]
      in client:
      $./cc [start 2500 sockets to connect server with same port and different ip,
             and use one of them to send data to server]
      
      ===== test on net-next
      -- perf top
      server:
        55.73%  [kernel]             [k] sctp_assoc_is_match
         6.80%  [kernel]             [k] sctp_assoc_lookup_paddr
         4.81%  [kernel]             [k] sctp_v4_cmp_addr
         3.12%  [kernel]             [k] _raw_spin_unlock_irqrestore
         1.94%  [kernel]             [k] sctp_cmp_addr_exact
      
      client:
        46.01%  [kernel]                    [k] sctp_endpoint_lookup_assoc
         5.55%  libc-2.17.so                [.] __libc_calloc
         5.39%  libc-2.17.so                [.] _int_free
         3.92%  libc-2.17.so                [.] _int_malloc
         3.23%  [kernel]                    [k] __memset
      
      -- spent time
      time is 487s, send pkt is 10000000
      
      we need to change the way to calculate the hash key, to use lport +
      rport + paddr as the hash key can avoid this issue.
      
      besides, this patchset will use transport hashtable to replace
      association hashtable to lookup with rhashtable api. get transport
      first then get association by t->asoc. and also it will make tcp
      style work better.
      
      ===== test with this patchset:
      -- perf top
      server:
        15.98%  [kernel]                 [k] _raw_spin_unlock_irqrestore
         9.92%  [kernel]                 [k] __pv_queued_spin_lock_slowpath
         7.22%  [kernel]                 [k] copy_user_generic_string
         2.38%  libpthread-2.17.so       [.] __recvmsg_nocancel
         1.88%  [kernel]                 [k] sctp_recvmsg
      
      client:
        11.90%  [kernel]                   [k] sctp_hash_cmp
         8.52%  [kernel]                   [k] rht_deferred_worker
         4.94%  [kernel]                   [k] __pv_queued_spin_lock_slowpath
         3.95%  [kernel]                   [k] sctp_bind_addr_match
         2.49%  [kernel]                   [k] __memset
      
      -- spent time
      time is 22s, send pkt is 10000000
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33c15297
    • Xin Long's avatar
      sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc · c79c0666
      Xin Long authored
      sctp_endpoint_lookup_assoc is called in the protection of sock lock
      there is no need to call local_bh_disable in this function. so remove
      them.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c79c0666
    • Xin Long's avatar
      sctp: drop the old assoc hashtable of sctp · b5eff712
      Xin Long authored
      transport hashtable will replace the association hashtable,
      so association hashtable is not used in sctp any more, so
      drop the codes about that.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5eff712
    • Xin Long's avatar
      sctp: apply rhashtable api to sctp procfs · 39f66a7d
      Xin Long authored
      Traversal the transport rhashtable, get the association only once through
      the condition assoc->peer.primary_path != transport.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39f66a7d
    • Xin Long's avatar
      sctp: apply rhashtable api to send/recv path · 4f008781
      Xin Long authored
      apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
      and __sctp_lookup_association, it's invoked in the protection of sock
      lock, it will be safe, but sctp_lookup_association need to call
      rcu_read_lock() and to detect the t->dead to protect it.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f008781
    • Xin Long's avatar
      sctp: add the rhashtable apis for sctp global transport hashtable · d6c0256a
      Xin Long authored
      tranport hashtbale will replace the association hashtable to do the
      lookup for transport, and then get association by t->assoc, rhashtable
      apis will be used because of it's resizable, scalable and using rcu.
      
      lport + rport + paddr will be the base hashkey to locate the chain,
      with net to protect one netns from another, then plus the laddr to
      compare to get the target.
      
      this patch will provider the lookup functions:
      - sctp_epaddr_lookup_transport
      - sctp_addrs_lookup_transport
      
      hash/unhash functions:
      - sctp_hash_transport
      - sctp_unhash_transport
      
      init/destroy functions:
      - sctp_transport_hashtable_init
      - sctp_transport_hashtable_destroy
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6c0256a
    • David S. Miller's avatar
      Merge branch 'faster-soreuseport' · 6a5ef90c
      David S. Miller authored
      Craig Gallek says:
      
      ====================
      Faster SO_REUSEPORT
      
      This series contains two optimizations for the SO_REUSEPORT feature:
      Faster lookup when selecting a socket for an incoming packet and
      the ability to select the socket from the group using a BPF program.
      
      This series only includes the UDP path.  I plan to submit a follow-up
      including the TCP path if the implementation in this series is
      acceptable.
      
      Changes in v4:
      - pskb_may_pull is unnecessary with pskb_pull (per Alexei Starovoitov)
      
      Changes in v3:
      - skb_pull_inline -> pskb_pull (per Alexei Starovoitov)
      - reuseport_attach* -> sk_reuseport_attach* and simple return statement
        syntax change (per Daniel Borkmann)
      
      Changes in v2:
      - Fix ARM build; remove unnecessary include.
      - Handle case where protocol header is not in linear section (per
        Alexei Starovoitov).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a5ef90c
    • Craig Gallek's avatar
      soreuseport: BPF selection functional test · 3ca8e402
      Craig Gallek authored
      This program will build classic and extended BPF programs and
      validate the socket selection logic when used with
      SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF.
      
      It also validates the re-programing flow and several edge cases.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ca8e402
    • Craig Gallek's avatar
      soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF · 538950a1
      Craig Gallek authored
      Expose socket options for setting a classic or extended BPF program
      for use when selecting sockets in an SO_REUSEPORT group.  These options
      can be used on the first socket to belong to a group before bind or
      on any socket in the group after bind.
      
      This change includes refactoring of the existing sk_filter code to
      allow reuse of the existing BPF filter validation checks.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      538950a1
    • Craig Gallek's avatar
      soreuseport: fast reuseport UDP socket selection · e32ea7e7
      Craig Gallek authored
      Include a struct sock_reuseport instance when a UDP socket binds to
      a specific address for the first time with the reuseport flag set.
      When selecting a socket for an incoming UDP packet, use the information
      available in sock_reuseport if present.
      
      This required adding an additional field to the UDP source address
      equality function to differentiate between exact and wildcard matches.
      The original use case allowed wildcard matches when checking for
      existing port uses during bind.  The new use case of adding a socket
      to a reuseport group requires exact address matching.
      
      Performance test (using a machine with 2 CPU sockets and a total of
      48 cores):  Create reuseport groups of varying size.  Use one socket
      from this group per user thread (pinning each thread to a different
      core) calling recvmmsg in a tight loop.  Record number of messages
      received per second while saturating a 10G link.
        10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
        20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
        40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)
      
      This work is based off a similar implementation written by
      Ying Cai <ycai@google.com> for implementing policy-based reuseport
      selection.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e32ea7e7
    • Craig Gallek's avatar
      soreuseport: define reuseport groups · ef456144
      Craig Gallek authored
      struct sock_reuseport is an optional shared structure referenced by each
      socket belonging to a reuseport group.  When a socket is bound to an
      address/port not yet in use and the reuseport flag has been set, the
      structure will be allocated and attached to the newly bound socket.
      When subsequent calls to bind are made for the same address/port, the
      shared structure will be updated to include the new socket and the
      newly bound socket will reference the group structure.
      
      Usually, when an incoming packet was destined for a reuseport group,
      all sockets in the same group needed to be considered before a
      dispatching decision was made.  With this structure, an appropriate
      socket can be found after looking up just one socket in the group.
      
      This shared structure will also allow for more complicated decisions to
      be made when selecting a socket (eg a BPF filter).
      
      This work is based off a similar implementation written by
      Ying Cai <ycai@google.com> for implementing policy-based reuseport
      selection.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef456144
    • David S. Miller's avatar
      Merge branch 'mlxsw-fixes' · ebb3cf41
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: couple of fixes
      
      Couple of fixes from Ido.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebb3cf41
    • Ido Schimmel's avatar
      mlxsw: spectrum: Change bridge port attributes only when bridged · 6c72a3d0
      Ido Schimmel authored
      Bridge port attributes are offloaded to hardware when invoked with SELF
      flag set, but it really makes no sense to reflect them when port is not
      bridged.
      
      Allow a user to change these attribute only when port is bridged and
      initialize them correctly when joining or leaving a bridge.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c72a3d0
    • Ido Schimmel's avatar
      mlxsw: spectrum: Set bridge status in appropriate functions · 5a8f4525
      Ido Schimmel authored
      Set the bridge status of physical ports in the appropriate functions, to
      be consistent with LAG join/leave and vPorts joining/leaving bridge.
      
      Also, remove the error messages in these two functions, as we already
      emit errors in both the single functions they call.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a8f4525