1. 20 Jul, 2015 3 commits
  2. 16 Jul, 2015 16 commits
    • YOSHIFUJI Hideaki's avatar
    • David S. Miller's avatar
      Merge branch 'protodown' · 0d057881
      David S. Miller authored
      Anuradha Karuppiah says:
      
      ====================
      net: Introduce protodown flag.
      
      User space daemons can detect errors in the network that need to be
      notified to the switch device drivers.
      
      Drivers can react to this error state by doing a phy-down on the
      switch-port which would result in a carrier-off locally and on the directly
      connected switch. Doing that would prevent loops and black-holes in the
      network.
      
      One such use case is the multi-chassis LAG application -
      
      1. The MLAG application runs on peer switches (say Switch0 and Switch1)
         synchronizing states, forwarding entries etc. between the two
         switches over the peer-link (this is a link directly connecting the
         two switches).
      2. An MLAG election process designates one of the switches as a primary
         (for e.g. Switch0 is primary and Switch1 is secondary).
      3. The peer link plays a critical role in allowing Switch0-Switch1 to
         function as a single LAG partner to the downstream dual-connected
         servers. When the peer-link between the switches goes down we have a
         split-brain situation. Switch0 and Switch1 are no longer in sync and
         are acting independently. This can result in traffic loops and
         traffic black-holing in the network.
      4. To prevent these problems the MLAG application on the secondary
         switch phy-downs the MLAG ports on detecting the peer-link down.
         This will be seen as a carrier down on servers that are
         dual-connected to Switch0 and Switch1.
      5. Specifically a dual-connected server will see a carrier-down on the
         port connected to the MLAG secondary, Switch1, and will stop using
         that port for traffic TX. So traffic black holing is prevented.
      
      v6 to v7:
         Removed some unnecessary code in response to review comments.
      
      v5 to v6:
         Replaced proto_flags with a simple proto_down boolean attribute in
         response to Dave's comments.
      
      v4 to v5:
         Changed the ip link display format for protodown to match the set as
         recommended by Stephen.
      
      v3 to v4:
         I have moved protodown out of IFF_XXX and introduced a separate
         proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify
         switch port errors. This is in response to Stephen's comments that
         adding a new IFF_XXX may break user space.
      
         I have used rocker as the sample switch driver. And to test this
         functionality I used the qemu-rocker patch that Scott sent out in
         response to the v3 posting (needed to set link up/down when phy is
         enabled/disabled).
      
      v1 to v2:
         Based on Dave's suggestion I have moved out aggregating of error bits
         across applications to a user space framework. This patch now simply
         notifies an aggregated error bit to drivers enabling them to handle
         the error gracefully.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d057881
    • Anuradha Karuppiah's avatar
      rocker: Handle protodown notifications. · c3055246
      Anuradha Karuppiah authored
      protodown can be set by user space applications like MLAG on detecting
      errors on a switch port. This patch provides sample switch driver changes
      for handling protodown. Rocker PHYS disables the port in response to
      protodown.
      Signed-off-by: default avatarAnuradha Karuppiah <anuradhak@cumulusnetworks.com>
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3055246
    • Anuradha Karuppiah's avatar
    • Anuradha Karuppiah's avatar
      net core: Add protodown support. · d746d707
      Anuradha Karuppiah authored
      This patch introduces the proto_down flag that can be used by user space
      applications to notify switch drivers that errors have been detected on the
      device.
      
      The switch driver can react to protodown notification by doing a phys down
      on the associated switch port.
      Signed-off-by: default avatarAnuradha Karuppiah <anuradhak@cumulusnetworks.com>
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d746d707
    • Thomas Falcon's avatar
      ibmveth: add support for TSO6 · 07e6a97d
      Thomas Falcon authored
      This patch adds support for a new method of signalling the firmware
      that TSO packets are being sent. The new method removes the need to
      alter the ip and tcp checksums and allows TSO6 support.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07e6a97d
    • Haiyang Zhang's avatar
      hv_netvsc: Add close of RNDIS filter into change mtu call · 2de8530b
      Haiyang Zhang authored
      The current change mtu call only stops tx before removing RNDIS filter.
      In case ringbufer is not empty, the rndis_filter_device_remove() may
      hang on removing the buffers.
      
      This patch adds close of RNDIS filter before removing it, also a
      gradual waiting loop until the ring is empty. The change_mtu hang
      issue under heavy traffic is solved by this patch.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de8530b
    • YOSHIFUJI Hideaki/吉藤英明's avatar
      ipv6: Fix finding best source address in ipv6_dev_get_saddr(). · c0b8da1e
      YOSHIFUJI Hideaki/吉藤英明 authored
      Commit 9131f3de ("ipv6: Do not iterate over all interfaces when
      finding source address on specific interface.") did not properly
      update best source address available.  Plus, it introduced
      possible NULL pointer dereference.
      
      Bug was reported by Erik Kline <ek@google.com>.
      Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>.
      
      Fixes: 9131f3de ("ipv6: Do not
      	iterate over all interfaces when finding source address
      	on specific interface.")
      Signed-off-by: default avatarYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Acked-by: default avatarHajime Tazaki <thehajime@gmail.com>
      Acked-by: default avatarErik Kline <ek@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0b8da1e
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 9243b25b
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-07-14
      
      This series contains updates to i40e and i40evf only.
      
      Joe Stringer and Jesse Gross add a ndo_features_check function to ensure
      that the i40e driver does not try to offload packets that exceed 80 bytes
      in length.
      
      Anjali adds additional stats to track flow director ATR and SB current
      state and flow director flush count which will help the need for verbose
      debug logs with respect to flow director.  Also refines an error message
      to avoid confusion, so that it indicates what may have really happened
      when the init_shared_code() call possibly fails.
      
      Pawel adds new fields to the capabilities structures to handle Flex-10
      device/function capabilities which is needed to support Flex-10 configs.
      
      Jesse improves the transmit performance by added a prefetch for the
      next transmit descriptor to be used when we know there are more coming.
      
      Mitch modifies i40evf driver to handle/allow an abundance of vectors.
      Currently the driver only maps transmit and receive queues to a single
      MSI-X vector per queue if there are exactly enough vectors for this, but
      if we have too many vectors, it will fail and allocate queues to vectors
      in a suboptimal manner.  So change the condition check to allow for an
      excess number of vectors and won't use the extras.  Also update the
      driver to just return success if the user attempts to set a port VLAN on
      a VF that already has the same port VLAN configured, instead of going
      through unnecessary filter removals & adds.  Fix the MAC filters for VFs,
      which were being programmed with 0 for the VLAN value when there was no
      VLAN assigned.  Instead, we must use -1 to indicate that no VLAN is in
      use.  Fix the VF disable code, which was not properly cleaning up the VF
      and would leave the VF in an indeterminate state, so fix this by
      notifying the VF and then call the normal VF reset routine.  Fix the
      logic in the driver so that MAC filters are added and removed correctly
      and added a check for the driver's hardware MAC address so that this
      filter does not get removed incorrectly.
      
      Carolyn removes incorrect #ifdef's which should not have been added in
      the first place and with the #ifdef's removed, make the necessary
      changes in the driver to resolve compile errors.
      
      Greg updates the admin queue command header defines.
      
      v2: fix indentation in patch 12 based on feedback from Sergei Shtylyov
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9243b25b
    • Andrea Parri's avatar
      pkt_sched: sch_qfq: remove unused member of struct qfq_sched · 40bdc536
      Andrea Parri authored
      The member (u32) "num_active_agg" of struct qfq_sched has been unused
      since its introduction in 462dbc91
      "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT)
      there is no active plan to use it; this removes the member.
      Signed-off-by: default avatarAndrea Parri <parri.andrea@gmail.com>
      Acked-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40bdc536
    • Christophe Jaillet's avatar
      net: qlcnic: Deletion of unnecessary memset · e29dd443
      Christophe Jaillet authored
      There is no need to memset memory allocated with vzalloc.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e29dd443
    • David S. Miller's avatar
      Merge branch 'gianfar_rx_sg' · 9061cb02
      David S. Miller authored
      Claudiu Manoil says:
      
      ====================
      gianfar: Add Rx S/G
      
      This patch-set introduces scatter/gather support
      on the Rx side, addressing Rx path performance
      issues in the driver.
      Thanks.
      
      As an example, two boards connected back-to-back
      were used to measure the throughput, running the
      same kernel 4.1, before and after applying these
      patches.
      The netperf UDP_STREAM results below show that the
      bottleneck lies on the Rx side BEFORE applying the
      patches, and that the Rx throughput is even lower
      with a larger MTU.  AFTER applying the patches the
      Rx bottleneck is gone (Rx throughput matches the
      Tx one) and the RX throughput is not influenced by
      MTU size any longer (as expected).
      
      BEFORE:
      
      1) MTU 1500 (default)
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    20119124      0      549.4     100.00   14.911
      163840           150.00    14057349             383.9     100.00   14.911
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23654013      0       80.7     100.00   101.463
      163840           150.00    15875288              54.2     100.00   101.463
      
      2) MTU 8000
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    20067232      0      548.0     100.00   14.950
      163840           150.00    6113498             166.9     99.95    14.942
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23621279      0       80.6     100.00   101.604
      163840           150.00    5868602              20.0     99.96    101.563
      
      AFTER:
      (both MTU 1500 and MTU 8000)
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    19914969      0      543.8     100.00   15.064
      163840           150.00    19914969             543.8     99.35    14.966
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23433989      0       80.0     100.00   102.416
      163840           150.00    23433989              80.0     99.62    102.023
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9061cb02
    • Claudiu Manoil's avatar
      gianfar: Add paged allocation and Rx S/G · 75354148
      Claudiu Manoil authored
      The eTSEC h/w is capable of scatter/gather on the receive side
      too if MAXFRM > MRBLR, when the allowed maximum Rx frame size
      is set to be greater than the maximum Rx buffer size (MRBLR).
      It's about time the driver makes use of this h/w capability,
      by supporting fixed buffer sizes and Rx S/G.
      
      The buffer size given to eTSEC for reception is fixed to
      1536B (must be multiple of 64), which is the same default
      buffer size as before, used to accommodate standard MTU
      (1500B) size frames.  As before, eTSEC can receive frames of
      up to 9600B.  Individual Rx buffers are mapped to page halves
      (page size for eTSEC systems is 4KB).  The skb is built around
      the first buffer of a frame (using build_skb()).  In case the
      frame spans multiple buffers, the trailing buffers are added
      as Rx fragments to the skb.  The last buffer in frame is marked
      by the L status flag.  A mechanism is in place to reuse the pages
      owned by the driver (for Rx) for subsequent receptions.
      
      Supporting fixed size buffers allows the implementation of Rx S/G,
      which in turn removes the memory pressure issues the driver had
      before when MTU was set for jumbo frame reception.
      Also, in most cases, the Rx path becomes faster due to Rx page
      reusal, since the overhead of allocating new rx buffers is removed
      from the fast path.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75354148
    • Claudiu Manoil's avatar
      gianfar: Use ndev, more Rx path cleanup · f23223f1
      Claudiu Manoil authored
      Use "ndev" instead of "dev", as the rx queue back pointer
      to a net_device struct, to avoid name clashing with a
      "struct device" reference.  This prepares the addition of a
      "struct device" back pointer to the rx queue structure.
      
      Remove duplicated rxq registration in the process.
      Move napi_gro_receive() outside gfar_process_frame().
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f23223f1
    • Claudiu Manoil's avatar
      gianfar: Fix and cleanup rxbd status handling · f966082e
      Claudiu Manoil authored
      There are several (long standing) problems about how the status
      field of the rx buffer descriptor (rxbd) is currently handled on
      the error path:
      - too many unnecessary 16bit reads of the two halves of the rxbd
      status field (32bit), also resulting in overuse of endianness
      convesion macros;
      - "bdp->status = RXBD_LARGE" makes no sense, since the "large"
      flag is read only (only eTSEC can write it), and trying to clear
      the other status bits is also error prone in this context
      (most of the rx status bits are read only anyway).
      
      This is fixed with a single 32bit read of the "status" field,
      and then the appropriate 16bit shifting is applied to access
      the various status bits or the rx frame length. Also corrected
      the use of the RXBD_LARGE flag.
      
      Additional fix:
      "rx_over_errors" stat is incremented instead of "rx_crc_errors"
      in case of RXBD_OVERRUN occurrence.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f966082e
    • Claudiu Manoil's avatar
      gianfar: Bundle Rx allocation, cleanup · 76f31e8b
      Claudiu Manoil authored
      Use a more common consumer/ producer index design to improve
      rx buffer allocation.  Instead of allocating a single new buffer
      (skb) on each iteration, bundle the allocation of several rx
      buffers at a time.  This also opens the path for further memory
      optimizations.
      
      Remove useless check of rxq->rfbptr, since this patch touches
      rx pause frame handling code as well.  rxq->rfbptr is always
      initialized as part of Rx BD ring init.
      Remove redundant (and misleading) 'amount_pull' parameter.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76f31e8b
  3. 15 Jul, 2015 16 commits
  4. 14 Jul, 2015 2 commits
  5. 13 Jul, 2015 3 commits
    • Nikolay Aleksandrov's avatar
      bridge: mdb: add vlan support for user entries · 74fe61f1
      Nikolay Aleksandrov authored
      Until now all user mdb entries were added in vlan 0, this patch adds
      support to allow the user to specify the vlan for the entry.
      About the uapi change a hole in struct br_mdb_entry is used so the size
      and offsets are kept the same (verified with pahole and tested with older
      iproute2).
      
      Example:
      $ bridge mdb
      dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000
      dev br0 port eth1 grp 239.0.0.1 permanent vlan 200
      dev br0 port eth1 grp 239.0.0.1 permanent
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74fe61f1
    • Daniel Borkmann's avatar
      ebpf: remove self-assignment in interpreter's tail call · c4675f93
      Daniel Borkmann authored
      ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1]
      and thus has no effect. Add a comment instead, explaining what happens and
      why it's okay to just remove it. Since from user space side, a tail call is
      invoked as a pseudo helper function via bpf_tail_call_proto, the verifier
      checks the arguments just like with any other helper function and makes
      sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4675f93
    • Tom Herbert's avatar
      net: Build IPv6 into kernel by default · de551f2e
      Tom Herbert authored
      This patch makes the default to build IPv6 into the kernel. IPv6
      now has significant traction and any remaining vestiges of IPv6
      not being provided parity with IPv4 should be swept away. IPv6 is now
      core to the Internet and kernel.
      
      Points on IPv6 adoption:
      
      - Per Google statistics, IPv6 usage has reached 7% on the Internet
        and continues to exhibit an exponential growth rate
        https://www.google.com/intl/en/ipv6/statistics.html
      - Just a few days ago ARIN officially depleted its IPv4 pool
      - IPv6 only data centers are being successfully built
        (e.g. at Facebook)
      
      This patch changes the IPv6 Kconfig for IPV6. Default for CONFIG_IPV6
      is set to "y" and the text has been updated to reflect the maturity of
      IPv6.
      
      Impact:
      
      Under some circumstances building modules in to kernel might have a
      performance advantage. In my testing, I did notice a very slight
      improvement.
      
      This will obviously increase the size of the kernel image. In my
      configuration I see:
      
      IPv6 as module:
      
         text    data     bss     dec     hex filename
      9703666 1899288  933888 12536842         bf4c0a vmlinux
      
      IPv6 built into kernel
      
        text     data     bss     dec     hex filename
      9436490 1879600  913408 12229498         ba9b7a vmlinux
      
      Which increases text size by ~270K (2.8% increase in size for me). If
      image size is an issue, presumably for a device which does not do IP
      networking (IMO we should be discouraging IPv4-only devices), IPV6 can
      be disabled or still built as a module.
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de551f2e