1. 21 Jul, 2015 13 commits
    • Jon Paul Maloy's avatar
      tipc: change sk_buffer handling in tipc_link_xmit() · 22d85c79
      Jon Paul Maloy authored
      When the function tipc_link_xmit() is given a buffer list for
      transmission, it currently consumes the list both when transmission
      is successful and when it fails, except for the special case when
      it encounters link congestion.
      
      This behavior is inconsistent, and needs to be corrected if we want
      to avoid problems in later commits in this series.
      
      In this commit, we change this to let the function consume the list
      only when transmission is successful, and leave the list with the
      sender in all other cases. We also modifiy the socket code so that
      it adapts to this change, i.e., purges the list when a non-congestion
      error code is returned.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22d85c79
    • Jon Paul Maloy's avatar
      tipc: use bearer index when looking up active links · 36e78a46
      Jon Paul Maloy authored
      struct tipc_node currently holds two arrays of link pointers; one,
      indexed by bearer identity, which contains all links irrespective of
      current state, and one two-slot array for the currently active link
      or links. The latter array contains direct pointers into the elements
      of the former. This has the effect that we cannot know the bearer id of
      a link when accessing it via the "active_links[]" array without actually
      dereferencing the pointer, something we want to avoid in some cases.
      
      In this commit, we do instead store the bearer identity in the
      "active_links" array, and use this as an index to find the right element
      in the overall link entry array. This change should be seen as a
      preparation for the later commits in this series.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36e78a46
    • Jon Paul Maloy's avatar
      tipc: move link input queue to tipc_node · d39bbd44
      Jon Paul Maloy authored
      At present, the link input queue and the name distributor receive
      queues are fields aggregated in struct tipc_link. This is a hazard,
      because a link might be deleted while a receiving socket still keeps
      reference to one of the queues.
      
      This commit fixes this bug. However, rather than adding yet another
      reference counter to the critical data path, we move the two queues
      to safe ground inside struct tipc_node, which is already protected, and
      let the link code only handle references to the queues. This is also
      in line with planned later changes in this area.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d39bbd44
    • Jon Paul Maloy's avatar
      tipc: move link creation from neighbor discoverer to node · d3a43b90
      Jon Paul Maloy authored
      As a step towards turning links into node internal entities, we move the
      creation of links from the neighbor discovery logics to the node's link
      control logics.
      
      We also create an additional entry for the link's media address in the
      newly introduced struct tipc_link_entry, since this is where it is
      needed in the upcoming commits. The current copy in struct tipc_link
      is kept for now, but will be removed later.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3a43b90
    • Jon Paul Maloy's avatar
      tipc: introduce link entry structure to struct tipc_node · 9d13ec65
      Jon Paul Maloy authored
      struct 'tipc_node' currently contains two arrays for link attributes,
      one for the link pointers, and one for the usable link MTUs.
      
      We now group those into a new struct 'tipc_link_entry', and intoduce
      one single array consisting of such enties. Apart from being a cosmetic
      improvement, this is a starting point for the strict master-slave
      relation between node and link that we will introduce in the following
      commits.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d13ec65
    • Jiri Benc's avatar
      net: remove skb_frag_add_head · 6acc2326
      Jiri Benc authored
      It's not used anywhere.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6acc2326
    • David S. Miller's avatar
      Merge branch 'offload_fwd_mark' · bd265242
      David S. Miller authored
      Scott Feldman says:
      
      ====================
      switchdev: avoid duplicate packet forwarding
      
      v3:
      
       - Per Nicolas Dichtel review: remove errant empty union.
      
      v2:
      
       - Per davem review: in sk_buff, union fwd_mark with secmark to save space
         since features appear to be mutually exclusive.
       - Per Simon Horman review:
         - fix grammar in switchdev.txt wrt fwd_mark
         - remove some unrelated changes that snuck in
      
      v1:
      
      This patchset was previously submitted as RFC.  No changes from the last
      version (v2) sent under RFC.  Including RFC version history here for reference.
      
      RFC v2:
      
       - s/fwd_mark/offload_fwd_mark
       - use consume_skb rather than kfree_skb when dropping pkt on egress.
       - Use Jiri's suggestion to use ifindex of one of the ports in a group
         as the mark for all the ports in the group.  This can be done with
         no additional storage (no hashtable from v1).  To pull it off, we
         need some simple recursive routines to walk the netdev tree ensuring
         all leaves in the tree (ports) in the same group (e.g. bridge)
         belonging to the same switch device will have the same offload fwd mark.
         Maybe someone sees a better design for the recusive routines?  They're
         not too bad, and should cover the stacked driver cases.
      
      RFC v1:
      
      With switchdev support for offloading L2/L3 forwarding data path to a
      switch device, we have a general problem where both the device and the
      kernel may forward the packet, resulting in duplicate packets on the wire.
      Anytime a packet is forwarded by the device and a copy is sent to the CPU,
      there is potential for duplicate forwarding, as the kernel may also do a
      forwarding lookup and send the packet on the wire.
      
      The specific problem this patch series is interested in solving is avoiding
      duplicate packets on bridged ports.  There was a previous RFC from Roopa
      (http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
      problem, but didn't solve the problem of mixed ports in the bridge from
      different devices; there was no way to exclude some ports from forwarding
      and include others.  This RFC solves that problem by tagging the ingressing
      packet with a unique mark, and then comparing the packet mark with the
      egress port mark, and skip forwarding when there is a match.  For the mixed
      ports bridge case, only those ports with matching marks are skipped.
      
      The switchdev port driver must do two things:
      
      1) Generate a fwd_mark for each switch port, using some unique key of the
         switch device (and optionally port).  This is done when the port netdev
         is registered or if the port's group membership changes (joins/leaves
         a bridge, for example).
      
      2) On packet ingress from port, mark the skb with the ingress port's
         fwd_mark.  If the device supports it, it's useful to only mark skbs
         which were already forwarded by the device.  If the device does not
         support such indication, all skbs can be marked, even if they're
         local dst.
      
      Two new 32-bit fields are added to struct sk_buff and struct netdevice to
      hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
      tried using skb->mark for this purpose, but ebtables can overwrite the
      skb->mark before the bridge gets it, so that will not work.
      
      In general, this fwd_mark can be used for any case where a packet is
      forwarded by the device and a copy is sent to the CPU, to avoid the kernel
      re-forwarding the packet.  sFlow is another use-case that comes to mind,
      but I haven't explored the details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd265242
    • Scott Feldman's avatar
    • Scott Feldman's avatar
      rocker: add offload_fwd_mark support · 3f98a8e6
      Scott Feldman authored
      If device flags ingress packet as "fwd offload", mark the
      skb->offlaod_fwd_mark using the ingress port's dev->offlaod_fwd_mark.  This
      will be the hint to the kernel that this packet has already been forwarded
      by device to egress ports matching skb->offlaod_fwd_mark.
      
      For rocker, derive port dev->offlaod_fwd_mark based on device switch ID and
      port ifindex.  If port is bridged, use the bridge ifindex rather than the
      port ifindex.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f98a8e6
    • Scott Feldman's avatar
      switchdev: add offload_fwd_mark generator helper · 1a3b2ec9
      Scott Feldman authored
      skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be
      unique for device and may even be unique for a sub-set of ports within
      device, so add switchdev helper function to generate unique marks based on
      port's switch ID and group_ifindex.  group_ifindex would typically be the
      container dev's ifindex, such as the bridge's ifindex.
      
      The generator uses a global hash table to store offload_fwd_marks hashed by
      {switch ID, group_ifindex} key.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a3b2ec9
    • Scott Feldman's avatar
    • Scott Feldman's avatar
      net: don't reforward packets already forwarded by offload device · 0c4f691f
      Scott Feldman authored
      Just before queuing skb for xmit on port, check if skb has been marked by
      switchdev port driver as already fordwarded by device.  If so, drop skb.  A
      non-zero skb->offload_fwd_mark field is set by the switchdev port
      driver/device on ingress to indicate the skb has already been forwarded by
      the device to egress ports with matching dev->skb_mark.  The switchdev port
      driver would assign a non-zero dev->offload_skb_mark for each device port
      netdev during registration, for example.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c4f691f
    • Simon Horman's avatar
      rocker: forward packets to CPU when port is joined to openvswitch · 8254973f
      Simon Horman authored
      Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
      There is scope to later refine what is passed up as per Open vSwitch flows
      on a port.
      
      This does not change the behaviour of rocker ports that are
      not joined to Open vSwitch.
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8254973f
  2. 20 Jul, 2015 11 commits
  3. 16 Jul, 2015 16 commits
    • YOSHIFUJI Hideaki's avatar
    • David S. Miller's avatar
      Merge branch 'protodown' · 0d057881
      David S. Miller authored
      Anuradha Karuppiah says:
      
      ====================
      net: Introduce protodown flag.
      
      User space daemons can detect errors in the network that need to be
      notified to the switch device drivers.
      
      Drivers can react to this error state by doing a phy-down on the
      switch-port which would result in a carrier-off locally and on the directly
      connected switch. Doing that would prevent loops and black-holes in the
      network.
      
      One such use case is the multi-chassis LAG application -
      
      1. The MLAG application runs on peer switches (say Switch0 and Switch1)
         synchronizing states, forwarding entries etc. between the two
         switches over the peer-link (this is a link directly connecting the
         two switches).
      2. An MLAG election process designates one of the switches as a primary
         (for e.g. Switch0 is primary and Switch1 is secondary).
      3. The peer link plays a critical role in allowing Switch0-Switch1 to
         function as a single LAG partner to the downstream dual-connected
         servers. When the peer-link between the switches goes down we have a
         split-brain situation. Switch0 and Switch1 are no longer in sync and
         are acting independently. This can result in traffic loops and
         traffic black-holing in the network.
      4. To prevent these problems the MLAG application on the secondary
         switch phy-downs the MLAG ports on detecting the peer-link down.
         This will be seen as a carrier down on servers that are
         dual-connected to Switch0 and Switch1.
      5. Specifically a dual-connected server will see a carrier-down on the
         port connected to the MLAG secondary, Switch1, and will stop using
         that port for traffic TX. So traffic black holing is prevented.
      
      v6 to v7:
         Removed some unnecessary code in response to review comments.
      
      v5 to v6:
         Replaced proto_flags with a simple proto_down boolean attribute in
         response to Dave's comments.
      
      v4 to v5:
         Changed the ip link display format for protodown to match the set as
         recommended by Stephen.
      
      v3 to v4:
         I have moved protodown out of IFF_XXX and introduced a separate
         proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify
         switch port errors. This is in response to Stephen's comments that
         adding a new IFF_XXX may break user space.
      
         I have used rocker as the sample switch driver. And to test this
         functionality I used the qemu-rocker patch that Scott sent out in
         response to the v3 posting (needed to set link up/down when phy is
         enabled/disabled).
      
      v1 to v2:
         Based on Dave's suggestion I have moved out aggregating of error bits
         across applications to a user space framework. This patch now simply
         notifies an aggregated error bit to drivers enabling them to handle
         the error gracefully.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d057881
    • Anuradha Karuppiah's avatar
      rocker: Handle protodown notifications. · c3055246
      Anuradha Karuppiah authored
      protodown can be set by user space applications like MLAG on detecting
      errors on a switch port. This patch provides sample switch driver changes
      for handling protodown. Rocker PHYS disables the port in response to
      protodown.
      Signed-off-by: default avatarAnuradha Karuppiah <anuradhak@cumulusnetworks.com>
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3055246
    • Anuradha Karuppiah's avatar
    • Anuradha Karuppiah's avatar
      net core: Add protodown support. · d746d707
      Anuradha Karuppiah authored
      This patch introduces the proto_down flag that can be used by user space
      applications to notify switch drivers that errors have been detected on the
      device.
      
      The switch driver can react to protodown notification by doing a phys down
      on the associated switch port.
      Signed-off-by: default avatarAnuradha Karuppiah <anuradhak@cumulusnetworks.com>
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d746d707
    • Thomas Falcon's avatar
      ibmveth: add support for TSO6 · 07e6a97d
      Thomas Falcon authored
      This patch adds support for a new method of signalling the firmware
      that TSO packets are being sent. The new method removes the need to
      alter the ip and tcp checksums and allows TSO6 support.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07e6a97d
    • Haiyang Zhang's avatar
      hv_netvsc: Add close of RNDIS filter into change mtu call · 2de8530b
      Haiyang Zhang authored
      The current change mtu call only stops tx before removing RNDIS filter.
      In case ringbufer is not empty, the rndis_filter_device_remove() may
      hang on removing the buffers.
      
      This patch adds close of RNDIS filter before removing it, also a
      gradual waiting loop until the ring is empty. The change_mtu hang
      issue under heavy traffic is solved by this patch.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de8530b
    • YOSHIFUJI Hideaki/吉藤英明's avatar
      ipv6: Fix finding best source address in ipv6_dev_get_saddr(). · c0b8da1e
      YOSHIFUJI Hideaki/吉藤英明 authored
      Commit 9131f3de ("ipv6: Do not iterate over all interfaces when
      finding source address on specific interface.") did not properly
      update best source address available.  Plus, it introduced
      possible NULL pointer dereference.
      
      Bug was reported by Erik Kline <ek@google.com>.
      Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>.
      
      Fixes: 9131f3de ("ipv6: Do not
      	iterate over all interfaces when finding source address
      	on specific interface.")
      Signed-off-by: default avatarYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Acked-by: default avatarHajime Tazaki <thehajime@gmail.com>
      Acked-by: default avatarErik Kline <ek@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0b8da1e
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 9243b25b
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-07-14
      
      This series contains updates to i40e and i40evf only.
      
      Joe Stringer and Jesse Gross add a ndo_features_check function to ensure
      that the i40e driver does not try to offload packets that exceed 80 bytes
      in length.
      
      Anjali adds additional stats to track flow director ATR and SB current
      state and flow director flush count which will help the need for verbose
      debug logs with respect to flow director.  Also refines an error message
      to avoid confusion, so that it indicates what may have really happened
      when the init_shared_code() call possibly fails.
      
      Pawel adds new fields to the capabilities structures to handle Flex-10
      device/function capabilities which is needed to support Flex-10 configs.
      
      Jesse improves the transmit performance by added a prefetch for the
      next transmit descriptor to be used when we know there are more coming.
      
      Mitch modifies i40evf driver to handle/allow an abundance of vectors.
      Currently the driver only maps transmit and receive queues to a single
      MSI-X vector per queue if there are exactly enough vectors for this, but
      if we have too many vectors, it will fail and allocate queues to vectors
      in a suboptimal manner.  So change the condition check to allow for an
      excess number of vectors and won't use the extras.  Also update the
      driver to just return success if the user attempts to set a port VLAN on
      a VF that already has the same port VLAN configured, instead of going
      through unnecessary filter removals & adds.  Fix the MAC filters for VFs,
      which were being programmed with 0 for the VLAN value when there was no
      VLAN assigned.  Instead, we must use -1 to indicate that no VLAN is in
      use.  Fix the VF disable code, which was not properly cleaning up the VF
      and would leave the VF in an indeterminate state, so fix this by
      notifying the VF and then call the normal VF reset routine.  Fix the
      logic in the driver so that MAC filters are added and removed correctly
      and added a check for the driver's hardware MAC address so that this
      filter does not get removed incorrectly.
      
      Carolyn removes incorrect #ifdef's which should not have been added in
      the first place and with the #ifdef's removed, make the necessary
      changes in the driver to resolve compile errors.
      
      Greg updates the admin queue command header defines.
      
      v2: fix indentation in patch 12 based on feedback from Sergei Shtylyov
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9243b25b
    • Andrea Parri's avatar
      pkt_sched: sch_qfq: remove unused member of struct qfq_sched · 40bdc536
      Andrea Parri authored
      The member (u32) "num_active_agg" of struct qfq_sched has been unused
      since its introduction in 462dbc91
      "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT)
      there is no active plan to use it; this removes the member.
      Signed-off-by: default avatarAndrea Parri <parri.andrea@gmail.com>
      Acked-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40bdc536
    • Christophe Jaillet's avatar
      net: qlcnic: Deletion of unnecessary memset · e29dd443
      Christophe Jaillet authored
      There is no need to memset memory allocated with vzalloc.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e29dd443
    • David S. Miller's avatar
      Merge branch 'gianfar_rx_sg' · 9061cb02
      David S. Miller authored
      Claudiu Manoil says:
      
      ====================
      gianfar: Add Rx S/G
      
      This patch-set introduces scatter/gather support
      on the Rx side, addressing Rx path performance
      issues in the driver.
      Thanks.
      
      As an example, two boards connected back-to-back
      were used to measure the throughput, running the
      same kernel 4.1, before and after applying these
      patches.
      The netperf UDP_STREAM results below show that the
      bottleneck lies on the Rx side BEFORE applying the
      patches, and that the Rx throughput is even lower
      with a larger MTU.  AFTER applying the patches the
      Rx bottleneck is gone (Rx throughput matches the
      Tx one) and the RX throughput is not influenced by
      MTU size any longer (as expected).
      
      BEFORE:
      
      1) MTU 1500 (default)
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    20119124      0      549.4     100.00   14.911
      163840           150.00    14057349             383.9     100.00   14.911
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23654013      0       80.7     100.00   101.463
      163840           150.00    15875288              54.2     100.00   101.463
      
      2) MTU 8000
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    20067232      0      548.0     100.00   14.950
      163840           150.00    6113498             166.9     99.95    14.942
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23621279      0       80.6     100.00   101.604
      163840           150.00    5868602              20.0     99.96    101.563
      
      AFTER:
      (both MTU 1500 and MTU 8000)
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    19914969      0      543.8     100.00   15.064
      163840           150.00    19914969             543.8     99.35    14.966
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23433989      0       80.0     100.00   102.416
      163840           150.00    23433989              80.0     99.62    102.023
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9061cb02
    • Claudiu Manoil's avatar
      gianfar: Add paged allocation and Rx S/G · 75354148
      Claudiu Manoil authored
      The eTSEC h/w is capable of scatter/gather on the receive side
      too if MAXFRM > MRBLR, when the allowed maximum Rx frame size
      is set to be greater than the maximum Rx buffer size (MRBLR).
      It's about time the driver makes use of this h/w capability,
      by supporting fixed buffer sizes and Rx S/G.
      
      The buffer size given to eTSEC for reception is fixed to
      1536B (must be multiple of 64), which is the same default
      buffer size as before, used to accommodate standard MTU
      (1500B) size frames.  As before, eTSEC can receive frames of
      up to 9600B.  Individual Rx buffers are mapped to page halves
      (page size for eTSEC systems is 4KB).  The skb is built around
      the first buffer of a frame (using build_skb()).  In case the
      frame spans multiple buffers, the trailing buffers are added
      as Rx fragments to the skb.  The last buffer in frame is marked
      by the L status flag.  A mechanism is in place to reuse the pages
      owned by the driver (for Rx) for subsequent receptions.
      
      Supporting fixed size buffers allows the implementation of Rx S/G,
      which in turn removes the memory pressure issues the driver had
      before when MTU was set for jumbo frame reception.
      Also, in most cases, the Rx path becomes faster due to Rx page
      reusal, since the overhead of allocating new rx buffers is removed
      from the fast path.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75354148
    • Claudiu Manoil's avatar
      gianfar: Use ndev, more Rx path cleanup · f23223f1
      Claudiu Manoil authored
      Use "ndev" instead of "dev", as the rx queue back pointer
      to a net_device struct, to avoid name clashing with a
      "struct device" reference.  This prepares the addition of a
      "struct device" back pointer to the rx queue structure.
      
      Remove duplicated rxq registration in the process.
      Move napi_gro_receive() outside gfar_process_frame().
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f23223f1
    • Claudiu Manoil's avatar
      gianfar: Fix and cleanup rxbd status handling · f966082e
      Claudiu Manoil authored
      There are several (long standing) problems about how the status
      field of the rx buffer descriptor (rxbd) is currently handled on
      the error path:
      - too many unnecessary 16bit reads of the two halves of the rxbd
      status field (32bit), also resulting in overuse of endianness
      convesion macros;
      - "bdp->status = RXBD_LARGE" makes no sense, since the "large"
      flag is read only (only eTSEC can write it), and trying to clear
      the other status bits is also error prone in this context
      (most of the rx status bits are read only anyway).
      
      This is fixed with a single 32bit read of the "status" field,
      and then the appropriate 16bit shifting is applied to access
      the various status bits or the rx frame length. Also corrected
      the use of the RXBD_LARGE flag.
      
      Additional fix:
      "rx_over_errors" stat is incremented instead of "rx_crc_errors"
      in case of RXBD_OVERRUN occurrence.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f966082e
    • Claudiu Manoil's avatar
      gianfar: Bundle Rx allocation, cleanup · 76f31e8b
      Claudiu Manoil authored
      Use a more common consumer/ producer index design to improve
      rx buffer allocation.  Instead of allocating a single new buffer
      (skb) on each iteration, bundle the allocation of several rx
      buffers at a time.  This also opens the path for further memory
      optimizations.
      
      Remove useless check of rxq->rfbptr, since this patch touches
      rx pause frame handling code as well.  rxq->rfbptr is always
      initialized as part of Rx BD ring init.
      Remove redundant (and misleading) 'amount_pull' parameter.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76f31e8b