1. 17 Dec, 2013 24 commits
    • David S. Miller's avatar
      Merge branch 'bonding_netlink' · 7271174f
      David S. Miller authored
      Scott Feldman says:
      
      ====================
      bonding: add some more netlink attributes
      
      The following series implements five more bonding netlink attributes:
      
      	primary
      	primary_reselect
      	fail_over_mac
      	xmit_hash_policy
      	resend_igmp
      
      Tested with modified iproute2 to verify attributes can be set at bond creation
      time or set later.  Verified sysfs interface to attributes continues to work.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7271174f
    • sfeldma@cumulusnetworks.com's avatar
      bonding: add resend_igmp attribute netlink support · d8838de7
      sfeldma@cumulusnetworks.com authored
      Add IFLA_BOND_RESEND_IGMP to allow get/set of bonding parameter
      resend_igmp via netlink.
      Signed-off-by: default avatarScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8838de7
    • sfeldma@cumulusnetworks.com's avatar
      bonding: add xmit_hash_policy attribute netlink support · f70161c6
      sfeldma@cumulusnetworks.com authored
      Add IFLA_BOND_XMIT_HASH_POLICY to allow get/set of bonding parameter
      xmit_hash_policy via netlink.
      Signed-off-by: default avatarScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f70161c6
    • sfeldma@cumulusnetworks.com's avatar
      bonding: add fail_over_mac attribute netlink support · 89901972
      sfeldma@cumulusnetworks.com authored
      Add IFLA_BOND_FAIL_OVER_MAC to allow get/set of bonding parameter
      fail_over_mac via netlink.
      Signed-off-by: default avatarScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89901972
    • sfeldma@cumulusnetworks.com's avatar
      bonding: add primary_select attribute netlink support · 8a41ae44
      sfeldma@cumulusnetworks.com authored
      Add IFLA_BOND_PRIMARY_SELECT to allow get/set of bonding parameter
      primary_select via netlink.
      Signed-off-by: default avatarScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a41ae44
    • sfeldma@cumulusnetworks.com's avatar
      bonding: add primary attribute netlink support · 0a98a0d1
      sfeldma@cumulusnetworks.com authored
      Add IFLA_BOND_PRIMARY to allow get/set of bonding parameter
      primary via netlink.
      Signed-off-by: default avatarScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a98a0d1
    • Eric Dumazet's avatar
      pkt_sched: fq: more robust memory allocation · c3bd8549
      Eric Dumazet authored
      This patch brings NUMA support and automatic fallback to vmalloc()
      in case kmalloc() failed to allocate FQ hash table.
      
      NUMA support depends on XPS being setup for the device before
      qdisc allocation. After a XPS change, it might be worth creating
      qdisc hierarchy again.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3bd8549
    • Jiri Pirko's avatar
      e7ef941d
    • stephen hemminger's avatar
      bnad: make local variable static · 482da0fa
      stephen hemminger authored
      Compile tested only.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarRasesh Mody <rmody@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      482da0fa
    • Eric Dumazet's avatar
      tcp: refine TSO splits · d4589926
      Eric Dumazet authored
      While investigating performance problems on small RPC workloads,
      I noticed linux TCP stack was always splitting the last TSO skb
      into two parts (skbs). One being a multiple of MSS, and a small one
      with the Push flag. This split is done even if TCP_NODELAY is set,
      or if no small packet is in flight.
      
      Example with request/response of 4K/4K
      
      IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
      
      We can avoid this split by including the Nagle tests at the right place.
      
      Note : If some NIC had trouble sending TSO packets with a partial
      last segment, we would have hit the problem in GRO/forwarding workload already.
      
      tcp_minshall_update() is moved to tcp_output.c and is updated as we might
      feed a TSO packet with a partial last segment.
      
      This patch tremendously improves performance, as the traffic now looks
      like :
      
      IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
      IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
      IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
      IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
      IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
      IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
      IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
      IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
      IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
      IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
      IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
      IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
      
      Before :
      
      lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
      280774
      
       Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':
      
           205719.049006 task-clock                #    9.278 CPUs utilized
               8,449,968 context-switches          #    0.041 M/sec
               1,935,997 CPU-migrations            #    0.009 M/sec
                 160,541 page-faults               #    0.780 K/sec
         548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
         455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
         272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
         166,091,460,030 instructions              #    0.30  insns per cycle
                                                   #    2.74  stalled cycles per insn [83.39%]
          29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
           1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]
      
            22.173517844 seconds time elapsed
      
      lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
      IpOutRequests                   16851063           0.0
      IpExtOutOctets                  23878580777        0.0
      
      After patch :
      
      lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
      280877
      
       Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':
      
           107496.071918 task-clock                #    4.847 CPUs utilized
               5,635,458 context-switches          #    0.052 M/sec
               1,374,707 CPU-migrations            #    0.013 M/sec
                 160,920 page-faults               #    0.001 M/sec
         281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
         228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
         142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
          95,227,712,566 instructions              #    0.34  insns per cycle
                                                   #    2.40  stalled cycles per insn [83.43%]
          16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
             874,252,952 branch-misses             #    5.39% of all branches         [83.37%]
      
            22.175821286 seconds time elapsed
      
      lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
      IpOutRequests                   11239428           0.0
      IpExtOutOctets                  23595191035        0.0
      
      Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
      2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
      scheduler.
      
      Many thanks to Neal for review and ideas.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Tested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4589926
    • stephen hemminger's avatar
      net: remove dead code for add/del multiple · 477bb933
      stephen hemminger authored
      These function to manipulate multiple addresses are not used anywhere
      in current net-next tree. Some out of tree code maybe using these but
      too bad; they should submit their code upstream..
      
      Also, make __hw_addr_flush local since only used by dev_addr_lists.c
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      477bb933
    • David S. Miller's avatar
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next · 6ea09d8a
      David S. Miller authored
      John W. Linville says:
      
      ====================
      Please pull this batch of updates for the 3.14 stream...
      
      For the Bluetooth bits, Gustavo says:
      
      "This is the first batch of patches intended for 3.14. There is
      nothing big here.  Most of the code are refactors, clean up, small
      fixes, plus some new device id support."
      
      And...
      
      "More patches to 3.14. Here we have the support for Low Energy
      Connection Oriented Channels (LE CoC). Basically, as the name says,
      this adds supports for connection oriented channels in the same way
      we already have them for BR/EDR connections so profiles/protocols
      that work on top of BR/EDR can now work on LE plus a plenty of new
      possibilities for LE."
      
      For the ath10k bits, Kalle says:
      
      "Janusz and Marek implemented DFS support to ath10k, but the code is
      not enabled yet due to missing cfg80211/mac80211 patches (it will be
      enabled in the next pull request). Michal did some device reset fixes
      and made it possible for ath10k to share an interrupt with another
      device. And lots of smaller fixes from different people."
      
      For the iwlwifi bits, Emmanuel says:
      
      "I have here a big rework of the rate control by Eyal. This is obviously
      the biggest part of this batch.
      I also have enhancement of protection flags by Avri and a few bits for
      WoWLAN by Eliad and Luca. Johannes cleans up the debugfs plus a few
      fixes. I provided a few things for Bluetooth coexistence.
      Besides this we have an implementation for low priority scan."
      
      Along with all that, there are big batches of updates to mwifiex and
      ath9k, Jeff Kirsher's FSF address fix patches, and a handful of other
      bits here and there.
      
      Please let me know if there are problems!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ea09d8a
    • David S. Miller's avatar
      Merge branch 'phy_power' · b80b376c
      David S. Miller authored
      Sebastian Hesselbarth says:
      
      ====================
      net: phy: Ethernet PHY powerdown optimization
      
      This is v2 of the ethernet PHY power optimization patches to reduce
      power consumption of network PHYs with link that are either unused or
      the corresponding netdev is down.
      
      Compared to the last version, this patch set drops a patch to disable
      unused PHYs after late initcall, as it is not compatible with a modular
      mdio bus [1]. I'll investigate different ways to have a modular mdio bus
      driver get notified when driver loading is done.
      
      Again, a branch with v2 applied to v3.13-rc2 can also be found at
      https://github.com/shesselba/linux-dove.git topic/ethphy-power-v2
      
      [1] http://www.spinics.net/lists/arm-kernel/msg293028.html
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b80b376c
    • Sebastian Hesselbarth's avatar
      net: phy: suspend phydev when going to HALTED · be9dad1f
      Sebastian Hesselbarth authored
      When phydev is going to HALTED state, we can try to suspend it to
      safe more power. phy_suspend helper will check if PHY can be suspended,
      so just call it when entering HALTED state.
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Acked-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be9dad1f
    • Sebastian Hesselbarth's avatar
      net: phy: resume/suspend PHYs on attach/detach · 1211ce53
      Sebastian Hesselbarth authored
      This ensures PHYs are resumed on attach and suspended on detach.
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Acked-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1211ce53
    • Sebastian Hesselbarth's avatar
      net: phy: provide phy_resume/phy_suspend helpers · 481b5d93
      Sebastian Hesselbarth authored
      This adds helper functions to resume and suspend a given phy_device
      by calling the corresponding driver callbacks if available.
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Acked-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      481b5d93
    • Sebastian Hesselbarth's avatar
      net: phy: marvell: provide genphy suspend/resume · 0898b448
      Sebastian Hesselbarth authored
      Marvell PHYs support generic PHY suspend/resume, so provide those
      callbacks to all marvell specific drivers.
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Acked-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0898b448
    • Sebastian Hesselbarth's avatar
      net: mv643xx_eth: properly start/stop phy device · 58911151
      Sebastian Hesselbarth authored
      When using phydev, it should be phy_start/phy_stop'ed properly. This
      driver doesn't do that, so add the corresponding calls to port_start/
      stop respectively.
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Acked-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58911151
    • wangweidong's avatar
      sctp: Reorder 'struc association' members to reduce its size · be78cfcb
      wangweidong authored
      Members of 'struct association' are not in appropriate order to
      reuse compiler added padding on 64bit architectures. In this patch
      we reorder those struct members and help reduce the size of the
      structure from 2776 bytes to 2720 bytes on 64 bit architectures.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be78cfcb
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · e4379310
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates
      
      This series contains updates to i40e only (again).
      
      Jesse provides a fix for when tx_rings structure is NULL and we do not want
      to panic. Then refactors the flow control set up and disables L2 flow control
      by default.  Provides some trivial fixes as well as prevent compiler warnings.
      Then to align to similar behaviour in ixgbe, use the total number of CPUs in
      the system to suggest the number of transmit and receive queue pairs.
      
      Shannon provides a i40e ethtool fix to get some more reasonable information
      reports back out to the ethtool.  In addition, fixes PF reset after offline
      test, where it reorders the test to put the register test last as it is the
      only one that needs a reset, and we wait to trigger the reset until after we
      clear the testing bit.  Lastly provides basic support for handling suspend
      and resume for now, later on Wake-On-LAN support will be added.
      
      Anjali provides changes to tell the stack about our actual number of queues
      in order for RFS/RPS/XFS to work correctly.  Then provides several patches to
      implement dynamically changing the queue count for the main VSI.  Adds
      basic support for get/set channels for RSS so that the number of receive and
      transmit queue pair can be changed via ethtool.  Cleans up the use of
      rtnl_lock in the reset patch since it runs from a work time.
      
      Neerav Parikh cleans up the VF interface to remove FCoE code as this
      feature will not be supported on VF interfaces.
      
      v2:
        - submitted patch 1 to net (since it was a fix needed for net), so dropped
          from this series (this patch will get added to net-next when Dave syncs
          his trees)
        - Dropped patches 4 & 11 from previous submission because of feedback
          received from Ben Hutchings and Sergei Shtylyov.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4379310
    • David S. Miller's avatar
      Merge branch 'ovs_hash' · bc4d0f61
      David S. Miller authored
      Francesco Fusco says:
      
      ====================
      ovs: introduce arch-specific fast hashing improvements
      
      From: Daniel Borkmann <dborkman@redhat.com>
      
      We are introducing a fast hash function (see patch1) that can be
      used in the context of OpenVSwitch to reduce the hashing footprint
      (patch2). For details, please see individual patches!
      
      v1->v2:
       - Make hash generic and place it under lib
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc4d0f61
    • Francesco Fusco's avatar
      net: ovs: use CRC32 accelerated flow hash if available · 500f8087
      Francesco Fusco authored
      Currently OVS uses jhash2() for calculating flow hashes in its
      internal flow_hash() function. The performance of the flow_hash()
      function is critical, as the input data can be hundreds of bytes
      long.
      
      OVS is largely deployed in x86_64 based datacenters.  Therefore,
      we argue that the performance critical fast path of OVS should
      exploit underlying CPU features in order to reduce the per packet
      processing costs. We replace jhash2 with the hash implementation
      provided by the kernel hash lib, which exploits the crc32l
      instruction to achieve high performance
      
      Our patch greatly reduces the hash footprint from ~200 cycles of
      jhash2() to around ~90 cycles in case of ovs_flow_hash_crc()
      (measured with rdtsc over maximum length flow keys on an i7 Intel
      CPU).
      
      Additionally, we wrote a microbenchmark to stress the flow table
      performance. The benchmark inserts random flows into the flow
      hash and then performs lookups. Our hash deployed on a CRC32
      capable CPU reduces the lookup for 1000 flows, 100 masks from
      ~10,100us to ~6,700us, for example.
      
      Thus, simply use the newly introduced arch_fast_hash2() as a
      drop-in replacement.
      Signed-off-by: default avatarFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarThomas Graf <tgraf@redhat.com>
      Acked-by: default avatarJesse Gross <jesse@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      500f8087
    • Francesco Fusco's avatar
      lib: introduce arch optimized hash library · 71ae8aac
      Francesco Fusco authored
      We introduce a new hashing library that is meant to be used in
      the contexts where speed is more important than uniformity of the
      hashed values. The hash library leverages architecture specific
      implementation to achieve high performance and fall backs to
      jhash() for the generic case.
      
      On Intel-based x86 architectures, the library can exploit the crc32l
      instruction, part of the Intel SSE4.2 instruction set, if the
      instruction is supported by the processor. This implementation
      is twice as fast as the jhash() implementation on an i7 processor.
      
      Additional architectures, such as Arm64 provide instructions for
      accelerating the computation of CRC, so they could be added as well
      in follow-up work.
      Signed-off-by: default avatarFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarThomas Graf <tgraf@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71ae8aac
    • tanxiaojun's avatar
      fddi: cleanup unsigned to unsigned int/short · 89e47d3b
      tanxiaojun authored
      Use "unsigned int/short" instead of "unsigned", and change the type of
      iteration variable "i" to "unsigned int".
      Signed-off-by: default avatarTan Xiaojun <tanxiaojun@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89e47d3b
  2. 16 Dec, 2013 16 commits