1. 30 Aug, 2013 18 commits
    • Jingoo Han's avatar
      net: ks8842: use dev_get_platdata() · 0dd14b67
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dd14b67
    • Jingoo Han's avatar
      net: pxa168_eth: use dev_get_platdata() · e19eac0e
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e19eac0e
    • Jingoo Han's avatar
      net: mv643xx_eth: use dev_get_platdata() · bbfa6d0a
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbfa6d0a
    • Jingoo Han's avatar
      net: fec: use dev_get_platdata() · 94660ba0
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Acked-by: default avatarFugang Duan  <B38611@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94660ba0
    • Jingoo Han's avatar
      net: ethoc: use dev_get_platdata() · 420fcd82
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      420fcd82
    • Jingoo Han's avatar
      net: dm9000: use dev_get_platdata() · cd4e2e4b
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd4e2e4b
    • Jingoo Han's avatar
      net: ep93xx_eth: use dev_get_platdata() · b96f64dd
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b96f64dd
    • Jingoo Han's avatar
      net: bcm63xx_enet: use dev_get_platdata() · cf0e7794
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf0e7794
    • Jingoo Han's avatar
      net: au1000_eth: use dev_get_platdata() · 1fc2c469
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fc2c469
    • Jingoo Han's avatar
      net: bfin_mac: use dev_get_platdata() · a63b82c4
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a63b82c4
    • Jingoo Han's avatar
      net: ax88796: use dev_get_platdata() · a3ea2800
      Jingoo Han authored
      Use the wrapper function for retrieving the platform data instead of
      accessing dev->platform_data directly. This is a cosmetic change
      to make the code simpler and enhance the readability.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3ea2800
    • Lutz Jaenicke's avatar
      macvlan: fix typo in assignment · 71740129
      Lutz Jaenicke authored
      commit 3b04ddde
      "[NET]: Move hardware header operations out of netdevice."
      moved the handling into macvlan setup adding
        dev->header_ops         = &macvlan_hard_header_ops,
      At the end of the line the ',' should have been a ';'
      Signed-off-by: default avatarLutz Jaenicke <ljaenicke@innominate.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71740129
    • Sonic Zhang's avatar
      driver:net:stmmac: Disable DMA store and forward mode if platform data... · e2a240c7
      Sonic Zhang authored
      driver:net:stmmac: Disable DMA store and forward mode if platform data force_thresh_dma_mode is set.
      
      Some synopsys ip implementation doesn't support DMA store and forward mode,
      such as BF60x. So, set force_thresh_dma_mode to use DMA thresholds only.
      Update document and devicetree as well.
      Signed-off-by: default avatarSonic Zhang <sonic.zhang@analog.com>
      Acked-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2a240c7
    • Thomas Graf's avatar
      ipv6: Remove redundant sk variable · 816c5b5b
      Thomas Graf authored
      A sk variable initialized to ndisc_sk is already available outside
      of the branch.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      816c5b5b
    • Yuchung Cheng's avatar
      tcp: do not use cached RTT for RTT estimation · 1b7fdd2a
      Yuchung Cheng authored
      RTT cached in the TCP metrics are valuable for the initial timeout
      because SYN RTT usually does not account for serialization delays
      on low BW path.
      
      However using it to seed the RTT estimator maybe disruptive because
      other components (e.g., pacing) require the smooth RTT to be obtained
      from actual connection.
      
      The solution is to use the higher cached RTT to set the first RTO
      conservatively like tcp_rtt_estimator(), but avoid seeding the other
      RTT estimator variables such as srtt.  It is also a good idea to
      keep RTO conservative to obtain the first RTT sample, and the
      performance is insured by TCP loss probe if SYN RTT is available.
      
      To keep the seeding formula consistent across SYN RTT and cached RTT,
      the rttvar is twice the cached RTT instead of cached RTTVAR value. The
      reason is because cached variation may be too small (near min RTO)
      which defeats the purpose of being conservative on first RTO. However
      the metrics still keep the RTT variations as they might be useful for
      user applications (through ip).
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b7fdd2a
    • Eric Dumazet's avatar
      pkt_sched: fq: prefetch() fix · 08f89b98
      Eric Dumazet authored
      kbuild bot reported following m68k build error :
      
        net/sched/sch_fq.c: In function 'fq_dequeue':
      >> net/sched/sch_fq.c:491:2: error: implicit declaration of function
      'prefetch' [-Werror=implicit-function-declaration]
         cc1: some warnings being treated as errors
      
      While we are fixing this, move this prefetch() call a bit earlier.
      Reported-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08f89b98
    • Joe Perches's avatar
      drivers:net: Convert dma_alloc_coherent(...__GFP_ZERO) to dma_zalloc_coherent · ede23fa8
      Joe Perches authored
      __GFP_ZERO is an uncommon flag and perhaps is better
      not used.  static inline dma_zalloc_coherent exists
      so convert the uses of dma_alloc_coherent with __GFP_ZERO
      to the more common kernel style with zalloc.
      
      Remove memset from the static inline dma_zalloc_coherent
      and add just one use of __GFP_ZERO instead.
      
      Trivially reduces the size of the existing uses of
      dma_zalloc_coherent.
      
      Realign arguments as appropriate.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Acked-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ede23fa8
    • Eric Dumazet's avatar
      pkt_sched: fq: Fair Queue packet scheduler · afe4fd06
      Eric Dumazet authored
      - Uses perfect flow match (not stochastic hash like SFQ/FQ_codel)
      - Uses the new_flow/old_flow separation from FQ_codel
      - New flows get an initial credit allowing IW10 without added delay.
      - Special FIFO queue for high prio packets (no need for PRIO + FQ)
      - Uses a hash table of RB trees to locate the flows at enqueue() time
      - Smart on demand gc (at enqueue() time, RB tree lookup evicts old
        unused flows)
      - Dynamic memory allocations.
      - Designed to allow millions of concurrent flows per Qdisc.
      - Small memory footprint : ~8K per Qdisc, and 104 bytes per flow.
      - Single high resolution timer for throttled flows (if any).
      - One RB tree to link throttled flows.
      - Ability to have a max rate per flow. We might add a socket option
        to add per socket limitation.
      
      Attempts have been made to add TCP pacing in TCP stack, but this
      seems to add complex code to an already complex stack.
      
      TCP pacing is welcomed for flows having idle times, as the cwnd
      permits TCP stack to queue a possibly large number of packets.
      
      This removes the 'slow start after idle' choice, hitting badly
      large BDP flows, and applications delivering chunks of data
      as video streams.
      
      Nicely spaced packets :
      Here interface is 10Gbit, but flow bottleneck is ~20Mbit
      
      cwin is big, yet FQ avoids the typical bursts generated by TCP
      (as in netperf TCP_RR -- -r 100000,100000)
      
      15:01:23.545279 IP A > B: . 78193:81089(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.545394 IP B > A: . ack 81089 win 3668 <nop,nop,timestamp 11597985 1115>
      15:01:23.546488 IP A > B: . 81089:83985(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.546565 IP B > A: . ack 83985 win 3668 <nop,nop,timestamp 11597986 1115>
      15:01:23.547713 IP A > B: . 83985:86881(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.547778 IP B > A: . ack 86881 win 3668 <nop,nop,timestamp 11597987 1115>
      15:01:23.548911 IP A > B: . 86881:89777(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.548949 IP B > A: . ack 89777 win 3668 <nop,nop,timestamp 11597988 1115>
      15:01:23.550116 IP A > B: . 89777:92673(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.550182 IP B > A: . ack 92673 win 3668 <nop,nop,timestamp 11597989 1115>
      15:01:23.551333 IP A > B: . 92673:95569(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.551406 IP B > A: . ack 95569 win 3668 <nop,nop,timestamp 11597991 1115>
      15:01:23.552539 IP A > B: . 95569:98465(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.552576 IP B > A: . ack 98465 win 3668 <nop,nop,timestamp 11597992 1115>
      15:01:23.553756 IP A > B: . 98465:99913(1448) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.554138 IP A > B: P 99913:100001(88) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.554204 IP B > A: . ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.554234 IP B > A: . 65248:68144(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.555620 IP B > A: . 68144:71040(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.557005 IP B > A: . 71040:73936(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.558390 IP B > A: . 73936:76832(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.559773 IP B > A: . 76832:79728(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.561158 IP B > A: . 79728:82624(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.562543 IP B > A: . 82624:85520(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.563928 IP B > A: . 85520:88416(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.565313 IP B > A: . 88416:91312(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.566698 IP B > A: . 91312:94208(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.568083 IP B > A: . 94208:97104(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.569467 IP B > A: . 97104:100000(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.570852 IP B > A: . 100000:102896(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.572237 IP B > A: . 102896:105792(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.573639 IP B > A: . 105792:108688(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.575024 IP B > A: . 108688:111584(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.576408 IP B > A: . 111584:114480(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.577793 IP B > A: . 114480:117376(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      
      TCP timestamps show that most packets from B were queued in the same ms
      timeframe (TSval 1159799{3,4}), but FQ managed to send them right
      in time to avoid a big burst.
      
      In slow start or steady state, very few packets are throttled [1]
      
      FQ gets a bunch of tunables as :
      
        limit : max number of packets on whole Qdisc (default 10000)
      
        flow_limit : max number of packets per flow (default 100)
      
        quantum : the credit per RR round (default is 2 MTU)
      
        initial_quantum : initial credit for new flows (default is 10 MTU)
      
        maxrate : max per flow rate (default : unlimited)
      
        buckets : number of RB trees (default : 1024) in hash table.
                     (consumes 8 bytes per bucket)
      
        [no]pacing : disable/enable pacing (default is enable)
      
      All of them can be changed on a live qdisc.
      
      $ tc qd add dev eth0 root fq help
      Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ]
                    [ quantum BYTES ] [ initial_quantum BYTES ]
                    [ maxrate RATE  ] [ buckets NUMBER ]
                    [ [no]pacing ]
      
      $ tc -s -d qd
      qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140
       Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14)
       backlog 0b 0p requeues 14
        511 flows, 511 inactive, 0 throttled
        110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit
      
      [1] Except if initial srtt is overestimated, as if using
      cached srtt in tcp metrics. We'll provide a fix for this issue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afe4fd06
  2. 29 Aug, 2013 22 commits
    • Daniel Borkmann's avatar
      net: packet: document available fanout policies · 7ec06da8
      Daniel Borkmann authored
      Update documentation to add fanout policies that are available.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ec06da8
    • Daniel Borkmann's avatar
      net: packet: use reciprocal_divide in fanout_demux_hash · f55d112e
      Daniel Borkmann authored
      Instead of hard-coding reciprocal_divide function, use the inline
      function from reciprocal_div.h.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f55d112e
    • Daniel Borkmann's avatar
      net: packet: add randomized fanout scheduler · 5df0ddfb
      Daniel Borkmann authored
      We currently allow for different fanout scheduling policies in pf_packet
      such as scheduling by skb's rxhash, round-robin, by cpu, and rollover.
      Also allow for a random, equidistributed selection of the socket from the
      fanout process group.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5df0ddfb
    • Sergei Shtylyov's avatar
      sh_eth: no need to call ether_setup() · 48859488
      Sergei Shtylyov authored
      There's no need to call ether_setup() in the driver since prior alloc_etherdev()
      call already arranges for it.
      Suggested-by: default avatarDenis Kirjanov <kda@linux-powerpc.org>
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48859488
    • David S. Miller's avatar
      Merge branch 'bond_vlan' · d7ef9b04
      David S. Miller authored
      Veaceslav Falico says:
      
      ====================
      bonding: remove vlan special handling
      
      v1: Per Jiri's advice, remove the exported netdev_upper struct to keep it
          inside dev.c only, and instead implement a macro to iterate over the
          list and return only net_device *.
      v2: Jiri noted that we need to see every upper device, but not only the
          first level. Modify the netdev_upper logic to include a list of lower
          devices and for both upper/lower lists every device would see both its
          first-level devices and every other devices that is lower/upper of it.
          Also, convert some annoying spamming warnings to pr_debug in
          bond_arp_send_all.
      v3: move renaming part completely to patch 1 (did I forget to git add
          before commiting?) and address Jiri's input about comments/style of
          patch 2.
      v4: as Vlad found spotted, bond_arp_send_all() won't work in a config where
          we have a device with ip on top of our upper vlan. It fails to send
          packets because we don't tag the packet, while the device on top of
          vlan will emit tagged packets through this vlan. Fix this by first
          searching for all upper vlans, and for each vlan - for the devs on top
          of it. If we find the dev - then tag the packet with the underling's
          vlan_id, otherwise just search the old way - for all devices on top of
          bonding. Also, move the version changes under "---" so they won't get
          into the commit message, if/when applied.
      
      The aim of this patchset is to remove bondings' own vlan handling as much
      as possible and replace it with the netdev upper device functionality.
      
      The upper device functionality is extended to include also lower devices
      and to have, for each device, a full view of every lower and upper device,
      but not only the first-level ones. This might permit in the future to
      avoid, for any grouping/teaming/upper/lower devices, to maintain its own
      lists of slaves/vlans/etc.
      
      This is achieved by adding a helper function to upper dev list handling -
      netdev_upper_get_next_dev(dev, iter), which returns the next device after
      the list_head **iter, and sets *iter to the next list_head *. This patchset
      also adds netdev_for_each_upper_dev(dev, upper, iter), which iterates
      through the whole dev->upper_dev_list, setting upper to the net_device.
      The only special treatment of vlans remains in rlb code.
      
      This patchset solves several issues with bonding, simplifies it overall,
      RCUify further and exports upper list functions for any other users which
      might also want to get rid of its own vlan_lists or slaves.
      
      I'm testing it continuously currently, no issues found, will update on
      anything.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7ef9b04
    • Veaceslav Falico's avatar
      bonding: pr_debug instead of pr_warn in bond_arp_send_all · 3e32582f
      Veaceslav Falico authored
      They're simply annoying and will spam dmesg constantly if we hit them, so
      convert to pr_debug so that we still can access them in case of debugging.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e32582f
    • Veaceslav Falico's avatar
      bonding: remove vlan_list/current_alb_vlan · e868b0c9
      Veaceslav Falico authored
      Currently there are no real users of vlan_list/current_alb_vlan, only the
      helpers which maintain them, so remove them.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e868b0c9
    • Veaceslav Falico's avatar
      bonding: make alb_send_learning_packets() use upper dev list · 5bf94b83
      Veaceslav Falico authored
      Currently, if there are vlans on top of bond, alb_send_learning_packets()
      will never send LPs from the bond itself (i.e. untagged), which might leave
      untagged clients unupdated.
      
      Also, the 'circular vlan' logic (i.e. update only MAX_LP_BURST vlans at a
      time, and save the last vlan for the next update) is really suboptimal - in
      case of lots of vlans it will take a lot of time to update every vlan. It
      is also never called in any hot path and sends only a few small packets -
      thus the optimization by itself is useless.
      
      So remove the whole current_alb_vlan/MAX_LP_BURST logic from
      alb_send_learning_packets(). Instead, we'll first send a packet untagged
      and then traverse the upper dev list, sending a tagged packet for each vlan
      found. Also, remove the MAX_LP_BURST define - we already don't need it.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bf94b83
    • Veaceslav Falico's avatar
      bonding: split alb_send_learning_packets() · 7aa64981
      Veaceslav Falico authored
      Create alb_send_lp_vid(), which will handle the skb/lp creation, vlan
      tagging and sending, and use it in alb_send_learning_packets().
      
      This way all the logic remains in alb_send_learning_packets(), which
      becomes a lot more cleaner and easier to understand.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7aa64981
    • Veaceslav Falico's avatar
      bonding: use vlan_uses_dev() in __bond_release_one() · a59d3d21
      Veaceslav Falico authored
      We always hold the rtnl_lock() in __bond_release_one(), so use
      vlan_uses_dev() instead of bond_vlan_used().
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a59d3d21
    • Veaceslav Falico's avatar
      bonding: convert bond_has_this_ip() to use upper devices · 50223ce4
      Veaceslav Falico authored
      Currently, bond_has_this_ip() is aware only of vlan upper devices, and thus
      will return false if the address is associated with the upper bridge or any
      other device, and thus will break the arp logic.
      
      Fix this by using the upper device list. For every upper device we verify
      if the address associated with it is our address, and if yes - return true.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50223ce4
    • Veaceslav Falico's avatar
      bonding: make bond_arp_send_all use upper device list · 27bc11e6
      Veaceslav Falico authored
      Currently, bond_arp_send_all() is aware only of vlans, which breaks
      configurations like bond <- bridge (or any other 'upper' device) with IP
      (which is quite a common scenario for virt setups).
      
      To fix this we convert the bond_arp_send_all() to first verify if the rt
      device is the bond itself, and if not - to go through its list of upper
      vlans and their respectiv upper devices (if the vlan's upper device matches
      - tag the packet), if still not found - go through all of our upper list
      devices to see if any of them match the route device for the target. If the
      match is a vlan device - we also save its vlan_id and tag it in
      bond_arp_send().
      
      Also, clean the function a bit to be more readable.
      
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27bc11e6
    • Veaceslav Falico's avatar
      bonding: use netdev_upper list in bond_vlan_used · c752af2c
      Veaceslav Falico authored
      Convert bond_vlan_used() to traverse the upper device list to see if we
      have any vlans above us. It's protected by rcu, and in case we are holding
      rtnl_lock we should call vlan_uses_dev() instead - it's faster.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c752af2c
    • Veaceslav Falico's avatar
      net: add netdev_for_each_upper_dev_rcu() · 8b5be856
      Veaceslav Falico authored
      The new macro netdev_for_each_upper_dev_rcu(dev, upper, iter) iterates
      through the dev->upper_dev_list starting from the first element, using
      the netdev_upper_get_next_dev_rcu(dev, &iter).
      
      Must be called under RCU read lock.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b5be856
    • Veaceslav Falico's avatar
      net: add netdev_upper_get_next_dev_rcu(dev, iter) · 48311f46
      Veaceslav Falico authored
      This function returns the next dev in the dev->upper_dev_list after the
      struct list_head **iter position, and updates *iter accordingly. Returns
      NULL if there are no devices left.
      
      Caller must hold RCU read lock.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48311f46
    • Veaceslav Falico's avatar
      net: remove search_list from netdev_adjacent · 620f3186
      Veaceslav Falico authored
      We already don't need it cause we see every upper/lower device in the list
      already.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      620f3186
    • Veaceslav Falico's avatar
      net: add lower_dev_list to net_device and make a full mesh · 5d261913
      Veaceslav Falico authored
      This patch adds lower_dev_list list_head to net_device, which is the same
      as upper_dev_list, only for lower devices, and begins to use it in the same
      way as the upper list.
      
      It also changes the way the whole adjacent device lists work - now they
      contain *all* of upper/lower devices, not only the first level. The first
      level devices are distinguished by the bool neighbour field in
      netdev_adjacent, also added by this patch.
      
      There are cases when a device can be added several times to the adjacent
      list, the simplest would be:
      
           /---- eth0.10 ---\
      eth0-		       --- bond0
           \---- eth0.20 ---/
      
      where both bond0 and eth0 'see' each other in the adjacent lists two times.
      To avoid duplication of netdev_adjacent structures ref_nr is being kept as
      the number of times the device was added to the list.
      
      The 'full view' is achieved by adding, on link creation, all of the
      upper_dev's upper_dev_list devices as upper devices to all of the
      lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice
      versa. On unlink they are removed using the same logic.
      
      I've tested it with thousands vlans/bonds/bridges, everything works ok and
      no observable lags even on a huge number of interfaces.
      
      Memory footprint for 128 devices interconnected with each other via both
      upper and lower (which is impossible, but for the comparison) lists would be:
      
      128*128*2*sizeof(netdev_adjacent) = 1.5MB
      
      but in the real world we usualy have at most several devices with slaves
      and a lot of vlans, so the footprint will be much lower.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d261913
    • Veaceslav Falico's avatar
      net: rename netdev_upper to netdev_adjacent · aa9d8560
      Veaceslav Falico authored
      Rename the structure to reflect the upcoming addition of lower_dev_list.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa9d8560
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 6d508cce
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      This series contains updates to ixgbe.
      
      Jacob provides a fix for 82599 devices where it can potentially keep link
      lights up when the adapter has gone down.
      
      Mark provides a fix to resolve the possible use of uninitialized memory
      by checking the return value on EEPROM reads.
      
      Don provides 2 patches, one to fix a issue where we were traversing the
      Tx ring with the value of IXGBE_NUM_RX_QUEUES which currently happens
      to have the correct value but this is misleading.  A change later, could
      easily make this no longer correct so when traversing the Tx ring, use
      netdev->num_tx_queues.  His second patch does some minor clean ups of log
      messages.
      
      Emil provides the remaining ixgbe patches.  First he fixes the link test
      where forcing the laser before the link check can lead to inconsistent
      results because it does not guarantee that the link will be negotiated
      correctly.  Then he initializes the message buffer array to 0 in order
      to avoid using random numbers from the memory as a MAC address for the
      VF.  Emil also fixes the read loop for the I2C data to account for the
      offset for SFP+ modules.  Lastly, Emil provides several patches to add
      support for QSFP modules where 1Gbps support is added as well as support
      for older QSFP active direct attach cables which pre-date SFF-8436 v3.6.
      
      v2: Fixed patch 4 description and added blank line based on feedback from
          Sergei Shtylyov
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d508cce
    • Fabio Estevam's avatar
      fec: Use NAPI_POLL_WEIGHT · 322555f5
      Fabio Estevam authored
      Instead of using a custom 'FEC_NAPI_WEIGHT', just use the generic
      'NAPI_POLL_WEIGHT' definition instead.
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      322555f5
    • Daniel Borkmann's avatar
      net: sctp: sctp_verify_init: clean up mandatory checks and add comment · 7613f5fe
      Daniel Borkmann authored
      Add a comment related to RFC4960 explaning why we do not check for initial
      TSN, and while at it, remove yoda notation checks and clean up code from
      checks of mandatory conditions. That's probably just really minor, but makes
      reviewing easier.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7613f5fe
    • Eric Dumazet's avatar
      tcp: TSO packets automatic sizing · 95bd09eb
      Eric Dumazet authored
      After hearing many people over past years complaining against TSO being
      bursty or even buggy, we are proud to present automatic sizing of TSO
      packets.
      
      One part of the problem is that tcp_tso_should_defer() uses an heuristic
      relying on upcoming ACKS instead of a timer, but more generally, having
      big TSO packets makes little sense for low rates, as it tends to create
      micro bursts on the network, and general consensus is to reduce the
      buffering amount.
      
      This patch introduces a per socket sk_pacing_rate, that approximates
      the current sending rate, and allows us to size the TSO packets so
      that we try to send one packet every ms.
      
      This field could be set by other transports.
      
      Patch has no impact for high speed flows, where having large TSO packets
      makes sense to reach line rate.
      
      For other flows, this helps better packet scheduling and ACK clocking.
      
      This patch increases performance of TCP flows in lossy environments.
      
      A new sysctl (tcp_min_tso_segs) is added, to specify the
      minimal size of a TSO packet (default being 2).
      
      A follow-up patch will provide a new packet scheduler (FQ), using
      sk_pacing_rate as an input to perform optional per flow pacing.
      
      This explains why we chose to set sk_pacing_rate to twice the current
      rate, allowing 'slow start' ramp up.
      
      sk_pacing_rate = 2 * cwnd * mss / srtt
      
      v2: Neal Cardwell reported a suspect deferring of last two segments on
      initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
      into account tp->xmit_size_goal_segs
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95bd09eb