1. 11 May, 2015 1 commit
    • Eric W. Biederman's avatar
      tun: Utilize the normal socket network namespace refcounting. · 140e807d
      Eric W. Biederman authored
      There is no need for tun to do the weird network namespace refcounting.
      The existing network namespace refcounting in tfile has almost exactly
      the same lifetime.  So rewrite the code to use the struct sock network
      namespace refcounting and remove the unnecessary hand rolled network
      namespace refcounting and the unncesary tfile->net.
      
      This change allows the tun code to directly call sock_put bypassing
      sock_release and making SOCK_EXTERNALLY_ALLOCATED unnecessary.
      
      Remove the now unncessary tun_release so that if anything tries to use
      the sock_release code path the kernel will oops, and let us know about
      the bug.
      
      The macvtap code already uses it's internal socket this way.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      140e807d
  2. 10 May, 2015 14 commits
    • Eric Dumazet's avatar
      codel: add ce_threshold attribute · 80ba92fa
      Eric Dumazet authored
      For DCTCP or similar ECN based deployments on fabrics with shallow
      buffers, hosts are responsible for a good part of the buffering.
      
      This patch adds an optional ce_threshold to codel & fq_codel qdiscs,
      so that DCTCP can have feedback from queuing in the host.
      
      A DCTCP enabled egress port simply have a queue occupancy threshold
      above which ECT packets get CE mark.
      
      In codel language this translates to a sojourn time, so that one doesn't
      have to worry about bytes or bandwidth but delays.
      
      This makes the host an active participant in the health of the whole
      network.
      
      This also helps experimenting DCTCP in a setup without DCTCP compliant
      fabric.
      
      On following example, ce_threshold is set to 1ms, and we can see from
      'ldelay xxx us' that TCP is not trying to go around the 5ms codel
      target.
      
      Queue has more capacity to absorb inelastic bursts (say from UDP
      traffic), as queues are maintained to an optimal level.
      
      lpaa23:~# ./tc -s -d qd sh dev eth1
      qdisc mq 1: dev eth1 root
       Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961)
       backlog 3108242b 364p requeues 42961
      qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503)
       rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503
        count 0 lastcount 0 ldelay 1.0ms drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384
      qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186)
       rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186
        count 0 lastcount 0 ldelay 694us drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873
      qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
       Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554)
       rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554
        count 0 lastcount 0 ldelay 889us drop_next 0us
        maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780
      ...
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80ba92fa
    • Varka Bhadram's avatar
      ethernet: qualcomm: use spi instead of spi_device · cf9d0dcc
      Varka Bhadram authored
      All spi based drivers have an instance of struct spi_device
      as spi. This patch renames spi_device to spi to synchronize
      with all the drivers.
      Signed-off-by: default avatarVarka Bhadram <varkab@cdac.in>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf9d0dcc
    • David S. Miller's avatar
      Merge branch 'pktgen-next' · 3e3b3468
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      The following series introduce some pktgen changes
      
      Patch01:
       Cleanup my own work when I introduced NO_TIMESTAMP.
      
      Patch02:
       Took over patch from Alexei, and addressed my own concerns, as Alexie
       is too busy with other work, and this will provide an easy tool for
       measuring ingress path performance, which is a hot topic ATM.
      
       Changes were primarily user interface related.  Introduced a separate
       "xmit_mode" setting, instead of stealing one of the dev flags like
       Alexei did.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e3b3468
    • Alexei Starovoitov's avatar
      pktgen: introduce xmit_mode '<start_xmit|netif_receive>' · 62f64aed
      Alexei Starovoitov authored
      Introduce xmit_mode 'netif_receive' for pktgen which generates the
      packets using familiar pktgen commands, but feeds them into
      netif_receive_skb() instead of ndo_start_xmit().
      
      Default mode is called 'start_xmit'.
      
      It is designed to test netif_receive_skb and ingress qdisc
      performace only. Make sure to understand how it works before
      using it for other rx benchmarking.
      
      Sample script 'pktgen.sh':
      \#!/bin/bash
      function pgset() {
        local result
      
        echo $1 > $PGDEV
      
        result=`cat $PGDEV | fgrep "Result: OK:"`
        if [ "$result" = "" ]; then
          cat $PGDEV | fgrep Result:
        fi
      }
      
      [ -z "$1" ] && echo "Usage: $0 DEV" && exit 1
      ETH=$1
      
      PGDEV=/proc/net/pktgen/kpktgend_0
      pgset "rem_device_all"
      pgset "add_device $ETH"
      
      PGDEV=/proc/net/pktgen/$ETH
      pgset "xmit_mode netif_receive"
      pgset "pkt_size 60"
      pgset "dst 198.18.0.1"
      pgset "dst_mac 90:e2:ba:ff:ff:ff"
      pgset "count 10000000"
      pgset "burst 32"
      
      PGDEV=/proc/net/pktgen/pgctrl
      echo "Running... ctrl^C to stop"
      pgset "start"
      echo "Done"
      cat /proc/net/pktgen/$ETH
      
      Usage:
      $ sudo ./pktgen.sh eth2
      ...
      Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
        43033682pps 20656Mb/sec (20656167360bps) errors: 10000000
      
      Raw netif_receive_skb speed should be ~43 million packet
      per second on 3.7Ghz x86 and 'perf report' should look like:
        37.69%  kpktgend_0   [kernel.vmlinux]  [k] __netif_receive_skb_core
        25.81%  kpktgend_0   [kernel.vmlinux]  [k] kfree_skb
         7.22%  kpktgend_0   [kernel.vmlinux]  [k] ip_rcv
         5.68%  kpktgend_0   [pktgen]          [k] pktgen_thread_worker
      
      If fib_table_lookup is seen on top, it means skb was processed
      by the stack. To benchmark netif_receive_skb only make sure
      that 'dst_mac' of your pktgen script is different from
      receiving device mac and it will be dropped by ip_rcv
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f64aed
    • Jesper Dangaard Brouer's avatar
      pktgen: adjust flag NO_TIMESTAMP to be more pktgen compliant · f1f00d8f
      Jesper Dangaard Brouer authored
      Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags,
      with a negation of the flag like !NO_TIMESTAMP.
      
      Also document the option flag NO_TIMESTAMP.
      
      Fixes: afb84b62 ("pktgen: add flag NO_TIMESTAMP to disable timestamping")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1f00d8f
    • David S. Miller's avatar
      Merge branch 'netns-scalability' · 4d95b72f
      David S. Miller authored
      Nicolas Dichtel says:
      
      ====================
      netns: ease netlink use with a lot of netns
      
      This idea was informally discussed in Ottawa / netdev0.1. The goal is to
      ease the use/scalability of netns, from a userland point of view.
      Today, users need to open one netlink socket per family and per netns.
      Thus, when the number of netns inscreases (for example 5K or more), the
      number of sockets needed to manage them grows a lot.
      
      The goal of this series is to be able to monitor netlink events, for a
      specified family, for a set of netns, with only one netlink socket. For
      this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID.
      When this option is set on a netlink socket, this socket will receive
      netlink notifications from all netns that have a nsid assigned into the
      netns where the socket has been opened.
      The nsid is sent to userland via an anscillary data.
      
      Here is an example with a patched iproute2. vxlan10 is created in the
      current netns (netns0, nsid 0) and then moved to another netns (netns1,
      nsid 1):
      
      $ ip netns exec netns0 ip monitor all-nsid label
      [nsid 0][NSID]nsid 1 (iproute2 netns name: netns1)
      [nsid 0][NEIGH]??? lladdr 00:00:00:00:00:00 REACHABLE,PERMANENT
      [nsid 0][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
      [nsid 0][LINK]Deleted 5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff
      [nsid 1][NSID]nsid 0 (iproute2 netns name: netns0)
      [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
      [nsid 1][ADDR]5: vxlan10    inet 192.168.0.249/24 brd 192.168.0.255 scope global vxlan10
             valid_lft forever preferred_lft forever
      [nsid 1][ROUTE]local 192.168.0.249 dev vxlan10  table local  proto kernel  scope host  src 192.168.0.249
      [nsid 1][ROUTE]ff00::/8 dev vxlan10  table local  metric 256  pref medium
      [nsid 1][ROUTE]2001:123::/64 dev vxlan10  proto kernel  metric 256  pref medium
      [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
          link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
      [nsid 1][ROUTE]broadcast 192.168.0.255 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]192.168.0.0/24 dev vxlan10  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]broadcast 192.168.0.0 dev vxlan10  table local  proto kernel  scope link  src 192.168.0.249
      [nsid 1][ROUTE]fe80::/64 dev vxlan10  proto kernel  metric 256  pref medium
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d95b72f
    • Nicolas Dichtel's avatar
      netlink: allow to listen "all" netns · 59324cf3
      Nicolas Dichtel authored
      More accurately, listen all netns that have a nsid assigned into the netns
      where the netlink socket is opened.
      For this purpose, a netlink socket option is added:
      NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
      socket will receive netlink notifications from all netns that have a nsid
      assigned into the netns where the socket has been opened. The nsid is sent
      to userland via an anscillary data.
      
      With this patch, a daemon needs only one socket to listen many netns. This
      is useful when the number of netns is high.
      
      Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
      the field nsid is valid or not. skb->cb is initialized to 0 on skb
      allocation, thus we are sure that we will never send a nsid 0 by error to
      the userland.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59324cf3
    • Nicolas Dichtel's avatar
      netlink: rename private flags and states · cc3a572f
      Nicolas Dichtel authored
      These flags and states have the same prefix (NETLINK_) that netlink socket
      options. To avoid confusion and to be able to name a flag like a socket
      option, let's use an other prefix: NETLINK_[S|F]_.
      
      Note: a comment has been fixed, it was talking about
      NETLINK_RECV_NO_ENOBUFS socket option instead of NETLINK_NO_ENOBUFS.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc3a572f
    • Nicolas Dichtel's avatar
      netns: use a spin_lock to protect nsid management · 95f38411
      Nicolas Dichtel authored
      Before this patch, nsid were protected by the rtnl lock. The goal of this
      patch is to be able to find a nsid without needing to hold the rtnl lock.
      
      The next patch will introduce a netlink socket option to listen to all
      netns that have a nsid assigned into the netns where the socket is opened.
      Thus, it's important to call rtnl_net_notifyid() outside the spinlock, to
      avoid a recursive lock (nsid are notified via rtnl). This was the main
      reason of the previous patch.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95f38411
    • Nicolas Dichtel's avatar
      netns: notify new nsid outside __peernet2id() · 3138dbf8
      Nicolas Dichtel authored
      There is no functional change with this patch. It will ease the refactoring
      of the locking system that protects nsids and the support of the netlink
      socket option NETLINK_LISTEN_ALL_NSID.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3138dbf8
    • Nicolas Dichtel's avatar
      netns: rename peernet2id() to peernet2id_alloc() · 7a0877d4
      Nicolas Dichtel authored
      In a following commit, a new function will be introduced to only lookup for
      a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the
      existing function is renamed.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a0877d4
    • Nicolas Dichtel's avatar
      netns: always provide the id to rtnl_net_fill() · cab3c8ec
      Nicolas Dichtel authored
      The goal of this commit is to prepare the rework of the locking of nsnid
      protection.
      After this patch, rtnl_net_notifyid() will not call anymore __peernet2id(),
      ie no idr_* operation into this function.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cab3c8ec
    • Nicolas Dichtel's avatar
      netns: returns always an id in __peernet2id() · 109582af
      Nicolas Dichtel authored
      All callers of this function expect a nsid, not an error.
      Thus, returns NETNSA_NSID_NOT_ASSIGNED in case of error so that callers
      don't have to convert the error to NETNSA_NSID_NOT_ASSIGNED.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      109582af
    • David S. Miller's avatar
      Merge tag 'linux-can-next-for-4.2-20150506' of... · 43996fdd
      David S. Miller authored
      Merge tag 'linux-can-next-for-4.2-20150506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2015-05-06
      
      this is a pull request of a seven patches for net-next/master.
      
      Andreas Gröger contributes two patches for the janz-ican3 driver. In
      the first patch, the documentation for already existing sysfs entries
      is added, the second patch adds support for another module/firmware
      variant. A patch by Shawn Landden makes the padding in the struct
      can_frame explicit. The next 4 patches target the flexcan driver, the
      first one is by David Jander adding some documentation, the reaming
      three by me add more documentation and two small code cleanups.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43996fdd
  3. 09 May, 2015 25 commits