1. 09 May, 2016 31 commits
  2. 07 May, 2016 1 commit
  3. 06 May, 2016 8 commits
    • Jiri Pirko's avatar
      mlxsw: spectrum: Fix ordering in mlxsw_sp_fini · 5113bfdb
      Jiri Pirko authored
      Fixes: 0f433fa0 ("mlxsw: spectrum_buffers: Implement shared buffer configuration")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5113bfdb
    • Marc Angel's avatar
      macvtap: add namespace support to the sysfs device class · 17af2bce
      Marc Angel authored
      When creating macvtaps that are expected to have the same ifindex
      in different network namespaces, only the first one will succeed.
      The others will fail with a sysfs_warn_dup warning due to them trying
      to create the following sysfs link (with 'NN' the ifindex of macvtapX):
      
      /sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN
      
      This is reproducible by running the following commands:
      
      ip netns add ns1
      ip netns add ns2
      ip link add veth0 type veth peer name veth1
      ip link set veth0 netns ns1
      ip link set veth1 netns ns2
      ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
      ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap
      
      The last command will fail with "RTNETLINK answers: File exists" (along
      with the kernel warning) but retrying it will work because the ifindex
      was incremented.
      
      The 'net' device class is isolated between network namespaces so each
      one has its own hierarchy of net devices.
      This isn't the case for the 'macvtap' device class.
      The problem occurs half-way through the netdev registration, when
      `macvtap_device_event` is called-back to create the 'tapNN' macvtap
      class device under the 'macvtapX' net class device.
      
      This patch adds namespace support to the 'macvtap' device class so
      that /sys/class/macvtap is no longer shared between net namespaces.
      
      However, making the macvtap sysfs class namespace-aware has the side
      effect of changing /sys/devices/virtual/net/macvtapX/tapNN  into
      /sys/devices/virtual/net/macvtapX/macvtap/tapNN.
      
      This is due to Commit 24b1442d ("Driver-core: Always create class
      directories for classses that support namespaces") and the fact that
      class devices supporting namespaces are really not supposed to be placed
      directly under other class devices.
      
      To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is
      created inside the macvtapX directory.
      Signed-off-by: default avatarMarc Angel <marc@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17af2bce
    • Eric Dumazet's avatar
      ipv4: tcp: ip_send_unicast_reply() is not BH safe · 47dcc20a
      Eric Dumazet authored
      I forgot that ip_send_unicast_reply() is not BH safe (yet).
      
      Disabling preemption before calling it was not a good move.
      
      Fixes: c10d9310 ("tcp: do not assume TCP code is non preemptible")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndres Lagar-Cavilla  <andreslc@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47dcc20a
    • David S. Miller's avatar
      Merge branch 'bpf-direct-pkt-access' · 4b307a8e
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpf: introduce direct packet access
      
      This set of patches introduce 'direct packet access' from
      cls_bpf and act_bpf programs (which are root only).
      
      Current bpf programs use LD_ABS, LD_INS instructions which have
      to do 'if (off < skb_headlen)' for every packet access.
      It's ok for socket filters, but too slow for XDP, since single
      LD_ABS insn consumes 3% of cpu. Therefore we have to amortize the cost
      of length check over multiple packet accesses via direct access
      to skb->data, data_end pointers.
      
      The existing packet parser typically look like:
        if (load_half(skb, offsetof(struct ethhdr, h_proto)) != ETH_P_IP)
           return 0;
        if (load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)) != IPPROTO_UDP ||
            load_byte(skb, ETH_HLEN) != 0x45)
           return 0;
        ...
      with 'direct packet access' the bpf program becomes:
         void *data = (void *)(long)skb->data;
         void *data_end = (void *)(long)skb->data_end;
         struct eth_hdr *eth = data;
         struct iphdr *iph = data + sizeof(*eth);
      
         if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
            return 0;
         if (eth->h_proto != htons(ETH_P_IP))
            return 0;
         if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
            return 0;
         ...
      which is more natural to write and significantly faster.
      See patch 6 for performance tests:
      21Mpps(old) vs 24Mpps(new) with just 5 loads.
      For more complex parsers the performance gain is higher.
      
      The other approach implemented in [1] was adding two new instructions
      to interpreter and JITs and was too hard to use from llvm side.
      The approach presented here doesn't need any instruction changes,
      but the verifier has to work harder to check safety of the packet access.
      
      Patch 1 prepares the code and Patch 2 adds new checks for direct
      packet access and all of them are gated with 'env->allow_ptr_leaks'
      which is true for root only.
      Patch 3 improves search pruning for large programs.
      Patch 4 wires in verifier's changes with net/core/filter side.
      Patch 5 updates docs
      Patches 6 and 7 add tests.
      
      [1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b307a8e
    • Alexei Starovoitov's avatar
      samples/bpf: add verifier tests · 883e44e4
      Alexei Starovoitov authored
      add few tests for "pointer to packet" logic of the verifier
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      883e44e4
    • Alexei Starovoitov's avatar
      samples/bpf: add 'pointer to packet' tests · 65d472fb
      Alexei Starovoitov authored
      parse_simple.c - packet parser exapmle with single length check that
      filters out udp packets for port 9
      
      parse_varlen.c - variable length parser that understand multiple vlan headers,
      ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
      The packet is parsed layer by layer with multitple length checks.
      
      parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
      Same functionality as parse_simple.
      
      simple = 24.1Mpps per core
      varlen = 22.7Mpps
      ldabs  = 21.4Mpps
      
      Parser with LD_ABS instructions is slower than full direct access parser
      which does more packet accesses and checks.
      
      These examples demonstrate the choice bpf program authors can make between
      flexibility of the parser vs speed.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d472fb
    • Alexei Starovoitov's avatar
      bpf: add documentation for 'direct packet access' · f9c8d19d
      Alexei Starovoitov authored
      explain how verifier checks safety of packet access
      and update email addresses.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9c8d19d
    • Alexei Starovoitov's avatar
      bpf: wire in data and data_end for cls_act_bpf · db58ba45
      Alexei Starovoitov authored
      allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers.
      The bpf helpers that change skb->data need to update data_end pointer as well.
      The verifier checks that programs always reload data, data_end pointers
      after calls to such bpf helpers.
      We cannot add 'data_end' pointer to struct qdisc_skb_cb directly,
      since it's embedded as-is by infiniband ipoib, so wrapper struct is needed.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db58ba45