1. 09 Dec, 2014 31 commits
  2. 06 Dec, 2014 9 commits
    • David S. Miller's avatar
      Merge branch 'ebpf-next' · 8d0c4697
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      allow eBPF programs to be attached to sockets
      
      V1->V2:
      
      fixed comments in sample code to state clearly that packet data is accessed
      with LD_ABS instructions and not internal skb fields.
      Also replaced constants in:
      BPF_LD_ABS(BPF_B, 14 + 9 /* R0 = ip->proto */),
      with:
      BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */),
      
      V1 cover:
      
      Introduce BPF_PROG_TYPE_SOCKET_FILTER type of eBPF programs that can be
      attached to sockets with setsockopt().
      Allow such programs to access maps via lookup/update/delete helpers.
      
      This feature was previewed by bpf manpage in commit b4fc1a46("Merge branch 'bpf-next'")
      Now it can actually run.
      
      1st patch adds LD_ABS/LD_IND instruction verification and
      2nd patch adds new setsockopt() flag.
      Patches 3-6 are examples in assembler and in C.
      
      Though native eBPF programs are way more powerful than classic filters
      (attachable through similar setsockopt() call), they don't have skb field
      accessors yet. Like skb->pkt_type, skb->dev->ifindex are not accessible.
      There are sevaral ways to achieve that. That will be in the next set of patches.
      So in this set native eBPF programs can only read data from packet and
      access maps.
      
      The most powerful example is sockex2_kern.c from patch 6 where ~200 lines of C
      are compiled into ~300 of eBPF instructions.
      It shows how quite complex packet parsing can be done.
      
      LLVM used to build examples is at https://github.com/iovisor/llvm
      which is fork of llvm trunk that I'm cleaning up for upstreaming.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d0c4697
    • Alexei Starovoitov's avatar
      samples: bpf: large eBPF program in C · fbe33108
      Alexei Starovoitov authored
      sockex2_kern.c is purposefully large eBPF program in C.
      llvm compiles ~200 lines of C code into ~300 eBPF instructions.
      
      It's similar to __skb_flow_dissect() to demonstrate that complex packet parsing
      can be done by eBPF.
      Then it uses (struct flow_keys)->dst IP address (or hash of ipv6 dst) to keep
      stats of number of packets per IP.
      User space loads eBPF program, attaches it to loopback interface and prints
      dest_ip->#packets stats every second.
      
      Usage:
      $sudo samples/bpf/sockex2
      ip 127.0.0.1 count 19
      ip 127.0.0.1 count 178115
      ip 127.0.0.1 count 369437
      ip 127.0.0.1 count 559841
      ip 127.0.0.1 count 750539
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbe33108
    • Alexei Starovoitov's avatar
      samples: bpf: trivial eBPF program in C · a8085782
      Alexei Starovoitov authored
      this example does the same task as previous socket example
      in assembler, but this one does it in C.
      
      eBPF program in kernel does:
          /* assume that packet is IPv4, load one byte of IP->proto */
          int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
          long *value;
      
          value = bpf_map_lookup_elem(&my_map, &index);
          if (value)
              __sync_fetch_and_add(value, 1);
      
      Corresponding user space reads map[tcp], map[udp], map[icmp]
      and prints protocol stats every second
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8085782
    • Alexei Starovoitov's avatar
      samples: bpf: elf_bpf file loader · 249b812d
      Alexei Starovoitov authored
      simple .o parser and loader using BPF syscall.
      .o is a standard ELF generated by LLVM backend
      
      It parses elf file compiled by llvm .c->.o
      - parses 'maps' section and creates maps via BPF syscall
      - parses 'license' section and passes it to syscall
      - parses elf relocations for BPF maps and adjusts BPF_LD_IMM64 insns
        by storing map_fd into insn->imm and marking such insns as BPF_PSEUDO_MAP_FD
      - loads eBPF programs via BPF syscall
      
      One ELF file can contain multiple BPF programs.
      
      int load_bpf_file(char *path);
      populates prog_fd[] and map_fd[] with FDs received from bpf syscall
      
      bpf_helpers.h - helper functions available to eBPF programs written in C
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      249b812d
    • Alexei Starovoitov's avatar
      samples: bpf: example of stateful socket filtering · 03f4723e
      Alexei Starovoitov authored
      this socket filter example does:
      - creates arraymap in kernel with key 4 bytes and value 8 bytes
      
      - loads eBPF program which assumes that packet is IPv4 and loads one byte of
        IP->proto from the packet and uses it as a key in a map
      
        r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)];
        *(u32*)(fp - 4) = r0;
        value = bpf_map_lookup_elem(map_fd, fp - 4);
        if (value)
             (*(u64*)value) += 1;
      
      - attaches this program to raw socket
      
      - every second user space reads map[IPPROTO_TCP], map[IPPROTO_UDP], map[IPPROTO_ICMP]
        to see how many packets of given protocol were seen on loopback interface
      
      Usage:
      $sudo samples/bpf/sock_example
      TCP 0 UDP 0 ICMP 0 packets
      TCP 187600 UDP 0 ICMP 4 packets
      TCP 376504 UDP 0 ICMP 8 packets
      TCP 563116 UDP 0 ICMP 12 packets
      TCP 753144 UDP 0 ICMP 16 packets
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03f4723e
    • Alexei Starovoitov's avatar
      net: sock: allow eBPF programs to be attached to sockets · 89aa0758
      Alexei Starovoitov authored
      introduce new setsockopt() command:
      
      setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
      
      where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
      and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
      
      setsockopt() calls bpf_prog_get() which increments refcnt of the program,
      so it doesn't get unloaded while socket is using the program.
      
      The same eBPF program can be attached to multiple sockets.
      
      User task exit automatically closes socket which calls sk_filter_uncharge()
      which decrements refcnt of eBPF program
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89aa0758
    • Alexei Starovoitov's avatar
      bpf: verifier: add checks for BPF_ABS | BPF_IND instructions · ddd872bc
      Alexei Starovoitov authored
      introduce program type BPF_PROG_TYPE_SOCKET_FILTER that is used
      for attaching programs to sockets where ctx == skb.
      
      add verifier checks for ABS/IND instructions which can only be seen
      in socket filters, therefore the check:
        if (env->prog->aux->prog_type != BPF_PROG_TYPE_SOCKET_FILTER)
          verbose("BPF_LD_ABS|IND instructions are only allowed in socket filters\n");
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddd872bc
    • Jason Wang's avatar
      tun/macvtap: use consume_skb() instead of kfree_skb() when needed · f51a5e82
      Jason Wang authored
      To be more friendly with drop monitor, we should only call kfree_skb() when
      the packets were dropped and use consume_skb() in other cases.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f51a5e82
    • Markus Elfring's avatar
      net-PA Semi: Deletion of unnecessary checks before the function call "pci_dev_put" · 6db16718
      Markus Elfring authored
      The pci_dev_put() function tests whether its argument is NULL
      and then returns immediately. Thus the test around the call
      is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Acked-by: default avatarOlof Johansson <olof@lixom.net>
      Acked-by: default avatarLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6db16718