Commit b4fc1a46 authored by David S. Miller's avatar David S. Miller

Merge branch 'bpf-next'

Alexei Starovoitov says:

====================
eBPF syscall, verifier, testsuite

v14 -> v15:
- got rid of macros with hidden control flow (suggested by David)
  replaced macro with explicit goto or return and simplified
  where possible (affected patches #9 and #10)
- rebased, retested

v13 -> v14:
- small change to 1st patch to ease 'new userspace with old kernel'
  problem (done similar to perf_copy_attr()) (suggested by Daniel)
- the rest unchanged

v12 -> v13:
- replaced 'foo __user *' pointers with __aligned_u64 (suggested by David)
- added __attribute__((aligned(8)) to 'union bpf_attr' to keep
  constant alignment between patches
- updated manpage and syscall wrappers due to __aligned_u64
- rebased, retested on x64 with 32-bit and 64-bit userspace and on i386,
  build tested on arm32,sparc64

v11 -> v12:
- dropped patch 11 and copied few macros to libbpf.h (suggested by Daniel)
- replaced 'enum bpf_prog_type' with u32 to be safe in compat (.. Andy)
- implemented and tested compat support (not part of this set) (.. Daniel)
- changed 'void *log_buf' to 'char *' (.. Daniel)
- combined struct bpf_work_struct and bpf_prog_info (.. Daniel)
- added better return value explanation to manpage (.. Andy)
- added log_buf/log_size explanation to manpage (.. Andy & Daniel)
- added a lot more info about prog_type and map_type to manpage (.. Andy)
- rebased, tweaked test_stubs

Patches 1-4 establish BPF syscall shell for maps and programs.
Patches 5-10 add verifier step by step
Patch 11 adds test stubs for 'unspec' program type and verifier testsuite
  from user space

Note that patches 1,3,4,7 add commands and attributes to the syscall
while being backwards compatible from each other, which should demonstrate
how other commands can be added in the future.

After this set the programs can be loaded for testing only. They cannot
be attached to any events. Though manpage talks about tracing and sockets,
it will be a subject of future patches.

Please take a look at manpage:

BPF(2)                     Linux Programmer's Manual                    BPF(2)

NAME
       bpf - perform a command on eBPF map or program

SYNOPSIS
       #include <linux/bpf.h>

       int bpf(int cmd, union bpf_attr *attr, unsigned int size);

DESCRIPTION
       bpf()  syscall  is a multiplexor for a range of different operations on
       eBPF  which  can  be  characterized  as  "universal  in-kernel  virtual
       machine".  eBPF  is  similar  to  original  Berkeley  Packet Filter (or
       "classic BPF") used to filter network packets. Both statically  analyze
       the  programs  before  loading  them  into  the  kernel  to ensure that
       programs cannot harm the running system.

       eBPF extends classic BPF in multiple ways including ability to call in-
       kernel  helper  functions  and  access shared data structures like eBPF
       maps.  The programs can be written in a restricted C that  is  compiled
       into  eBPF  bytecode  and executed on the eBPF virtual machine or JITed
       into native instruction set.

   eBPF Design/Architecture
       eBPF maps is a generic storage of different types.   User  process  can
       create  multiple  maps  (with key/value being opaque bytes of data) and
       access them via file descriptor. In parallel eBPF programs  can  access
       maps  from inside the kernel.  It's up to user process and eBPF program
       to decide what they store inside maps.

       eBPF programs are similar to kernel modules. They  are  loaded  by  the
       user  process  and automatically unloaded when process exits. Each eBPF
       program is a safe run-to-completion set of instructions. eBPF  verifier
       statically  determines  that  the  program  terminates  and  is safe to
       execute. During verification the program takes a hold of maps  that  it
       intends to use, so selected maps cannot be removed until the program is
       unloaded. The program can be attached to different events. These events
       can  be packets, tracepoint events and other types in the future. A new
       event triggers execution of the program  which  may  store  information
       about the event in the maps.  Beyond storing data the programs may call
       into in-kernel helper functions which may, for example, dump stack,  do
       trace_printk  or other forms of live kernel debugging. The same program
       can be attached to multiple events. Different programs can  access  the
       same map:
         tracepoint  tracepoint  tracepoint    sk_buff    sk_buff
          event A     event B     event C      on eth0    on eth1
           |             |          |            |          |
           |             |          |            |          |
           --> tracing <--      tracing       socket      socket
                prog_1           prog_2       prog_3      prog_4
                |  |               |            |
             |---  -----|  |-------|           map_3
           map_1       map_2

   Syscall Arguments
       bpf()  syscall  operation  is determined by cmd which can be one of the
       following:

       BPF_MAP_CREATE
              Create a map with given type and attributes and return map FD

       BPF_MAP_LOOKUP_ELEM
              Lookup element by key in a given map and return its value

       BPF_MAP_UPDATE_ELEM
              Create or update element (key/value pair) in a given map

       BPF_MAP_DELETE_ELEM
              Lookup and delete element by key in a given map

       BPF_MAP_GET_NEXT_KEY
              Lookup element by key in a given map  and  return  key  of  next
              element

       BPF_PROG_LOAD
              Verify and load eBPF program

       attr   is a pointer to a union of type bpf_attr as defined below.

       size   is the size of the union.

       union bpf_attr {
           struct { /* anonymous struct used by BPF_MAP_CREATE command */
               __u32             map_type;
               __u32             key_size;    /* size of key in bytes */
               __u32             value_size;  /* size of value in bytes */
               __u32             max_entries; /* max number of entries in a map */
           };

           struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
               __u32             map_fd;
               __aligned_u64     key;
               union {
                   __aligned_u64 value;
                   __aligned_u64 next_key;
               };
           };

           struct { /* anonymous struct used by BPF_PROG_LOAD command */
               __u32         prog_type;
               __u32         insn_cnt;
               __aligned_u64 insns;     /* 'const struct bpf_insn *' */
               __aligned_u64 license;   /* 'const char *' */
               __u32         log_level; /* verbosity level of eBPF verifier */
               __u32         log_size;  /* size of user buffer */
               __aligned_u64 log_buf;   /* user supplied 'char *' buffer */
           };
       } __attribute__((aligned(8)));

   eBPF maps
       maps  is  a generic storage of different types for sharing data between
       kernel and userspace.

       Any map type has the following attributes:
         . type
         . max number of elements
         . key size in bytes
         . value size in bytes

       The following wrapper functions demonstrate how  this  syscall  can  be
       used  to  access the maps. The functions use the cmd argument to invoke
       different operations.

       BPF_MAP_CREATE
              int bpf_create_map(enum bpf_map_type map_type, int key_size,
                                 int value_size, int max_entries)
              {
                  union bpf_attr attr = {
                      .map_type = map_type,
                      .key_size = key_size,
                      .value_size = value_size,
                      .max_entries = max_entries
                  };

                  return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
              }
              bpf()  syscall  creates  a  map  of  map_type  type  and   given
              attributes  key_size,  value_size,  max_entries.   On success it
              returns process-local file descriptor. On error, -1 is  returned
              and errno is set to EINVAL or EPERM or ENOMEM.

              The  attributes key_size and value_size will be used by verifier
              during  program  loading  to  check  that  program  is   calling
              bpf_map_*_elem() helper functions with correctly initialized key
              and  that  program  doesn't  access  map  element  value  beyond
              specified  value_size.   For  example,  when map is created with
              key_size = 8 and program does:
              bpf_map_lookup_elem(map_fd, fp - 4)
              such program will be rejected, since in-kernel  helper  function
              bpf_map_lookup_elem(map_fd,  void  *key) expects to read 8 bytes
              from 'key' pointer, but 'fp - 4' starting address will cause out
              of bounds stack access.

              Similarly,  when  map is created with value_size = 1 and program
              does:
              value = bpf_map_lookup_elem(...);
              *(u32 *)value = 1;
              such program will be rejected, since it accesses  value  pointer
              beyond specified 1 byte value_size limit.

              Currently only hash table map_type is supported:
              enum bpf_map_type {
                 BPF_MAP_TYPE_UNSPEC,
                 BPF_MAP_TYPE_HASH,
              };
              map_type  selects  one  of  the available map implementations in
              kernel. For all map_types eBPF programs  access  maps  with  the
              same      bpf_map_lookup_elem()/bpf_map_update_elem()     helper
              functions.

       BPF_MAP_LOOKUP_ELEM
              int bpf_lookup_elem(int fd, void *key, void *value)
              {
                  union bpf_attr attr = {
                      .map_fd = fd,
                      .key = ptr_to_u64(key),
                      .value = ptr_to_u64(value),
                  };

                  return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
              }
              bpf() syscall looks up an element with given key in  a  map  fd.
              If  element  is found it returns zero and stores element's value
              into value.  If element is not found  it  returns  -1  and  sets
              errno to ENOENT.

       BPF_MAP_UPDATE_ELEM
              int bpf_update_elem(int fd, void *key, void *value)
              {
                  union bpf_attr attr = {
                      .map_fd = fd,
                      .key = ptr_to_u64(key),
                      .value = ptr_to_u64(value),
                  };

                  return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
              }
              The  call  creates  or updates element with given key/value in a
              map fd.  On success it returns zero.  On error, -1  is  returned
              and  errno  is set to EINVAL or EPERM or ENOMEM or E2BIG.  E2BIG
              indicates that number of elements in the map reached max_entries
              limit specified at map creation time.

       BPF_MAP_DELETE_ELEM
              int bpf_delete_elem(int fd, void *key)
              {
                  union bpf_attr attr = {
                      .map_fd = fd,
                      .key = ptr_to_u64(key),
                  };

                  return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
              }
              The call deletes an element in a map fd with given key.  Returns
              zero on success. If element is not found it returns -1 and  sets
              errno to ENOENT.

       BPF_MAP_GET_NEXT_KEY
              int bpf_get_next_key(int fd, void *key, void *next_key)
              {
                  union bpf_attr attr = {
                      .map_fd = fd,
                      .key = ptr_to_u64(key),
                      .next_key = ptr_to_u64(next_key),
                  };

                  return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
              }
              The  call  looks  up  an  element  by  key in a given map fd and
              returns key of the next element into next_key pointer. If key is
              not  found,  it return zero and returns key of the first element
              into next_key. If key is the last element,  it  returns  -1  and
              sets  errno  to  ENOENT. Other possible errno values are ENOMEM,
              EFAULT, EPERM, EINVAL.  This method can be used to iterate  over
              all elements of the map.

       close(map_fd)
              will  delete  the  map  map_fd.  Exiting process will delete all
              maps automatically.

   eBPF programs
       BPF_PROG_LOAD
              This cmd is used to load eBPF program into the kernel.

              char bpf_log_buf[LOG_BUF_SIZE];

              int bpf_prog_load(enum bpf_prog_type prog_type,
                                const struct bpf_insn *insns, int insn_cnt,
                                const char *license)
              {
                  union bpf_attr attr = {
                      .prog_type = prog_type,
                      .insns = ptr_to_u64(insns),
                      .insn_cnt = insn_cnt,
                      .license = ptr_to_u64(license),
                      .log_buf = ptr_to_u64(bpf_log_buf),
                      .log_size = LOG_BUF_SIZE,
                      .log_level = 1,
                  };

                  return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
              }
              prog_type is one of the available program types:
              enum bpf_prog_type {
                      BPF_PROG_TYPE_UNSPEC,
                      BPF_PROG_TYPE_SOCKET,
                      BPF_PROG_TYPE_TRACING,
              };
              By picking prog_type program author  selects  a  set  of  helper
              functions callable from eBPF program and corresponding format of
              struct bpf_context (which is  the  data  blob  passed  into  the
              program  as  the  first  argument).   For  example, the programs
              loaded with  prog_type  =  TYPE_TRACING  may  call  bpf_printk()
              helper,  whereas  TYPE_SOCKET  programs  may  not.   The  set of
              functions  available  to  the  programs  under  given  type  may
              increase in the future.

              Currently the set of functions for TYPE_TRACING is:
              bpf_map_lookup_elem(map_fd, void *key)              // lookup key in a map_fd
              bpf_map_update_elem(map_fd, void *key, void *value) // update key/value
              bpf_map_delete_elem(map_fd, void *key)              // delete key in a map_fd
              bpf_ktime_get_ns(void)                              // returns current ktime
              bpf_printk(char *fmt, int fmt_size, ...)            // prints into trace buffer
              bpf_memcmp(void *ptr1, void *ptr2, int size)        // non-faulting memcmp
              bpf_fetch_ptr(void *ptr)    // non-faulting load pointer from any address
              bpf_fetch_u8(void *ptr)     // non-faulting 1 byte load
              bpf_fetch_u16(void *ptr)    // other non-faulting loads
              bpf_fetch_u32(void *ptr)
              bpf_fetch_u64(void *ptr)

              and bpf_context is defined as:
              struct bpf_context {
                  /* argN fields match one to one to arguments passed to trace events */
                  u64 arg1, arg2, arg3, arg4, arg5, arg6;
                  /* return value from kretprobe event or from syscall_exit event */
                  u64 ret;
              };

              The set of helper functions for TYPE_SOCKET is TBD.

              More   program   types   may   be  added  in  the  future.  Like
              BPF_PROG_TYPE_USER_TRACING for unprivileged programs.

              BPF_PROG_TYPE_UNSPEC is used for  testing  only.  Such  programs
              cannot be attached to events.

              insns array of "struct bpf_insn" instructions

              insn_cnt number of instructions in the program

              license  license  string,  which  must be GPL compatible to call
              helper functions marked gpl_only

              log_buf user supplied buffer that in-kernel verifier is using to
              store  verification  log. Log is a multi-line string that should
              be used by program author to understand  how  verifier  came  to
              conclusion  that program is unsafe. The format of the output can
              change at any time as verifier evolves.

              log_size size of user buffer. If size of the buffer is not large
              enough  to store all verifier messages, -1 is returned and errno
              is set to ENOSPC.

              log_level verbosity level of eBPF verifier, where zero means  no
              logs provided

       close(prog_fd)
              will unload eBPF program

       The  maps  are  accesible  from  programs  and  generally  tie  the two
       together.  Programs process various events  (like  tracepoint,  kprobe,
       packets)  and  store  the  data into maps. User space fetches data from
       maps.  Either the same or a different map may be used by user space  as
       configuration space to alter program behavior on the fly.

   Events
       Once an eBPF program is loaded, it can be attached to an event. Various
       kernel subsystems have different ways to do so. For example:

       setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd));
       will attach the program prog_fd to socket sock which  was  received  by
       prior call to socket().

       ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
       will  attach  the  program  prog_fd  to  perf  event event_fd which was
       received by prior call to perf_event_open().

       Another way to attach the program to a tracing event is:
       event_fd = open("/sys/kernel/debug/tracing/events/skb/kfree_skb/filter");
       write(event_fd, "bpf-123"); /* where 123 is eBPF program FD */
       /* here program is attached and will be triggered by events */
       close(event_fd); /* to detach from event */

EXAMPLES
       /* eBPF+sockets example:
        * 1. create map with maximum of 2 elements
        * 2. set map[6] = 0 and map[17] = 0
        * 3. load eBPF program that counts number of TCP and UDP packets received
        *    via map[skb->ip->proto]++
        * 4. attach prog_fd to raw socket via setsockopt()
        * 5. print number of received TCP/UDP packets every second
        */
       int main(int ac, char **av)
       {
           int sock, map_fd, prog_fd, key;
           long long value = 0, tcp_cnt, udp_cnt;

           map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 2);
           if (map_fd < 0) {
               printf("failed to create map '%s'\n", strerror(errno));
               /* likely not run as root */
               return 1;
           }

           key = 6; /* ip->proto == tcp */
           assert(bpf_update_elem(map_fd, &key, &value) == 0);

           key = 17; /* ip->proto == udp */
           assert(bpf_update_elem(map_fd, &key, &value) == 0);

           struct bpf_insn prog[] = {
               BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),          /* r6 = r1 */
               BPF_LD_ABS(BPF_B, 14 + 9),                    /* r0 = ip->proto */
               BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),/* *(u32 *)(fp - 4) = r0 */
               BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),         /* r2 = fp */
               BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),        /* r2 = r2 - 4 */
               BPF_LD_MAP_FD(BPF_REG_1, map_fd),             /* r1 = map_fd */
               BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),      /* r0 = map_lookup(r1, r2) */
               BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),        /* if (r0 == 0) goto pc+2 */
               BPF_MOV64_IMM(BPF_REG_1, 1),                  /* r1 = 1 */
               BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* lock *(u64 *)r0 += r1 */
               BPF_MOV64_IMM(BPF_REG_0, 0),                  /* r0 = 0 */
               BPF_EXIT_INSN(),                              /* return r0 */
           };
           prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET, prog, sizeof(prog), "GPL");
           assert(prog_fd >= 0);

           sock = open_raw_sock("lo");

           assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
                             sizeof(prog_fd)) == 0);

           for (;;) {
               key = 6;
               assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
               key = 17;
               assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
               printf("TCP %lld UDP %lld packets0, tcp_cnt, udp_cnt);
               sleep(1);
           }

           return 0;
       }

RETURN VALUE
       For a successful call, the return value depends on the operation:

       BPF_MAP_CREATE
              The new file descriptor associated with eBPF map.

       BPF_PROG_LOAD
              The new file descriptor associated with eBPF program.

       All other commands
              Zero.

       On error, -1 is returned, and errno is set appropriately.

ERRORS
       EPERM  bpf() syscall was made without sufficient privilege (without the
              CAP_SYS_ADMIN capability).

       ENOMEM Cannot allocate sufficient memory.

       EBADF  fd is not an open file descriptor

       EFAULT One  of  the  pointers  (  key or value or log_buf or insns ) is
              outside accessible address space.

       EINVAL The value specified in cmd is not recognized by this kernel.

       EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid.

       EINVAL For BPF_MAP_*_ELEM  commands,  some  of  the  fields  of  "union
              bpf_attr" unused by this command are not set to zero.

       EINVAL For BPF_PROG_LOAD, attempt to load invalid program (unrecognized
              instruction or uses reserved fields or jumps  out  of  range  or
              loop detected or calls unknown function).

       EACCES For BPF_PROG_LOAD, though program has valid instructions, it was
              rejected, since it was  deemed  unsafe  (may  access  disallowed
              memory   region  or  uninitialized  stack/register  or  function
              constraints don't match actual types or misaligned  access).  In
              such case it is recommended to call bpf() again with log_level =
              1 and examine log_buf for specific reason provided by verifier.

       ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM,  indicates  that
              element with given key was not found.

       E2BIG  program  is  too  large  or a map reached max_entries limit (max
              number of elements).

NOTES
       These commands may be used only by a privileged process (one having the
       CAP_SYS_ADMIN capability).

SEE ALSO
       eBPF    architecture    and    instruction    set   is   explained   in
       Documentation/networking/filter.txt

Linux                             2014-09-16                            BPF(2)
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 4a8e320c 3c731eba
......@@ -1001,6 +1001,269 @@ instruction that loads 64-bit immediate value into a dst_reg.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
32-bit immediate value into a register.
eBPF verifier
-------------
The safety of the eBPF program is determined in two steps.
First step does DAG check to disallow loops and other CFG validation.
In particular it will detect programs that have unreachable instructions.
(though classic BPF checker allows them)
Second step starts from the first insn and descends all possible paths.
It simulates execution of every insn and observes the state change of
registers and stack.
At the start of the program the register R1 contains a pointer to context
and has type PTR_TO_CTX.
If verifier sees an insn that does R2=R1, then R2 has now type
PTR_TO_CTX as well and can be used on the right hand side of expression.
If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=UNKNOWN_VALUE,
since addition of two valid pointers makes invalid pointer.
(In 'secure' mode verifier will reject any type of pointer arithmetic to make
sure that kernel addresses don't leak to unprivileged users)
If register was never written to, it's not readable:
bpf_mov R0 = R2
bpf_exit
will be rejected, since R2 is unreadable at the start of the program.
After kernel function call, R1-R5 are reset to unreadable and
R0 has a return type of the function.
Since R6-R9 are callee saved, their state is preserved across the call.
bpf_mov R6 = 1
bpf_call foo
bpf_mov R0 = R6
bpf_exit
is a correct program. If there was R1 instead of R6, it would have
been rejected.
load/store instructions are allowed only with registers of valid types, which
are PTR_TO_CTX, PTR_TO_MAP, FRAME_PTR. They are bounds and alignment checked.
For example:
bpf_mov R1 = 1
bpf_mov R2 = 2
bpf_xadd *(u32 *)(R1 + 3) += R2
bpf_exit
will be rejected, since R1 doesn't have a valid pointer type at the time of
execution of instruction bpf_xadd.
At the start R1 type is PTR_TO_CTX (a pointer to generic 'struct bpf_context')
A callback is used to customize verifier to restrict eBPF program access to only
certain fields within ctx structure with specified size and alignment.
For example, the following insn:
bpf_ld R0 = *(u32 *)(R6 + 8)
intends to load a word from address R6 + 8 and store it into R0
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
that offset 8 of size 4 bytes can be accessed for reading, otherwise
the verifier will reject the program.
If R6=FRAME_PTR, then access should be aligned and be within
stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
so it will fail verification, since it's out of bounds.
The verifier will allow eBPF program to read data from stack only after
it wrote into it.
Classic BPF verifier does similar check with M[0-15] memory slots.
For example:
bpf_ld R0 = *(u32 *)(R10 - 4)
bpf_exit
is invalid program.
Though R10 is correct read-only register and has type FRAME_PTR
and R10 - 4 is within stack bounds, there were no stores into that location.
Pointer register spill/fill is tracked as well, since four (R6-R9)
callee saved registers may not be enough for some programs.
Allowed function calls are customized with bpf_verifier_ops->get_func_proto()
The eBPF verifier will check that registers match argument constraints.
After the call register R0 will be set to return type of the function.
Function calls is a main mechanism to extend functionality of eBPF programs.
Socket filters may let programs to call one set of functions, whereas tracing
filters may allow completely different set.
If a function made accessible to eBPF program, it needs to be thought through
from safety point of view. The verifier will guarantee that the function is
called with valid arguments.
seccomp vs socket filters have different security restrictions for classic BPF.
Seccomp solves this by two stage verifier: classic BPF verifier is followed
by seccomp verifier. In case of eBPF one configurable verifier is shared for
all use cases.
See details of eBPF verifier in kernel/bpf/verifier.c
eBPF maps
---------
'maps' is a generic storage of different types for sharing data between kernel
and userspace.
The maps are accessed from user space via BPF syscall, which has commands:
- create a map with given type and attributes
map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)
using attr->map_type, attr->key_size, attr->value_size, attr->max_entries
returns process-local file descriptor or negative error
- lookup key in a given map
err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
using attr->map_fd, attr->key, attr->value
returns zero and stores found elem into value or negative error
- create or update key/value pair in a given map
err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
using attr->map_fd, attr->key, attr->value
returns zero or negative error
- find and delete element by key in a given map
err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
using attr->map_fd, attr->key
- to delete map: close(fd)
Exiting process will delete maps automatically
userspace programs use this syscall to create/access maps that eBPF programs
are concurrently updating.
maps can have different types: hash, array, bloom filter, radix-tree, etc.
The map is defined by:
. type
. max number of elements
. key size in bytes
. value size in bytes
Understanding eBPF verifier messages
------------------------------------
The following are few examples of invalid eBPF programs and verifier error
messages as seen in the log:
Program with unreachable instructions:
static struct bpf_insn prog[] = {
BPF_EXIT_INSN(),
BPF_EXIT_INSN(),
};
Error:
unreachable insn 1
Program that reads uninitialized register:
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
BPF_EXIT_INSN(),
Error:
0: (bf) r0 = r2
R2 !read_ok
Program that doesn't initialize R0 before exiting:
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
BPF_EXIT_INSN(),
Error:
0: (bf) r2 = r1
1: (95) exit
R0 !read_ok
Program that accesses stack out of bounds:
BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
BPF_EXIT_INSN(),
Error:
0: (7a) *(u64 *)(r10 +8) = 0
invalid stack off=8 size=8
Program that doesn't initialize stack before passing its address into function:
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
Error:
0: (bf) r2 = r10
1: (07) r2 += -8
2: (b7) r1 = 0x0
3: (85) call 1
invalid indirect read from stack off -8+0 size 8
Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
Error:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 0x0
4: (85) call 1
fd 0 is not pointing to valid bpf_map
Program that doesn't check return value of map_lookup_elem() before accessing
map element:
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
BPF_EXIT_INSN(),
Error:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 0x0
4: (85) call 1
5: (7a) *(u64 *)(r0 +0) = 0
R0 invalid mem access 'map_value_or_null'
Program that correctly checks map_lookup_elem() returned value for NULL, but
accesses the memory with incorrect alignment:
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
BPF_EXIT_INSN(),
Error:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 1
4: (85) call 1
5: (15) if r0 == 0x0 goto pc+1
R0=map_ptr R10=fp
6: (7a) *(u64 *)(r0 +4) = 0
misaligned access off 4 size 8
Program that correctly checks map_lookup_elem() returned value for NULL and
accesses memory with correct alignment in one side of 'if' branch, but fails
to do so in the other side of 'if' branch:
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
BPF_EXIT_INSN(),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
Error:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 1
4: (85) call 1
5: (15) if r0 == 0x0 goto pc+2
R0=map_ptr R10=fp
6: (7a) *(u64 *)(r0 +0) = 0
7: (95) exit
from 5 to 8: R0=imm0 R10=fp
8: (7a) *(u64 *)(r0 +0) = 1
R0 invalid mem access 'imm'
Testing
-------
......
......@@ -363,3 +363,4 @@
354 i386 seccomp sys_seccomp
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
......@@ -327,6 +327,7 @@
318 common getrandom sys_getrandom
319 common memfd_create sys_memfd_create
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
#
# x32-specific system call numbers start at 512 to avoid cache impact
......
/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#ifndef _LINUX_BPF_H
#define _LINUX_BPF_H 1
#include <uapi/linux/bpf.h>
#include <linux/workqueue.h>
#include <linux/file.h>
struct bpf_map;
/* map is generic key/value storage optionally accesible by eBPF programs */
struct bpf_map_ops {
/* funcs callable from userspace (via syscall) */
struct bpf_map *(*map_alloc)(union bpf_attr *attr);
void (*map_free)(struct bpf_map *);
int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
/* funcs callable from userspace and from eBPF programs */
void *(*map_lookup_elem)(struct bpf_map *map, void *key);
int (*map_update_elem)(struct bpf_map *map, void *key, void *value);
int (*map_delete_elem)(struct bpf_map *map, void *key);
};
struct bpf_map {
atomic_t refcnt;
enum bpf_map_type map_type;
u32 key_size;
u32 value_size;
u32 max_entries;
struct bpf_map_ops *ops;
struct work_struct work;
};
struct bpf_map_type_list {
struct list_head list_node;
struct bpf_map_ops *ops;
enum bpf_map_type type;
};
void bpf_register_map_type(struct bpf_map_type_list *tl);
void bpf_map_put(struct bpf_map *map);
struct bpf_map *bpf_map_get(struct fd f);
/* function argument constraints */
enum bpf_arg_type {
ARG_ANYTHING = 0, /* any argument is ok */
/* the following constraints used to prototype
* bpf_map_lookup/update/delete_elem() functions
*/
ARG_CONST_MAP_PTR, /* const argument used as pointer to bpf_map */
ARG_PTR_TO_MAP_KEY, /* pointer to stack used as map key */
ARG_PTR_TO_MAP_VALUE, /* pointer to stack used as map value */
/* the following constraints used to prototype bpf_memcmp() and other
* functions that access data on eBPF program stack
*/
ARG_PTR_TO_STACK, /* any pointer to eBPF program stack */
ARG_CONST_STACK_SIZE, /* number of bytes accessed from stack */
};
/* type of values returned from helper functions */
enum bpf_return_type {
RET_INTEGER, /* function returns integer */
RET_VOID, /* function doesn't return anything */
RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */
};
/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
* to in-kernel helper functions and for adjusting imm32 field in BPF_CALL
* instructions after verifying
*/
struct bpf_func_proto {
u64 (*func)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
bool gpl_only;
enum bpf_return_type ret_type;
enum bpf_arg_type arg1_type;
enum bpf_arg_type arg2_type;
enum bpf_arg_type arg3_type;
enum bpf_arg_type arg4_type;
enum bpf_arg_type arg5_type;
};
/* bpf_context is intentionally undefined structure. Pointer to bpf_context is
* the first argument to eBPF programs.
* For socket filters: 'struct bpf_context *' == 'struct sk_buff *'
*/
struct bpf_context;
enum bpf_access_type {
BPF_READ = 1,
BPF_WRITE = 2
};
struct bpf_verifier_ops {
/* return eBPF function prototype for verification */
const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
/* return true if 'size' wide access at offset 'off' within bpf_context
* with 'type' (read or write) is allowed
*/
bool (*is_valid_access)(int off, int size, enum bpf_access_type type);
};
struct bpf_prog_type_list {
struct list_head list_node;
struct bpf_verifier_ops *ops;
enum bpf_prog_type type;
};
void bpf_register_prog_type(struct bpf_prog_type_list *tl);
struct bpf_prog;
struct bpf_prog_aux {
atomic_t refcnt;
bool is_gpl_compatible;
enum bpf_prog_type prog_type;
struct bpf_verifier_ops *ops;
struct bpf_map **used_maps;
u32 used_map_cnt;
struct bpf_prog *prog;
struct work_struct work;
};
void bpf_prog_put(struct bpf_prog *prog);
struct bpf_prog *bpf_prog_get(u32 ufd);
/* verify correctness of eBPF program */
int bpf_check(struct bpf_prog *fp, union bpf_attr *attr);
#endif /* _LINUX_BPF_H */
......@@ -21,6 +21,7 @@
struct sk_buff;
struct sock;
struct seccomp_data;
struct bpf_prog_aux;
/* ArgX, context and stack frame pointer register positions. Note,
* Arg1, Arg2, Arg3, etc are used as argument mappings of function
......@@ -144,6 +145,12 @@ struct seccomp_data;
.off = 0, \
.imm = ((__u64) (IMM)) >> 32 })
#define BPF_PSEUDO_MAP_FD 1
/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
#define BPF_LD_MAP_FD(DST, MAP_FD) \
BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \
......@@ -300,17 +307,12 @@ struct bpf_binary_header {
u8 image[];
};
struct bpf_work_struct {
struct bpf_prog *prog;
struct work_struct work;
};
struct bpf_prog {
u16 pages; /* Number of allocated pages */
bool jited; /* Is our filter JIT'ed? */
u32 len; /* Number of filter blocks */
struct sock_fprog_kern *orig_prog; /* Original BPF program */
struct bpf_work_struct *work; /* Deferred free work struct */
struct bpf_prog_aux *aux; /* Auxiliary fields */
unsigned int (*bpf_func)(const struct sk_buff *skb,
const struct bpf_insn *filter);
/* Instructions for interpreter */
......
......@@ -65,6 +65,7 @@ struct old_linux_dirent;
struct perf_event_attr;
struct file_handle;
struct sigaltstack;
union bpf_attr;
#include <linux/types.h>
#include <linux/aio_abi.h>
......@@ -875,5 +876,5 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
const char __user *uargs);
asmlinkage long sys_getrandom(char __user *buf, size_t count,
unsigned int flags);
asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
#endif
......@@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
__SYSCALL(__NR_getrandom, sys_getrandom)
#define __NR_memfd_create 279
__SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 280
__SYSCALL(__NR_bpf, sys_bpf)
#undef __NR_syscalls
#define __NR_syscalls 280
#define __NR_syscalls 281
/*
* All syscalls below here should go away really,
......
......@@ -62,4 +62,94 @@ struct bpf_insn {
__s32 imm; /* signed immediate constant */
};
/* BPF syscall commands */
enum bpf_cmd {
/* create a map with given type and attributes
* fd = bpf(BPF_MAP_CREATE, union bpf_attr *, u32 size)
* returns fd or negative error
* map is deleted when fd is closed
*/
BPF_MAP_CREATE,
/* lookup key in a given map
* err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
* Using attr->map_fd, attr->key, attr->value
* returns zero and stores found elem into value
* or negative error
*/
BPF_MAP_LOOKUP_ELEM,
/* create or update key/value pair in a given map
* err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
* Using attr->map_fd, attr->key, attr->value
* returns zero or negative error
*/
BPF_MAP_UPDATE_ELEM,
/* find and delete elem by key in a given map
* err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
* Using attr->map_fd, attr->key
* returns zero or negative error
*/
BPF_MAP_DELETE_ELEM,
/* lookup key in a given map and return next key
* err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size)
* Using attr->map_fd, attr->key, attr->next_key
* returns zero and stores next key or negative error
*/
BPF_MAP_GET_NEXT_KEY,
/* verify and load eBPF program
* prog_fd = bpf(BPF_PROG_LOAD, union bpf_attr *attr, u32 size)
* Using attr->prog_type, attr->insns, attr->license
* returns fd or negative error
*/
BPF_PROG_LOAD,
};
enum bpf_map_type {
BPF_MAP_TYPE_UNSPEC,
};
enum bpf_prog_type {
BPF_PROG_TYPE_UNSPEC,
};
union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32 map_type; /* one of enum bpf_map_type */
__u32 key_size; /* size of key in bytes */
__u32 value_size; /* size of value in bytes */
__u32 max_entries; /* max number of entries in a map */
};
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
__u32 map_fd;
__aligned_u64 key;
union {
__aligned_u64 value;
__aligned_u64 next_key;
};
};
struct { /* anonymous struct used by BPF_PROG_LOAD command */
__u32 prog_type; /* one of enum bpf_prog_type */
__u32 insn_cnt;
__aligned_u64 insns;
__aligned_u64 license;
__u32 log_level; /* verbosity level of verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied buffer */
};
} __attribute__((aligned(8)));
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
*/
enum bpf_func_id {
BPF_FUNC_unspec,
__BPF_FUNC_MAX_ID,
};
#endif /* _UAPI__LINUX_BPF_H__ */
obj-y := core.o
obj-y := core.o syscall.o verifier.o
ifdef CONFIG_TEST_BPF
obj-y += test_stub.o
endif
......@@ -27,6 +27,7 @@
#include <linux/random.h>
#include <linux/moduleloader.h>
#include <asm/unaligned.h>
#include <linux/bpf.h>
/* Registers */
#define BPF_R0 regs[BPF_REG_0]
......@@ -71,7 +72,7 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
{
gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO |
gfp_extra_flags;
struct bpf_work_struct *ws;
struct bpf_prog_aux *aux;
struct bpf_prog *fp;
size = round_up(size, PAGE_SIZE);
......@@ -79,14 +80,14 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
if (fp == NULL)
return NULL;
ws = kmalloc(sizeof(*ws), GFP_KERNEL | gfp_extra_flags);
if (ws == NULL) {
aux = kzalloc(sizeof(*aux), GFP_KERNEL | gfp_extra_flags);
if (aux == NULL) {
vfree(fp);
return NULL;
}
fp->pages = size / PAGE_SIZE;
fp->work = ws;
fp->aux = aux;
return fp;
}
......@@ -110,10 +111,10 @@ struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
memcpy(fp, fp_old, fp_old->pages * PAGE_SIZE);
fp->pages = size / PAGE_SIZE;
/* We keep fp->work from fp_old around in the new
/* We keep fp->aux from fp_old around in the new
* reallocated structure.
*/
fp_old->work = NULL;
fp_old->aux = NULL;
__bpf_prog_free(fp_old);
}
......@@ -123,7 +124,7 @@ EXPORT_SYMBOL_GPL(bpf_prog_realloc);
void __bpf_prog_free(struct bpf_prog *fp)
{
kfree(fp->work);
kfree(fp->aux);
vfree(fp);
}
EXPORT_SYMBOL_GPL(__bpf_prog_free);
......@@ -638,19 +639,19 @@ EXPORT_SYMBOL_GPL(bpf_prog_select_runtime);
static void bpf_prog_free_deferred(struct work_struct *work)
{
struct bpf_work_struct *ws;
struct bpf_prog_aux *aux;
ws = container_of(work, struct bpf_work_struct, work);
bpf_jit_free(ws->prog);
aux = container_of(work, struct bpf_prog_aux, work);
bpf_jit_free(aux->prog);
}
/* Free internal BPF program */
void bpf_prog_free(struct bpf_prog *fp)
{
struct bpf_work_struct *ws = fp->work;
struct bpf_prog_aux *aux = fp->aux;
INIT_WORK(&ws->work, bpf_prog_free_deferred);
ws->prog = fp;
schedule_work(&ws->work);
INIT_WORK(&aux->work, bpf_prog_free_deferred);
aux->prog = fp;
schedule_work(&aux->work);
}
EXPORT_SYMBOL_GPL(bpf_prog_free);
This diff is collapsed.
/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/bpf.h>
/* test stubs for BPF_MAP_TYPE_UNSPEC and for BPF_PROG_TYPE_UNSPEC
* to be used by user space verifier testsuite
*/
struct bpf_context {
u64 arg1;
u64 arg2;
};
static u64 test_func(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
{
return 0;
}
static struct bpf_func_proto test_funcs[] = {
[BPF_FUNC_unspec] = {
.func = test_func,
.gpl_only = true,
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_MAP_KEY,
},
};
static const struct bpf_func_proto *test_func_proto(enum bpf_func_id func_id)
{
if (func_id < 0 || func_id >= ARRAY_SIZE(test_funcs))
return NULL;
return &test_funcs[func_id];
}
static const struct bpf_context_access {
int size;
enum bpf_access_type type;
} test_ctx_access[] = {
[offsetof(struct bpf_context, arg1)] = {
FIELD_SIZEOF(struct bpf_context, arg1),
BPF_READ
},
[offsetof(struct bpf_context, arg2)] = {
FIELD_SIZEOF(struct bpf_context, arg2),
BPF_READ
},
};
static bool test_is_valid_access(int off, int size, enum bpf_access_type type)
{
const struct bpf_context_access *access;
if (off < 0 || off >= ARRAY_SIZE(test_ctx_access))
return false;
access = &test_ctx_access[off];
if (access->size == size && (access->type & type))
return true;
return false;
}
static struct bpf_verifier_ops test_ops = {
.get_func_proto = test_func_proto,
.is_valid_access = test_is_valid_access,
};
static struct bpf_prog_type_list tl_prog = {
.ops = &test_ops,
.type = BPF_PROG_TYPE_UNSPEC,
};
static struct bpf_map *test_map_alloc(union bpf_attr *attr)
{
struct bpf_map *map;
map = kzalloc(sizeof(*map), GFP_USER);
if (!map)
return ERR_PTR(-ENOMEM);
map->key_size = attr->key_size;
map->value_size = attr->value_size;
map->max_entries = attr->max_entries;
return map;
}
static void test_map_free(struct bpf_map *map)
{
kfree(map);
}
static struct bpf_map_ops test_map_ops = {
.map_alloc = test_map_alloc,
.map_free = test_map_free,
};
static struct bpf_map_type_list tl_map = {
.ops = &test_map_ops,
.type = BPF_MAP_TYPE_UNSPEC,
};
static int __init register_test_ops(void)
{
bpf_register_map_type(&tl_map);
bpf_register_prog_type(&tl_prog);
return 0;
}
late_initcall(register_test_ops);
This diff is collapsed.
......@@ -218,3 +218,6 @@ cond_syscall(sys_kcmp);
/* operate on Secure Computing state */
cond_syscall(sys_seccomp);
/* access BPF programs and maps */
cond_syscall(sys_bpf);
......@@ -1672,7 +1672,8 @@ config TEST_BPF
against the BPF interpreter or BPF JIT compiler depending on the
current setting. This is in particular useful for BPF JIT compiler
development, but also to run regression tests against changes in
the interpreter code.
the interpreter code. It also enables test stubs for eBPF maps and
verifier used by user space verifier testsuite.
If unsure, say N.
......
# kbuild trick to avoid linker error. Can be omitted if a module is built.
obj- := dummy.o
# List of programs to build
hostprogs-y := test_verifier
test_verifier-objs := test_verifier.o libbpf.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
HOSTCFLAGS += -I$(objtree)/usr/include
/* eBPF mini library */
#include <stdlib.h>
#include <stdio.h>
#include <linux/unistd.h>
#include <unistd.h>
#include <string.h>
#include <linux/netlink.h>
#include <linux/bpf.h>
#include <errno.h>
#include "libbpf.h"
static __u64 ptr_to_u64(void *ptr)
{
return (__u64) (unsigned long) ptr;
}
int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
int max_entries)
{
union bpf_attr attr = {
.map_type = map_type,
.key_size = key_size,
.value_size = value_size,
.max_entries = max_entries
};
return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
}
int bpf_update_elem(int fd, void *key, void *value)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.value = ptr_to_u64(value),
};
return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}
int bpf_lookup_elem(int fd, void *key, void *value)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.value = ptr_to_u64(value),
};
return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
}
int bpf_delete_elem(int fd, void *key)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
};
return syscall(__NR_bpf, BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
}
int bpf_get_next_key(int fd, void *key, void *next_key)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.next_key = ptr_to_u64(next_key),
};
return syscall(__NR_bpf, BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
}
#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
char bpf_log_buf[LOG_BUF_SIZE];
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int prog_len,
const char *license)
{
union bpf_attr attr = {
.prog_type = prog_type,
.insns = ptr_to_u64((void *) insns),
.insn_cnt = prog_len / sizeof(struct bpf_insn),
.license = ptr_to_u64((void *) license),
.log_buf = ptr_to_u64(bpf_log_buf),
.log_size = LOG_BUF_SIZE,
.log_level = 1,
};
bpf_log_buf[0] = 0;
return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
}
/* eBPF mini library */
#ifndef __LIBBPF_H
#define __LIBBPF_H
struct bpf_insn;
int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
int max_entries);
int bpf_update_elem(int fd, void *key, void *value);
int bpf_lookup_elem(int fd, void *key, void *value);
int bpf_delete_elem(int fd, void *key);
int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int insn_len,
const char *license);
#define LOG_BUF_SIZE 8192
extern char bpf_log_buf[LOG_BUF_SIZE];
/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
#define BPF_ALU64_REG(OP, DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
#define BPF_ALU32_REG(OP, DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
#define BPF_ALU64_IMM(OP, DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
#define BPF_ALU32_IMM(OP, DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
/* Short form of mov, dst_reg = src_reg */
#define BPF_MOV64_REG(DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = 0 })
/* Short form of mov, dst_reg = imm32 */
#define BPF_MOV64_IMM(DST, IMM) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = 0, \
.imm = IMM })
/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
#define BPF_LD_IMM64(DST, IMM) \
BPF_LD_IMM64_RAW(DST, 0, IMM)
#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
((struct bpf_insn) { \
.code = BPF_LD | BPF_DW | BPF_IMM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = 0, \
.imm = (__u32) (IMM) }), \
((struct bpf_insn) { \
.code = 0, /* zero is reserved opcode */ \
.dst_reg = 0, \
.src_reg = 0, \
.off = 0, \
.imm = ((__u64) (IMM)) >> 32 })
#define BPF_PSEUDO_MAP_FD 1
/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
#define BPF_LD_MAP_FD(DST, MAP_FD) \
BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
((struct bpf_insn) { \
.code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
.dst_reg = DST, \
.src_reg = 0, \
.off = OFF, \
.imm = IMM })
/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
#define BPF_JMP_REG(OP, DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_OP(OP) | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_OP(OP) | BPF_K, \
.dst_reg = DST, \
.src_reg = 0, \
.off = OFF, \
.imm = IMM })
/* Raw code statement block */
#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
((struct bpf_insn) { \
.code = CODE, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = IMM })
/* Program exit */
#define BPF_EXIT_INSN() \
((struct bpf_insn) { \
.code = BPF_JMP | BPF_EXIT, \
.dst_reg = 0, \
.src_reg = 0, \
.off = 0, \
.imm = 0 })
#endif
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment