Commit 4b307a8e authored by David S. Miller's avatar David S. Miller

Merge branch 'bpf-direct-pkt-access'

Alexei Starovoitov says:

====================
bpf: introduce direct packet access

This set of patches introduce 'direct packet access' from
cls_bpf and act_bpf programs (which are root only).

Current bpf programs use LD_ABS, LD_INS instructions which have
to do 'if (off < skb_headlen)' for every packet access.
It's ok for socket filters, but too slow for XDP, since single
LD_ABS insn consumes 3% of cpu. Therefore we have to amortize the cost
of length check over multiple packet accesses via direct access
to skb->data, data_end pointers.

The existing packet parser typically look like:
  if (load_half(skb, offsetof(struct ethhdr, h_proto)) != ETH_P_IP)
     return 0;
  if (load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)) != IPPROTO_UDP ||
      load_byte(skb, ETH_HLEN) != 0x45)
     return 0;
  ...
with 'direct packet access' the bpf program becomes:
   void *data = (void *)(long)skb->data;
   void *data_end = (void *)(long)skb->data_end;
   struct eth_hdr *eth = data;
   struct iphdr *iph = data + sizeof(*eth);

   if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
      return 0;
   if (eth->h_proto != htons(ETH_P_IP))
      return 0;
   if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
      return 0;
   ...
which is more natural to write and significantly faster.
See patch 6 for performance tests:
21Mpps(old) vs 24Mpps(new) with just 5 loads.
For more complex parsers the performance gain is higher.

The other approach implemented in [1] was adding two new instructions
to interpreter and JITs and was too hard to use from llvm side.
The approach presented here doesn't need any instruction changes,
but the verifier has to work harder to check safety of the packet access.

Patch 1 prepares the code and Patch 2 adds new checks for direct
packet access and all of them are gated with 'env->allow_ptr_leaks'
which is true for root only.
Patch 3 improves search pruning for large programs.
Patch 4 wires in verifier's changes with net/core/filter side.
Patch 5 updates docs
Patches 6 and 7 add tests.

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 95aef7ce 883e44e4
...@@ -1095,6 +1095,87 @@ all use cases. ...@@ -1095,6 +1095,87 @@ all use cases.
See details of eBPF verifier in kernel/bpf/verifier.c See details of eBPF verifier in kernel/bpf/verifier.c
Direct packet access
--------------------
In cls_bpf and act_bpf programs the verifier allows direct access to the packet
data via skb->data and skb->data_end pointers.
Ex:
1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
2: r3 = *(u32 *)(r1 +76) /* load skb->data */
3: r5 = r3
4: r5 += 14
5: if r5 > r4 goto pc+16
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
this 2byte load from the packet is safe to do, since the program author
did check 'if (skb->data + 14 > skb->data_end) goto err' at insn #5 which
means that in the fall-through case the register R3 (which points to skb->data)
has at least 14 directly accessible bytes. The verifier marks it
as R3=pkt(id=0,off=0,r=14).
id=0 means that no additional variables were added to the register.
off=0 means that no additional constants were added.
r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
to the packet data, but constant 14 was added to the register, so
it now points to 'skb->data + 14' and accessible range is [R5, R5 + 14 - 14)
which is zero bytes.
More complex packet access may look like:
R0=imm1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
7: r4 = *(u8 *)(r3 +12)
8: r4 *= 14
9: r3 = *(u32 *)(r1 +76) /* load skb->data */
10: r3 += r4
11: r2 = r1
12: r2 <<= 48
13: r2 >>= 48
14: r3 += r2
15: r2 = r3
16: r2 += 8
17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
18: if r2 > r1 goto pc+2
R0=inv56 R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv52 R5=pkt(id=0,off=14,r=14) R10=fp
19: r1 = *(u8 *)(r3 +4)
The state of the register R3 is R3=pkt(id=2,off=0,r=8)
id=2 means that two 'r3 += rX' instructions were seen, so r3 points to some
offset within a packet and since the program author did
'if (r3 + 8 > r1) goto err' at insn #18, the safe range is [R3, R3 + 8).
The verifier only allows 'add' operation on packet registers. Any other
operation will set the register state to 'unknown_value' and it won't be
available for direct packet access.
Operation 'r3 += rX' may overflow and become less than original skb->data,
therefore the verifier has to prevent that. So it tracks the number of
upper zero bits in all 'uknown_value' registers, so when it sees
'r3 += rX' instruction and rX is more than 16-bit value, it will error as:
"cannot add integer value with N upper zero bits to ptr_to_packet"
Ex. after insn 'r4 = *(u8 *)(r3 +12)' (insn #7 above) the state of r4 is
R4=inv56 which means that upper 56 bits on the register are guaranteed
to be zero. After insn 'r4 *= 14' the state becomes R4=inv52, since
multiplying 8-bit value by constant 14 will keep upper 52 bits as zero.
Similarly 'r2 >>= 48' will make R2=inv48, since the shift is not sign
extending. This logic is implemented in evaluate_reg_alu() function.
The end result is that bpf program author can access packet directly
using normal C code as:
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct eth_hdr *eth = data;
struct iphdr *iph = data + sizeof(*eth);
struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
return 0;
if (eth->h_proto != htons(ETH_P_IP))
return 0;
if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
return 0;
if (udp->dest == 53 || udp->source == 9)
...;
which makes such programs easier to write comparing to LD_ABS insn
and significantly faster.
eBPF maps eBPF maps
--------- ---------
'maps' is a generic storage of different types for sharing data between kernel 'maps' is a generic storage of different types for sharing data between kernel
...@@ -1293,5 +1374,5 @@ to give potential BPF hackers or security auditors a better overview of ...@@ -1293,5 +1374,5 @@ to give potential BPF hackers or security auditors a better overview of
the underlying architecture. the underlying architecture.
Jay Schulist <jschlst@samba.org> Jay Schulist <jschlst@samba.org>
Daniel Borkmann <dborkman@redhat.com> Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov <ast@plumgrid.com> Alexei Starovoitov <ast@kernel.org>
...@@ -352,6 +352,22 @@ struct sk_filter { ...@@ -352,6 +352,22 @@ struct sk_filter {
#define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
struct bpf_skb_data_end {
struct qdisc_skb_cb qdisc_cb;
void *data_end;
};
/* compute the linear packet data range [data, data_end) which
* will be accessed by cls_bpf and act_bpf programs
*/
static inline void bpf_compute_data_end(struct sk_buff *skb)
{
struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
BUILD_BUG_ON(sizeof(*cb) > FIELD_SIZEOF(struct sk_buff, cb));
cb->data_end = skb->data + skb_headlen(skb);
}
static inline u8 *bpf_skb_cb(struct sk_buff *skb) static inline u8 *bpf_skb_cb(struct sk_buff *skb)
{ {
/* eBPF programs may read/write skb->cb[] area to transfer meta /* eBPF programs may read/write skb->cb[] area to transfer meta
......
...@@ -370,6 +370,8 @@ struct __sk_buff { ...@@ -370,6 +370,8 @@ struct __sk_buff {
__u32 cb[5]; __u32 cb[5];
__u32 hash; __u32 hash;
__u32 tc_classid; __u32 tc_classid;
__u32 data;
__u32 data_end;
}; };
struct bpf_tunnel_key { struct bpf_tunnel_key {
......
...@@ -794,6 +794,11 @@ void __weak bpf_int_jit_compile(struct bpf_prog *prog) ...@@ -794,6 +794,11 @@ void __weak bpf_int_jit_compile(struct bpf_prog *prog)
{ {
} }
bool __weak bpf_helper_changes_skb_data(void *func)
{
return false;
}
/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call /* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
* skb_copy_bits(), so provide a weak definition of it for NET-less config. * skb_copy_bits(), so provide a weak definition of it for NET-less config.
*/ */
......
This diff is collapsed.
...@@ -1344,6 +1344,21 @@ struct bpf_scratchpad { ...@@ -1344,6 +1344,21 @@ struct bpf_scratchpad {
static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp); static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp);
static inline int bpf_try_make_writable(struct sk_buff *skb,
unsigned int write_len)
{
int err;
if (!skb_cloned(skb))
return 0;
if (skb_clone_writable(skb, write_len))
return 0;
err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
if (!err)
bpf_compute_data_end(skb);
return err;
}
static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags) static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags)
{ {
struct bpf_scratchpad *sp = this_cpu_ptr(&bpf_sp); struct bpf_scratchpad *sp = this_cpu_ptr(&bpf_sp);
...@@ -1366,7 +1381,7 @@ static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags) ...@@ -1366,7 +1381,7 @@ static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags)
*/ */
if (unlikely((u32) offset > 0xffff || len > sizeof(sp->buff))) if (unlikely((u32) offset > 0xffff || len > sizeof(sp->buff)))
return -EFAULT; return -EFAULT;
if (unlikely(skb_try_make_writable(skb, offset + len))) if (unlikely(bpf_try_make_writable(skb, offset + len)))
return -EFAULT; return -EFAULT;
ptr = skb_header_pointer(skb, offset, len, sp->buff); ptr = skb_header_pointer(skb, offset, len, sp->buff);
...@@ -1444,7 +1459,7 @@ static u64 bpf_l3_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags) ...@@ -1444,7 +1459,7 @@ static u64 bpf_l3_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags)
return -EINVAL; return -EINVAL;
if (unlikely((u32) offset > 0xffff)) if (unlikely((u32) offset > 0xffff))
return -EFAULT; return -EFAULT;
if (unlikely(skb_try_make_writable(skb, offset + sizeof(sum)))) if (unlikely(bpf_try_make_writable(skb, offset + sizeof(sum))))
return -EFAULT; return -EFAULT;
ptr = skb_header_pointer(skb, offset, sizeof(sum), &sum); ptr = skb_header_pointer(skb, offset, sizeof(sum), &sum);
...@@ -1499,7 +1514,7 @@ static u64 bpf_l4_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags) ...@@ -1499,7 +1514,7 @@ static u64 bpf_l4_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags)
return -EINVAL; return -EINVAL;
if (unlikely((u32) offset > 0xffff)) if (unlikely((u32) offset > 0xffff))
return -EFAULT; return -EFAULT;
if (unlikely(skb_try_make_writable(skb, offset + sizeof(sum)))) if (unlikely(bpf_try_make_writable(skb, offset + sizeof(sum))))
return -EFAULT; return -EFAULT;
ptr = skb_header_pointer(skb, offset, sizeof(sum), &sum); ptr = skb_header_pointer(skb, offset, sizeof(sum), &sum);
...@@ -1699,12 +1714,15 @@ static u64 bpf_skb_vlan_push(u64 r1, u64 r2, u64 vlan_tci, u64 r4, u64 r5) ...@@ -1699,12 +1714,15 @@ static u64 bpf_skb_vlan_push(u64 r1, u64 r2, u64 vlan_tci, u64 r4, u64 r5)
{ {
struct sk_buff *skb = (struct sk_buff *) (long) r1; struct sk_buff *skb = (struct sk_buff *) (long) r1;
__be16 vlan_proto = (__force __be16) r2; __be16 vlan_proto = (__force __be16) r2;
int ret;
if (unlikely(vlan_proto != htons(ETH_P_8021Q) && if (unlikely(vlan_proto != htons(ETH_P_8021Q) &&
vlan_proto != htons(ETH_P_8021AD))) vlan_proto != htons(ETH_P_8021AD)))
vlan_proto = htons(ETH_P_8021Q); vlan_proto = htons(ETH_P_8021Q);
return skb_vlan_push(skb, vlan_proto, vlan_tci); ret = skb_vlan_push(skb, vlan_proto, vlan_tci);
bpf_compute_data_end(skb);
return ret;
} }
const struct bpf_func_proto bpf_skb_vlan_push_proto = { const struct bpf_func_proto bpf_skb_vlan_push_proto = {
...@@ -1720,8 +1738,11 @@ EXPORT_SYMBOL_GPL(bpf_skb_vlan_push_proto); ...@@ -1720,8 +1738,11 @@ EXPORT_SYMBOL_GPL(bpf_skb_vlan_push_proto);
static u64 bpf_skb_vlan_pop(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) static u64 bpf_skb_vlan_pop(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
{ {
struct sk_buff *skb = (struct sk_buff *) (long) r1; struct sk_buff *skb = (struct sk_buff *) (long) r1;
int ret;
return skb_vlan_pop(skb); ret = skb_vlan_pop(skb);
bpf_compute_data_end(skb);
return ret;
} }
const struct bpf_func_proto bpf_skb_vlan_pop_proto = { const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
...@@ -2066,8 +2087,12 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type) ...@@ -2066,8 +2087,12 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type)
static bool sk_filter_is_valid_access(int off, int size, static bool sk_filter_is_valid_access(int off, int size,
enum bpf_access_type type) enum bpf_access_type type)
{ {
if (off == offsetof(struct __sk_buff, tc_classid)) switch (off) {
case offsetof(struct __sk_buff, tc_classid):
case offsetof(struct __sk_buff, data):
case offsetof(struct __sk_buff, data_end):
return false; return false;
}
if (type == BPF_WRITE) { if (type == BPF_WRITE) {
switch (off) { switch (off) {
...@@ -2215,6 +2240,20 @@ static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg, ...@@ -2215,6 +2240,20 @@ static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
*insn++ = BPF_LDX_MEM(BPF_H, dst_reg, src_reg, ctx_off); *insn++ = BPF_LDX_MEM(BPF_H, dst_reg, src_reg, ctx_off);
break; break;
case offsetof(struct __sk_buff, data):
*insn++ = BPF_LDX_MEM(bytes_to_bpf_size(FIELD_SIZEOF(struct sk_buff, data)),
dst_reg, src_reg,
offsetof(struct sk_buff, data));
break;
case offsetof(struct __sk_buff, data_end):
ctx_off -= offsetof(struct __sk_buff, data_end);
ctx_off += offsetof(struct sk_buff, cb);
ctx_off += offsetof(struct bpf_skb_data_end, data_end);
*insn++ = BPF_LDX_MEM(bytes_to_bpf_size(sizeof(void *)),
dst_reg, src_reg, ctx_off);
break;
case offsetof(struct __sk_buff, tc_index): case offsetof(struct __sk_buff, tc_index):
#ifdef CONFIG_NET_SCHED #ifdef CONFIG_NET_SCHED
BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, tc_index) != 2); BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, tc_index) != 2);
......
...@@ -53,9 +53,11 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act, ...@@ -53,9 +53,11 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
filter = rcu_dereference(prog->filter); filter = rcu_dereference(prog->filter);
if (at_ingress) { if (at_ingress) {
__skb_push(skb, skb->mac_len); __skb_push(skb, skb->mac_len);
bpf_compute_data_end(skb);
filter_res = BPF_PROG_RUN(filter, skb); filter_res = BPF_PROG_RUN(filter, skb);
__skb_pull(skb, skb->mac_len); __skb_pull(skb, skb->mac_len);
} else { } else {
bpf_compute_data_end(skb);
filter_res = BPF_PROG_RUN(filter, skb); filter_res = BPF_PROG_RUN(filter, skb);
} }
rcu_read_unlock(); rcu_read_unlock();
......
...@@ -96,9 +96,11 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp, ...@@ -96,9 +96,11 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
if (at_ingress) { if (at_ingress) {
/* It is safe to push/pull even if skb_shared() */ /* It is safe to push/pull even if skb_shared() */
__skb_push(skb, skb->mac_len); __skb_push(skb, skb->mac_len);
bpf_compute_data_end(skb);
filter_res = BPF_PROG_RUN(prog->filter, skb); filter_res = BPF_PROG_RUN(prog->filter, skb);
__skb_pull(skb, skb->mac_len); __skb_pull(skb, skb->mac_len);
} else { } else {
bpf_compute_data_end(skb);
filter_res = BPF_PROG_RUN(prog->filter, skb); filter_res = BPF_PROG_RUN(prog->filter, skb);
} }
......
...@@ -60,6 +60,7 @@ always += spintest_kern.o ...@@ -60,6 +60,7 @@ always += spintest_kern.o
always += map_perf_test_kern.o always += map_perf_test_kern.o
always += test_overhead_tp_kern.o always += test_overhead_tp_kern.o
always += test_overhead_kprobe_kern.o always += test_overhead_kprobe_kern.o
always += parse_varlen.o parse_simple.o parse_ldabs.o
HOSTCFLAGS += -I$(objtree)/usr/include HOSTCFLAGS += -I$(objtree)/usr/include
...@@ -120,4 +121,5 @@ $(src)/*.c: verify_target_bpf ...@@ -120,4 +121,5 @@ $(src)/*.c: verify_target_bpf
$(obj)/%.o: $(src)/%.c $(obj)/%.o: $(src)/%.c
$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \ $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
-Wno-compare-distinct-pointer-types \
-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@ -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
/* Copyright (c) 2016 Facebook
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/in.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <uapi/linux/bpf.h>
#include "bpf_helpers.h"
#define DEFAULT_PKTGEN_UDP_PORT 9
#define IP_MF 0x2000
#define IP_OFFSET 0x1FFF
static inline int ip_is_fragment(struct __sk_buff *ctx, __u64 nhoff)
{
return load_half(ctx, nhoff + offsetof(struct iphdr, frag_off))
& (IP_MF | IP_OFFSET);
}
SEC("ldabs")
int handle_ingress(struct __sk_buff *skb)
{
__u64 troff = ETH_HLEN + sizeof(struct iphdr);
if (load_half(skb, offsetof(struct ethhdr, h_proto)) != ETH_P_IP)
return 0;
if (load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)) != IPPROTO_UDP ||
load_byte(skb, ETH_HLEN) != 0x45)
return 0;
if (ip_is_fragment(skb, ETH_HLEN))
return 0;
if (load_half(skb, troff + offsetof(struct udphdr, dest)) == DEFAULT_PKTGEN_UDP_PORT)
return TC_ACT_SHOT;
return 0;
}
char _license[] SEC("license") = "GPL";
/* Copyright (c) 2016 Facebook
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/in.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <uapi/linux/bpf.h>
#include <net/ip.h>
#include "bpf_helpers.h"
#define DEFAULT_PKTGEN_UDP_PORT 9
/* copy of 'struct ethhdr' without __packed */
struct eth_hdr {
unsigned char h_dest[ETH_ALEN];
unsigned char h_source[ETH_ALEN];
unsigned short h_proto;
};
SEC("simple")
int handle_ingress(struct __sk_buff *skb)
{
void *data = (void *)(long)skb->data;
struct eth_hdr *eth = data;
struct iphdr *iph = data + sizeof(*eth);
struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
void *data_end = (void *)(long)skb->data_end;
/* single length check */
if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
return 0;
if (eth->h_proto != htons(ETH_P_IP))
return 0;
if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
return 0;
if (ip_is_fragment(iph))
return 0;
if (udp->dest == htons(DEFAULT_PKTGEN_UDP_PORT))
return TC_ACT_SHOT;
return 0;
}
char _license[] SEC("license") = "GPL";
/* Copyright (c) 2016 Facebook
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/in.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <uapi/linux/bpf.h>
#include <net/ip.h>
#include "bpf_helpers.h"
#define DEFAULT_PKTGEN_UDP_PORT 9
#define DEBUG 0
static int tcp(void *data, uint64_t tp_off, void *data_end)
{
struct tcphdr *tcp = data + tp_off;
if (tcp + 1 > data_end)
return 0;
if (tcp->dest == htons(80) || tcp->source == htons(80))
return TC_ACT_SHOT;
return 0;
}
static int udp(void *data, uint64_t tp_off, void *data_end)
{
struct udphdr *udp = data + tp_off;
if (udp + 1 > data_end)
return 0;
if (udp->dest == htons(DEFAULT_PKTGEN_UDP_PORT) ||
udp->source == htons(DEFAULT_PKTGEN_UDP_PORT)) {
if (DEBUG) {
char fmt[] = "udp port 9 indeed\n";
bpf_trace_printk(fmt, sizeof(fmt));
}
return TC_ACT_SHOT;
}
return 0;
}
static int parse_ipv4(void *data, uint64_t nh_off, void *data_end)
{
struct iphdr *iph;
uint64_t ihl_len;
iph = data + nh_off;
if (iph + 1 > data_end)
return 0;
if (ip_is_fragment(iph))
return 0;
ihl_len = iph->ihl * 4;
if (iph->protocol == IPPROTO_IPIP) {
iph = data + nh_off + ihl_len;
if (iph + 1 > data_end)
return 0;
ihl_len += iph->ihl * 4;
}
if (iph->protocol == IPPROTO_TCP)
return tcp(data, nh_off + ihl_len, data_end);
else if (iph->protocol == IPPROTO_UDP)
return udp(data, nh_off + ihl_len, data_end);
return 0;
}
static int parse_ipv6(void *data, uint64_t nh_off, void *data_end)
{
struct ipv6hdr *ip6h;
struct iphdr *iph;
uint64_t ihl_len = sizeof(struct ipv6hdr);
uint64_t nexthdr;
ip6h = data + nh_off;
if (ip6h + 1 > data_end)
return 0;
nexthdr = ip6h->nexthdr;
if (nexthdr == IPPROTO_IPIP) {
iph = data + nh_off + ihl_len;
if (iph + 1 > data_end)
return 0;
ihl_len += iph->ihl * 4;
nexthdr = iph->protocol;
} else if (nexthdr == IPPROTO_IPV6) {
ip6h = data + nh_off + ihl_len;
if (ip6h + 1 > data_end)
return 0;
ihl_len += sizeof(struct ipv6hdr);
nexthdr = ip6h->nexthdr;
}
if (nexthdr == IPPROTO_TCP)
return tcp(data, nh_off + ihl_len, data_end);
else if (nexthdr == IPPROTO_UDP)
return udp(data, nh_off + ihl_len, data_end);
return 0;
}
struct vlan_hdr {
uint16_t h_vlan_TCI;
uint16_t h_vlan_encapsulated_proto;
};
SEC("varlen")
int handle_ingress(struct __sk_buff *skb)
{
void *data = (void *)(long)skb->data;
struct ethhdr *eth = data;
void *data_end = (void *)(long)skb->data_end;
uint64_t h_proto, nh_off;
nh_off = sizeof(*eth);
if (data + nh_off > data_end)
return 0;
h_proto = eth->h_proto;
if (h_proto == ETH_P_8021Q || h_proto == ETH_P_8021AD) {
struct vlan_hdr *vhdr;
vhdr = data + nh_off;
nh_off += sizeof(struct vlan_hdr);
if (data + nh_off > data_end)
return 0;
h_proto = vhdr->h_vlan_encapsulated_proto;
}
if (h_proto == ETH_P_8021Q || h_proto == ETH_P_8021AD) {
struct vlan_hdr *vhdr;
vhdr = data + nh_off;
nh_off += sizeof(struct vlan_hdr);
if (data + nh_off > data_end)
return 0;
h_proto = vhdr->h_vlan_encapsulated_proto;
}
if (h_proto == htons(ETH_P_IP))
return parse_ipv4(data, nh_off, data_end);
else if (h_proto == htons(ETH_P_IPV6))
return parse_ipv6(data, nh_off, data_end);
return 0;
}
char _license[] SEC("license") = "GPL";
#!/bin/bash
function pktgen {
../pktgen/pktgen_bench_xmit_mode_netif_receive.sh -i $IFC -s 64 \
-m 90:e2:ba:ff:ff:ff -d 192.168.0.1 -t 4
local dropped=`tc -s qdisc show dev $IFC | tail -3 | awk '/drop/{print $7}'`
if [ "$dropped" == "0," ]; then
echo "FAIL"
else
echo "Successfully filtered " $dropped " packets"
fi
}
function test {
echo -n "Loading bpf program '$2'... "
tc qdisc add dev $IFC clsact
tc filter add dev $IFC ingress bpf da obj $1 sec $2
local status=$?
if [ $status -ne 0 ]; then
echo "FAIL"
else
echo "ok"
pktgen
fi
tc qdisc del dev $IFC clsact
}
IFC=test_veth
ip link add name $IFC type veth peer name pair_$IFC
ip link set $IFC up
ip link set pair_$IFC up
test ./parse_simple.o simple
test ./parse_varlen.o varlen
test ./parse_ldabs.o ldabs
ip link del dev $IFC
...@@ -1448,6 +1448,86 @@ static struct bpf_test tests[] = { ...@@ -1448,6 +1448,86 @@ static struct bpf_test tests[] = {
.result = ACCEPT, .result = ACCEPT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS, .prog_type = BPF_PROG_TYPE_SCHED_CLS,
}, },
{
"pkt: test1",
.insns = {
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
offsetof(struct __sk_buff, data)),
BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
offsetof(struct __sk_buff, data_end)),
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
{
"pkt: test2",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 1),
BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
offsetof(struct __sk_buff, data_end)),
BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
offsetof(struct __sk_buff, data)),
BPF_MOV64_REG(BPF_REG_5, BPF_REG_3),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, 14),
BPF_JMP_REG(BPF_JGT, BPF_REG_5, BPF_REG_4, 15),
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_3, 7),
BPF_LDX_MEM(BPF_B, BPF_REG_4, BPF_REG_3, 12),
BPF_ALU64_IMM(BPF_MUL, BPF_REG_4, 14),
BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
offsetof(struct __sk_buff, data)),
BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_4),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 48),
BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 48),
BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_2),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_3),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 8),
BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1,
offsetof(struct __sk_buff, data_end)),
BPF_JMP_REG(BPF_JGT, BPF_REG_2, BPF_REG_1, 1),
BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
{
"pkt: test3",
.insns = {
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
offsetof(struct __sk_buff, data)),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.errstr = "invalid bpf_context access off=76",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
},
{
"pkt: test4",
.insns = {
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
offsetof(struct __sk_buff, data)),
BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
offsetof(struct __sk_buff, data_end)),
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
BPF_STX_MEM(BPF_B, BPF_REG_2, BPF_REG_2, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.errstr = "cannot write",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
}; };
static int probe_filter_length(struct bpf_insn *fp) static int probe_filter_length(struct bpf_insn *fp)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment