bpf: direct packet access (969bf05e) · Commits · Kirill Smelkov / linux

Commit 969bf05e authored May 05, 2016 by

Alexei Starovoitov Committed by David S. Miller May 06, 2016

bpf: direct packet access

Extended BPF carried over two instructions from classic to access
packet data: LD_ABS and LD_IND. They're highly optimized in JITs,
but due to their design they have to do length check for every access.
When BPF is processing 20M packets per second single LD_ABS after JIT
is consuming 3% cpu. Hence the need to optimize it further by amortizing
the cost of 'off < skb_headlen' over multiple packet accesses.
One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW
with similar usage as skb_header_pointer().
The kernel part for interpreter and x64 JIT was implemented in [1], but such
new insns behave like old ld_abs and abort the program with 'return 0' if
access is beyond linear data. Such hidden control flow is hard to workaround
plus changing JITs and rolling out new llvm is incovenient.

Therefore allow cls_bpf/act_bpf program access skb->data directly:
int bpf_prog(struct __sk_buff *skb)
{
  struct iphdr *ip;

  if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end)
      /* packet too small */
      return 0;

  ip = skb->data + ETH_HLEN;

  /* access IP header fields with direct loads */
  if (ip->version != 4 || ip->saddr == 0x7f000001)
      return 1;
  [...]
}

This solution avoids introduction of new instructions. llvm stays
the same and all JITs stay the same, but verifier has to work extra hard
to prove safety of the above program.

For XDP the direct store instructions can be allowed as well.

The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check
the alignment. The complex packet parsers where packet pointer is adjusted
incrementally cannot be tracked for alignment, so allow byte access in such cases
and misaligned access on architectures that define efficient_unaligned_access

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dwSigned-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

parent 1a0dc1ac

Expand all Hide whitespace changes

Inline Side-by-side

View file @ 969bf05e

...	@@ -370,6 +370,8 @@ struct __sk_buff {	...	@@ -370,6 +370,8 @@ struct __sk_buff {
	__u32 cb[5];		__u32 cb[5];
	__u32 hash;		__u32 hash;
	__u32 tc_classid;		__u32 tc_classid;
			__u32 data;
			__u32 data_end;
	};		};

	struct bpf_tunnel_key {		struct bpf_tunnel_key {
...		...

View file @ 969bf05e

...	@@ -794,6 +794,11 @@ void __weak bpf_int_jit_compile(struct bpf_prog *prog)	...	@@ -794,6 +794,11 @@ void __weak bpf_int_jit_compile(struct bpf_prog *prog)
	{		{
	}		}

			bool __weak bpf_helper_changes_skb_data(void *func)
			{
			return false;
			}

	/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call		/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
	* skb_copy_bits(), so provide a weak definition of it for NET-less config.		* skb_copy_bits(), so provide a weak definition of it for NET-less config.
	*/		*/
...		...

View file @ 969bf05e

This diff is collapsed.

Kirill Smelkov @kirr
mentioned in commit 19de99f7
· Feb 26, 2017

mentioned in commit 19de99f7

mentioned in commit 19de99f70b87fcc3338da52a89c439b088cbff71

Toggle commit list
Kirill Smelkov @kirr
mentioned in commit 1f415a74
· Feb 26, 2017

mentioned in commit 1f415a74

mentioned in commit 1f415a74b0ca64b5bfacbb12d71ed2ec050a8cfb

Toggle commit list
Kirill Smelkov @kirr
mentioned in commit 2d2be8ca
· Feb 26, 2017

mentioned in commit 2d2be8ca

mentioned in commit 2d2be8cab26ed918e94d2deae89580003242a123

Toggle commit list
Kirill Smelkov @kirr
mentioned in commit b399cf64
· Feb 26, 2017

mentioned in commit b399cf64

mentioned in commit b399cf64e318ac8c5f10d36bb911e61c746b8788

Toggle commit list
Kirill Smelkov @kirr
mentioned in commit b1977682
· Sep 27, 2017

mentioned in commit b1977682

mentioned in commit b1977682a3858b5584ffea7cfb7bd863f68db18d

Toggle commit list

Please register or to comment