Commit 071a234a authored by David S. Miller's avatar David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-10-08

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.

2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c

3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet

4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski

5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.

6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf

7) various AF_XDP fixes from Björn and Magnus
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 9000a457 df3f94a0
......@@ -159,8 +159,8 @@ log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
and 3000 refers to the same chunk.
UMEM Completetion Ring
~~~~~~~~~~~~~~~~~~~~~~
UMEM Completion Ring
~~~~~~~~~~~~~~~~~~~~
The Completion Ring is used transfer ownership of UMEM frames from
kernel-space to user-space. Just like the Fill ring, UMEM indicies are
......
......@@ -203,11 +203,11 @@ opcodes as defined in linux/filter.h stand for:
Instruction Addressing mode Description
ld 1, 2, 3, 4, 10 Load word into A
ld 1, 2, 3, 4, 12 Load word into A
ldi 4 Load word into A
ldh 1, 2 Load half-word into A
ldb 1, 2 Load byte into A
ldx 3, 4, 5, 10 Load word into X
ldx 3, 4, 5, 12 Load word into X
ldxi 4 Load word into X
ldxb 5 Load byte into X
......@@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for:
jmp 6 Jump to label
ja 6 Jump to label
jeq 7, 8 Jump on A == k
jneq 8 Jump on A != k
jne 8 Jump on A != k
jlt 8 Jump on A < k
jle 8 Jump on A <= k
jgt 7, 8 Jump on A > k
jge 7, 8 Jump on A >= k
jset 7, 8 Jump on A & k
jeq 7, 8, 9, 10 Jump on A == <x>
jneq 9, 10 Jump on A != <x>
jne 9, 10 Jump on A != <x>
jlt 9, 10 Jump on A < <x>
jle 9, 10 Jump on A <= <x>
jgt 7, 8, 9, 10 Jump on A > <x>
jge 7, 8, 9, 10 Jump on A >= <x>
jset 7, 8, 9, 10 Jump on A & <x>
add 0, 4 A + <x>
sub 0, 4 A - <x>
......@@ -240,7 +240,7 @@ opcodes as defined in linux/filter.h stand for:
tax Copy A into X
txa Copy X into A
ret 4, 9 Return
ret 4, 11 Return
The next table shows addressing formats from the 2nd column:
......@@ -254,9 +254,11 @@ The next table shows addressing formats from the 2nd column:
5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet
6 L Jump label L
7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf
8 #k,Lt Jump to Lt if predicate is true
9 a/%a Accumulator A
10 extension BPF extension
8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf
9 #k,Lt Jump to Lt if predicate is true
10 x/%x,Lt Jump to Lt if predicate is true
11 a/%a Accumulator A
12 extension BPF extension
The Linux kernel also has a couple of BPF extensions that are used along
with the class of load instructions by "overloading" the k argument with
......@@ -1125,6 +1127,14 @@ pointer type. The types of pointers describe their base, as follows:
PTR_TO_STACK Frame pointer.
PTR_TO_PACKET skb->data.
PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden.
PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted.
PTR_TO_SOCKET_OR_NULL
Either a pointer to a socket, or NULL; socket lookup
returns this type, which becomes a PTR_TO_SOCKET when
checked != NULL. PTR_TO_SOCKET is reference-counted,
so programs must release the reference through the
socket release function before the end of the program.
Arithmetic on these pointers is forbidden.
However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate
......@@ -1171,6 +1181,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
that pointer are safe.
The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
to all copies of the pointer returned from a socket lookup. This has similar
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
represents a reference to the corresponding 'struct sock'. To ensure that the
reference is not leaked, it is imperative to NULL-check the reference and in
the non-NULL case, and pass the valid reference to the socket release function.
Direct packet access
--------------------
......@@ -1444,6 +1461,55 @@ Error:
8: (7a) *(u64 *)(r0 +0) = 1
R0 invalid mem access 'imm'
Program that performs a socket lookup then sets the pointer to NULL without
checking it:
value:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
Error:
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (b7) r0 = 0
9: (95) exit
Unreleased reference id=1, alloc_insn=7
Program that performs a socket lookup but does not NULL-check the returned
value:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_EXIT_INSN(),
Error:
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (95) exit
Unreleased reference id=1, alloc_insn=7
Testing
-------
......
......@@ -89,15 +89,32 @@ nfp_bpf_cmsg_alloc(struct nfp_app_bpf *bpf, unsigned int size)
return skb;
}
static unsigned int
nfp_bpf_cmsg_map_req_size(struct nfp_app_bpf *bpf, unsigned int n)
{
unsigned int size;
size = sizeof(struct cmsg_req_map_op);
size += (bpf->cmsg_key_sz + bpf->cmsg_val_sz) * n;
return size;
}
static struct sk_buff *
nfp_bpf_cmsg_map_req_alloc(struct nfp_app_bpf *bpf, unsigned int n)
{
return nfp_bpf_cmsg_alloc(bpf, nfp_bpf_cmsg_map_req_size(bpf, n));
}
static unsigned int
nfp_bpf_cmsg_map_reply_size(struct nfp_app_bpf *bpf, unsigned int n)
{
unsigned int size;
size = sizeof(struct cmsg_req_map_op);
size += sizeof(struct cmsg_key_value_pair) * n;
size = sizeof(struct cmsg_reply_map_op);
size += (bpf->cmsg_key_sz + bpf->cmsg_val_sz) * n;
return nfp_bpf_cmsg_alloc(bpf, size);
return size;
}
static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb)
......@@ -338,6 +355,34 @@ void nfp_bpf_ctrl_free_map(struct nfp_app_bpf *bpf, struct nfp_bpf_map *nfp_map)
dev_consume_skb_any(skb);
}
static void *
nfp_bpf_ctrl_req_key(struct nfp_app_bpf *bpf, struct cmsg_req_map_op *req,
unsigned int n)
{
return &req->data[bpf->cmsg_key_sz * n + bpf->cmsg_val_sz * n];
}
static void *
nfp_bpf_ctrl_req_val(struct nfp_app_bpf *bpf, struct cmsg_req_map_op *req,
unsigned int n)
{
return &req->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n];
}
static void *
nfp_bpf_ctrl_reply_key(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply,
unsigned int n)
{
return &reply->data[bpf->cmsg_key_sz * n + bpf->cmsg_val_sz * n];
}
static void *
nfp_bpf_ctrl_reply_val(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply,
unsigned int n)
{
return &reply->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n];
}
static int
nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
enum nfp_bpf_cmsg_type op,
......@@ -366,12 +411,13 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
/* Copy inputs */
if (key)
memcpy(&req->elem[0].key, key, map->key_size);
memcpy(nfp_bpf_ctrl_req_key(bpf, req, 0), key, map->key_size);
if (value)
memcpy(&req->elem[0].value, value, map->value_size);
memcpy(nfp_bpf_ctrl_req_val(bpf, req, 0), value,
map->value_size);
skb = nfp_bpf_cmsg_communicate(bpf, skb, op,
sizeof(*reply) + sizeof(*reply->elem));
nfp_bpf_cmsg_map_reply_size(bpf, 1));
if (IS_ERR(skb))
return PTR_ERR(skb);
......@@ -382,9 +428,11 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
/* Copy outputs */
if (out_key)
memcpy(out_key, &reply->elem[0].key, map->key_size);
memcpy(out_key, nfp_bpf_ctrl_reply_key(bpf, reply, 0),
map->key_size);
if (out_value)
memcpy(out_value, &reply->elem[0].value, map->value_size);
memcpy(out_value, nfp_bpf_ctrl_reply_val(bpf, reply, 0),
map->value_size);
dev_consume_skb_any(skb);
......@@ -428,6 +476,13 @@ int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
key, NULL, 0, next_key, NULL);
}
unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf)
{
return max3((unsigned int)NFP_NET_DEFAULT_MTU,
nfp_bpf_cmsg_map_req_size(bpf, 1),
nfp_bpf_cmsg_map_reply_size(bpf, 1));
}
void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
{
struct nfp_app_bpf *bpf = app->priv;
......
......@@ -52,6 +52,7 @@ enum bpf_cap_tlv_type {
NFP_BPF_CAP_TYPE_RANDOM = 4,
NFP_BPF_CAP_TYPE_QUEUE_SELECT = 5,
NFP_BPF_CAP_TYPE_ADJUST_TAIL = 6,
NFP_BPF_CAP_TYPE_ABI_VERSION = 7,
};
struct nfp_bpf_cap_tlv_func {
......@@ -98,6 +99,7 @@ enum nfp_bpf_cmsg_type {
#define CMSG_TYPE_MAP_REPLY_BIT 7
#define __CMSG_REPLY(req) (BIT(CMSG_TYPE_MAP_REPLY_BIT) | (req))
/* BPF ABIv2 fixed-length control message fields */
#define CMSG_MAP_KEY_LW 16
#define CMSG_MAP_VALUE_LW 16
......@@ -147,24 +149,19 @@ struct cmsg_reply_map_free_tbl {
__be32 count;
};
struct cmsg_key_value_pair {
__be32 key[CMSG_MAP_KEY_LW];
__be32 value[CMSG_MAP_VALUE_LW];
};
struct cmsg_req_map_op {
struct cmsg_hdr hdr;
__be32 tid;
__be32 count;
__be32 flags;
struct cmsg_key_value_pair elem[0];
u8 data[0];
};
struct cmsg_reply_map_op {
struct cmsg_reply_map_simple reply_hdr;
__be32 count;
__be32 resv;
struct cmsg_key_value_pair elem[0];
u8 data[0];
};
struct cmsg_bpf_event {
......
......@@ -54,11 +54,14 @@ const struct rhashtable_params nfp_bpf_maps_neutral_params = {
static bool nfp_net_ebpf_capable(struct nfp_net *nn)
{
#ifdef __LITTLE_ENDIAN
if (nn->cap & NFP_NET_CFG_CTRL_BPF &&
nn_readb(nn, NFP_NET_CFG_BPF_ABI) == NFP_NET_BPF_ABI)
return true;
#endif
struct nfp_app_bpf *bpf = nn->app->priv;
return nn->cap & NFP_NET_CFG_CTRL_BPF &&
bpf->abi_version &&
nn_readb(nn, NFP_NET_CFG_BPF_ABI) == bpf->abi_version;
#else
return false;
#endif
}
static int
......@@ -342,6 +345,26 @@ nfp_bpf_parse_cap_adjust_tail(struct nfp_app_bpf *bpf, void __iomem *value,
return 0;
}
static int
nfp_bpf_parse_cap_abi_version(struct nfp_app_bpf *bpf, void __iomem *value,
u32 length)
{
if (length < 4) {
nfp_err(bpf->app->cpp, "truncated ABI version TLV: %d\n",
length);
return -EINVAL;
}
bpf->abi_version = readl(value);
if (bpf->abi_version < 2 || bpf->abi_version > 3) {
nfp_warn(bpf->app->cpp, "unsupported BPF ABI version: %d\n",
bpf->abi_version);
bpf->abi_version = 0;
}
return 0;
}
static int nfp_bpf_parse_capabilities(struct nfp_app *app)
{
struct nfp_cpp *cpp = app->pf->cpp;
......@@ -393,6 +416,11 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
length))
goto err_release_free;
break;
case NFP_BPF_CAP_TYPE_ABI_VERSION:
if (nfp_bpf_parse_cap_abi_version(app->priv, value,
length))
goto err_release_free;
break;
default:
nfp_dbg(cpp, "unknown BPF capability: %d\n", type);
break;
......@@ -414,6 +442,11 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
return -EINVAL;
}
static void nfp_bpf_init_capabilities(struct nfp_app_bpf *bpf)
{
bpf->abi_version = 2; /* Original BPF ABI version */
}
static int nfp_bpf_ndo_init(struct nfp_app *app, struct net_device *netdev)
{
struct nfp_app_bpf *bpf = app->priv;
......@@ -447,10 +480,21 @@ static int nfp_bpf_init(struct nfp_app *app)
if (err)
goto err_free_bpf;
nfp_bpf_init_capabilities(bpf);
err = nfp_bpf_parse_capabilities(app);
if (err)
goto err_free_neutral_maps;
if (bpf->abi_version < 3) {
bpf->cmsg_key_sz = CMSG_MAP_KEY_LW * 4;
bpf->cmsg_val_sz = CMSG_MAP_VALUE_LW * 4;
} else {
bpf->cmsg_key_sz = bpf->maps.max_key_sz;
bpf->cmsg_val_sz = bpf->maps.max_val_sz;
app->ctrl_mtu = nfp_bpf_ctrl_cmsg_mtu(bpf);
}
bpf->bpf_dev = bpf_offload_dev_create();
err = PTR_ERR_OR_ZERO(bpf->bpf_dev);
if (err)
......
......@@ -61,6 +61,8 @@ enum nfp_relo_type {
/* internal jumps to parts of the outro */
RELO_BR_GO_OUT,
RELO_BR_GO_ABORT,
RELO_BR_GO_CALL_PUSH_REGS,
RELO_BR_GO_CALL_POP_REGS,
/* external jumps to fixed addresses */
RELO_BR_NEXT_PKT,
RELO_BR_HELPER,
......@@ -104,6 +106,7 @@ enum pkt_vec {
#define imma_a(np) reg_a(STATIC_REG_IMMA)
#define imma_b(np) reg_b(STATIC_REG_IMMA)
#define imm_both(np) reg_both(STATIC_REG_IMM)
#define ret_reg(np) imm_a(np)
#define NFP_BPF_ABI_FLAGS reg_imm(0)
#define NFP_BPF_ABI_FLAG_MARK 1
......@@ -121,12 +124,17 @@ enum pkt_vec {
* @cmsg_replies: received cmsg replies waiting to be consumed
* @cmsg_wq: work queue for waiting for cmsg replies
*
* @cmsg_key_sz: size of key in cmsg element array
* @cmsg_val_sz: size of value in cmsg element array
*
* @map_list: list of offloaded maps
* @maps_in_use: number of currently offloaded maps
* @map_elems_in_use: number of elements allocated to offloaded maps
*
* @maps_neutral: hash table of offload-neutral maps (on pointer)
*
* @abi_version: global BPF ABI version
*
* @adjust_head: adjust head capability
* @adjust_head.flags: extra flags for adjust head
* @adjust_head.off_min: minimal packet offset within buffer required
......@@ -164,12 +172,17 @@ struct nfp_app_bpf {
struct sk_buff_head cmsg_replies;
struct wait_queue_head cmsg_wq;
unsigned int cmsg_key_sz;
unsigned int cmsg_val_sz;
struct list_head map_list;
unsigned int maps_in_use;
unsigned int map_elems_in_use;
struct rhashtable maps_neutral;
u32 abi_version;
struct nfp_bpf_cap_adjust_head {
u32 flags;
int off_min;
......@@ -252,7 +265,9 @@ struct nfp_bpf_reg_state {
bool var_off;
};
#define FLAG_INSN_IS_JUMP_DST BIT(0)
#define FLAG_INSN_IS_JUMP_DST BIT(0)
#define FLAG_INSN_IS_SUBPROG_START BIT(1)
#define FLAG_INSN_PTR_CALLER_STACK_FRAME BIT(2)
/**
* struct nfp_insn_meta - BPF instruction wrapper
......@@ -269,6 +284,7 @@ struct nfp_bpf_reg_state {
* @xadd_maybe_16bit: 16bit immediate is possible
* @jmp_dst: destination info for jump instructions
* @jump_neg_op: jump instruction has inverted immediate, use ADD instead of SUB
* @num_insns_after_br: number of insns following a branch jump, used for fixup
* @func_id: function id for call instructions
* @arg1: arg1 for call instructions
* @arg2: arg2 for call instructions
......@@ -279,6 +295,7 @@ struct nfp_bpf_reg_state {
* @off: index of first generated machine instruction (in nfp_prog.prog)
* @n: eBPF instruction number
* @flags: eBPF instruction extra optimization flags
* @subprog_idx: index of subprogram to which the instruction belongs
* @skip: skip this instruction (optimized out)
* @double_cb: callback for second part of the instruction
* @l: link on nfp_prog->insns list
......@@ -304,6 +321,7 @@ struct nfp_insn_meta {
struct {
struct nfp_insn_meta *jmp_dst;
bool jump_neg_op;
u32 num_insns_after_br; /* only for BPF-to-BPF calls */
};
/* function calls */
struct {
......@@ -325,6 +343,7 @@ struct nfp_insn_meta {
unsigned int off;
unsigned short n;
unsigned short flags;
unsigned short subprog_idx;
bool skip;
instr_cb_t double_cb;
......@@ -413,6 +432,34 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_DIV;
}
static inline bool is_mbpf_helper_call(const struct nfp_insn_meta *meta)
{
struct bpf_insn insn = meta->insn;
return insn.code == (BPF_JMP | BPF_CALL) &&
insn.src_reg != BPF_PSEUDO_CALL;
}
static inline bool is_mbpf_pseudo_call(const struct nfp_insn_meta *meta)
{
struct bpf_insn insn = meta->insn;
return insn.code == (BPF_JMP | BPF_CALL) &&
insn.src_reg == BPF_PSEUDO_CALL;
}
#define STACK_FRAME_ALIGN 64
/**
* struct nfp_bpf_subprog_info - nfp BPF sub-program (a.k.a. function) info
* @stack_depth: maximum stack depth used by this sub-program
* @needs_reg_push: whether sub-program uses callee-saved registers
*/
struct nfp_bpf_subprog_info {
u16 stack_depth;
u8 needs_reg_push : 1;
};
/**
* struct nfp_prog - nfp BPF program
* @bpf: backpointer to the bpf app priv structure
......@@ -424,12 +471,16 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
* @last_bpf_off: address of the last instruction translated from BPF
* @tgt_out: jump target for normal exit
* @tgt_abort: jump target for abort (e.g. access outside of packet buffer)
* @tgt_call_push_regs: jump target for subroutine for saving R6~R9 to stack
* @tgt_call_pop_regs: jump target for subroutine used for restoring R6~R9
* @n_translated: number of successfully translated instructions (for errors)
* @error: error code if something went wrong
* @stack_depth: max stack depth from the verifier
* @stack_frame_depth: max stack depth for current frame
* @adjust_head_location: if program has single adjust head call - the insn no.
* @map_records_cnt: the number of map pointers recorded for this prog
* @subprog_cnt: number of sub-programs, including main function
* @map_records: the map record pointers from bpf->maps_neutral
* @subprog: pointer to an array of objects holding info about sub-programs
* @insns: list of BPF instruction wrappers (struct nfp_insn_meta)
*/
struct nfp_prog {
......@@ -446,15 +497,19 @@ struct nfp_prog {
unsigned int last_bpf_off;
unsigned int tgt_out;
unsigned int tgt_abort;
unsigned int tgt_call_push_regs;
unsigned int tgt_call_pop_regs;
unsigned int n_translated;
int error;
unsigned int stack_depth;
unsigned int stack_frame_depth;
unsigned int adjust_head_location;
unsigned int map_records_cnt;
unsigned int subprog_cnt;
struct nfp_bpf_neutral_map **map_records;
struct nfp_bpf_subprog_info *subprog;
struct list_head insns;
};
......@@ -471,6 +526,7 @@ struct nfp_bpf_vnic {
unsigned int tgt_done;
};
bool nfp_is_subprog_start(struct nfp_insn_meta *meta);
void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt);
int nfp_bpf_jit(struct nfp_prog *prog);
bool nfp_bpf_supported_opcode(u8 code);
......@@ -492,6 +548,7 @@ nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv);
unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf);
long long int
nfp_bpf_ctrl_alloc_map(struct nfp_app_bpf *bpf, struct bpf_map *map);
void
......
......@@ -208,6 +208,8 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog)
{
struct nfp_insn_meta *meta, *tmp;
kfree(nfp_prog->subprog);
list_for_each_entry_safe(meta, tmp, &nfp_prog->insns, l) {
list_del(&meta->l);
kfree(meta);
......@@ -250,18 +252,9 @@ nfp_bpf_verifier_prep(struct nfp_app *app, struct nfp_net *nn,
static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
{
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
unsigned int stack_size;
unsigned int max_instr;
int err;
stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64;
if (prog->aux->stack_depth > stack_size) {
nn_info(nn, "stack too large: program %dB > FW stack %dB\n",
prog->aux->stack_depth, stack_size);
return -EOPNOTSUPP;
}
nfp_prog->stack_depth = round_up(prog->aux->stack_depth, 4);
max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
nfp_prog->__prog_alloc_len = max_instr * sizeof(u64);
......
......@@ -34,10 +34,12 @@
#include <linux/bpf.h>
#include <linux/bpf_verifier.h>
#include <linux/kernel.h>
#include <linux/netdevice.h>
#include <linux/pkt_cls.h>
#include "../nfp_app.h"
#include "../nfp_main.h"
#include "../nfp_net.h"
#include "fw.h"
#include "main.h"
......@@ -155,8 +157,9 @@ nfp_bpf_map_call_ok(const char *fname, struct bpf_verifier_env *env,
}
static int
nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env,
struct nfp_insn_meta *meta)
nfp_bpf_check_helper_call(struct nfp_prog *nfp_prog,
struct bpf_verifier_env *env,
struct nfp_insn_meta *meta)
{
const struct bpf_reg_state *reg1 = cur_regs(env) + BPF_REG_1;
const struct bpf_reg_state *reg2 = cur_regs(env) + BPF_REG_2;
......@@ -333,6 +336,9 @@ nfp_bpf_check_stack_access(struct nfp_prog *nfp_prog,
{
s32 old_off, new_off;
if (reg->frameno != env->cur_state->curframe)
meta->flags |= FLAG_INSN_PTR_CALLER_STACK_FRAME;
if (!tnum_is_const(reg->var_off)) {
pr_vlog(env, "variable ptr stack access\n");
return -EINVAL;
......@@ -620,8 +626,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
return -EINVAL;
}
if (meta->insn.code == (BPF_JMP | BPF_CALL))
return nfp_bpf_check_call(nfp_prog, env, meta);
if (is_mbpf_helper_call(meta))
return nfp_bpf_check_helper_call(nfp_prog, env, meta);
if (meta->insn.code == (BPF_JMP | BPF_EXIT))
return nfp_bpf_check_exit(nfp_prog, env);
......@@ -640,6 +646,131 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
return 0;
}
static int
nfp_assign_subprog_idx_and_regs(struct bpf_verifier_env *env,
struct nfp_prog *nfp_prog)
{
struct nfp_insn_meta *meta;
int index = 0;
list_for_each_entry(meta, &nfp_prog->insns, l) {
if (nfp_is_subprog_start(meta))
index++;
meta->subprog_idx = index;
if (meta->insn.dst_reg >= BPF_REG_6 &&
meta->insn.dst_reg <= BPF_REG_9)
nfp_prog->subprog[index].needs_reg_push = 1;
}
if (index + 1 != nfp_prog->subprog_cnt) {
pr_vlog(env, "BUG: number of processed BPF functions is not consistent (processed %d, expected %d)\n",
index + 1, nfp_prog->subprog_cnt);
return -EFAULT;
}
return 0;
}
static unsigned int
nfp_bpf_get_stack_usage(struct nfp_prog *nfp_prog, unsigned int cnt)
{
struct nfp_insn_meta *meta = nfp_prog_first_meta(nfp_prog);
unsigned int max_depth = 0, depth = 0, frame = 0;
struct nfp_insn_meta *ret_insn[MAX_CALL_FRAMES];
unsigned short frame_depths[MAX_CALL_FRAMES];
unsigned short ret_prog[MAX_CALL_FRAMES];
unsigned short idx = meta->subprog_idx;
/* Inspired from check_max_stack_depth() from kernel verifier.
* Starting from main subprogram, walk all instructions and recursively
* walk all callees that given subprogram can call. Since recursion is
* prevented by the kernel verifier, this algorithm only needs a local
* stack of MAX_CALL_FRAMES to remember callsites.
*/
process_subprog:
frame_depths[frame] = nfp_prog->subprog[idx].stack_depth;
frame_depths[frame] = round_up(frame_depths[frame], STACK_FRAME_ALIGN);
depth += frame_depths[frame];
max_depth = max(max_depth, depth);
continue_subprog:
for (; meta != nfp_prog_last_meta(nfp_prog) && meta->subprog_idx == idx;
meta = nfp_meta_next(meta)) {
if (!is_mbpf_pseudo_call(meta))
continue;
/* We found a call to a subprogram. Remember instruction to
* return to and subprog id.
*/
ret_insn[frame] = nfp_meta_next(meta);
ret_prog[frame] = idx;
/* Find the callee and start processing it. */
meta = nfp_bpf_goto_meta(nfp_prog, meta,
meta->n + 1 + meta->insn.imm, cnt);
idx = meta->subprog_idx;
frame++;
goto process_subprog;
}
/* End of for() loop means the last instruction of the subprog was
* reached. If we popped all stack frames, return; otherwise, go on
* processing remaining instructions from the caller.
*/
if (frame == 0)
return max_depth;
depth -= frame_depths[frame];
frame--;
meta = ret_insn[frame];
idx = ret_prog[frame];
goto continue_subprog;
}
static int nfp_bpf_finalize(struct bpf_verifier_env *env)
{
unsigned int stack_size, stack_needed;
struct bpf_subprog_info *info;
struct nfp_prog *nfp_prog;
struct nfp_net *nn;
int i;
nfp_prog = env->prog->aux->offload->dev_priv;
nfp_prog->subprog_cnt = env->subprog_cnt;
nfp_prog->subprog = kcalloc(nfp_prog->subprog_cnt,
sizeof(nfp_prog->subprog[0]), GFP_KERNEL);
if (!nfp_prog->subprog)
return -ENOMEM;
nfp_assign_subprog_idx_and_regs(env, nfp_prog);
info = env->subprog_info;
for (i = 0; i < nfp_prog->subprog_cnt; i++) {
nfp_prog->subprog[i].stack_depth = info[i].stack_depth;
if (i == 0)
continue;
/* Account for size of return address. */
nfp_prog->subprog[i].stack_depth += REG_WIDTH;
/* Account for size of saved registers, if necessary. */
if (nfp_prog->subprog[i].needs_reg_push)
nfp_prog->subprog[i].stack_depth += BPF_REG_SIZE * 4;
}
nn = netdev_priv(env->prog->aux->offload->netdev);
stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64;
stack_needed = nfp_bpf_get_stack_usage(nfp_prog, env->prog->len);
if (stack_needed > stack_size) {
pr_vlog(env, "stack too large: program %dB > FW stack %dB\n",
stack_needed, stack_size);
return -EOPNOTSUPP;
}
return 0;
}
const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
.insn_hook = nfp_verify_insn,
.insn_hook = nfp_verify_insn,
.finalize = nfp_bpf_finalize,
};
......@@ -40,6 +40,8 @@
#include "nfp_net_repr.h"
#define NFP_APP_CTRL_MTU_MAX U32_MAX
struct bpf_prog;
struct net_device;
struct netdev_bpf;
......@@ -178,6 +180,7 @@ struct nfp_app_type {
* @ctrl: pointer to ctrl vNIC struct
* @reprs: array of pointers to representors
* @type: pointer to const application ops and info
* @ctrl_mtu: MTU to set on the control vNIC (set in .init())
* @priv: app-specific priv data
*/
struct nfp_app {
......@@ -189,6 +192,7 @@ struct nfp_app {
struct nfp_reprs __rcu *reprs[NFP_REPR_TYPE_MAX + 1];
const struct nfp_app_type *type;
unsigned int ctrl_mtu;
void *priv;
};
......
......@@ -82,6 +82,15 @@
#define OP_BR_BIT_ADDR_LO OP_BR_ADDR_LO
#define OP_BR_BIT_ADDR_HI OP_BR_ADDR_HI
#define OP_BR_ALU_BASE 0x0e800000000ULL
#define OP_BR_ALU_BASE_MASK 0x0ff80000000ULL
#define OP_BR_ALU_A_SRC 0x000000003ffULL
#define OP_BR_ALU_B_SRC 0x000000ffc00ULL
#define OP_BR_ALU_DEFBR 0x00000300000ULL
#define OP_BR_ALU_IMM_HI 0x0007fc00000ULL
#define OP_BR_ALU_SRC_LMEXTN 0x40000000000ULL
#define OP_BR_ALU_DST_LMEXTN 0x80000000000ULL
static inline bool nfp_is_br(u64 insn)
{
return (insn & OP_BR_BASE_MASK) == OP_BR_BASE ||
......
......@@ -3884,10 +3884,20 @@ int nfp_net_init(struct nfp_net *nn)
return err;
/* Set default MTU and Freelist buffer size */
if (nn->max_mtu < NFP_NET_DEFAULT_MTU)
if (!nfp_net_is_data_vnic(nn) && nn->app->ctrl_mtu) {
if (nn->app->ctrl_mtu <= nn->max_mtu) {
nn->dp.mtu = nn->app->ctrl_mtu;
} else {
if (nn->app->ctrl_mtu != NFP_APP_CTRL_MTU_MAX)
nn_warn(nn, "app requested MTU above max supported %u > %u\n",
nn->app->ctrl_mtu, nn->max_mtu);
nn->dp.mtu = nn->max_mtu;
}
} else if (nn->max_mtu < NFP_NET_DEFAULT_MTU) {
nn->dp.mtu = nn->max_mtu;
else
} else {
nn->dp.mtu = NFP_NET_DEFAULT_MTU;
}
nn->dp.fl_bufsz = nfp_net_calc_fl_bufsz(&nn->dp);
if (nfp_app_ctrl_uses_data_vnics(nn->app))
......
......@@ -264,7 +264,6 @@
* %NFP_NET_CFG_BPF_ADDR: DMA address of the buffer with JITed BPF code
*/
#define NFP_NET_CFG_BPF_ABI 0x0080
#define NFP_NET_BPF_ABI 2
#define NFP_NET_CFG_BPF_CAP 0x0081
#define NFP_NET_BPF_CAP_RELO (1 << 0) /* seamless reload */
#define NFP_NET_CFG_BPF_MAX_LEN 0x0082
......
......@@ -86,8 +86,14 @@ nsim_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn)
return 0;
}
static int nsim_bpf_finalize(struct bpf_verifier_env *env)
{
return 0;
}
static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = {
.insn_hook = nsim_bpf_verify_insn,
.insn_hook = nsim_bpf_verify_insn,
.finalize = nsim_bpf_finalize,
};
static bool nsim_xdp_offload_active(struct netdevsim *ns)
......
......@@ -2,6 +2,7 @@
#ifndef _BPF_CGROUP_H
#define _BPF_CGROUP_H
#include <linux/bpf.h>
#include <linux/errno.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
......@@ -22,7 +23,11 @@ struct bpf_cgroup_storage;
extern struct static_key_false cgroup_bpf_enabled_key;
#define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key)
DECLARE_PER_CPU(void*, bpf_cgroup_storage);
DECLARE_PER_CPU(struct bpf_cgroup_storage*,
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
#define for_each_cgroup_storage_type(stype) \
for (stype = 0; stype < MAX_BPF_CGROUP_STORAGE_TYPE; stype++)
struct bpf_cgroup_storage_map;
......@@ -32,7 +37,10 @@ struct bpf_storage_buffer {
};
struct bpf_cgroup_storage {
struct bpf_storage_buffer *buf;
union {
struct bpf_storage_buffer *buf;
void __percpu *percpu_buf;
};
struct bpf_cgroup_storage_map *map;
struct bpf_cgroup_storage_key key;
struct list_head list;
......@@ -43,7 +51,7 @@ struct bpf_cgroup_storage {
struct bpf_prog_list {
struct list_head node;
struct bpf_prog *prog;
struct bpf_cgroup_storage *storage;
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE];
};
struct bpf_prog_array;
......@@ -101,18 +109,26 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
short access, enum bpf_attach_type type);
static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage *storage)
static inline enum bpf_cgroup_storage_type cgroup_storage_type(
struct bpf_map *map)
{
struct bpf_storage_buffer *buf;
if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
return BPF_CGROUP_STORAGE_PERCPU;
return BPF_CGROUP_STORAGE_SHARED;
}
if (!storage)
return;
static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage
*storage[MAX_BPF_CGROUP_STORAGE_TYPE])
{
enum bpf_cgroup_storage_type stype;
buf = READ_ONCE(storage->buf);
this_cpu_write(bpf_cgroup_storage, &buf->data[0]);
for_each_cgroup_storage_type(stype)
this_cpu_write(bpf_cgroup_storage[stype], storage[stype]);
}
struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog);
struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
enum bpf_cgroup_storage_type stype);
void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage);
void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
struct cgroup *cgroup,
......@@ -121,6 +137,10 @@ void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage);
int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *map);
void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *map);
int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value);
int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
void *value, u64 flags);
/* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb) \
({ \
......@@ -265,15 +285,24 @@ static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
return -EINVAL;
}
static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage *storage) {}
static inline void bpf_cgroup_storage_set(
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) {}
static inline int bpf_cgroup_storage_assign(struct bpf_prog *prog,
struct bpf_map *map) { return 0; }
static inline void bpf_cgroup_storage_release(struct bpf_prog *prog,
struct bpf_map *map) {}
static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
struct bpf_prog *prog) { return 0; }
struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return 0; }
static inline void bpf_cgroup_storage_free(
struct bpf_cgroup_storage *storage) {}
static inline int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key,
void *value) {
return 0;
}
static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
void *key, void *value, u64 flags) {
return 0;
}
#define cgroup_bpf_enabled (0)
#define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0)
......@@ -293,6 +322,8 @@ static inline void bpf_cgroup_storage_free(
#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
#define for_each_cgroup_storage_type(stype) for (; false; )
#endif /* CONFIG_CGROUP_BPF */
#endif /* _BPF_CGROUP_H */
......@@ -154,6 +154,7 @@ enum bpf_arg_type {
ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING, /* any (initialized) argument is ok */
ARG_PTR_TO_SOCKET, /* pointer to bpf_sock */
};
/* type of values returned from helper functions */
......@@ -162,6 +163,7 @@ enum bpf_return_type {
RET_VOID, /* function doesn't return anything */
RET_PTR_TO_MAP_VALUE, /* returns a pointer to map elem value */
RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */
RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */
};
/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
......@@ -213,6 +215,8 @@ enum bpf_reg_type {
PTR_TO_PACKET, /* reg points to skb->data */
PTR_TO_PACKET_END, /* skb->data + headlen */
PTR_TO_FLOW_KEYS, /* reg points to bpf_flow_keys */
PTR_TO_SOCKET, /* reg points to struct bpf_sock */
PTR_TO_SOCKET_OR_NULL, /* reg points to struct bpf_sock or NULL */
};
/* The information passed from prog-specific *_is_valid_access
......@@ -259,6 +263,7 @@ struct bpf_verifier_ops {
struct bpf_prog_offload_ops {
int (*insn_hook)(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx);
int (*finalize)(struct bpf_verifier_env *env);
};
struct bpf_prog_offload {
......@@ -272,6 +277,14 @@ struct bpf_prog_offload {
u32 jited_len;
};
enum bpf_cgroup_storage_type {
BPF_CGROUP_STORAGE_SHARED,
BPF_CGROUP_STORAGE_PERCPU,
__BPF_CGROUP_STORAGE_MAX
};
#define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX
struct bpf_prog_aux {
atomic_t refcnt;
u32 used_map_cnt;
......@@ -289,7 +302,7 @@ struct bpf_prog_aux {
struct bpf_prog *prog;
struct user_struct *user;
u64 load_time; /* ns since boottime */
struct bpf_map *cgroup_storage;
struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
char name[BPF_OBJ_NAME_LEN];
#ifdef CONFIG_SECURITY
void *security;
......@@ -335,6 +348,11 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src,
unsigned long off, unsigned long len);
typedef u32 (*bpf_convert_ctx_access_t)(enum bpf_access_type type,
const struct bpf_insn *src,
struct bpf_insn *dst,
struct bpf_prog *prog,
u32 *target_size);
u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
......@@ -358,7 +376,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
*/
struct bpf_prog_array_item {
struct bpf_prog *prog;
struct bpf_cgroup_storage *cgroup_storage;
struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
};
struct bpf_prog_array {
......@@ -828,4 +846,29 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
void bpf_user_rnd_init_once(void);
u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
#if defined(CONFIG_NET)
bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info);
u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog,
u32 *target_size);
#else
static inline bool bpf_sock_is_valid_access(int off, int size,
enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
return false;
}
static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog,
u32 *target_size)
{
return 0;
}
#endif
#endif /* _LINUX_BPF_H */
......@@ -43,6 +43,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY, cgroup_array_map_ops)
#endif
#ifdef CONFIG_CGROUP_BPF
BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, cgroup_storage_map_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops)
......
......@@ -58,6 +58,8 @@ struct bpf_reg_state {
* offset, so they can share range knowledge.
* For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
* came from, when one is tested for != NULL.
* For PTR_TO_SOCKET this is used to share which pointers retain the
* same reference to the socket, to determine proper reference freeing.
*/
u32 id;
/* For scalar types (SCALAR_VALUE), this represents our knowledge of
......@@ -102,6 +104,17 @@ struct bpf_stack_state {
u8 slot_type[BPF_REG_SIZE];
};
struct bpf_reference_state {
/* Track each reference created with a unique id, even if the same
* instruction creates the reference multiple times (eg, via CALL).
*/
int id;
/* Instruction where the allocation of this reference occurred. This
* is used purely to inform the user of a reference leak.
*/
int insn_idx;
};
/* state of the program:
* type of all registers and stack info
*/
......@@ -119,7 +132,9 @@ struct bpf_func_state {
*/
u32 subprogno;
/* should be second to last. See copy_func_state() */
/* The following fields should be last. See copy_func_state() */
int acquired_refs;
struct bpf_reference_state *refs;
int allocated_stack;
struct bpf_stack_state *stack;
};
......@@ -131,6 +146,17 @@ struct bpf_verifier_state {
u32 curframe;
};
#define bpf_get_spilled_reg(slot, frame) \
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
(frame->stack[slot].slot_type[0] == STACK_SPILL)) \
? &frame->stack[slot].spilled_ptr : NULL)
/* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
#define bpf_for_each_spilled_reg(iter, frame, reg) \
for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \
iter < frame->allocated_stack / BPF_REG_SIZE; \
iter++, reg = bpf_get_spilled_reg(iter, frame))
/* linked list of verifier states used to prune search */
struct bpf_verifier_state_list {
struct bpf_verifier_state state;
......@@ -204,15 +230,21 @@ __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
const char *fmt, ...);
static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
{
struct bpf_verifier_state *cur = env->cur_state;
return cur->frame[cur->curframe]->regs;
return cur->frame[cur->curframe];
}
static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
{
return cur_func(env)->regs;
}
int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx);
int bpf_prog_offload_finalize(struct bpf_verifier_env *env);
#endif /* _LINUX_BPF_VERIFIER_H */
......@@ -609,6 +609,9 @@ struct netdev_queue {
/* Subordinate device that the queue has been assigned to */
struct net_device *sb_dev;
#ifdef CONFIG_XDP_SOCKETS
struct xdp_umem *umem;
#endif
/*
* write-mostly part
*/
......@@ -738,6 +741,9 @@ struct netdev_rx_queue {
struct kobject kobj;
struct net_device *dev;
struct xdp_rxq_info xdp_rxq;
#ifdef CONFIG_XDP_SOCKETS
struct xdp_umem *umem;
#endif
} ____cacheline_aligned_in_smp;
/*
......
......@@ -86,6 +86,7 @@ struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries);
struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
struct xdp_umem_fq_reuse *newq);
void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
{
......@@ -183,6 +184,12 @@ static inline void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq)
{
}
static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
u16 queue_id)
{
return NULL;
}
static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
{
return NULL;
......
......@@ -127,6 +127,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_SOCKHASH,
BPF_MAP_TYPE_CGROUP_STORAGE,
BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
};
enum bpf_prog_type {
......@@ -2143,6 +2144,77 @@ union bpf_attr {
* request in the skb.
* Return
* 0 on success, or a negative error in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for TCP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for UDP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* int bpf_sk_release(struct bpf_sock *sk)
* Description
* Release the reference held by *sock*. *sock* must be a non-NULL
* pointer that was returned from bpf_sk_lookup_xxx\ ().
* Return
* 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
......@@ -2228,7 +2300,10 @@ union bpf_attr {
FN(get_current_cgroup_id), \
FN(get_local_storage), \
FN(sk_select_reuseport), \
FN(skb_ancestor_cgroup_id),
FN(skb_ancestor_cgroup_id), \
FN(sk_lookup_tcp), \
FN(sk_lookup_udp), \
FN(sk_release),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
......@@ -2398,6 +2473,23 @@ struct bpf_sock {
*/
};
struct bpf_sock_tuple {
union {
struct {
__be32 saddr;
__be32 daddr;
__be16 sport;
__be16 dport;
} ipv4;
struct {
__be32 saddr[4];
__be32 daddr[4];
__be16 sport;
__be16 dport;
} ipv6;
};
};
#define XDP_PACKET_HEADROOM 256
/* User return codes for XDP prog type.
......
......@@ -25,6 +25,7 @@ EXPORT_SYMBOL(cgroup_bpf_enabled_key);
*/
void cgroup_bpf_put(struct cgroup *cgrp)
{
enum bpf_cgroup_storage_type stype;
unsigned int type;
for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) {
......@@ -34,8 +35,10 @@ void cgroup_bpf_put(struct cgroup *cgrp)
list_for_each_entry_safe(pl, tmp, progs, node) {
list_del(&pl->node);
bpf_prog_put(pl->prog);
bpf_cgroup_storage_unlink(pl->storage);
bpf_cgroup_storage_free(pl->storage);
for_each_cgroup_storage_type(stype) {
bpf_cgroup_storage_unlink(pl->storage[stype]);
bpf_cgroup_storage_free(pl->storage[stype]);
}
kfree(pl);
static_branch_dec(&cgroup_bpf_enabled_key);
}
......@@ -97,6 +100,7 @@ static int compute_effective_progs(struct cgroup *cgrp,
enum bpf_attach_type type,
struct bpf_prog_array __rcu **array)
{
enum bpf_cgroup_storage_type stype;
struct bpf_prog_array *progs;
struct bpf_prog_list *pl;
struct cgroup *p = cgrp;
......@@ -125,7 +129,9 @@ static int compute_effective_progs(struct cgroup *cgrp,
continue;
progs->items[cnt].prog = pl->prog;
progs->items[cnt].cgroup_storage = pl->storage;
for_each_cgroup_storage_type(stype)
progs->items[cnt].cgroup_storage[stype] =
pl->storage[stype];
cnt++;
}
} while ((p = cgroup_parent(p)));
......@@ -232,7 +238,9 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
{
struct list_head *progs = &cgrp->bpf.progs[type];
struct bpf_prog *old_prog = NULL;
struct bpf_cgroup_storage *storage, *old_storage = NULL;
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
*old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL};
enum bpf_cgroup_storage_type stype;
struct bpf_prog_list *pl;
bool pl_was_allocated;
int err;
......@@ -254,34 +262,44 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS)
return -E2BIG;
storage = bpf_cgroup_storage_alloc(prog);
if (IS_ERR(storage))
return -ENOMEM;
for_each_cgroup_storage_type(stype) {
storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
if (IS_ERR(storage[stype])) {
storage[stype] = NULL;
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -ENOMEM;
}
}
if (flags & BPF_F_ALLOW_MULTI) {
list_for_each_entry(pl, progs, node) {
if (pl->prog == prog) {
/* disallow attaching the same prog twice */
bpf_cgroup_storage_free(storage);
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -EINVAL;
}
}
pl = kmalloc(sizeof(*pl), GFP_KERNEL);
if (!pl) {
bpf_cgroup_storage_free(storage);
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -ENOMEM;
}
pl_was_allocated = true;
pl->prog = prog;
pl->storage = storage;
for_each_cgroup_storage_type(stype)
pl->storage[stype] = storage[stype];
list_add_tail(&pl->node, progs);
} else {
if (list_empty(progs)) {
pl = kmalloc(sizeof(*pl), GFP_KERNEL);
if (!pl) {
bpf_cgroup_storage_free(storage);
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -ENOMEM;
}
pl_was_allocated = true;
......@@ -289,12 +307,15 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
} else {
pl = list_first_entry(progs, typeof(*pl), node);
old_prog = pl->prog;
old_storage = pl->storage;
bpf_cgroup_storage_unlink(old_storage);
for_each_cgroup_storage_type(stype) {
old_storage[stype] = pl->storage[stype];
bpf_cgroup_storage_unlink(old_storage[stype]);
}
pl_was_allocated = false;
}
pl->prog = prog;
pl->storage = storage;
for_each_cgroup_storage_type(stype)
pl->storage[stype] = storage[stype];
}
cgrp->bpf.flags[type] = flags;
......@@ -304,21 +325,27 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
goto cleanup;
static_branch_inc(&cgroup_bpf_enabled_key);
if (old_storage)
bpf_cgroup_storage_free(old_storage);
for_each_cgroup_storage_type(stype) {
if (!old_storage[stype])
continue;
bpf_cgroup_storage_free(old_storage[stype]);
}
if (old_prog) {
bpf_prog_put(old_prog);
static_branch_dec(&cgroup_bpf_enabled_key);
}
bpf_cgroup_storage_link(storage, cgrp, type);
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_link(storage[stype], cgrp, type);
return 0;
cleanup:
/* and cleanup the prog list */
pl->prog = old_prog;
bpf_cgroup_storage_free(pl->storage);
pl->storage = old_storage;
bpf_cgroup_storage_link(old_storage, cgrp, type);
for_each_cgroup_storage_type(stype) {
bpf_cgroup_storage_free(pl->storage[stype]);
pl->storage[stype] = old_storage[stype];
bpf_cgroup_storage_link(old_storage[stype], cgrp, type);
}
if (pl_was_allocated) {
list_del(&pl->node);
kfree(pl);
......@@ -339,6 +366,7 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
enum bpf_attach_type type, u32 unused_flags)
{
struct list_head *progs = &cgrp->bpf.progs[type];
enum bpf_cgroup_storage_type stype;
u32 flags = cgrp->bpf.flags[type];
struct bpf_prog *old_prog = NULL;
struct bpf_prog_list *pl;
......@@ -385,8 +413,10 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
/* now can actually delete it from this cgroup list */
list_del(&pl->node);
bpf_cgroup_storage_unlink(pl->storage);
bpf_cgroup_storage_free(pl->storage);
for_each_cgroup_storage_type(stype) {
bpf_cgroup_storage_unlink(pl->storage[stype]);
bpf_cgroup_storage_free(pl->storage[stype]);
}
kfree(pl);
if (list_empty(progs))
/* last program was detached, reset flags to zero */
......@@ -677,6 +707,8 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_current_uid_gid_proto;
case BPF_FUNC_get_local_storage:
return &bpf_get_local_storage_proto;
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
case BPF_FUNC_trace_printk:
if (capable(CAP_SYS_ADMIN))
return bpf_get_trace_printk_proto();
......
......@@ -194,16 +194,28 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
.ret_type = RET_INTEGER,
};
DECLARE_PER_CPU(void*, bpf_cgroup_storage);
#ifdef CONFIG_CGROUP_BPF
DECLARE_PER_CPU(struct bpf_cgroup_storage*,
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
{
/* map and flags arguments are not used now,
* but provide an ability to extend the API
* for other types of local storages.
* verifier checks that their values are correct.
/* flags argument is not used now,
* but provides an ability to extend the API.
* verifier checks that its value is correct.
*/
return (unsigned long) this_cpu_read(bpf_cgroup_storage);
enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
struct bpf_cgroup_storage *storage;
void *ptr;
storage = this_cpu_read(bpf_cgroup_storage[stype]);
if (stype == BPF_CGROUP_STORAGE_SHARED)
ptr = &READ_ONCE(storage->buf)->data[0];
else
ptr = this_cpu_ptr(storage->percpu_buf);
return (unsigned long)ptr;
}
const struct bpf_func_proto bpf_get_local_storage_proto = {
......@@ -214,3 +226,4 @@ const struct bpf_func_proto bpf_get_local_storage_proto = {
.arg2_type = ARG_ANYTHING,
};
#endif
#endif
......@@ -7,7 +7,8 @@
#include <linux/rbtree.h>
#include <linux/slab.h>
DEFINE_PER_CPU(void*, bpf_cgroup_storage);
DEFINE_PER_CPU(struct bpf_cgroup_storage*,
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
#ifdef CONFIG_CGROUP_BPF
......@@ -151,6 +152,71 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *_key,
return 0;
}
int bpf_percpu_cgroup_storage_copy(struct bpf_map *_map, void *_key,
void *value)
{
struct bpf_cgroup_storage_map *map = map_to_storage(_map);
struct bpf_cgroup_storage_key *key = _key;
struct bpf_cgroup_storage *storage;
int cpu, off = 0;
u32 size;
rcu_read_lock();
storage = cgroup_storage_lookup(map, key, false);
if (!storage) {
rcu_read_unlock();
return -ENOENT;
}
/* per_cpu areas are zero-filled and bpf programs can only
* access 'value_size' of them, so copying rounded areas
* will not leak any kernel data
*/
size = round_up(_map->value_size, 8);
for_each_possible_cpu(cpu) {
bpf_long_memcpy(value + off,
per_cpu_ptr(storage->percpu_buf, cpu), size);
off += size;
}
rcu_read_unlock();
return 0;
}
int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *_key,
void *value, u64 map_flags)
{
struct bpf_cgroup_storage_map *map = map_to_storage(_map);
struct bpf_cgroup_storage_key *key = _key;
struct bpf_cgroup_storage *storage;
int cpu, off = 0;
u32 size;
if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
return -EINVAL;
rcu_read_lock();
storage = cgroup_storage_lookup(map, key, false);
if (!storage) {
rcu_read_unlock();
return -ENOENT;
}
/* the user space will provide round_up(value_size, 8) bytes that
* will be copied into per-cpu area. bpf programs can only access
* value_size of it. During lookup the same extra bytes will be
* returned or zeros which were zero-filled by percpu_alloc,
* so no kernel data leaks possible
*/
size = round_up(_map->value_size, 8);
for_each_possible_cpu(cpu) {
bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu),
value + off, size);
off += size;
}
rcu_read_unlock();
return 0;
}
static int cgroup_storage_get_next_key(struct bpf_map *_map, void *_key,
void *_next_key)
{
......@@ -254,6 +320,7 @@ const struct bpf_map_ops cgroup_storage_map_ops = {
int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map)
{
enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
struct bpf_cgroup_storage_map *map = map_to_storage(_map);
int ret = -EBUSY;
......@@ -261,11 +328,12 @@ int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map)
if (map->prog && map->prog != prog)
goto unlock;
if (prog->aux->cgroup_storage && prog->aux->cgroup_storage != _map)
if (prog->aux->cgroup_storage[stype] &&
prog->aux->cgroup_storage[stype] != _map)
goto unlock;
map->prog = prog;
prog->aux->cgroup_storage = _map;
prog->aux->cgroup_storage[stype] = _map;
ret = 0;
unlock:
spin_unlock_bh(&map->lock);
......@@ -275,70 +343,117 @@ int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *_map)
void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *_map)
{
enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
struct bpf_cgroup_storage_map *map = map_to_storage(_map);
spin_lock_bh(&map->lock);
if (map->prog == prog) {
WARN_ON(prog->aux->cgroup_storage != _map);
WARN_ON(prog->aux->cgroup_storage[stype] != _map);
map->prog = NULL;
prog->aux->cgroup_storage = NULL;
prog->aux->cgroup_storage[stype] = NULL;
}
spin_unlock_bh(&map->lock);
}
struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog)
static size_t bpf_cgroup_storage_calculate_size(struct bpf_map *map, u32 *pages)
{
size_t size;
if (cgroup_storage_type(map) == BPF_CGROUP_STORAGE_SHARED) {
size = sizeof(struct bpf_storage_buffer) + map->value_size;
*pages = round_up(sizeof(struct bpf_cgroup_storage) + size,
PAGE_SIZE) >> PAGE_SHIFT;
} else {
size = map->value_size;
*pages = round_up(round_up(size, 8) * num_possible_cpus(),
PAGE_SIZE) >> PAGE_SHIFT;
}
return size;
}
struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
enum bpf_cgroup_storage_type stype)
{
struct bpf_cgroup_storage *storage;
struct bpf_map *map;
gfp_t flags;
size_t size;
u32 pages;
map = prog->aux->cgroup_storage;
map = prog->aux->cgroup_storage[stype];
if (!map)
return NULL;
pages = round_up(sizeof(struct bpf_cgroup_storage) +
sizeof(struct bpf_storage_buffer) +
map->value_size, PAGE_SIZE) >> PAGE_SHIFT;
size = bpf_cgroup_storage_calculate_size(map, &pages);
if (bpf_map_charge_memlock(map, pages))
return ERR_PTR(-EPERM);
storage = kmalloc_node(sizeof(struct bpf_cgroup_storage),
__GFP_ZERO | GFP_USER, map->numa_node);
if (!storage) {
bpf_map_uncharge_memlock(map, pages);
return ERR_PTR(-ENOMEM);
}
if (!storage)
goto enomem;
storage->buf = kmalloc_node(sizeof(struct bpf_storage_buffer) +
map->value_size, __GFP_ZERO | GFP_USER,
map->numa_node);
if (!storage->buf) {
bpf_map_uncharge_memlock(map, pages);
kfree(storage);
return ERR_PTR(-ENOMEM);
flags = __GFP_ZERO | GFP_USER;
if (stype == BPF_CGROUP_STORAGE_SHARED) {
storage->buf = kmalloc_node(size, flags, map->numa_node);
if (!storage->buf)
goto enomem;
} else {
storage->percpu_buf = __alloc_percpu_gfp(size, 8, flags);
if (!storage->percpu_buf)
goto enomem;
}
storage->map = (struct bpf_cgroup_storage_map *)map;
return storage;
enomem:
bpf_map_uncharge_memlock(map, pages);
kfree(storage);
return ERR_PTR(-ENOMEM);
}
static void free_shared_cgroup_storage_rcu(struct rcu_head *rcu)
{
struct bpf_cgroup_storage *storage =
container_of(rcu, struct bpf_cgroup_storage, rcu);
kfree(storage->buf);
kfree(storage);
}
static void free_percpu_cgroup_storage_rcu(struct rcu_head *rcu)
{
struct bpf_cgroup_storage *storage =
container_of(rcu, struct bpf_cgroup_storage, rcu);
free_percpu(storage->percpu_buf);
kfree(storage);
}
void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage)
{
u32 pages;
enum bpf_cgroup_storage_type stype;
struct bpf_map *map;
u32 pages;
if (!storage)
return;
map = &storage->map->map;
pages = round_up(sizeof(struct bpf_cgroup_storage) +
sizeof(struct bpf_storage_buffer) +
map->value_size, PAGE_SIZE) >> PAGE_SHIFT;
bpf_cgroup_storage_calculate_size(map, &pages);
bpf_map_uncharge_memlock(map, pages);
kfree_rcu(storage->buf, rcu);
kfree_rcu(storage, rcu);
stype = cgroup_storage_type(map);
if (stype == BPF_CGROUP_STORAGE_SHARED)
call_rcu(&storage->rcu, free_shared_cgroup_storage_rcu);
else
call_rcu(&storage->rcu, free_percpu_cgroup_storage_rcu);
}
void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
......
......@@ -24,7 +24,8 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
* in the verifier is not enough.
*/
if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE) {
inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE ||
inner_map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
fdput(f);
return ERR_PTR(-ENOTSUPP);
}
......
......@@ -172,6 +172,24 @@ int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
return ret;
}
int bpf_prog_offload_finalize(struct bpf_verifier_env *env)
{
struct bpf_prog_offload *offload;
int ret = -ENODEV;
down_read(&bpf_devs_lock);
offload = env->prog->aux->offload;
if (offload) {
if (offload->dev_ops->finalize)
ret = offload->dev_ops->finalize(env);
else
ret = 0;
}
up_read(&bpf_devs_lock);
return ret;
}
static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
{
struct bpf_prog_offload *offload = prog->aux->offload;
......
......@@ -686,7 +686,8 @@ static int map_lookup_elem(union bpf_attr *attr)
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
value_size = round_up(map->value_size, 8) * num_possible_cpus();
else if (IS_FD_MAP(map))
value_size = sizeof(u32);
......@@ -705,6 +706,8 @@ static int map_lookup_elem(union bpf_attr *attr)
err = bpf_percpu_hash_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
err = bpf_percpu_cgroup_storage_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
err = bpf_stackmap_copy(map, key, value);
} else if (IS_FD_ARRAY(map)) {
......@@ -774,7 +777,8 @@ static int map_update_elem(union bpf_attr *attr)
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
value_size = round_up(map->value_size, 8) * num_possible_cpus();
else
value_size = map->value_size;
......@@ -809,6 +813,9 @@ static int map_update_elem(union bpf_attr *attr)
err = bpf_percpu_hash_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
err = bpf_percpu_cgroup_storage_update(map, key, value,
attr->flags);
} else if (IS_FD_ARRAY(map)) {
rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value,
......@@ -988,10 +995,15 @@ static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
/* drop refcnt on maps used by eBPF program and free auxilary data */
static void free_used_maps(struct bpf_prog_aux *aux)
{
enum bpf_cgroup_storage_type stype;
int i;
if (aux->cgroup_storage)
bpf_cgroup_storage_release(aux->prog, aux->cgroup_storage);
for_each_cgroup_storage_type(stype) {
if (!aux->cgroup_storage[stype])
continue;
bpf_cgroup_storage_release(aux->prog,
aux->cgroup_storage[stype]);
}
for (i = 0; i < aux->used_map_cnt; i++)
bpf_map_put(aux->used_maps[i]);
......
This diff is collapsed.
......@@ -6494,6 +6494,7 @@ static struct sk_buff *populate_skb(char *buf, int size)
skb->queue_mapping = SKB_QUEUE_MAP;
skb->vlan_tci = SKB_VLAN_TCI;
skb->vlan_proto = htons(ETH_P_IP);
dev_net_set(&dev, &init_net);
skb->dev = &dev;
skb->dev->ifindex = SKB_DEV_IFINDEX;
skb->dev->type = SKB_DEV_TYPE;
......
......@@ -12,7 +12,7 @@
#include <linux/sched/signal.h>
static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
struct bpf_cgroup_storage *storage)
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
{
u32 ret;
......@@ -28,13 +28,20 @@ static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *time)
{
struct bpf_cgroup_storage *storage = NULL;
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 };
enum bpf_cgroup_storage_type stype;
u64 time_start, time_spent = 0;
u32 ret = 0, i;
storage = bpf_cgroup_storage_alloc(prog);
if (IS_ERR(storage))
return PTR_ERR(storage);
for_each_cgroup_storage_type(stype) {
storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
if (IS_ERR(storage[stype])) {
storage[stype] = NULL;
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return -ENOMEM;
}
}
if (!repeat)
repeat = 1;
......@@ -53,7 +60,8 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *time)
do_div(time_spent, repeat);
*time = time_spent > U32_MAX ? U32_MAX : (u32)time_spent;
bpf_cgroup_storage_free(storage);
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
return ret;
}
......
......@@ -27,6 +27,7 @@
#include <linux/rtnetlink.h>
#include <linux/sched/signal.h>
#include <linux/net.h>
#include <net/xdp_sock.h>
/*
* Some useful ethtool_ops methods that're device independent.
......@@ -1662,8 +1663,10 @@ static noinline_for_stack int ethtool_get_channels(struct net_device *dev,
static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
void __user *useraddr)
{
struct ethtool_channels channels, max = { .cmd = ETHTOOL_GCHANNELS };
struct ethtool_channels channels, curr = { .cmd = ETHTOOL_GCHANNELS };
u16 from_channel, to_channel;
u32 max_rx_in_use = 0;
unsigned int i;
if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
return -EOPNOTSUPP;
......@@ -1671,13 +1674,13 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
if (copy_from_user(&channels, useraddr, sizeof(channels)))
return -EFAULT;
dev->ethtool_ops->get_channels(dev, &max);
dev->ethtool_ops->get_channels(dev, &curr);
/* ensure new counts are within the maximums */
if ((channels.rx_count > max.max_rx) ||
(channels.tx_count > max.max_tx) ||
(channels.combined_count > max.max_combined) ||
(channels.other_count > max.max_other))
if (channels.rx_count > curr.max_rx ||
channels.tx_count > curr.max_tx ||
channels.combined_count > curr.max_combined ||
channels.other_count > curr.max_other)
return -EINVAL;
/* ensure the new Rx count fits within the configured Rx flow
......@@ -1687,6 +1690,14 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
(channels.combined_count + channels.rx_count) <= max_rx_in_use)
return -EINVAL;
/* Disabling channels, query zero-copy AF_XDP sockets */
from_channel = channels.combined_count +
min(channels.rx_count, channels.tx_count);
to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count);
for (i = from_channel; i < to_channel; i++)
if (xdp_get_umem_from_qid(dev, i))
return -EINVAL;
return dev->ethtool_ops->set_channels(dev, &channels);
}
......
......@@ -58,13 +58,17 @@
#include <net/busy_poll.h>
#include <net/tcp.h>
#include <net/xfrm.h>
#include <net/udp.h>
#include <linux/bpf_trace.h>
#include <net/xdp_sock.h>
#include <linux/inetdevice.h>
#include <net/inet_hashtables.h>
#include <net/inet6_hashtables.h>
#include <net/ip_fib.h>
#include <net/flow.h>
#include <net/arp.h>
#include <net/ipv6.h>
#include <net/net_namespace.h>
#include <linux/seg6_local.h>
#include <net/seg6.h>
#include <net/seg6_local.h>
......@@ -4813,6 +4817,143 @@ static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
};
#endif /* CONFIG_IPV6_SEG6_BPF */
#ifdef CONFIG_INET
static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
struct sk_buff *skb, u8 family, u8 proto)
{
int dif = skb->dev->ifindex;
bool refcounted = false;
struct sock *sk = NULL;
if (family == AF_INET) {
__be32 src4 = tuple->ipv4.saddr;
__be32 dst4 = tuple->ipv4.daddr;
int sdif = inet_sdif(skb);
if (proto == IPPROTO_TCP)
sk = __inet_lookup(net, &tcp_hashinfo, skb, 0,
src4, tuple->ipv4.sport,
dst4, tuple->ipv4.dport,
dif, sdif, &refcounted);
else
sk = __udp4_lib_lookup(net, src4, tuple->ipv4.sport,
dst4, tuple->ipv4.dport,
dif, sdif, &udp_table, skb);
#if IS_REACHABLE(CONFIG_IPV6)
} else {
struct in6_addr *src6 = (struct in6_addr *)&tuple->ipv6.saddr;
struct in6_addr *dst6 = (struct in6_addr *)&tuple->ipv6.daddr;
int sdif = inet6_sdif(skb);
if (proto == IPPROTO_TCP)
sk = __inet6_lookup(net, &tcp_hashinfo, skb, 0,
src6, tuple->ipv6.sport,
dst6, tuple->ipv6.dport,
dif, sdif, &refcounted);
else
sk = __udp6_lib_lookup(net, src6, tuple->ipv6.sport,
dst6, tuple->ipv6.dport,
dif, sdif, &udp_table, skb);
#endif
}
if (unlikely(sk && !refcounted && !sock_flag(sk, SOCK_RCU_FREE))) {
WARN_ONCE(1, "Found non-RCU, unreferenced socket!");
sk = NULL;
}
return sk;
}
/* bpf_sk_lookup performs the core lookup for different types of sockets,
* taking a reference on the socket if it doesn't have the flag SOCK_RCU_FREE.
* Returns the socket as an 'unsigned long' to simplify the casting in the
* callers to satisfy BPF_CALL declarations.
*/
static unsigned long
bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
u8 proto, u64 netns_id, u64 flags)
{
struct net *caller_net;
struct sock *sk = NULL;
u8 family = AF_UNSPEC;
struct net *net;
family = len == sizeof(tuple->ipv4) ? AF_INET : AF_INET6;
if (unlikely(family == AF_UNSPEC || netns_id > U32_MAX || flags))
goto out;
if (skb->dev)
caller_net = dev_net(skb->dev);
else
caller_net = sock_net(skb->sk);
if (netns_id) {
net = get_net_ns_by_id(caller_net, netns_id);
if (unlikely(!net))
goto out;
sk = sk_lookup(net, tuple, skb, family, proto);
put_net(net);
} else {
net = caller_net;
sk = sk_lookup(net, tuple, skb, family, proto);
}
if (sk)
sk = sk_to_full_sk(sk);
out:
return (unsigned long) sk;
}
BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
return bpf_sk_lookup(skb, tuple, len, IPPROTO_TCP, netns_id, flags);
}
static const struct bpf_func_proto bpf_sk_lookup_tcp_proto = {
.func = bpf_sk_lookup_tcp,
.gpl_only = false,
.pkt_access = true,
.ret_type = RET_PTR_TO_SOCKET_OR_NULL,
.arg1_type = ARG_PTR_TO_CTX,
.arg2_type = ARG_PTR_TO_MEM,
.arg3_type = ARG_CONST_SIZE,
.arg4_type = ARG_ANYTHING,
.arg5_type = ARG_ANYTHING,
};
BPF_CALL_5(bpf_sk_lookup_udp, struct sk_buff *, skb,
struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
{
return bpf_sk_lookup(skb, tuple, len, IPPROTO_UDP, netns_id, flags);
}
static const struct bpf_func_proto bpf_sk_lookup_udp_proto = {
.func = bpf_sk_lookup_udp,
.gpl_only = false,
.pkt_access = true,
.ret_type = RET_PTR_TO_SOCKET_OR_NULL,
.arg1_type = ARG_PTR_TO_CTX,
.arg2_type = ARG_PTR_TO_MEM,
.arg3_type = ARG_CONST_SIZE,
.arg4_type = ARG_ANYTHING,
.arg5_type = ARG_ANYTHING,
};
BPF_CALL_1(bpf_sk_release, struct sock *, sk)
{
if (!sock_flag(sk, SOCK_RCU_FREE))
sock_gen_put(sk);
return 0;
}
static const struct bpf_func_proto bpf_sk_release_proto = {
.func = bpf_sk_release,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_SOCKET,
};
#endif /* CONFIG_INET */
bool bpf_helper_changes_pkt_data(void *func)
{
if (func == bpf_skb_vlan_push ||
......@@ -5018,6 +5159,14 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_skb_cgroup_id_proto;
case BPF_FUNC_skb_ancestor_cgroup_id:
return &bpf_skb_ancestor_cgroup_id_proto;
#endif
#ifdef CONFIG_INET
case BPF_FUNC_sk_lookup_tcp:
return &bpf_sk_lookup_tcp_proto;
case BPF_FUNC_sk_lookup_udp:
return &bpf_sk_lookup_udp_proto;
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
#endif
default:
return bpf_base_func_proto(func_id);
......@@ -5119,6 +5268,14 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_sk_redirect_hash_proto;
case BPF_FUNC_get_local_storage:
return &bpf_get_local_storage_proto;
#ifdef CONFIG_INET
case BPF_FUNC_sk_lookup_tcp:
return &bpf_sk_lookup_tcp_proto;
case BPF_FUNC_sk_lookup_udp:
return &bpf_sk_lookup_udp_proto;
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
#endif
default:
return bpf_base_func_proto(func_id);
}
......@@ -5394,23 +5551,29 @@ static bool __sock_filter_check_size(int off, int size,
return size == size_default;
}
static bool sock_filter_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
if (off < 0 || off >= sizeof(struct bpf_sock))
return false;
if (off % size != 0)
return false;
if (!__sock_filter_check_attach_type(off, type,
prog->expected_attach_type))
return false;
if (!__sock_filter_check_size(off, size, info))
return false;
return true;
}
static bool sock_filter_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
if (!bpf_sock_is_valid_access(off, size, type, info))
return false;
return __sock_filter_check_attach_type(off, type,
prog->expected_attach_type);
}
static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write,
const struct bpf_prog *prog, int drop_verdict)
{
......@@ -6122,10 +6285,10 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
return insn - insn_buf;
}
static u32 sock_filter_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog, u32 *target_size)
u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog, u32 *target_size)
{
struct bpf_insn *insn = insn_buf;
int off;
......@@ -7037,7 +7200,7 @@ const struct bpf_prog_ops lwt_seg6local_prog_ops = {
const struct bpf_verifier_ops cg_sock_verifier_ops = {
.get_func_proto = sock_filter_func_proto,
.is_valid_access = sock_filter_is_valid_access,
.convert_ctx_access = sock_filter_convert_ctx_access,
.convert_ctx_access = bpf_sock_convert_ctx_access,
};
const struct bpf_prog_ops cg_sock_prog_ops = {
......
......@@ -32,37 +32,49 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
{
unsigned long flags;
if (xs->dev) {
spin_lock_irqsave(&umem->xsk_list_lock, flags);
list_del_rcu(&xs->list);
spin_unlock_irqrestore(&umem->xsk_list_lock, flags);
if (umem->zc)
synchronize_net();
}
spin_lock_irqsave(&umem->xsk_list_lock, flags);
list_del_rcu(&xs->list);
spin_unlock_irqrestore(&umem->xsk_list_lock, flags);
}
int xdp_umem_query(struct net_device *dev, u16 queue_id)
/* The umem is stored both in the _rx struct and the _tx struct as we do
* not know if the device has more tx queues than rx, or the opposite.
* This might also change during run time.
*/
static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
u16 queue_id)
{
struct netdev_bpf bpf;
if (queue_id < dev->real_num_rx_queues)
dev->_rx[queue_id].umem = umem;
if (queue_id < dev->real_num_tx_queues)
dev->_tx[queue_id].umem = umem;
}
ASSERT_RTNL();
struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
u16 queue_id)
{
if (queue_id < dev->real_num_rx_queues)
return dev->_rx[queue_id].umem;
if (queue_id < dev->real_num_tx_queues)
return dev->_tx[queue_id].umem;
memset(&bpf, 0, sizeof(bpf));
bpf.command = XDP_QUERY_XSK_UMEM;
bpf.xsk.queue_id = queue_id;
return NULL;
}
if (!dev->netdev_ops->ndo_bpf)
return 0;
return dev->netdev_ops->ndo_bpf(dev, &bpf) ?: !!bpf.xsk.umem;
static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
{
if (queue_id < dev->real_num_rx_queues)
dev->_rx[queue_id].umem = NULL;
if (queue_id < dev->real_num_tx_queues)
dev->_tx[queue_id].umem = NULL;
}
int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
u32 queue_id, u16 flags)
u16 queue_id, u16 flags)
{
bool force_zc, force_copy;
struct netdev_bpf bpf;
int err;
int err = 0;
force_zc = flags & XDP_ZEROCOPY;
force_copy = flags & XDP_COPY;
......@@ -70,17 +82,23 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
if (force_zc && force_copy)
return -EINVAL;
if (force_copy)
return 0;
rtnl_lock();
if (xdp_get_umem_from_qid(dev, queue_id)) {
err = -EBUSY;
goto out_rtnl_unlock;
}
if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
xdp_reg_umem_at_qid(dev, umem, queue_id);
umem->dev = dev;
umem->queue_id = queue_id;
if (force_copy)
/* For copy-mode, we are done. */
goto out_rtnl_unlock;
rtnl_lock();
err = xdp_umem_query(dev, queue_id);
if (err) {
err = err < 0 ? -EOPNOTSUPP : -EBUSY;
goto err_rtnl_unlock;
if (!dev->netdev_ops->ndo_bpf ||
!dev->netdev_ops->ndo_xsk_async_xmit) {
err = -EOPNOTSUPP;
goto err_unreg_umem;
}
bpf.command = XDP_SETUP_XSK_UMEM;
......@@ -89,18 +107,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
err = dev->netdev_ops->ndo_bpf(dev, &bpf);
if (err)
goto err_rtnl_unlock;
goto err_unreg_umem;
rtnl_unlock();
dev_hold(dev);
umem->dev = dev;
umem->queue_id = queue_id;
umem->zc = true;
return 0;
err_rtnl_unlock:
err_unreg_umem:
xdp_clear_umem_at_qid(dev, queue_id);
if (!force_zc)
err = 0; /* fallback to copy mode */
out_rtnl_unlock:
rtnl_unlock();
return force_zc ? err : 0; /* fail or fallback */
return err;
}
static void xdp_umem_clear_dev(struct xdp_umem *umem)
......@@ -108,7 +128,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
struct netdev_bpf bpf;
int err;
if (umem->dev) {
if (umem->zc) {
bpf.command = XDP_SETUP_XSK_UMEM;
bpf.xsk.umem = NULL;
bpf.xsk.queue_id = umem->queue_id;
......@@ -119,9 +139,17 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
if (err)
WARN(1, "failed to disable umem!\n");
}
if (umem->dev) {
rtnl_lock();
xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
rtnl_unlock();
}
if (umem->zc) {
dev_put(umem->dev);
umem->dev = NULL;
umem->zc = false;
}
}
......
......@@ -9,7 +9,7 @@
#include <net/xdp_sock.h>
int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
u32 queue_id, u16 flags);
u16 queue_id, u16 flags);
bool xdp_umem_validate_queues(struct xdp_umem *umem);
void xdp_get_umem(struct xdp_umem *umem);
void xdp_put_umem(struct xdp_umem *umem);
......
......@@ -355,12 +355,18 @@ static int xsk_release(struct socket *sock)
local_bh_enable();
if (xs->dev) {
struct net_device *dev = xs->dev;
/* Wait for driver to stop using the xdp socket. */
synchronize_net();
dev_put(xs->dev);
xdp_del_sk_umem(xs->umem, xs);
xs->dev = NULL;
synchronize_net();
dev_put(dev);
}
xskq_destroy(xs->rx);
xskq_destroy(xs->tx);
sock_orphan(sk);
sock->sk = NULL;
......@@ -419,13 +425,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
}
qid = sxdp->sxdp_queue_id;
if ((xs->rx && qid >= dev->real_num_rx_queues) ||
(xs->tx && qid >= dev->real_num_tx_queues)) {
err = -EINVAL;
goto out_unlock;
}
flags = sxdp->sxdp_flags;
if (flags & XDP_SHARED_UMEM) {
......@@ -721,9 +720,6 @@ static void xsk_destruct(struct sock *sk)
if (!sock_flag(sk, SOCK_DEAD))
return;
xskq_destroy(xs->rx);
xskq_destroy(xs->tx);
xdp_del_sk_umem(xs->umem, xs);
xdp_put_umem(xs->umem);
sk_refcnt_debug_dec(sk);
......
......@@ -209,7 +209,7 @@ static int map_fd = -1;
static int prog_load_cnt(int verdict, int val)
{
int cgroup_storage_fd;
int cgroup_storage_fd, percpu_cgroup_storage_fd;
if (map_fd < 0)
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 8, 1, 0);
......@@ -225,6 +225,14 @@ static int prog_load_cnt(int verdict, int val)
return -1;
}
percpu_cgroup_storage_fd = bpf_create_map(
BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
sizeof(struct bpf_cgroup_storage_key), 8, 0, 0);
if (percpu_cgroup_storage_fd < 0) {
printf("failed to create map '%s'\n", strerror(errno));
return -1;
}
struct bpf_insn prog[] = {
BPF_MOV32_IMM(BPF_REG_0, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
......@@ -235,11 +243,20 @@ static int prog_load_cnt(int verdict, int val)
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, val), /* r1 = 1 */
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
BPF_LD_MAP_FD(BPF_REG_1, cgroup_storage_fd),
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_local_storage),
BPF_MOV64_IMM(BPF_REG_1, val),
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_0, BPF_REG_1, 0, 0),
BPF_LD_MAP_FD(BPF_REG_1, percpu_cgroup_storage_fd),
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_local_storage),
BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, 0x1),
BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_3, 0),
BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
BPF_EXIT_INSN(),
};
......
......@@ -17,8 +17,6 @@
#include "bpf_load.h"
#include "bpf_util.h"
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
#define SLOTS 100
static void clear_stats(int fd)
......
......@@ -72,13 +72,15 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_SOCKHASH] = "sockhash",
[BPF_MAP_TYPE_CGROUP_STORAGE] = "cgroup_storage",
[BPF_MAP_TYPE_REUSEPORT_SOCKARRAY] = "reuseport_sockarray",
[BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE] = "percpu_cgroup_storage",
};
static bool map_is_per_cpu(__u32 type)
{
return type == BPF_MAP_TYPE_PERCPU_HASH ||
type == BPF_MAP_TYPE_PERCPU_ARRAY ||
type == BPF_MAP_TYPE_LRU_PERCPU_HASH;
type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE;
}
static bool map_is_map_of_maps(__u32 type)
......
......@@ -69,7 +69,9 @@ static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
snprintf(netinfo->devices[netinfo->used_len].devname,
sizeof(netinfo->devices[netinfo->used_len].devname),
"%s",
tb[IFLA_IFNAME] ? nla_getattr_str(tb[IFLA_IFNAME]) : "");
tb[IFLA_IFNAME]
? libbpf_nla_getattr_str(tb[IFLA_IFNAME])
: "");
netinfo->used_len++;
return do_xdp_dump(ifinfo, tb);
......@@ -83,7 +85,7 @@ static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb)
if (tcinfo->is_qdisc) {
/* skip clsact qdisc */
if (tb[TCA_KIND] &&
strcmp(nla_data(tb[TCA_KIND]), "clsact") == 0)
strcmp(libbpf_nla_data(tb[TCA_KIND]), "clsact") == 0)
return 0;
if (info->tcm_handle == 0)
return 0;
......@@ -101,7 +103,9 @@ static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb)
snprintf(tcinfo->handle_array[tcinfo->used_len].kind,
sizeof(tcinfo->handle_array[tcinfo->used_len].kind),
"%s",
tb[TCA_KIND] ? nla_getattr_str(tb[TCA_KIND]) : "unknown");
tb[TCA_KIND]
? libbpf_nla_getattr_str(tb[TCA_KIND])
: "unknown");
tcinfo->used_len++;
return 0;
......@@ -127,14 +131,14 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
tcinfo.array_len = 0;
tcinfo.is_qdisc = false;
ret = nl_get_class(sock, nl_pid, dev->ifindex, dump_class_qdisc_nlmsg,
&tcinfo);
ret = libbpf_nl_get_class(sock, nl_pid, dev->ifindex,
dump_class_qdisc_nlmsg, &tcinfo);
if (ret)
goto out;
tcinfo.is_qdisc = true;
ret = nl_get_qdisc(sock, nl_pid, dev->ifindex, dump_class_qdisc_nlmsg,
&tcinfo);
ret = libbpf_nl_get_qdisc(sock, nl_pid, dev->ifindex,
dump_class_qdisc_nlmsg, &tcinfo);
if (ret)
goto out;
......@@ -142,10 +146,9 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
filter_info.ifindex = dev->ifindex;
for (i = 0; i < tcinfo.used_len; i++) {
filter_info.kind = tcinfo.handle_array[i].kind;
ret = nl_get_filter(sock, nl_pid, dev->ifindex,
tcinfo.handle_array[i].handle,
dump_filter_nlmsg,
&filter_info);
ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex,
tcinfo.handle_array[i].handle,
dump_filter_nlmsg, &filter_info);
if (ret)
goto out;
}
......@@ -153,22 +156,22 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
/* root, ingress and egress handle */
handle = TC_H_ROOT;
filter_info.kind = "root";
ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info);
ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info);
if (ret)
goto out;
handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
filter_info.kind = "clsact/ingress";
ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info);
ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info);
if (ret)
goto out;
handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS);
filter_info.kind = "clsact/egress";
ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info);
ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
dump_filter_nlmsg, &filter_info);
if (ret)
goto out;
......@@ -196,7 +199,7 @@ static int do_show(int argc, char **argv)
usage();
}
sock = bpf_netlink_open(&nl_pid);
sock = libbpf_netlink_open(&nl_pid);
if (sock < 0) {
fprintf(stderr, "failed to open netlink sock\n");
return -1;
......@@ -211,7 +214,7 @@ static int do_show(int argc, char **argv)
jsonw_start_array(json_wtr);
NET_START_OBJECT;
NET_START_ARRAY("xdp", "%s:\n");
ret = nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array);
ret = libbpf_nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array);
NET_END_ARRAY("\n");
if (!ret) {
......
......@@ -21,7 +21,7 @@ static void xdp_dump_prog_id(struct nlattr **tb, int attr,
if (new_json_object)
NET_START_OBJECT
NET_DUMP_STR("mode", " %s", mode);
NET_DUMP_UINT("id", " id %u", nla_getattr_u32(tb[attr]))
NET_DUMP_UINT("id", " id %u", libbpf_nla_getattr_u32(tb[attr]))
if (new_json_object)
NET_END_OBJECT
}
......@@ -32,13 +32,13 @@ static int do_xdp_dump_one(struct nlattr *attr, unsigned int ifindex,
struct nlattr *tb[IFLA_XDP_MAX + 1];
unsigned char mode;
if (nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0)
if (libbpf_nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0)
return -1;
if (!tb[IFLA_XDP_ATTACHED])
return 0;
mode = nla_getattr_u8(tb[IFLA_XDP_ATTACHED]);
mode = libbpf_nla_getattr_u8(tb[IFLA_XDP_ATTACHED]);
if (mode == XDP_ATTACHED_NONE)
return 0;
......@@ -75,14 +75,14 @@ int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb)
return 0;
return do_xdp_dump_one(tb[IFLA_XDP], ifinfo->ifi_index,
nla_getattr_str(tb[IFLA_IFNAME]));
libbpf_nla_getattr_str(tb[IFLA_IFNAME]));
}
static int do_bpf_dump_one_act(struct nlattr *attr)
{
struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
if (nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL) < 0)
if (libbpf_nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL) < 0)
return -LIBBPF_ERRNO__NLPARSE;
if (!tb[TCA_ACT_BPF_PARMS])
......@@ -91,10 +91,10 @@ static int do_bpf_dump_one_act(struct nlattr *attr)
NET_START_OBJECT_NESTED2;
if (tb[TCA_ACT_BPF_NAME])
NET_DUMP_STR("name", "%s",
nla_getattr_str(tb[TCA_ACT_BPF_NAME]));
libbpf_nla_getattr_str(tb[TCA_ACT_BPF_NAME]));
if (tb[TCA_ACT_BPF_ID])
NET_DUMP_UINT("id", " id %u",
nla_getattr_u32(tb[TCA_ACT_BPF_ID]));
libbpf_nla_getattr_u32(tb[TCA_ACT_BPF_ID]));
NET_END_OBJECT_NESTED;
return 0;
}
......@@ -106,10 +106,11 @@ static int do_dump_one_act(struct nlattr *attr)
if (!attr)
return 0;
if (nla_parse_nested(tb, TCA_ACT_MAX, attr, NULL) < 0)
if (libbpf_nla_parse_nested(tb, TCA_ACT_MAX, attr, NULL) < 0)
return -LIBBPF_ERRNO__NLPARSE;
if (tb[TCA_ACT_KIND] && strcmp(nla_data(tb[TCA_ACT_KIND]), "bpf") == 0)
if (tb[TCA_ACT_KIND] &&
strcmp(libbpf_nla_data(tb[TCA_ACT_KIND]), "bpf") == 0)
return do_bpf_dump_one_act(tb[TCA_ACT_OPTIONS]);
return 0;
......@@ -120,7 +121,7 @@ static int do_bpf_act_dump(struct nlattr *attr)
struct nlattr *tb[TCA_ACT_MAX_PRIO + 1];
int act, ret;
if (nla_parse_nested(tb, TCA_ACT_MAX_PRIO, attr, NULL) < 0)
if (libbpf_nla_parse_nested(tb, TCA_ACT_MAX_PRIO, attr, NULL) < 0)
return -LIBBPF_ERRNO__NLPARSE;
NET_START_ARRAY("act", " %s [");
......@@ -139,13 +140,15 @@ static int do_bpf_filter_dump(struct nlattr *attr)
struct nlattr *tb[TCA_BPF_MAX + 1];
int ret;
if (nla_parse_nested(tb, TCA_BPF_MAX, attr, NULL) < 0)
if (libbpf_nla_parse_nested(tb, TCA_BPF_MAX, attr, NULL) < 0)
return -LIBBPF_ERRNO__NLPARSE;
if (tb[TCA_BPF_NAME])
NET_DUMP_STR("name", " %s", nla_getattr_str(tb[TCA_BPF_NAME]));
NET_DUMP_STR("name", " %s",
libbpf_nla_getattr_str(tb[TCA_BPF_NAME]));
if (tb[TCA_BPF_ID])
NET_DUMP_UINT("id", " id %u", nla_getattr_u32(tb[TCA_BPF_ID]));
NET_DUMP_UINT("id", " id %u",
libbpf_nla_getattr_u32(tb[TCA_BPF_ID]));
if (tb[TCA_BPF_ACT]) {
ret = do_bpf_act_dump(tb[TCA_BPF_ACT]);
if (ret)
......@@ -160,7 +163,8 @@ int do_filter_dump(struct tcmsg *info, struct nlattr **tb, const char *kind,
{
int ret = 0;
if (tb[TCA_OPTIONS] && strcmp(nla_data(tb[TCA_KIND]), "bpf") == 0) {
if (tb[TCA_OPTIONS] &&
strcmp(libbpf_nla_data(tb[TCA_KIND]), "bpf") == 0) {
NET_START_OBJECT;
if (devname[0] != '\0')
NET_DUMP_STR("devname", "%s", devname);
......
......@@ -16,7 +16,7 @@
jsonw_name(json_wtr, name); \
jsonw_start_object(json_wtr); \
} else { \
fprintf(stderr, "%s {", name); \
fprintf(stdout, "%s {", name); \
} \
}
......@@ -25,7 +25,7 @@
if (json_output) \
jsonw_start_object(json_wtr); \
else \
fprintf(stderr, "{"); \
fprintf(stdout, "{"); \
}
#define NET_END_OBJECT_NESTED \
......@@ -33,7 +33,7 @@
if (json_output) \
jsonw_end_object(json_wtr); \
else \
fprintf(stderr, "}"); \
fprintf(stdout, "}"); \
}
#define NET_END_OBJECT \
......@@ -47,7 +47,7 @@
if (json_output) \
jsonw_end_object(json_wtr); \
else \
fprintf(stderr, "\n"); \
fprintf(stdout, "\n"); \
}
#define NET_START_ARRAY(name, fmt_str) \
......@@ -56,7 +56,7 @@
jsonw_name(json_wtr, name); \
jsonw_start_array(json_wtr); \
} else { \
fprintf(stderr, fmt_str, name); \
fprintf(stdout, fmt_str, name); \
} \
}
......@@ -65,7 +65,7 @@
if (json_output) \
jsonw_end_array(json_wtr); \
else \
fprintf(stderr, "%s", endstr); \
fprintf(stdout, "%s", endstr); \
}
#define NET_DUMP_UINT(name, fmt_str, val) \
......@@ -73,7 +73,7 @@
if (json_output) \
jsonw_uint_field(json_wtr, name, val); \
else \
fprintf(stderr, fmt_str, val); \
fprintf(stdout, fmt_str, val); \
}
#define NET_DUMP_STR(name, fmt_str, str) \
......@@ -81,7 +81,7 @@
if (json_output) \
jsonw_string_field(json_wtr, name, str);\
else \
fprintf(stderr, fmt_str, str); \
fprintf(stdout, fmt_str, str); \
}
#define NET_DUMP_STR_ONLY(str) \
......@@ -89,7 +89,7 @@
if (json_output) \
jsonw_string(json_wtr, str); \
else \
fprintf(stderr, "%s ", str); \
fprintf(stdout, "%s ", str); \
}
#endif
......@@ -127,6 +127,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_SOCKHASH,
BPF_MAP_TYPE_CGROUP_STORAGE,
BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
};
enum bpf_prog_type {
......@@ -2143,6 +2144,77 @@ union bpf_attr {
* request in the skb.
* Return
* 0 on success, or a negative error in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for TCP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for UDP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* int bpf_sk_release(struct bpf_sock *sk)
* Description
* Release the reference held by *sock*. *sock* must be a non-NULL
* pointer that was returned from bpf_sk_lookup_xxx\ ().
* Return
* 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
......@@ -2228,7 +2300,10 @@ union bpf_attr {
FN(get_current_cgroup_id), \
FN(get_local_storage), \
FN(sk_select_reuseport), \
FN(skb_ancestor_cgroup_id),
FN(skb_ancestor_cgroup_id), \
FN(sk_lookup_tcp), \
FN(sk_lookup_udp), \
FN(sk_release),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
......@@ -2398,6 +2473,23 @@ struct bpf_sock {
*/
};
struct bpf_sock_tuple {
union {
struct {
__be32 saddr;
__be32 daddr;
__be16 sport;
__be16 dport;
} ipv4;
struct {
__be32 saddr[4];
__be32 daddr[4];
__be16 sport;
__be16 dport;
} ipv6;
};
};
#define XDP_PACKET_HEADROOM 256
/* User return codes for XDP prog type.
......
# SPDX-License-Identifier: GPL-2.0
# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
# Most of this file is copied from tools/lib/traceevent/Makefile
BPF_VERSION = 0
......
// SPDX-License-Identifier: LGPL-2.1
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/*
* common eBPF ELF operations.
......
/* SPDX-License-Identifier: LGPL-2.1 */
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/*
* common eBPF ELF operations.
......@@ -20,8 +20,8 @@
* You should have received a copy of the GNU Lesser General Public
* License along with this program; if not, see <http://www.gnu.org/licenses>
*/
#ifndef __BPF_BPF_H
#define __BPF_BPF_H
#ifndef __LIBBPF_BPF_H
#define __LIBBPF_BPF_H
#include <linux/bpf.h>
#include <stdbool.h>
......@@ -111,4 +111,4 @@ int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len,
__u32 *prog_id, __u32 *fd_type, __u64 *probe_offset,
__u64 *probe_addr);
#endif
#endif /* __LIBBPF_BPF_H */
// SPDX-License-Identifier: LGPL-2.1
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2018 Facebook */
#include <stdlib.h>
......
/* SPDX-License-Identifier: LGPL-2.1 */
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* Copyright (c) 2018 Facebook */
#ifndef __BPF_BTF_H
#define __BPF_BTF_H
#ifndef __LIBBPF_BTF_H
#define __LIBBPF_BTF_H
#include <linux/types.h>
......@@ -23,4 +23,4 @@ int btf__resolve_type(const struct btf *btf, __u32 type_id);
int btf__fd(const struct btf *btf);
const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
#endif
#endif /* __LIBBPF_BTF_H */
This diff is collapsed.
/* SPDX-License-Identifier: LGPL-2.1 */
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/*
* Common eBPF ELF object loading operations.
......@@ -6,22 +6,9 @@
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation;
* version 2.1 of the License (not later!)
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this program; if not, see <http://www.gnu.org/licenses>
*/
#ifndef __BPF_LIBBPF_H
#define __BPF_LIBBPF_H
#ifndef __LIBBPF_LIBBPF_H
#define __LIBBPF_LIBBPF_H
#include <stdio.h>
#include <stdint.h>
......@@ -104,6 +91,8 @@ void *bpf_object__priv(struct bpf_object *prog);
int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
enum bpf_attach_type *expected_attach_type);
int libbpf_attach_type_by_name(const char *name,
enum bpf_attach_type *attach_type);
/* Accessors of bpf_program */
struct bpf_program;
......@@ -126,10 +115,13 @@ void bpf_program__set_ifindex(struct bpf_program *prog, __u32 ifindex);
const char *bpf_program__title(struct bpf_program *prog, bool needs_copy);
int bpf_program__load(struct bpf_program *prog, char *license,
__u32 kern_version);
int bpf_program__fd(struct bpf_program *prog);
int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
int instance);
int bpf_program__pin(struct bpf_program *prog, const char *path);
void bpf_program__unload(struct bpf_program *prog);
struct bpf_insn;
......@@ -299,18 +291,15 @@ int bpf_perf_event_read_simple(void *mem, unsigned long size,
void **buf, size_t *buf_len,
bpf_perf_event_print_t fn, void *priv);
struct nlmsghdr;
struct nlattr;
typedef int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, dump_nlmsg_t,
void *cookie);
int bpf_netlink_open(unsigned int *nl_pid);
int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg,
void *cookie);
int nl_get_class(int sock, unsigned int nl_pid, int ifindex,
dump_nlmsg_t dump_class_nlmsg, void *cookie);
int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
dump_nlmsg_t dump_qdisc_nlmsg, void *cookie);
int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
dump_nlmsg_t dump_filter_nlmsg, void *cookie);
#endif
typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
int libbpf_netlink_open(unsigned int *nl_pid);
int libbpf_nl_get_link(int sock, unsigned int nl_pid,
libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie);
int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex,
libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie);
int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie);
int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie);
#endif /* __LIBBPF_LIBBPF_H */
// SPDX-License-Identifier: LGPL-2.1
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
* Copyright (C) 2017 Nicira, Inc.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation;
* version 2.1 of the License (not later!)
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this program; if not, see <http://www.gnu.org/licenses>
*/
#include <stdio.h>
......
// SPDX-License-Identifier: LGPL-2.1
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2018 Facebook */
#include <stdlib.h>
......@@ -18,7 +18,10 @@
#define SOL_NETLINK 270
#endif
int bpf_netlink_open(__u32 *nl_pid)
typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t,
void *cookie);
int libbpf_netlink_open(__u32 *nl_pid)
{
struct sockaddr_nl sa;
socklen_t addrlen;
......@@ -62,7 +65,7 @@ int bpf_netlink_open(__u32 *nl_pid)
}
static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
__dump_nlmsg_t _fn, dump_nlmsg_t fn,
__dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
void *cookie)
{
bool multipart = true;
......@@ -100,7 +103,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
if (!err->error)
continue;
ret = err->error;
nla_dump_errormsg(nh);
libbpf_nla_dump_errormsg(nh);
goto done;
case NLMSG_DONE:
return 0;
......@@ -130,7 +133,7 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
} req;
__u32 nl_pid;
sock = bpf_netlink_open(&nl_pid);
sock = libbpf_netlink_open(&nl_pid);
if (sock < 0)
return sock;
......@@ -178,8 +181,8 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
return ret;
}
static int __dump_link_nlmsg(struct nlmsghdr *nlh, dump_nlmsg_t dump_link_nlmsg,
void *cookie)
static int __dump_link_nlmsg(struct nlmsghdr *nlh,
libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
{
struct nlattr *tb[IFLA_MAX + 1], *attr;
struct ifinfomsg *ifi = NLMSG_DATA(nlh);
......@@ -187,14 +190,14 @@ static int __dump_link_nlmsg(struct nlmsghdr *nlh, dump_nlmsg_t dump_link_nlmsg,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
attr = (struct nlattr *) ((void *) ifi + NLMSG_ALIGN(sizeof(*ifi)));
if (nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0)
if (libbpf_nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE;
return dump_link_nlmsg(cookie, ifi, tb);
}
int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg,
void *cookie)
int libbpf_nl_get_link(int sock, unsigned int nl_pid,
libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
{
struct {
struct nlmsghdr nlh;
......@@ -216,7 +219,8 @@ int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg,
}
static int __dump_class_nlmsg(struct nlmsghdr *nlh,
dump_nlmsg_t dump_class_nlmsg, void *cookie)
libbpf_dump_nlmsg_t dump_class_nlmsg,
void *cookie)
{
struct nlattr *tb[TCA_MAX + 1], *attr;
struct tcmsg *t = NLMSG_DATA(nlh);
......@@ -224,14 +228,14 @@ static int __dump_class_nlmsg(struct nlmsghdr *nlh,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE;
return dump_class_nlmsg(cookie, t, tb);
}
int nl_get_class(int sock, unsigned int nl_pid, int ifindex,
dump_nlmsg_t dump_class_nlmsg, void *cookie)
int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex,
libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie)
{
struct {
struct nlmsghdr nlh;
......@@ -254,7 +258,8 @@ int nl_get_class(int sock, unsigned int nl_pid, int ifindex,
}
static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh,
dump_nlmsg_t dump_qdisc_nlmsg, void *cookie)
libbpf_dump_nlmsg_t dump_qdisc_nlmsg,
void *cookie)
{
struct nlattr *tb[TCA_MAX + 1], *attr;
struct tcmsg *t = NLMSG_DATA(nlh);
......@@ -262,14 +267,14 @@ static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE;
return dump_qdisc_nlmsg(cookie, t, tb);
}
int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
dump_nlmsg_t dump_qdisc_nlmsg, void *cookie)
int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie)
{
struct {
struct nlmsghdr nlh;
......@@ -292,7 +297,8 @@ int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
}
static int __dump_filter_nlmsg(struct nlmsghdr *nlh,
dump_nlmsg_t dump_filter_nlmsg, void *cookie)
libbpf_dump_nlmsg_t dump_filter_nlmsg,
void *cookie)
{
struct nlattr *tb[TCA_MAX + 1], *attr;
struct tcmsg *t = NLMSG_DATA(nlh);
......@@ -300,14 +306,14 @@ static int __dump_filter_nlmsg(struct nlmsghdr *nlh,
len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
return -LIBBPF_ERRNO__NLPARSE;
return dump_filter_nlmsg(cookie, t, tb);
}
int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
dump_nlmsg_t dump_filter_nlmsg, void *cookie)
int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie)
{
struct {
struct nlmsghdr nlh;
......
This diff is collapsed.
This diff is collapsed.
// SPDX-License-Identifier: LGPL-2.1
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
#undef _GNU_SOURCE
#include <string.h>
#include <stdio.h>
......@@ -9,7 +9,7 @@
* libc, while checking strerror_r() return to avoid having to check this in
* all places calling it.
*/
char *str_error(int err, char *dst, int len)
char *libbpf_strerror_r(int err, char *dst, int len)
{
int ret = strerror_r(err, dst, len);
if (ret)
......
// SPDX-License-Identifier: LGPL-2.1
#ifndef BPF_STR_ERROR
#define BPF_STR_ERROR
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
#ifndef __LIBBPF_STR_ERROR_H
#define __LIBBPF_STR_ERROR_H
char *str_error(int err, char *dst, int len);
#endif // BPF_STR_ERROR
char *libbpf_strerror_r(int err, char *dst, int len);
#endif /* __LIBBPF_STR_ERROR_H */
......@@ -23,7 +23,8 @@ $(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \
test_socket_cookie test_cgroup_storage test_select_reuseport
test_socket_cookie test_cgroup_storage test_select_reuseport test_section_names \
test_netcnt
TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \
......@@ -35,7 +36,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o test_sk_lookup_kern.o
# Order correspond to 'make run_tests' order
TEST_PROGS := test_kmod.sh \
......@@ -72,6 +73,7 @@ $(OUTPUT)/test_tcpbpf_user: cgroup_helpers.c
$(OUTPUT)/test_progs: trace_helpers.c
$(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
$(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
$(OUTPUT)/test_netcnt: cgroup_helpers.c
.PHONY: force
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -158,11 +158,7 @@ static int run_test(int cgfd)
bpf_object__for_each_program(prog, pobj) {
prog_name = bpf_program__title(prog, /*needs_copy*/ false);
if (strcmp(prog_name, "cgroup/connect6") == 0) {
attach_type = BPF_CGROUP_INET6_CONNECT;
} else if (strcmp(prog_name, "sockops") == 0) {
attach_type = BPF_CGROUP_SOCK_OPS;
} else {
if (libbpf_attach_type_by_name(prog_name, &attach_type)) {
log_err("Unexpected prog: %s", prog_name);
goto err;
}
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment