Commit 2ec1899e authored by Alexei Starovoitov's avatar Alexei Starovoitov

Merge branch 'bpf-sockopt-hooks'

Stanislav Fomichev says:

====================
This series implements two new per-cgroup hooks: getsockopt and
setsockopt along with a new sockopt program type. The idea is pretty
similar to recently introduced cgroup sysctl hooks, but
implementation is simpler (no need to convert to/from strings).

What this can be applied to:
* move business logic of what tos/priority/etc can be set by
  containers (either pass or reject)
* handle existing options (or introduce new ones) differently by
  propagating some information in cgroup/socket local storage

Compared to a simple syscall/{g,s}etsockopt tracepoint, those
hooks are context aware. Meaning, they can access underlying socket
and use cgroup and socket local storage.

v9:
* allow overwriting setsocktop arguments (Alexei Starovoitov)
  (see individual changes for more changelog details)
====================
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parents 3b1c667e f6d08d9d
...@@ -42,6 +42,7 @@ Program types ...@@ -42,6 +42,7 @@ Program types
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
prog_cgroup_sockopt
prog_cgroup_sysctl prog_cgroup_sysctl
prog_flow_dissector prog_flow_dissector
......
.. SPDX-License-Identifier: GPL-2.0
============================
BPF_PROG_TYPE_CGROUP_SOCKOPT
============================
``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
cgroup hooks:
* ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
system call.
* ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
system call.
The context (``struct bpf_sockopt``) has associated socket (``sk``) and
all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.
BPF_CGROUP_SETSOCKOPT
=====================
``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of
sockopt and it has writable context: it can modify the supplied arguments
before passing them down to the kernel. This hook has access to the cgroup
and socket local storage.
If BPF program sets ``optlen`` to -1, the control will be returned
back to the userspace after all other BPF programs in the cgroup
chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed).
Note, that ``optlen`` can not be increased beyond the user-supplied
value. It can only be decreased or set to -1. Any other value will
trigger ``EFAULT``.
Return Type
-----------
* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
* ``1`` - success, continue with next BPF program in the cgroup chain.
BPF_CGROUP_GETSOCKOPT
=====================
``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of
sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
if it's interested in whatever kernel has returned. BPF hook can override
the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
has been increased above initial ``getsockopt`` value (i.e. userspace
buffer is too small), ``EFAULT`` is returned.
This hook has access to the cgroup and socket local storage.
Note, that the only acceptable value to set to ``retval`` is 0 and the
original value that the kernel returned. Any other value will trigger
``EFAULT``.
Return Type
-----------
* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
* ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return
``retval`` from the syscall (note that this can be overwritten by
the BPF program from the parent cgroup).
Cgroup Inheritance
==================
Suppose, there is the following cgroup hierarchy where each cgroup
has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with
``BPF_F_ALLOW_MULTI`` flag::
A (root, parent)
\
B (child)
When the application calls ``getsockopt`` syscall from the cgroup B,
the programs are executed from the bottom up: B, A. First program
(B) sees the result of kernel's ``getsockopt``. It can optionally
adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that
control will be passed to the second (A) program which will see the
same context as B including any potential modifications.
Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to
A and B, the trigger order is B, then A. If B does any changes
to the input arguments (``level``, ``optname``, ``optval``, ``optlen``),
then the next program in the chain (A) will see those changes,
*not* the original input ``setsockopt`` arguments. The potentially
modified values will be then passed down to the kernel.
Example
=======
See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
of BPF program that handles socket options.
...@@ -124,6 +124,14 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, ...@@ -124,6 +124,14 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
loff_t *ppos, void **new_buf, loff_t *ppos, void **new_buf,
enum bpf_attach_type type); enum bpf_attach_type type);
int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
int *optname, char __user *optval,
int *optlen, char **kernel_optval);
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
int optname, char __user *optval,
int __user *optlen, int max_optlen,
int retval);
static inline enum bpf_cgroup_storage_type cgroup_storage_type( static inline enum bpf_cgroup_storage_type cgroup_storage_type(
struct bpf_map *map) struct bpf_map *map)
{ {
...@@ -286,6 +294,38 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, ...@@ -286,6 +294,38 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
__ret; \ __ret; \
}) })
#define BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock, level, optname, optval, optlen, \
kernel_optval) \
({ \
int __ret = 0; \
if (cgroup_bpf_enabled) \
__ret = __cgroup_bpf_run_filter_setsockopt(sock, level, \
optname, optval, \
optlen, \
kernel_optval); \
__ret; \
})
#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen) \
({ \
int __ret = 0; \
if (cgroup_bpf_enabled) \
get_user(__ret, optlen); \
__ret; \
})
#define BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock, level, optname, optval, optlen, \
max_optlen, retval) \
({ \
int __ret = retval; \
if (cgroup_bpf_enabled) \
__ret = __cgroup_bpf_run_filter_getsockopt(sock, level, \
optname, optval, \
optlen, max_optlen, \
retval); \
__ret; \
})
int cgroup_bpf_prog_attach(const union bpf_attr *attr, int cgroup_bpf_prog_attach(const union bpf_attr *attr,
enum bpf_prog_type ptype, struct bpf_prog *prog); enum bpf_prog_type ptype, struct bpf_prog *prog);
int cgroup_bpf_prog_detach(const union bpf_attr *attr, int cgroup_bpf_prog_detach(const union bpf_attr *attr,
...@@ -357,6 +397,11 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map, ...@@ -357,6 +397,11 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; }) #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; }) #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
#define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos,nbuf) ({ 0; }) #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos,nbuf) ({ 0; })
#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen) ({ 0; })
#define BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock, level, optname, optval, \
optlen, max_optlen, retval) ({ retval; })
#define BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock, level, optname, optval, optlen, \
kernel_optval) ({ 0; })
#define for_each_cgroup_storage_type(stype) for (; false; ) #define for_each_cgroup_storage_type(stype) for (; false; )
......
...@@ -518,6 +518,7 @@ struct bpf_prog_array { ...@@ -518,6 +518,7 @@ struct bpf_prog_array {
struct bpf_prog_array *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags); struct bpf_prog_array *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags);
void bpf_prog_array_free(struct bpf_prog_array *progs); void bpf_prog_array_free(struct bpf_prog_array *progs);
int bpf_prog_array_length(struct bpf_prog_array *progs); int bpf_prog_array_length(struct bpf_prog_array *progs);
bool bpf_prog_array_is_empty(struct bpf_prog_array *array);
int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs, int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs,
__u32 __user *prog_ids, u32 cnt); __u32 __user *prog_ids, u32 cnt);
...@@ -1051,6 +1052,7 @@ extern const struct bpf_func_proto bpf_spin_unlock_proto; ...@@ -1051,6 +1052,7 @@ extern const struct bpf_func_proto bpf_spin_unlock_proto;
extern const struct bpf_func_proto bpf_get_local_storage_proto; extern const struct bpf_func_proto bpf_get_local_storage_proto;
extern const struct bpf_func_proto bpf_strtol_proto; extern const struct bpf_func_proto bpf_strtol_proto;
extern const struct bpf_func_proto bpf_strtoul_proto; extern const struct bpf_func_proto bpf_strtoul_proto;
extern const struct bpf_func_proto bpf_tcp_sock_proto;
/* Shared helpers among cBPF and eBPF. */ /* Shared helpers among cBPF and eBPF. */
void bpf_user_rnd_init_once(void); void bpf_user_rnd_init_once(void);
......
...@@ -30,6 +30,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable) ...@@ -30,6 +30,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable)
#ifdef CONFIG_CGROUP_BPF #ifdef CONFIG_CGROUP_BPF
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev) BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SYSCTL, cg_sysctl) BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SYSCTL, cg_sysctl)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCKOPT, cg_sockopt)
#endif #endif
#ifdef CONFIG_BPF_LIRC_MODE2 #ifdef CONFIG_BPF_LIRC_MODE2
BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2) BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)
......
...@@ -1199,4 +1199,14 @@ struct bpf_sysctl_kern { ...@@ -1199,4 +1199,14 @@ struct bpf_sysctl_kern {
u64 tmp_reg; u64 tmp_reg;
}; };
struct bpf_sockopt_kern {
struct sock *sk;
u8 *optval;
u8 *optval_end;
s32 level;
s32 optname;
s32 optlen;
s32 retval;
};
#endif /* __LINUX_FILTER_H__ */ #endif /* __LINUX_FILTER_H__ */
...@@ -170,6 +170,7 @@ enum bpf_prog_type { ...@@ -170,6 +170,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_FLOW_DISSECTOR, BPF_PROG_TYPE_FLOW_DISSECTOR,
BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_PROG_TYPE_CGROUP_SYSCTL,
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
BPF_PROG_TYPE_CGROUP_SOCKOPT,
}; };
enum bpf_attach_type { enum bpf_attach_type {
...@@ -194,6 +195,8 @@ enum bpf_attach_type { ...@@ -194,6 +195,8 @@ enum bpf_attach_type {
BPF_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL,
BPF_CGROUP_UDP4_RECVMSG, BPF_CGROUP_UDP4_RECVMSG,
BPF_CGROUP_UDP6_RECVMSG, BPF_CGROUP_UDP6_RECVMSG,
BPF_CGROUP_GETSOCKOPT,
BPF_CGROUP_SETSOCKOPT,
__MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
}; };
...@@ -3541,4 +3544,15 @@ struct bpf_sysctl { ...@@ -3541,4 +3544,15 @@ struct bpf_sysctl {
*/ */
}; };
struct bpf_sockopt {
__bpf_md_ptr(struct bpf_sock *, sk);
__bpf_md_ptr(void *, optval);
__bpf_md_ptr(void *, optval_end);
__s32 level;
__s32 optname;
__s32 optlen;
__s32 retval;
};
#endif /* _UAPI__LINUX_BPF_H__ */ #endif /* _UAPI__LINUX_BPF_H__ */
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
#include <linux/bpf.h> #include <linux/bpf.h>
#include <linux/bpf-cgroup.h> #include <linux/bpf-cgroup.h>
#include <net/sock.h> #include <net/sock.h>
#include <net/bpf_sk_storage.h>
#include "../cgroup/cgroup-internal.h" #include "../cgroup/cgroup-internal.h"
...@@ -938,6 +939,188 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, ...@@ -938,6 +939,188 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
} }
EXPORT_SYMBOL(__cgroup_bpf_run_filter_sysctl); EXPORT_SYMBOL(__cgroup_bpf_run_filter_sysctl);
static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
enum bpf_attach_type attach_type)
{
struct bpf_prog_array *prog_array;
bool empty;
rcu_read_lock();
prog_array = rcu_dereference(cgrp->bpf.effective[attach_type]);
empty = bpf_prog_array_is_empty(prog_array);
rcu_read_unlock();
return empty;
}
static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
{
if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
return -EINVAL;
ctx->optval = kzalloc(max_optlen, GFP_USER);
if (!ctx->optval)
return -ENOMEM;
ctx->optval_end = ctx->optval + max_optlen;
ctx->optlen = max_optlen;
return 0;
}
static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
{
kfree(ctx->optval);
}
int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
int *optname, char __user *optval,
int *optlen, char **kernel_optval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
struct bpf_sockopt_kern ctx = {
.sk = sk,
.level = *level,
.optname = *optname,
};
int ret;
/* Opportunistic check to see whether we have any BPF program
* attached to the hook so we don't waste time allocating
* memory and locking the socket.
*/
if (!cgroup_bpf_enabled ||
__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_SETSOCKOPT))
return 0;
ret = sockopt_alloc_buf(&ctx, *optlen);
if (ret)
return ret;
if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
ret = -EFAULT;
goto out;
}
lock_sock(sk);
ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_SETSOCKOPT],
&ctx, BPF_PROG_RUN);
release_sock(sk);
if (!ret) {
ret = -EPERM;
goto out;
}
if (ctx.optlen == -1) {
/* optlen set to -1, bypass kernel */
ret = 1;
} else if (ctx.optlen > *optlen || ctx.optlen < -1) {
/* optlen is out of bounds */
ret = -EFAULT;
} else {
/* optlen within bounds, run kernel handler */
ret = 0;
/* export any potential modifications */
*level = ctx.level;
*optname = ctx.optname;
*optlen = ctx.optlen;
*kernel_optval = ctx.optval;
}
out:
if (ret)
sockopt_free_buf(&ctx);
return ret;
}
EXPORT_SYMBOL(__cgroup_bpf_run_filter_setsockopt);
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
int optname, char __user *optval,
int __user *optlen, int max_optlen,
int retval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
struct bpf_sockopt_kern ctx = {
.sk = sk,
.level = level,
.optname = optname,
.retval = retval,
};
int ret;
/* Opportunistic check to see whether we have any BPF program
* attached to the hook so we don't waste time allocating
* memory and locking the socket.
*/
if (!cgroup_bpf_enabled ||
__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_GETSOCKOPT))
return retval;
ret = sockopt_alloc_buf(&ctx, max_optlen);
if (ret)
return ret;
if (!retval) {
/* If kernel getsockopt finished successfully,
* copy whatever was returned to the user back
* into our temporary buffer. Set optlen to the
* one that kernel returned as well to let
* BPF programs inspect the value.
*/
if (get_user(ctx.optlen, optlen)) {
ret = -EFAULT;
goto out;
}
if (ctx.optlen > max_optlen)
ctx.optlen = max_optlen;
if (copy_from_user(ctx.optval, optval, ctx.optlen) != 0) {
ret = -EFAULT;
goto out;
}
}
lock_sock(sk);
ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
&ctx, BPF_PROG_RUN);
release_sock(sk);
if (!ret) {
ret = -EPERM;
goto out;
}
if (ctx.optlen > max_optlen) {
ret = -EFAULT;
goto out;
}
/* BPF programs only allowed to set retval to 0, not some
* arbitrary value.
*/
if (ctx.retval != 0 && ctx.retval != retval) {
ret = -EFAULT;
goto out;
}
if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
put_user(ctx.optlen, optlen)) {
ret = -EFAULT;
goto out;
}
ret = ctx.retval;
out:
sockopt_free_buf(&ctx);
return ret;
}
EXPORT_SYMBOL(__cgroup_bpf_run_filter_getsockopt);
static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp, static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp,
size_t *lenp) size_t *lenp)
{ {
...@@ -1198,3 +1381,153 @@ const struct bpf_verifier_ops cg_sysctl_verifier_ops = { ...@@ -1198,3 +1381,153 @@ const struct bpf_verifier_ops cg_sysctl_verifier_ops = {
const struct bpf_prog_ops cg_sysctl_prog_ops = { const struct bpf_prog_ops cg_sysctl_prog_ops = {
}; };
static const struct bpf_func_proto *
cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
case BPF_FUNC_sk_storage_get:
return &bpf_sk_storage_get_proto;
case BPF_FUNC_sk_storage_delete:
return &bpf_sk_storage_delete_proto;
#ifdef CONFIG_INET
case BPF_FUNC_tcp_sock:
return &bpf_tcp_sock_proto;
#endif
default:
return cgroup_base_func_proto(func_id, prog);
}
}
static bool cg_sockopt_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
const int size_default = sizeof(__u32);
if (off < 0 || off >= sizeof(struct bpf_sockopt))
return false;
if (off % size != 0)
return false;
if (type == BPF_WRITE) {
switch (off) {
case offsetof(struct bpf_sockopt, retval):
if (size != size_default)
return false;
return prog->expected_attach_type ==
BPF_CGROUP_GETSOCKOPT;
case offsetof(struct bpf_sockopt, optname):
/* fallthrough */
case offsetof(struct bpf_sockopt, level):
if (size != size_default)
return false;
return prog->expected_attach_type ==
BPF_CGROUP_SETSOCKOPT;
case offsetof(struct bpf_sockopt, optlen):
return size == size_default;
default:
return false;
}
}
switch (off) {
case offsetof(struct bpf_sockopt, sk):
if (size != sizeof(__u64))
return false;
info->reg_type = PTR_TO_SOCKET;
break;
case offsetof(struct bpf_sockopt, optval):
if (size != sizeof(__u64))
return false;
info->reg_type = PTR_TO_PACKET;
break;
case offsetof(struct bpf_sockopt, optval_end):
if (size != sizeof(__u64))
return false;
info->reg_type = PTR_TO_PACKET_END;
break;
case offsetof(struct bpf_sockopt, retval):
if (size != size_default)
return false;
return prog->expected_attach_type == BPF_CGROUP_GETSOCKOPT;
default:
if (size != size_default)
return false;
break;
}
return true;
}
#define CG_SOCKOPT_ACCESS_FIELD(T, F) \
T(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F), \
si->dst_reg, si->src_reg, \
offsetof(struct bpf_sockopt_kern, F))
static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog,
u32 *target_size)
{
struct bpf_insn *insn = insn_buf;
switch (si->off) {
case offsetof(struct bpf_sockopt, sk):
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, sk);
break;
case offsetof(struct bpf_sockopt, level):
if (type == BPF_WRITE)
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, level);
else
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, level);
break;
case offsetof(struct bpf_sockopt, optname):
if (type == BPF_WRITE)
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optname);
else
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optname);
break;
case offsetof(struct bpf_sockopt, optlen):
if (type == BPF_WRITE)
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optlen);
else
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optlen);
break;
case offsetof(struct bpf_sockopt, retval):
if (type == BPF_WRITE)
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, retval);
else
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, retval);
break;
case offsetof(struct bpf_sockopt, optval):
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval);
break;
case offsetof(struct bpf_sockopt, optval_end):
*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval_end);
break;
}
return insn - insn_buf;
}
static int cg_sockopt_get_prologue(struct bpf_insn *insn_buf,
bool direct_write,
const struct bpf_prog *prog)
{
/* Nothing to do for sockopt argument. The data is kzalloc'ated.
*/
return 0;
}
const struct bpf_verifier_ops cg_sockopt_verifier_ops = {
.get_func_proto = cg_sockopt_func_proto,
.is_valid_access = cg_sockopt_is_valid_access,
.convert_ctx_access = cg_sockopt_convert_ctx_access,
.gen_prologue = cg_sockopt_get_prologue,
};
const struct bpf_prog_ops cg_sockopt_prog_ops = {
};
...@@ -1809,6 +1809,15 @@ int bpf_prog_array_length(struct bpf_prog_array *array) ...@@ -1809,6 +1809,15 @@ int bpf_prog_array_length(struct bpf_prog_array *array)
return cnt; return cnt;
} }
bool bpf_prog_array_is_empty(struct bpf_prog_array *array)
{
struct bpf_prog_array_item *item;
for (item = array->items; item->prog; item++)
if (item->prog != &dummy_bpf_prog.prog)
return false;
return true;
}
static bool bpf_prog_array_copy_core(struct bpf_prog_array *array, static bool bpf_prog_array_copy_core(struct bpf_prog_array *array,
u32 *prog_ids, u32 *prog_ids,
......
...@@ -1590,6 +1590,14 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type, ...@@ -1590,6 +1590,14 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type,
default: default:
return -EINVAL; return -EINVAL;
} }
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
switch (expected_attach_type) {
case BPF_CGROUP_SETSOCKOPT:
case BPF_CGROUP_GETSOCKOPT:
return 0;
default:
return -EINVAL;
}
default: default:
return 0; return 0;
} }
...@@ -1840,6 +1848,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog, ...@@ -1840,6 +1848,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
switch (prog->type) { switch (prog->type) {
case BPF_PROG_TYPE_CGROUP_SOCK: case BPF_PROG_TYPE_CGROUP_SOCK:
case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
return attach_type == prog->expected_attach_type ? 0 : -EINVAL; return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
case BPF_PROG_TYPE_CGROUP_SKB: case BPF_PROG_TYPE_CGROUP_SKB:
return prog->enforce_expected_attach_type && return prog->enforce_expected_attach_type &&
...@@ -1912,6 +1921,10 @@ static int bpf_prog_attach(const union bpf_attr *attr) ...@@ -1912,6 +1921,10 @@ static int bpf_prog_attach(const union bpf_attr *attr)
case BPF_CGROUP_SYSCTL: case BPF_CGROUP_SYSCTL:
ptype = BPF_PROG_TYPE_CGROUP_SYSCTL; ptype = BPF_PROG_TYPE_CGROUP_SYSCTL;
break; break;
case BPF_CGROUP_GETSOCKOPT:
case BPF_CGROUP_SETSOCKOPT:
ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
break;
default: default:
return -EINVAL; return -EINVAL;
} }
...@@ -1995,6 +2008,10 @@ static int bpf_prog_detach(const union bpf_attr *attr) ...@@ -1995,6 +2008,10 @@ static int bpf_prog_detach(const union bpf_attr *attr)
case BPF_CGROUP_SYSCTL: case BPF_CGROUP_SYSCTL:
ptype = BPF_PROG_TYPE_CGROUP_SYSCTL; ptype = BPF_PROG_TYPE_CGROUP_SYSCTL;
break; break;
case BPF_CGROUP_GETSOCKOPT:
case BPF_CGROUP_SETSOCKOPT:
ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
break;
default: default:
return -EINVAL; return -EINVAL;
} }
...@@ -2031,6 +2048,8 @@ static int bpf_prog_query(const union bpf_attr *attr, ...@@ -2031,6 +2048,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
case BPF_CGROUP_SOCK_OPS: case BPF_CGROUP_SOCK_OPS:
case BPF_CGROUP_DEVICE: case BPF_CGROUP_DEVICE:
case BPF_CGROUP_SYSCTL: case BPF_CGROUP_SYSCTL:
case BPF_CGROUP_GETSOCKOPT:
case BPF_CGROUP_SETSOCKOPT:
break; break;
case BPF_LIRC_MODE2: case BPF_LIRC_MODE2:
return lirc_prog_query(attr, uattr); return lirc_prog_query(attr, uattr);
......
...@@ -2215,6 +2215,13 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env, ...@@ -2215,6 +2215,13 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
env->seen_direct_write = true; env->seen_direct_write = true;
return true; return true;
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
if (t == BPF_WRITE)
env->seen_direct_write = true;
return true;
default: default:
return false; return false;
} }
...@@ -6066,6 +6073,7 @@ static int check_return_code(struct bpf_verifier_env *env) ...@@ -6066,6 +6073,7 @@ static int check_return_code(struct bpf_verifier_env *env)
case BPF_PROG_TYPE_SOCK_OPS: case BPF_PROG_TYPE_SOCK_OPS:
case BPF_PROG_TYPE_CGROUP_DEVICE: case BPF_PROG_TYPE_CGROUP_DEVICE:
case BPF_PROG_TYPE_CGROUP_SYSCTL: case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
break; break;
default: default:
return 0; return 0;
......
...@@ -5651,7 +5651,7 @@ BPF_CALL_1(bpf_tcp_sock, struct sock *, sk) ...@@ -5651,7 +5651,7 @@ BPF_CALL_1(bpf_tcp_sock, struct sock *, sk)
return (unsigned long)NULL; return (unsigned long)NULL;
} }
static const struct bpf_func_proto bpf_tcp_sock_proto = { const struct bpf_func_proto bpf_tcp_sock_proto = {
.func = bpf_tcp_sock, .func = bpf_tcp_sock,
.gpl_only = false, .gpl_only = false,
.ret_type = RET_PTR_TO_TCP_SOCK_OR_NULL, .ret_type = RET_PTR_TO_TCP_SOCK_OR_NULL,
......
...@@ -2051,6 +2051,8 @@ SYSCALL_DEFINE4(recv, int, fd, void __user *, ubuf, size_t, size, ...@@ -2051,6 +2051,8 @@ SYSCALL_DEFINE4(recv, int, fd, void __user *, ubuf, size_t, size,
static int __sys_setsockopt(int fd, int level, int optname, static int __sys_setsockopt(int fd, int level, int optname,
char __user *optval, int optlen) char __user *optval, int optlen)
{ {
mm_segment_t oldfs = get_fs();
char *kernel_optval = NULL;
int err, fput_needed; int err, fput_needed;
struct socket *sock; struct socket *sock;
...@@ -2063,6 +2065,22 @@ static int __sys_setsockopt(int fd, int level, int optname, ...@@ -2063,6 +2065,22 @@ static int __sys_setsockopt(int fd, int level, int optname,
if (err) if (err)
goto out_put; goto out_put;
err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level,
&optname, optval, &optlen,
&kernel_optval);
if (err < 0) {
goto out_put;
} else if (err > 0) {
err = 0;
goto out_put;
}
if (kernel_optval) {
set_fs(KERNEL_DS);
optval = (char __user __force *)kernel_optval;
}
if (level == SOL_SOCKET) if (level == SOL_SOCKET)
err = err =
sock_setsockopt(sock, level, optname, optval, sock_setsockopt(sock, level, optname, optval,
...@@ -2071,6 +2089,11 @@ static int __sys_setsockopt(int fd, int level, int optname, ...@@ -2071,6 +2089,11 @@ static int __sys_setsockopt(int fd, int level, int optname,
err = err =
sock->ops->setsockopt(sock, level, optname, optval, sock->ops->setsockopt(sock, level, optname, optval,
optlen); optlen);
if (kernel_optval) {
set_fs(oldfs);
kfree(kernel_optval);
}
out_put: out_put:
fput_light(sock->file, fput_needed); fput_light(sock->file, fput_needed);
} }
...@@ -2093,6 +2116,7 @@ static int __sys_getsockopt(int fd, int level, int optname, ...@@ -2093,6 +2116,7 @@ static int __sys_getsockopt(int fd, int level, int optname,
{ {
int err, fput_needed; int err, fput_needed;
struct socket *sock; struct socket *sock;
int max_optlen;
sock = sockfd_lookup_light(fd, &err, &fput_needed); sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (sock != NULL) { if (sock != NULL) {
...@@ -2100,6 +2124,8 @@ static int __sys_getsockopt(int fd, int level, int optname, ...@@ -2100,6 +2124,8 @@ static int __sys_getsockopt(int fd, int level, int optname,
if (err) if (err)
goto out_put; goto out_put;
max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
if (level == SOL_SOCKET) if (level == SOL_SOCKET)
err = err =
sock_getsockopt(sock, level, optname, optval, sock_getsockopt(sock, level, optname, optval,
...@@ -2108,6 +2134,10 @@ static int __sys_getsockopt(int fd, int level, int optname, ...@@ -2108,6 +2134,10 @@ static int __sys_getsockopt(int fd, int level, int optname,
err = err =
sock->ops->getsockopt(sock, level, optname, optval, sock->ops->getsockopt(sock, level, optname, optval,
optlen); optlen);
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
optval, optlen,
max_optlen, err);
out_put: out_put:
fput_light(sock->file, fput_needed); fput_light(sock->file, fput_needed);
} }
......
...@@ -29,7 +29,8 @@ CGROUP COMMANDS ...@@ -29,7 +29,8 @@ CGROUP COMMANDS
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* } | *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
| *ATTACH_TYPE* := { **ingress** | **egress** | **sock_create** | **sock_ops** | **device** | | *ATTACH_TYPE* := { **ingress** | **egress** | **sock_create** | **sock_ops** | **device** |
| **bind4** | **bind6** | **post_bind4** | **post_bind6** | **connect4** | **connect6** | | **bind4** | **bind6** | **post_bind4** | **post_bind6** | **connect4** | **connect6** |
| **sendmsg4** | **sendmsg6** | **recvmsg4** | **recvmsg6** | **sysctl** } | **sendmsg4** | **sendmsg6** | **recvmsg4** | **recvmsg6** | **sysctl** |
| **getsockopt** | **setsockopt** }
| *ATTACH_FLAGS* := { **multi** | **override** } | *ATTACH_FLAGS* := { **multi** | **override** }
DESCRIPTION DESCRIPTION
...@@ -90,7 +91,9 @@ DESCRIPTION ...@@ -90,7 +91,9 @@ DESCRIPTION
an unconnected udp4 socket (since 5.2); an unconnected udp4 socket (since 5.2);
**recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for **recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
an unconnected udp6 socket (since 5.2); an unconnected udp6 socket (since 5.2);
**sysctl** sysctl access (since 5.2). **sysctl** sysctl access (since 5.2);
**getsockopt** call to getsockopt (since 5.3);
**setsockopt** call to setsockopt (since 5.3).
**bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG* **bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
Detach *PROG* from the cgroup *CGROUP* and attach type Detach *PROG* from the cgroup *CGROUP* and attach type
......
...@@ -40,7 +40,8 @@ PROG COMMANDS ...@@ -40,7 +40,8 @@ PROG COMMANDS
| **lwt_seg6local** | **sockops** | **sk_skb** | **sk_msg** | **lirc_mode2** | | **lwt_seg6local** | **sockops** | **sk_skb** | **sk_msg** | **lirc_mode2** |
| **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** | | **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
| **cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6** | | **cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6** |
| **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/sysctl** | **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/sysctl** |
| **cgroup/getsockopt** | **cgroup/setsockopt**
| } | }
| *ATTACH_TYPE* := { | *ATTACH_TYPE* := {
| **msg_verdict** | **stream_verdict** | **stream_parser** | **flow_dissector** | **msg_verdict** | **stream_verdict** | **stream_parser** | **flow_dissector**
......
...@@ -379,7 +379,8 @@ _bpftool() ...@@ -379,7 +379,8 @@ _bpftool()
cgroup/sendmsg4 cgroup/sendmsg6 \ cgroup/sendmsg4 cgroup/sendmsg6 \
cgroup/recvmsg4 cgroup/recvmsg6 \ cgroup/recvmsg4 cgroup/recvmsg6 \
cgroup/post_bind4 cgroup/post_bind6 \ cgroup/post_bind4 cgroup/post_bind6 \
cgroup/sysctl" -- \ cgroup/sysctl cgroup/getsockopt \
cgroup/setsockopt" -- \
"$cur" ) ) "$cur" ) )
return 0 return 0
;; ;;
...@@ -689,7 +690,8 @@ _bpftool() ...@@ -689,7 +690,8 @@ _bpftool()
attach|detach) attach|detach)
local ATTACH_TYPES='ingress egress sock_create sock_ops \ local ATTACH_TYPES='ingress egress sock_create sock_ops \
device bind4 bind6 post_bind4 post_bind6 connect4 \ device bind4 bind6 post_bind4 post_bind6 connect4 \
connect6 sendmsg4 sendmsg6 recvmsg4 recvmsg6 sysctl' connect6 sendmsg4 sendmsg6 recvmsg4 recvmsg6 sysctl \
getsockopt setsockopt'
local ATTACH_FLAGS='multi override' local ATTACH_FLAGS='multi override'
local PROG_TYPE='id pinned tag' local PROG_TYPE='id pinned tag'
case $prev in case $prev in
...@@ -699,7 +701,8 @@ _bpftool() ...@@ -699,7 +701,8 @@ _bpftool()
;; ;;
ingress|egress|sock_create|sock_ops|device|bind4|bind6|\ ingress|egress|sock_create|sock_ops|device|bind4|bind6|\
post_bind4|post_bind6|connect4|connect6|sendmsg4|\ post_bind4|post_bind6|connect4|connect6|sendmsg4|\
sendmsg6|recvmsg4|recvmsg6|sysctl) sendmsg6|recvmsg4|recvmsg6|sysctl|getsockopt|\
setsockopt)
COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \ COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \
"$cur" ) ) "$cur" ) )
return 0 return 0
......
...@@ -26,7 +26,8 @@ ...@@ -26,7 +26,8 @@
" sock_ops | device | bind4 | bind6 |\n" \ " sock_ops | device | bind4 | bind6 |\n" \
" post_bind4 | post_bind6 | connect4 |\n" \ " post_bind4 | post_bind6 | connect4 |\n" \
" connect6 | sendmsg4 | sendmsg6 |\n" \ " connect6 | sendmsg4 | sendmsg6 |\n" \
" recvmsg4 | recvmsg6 | sysctl }" " recvmsg4 | recvmsg6 | sysctl |\n" \
" getsockopt | setsockopt }"
static const char * const attach_type_strings[] = { static const char * const attach_type_strings[] = {
[BPF_CGROUP_INET_INGRESS] = "ingress", [BPF_CGROUP_INET_INGRESS] = "ingress",
...@@ -45,6 +46,8 @@ static const char * const attach_type_strings[] = { ...@@ -45,6 +46,8 @@ static const char * const attach_type_strings[] = {
[BPF_CGROUP_SYSCTL] = "sysctl", [BPF_CGROUP_SYSCTL] = "sysctl",
[BPF_CGROUP_UDP4_RECVMSG] = "recvmsg4", [BPF_CGROUP_UDP4_RECVMSG] = "recvmsg4",
[BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6", [BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6",
[BPF_CGROUP_GETSOCKOPT] = "getsockopt",
[BPF_CGROUP_SETSOCKOPT] = "setsockopt",
[__MAX_BPF_ATTACH_TYPE] = NULL, [__MAX_BPF_ATTACH_TYPE] = NULL,
}; };
......
...@@ -74,6 +74,7 @@ static const char * const prog_type_name[] = { ...@@ -74,6 +74,7 @@ static const char * const prog_type_name[] = {
[BPF_PROG_TYPE_SK_REUSEPORT] = "sk_reuseport", [BPF_PROG_TYPE_SK_REUSEPORT] = "sk_reuseport",
[BPF_PROG_TYPE_FLOW_DISSECTOR] = "flow_dissector", [BPF_PROG_TYPE_FLOW_DISSECTOR] = "flow_dissector",
[BPF_PROG_TYPE_CGROUP_SYSCTL] = "cgroup_sysctl", [BPF_PROG_TYPE_CGROUP_SYSCTL] = "cgroup_sysctl",
[BPF_PROG_TYPE_CGROUP_SOCKOPT] = "cgroup_sockopt",
}; };
extern const char * const map_type_name[]; extern const char * const map_type_name[];
......
...@@ -1071,7 +1071,8 @@ static int do_help(int argc, char **argv) ...@@ -1071,7 +1071,8 @@ static int do_help(int argc, char **argv)
" cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n" " cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n"
" cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n" " cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
" cgroup/sendmsg4 | cgroup/sendmsg6 | cgroup/recvmsg4 |\n" " cgroup/sendmsg4 | cgroup/sendmsg6 | cgroup/recvmsg4 |\n"
" cgroup/recvmsg6 }\n" " cgroup/recvmsg6 | cgroup/getsockopt |\n"
" cgroup/setsockopt }\n"
" ATTACH_TYPE := { msg_verdict | stream_verdict | stream_parser |\n" " ATTACH_TYPE := { msg_verdict | stream_verdict | stream_parser |\n"
" flow_dissector }\n" " flow_dissector }\n"
" " HELP_SPEC_OPTIONS "\n" " " HELP_SPEC_OPTIONS "\n"
......
...@@ -170,6 +170,7 @@ enum bpf_prog_type { ...@@ -170,6 +170,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_FLOW_DISSECTOR, BPF_PROG_TYPE_FLOW_DISSECTOR,
BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_PROG_TYPE_CGROUP_SYSCTL,
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
BPF_PROG_TYPE_CGROUP_SOCKOPT,
}; };
enum bpf_attach_type { enum bpf_attach_type {
...@@ -194,6 +195,8 @@ enum bpf_attach_type { ...@@ -194,6 +195,8 @@ enum bpf_attach_type {
BPF_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL,
BPF_CGROUP_UDP4_RECVMSG, BPF_CGROUP_UDP4_RECVMSG,
BPF_CGROUP_UDP6_RECVMSG, BPF_CGROUP_UDP6_RECVMSG,
BPF_CGROUP_GETSOCKOPT,
BPF_CGROUP_SETSOCKOPT,
__MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
}; };
...@@ -3541,4 +3544,15 @@ struct bpf_sysctl { ...@@ -3541,4 +3544,15 @@ struct bpf_sysctl {
*/ */
}; };
struct bpf_sockopt {
__bpf_md_ptr(struct bpf_sock *, sk);
__bpf_md_ptr(void *, optval);
__bpf_md_ptr(void *, optval_end);
__s32 level;
__s32 optname;
__s32 optlen;
__s32 retval;
};
#endif /* _UAPI__LINUX_BPF_H__ */ #endif /* _UAPI__LINUX_BPF_H__ */
...@@ -2646,6 +2646,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type) ...@@ -2646,6 +2646,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE: case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
case BPF_PROG_TYPE_PERF_EVENT: case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_CGROUP_SYSCTL: case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
return false; return false;
case BPF_PROG_TYPE_KPROBE: case BPF_PROG_TYPE_KPROBE:
default: default:
...@@ -3604,6 +3605,10 @@ static const struct { ...@@ -3604,6 +3605,10 @@ static const struct {
BPF_CGROUP_UDP6_RECVMSG), BPF_CGROUP_UDP6_RECVMSG),
BPF_EAPROG_SEC("cgroup/sysctl", BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_EAPROG_SEC("cgroup/sysctl", BPF_PROG_TYPE_CGROUP_SYSCTL,
BPF_CGROUP_SYSCTL), BPF_CGROUP_SYSCTL),
BPF_EAPROG_SEC("cgroup/getsockopt", BPF_PROG_TYPE_CGROUP_SOCKOPT,
BPF_CGROUP_GETSOCKOPT),
BPF_EAPROG_SEC("cgroup/setsockopt", BPF_PROG_TYPE_CGROUP_SOCKOPT,
BPF_CGROUP_SETSOCKOPT),
}; };
#undef BPF_PROG_SEC_IMPL #undef BPF_PROG_SEC_IMPL
......
...@@ -101,6 +101,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns, ...@@ -101,6 +101,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
case BPF_PROG_TYPE_SK_REUSEPORT: case BPF_PROG_TYPE_SK_REUSEPORT:
case BPF_PROG_TYPE_FLOW_DISSECTOR: case BPF_PROG_TYPE_FLOW_DISSECTOR:
case BPF_PROG_TYPE_CGROUP_SYSCTL: case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
default: default:
break; break;
} }
......
...@@ -39,3 +39,6 @@ libbpf.so.* ...@@ -39,3 +39,6 @@ libbpf.so.*
test_hashmap test_hashmap
test_btf_dump test_btf_dump
xdping xdping
test_sockopt
test_sockopt_sk
test_sockopt_multi
...@@ -26,7 +26,8 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test ...@@ -26,7 +26,8 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \ test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \
test_cgroup_storage test_select_reuseport test_section_names \ test_cgroup_storage test_select_reuseport test_section_names \
test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \ test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
test_btf_dump test_cgroup_attach xdping test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
test_sockopt_multi
BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c))) BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
TEST_GEN_FILES = $(BPF_OBJ_FILES) TEST_GEN_FILES = $(BPF_OBJ_FILES)
...@@ -103,6 +104,9 @@ $(OUTPUT)/test_netcnt: cgroup_helpers.c ...@@ -103,6 +104,9 @@ $(OUTPUT)/test_netcnt: cgroup_helpers.c
$(OUTPUT)/test_sock_fields: cgroup_helpers.c $(OUTPUT)/test_sock_fields: cgroup_helpers.c
$(OUTPUT)/test_sysctl: cgroup_helpers.c $(OUTPUT)/test_sysctl: cgroup_helpers.c
$(OUTPUT)/test_cgroup_attach: cgroup_helpers.c $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
$(OUTPUT)/test_sockopt: cgroup_helpers.c
$(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
$(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
.PHONY: force .PHONY: force
......
// SPDX-License-Identifier: GPL-2.0
#include <netinet/in.h>
#include <linux/bpf.h>
#include "bpf_helpers.h"
char _license[] SEC("license") = "GPL";
__u32 _version SEC("version") = 1;
SEC("cgroup/getsockopt/child")
int _getsockopt_child(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
if (ctx->level != SOL_IP || ctx->optname != IP_TOS)
return 1;
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
if (optval[0] != 0x80)
return 0; /* EPERM, unexpected optval from the kernel */
ctx->retval = 0; /* Reset system call return value to zero */
optval[0] = 0x90;
ctx->optlen = 1;
return 1;
}
SEC("cgroup/getsockopt/parent")
int _getsockopt_parent(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
if (ctx->level != SOL_IP || ctx->optname != IP_TOS)
return 1;
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
if (optval[0] != 0x90)
return 0; /* EPERM, unexpected optval from the kernel */
ctx->retval = 0; /* Reset system call return value to zero */
optval[0] = 0xA0;
ctx->optlen = 1;
return 1;
}
SEC("cgroup/setsockopt")
int _setsockopt(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
if (ctx->level != SOL_IP || ctx->optname != IP_TOS)
return 1;
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
optval[0] += 0x10;
ctx->optlen = 1;
return 1;
}
// SPDX-License-Identifier: GPL-2.0
#include <netinet/in.h>
#include <linux/bpf.h>
#include "bpf_helpers.h"
char _license[] SEC("license") = "GPL";
__u32 _version SEC("version") = 1;
#define SOL_CUSTOM 0xdeadbeef
struct sockopt_sk {
__u8 val;
};
struct bpf_map_def SEC("maps") socket_storage_map = {
.type = BPF_MAP_TYPE_SK_STORAGE,
.key_size = sizeof(int),
.value_size = sizeof(struct sockopt_sk),
.map_flags = BPF_F_NO_PREALLOC,
};
BPF_ANNOTATE_KV_PAIR(socket_storage_map, int, struct sockopt_sk);
SEC("cgroup/getsockopt")
int _getsockopt(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
struct sockopt_sk *storage;
if (ctx->level == SOL_IP && ctx->optname == IP_TOS)
/* Not interested in SOL_IP:IP_TOS;
* let next BPF program in the cgroup chain or kernel
* handle it.
*/
return 1;
if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) {
/* Not interested in SOL_SOCKET:SO_SNDBUF;
* let next BPF program in the cgroup chain or kernel
* handle it.
*/
return 1;
}
if (ctx->level != SOL_CUSTOM)
return 0; /* EPERM, deny everything except custom level */
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
storage = bpf_sk_storage_get(&socket_storage_map, ctx->sk, 0,
BPF_SK_STORAGE_GET_F_CREATE);
if (!storage)
return 0; /* EPERM, couldn't get sk storage */
if (!ctx->retval)
return 0; /* EPERM, kernel should not have handled
* SOL_CUSTOM, something is wrong!
*/
ctx->retval = 0; /* Reset system call return value to zero */
optval[0] = storage->val;
ctx->optlen = 1;
return 1;
}
SEC("cgroup/setsockopt")
int _setsockopt(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
struct sockopt_sk *storage;
if (ctx->level == SOL_IP && ctx->optname == IP_TOS)
/* Not interested in SOL_IP:IP_TOS;
* let next BPF program in the cgroup chain or kernel
* handle it.
*/
return 1;
if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) {
/* Overwrite SO_SNDBUF value */
if (optval + sizeof(__u32) > optval_end)
return 0; /* EPERM, bounds check */
*(__u32 *)optval = 0x55AA;
ctx->optlen = 4;
return 1;
}
if (ctx->level != SOL_CUSTOM)
return 0; /* EPERM, deny everything except custom level */
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
storage = bpf_sk_storage_get(&socket_storage_map, ctx->sk, 0,
BPF_SK_STORAGE_GET_F_CREATE);
if (!storage)
return 0; /* EPERM, couldn't get sk storage */
storage->val = optval[0];
ctx->optlen = -1; /* BPF has consumed this option, don't call kernel
* setsockopt handler.
*/
return 1;
}
...@@ -134,6 +134,16 @@ static struct sec_name_test tests[] = { ...@@ -134,6 +134,16 @@ static struct sec_name_test tests[] = {
{0, BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL}, {0, BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL},
{0, BPF_CGROUP_SYSCTL}, {0, BPF_CGROUP_SYSCTL},
}, },
{
"cgroup/getsockopt",
{0, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_CGROUP_GETSOCKOPT},
{0, BPF_CGROUP_GETSOCKOPT},
},
{
"cgroup/setsockopt",
{0, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT},
{0, BPF_CGROUP_SETSOCKOPT},
},
}; };
static int test_prog_type_by_name(const struct sec_name_test *test) static int test_prog_type_by_name(const struct sec_name_test *test)
......
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0
#include <error.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/filter.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "bpf_rlimit.h"
#include "bpf_util.h"
#include "cgroup_helpers.h"
static int prog_attach(struct bpf_object *obj, int cgroup_fd, const char *title)
{
enum bpf_attach_type attach_type;
enum bpf_prog_type prog_type;
struct bpf_program *prog;
int err;
err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
if (err) {
log_err("Failed to deduct types for %s BPF program", title);
return -1;
}
prog = bpf_object__find_program_by_title(obj, title);
if (!prog) {
log_err("Failed to find %s BPF program", title);
return -1;
}
err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
attach_type, BPF_F_ALLOW_MULTI);
if (err) {
log_err("Failed to attach %s BPF program", title);
return -1;
}
return 0;
}
static int prog_detach(struct bpf_object *obj, int cgroup_fd, const char *title)
{
enum bpf_attach_type attach_type;
enum bpf_prog_type prog_type;
struct bpf_program *prog;
int err;
err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
if (err)
return -1;
prog = bpf_object__find_program_by_title(obj, title);
if (!prog)
return -1;
err = bpf_prog_detach2(bpf_program__fd(prog), cgroup_fd,
attach_type);
if (err)
return -1;
return 0;
}
static int run_getsockopt_test(struct bpf_object *obj, int cg_parent,
int cg_child, int sock_fd)
{
socklen_t optlen;
__u8 buf;
int err;
/* Set IP_TOS to the expected value (0x80). */
buf = 0x80;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x80) {
log_err("Unexpected getsockopt 0x%x != 0x80 without BPF", buf);
err = -1;
goto detach;
}
/* Attach child program and make sure it returns new value:
* - kernel: -> 0x80
* - child: 0x80 -> 0x90
*/
err = prog_attach(obj, cg_child, "cgroup/getsockopt/child");
if (err)
goto detach;
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x90) {
log_err("Unexpected getsockopt 0x%x != 0x90", buf);
err = -1;
goto detach;
}
/* Attach parent program and make sure it returns new value:
* - kernel: -> 0x80
* - child: 0x80 -> 0x90
* - parent: 0x90 -> 0xA0
*/
err = prog_attach(obj, cg_parent, "cgroup/getsockopt/parent");
if (err)
goto detach;
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0xA0) {
log_err("Unexpected getsockopt 0x%x != 0xA0", buf);
err = -1;
goto detach;
}
/* Setting unexpected initial sockopt should return EPERM:
* - kernel: -> 0x40
* - child: unexpected 0x40, EPERM
* - parent: unexpected 0x40, EPERM
*/
buf = 0x40;
if (setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1) < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (!err) {
log_err("Unexpected success from getsockopt(IP_TOS)");
goto detach;
}
/* Detach child program and make sure we still get EPERM:
* - kernel: -> 0x40
* - parent: unexpected 0x40, EPERM
*/
err = prog_detach(obj, cg_child, "cgroup/getsockopt/child");
if (err) {
log_err("Failed to detach child program");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (!err) {
log_err("Unexpected success from getsockopt(IP_TOS)");
goto detach;
}
/* Set initial value to the one the parent program expects:
* - kernel: -> 0x90
* - parent: 0x90 -> 0xA0
*/
buf = 0x90;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0xA0) {
log_err("Unexpected getsockopt 0x%x != 0xA0", buf);
err = -1;
goto detach;
}
detach:
prog_detach(obj, cg_child, "cgroup/getsockopt/child");
prog_detach(obj, cg_parent, "cgroup/getsockopt/parent");
return err;
}
static int run_setsockopt_test(struct bpf_object *obj, int cg_parent,
int cg_child, int sock_fd)
{
socklen_t optlen;
__u8 buf;
int err;
/* Set IP_TOS to the expected value (0x80). */
buf = 0x80;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x80) {
log_err("Unexpected getsockopt 0x%x != 0x80 without BPF", buf);
err = -1;
goto detach;
}
/* Attach child program and make sure it adds 0x10. */
err = prog_attach(obj, cg_child, "cgroup/setsockopt");
if (err)
goto detach;
buf = 0x80;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x80 + 0x10) {
log_err("Unexpected getsockopt 0x%x != 0x80 + 0x10", buf);
err = -1;
goto detach;
}
/* Attach parent program and make sure it adds another 0x10. */
err = prog_attach(obj, cg_parent, "cgroup/setsockopt");
if (err)
goto detach;
buf = 0x80;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x80 + 2 * 0x10) {
log_err("Unexpected getsockopt 0x%x != 0x80 + 2 * 0x10", buf);
err = -1;
goto detach;
}
detach:
prog_detach(obj, cg_child, "cgroup/setsockopt");
prog_detach(obj, cg_parent, "cgroup/setsockopt");
return err;
}
int main(int argc, char **argv)
{
struct bpf_prog_load_attr attr = {
.file = "./sockopt_multi.o",
};
int cg_parent = -1, cg_child = -1;
struct bpf_object *obj = NULL;
int sock_fd = -1;
int err = -1;
int ignored;
if (setup_cgroup_environment()) {
log_err("Failed to setup cgroup environment\n");
goto out;
}
cg_parent = create_and_get_cgroup("/parent");
if (cg_parent < 0) {
log_err("Failed to create cgroup /parent\n");
goto out;
}
cg_child = create_and_get_cgroup("/parent/child");
if (cg_child < 0) {
log_err("Failed to create cgroup /parent/child\n");
goto out;
}
if (join_cgroup("/parent/child")) {
log_err("Failed to join cgroup /parent/child\n");
goto out;
}
err = bpf_prog_load_xattr(&attr, &obj, &ignored);
if (err) {
log_err("Failed to load BPF object");
goto out;
}
sock_fd = socket(AF_INET, SOCK_STREAM, 0);
if (sock_fd < 0) {
log_err("Failed to create socket");
goto out;
}
if (run_getsockopt_test(obj, cg_parent, cg_child, sock_fd))
err = -1;
printf("test_sockopt_multi: getsockopt %s\n",
err ? "FAILED" : "PASSED");
if (run_setsockopt_test(obj, cg_parent, cg_child, sock_fd))
err = -1;
printf("test_sockopt_multi: setsockopt %s\n",
err ? "FAILED" : "PASSED");
out:
close(sock_fd);
bpf_object__close(obj);
close(cg_child);
close(cg_parent);
printf("test_sockopt_multi: %s\n", err ? "FAILED" : "PASSED");
return err ? EXIT_FAILURE : EXIT_SUCCESS;
}
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/filter.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "bpf_rlimit.h"
#include "bpf_util.h"
#include "cgroup_helpers.h"
#define CG_PATH "/sockopt"
#define SOL_CUSTOM 0xdeadbeef
static int getsetsockopt(void)
{
int fd, err;
char buf[4] = {};
socklen_t optlen;
fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd < 0) {
log_err("Failed to create socket");
return -1;
}
/* IP_TOS - BPF bypass */
buf[0] = 0x08;
err = setsockopt(fd, SOL_IP, IP_TOS, buf, 1);
if (err) {
log_err("Failed to call setsockopt(IP_TOS)");
goto err;
}
buf[0] = 0x00;
optlen = 1;
err = getsockopt(fd, SOL_IP, IP_TOS, buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto err;
}
if (buf[0] != 0x08) {
log_err("Unexpected getsockopt(IP_TOS) buf[0] 0x%02x != 0x08",
buf[0]);
goto err;
}
/* IP_TTL - EPERM */
buf[0] = 1;
err = setsockopt(fd, SOL_IP, IP_TTL, buf, 1);
if (!err || errno != EPERM) {
log_err("Unexpected success from setsockopt(IP_TTL)");
goto err;
}
/* SOL_CUSTOM - handled by BPF */
buf[0] = 0x01;
err = setsockopt(fd, SOL_CUSTOM, 0, buf, 1);
if (err) {
log_err("Failed to call setsockopt");
goto err;
}
buf[0] = 0x00;
optlen = 4;
err = getsockopt(fd, SOL_CUSTOM, 0, buf, &optlen);
if (err) {
log_err("Failed to call getsockopt");
goto err;
}
if (optlen != 1) {
log_err("Unexpected optlen %d != 1", optlen);
goto err;
}
if (buf[0] != 0x01) {
log_err("Unexpected buf[0] 0x%02x != 0x01", buf[0]);
goto err;
}
/* SO_SNDBUF is overwritten */
buf[0] = 0x01;
buf[1] = 0x01;
buf[2] = 0x01;
buf[3] = 0x01;
err = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, buf, 4);
if (err) {
log_err("Failed to call setsockopt(SO_SNDBUF)");
goto err;
}
buf[0] = 0x00;
buf[1] = 0x00;
buf[2] = 0x00;
buf[3] = 0x00;
optlen = 4;
err = getsockopt(fd, SOL_SOCKET, SO_SNDBUF, buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(SO_SNDBUF)");
goto err;
}
if (*(__u32 *)buf != 0x55AA*2) {
log_err("Unexpected getsockopt(SO_SNDBUF) 0x%x != 0x55AA*2",
*(__u32 *)buf);
goto err;
}
close(fd);
return 0;
err:
close(fd);
return -1;
}
static int prog_attach(struct bpf_object *obj, int cgroup_fd, const char *title)
{
enum bpf_attach_type attach_type;
enum bpf_prog_type prog_type;
struct bpf_program *prog;
int err;
err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
if (err) {
log_err("Failed to deduct types for %s BPF program", title);
return -1;
}
prog = bpf_object__find_program_by_title(obj, title);
if (!prog) {
log_err("Failed to find %s BPF program", title);
return -1;
}
err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
attach_type, 0);
if (err) {
log_err("Failed to attach %s BPF program", title);
return -1;
}
return 0;
}
static int run_test(int cgroup_fd)
{
struct bpf_prog_load_attr attr = {
.file = "./sockopt_sk.o",
};
struct bpf_object *obj;
int ignored;
int err;
err = bpf_prog_load_xattr(&attr, &obj, &ignored);
if (err) {
log_err("Failed to load BPF object");
return -1;
}
err = prog_attach(obj, cgroup_fd, "cgroup/getsockopt");
if (err)
goto close_bpf_object;
err = prog_attach(obj, cgroup_fd, "cgroup/setsockopt");
if (err)
goto close_bpf_object;
err = getsetsockopt();
close_bpf_object:
bpf_object__close(obj);
return err;
}
int main(int args, char **argv)
{
int cgroup_fd;
int err = EXIT_SUCCESS;
if (setup_cgroup_environment())
goto cleanup_obj;
cgroup_fd = create_and_get_cgroup(CG_PATH);
if (cgroup_fd < 0)
goto cleanup_cgroup_env;
if (join_cgroup(CG_PATH))
goto cleanup_cgroup;
if (run_test(cgroup_fd))
err = EXIT_FAILURE;
printf("test_sockopt_sk: %s\n",
err == EXIT_SUCCESS ? "PASSED" : "FAILED");
cleanup_cgroup:
close(cgroup_fd);
cleanup_cgroup_env:
cleanup_cgroup_environment();
cleanup_obj:
return err;
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment