Merge branch 'netlink-bind'

Richard Guy Briggs says: ==================== audit: implement multicast socket for journald This is a patch set Eric Paris and I have been working on to add a restricted capability read-only netlink multicast socket to kernel audit to enable userspace clients such as systemd/journald to receive audit logs, in addition to the bidirectional auditd userspace client. Currently, auditd has the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE capabilities (but uses CAP_NET_ADMIN). The CAP_AUDIT_READ capability will be added for use by read-only AUDIT_NLGRP_READLOG multicast group clients to the kaudit subsystem. This will remove the dependence on CAP_NET_ADMIN for the multicast read-only socket. Patches 1-3 provide a way for per-protocol bind functions to signal an error and to be able to clean up after themselves. The first netfilter cleanup patch has already been accepted by a netfilter maintainer, though I don't see it upstream yet, so it is included for completeness. The second patch adds the per-protocol bind function return code to signal to the netlink code that no further processing should be done and to undo the work already done. V1: This rev fixes a bug introduced by flattening the code in the last posting. *V2: This rev moves the per-protocol bind call above the socket exposure call and refactors out the unbind procedure. The third provides a way per protocol to undo bind actions on DROP. Patches 4-6 implement the audit multicast socket with capability checking. The fourth patch adds the bind function capability check to multicast join requests for audit. The fifth patch adds the audit log read multicast group. An assumption has been made that systemd/journald reside in the initial network namespace. This could be changed to check the actual network namespace of systemd/journald should this assumption no longer be true since audit now supports all network namespaces. This version of the patch now directly sends the broadcast when the packet is ready rather than waiting until it passes the queue. The sixth checks if any clients actually exist before sending. Since the net tree is busier than the audit tree, conflicts are more likely and the audit patches depend on the net patches, it is proposed to have the net tree carry this entire patchset for 3.16. Are the net maintainers ok with this? https://bugzilla.redhat.com/show_bug.cgi?id=887992 First posted: https://www.redhat.com/archives/linux-audit/2013-January/msg00008.html https://lkml.org/lkml/2013/1/27/279 Please find source for a test program at: http://people.redhat.com/rbriggs/audit-multicast-listen/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netlink-bind'
Richard Guy Briggs says: ==================== audit: implement multicast socket for journald This is a patch set Eric Paris and I have been working on to add a restricted capability read-only netlink multicast socket to kernel audit to enable userspace clients such as systemd/journald to receive audit logs, in addition to the bidirectional auditd userspace client. Currently, auditd has the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE capabilities (but uses CAP_NET_ADMIN). The CAP_AUDIT_READ capability will be added for use by read-only AUDIT_NLGRP_READLOG multicast group clients to the kaudit subsystem. This will remove the dependence on CAP_NET_ADMIN for the multicast read-only socket. Patches 1-3 provide a way for per-protocol bind functions to signal an error and to be able to clean up after themselves. The first netfilter cleanup patch has already been accepted by a netfilter maintainer, though I don't see it upstream yet, so it is included for completeness. The second patch adds the per-protocol bind function return code to signal to the netlink code that no further processing should be done and to undo the work already done. V1: This rev fixes a bug introduced by flattening the code in the last posting. *V2: This rev moves the per-protocol bind call above the socket exposure call and refactors out the unbind procedure. The third provides a way per protocol to undo bind actions on DROP. Patches 4-6 implement the audit multicast socket with capability checking. The fourth patch adds the bind function capability check to multicast join requests for audit. The fifth patch adds the audit log read multicast group. An assumption has been made that systemd/journald reside in the initial network namespace. This could be changed to check the actual network namespace of systemd/journald should this assumption no longer be true since audit now supports all network namespaces. This version of the patch now directly sends the broadcast when the packet is ready rather than waiting until it passes the queue. The sixth checks if any clients actually exist before sending. Since the net tree is busier than the audit tree, conflicts are more likely and the audit patches depend on the net patches, it is proposed to have the net tree carry this entire patchset for 3.16. Are the net maintainers ok with this? https://bugzilla.redhat.com/show_bug.cgi?id=887992 First posted: https://www.redhat.com/archives/linux-audit/2013-January/msg00008.html https://lkml.org/lkml/2013/1/27/279 Please find source for a test program at: http://people.redhat.com/rbriggs/audit-multicast-listen/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
a29b694a · David S. Miller · 4cd3675e · 7f74ecd7 · a29b694a · a29b694a
Commit a29b694a authored Apr 22, 2014 by David S. Miller
8 changed files
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -45,7 +45,8 @@ struct netlink_kernel_cfg {
 	unsigned int	flags;
 	void		(*input)(struct sk_buff *skb);
 	struct mutex	*cb_mutex;
-	void		(*bind)(int group);
+	int		(*bind)(int group);
+	void		(*unbind)(int group);
 	bool		(*compare)(struct net *net, struct sock *sk);
 };

--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -373,6 +373,14 @@ enum {
 */
 #define AUDIT_MESSAGE_TEXT_MAX	8560
+/* Multicast Netlink socket groups (default up to 32) */
+enum audit_nlgrps {
+	AUDIT_NLGRP_NONE,	/* Group 0 not used */
+	AUDIT_NLGRP_READLOG,	/* "best effort" read only socket */
+	__AUDIT_NLGRP_MAX
+};
+#define AUDIT_NLGRP_MAX                (__AUDIT_NLGRP_MAX - 1)
 struct audit_status {
 	__u32		mask;		/* Bit mask for valid entries */
 	__u32		enabled;	/* 1 = enabled, 0 = disabled */

--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -347,7 +347,12 @@ struct vfs_cap_data {
 #define CAP_BLOCK_SUSPEND    36
-#define CAP_LAST_CAP         CAP_BLOCK_SUSPEND
+/* Allow reading the audit log via multicast netlink socket */
+#define CAP_AUDIT_READ		37
+#define CAP_LAST_CAP         CAP_AUDIT_READ
 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -423,6 +423,38 @@ static void kauditd_send_skb(struct sk_buff *skb)
 		consume_skb(skb);
 }
+/*
+ * kauditd_send_multicast_skb - send the skb to multicast userspace listeners
+ *
+ * This function doesn't consume an skb as might be expected since it has to
+ * copy it anyways.
+ */
+static void kauditd_send_multicast_skb(struct sk_buff *skb)
+{
+	struct sk_buff		*copy;
+	struct audit_net	*aunet = net_generic(&init_net, audit_net_id);
+	struct sock		*sock = aunet->nlsk;
+	if (!netlink_has_listeners(sock, AUDIT_NLGRP_READLOG))
+		return;
+	/*
+	 * The seemingly wasteful skb_copy() rather than bumping the refcount
+	 * using skb_get() is necessary because non-standard mods are made to
+	 * the skb by the original kaudit unicast socket send routine.  The
+	 * existing auditd daemon assumes this breakage.  Fixing this would
+	 * require co-ordinating a change in the established protocol between
+	 * the kaudit kernel subsystem and the auditd userspace code.  There is
+	 * no reason for new multicast clients to continue with this
+	 * non-compliance.
+	 */
+	copy = skb_copy(skb, GFP_KERNEL);
+	if (!copy)
+		return;
+	nlmsg_multicast(sock, copy, 0, AUDIT_NLGRP_READLOG, GFP_KERNEL);
+}
 /*
 * flush_hold_queue - empty the hold queue if auditd appears
 *
@@ -1076,10 +1108,22 @@ static void audit_receive(struct sk_buff  *skb)
 	mutex_unlock(&audit_cmd_mutex);
 }
+/* Run custom bind function on netlink socket group connect or bind requests. */
+static int audit_bind(int group)
+{
+	if (!capable(CAP_AUDIT_READ))
+		return -EPERM;
+	return 0;
+}
 static int __net_init audit_net_init(struct net *net)
 {
 	struct netlink_kernel_cfg cfg = {
 		.input	= audit_receive,
+		.bind	= audit_bind,
+		.flags	= NL_CFG_F_NONROOT_RECV,
+		.groups	= AUDIT_NLGRP_MAX,
 	};
 	struct audit_net *aunet = net_generic(net, audit_net_id);
@@ -1901,10 +1945,10 @@ void audit_log_link_denied(const char *operation, struct path *link)
 * audit_log_end - end one audit record
 * @ab: the audit_buffer
 *
- * The netlink_* functions cannot be called inside an irq context, so
+ * netlink_unicast() cannot be called inside an irq context because it blocks
- * the audit buffer is placed on a queue and a tasklet is scheduled to
+ * (last arg, flags, is not set to MSG_DONTWAIT), so the audit buffer is placed
- * remove them from the queue outside the irq context.  May be called in
+ * on a queue and a tasklet is scheduled to remove them from the queue outside
- * any context.
+ * the irq context.  May be called in any context.
 */
 void audit_log_end(struct audit_buffer *ab)
 {
@@ -1914,6 +1958,18 @@ void audit_log_end(struct audit_buffer *ab)
 		audit_log_lost("rate limit exceeded");
 	} else {
 		struct nlmsghdr *nlh = nlmsg_hdr(ab->skb);
+		kauditd_send_multicast_skb(ab->skb);
+		/*
+		 * The original kaudit unicast socket sends up messages with
+		 * nlmsg_len set to the payload length rather than the entire
+		 * message length.  This breaks the standard set by netlink.
+		 * The existing auditd daemon assumes this breakage.  Fixing
+		 * this would require co-ordinating a change in the established
+		 * protocol between the kaudit kernel subsystem and the auditd
+		 * userspace code.
+		 */
 		nlh->nlmsg_len = ab->skb->len - NLMSG_HDRLEN;
 		if (audit_pid) {

--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -400,19 +400,17 @@ static void nfnetlink_rcv(struct sk_buff *skb)
 }
 #ifdef CONFIG_MODULES
-static void nfnetlink_bind(int group)
+static int nfnetlink_bind(int group)
 {
 	const struct nfnetlink_subsystem *ss;
 	int type = nfnl_group2type[group];
 	rcu_read_lock();
 	ss = nfnetlink_get_subsys(type);
-	if (!ss) {
 	rcu_read_unlock();
+	if (!ss)
 		request_module("nfnetlink-subsys-%d", type);
-		return;
+	return 0;
-	}
-	rcu_read_unlock();
 }
 #endif

--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1206,7 +1206,8 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
 	struct module *module = NULL;
 	struct mutex *cb_mutex;
 	struct netlink_sock *nlk;
-	void (*bind)(int group);
+	int (*bind)(int group);
+	void (*unbind)(int group);
 	int err = 0;
 	sock->state = SS_UNCONNECTED;
@@ -1232,6 +1233,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
 		err = -EPROTONOSUPPORT;
 	cb_mutex = nl_table[protocol].cb_mutex;
 	bind = nl_table[protocol].bind;
+	unbind = nl_table[protocol].unbind;
 	netlink_unlock_table();
 	if (err < 0)
@@ -1248,6 +1250,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
 	nlk = nlk_sk(sock->sk);
 	nlk->module = module;
 	nlk->netlink_bind = bind;
+	nlk->netlink_unbind = unbind;
 out:
 	return err;
@@ -1301,6 +1304,7 @@ static int netlink_release(struct socket *sock)
 			kfree_rcu(old, rcu);
 			nl_table[sk->sk_protocol].module = NULL;
 			nl_table[sk->sk_protocol].bind = NULL;
+			nl_table[sk->sk_protocol].unbind = NULL;
 			nl_table[sk->sk_protocol].flags = 0;
 			nl_table[sk->sk_protocol].registered = 0;
 		}
@@ -1411,6 +1415,19 @@ static int netlink_realloc_groups(struct sock *sk)
 	return err;
 }
+static void netlink_unbind(int group, long unsigned int groups,
+			   struct netlink_sock *nlk)
+{
+	int undo;
+	if (!nlk->netlink_unbind)
+		return;
+	for (undo = 0; undo < group; undo++)
+		if (test_bit(group, &groups))
+			nlk->netlink_unbind(undo);
+}
 static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 			int addr_len)
 {
@@ -1419,6 +1436,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 	struct netlink_sock *nlk = nlk_sk(sk);
 	struct sockaddr_nl *nladdr = (struct sockaddr_nl *)addr;
 	int err;
+	long unsigned int groups = nladdr->nl_groups;
 	if (addr_len < sizeof(struct sockaddr_nl))
 		return -EINVAL;
@@ -1427,7 +1445,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 		return -EINVAL;
 	/* Only superuser is allowed to listen multicasts */
-	if (nladdr->nl_groups) {
+	if (groups) {
 		if (!netlink_capable(sock, NL_CFG_F_NONROOT_RECV))
 			return -EPERM;
 		err = netlink_realloc_groups(sk);
@@ -1435,37 +1453,45 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 			return err;
 	}
-	if (nlk->portid) {
+	if (nlk->portid)
 		if (nladdr->nl_pid != nlk->portid)
 			return -EINVAL;
-	} else {
+	if (nlk->netlink_bind && groups) {
+		int group;
+		for (group = 0; group < nlk->ngroups; group++) {
+			if (!test_bit(group, &groups))
+				continue;
+			err = nlk->netlink_bind(group);
+			if (!err)
+				continue;
+			netlink_unbind(group, groups, nlk);
+			return err;
+		}
+	}
+	if (!nlk->portid) {
 		err = nladdr->nl_pid ?
 			netlink_insert(sk, net, nladdr->nl_pid) :
 			netlink_autobind(sock);
-		if (err)
+		if (err) {
+			netlink_unbind(nlk->ngroups - 1, groups, nlk);
 			return err;
 		}
+	}
-	if (!nladdr->nl_groups && (nlk->groups == NULL || !(u32)nlk->groups[0]))
+	if (!groups && (nlk->groups == NULL || !(u32)nlk->groups[0]))
 		return 0;
 	netlink_table_grab();
 	netlink_update_subscriptions(sk, nlk->subscriptions +
-					 hweight32(nladdr->nl_groups) -
+					 hweight32(groups) -
 					 hweight32(nlk->groups[0]));
-	nlk->groups[0] = (nlk->groups[0] & ~0xffffffffUL) | nladdr->nl_groups;
+	nlk->groups[0] = (nlk->groups[0] & ~0xffffffffUL) | groups;
 	netlink_update_listeners(sk);
 	netlink_table_ungrab();
-	if (nlk->netlink_bind && nlk->groups[0]) {
-		int i;
-		for (i = 0; i < nlk->ngroups; i++) {
-			if (test_bit(i, nlk->groups))
-				nlk->netlink_bind(i);
-		}
-	}
 	return 0;
 }
@@ -2103,13 +2129,17 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 			return err;
 		if (!val || val - 1 >= nlk->ngroups)
 			return -EINVAL;
+		if (optname == NETLINK_ADD_MEMBERSHIP && nlk->netlink_bind) {
+			err = nlk->netlink_bind(val);
+			if (err)
+				return err;
+		}
 		netlink_table_grab();
 		netlink_update_socket_mc(nlk, val,
 					 optname == NETLINK_ADD_MEMBERSHIP);
 		netlink_table_ungrab();
+		if (optname == NETLINK_DROP_MEMBERSHIP && nlk->netlink_unbind)
-		if (nlk->netlink_bind)
+			nlk->netlink_unbind(val);
-			nlk->netlink_bind(val);
 		err = 0;
 		break;

--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -38,7 +38,8 @@ struct netlink_sock {
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;
 	void			(*netlink_rcv)(struct sk_buff *skb);
-	void			(*netlink_bind)(int group);
+	int			(*netlink_bind)(int group);
+	void			(*netlink_unbind)(int group);
 	struct module		*module;
 #ifdef CONFIG_NETLINK_MMAP
 	struct mutex		pg_vec_lock;
@@ -74,7 +75,8 @@ struct netlink_table {
 	unsigned int		groups;
 	struct mutex		*cb_mutex;
 	struct module		*module;
-	void			(*bind)(int group);
+	int			(*bind)(int group);
+	void			(*unbind)(int group);
 	bool			(*compare)(struct net *net, struct sock *sock);
 	int			registered;
 };

--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -147,7 +147,7 @@ struct security_class_mapping secclass_map[] = {
 	{ "peer", { "recv", NULL } },
 	{ "capability2",
 	  { "mac_override", "mac_admin", "syslog", "wake_alarm", "block_suspend",
-	    NULL } },
+	    "audit_read", NULL } },
 	{ "kernel_service", { "use_as_override", "create_files_as", NULL } },
 	{ "tun_socket",
 	  { COMMON_SOCK_PERMS, "attach_queue", NULL } },