Merge branch 'mctp'

Jeremy Kerr says: ==================== Add Management Component Transport Protocol support This series adds core MCTP support to the kernel. From the Kconfig description: Management Component Transport Protocol (MCTP) is an in-system protocol for communicating between management controllers and their managed devices (peripherals, host processors, etc.). The protocol is defined by DMTF specification DSP0236. This option enables core MCTP support. For communicating with other devices, you'll want to enable a driver for a specific hardware channel. This implementation allows a sockets-based API for sending and receiving MCTP messages via sendmsg/recvmsg on SOCK_DGRAM sockets. Kernel stack control is all via netlink, using existing RTM_* messages. The userspace ABI change is fairly small; just the necessary AF_/ETH_P_/ARPHDR_ constants, a new sockaddr, and a new netlink attribute. For MAINTAINERS, I've just included netdev@ as the list entry. I'm happy to alter this based on preferences here - an alternative would be the OpenBMC list (the main user of the MCTP interface), or we can create a new list entirely. We have a couple of interface drivers almost ready to go at the moment, but those can wait until the core code has some review. This is v4 of the series; v1 and v2 were both RFC. selinux folks: CCing 01/15 due to the new PF_MCTP protocol family. linux-doc folks: CCing 15/15 for the new MCTP overview document. Review, comments, questions etc. are most welcome. Cheers, Jeremy v2: - change to match spec terminology: controller -> component - require specific capabilities for bind() & sendmsg() - add address and tag defintions to uapi - add selinux AF_MCTP table definitions - remove strict cflags; warnings are present in common headers v3: - require caps for MCTP bind() & send() - comment typo fixes - switch to an array for local EIDs - fix addrinfo dump iteration & error path - add RTM_DELADDR - remove GENMASK() and BIT() from uapi v4: - drop tun patch; that can be submitted separately - keep nipa happy: add maintainer CCs, including doc and selinux - net-next rebase - Include AF_MCTP in af_family_slock_keys and pf_family_names - Introduce MODULE_ definitions earlier - upstream change: set_link_af no longer called with RTNL held - add kdoc for net_device.mctp_ptr - don't inline mctp_rt_match_eid - require rtm_type == RTN_UNICAST in route management handlers - remove unused RTAX policy table - fix mctp_sock->keys rcu annotations - fix spurious rcu_read_unlock in route input ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mctp'
Jeremy Kerr says: ==================== Add Management Component Transport Protocol support This series adds core MCTP support to the kernel. From the Kconfig description: Management Component Transport Protocol (MCTP) is an in-system protocol for communicating between management controllers and their managed devices (peripherals, host processors, etc.). The protocol is defined by DMTF specification DSP0236. This option enables core MCTP support. For communicating with other devices, you'll want to enable a driver for a specific hardware channel. This implementation allows a sockets-based API for sending and receiving MCTP messages via sendmsg/recvmsg on SOCK_DGRAM sockets. Kernel stack control is all via netlink, using existing RTM_* messages. The userspace ABI change is fairly small; just the necessary AF_/ETH_P_/ARPHDR_ constants, a new sockaddr, and a new netlink attribute. For MAINTAINERS, I've just included netdev@ as the list entry. I'm happy to alter this based on preferences here - an alternative would be the OpenBMC list (the main user of the MCTP interface), or we can create a new list entirely. We have a couple of interface drivers almost ready to go at the moment, but those can wait until the core code has some review. This is v4 of the series; v1 and v2 were both RFC. selinux folks: CCing 01/15 due to the new PF_MCTP protocol family. linux-doc folks: CCing 15/15 for the new MCTP overview document. Review, comments, questions etc. are most welcome. Cheers, Jeremy v2: - change to match spec terminology: controller -> component - require specific capabilities for bind() & sendmsg() - add address and tag defintions to uapi - add selinux AF_MCTP table definitions - remove strict cflags; warnings are present in common headers v3: - require caps for MCTP bind() & send() - comment typo fixes - switch to an array for local EIDs - fix addrinfo dump iteration & error path - add RTM_DELADDR - remove GENMASK() and BIT() from uapi v4: - drop tun patch; that can be submitted separately - keep nipa happy: add maintainer CCs, including doc and selinux - net-next rebase - Include AF_MCTP in af_family_slock_keys and pf_family_names - Introduce MODULE_ definitions earlier - upstream change: set_link_af no longer called with RTNL held - add kdoc for net_device.mctp_ptr - don't inline mctp_rt_match_eid - require rtm_type == RTN_UNICAST in route management handlers - remove unused RTAX policy table - fix mctp_sock->keys rcu annotations - fix spurious rcu_read_unlock in route input ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
e5fe3a5f · David S. Miller · 658e6b16 · 6a2d98b1 · e5fe3a5f · e5fe3a5f
Commit e5fe3a5f authored Jul 29, 2021 by David S. Miller
29 changed files
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -69,6 +69,7 @@ Contents:
   l2tp
   lapb-module
   mac80211-injection
+   mctp
   mpls-sysctl
   mptcp-sysctl
   multiqueue

--- a/Documentation/networking/mctp.rst
+++ b/Documentation/networking/mctp.rst
+.. SPDX-License-Identifier: GPL-2.0
+==============================================
+Management Component Transport Protocol (MCTP)
+==============================================
+net/mctp/ contains protocol support for MCTP, as defined by DMTF standard
+DSP0236. Physical interface drivers ("bindings" in the specification) are
+provided in drivers/net/mctp/.
+The core code provides a socket-based interface to send and receive MCTP
+messages, through an AF_MCTP, SOCK_DGRAM socket.
+Structure: interfaces & networks
+================================
+The kernel models the local MCTP topology through two items: interfaces and
+networks.
+An interface (or "link") is an instance of an MCTP physical transport binding
+(as defined by DSP0236, section 3.2.47), likely connected to a specific hardware
+device. This is represented as a ``struct netdevice``.
+A network defines a unique address space for MCTP endpoints by endpoint-ID
+(described by DSP0236, section 3.2.31). A network has a user-visible identifier
+to allow references from userspace. Route definitions are specific to one
+network.
+Interfaces are associated with one network. A network may be associated with one
+or more interfaces.
+If multiple networks are present, each may contain endpoint IDs (EIDs) that are
+also present on other networks.
+Sockets API
+===========
+Protocol definitions
+--------------------
+MCTP uses ``AF_MCTP`` / ``PF_MCTP`` for the address- and protocol- families.
+Since MCTP is message-based, only ``SOCK_DGRAM`` sockets are supported.
+.. code-block:: C
+    int sd = socket(AF_MCTP, SOCK_DGRAM, 0);
+The only (current) value for the ``protocol`` argument is 0.
+As with all socket address families, source and destination addresses are
+specified with a ``sockaddr`` type, with a single-byte endpoint address:
+.. code-block:: C
+    typedef __u8		mctp_eid_t;
+    struct mctp_addr {
+            mctp_eid_t		s_addr;
+    };
+    struct sockaddr_mctp {
+            unsigned short int	smctp_family;
+            int			smctp_network;
+            struct mctp_addr	smctp_addr;
+            __u8		smctp_type;
+            __u8		smctp_tag;
+    };
+    #define MCTP_NET_ANY	0x0
+    #define MCTP_ADDR_ANY	0xff
+Syscall behaviour
+-----------------
+The following sections describe the MCTP-specific behaviours of the standard
+socket system calls. These behaviours have been chosen to map closely to the
+existing sockets APIs.
+``bind()`` : set local socket address
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Sockets that receive incoming request packets will bind to a local address,
+using the ``bind()`` syscall.
+.. code-block:: C
+    struct sockaddr_mctp addr;
+    addr.smctp_family = AF_MCTP;
+    addr.smctp_network = MCTP_NET_ANY;
+    addr.smctp_addr.s_addr = MCTP_ADDR_ANY;
+    addr.smctp_type = MCTP_TYPE_PLDM;
+    addr.smctp_tag = MCTP_TAG_OWNER;
+    int rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr));
+This establishes the local address of the socket. Incoming MCTP messages that
+match the network, address, and message type will be received by this socket.
+The reference to 'incoming' is important here; a bound socket will only receive
+messages with the TO bit set, to indicate an incoming request message, rather
+than a response.
+The ``smctp_tag`` value will configure the tags accepted from the remote side of
+this socket. Given the above, the only valid value is ``MCTP_TAG_OWNER``, which
+will result in remotely "owned" tags being routed to this socket. Since
+``MCTP_TAG_OWNER`` is set, the 3 least-significant bits of ``smctp_tag`` are not
+used; callers must set them to zero.
+A ``smctp_network`` value of ``MCTP_NET_ANY`` will configure the socket to
+receive incoming packets from any locally-connected network. A specific network
+value will cause the socket to only receive incoming messages from that network.
+The ``smctp_addr`` field specifies a local address to bind to. A value of
+``MCTP_ADDR_ANY`` configures the socket to receive messages addressed to any
+local destination EID.
+The ``smctp_type`` field specifies which message types to receive. Only the
+lower 7 bits of the type is matched on incoming messages (ie., the
+most-significant IC bit is not part of the match). This results in the socket
+receiving packets with and without a message integrity check footer.
+``sendto()``, ``sendmsg()``, ``send()`` : transmit an MCTP message
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+An MCTP message is transmitted using one of the ``sendto()``, ``sendmsg()`` or
+``send()`` syscalls. Using ``sendto()`` as the primary example:
+.. code-block:: C
+    struct sockaddr_mctp addr;
+    char buf[14];
+    ssize_t len;
+    /* set message destination */
+    addr.smctp_family = AF_MCTP;
+    addr.smctp_network = 0;
+    addr.smctp_addr.s_addr = 8;
+    addr.smctp_tag = MCTP_TAG_OWNER;
+    addr.smctp_type = MCTP_TYPE_ECHO;
+    /* arbitrary message to send, with message-type header */
+    buf[0] = MCTP_TYPE_ECHO;
+    memcpy(buf + 1, "hello, world!", sizeof(buf) - 1);
+    len = sendto(sd, buf, sizeof(buf), 0,
+                    (struct sockaddr_mctp *)&addr, sizeof(addr));
+The network and address fields of ``addr`` define the remote address to send to.
+If ``smctp_tag`` has the ``MCTP_TAG_OWNER``, the kernel will ignore any bits set
+in ``MCTP_TAG_VALUE``, and generate a tag value suitable for the destination
+EID. If ``MCTP_TAG_OWNER`` is not set, the message will be sent with the tag
+value as specified. If a tag value cannot be allocated, the system call will
+report an errno of ``EAGAIN``.
+The application must provide the message type byte as the first byte of the
+message buffer passed to ``sendto()``. If a message integrity check is to be
+included in the transmitted message, it must also be provided in the message
+buffer, and the most-significant bit of the message type byte must be 1.
+The ``sendmsg()`` system call allows a more compact argument interface, and the
+message buffer to be specified as a scatter-gather list. At present no ancillary
+message types (used for the ``msg_control`` data passed to ``sendmsg()``) are
+defined.
+Transmitting a message on an unconnected socket with ``MCTP_TAG_OWNER``
+specified will cause an allocation of a tag, if no valid tag is already
+allocated for that destination. The (destination-eid,tag) tuple acts as an
+implicit local socket address, to allow the socket to receive responses to this
+outgoing message. If any previous allocation has been performed (to for a
+different remote EID), that allocation is lost.
+Sockets will only receive responses to requests they have sent (with TO=1) and
+may only respond (with TO=0) to requests they have received.
+``recvfrom()``, ``recvmsg()``, ``recv()`` : receive an MCTP message
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+An MCTP message can be received by an application using one of the
+``recvfrom()``, ``recvmsg()``, or ``recv()`` system calls. Using ``recvfrom()``
+as the primary example:
+.. code-block:: C
+    struct sockaddr_mctp addr;
+    socklen_t addrlen;
+    char buf[14];
+    ssize_t len;
+    addrlen = sizeof(addr);
+    len = recvfrom(sd, buf, sizeof(buf), 0,
+                    (struct sockaddr_mctp *)&addr, &addrlen);
+    /* We can expect addr to describe an MCTP address */
+    assert(addrlen >= sizeof(buf));
+    assert(addr.smctp_family == AF_MCTP);
+    printf("received %zd bytes from remote EID %d\n", rc, addr.smctp_addr);
+The address argument to ``recvfrom`` and ``recvmsg`` is populated with the
+remote address of the incoming message, including tag value (this will be needed
+in order to reply to the message).
+The first byte of the message buffer will contain the message type byte. If an
+integrity check follows the message, it will be included in the received buffer.
+The ``recv()`` system call behaves in a similar way, but does not provide a
+remote address to the application. Therefore, these are only useful if the
+remote address is already known, or the message does not require a reply.
+Like the send calls, sockets will only receive responses to requests they have
+sent (TO=1) and may only respond (TO=0) to requests they have received.
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11032,6 +11032,18 @@ F:	drivers/mailbox/arm_mhuv2.c
 F:	include/linux/mailbox/arm_mhuv2_message.h
 F:	Documentation/devicetree/bindings/mailbox/arm,mhuv2.yaml
+MANAGEMENT COMPONENT TRANSPORT PROTOCOL (MCTP)
+M:	Jeremy Kerr <jk@codeconstruct.com.au>
+M:	Matt Johnston <matt@codeconstruct.com.au>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	Documentation/networking/mctp.rst
+F:	drivers/net/mctp/
+F:	include/net/mctp.h
+F:	include/net/mctpdevice.h
+F:	include/net/netns/mctp.h
+F:	net/mctp/
 MAN-PAGES: MANUAL PAGES FOR LINUX -- Sections 2, 3, 4, 5, and 7
 M:	Michael Kerrisk <mtk.manpages@gmail.com>
 L:	linux-man@vger.kernel.org

--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -483,6 +483,8 @@ config NET_SB1000
 source "drivers/net/phy/Kconfig"
+source "drivers/net/mctp/Kconfig"
 source "drivers/net/mdio/Kconfig"
 source "drivers/net/pcs/Kconfig"

--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -69,6 +69,7 @@ obj-$(CONFIG_WAN) += wan/
 obj-$(CONFIG_WLAN) += wireless/
 obj-$(CONFIG_IEEE802154) += ieee802154/
 obj-$(CONFIG_WWAN) += wwan/
+obj-$(CONFIG_MCTP) += mctp/
 obj-$(CONFIG_VMXNET3) += vmxnet3/
 obj-$(CONFIG_XEN_NETDEV_FRONTEND) += xen-netfront.o

--- a/drivers/net/mctp/Kconfig
+++ b/drivers/net/mctp/Kconfig
+if MCTP
+menu "MCTP Device Drivers"
+endmenu
+endif
--- a/drivers/net/mctp/Makefile
+++ b/drivers/net/mctp/Makefile
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1823,6 +1823,7 @@ enum netdev_ml_priv_type {
 *	@ieee802154_ptr: IEEE 802.15.4 low-rate Wireless Personal Area Network
 *			 device struct
 *	@mpls_ptr:	mpls_dev struct pointer
+ *	@mctp_ptr:	MCTP specific data
 *
 *	@dev_addr:	Hw address (before bcast,
 *			because most packets are unicast)
@@ -2110,6 +2111,9 @@ struct net_device {
 #if IS_ENABLED(CONFIG_MPLS_ROUTING)
 	struct mpls_dev __rcu	*mpls_ptr;
 #endif
+#if IS_ENABLED(CONFIG_MCTP)
+	struct mctp_dev __rcu	*mctp_ptr;
+#endif
 /*
 * Cache lines mostly used on receive path (including eth_type_trans())

--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -223,8 +223,11 @@ struct ucred {
 				 * reuses AF_INET address family
 				 */
 #define AF_XDP		44	/* XDP sockets			*/
+#define AF_MCTP		45	/* Management component
+				 * transport protocol
+				 */
-#define AF_MAX		45	/* For now.. */
+#define AF_MAX		46	/* For now.. */
 /* Protocol families, same as address families. */
 #define PF_UNSPEC	AF_UNSPEC
@@ -274,6 +277,7 @@ struct ucred {
 #define PF_QIPCRTR	AF_QIPCRTR
 #define PF_SMC		AF_SMC
 #define PF_XDP		AF_XDP
+#define PF_MCTP		AF_MCTP
 #define PF_MAX		AF_MAX
 /* Maximum queue length specifiable by listen.  */

--- a/include/net/mctp.h
+++ b/include/net/mctp.h
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Management Component Transport Protocol (MCTP)
+ *
+ * Copyright (c) 2021 Code Construct
+ * Copyright (c) 2021 Google
+ */
+#ifndef __NET_MCTP_H
+#define __NET_MCTP_H
+#include <linux/bits.h>
+#include <linux/mctp.h>
+#include <net/net_namespace.h>
+#include <net/sock.h>
+/* MCTP packet definitions */
+struct mctp_hdr {
+	u8	ver;
+	u8	dest;
+	u8	src;
+	u8	flags_seq_tag;
+};
+#define MCTP_VER_MIN	1
+#define MCTP_VER_MAX	1
+/* Definitions for flags_seq_tag field */
+#define MCTP_HDR_FLAG_SOM	BIT(7)
+#define MCTP_HDR_FLAG_EOM	BIT(6)
+#define MCTP_HDR_FLAG_TO	BIT(3)
+#define MCTP_HDR_FLAGS		GENMASK(5, 3)
+#define MCTP_HDR_SEQ_SHIFT	4
+#define MCTP_HDR_SEQ_MASK	GENMASK(1, 0)
+#define MCTP_HDR_TAG_SHIFT	0
+#define MCTP_HDR_TAG_MASK	GENMASK(2, 0)
+#define MCTP_HEADER_MAXLEN	4
+#define MCTP_INITIAL_DEFAULT_NET	1
+static inline bool mctp_address_ok(mctp_eid_t eid)
+{
+	return eid >= 8 && eid < 255;
+}
+static inline struct mctp_hdr *mctp_hdr(struct sk_buff *skb)
+{
+	return (struct mctp_hdr *)skb_network_header(skb);
+}
+/* socket implementation */
+struct mctp_sock {
+	struct sock	sk;
+	/* bind() params */
+	int		bind_net;
+	mctp_eid_t	bind_addr;
+	__u8		bind_type;
+	/* list of mctp_sk_key, for incoming tag lookup. updates protected
+	 * by sk->net->keys_lock
+	 */
+	struct hlist_head keys;
+};
+/* Key for matching incoming packets to sockets or reassembly contexts.
+ * Packets are matched on (src,dest,tag).
+ *
+ * Lifetime requirements:
+ *
+ *  - keys are free()ed via RCU
+ *
+ *  - a mctp_sk_key contains a reference to a struct sock; this is valid
+ *    for the life of the key. On sock destruction (through unhash), the key is
+ *    removed from lists (see below), and will not be observable after a RCU
+ *    grace period.
+ *
+ *    any RX occurring within that grace period may still queue to the socket,
+ *    but will hit the SOCK_DEAD case before the socket is freed.
+ *
+ * - these mctp_sk_keys appear on two lists:
+ *     1) the struct mctp_sock->keys list
+ *     2) the struct netns_mctp->keys list
+ *
+ *        updates to either list are performed under the netns_mctp->keys
+ *        lock.
+ *
+ * - a key may have a sk_buff attached as part of an in-progress message
+ *   reassembly (->reasm_head). The reassembly context is protected by
+ *   reasm_lock, which may be acquired with the keys lock (above) held, if
+ *   necessary. Consequently, keys lock *cannot* be acquired with the
+ *   reasm_lock held.
+ *
+ * - there are two destruction paths for a mctp_sk_key:
+ *
+ *    - through socket unhash (see mctp_sk_unhash). This performs the list
+ *      removal under keys_lock.
+ *
+ *    - where a key is established to receive a reply message: after receiving
+ *      the (complete) reply, or during reassembly errors. Here, we clean up
+ *      the reassembly context (marking reasm_dead, to prevent another from
+ *      starting), and remove the socket from the netns & socket lists.
+ */
+struct mctp_sk_key {
+	mctp_eid_t	peer_addr;
+	mctp_eid_t	local_addr;
+	__u8		tag; /* incoming tag match; invert TO for local */
+	/* we hold a ref to sk when set */
+	struct sock	*sk;
+	/* routing lookup list */
+	struct hlist_node hlist;
+	/* per-socket list */
+	struct hlist_node sklist;
+	/* incoming fragment reassembly context */
+	spinlock_t	reasm_lock;
+	struct sk_buff	*reasm_head;
+	struct sk_buff	**reasm_tailp;
+	bool		reasm_dead;
+	u8		last_seq;
+	struct rcu_head	rcu;
+};
+struct mctp_skb_cb {
+	unsigned int	magic;
+	unsigned int	net;
+	mctp_eid_t	src;
+};
+/* skb control-block accessors with a little extra debugging for initial
+ * development.
+ *
+ * TODO: remove checks & mctp_skb_cb->magic; replace callers of __mctp_cb
+ * with mctp_cb().
+ *
+ * __mctp_cb() is only for the initial ingress code; we should see ->magic set
+ * at all times after this.
+ */
+static inline struct mctp_skb_cb *__mctp_cb(struct sk_buff *skb)
+{
+	struct mctp_skb_cb *cb = (void *)skb->cb;
+	cb->magic = 0x4d435450;
+	return cb;
+}
+static inline struct mctp_skb_cb *mctp_cb(struct sk_buff *skb)
+{
+	struct mctp_skb_cb *cb = (void *)skb->cb;
+	WARN_ON(cb->magic != 0x4d435450);
+	return (void *)(skb->cb);
+}
+/* Route definition.
+ *
+ * These are held in the pernet->mctp.routes list, with RCU protection for
+ * removed routes. We hold a reference to the netdev; routes need to be
+ * dropped on NETDEV_UNREGISTER events.
+ *
+ * Updates to the route table are performed under rtnl; all reads under RCU,
+ * so routes cannot be referenced over a RCU grace period. Specifically: A
+ * caller cannot block between mctp_route_lookup and passing the route to
+ * mctp_do_route.
+ */
+struct mctp_route {
+	mctp_eid_t		min, max;
+	struct mctp_dev		*dev;
+	unsigned int		mtu;
+	int			(*output)(struct mctp_route *route,
+					  struct sk_buff *skb);
+	struct list_head	list;
+	refcount_t		refs;
+	struct rcu_head		rcu;
+};
+/* route interfaces */
+struct mctp_route *mctp_route_lookup(struct net *net, unsigned int dnet,
+				     mctp_eid_t daddr);
+int mctp_do_route(struct mctp_route *rt, struct sk_buff *skb);
+int mctp_local_output(struct sock *sk, struct mctp_route *rt,
+		      struct sk_buff *skb, mctp_eid_t daddr, u8 req_tag);
+/* routing <--> device interface */
+unsigned int mctp_default_net(struct net *net);
+int mctp_default_net_set(struct net *net, unsigned int index);
+int mctp_route_add_local(struct mctp_dev *mdev, mctp_eid_t addr);
+int mctp_route_remove_local(struct mctp_dev *mdev, mctp_eid_t addr);
+void mctp_route_remove_dev(struct mctp_dev *mdev);
+/* neighbour definitions */
+enum mctp_neigh_source {
+	MCTP_NEIGH_STATIC,
+	MCTP_NEIGH_DISCOVER,
+};
+struct mctp_neigh {
+	struct mctp_dev		*dev;
+	mctp_eid_t		eid;
+	enum mctp_neigh_source	source;
+	unsigned char		ha[MAX_ADDR_LEN];
+	struct list_head	list;
+	struct rcu_head		rcu;
+};
+int mctp_neigh_init(void);
+void mctp_neigh_exit(void);
+// ret_hwaddr may be NULL, otherwise must have space for MAX_ADDR_LEN
+int mctp_neigh_lookup(struct mctp_dev *dev, mctp_eid_t eid,
+		      void *ret_hwaddr);
+void mctp_neigh_remove_dev(struct mctp_dev *mdev);
+int mctp_routes_init(void);
+void mctp_routes_exit(void);
+void mctp_device_init(void);
+void mctp_device_exit(void);
+#endif /* __NET_MCTP_H */
--- a/include/net/mctpdevice.h
+++ b/include/net/mctpdevice.h
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Management Component Transport Protocol (MCTP) - device
+ * definitions.
+ *
+ * Copyright (c) 2021 Code Construct
+ * Copyright (c) 2021 Google
+ */
+#ifndef __NET_MCTPDEVICE_H
+#define __NET_MCTPDEVICE_H
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/refcount.h>
+struct mctp_dev {
+	struct net_device	*dev;
+	unsigned int		net;
+	/* Only modified under RTNL. Reads have addrs_lock held */
+	u8			*addrs;
+	size_t			num_addrs;
+	spinlock_t		addrs_lock;
+	struct rcu_head		rcu;
+};
+#define MCTP_INITIAL_DEFAULT_NET	1
+struct mctp_dev *mctp_dev_get_rtnl(const struct net_device *dev);
+struct mctp_dev *__mctp_dev_get(const struct net_device *dev);
+struct mctp_dev *mctp_dev_get_rtnl(const struct net_device *dev);
+#endif /* __NET_MCTPDEVICE_H */
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -34,6 +34,7 @@
 #include <net/netns/xdp.h>
 #include <net/netns/smc.h>
 #include <net/netns/bpf.h>
+#include <net/netns/mctp.h>
 #include <linux/ns_common.h>
 #include <linux/idr.h>
 #include <linux/skbuff.h>
@@ -167,6 +168,9 @@ struct net {
 #ifdef CONFIG_XDP_SOCKETS
 	struct netns_xdp	xdp;
 #endif
+#if IS_ENABLED(CONFIG_MCTP)
+	struct netns_mctp	mctp;
+#endif
 #if IS_ENABLED(CONFIG_CRYPTO_USER)
 	struct sock		*crypto_nlsk;
 #endif

--- a/include/net/netns/mctp.h
+++ b/include/net/netns/mctp.h
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * MCTP per-net structures
+ */
+#ifndef __NETNS_MCTP_H__
+#define __NETNS_MCTP_H__
+#include <linux/types.h>
+struct netns_mctp {
+	/* Only updated under RTNL, entries freed via RCU */
+	struct list_head routes;
+	/* Bound sockets: list of sockets bound by type.
+	 * This list is updated from non-atomic contexts (under bind_lock),
+	 * and read (under rcu) in packet rx
+	 */
+	struct mutex bind_lock;
+	struct hlist_head binds;
+	/* tag allocations. This list is read and updated from atomic contexts,
+	 * but elements are free()ed after a RCU grace-period
+	 */
+	spinlock_t keys_lock;
+	struct hlist_head keys;
+	/* MCTP network */
+	unsigned int default_net;
+	/* neighbour table */
+	struct mutex neigh_lock;
+	struct list_head neighbours;
+};
+#endif /* __NETNS_MCTP_H__ */
--- a/include/uapi/linux/if_arp.h
+++ b/include/uapi/linux/if_arp.h
@@ -54,6 +54,7 @@
 #define ARPHRD_X25	271		/* CCITT X.25			*/
 #define ARPHRD_HWX25	272		/* Boards with X.25 in firmware	*/
 #define ARPHRD_CAN	280		/* Controller Area Network      */
+#define ARPHRD_MCTP	290
 #define ARPHRD_PPP	512
 #define ARPHRD_CISCO	513		/* Cisco HDLC	 		*/
 #define ARPHRD_HDLC	ARPHRD_CISCO

--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -151,6 +151,9 @@
 #define ETH_P_MAP	0x00F9		/* Qualcomm multiplexing and
 					 * aggregation protocol
 					 */
+#define ETH_P_MCTP	0x00FA		/* Management component transport
+					 * protocol packets
+					 */
 /*
 *	This is an Ethernet frame header.

--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1260,4 +1260,14 @@ struct ifla_rmnet_flags {
 	__u32	mask;
 };
+/* MCTP section */
+enum {
+	IFLA_MCTP_UNSPEC,
+	IFLA_MCTP_NET,
+	__IFLA_MCTP_MAX,
+};
+#define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
 #endif /* _UAPI_LINUX_IF_LINK_H */
--- a/include/uapi/linux/mctp.h
+++ b/include/uapi/linux/mctp.h
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Management Component Transport Protocol (MCTP)
+ *
+ * Copyright (c) 2021 Code Construct
+ * Copyright (c) 2021 Google
+ */
+#ifndef __UAPI_MCTP_H
+#define __UAPI_MCTP_H
+#include <linux/types.h>
+typedef __u8			mctp_eid_t;
+struct mctp_addr {
+	mctp_eid_t		s_addr;
+};
+struct sockaddr_mctp {
+	unsigned short int	smctp_family;
+	int			smctp_network;
+	struct mctp_addr	smctp_addr;
+	__u8			smctp_type;
+	__u8			smctp_tag;
+};
+#define MCTP_NET_ANY		0x0
+#define MCTP_ADDR_NULL		0x00
+#define MCTP_ADDR_ANY		0xff
+#define MCTP_TAG_MASK		0x07
+#define MCTP_TAG_OWNER		0x08
+#endif /* __UAPI_MCTP_H */
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -363,6 +363,7 @@ source "net/bluetooth/Kconfig"
 source "net/rxrpc/Kconfig"
 source "net/kcm/Kconfig"
 source "net/strparser/Kconfig"
+source "net/mctp/Kconfig"
 config FIB_RULES
 	bool

--- a/net/Makefile
+++ b/net/Makefile
@@ -78,3 +78,4 @@ obj-$(CONFIG_QRTR)		+= qrtr/
 obj-$(CONFIG_NET_NCSI)		+= ncsi/
 obj-$(CONFIG_XDP_SOCKETS)	+= xdp/
 obj-$(CONFIG_MPTCP)		+= mptcp/
+obj-$(CONFIG_MCTP)		+= mctp/
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -226,6 +226,7 @@ static struct lock_class_key af_family_kern_slock_keys[AF_MAX];
  x "AF_IEEE802154",	x "AF_CAIF"	,	x "AF_ALG"      , \
  x "AF_NFC"   ,	x "AF_VSOCK"    ,	x "AF_KCM"      , \
  x "AF_QIPCRTR",	x "AF_SMC"	,	x "AF_XDP"	, \
+  x "AF_MCTP"  , \
  x "AF_MAX"
 static const char *const af_family_key_strings[AF_MAX+1] = {

--- a/net/mctp/Kconfig
+++ b/net/mctp/Kconfig
+menuconfig MCTP
+	depends on NET
+	tristate "MCTP core protocol support"
+	help
+	  Management Component Transport Protocol (MCTP) is an in-system
+	  protocol for communicating between management controllers and
+	  their managed devices (peripherals, host processors, etc.). The
+	  protocol is defined by DMTF specification DSP0236.
+	  This option enables core MCTP support. For communicating with other
+	  devices, you'll want to enable a driver for a specific hardware
+	  channel.
--- a/net/mctp/Makefile
+++ b/net/mctp/Makefile
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_MCTP) += mctp.o
+mctp-objs := af_mctp.o device.o route.o neigh.o
--- a/net/mctp/af_mctp.c
+++ b/net/mctp/af_mctp.c
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Management Component Transport Protocol (MCTP)
+ *
+ * Copyright (c) 2021 Code Construct
+ * Copyright (c) 2021 Google
+ */
+#include <linux/if_arp.h>
+#include <linux/net.h>
+#include <linux/mctp.h>
+#include <linux/module.h>
+#include <linux/socket.h>
+#include <net/mctp.h>
+#include <net/mctpdevice.h>
+#include <net/sock.h>
+/* socket implementation */
+static int mctp_release(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	if (sk) {
+		sock->sk = NULL;
+		sk->sk_prot->close(sk, 0);
+	}
+	return 0;
+}
+static int mctp_bind(struct socket *sock, struct sockaddr *addr, int addrlen)
+{
+	struct sock *sk = sock->sk;
+	struct mctp_sock *msk = container_of(sk, struct mctp_sock, sk);
+	struct sockaddr_mctp *smctp;
+	int rc;
+	if (addrlen < sizeof(*smctp))
+		return -EINVAL;
+	if (addr->sa_family != AF_MCTP)
+		return -EAFNOSUPPORT;
+	if (!capable(CAP_NET_BIND_SERVICE))
+		return -EACCES;
+	/* it's a valid sockaddr for MCTP, cast and do protocol checks */
+	smctp = (struct sockaddr_mctp *)addr;
+	lock_sock(sk);
+	/* TODO: allow rebind */
+	if (sk_hashed(sk)) {
+		rc = -EADDRINUSE;
+		goto out_release;
+	}
+	msk->bind_net = smctp->smctp_network;
+	msk->bind_addr = smctp->smctp_addr.s_addr;
+	msk->bind_type = smctp->smctp_type & 0x7f; /* ignore the IC bit */
+	rc = sk->sk_prot->hash(sk);
+out_release:
+	release_sock(sk);
+	return rc;
+}
+static int mctp_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
+{
+	DECLARE_SOCKADDR(struct sockaddr_mctp *, addr, msg->msg_name);
+	const int hlen = MCTP_HEADER_MAXLEN + sizeof(struct mctp_hdr);
+	int rc, addrlen = msg->msg_namelen;
+	struct sock *sk = sock->sk;
+	struct mctp_skb_cb *cb;
+	struct mctp_route *rt;
+	struct sk_buff *skb;
+	if (addr) {
+		if (addrlen < sizeof(struct sockaddr_mctp))
+			return -EINVAL;
+		if (addr->smctp_family != AF_MCTP)
+			return -EINVAL;
+		if (addr->smctp_tag & ~(MCTP_TAG_MASK | MCTP_TAG_OWNER))
+			return -EINVAL;
+	} else {
+		/* TODO: connect()ed sockets */
+		return -EDESTADDRREQ;
+	}
+	if (!capable(CAP_NET_RAW))
+		return -EACCES;
+	if (addr->smctp_network == MCTP_NET_ANY)
+		addr->smctp_network = mctp_default_net(sock_net(sk));
+	rt = mctp_route_lookup(sock_net(sk), addr->smctp_network,
+			       addr->smctp_addr.s_addr);
+	if (!rt)
+		return -EHOSTUNREACH;
+	skb = sock_alloc_send_skb(sk, hlen + 1 + len,
+				  msg->msg_flags & MSG_DONTWAIT, &rc);
+	if (!skb)
+		return rc;
+	skb_reserve(skb, hlen);
+	/* set type as fist byte in payload */
+	*(u8 *)skb_put(skb, 1) = addr->smctp_type;
+	rc = memcpy_from_msg((void *)skb_put(skb, len), msg, len);
+	if (rc < 0) {
+		kfree_skb(skb);
+		return rc;
+	}
+	/* set up cb */
+	cb = __mctp_cb(skb);
+	cb->net = addr->smctp_network;
+	rc = mctp_local_output(sk, rt, skb, addr->smctp_addr.s_addr,
+			       addr->smctp_tag);
+	return rc ? : len;
+}
+static int mctp_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+			int flags)
+{
+	DECLARE_SOCKADDR(struct sockaddr_mctp *, addr, msg->msg_name);
+	struct sock *sk = sock->sk;
+	struct sk_buff *skb;
+	size_t msglen;
+	u8 type;
+	int rc;
+	if (flags & ~(MSG_DONTWAIT | MSG_TRUNC | MSG_PEEK))
+		return -EOPNOTSUPP;
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &rc);
+	if (!skb)
+		return rc;
+	if (!skb->len) {
+		rc = 0;
+		goto out_free;
+	}
+	/* extract message type, remove from data */
+	type = *((u8 *)skb->data);
+	msglen = skb->len - 1;
+	if (len < msglen)
+		msg->msg_flags |= MSG_TRUNC;
+	else
+		len = msglen;
+	rc = skb_copy_datagram_msg(skb, 1, msg, len);
+	if (rc < 0)
+		goto out_free;
+	sock_recv_ts_and_drops(msg, sk, skb);
+	if (addr) {
+		struct mctp_skb_cb *cb = mctp_cb(skb);
+		/* TODO: expand mctp_skb_cb for header fields? */
+		struct mctp_hdr *hdr = mctp_hdr(skb);
+		hdr = mctp_hdr(skb);
+		addr = msg->msg_name;
+		addr->smctp_family = AF_MCTP;
+		addr->smctp_network = cb->net;
+		addr->smctp_addr.s_addr = hdr->src;
+		addr->smctp_type = type;
+		addr->smctp_tag = hdr->flags_seq_tag &
+					(MCTP_HDR_TAG_MASK | MCTP_HDR_FLAG_TO);
+		msg->msg_namelen = sizeof(*addr);
+	}
+	rc = len;
+	if (flags & MSG_TRUNC)
+		rc = msglen;
+out_free:
+	skb_free_datagram(sk, skb);
+	return rc;
+}
+static int mctp_setsockopt(struct socket *sock, int level, int optname,
+			   sockptr_t optval, unsigned int optlen)
+{
+	return -EINVAL;
+}
+static int mctp_getsockopt(struct socket *sock, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	return -EINVAL;
+}
+static const struct proto_ops mctp_dgram_ops = {
+	.family		= PF_MCTP,
+	.release	= mctp_release,
+	.bind		= mctp_bind,
+	.connect	= sock_no_connect,
+	.socketpair	= sock_no_socketpair,
+	.accept		= sock_no_accept,
+	.getname	= sock_no_getname,
+	.poll		= datagram_poll,
+	.ioctl		= sock_no_ioctl,
+	.gettstamp	= sock_gettstamp,
+	.listen		= sock_no_listen,
+	.shutdown	= sock_no_shutdown,
+	.setsockopt	= mctp_setsockopt,
+	.getsockopt	= mctp_getsockopt,
+	.sendmsg	= mctp_sendmsg,
+	.recvmsg	= mctp_recvmsg,
+	.mmap		= sock_no_mmap,
+	.sendpage	= sock_no_sendpage,
+};
+static int mctp_sk_init(struct sock *sk)
+{
+	struct mctp_sock *msk = container_of(sk, struct mctp_sock, sk);
+	INIT_HLIST_HEAD(&msk->keys);
+	return 0;
+}
+static void mctp_sk_close(struct sock *sk, long timeout)
+{
+	sk_common_release(sk);
+}
+static int mctp_sk_hash(struct sock *sk)
+{
+	struct net *net = sock_net(sk);
+	mutex_lock(&net->mctp.bind_lock);
+	sk_add_node_rcu(sk, &net->mctp.binds);
+	mutex_unlock(&net->mctp.bind_lock);
+	return 0;
+}
+static void mctp_sk_unhash(struct sock *sk)
+{
+	struct mctp_sock *msk = container_of(sk, struct mctp_sock, sk);
+	struct net *net = sock_net(sk);
+	struct mctp_sk_key *key;
+	struct hlist_node *tmp;
+	unsigned long flags;
+	/* remove from any type-based binds */
+	mutex_lock(&net->mctp.bind_lock);
+	sk_del_node_init_rcu(sk);
+	mutex_unlock(&net->mctp.bind_lock);
+	/* remove tag allocations */
+	spin_lock_irqsave(&net->mctp.keys_lock, flags);
+	hlist_for_each_entry_safe(key, tmp, &msk->keys, sklist) {
+		hlist_del_rcu(&key->sklist);
+		hlist_del_rcu(&key->hlist);
+		spin_lock(&key->reasm_lock);
+		if (key->reasm_head)
+			kfree_skb(key->reasm_head);
+		key->reasm_head = NULL;
+		key->reasm_dead = true;
+		spin_unlock(&key->reasm_lock);
+		kfree_rcu(key, rcu);
+	}
+	spin_unlock_irqrestore(&net->mctp.keys_lock, flags);
+	synchronize_rcu();
+}
+static struct proto mctp_proto = {
+	.name		= "MCTP",
+	.owner		= THIS_MODULE,
+	.obj_size	= sizeof(struct mctp_sock),
+	.init		= mctp_sk_init,
+	.close		= mctp_sk_close,
+	.hash		= mctp_sk_hash,
+	.unhash		= mctp_sk_unhash,
+};
+static int mctp_pf_create(struct net *net, struct socket *sock,
+			  int protocol, int kern)
+{
+	const struct proto_ops *ops;
+	struct proto *proto;
+	struct sock *sk;
+	int rc;
+	if (protocol)
+		return -EPROTONOSUPPORT;
+	/* only datagram sockets are supported */
+	if (sock->type != SOCK_DGRAM)
+		return -ESOCKTNOSUPPORT;
+	proto = &mctp_proto;
+	ops = &mctp_dgram_ops;
+	sock->state = SS_UNCONNECTED;
+	sock->ops = ops;
+	sk = sk_alloc(net, PF_MCTP, GFP_KERNEL, proto, kern);
+	if (!sk)
+		return -ENOMEM;
+	sock_init_data(sock, sk);
+	rc = 0;
+	if (sk->sk_prot->init)
+		rc = sk->sk_prot->init(sk);
+	if (rc)
+		goto err_sk_put;
+	return 0;
+err_sk_put:
+	sock_orphan(sk);
+	sock_put(sk);
+	return rc;
+}
+static struct net_proto_family mctp_pf = {
+	.family = PF_MCTP,
+	.create = mctp_pf_create,
+	.owner = THIS_MODULE,
+};
+static __init int mctp_init(void)
+{
+	int rc;
+	/* ensure our uapi tag definitions match the header format */
+	BUILD_BUG_ON(MCTP_TAG_OWNER != MCTP_HDR_FLAG_TO);
+	BUILD_BUG_ON(MCTP_TAG_MASK != MCTP_HDR_TAG_MASK);
+	pr_info("mctp: management component transport protocol core\n");
+	rc = sock_register(&mctp_pf);
+	if (rc)
+		return rc;
+	rc = proto_register(&mctp_proto, 0);
+	if (rc)
+		goto err_unreg_sock;
+	rc = mctp_routes_init();
+	if (rc)
+		goto err_unreg_proto;
+	rc = mctp_neigh_init();
+	if (rc)
+		goto err_unreg_proto;
+	mctp_device_init();
+	return 0;
+err_unreg_proto:
+	proto_unregister(&mctp_proto);
+err_unreg_sock:
+	sock_unregister(PF_MCTP);
+	return rc;
+}
+static __exit void mctp_exit(void)
+{
+	mctp_device_exit();
+	mctp_neigh_exit();
+	mctp_routes_exit();
+	proto_unregister(&mctp_proto);
+	sock_unregister(PF_MCTP);
+}
+module_init(mctp_init);
+module_exit(mctp_exit);
+MODULE_DESCRIPTION("MCTP core");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jeremy Kerr <jk@codeconstruct.com.au>");
+MODULE_ALIAS_NETPROTO(PF_MCTP);
--- a/net/mctp/device.c
+++ b/net/mctp/device.c
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Management Component Transport Protocol (MCTP) - device implementation.
+ *
+ * Copyright (c) 2021 Code Construct
+ * Copyright (c) 2021 Google
+ */
+#include <linux/if_link.h>
+#include <linux/mctp.h>
+#include <linux/netdevice.h>
+#include <linux/rcupdate.h>
+#include <linux/rtnetlink.h>
+#include <net/addrconf.h>
+#include <net/netlink.h>
+#include <net/mctp.h>
+#include <net/mctpdevice.h>
+#include <net/sock.h>
+struct mctp_dump_cb {
+	int h;
+	int idx;
+	size_t a_idx;
+};
+/* unlocked: caller must hold rcu_read_lock */
+struct mctp_dev *__mctp_dev_get(const struct net_device *dev)
+{
+	return rcu_dereference(dev->mctp_ptr);
+}
+struct mctp_dev *mctp_dev_get_rtnl(const struct net_device *dev)
+{
+	return rtnl_dereference(dev->mctp_ptr);
+}
+static void mctp_dev_destroy(struct mctp_dev *mdev)
+{
+	struct net_device *dev = mdev->dev;
+	dev_put(dev);
+	kfree_rcu(mdev, rcu);
+}
+static int mctp_fill_addrinfo(struct sk_buff *skb, struct netlink_callback *cb,
+			      struct mctp_dev *mdev, mctp_eid_t eid)
+{
+	struct ifaddrmsg *hdr;
+	struct nlmsghdr *nlh;
+	nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
+			RTM_NEWADDR, sizeof(*hdr), NLM_F_MULTI);
+	if (!nlh)
+		return -EMSGSIZE;
+	hdr = nlmsg_data(nlh);
+	hdr->ifa_family = AF_MCTP;
+	hdr->ifa_prefixlen = 0;
+	hdr->ifa_flags = 0;
+	hdr->ifa_scope = 0;
+	hdr->ifa_index = mdev->dev->ifindex;
+	if (nla_put_u8(skb, IFA_LOCAL, eid))
+		goto cancel;
+	if (nla_put_u8(skb, IFA_ADDRESS, eid))
+		goto cancel;
+	nlmsg_end(skb, nlh);
+	return 0;
+cancel:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+static int mctp_dump_dev_addrinfo(struct mctp_dev *mdev, struct sk_buff *skb,
+				  struct netlink_callback *cb)
+{
+	struct mctp_dump_cb *mcb = (void *)cb->ctx;
+	int rc = 0;
+	for (; mcb->a_idx < mdev->num_addrs; mcb->a_idx++) {
+		rc = mctp_fill_addrinfo(skb, cb, mdev, mdev->addrs[mcb->a_idx]);
+		if (rc < 0)
+			break;
+	}
+	return rc;
+}
+static int mctp_dump_addrinfo(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct mctp_dump_cb *mcb = (void *)cb->ctx;
+	struct net *net = sock_net(skb->sk);
+	struct hlist_head *head;
+	struct net_device *dev;
+	struct ifaddrmsg *hdr;
+	struct mctp_dev *mdev;
+	int ifindex;
+	int idx, rc;
+	hdr = nlmsg_data(cb->nlh);
+	// filter by ifindex if requested
+	ifindex = hdr->ifa_index;
+	rcu_read_lock();
+	for (; mcb->h < NETDEV_HASHENTRIES; mcb->h++, mcb->idx = 0) {
+		idx = 0;
+		head = &net->dev_index_head[mcb->h];
+		hlist_for_each_entry_rcu(dev, head, index_hlist) {
+			if (idx >= mcb->idx &&
+			    (ifindex == 0 || ifindex == dev->ifindex)) {
+				mdev = __mctp_dev_get(dev);
+				if (mdev) {
+					rc = mctp_dump_dev_addrinfo(mdev,
+								    skb, cb);
+					// Error indicates full buffer, this
+					// callback will get retried.
+					if (rc < 0)
+						goto out;
+				}
+			}
+			idx++;
+			// reset for next iteration
+			mcb->a_idx = 0;
+		}
+	}
+out:
+	rcu_read_unlock();
+	mcb->idx = idx;
+	return skb->len;
+}
+static const struct nla_policy ifa_mctp_policy[IFA_MAX + 1] = {
+	[IFA_ADDRESS]		= { .type = NLA_U8 },
+	[IFA_LOCAL]		= { .type = NLA_U8 },
+};
+static int mctp_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh,
+			    struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[IFA_MAX + 1];
+	struct net_device *dev;
+	struct mctp_addr *addr;
+	struct mctp_dev *mdev;
+	struct ifaddrmsg *ifm;
+	unsigned long flags;
+	u8 *tmp_addrs;
+	int rc;
+	rc = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_mctp_policy,
+			 extack);
+	if (rc < 0)
+		return rc;
+	ifm = nlmsg_data(nlh);
+	if (tb[IFA_LOCAL])
+		addr = nla_data(tb[IFA_LOCAL]);
+	else if (tb[IFA_ADDRESS])
+		addr = nla_data(tb[IFA_ADDRESS]);
+	else
+		return -EINVAL;
+	/* find device */
+	dev = __dev_get_by_index(net, ifm->ifa_index);
+	if (!dev)
+		return -ENODEV;
+	mdev = mctp_dev_get_rtnl(dev);
+	if (!mdev)
+		return -ENODEV;
+	if (!mctp_address_ok(addr->s_addr))
+		return -EINVAL;
+	/* Prevent duplicates. Under RTNL so don't need to lock for reading */
+	if (memchr(mdev->addrs, addr->s_addr, mdev->num_addrs))
+		return -EEXIST;
+	tmp_addrs = kmalloc(mdev->num_addrs + 1, GFP_KERNEL);
+	if (!tmp_addrs)
+		return -ENOMEM;
+	memcpy(tmp_addrs, mdev->addrs, mdev->num_addrs);
+	tmp_addrs[mdev->num_addrs] = addr->s_addr;
+	/* Lock to write */
+	spin_lock_irqsave(&mdev->addrs_lock, flags);
+	mdev->num_addrs++;
+	swap(mdev->addrs, tmp_addrs);
+	spin_unlock_irqrestore(&mdev->addrs_lock, flags);
+	kfree(tmp_addrs);
+	mctp_route_add_local(mdev, addr->s_addr);
+	return 0;
+}
+static int mctp_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh,
+			    struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[IFA_MAX + 1];
+	struct net_device *dev;
+	struct mctp_addr *addr;
+	struct mctp_dev *mdev;
+	struct ifaddrmsg *ifm;
+	unsigned long flags;
+	u8 *pos;
+	int rc;
+	rc = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_mctp_policy,
+			 extack);
+	if (rc < 0)
+		return rc;
+	ifm = nlmsg_data(nlh);
+	if (tb[IFA_LOCAL])
+		addr = nla_data(tb[IFA_LOCAL]);
+	else if (tb[IFA_ADDRESS])
+		addr = nla_data(tb[IFA_ADDRESS]);
+	else
+		return -EINVAL;
+	/* find device */
+	dev = __dev_get_by_index(net, ifm->ifa_index);
+	if (!dev)
+		return -ENODEV;
+	mdev = mctp_dev_get_rtnl(dev);
+	if (!mdev)
+		return -ENODEV;
+	pos = memchr(mdev->addrs, addr->s_addr, mdev->num_addrs);
+	if (!pos)
+		return -ENOENT;
+	rc = mctp_route_remove_local(mdev, addr->s_addr);
+	// we can ignore -ENOENT in the case a route was already removed
+	if (rc < 0 && rc != -ENOENT)
+		return rc;
+	spin_lock_irqsave(&mdev->addrs_lock, flags);
+	memmove(pos, pos + 1, mdev->num_addrs - 1 - (pos - mdev->addrs));
+	mdev->num_addrs--;
+	spin_unlock_irqrestore(&mdev->addrs_lock, flags);
+	return 0;
+}
+static struct mctp_dev *mctp_add_dev(struct net_device *dev)
+{
+	struct mctp_dev *mdev;
+	ASSERT_RTNL();
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev)
+		return ERR_PTR(-ENOMEM);
+	spin_lock_init(&mdev->addrs_lock);
+	mdev->net = mctp_default_net(dev_net(dev));
+	/* associate to net_device */
+	rcu_assign_pointer(dev->mctp_ptr, mdev);
+	dev_hold(dev);
+	mdev->dev = dev;
+	return mdev;
+}
+static int mctp_fill_link_af(struct sk_buff *skb,
+			     const struct net_device *dev, u32 ext_filter_mask)
+{
+	struct mctp_dev *mdev;
+	mdev = mctp_dev_get_rtnl(dev);
+	if (!mdev)
+		return -ENODATA;
+	if (nla_put_u32(skb, IFLA_MCTP_NET, mdev->net))
+		return -EMSGSIZE;
+	return 0;
+}
+static size_t mctp_get_link_af_size(const struct net_device *dev,
+				    u32 ext_filter_mask)
+{
+	struct mctp_dev *mdev;
+	unsigned int ret;
+	/* caller holds RCU */
+	mdev = __mctp_dev_get(dev);
+	if (!mdev)
+		return 0;
+	ret = nla_total_size(4); /* IFLA_MCTP_NET */
+	return ret;
+}
+static const struct nla_policy ifla_af_mctp_policy[IFLA_MCTP_MAX + 1] = {
+	[IFLA_MCTP_NET]		= { .type = NLA_U32 },
+};
+static int mctp_set_link_af(struct net_device *dev, const struct nlattr *attr,
+			    struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[IFLA_MCTP_MAX + 1];
+	struct mctp_dev *mdev;
+	int rc;
+	rc = nla_parse_nested(tb, IFLA_MCTP_MAX, attr, ifla_af_mctp_policy,
+			      NULL);
+	if (rc)
+		return rc;
+	mdev = mctp_dev_get_rtnl(dev);
+	if (!mdev)
+		return 0;
+	if (tb[IFLA_MCTP_NET])
+		WRITE_ONCE(mdev->net, nla_get_u32(tb[IFLA_MCTP_NET]));
+	return 0;
+}
+static void mctp_unregister(struct net_device *dev)
+{
+	struct mctp_dev *mdev;
+	mdev = mctp_dev_get_rtnl(dev);
+	if (!mdev)
+		return;
+	RCU_INIT_POINTER(mdev->dev->mctp_ptr, NULL);
+	mctp_route_remove_dev(mdev);
+	mctp_neigh_remove_dev(mdev);
+	kfree(mdev->addrs);
+	mctp_dev_destroy(mdev);
+}
+static int mctp_register(struct net_device *dev)
+{
+	struct mctp_dev *mdev;
+	/* Already registered? */
+	if (rtnl_dereference(dev->mctp_ptr))
+		return 0;
+	/* only register specific types; MCTP-specific and loopback for now */
+	if (dev->type != ARPHRD_MCTP && dev->type != ARPHRD_LOOPBACK)
+		return 0;
+	mdev = mctp_add_dev(dev);
+	if (IS_ERR(mdev))
+		return PTR_ERR(mdev);
+	return 0;
+}
+static int mctp_dev_notify(struct notifier_block *this, unsigned long event,
+			   void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	int rc;
+	switch (event) {
+	case NETDEV_REGISTER:
+		rc = mctp_register(dev);
+		if (rc)
+			return notifier_from_errno(rc);
+		break;
+	case NETDEV_UNREGISTER:
+		mctp_unregister(dev);
+		break;
+	}
+	return NOTIFY_OK;
+}
+static struct rtnl_af_ops mctp_af_ops = {
+	.family = AF_MCTP,
+	.fill_link_af = mctp_fill_link_af,
+	.get_link_af_size = mctp_get_link_af_size,
+	.set_link_af = mctp_set_link_af,
+};
+static struct notifier_block mctp_dev_nb = {
+	.notifier_call = mctp_dev_notify,
+	.priority = ADDRCONF_NOTIFY_PRIORITY,
+};
+void __init mctp_device_init(void)
+{
+	register_netdevice_notifier(&mctp_dev_nb);
+	rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETADDR,
+			     NULL, mctp_dump_addrinfo, 0);
+	rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_NEWADDR,
+			     mctp_rtm_newaddr, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_DELADDR,
+			     mctp_rtm_deladdr, NULL, 0);
+	rtnl_af_register(&mctp_af_ops);
+}
+void __exit mctp_device_exit(void)
+{
+	rtnl_af_unregister(&mctp_af_ops);
+	rtnl_unregister(PF_MCTP, RTM_DELADDR);
+	rtnl_unregister(PF_MCTP, RTM_NEWADDR);
+	rtnl_unregister(PF_MCTP, RTM_GETADDR);
+	unregister_netdevice_notifier(&mctp_dev_nb);
+}
--- a/net/mctp/neigh.c
+++ b/net/mctp/neigh.c
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Management Component Transport Protocol (MCTP) - routing
+ * implementation.
+ *
+ * This is currently based on a simple routing table, with no dst cache. The
+ * number of routes should stay fairly small, so the lookup cost is small.
+ *
+ * Copyright (c) 2021 Code Construct
+ * Copyright (c) 2021 Google
+ */
+#include <linux/idr.h>
+#include <linux/mctp.h>
+#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
+#include <linux/skbuff.h>
+#include <net/mctp.h>
+#include <net/mctpdevice.h>
+#include <net/netlink.h>
+#include <net/sock.h>
+static int mctp_neigh_add(struct mctp_dev *mdev, mctp_eid_t eid,
+			  enum mctp_neigh_source source,
+			  size_t lladdr_len, const void *lladdr)
+{
+	struct net *net = dev_net(mdev->dev);
+	struct mctp_neigh *neigh;
+	int rc;
+	mutex_lock(&net->mctp.neigh_lock);
+	if (mctp_neigh_lookup(mdev, eid, NULL) == 0) {
+		rc = -EEXIST;
+		goto out;
+	}
+	if (lladdr_len > sizeof(neigh->ha)) {
+		rc = -EINVAL;
+		goto out;
+	}
+	neigh = kzalloc(sizeof(*neigh), GFP_KERNEL);
+	if (!neigh) {
+		rc = -ENOMEM;
+		goto out;
+	}
+	INIT_LIST_HEAD(&neigh->list);
+	neigh->dev = mdev;
+	dev_hold(neigh->dev->dev);
+	neigh->eid = eid;
+	neigh->source = source;
+	memcpy(neigh->ha, lladdr, lladdr_len);
+	list_add_rcu(&neigh->list, &net->mctp.neighbours);
+	rc = 0;
+out:
+	mutex_unlock(&net->mctp.neigh_lock);
+	return rc;
+}
+static void __mctp_neigh_free(struct rcu_head *rcu)
+{
+	struct mctp_neigh *neigh = container_of(rcu, struct mctp_neigh, rcu);
+	dev_put(neigh->dev->dev);
+	kfree(neigh);
+}
+/* Removes all neighbour entries referring to a device */
+void mctp_neigh_remove_dev(struct mctp_dev *mdev)
+{
+	struct net *net = dev_net(mdev->dev);
+	struct mctp_neigh *neigh, *tmp;
+	mutex_lock(&net->mctp.neigh_lock);
+	list_for_each_entry_safe(neigh, tmp, &net->mctp.neighbours, list) {
+		if (neigh->dev == mdev) {
+			list_del_rcu(&neigh->list);
+			/* TODO: immediate RTM_DELNEIGH */
+			call_rcu(&neigh->rcu, __mctp_neigh_free);
+		}
+	}
+	mutex_unlock(&net->mctp.neigh_lock);
+}
+// TODO: add a "source" flag so netlink can only delete static neighbours?
+static int mctp_neigh_remove(struct mctp_dev *mdev, mctp_eid_t eid)
+{
+	struct net *net = dev_net(mdev->dev);
+	struct mctp_neigh *neigh, *tmp;
+	bool dropped = false;
+	mutex_lock(&net->mctp.neigh_lock);
+	list_for_each_entry_safe(neigh, tmp, &net->mctp.neighbours, list) {
+		if (neigh->dev == mdev && neigh->eid == eid) {
+			list_del_rcu(&neigh->list);
+			/* TODO: immediate RTM_DELNEIGH */
+			call_rcu(&neigh->rcu, __mctp_neigh_free);
+			dropped = true;
+		}
+	}
+	mutex_unlock(&net->mctp.neigh_lock);
+	return dropped ? 0 : -ENOENT;
+}
+static const struct nla_policy nd_mctp_policy[NDA_MAX + 1] = {
+	[NDA_DST]		= { .type = NLA_U8 },
+	[NDA_LLADDR]		= { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
+};
+static int mctp_rtm_newneigh(struct sk_buff *skb, struct nlmsghdr *nlh,
+			     struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+	struct mctp_dev *mdev;
+	struct ndmsg *ndm;
+	struct nlattr *tb[NDA_MAX + 1];
+	int rc;
+	mctp_eid_t eid;
+	void *lladdr;
+	int lladdr_len;
+	rc = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, nd_mctp_policy,
+			 extack);
+	if (rc < 0) {
+		NL_SET_ERR_MSG(extack, "lladdr too large?");
+		return rc;
+	}
+	if (!tb[NDA_DST]) {
+		NL_SET_ERR_MSG(extack, "Neighbour EID must be specified");
+		return -EINVAL;
+	}
+	if (!tb[NDA_LLADDR]) {
+		NL_SET_ERR_MSG(extack, "Neighbour lladdr must be specified");
+		return -EINVAL;
+	}
+	eid = nla_get_u8(tb[NDA_DST]);
+	if (!mctp_address_ok(eid)) {
+		NL_SET_ERR_MSG(extack, "Invalid neighbour EID");
+		return -EINVAL;
+	}
+	lladdr = nla_data(tb[NDA_LLADDR]);
+	lladdr_len = nla_len(tb[NDA_LLADDR]);
+	ndm = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
+	if (!dev)
+		return -ENODEV;
+	mdev = mctp_dev_get_rtnl(dev);
+	if (!mdev)
+		return -ENODEV;
+	if (lladdr_len != dev->addr_len) {
+		NL_SET_ERR_MSG(extack, "Wrong lladdr length");
+		return -EINVAL;
+	}
+	return mctp_neigh_add(mdev, eid, MCTP_NEIGH_STATIC,
+			lladdr_len, lladdr);
+}
+static int mctp_rtm_delneigh(struct sk_buff *skb, struct nlmsghdr *nlh,
+			     struct netlink_ext_ack *extack)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[NDA_MAX + 1];
+	struct net_device *dev;
+	struct mctp_dev *mdev;
+	struct ndmsg *ndm;
+	int rc;
+	mctp_eid_t eid;
+	rc = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, nd_mctp_policy,
+			 extack);
+	if (rc < 0) {
+		NL_SET_ERR_MSG(extack, "incorrect format");
+		return rc;
+	}
+	if (!tb[NDA_DST]) {
+		NL_SET_ERR_MSG(extack, "Neighbour EID must be specified");
+		return -EINVAL;
+	}
+	eid = nla_get_u8(tb[NDA_DST]);
+	ndm = nlmsg_data(nlh);
+	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
+	if (!dev)
+		return -ENODEV;
+	mdev = mctp_dev_get_rtnl(dev);
+	if (!mdev)
+		return -ENODEV;
+	return mctp_neigh_remove(mdev, eid);
+}
+static int mctp_fill_neigh(struct sk_buff *skb, u32 portid, u32 seq, int event,
+			   unsigned int flags, struct mctp_neigh *neigh)
+{
+	struct net_device *dev = neigh->dev->dev;
+	struct nlmsghdr *nlh;
+	struct ndmsg *hdr;
+	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*hdr), flags);
+	if (!nlh)
+		return -EMSGSIZE;
+	hdr = nlmsg_data(nlh);
+	hdr->ndm_family = AF_MCTP;
+	hdr->ndm_ifindex = dev->ifindex;
+	hdr->ndm_state = 0; // TODO other state bits?
+	if (neigh->source == MCTP_NEIGH_STATIC)
+		hdr->ndm_state |= NUD_PERMANENT;
+	hdr->ndm_flags = 0;
+	hdr->ndm_type = RTN_UNICAST; // TODO: is loopback RTN_LOCAL?
+	if (nla_put_u8(skb, NDA_DST, neigh->eid))
+		goto cancel;
+	if (nla_put(skb, NDA_LLADDR, dev->addr_len, neigh->ha))
+		goto cancel;
+	nlmsg_end(skb, nlh);
+	return 0;
+cancel:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+static int mctp_rtm_getneigh(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	int rc, idx, req_ifindex;
+	struct mctp_neigh *neigh;
+	struct ndmsg *ndmsg;
+	struct {
+		int idx;
+	} *cbctx = (void *)cb->ctx;
+	ndmsg = nlmsg_data(cb->nlh);
+	req_ifindex = ndmsg->ndm_ifindex;
+	idx = 0;
+	rcu_read_lock();
+	list_for_each_entry_rcu(neigh, &net->mctp.neighbours, list) {
+		if (idx < cbctx->idx)
+			goto cont;
+		rc = 0;
+		if (req_ifindex == 0 || req_ifindex == neigh->dev->dev->ifindex)
+			rc = mctp_fill_neigh(skb, NETLINK_CB(cb->skb).portid,
+					     cb->nlh->nlmsg_seq,
+					     RTM_NEWNEIGH, NLM_F_MULTI, neigh);
+		if (rc)
+			break;
+cont:
+		idx++;
+	}
+	rcu_read_unlock();
+	cbctx->idx = idx;
+	return skb->len;
+}
+int mctp_neigh_lookup(struct mctp_dev *mdev, mctp_eid_t eid, void *ret_hwaddr)
+{
+	struct net *net = dev_net(mdev->dev);
+	struct mctp_neigh *neigh;
+	int rc = -EHOSTUNREACH; // TODO: or ENOENT?
+	rcu_read_lock();
+	list_for_each_entry_rcu(neigh, &net->mctp.neighbours, list) {
+		if (mdev == neigh->dev && eid == neigh->eid) {
+			if (ret_hwaddr)
+				memcpy(ret_hwaddr, neigh->ha,
+				       sizeof(neigh->ha));
+			rc = 0;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return rc;
+}
+/* namespace registration */
+static int __net_init mctp_neigh_net_init(struct net *net)
+{
+	struct netns_mctp *ns = &net->mctp;
+	INIT_LIST_HEAD(&ns->neighbours);
+	mutex_init(&ns->neigh_lock);
+	return 0;
+}
+static void __net_exit mctp_neigh_net_exit(struct net *net)
+{
+	struct netns_mctp *ns = &net->mctp;
+	struct mctp_neigh *neigh;
+	list_for_each_entry(neigh, &ns->neighbours, list)
+		call_rcu(&neigh->rcu, __mctp_neigh_free);
+}
+/* net namespace implementation */
+static struct pernet_operations mctp_net_ops = {
+	.init = mctp_neigh_net_init,
+	.exit = mctp_neigh_net_exit,
+};
+int __init mctp_neigh_init(void)
+{
+	rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_NEWNEIGH,
+			     mctp_rtm_newneigh, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_DELNEIGH,
+			     mctp_rtm_delneigh, NULL, 0);
+	rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETNEIGH,
+			     NULL, mctp_rtm_getneigh, 0);
+	return register_pernet_subsys(&mctp_net_ops);
+}
+void __exit mctp_neigh_exit(void)
+{
+	unregister_pernet_subsys(&mctp_net_ops);
+	rtnl_unregister(PF_MCTP, RTM_GETNEIGH);
+	rtnl_unregister(PF_MCTP, RTM_DELNEIGH);
+	rtnl_unregister(PF_MCTP, RTM_NEWNEIGH);
+}
--- a/net/mctp/route.c
+++ b/net/mctp/route.c
--- a/net/socket.c
+++ b/net/socket.c
@@ -212,6 +212,7 @@ static const char * const pf_family_names[] = {
 	[PF_QIPCRTR]	= "PF_QIPCRTR",
 	[PF_SMC]	= "PF_SMC",
 	[PF_XDP]	= "PF_XDP",
+	[PF_MCTP]	= "PF_MCTP",
 };
 /*

--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1330,7 +1330,9 @@ static inline u16 socket_type_to_security_class(int family, int type, int protoc
 			return SECCLASS_SMC_SOCKET;
 		case PF_XDP:
 			return SECCLASS_XDP_SOCKET;
-#if PF_MAX > 45
+		case PF_MCTP:
+			return SECCLASS_MCTP_SOCKET;
+#if PF_MAX > 46
 #error New address family defined, please update this function.
 #endif
 		}

--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -246,6 +246,8 @@ struct security_class_mapping secclass_map[] = {
 	    NULL } },
 	{ "xdp_socket",
 	  { COMMON_SOCK_PERMS, NULL } },
+	{ "mctp_socket",
+	  { COMMON_SOCK_PERMS, NULL } },
 	{ "perf_event",
 	  { "open", "cpu", "kernel", "tracepoint", "read", "write", NULL } },
 	{ "lockdown",
@@ -255,6 +257,6 @@ struct security_class_mapping secclass_map[] = {
 	{ NULL }
  };
-#if PF_MAX > 45
+#if PF_MAX > 46
 #error New address family defined, please update secclass_map.
 #endif