Commit e5fe3a5f authored by David S. Miller's avatar David S. Miller

Merge branch 'mctp'

Jeremy Kerr says:

====================
Add Management Component Transport Protocol support

This series adds core MCTP support to the kernel. From the Kconfig
description:

  Management Component Transport Protocol (MCTP) is an in-system
  protocol for communicating between management controllers and
  their managed devices (peripherals, host processors, etc.). The
  protocol is defined by DMTF specification DSP0236.

  This option enables core MCTP support. For communicating with other
  devices, you'll want to enable a driver for a specific hardware
  channel.

This implementation allows a sockets-based API for sending and receiving
MCTP messages via sendmsg/recvmsg on SOCK_DGRAM sockets. Kernel stack
control is all via netlink, using existing RTM_* messages. The userspace
ABI change is fairly small; just the necessary AF_/ETH_P_/ARPHDR_
constants, a new sockaddr, and a new netlink attribute.

For MAINTAINERS, I've just included netdev@ as the list entry. I'm happy
to alter this based on preferences here - an alternative would be the
OpenBMC list (the main user of the MCTP interface), or we can create a
new list entirely.

We have a couple of interface drivers almost ready to go at the moment,
but those can wait until the core code has some review.

This is v4 of the series; v1 and v2 were both RFC.

selinux folks: CCing 01/15 due to the new PF_MCTP protocol family.

linux-doc folks: CCing 15/15 for the new MCTP overview document.

Review, comments, questions etc. are most welcome.

Cheers,

Jeremy

v2:
 - change to match spec terminology: controller -> component
 - require specific capabilities for bind() & sendmsg()
 - add address and tag defintions to uapi
 - add selinux AF_MCTP table definitions
 - remove strict cflags; warnings are present in common headers

v3:
 - require caps for MCTP bind() & send()
 - comment typo fixes
 - switch to an array for local EIDs
 - fix addrinfo dump iteration & error path
 - add RTM_DELADDR
 - remove GENMASK() and BIT() from uapi

v4:
 - drop tun patch; that can be submitted separately
 - keep nipa happy: add maintainer CCs, including doc and selinux
 - net-next rebase
 - Include AF_MCTP in af_family_slock_keys and pf_family_names
 - Introduce MODULE_ definitions earlier
 - upstream change: set_link_af no longer called with RTNL held
 - add kdoc for net_device.mctp_ptr
 - don't inline mctp_rt_match_eid
 - require rtm_type == RTN_UNICAST in route management handlers
 - remove unused RTAX policy table
 - fix mctp_sock->keys rcu annotations
 - fix spurious rcu_read_unlock in route input
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 658e6b16 6a2d98b1
......@@ -69,6 +69,7 @@ Contents:
l2tp
lapb-module
mac80211-injection
mctp
mpls-sysctl
mptcp-sysctl
multiqueue
......
.. SPDX-License-Identifier: GPL-2.0
==============================================
Management Component Transport Protocol (MCTP)
==============================================
net/mctp/ contains protocol support for MCTP, as defined by DMTF standard
DSP0236. Physical interface drivers ("bindings" in the specification) are
provided in drivers/net/mctp/.
The core code provides a socket-based interface to send and receive MCTP
messages, through an AF_MCTP, SOCK_DGRAM socket.
Structure: interfaces & networks
================================
The kernel models the local MCTP topology through two items: interfaces and
networks.
An interface (or "link") is an instance of an MCTP physical transport binding
(as defined by DSP0236, section 3.2.47), likely connected to a specific hardware
device. This is represented as a ``struct netdevice``.
A network defines a unique address space for MCTP endpoints by endpoint-ID
(described by DSP0236, section 3.2.31). A network has a user-visible identifier
to allow references from userspace. Route definitions are specific to one
network.
Interfaces are associated with one network. A network may be associated with one
or more interfaces.
If multiple networks are present, each may contain endpoint IDs (EIDs) that are
also present on other networks.
Sockets API
===========
Protocol definitions
--------------------
MCTP uses ``AF_MCTP`` / ``PF_MCTP`` for the address- and protocol- families.
Since MCTP is message-based, only ``SOCK_DGRAM`` sockets are supported.
.. code-block:: C
int sd = socket(AF_MCTP, SOCK_DGRAM, 0);
The only (current) value for the ``protocol`` argument is 0.
As with all socket address families, source and destination addresses are
specified with a ``sockaddr`` type, with a single-byte endpoint address:
.. code-block:: C
typedef __u8 mctp_eid_t;
struct mctp_addr {
mctp_eid_t s_addr;
};
struct sockaddr_mctp {
unsigned short int smctp_family;
int smctp_network;
struct mctp_addr smctp_addr;
__u8 smctp_type;
__u8 smctp_tag;
};
#define MCTP_NET_ANY 0x0
#define MCTP_ADDR_ANY 0xff
Syscall behaviour
-----------------
The following sections describe the MCTP-specific behaviours of the standard
socket system calls. These behaviours have been chosen to map closely to the
existing sockets APIs.
``bind()`` : set local socket address
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sockets that receive incoming request packets will bind to a local address,
using the ``bind()`` syscall.
.. code-block:: C
struct sockaddr_mctp addr;
addr.smctp_family = AF_MCTP;
addr.smctp_network = MCTP_NET_ANY;
addr.smctp_addr.s_addr = MCTP_ADDR_ANY;
addr.smctp_type = MCTP_TYPE_PLDM;
addr.smctp_tag = MCTP_TAG_OWNER;
int rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr));
This establishes the local address of the socket. Incoming MCTP messages that
match the network, address, and message type will be received by this socket.
The reference to 'incoming' is important here; a bound socket will only receive
messages with the TO bit set, to indicate an incoming request message, rather
than a response.
The ``smctp_tag`` value will configure the tags accepted from the remote side of
this socket. Given the above, the only valid value is ``MCTP_TAG_OWNER``, which
will result in remotely "owned" tags being routed to this socket. Since
``MCTP_TAG_OWNER`` is set, the 3 least-significant bits of ``smctp_tag`` are not
used; callers must set them to zero.
A ``smctp_network`` value of ``MCTP_NET_ANY`` will configure the socket to
receive incoming packets from any locally-connected network. A specific network
value will cause the socket to only receive incoming messages from that network.
The ``smctp_addr`` field specifies a local address to bind to. A value of
``MCTP_ADDR_ANY`` configures the socket to receive messages addressed to any
local destination EID.
The ``smctp_type`` field specifies which message types to receive. Only the
lower 7 bits of the type is matched on incoming messages (ie., the
most-significant IC bit is not part of the match). This results in the socket
receiving packets with and without a message integrity check footer.
``sendto()``, ``sendmsg()``, ``send()`` : transmit an MCTP message
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An MCTP message is transmitted using one of the ``sendto()``, ``sendmsg()`` or
``send()`` syscalls. Using ``sendto()`` as the primary example:
.. code-block:: C
struct sockaddr_mctp addr;
char buf[14];
ssize_t len;
/* set message destination */
addr.smctp_family = AF_MCTP;
addr.smctp_network = 0;
addr.smctp_addr.s_addr = 8;
addr.smctp_tag = MCTP_TAG_OWNER;
addr.smctp_type = MCTP_TYPE_ECHO;
/* arbitrary message to send, with message-type header */
buf[0] = MCTP_TYPE_ECHO;
memcpy(buf + 1, "hello, world!", sizeof(buf) - 1);
len = sendto(sd, buf, sizeof(buf), 0,
(struct sockaddr_mctp *)&addr, sizeof(addr));
The network and address fields of ``addr`` define the remote address to send to.
If ``smctp_tag`` has the ``MCTP_TAG_OWNER``, the kernel will ignore any bits set
in ``MCTP_TAG_VALUE``, and generate a tag value suitable for the destination
EID. If ``MCTP_TAG_OWNER`` is not set, the message will be sent with the tag
value as specified. If a tag value cannot be allocated, the system call will
report an errno of ``EAGAIN``.
The application must provide the message type byte as the first byte of the
message buffer passed to ``sendto()``. If a message integrity check is to be
included in the transmitted message, it must also be provided in the message
buffer, and the most-significant bit of the message type byte must be 1.
The ``sendmsg()`` system call allows a more compact argument interface, and the
message buffer to be specified as a scatter-gather list. At present no ancillary
message types (used for the ``msg_control`` data passed to ``sendmsg()``) are
defined.
Transmitting a message on an unconnected socket with ``MCTP_TAG_OWNER``
specified will cause an allocation of a tag, if no valid tag is already
allocated for that destination. The (destination-eid,tag) tuple acts as an
implicit local socket address, to allow the socket to receive responses to this
outgoing message. If any previous allocation has been performed (to for a
different remote EID), that allocation is lost.
Sockets will only receive responses to requests they have sent (with TO=1) and
may only respond (with TO=0) to requests they have received.
``recvfrom()``, ``recvmsg()``, ``recv()`` : receive an MCTP message
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An MCTP message can be received by an application using one of the
``recvfrom()``, ``recvmsg()``, or ``recv()`` system calls. Using ``recvfrom()``
as the primary example:
.. code-block:: C
struct sockaddr_mctp addr;
socklen_t addrlen;
char buf[14];
ssize_t len;
addrlen = sizeof(addr);
len = recvfrom(sd, buf, sizeof(buf), 0,
(struct sockaddr_mctp *)&addr, &addrlen);
/* We can expect addr to describe an MCTP address */
assert(addrlen >= sizeof(buf));
assert(addr.smctp_family == AF_MCTP);
printf("received %zd bytes from remote EID %d\n", rc, addr.smctp_addr);
The address argument to ``recvfrom`` and ``recvmsg`` is populated with the
remote address of the incoming message, including tag value (this will be needed
in order to reply to the message).
The first byte of the message buffer will contain the message type byte. If an
integrity check follows the message, it will be included in the received buffer.
The ``recv()`` system call behaves in a similar way, but does not provide a
remote address to the application. Therefore, these are only useful if the
remote address is already known, or the message does not require a reply.
Like the send calls, sockets will only receive responses to requests they have
sent (TO=1) and may only respond (TO=0) to requests they have received.
......@@ -11032,6 +11032,18 @@ F: drivers/mailbox/arm_mhuv2.c
F: include/linux/mailbox/arm_mhuv2_message.h
F: Documentation/devicetree/bindings/mailbox/arm,mhuv2.yaml
MANAGEMENT COMPONENT TRANSPORT PROTOCOL (MCTP)
M: Jeremy Kerr <jk@codeconstruct.com.au>
M: Matt Johnston <matt@codeconstruct.com.au>
L: netdev@vger.kernel.org
S: Maintained
F: Documentation/networking/mctp.rst
F: drivers/net/mctp/
F: include/net/mctp.h
F: include/net/mctpdevice.h
F: include/net/netns/mctp.h
F: net/mctp/
MAN-PAGES: MANUAL PAGES FOR LINUX -- Sections 2, 3, 4, 5, and 7
M: Michael Kerrisk <mtk.manpages@gmail.com>
L: linux-man@vger.kernel.org
......
......@@ -483,6 +483,8 @@ config NET_SB1000
source "drivers/net/phy/Kconfig"
source "drivers/net/mctp/Kconfig"
source "drivers/net/mdio/Kconfig"
source "drivers/net/pcs/Kconfig"
......
......@@ -69,6 +69,7 @@ obj-$(CONFIG_WAN) += wan/
obj-$(CONFIG_WLAN) += wireless/
obj-$(CONFIG_IEEE802154) += ieee802154/
obj-$(CONFIG_WWAN) += wwan/
obj-$(CONFIG_MCTP) += mctp/
obj-$(CONFIG_VMXNET3) += vmxnet3/
obj-$(CONFIG_XEN_NETDEV_FRONTEND) += xen-netfront.o
......
if MCTP
menu "MCTP Device Drivers"
endmenu
endif
......@@ -1823,6 +1823,7 @@ enum netdev_ml_priv_type {
* @ieee802154_ptr: IEEE 802.15.4 low-rate Wireless Personal Area Network
* device struct
* @mpls_ptr: mpls_dev struct pointer
* @mctp_ptr: MCTP specific data
*
* @dev_addr: Hw address (before bcast,
* because most packets are unicast)
......@@ -2110,6 +2111,9 @@ struct net_device {
#if IS_ENABLED(CONFIG_MPLS_ROUTING)
struct mpls_dev __rcu *mpls_ptr;
#endif
#if IS_ENABLED(CONFIG_MCTP)
struct mctp_dev __rcu *mctp_ptr;
#endif
/*
* Cache lines mostly used on receive path (including eth_type_trans())
......
......@@ -223,8 +223,11 @@ struct ucred {
* reuses AF_INET address family
*/
#define AF_XDP 44 /* XDP sockets */
#define AF_MCTP 45 /* Management component
* transport protocol
*/
#define AF_MAX 45 /* For now.. */
#define AF_MAX 46 /* For now.. */
/* Protocol families, same as address families. */
#define PF_UNSPEC AF_UNSPEC
......@@ -274,6 +277,7 @@ struct ucred {
#define PF_QIPCRTR AF_QIPCRTR
#define PF_SMC AF_SMC
#define PF_XDP AF_XDP
#define PF_MCTP AF_MCTP
#define PF_MAX AF_MAX
/* Maximum queue length specifiable by listen. */
......
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Management Component Transport Protocol (MCTP)
*
* Copyright (c) 2021 Code Construct
* Copyright (c) 2021 Google
*/
#ifndef __NET_MCTP_H
#define __NET_MCTP_H
#include <linux/bits.h>
#include <linux/mctp.h>
#include <net/net_namespace.h>
#include <net/sock.h>
/* MCTP packet definitions */
struct mctp_hdr {
u8 ver;
u8 dest;
u8 src;
u8 flags_seq_tag;
};
#define MCTP_VER_MIN 1
#define MCTP_VER_MAX 1
/* Definitions for flags_seq_tag field */
#define MCTP_HDR_FLAG_SOM BIT(7)
#define MCTP_HDR_FLAG_EOM BIT(6)
#define MCTP_HDR_FLAG_TO BIT(3)
#define MCTP_HDR_FLAGS GENMASK(5, 3)
#define MCTP_HDR_SEQ_SHIFT 4
#define MCTP_HDR_SEQ_MASK GENMASK(1, 0)
#define MCTP_HDR_TAG_SHIFT 0
#define MCTP_HDR_TAG_MASK GENMASK(2, 0)
#define MCTP_HEADER_MAXLEN 4
#define MCTP_INITIAL_DEFAULT_NET 1
static inline bool mctp_address_ok(mctp_eid_t eid)
{
return eid >= 8 && eid < 255;
}
static inline struct mctp_hdr *mctp_hdr(struct sk_buff *skb)
{
return (struct mctp_hdr *)skb_network_header(skb);
}
/* socket implementation */
struct mctp_sock {
struct sock sk;
/* bind() params */
int bind_net;
mctp_eid_t bind_addr;
__u8 bind_type;
/* list of mctp_sk_key, for incoming tag lookup. updates protected
* by sk->net->keys_lock
*/
struct hlist_head keys;
};
/* Key for matching incoming packets to sockets or reassembly contexts.
* Packets are matched on (src,dest,tag).
*
* Lifetime requirements:
*
* - keys are free()ed via RCU
*
* - a mctp_sk_key contains a reference to a struct sock; this is valid
* for the life of the key. On sock destruction (through unhash), the key is
* removed from lists (see below), and will not be observable after a RCU
* grace period.
*
* any RX occurring within that grace period may still queue to the socket,
* but will hit the SOCK_DEAD case before the socket is freed.
*
* - these mctp_sk_keys appear on two lists:
* 1) the struct mctp_sock->keys list
* 2) the struct netns_mctp->keys list
*
* updates to either list are performed under the netns_mctp->keys
* lock.
*
* - a key may have a sk_buff attached as part of an in-progress message
* reassembly (->reasm_head). The reassembly context is protected by
* reasm_lock, which may be acquired with the keys lock (above) held, if
* necessary. Consequently, keys lock *cannot* be acquired with the
* reasm_lock held.
*
* - there are two destruction paths for a mctp_sk_key:
*
* - through socket unhash (see mctp_sk_unhash). This performs the list
* removal under keys_lock.
*
* - where a key is established to receive a reply message: after receiving
* the (complete) reply, or during reassembly errors. Here, we clean up
* the reassembly context (marking reasm_dead, to prevent another from
* starting), and remove the socket from the netns & socket lists.
*/
struct mctp_sk_key {
mctp_eid_t peer_addr;
mctp_eid_t local_addr;
__u8 tag; /* incoming tag match; invert TO for local */
/* we hold a ref to sk when set */
struct sock *sk;
/* routing lookup list */
struct hlist_node hlist;
/* per-socket list */
struct hlist_node sklist;
/* incoming fragment reassembly context */
spinlock_t reasm_lock;
struct sk_buff *reasm_head;
struct sk_buff **reasm_tailp;
bool reasm_dead;
u8 last_seq;
struct rcu_head rcu;
};
struct mctp_skb_cb {
unsigned int magic;
unsigned int net;
mctp_eid_t src;
};
/* skb control-block accessors with a little extra debugging for initial
* development.
*
* TODO: remove checks & mctp_skb_cb->magic; replace callers of __mctp_cb
* with mctp_cb().
*
* __mctp_cb() is only for the initial ingress code; we should see ->magic set
* at all times after this.
*/
static inline struct mctp_skb_cb *__mctp_cb(struct sk_buff *skb)
{
struct mctp_skb_cb *cb = (void *)skb->cb;
cb->magic = 0x4d435450;
return cb;
}
static inline struct mctp_skb_cb *mctp_cb(struct sk_buff *skb)
{
struct mctp_skb_cb *cb = (void *)skb->cb;
WARN_ON(cb->magic != 0x4d435450);
return (void *)(skb->cb);
}
/* Route definition.
*
* These are held in the pernet->mctp.routes list, with RCU protection for
* removed routes. We hold a reference to the netdev; routes need to be
* dropped on NETDEV_UNREGISTER events.
*
* Updates to the route table are performed under rtnl; all reads under RCU,
* so routes cannot be referenced over a RCU grace period. Specifically: A
* caller cannot block between mctp_route_lookup and passing the route to
* mctp_do_route.
*/
struct mctp_route {
mctp_eid_t min, max;
struct mctp_dev *dev;
unsigned int mtu;
int (*output)(struct mctp_route *route,
struct sk_buff *skb);
struct list_head list;
refcount_t refs;
struct rcu_head rcu;
};
/* route interfaces */
struct mctp_route *mctp_route_lookup(struct net *net, unsigned int dnet,
mctp_eid_t daddr);
int mctp_do_route(struct mctp_route *rt, struct sk_buff *skb);
int mctp_local_output(struct sock *sk, struct mctp_route *rt,
struct sk_buff *skb, mctp_eid_t daddr, u8 req_tag);
/* routing <--> device interface */
unsigned int mctp_default_net(struct net *net);
int mctp_default_net_set(struct net *net, unsigned int index);
int mctp_route_add_local(struct mctp_dev *mdev, mctp_eid_t addr);
int mctp_route_remove_local(struct mctp_dev *mdev, mctp_eid_t addr);
void mctp_route_remove_dev(struct mctp_dev *mdev);
/* neighbour definitions */
enum mctp_neigh_source {
MCTP_NEIGH_STATIC,
MCTP_NEIGH_DISCOVER,
};
struct mctp_neigh {
struct mctp_dev *dev;
mctp_eid_t eid;
enum mctp_neigh_source source;
unsigned char ha[MAX_ADDR_LEN];
struct list_head list;
struct rcu_head rcu;
};
int mctp_neigh_init(void);
void mctp_neigh_exit(void);
// ret_hwaddr may be NULL, otherwise must have space for MAX_ADDR_LEN
int mctp_neigh_lookup(struct mctp_dev *dev, mctp_eid_t eid,
void *ret_hwaddr);
void mctp_neigh_remove_dev(struct mctp_dev *mdev);
int mctp_routes_init(void);
void mctp_routes_exit(void);
void mctp_device_init(void);
void mctp_device_exit(void);
#endif /* __NET_MCTP_H */
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Management Component Transport Protocol (MCTP) - device
* definitions.
*
* Copyright (c) 2021 Code Construct
* Copyright (c) 2021 Google
*/
#ifndef __NET_MCTPDEVICE_H
#define __NET_MCTPDEVICE_H
#include <linux/list.h>
#include <linux/types.h>
#include <linux/refcount.h>
struct mctp_dev {
struct net_device *dev;
unsigned int net;
/* Only modified under RTNL. Reads have addrs_lock held */
u8 *addrs;
size_t num_addrs;
spinlock_t addrs_lock;
struct rcu_head rcu;
};
#define MCTP_INITIAL_DEFAULT_NET 1
struct mctp_dev *mctp_dev_get_rtnl(const struct net_device *dev);
struct mctp_dev *__mctp_dev_get(const struct net_device *dev);
struct mctp_dev *mctp_dev_get_rtnl(const struct net_device *dev);
#endif /* __NET_MCTPDEVICE_H */
......@@ -34,6 +34,7 @@
#include <net/netns/xdp.h>
#include <net/netns/smc.h>
#include <net/netns/bpf.h>
#include <net/netns/mctp.h>
#include <linux/ns_common.h>
#include <linux/idr.h>
#include <linux/skbuff.h>
......@@ -167,6 +168,9 @@ struct net {
#ifdef CONFIG_XDP_SOCKETS
struct netns_xdp xdp;
#endif
#if IS_ENABLED(CONFIG_MCTP)
struct netns_mctp mctp;
#endif
#if IS_ENABLED(CONFIG_CRYPTO_USER)
struct sock *crypto_nlsk;
#endif
......
/* SPDX-License-Identifier: GPL-2.0 */
/*
* MCTP per-net structures
*/
#ifndef __NETNS_MCTP_H__
#define __NETNS_MCTP_H__
#include <linux/types.h>
struct netns_mctp {
/* Only updated under RTNL, entries freed via RCU */
struct list_head routes;
/* Bound sockets: list of sockets bound by type.
* This list is updated from non-atomic contexts (under bind_lock),
* and read (under rcu) in packet rx
*/
struct mutex bind_lock;
struct hlist_head binds;
/* tag allocations. This list is read and updated from atomic contexts,
* but elements are free()ed after a RCU grace-period
*/
spinlock_t keys_lock;
struct hlist_head keys;
/* MCTP network */
unsigned int default_net;
/* neighbour table */
struct mutex neigh_lock;
struct list_head neighbours;
};
#endif /* __NETNS_MCTP_H__ */
......@@ -54,6 +54,7 @@
#define ARPHRD_X25 271 /* CCITT X.25 */
#define ARPHRD_HWX25 272 /* Boards with X.25 in firmware */
#define ARPHRD_CAN 280 /* Controller Area Network */
#define ARPHRD_MCTP 290
#define ARPHRD_PPP 512
#define ARPHRD_CISCO 513 /* Cisco HDLC */
#define ARPHRD_HDLC ARPHRD_CISCO
......
......@@ -151,6 +151,9 @@
#define ETH_P_MAP 0x00F9 /* Qualcomm multiplexing and
* aggregation protocol
*/
#define ETH_P_MCTP 0x00FA /* Management component transport
* protocol packets
*/
/*
* This is an Ethernet frame header.
......
......@@ -1260,4 +1260,14 @@ struct ifla_rmnet_flags {
__u32 mask;
};
/* MCTP section */
enum {
IFLA_MCTP_UNSPEC,
IFLA_MCTP_NET,
__IFLA_MCTP_MAX,
};
#define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1)
#endif /* _UAPI_LINUX_IF_LINK_H */
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* Management Component Transport Protocol (MCTP)
*
* Copyright (c) 2021 Code Construct
* Copyright (c) 2021 Google
*/
#ifndef __UAPI_MCTP_H
#define __UAPI_MCTP_H
#include <linux/types.h>
typedef __u8 mctp_eid_t;
struct mctp_addr {
mctp_eid_t s_addr;
};
struct sockaddr_mctp {
unsigned short int smctp_family;
int smctp_network;
struct mctp_addr smctp_addr;
__u8 smctp_type;
__u8 smctp_tag;
};
#define MCTP_NET_ANY 0x0
#define MCTP_ADDR_NULL 0x00
#define MCTP_ADDR_ANY 0xff
#define MCTP_TAG_MASK 0x07
#define MCTP_TAG_OWNER 0x08
#endif /* __UAPI_MCTP_H */
......@@ -363,6 +363,7 @@ source "net/bluetooth/Kconfig"
source "net/rxrpc/Kconfig"
source "net/kcm/Kconfig"
source "net/strparser/Kconfig"
source "net/mctp/Kconfig"
config FIB_RULES
bool
......
......@@ -78,3 +78,4 @@ obj-$(CONFIG_QRTR) += qrtr/
obj-$(CONFIG_NET_NCSI) += ncsi/
obj-$(CONFIG_XDP_SOCKETS) += xdp/
obj-$(CONFIG_MPTCP) += mptcp/
obj-$(CONFIG_MCTP) += mctp/
......@@ -226,6 +226,7 @@ static struct lock_class_key af_family_kern_slock_keys[AF_MAX];
x "AF_IEEE802154", x "AF_CAIF" , x "AF_ALG" , \
x "AF_NFC" , x "AF_VSOCK" , x "AF_KCM" , \
x "AF_QIPCRTR", x "AF_SMC" , x "AF_XDP" , \
x "AF_MCTP" , \
x "AF_MAX"
static const char *const af_family_key_strings[AF_MAX+1] = {
......
menuconfig MCTP
depends on NET
tristate "MCTP core protocol support"
help
Management Component Transport Protocol (MCTP) is an in-system
protocol for communicating between management controllers and
their managed devices (peripherals, host processors, etc.). The
protocol is defined by DMTF specification DSP0236.
This option enables core MCTP support. For communicating with other
devices, you'll want to enable a driver for a specific hardware
channel.
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_MCTP) += mctp.o
mctp-objs := af_mctp.o device.o route.o neigh.o
// SPDX-License-Identifier: GPL-2.0
/*
* Management Component Transport Protocol (MCTP)
*
* Copyright (c) 2021 Code Construct
* Copyright (c) 2021 Google
*/
#include <linux/if_arp.h>
#include <linux/net.h>
#include <linux/mctp.h>
#include <linux/module.h>
#include <linux/socket.h>
#include <net/mctp.h>
#include <net/mctpdevice.h>
#include <net/sock.h>
/* socket implementation */
static int mctp_release(struct socket *sock)
{
struct sock *sk = sock->sk;
if (sk) {
sock->sk = NULL;
sk->sk_prot->close(sk, 0);
}
return 0;
}
static int mctp_bind(struct socket *sock, struct sockaddr *addr, int addrlen)
{
struct sock *sk = sock->sk;
struct mctp_sock *msk = container_of(sk, struct mctp_sock, sk);
struct sockaddr_mctp *smctp;
int rc;
if (addrlen < sizeof(*smctp))
return -EINVAL;
if (addr->sa_family != AF_MCTP)
return -EAFNOSUPPORT;
if (!capable(CAP_NET_BIND_SERVICE))
return -EACCES;
/* it's a valid sockaddr for MCTP, cast and do protocol checks */
smctp = (struct sockaddr_mctp *)addr;
lock_sock(sk);
/* TODO: allow rebind */
if (sk_hashed(sk)) {
rc = -EADDRINUSE;
goto out_release;
}
msk->bind_net = smctp->smctp_network;
msk->bind_addr = smctp->smctp_addr.s_addr;
msk->bind_type = smctp->smctp_type & 0x7f; /* ignore the IC bit */
rc = sk->sk_prot->hash(sk);
out_release:
release_sock(sk);
return rc;
}
static int mctp_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
DECLARE_SOCKADDR(struct sockaddr_mctp *, addr, msg->msg_name);
const int hlen = MCTP_HEADER_MAXLEN + sizeof(struct mctp_hdr);
int rc, addrlen = msg->msg_namelen;
struct sock *sk = sock->sk;
struct mctp_skb_cb *cb;
struct mctp_route *rt;
struct sk_buff *skb;
if (addr) {
if (addrlen < sizeof(struct sockaddr_mctp))
return -EINVAL;
if (addr->smctp_family != AF_MCTP)
return -EINVAL;
if (addr->smctp_tag & ~(MCTP_TAG_MASK | MCTP_TAG_OWNER))
return -EINVAL;
} else {
/* TODO: connect()ed sockets */
return -EDESTADDRREQ;
}
if (!capable(CAP_NET_RAW))
return -EACCES;
if (addr->smctp_network == MCTP_NET_ANY)
addr->smctp_network = mctp_default_net(sock_net(sk));
rt = mctp_route_lookup(sock_net(sk), addr->smctp_network,
addr->smctp_addr.s_addr);
if (!rt)
return -EHOSTUNREACH;
skb = sock_alloc_send_skb(sk, hlen + 1 + len,
msg->msg_flags & MSG_DONTWAIT, &rc);
if (!skb)
return rc;
skb_reserve(skb, hlen);
/* set type as fist byte in payload */
*(u8 *)skb_put(skb, 1) = addr->smctp_type;
rc = memcpy_from_msg((void *)skb_put(skb, len), msg, len);
if (rc < 0) {
kfree_skb(skb);
return rc;
}
/* set up cb */
cb = __mctp_cb(skb);
cb->net = addr->smctp_network;
rc = mctp_local_output(sk, rt, skb, addr->smctp_addr.s_addr,
addr->smctp_tag);
return rc ? : len;
}
static int mctp_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
int flags)
{
DECLARE_SOCKADDR(struct sockaddr_mctp *, addr, msg->msg_name);
struct sock *sk = sock->sk;
struct sk_buff *skb;
size_t msglen;
u8 type;
int rc;
if (flags & ~(MSG_DONTWAIT | MSG_TRUNC | MSG_PEEK))
return -EOPNOTSUPP;
skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &rc);
if (!skb)
return rc;
if (!skb->len) {
rc = 0;
goto out_free;
}
/* extract message type, remove from data */
type = *((u8 *)skb->data);
msglen = skb->len - 1;
if (len < msglen)
msg->msg_flags |= MSG_TRUNC;
else
len = msglen;
rc = skb_copy_datagram_msg(skb, 1, msg, len);
if (rc < 0)
goto out_free;
sock_recv_ts_and_drops(msg, sk, skb);
if (addr) {
struct mctp_skb_cb *cb = mctp_cb(skb);
/* TODO: expand mctp_skb_cb for header fields? */
struct mctp_hdr *hdr = mctp_hdr(skb);
hdr = mctp_hdr(skb);
addr = msg->msg_name;
addr->smctp_family = AF_MCTP;
addr->smctp_network = cb->net;
addr->smctp_addr.s_addr = hdr->src;
addr->smctp_type = type;
addr->smctp_tag = hdr->flags_seq_tag &
(MCTP_HDR_TAG_MASK | MCTP_HDR_FLAG_TO);
msg->msg_namelen = sizeof(*addr);
}
rc = len;
if (flags & MSG_TRUNC)
rc = msglen;
out_free:
skb_free_datagram(sk, skb);
return rc;
}
static int mctp_setsockopt(struct socket *sock, int level, int optname,
sockptr_t optval, unsigned int optlen)
{
return -EINVAL;
}
static int mctp_getsockopt(struct socket *sock, int level, int optname,
char __user *optval, int __user *optlen)
{
return -EINVAL;
}
static const struct proto_ops mctp_dgram_ops = {
.family = PF_MCTP,
.release = mctp_release,
.bind = mctp_bind,
.connect = sock_no_connect,
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
.poll = datagram_poll,
.ioctl = sock_no_ioctl,
.gettstamp = sock_gettstamp,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
.setsockopt = mctp_setsockopt,
.getsockopt = mctp_getsockopt,
.sendmsg = mctp_sendmsg,
.recvmsg = mctp_recvmsg,
.mmap = sock_no_mmap,
.sendpage = sock_no_sendpage,
};
static int mctp_sk_init(struct sock *sk)
{
struct mctp_sock *msk = container_of(sk, struct mctp_sock, sk);
INIT_HLIST_HEAD(&msk->keys);
return 0;
}
static void mctp_sk_close(struct sock *sk, long timeout)
{
sk_common_release(sk);
}
static int mctp_sk_hash(struct sock *sk)
{
struct net *net = sock_net(sk);
mutex_lock(&net->mctp.bind_lock);
sk_add_node_rcu(sk, &net->mctp.binds);
mutex_unlock(&net->mctp.bind_lock);
return 0;
}
static void mctp_sk_unhash(struct sock *sk)
{
struct mctp_sock *msk = container_of(sk, struct mctp_sock, sk);
struct net *net = sock_net(sk);
struct mctp_sk_key *key;
struct hlist_node *tmp;
unsigned long flags;
/* remove from any type-based binds */
mutex_lock(&net->mctp.bind_lock);
sk_del_node_init_rcu(sk);
mutex_unlock(&net->mctp.bind_lock);
/* remove tag allocations */
spin_lock_irqsave(&net->mctp.keys_lock, flags);
hlist_for_each_entry_safe(key, tmp, &msk->keys, sklist) {
hlist_del_rcu(&key->sklist);
hlist_del_rcu(&key->hlist);
spin_lock(&key->reasm_lock);
if (key->reasm_head)
kfree_skb(key->reasm_head);
key->reasm_head = NULL;
key->reasm_dead = true;
spin_unlock(&key->reasm_lock);
kfree_rcu(key, rcu);
}
spin_unlock_irqrestore(&net->mctp.keys_lock, flags);
synchronize_rcu();
}
static struct proto mctp_proto = {
.name = "MCTP",
.owner = THIS_MODULE,
.obj_size = sizeof(struct mctp_sock),
.init = mctp_sk_init,
.close = mctp_sk_close,
.hash = mctp_sk_hash,
.unhash = mctp_sk_unhash,
};
static int mctp_pf_create(struct net *net, struct socket *sock,
int protocol, int kern)
{
const struct proto_ops *ops;
struct proto *proto;
struct sock *sk;
int rc;
if (protocol)
return -EPROTONOSUPPORT;
/* only datagram sockets are supported */
if (sock->type != SOCK_DGRAM)
return -ESOCKTNOSUPPORT;
proto = &mctp_proto;
ops = &mctp_dgram_ops;
sock->state = SS_UNCONNECTED;
sock->ops = ops;
sk = sk_alloc(net, PF_MCTP, GFP_KERNEL, proto, kern);
if (!sk)
return -ENOMEM;
sock_init_data(sock, sk);
rc = 0;
if (sk->sk_prot->init)
rc = sk->sk_prot->init(sk);
if (rc)
goto err_sk_put;
return 0;
err_sk_put:
sock_orphan(sk);
sock_put(sk);
return rc;
}
static struct net_proto_family mctp_pf = {
.family = PF_MCTP,
.create = mctp_pf_create,
.owner = THIS_MODULE,
};
static __init int mctp_init(void)
{
int rc;
/* ensure our uapi tag definitions match the header format */
BUILD_BUG_ON(MCTP_TAG_OWNER != MCTP_HDR_FLAG_TO);
BUILD_BUG_ON(MCTP_TAG_MASK != MCTP_HDR_TAG_MASK);
pr_info("mctp: management component transport protocol core\n");
rc = sock_register(&mctp_pf);
if (rc)
return rc;
rc = proto_register(&mctp_proto, 0);
if (rc)
goto err_unreg_sock;
rc = mctp_routes_init();
if (rc)
goto err_unreg_proto;
rc = mctp_neigh_init();
if (rc)
goto err_unreg_proto;
mctp_device_init();
return 0;
err_unreg_proto:
proto_unregister(&mctp_proto);
err_unreg_sock:
sock_unregister(PF_MCTP);
return rc;
}
static __exit void mctp_exit(void)
{
mctp_device_exit();
mctp_neigh_exit();
mctp_routes_exit();
proto_unregister(&mctp_proto);
sock_unregister(PF_MCTP);
}
module_init(mctp_init);
module_exit(mctp_exit);
MODULE_DESCRIPTION("MCTP core");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Jeremy Kerr <jk@codeconstruct.com.au>");
MODULE_ALIAS_NETPROTO(PF_MCTP);
// SPDX-License-Identifier: GPL-2.0
/*
* Management Component Transport Protocol (MCTP) - device implementation.
*
* Copyright (c) 2021 Code Construct
* Copyright (c) 2021 Google
*/
#include <linux/if_link.h>
#include <linux/mctp.h>
#include <linux/netdevice.h>
#include <linux/rcupdate.h>
#include <linux/rtnetlink.h>
#include <net/addrconf.h>
#include <net/netlink.h>
#include <net/mctp.h>
#include <net/mctpdevice.h>
#include <net/sock.h>
struct mctp_dump_cb {
int h;
int idx;
size_t a_idx;
};
/* unlocked: caller must hold rcu_read_lock */
struct mctp_dev *__mctp_dev_get(const struct net_device *dev)
{
return rcu_dereference(dev->mctp_ptr);
}
struct mctp_dev *mctp_dev_get_rtnl(const struct net_device *dev)
{
return rtnl_dereference(dev->mctp_ptr);
}
static void mctp_dev_destroy(struct mctp_dev *mdev)
{
struct net_device *dev = mdev->dev;
dev_put(dev);
kfree_rcu(mdev, rcu);
}
static int mctp_fill_addrinfo(struct sk_buff *skb, struct netlink_callback *cb,
struct mctp_dev *mdev, mctp_eid_t eid)
{
struct ifaddrmsg *hdr;
struct nlmsghdr *nlh;
nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
RTM_NEWADDR, sizeof(*hdr), NLM_F_MULTI);
if (!nlh)
return -EMSGSIZE;
hdr = nlmsg_data(nlh);
hdr->ifa_family = AF_MCTP;
hdr->ifa_prefixlen = 0;
hdr->ifa_flags = 0;
hdr->ifa_scope = 0;
hdr->ifa_index = mdev->dev->ifindex;
if (nla_put_u8(skb, IFA_LOCAL, eid))
goto cancel;
if (nla_put_u8(skb, IFA_ADDRESS, eid))
goto cancel;
nlmsg_end(skb, nlh);
return 0;
cancel:
nlmsg_cancel(skb, nlh);
return -EMSGSIZE;
}
static int mctp_dump_dev_addrinfo(struct mctp_dev *mdev, struct sk_buff *skb,
struct netlink_callback *cb)
{
struct mctp_dump_cb *mcb = (void *)cb->ctx;
int rc = 0;
for (; mcb->a_idx < mdev->num_addrs; mcb->a_idx++) {
rc = mctp_fill_addrinfo(skb, cb, mdev, mdev->addrs[mcb->a_idx]);
if (rc < 0)
break;
}
return rc;
}
static int mctp_dump_addrinfo(struct sk_buff *skb, struct netlink_callback *cb)
{
struct mctp_dump_cb *mcb = (void *)cb->ctx;
struct net *net = sock_net(skb->sk);
struct hlist_head *head;
struct net_device *dev;
struct ifaddrmsg *hdr;
struct mctp_dev *mdev;
int ifindex;
int idx, rc;
hdr = nlmsg_data(cb->nlh);
// filter by ifindex if requested
ifindex = hdr->ifa_index;
rcu_read_lock();
for (; mcb->h < NETDEV_HASHENTRIES; mcb->h++, mcb->idx = 0) {
idx = 0;
head = &net->dev_index_head[mcb->h];
hlist_for_each_entry_rcu(dev, head, index_hlist) {
if (idx >= mcb->idx &&
(ifindex == 0 || ifindex == dev->ifindex)) {
mdev = __mctp_dev_get(dev);
if (mdev) {
rc = mctp_dump_dev_addrinfo(mdev,
skb, cb);
// Error indicates full buffer, this
// callback will get retried.
if (rc < 0)
goto out;
}
}
idx++;
// reset for next iteration
mcb->a_idx = 0;
}
}
out:
rcu_read_unlock();
mcb->idx = idx;
return skb->len;
}
static const struct nla_policy ifa_mctp_policy[IFA_MAX + 1] = {
[IFA_ADDRESS] = { .type = NLA_U8 },
[IFA_LOCAL] = { .type = NLA_U8 },
};
static int mctp_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(skb->sk);
struct nlattr *tb[IFA_MAX + 1];
struct net_device *dev;
struct mctp_addr *addr;
struct mctp_dev *mdev;
struct ifaddrmsg *ifm;
unsigned long flags;
u8 *tmp_addrs;
int rc;
rc = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_mctp_policy,
extack);
if (rc < 0)
return rc;
ifm = nlmsg_data(nlh);
if (tb[IFA_LOCAL])
addr = nla_data(tb[IFA_LOCAL]);
else if (tb[IFA_ADDRESS])
addr = nla_data(tb[IFA_ADDRESS]);
else
return -EINVAL;
/* find device */
dev = __dev_get_by_index(net, ifm->ifa_index);
if (!dev)
return -ENODEV;
mdev = mctp_dev_get_rtnl(dev);
if (!mdev)
return -ENODEV;
if (!mctp_address_ok(addr->s_addr))
return -EINVAL;
/* Prevent duplicates. Under RTNL so don't need to lock for reading */
if (memchr(mdev->addrs, addr->s_addr, mdev->num_addrs))
return -EEXIST;
tmp_addrs = kmalloc(mdev->num_addrs + 1, GFP_KERNEL);
if (!tmp_addrs)
return -ENOMEM;
memcpy(tmp_addrs, mdev->addrs, mdev->num_addrs);
tmp_addrs[mdev->num_addrs] = addr->s_addr;
/* Lock to write */
spin_lock_irqsave(&mdev->addrs_lock, flags);
mdev->num_addrs++;
swap(mdev->addrs, tmp_addrs);
spin_unlock_irqrestore(&mdev->addrs_lock, flags);
kfree(tmp_addrs);
mctp_route_add_local(mdev, addr->s_addr);
return 0;
}
static int mctp_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(skb->sk);
struct nlattr *tb[IFA_MAX + 1];
struct net_device *dev;
struct mctp_addr *addr;
struct mctp_dev *mdev;
struct ifaddrmsg *ifm;
unsigned long flags;
u8 *pos;
int rc;
rc = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_mctp_policy,
extack);
if (rc < 0)
return rc;
ifm = nlmsg_data(nlh);
if (tb[IFA_LOCAL])
addr = nla_data(tb[IFA_LOCAL]);
else if (tb[IFA_ADDRESS])
addr = nla_data(tb[IFA_ADDRESS]);
else
return -EINVAL;
/* find device */
dev = __dev_get_by_index(net, ifm->ifa_index);
if (!dev)
return -ENODEV;
mdev = mctp_dev_get_rtnl(dev);
if (!mdev)
return -ENODEV;
pos = memchr(mdev->addrs, addr->s_addr, mdev->num_addrs);
if (!pos)
return -ENOENT;
rc = mctp_route_remove_local(mdev, addr->s_addr);
// we can ignore -ENOENT in the case a route was already removed
if (rc < 0 && rc != -ENOENT)
return rc;
spin_lock_irqsave(&mdev->addrs_lock, flags);
memmove(pos, pos + 1, mdev->num_addrs - 1 - (pos - mdev->addrs));
mdev->num_addrs--;
spin_unlock_irqrestore(&mdev->addrs_lock, flags);
return 0;
}
static struct mctp_dev *mctp_add_dev(struct net_device *dev)
{
struct mctp_dev *mdev;
ASSERT_RTNL();
mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
if (!mdev)
return ERR_PTR(-ENOMEM);
spin_lock_init(&mdev->addrs_lock);
mdev->net = mctp_default_net(dev_net(dev));
/* associate to net_device */
rcu_assign_pointer(dev->mctp_ptr, mdev);
dev_hold(dev);
mdev->dev = dev;
return mdev;
}
static int mctp_fill_link_af(struct sk_buff *skb,
const struct net_device *dev, u32 ext_filter_mask)
{
struct mctp_dev *mdev;
mdev = mctp_dev_get_rtnl(dev);
if (!mdev)
return -ENODATA;
if (nla_put_u32(skb, IFLA_MCTP_NET, mdev->net))
return -EMSGSIZE;
return 0;
}
static size_t mctp_get_link_af_size(const struct net_device *dev,
u32 ext_filter_mask)
{
struct mctp_dev *mdev;
unsigned int ret;
/* caller holds RCU */
mdev = __mctp_dev_get(dev);
if (!mdev)
return 0;
ret = nla_total_size(4); /* IFLA_MCTP_NET */
return ret;
}
static const struct nla_policy ifla_af_mctp_policy[IFLA_MCTP_MAX + 1] = {
[IFLA_MCTP_NET] = { .type = NLA_U32 },
};
static int mctp_set_link_af(struct net_device *dev, const struct nlattr *attr,
struct netlink_ext_ack *extack)
{
struct nlattr *tb[IFLA_MCTP_MAX + 1];
struct mctp_dev *mdev;
int rc;
rc = nla_parse_nested(tb, IFLA_MCTP_MAX, attr, ifla_af_mctp_policy,
NULL);
if (rc)
return rc;
mdev = mctp_dev_get_rtnl(dev);
if (!mdev)
return 0;
if (tb[IFLA_MCTP_NET])
WRITE_ONCE(mdev->net, nla_get_u32(tb[IFLA_MCTP_NET]));
return 0;
}
static void mctp_unregister(struct net_device *dev)
{
struct mctp_dev *mdev;
mdev = mctp_dev_get_rtnl(dev);
if (!mdev)
return;
RCU_INIT_POINTER(mdev->dev->mctp_ptr, NULL);
mctp_route_remove_dev(mdev);
mctp_neigh_remove_dev(mdev);
kfree(mdev->addrs);
mctp_dev_destroy(mdev);
}
static int mctp_register(struct net_device *dev)
{
struct mctp_dev *mdev;
/* Already registered? */
if (rtnl_dereference(dev->mctp_ptr))
return 0;
/* only register specific types; MCTP-specific and loopback for now */
if (dev->type != ARPHRD_MCTP && dev->type != ARPHRD_LOOPBACK)
return 0;
mdev = mctp_add_dev(dev);
if (IS_ERR(mdev))
return PTR_ERR(mdev);
return 0;
}
static int mctp_dev_notify(struct notifier_block *this, unsigned long event,
void *ptr)
{
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
int rc;
switch (event) {
case NETDEV_REGISTER:
rc = mctp_register(dev);
if (rc)
return notifier_from_errno(rc);
break;
case NETDEV_UNREGISTER:
mctp_unregister(dev);
break;
}
return NOTIFY_OK;
}
static struct rtnl_af_ops mctp_af_ops = {
.family = AF_MCTP,
.fill_link_af = mctp_fill_link_af,
.get_link_af_size = mctp_get_link_af_size,
.set_link_af = mctp_set_link_af,
};
static struct notifier_block mctp_dev_nb = {
.notifier_call = mctp_dev_notify,
.priority = ADDRCONF_NOTIFY_PRIORITY,
};
void __init mctp_device_init(void)
{
register_netdevice_notifier(&mctp_dev_nb);
rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETADDR,
NULL, mctp_dump_addrinfo, 0);
rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_NEWADDR,
mctp_rtm_newaddr, NULL, 0);
rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_DELADDR,
mctp_rtm_deladdr, NULL, 0);
rtnl_af_register(&mctp_af_ops);
}
void __exit mctp_device_exit(void)
{
rtnl_af_unregister(&mctp_af_ops);
rtnl_unregister(PF_MCTP, RTM_DELADDR);
rtnl_unregister(PF_MCTP, RTM_NEWADDR);
rtnl_unregister(PF_MCTP, RTM_GETADDR);
unregister_netdevice_notifier(&mctp_dev_nb);
}
// SPDX-License-Identifier: GPL-2.0
/*
* Management Component Transport Protocol (MCTP) - routing
* implementation.
*
* This is currently based on a simple routing table, with no dst cache. The
* number of routes should stay fairly small, so the lookup cost is small.
*
* Copyright (c) 2021 Code Construct
* Copyright (c) 2021 Google
*/
#include <linux/idr.h>
#include <linux/mctp.h>
#include <linux/netdevice.h>
#include <linux/rtnetlink.h>
#include <linux/skbuff.h>
#include <net/mctp.h>
#include <net/mctpdevice.h>
#include <net/netlink.h>
#include <net/sock.h>
static int mctp_neigh_add(struct mctp_dev *mdev, mctp_eid_t eid,
enum mctp_neigh_source source,
size_t lladdr_len, const void *lladdr)
{
struct net *net = dev_net(mdev->dev);
struct mctp_neigh *neigh;
int rc;
mutex_lock(&net->mctp.neigh_lock);
if (mctp_neigh_lookup(mdev, eid, NULL) == 0) {
rc = -EEXIST;
goto out;
}
if (lladdr_len > sizeof(neigh->ha)) {
rc = -EINVAL;
goto out;
}
neigh = kzalloc(sizeof(*neigh), GFP_KERNEL);
if (!neigh) {
rc = -ENOMEM;
goto out;
}
INIT_LIST_HEAD(&neigh->list);
neigh->dev = mdev;
dev_hold(neigh->dev->dev);
neigh->eid = eid;
neigh->source = source;
memcpy(neigh->ha, lladdr, lladdr_len);
list_add_rcu(&neigh->list, &net->mctp.neighbours);
rc = 0;
out:
mutex_unlock(&net->mctp.neigh_lock);
return rc;
}
static void __mctp_neigh_free(struct rcu_head *rcu)
{
struct mctp_neigh *neigh = container_of(rcu, struct mctp_neigh, rcu);
dev_put(neigh->dev->dev);
kfree(neigh);
}
/* Removes all neighbour entries referring to a device */
void mctp_neigh_remove_dev(struct mctp_dev *mdev)
{
struct net *net = dev_net(mdev->dev);
struct mctp_neigh *neigh, *tmp;
mutex_lock(&net->mctp.neigh_lock);
list_for_each_entry_safe(neigh, tmp, &net->mctp.neighbours, list) {
if (neigh->dev == mdev) {
list_del_rcu(&neigh->list);
/* TODO: immediate RTM_DELNEIGH */
call_rcu(&neigh->rcu, __mctp_neigh_free);
}
}
mutex_unlock(&net->mctp.neigh_lock);
}
// TODO: add a "source" flag so netlink can only delete static neighbours?
static int mctp_neigh_remove(struct mctp_dev *mdev, mctp_eid_t eid)
{
struct net *net = dev_net(mdev->dev);
struct mctp_neigh *neigh, *tmp;
bool dropped = false;
mutex_lock(&net->mctp.neigh_lock);
list_for_each_entry_safe(neigh, tmp, &net->mctp.neighbours, list) {
if (neigh->dev == mdev && neigh->eid == eid) {
list_del_rcu(&neigh->list);
/* TODO: immediate RTM_DELNEIGH */
call_rcu(&neigh->rcu, __mctp_neigh_free);
dropped = true;
}
}
mutex_unlock(&net->mctp.neigh_lock);
return dropped ? 0 : -ENOENT;
}
static const struct nla_policy nd_mctp_policy[NDA_MAX + 1] = {
[NDA_DST] = { .type = NLA_U8 },
[NDA_LLADDR] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
};
static int mctp_rtm_newneigh(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(skb->sk);
struct net_device *dev;
struct mctp_dev *mdev;
struct ndmsg *ndm;
struct nlattr *tb[NDA_MAX + 1];
int rc;
mctp_eid_t eid;
void *lladdr;
int lladdr_len;
rc = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, nd_mctp_policy,
extack);
if (rc < 0) {
NL_SET_ERR_MSG(extack, "lladdr too large?");
return rc;
}
if (!tb[NDA_DST]) {
NL_SET_ERR_MSG(extack, "Neighbour EID must be specified");
return -EINVAL;
}
if (!tb[NDA_LLADDR]) {
NL_SET_ERR_MSG(extack, "Neighbour lladdr must be specified");
return -EINVAL;
}
eid = nla_get_u8(tb[NDA_DST]);
if (!mctp_address_ok(eid)) {
NL_SET_ERR_MSG(extack, "Invalid neighbour EID");
return -EINVAL;
}
lladdr = nla_data(tb[NDA_LLADDR]);
lladdr_len = nla_len(tb[NDA_LLADDR]);
ndm = nlmsg_data(nlh);
dev = __dev_get_by_index(net, ndm->ndm_ifindex);
if (!dev)
return -ENODEV;
mdev = mctp_dev_get_rtnl(dev);
if (!mdev)
return -ENODEV;
if (lladdr_len != dev->addr_len) {
NL_SET_ERR_MSG(extack, "Wrong lladdr length");
return -EINVAL;
}
return mctp_neigh_add(mdev, eid, MCTP_NEIGH_STATIC,
lladdr_len, lladdr);
}
static int mctp_rtm_delneigh(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(skb->sk);
struct nlattr *tb[NDA_MAX + 1];
struct net_device *dev;
struct mctp_dev *mdev;
struct ndmsg *ndm;
int rc;
mctp_eid_t eid;
rc = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, nd_mctp_policy,
extack);
if (rc < 0) {
NL_SET_ERR_MSG(extack, "incorrect format");
return rc;
}
if (!tb[NDA_DST]) {
NL_SET_ERR_MSG(extack, "Neighbour EID must be specified");
return -EINVAL;
}
eid = nla_get_u8(tb[NDA_DST]);
ndm = nlmsg_data(nlh);
dev = __dev_get_by_index(net, ndm->ndm_ifindex);
if (!dev)
return -ENODEV;
mdev = mctp_dev_get_rtnl(dev);
if (!mdev)
return -ENODEV;
return mctp_neigh_remove(mdev, eid);
}
static int mctp_fill_neigh(struct sk_buff *skb, u32 portid, u32 seq, int event,
unsigned int flags, struct mctp_neigh *neigh)
{
struct net_device *dev = neigh->dev->dev;
struct nlmsghdr *nlh;
struct ndmsg *hdr;
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*hdr), flags);
if (!nlh)
return -EMSGSIZE;
hdr = nlmsg_data(nlh);
hdr->ndm_family = AF_MCTP;
hdr->ndm_ifindex = dev->ifindex;
hdr->ndm_state = 0; // TODO other state bits?
if (neigh->source == MCTP_NEIGH_STATIC)
hdr->ndm_state |= NUD_PERMANENT;
hdr->ndm_flags = 0;
hdr->ndm_type = RTN_UNICAST; // TODO: is loopback RTN_LOCAL?
if (nla_put_u8(skb, NDA_DST, neigh->eid))
goto cancel;
if (nla_put(skb, NDA_LLADDR, dev->addr_len, neigh->ha))
goto cancel;
nlmsg_end(skb, nlh);
return 0;
cancel:
nlmsg_cancel(skb, nlh);
return -EMSGSIZE;
}
static int mctp_rtm_getneigh(struct sk_buff *skb, struct netlink_callback *cb)
{
struct net *net = sock_net(skb->sk);
int rc, idx, req_ifindex;
struct mctp_neigh *neigh;
struct ndmsg *ndmsg;
struct {
int idx;
} *cbctx = (void *)cb->ctx;
ndmsg = nlmsg_data(cb->nlh);
req_ifindex = ndmsg->ndm_ifindex;
idx = 0;
rcu_read_lock();
list_for_each_entry_rcu(neigh, &net->mctp.neighbours, list) {
if (idx < cbctx->idx)
goto cont;
rc = 0;
if (req_ifindex == 0 || req_ifindex == neigh->dev->dev->ifindex)
rc = mctp_fill_neigh(skb, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RTM_NEWNEIGH, NLM_F_MULTI, neigh);
if (rc)
break;
cont:
idx++;
}
rcu_read_unlock();
cbctx->idx = idx;
return skb->len;
}
int mctp_neigh_lookup(struct mctp_dev *mdev, mctp_eid_t eid, void *ret_hwaddr)
{
struct net *net = dev_net(mdev->dev);
struct mctp_neigh *neigh;
int rc = -EHOSTUNREACH; // TODO: or ENOENT?
rcu_read_lock();
list_for_each_entry_rcu(neigh, &net->mctp.neighbours, list) {
if (mdev == neigh->dev && eid == neigh->eid) {
if (ret_hwaddr)
memcpy(ret_hwaddr, neigh->ha,
sizeof(neigh->ha));
rc = 0;
break;
}
}
rcu_read_unlock();
return rc;
}
/* namespace registration */
static int __net_init mctp_neigh_net_init(struct net *net)
{
struct netns_mctp *ns = &net->mctp;
INIT_LIST_HEAD(&ns->neighbours);
mutex_init(&ns->neigh_lock);
return 0;
}
static void __net_exit mctp_neigh_net_exit(struct net *net)
{
struct netns_mctp *ns = &net->mctp;
struct mctp_neigh *neigh;
list_for_each_entry(neigh, &ns->neighbours, list)
call_rcu(&neigh->rcu, __mctp_neigh_free);
}
/* net namespace implementation */
static struct pernet_operations mctp_net_ops = {
.init = mctp_neigh_net_init,
.exit = mctp_neigh_net_exit,
};
int __init mctp_neigh_init(void)
{
rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_NEWNEIGH,
mctp_rtm_newneigh, NULL, 0);
rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_DELNEIGH,
mctp_rtm_delneigh, NULL, 0);
rtnl_register_module(THIS_MODULE, PF_MCTP, RTM_GETNEIGH,
NULL, mctp_rtm_getneigh, 0);
return register_pernet_subsys(&mctp_net_ops);
}
void __exit mctp_neigh_exit(void)
{
unregister_pernet_subsys(&mctp_net_ops);
rtnl_unregister(PF_MCTP, RTM_GETNEIGH);
rtnl_unregister(PF_MCTP, RTM_DELNEIGH);
rtnl_unregister(PF_MCTP, RTM_NEWNEIGH);
}
This diff is collapsed.
......@@ -212,6 +212,7 @@ static const char * const pf_family_names[] = {
[PF_QIPCRTR] = "PF_QIPCRTR",
[PF_SMC] = "PF_SMC",
[PF_XDP] = "PF_XDP",
[PF_MCTP] = "PF_MCTP",
};
/*
......
......@@ -1330,7 +1330,9 @@ static inline u16 socket_type_to_security_class(int family, int type, int protoc
return SECCLASS_SMC_SOCKET;
case PF_XDP:
return SECCLASS_XDP_SOCKET;
#if PF_MAX > 45
case PF_MCTP:
return SECCLASS_MCTP_SOCKET;
#if PF_MAX > 46
#error New address family defined, please update this function.
#endif
}
......
......@@ -246,6 +246,8 @@ struct security_class_mapping secclass_map[] = {
NULL } },
{ "xdp_socket",
{ COMMON_SOCK_PERMS, NULL } },
{ "mctp_socket",
{ COMMON_SOCK_PERMS, NULL } },
{ "perf_event",
{ "open", "cpu", "kernel", "tracepoint", "read", "write", NULL } },
{ "lockdown",
......@@ -255,6 +257,6 @@ struct security_class_mapping secclass_map[] = {
{ NULL }
};
#if PF_MAX > 45
#if PF_MAX > 46
#error New address family defined, please update secclass_map.
#endif
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment