Commits · 7c4c24168014f250241b6df66ca5bae37eda7ffc · Kirill Smelkov / linux

24 Jul, 2020 32 commits

Merge branch 'get-rid-of-the-address_space-override-in-setsockopt-v2' · 7c4c2416

David S. Miller authored Jul 24, 2020

Christoph Hellwig says:

====================
get rid of the address_space override in setsockopt v2

setsockopt is the last place in architecture-independ code that still
uses set_fs to force the uaccess routines to operate on kernel pointers.

This series adds a new sockptr_t type that can contained either a kernel
or user pointer, and which has accessors that do the right thing, and
then uses it for setsockopt, starting by refactoring some low-level
helpers and moving them over to it before finally doing the main
setsockopt method.

Note that apparently the eBPF selftests do not even cover this path, so
the series has been tested with a testing patch that always copies the
data first and passes a kernel pointer.  This is something that works for
most common sockopts (and is something that the ePBF support relies on),
but unfortunately in various corner cases we either don't use the passed
in length, or in one case actually copy data back from setsockopt, or in
case of bpfilter straight out do not work with kernel pointers at all.

Against net-next/master.

Changes since v1:
 - check that users don't pass in kernel addresses
 - more bpfilter cleanups
 - cosmetic mptcp tweak
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7c4c2416

net: optimize the sockptr_t for unified kernel/user address spaces · 6d04fe15

Christoph Hellwig authored Jul 23, 2020

For architectures like x86 and arm64 we don't need the separate bit to
indicate that a pointer is a kernel pointer as the address spaces are
unified.  That way the sockptr_t can be reduced to a union of two
pointers, which leads to nicer calling conventions.

The only caveat is that we need to check that users don't pass in kernel
address and thus gain access to kernel memory.  Thus the USER_SOCKPTR
helper is replaced with a init_user_sockptr function that does this check
and returns an error if it fails.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d04fe15

net: pass a sockptr_t into ->setsockopt · a7b75c5a

Christoph Hellwig authored Jul 23, 2020

Rework the remaining setsockopt code to pass a sockptr_t instead of a
plain user pointer.  This removes the last remaining set_fs(KERNEL_DS)
outside of architecture specific code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154]
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7b75c5a

net/tcp: switch do_tcp_setsockopt to sockptr_t · d38d2b00

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

d38d2b00

net/tcp: switch ->md5_parse to sockptr_t · d4c19c49

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

d4c19c49

net/udp: switch udp_lib_setsockopt to sockptr_t · 91ac1cca

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

91ac1cca

net/ipv6: switch do_ipv6_setsockopt to sockptr_t · 894cfbc0

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

894cfbc0

net/ipv6: factor out a ipv6_set_opt_hdr helper · b84d2b73

Christoph Hellwig authored Jul 23, 2020

Factour out a helper to set the IPv6 option headers from
do_ipv6_setsockopt.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b84d2b73

net/ipv6: switch ipv6_flowlabel_opt to sockptr_t · 86298285

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Note that the get case is pretty weird in that it actually copies data
back to userspace from setsockopt.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

86298285

net/ipv6: split up ipv6_flowlabel_opt · ff6a4cf2

Christoph Hellwig authored Jul 23, 2020

Split ipv6_flowlabel_opt into a subfunction for each action and a small
wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff6a4cf2

net/ipv6: switch ip6_mroute_setsockopt to sockptr_t · b43c6153

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b43c6153

net/ipv4: switch do_ip_setsockopt to sockptr_t · 89654c5f

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

89654c5f

net/ipv4: merge ip_options_get and ip_options_get_from_user · de40a3e8

Christoph Hellwig authored Jul 23, 2020

Use the sockptr_t type to merge the versions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

de40a3e8

net/ipv4: switch ip_mroute_setsockopt to sockptr_t · 01ccb5b4

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

01ccb5b4

bpfilter: switch bpfilter_ip_set_sockopt to sockptr_t · b03afaa8

Christoph Hellwig authored Jul 23, 2020

This is mostly to prepare for cleaning up the callers, as bpfilter by
design can't handle kernel pointers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b03afaa8

netfilter: switch nf_setsockopt to sockptr_t · c2f12630

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2f12630

netfilter: switch xt_copy_counters to sockptr_t · ab214d1b

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab214d1b

netfilter: remove the unused user argument to do_update_counters · 7e4b9dba
Christoph Hellwig authored Jul 23, 2020
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
7e4b9dba

net/xfrm: switch xfrm_user_policy to sockptr_t · c6d1b26a

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6d1b26a

net: switch sock_set_timeout to sockptr_t · c8c1bbb6

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8c1bbb6

net: switch sock_set_timeout to sockptr_t · c34645ac

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c34645ac

net: switch sock_setbindtodevice to sockptr_t · 5790642b

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

5790642b

net: switch copy_bpf_fprog_from_user to sockptr_t · b1ea9ff6

Christoph Hellwig authored Jul 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1ea9ff6

net: add a new sockptr_t type · ba423fda

Christoph Hellwig authored Jul 23, 2020

Add a uptr_t type that can hold a pointer to either a user or kernel
memory region, and simply helpers to copy to and from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba423fda

bpfilter: reject kernel addresses · d200cf62

Christoph Hellwig authored Jul 23, 2020

The bpfilter user mode helper processes the optval address using
process_vm_readv.  Don't send it kernel addresses fed under
set_fs(KERNEL_DS) as that won't work.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

d200cf62

net/bpfilter: split __bpfilter_process_sockopt · c9ffebdd

Christoph Hellwig authored Jul 23, 2020

Split __bpfilter_process_sockopt into a low-level send request routine and
the actual setsockopt hook to split the init time ping from the actual
setsockopt processing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9ffebdd

bpfilter: fix up a sparse annotation · e024e008

Christoph Hellwig authored Jul 23, 2020

The __user doesn't make sense when casting to an integer type, just
switch to a uintptr_t cast which also removes the need for the __force.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e024e008

Merge branch 'TC-datapath-hash-api' · 197569f7

David S. Miller authored Jul 24, 2020

Ariel Levkovich says:

====================
TC datapath hash api

Hash based packet classification allows user to set up rules that
provide load balancing of traffic across multiple vports and
for ECMP path selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on a their hash value and distribute
the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The series introduces an extention to the cls flower classifier
and allows user to add rules that match on the hash value that
is stored in skb->hash while assuming the value was set prior to
the classification.

Setting the skb->hash can be done in various ways and is not defined
in this series - for example:
1. By the device driver upon processing an rx packet.
2. Using tc action bpf with a program which computes and sets the
skb->hash value.

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

v3 -> v4:
 *Drop hash setting code leaving only the classidication parts.
  Setting the hash will be possible via existing tc action bpf.

v2 -> v3:
 *Split hash algorithm option into 2 different actions.
  Asym_l4 available via act_skbedit and bpf via new act_hash.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

197569f7

net/sched: cls_flower: Add hash info to flow classification · 5923b8f7

Ariel Levkovich authored Jul 23, 2020

Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5923b8f7

net/flow_dissector: add packet hash dissection · 0cb09aff

Ariel Levkovich authored Jul 23, 2020

Retreive a hash value from the SKB and store it
in the dissector key for future matching.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cb09aff

net: hyperv: dump TX indirection table to ethtool regs · 4a062d66

Chi Song authored Jul 23, 2020

An imbalanced TX indirection table causes netvsc to have low
performance. This table is created and managed during runtime. To help
better diagnose performance issues caused by imbalanced tables, it needs
make TX indirection tables visible.

Because TX indirection table is driver specified information, so
display it via ethtool register dump.
Signed-off-by: Chi Song <chisong@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a062d66

vrf: Handle CONFIG_SYSCTL not set · 1b6687e3

David Ahern authored Jul 23, 2020

Randy reported compile failure when CONFIG_SYSCTL is not set/enabled:

ERROR: modpost: "sysctl_vals" [drivers/net/vrf.ko] undefined!

Fix by splitting out the sysctl init and cleanup into helpers that
can be set to do nothing when CONFIG_SYSCTL is disabled. In addition,
move vrf_strict_mode and vrf_strict_mode_change to above
vrf_shared_table_handler (code move only) and wrap all of it
in the ifdef CONFIG_SYSCTL.

Update the strict mode tests to check for the existence of the
/proc/sys entry.

Fixes: 33306f1a ("vrf: add sysctl parameter for strict mode")
Cc: Andrea Mayer <andrea.mayer@uniroma2.it>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>

1b6687e3

23 Jul, 2020 8 commits

net: dsa: stop overriding master's ndo_get_phys_port_name · 5df5661a

Vladimir Oltean authored Jul 23, 2020

The purpose of this override is to give the user an indication of what
the number of the CPU port is (in DSA, the CPU port is a hardware
implementation detail and not a network interface capable of traffic).

However, it has always failed (by design) at providing this information
to the user in a reliable fashion.

Prior to commit 3369afba ("net: Call into DSA netdevice_ops
wrappers"), the behavior was to only override this callback if it was
not provided by the DSA master.

That was its first failure: if the DSA master itself was a DSA port or a
switchdev, then the user would not see the number of the CPU port in
/sys/class/net/eth0/phys_port_name, but the number of the DSA master
port within its respective physical switch.

But that was actually ok in a way. The commit mentioned above changed
that behavior, and now overrides the master's ndo_get_phys_port_name
unconditionally. That comes with problems of its own, which are worse in
a way.

The idea is that it's typical for switchdev users to have udev rules for
consistent interface naming. These are based, among other things, on
the phys_port_name attribute. If we let the DSA switch at the bottom
to start randomly overriding ndo_get_phys_port_name with its own CPU
port, we basically lose any predictability in interface naming, or even
uniqueness, for that matter.

So, there are reasons to let DSA override the master's callback (to
provide a consistent interface, a number which has a clear meaning and
must not be interpreted according to context), and there are reasons to
not let DSA override it (it breaks udev matching for the DSA master).

But, there is an alternative method for users to retrieve the number of
the CPU port of each DSA switch in the system:

  $ devlink port
  pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0
  pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2
  pci/0000:00:00.5/4: type notset flavour cpu port 4
  spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0
  spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1
  spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2
  spi/spi2.0/4: type notset flavour cpu port 4
  spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0
  spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1
  spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2
  spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3
  spi/spi2.1/4: type notset flavour cpu port 4

So remove this duplicated, unreliable and troublesome method. From this
patch on, the phys_port_name attribute of the DSA master will only
contain information about itself (if at all). If the users need reliable
information about the CPU port they're probably using devlink anyway.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5df5661a

cxgb4: add loopback ethtool self-test · 7235ffae

Vishal Kulkarni authored Jul 23, 2020

In this test, loopback pkt is created and sent on default queue.
The packet goes until the Multi Port Switch (MPS) just before
the MAC and based on the specified channel number, it either
goes outside the wire on one of the physical ports or looped
back to Rx path by MPS. In this case, we're specifying loopback
channel, instead of physical ports, so the packet gets looped
back to Rx path, instead of getting transmitted on the wire.

v3:
- Modify commit message to include test details.
v2:
- Add only loopback self-test.
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7235ffae

Merge branch 'l2tp-further-checkpatch-pl-cleanups' · 15be4ea3

David S. Miller authored Jul 23, 2020

Tom Parkin says:

====================
l2tp: further checkpatch.pl cleanups

l2tp hasn't been kept up to date with the static analysis checks offered
by checkpatch.pl.

This patchset builds on the series "l2tp: cleanup checkpatch.pl
warnings".  It includes small refactoring changes which improve code
quality and resolve a subset of the checkpatch warnings for the l2tp
codebase.
====================
Reviewed-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>