Commits · be32a24372cf162e825332da1a7ccef058d4f20b · Kirill Smelkov / linux

10 Jun, 2019 4 commits

ibmvnic: Refresh device multicast list after reset · be32a243

Thomas Falcon authored Jun 07, 2019

It was observed that multicast packets were no longer received after
a device reset.  The fix is to resend the current multicast list to
the backing device after recovery.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be32a243

ibmvnic: Do not close unopened driver during reset · 1f94608b

Thomas Falcon authored Jun 07, 2019

Check driver state before halting it during a reset. If the driver is
not running, do nothing. Otherwise, a request to deactivate a down link
can cause an error and the reset will fail.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f94608b

Merge tag 'mlx5-fixes-2019-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 4172eadb

David S. Miller authored Jun 09, 2019

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-06-07

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v4.17
  ('net/mlx5: Avoid reloading already removed devices')

For -stable v5.0
  ('net/mlx5e: Avoid detaching non-existing netdev under switchdev mode')

For -stable v5.1
  ('net/mlx5e: Fix source port matching in fdb peer flow rule')
  ('net/mlx5e: Support tagged tunnel over bond')
  ('net/mlx5e: Add ndo_set_feature for uplink representor')
  ('net/mlx5: Update pci error handler entries and command translation')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4172eadb

Merge tag 'linux-can-fixes-for-5.2-20190607' of... · 62f42a11

David S. Miller authored Jun 09, 2019

Merge tag 'linux-can-fixes-for-5.2-20190607' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2019-06-07

this is a pull reqeust of 9 patches for net/master.

The first patch is by Alexander Dahl and removes a duplicate menu entry from
the Kconfig. The next patch by Joakim Zhang fixes the timeout in the flexcan
driver when setting small bit rates. Anssi Hannula's patch for the xilinx_can
driver fixes the bittiming_const for CAN FD core. The two patches by Sean
Nyekjaer bring mcp25625 to the existing mcp251x driver. The patch by Eugen
Hristev implements an errata for the m_can driver. YueHaibing's patch fixes the
error handling ing can_init(). The patch by Fabio Estevam for the flexcan
driver removes an unneeded registration message during flexcan_probe(). And the
last patch is by Willem de Bruijn and adds the missing purging the  socket
error queue on sock destruct.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

62f42a11

09 Jun, 2019 4 commits

mpls: fix warning with multi-label encap · 2f3f7d1f

George Wilkie authored Jun 07, 2019

If you configure a route with multiple labels, e.g.
  ip route add 10.10.3.0/24 encap mpls 16/100 via 10.10.2.2 dev ens4
A warning is logged:
  kernel: [  130.561819] netlink: 'ip': attribute type 1 has an invalid
  length.

This happens because mpls_iptunnel_policy has set the type of
MPLS_IPTUNNEL_DST to fixed size NLA_U32.
Change it to a minimum size.
nla_get_labels() does the remaining validation.

Fixes: e3e4712e ("mpls: ip tunnel support")
Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f3f7d1f

net: phy: rename Asix Electronics PHY driver · a9520543

Michael Schmitz authored Jun 07, 2019

[Resent to net instead of net-next - may clash with Anders Roxell's patch
series addressing duplicate module names]

Commit 31dd83b9 ("net-next: phy: new Asix Electronics PHY driver")
introduced a new PHY driver drivers/net/phy/asix.c that causes a module
name conflict with a pre-existiting driver (drivers/net/usb/asix.c).

The PHY driver is used by the X-Surf 100 ethernet card driver, and loaded
by that driver via its PHY ID. A rename of the driver looks unproblematic.

Rename PHY driver to ax88796b.c in order to resolve name conflict.
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Fixes: 31dd83b9 ("net-next: phy: new Asix Electronics PHY driver")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

a9520543

ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero · 65a3c497

Eric Dumazet authored Jun 06, 2019

Before taking a refcount, make sure the object is not already
scheduled for deletion.

Same fix is needed in ipv6_flowlabel_opt()

Fixes: 18367681 ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65a3c497

net: ipv4: fib_semantics: fix uninitialized variable · c3fee640

Enrico Weigelt authored Jun 06, 2019

fix an uninitialized variable:

  CC      net/ipv4/fib_semantics.o
net/ipv4/fib_semantics.c: In function 'fib_check_nh_v4_gw':
net/ipv4/fib_semantics.c:1027:12: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized]
   if (!tbl || err) {
            ^~
Signed-off-by: Enrico Weigelt <info@metux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3fee640

07 Jun, 2019 21 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 38e406f6

David S. Miller authored Jun 07, 2019

Daniel Borkmann says:

====================
pull-request: bpf 2019-06-07

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix several bugs in riscv64 JIT code emission which forgot to clear high
   32-bits for alu32 ops, from Björn and Luke with selftests covering all
   relevant BPF alu ops from Björn and Jiong.

2) Two fixes for UDP BPF reuseport that avoid calling the program in case of
   __udp6_lib_err and UDP GRO which broke reuseport_select_sock() assumption
   that skb->data is pointing to transport header, from Martin.

3) Two fixes for BPF sockmap: a use-after-free from sleep in psock's backlog
   workqueue, and a missing restore of sk_write_space when psock gets dropped,
   from Jakub and John.

4) Fix unconnected UDP sendmsg hook API which is insufficient as-is since it
   breaks standard applications like DNS if reverse NAT is not performed upon
   receive, from Daniel.

5) Fix an out-of-bounds read in __bpf_skc_lookup which in case of AF_INET6
   fails to verify that the length of the tuple is long enough, from Lorenz.

6) Fix libbpf's libbpf__probe_raw_btf to return an fd instead of 0/1 (for
   {un,}successful probe) as that is expected to be propagated as an fd to
   load_sk_storage_btf() and thus closing the wrong descriptor otherwise,
   from Michal.

7) Fix bpftool's JSON output for the case when a lookup fails, from Krzesimir.

8) Minor misc fixes in docs, samples and selftests, from various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

38e406f6

net/mlx5e: Support tagged tunnel over bond · 45e7d4c0

Eli Britstein authored Jun 02, 2019

Stacked devices like bond interface may have a VLAN device on top of
them. Detect lag state correctly under this condition, and return the
correct routed net device, according to it the encap header is built.

Fixes: e32ee6c7 ("net/mlx5e: Support tunnel encap over tagged Ethernet")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

45e7d4c0

net/mlx5e: Avoid detaching non-existing netdev under switchdev mode · 47c9d2c9

Alaa Hleihel authored May 26, 2019

After introducing dedicated uplink representor, the netdev instance
set over the esw manager vport (PF) became no longer in use, so it was
removed in the cited commit once we're on switchdev mode.
However, the mlx5e_detach function was not updated accordingly, and it
still tries to detach a non-existing netdev, causing a kernel crash.

This patch fixes this issue.

Fixes: aec002f6 ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

47c9d2c9

net/mlx5e: Fix source port matching in fdb peer flow rule · b83c0730

Raed Salem authored Jun 02, 2019

The cited commit changed the initialization placement of the eswitch
attributes so it is done prior to parse tc actions function call,
including among others the in_rep and in_mdev fields which are mistakenly
reassigned inside the parse actions function.

This breaks the source port matching criteria of the peer redirect rule.

Fix by removing the now redundant reassignment of the already initialized
fields.

Fixes: 988ab9c7 ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

b83c0730

net/mlx5e: Replace reciprocal_scale in TX select queue function · 57c70d87

Shay Agroskin authored Apr 28, 2019

The TX queue index returned by the fallback function ranges
between [0,NUM CHANNELS - 1] if QoS isn't set and
[0, (NUM CHANNELS)*(NUM TCs) -1] otherwise.

Our HW uses different TC mapping than the fallback function
(which is denoted as 'up', user priority) so we only need to extract
a channel number out of the returned value.

Since (NUM CHANNELS)*(NUM TCs) is a relatively small number, using
reciprocal scale almost always returns zero.
We instead access the 'txq2sq' table to extract the sq (and with it the
channel number) associated with the tx queue, thus getting
a more evenly distributed channel number.

Perf:

Rx/Tx side with Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz and ConnectX-5.
Used 'iperf' UDP traffic, 10 threads, and priority 5.

Before:	0.566Mpps
After:	 2.37Mpps

As expected, releasing the existing bottleneck of steering all traffic
to TX queue zero significantly improves transmission rates.

Fixes: 7ccdd084 ("net/mlx5e: Fix select queue callback")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

57c70d87

net/mlx5e: Add ndo_set_feature for uplink representor · d3cbd425

Chris Mi authored May 16, 2019

After we have a dedicated uplink representor, the new netdev ops
doesn't support ndo_set_feature. Because of that, we can't change
some features, eg. rxvlan. Now add it back.

In this patch, I also do a cleanup for the features flag handling,
eg. remove duplicate NETIF_F_HW_TC flag setting.

Fixes: aec002f6 ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

d3cbd425

net/mlx5: Avoid reloading already removed devices · dd80857b

Alaa Hleihel authored May 19, 2019

Prior to reloading a device we must first verify that it was not already
removed. Otherwise, the attempt to remove the device will do nothing, and
in that case we will end up proceeding with adding an new device that no
one was expecting to remove, leaving behind used resources such as EQs that
causes a failure to destroy comp EQs and syndrome (0x30f433).

Fix that by making sure that we try to remove and add a device (based on a
protocol) only if the device is already added.

Fixes: c5447c70 ("net/mlx5: E-Switch, Reload IB interface when switching devlink modes")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

dd80857b

net/mlx5: Update pci error handler entries and command translation · 6a6fabbf

Edward Srouji authored May 23, 2019

Add missing entries for create/destroy UCTX and UMEM commands.
This could get us wrong "unknown FW command" error in flows
where we unbind the device or reset the driver.

Also the translation of these commands from opcodes to string
was missing.

Fixes: 6e3722ba ("IB/mlx5: Use the correct commands for UMEM and UCTX allocation")
Signed-off-by: Edward Srouji <edwards@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

6a6fabbf

can: purge socket error queue on sock destruct · fd704bd5

Willem de Bruijn authored Jun 07, 2019

CAN supports software tx timestamps as of the below commit. Purge
any queued timestamp packets on socket destroy.

Fixes: 51f31cab ("ip: support for TX timestamps on UDP and RAW sockets")
Reported-by: syzbot+a90604060cb40f5bdd16@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

fd704bd5

can: flexcan: Remove unneeded registration message · eb503004

Fabio Estevam authored Jun 04, 2019

Currently the following message is observed when the flexcan
driver is probed:

flexcan 2090000.flexcan: device registered (reg_base=(ptrval), irq=23)

The reason for printing 'ptrval' is explained at
Documentation/core-api/printk-formats.rst:

"Pointers printed without a specifier extension (i.e unadorned %p) are
hashed to prevent leaking information about the kernel memory layout. This
has the added benefit of providing a unique identifier. On 64-bit machines
the first 32 bits are zeroed. The kernel will print ``(ptrval)`` until it
gathers enough entropy."

Instead of passing %pK, which can print the correct address, simply
remove the entire message as it is not really that useful.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

eb503004

can: af_can: Fix error path of can_init() · c5a3aed1

YueHaibing authored May 16, 2019

This patch add error path for can_init() to avoid possible crash if some
error occurs.

Fixes: 0d66548a ("[CAN]: Add PF_CAN core module")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

c5a3aed1

can: m_can: implement errata "Needless activation of MRAF irq" · 3e82f2f3

Eugen Hristev authored Mar 04, 2019

During frame reception while the MCAN is in Error Passive state and the
Receive Error Counter has thevalue MCAN_ECR.REC = 127, it may happen
that MCAN_IR.MRAF is set although there was no Message RAM access
failure. If MCAN_IR.MRAF is enabled, an interrupt to the Host CPU is
generated.

Work around:
The Message RAM Access Failure interrupt routine needs to check whether

    MCAN_ECR.RP = '1' and MCAN_ECR.REC = '127'.

In this case, reset MCAN_IR.MRAF. No further action is required.
This affects versions older than 3.2.0

Errata explained on Sama5d2 SoC which includes this hardware block:
http://ww1.microchip.com/downloads/en/DeviceDoc/SAMA5D2-Family-Silicon-Errata-and-Data-Sheet-Clarification-DS80000803B.pdf
chapter 6.2

Reproducibility: If 2 devices with m_can are connected back to back,
configuring different bitrate on them will lead to interrupt storm on
the receiving side, with error "Message RAM access failure occurred".
Another way is to have a bad hardware connection. Bad wire connection
can lead to this issue as well.

This patch fixes the issue according to provided workaround.
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Reviewed-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

3e82f2f3

can: mcp251x: add support for mcp25625 · 35b7fa4d

Sean Nyekjaer authored May 07, 2019

Fully compatible with mcp2515, the mcp25625 have integrated transceiver.

This patch adds support for the mcp25625 to the existing mcp251x driver.
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

35b7fa4d

dt-bindings: can: mcp251x: add mcp25625 support · 0df82dcd

Sean Nyekjaer authored May 07, 2019

Fully compatible with mcp2515, the mcp25625 have integrated transceiver.

This patch add the mcp25625 to the device tree bindings documentation.
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

0df82dcd

can: xilinx_can: use correct bittiming_const for CAN FD core · 904044dd

Anssi Hannula authored Sep 11, 2018

Commit 9e5f1b27 ("can: xilinx_can: add support for Xilinx CAN FD
core") added a new can_bittiming_const structure for CAN FD cores that
support larger values for tseg1, tseg2, and sjw than previous Xilinx CAN
cores, but the commit did not actually take that into use.

Fix that.

Tested with CAN FD core on a ZynqMP board.

Fixes: 9e5f1b27 ("can: xilinx_can: add support for Xilinx CAN FD core")
Reported-by: Shubhrajyoti Datta <shubhrajyoti.datta@gmail.com>
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Shubhrajyoti Datta <shubhrajyoti.datta@gmail.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

904044dd

can: flexcan: fix timeout when set small bitrate · 247e5356

Joakim Zhang authored Jan 31, 2019

Current we can meet timeout issue when setting a small bitrate like
10000 as follows on i.MX6UL EVK board (ipg clock = 66MHZ, per clock =
30MHZ):

| root@imx6ul7d:~# ip link set can0 up type can bitrate 10000

A link change request failed with some changes committed already.
Interface can0 may have been left with an inconsistent configuration,
please check.

| RTNETLINK answers: Connection timed out

It is caused by calling of flexcan_chip_unfreeze() timeout.

Originally the code is using usleep_range(10, 20) for unfreeze
operation, but the patch (8badd65e can: flexcan: avoid calling
usleep_range from interrupt context) changed it into udelay(10) which is
only a half delay of before, there're also some other delay changes.

After double to FLEXCAN_TIMEOUT_US to 100 can fix the issue.

Meanwhile, Rasmus Villemoes reported that even with a timeout of 100,
flexcan_probe() fails on the MPC8309, which requires a value of at least
140 to work reliably. 250 works for everyone.
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

247e5356

can: usb: Kconfig: Remove duplicate menu entry · 0ed89d77

Alexander Dahl authored Jan 22, 2019

This seems to have slipped in by accident when sorting the entries.

Fixes: ffbdd917Signed-off-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

0ed89d77

Merge tag 'wireless-drivers-for-davem-2019-06-07' of... · c7e3c93a

David S. Miller authored Jun 07, 2019

Merge tag 'wireless-drivers-for-davem-2019-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 5.2

First set of fixes for 5.2. Most important here are buffer overflow
fixes for mwifiex.

rtw88

* fix out of bounds compiler warning

* fix rssi handling to get 4x more throughput

* avoid circular locking

rsi

* fix unitilised data warning, these are hopefully the last ones so
  that the warning can be enabled by default

mwifiex

* fix buffer overflows

iwlwifi

* remove not used debugfs file

* various fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c7e3c93a

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1e1d9263

Linus Torvalds authored Jun 07, 2019

Pull networking fixes from David Miller:

1) Free AF_PACKET po->rollover properly, from Willem de Bruijn.

2) Read SFP eeprom in max 16 byte increments to avoid problems with
some SFP modules, from Russell King.

3) Fix UDP socket lookup wrt. VRF, from Tim Beale.

4) Handle route invalidation properly in s390 qeth driver, from Julian
Wiedmann.

5) Memory leak on unload in RDS, from Zhu Yanjun.

6) sctp_process_init leak, from Neil HOrman.

7) Fix fib_rules rule insertion semantic change that broke Android,
from Hangbin Liu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
pktgen: do not sleep with the thread lock held.
net: mvpp2: Use strscpy to handle stat strings
net: rds: fix memory leak in rds_ib_flush_mr_pool
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
net: aquantia: fix wol configuration not applied sometimes
ethtool: fix potential userspace buffer overflow
Fix memory leak in sctp_process_init
net: rds: fix memory leak when unload rds_rdma
ipv6: fix the check before getting the cookie in rt6_get_cookie
ipv4: not do cache for local delivery if bc_forwarding is enabled
s390/qeth: handle error when updating TX queue count
s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
s390/qeth: check dst entry before use
s390/qeth: handle limited IPv4 broadcast in L3 TX path
net: fix indirect calls helpers for ptype list hooks.
net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
udp: only choose unbound UDP socket for multicast when not in a VRF
net/tls: replace the sleeping lock around RX resync with a bit lock
...

1e1d9263

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 6e38335d

Linus Torvalds authored Jun 07, 2019

Pull rdma fixes from Jason Gunthorpe:
 "Things are looking pretty quiet here in RDMA, not too many bug fixes
  rolling in right now. The usual driver bug fixes and fixes for a
  couple of regressions introduced in 5.2:

   - Fix a race on bootup with RDMA device renaming and srp. SRP also
     needs to rename its internal sys files

   - Fix a memory leak in hns

   - Don't leak resources in efa on certain error unwinds

   - Don't panic in certain error unwinds in ib_register_device

   - Various small user visible bug fix patches for the hfi and efa
     drivers

   - Fix the 32 bit compilation break"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/efa: Remove MAYEXEC flag check from mmap flow
  mlx5: avoid 64-bit division
  IB/hfi1: Validate page aligned for a given virtual address
  IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
  IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
  IB/rdmavt: Fix alloc_qpn() WARN_ON()
  RDMA/core: Fix panic when port_data isn't initialized
  RDMA/uverbs: Pass udata on uverbs error unwind
  RDMA/core: Clear out the udata before error unwind
  RDMA/hns: Fix PD memory leak for internal allocation
  RDMA/srp: Rename SRP sysfs name after IB device rename trigger

6e38335d

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · a02a532c

Linus Torvalds authored Jun 07, 2019

Pull arm64 fixes from Will Deacon:
 "Another round of mostly-benign fixes, the exception being a boot crash
  on SVE2-capable CPUs (although I don't know where you'd find such a
  thing, so maybe it's benign too).

  We're in the process of resolving some big-endian ptrace breakage, so
  I'll probably have some more for you next week.

  Summary:

   - Fix boot crash on platforms with SVE2 due to missing register
     encoding

   - Fix architected timer accessors when CONFIG_OPTIMIZE_INLINING=y

   - Move cpu_logical_map into smp.h for use by upcoming irqchip drivers

   - Trivial typo fix in comment

   - Disable some useless, noisy warnings from GCC 9"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Silence gcc warnings about arch ABI drift
  ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fix
  arm64: arch_timer: mark functions as __always_inline
  arm64: smp: Moved cpu_logical_map[] to smp.h
  arm64: cpufeature: Fix missing ZFR0 in __read_sysreg_by_encoding()

a02a532c

06 Jun, 2019 11 commits

Merge branch 'fix-unconnected-udp' · 4aeba328

Alexei Starovoitov authored Jun 06, 2019

Daniel Borkmann says:

====================
Please refer to the patch 1/6 as the main patch with the details
on the current sendmsg hook API limitations and proposal to fix
it in order to work with basic applications like DNS. Remaining
patches are the usual uapi and tooling updates as well as test
cases. Thanks a lot!

v2 -> v3:
  - Add attach types to test_section_names.c and libbpf (Andrey)
  - Added given Acks, rest as-is
v1 -> v2:
  - Split off uapi header sync and bpftool bits (Martin, Alexei)
  - Added missing bpftool doc and bash completion as well
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4aeba328

bpf: expand section tests for test_section_names · b714560f

Daniel Borkmann authored Jun 07, 2019

Add cgroup/recvmsg{4,6} to test_section_names as well. Test run output:

  # ./test_section_names
  libbpf: failed to guess program type based on ELF section name 'InvAliD'
  libbpf: supported section(type) names are: [...]
  libbpf: failed to guess attach type based on ELF section name 'InvAliD'
  libbpf: attachable section(type) names are: [...]
  libbpf: failed to guess program type based on ELF section name 'cgroup'
  libbpf: supported section(type) names are: [...]
  libbpf: failed to guess attach type based on ELF section name 'cgroup'
  libbpf: attachable section(type) names are: [...]
  Summary: 38 PASSED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b714560f

bpf: more msg_name rewrite tests to test_sock_addr · 1812291e

Daniel Borkmann authored Jun 07, 2019

Extend test_sock_addr for recvmsg test cases, bigger parts of the
sendmsg code can be reused for this. Below are the strace view of
the recvmsg rewrites; the sendmsg side does not have a BPF prog
connected to it for the context of this test:

IPv4 test case:

  [pid  4846] bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=0x13 /* BPF_??? */, attach_flags=BPF_F_ALLOW_OVERRIDE}, 112) = 0
  [pid  4846] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
  [pid  4846] bind(5, {sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("127.0.0.1")}, 128) = 0
  [pid  4846] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 6
  [pid  4846] sendmsg(6, {msg_name={sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128, msg_iov=[{iov_base="a", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
  [pid  4846] select(6, [5], NULL, NULL, {tv_sec=2, tv_usec=0}) = 1 (in [5], left {tv_sec=1, tv_usec=999995})
  [pid  4846] recvmsg(5, {msg_name={sa_family=AF_INET, sin_port=htons(4040), sin_addr=inet_addr("192.168.1.254")}, msg_namelen=128->16, msg_iov=[{iov_base="a", iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
  [pid  4846] close(6)                    = 0
  [pid  4846] close(5)                    = 0
  [pid  4846] bpf(BPF_PROG_DETACH, {target_fd=3, attach_type=0x13 /* BPF_??? */}, 112) = 0

IPv6 test case:

  [pid  4846] bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=0x14 /* BPF_??? */, attach_flags=BPF_F_ALLOW_OVERRIDE}, 112) = 0
  [pid  4846] socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 5
  [pid  4846] bind(5, {sa_family=AF_INET6, sin6_port=htons(6666), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 128) = 0
  [pid  4846] socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 6
  [pid  4846] sendmsg(6, {msg_name={sa_family=AF_INET6, sin6_port=htons(6666), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=128, msg_iov=[{iov_base="a", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
  [pid  4846] select(6, [5], NULL, NULL, {tv_sec=2, tv_usec=0}) = 1 (in [5], left {tv_sec=1, tv_usec=999996})
  [pid  4846] recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(6060), inet_pton(AF_INET6, "face:b00c:1234:5678::abcd", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=128->28, msg_iov=[{iov_base="a", iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
  [pid  4846] close(6)                    = 0
  [pid  4846] close(5)                    = 0
  [pid  4846] bpf(BPF_PROG_DETACH, {target_fd=3, attach_type=0x14 /* BPF_??? */}, 112) = 0

test_sock_addr run w/o strace view:

  # ./test_sock_addr.sh
  [...]
  Test case: recvmsg4: return code ok .. [PASS]
  Test case: recvmsg4: return code !ok .. [PASS]
  Test case: recvmsg6: return code ok .. [PASS]
  Test case: recvmsg6: return code !ok .. [PASS]
  Test case: recvmsg4: rewrite IP & port (asm) .. [PASS]
  Test case: recvmsg6: rewrite IP & port (asm) .. [PASS]
  [...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1812291e

bpf, bpftool: enable recvmsg attach types · 000aa125

Daniel Borkmann authored Jun 07, 2019

Trivial patch to bpftool in order to complete enabling attaching programs
to BPF_CGROUP_UDP{4,6}_RECVMSG.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

000aa125

bpf, libbpf: enable recvmsg attach types · 9bb59ac1

Daniel Borkmann authored Jun 07, 2019

Another trivial patch to libbpf in order to enable identifying and
attaching programs to BPF_CGROUP_UDP{4,6}_RECVMSG by section name.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9bb59ac1

bpf: sync tooling uapi header · 3dbc6ada

Daniel Borkmann authored Jun 07, 2019

Sync BPF uapi header in order to pull in BPF_CGROUP_UDP{4,6}_RECVMSG
attach types. This is done and preferred as an extra patch in order
to ease sync of libbpf.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

3dbc6ada

bpf: fix unconnected udp hooks · 983695fa

Daniel Borkmann authored Jun 07, 2019

Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently
to applications as also stated in original motivation in 7828f20e ("Merge
branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter
two hooks into Cilium to enable host based load-balancing with Kubernetes,
I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes
typically sets up DNS as a service and is thus subject to load-balancing.

Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API
is currently insufficient and thus not usable as-is for standard applications
shipped with most distros. To break down the issue we ran into with a simple
example:

  # cat /etc/resolv.conf
  nameserver 147.75.207.207
  nameserver 147.75.207.208

For the purpose of a simple test, we set up above IPs as service IPs and
transparently redirect traffic to a different DNS backend server for that
node:

  # cilium service list
  ID   Frontend            Backend
  1    147.75.207.207:53   1 => 8.8.8.8:53
  2    147.75.207.208:53   1 => 8.8.8.8:53

The attached BPF program is basically selecting one of the backends if the
service IP/port matches on the cgroup hook. DNS breaks here, because the
hooks are not transparent enough to applications which have built-in msg_name
address checks:

  # nslookup 1.1.1.1
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
  [...]
  ;; connection timed out; no servers could be reached

  # dig 1.1.1.1
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
  [...]

  ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
  ;; global options: +cmd
  ;; connection timed out; no servers could be reached

For comparison, if none of the service IPs is used, and we tell nslookup
to use 8.8.8.8 directly it works just fine, of course:

  # nslookup 1.1.1.1 8.8.8.8
  1.1.1.1.in-addr.arpa	name = one.one.one.one.

In order to fix this and thus act more transparent to the application,
this needs reverse translation on recvmsg() side. A minimal fix for this
API is to add similar recvmsg() hooks behind the BPF cgroups static key
such that the program can track state and replace the current sockaddr_in{,6}
with the original service IP. From BPF side, this basically tracks the
service tuple plus socket cookie in an LRU map where the reverse NAT can
then be retrieved via map value as one example. Side-note: the BPF cgroups
static key should be converted to a per-hook static key in future.

Same example after this fix:

  # cilium service list
  ID   Frontend            Backend
  1    147.75.207.207:53   1 => 8.8.8.8:53
  2    147.75.207.208:53   1 => 8.8.8.8:53

Lookups work fine now:

  # nslookup 1.1.1.1
  1.1.1.1.in-addr.arpa    name = one.one.one.one.

  Authoritative answers can be found from:

  # dig 1.1.1.1

  ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550
  ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

  ;; OPT PSEUDOSECTION:
  ; EDNS: version: 0, flags:; udp: 512
  ;; QUESTION SECTION:
  ;1.1.1.1.                       IN      A

  ;; AUTHORITY SECTION:
  .                       23426   IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400

  ;; Query time: 17 msec
  ;; SERVER: 147.75.207.207#53(147.75.207.207)
  ;; WHEN: Tue May 21 12:59:38 UTC 2019
  ;; MSG SIZE  rcvd: 111

And from an actual packet level it shows that we're using the back end
server when talking via 147.75.207.20{7,8} front end:

  # tcpdump -i any udp
  [...]
  12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
  12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
  12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
  12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
  [...]

In order to be flexible and to have same semantics as in sendmsg BPF
programs, we only allow return codes in [1,1] range. In the sendmsg case
the program is called if msg->msg_name is present which can be the case
in both, connected and unconnected UDP.

The former only relies on the sockaddr_in{,6} passed via connect(2) if
passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar
way to call into the BPF program whenever a non-NULL msg->msg_name was
passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note
that for TCP case, the msg->msg_name is ignored in the regular recvmsg
path and therefore not relevant.

For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE,
the hook is not called. This is intentional as it aligns with the same
semantics as in case of TCP cgroup BPF hooks right now. This might be
better addressed in future through a different bpf_attach_type such
that this case can be distinguished from the regular recvmsg paths,
for example.

Fixes: 1cedee13 ("bpf: Hooks for sys_sendmsg")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

983695fa

Merge branch 'parisc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 16d72dd4

Linus Torvalds authored Jun 06, 2019

Pull parisc fixes from Helge Deller:

 - Fix crashes when accessing PCI devices on some machines like C240 and
   J5000. The crashes were triggered because we replaced cache flushes
   by nops in the alternative coding where we shouldn't for some
   machines.

 - Dave fixed a race in the usage of the sr1 space register when used to
   load the coherence index.

 - Use the hardware lpa instruction to to load the physical address of
   kernel virtual addresses in the iommu driver code.

 - The kernel may fail to link when CONFIG_MLONGCALLS isn't set. Solve
   that by rearranging functions in the final vmlinux executeable.

 - Some defconfig cleanups and removal of compiler warnings.

* 'parisc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix crash due alternative coding for NP iopdir_fdc bit
  parisc: Use lpa instruction to load physical addresses in driver code
  parisc: configs: Remove useless UEVENT_HELPER_PATH
  parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
  parisc: Fix compiler warnings in float emulation code
  parisc/slab: cleanup after /proc/slab_allocators removal
  parisc: Allow building 64-bit kernel without -mlong-calls compiler option
  parisc: Kconfig: remove ARCH_DISCARD_MEMBLOCK

16d72dd4

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · ae876604

Linus Torvalds authored Jun 06, 2019

Pull crypto fixes from Herbert Xu:
 "This fixes a regression that breaks the jitterentropy RNG and a
  potential memory leak in hmac"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: hmac - fix memory leak in hmac_init_tfm()
  crypto: jitterentropy - change back to module_init()

ae876604

Merge tag 'xfs-5.2-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 01047631

Linus Torvalds authored Jun 06, 2019

Pull xfs fixes from Darrick Wong:
 "Here are a couple more bug fixes for 5.2. Changes since last update:

   - Fix some forgotten strings in a log debugging function

   - Fix incorrect unit conversion in online fsck code"

* tag 'xfs-5.2-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: inode btree scrubber should calculate im_boffset correctly
  xfs: fix broken log reservation debugging

01047631

Merge tag 'gfs2-v5.2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · dc8ca9cc

Linus Torvalds authored Jun 06, 2019

Pull gfs2 fix from Andreas Gruenbacher:
 "A revert for a patch that turned out to be broken"

* tag 'gfs2-v5.2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  Revert "gfs2: Replace gl_revokes with a GLF flag"

dc8ca9cc