Commits · 71685eb4ce80ae9c49eff82ca4dd15acab215de9 · Kirill Smelkov / linux

08 Nov, 2019 1 commit

inetpeer: fix data-race in inet_putpeer / inet_putpeer · 71685eb4

Eric Dumazet authored Nov 07, 2019

We need to explicitely forbid read/store tearing in inet_peer_gc()
and inet_putpeer().

The following syzbot report reminds us about inet_putpeer()
running without a lock held.

BUG: KCSAN: data-race in inet_putpeer / inet_putpeer

write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 0:
 inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240
 ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102
 inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228
 __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
 rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157
 rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377
 rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0xbb/0xe0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
 cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 do_idle+0x1af/0x280 kernel/sched/idle.c:263

write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 1:
 inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240
 ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102
 inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228
 __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
 rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157
 rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377
 rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 4b9d9be8 ("inetpeer: remove unused list")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71685eb4

07 Nov, 2019 39 commits

net: phy: at803x: add missing dependency on CONFIG_REGULATOR · dddb318b

Madalin Bucur authored Nov 07, 2019

Compilation fails on PPC targets as CONFIG_REGULATOR is not set and
drivers/regulator/devres.c is not compiled in while functions exported
there are used by drivers/net/phy/at803x.c. Here's the error log:

LD .tmp_vmlinux1
drivers/net/phy/at803x.o: In function `at803x_rgmii_reg_set_voltage_sel':
drivers/net/phy/at803x.c:294: undefined reference to `.rdev_get_drvdata'
drivers/net/phy/at803x.o: In function `at803x_rgmii_reg_get_voltage_sel':
drivers/net/phy/at803x.c:306: undefined reference to `.rdev_get_drvdata'
drivers/net/phy/at803x.o: In function `at8031_register_regulators':
drivers/net/phy/at803x.c:359: undefined reference to `.devm_regulator_register'
drivers/net/phy/at803x.c:365: undefined reference to `.devm_regulator_register'
drivers/net/phy/at803x.o:(.data.rel+0x0): undefined reference to `regulator_list_voltage_table'
linux/Makefile:1074: recipe for target 'vmlinux' failed
make[1]: *** [vmlinux] Error 1

Fixes: 2f664823 ("net: phy: at803x: add device tree binding")
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dddb318b

dpaa2-eth: add ethtool MAC counters · 991df1fb

Ioana Ciornei authored Nov 07, 2019

When a DPNI is connected to a MAC, export its associated counters.
Ethtool related functions are added in dpaa2_mac for returning the
number of counters, their strings and also their values.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

991df1fb

enetc: ethtool: add wake-on-lan callbacks · 88c8562b

Michael Walle authored Nov 07, 2019

If there is an external PHY, pass the wake-on-lan request to the PHY.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

88c8562b

enetc: add ioctl() support for PHY-related ops · a613bafe

Michael Walle authored Nov 07, 2019

If there is an attached PHY try to handle the requested ioctl with its
handler, which allows the userspace to access PHY registers, for
example. This will make mii-diag and similar tools work.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

a613bafe

mlxsw: spectrum: Fix error return code in mlxsw_sp_port_module_info_init() · 630d4e75

Wei Yongjun authored Nov 06, 2019

Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 4a7f970f ("mlxsw: spectrum: Replace port_to_module array with array of structs")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

630d4e75

Merge branch 'cxgb4-add-support-for-TC-MQPRIO-Qdisc-Offload' · 69625ea7

David S. Miller authored Nov 07, 2019

Rahul Lakkireddy says:

====================
cxgb4: add support for TC-MQPRIO Qdisc Offload

This series of patches add support for offloading TC-MQPRIO Qdisc
to Chelsio T5/T6 NICs. Offloading QoS traffic shaping and pacing
requires using Ethernet Offload (ETHOFLD) resources available on
Chelsio NICs. The ETHOFLD resources are configured by firmware
and taken from the resource pool shared with other Chelsio Upper
Layer Drivers. Traffic flowing through ETHOFLD region requires a
software netdev Tx queue (EOSW_TXQ) exposed to networking stack,
and an underlying hardware Tx queue (EOHW_TXQ) used for sending
packets through hardware.

ETHOFLD region is addressed using EOTIDs, which are per-connection
resource. Hence, EOTIDs are capable of storing only a very small
number of packets in flight. To allow more connections to share
the the QoS rate limiting configuration, multiple EOTIDs must be
allocated to reduce packet drops. EOTIDs are 1-to-1 mapped with
software EOSW_TXQ. Several software EOSW_TXQs can post packets to
a single hardware EOHW_TXQ.

The series is broken down as follows:

Patch 1 queries firmware for maximum available traffic classes,
as well as, start and maximum available indices (EOTID) into ETHOFLD
region, supported by the underlying device.

Patch 2 reworks queue configuration and simplifies MSI-X allocation
logic in preparation for ETHOFLD queues support.

Patch 3 adds skeleton for validating and configuring TC-MQPRIO Qdisc
offload. Also, adds support for software EOSW_TXQs and exposes them
to network stack. Updates Tx queue selection to use fallback NIC Tx
path for unsupported traffic that can't go through ETHOFLD queues.

Patch 4 adds support for managing hardware queues to rate limit
traffic flowing through them. The queues are allocated/removed based
on enabling/disabling TC-MQPRIO Qdisc offload, respectively.

Patch 5 adds Tx path for traffic flowing through software EOSW_TXQ
and EOHW_TXQ. Also, adds Rx path to handle Tx completions.

Patch 6 updates exisiting SCHED API to configure FLOWC based QoS
offload. In the existing QUEUE based rate limiting, multiple queues
sharing a traffic class get the aggreagated max rate limit value.
On the other hand, in FLOWC based rate limiting, multiple queues
sharing a traffic class get their own individual max rate limit
value. For example, if 2 queues are bound to class 0, which is rate
limited to 1 Gbps, then in QUEUE based rate limiting, both the
queues get the aggregate max output of 1 Gbps only. In FLOWC based
rate limiting, each queue gets its own output of max 1 Gbps each;
i.e. 2 queues * 1 Gbps rate limit = 2 Gbps max output.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

69625ea7

cxgb4: add FLOWC based QoS offload · 0e395b3c

Rahul Lakkireddy authored Nov 07, 2019

Rework SCHED API to allow offloading TC-MQPRIO QoS configuration.
The existing QUEUE based rate limiting throttles all queues sharing
a traffic class, to the specified max rate limit value. So, if
multiple queues share a traffic class, then all the queues get
the aggregate specified max rate limit.

So, introduce the new FLOWC based rate limiting, where multiple
queues can share a traffic class with each queue getting its own
individual specified max rate limit.

For example, if 2 queues are bound to class 0, which is rate limited
to 1 Gbps, then 2 queues using QUEUE based rate limiting, get the
aggregate output of 1 Gbps only. In FLOWC based rate limiting, each
queue gets its own output of max 1 Gbps each; i.e. 2 queues * 1 Gbps
rate limit = 2 Gbps.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e395b3c

cxgb4: add Tx and Rx path for ETHOFLD traffic · 4846d533

Rahul Lakkireddy authored Nov 07, 2019

Implement Tx path for traffic flowing through software EOSW_TXQ
and EOHW_TXQ. Since multiple EOSW_TXQ can post packets to a single
EOHW_TXQ, protect the hardware queue with necessary spinlock. Also,
move common code used to generate TSO work request to a common
function.

Implement Rx path to handle Tx completions for successfully
transmitted packets.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4846d533

cxgb4: add ETHOFLD hardware queue support · 2d0cb84d

Rahul Lakkireddy authored Nov 07, 2019

Add support for configuring and managing ETHOFLD hardware queues.
Keep the queue count and MSI-X allocation scheme same as NIC queues.
ETHOFLD hardware queues are dynamically allocated/destroyed as
TC-MQPRIO Qdisc offload is enabled/disabled on the corresponding
interface, respectively.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d0cb84d

cxgb4: parse and configure TC-MQPRIO offload · b1396c2b

Rahul Lakkireddy authored Nov 07, 2019

Add logic for validation and configuration of TC-MQPRIO Qdisc
offload. Also, add support to manage EOSW_TXQ, which have 1-to-1
mapping with EOTIDs, and expose them to network stack.

Move common skb validation in Tx path to a separate function and
add minimal Tx path for ETHOFLD. Update Tx queue selection to return
normal NIC Txq to send traffic pattern that can't go through ETHOFLD
Tx path.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1396c2b

cxgb4: rework queue config and MSI-X allocation · 76c3a552

Rahul Lakkireddy authored Nov 07, 2019

Simplify queue configuration and MSI-X allocation logic. Use a single
MSI-X information table for both NIC and ULDs. Remove hard-coded
MSI-X indices for firmware event queue and non data interrupts.
Instead, use the MSI-X bitmap to obtain a free MSI-X index
dynamically. Save each Rxq's index into the MSI-X information table,
within the Rxq structures themselves, for easier cleanup.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76c3a552

cxgb4: query firmware for QoS offload resources · ab0367ea

Rahul Lakkireddy authored Nov 07, 2019

QoS offload needs Ethernet Offload (ETHOFLD) resources present in the
NIC. These resources are shared with other ULDs. So, query firmware
for the available number of traffic classes, as well as, start and
end indices (EOTID) of the ETHOFLD region.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab0367ea

net_sched: gen_estimator: extend packet counter to 64bit · 1c8dd9cb

Eric Dumazet authored Nov 06, 2019

I forgot to change last_packets field in struct net_rate_estimator.

Without this fix, rate estimators would misbehave after more
than 2^32 packets have been sent.

Another solution would be to be careful and only use the
32 least significant bits of packets counters, but we have
a hole in net_rate_estimator structure and this looks
easier to read/maintain.

Fixes: d0083d98 ("net_sched: extend packet counter to 64bit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c8dd9cb

dpaa2-ptp: fix compile error · 2d791e3b

Chenwandun authored Nov 07, 2019

phylink_set_port_modes will be compiled if CONFIG_PHYLINK enabled,
dpaa2_mac_validate will be compiled if CONFIG_FSL_DPAA2_ETH enabled,
it should select CONFIG_PHYLINK when dpaa2_mac_validate call
phylink_set_port_modes

drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_validate':
dpaa2-mac.c:(.text+0x3a1): undefined reference to `phylink_set_port_modes'
drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_connect':
dpaa2-mac.c:(.text+0x91a): undefined reference to `phylink_create'
dpaa2-mac.c:(.text+0x94e): undefined reference to `phylink_of_phy_connect'
dpaa2-mac.c:(.text+0x97f): undefined reference to `phylink_destroy'
drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.o: In function `dpaa2_mac_disconnect':
dpaa2-mac.c:(.text+0xa9f): undefined reference to `phylink_disconnect_phy'
dpaa2-mac.c:(.text+0xab0): undefined reference to `phylink_destroy'
make: *** [vmlinux] Error 1

Fixes: 71947923 ("dpaa2-eth: add MAC/PHY support through phylink")
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d791e3b

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · fdc66c3d

David S. Miller authored Nov 06, 2019

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-11-06

This series contains updates to ice driver only.

Scott adds ethtool -m support so that we can read eeprom data on SFP/OSFP
modules.

Anirudh updates the return value to properly reflect when SRIOV is not
supported.

Md Fahad updates the driver to handle a change in the NVM, where the
boot configuration section was moved to the Preserved Field Area (PFA)
of the NVM.

Paul resolves an issue when DCBx requests non-contiguous TCs, transmit
hangs could occur, so configure a default traffic class (TC0) in these
cases to prevent traffic hangs.  Adds a print statement to notify the
user when unsupported modules are inserted.

Bruce fixes up the driver unload code flow to ensure we do not clear the
interrupt scheme until the reset is complete, otherwise a hardware error
may occur.

Dave updates the DCB initialization to set is_sw_lldp boolean when the
firmware has been detected to be in an untenable state.  This will
ensure that the firmware is in a known state.

Michal saves off the PCI state and I/O BARs address after PCI bus reset
so that after the reset, device registers can be read.  Also adds a NULL
pointer check to prevent a potential kernel panic.

Mitch resolves an issue where VF's on PF's other than 0 were not seeing
resets by using the per-PF VF ID instead of the absolute VF ID.

Krzysztof does some code cleanup to remove a unneeded wrapper and
reduces the code complexity.

Brett reduces confusion by changing the name of ice_vc_dis_vf() to
ice_vc_reset_vf() to better describe what the function is actually
doing.

v2: dropped patch 3 "ice: Add support for FW recovery mode detection"
    from the origin al series, while Ani makes changes based on
    community feedback to implement devlink into the changes.
v3: dropped patch 1 "ice: implement set_eeprom functionality" due to a
    bug found and additional changes will be needed when Ani implements
    devlink in the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fdc66c3d

net: dsa: mv8e6xxx: Fix stub function parameters · 64a26007

Andrew Lunn authored Nov 07, 2019

mv88e6xxx_g2_atu_stats_get() takes two parameters. Make the stub
function also take two, otherwise we get compile errors.

Fixes: c5f299d5 ("net: dsa: mv88e6xxx: global1_atu: Add helper for get next")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

64a26007

Merge branch 'net-phy-at803x-device-tree-binding' · 16cf4222

David S. Miller authored Nov 06, 2019

Michael Walle says:

====================
net: phy: at803x device tree binding

Adds a device tree binding to configure the clock and the RGMII voltage.

Changes since v1:
 - rebased to latest net-next
 - renamed "Atheros" to "Qualcomm Atheros"
 - add a new patch to remove config_init() from AR9331

Changes since the RFC:
 - renamed the Kconfig entry to "Qualcomm Atheros.." and reordered the
   item
 - renamed the prefix from atheros to qca
 - use the correct name AR803x (instead of AT803x) in new files and
   dt-bindings.
 - listed the PHY maintainers in the new schema. Hopefully, thats ok.
 - fixed a typo in the bindings schema
 - run dtb_checks and dt_binding_check and fixed the schema
 - dropped the rgmii-io-1v8 property; instead provide two regulators vddh
   and vddio, add one consumer vddio-supply
 - fix the clock settings for the AR8030/AR8035
 - only the AR8031 supports chaning the LDO and the PLL mode in software.
   Check if we have the correct PHY.
 - new patch to mention the AR8033 which is the same as the AR8031 just
   without PTP support
 - new patch which corrects any displayed PHY names and comments. Be
   consistent.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

16cf4222

net: phy: at803x: remove config_init for AR9331 · ed7fa2ad

Michael Walle authored Nov 06, 2019

According to its datasheet, the internal PHY doesn't have debug
registers nor MMDs. Since config_init() only configures delays and
clocks and so on in these registers it won't be needed on this PHY.
Remove it.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed7fa2ad

net: phy: at803x: fix the PHY names · 96c36712

Michael Walle authored Nov 06, 2019

Fix at least the displayed strings. The actual name of the chip is
AR803x.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

96c36712

net: phy: at803x: mention AR8033 as same as AR8031 · 428061f7

Michael Walle authored Nov 06, 2019

The AR8033 is the AR8031 without PTP support. All other registers are
the same. Unfortunately, they share the same PHY ID. Therefore, we
cannot distinguish between the one with PTP support and the one without.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

428061f7

net: phy: at803x: add device tree binding · 2f664823

Michael Walle authored Nov 06, 2019

Add support for configuring the CLK_25M pin as well as the RGMII I/O
voltage by the device tree.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f664823

dt-bindings: net: phy: Add support for AT803X · 2c63221c

Michael Walle authored Nov 06, 2019

Document the Atheros AR803x PHY bindings.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c63221c

net: phy: at803x: fix Kconfig description · 4985dffc

Michael Walle authored Nov 06, 2019

The name of the PHY is actually AR803x not AT803x. Additionally, add the
name of the vendor and mention the AR8031 support.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

4985dffc

tcp: fix data-race in tcp_recvmsg() · a5a7daa5

Eric Dumazet authored Nov 06, 2019

Reading tp->recvmsg_inq after socket lock is released
raises a KCSAN warning [1]

Replace has_tss & has_cmsg by cmsg_flags and make
sure to not read tp->recvmsg_inq a second time.

[1]
BUG: KCSAN: data-race in tcp_chrono_stop / tcp_recvmsg

write to 0xffff888126adef24 of 2 bytes by interrupt on cpu 0:
 tcp_chrono_set net/ipv4/tcp_output.c:2309 [inline]
 tcp_chrono_stop+0x14c/0x280 net/ipv4/tcp_output.c:2338
 tcp_clean_rtx_queue net/ipv4/tcp_input.c:3165 [inline]
 tcp_ack+0x274f/0x3170 net/ipv4/tcp_input.c:3688
 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
 tcp_v4_rcv+0x19dc/0x1bb0 net/ipv4/tcp_ipv4.c:1942
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5214
 napi_skb_finish net/core/dev.c:5677 [inline]
 napi_gro_receive+0x28f/0x330 net/core/dev.c:5710

read to 0xffff888126adef25 of 1 bytes by task 7275 on cpu 1:
 tcp_recvmsg+0x77b/0x1a30 net/ipv4/tcp.c:2187
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec net/socket.c:871 [inline]
 sock_recvmsg net/socket.c:889 [inline]
 sock_recvmsg+0x92/0xb0 net/socket.c:885
 sock_read_iter+0x15f/0x1e0 net/socket.c:967
 call_read_iter include/linux/fs.h:1889 [inline]
 new_sync_read+0x389/0x4f0 fs/read_write.c:414
 __vfs_read+0xb1/0xc0 fs/read_write.c:427
 vfs_read fs/read_write.c:461 [inline]
 vfs_read+0x143/0x2c0 fs/read_write.c:446
 ksys_read+0xd5/0x1b0 fs/read_write.c:587
 __do_sys_read fs/read_write.c:597 [inline]
 __se_sys_read fs/read_write.c:595 [inline]
 __x64_sys_read+0x4c/0x60 fs/read_write.c:595
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7275 Comm: sshd Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: b75eba76 ("tcp: send in-queue bytes in cmsg upon read")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5a7daa5

net: silence data-races on sk_backlog.tail · 9ed498c6

Eric Dumazet authored Nov 06, 2019

sk->sk_backlog.tail might be read without holding the socket spinlock,
we need to add proper READ_ONCE()/WRITE_ONCE() to silence the warnings.

KCSAN reported :

BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg

write to 0xffff8881265109f8 of 8 bytes by interrupt on cpu 1:
 __sk_add_backlog include/net/sock.h:907 [inline]
 sk_add_backlog include/net/sock.h:938 [inline]
 tcp_add_backlog+0x476/0xce0 net/ipv4/tcp_ipv4.c:1759
 tcp_v4_rcv+0x1a70/0x1bd0 net/ipv4/tcp_ipv4.c:1947
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:4929
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5043
 netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5133
 napi_skb_finish net/core/dev.c:5596 [inline]
 napi_gro_receive+0x28f/0x330 net/core/dev.c:5629
 receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
 virtnet_receive drivers/net/virtio_net.c:1323 [inline]
 virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
 napi_poll net/core/dev.c:6311 [inline]
 net_rx_action+0x3ae/0xa90 net/core/dev.c:6379
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0xbb/0xe0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 do_IRQ+0xa6/0x180 arch/x86/kernel/irq.c:263
 ret_from_intr+0x0/0x19
 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
 cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 do_idle+0x1af/0x280 kernel/sched/idle.c:263
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
 start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241

read to 0xffff8881265109f8 of 8 bytes by task 8057 on cpu 0:
 tcp_recvmsg+0x46e/0x1b40 net/ipv4/tcp.c:2050
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec net/socket.c:871 [inline]
 sock_recvmsg net/socket.c:889 [inline]
 sock_recvmsg+0x92/0xb0 net/socket.c:885
 sock_read_iter+0x15f/0x1e0 net/socket.c:967
 call_read_iter include/linux/fs.h:1889 [inline]
 new_sync_read+0x389/0x4f0 fs/read_write.c:414
 __vfs_read+0xb1/0xc0 fs/read_write.c:427
 vfs_read fs/read_write.c:461 [inline]
 vfs_read+0x143/0x2c0 fs/read_write.c:446
 ksys_read+0xd5/0x1b0 fs/read_write.c:587
 __do_sys_read fs/read_write.c:597 [inline]
 __se_sys_read fs/read_write.c:595 [inline]
 __x64_sys_read+0x4c/0x60 fs/read_write.c:595
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 8057 Comm: syz-fuzzer Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ed498c6

dpaa2-eth: fix an always true condition in dpaa2_mac_get_if_mode · 226df3ef

Ioana Ciornei authored Nov 06, 2019

Convert the phy_mode() function to return the if_mode through an
argument, similar to the new form of of_get_phy_mode().
This will help with handling errors in a common manner and also will fix
an always true condition.

Fixes: 0c65b2b9 ("net: of_get_phy_mode: Change API to solve int/unit warnings")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

226df3ef

net: openvswitch: select vport upcall portid directly · 90ce9f23

Tonghao Zhang authored Nov 07, 2019

The commit 69c51582ff786 ("dpif-netlink: don't allocate per
thread netlink sockets"), in Open vSwitch ovs-vswitchd, has
changed the number of allocated sockets to just one per port
by moving the socket array from a per handler structure to
a per datapath one. In the kernel datapath, a vport will have
only one socket in most case, if so select it directly in
fast-path.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

90ce9f23

net: axienet: Fix error return code in axienet_probe() · eb34e98b

Wei Yongjun authored Nov 06, 2019

In the DMA memory resource get failed case, the error is not
set and 0 will be returned. Fix it by removing redundant check
since devm_ioremap_resource() will handle it.

Fixes: 28ef9ebd ("net: axienet: make use of axistream-connected attribute optional")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb34e98b

net: aquantia: fix return value check in aq_ptp_init() · 1dcff44a

Wei Yongjun authored Nov 06, 2019

Function ptp_clock_register() returns ERR_PTR() and never returns
NULL. The NULL test should be removed.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1dcff44a

ptp: ptp_clockmatrix: Fix missing unlock on error in idtcm_probe() · b97fa0b5

Wei Yongjun authored Nov 06, 2019

Add the missing unlock before return from function idtcm_probe()
in the error handling case.

Fixes: 3a6ba7dc ("ptp: Add a ptp clock driver for IDT ClockMatrix.")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b97fa0b5

tipc: eliminate the dummy packet in link synching · d0d605c5

Tuong Lien authored Nov 06, 2019

When preparing tunnel packets for the link failover or synchronization,
as for the safe algorithm, we added a dummy packet on the pair link but
never sent it out. In the case of failover, the pair link will be reset
anyway. But for link synching, it will always result in retransmission
of the dummy packet after that.
We have also observed that such the retransmission at the early stage
when a new node comes in a large cluster will take some time and hard
to be done, leading to the repeated retransmit failures and the link is
reset.

Since in commit 4929a932 ("tipc: optimize link synching mechanism")
we have already built a dummy 'TUNNEL_PROTOCOL' message on the new link
for the synchronization, there's no need for the dummy on the pair one,
this commit will skip it when the new mechanism takes in place. In case
nothing exists in the pair link's transmq, the link synching will just
start and stop shortly on the peer side.

The patch is backward compatible.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0d605c5

Merge branch 'lwtunnel-add-ip-and-ip6-options-setting-and-dumping' · 3924f72a

David S. Miller authored Nov 06, 2019

Xin Long says:

====================
lwtunnel: add ip and ip6 options setting and dumping

With this patchset, users can configure options by ip route encap
for geneve, vxlan and ersapn lwtunnel, like:

  # ip r a 1.1.1.0/24 encap ip id 1 geneve class 0 type 0 \
    data "1212121234567890" dst 10.1.0.2 dev geneve1

  # ip r a 1.1.1.0/24 encap ip id 1 vxlan gbp 456 \
    dst 10.1.0.2 dev erspan1

  # ip r a 1.1.1.0/24 encap ip id 1 erspan ver 1 idx 123 \
    dst 10.1.0.2 dev erspan1

iproute side patch is attached on the reply of this mail.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3924f72a