- 11 Dec, 2020 1 commit
-
-
Emmanuel Grumbach authored
The WLAN device may exist yet not be usable. This can happen when the WLAN device is controllable by both the host and some platform internal component. We need some arbritration that is vendor specific, but when the device is not available for the host, we need to reflect this state towards the user space. Add a reason field to the rfkill object (and event) so that userspace can know why the device is in rfkill: because some other platform component currently owns the device, or because the actual hw rfkill signal is asserted. Capable userspace can now determine the reason for the rfkill and possibly do some negotiation on a side band channel using a proprietary protocol to gain ownership on the device in case the device is owned by some other component. When the host gains ownership on the device, the kernel can remove the RFKILL_HARD_BLOCK_NOT_OWNER reason and the hw rfkill state will be off. Then, the userspace can bring the device up and start normal operation. The rfkill_event structure is enlarged to include the additional byte, it is now 9 bytes long. Old user space will ask to read only 8 bytes so that the kernel can know not to feed them with more data. When the user space writes 8 bytes, new kernels will just read what is present in the file descriptor. This new byte is read only from the userspace standpoint anyway. If a new user space uses an old kernel, it'll ask to read 9 bytes but will get only 8, and it'll know that it didn't get the new state. When it'll write 9 bytes, the kernel will again ignore this new byte which is read only from the userspace standpoint. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20201104134641.28816-1-emmanuel.grumbach@intel.comSigned-off-by: Johannes Berg <johannes.berg@intel.com>
-
- 10 Dec, 2020 39 commits
-
-
David S. Miller authored
Tom Parkin says: ==================== add ppp_generic ioctl(s) to bridge channels Following on from my previous RFC[1], this series adds two ioctl calls to the ppp code to implement "channel bridging". When two ppp channels are bridged, frames presented to ppp_input() on one channel are passed to the other channel's ->start_xmit function for transmission. The primary use-case for this functionality is in an L2TP Access Concentrator where PPP frames are typically presented in a PPPoE session (e.g. from a home broadband user) and are forwarded to the ISP network in a PPPoL2TP session. The two new ioctls, PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN form a symmetric pair. Userspace code testing and illustrating use of the ioctl calls is available in the go-l2tp[2] and l2tp-ktest[3] repositories. [1]. Previous RFC series: https://lore.kernel.org/netdev/20201106181647.16358-1-tparkin@katalix.com/ [2]. go-l2tp: a Go library for building L2TP applications on Linux systems. Support for the PPPIOCBRIDGECHAN ioctl is on a branch: https://github.com/katalix/go-l2tp/tree/tp_002_pppoe_2 [3]. l2tp-ktest: a test suite for the Linux Kernel L2TP subsystem. Support for the PPPIOCBRIDGECHAN ioctl is on a branch: https://github.com/katalix/l2tp-ktest/tree/tp_ac_pppoe_tests_2 Changelog: v4: * Fix NULL-pointer access in PPPIOCBRIDGECHAN in the case that the ID of the channel to be bridged wasn't found. * Add comment in ppp_unbridge_channels to better document the unbridge process. v3: * Use rcu_dereference_protected for accessing struct channel 'bridge' field during updates with lock 'upl' held. * Avoid race in ppp_unbridge_channels by ensuring that each channel in the bridge points to it's peer before decrementing refcounts. v2: * Add missing __rcu annotation to struct channel 'bridge' field in order to squash a sparse warning from a C=1 build * Integrate review comments from gnault@redhat.com * Have ppp_unbridge_channels return -EINVAL if the channel isn't part of a bridge: this better aligns with the return code from ppp_disconnect_channel. * Improve docs update by including information on ioctl arguments and error return codes. ==================== Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Parkin authored
Add documentation of the newly-added PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Parkin authored
This new ioctl pair allows two ppp channels to be bridged together: frames arriving in one channel are transmitted in the other channel and vice versa. The practical use for this is primarily to support the L2TP Access Concentrator use-case. The end-user session is presented as a ppp channel (typically PPPoE, although it could be e.g. PPPoA, or even PPP over a serial link) and is switched into a PPPoL2TP session for transmission to the LNS. At the LNS the PPP session is terminated in the ISP's network. When a PPP channel is bridged to another it takes a reference on the other's struct ppp_file. This reference is dropped when the channels are unbridged, which can occur either explicitly on userspace calling the PPPIOCUNBRIDGECHAN ioctl, or implicitly when either channel in the bridge is unregistered. In order to implement the channel bridge, struct channel is extended with a new field, 'bridge', which points to the other struct channel making up the bridge. This pointer is RCU protected to avoid adding another lock to the data path. To guard against concurrent writes to the pointer, the existing struct channel lock 'upl' coverage is extended rather than adding a new lock. The 'upl' lock is used to protect the existing unit pointer. Since the bridge effectively replaces the unit (they're mutually exclusive for a channel) it makes coding easier to use the same lock to cover them both. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We use rcu_assign_pointer to assign both the table and the entries, but the entries are not marked as __rcu. This generates sparse warnings. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willy Tarreau authored
This reverts commit 0a4e9ce1. The code was developed and tested on an MSC313E SoC, which seems to be half-way between the AT91RM9200 and the AT91SAM9260 in that it supports both the 2-descriptors mode and a Tx ring. It turns out that after the code was merged I could notice that the controller would sometimes lock up, and only when dealing with sustained bidirectional transfers, in which case it would report a Tx overrun condition right after having reported being ready, and will stop sending even after the status is cleared (a down/up cycle fixes it though). After adding lots of traces I couldn't spot a sequence pattern allowing to predict that this situation would happen. The chip comes with no documentation and other bits are often reported with no conclusive pattern either. It is possible that my change is wrong just like it is possible that the controller on the chip is bogus or at least unpredictable based on existing docs from other chips. I do not have an RM9200 at hand to test at the moment and a few tests run on a more recent 9G20 indicate that this code path cannot be used there to test the code on a 3rd platform. Since the MSC313E works fine in the single-descriptor mode, and that people using the old RM9200 very likely favor stability over performance, better revert this patch until we can test it on the original platform this part of the driver was written for. Note that the reverted patch was actually tested on MSC313E. Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: Claudiu Beznea <claudiu.beznea@microchip.com> Cc: Daniel Palmer <daniel@0x0f.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/netdev/20201206092041.GA10646@1wt.eu/Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Subash Abhinov Kasiviswanathan authored
Packets sent by rmnet to the real device have variable MAP header lengths based on the data format configured. This patch adds checks to ensure that the real device MTU is sufficient to transmit the MAP packet comprising of the MAP header and the IP packet. This check is enforced when rmnet devices are created and updated and during MTU updates of both the rmnet and real device. Additionally, rmnet devices now have a default MTU configured which accounts for the real device MTU and the headroom based on the data format. Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Tested-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xie He authored
When the upper layer instruct us to connect (or disconnect), but we have already connected (or disconnected), consider this operation successful rather than failed. This can help the upper layer to correct its record about whether we are connected or not here in layer 2. The upper layer may not have the correct information about whether we are connected or not. This can happen if this driver has already been running for some time when the "x25" module gets loaded. Another X.25 driver (hdlc_x25) is already doing this, so we make this driver do this, too. Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sasha Neftin authored
Add new device ID for the next step of the silicon and reflect the I226_K part. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arjun Roy authored
A prior patch increased the size of struct tcp_zerocopy_receive but did not update do_tcp_getsockopt() handling to properly account for this. This patch simply reintroduces content erroneously cut from the referenced prior patch that handles the new struct size. Fixes: 18fb76ed ("net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.") Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Zheng Yongjun authored
Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Zheng Yongjun authored
Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Zheng Yongjun authored
Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'linux-can-next-for-5.11-20201210' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2020-12-10 here's a pull request of 7 patches for net-next/master. The first patch is by Oliver Hartkopp for the CAN ISOTP, which adds support for functional addressing. A patch by Antonio Quartulli removes an unneeded unlikely() annotation from the rx-offload helper. The next three patches target the m_can driver. Sean Nyekjaers's patch removes a double clearing of clock stop request bit, Patrik Flykt's patch moves the runtime PM enable/disable to m_can_platform and Jarkko Nikula's patch adds a PCI glue code driver. Fabio Estevam's patch converts the flexcan driver to DT only. And Manivannan Sadhasivam's patchd for the mcp251xfd driver adds internal loopback mode support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antonio Quartulli authored
The definition of IS_ERR() already applies the unlikely() notation when checking the error status of the passed pointer. For this reason there is no need to have the same notation outside of IS_ERR() itself. Clean up code by removing redundant notation. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Manivannan Sadhasivam authored
MCP251xFD supports internal loopback mode which can be used to verify CAN functionality in the absence of a real CAN device. Link: https://lore.kernel.org/r/20201201054019.11012-1-manivannan.sadhasivam@linaro.orgSigned-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [mkl: mcp251xfd_get_normal_mode(): move CAN_CTRLMODE_LOOPBACK check to front] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Fabio Estevam authored
The flexcan driver runs only on DT platforms, so simplify the code by using of_device_get_match_data() to retrieve the driver data and also by removing the unused id_table. Signed-off-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/r/20201128132855.7724-1-festevam@gmail.comSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Jarkko Nikula authored
Add support for M_CAN controller on Intel Elkhart Lake attached to the PCI bus. It integrates the Bosch M_CAN controller with Message RAM and the wrapper IP block with additional registers which all of them are within the same MMIO range. Currently only interrupt control register from wrapper IP is used and the MRAM configuration is expected to come from the firmware via "bosch,mram-cfg" device property and parsed by m_can.c core. Initial implementation is done by Felipe Balbi while he was working at Intel with later changes from Raymond Tan and me. Co-developed-by: Felipe Balbi (Intel) <balbi@kernel.org> Co-developed-by: Raymond Tan <raymond.tan@intel.com> Signed-off-by: Felipe Balbi (Intel) <balbi@kernel.org> Signed-off-by: Raymond Tan <raymond.tan@intel.com> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20201117160827.3636264-1-jarkko.nikula@linux.intel.comSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Patrik Flykt authored
This is a preparatory patch for upcoming PCI based M_CAN devices. The current PM implementation would cause PCI based drivers to enable PM twice, once when the PCI device is added and a second time in m_can_class_register(). This will cause 'Unbalanced pm_runtime_enable!' to be logged, and is a situation that should be avoided. Therefore, in anticipation of PCI devices, move PM enabling out from M_CAN class registration to its only user, the m_can_platform driver. Signed-off-by: Patrik Flykt <patrik.flykt@linux.intel.com> Link: https://lore.kernel.org/r/20201023115800.46538-2-patrik.flykt@linux.intel.com [mkl: m_can_plat_probe(): fix error handling m_can_class_register(): simplify error handling] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Sean Nyekjaer authored
The CSR bit is already cleared when arriving here so remove this section of duplicate code. The registers set in m_can_config_endisable() is set to same exact values as before this patch. Signed-off-by: Sean Nyekjaer <sean@geanix.com> Acked-by: Sriram Dash <sriram.dash@samsung.com> Acked-by: Dan Murphy <dmurphy@ti.com> Link: https://lore.kernel.org/r/20191211063227.84259-1-sean@geanix.com Fixes: f524f829 ("can: m_can: Create a m_can platform framework") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Antonio Quartulli authored
The definition of IS_ERR() already applies the unlikely() notation when checking the error status of the passed pointer. For this reason there is no need to have the same notation outside of IS_ERR() itself. Clean up code by removing redundant notation. Signed-off-by: Antonio Quartulli <a@unstable.cc> Link: https://lore.kernel.org/r/20201210085321.18693-1-a@unstable.ccSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Oliver Hartkopp authored
When CAN_ISOTP_SF_BROADCAST is set in the CAN_ISOTP_OPTS flags the CAN_ISOTP socket is switched into functional addressing mode, where only single frame (SF) protocol data units can be send on the specified CAN interface and the given tp.tx_id after bind(). In opposite to normal and extended addressing this socket does not register a CAN-ID for reception which would be needed for a 1-to-1 ISOTP connection with a segmented bi-directional data transfer. Sending SFs on this socket is therefore a TX-only 'broadcast' operation. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Thomas Wagner <thwa1@web.de> Link: https://lore.kernel.org/r/20201206144731.4609-1-socketcan@hartkopp.netSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
David S. Miller authored
Huazhong Tan says: ==================== net: hns3: updates for -next This patchset adds support for tc mqprio offload, hw tc offload of tc flower, and adpation for max rss size changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guojia Liao authored
For the max rss size of PF may be up to 512, the max queue number of single tc may be up to 512 too. For the total queue numbers may be up to 1280, so the queue offset of each tc may be more than 1024. So adjust the rss tc mode configuration command, including extend tc size field from 10 bits to 11 bits, and extend tc size field from 3 bits to 4 bits. Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guojia Liao authored
For the max rss size of PF may be up to 512, so adjust the command of configuring rss indirection table to support queue id larger than 255. The width of queue id is extended from 8 bits to 10 bits. The high 2 bits are stored in filed rss_qid_h when the queue id is larger than 255. Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guojia Liao authored
Currently, the driver gets the max rss size from configuration file when initialization. Both the PF and VF share the same max rss size, and no more than 128. For DEVICE_VERSION_V3, the max rss size for PF can be up to 512, so there is a new field in configuration file to store it, the old filed is used for VF. To be compatible with boards using old configure file, the PF will use the old filed if the one is zero. For the rss size may be larger than 256, so the type of rss_indirection_tbl of struct hclge_vport should be changed to u16 as well. Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
Some new device supports forwarding packet to queues of specified TC when flow director rule hit. So add support to configure flow director rule by tc flower. To avoid rule conflict, add a new flow director mode HCLGE_FD_TC_FLOWER_ACTIVE, and only one mode can be active at the same time. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
For some new device, it supports forwarding packet to queues of specified TC when flow director rule hit. So extend the command handle to support it. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
Currently, the HNS3 driver only supports offload for tc number and prio_tc. This patch adds support for other qopts, including queues count and offset for each tc. When enable tc mqprio offload, it's not allowed to change queue numbers by ethtool. For hardware limitation, the queue number of each tc should be power of 2. For the queues is not assigned to each tc by average, so it's should return vport->alloc_tqps for hclge_get_max_channels(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
Currently, there are multiple members related to tc information in struct hnae3_knic_private_info. Merge them into a new struct hnae3_tc_info. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Test robot reports: drivers/net/ethernet/netronome/nfp/crypto/tls.c: In function 'nfp_net_tls_rx_resync_req': drivers/net/ethernet/netronome/nfp/crypto/tls.c:477:18: warning: variable 'ipv6h' set but not used [-Wunused-but-set-variable] 477 | struct ipv6hdr *ipv6h; | ^~~~~ In file included from include/linux/compiler_types.h:65, from <command-line>: drivers/net/ethernet/netronome/nfp/crypto/tls.c: In function 'nfp_net_tls_add': include/linux/compiler_attributes.h:208:41: warning: statement will never be executed [-Wswitch-unreachable] 208 | # define fallthrough __attribute__((__fallthrough__)) | ^~~~~~~~~~~~~ drivers/net/ethernet/netronome/nfp/crypto/tls.c:299:3: note: in expansion of macro 'fallthrough' 299 | fallthrough; | ^~~~~~~~~~~ Use the IPv6 header in the switch, it doesn't matter which header we use to read the version field. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wong Vee Khee authored
Assign stmmac's mdio_bus probe capabilities to MDIOBUS_C22_C45. This extended the probing of C45 PHY devices on the MDIO bus. Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
dule' Russell King says: ==================== Add support for VSOL V2801F/CarlitoxxPro CPGOS03 GPON module This patch set adds support for the V2801F / CarlitoxxPro module. This requires two changes: 1) the module only supports single byte reads to the ID EEPROM, while we need to still permit sequential reads to the diagnostics EEPROM for atomicity reasons. 2) we need to relax the encoding check when we have no reported capabilities to allow 1000base-X based on the module bitrate. Thanks to Pali Rohár for responsive testing over the last two days. (Resending, dropping the utf-8 characters in Pali's name so the patches get through vger. Added Andrew's r-b tags.) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King authored
Do not check the encoding when deriving 1000BASE-X from the bitrate when no other modes are discovered. Some GPON modules (VSOL V2801F and CarlitoxxPro CPGOS03-0490 v2.0) indicate NRZ encoding with a 1200Mbaud bitrate, but should be driven with 1000BASE-X on the host side. Tested-by: Pali Rohár <pali@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King authored
Add a workaround for the detection of VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 GPON module which CarlitoxxPro states needs single byte I2C reads to the EEPROM. Pali Rohár reports that he also has a CarlitoxxPro-based V2801F module, which reports a manufacturer of "OEM". This manufacturer can't be matched as it appears in many different modules, so also match the part number too. Reported-by: Thomas Schreiber <tschreibe@gmail.com> Reported-by: Pali Rohár <pali@kernel.org> Tested-by: Pali Rohár <pali@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xie He authored
1. When the x25 module gets loaded, layer 2 may already be running and connected. In this case, although we are in X25_LINK_STATE_0, we still need to handle the Restart Request received, rather than ignore it. 2. When we are in X25_LINK_STATE_2, we have already sent a Restart Request and is waiting for the Restart Confirmation with t20timer. t20timer will restart itself repeatedly forever so it will always be there, as long as we are in State 2. So we don't need to check x25_t20timer_pending again. Fixes: d023b2b9 ("net/x25: fix restart request/confirm handling") Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Paolo Abeni says: ==================== mptcp: a bunch of fixes This series includes a few fixes following-up the recent code refactor for the MPTCP RX and TX paths. Boundling them together, since the fixes are somewhat related. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
When the workqueue disposes of the msk, the subflows can still receive some data from the peer after __mptcp_close_ssk() completes. The above could trigger a race between the msk receive path and the msk destruction. Acquiring the mptcp_data_lock() in __mptcp_destroy_sock() will not save the day: the rx path could be reached even after msk destruction completes. Instead use the subflow 'disposable' flag to prevent entering the msk receive path after __mptcp_close_ssk(). Fixes: e16163b6 ("mptcp: refactor shutdown and close") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
When a MPTCP listener socket is closed with unaccepted children pending, the ULP release callback will be invoked, but nobody will call into __mptcp_close_ssk() on the corresponding subflow. As a consequence, at ULP release time, the 'disposable' flag will be cleared and the subflow context memory will be leaked. This change addresses the issue always freeing the context if the subflow is still in the accept queue at ULP release time. Additionally, this fixes an incorrect code reference in the related comment. Note: this fix leverages the changes introduced by the previous commit. Fixes: e16163b6 ("mptcp: refactor shutdown and close") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Christoph reported the following splat: WARNING: CPU: 0 PID: 4615 at net/ipv4/inet_connection_sock.c:1031 inet_csk_listen_stop+0x8e8/0xad0 net/ipv4/inet_connection_sock.c:1031 Modules linked in: CPU: 0 PID: 4615 Comm: syz-executor.4 Not tainted 5.9.0 #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:inet_csk_listen_stop+0x8e8/0xad0 net/ipv4/inet_connection_sock.c:1031 Code: 03 00 00 00 e8 79 b2 3d ff e9 ad f9 ff ff e8 1f 76 ba fe be 02 00 00 00 4c 89 f7 e8 62 b2 3d ff e9 14 f9 ff ff e8 08 76 ba fe <0f> 0b e9 97 f8 ff ff e8 fc 75 ba fe be 03 00 00 00 4c 89 f7 e8 3f RSP: 0018:ffffc900037f7948 EFLAGS: 00010293 RAX: ffff88810a349c80 RBX: ffff888114ee1b00 RCX: ffffffff827b14cd RDX: 0000000000000000 RSI: ffffffff827b1c38 RDI: 0000000000000005 RBP: ffff88810a2a8000 R08: ffff88810a349c80 R09: fffff520006fef1f R10: 0000000000000003 R11: fffff520006fef1e R12: ffff888114ee2d00 R13: dffffc0000000000 R14: 0000000000000001 R15: ffff888114ee1d68 FS: 00007f2ac1945700(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd44798bc0 CR3: 0000000109810002 CR4: 0000000000170ef0 Call Trace: __tcp_close+0xd86/0x1110 net/ipv4/tcp.c:2433 __mptcp_close_ssk+0x256/0x430 net/mptcp/protocol.c:1761 __mptcp_destroy_sock+0x49b/0x770 net/mptcp/protocol.c:2127 mptcp_close+0x62d/0x910 net/mptcp/protocol.c:2184 inet_release+0xe9/0x1f0 net/ipv4/af_inet.c:434 __sock_release+0xd2/0x280 net/socket.c:596 sock_close+0x15/0x20 net/socket.c:1277 __fput+0x276/0x960 fs/file_table.c:281 task_work_run+0x109/0x1d0 kernel/task_work.c:151 get_signal+0xe8f/0x1d40 kernel/signal.c:2561 arch_do_signal+0x88/0x1b60 arch/x86/kernel/signal.c:811 exit_to_user_mode_loop kernel/entry/common.c:161 [inline] exit_to_user_mode_prepare+0x9b/0xf0 kernel/entry/common.c:191 syscall_exit_to_user_mode+0x22/0x150 kernel/entry/common.c:266 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f2ac1254469 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 RSP: 002b:00007f2ac1944dc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffbf RBX: 000000000069bf00 RCX: 00007f2ac1254469 RDX: 0000000000000000 RSI: 0000000000008982 RDI: 0000000000000003 RBP: 000000000069bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000069bf0c R13: 00007ffeb53f178f R14: 00000000004668b0 R15: 0000000000000003 After commit 0397c6d8 ("mptcp: keep unaccepted MPC subflow into join list"), the msk's workqueue and/or PM can touch the MPC subflow - and acquire its socket lock - even if it's still unaccepted. If the above event races with the relevant listener socket close, we can end-up with the above splat. This change addresses the issue delaying the MPC socket insertion in conn_list at accept time - that is, partially reverting the blamed commit. We must additionally ensure that mptcp_pm_fully_established() happens after accept() time, or the PM will not be able to handle properly such event - conn_list could be empty otherwise. In the receive path, we check the subflow list node to ensure it is out of the listener queue. Be sure client subflows do not match transiently such condition moving them into the join list earlier at creation time. Since we now have multiple mptcp_pm_fully_established() call sites from different code-paths, said helper can now race with itself. Use an additional PM status bit to avoid multiple notifications. Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/103 Fixes: 0397c6d8 ("mptcp: keep unaccepted MPC subflow into join list"), Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-