- 23 Feb, 2022 38 commits
-
-
Hans Schultz authored
Supporting bridge ports in locked mode using the drop on lock feature in Marvell mv88e6xxx switchcores is described in the '88E6096/88E6097/88E6097F Datasheet', sections 4.4.6, 4.4.7 and 5.1.2.1 (Drop on Lock). This feature is implemented here facilitated by the locked port flag. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hans Schultz authored
Ensures that the DSA switch driver gets notified of changes to the BR_PORT_LOCKED flag as well, for the case when a DSA port joins or leaves a LAG that is a bridge port. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hans Schultz authored
Various switchcores support setting ports in locked mode, so that clients behind locked ports cannot send traffic through the port unless a fdb entry is added with the clients MAC address. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hans Schultz authored
In a 802.1X scenario, clients connected to a bridge port shall not be allowed to have traffic forwarded until fully authenticated. A static fdb entry of the clients MAC address for the bridge port unlocks the client and allows bidirectional communication. This scenario is facilitated with setting the bridge port in locked mode, which is also supported by various switchcore chipsets. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
drop_monitor is using an unique list on which all netdevices in the host have an element, regardless of their netns. This scales poorly, not only at device unregister time (what I caught during my netns dismantle stress tests), but also at packet processing time whenever trace_napi_poll_hit() is called. If the intent was to avoid adding one pointer in 'struct net_device' then surely we prefer O(1) behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ido Schimmel says: ==================== mlxsw: Various updates This patchset contains miscellaneous updates to mlxsw gathered over time. Patches #1-#2 fix recent regressions present in net-next. Patches #3-#11 are small cleanups performed while adding line card support in mlxsw. Patch #12 adds the SFF-8024 Identifier Value of OSFP transceiver in order to be able to dump their EEPROM contents over the ethtool IOCTL interface. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Danielle Ratson authored
The driver can already dump the EEPROM contents of QSFP-DD transceiver modules via its ethtool_ops::get_module_info() and ethtool_ops::get_module_eeprom() callbacks. Add support for OSFP transceiver modules by adding their SFF-8024 Identifier Value (0x19). This is required for future NVIDIA Spectrum-4 based systems that will be equipped with OSFP transceivers. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Since SwitchX-2 support was removed in commit b0d80c01 ("mlxsw: Remove Mellanox SwitchX-2 ASIC support"), all the ASICs supported by mlxsw support the resource query command. Therefore, remove the resource query check and always query resources from the device. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Currently there are several different features defined in 'mlxsw_driver' for trap support validation. There is no reason to have dedicated features for specific traps. Perform validation of all of them by testing feature 'MLXSW_BUS_F_TXRX'. Remove trap capability validation from 'core_env.c' which is redundant after validation has been added to mlxsw_core_trap_register(). Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
The FW minor and subminor versions are the same for all generations of Spectrum ASICs. Unify them into a single set of defines. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Remove unnecessary asserts for module index validation. Leave only one that is actually necessary in mlxsw_env_pmpe_listener_func() where the module index is directly read from the firmware event. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Do the same as for other registers and have "mgpir_" prefix for the MGPIR fields. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Remove obsolete API mlxsw_core_res_query_enabled(), which is only relevant for end-of-life SwitchX-2 ASICs. Support for these ASICs was removed in commit b0d80c01 ("mlxsw: Remove Mellanox SwitchX-2 ASIC support"). Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Rename labels for error flow handling in order to align with naming convention used in rest of 'mlxsw' code. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Replace all local variables 'mlwsw_hwmon_attr' by 'mlxsw_hwmon_attr'. All variable prefixes should start with 'mlxsw' according to the naming convention, so 'mlwsw' is changed to 'mlxsw'. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
The driver registers with both the hwmon and thermal subsystems. Therefore, there is no need for the thermal subsystem to automatically create hwmon entries upon registration of a thermal zone, as this results in duplicate information. Avoid creation of virtual hwmon objects by thermal subsystem by registering a thermal zone with 'no_hwmon' set to 'true'. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Only VLAN entries installed on the bridge device itself should be considered when checking whether a packet with a specific VLAN can be mirrored via a bridge device. VLAN entries only used to keep context (i.e., entries with 'BRIDGE_VLAN_INFO_BRENTRY' unset) should be ignored. Fix this by preventing mirroring when the VLAN entry does not have the 'BRIDGE_VLAN_INFO_BRENTRY' flag set. Fixes: ddaff504 ("mlxsw: spectrum: remove guards against !BRIDGE_VLAN_INFO_BRENTRY") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Avoid trap group setting if driver is not capable of EMAD support. For example, "mlxsw_minimal" driver works over I2C bus, overs which EMADs cannot be sent. Validation is performed by testing feature 'MLXSW_BUS_F_TXRX'. Fixes: 74e0494d ("mlxsw: core: Move basic_trap_groups_set() call out of EMAD init code") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matt Johnston authored
net/mctp/device.c:140:11: warning: Assigned value is garbage or undefined [clang-analyzer-core.uninitialized.Assign] mcb->idx = idx; - Not a real problem due to how the callback runs, fix the warning. net/mctp/route.c:458:4: warning: Value stored to 'msk' is never read [clang-analyzer-deadcode.DeadStores] msk = container_of(key->sk, struct mctp_sock, sk); - 'msk' dead assignment can be removed here. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Matt Johnston says: ==================== mctp: Fix incorrect refs for extended addr This fixes an incorrect netdev unref and also addresses the race condition identified by Jakub in v2. Thanks for the review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matt Johnston authored
In the extended addressing local route output codepath dev_get_by_index_rcu() doesn't take a dev_hold() so we shouldn't dev_put(). Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matt Johnston authored
Previously there was a race that could allow the mctp_dev refcount to hit zero: rcu_read_lock(); mdev = __mctp_dev_get(dev); // mctp_unregister() happens here, mdev->refs hits zero mctp_dev_hold(dev); rcu_read_unlock(); Now we make __mctp_dev_get() take the hold itself. It is safe to test against the zero refcount because __mctp_dev_get() is called holding rcu_read_lock and mctp_dev uses kfree_rcu(). Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alvin Šipraga says: ==================== net: dsa: realtek: fix PHY register read corruption These two patches fix the issue reported by Arınç where PHY register reads sometimes return garbage data. v1 -> v2: - no code changes - just update the commit message of patch 2 to reflect the conclusion of further investigation requested by Vladimir ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alvin Šipraga authored
Realtek switches in the rtl8365mb family can access the PHY registers of the internal PHYs via the switch registers. This method is called indirect access. At a high level, the indirect PHY register access method involves reading and writing some special switch registers in a particular sequence. This works for both SMI and MDIO connected switches. Currently the rtl8365mb driver does not take any care to serialize the aforementioned access to the switch registers. In particular, it is permitted for other driver code to access other switch registers while the indirect PHY register access is ongoing. Locking is only done at the regmap level. This, however, is a bug: concurrent register access, even to unrelated switch registers, risks corrupting the PHY register value read back via the indirect access method described above. Arınç reported that the switch sometimes returns nonsense data when reading the PHY registers. In particular, a value of 0 causes the kernel's PHY subsystem to think that the link is down, but since most reads return correct data, the link then flip-flops between up and down over a period of time. The aforementioned bug can be readily observed by: 1. Enabling ftrace events for regmap and mdio 2. Polling BSMR PHY register for a connected port; it should always read the same (e.g. 0x79ed) 3. Wait for step 2 to give a different value Example command for step 2: while true; do phytool read swp2/2/0x01; done On my i.MX8MM, the above steps will yield a bogus value for the BSMR PHY register within a matter of seconds. The interleaved register access it then evident in the trace log: kworker/3:4-70 [003] ....... 1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd phytool-16816 [002] ....... 1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0 kworker/3:4-70 [003] ....... 1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0 phytool-16816 [002] ....... 1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69 kworker/3:4-70 [003] ....... 1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0 phytool-16816 [002] ....... 1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041 kworker/3:4-70 [003] ....... 1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0 phytool-16816 [002] ....... 1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1 kworker/3:4-70 [003] ....... 1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be phytool-16816 [002] ....... 1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0 kworker/3:4-70 [003] ....... 1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0 phytool-16816 [002] ....... 1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0 phytool-16816 [002] ....... 1927.142641: mdio_access: SMI-0 read phy:0x02 reg:0x01 val:0x0000 <- ?! kworker/3:4-70 [003] ....... 1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0 kworker/3:4-70 [003] ....... 1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89 kworker/3:4-70 [003] ....... 1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be kworker/3:4-70 [003] ....... 1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0 kworker/3:4-70 [003] ....... 1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0 kworker/3:4-70 [003] ....... 1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6 The kworker here is polling MIB counters for stats, as evidenced by the register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG). This polling is performed every 3 seconds, but is just one example of such unsynchronized access. In Arınç's case, the driver was not using the switch IRQ, so the PHY subsystem was itself doing polling analogous to phytool in the above example. A test module was created [see second Link] to simulate such spurious switch register accesses while performing indirect PHY register reads and writes. Realtek was also consulted to confirm whether this is a known issue or not. The conclusion of these lines of inquiry is as follows: 1. Reading of PHY registers via indirect access will be aborted if, after executing the read operation (via a write to the INDIRECT_ACCESS_CTRL_REG), any register is accessed, other than INDIRECT_ACCESS_STATUS_REG. 2. The PHY register indirect read is only complete when INDIRECT_ACCESS_STATUS_REG reads zero. 3. The INDIRECT_ACCESS_DATA_REG, which is read to get the result of the PHY read, will contain the result of the last successful read operation. If there was spurious register access and the indirect read was aborted, then this register is not guaranteed to hold anything meaningful and the PHY read will silently fail. 4. PHY writes do not appear to be affected by this mechanism. 5. Other similar access routines, such as for MIB counters, although similar to the PHY indirect access method, are actually table access. Table access is not affected by spurious reads or writes of other registers. However, concurrent table access is not allowed. Currently this is protected via mib_lock, so there is nothing to fix. The above statements are corroborated both via the test module and through consultation with Realtek. In particular, Realtek states that this is simply a property of the hardware design and is not a hardware bug. To fix this problem, one must guard against regmap access while the PHY indirect register read is executing. Fix this by using the newly introduced "nolock" regmap in all PHY-related functions, and by aquiring the regmap mutex at the top level of the PHY register access callbacks. Although no issue has been observed with PHY register _writes_, this change also serializes the indirect access method there. This is done purely as a matter of convenience and for reasons of symmetry. Fixes: 4af2950c ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC") Link: https://lore.kernel.org/netdev/CAJq09z5FCgG-+jVT7uxh1a-0CiiFsoKoHYsAWJtiKwv7LXKofQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/871qzwjmtv.fsf@bang-olufsen.dk/Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alvin Šipraga authored
Currently there is no way for Realtek DSA subdrivers to serialize consecutive regmap accesses. In preparation for a bugfix relating to indirect PHY register access - which involves a series of regmap reads and writes - add a facility for subdrivers to serialize their regmap access. Specifically, a mutex is added to the driver private data structure and the standard regmap is initialized with custom lock/unlock ops which use this mutex. Then, a "nolock" variant of the regmap is added, which is functionally equivalent to the existing regmap except that regmap locking is disabled. Functions that wish to serialize a sequence of regmap accesses may then lock the newly introduced driver-owned mutex before using the nolock regmap. Doing things this way means that subdriver code that doesn't care about serialized register access - i.e. the vast majority of code - needn't worry about synchronizing register access with an external lock: it can just continue to use the original regmap. Another advantage of this design is that, while regmaps with locking disabled do not expose a debugfs interface for obvious reasons, there still exists the original regmap which does expose this interface. This interface remains safe to use even combined with driver codepaths that use the nolock regmap, because said codepaths will use the same mutex to synchronize access. With respect to disadvantages, it can be argued that having near-duplicate regmaps is confusing. However, the naming is rather explicit, and examples will abound. Finally, while we are at it, rename realtek_smi_mdio_regmap_config to realtek_smi_regmap_config. This makes it consistent with the naming realtek_mdio_regmap_config in realtek-mdio.c. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The logic from switchdev_handle_port_obj_add_foreign() is directly adapted from switchdev_handle_fdb_event_to_device(), which already detects events on foreign interfaces and reoffloads them towards the switchdev neighbors. However, when we have a simple br0 <-> bond0 <-> swp0 topology and the switchdev_handle_port_obj_add_foreign() gets called on bond0, we get stuck into an infinite recursion: 1. bond0 does not pass check_cb(), so we attempt to find switchdev neighbor interfaces. For that, we recursively call __switchdev_handle_port_obj_add() for bond0's bridge, br0. 2. __switchdev_handle_port_obj_add() recurses through br0's lowers, essentially calling __switchdev_handle_port_obj_add() for bond0 3. Go to step 1. This happens because switchdev_handle_fdb_event_to_device() and switchdev_handle_port_obj_add_foreign() are not exactly the same. The FDB event helper special-cases LAG interfaces with its lag_mod_cb(), so this is why we don't end up in an infinite loop - because it doesn't attempt to treat LAG interfaces as potentially foreign bridge ports. The problem is solved by looking ahead through the bridge's lowers to see whether there is any switchdev interface that is foreign to the @dev we are currently processing. This stops the recursion described above at step 1: __switchdev_handle_port_obj_add(bond0) will not create another call to __switchdev_handle_port_obj_add(br0). Going one step upper should only happen when we're starting from a bridge port that has been determined to be "foreign" to the switchdev driver that passes the foreign_dev_check_cb(). Fixes: c4076cdd ("net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shannon Nelson authored
The ever-vigilant Linux kernel test robot reminded us that we need to use the correct include files to be sure that all the build variations will work correctly. Adding the vmalloc.h include takes care of declaring our use of vzalloc() and vfree(). drivers/net/ethernet/pensando/ionic/ionic_lif.c:396:17: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? drivers/net/ethernet/pensando/ionic/ionic_lif.c:531:21: warning: assignment to 'struct ionic_desc_info *' from 'int' makes pointer from integer without a cast Fixes: 116dce0f ("ionic: Use vzalloc for large per-queue related buffers") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Shannon Nelson <snelson@pensando.io> Link: https://lore.kernel.org/r/20220223015731.22025-1-snelson@pensando.ioSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Eric Dumazet says: ==================== tcp: take care of another syzbot issue This is a minor issue: It took months for syzbot to find a C repro, and even with it, I had to spend a lot of time to understand KFENCE was a prereq. With the default kfence 500ms interval, I had to be very patient to trigger the kernel warning and perform my analysis. This series targets net-next tree, because I added a new generic helper in the first patch, then fixed the issue in the second one. They can be backported once proven solid. ==================== Link: https://lore.kernel.org/r/20220222032113.4005821-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
syzbot found another way to trigger the infamous WARN_ON_ONCE(delta < len) in skb_try_coalesce() [1] I was able to root cause the issue to kfence. When kfence is in action, the following assertion is no longer true: int size = xxxx; void *ptr1 = kmalloc(size, gfp); void *ptr2 = kmalloc(size, gfp); if (ptr1 && ptr2) ASSERT(ksize(ptr1) == ksize(ptr2)); We attempted to fix these issues in the blamed commits, but forgot that TCP was possibly shifting data after skb_unclone_keeptruesize() has been used, notably from tcp_retrans_try_collapse(). So we not only need to keep same skb->truesize value, we also need to make sure TCP wont fill new tailroom that pskb_expand_head() was able to get from a addr = kmalloc(...) followed by ksize(addr) Split skb_unclone_keeptruesize() into two parts: 1) Inline skb_unclone_keeptruesize() for the common case, when skb is not cloned. 2) Out of line __skb_unclone_keeptruesize() for the 'slow path'. WARNING: CPU: 1 PID: 6490 at net/core/skbuff.c:5295 skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295 Modules linked in: CPU: 1 PID: 6490 Comm: syz-executor161 Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295 Code: bf 01 00 00 00 0f b7 c0 89 c6 89 44 24 20 e8 62 24 4e fa 8b 44 24 20 83 e8 01 0f 85 e5 f0 ff ff e9 87 f4 ff ff e8 cb 20 4e fa <0f> 0b e9 06 f9 ff ff e8 af b2 95 fa e9 69 f0 ff ff e8 95 b2 95 fa RSP: 0018:ffffc900063af268 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 00000000ffffffd5 RCX: 0000000000000000 RDX: ffff88806fc05700 RSI: ffffffff872abd55 RDI: 0000000000000003 RBP: ffff88806e675500 R08: 00000000ffffffd5 R09: 0000000000000000 R10: ffffffff872ab659 R11: 0000000000000000 R12: ffff88806dd554e8 R13: ffff88806dd9bac0 R14: ffff88806dd9a2c0 R15: 0000000000000155 FS: 00007f18014f9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020002000 CR3: 000000006be7a000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_try_coalesce net/ipv4/tcp_input.c:4651 [inline] tcp_try_coalesce+0x393/0x920 net/ipv4/tcp_input.c:4630 tcp_queue_rcv+0x8a/0x6e0 net/ipv4/tcp_input.c:4914 tcp_data_queue+0x11fd/0x4bb0 net/ipv4/tcp_input.c:5025 tcp_rcv_established+0x81e/0x1ff0 net/ipv4/tcp_input.c:5947 tcp_v4_do_rcv+0x65e/0x980 net/ipv4/tcp_ipv4.c:1719 sk_backlog_rcv include/net/sock.h:1037 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2779 release_sock+0x54/0x1b0 net/core/sock.c:3311 sk_wait_data+0x177/0x450 net/core/sock.c:2821 tcp_recvmsg_locked+0xe28/0x1fd0 net/ipv4/tcp.c:2457 tcp_recvmsg+0x137/0x610 net/ipv4/tcp.c:2572 inet_recvmsg+0x11b/0x5e0 net/ipv4/af_inet.c:850 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_recvmsg net/socket.c:962 [inline] ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632 ___sys_recvmsg+0x127/0x200 net/socket.c:2674 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c4777efa ("net: add and use skb_unclone_keeptruesize() helper") Fixes: 097b9146 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
We have multiple places where this helper is convenient, and plan using it in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
All other skbs allocated for TCP tx are using MAX_TCP_HEADER already. MAX_HEADER can be too small for some cases (like eBPF based encapsulation), so this can avoid extra pskb_expand_head() in lower stacks. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220222031115.4005060-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Maciek Machnikowski authored
Add option to shift the clock by a specified number of nanoseconds. The new argument -n will specify the number of nanoseconds to add to the ptp clock. Since the API doesn't support negative shifts those needs to be calculated by subtracting full seconds and adding a nanosecond offset. Signed-off-by: Maciek Machnikowski <maciek@machnikowski.net> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20220221200637.125595-1-maciek@machnikowski.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gustavo A. R. Silva authored
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). This issue was found with the help of Coccinelle and audited and fixed, manually. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20220221173415.GA1149599@embeddedorSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Vladimir Oltean reports that probing on DSA drivers that aren't yet populating supported_interfaces now fails. Fix this by allowing phylink to detect whether DSA actually provides an underlying mac_select_pcs() implementation. Reported-by: Vladimir Oltean <olteanv@gmail.com> Fixes: bde01822 ("net: dsa: add support for phylink mac_select_pcs()") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/E1nMCD6-00A0wC-FG@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Oleksij Rempel authored
30 seconds is too long interval especially if it used with ip -s l. Reduce polling interval to 5 sec. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20220221084129.3660124-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Alexandra Winter says: ==================== s390/net: updates 2022-02-21 Just cleanup. No functional changes, as currently virt=phys in s390. ==================== Link: https://lore.kernel.org/r/20220221145633.3869621-1-wintera@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alexander Gordeev authored
Fix virtual vs physical address confusion (which currently are the same). Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alexander Gordeev authored
Fix virtual vs physical address confusion (which currently are the same). Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 22 Feb, 2022 2 commits
-
-
Eric Dumazet authored
Another thing making netns dismantles potentially very slow is located in gro_cells_destroy(), whenever cleanup_net() has to remove a device using gro_cells framework. RTNL is not held at this stage, so synchronize_net() is calling synchronize_rcu(): netdev_run_todo() ip_tunnel_dev_free() gro_cells_destroy() synchronize_net() synchronize_rcu() // Ouch. This patch uses call_rcu(), and gave me a 25x performance improvement in my tests. cleanup_net() is no longer blocked ~10 ms per synchronize_rcu() call. In the case we could not allocate the memory needed to queue the deferred free, use synchronize_rcu_expedited() v2: made percpu_free_defer_callback() static Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220220041155.607637-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David S. Miller authored
Russell King says: ==================== net: dsa: b53: convert to phylink_generic_validate() and mark as non-legacy This series converts b53 to use phylink_generic_validate() and also marks this driver as non-legacy. Patch 1 cleans up an if() condition to be more readable before we proceed with the conversion. Patch 2 populates the supported_interfaces and mac_capabilities members of phylink_config. Patch 3 drops the use of phylink_helper_basex_speed() which is now not necessary. Patch 4 switches the driver to use phylink_generic_validate() Patch 5 marks the driver as non-legacy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-