- 17 Feb, 2022 40 commits
-
-
Eric Dumazet authored
Before freeing the hash table in addrconf_exit_net(), we need to make sure the work queue has completed, or risk NULL dereference or UAF. Thus, use cancel_delayed_work_sync() to enforce this. We do not hold RTNL in addrconf_exit_net(), making this safe. Fixes: 8805d13f ("ipv6/addrconf: use one delayed work per netns") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220216182037.3742-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Sprinkle for each loops to allow netdevices to be unregistered out of order, as their refs are released. This prevents problems caused by dependencies between netdevs which want to release references in their ->priv_destructor. See commit d6ff94af ("vlan: move dev_put into vlan_dev_uninit") for example. Eric has removed the only known ordering requirement in commit c002496b ("Merge branch 'ipv6-loopback'") so let's try this and see if anything explodes... Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20220215225310.3679266-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
In prep for unregistering netdevs out of order move the netdev state validation and change outside of the loop. While at it modernize this code and use WARN() instead of pr_err() + dump_stack(). Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20220215225310.3679266-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David S. Miller authored
Jakub Kicinski says: ==================== net: ping6: support setting basic SOL_IPV6 options via cmsg Support for IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG on ICMPv6 sockets and associated tests. I have no immediate plans to implement IPV6_FLOWINFO and all the extension header stuff. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Add a basic test to make sure ping sockets don't crash with IPV6_2292* options. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Test setting IPV6_HOPLIMIT via setsockopt and cmsg across socket types. Output without the kernel support (this series): Case HOPLIMIT ICMP cmsg - packet data returned 1, expected 0 Case HOPLIMIT ICMP diff - packet data returned 1, expected 0 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Test setting IPV6_TCLASS via setsockopt and cmsg across socket types. Output without the kernel support (this series): Case TCLASS ICMP cmsg - packet data returned 1, expected 0 Case TCLASS ICMP cmsg - rejection returned 0, expected 1 Case TCLASS ICMP diff - pass returned 1, expected 0 Case TCLASS ICMP diff - packet data returned 1, expected 0 Case TCLASS ICMP diff - rejection returned 0, expected 1 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Test setting IPV6_DONTFRAG via setsockopt and cmsg across socket types. Output without the kernel support (this series): Case DONTFRAG ICMP setsock returned 0, expected 1 Case DONTFRAG ICMP cmsg returned 0, expected 1 Case DONTFRAG ICMP both returned 0, expected 1 Case DONTFRAG ICMP diff returned 0, expected 1 FAIL - 4/24 cases failed Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Support setting IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG during sendmsg via SOL_IPV6 cmsgs. tclass and dontfrag are init'ed from struct ipv6_pinfo in ipcm6_init_sk(), while hlimit is inited to -1, so we need to handle it being populated via cmsg explicitly. Leave extension headers and flowlabel unimplemented. Those are slightly more laborious to test and users seem to primarily care about IPV6_TCLASS. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vladimir Oltean says: ==================== kRemove BRENTRY checks from switchdev drivers As discussed here: https://patchwork.kernel.org/project/netdevbpf/patch/20220214233111.1586715-2-vladimir.oltean@nxp.com/#24738869 no switchdev driver makes use of VLAN port objects that lack the BRIDGE_VLAN_INFO_BRENTRY flag. Notifying them in the first place rather seems like an omission of commit 9c86ce2c ("net: bridge: Notify about bridge VLANs"). Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag") that was just merged, the bridge no longer notifies switchdev upon creation of these VLANs, so we can remove the checks from drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag"), the bridge no longer emits switchdev notifiers for VLANs that don't have the BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vladimir Oltean says: ==================== Support PTP over UDP with the ocelot-8021q DSA tagging protocol The alternative tag_8021q-based tagger for Ocelot switches, added here: https://patchwork.kernel.org/project/netdevbpf/cover/20210129010009.3959398-1-olteanv@gmail.com/ gained support for PTP over L2 here: https://patchwork.kernel.org/project/netdevbpf/cover/20210213223801.1334216-1-olteanv@gmail.com/ mostly as a minimum viable requirement. That PTP support was mostly self-contained code that installed some rules to replicate PTP packets on the CPU queue, in felix_setup_mmio_filtering(). However ocelot-8021q starts to look more interesting for general purpose usage, so it is now time to reduce the technical debt by integrating the PTP traps used by Felix for tag_8021q with the rest of the Ocelot driver. There is further consolidation of traps to be done. The cookies used by MRP traps overlap with the cookies used for tag_8021q PTP traps, so those features could not be used at the same time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
DSA inherits NETIF_F_CSUM_MASK from master->vlan_features, and the expectation is that TX checksumming is offloaded and not done in software. Normally the DSA master takes care of this, but packets handled by ocelot_defer_xmit() are a very special exception, because they are actually injected into the switch through register-based MMIO. So the DSA master is not involved at all for these packets => no one calculates the checksum. This allows PTP over UDP to work using the ocelot-8021q tagging protocol. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Historically, the felix DSA driver has installed special traps such that PTP over L2 works with the ocelot-8021q tagging protocol; commit 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") has the details. Then the ocelot switch library also gained more comprehensive support for PTP traps through commit 96ca08c0 ("net: mscc: ocelot: set up traps for PTP packets"). Right now, PTP over L2 works using ocelot-8021q via the traps it has set for itself, but nothing else does. Consolidating the two code blocks would make ocelot-8021q gain support for PTP over L4 and tc-flower traps, and at the same time avoid some code and TCAM duplication. The traps are similar in intent, but different in execution, so some explanation is required. The traps set up by felix_setup_mmio_filtering() are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2 filter, and the IS2 is where the 'trap' action resides. The traps set up by ocelot_trap_add(), on the other hand, have a single filter, in VCAP IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure that the hardcoded traps take precedence and cannot be overridden by the Ocelot switch library. So in principle, the PTP traps needed for ocelot-8021q in the Felix driver can rely on ocelot_trap_add(), but the filters need to be patched to account for a quirk that LS1028A has: the quirk_no_xtr_irq described in commit 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping"). Live-patching is done by iterating through the trap list every time we know it has been updated, and transforming a trap into a redirect + CPU copy if ocelot-8021q is in use. Making the DSA ocelot-8021q tagger work with the Ocelot traps means we can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta, OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO (the alternative would have been to left-shift all cookie numbers by 1). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
There has been some controversy related to the sanity check that a CPU port exists, and commit e8b1d769 ("net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering") even "corrected" an apparent memory leak as static analysis tools see it. However, the check is completely dead code, since the earliest point at which felix_setup_mmio_filtering() can be called is: felix_pci_probe -> dsa_register_switch -> dsa_switch_probe -> dsa_tree_setup -> dsa_tree_setup_cpu_ports -> dsa_tree_setup_default_cpu -> contains the "DSA: tree %d has no CPU port\n" check -> dsa_tree_setup_master -> dsa_master_setup -> sysfs_create_group(&dev->dev.kobj, &dsa_group); -> makes tagging_store() callable -> dsa_tree_change_tag_proto -> dsa_tree_notify -> dsa_switch_event -> dsa_switch_change_tag_proto -> ds->ops->change_tag_protocol -> felix_change_tag_protocol -> felix_set_tag_protocol -> felix_setup_tag_8021q -> felix_setup_mmio_filtering -> breaks at first CPU port So probing would have failed earlier if there wasn't any CPU port defined. To avoid all confusion, delete the dead code and replace it with a comment. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The ocelot switch library does not need this information, but the felix DSA driver does. As a reminder, the VSC9959 switch in LS1028A doesn't have an IRQ line for packet extraction, so to be notified that a PTP packet needs to be dequeued, it receives that packet also over Ethernet, by setting up a packet trap. The Felix driver needs to install special kinds of traps for packets in need of RX timestamps, such that the packets are replicated both over Ethernet and over the CPU port module. But the Ocelot switch library sets up more than one trap for PTP event messages; it also traps PTP general messages, MRP control messages etc. Those packets don't need PTP timestamps, so there's no reason for the Felix driver to send them to the CPU port module. By knowing which traps need PTP timestamps, the Felix driver can adjust the traps installed using ocelot_trap_add() such that only those will actually get delivered to the CPU port module. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
When using the ocelot-8021q tagging protocol, the CPU port isn't configured as an NPI port, but is a regular port. So a "trap to CPU" operation is actually a "redirect" operation. So DSA needs to set up the trapping action one way or another, depending on the tagging protocol in use. To ease DSA's work of modifying the action, keep all currently installed traps in a list, so that DSA can live-patch them when the tagging protocol changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Use the helpers that avoid the quadratic complexity associated with calling dsa_to_port() indirectly: dsa_is_unused_port(), dsa_is_cpu_port(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
OCELOT_VCAP_IS2_TAG_8021Q_TXVLAN overlaps with OCELOT_VCAP_IS2_MRP_REDIRECT. To avoid this, make OCELOT_VCAP_IS2_MRP_REDIRECT take the cookie region from N to 2 * N - 1 (where N is ocelot->num_phys_ports). To avoid any risk that the singleton (not per port) VCAP IS2 filters overlap with per-port VCAP IS2 filters, we must ensure that the number of singleton filters is smaller than the number of physical ports. This is true right now, but may change in the future as switches with less ports get supported, or more singleton filters get added. So to be future-proof, let's move the singleton filters at the end of the range, where they won't overlap with anything to their right. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The MRP assist code installs a VCAP IS2 trapping rule for each port, but since the key and the action is the same, just the ingress port mask differs, there isn't any need to do this. We can save some space in the TCAM by using a single filter and adjusting the ingress port mask. Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this purpose. Now that the cookies are no longer per port, we need to change the allocation scheme such that MRP traps use a fixed number. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
MRP frames are configured to be trapped to the CPU queue 7, and this number is reflected in the extraction header. However, the information isn't used anywhere, so just leave MRP frames to go to CPU queue 0 unless needed otherwise. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Every use case that needed VCAP filters (in order: DSA tag_8021q, MRP, PTP traps) has hardcoded filter identifiers that worked well enough for that use case alone. But when two or more of those use cases would be used together, some of those identifiers would overlap, leading to breakage. Add definitions for each cookie and centralize them in ocelot_vcap.h, such that the overlaps are more obvious. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The driver uses an identifier equal to (ocelot->num_phys_ports + port) for MRP traps installed when the system is in the role of an MRC, and an identifier equal to (port) otherwise. Use the same identifier in both cases as a consolidation for the various cookie values spread throughout the driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5-updates-2022-02-16 Misc updates for mlx5: 1) Alex Liu Adds support for using xdp->data_meta 2) Aya Levin Adds PTP counters and port time stamp mode for representors and switchdev mode. 3) Tariq Toukan, Striding RQ simple improvements. 4) Roi Dayan (7): Create multiple attr instances per flow Some TC actions use post actions for their implementation. For example CT and sample actions. Create a new flow attr instance after each multi table action and create a post action rule for it as a generic parsing step. Now multi table actions like CT, sample don't require to do it. When flow has multiple attr instances, the first flow attr is being offloaded normally and linked to the next attr (post action rule) by setting an id on reg_c for matching. Post action rule (rule created from second attr instance) match the id on reg_c and does rest of the actions. Example rule with actions CT,goto will be created with 2 attr instances as following: attr1(CT)->attr2(goto) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roi Dayan authored
Allow sample+CT actions but still block sample+CT NAT as it is not supported. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Before this commit post_act can be used for normal rules and didn't handle special cases like CT and sample. With this commit post_act rule can also handle the special cases when needed. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
When tc actions being parsed only the last flow attr created needs the counter flag and the previous flags being reset. Clean the flag from the tc action parsers. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
CT and sample actions use post actions for their implementation. Flag those actions as multi table actions so the post act infrastructure will handle the post actions allocation. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Some TC actions use post actions for their implementation. For example CT and sample actions. Create a new flow attr after each multi table action and create a post action rule for it. First flow attr being offloaded normally and linked to the next attr (post action rule) with setting an id on reg_c. Post action rules match the id on reg_c and continue to the next one. The flow counter is allocated on the last rule. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Introduce mlx5e_tc_post_act_offload() and mlx5e_tc_post_act_unoffload() to be able to unoffload and reoffload existing post action rules handles. For example in neigh update events, the driver removes and readds rules in hardware. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Currently the mlx5_flow object contains a single mlx5_attr instance. However, multi table actions (e.g. CT) instantiate multiple attr instances. Currently action_match_supported() reads the actions flag from the flow's attribute instance. Modify the function to receive the action flags as a parameter which is set by the calling function and pass the aggregated actions to actions_match_supported(). Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
To allow shared tc block offload between two or more reps of the same eswitch, move the tc flow hashtable to be per rep, instead of per eswitch. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Aya Levin authored
When turning on tx_port_ts (private flag) a PTP-SQ is created. Consider this queue when adding rules matching SQs to VPORTs. Otherwise the traffic on this queue won't reach the wire. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Aya Levin authored
There is a configuration where the uplink interface is the synchronizer. Add PTP counters for this interface for monitoring. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Tariq Toukan authored
In RQs of type multi-packet WQE (Striding RQ), each WQE is relatively large (typically 256KB) but their number is relatively small (8 in default). Re-mapping the descriptors' buffers before re-posting them is done via UMR (User-Mode Memory Registration) operations. On the one hand, posting UMR WQEs in bulks reduces communication overhead with the HW and better utilizes its processing units. On the other hand, delaying the WQE repost operations for a small RQ (say, of 4 WQEs) might drastically hit its performance, causing packet drops due to no receive buffer, for high or bursty incoming packets rate. Here we restrict the bulk size for too small RQs. Effectively, with the current constants, RQ of size 4 (minimum allowed) would have no bulking, while larger RQs will continue working with bulks of 2. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Tariq Toukan authored
CQE compression is turned on by default on slow pci systems to help reduce the load on pci. In this case, Striding RQ was turned off as CQEs of packets that span several strides were not compressed, significantly reducing the compression effectiveness. This issue does not exist when using the newer mini_cqe format "stride_index". Hence, allow defaulting to Striding RQ in this case. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-