- 19 May, 2021 19 commits
-
-
Raju Rangoju authored
Hardware register having the server TID base can contain invalid values when adapter is in bad state (for example, due to AER fatal error). Reading these invalid values in the register can lead to out-of-bound memory access. So, fix by using the saved server TID base when clearing filters. Fixes: b1a79360 ("cxgb4: Delete all hash and TCAM filters before resource cleanup") Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5 fixes 2021-05-18 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
YueHaibing authored
data->ctrl_stats should be memset with correct size. Fixes: bfad2b97 ("ethtool: add interface to read standard MAC Ctrl stats") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
It's not correct to call napi_schedule() in pure process context. Because we use __raise_softirq_irqoff() we require callers to be in a context which will eventually lead to softirq handling (hardirq, bh disabled, etc.). With code as is users will see: NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!! Fixes: a8dd7ac1 ("net/mlx5e: Generalize RQ activation") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Ariel Levkovich authored
Termination tables are restricted to have the default miss action and cannot be set to forward to another table in case of a miss. If the fs prio of the termination table is not the last one in the list, fs_core will attempt to attach it to another table. Set the unmanaged ft flag when creating the termination table ft and select the tc offload prio for it to prevent fs_core from selecting the forwarding to next ft miss action and use the default one. In addition, set the flow that forwards to the termination table to ignore ft level restrictions since the ft level is not set by fs_core for unamanged fts. Fixes: 249ccc3c ("net/mlx5e: Add support for offloading traffic from uplink to uplink") Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
-
Leon Romanovsky authored
During driver probe of device that has dynamic MSI-X feature enabled, the following error is printed in some FW flavour (not released yet). mlx5_core 0000:06:00.0: firmware version: 4.7.4387 mlx5_core 0000:06:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link) mlx5_core 0000:06:00.0: mlx5_cmd_check:777:(pid 70599): SET_HCA_CAP(0x109) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x0) mlx5_core 0000:06:00.0: set_hca_cap:622:(pid 70599): handle_hca_cap failed mlx5_core 0000:06:00.0: mlx5_function_setup:1045:(pid 70599): set_hca_cap failed mlx5_core 0000:06:00.0: probe_one:1465:(pid 70599): mlx5_init_one failed with error code -22 mlx5_core: probe of 0000:06:00.0 failed with error -22 In order to make the setting capability of MSI-X future proof, let's query the current capabilities first. Fixes: 604774ad ("net/mlx5: Dynamically assign MSI-X vectors count") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Eli Cohen authored
net/mlx5: Expose MPFS configuration API MPFS is the multi physical function switch that bridges traffic between the physical port and any physical functions associated with it. The driver is required to add or remove MAC entries to properly forward incoming traffic to the correct physical function. We export the API to control MPFS so that other drivers, such as mlx5_vdpa are able to add MAC addresses of their network interfaces. The MAC address of the vdpa interface must be configured into the MPFS L2 address. Failing to do so could cause, in some NIC configurations, failure to forward packets to the vdpa network device instance. Fix this by adding calls to update the MPFS table. CC: <mst@redhat.com> CC: <jasowang@redhat.com> CC: <virtualization@lists.linux-foundation.org> Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Aya Levin authored
Avoid division by zero in the error flow. In the driver TC number can be either 1 or 8. When TC count is set to 1, driver zero netdev->num_tc. Hence, need to convert it back from 0 to 1 in the error flow. Fixes: fa374877 ("net/mlx5e: Handle errors from netif_set_real_num_{tx,rx}_queues") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Vlad Buslov authored
Rules with MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE dest flag are translated to destination FT in eswitch. Currently it is not possible to mirror such rules because firmware doesn't support mixing FT and Vport destinations in single rule when one of them adds encapsulation. Since the only use case for MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE destination is support for tunnel endpoints on VF and trying to offload such rule with mirror action causes either crash in fs_core or firmware error with syndrome 0xff6a1d, reject all such rules in mlx5 TC layer. Fixes: 10742efc ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Dima Chumak authored
When handling FIB_EVENT_ENTRY_REPLACE event for a new multipath route, lag activation can be missed if a stale (struct lag_mp)->mfi pointer exists, which was associated with an older multipath route that had been removed. Normally, when a route is removed, it triggers mlx5_lag_fib_event(), which handles FIB_EVENT_ENTRY_DEL and clears mfi pointer. But, if mlx5_lag_check_prereq() condition isn't met, for example when eswitch is in legacy mode, the fib event is skipped and mfi pointer becomes stale. Fix by resetting mfi pointer to NULL every time mlx5_lag_mp_init() is called. Fixes: 544fe7c2 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Saeed Mahameed authored
mlx5e_attach_netdev can be called prior to registering the netdevice: Example stack: ipoib_new_child_link -> ipoib_intf_init-> rdma_init_netdev-> mlx5_rdma_setup_rn-> mlx5e_attach_netdev-> mlx5e_num_channels_changed -> mlx5e_set_default_xps_cpumasks -> netif_set_xps_queue -> __netif_set_xps_queue -> kmalloc If any later stage fails at any point after mlx5e_num_channels_changed() returns, XPS allocated maps will never be freed as they are only freed during netdev unregistration, which will never happen for yet to be registered netdevs. Fixes: 3909a12e ("net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
-
Roi Dayan authored
For unreachable route entry the fib dev does not exists. Fixes: 8914add2 ("net/mlx5e: Handle FIB events to update tunnel endpoint device") Reported-by: Dennis Afanasev <dennis.afanasev@stateless.net> Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
It could be the lag dev is null so stop processing the event. In bond_enslave() the active/backup slave being set before setting the upper dev so first event is without an upper dev. After setting the upper dev with bond_master_upper_dev_link() there is a second event and in that event we have an upper dev. Fixes: 7e51891a ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Dima Chumak authored
The result of __dev_get_by_index() is not checked for NULL, which then passed to mlx5e_attach_encap() and gets dereferenced. Also, in case of a successful lookup, the net_device reference count is not incremented, which may result in net_device pointer becoming invalid at any time during mlx5e_attach_encap() execution. Fix by using dev_get_by_index(), which does proper reference counting on the net_device pointer. Also, handle nullptr return value when mirred device is not found. It's safe to call dev_put() on the mirred net_device pointer, right after mlx5e_attach_encap() call, because it's not being saved/copied down the call chain. Fixes: 3c37745e ("net/mlx5e: Properly deal with encap flows add/del under neigh update") Addresses-Coverity: ("Dereference null return value") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Parav Pandit authored
When a SF is inactivated and when it is in a TEARDOWN_REQUEST state, driver still returns its state as active. This is incorrect. Fix it by treating TEARDOWN_REQEUST as inactive state. When a SF is still attached to the driver, on user request to reactivate EINVAL error is returned. Inform user about it with better code EBUSY and informative error message. Fixes: 6a327321 ("net/mlx5: SF, Port function state change support") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Fix print to print correct error code and not using IS_ERR() which will just result in always printing 1. Also return real err instead of always -EOPNOTSUPP. Fixes: 10caabda ("net/mlx5e: Use termination table for VLAN push actions") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Jianbo Liu authored
For remote mirroring, after the tunnel packets are received, they are decapsulated and sent to representor, then re-encapsulated and sent out over another tunnel. So reformat action is set only when the destination is required to do encapsulation. Fixes: 249ccc3c ("net/mlx5e: Add support for offloading traffic from uplink to uplink") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Dima Chumak authored
The result of dev_get_by_index_rcu() is not checked for NULL and then gets dereferenced immediately. Also, the RCU lock must be held by the caller of dev_get_by_index_rcu(), which isn't satisfied by the call stack. Fix by handling nullptr return value when iflink device is not found. Add RCU locking around dev_get_by_index_rcu() to avoid possible adverse effects while iterating over the net_device's hlist. It is safe not to increment reference count of the net_device pointer in case of a successful lookup, because it's already handled by VLAN code during VLAN device registration (see register_vlan_dev and netdev_upper_dev_link). Fixes: 278748a9 ("net/mlx5e: Offload TC e-switch rules with egress VLAN device") Addresses-Coverity: ("Dereference null return value") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maor Gottlieb authored
mlx5_core_dev holds pointer to static profile, hence when the log_max_qp of the profile is override by some device, then it effect all other mlx5 devices that share the same profile. Fix it by having a profile instance for every mlx5 device. Fixes: 883371c4 ("net/mlx5: Check FW limitations on log_max_qp before setting it") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
- 18 May, 2021 7 commits
-
-
David S. Miller authored
Huazhong Tan says: ==================== net: hns3: fixes for -net This series includes some bugfixes for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yunsheng Lin authored
Currently skb_checksum_help()'s return is ignored, but it may return error when it fails to allocate memory when linearizing. So adds checking for the return of skb_checksum_help(). Fixes: 76ad4f0e("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Fixes: 3db084d2("net: hns3: Fix for vxlan tx checksum bug") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
Currently, when adaptive is on, the user's coalesce configuration may be overwritten by the dynamic one. The reason is that user's configurations are saved in struct hns3_enet_tqp_vector whose value maybe changed by the dynamic algorithm. To fix it, use struct hns3_nic_priv instead of struct hns3_enet_tqp_vector to save and get the user's configuration. BTW, operations of storing and restoring coalesce info in the reset process are unnecessary now, so remove them as well. Fixes: 434776a5 ("net: hns3: add ethtool_ops.set_coalesce support to PF") Fixes: 7e96adc4 ("net: hns3: add ethtool_ops.get_coalesce support to PF") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
Currently, the netdevice is registered before client initializing complete. So there is a timewindow between netdevice available and usable. In this case, if user try to change the channel number or ring param, it may cause the hns3_set_rx_cpu_rmap() being called twice, and report bug. [47199.416502] hns3 0000:35:00.0 eth1: set channels: tqp_num=1, rxfh=0 [47199.430340] hns3 0000:35:00.0 eth1: already uninitialized [47199.438554] hns3 0000:35:00.0: rss changes from 4 to 1 [47199.511854] hns3 0000:35:00.0: Channels changed, rss_size from 4 to 1, tqps from 4 to 1 [47200.163524] ------------[ cut here ]------------ [47200.171674] kernel BUG at lib/cpu_rmap.c:142! [47200.177847] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [47200.185259] Modules linked in: hclge(+) hns3(-) hns3_cae(O) hns_roce_hw_v2 hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [last unloaded: hclge] [47200.205912] CPU: 1 PID: 8260 Comm: ethtool Tainted: G O 5.11.0-rc3+ #1 [47200.215601] Hardware name: , xxxxxx 02/04/2021 [47200.223052] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--) [47200.230188] pc : cpu_rmap_add+0x38/0x40 [47200.237472] lr : irq_cpu_rmap_add+0x84/0x140 [47200.243291] sp : ffff800010e93a30 [47200.247295] x29: ffff800010e93a30 x28: ffff082100584880 [47200.254155] x27: 0000000000000000 x26: 0000000000000000 [47200.260712] x25: 0000000000000000 x24: 0000000000000004 [47200.267241] x23: ffff08209ba03000 x22: ffff08209ba038c0 [47200.273789] x21: 000000000000003f x20: ffff0820e2bc1680 [47200.280400] x19: ffff0820c970ec80 x18: 00000000000000c0 [47200.286944] x17: 0000000000000000 x16: ffffb43debe4a0d0 [47200.293456] x15: fffffc2082990600 x14: dead000000000122 [47200.300059] x13: ffffffffffffffff x12: 000000000000003e [47200.306606] x11: ffff0820815b8080 x10: ffff53e411988000 [47200.313171] x9 : 0000000000000000 x8 : ffff0820e2bc1700 [47200.319682] x7 : 0000000000000000 x6 : 000000000000003f [47200.326170] x5 : 0000000000000040 x4 : ffff800010e93a20 [47200.332656] x3 : 0000000000000004 x2 : ffff0820c970ec80 [47200.339168] x1 : ffff0820e2bc1680 x0 : 0000000000000004 [47200.346058] Call trace: [47200.349324] cpu_rmap_add+0x38/0x40 [47200.354300] hns3_set_rx_cpu_rmap+0x6c/0xe0 [hns3] [47200.362294] hns3_reset_notify_init_enet+0x1cc/0x340 [hns3] [47200.370049] hns3_change_channels+0x40/0xb0 [hns3] [47200.376770] hns3_set_channels+0x12c/0x2a0 [hns3] [47200.383353] ethtool_set_channels+0x140/0x250 [47200.389772] dev_ethtool+0x714/0x23d0 [47200.394440] dev_ioctl+0x4cc/0x640 [47200.399277] sock_do_ioctl+0x100/0x2a0 [47200.404574] sock_ioctl+0x28c/0x470 [47200.409079] __arm64_sys_ioctl+0xb4/0x100 [47200.415217] el0_svc_common.constprop.0+0x84/0x210 [47200.422088] do_el0_svc+0x28/0x34 [47200.426387] el0_svc+0x28/0x70 [47200.431308] el0_sync_handler+0x1a4/0x1b0 [47200.436477] el0_sync+0x174/0x180 [47200.441562] Code: 11000405 79000c45 f8247861 d65f03c0 (d4210000) [47200.448869] ---[ end trace a01efe4ce42e5f34 ]--- The process is like below: excuting hns3_client_init | register_netdev() | hns3_set_channels() | | hns3_set_rx_cpu_rmap() hns3_reset_notify_uninit_enet() | | | quit without calling function | hns3_free_rx_cpu_rmap for flag | HNS3_NIC_STATE_INITED is unset. | | | hns3_reset_notify_init_enet() | | set HNS3_NIC_STATE_INITED call hns3_set_rx_cpu_rmap()-- crash Fix it by calling register_netdev() at the end of function hns3_client_init(). Fixes: 08a10068 ("net: hns3: re-organize vector handle") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiaran Zhang authored
In hclge_mbx_handler(), if there are two consecutive mailbox messages that requires resp_msg, the resp_msg is not cleared after processing the first message, which will cause the resp_msg data of second message incorrect. Fix it by clearing the resp_msg before processing every mailbox message. Fixes: bb5790b7 ("net: hns3: refactor mailbox response scheme between PF and VF") Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Markus Bloechl authored
lan78xx already calls skb_tx_timestamp() in its lan78xx_start_xmit(). Override .get_ts_info to also advertise this capability (SOF_TIMESTAMPING_TX_SOFTWARE) via ethtool. Signed-off-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
This patch is to use "struct work_struct" for the finalize work queue instead of "struct tipc_net_work", as it can get the "net" and "addr" from tipc_net's other members and there is no need to add extra net and addr in tipc_net by defining "struct tipc_net_work". Note that it's safe to get net from tn->bcl as bcl is always released after the finalize work queue is done. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 May, 2021 14 commits
-
-
Dan Carpenter authored
We spotted a bug recently during a review where a driver was unregistering a bus that wasn't registered, which would trigger this BUG_ON(). Let's handle that situation more gracefully, and just print a warning and return. Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
David Awogbemila says: ==================== GVE bug fixes This patch series includes fixes to some bugs in the gve driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Awogbemila authored
SKBs with skb_get_queue_mapping(skb) == tx_cfg.num_queues should also be considered invalid. Fixes: f5cedc84 ("gve: Add transmit and receive support") Signed-off-by: David Awogbemila <awogbemila@google.com> Acked-by: Willem de Brujin <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Catherine Sullivan authored
As currently written, if the driver checks for more work (via gve_tx_poll or gve_rx_poll) before the device posts work and the irq doorbell is not unmasked (via iowrite32be(GVE_IRQ_ACK | GVE_IRQ_EVENT, ...)) before the device attempts to raise an interrupt, an interrupt is lost and this could potentially lead to the traffic being completely halted. For example, if a tx queue has already been stopped, the driver won't get the chance to complete work and egress will be halted. We need a full memory barrier in the poll routine to ensure that the irq doorbell is unmasked before the driver checks for more work. Fixes: f5cedc84 ("gve: Add transmit and receive support") Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Acked-by: Willem de Brujin <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Awogbemila authored
When freeing notification blocks, we index priv->msix_vectors. If we failed to allocate priv->msix_vectors (see abort_with_msix_vectors) this could lead to a NULL pointer dereference if the driver is unloaded. Fixes: 893ce44d ("gve: Add basic driver framework for Compute Engine Virtual NIC") Signed-off-by: David Awogbemila <awogbemila@google.com> Acked-by: Willem de Brujin <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Awogbemila authored
If we do not get the expected number of vectors from pci_enable_msix_range, we update priv->num_ntfy_blks but not priv->mgmt_msix_idx. This patch fixes this so that priv->mgmt_msix_idx is updated accordingly. Fixes: f5cedc84 ("gve: Add transmit and receive support") Signed-off-by: David Awogbemila <awogbemila@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Catherine Sullivan authored
Correctly check the TX QPL was assigned and unassigned if other steps in the allocation fail. Fixes: f5cedc84 (gve: Add transmit and receive support) Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Syzbot reports that in mac80211 we have a potential deadlock between our "local->stop_queue_reasons_lock" (spinlock) and netlink's nl_table_lock (rwlock). This is because there's at least one situation in which we might try to send a netlink message with this spinlock held while it is also possible to take the spinlock from a hardirq context, resulting in the following deadlock scenario reported by lockdep: CPU0 CPU1 ---- ---- lock(nl_table_lock); local_irq_disable(); lock(&local->queue_stop_reason_lock); lock(nl_table_lock); <Interrupt> lock(&local->queue_stop_reason_lock); This seems valid, we can take the queue_stop_reason_lock in any kind of context ("CPU0"), and call ieee80211_report_ack_skb() with the spinlock held and IRQs disabled ("CPU1") in some code path (ieee80211_do_stop() via ieee80211_free_txskb()). Short of disallowing netlink use in scenarios like these (which would be rather complex in mac80211's case due to the deep callchain), it seems the only fix for this is to disable IRQs while nl_table_lock is held to avoid hitting this scenario, this disallows the "CPU0" portion of the reported deadlock. Note that the writer side (netlink_table_grab()) already disables IRQs for this lock. Unfortunately though, this seems like a huge hammer, and maybe the whole netlink table locking should be reworked. Reported-by: syzbot+69ff9dff50dcfe14ddd4@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
If the device_add() for a smcd_dev fails, there's no cleanup step that rolls back the earlier list_add(). The device subsequently gets freed, and we end up with a corrupted list. Add some error handling that removes the device from the list. Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
If bond_kobj_init() or later kzalloc() in bond_alloc_slave() fail, then we call kobject_put() on the slave->kobj. This in turn calls the release function slave_kobj_release() which will always try to cancel_delayed_work_sync(&slave->notify_work), which shouldn't be done on an uninitialized work struct. Always initialize the work struct earlier to avoid problems here. Syzbot bisected this down to a completely pointless commit, some fault injection may have been at work here that caused the alloc failure in the first place, which may interact badly with bisect. Reported-by: syzbot+bfda097c12a00c8cae67@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Krzysztof Kozlowski authored
The http://www.linuxfoundation.org/en/Net does not contain networking subsystem description ("Nothing found"). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
On some host, a crash could be triggered simply by repeating these commands several times: # modprobe tipc # tipc bearer enable media udp name UDP1 localip 127.0.0.1 # rmmod tipc [] BUG: unable to handle kernel paging request at ffffffffc096bb00 [] Workqueue: events 0xffffffffc096bb00 [] Call Trace: [] ? process_one_work+0x1a7/0x360 [] ? worker_thread+0x30/0x390 [] ? create_worker+0x1a0/0x1a0 [] ? kthread+0x116/0x130 [] ? kthread_flush_work_fn+0x10/0x10 [] ? ret_from_fork+0x35/0x40 When removing the TIPC module, the UDP tunnel sock will be delayed to release in a work queue as sock_release() can't be done in rtnl_lock(). If the work queue is schedule to run after the TIPC module is removed, kernel will crash as the work queue function cleanup_beareri() code no longer exists when trying to invoke it. To fix it, this patch introduce a member wq_count in tipc_net to track the numbers of work queues in schedule, and wait and exit until all work queues are done in tipc_exit_net(). Fixes: d0f91938 ("tipc: add ip/udp media type") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Taehee Yoo authored
mld_newpack() doesn't allow to allocate high order page, only order-0 allocation is allowed. If headroom size is too large, a kernel panic could occur in skb_put(). Test commands: ip netns del A ip netns del B ip netns add A ip netns add B ip link add veth0 type veth peer name veth1 ip link set veth0 netns A ip link set veth1 netns B ip netns exec A ip link set lo up ip netns exec A ip link set veth0 up ip netns exec A ip -6 a a 2001:db8:0::1/64 dev veth0 ip netns exec B ip link set lo up ip netns exec B ip link set veth1 up ip netns exec B ip -6 a a 2001:db8:0::2/64 dev veth1 for i in {1..99} do let A=$i-1 ip netns exec A ip link add ip6gre$i type ip6gre \ local 2001:db8:$A::1 remote 2001:db8:$A::2 encaplimit 100 ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev ip6gre$i ip netns exec A ip link set ip6gre$i up ip netns exec B ip link add ip6gre$i type ip6gre \ local 2001:db8:$A::2 remote 2001:db8:$A::1 encaplimit 100 ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev ip6gre$i ip netns exec B ip link set ip6gre$i up done Splat looks like: kernel BUG at net/core/skbuff.c:110! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.12.0+ #891 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:skb_panic+0x15d/0x15f Code: 92 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 00 ae 79 83 41 57 41 56 41 55 48 8b 54 24 a6 26 f9 ff <0f> 0b 48 8b 6c 24 20 89 34 24 e8 4a 4e 92 fe 8b 34 24 48 c7 c1 20 RSP: 0018:ffff88810091f820 EFLAGS: 00010282 RAX: 0000000000000089 RBX: ffff8881086e9000 RCX: 0000000000000000 RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed1020123efb RBP: ffff888005f6eac0 R08: ffffed1022fc0031 R09: ffffed1022fc0031 R10: ffff888117e00187 R11: ffffed1022fc0030 R12: 0000000000000028 R13: ffff888008284eb0 R14: 0000000000000ed8 R15: 0000000000000ec0 FS: 0000000000000000(0000) GS:ffff888117c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8b801c5640 CR3: 0000000033c2c006 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600 ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600 skb_put.cold.104+0x22/0x22 ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600 ? rcu_read_lock_sched_held+0x91/0xc0 mld_newpack+0x398/0x8f0 ? ip6_mc_hdr.isra.26.constprop.46+0x600/0x600 ? lock_contended+0xc40/0xc40 add_grhead.isra.33+0x280/0x380 add_grec+0x5ca/0xff0 ? mld_sendpack+0xf40/0xf40 ? lock_downgrade+0x690/0x690 mld_send_initial_cr.part.34+0xb9/0x180 ipv6_mc_dad_complete+0x15d/0x1b0 addrconf_dad_completed+0x8d2/0xbb0 ? lock_downgrade+0x690/0x690 ? addrconf_rs_timer+0x660/0x660 ? addrconf_dad_work+0x73c/0x10e0 addrconf_dad_work+0x73c/0x10e0 Allowing high order page allocation could fix this problem. Fixes: 72e09ad1 ("ipv6: avoid high order allocations") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Zheyu Ma authored
'nj_setup' in netjet.c might fail with -EIO and in this case 'card->irq' is initialized and is bigger than zero. A subsequent call to 'nj_release' will free the irq that has not been requested. Fix this bug by deleting the previous assignment to 'card->irq' and just keep the assignment before 'request_irq'. The KASAN's log reveals it: [ 3.354615 ] WARNING: CPU: 0 PID: 1 at kernel/irq/manage.c:1826 free_irq+0x100/0x480 [ 3.355112 ] Modules linked in: [ 3.355310 ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1-00144-g25a12987 #13 [ 3.355816 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 3.356552 ] RIP: 0010:free_irq+0x100/0x480 [ 3.356820 ] Code: 6e 08 74 6f 4d 89 f4 e8 5e ac 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 4f ac 09 00 8b 75 c8 48 c7 c7 78 c1 2e 85 e8 e0 cf f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 72 33 0b 03 48 8b 43 40 4c 8b a0 80 [ 3.358012 ] RSP: 0000:ffffc90000017b48 EFLAGS: 00010082 [ 3.358357 ] RAX: 0000000000000000 RBX: ffff888104dc8000 RCX: 0000000000000000 [ 3.358814 ] RDX: ffff8881003c8000 RSI: ffffffff8124a9e6 RDI: 00000000ffffffff [ 3.359272 ] RBP: ffffc90000017b88 R08: 0000000000000000 R09: 0000000000000000 [ 3.359732 ] R10: ffffc900000179f0 R11: 0000000000001d04 R12: 0000000000000000 [ 3.360195 ] R13: ffff888107dc6000 R14: ffff888107dc6928 R15: ffff888104dc80a8 [ 3.360652 ] FS: 0000000000000000(0000) GS:ffff88817bc00000(0000) knlGS:0000000000000000 [ 3.361170 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.361538 ] CR2: 0000000000000000 CR3: 000000000582e000 CR4: 00000000000006f0 [ 3.362003 ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3.362175 ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3.362175 ] Call Trace: [ 3.362175 ] nj_release+0x51/0x1e0 [ 3.362175 ] nj_probe+0x450/0x950 [ 3.362175 ] ? pci_device_remove+0x110/0x110 [ 3.362175 ] local_pci_probe+0x45/0xa0 [ 3.362175 ] pci_device_probe+0x12b/0x1d0 [ 3.362175 ] really_probe+0x2a9/0x610 [ 3.362175 ] driver_probe_device+0x90/0x1d0 [ 3.362175 ] ? mutex_lock_nested+0x1b/0x20 [ 3.362175 ] device_driver_attach+0x68/0x70 [ 3.362175 ] __driver_attach+0x124/0x1b0 [ 3.362175 ] ? device_driver_attach+0x70/0x70 [ 3.362175 ] bus_for_each_dev+0xbb/0x110 [ 3.362175 ] ? rdinit_setup+0x45/0x45 [ 3.362175 ] driver_attach+0x27/0x30 [ 3.362175 ] bus_add_driver+0x1eb/0x2a0 [ 3.362175 ] driver_register+0xa9/0x180 [ 3.362175 ] __pci_register_driver+0x82/0x90 [ 3.362175 ] ? w6692_init+0x38/0x38 [ 3.362175 ] nj_init+0x36/0x38 [ 3.362175 ] do_one_initcall+0x7f/0x3d0 [ 3.362175 ] ? rdinit_setup+0x45/0x45 [ 3.362175 ] ? rcu_read_lock_sched_held+0x4f/0x80 [ 3.362175 ] kernel_init_freeable+0x2aa/0x301 [ 3.362175 ] ? rest_init+0x2c0/0x2c0 [ 3.362175 ] kernel_init+0x18/0x190 [ 3.362175 ] ? rest_init+0x2c0/0x2c0 [ 3.362175 ] ? rest_init+0x2c0/0x2c0 [ 3.362175 ] ret_from_fork+0x1f/0x30 [ 3.362175 ] Kernel panic - not syncing: panic_on_warn set ... [ 3.362175 ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1-00144-g25a12987 #13 [ 3.362175 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 3.362175 ] Call Trace: [ 3.362175 ] dump_stack+0xba/0xf5 [ 3.362175 ] ? free_irq+0x100/0x480 [ 3.362175 ] panic+0x15a/0x3f2 [ 3.362175 ] ? __warn+0xf2/0x150 [ 3.362175 ] ? free_irq+0x100/0x480 [ 3.362175 ] __warn+0x108/0x150 [ 3.362175 ] ? free_irq+0x100/0x480 [ 3.362175 ] report_bug+0x119/0x1c0 [ 3.362175 ] handle_bug+0x3b/0x80 [ 3.362175 ] exc_invalid_op+0x18/0x70 [ 3.362175 ] asm_exc_invalid_op+0x12/0x20 [ 3.362175 ] RIP: 0010:free_irq+0x100/0x480 [ 3.362175 ] Code: 6e 08 74 6f 4d 89 f4 e8 5e ac 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 4f ac 09 00 8b 75 c8 48 c7 c7 78 c1 2e 85 e8 e0 cf f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 72 33 0b 03 48 8b 43 40 4c 8b a0 80 [ 3.362175 ] RSP: 0000:ffffc90000017b48 EFLAGS: 00010082 [ 3.362175 ] RAX: 0000000000000000 RBX: ffff888104dc8000 RCX: 0000000000000000 [ 3.362175 ] RDX: ffff8881003c8000 RSI: ffffffff8124a9e6 RDI: 00000000ffffffff [ 3.362175 ] RBP: ffffc90000017b88 R08: 0000000000000000 R09: 0000000000000000 [ 3.362175 ] R10: ffffc900000179f0 R11: 0000000000001d04 R12: 0000000000000000 [ 3.362175 ] R13: ffff888107dc6000 R14: ffff888107dc6928 R15: ffff888104dc80a8 [ 3.362175 ] ? vprintk+0x76/0x150 [ 3.362175 ] ? free_irq+0x100/0x480 [ 3.362175 ] nj_release+0x51/0x1e0 [ 3.362175 ] nj_probe+0x450/0x950 [ 3.362175 ] ? pci_device_remove+0x110/0x110 [ 3.362175 ] local_pci_probe+0x45/0xa0 [ 3.362175 ] pci_device_probe+0x12b/0x1d0 [ 3.362175 ] really_probe+0x2a9/0x610 [ 3.362175 ] driver_probe_device+0x90/0x1d0 [ 3.362175 ] ? mutex_lock_nested+0x1b/0x20 [ 3.362175 ] device_driver_attach+0x68/0x70 [ 3.362175 ] __driver_attach+0x124/0x1b0 [ 3.362175 ] ? device_driver_attach+0x70/0x70 [ 3.362175 ] bus_for_each_dev+0xbb/0x110 [ 3.362175 ] ? rdinit_setup+0x45/0x45 [ 3.362175 ] driver_attach+0x27/0x30 [ 3.362175 ] bus_add_driver+0x1eb/0x2a0 [ 3.362175 ] driver_register+0xa9/0x180 [ 3.362175 ] __pci_register_driver+0x82/0x90 [ 3.362175 ] ? w6692_init+0x38/0x38 [ 3.362175 ] nj_init+0x36/0x38 [ 3.362175 ] do_one_initcall+0x7f/0x3d0 [ 3.362175 ] ? rdinit_setup+0x45/0x45 [ 3.362175 ] ? rcu_read_lock_sched_held+0x4f/0x80 [ 3.362175 ] kernel_init_freeable+0x2aa/0x301 [ 3.362175 ] ? rest_init+0x2c0/0x2c0 [ 3.362175 ] kernel_init+0x18/0x190 [ 3.362175 ] ? rest_init+0x2c0/0x2c0 [ 3.362175 ] ? rest_init+0x2c0/0x2c0 [ 3.362175 ] ret_from_fork+0x1f/0x30 [ 3.362175 ] Dumping ftrace buffer: [ 3.362175 ] (ftrace buffer empty) [ 3.362175 ] Kernel Offset: disabled [ 3.362175 ] Rebooting in 1 seconds.. Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-