- 20 Jan, 2023 6 commits
-
-
Srujana Challa authored
The CN10K CPT coprocessor contains a context processor to accelerate updates to the IPsec security association contexts. The context processor contains a context cache. This patch updates CPT LF ALLOC mailbox to config ctx_ilen requested by VFs. CPT_LF_ALLOC:ctx_ilen is the size of initial context fetch. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nithin Dabilpuram authored
CN10K CPT coprocessor includes a component named RXC which is responsible for reassembly of inner IP packets. RXC has the feature to evict oldest entries based on age/threshold. The age/threshold is being set to minimum values to evict all entries at the time of teardown. This patch adds code to restore timeout and threshold config after teardown sequence is complete as it is global config. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Srujana Challa authored
Optimize CPT PF identification in mbox handling for faster mbox response by doing it at AF driver probe instead of every mbox message. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Srujana Challa authored
On OcteonTX2 platform CPT instruction enqueue is only possible via LMTST operations. The existing FLR sequence mentioned in HRM requires a dummy LMTST to CPT but LMTST can't be submitted from AF driver. So, HW team provided a new sequence to avoid dummy LMTST. This patch adds code for the same. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Srujana Challa authored
On OcteonTX2 SoC, the admin function (AF) is the only one with all priviliges to configure HW and alloc resources, PFs and it's VFs have to request AF via mailbox for all their needs. This patch adds a new mailbox for CPT VFs to request for CPT LF reset. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Srujana Challa authored
When CPT engine has uncorrectable errors, it will get halted and must be disabled and re-enabled. This patch adds code for the same. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 19 Jan, 2023 10 commits
-
-
Rakesh Sankaranarayanan authored
ALU table entry 2 register in KSZ9477 have bit positions reserved for forwarding port map. This field is referred in ksz9477_fdb_del() for clearing forward port map and alu table. But current fdb_del refer ALU table entry 3 register for accessing forward port map. Update ksz9477_fdb_del() to get forward port map from correct alu table entry register. With this bug, issue can be observed while deleting static MAC entries. Delete any specific MAC entry using "bridge fdb del" command. This should clear all the specified MAC entries. But it is observed that entries with self static alone are retained. Tested on LAN9370 EVB since ksz9477_fdb_del() is used common across LAN937x and KSZ series. Fixes: b987e98e ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20230118174735.702377-1-rakesh.sankaranarayanan@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Willem de Bruijn authored
Avoid race between process wakeup and tpacket_v3 block timeout. The test waits for cfg_timeout_msec for packets to arrive. Packets arrive in tpacket_v3 rings, which pass packets ("frames") to the process in batches ("blocks"). The sk waits for req3.tp_retire_blk_tov msec to release a block. Set the block timeout lower than the process waiting time, else the process may find that no block has been released by the time it scans the socket list. Convert to a ring of more than one, smaller, blocks with shorter timeouts. Blocks must be page aligned, so >= 64KB. Fixes: 5ebfb4cc ("selftests/net: toeplitz test") Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230118151847.4124260-1-willemdebruijn.kernel@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
The referenced commit changed the error code returned by the kernel when preventing a non-established socket from attaching the ktls ULP. Before to such a commit, the user-space got ENOTCONN instead of EINVAL. The existing self-tests depend on such error code, and the change caused a failure: RUN global.non_established ... tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107) non_established: Test failed at step #3 FAIL global.non_established In the unlikely event existing applications do the same, address the issue by restoring the prior error code in the above scenario. Note that the only other ULP performing similar checks at init time - smc_ulp_ops - also fails with ENOTCONN when trying to attach the ULP to a non-established socket. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Fixes: 2c02d41d ("net/ulp: prevent ULP without clone op from entering the LISTEN status") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.1674042685.git.pabeni@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxPaolo Abeni authored
Saeed Mahameed says: ==================== This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net: mlx5: eliminate anonymous module_init & module_exit net/mlx5: E-switch, Fix switchdev mode after devlink reload net/mlx5e: Protect global IPsec ASO net/mlx5e: Remove optimization which prevented update of ESN state net/mlx5e: Set decap action based on attr for sample net/mlx5e: QoS, Fix wrongfully setting parent_element_id on MODIFY_SCHEDULING_ELEMENT net/mlx5: E-switch, Fix setting of reserved fields on MODIFY_SCHEDULING_ELEMENT net/mlx5e: Remove redundant xsk pointer check in mlx5e_mpwrq_validate_xsk net/mlx5e: Avoid false lock dependency warning on tc_ht even more net/mlx5: fix missing mutex_unlock in mlx5_fw_fatal_reporter_err_work() ==================== Link: https://lore.kernel.org/r/Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Kevin Hao authored
The commit 4af1b64f ("octeontx2-pf: Fix lmtst ID used in aura free") uses the get/put_cpu() to protect the usage of percpu pointer in ->aura_freeptr() callback, but it also unnecessarily disable the preemption for the blockable memory allocation. The commit 87b93b67 ("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to fix these sleep inside atomic warnings. But it only fix the one for the non-rt kernel. For the rt kernel, we still get the similar warnings like below. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by swapper/0/1: #0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30 #1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4 #2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac Preemption disabled at: [<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284 CPU: 20 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc3-rt1-yocto-preempt-rt #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace.part.0+0xe8/0xf4 show_stack+0x20/0x30 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x224 rt_spin_lock+0x64/0x110 alloc_iova_fast+0x1ac/0x2ac iommu_dma_alloc_iova+0xd4/0x110 __iommu_dma_map+0x80/0x144 iommu_dma_map_page+0xe8/0x260 dma_map_page_attrs+0xb4/0xc0 __otx2_alloc_rbuf+0x90/0x150 otx2_rq_aura_pool_init+0x1c8/0x284 otx2_init_hw_resources+0xe4/0x3a4 otx2_open+0xf0/0x610 __dev_open+0x104/0x224 __dev_change_flags+0x1e4/0x274 dev_change_flags+0x2c/0x7c ic_open_devs+0x124/0x2f8 ip_auto_config+0x180/0x42c do_one_initcall+0x90/0x4dc do_basic_setup+0x10c/0x14c kernel_init_freeable+0x10c/0x13c kernel_init+0x2c/0x140 ret_from_fork+0x10/0x20 Of course, we can shuffle the get/put_cpu() to only wrap the invocation of ->aura_freeptr() as what commit 87b93b67 does. But there are only two ->aura_freeptr() callbacks, otx2_aura_freeptr() and cn10k_aura_freeptr(). There is no usage of perpcu variable in the otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it. We can move the get/put_cpu() into the corresponding callback which really has the percpu variable usage and avoid the sprinkling of get/put_cpu() in several places. Fixes: 4af1b64f ("octeontx2-pf: Fix lmtst ID used in aura free") Signed-off-by: Kevin Hao <haokexin@gmail.com> Link: https://lore.kernel.org/r/20230118071300.3271125-1-haokexin@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jason Xing authored
While one cpu is working on looking up the right socket from ehash table, another cpu is done deleting the request socket and is about to add (or is adding) the big socket from the table. It means that we could miss both of them, even though it has little chance. Let me draw a call trace map of the server side. CPU 0 CPU 1 ----- ----- tcp_v4_rcv() syn_recv_sock() inet_ehash_insert() -> sk_nulls_del_node_init_rcu(osk) __inet_lookup_established() -> __sk_nulls_add_node_rcu(sk, list) Notice that the CPU 0 is receiving the data after the final ack during 3-way shakehands and CPU 1 is still handling the final ack. Why could this be a real problem? This case is happening only when the final ack and the first data receiving by different CPUs. Then the server receiving data with ACK flag tries to search one proper established socket from ehash table, but apparently it fails as my map shows above. After that, the server fetches a listener socket and then sends a RST because it finds a ACK flag in the skb (data), which obeys RST definition in RFC 793. Besides, Eric pointed out there's one more race condition where it handles tw socket hashdance. Only by adding to the tail of the list before deleting the old one can we avoid the race if the reader has already begun the bucket traversal and it would possibly miss the head. Many thanks to Eric for great help from beginning to end. Fixes: 5e0724d0 ("tcp/dccp: fix hashdance race for passive sessions") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/ Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Xin Long authored
This reverts commit 0aa64df3. Currently IFF_NO_ADDRCONF is used to prevent all ipv6 addrconf for the slave ports of team, bonding and failover devices and it means no ipv6 packets can be sent out through these slave ports. However, for team device, "nsna_ping" link_watch requires ipv6 addrconf. Otherwise, the link will be marked failure. This patch removes the IFF_NO_ADDRCONF flag set for team port, and we will fix the original issue in another patch, as Jakub suggested. Fixes: 0aa64df3 ("net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf") Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/63e09531fc47963d2e4eff376653d3db21b97058.1673980932.git.lucien.xin@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
We often have to ping Willem asking for reviews of patches because he doesn't get included in the CC list. Add MAINTAINERS entries for some of the areas he covers so that ./scripts/ will know to add him. Acked-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Link: https://lore.kernel.org/r/20230117190141.60795-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Naresh reports seeing a warning that gred is calling u64_stats_update_begin() with preemption enabled. Arnd points out it's coming from _bstats_update(). We should be holding the qdisc lock when writing to stats, they are also updated from the datapath. Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/all/CA+G9fYsTr9_r893+62u6UGD3dVaCE-kN9C-Apmb2m=hxjc1Cqg@mail.gmail.com/ Fixes: e49efd52 ("net: sched: gred: support reporting stats from offloads") Link: https://lore.kernel.org/r/20230113044137.1383067-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wirelessJakub Kicinski authored
Kalle Valo says: ==================== wireless fixes for v6.2 Third set of fixes for v6.2. This time most of them are for drivers, only one revert for mac80211. For an important mt76 fix we had to cherry pick two commits from wireless-next. * tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()" wifi: mt76: dma: fix a regression in adding rx buffers wifi: mt76: handle possible mt76_rx_token_consume failures wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices wifi: brcmfmac: avoid NULL-deref in survey dump for 2G only device wifi: brcmfmac: avoid handling disabled channels for survey dump ==================== Link: https://lore.kernel.org/r/20230118073749.AF061C433EF@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 18 Jan, 2023 18 commits
-
-
Eric Dumazet authored
lockdep complains with the following lock/unlock sequence: lock_sock(sk); write_lock_bh(&sk->sk_callback_lock); [1] release_sock(sk); [2] write_unlock_bh(&sk->sk_callback_lock); We need to swap [1] and [2] to fix this issue. Fixes: 0b2c5972 ("l2tp: close all race conditions in l2tp_tunnel_register()") Reported-by: syzbot+bbd35b345c7cab0d9a08@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/20230114030137.672706-1-xiyou.wangcong@gmail.com/T/#m1164ff20628671b0f326a24cb106ab3239c70ce3 Cc: Cong Wang <cong.wang@bytedance.com> Cc: Guillaume Nault <gnault@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason Wang authored
Commit a7766ef1("virtio_net: disable cb aggressively") enables virtqueue callback via the following statement: do { if (use_napi) virtqueue_disable_cb(sq->vq); free_old_xmit_skbs(sq, false); } while (use_napi && kick && unlikely(!virtqueue_enable_cb_delayed(sq->vq))); When NAPI is used and kick is false, the callback won't be enabled here. And when the virtqueue is about to be full, the tx will be disabled, but we still don't enable tx interrupt which will cause a TX hang. This could be observed when using pktgen with burst enabled. TO be consistent with the logic that tries to disable cb only for NAPI, fixing this by trying to enable delayed callback only when NAPI is enabled when the queue is about to be full. Fixes: a7766ef1 ("virtio_net: disable cb aggressively") Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Robert Hancock authored
PTP TX timestamp handling was observed to be broken with this driver when using the raw Layer 2 PTP encapsulation. ptp4l was not receiving the expected TX timestamp after transmitting a packet, causing it to enter a failure state. The problem appears to be due to the way that the driver pads packets which are smaller than the Ethernet minimum of 60 bytes. If headroom space was available in the SKB, this caused the driver to move the data back to utilize it. However, this appears to cause other data references in the SKB to become inconsistent. In particular, this caused the ptp_one_step_sync function to later (in the TX completion path) falsely detect the packet as a one-step SYNC packet, even when it was not, which caused the TX timestamp to not be processed when it should be. Using the headroom for this purpose seems like an unnecessary complexity as this is not a hot path in the driver, and in most cases it appears that there is sufficient tailroom to not require using the headroom anyway. Remove this usage of headroom to prevent this inconsistency from occurring and causing other problems. Fixes: 653e92a9 ("net: macb: add support for padding and fcs computation") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> # on SAMA7G5 Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller authored
Pablo Niera Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Fix syn-retransmits until initiator gives up when connection is re-used due to rst marked as invalid, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Randy Dunlap authored
Eliminate anonymous module_init() and module_exit(), which can lead to confusion or ambiguity when reading System.map, crashes/oops/bugs, or an initcall_debug log. Give each of these init and exit functions unique driver-specific names to eliminate the anonymous names. Example 1: (System.map) ffffffff832fc78c t init ffffffff832fc79e t init ffffffff832fc8f8 t init Example 2: (initcall_debug log) calling init+0x0/0x12 @ 1 initcall init+0x0/0x12 returned 0 after 15 usecs calling init+0x0/0x60 @ 1 initcall init+0x0/0x60 returned 0 after 2 usecs calling init+0x0/0x9a @ 1 initcall init+0x0/0x9a returned 0 after 74 usecs Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Eli Cohen <eli@mellanox.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Chris Mi authored
The cited commit removes eswitch mode none. So after devlink reload in switchdev mode, eswitch mode is not changed. But actually eswitch is disabled during devlink reload. Fix it by setting eswitch mode to legacy when disabling eswitch which is called by reload_down. Fixes: f019679e ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Leon Romanovsky authored
ASO operations are global to whole IPsec as they share one DMA address for all operations. As such all WQE operations need to be protected with lock. In this case, it must be spinlock to allow mlx5e_ipsec_aso_query() operate in atomic context. Fixes: 1ed78fc0 ("net/mlx5e: Update IPsec soft and hard limits") Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Leon Romanovsky authored
aso->use_cache variable introduced in commit 8c582ddf ("net/mlx5e: Handle hardware IPsec limits events") was an optimization to skip recurrent calls to mlx5e_ipsec_aso_query(). Such calls are possible when lifetime event is generated: -> mlx5e_ipsec_handle_event() -> mlx5e_ipsec_aso_query() - first call -> xfrm_state_check_expire() -> mlx5e_xfrm_update_curlft() -> mlx5e_ipsec_aso_query() - second call However, such optimization not really effective as mlx5e_ipsec_aso_query() is needed to be called for update ESN anyway, which was missed due to misplaced use_cache assignment. Fixes: cee137a6 ("net/mlx5e: Handle ESN update events") Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Chris Mi authored
Currently decap action is set based on tunnel_id. That means it is set unconditionally. But for decap, ct and sample actions, decap is done before ct. No need to decap again in sample. And the actions are set correctly when parsing. So set decap action based on attr instead of tunnel_id. Fixes: 2741f223 ("net/mlx5e: TC, Support sample offload action for tunneled traffic") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maor Dickman authored
According to HW spec parent_element_id field should be reserved (0x0) when calling MODIFY_SCHEDULING_ELEMENT command. This patch remove the wrong initialization of reserved field, parent_element_id, on mlx5_qos_update_node. Fixes: 214baf22 ("net/mlx5e: Support HTB offload") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maor Dickman authored
According to HW spec element_type, element_attributes and parent_element_id fields should be reserved (0x0) when calling MODIFY_SCHEDULING_ELEMENT command. This patch remove initialization of these fields when calling the command. Fixes: bd77bf1c ("net/mlx5: Add SRIOV VF max rate configuration support") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Adham Faris authored
This validation function is relevant only for XSK cases, hence it assumes to be called only with xsk != NULL. Thus checking for invalid xsk pointer is redundant and misleads static code analyzers. This commit removes redundant xsk pointer check. This solves the following smatch warning: drivers/net/ethernet/mellanox/mlx5/core/en/params.c:481 mlx5e_mpwrq_validate_xsk() error: we previously assumed 'xsk' could be null (see line 478) Fixes: 6470d2e7 ("net/mlx5e: xsk: Use KSM for unaligned XSK") Signed-off-by: Adham Faris <afaris@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Vlad Buslov authored
The cited commit changed class of tc_ht internal mutex in order to avoid false lock dependency with fs_core node and flow_table hash table structures. However, hash table implementation internally also includes a workqueue task with its own lockdep map which causes similar bogus lockdep splat[0]. Fix it by also adding dedicated class for hash table workqueue work structure of tc_ht. [0]: [ 1139.672465] ====================================================== [ 1139.673552] WARNING: possible circular locking dependency detected [ 1139.674635] 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 Not tainted [ 1139.675734] ------------------------------------------------------ [ 1139.676801] modprobe/5998 is trying to acquire lock: [ 1139.677726] ffff88811e7b93b8 (&node->lock){++++}-{3:3}, at: down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.679662] but task is already holding lock: [ 1139.680703] ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0 [ 1139.682223] which lock already depends on the new lock. [ 1139.683640] the existing dependency chain (in reverse order) is: [ 1139.684887] -> #2 (&tc_ht_lock_key){+.+.}-{3:3}: [ 1139.685975] __mutex_lock+0x12c/0x14b0 [ 1139.686659] rht_deferred_worker+0x35/0x1540 [ 1139.687405] process_one_work+0x7c2/0x1310 [ 1139.688134] worker_thread+0x59d/0xec0 [ 1139.688820] kthread+0x28f/0x330 [ 1139.689444] ret_from_fork+0x1f/0x30 [ 1139.690106] -> #1 ((work_completion)(&ht->run_work)){+.+.}-{0:0}: [ 1139.691250] __flush_work+0xe8/0x900 [ 1139.691915] __cancel_work_timer+0x2ca/0x3f0 [ 1139.692655] rhashtable_free_and_destroy+0x22/0x6f0 [ 1139.693472] del_sw_flow_table+0x22/0xb0 [mlx5_core] [ 1139.694592] tree_put_node+0x24c/0x450 [mlx5_core] [ 1139.695686] tree_remove_node+0x6e/0x100 [mlx5_core] [ 1139.696803] mlx5_destroy_flow_table+0x187/0x690 [mlx5_core] [ 1139.698017] mlx5e_tc_nic_cleanup+0x2f8/0x400 [mlx5_core] [ 1139.699217] mlx5e_cleanup_nic_rx+0x2b/0x210 [mlx5_core] [ 1139.700397] mlx5e_detach_netdev+0x19d/0x2b0 [mlx5_core] [ 1139.701571] mlx5e_suspend+0xdb/0x140 [mlx5_core] [ 1139.702665] mlx5e_remove+0x89/0x190 [mlx5_core] [ 1139.703756] auxiliary_bus_remove+0x52/0x70 [ 1139.704492] device_release_driver_internal+0x3c1/0x600 [ 1139.705360] bus_remove_device+0x2a5/0x560 [ 1139.706080] device_del+0x492/0xb80 [ 1139.706724] mlx5_rescan_drivers_locked+0x194/0x6a0 [mlx5_core] [ 1139.707961] mlx5_unregister_device+0x7a/0xa0 [mlx5_core] [ 1139.709138] mlx5_uninit_one+0x5f/0x160 [mlx5_core] [ 1139.710252] remove_one+0xd1/0x160 [mlx5_core] [ 1139.711297] pci_device_remove+0x96/0x1c0 [ 1139.722721] device_release_driver_internal+0x3c1/0x600 [ 1139.723590] unbind_store+0x1b1/0x200 [ 1139.724259] kernfs_fop_write_iter+0x348/0x520 [ 1139.725019] vfs_write+0x7b2/0xbf0 [ 1139.725658] ksys_write+0xf3/0x1d0 [ 1139.726292] do_syscall_64+0x3d/0x90 [ 1139.726942] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.727769] -> #0 (&node->lock){++++}-{3:3}: [ 1139.728698] __lock_acquire+0x2cf5/0x62f0 [ 1139.729415] lock_acquire+0x1c1/0x540 [ 1139.730076] down_write+0x8e/0x1f0 [ 1139.730709] down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.731841] mlx5_del_flow_rules+0x6f/0x610 [mlx5_core] [ 1139.732982] __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core] [ 1139.734207] mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core] [ 1139.735491] mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core] [ 1139.736716] mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core] [ 1139.738007] mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core] [ 1139.739213] mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core] [ 1139.740377] _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core] [ 1139.741534] rhashtable_free_and_destroy+0x3be/0x6f0 [ 1139.742351] mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core] [ 1139.743512] mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core] [ 1139.744683] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] [ 1139.745860] mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core] [ 1139.747098] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] [ 1139.748372] mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core] [ 1139.749590] __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core] [ 1139.750813] mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core] [ 1139.752147] mlx5e_rep_remove+0x62/0x80 [mlx5_core] [ 1139.753293] auxiliary_bus_remove+0x52/0x70 [ 1139.754028] device_release_driver_internal+0x3c1/0x600 [ 1139.754885] driver_detach+0xc1/0x180 [ 1139.755553] bus_remove_driver+0xef/0x2e0 [ 1139.756260] auxiliary_driver_unregister+0x16/0x50 [ 1139.757059] mlx5e_rep_cleanup+0x19/0x30 [mlx5_core] [ 1139.758207] mlx5e_cleanup+0x12/0x30 [mlx5_core] [ 1139.759295] mlx5_cleanup+0xc/0x49 [mlx5_core] [ 1139.760384] __x64_sys_delete_module+0x2b5/0x450 [ 1139.761166] do_syscall_64+0x3d/0x90 [ 1139.761827] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.762663] other info that might help us debug this: [ 1139.763925] Chain exists of: &node->lock --> (work_completion)(&ht->run_work) --> &tc_ht_lock_key [ 1139.765743] Possible unsafe locking scenario: [ 1139.766688] CPU0 CPU1 [ 1139.767399] ---- ---- [ 1139.768111] lock(&tc_ht_lock_key); [ 1139.768704] lock((work_completion)(&ht->run_work)); [ 1139.769869] lock(&tc_ht_lock_key); [ 1139.770770] lock(&node->lock); [ 1139.771326] *** DEADLOCK *** [ 1139.772345] 2 locks held by modprobe/5998: [ 1139.772994] #0: ffff88813c1ff0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600 [ 1139.774399] #1: ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0 [ 1139.775822] stack backtrace: [ 1139.776579] CPU: 3 PID: 5998 Comm: modprobe Not tainted 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 [ 1139.777935] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 1139.779529] Call Trace: [ 1139.779992] <TASK> [ 1139.780409] dump_stack_lvl+0x57/0x7d [ 1139.781015] check_noncircular+0x278/0x300 [ 1139.781687] ? print_circular_bug+0x460/0x460 [ 1139.782381] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1139.783121] ? lock_release+0x487/0x7c0 [ 1139.783759] ? orc_find.part.0+0x1f1/0x330 [ 1139.784423] ? mark_lock.part.0+0xef/0x2fc0 [ 1139.785091] __lock_acquire+0x2cf5/0x62f0 [ 1139.785754] ? register_lock_class+0x18e0/0x18e0 [ 1139.786483] lock_acquire+0x1c1/0x540 [ 1139.787093] ? down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.788195] ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0 [ 1139.788978] ? register_lock_class+0x18e0/0x18e0 [ 1139.789715] down_write+0x8e/0x1f0 [ 1139.790292] ? down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.791380] ? down_write_killable+0x220/0x220 [ 1139.792080] ? find_held_lock+0x2d/0x110 [ 1139.792713] down_write_ref_node+0x7c/0xe0 [mlx5_core] [ 1139.793795] mlx5_del_flow_rules+0x6f/0x610 [mlx5_core] [ 1139.794879] __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core] [ 1139.796032] ? __esw_offloads_unload_rep+0xd0/0xd0 [mlx5_core] [ 1139.797227] ? xa_load+0x11a/0x200 [ 1139.797800] ? __xa_clear_mark+0xf0/0xf0 [ 1139.798438] mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core] [ 1139.799660] mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core] [ 1139.800821] mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core] [ 1139.802049] ? mlx5_eswitch_get_uplink_priv+0x25/0x80 [mlx5_core] [ 1139.803260] mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core] [ 1139.804398] ? __cancel_work_timer+0x1c2/0x3f0 [ 1139.805099] ? mlx5e_tc_unoffload_from_slow_path+0x460/0x460 [mlx5_core] [ 1139.806387] mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core] [ 1139.807481] _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core] [ 1139.808564] rhashtable_free_and_destroy+0x3be/0x6f0 [ 1139.809336] ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core] [ 1139.809336] ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core] [ 1139.810455] mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core] [ 1139.811552] mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core] [ 1139.812655] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] [ 1139.813768] mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core] [ 1139.814952] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] [ 1139.816166] mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core] [ 1139.817336] __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core] [ 1139.818507] mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core] [ 1139.819788] ? mlx5_eswitch_uplink_get_proto_dev+0x30/0x30 [mlx5_core] [ 1139.821051] ? kernfs_find_ns+0x137/0x310 [ 1139.821705] mlx5e_rep_remove+0x62/0x80 [mlx5_core] [ 1139.822778] auxiliary_bus_remove+0x52/0x70 [ 1139.823449] device_release_driver_internal+0x3c1/0x600 [ 1139.824240] driver_detach+0xc1/0x180 [ 1139.824842] bus_remove_driver+0xef/0x2e0 [ 1139.825504] auxiliary_driver_unregister+0x16/0x50 [ 1139.826245] mlx5e_rep_cleanup+0x19/0x30 [mlx5_core] [ 1139.827322] mlx5e_cleanup+0x12/0x30 [mlx5_core] [ 1139.828345] mlx5_cleanup+0xc/0x49 [mlx5_core] [ 1139.829382] __x64_sys_delete_module+0x2b5/0x450 [ 1139.830119] ? module_flags+0x300/0x300 [ 1139.830750] ? task_work_func_match+0x50/0x50 [ 1139.831440] ? task_work_cancel+0x20/0x20 [ 1139.832088] ? lockdep_hardirqs_on_prepare+0x273/0x3f0 [ 1139.832873] ? syscall_enter_from_user_mode+0x1d/0x50 [ 1139.833661] ? trace_hardirqs_on+0x2d/0x100 [ 1139.834328] do_syscall_64+0x3d/0x90 [ 1139.834922] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 1139.835700] RIP: 0033:0x7f153e71288b [ 1139.836302] Code: 73 01 c3 48 8b 0d 9d 75 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 75 0e 00 f7 d8 64 89 01 48 [ 1139.838866] RSP: 002b:00007ffe0a3ed938 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1139.840020] RAX: ffffffffffffffda RBX: 0000564c2cbf8220 RCX: 00007f153e71288b [ 1139.841043] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000564c2cbf8288 [ 1139.842072] RBP: 0000564c2cbf8220 R08: 0000000000000000 R09: 0000000000000000 [ 1139.843094] R10: 00007f153e7a3ac0 R11: 0000000000000206 R12: 0000564c2cbf8288 [ 1139.844118] R13: 0000000000000000 R14: 0000564c2cbf7ae8 R15: 00007ffe0a3efcb8 Fixes: 9ba33339 ("net/mlx5e: Avoid false lock depenency warning on tc_ht") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yang Yingliang authored
Add missing mutex_unlock() before returning from mlx5_fw_fatal_reporter_err_work(). Fixes: 9078e843 ("net/mlx5: Avoid recovery in probe flows") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetoothJakub Kicinski authored
Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix a buffer overflow in mgmt_mesh_add - Fix use HCI_OP_LE_READ_BUFFER_SIZE_V2 - Fix hci_qca shutdown on closed serdev - Fix possible circular locking dependencies on ISO code - Fix possible deadlock in rfcomm_sk_state_change * tag 'for-net-2023-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: Fix possible deadlock in rfcomm_sk_state_change Bluetooth: ISO: Fix possible circular locking dependency Bluetooth: hci_event: Fix Invalid wait context Bluetooth: ISO: Fix possible circular locking dependency Bluetooth: hci_sync: fix memory leak in hci_update_adv_data() Bluetooth: hci_qca: Fix driver shutdown on closed serdev Bluetooth: hci_conn: Fix memory leaks Bluetooth: hci_sync: Fix use HCI_OP_LE_READ_BUFFER_SIZE_V2 Bluetooth: Fix a buffer overflow in mgmt_mesh_add() ==================== Link: https://lore.kernel.org/r/20230118002944.1679845-1-luiz.dentz@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Daniel Borkmann says: ==================== bpf 2023-01-16 We've added 6 non-merge commits during the last 8 day(s) which contain a total of 6 files changed, 22 insertions(+), 24 deletions(-). The main changes are: 1) Mitigate a Spectre v4 leak in unprivileged BPF from speculative pointer-as-scalar type confusion, from Luis Gerhorst. 2) Fix a splat when pid 1 attaches a BPF program that attempts to send killing signal to itself, from Hao Sun. 3) Fix BPF program ID information in BPF_AUDIT_UNLOAD as well as PERF_BPF_EVENT_PROG_UNLOAD events, from Paul Moore. 4) Fix BPF verifier warning triggered from invalid kfunc call in backtrack_insn, also from Hao Sun. 5) Fix potential deadlock in htab_lock_bucket from same bucket index but different map_locked index, from Tonghao Zhang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation bpf: hash map, avoid deadlock with suitable hash mask bpf: remove the do_idr_lock parameter from bpf_prog_free_id() bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD bpf: Skip task with pid=1 in send_signal_common() bpf: Skip invalid kfunc call in backtrack_insn ==================== Link: https://lore.kernel.org/r/20230116230745.21742-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shyam Sundar S K authored
Due to other additional responsibilities Tom would no longer be able to support AMD XGBE driver. Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20230116085015.443127-1-Shyam-sundar.S-k@amd.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Caleb Connolly authored
The IPA interrupt can fire when pm_runtime is disabled due to it racing with the PM suspend/resume code. This causes a splat in the interrupt handler when it tries to call pm_runtime_get(). Explicitly disable the interrupt in our ->suspend callback, and re-enable it in ->resume to avoid this. If there is an interrupt pending it will be handled after resuming. The interrupt is a wake_irq, as a result even when disabled if it fires it will cause the system to wake from suspend as well as cancel any suspend transition that may be in progress. If there is an interrupt pending, the ipa_isr_thread handler will be called after resuming. Fixes: 1aac309d ("net: ipa: use autosuspend") Signed-off-by: Caleb Connolly <caleb.connolly@linaro.org> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20230115175925.465918-1-caleb.connolly@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 17 Jan, 2023 6 commits
-
-
Ying Hsu authored
syzbot reports a possible deadlock in rfcomm_sk_state_change [1]. While rfcomm_sock_connect acquires the sk lock and waits for the rfcomm lock, rfcomm_sock_release could have the rfcomm lock and hit a deadlock for acquiring the sk lock. Here's a simplified flow: rfcomm_sock_connect: lock_sock(sk) rfcomm_dlc_open: rfcomm_lock() rfcomm_sock_release: rfcomm_sock_shutdown: rfcomm_lock() __rfcomm_dlc_close: rfcomm_k_state_change: lock_sock(sk) This patch drops the sk lock before calling rfcomm_dlc_open to avoid the possible deadlock and holds sk's reference count to prevent use-after-free after rfcomm_dlc_open completes. Reported-by: syzbot+d7ce59...@syzkaller.appspotmail.com Fixes: 1804fdf6 ("Bluetooth: btintel: Combine setting up MSFT extension") Link: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218 [1] Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-
Luiz Augusto von Dentz authored
This attempts to fix the following trace: iso-tester/52 is trying to acquire lock: ffff8880024e0070 (&hdev->lock){+.+.}-{3:3}, at: iso_sock_listen+0x29e/0x440 but task is already holding lock: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_sock_listen+0x8b/0x440 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: lock_acquire+0x176/0x3d0 lock_sock_nested+0x32/0x80 iso_connect_cfm+0x1a3/0x630 hci_cc_le_setup_iso_path+0x195/0x340 hci_cmd_complete_evt+0x1ae/0x500 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 -> #1 (hci_cb_list_lock){+.+.}-{3:3}: lock_acquire+0x176/0x3d0 __mutex_lock+0x13b/0xf50 hci_le_remote_feat_complete_evt+0x17e/0x320 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 -> #0 (&hdev->lock){+.+.}-{3:3}: check_prev_add+0xfc/0x1190 __lock_acquire+0x1e27/0x2750 lock_acquire+0x176/0x3d0 __mutex_lock+0x13b/0xf50 iso_sock_listen+0x29e/0x440 __sys_listen+0xe6/0x160 __x64_sys_listen+0x25/0x30 do_syscall_64+0x42/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc other info that might help us debug this: Chain exists of: &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(&hdev->lock); *** DEADLOCK *** 1 lock held by iso-tester/52: #0: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_sock_listen+0x8b/0x440 Fixes: f764a6c2 ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-
Luiz Augusto von Dentz authored
This fixes the following trace caused by attempting to lock cmd_sync_work_lock while holding the rcu_read_lock: kworker/u3:2/212 is trying to lock: ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at: hci_cmd_sync_queue+0xad/0x140 other info that might help us debug this: context-{4:4} 4 locks held by kworker/u3:2/212: #0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work+0x4dc/0x9a0 #2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at: hci_cc_le_set_cig_params+0x64/0x4f0 #3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at: hci_cc_le_set_cig_params+0x2f9/0x4f0 Fixes: 26afbd82 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-
Luiz Augusto von Dentz authored
This attempts to fix the following trace: kworker/u3:1/184 is trying to acquire lock: ffff888001888130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: iso_connect_cfm+0x2de/0x690 but task is already holding lock: ffff8880028d1c20 (&conn->lock){+.+.}-{2:2}, at: iso_connect_cfm+0x265/0x690 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&conn->lock){+.+.}-{2:2}: lock_acquire+0x176/0x3d0 _raw_spin_lock+0x2a/0x40 __iso_sock_close+0x1dd/0x4f0 iso_sock_release+0xa0/0x1b0 sock_close+0x5e/0x120 __fput+0x102/0x410 task_work_run+0xf1/0x160 exit_to_user_mode_prepare+0x170/0x180 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x62/0xcc -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: check_prev_add+0xfc/0x1190 __lock_acquire+0x1e27/0x2750 lock_acquire+0x176/0x3d0 lock_sock_nested+0x32/0x80 iso_connect_cfm+0x2de/0x690 hci_cc_le_setup_iso_path+0x195/0x340 hci_cmd_complete_evt+0x1ae/0x500 hci_event_packet+0x38e/0x7c0 hci_rx_work+0x34c/0x980 process_one_work+0x5a5/0x9a0 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x22/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&conn->lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); lock(&conn->lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); *** DEADLOCK *** Fixes: ccf74f23 ("Bluetooth: Add BTPROTO_ISO socket type") Fixes: f764a6c2 ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-
Zhengchao Shao authored
When hci_cmd_sync_queue() failed in hci_update_adv_data(), inst_ptr is not freed, which will cause memory leak, convert to use ERR_PTR/PTR_ERR to pass the instance to callback so no memory needs to be allocated. Fixes: 651cd3d6 ("Bluetooth: convert hci_update_adv_data to hci_sync") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-
Krzysztof Kozlowski authored
The driver shutdown callback (which sends EDL_SOC_RESET to the device over serdev) should not be invoked when HCI device is not open (e.g. if hci_dev_open_sync() failed), because the serdev and its TTY are not open either. Also skip this step if device is powered off (qca_power_shutdown()). The shutdown callback causes use-after-free during system reboot with Qualcomm Atheros Bluetooth: Unable to handle kernel paging request at virtual address 0072662f67726fd7 ... CPU: 6 PID: 1 Comm: systemd-shutdow Tainted: G W 6.1.0-rt5-00325-g8a5f56bcfcca #8 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: tty_driver_flush_buffer+0x4/0x30 serdev_device_write_flush+0x24/0x34 qca_serdev_shutdown+0x80/0x130 [hci_uart] device_shutdown+0x15c/0x260 kernel_restart+0x48/0xac KASAN report: BUG: KASAN: use-after-free in tty_driver_flush_buffer+0x1c/0x50 Read of size 8 at addr ffff16270c2e0018 by task systemd-shutdow/1 CPU: 7 PID: 1 Comm: systemd-shutdow Not tainted 6.1.0-next-20221220-00014-gb85aaf97fb01-dirty #28 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: dump_backtrace.part.0+0xdc/0xf0 show_stack+0x18/0x30 dump_stack_lvl+0x68/0x84 print_report+0x188/0x488 kasan_report+0xa4/0xf0 __asan_load8+0x80/0xac tty_driver_flush_buffer+0x1c/0x50 ttyport_write_flush+0x34/0x44 serdev_device_write_flush+0x48/0x60 qca_serdev_shutdown+0x124/0x274 device_shutdown+0x1e8/0x350 kernel_restart+0x48/0xb0 __do_sys_reboot+0x244/0x2d0 __arm64_sys_reboot+0x54/0x70 invoke_syscall+0x60/0x190 el0_svc_common.constprop.0+0x7c/0x160 do_el0_svc+0x44/0xf0 el0_svc+0x2c/0x6c el0t_64_sync_handler+0xbc/0x140 el0t_64_sync+0x190/0x194 Fixes: 7e7bbddd ("Bluetooth: hci_qca: Fix qca6390 enable failure after warm reboot") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-