- 25 Jan, 2021 18 commits
-
-
Johannes Berg authored
If we get into a problem severe enough to attempt a reprobe, we schedule a worker to do that. However, if the problem gets more severe and the device is actually destroyed before this worker has a chance to run, we use a free device. Bump up the reference count of the device until the worker runs to avoid this situation. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210122144849.871f0892e4b2.I94819e11afd68d875f3e242b98bef724b8236f1e@changeid
-
Matti Gottlieb authored
The bit that indicates if the device supports 160MHZ is bit #9. The macro checks bit #8. Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit. Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com> Fixes: d6f2134a ("iwlwifi: add mac/rf types and 160MHz to the device tables") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210122144849.bddbf9b57a75.I16e09e2b1404b16bfff70852a5a654aa468579e2@changeid
-
Shaul Triebitz authored
In D3 resume flow, avoid the following race where sending packets before updating the sequence number (sequence number received from the wowlan status command response): Thread 1: __iwl_mvm_resume clears IWL_MVM_STATUS_IN_D3 and is cut by thread 2 before reaching iwl_mvm_query_wakeup_reasons. Thread 2: iwl_mvm_mac_itxq_xmit calls iwl_mvm_tx_skb since IWL_MVM_STATUS_IN_D3 is not set using a wrong sequence number. Thread 1: __iwl_mvm_resume continues and calls iwl_mvm_query_wakeup_reasons updating the sequence number received from the firmware. The next packet that will be sent now will cause sysassert 0x1096. Fix the bug by moving 'clear IWL_MVM_STATUS_IN_D3' to after sending the wowlan status command and updating the sequence number. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210122144849.fe927ec939c6.I103d3321fb55da7e6c6c51582cfadf94eb8b6c58@changeid
-
Luca Coelho authored
Until now we have been relying on matching the PCI ID and subsystem device ID in order to recognize Qu devices with Hr2. Add rules to match these devices, so that we don't have to add a new rule for every new ID we get. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210122144849.591ce253ddd8.Ia4b9cc2c535625890c6d6b560db97ee9f2d5ca3b@changeid
-
Gregory Greenman authored
Having sta_id not set for aux_sta and snif_sta can potentially lead to a hard to debug issue in case remove station is called without an add. In this case sta_id 0, an unrelated regular station, will be removed. In fact, we do have a FW assert that occures rarely and from the debug data analysis it looks like sta_id 0 is removed by mistake, though it's hard to pinpoint the exact flow. The WARN_ON in this patch should help to find it. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210122144849.5dc6dd9b22d5.I2add1b5ad24d0d0a221de79d439c09f88fcaf15d@changeid
-
Matt Chen authored
The return type value of functions 1 and 2 were considered to be an integer inside a buffer, but they can also be only an integer, without the buffer. Fix the code in iwl_acpi_get_dsm_u8() to handle it as a single integer value, as well as packed inside a buffer. Signed-off-by: Matt Chen <matt.chen@intel.com> Fixes: 9db93491 ("iwlwifi: acpi: support device specific method (DSM)") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210122144849.5757092adcd6.Ic24524627b899c9a01af38107a62a626bdf5ae3a@changeid
-
Johannes Berg authored
If we spin for a long time in memory reads that (for some reason in hardware) take a long time, then we'll eventually get messages such as watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272] This is because the reading really does take a very long time, and we don't schedule, so we're hogging the CPU with this task, at least if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y. Previously I misinterpreted the situation and thought that this was only going to happen if we had interrupts disabled, and then fixed this (which is good anyway, however), but that didn't always help; looking at it again now I realized that the spin unlock will only reschedule if CONFIG_PREEMPT is used. In order to avoid this issue, change the code to cond_resched() if we've been spinning for too long here. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 04516706 ("iwlwifi: pcie: limit memory read spin time") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid
-
Johannes Berg authored
There's no reason to use ktime_get() since we don't need any better precision than jiffies, and since we no longer disable interrupts around this code (when grabbing NIC access), jiffies will work fine. Use jiffies instead of ktime_get(). This cleanup is preparation for the following patch "iwlwifi: pcie: reschedule in long-running memory reads". The code gets simpler with the weird clock use etc. removed before we add cond_resched(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130253.621c948b1fad.I3ee9f4bc4e74a0c9125d42fb7c35cd80df4698a1@changeid
-
Johannes Berg authored
If the image loader allocation fails, we leak all the previously allocated memory. Fix this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.97172cbaa67c.I3473233d0ad01a71aa9400832fb2b9f494d88a11@changeid
-
Emmanuel Grumbach authored
I hit a NULL pointer exception in this function when the init flow went really bad. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.2e8da9f2c132.I0234d4b8ddaf70aaa5028a20c863255e05bc1f84@changeid
-
Johannes Berg authored
To avoid completion timeouts during device boot, set up the LTR timeouts on more devices - similar to what we had before for AX210. This also corrects the AX210 workaround to be done only on discrete (non-integrated) devices, otherwise the registers have no effect. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: edb62520 ("iwlwifi: pcie: set LTR to avoid completion timeout") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.fb819e19530b.I0396f82922db66426f52fbb70d32a29c8fd66951@changeid
-
Emmanuel Grumbach authored
The code was really awkward, we would first dereference txq->entries when calling iwl_txq_genX_tfd_unmap and then we would check that txq->entries is non-NULL. Fix that by exiting if txq->entries is NULL. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.173359fc236d.I75c7c2397d20df8d7fbc24cb16a5232d5c551889@changeid
-
Emmanuel Grumbach authored
I noticed that the flow that triggers an NMI on the firmware for old devices (tested on 7265) doesn't work. Apparently, the firmware / device is still in low power when we write the register that triggers the NMI. We call the "grab_nic_access" function to make sure the device is awake but that wasn't enough. I played with this and noticed that if we wait 1 ms after the device reports it is awake before we write to the NMI register, the device always sees our write and the firmware gets properly asserted. Triggering an NMI to the firmware can be done with the debugfs hook: echo 1 > /sys/kernel/debug/iwlwifi/0000\:00\:03.0/iwlmvm/fw_nmi What happened before is that the firmware would just stall without running its NMI routine. Because of that the driver wouldn't get the "firmware crashed" interrupt. After a while the driver would notice that the firmware is not responding to some command and it would read the error data from the firmware, but this data is populated in the NMI service routine in the firmware which was not called. So in the logs it looked like: iwlwifi 0000:00:03.0: Error sending REPLY_ERROR: time out after 2000ms. iwlwifi 0000:00:03.0: Current CMD queue read_ptr 33 write_ptr 34 iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode iwlwifi 0000:00:03.0: 0x00000000 | ADVANCED_SYSASSERT iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status0 iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1 iwlwifi 0000:00:03.0: 0x00000000 | branchlink2 iwlwifi 0000:00:03.0: 0x00000000 | interruptlink1 iwlwifi 0000:00:03.0: 0x00000000 | interruptlink2 iwlwifi 0000:00:03.0: 0x00000000 | data1 iwlwifi 0000:00:03.0: 0x00000000 | data2 iwlwifi 0000:00:03.0: 0x00000000 | data3 iwlwifi 0000:00:03.0: 0x00000000 | beacon time iwlwifi 0000:00:03.0: 0x00000000 | tsf low ... With this fix, immediately after we trigger the NMI to the firmware, we get the expected: iwlwifi 0000:00:03.0: Microcode SW error detected. Restarting 0x2000000. iwlwifi 0000:00:03.0: Start IWL Error Log Dump: iwlwifi 0000:00:03.0: Status: 0x00000040, count: 6 iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode iwlwifi 0000:00:03.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN iwlwifi 0000:00:03.0: 0x000002F1 | trm_hw_status0 iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1 iwlwifi 0000:00:03.0: 0x00043D6C | branchlink2 iwlwifi 0000:00:03.0: 0x0004AFD6 | interruptlink1 iwlwifi 0000:00:03.0: 0x000008C4 | interruptlink2 iwlwifi 0000:00:03.0: 0x00000000 | data1 iwlwifi 0000:00:03.0: 0x00000080 | data2 iwlwifi 0000:00:03.0: 0x07030000 | data3 iwlwifi 0000:00:03.0: 0x003FD4C3 | beacon time iwlwifi 0000:00:03.0: 0x00C22AC3 | tsf low iwlwifi 0000:00:03.0: 0x00000000 | tsf hi iwlwifi 0000:00:03.0: 0x00000000 | time gp1 iwlwifi 0000:00:03.0: 0x00C22AC3 | time gp2 iwlwifi 0000:00:03.0: 0x00000001 | uCode revision type iwlwifi 0000:00:03.0: 0x0000001D | uCode version major Notice the first line: "Microcode SW error detected:" which is printed in the driver's ISR, which means that the driver actually got an interrupt from the firmware saying that it crashed. And then we have the properly populated error data. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.70e67cc75d88.I6615cad4361862e7f3c9f2d3cafb6a8c61e16781@changeid
-
Johannes Berg authored
If loading the PNVM file failed on the first try during the interface up, the file is unlikely to show up later, and we already don't try to reload it if it changes, so just don't try loading it again and again. This also fixes some issues where we may try to load it at resume time, which may not be possible yet. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 69725928 ("iwlwifi: read and parse PNVM file") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.5ac6828a0bbe.I7d308358b21d3c0c84b1086999dbc7267f86e219@changeid
-
Johannes Berg authored
Even if we don't reload the file from disk, we still need to trigger the PNVM load flow with the device; fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 69725928 ("iwlwifi: read and parse PNVM file") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.85ef56c4ef8c.I3b853ce041a0755d45e448035bef1837995d191b@changeid
-
Johannes Berg authored
If we erroneously try to set the PNVM data again after it has already been set, we could leak the old DMA memory. Avoid that and warn, we shouldn't be doing this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 69725928 ("iwlwifi: read and parse PNVM file") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.929c2d680429.I086b9490e6c005f3bcaa881b617e9f61908160f3@changeid
-
Johannes Berg authored
We need to take the mutex to call iwl_mvm_get_sync_time(), do it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.4bb5ccf881a6.I62973cbb081e80aa5b0447a5c3b9c3251a65cf6b@changeid
-
Sara Sharon authored
In the new CSA flow, we remain associated during CSA, but still do a unbind-bind to the vif. However, sending the power command right after when vif is unbound but still associated causes FW to assert (0x3400) since it cannot tell the LMAC id. Just skip this command, we will send it again in a bit, when assigning the new context. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210115130252.64a2254ac5c3.Iaa3a9050bf3d7c9cd5beaf561e932e6defc12ec3@changeid
-
- 18 Jan, 2021 2 commits
-
-
Lorenzo Bianconi authored
Similar to mt7601u driver, fix erroneous rx page refcounting Fixes: a66cbdd6 ("mt76: mt7615: introduce mt7663s support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/dca19c9d445156201bc41f7cbb6e894bbc9a678c.1610644945.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
Fix the following crash due to erroneous page refcounting: [ 32.445919] BUG: Bad page state in process swapper/1 pfn:11f65a [ 32.447409] page:00000000938f0632 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x11f65a [ 32.449605] flags: 0x8000000000000000() [ 32.450421] raw: 8000000000000000 ffffffff825b0148 ffffea00045ae988 0000000000000000 [ 32.451795] raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 [ 32.452999] page dumped because: nonzero mapcount [ 32.453888] Modules linked in: [ 32.454492] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-rc2+ #1976 [ 32.455695] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014 [ 32.457157] Call Trace: [ 32.457636] <IRQ> [ 32.457993] dump_stack+0x77/0x97 [ 32.458576] bad_page.cold+0x65/0x96 [ 32.459198] get_page_from_freelist+0x46a/0x11f0 [ 32.460008] __alloc_pages_nodemask+0x10a/0x2b0 [ 32.460794] mt7601u_rx_tasklet+0x651/0x720 [ 32.461505] tasklet_action_common.constprop.0+0x6b/0xd0 [ 32.462343] __do_softirq+0x152/0x46c [ 32.462928] asm_call_irq_on_stack+0x12/0x20 [ 32.463610] </IRQ> [ 32.463953] do_softirq_own_stack+0x5b/0x70 [ 32.464582] irq_exit_rcu+0x9f/0xe0 [ 32.465028] common_interrupt+0xae/0x1a0 [ 32.465536] asm_common_interrupt+0x1e/0x40 [ 32.466071] RIP: 0010:default_idle+0x18/0x20 [ 32.468981] RSP: 0018:ffffc90000077f00 EFLAGS: 00000246 [ 32.469648] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 32.470550] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81aac3dd [ 32.471463] RBP: ffff88810022ab00 R08: 0000000000000001 R09: 0000000000000001 [ 32.472335] R10: 0000000000000046 R11: 0000000000005aa0 R12: 0000000000000000 [ 32.473235] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 32.474139] ? default_idle_call+0x4d/0x200 [ 32.474681] default_idle_call+0x74/0x200 [ 32.475192] do_idle+0x1d5/0x250 [ 32.475612] cpu_startup_entry+0x19/0x20 [ 32.476114] secondary_startup_64_no_verify+0xb0/0xbb [ 32.476765] Disabling lock debugging due to kernel taint Fixes: c869f77d ("add mt7601u driver") Co-developed-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/62b2380c8c2091834cfad05e1059b55f945bd114.1610643952.git.lorenzo@kernel.org
-
- 14 Jan, 2021 2 commits
-
-
Takashi Iwai authored
The commit ba8f6f4a ("iwlwifi: dbg: add dumping special device memory") added a termination of name string just to be sure, and this seems causing a regression, a GPF triggered at firmware loading. Basically we shouldn't modify the firmware data that may be provided as read-only. This patch drops the code that caused the regression and keep the tlv data as is. Fixes: ba8f6f4a ("iwlwifi: dbg: add dumping special device memory") BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1180344 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=210733 Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210112132449.22243-2-tiwai@suse.de
-
Nathan Chancellor authored
Clang warns in both mt7615 and mt7915: drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:271:9: warning: implicit conversion from enumeration type 'enum mt76_mcuq_id' to different enumeration type 'enum mt76_txq_id' [-Wenum-conversion] txq = MT_MCUQ_FWDL; ~ ^~~~~~~~~~~~ drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:278:9: warning: implicit conversion from enumeration type 'enum mt76_mcuq_id' to different enumeration type 'enum mt76_txq_id' [-Wenum-conversion] txq = MT_MCUQ_WA; ~ ^~~~~~~~~~ drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:282:9: warning: implicit conversion from enumeration type 'enum mt76_mcuq_id' to different enumeration type 'enum mt76_txq_id' [-Wenum-conversion] txq = MT_MCUQ_WM; ~ ^~~~~~~~~~ 3 warnings generated. drivers/net/wireless/mediatek/mt76/mt7615/mcu.c:238:9: warning: implicit conversion from enumeration type 'enum mt76_mcuq_id' to different enumeration type 'enum mt76_txq_id' [-Wenum-conversion] qid = MT_MCUQ_WM; ~ ^~~~~~~~~~ drivers/net/wireless/mediatek/mt76/mt7615/mcu.c:240:9: warning: implicit conversion from enumeration type 'enum mt76_mcuq_id' to different enumeration type 'enum mt76_txq_id' [-Wenum-conversion] qid = MT_MCUQ_FWDL; ~ ^~~~~~~~~~~~ 2 warnings generated. Use the proper type for the queue ID variables to fix these warnings. Additionally, rename the txq variable in mt7915_mcu_send_message to be more neutral like mt7615_mcu_send_message. Fixes: e637763b ("mt76: move mcu queues to mt76_dev q_mcu array") Link: https://github.com/ClangBuiltLinux/linux/issues/1229Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201229211548.1348077-1-natechancellor@gmail.com
-
- 08 Jan, 2021 18 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull more networking fixes from Jakub Kicinski: "Slightly lighter pull request to get back into the Thursday cadence. Current release - always broken: - can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions - dsa: hellcreek: fix led_classdev build errors Previous releases - regressions: - ipv6: fib: flush exceptions when purging route to avoid netdev reference leak - ip_tunnels: fix pmtu check in nopmtudisc mode - ip: always refragment ip defragmented packets to avoid MTU issues when forwarding through tunnels, correct "packet too big" message is prohibitively tricky to generate - s390/qeth: fix locking for discipline setup / removal and during recovery to prevent both deadlocks and races - mlx5: Use port_num 1 instead of 0 when delete a RoCE address Previous releases - always broken: - cdc_ncm: correct overhead calculation in delayed_ndp_size to prevent out of bound accesses with Huawei 909s-120 LTE module - fix stmmac dwmac-sun8i suspend/resume: - PHY being left powered off - MAC syscon configuration being reset - reference to the reset controller being improperly dropped - qrtr: fix null-ptr-deref in qrtr_ns_remove - can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver - mlx5e: CT: Use per flow counter when CT flow accounting is enabled - mlx5e: Fix SWP offsets when vlan inserted by driver Misc: - bpf: Fix a task_iter bug caused by a bpf -> net merge conflict resolution And the usual many fixes to various error paths" * tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE s390/qeth: fix L2 header access in qeth_l3_osa_features_check() s390/qeth: fix locking for discipline setup / removal s390/qeth: fix deadlock during recovery selftests: fib_nexthops: Fix wrong mausezahn invocation nexthop: Bounce NHA_GATEWAY in FDB nexthop groups nexthop: Unlink nexthop group entry in error path nexthop: Fix off-by-one error in error path octeontx2-af: fix memory leak of lmac and lmac->name chtls: Fix chtls resources release sequence chtls: Added a check to avoid NULL pointer dereference chtls: Replace skb_dequeue with skb_peek chtls: Avoid unnecessary freeing of oreq pointer chtls: Fix panic when route to peer not configured chtls: Remove invalid set_tcb call chtls: Fix hardware tid leak net: ip: always refragment ip defragmented packets net: fix pmtu check in nopmtudisc mode selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking docs: octeontx2: tune rst markup ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fixes from Herbert Xu: "This fixes a functional bug in arm/chacha-neon as well as a potential buffer overflow in ecdh" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ecdh - avoid buffer overflow in ecdh_set_secret() crypto: arm/chacha-neon - add missing counter increment
-
Linus Torvalds authored
The kernel test robot reported a -5.8% performance regression on the "poll2" test of will-it-scale, and bisected it to commit d55564cf ("x86: Make __put_user() generate an out-of-line call"). I didn't expect an out-of-line __put_user() to matter, because no normal core code should use that non-checking legacy version of user access any more. But I had overlooked the very odd poll() usage, which does a __put_user() to update the 'revents' values of the poll array. Now, Al Viro correctly points out that instead of updating just the 'revents' field, it would be much simpler to just copy the _whole_ pollfd entry, and then we could just use "copy_to_user()" on the whole array of entries, the same way we use "copy_from_user()" a few lines earlier to get the original values. But that is not what we've traditionally done, and I worry that threaded applications might be concurrently modifying the other fields of the pollfd array. So while Al's suggestion is simpler - and perhaps worth trying in the future - this instead keeps the "just update revents" model. To fix the performance regression, use the modern "unsafe_put_user()" instead of __put_user(), with the proper "user_write_access_begin()" guarding in place. This improves code generation enormously. Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Laight <David.Laight@aculab.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Petr Mladek authored
This reverts commit 757055ae. The commit caused that ttynull was used as the default console on several systems[1][2][3]. As a result, the console was blank even when a better alternative existed. It happened when there was no console configured on the command line and ttynull_init() was the first initcall calling register_console(). Or it happened when /dev/ did not exist when console_on_rootfs() was called. It was not able to open /dev/console even though a console driver was registered. It tried to add ttynull console but it obviously did not help. But ttynull became the preferred console and was used by /dev/console when it was available later. The commit tried to fix a historical problem that have been there for ages. The primary motivation was the commit 3cffa06a ("printk/console: Allow to disable console output by using console="" or console=null"). It provided a clean solution for a workaround that was widely used and worked only by chance. This revert causes that the console="" or console=null command line options will again work only by chance. These options will cause that a particular console will be preferred and the default (tty) ones will not get enabled. There will be no console registered at all. As a result there won't be stdin, stdout, and stderr for the init process. But it worked exactly this way even before. The proper solution has to fulfill many conditions: + Register ttynull only when explicitly required or as the ultimate fallback. + ttynull should get associated with /dev/console but it must not become preferred console when used as a fallback. Especially, it must still be possible to replace it by a better console later. Such a change requires clean up of the register_console() code. Otherwise, it would be even harder to follow. Especially, the use of has_preferred_console and CON_CONSDEV flag is tricky. The clean up is risky. The ordering of consoles is not well defined. And any changes tend to break existing user settings. Do the revert at the least risky solution for now. [1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/ [2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/ [3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Vineet Gupta <vgupta@synopsys.com> Reported-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5 fixes 2021-01-07 * tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups net/mlx5e: Fix two double free cases net/mlx5: Release devlink object if adev fails net/mlx5e: ethtool, Fix restriction of autoneg with 56G net/mlx5e: In skb build skip setting mark in switchdev mode net/mlx5: E-Switch, fix changing vf VLANID net/mlx5e: Fix SWP offsets when vlan inserted by driver net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address net/mlx5e: Add missing capability check for uplink follow net/mlx5: Check if lag is supported before creating one ==================== Link: https://lore.kernel.org/r/20210107202845.470205-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Aleksander Jan Bajkowski authored
Exclude RMII from modes that report 1 GbE support. Reduced MII supports up to 100 MbE. Fixes: 14fceff4 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210107195818.3878-1-olek2@wp.plSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Julian Wiedmann says: ==================== s390/qeth: fixes 2021-01-07 This brings two locking fixes for the device control path. Also one fix for a path where our .ndo_features_check() attempts to access a non-existent L2 header. ==================== Link: https://lore.kernel.org/r/20210107172442.1737-1-jwi@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Julian Wiedmann authored
ip_finish_output_gso() may call .ndo_features_check() even before the skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt to inspect the L2 header via vlan_eth_hdr(). Switch to vlan_get_protocol(), as already used further down in the common qeth_features_check() path. Fixes: f13ade19 ("s390/qeth: run non-offload L3 traffic over common xmit path") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Julian Wiedmann authored
Due to insufficient locking, qeth_core_set_online() and qeth_dev_layer2_store() can run in parallel, both attempting to load & setup the discipline (and stepping on each other toes along the way). A similar race can also occur between qeth_core_remove_device() and qeth_dev_layer2_store(). Access to .discipline is meant to be protected by the discipline_mutex, so add/expand the locking in qeth_core_remove_device() and qeth_core_set_online(). Adjust the locking in qeth_l*_remove_device() accordingly, as it's now handled by the callers in a consistent manner. Based on an initial patch by Ursula Braun. Fixes: 9dc48ccc ("qeth: serialize sysfs-triggered device configurations") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Julian Wiedmann authored
When qeth_dev_layer2_store() - holding the discipline_mutex - waits inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete, we can hit a deadlock if qeth_do_reset() concurrently calls qeth_set_online() and thus tries to aquire the discipline_mutex. Move the discipline_mutex locking outside of qeth_set_online() and qeth_set_offline(), and turn the discipline into a parameter so that callers understand the dependency. To fix the deadlock, we can now relax the locking: As already established, qeth_l*_remove_device() waits for qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk of having card->discipline ripped out while it's running, and thus doesn't need to take the discipline_mutex. Fixes: 9dc48ccc ("qeth: serialize sysfs-triggered device configurations") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ido Schimmel says: ==================== nexthop: Various fixes This series contains various fixes for the nexthop code. The bugs were uncovered during the development of resilient nexthop groups. Patches #1-#2 fix the error path of nexthop_create_group(). I was not able to trigger these bugs with current code, but it is possible with the upcoming resilient nexthop groups code which adds a user controllable memory allocation further in the function. Patch #3 fixes wrong validation of netlink attributes. Patch #4 fixes wrong invocation of mausezahn in a selftest. ==================== Link: https://lore.kernel.org/r/20210107144824.1135691-1-idosch@idosch.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
For IPv6 traffic, mausezahn needs to be invoked with '-6'. Otherwise an error is returned: # ip netns exec me mausezahn veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn" Failed to set source IPv4 address. Please check if source is set to a valid IPv4 address. Invalid command line parameters! Fixes: 7c741868 ("selftests: Add torture tests to nexthop tests") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
The function nh_check_attr_group() is called to validate nexthop groups. The intention of that code seems to have been to bounce all attributes above NHA_GROUP_TYPE except for NHA_FDB. However instead it bounces all these attributes except when NHA_FDB attribute is present--then it accepts them. NHA_FDB validation that takes place before, in rtm_to_nh_config(), already bounces NHA_OIF, NHA_BLACKHOLE, NHA_ENCAP and NHA_ENCAP_TYPE. Yet further back, NHA_GROUPS and NHA_MASTER are bounced unconditionally. But that still leaves NHA_GATEWAY as an attribute that would be accepted in FDB nexthop groups (with no meaning), so long as it keeps the address family as unspecified: # ip nexthop add id 1 fdb via 127.0.0.1 # ip nexthop add id 10 fdb via default group 1 The nexthop code is still relatively new and likely not used very broadly, and the FDB bits are newer still. Even though there is a reproducer out there, it relies on an improbable gateway arguments "via default", "via all" or "via any". Given all this, I believe it is OK to reformulate the condition to do the right thing and bounce NHA_GATEWAY. Fixes: 38428d68 ("nexthop: support for fdb ecmp nexthops") Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
In case of error, remove the nexthop group entry from the list to which it was previously added. Fixes: 430a0491 ("nexthop: Add support for nexthop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
A reference was not taken for the current nexthop entry, so do not try to put it in the error path. Fixes: 430a0491 ("nexthop: Add support for nexthop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Colin Ian King authored
Currently the error return paths don't kfree lmac and lmac->name leading to some memory leaks. Fix this by adding two error return paths that kfree these objects Addresses-Coverity: ("Resource leak") Fixes: 1463f382 ("octeontx2-af: Add support for CGX link management") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210107123916.189748-1-colin.king@canonical.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ayush Sawal says: ==================== Bug fixes for chtls driver patch 1: Fix hardware tid leak. patch 2: Remove invalid set_tcb call. patch 3: Fix panic when route to peer not configured. patch 4: Avoid unnecessary freeing of oreq pointer. patch 5: Replace skb_dequeue with skb_peek. patch 6: Added a check to avoid NULL pointer dereference patch. patch 7: Fix chtls resources release sequence. ==================== Link: https://lore.kernel.org/r/20210106042912.23512-1-ayush.sawal@chelsio.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
CPL_ABORT_RPL is sent after releasing the resources by calling chtls_release_resources(sk); and chtls_conn_done(sk); eventually causing kernel panic. Fixing it by calling release in appropriate order. Fixes: cc35c88a ("crypto : chtls - CPL handler definition") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-