1. 10 Nov, 2020 3 commits
    • Tsuchiya Yuto's avatar
      mwifiex: pcie: skip cancel_work_sync() on reset failure path · 4add4d98
      Tsuchiya Yuto authored
      If a reset is performed, but even the reset fails for some reasons (e.g.,
      on Surface devices, the fw reset requires another quirks),
      cancel_work_sync() hangs in mwifiex_cleanup_pcie().
      
          # firmware went into a bad state
          [...]
          [ 1608.281690] mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex...
          [ 1608.282724] mwifiex_pcie 0000:03:00.0: rx_pending=0, tx_pending=1,	cmd_pending=0
          [ 1608.292400] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [ 1608.292405] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          # reset performed after firmware went into a bad state
          [ 1609.394320] mwifiex_pcie 0000:03:00.0: WLAN FW already running! Skip FW dnld
          [ 1609.394335] mwifiex_pcie 0000:03:00.0: WLAN FW is active
          # but even the reset failed
          [ 1619.499049] mwifiex_pcie 0000:03:00.0: mwifiex_cmd_timeout_func: Timeout cmd id = 0xfa, act = 0xe000
          [ 1619.499094] mwifiex_pcie 0000:03:00.0: num_data_h2c_failure = 0
          [ 1619.499103] mwifiex_pcie 0000:03:00.0: num_cmd_h2c_failure = 0
          [ 1619.499110] mwifiex_pcie 0000:03:00.0: is_cmd_timedout = 1
          [ 1619.499117] mwifiex_pcie 0000:03:00.0: num_tx_timeout = 0
          [ 1619.499124] mwifiex_pcie 0000:03:00.0: last_cmd_index = 0
          [ 1619.499133] mwifiex_pcie 0000:03:00.0: last_cmd_id: fa 00 07 01 07 01 07 01 07 01
          [ 1619.499140] mwifiex_pcie 0000:03:00.0: last_cmd_act: 00 e0 00 00 00 00 00 00 00 00
          [ 1619.499147] mwifiex_pcie 0000:03:00.0: last_cmd_resp_index = 3
          [ 1619.499155] mwifiex_pcie 0000:03:00.0: last_cmd_resp_id: 07 81 07 81 07 81 07 81 07 81
          [ 1619.499162] mwifiex_pcie 0000:03:00.0: last_event_index = 2
          [ 1619.499169] mwifiex_pcie 0000:03:00.0: last_event: 58 00 58 00 58 00 58 00 58 00
          [ 1619.499177] mwifiex_pcie 0000:03:00.0: data_sent=0 cmd_sent=1
          [ 1619.499185] mwifiex_pcie 0000:03:00.0: ps_mode=0 ps_state=0
          [ 1619.499215] mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
          # mwifiex_pcie_work hang happening
          [ 1823.233923] INFO: task kworker/3:1:44 blocked for more than 122 seconds.
          [ 1823.233932]       Tainted: G        WC OE     5.10.0-rc1-1-mainline #1
          [ 1823.233935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          [ 1823.233940] task:kworker/3:1     state:D stack:    0 pid:   44 ppid:     2 flags:0x00004000
          [ 1823.233960] Workqueue: events mwifiex_pcie_work [mwifiex_pcie]
          [ 1823.233965] Call Trace:
          [ 1823.233981]  __schedule+0x292/0x820
          [ 1823.233990]  schedule+0x45/0xe0
          [ 1823.233995]  schedule_timeout+0x11c/0x160
          [ 1823.234003]  wait_for_completion+0x9e/0x100
          [ 1823.234012]  __flush_work.isra.0+0x156/0x210
          [ 1823.234018]  ? flush_workqueue_prep_pwqs+0x130/0x130
          [ 1823.234026]  __cancel_work_timer+0x11e/0x1a0
          [ 1823.234035]  mwifiex_cleanup_pcie+0x28/0xd0 [mwifiex_pcie]
          [ 1823.234049]  mwifiex_free_adapter+0x24/0xe0 [mwifiex]
          [ 1823.234060]  _mwifiex_fw_dpc+0x294/0x560 [mwifiex]
          [ 1823.234074]  mwifiex_reinit_sw+0x15d/0x300 [mwifiex]
          [ 1823.234080]  mwifiex_pcie_reset_done+0x50/0x80 [mwifiex_pcie]
          [ 1823.234087]  pci_try_reset_function+0x5c/0x90
          [ 1823.234094]  process_one_work+0x1d6/0x3a0
          [ 1823.234100]  worker_thread+0x4d/0x3d0
          [ 1823.234107]  ? rescuer_thread+0x410/0x410
          [ 1823.234112]  kthread+0x142/0x160
          [ 1823.234117]  ? __kthread_bind_mask+0x60/0x60
          [ 1823.234124]  ret_from_fork+0x22/0x30
          [...]
      
      This is a deadlock caused by calling cancel_work_sync() in
      mwifiex_cleanup_pcie():
      
      - Device resets are done via mwifiex_pcie_card_reset()
      - which schedules card->work to call mwifiex_pcie_card_reset_work()
      - which calls pci_try_reset_function().
      - This leads to mwifiex_pcie_reset_done() be called on the same workqueue,
        which in turn calls
      - mwifiex_reinit_sw() and that calls
      - _mwifiex_fw_dpc().
      
      The problem is now that _mwifiex_fw_dpc() calls mwifiex_free_adapter()
      in case firmware initialization fails. That ends up calling
      mwifiex_cleanup_pcie().
      
      Note that all those calls are still running on the workqueue. So when
      mwifiex_cleanup_pcie() now calls cancel_work_sync(), it's really waiting
      on itself to complete, causing a deadlock.
      
      This commit fixes the deadlock by skipping cancel_work_sync() on a reset
      failure path.
      
      After this commit, when reset fails, the following output is
      expected to be shown:
      
          kernel: mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
          kernel: mwifiex: Failed to bring up adapter: -5
          kernel: mwifiex_pcie 0000:03:00.0: reinit failed: -5
      
      To reproduce this issue, for example, try putting the root port of wifi
      into D3 (replace "00:1d.3" with your setup).
      
          # put into D3 (root port)
          sudo setpci -v -s 00:1d.3 CAP_PM+4.b=0b
      
      Cc: Maximilian Luz <luzmaximilian@gmail.com>
      Signed-off-by: default avatarTsuchiya Yuto <kitakar@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201028142346.18355-1-kitakar@gmail.com
      4add4d98
    • Tsuchiya Yuto's avatar
      mwifiex: update comment for shutdown_sw()/reinit_sw() to reflect current state · 566b4cb9
      Tsuchiya Yuto authored
      The functions mwifiex_shutdown_sw() and mwifiex_reinit_sw() can be used
      for more general purposes than the PCIe function level reset. Also, these
      are even not PCIe-specific.
      
      So, let's update the comments at the top of each function accordingly.
      Signed-off-by: default avatarTsuchiya Yuto <kitakar@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201028142110.18144-3-kitakar@gmail.com
      566b4cb9
    • Tsuchiya Yuto's avatar
      mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure · fa74cb1d
      Tsuchiya Yuto authored
      When a PCIe function level reset (FLR) is performed but without fw reset for
      some reasons (e.g., on Microsoft Surface devices, fw reset requires other
      quirks), it fails to reset wifi properly. You can trigger the issue on such
      devices via debugfs entry for reset:
      
          $ echo 1 | sudo tee /sys/kernel/debug/mwifiex/mlan0/reset
      
      and the resulting dmesg log:
      
          [   45.740508] mwifiex_pcie 0000:03:00.0: Resetting per request
          [   45.742937] mwifiex_pcie 0000:03:00.0: info: successfully disconnected from [BSSID]: reason code 3
          [   45.744666] mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex...
          [   45.751530] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [   45.751539] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [   45.771691] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [   45.771695] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [   45.771697] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [   45.771698] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [   45.771699] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [   45.771701] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [   45.771702] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [   45.771703] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [   45.771704] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [   45.771705] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [   45.771707] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
          [   45.771708] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [   53.099343] mwifiex_pcie 0000:03:00.0: info: trying to associate to '[SSID]' bssid [BSSID]
          [   53.241870] mwifiex_pcie 0000:03:00.0: info: associated to bssid [BSSID] successfully
          [   75.377942] mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110
          [   85.385491] mwifiex_pcie 0000:03:00.0: info: successfully disconnected from [BSSID]: reason code 15
          [   87.539408] mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110
          [   87.539412] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [   99.699917] mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110
          [   99.699925] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [  111.859802] mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110
          [  111.859808] mwifiex_pcie 0000:03:00.0: deleting the crypto keys
          [...]
      
      When comparing mwifiex_shutdown_sw() with mwifiex_pcie_remove(), it
      lacks mwifiex_init_shutdown_fw().
      
      This commit fixes mwifiex_shutdown_sw() by adding the missing
      mwifiex_init_shutdown_fw().
      
      Fixes: 4c5dae59 ("mwifiex: add PCIe function level reset support")
      Signed-off-by: default avatarTsuchiya Yuto <kitakar@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201028142110.18144-2-kitakar@gmail.com
      fa74cb1d
  2. 07 Nov, 2020 37 commits