• Pierre-Louis Bossart's avatar
    soundwire: revisit driver bind/unbind and callbacks · bd29c00e
    Pierre-Louis Bossart authored
    In the SoundWire probe, we store a pointer from the driver ops into
    the 'slave' structure. This can lead to kernel oopses when unbinding
    codec drivers, e.g. with the following sequence to remove machine
    driver and codec driver.
    
    /sbin/modprobe -r snd_soc_sof_sdw
    /sbin/modprobe -r snd_soc_rt711
    
    The full details can be found in the BugLink below, for reference the
    two following examples show different cases of driver ops/callbacks
    being invoked after the driver .remove().
    
    kernel: BUG: kernel NULL pointer dereference, address: 0000000000000150
    kernel: Workqueue: events cdns_update_slave_status_work [soundwire_cadence]
    kernel: RIP: 0010:mutex_lock+0x19/0x30
    kernel: Call Trace:
    kernel:  ? sdw_handle_slave_status+0x426/0xe00 [soundwire_bus 94ff184bf398570c3f8ff7efe9e32529f532e4ae]
    kernel:  ? newidle_balance+0x26a/0x400
    kernel:  ? cdns_update_slave_status_work+0x1e9/0x200 [soundwire_cadence 1bcf98eebe5ba9833cd433323769ac923c9c6f82]
    
    kernel: BUG: unable to handle page fault for address: ffffffffc07654c8
    kernel: Workqueue: pm pm_runtime_work
    kernel: RIP: 0010:sdw_bus_prep_clk_stop+0x6f/0x160 [soundwire_bus]
    kernel: Call Trace:
    kernel:  <TASK>
    kernel:  sdw_cdns_clock_stop+0xb5/0x1b0 [soundwire_cadence 1bcf98eebe5ba9833cd433323769ac923c9c6f82]
    kernel:  intel_suspend_runtime+0x5f/0x120 [soundwire_intel aca858f7c87048d3152a4a41bb68abb9b663a1dd]
    kernel:  ? dpm_sysfs_remove+0x60/0x60
    
    This was not detected earlier in Intel tests since the tests first
    remove the parent PCI device and shut down the bus. The sequence
    above is a corner case which keeps the bus operational but without a
    driver bound.
    
    While trying to solve this kernel oopses, it became clear that the
    existing SoundWire bus does not deal well with the unbind case.
    
    Commit 528be501 ("soundwire: sdw_slave: add probe_complete structure and new fields")
    added a 'probed' status variable and a 'probe_complete'
    struct completion. This status is however not reset on remove and
    likewise the 'probe complete' is not re-initialized, so the
    bind/unbind/bind test cases would fail. The timeout used before the
    'update_status' callback was also a bad idea in hindsight, there
    should really be no timing assumption as to if and when a driver is
    bound to a device.
    
    An initial draft was based on device_lock() and device_unlock() was
    tested. This proved too complicated, with deadlocks created during the
    suspend-resume sequences, which also use the same device_lock/unlock()
    as the bind/unbind sequences. On a CometLake device, a bad DSDT/BIOS
    caused spurious resumes and the use of device_lock() caused hangs
    during suspend. After multiple weeks or testing and painful
    reverse-engineering of deadlocks on different devices, we looked for
    alternatives that did not interfere with the device core.
    
    A bus notifier was used successfully to keep track of DRIVER_BOUND and
    DRIVER_UNBIND events. This solved the bind-unbind-bind case in tests,
    but it can still be defeated with a theoretical corner case where the
    memory is freed by a .remove while the callback is in use. The
    notifier only helps make sure the driver callbacks are valid, but not
    that the memory allocated in probe remains valid while the callbacks
    are invoked.
    
    This patch suggests the introduction of a new 'sdw_dev_lock' mutex
    protecting probe/remove and all driver callbacks. Since this mutex is
    'local' to SoundWire only, it does not interfere with existing locks
    and does not create deadlocks. In addition, this patch removes the
    'probe_complete' completion, instead we directly invoke the
    'update_status' from the probe routine. That removes any sort of
    timing dependency and a much better support for the device/driver
    model, the driver could be bound before the bus started, or eons after
    the bus started and the hardware would be properly initialized in all
    cases.
    
    BugLink: https://github.com/thesofproject/linux/issues/3531
    Fixes: 56d4fe31 ("soundwire: Add MIPI DisCo property helpers")
    Fixes: 528be501 ("soundwire: sdw_slave: add probe_complete structure and new fields")
    Signed-off-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    Reviewed-by: default avatarRander Wang <rander.wang@intel.com>
    Reviewed-by: default avatarRanjani Sridharan <ranjani.sridharan@linux.intel.com>
    Reviewed-by: default avatarBard Liao <yung-chuan.liao@linux.intel.com>
    Reviewed-by: default avatarPéter Ujfalusi <peter.ujfalusi@linux.intel.com>
    Link: https://lore.kernel.org/r/20220621225641.221170-2-pierre-louis.bossart@linux.intel.comSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
    bd29c00e
bus.c 44.8 KB