• Reinette Chatre's avatar
    dmaengine: idxd: Let probe fail when workqueue cannot be enabled · b51b75f0
    Reinette Chatre authored
    The workqueue is enabled when the appropriate driver is loaded and
    disabled when the driver is removed. When the driver is removed it
    assumes that the workqueue was enabled successfully and proceeds to
    free allocations made during workqueue enabling.
    
    Failure during workqueue enabling does not prevent the driver from
    being loaded. This is because the error path within drv_enable_wq()
    returns success unless a second failure is encountered
    during the error path. By returning success it is possible to load
    the driver even if the workqueue cannot be enabled and
    allocations that do not exist are attempted to be freed during
    driver remove.
    
    Some examples of problematic flows:
    (a)
    
     idxd_dmaengine_drv_probe() -> drv_enable_wq() -> idxd_wq_request_irq():
     In above flow, if idxd_wq_request_irq() fails then
     idxd_wq_unmap_portal() is called on error exit path, but
     drv_enable_wq() returns 0 because idxd_wq_disable() succeeds. The
     driver is thus loaded successfully.
    
     idxd_dmaengine_drv_remove()->drv_disable_wq()->idxd_wq_unmap_portal()
     Above flow on driver unload triggers the WARN in devm_iounmap() because
     the device resource has already been removed during error path of
     drv_enable_wq().
    
    (b)
    
     idxd_dmaengine_drv_probe() -> drv_enable_wq() -> idxd_wq_request_irq():
     In above flow, if idxd_wq_request_irq() fails then
     idxd_wq_init_percpu_ref() is never called to initialize the percpu
     counter, yet the driver loads successfully because drv_enable_wq()
     returns 0.
    
     idxd_dmaengine_drv_remove()->__idxd_wq_quiesce()->percpu_ref_kill():
     Above flow on driver unload triggers a BUG when attempting to drop the
     initial ref of the uninitialized percpu ref:
     BUG: kernel NULL pointer dereference, address: 0000000000000010
    
    Fix the drv_enable_wq() error path by returning the original error that
    indicates failure of workqueue enabling. This ensures that the probe
    fails when an error is encountered and the driver remove paths are only
    attempted when the workqueue was enabled successfully.
    
    Fixes: 1f2bb403 ("dmaengine: idxd: move wq_enable() to device.c")
    Signed-off-by: default avatarReinette Chatre <reinette.chatre@intel.com>
    Reviewed-by: default avatarDave Jiang <dave.jiang@intel.com>
    Reviewed-by: default avatarFenghua Yu <fenghua.yu@intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/e8d8116e5efa0fd14fadc5adae6ffd319f0e5ff1.1670452419.git.reinette.chatre@intel.comSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
    b51b75f0
device.c 36 KB