1. 21 Jul, 2021 15 commits
  2. 20 Jul, 2021 4 commits
    • Dave Jiang's avatar
      dmaengine: idxd: fix sequence for pci driver remove() and shutdown() · 49c4959f
      Dave Jiang authored
      ->shutdown() call should only be responsible for quiescing the device.
      Currently it is doing PCI device tear down. This causes issue when things
      like MMIO mapping is removed while idxd_unregister_devices() will trigger
      removal of idxd device sub-driver and still initiates MMIO writes to the
      device. Another issue is with the unregistering of idxd 'struct device',
      the memory context gets freed. So the teardown calls are accessing freed
      memory and can cause kernel oops. Move all the teardown bits that doesn't
      belong in shutdown to ->remove() call. Move unregistering of the idxd
      conf_dev 'struct device' to after doing all the teardown to free all
      the memory that's no longer needed.
      
      Fixes: 47c16ac2 ("dmaengine: idxd: fix idxd conf_dev 'struct device' lifetime")
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Link: https://lore.kernel.org/r/162629983901.395844.17964803190905549615.stgit@djiang5-desk3.ch.intel.comSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
      49c4959f
    • Dave Jiang's avatar
      dmaengine: idxd: fix submission race window · 6b4b87f2
      Dave Jiang authored
      Konstantin observed that when descriptors are submitted, the descriptor is
      added to the pending list after the submission. This creates a race window
      with the slight possibility that the descriptor can complete before it
      gets added to the pending list and this window would cause the completion
      handler to miss processing the descriptor.
      
      To address the issue, the addition of the descriptor to the pending list
      must be done before it gets submitted to the hardware. However, submitting
      to swq with ENQCMDS instruction can cause a failure with the condition of
      either wq is full or wq is not "active".
      
      With the descriptor allocation being the gate to the wq capacity, it is not
      possible to hit a retry with ENQCMDS submission to the swq. The only
      possible failure can happen is when wq is no longer "active" due to hw
      error and therefore we are moving towards taking down the portal. Given
      this is a rare condition and there's no longer concern over I/O
      performance, the driver can walk the completion lists in order to retrieve
      and abort the descriptor.
      
      The error path will set the descriptor to aborted status. It will take the
      work list lock to prevent further processing of worklist. It will do a
      delete_all on the pending llist to retrieve all descriptors on the pending
      llist. The delete_all action does not require a lock. It will walk through
      the acquired llist to find the aborted descriptor while add all remaining
      descriptors to the work list since it holds the lock. If it does not find
      the aborted descriptor on the llist, it will walk through the work
      list. And if it still does not find the descriptor, then it means the
      interrupt handler has removed the desc from the llist but is pending on
      the work list lock and will process it once the error path releases the
      lock.
      
      Fixes: eb15e715 ("dmaengine: idxd: add interrupt handle request and release support")
      Reported-by: default avatarKonstantin Ananyev <konstantin.ananyev@intel.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Link: https://lore.kernel.org/r/162628855747.360485.10101925573082466530.stgit@djiang5-desk3.ch.intel.comSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
      6b4b87f2
    • Dave Jiang's avatar
      dmaengine: idxd: fix sequence for pci driver remove() and shutdown() · 7eb25da1
      Dave Jiang authored
      ->shutdown() call should only be responsible for quiescing the device.
      Currently it is doing PCI device tear down. This causes issue when things
      like MMIO mapping is removed while idxd_unregister_devices() will trigger
      removal of idxd device sub-driver and still initiates MMIO writes to the
      device. Another issue is with the unregistering of idxd 'struct device',
      the memory context gets freed. So the teardown calls are accessing freed
      memory and can cause kernel oops. Move all the teardown bits that doesn't
      belong in shutdown to ->remove() call. Move unregistering of the idxd
      conf_dev 'struct device' to after doing all the teardown to free all
      the memory that's no longer needed.
      
      Fixes: 47c16ac2 ("dmaengine: idxd: fix idxd conf_dev 'struct device' lifetime")
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Link: https://lore.kernel.org/r/162629983901.395844.17964803190905549615.stgit@djiang5-desk3.ch.intel.comSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
      7eb25da1
    • Dave Jiang's avatar
      dmaengine: idxd: fix desc->vector that isn't being updated · 8ba89a3c
      Dave Jiang authored
      Missing update for desc->vector when the wq vector gets updated. This
      causes the desc->vector to always be at 0.
      
      Fixes: da435aed ("dmaengine: idxd: fix array index when int_handles are being used")
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Link: https://lore.kernel.org/r/162628784374.353761.4736602409627820431.stgit@djiang5-desk3.ch.intel.comSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
      8ba89a3c
  3. 15 Jul, 2021 2 commits
  4. 14 Jul, 2021 12 commits
  5. 11 Jul, 2021 7 commits