1. 08 Feb, 2019 3 commits
    • Geert Uytterhoeven's avatar
      driver core: Postpone DMA tear-down until after devres release · 376991db
      Geert Uytterhoeven authored
      When unbinding the (IOMMU-enabled) R-Car SATA device on Salvator-XS
      (R-Car H3 ES2.0), in preparation of rebinding against vfio-platform for
      device pass-through for virtualization:
      
          echo ee300000.sata > /sys/bus/platform/drivers/sata_rcar/unbind
      
      the kernel crashes with:
      
          Unable to handle kernel paging request at virtual address ffffffbf029ffffc
          Mem abort info:
            ESR = 0x96000006
            Exception class = DABT (current EL), IL = 32 bits
            SET = 0, FnV = 0
            EA = 0, S1PTW = 0
          Data abort info:
            ISV = 0, ISS = 0x00000006
            CM = 0, WnR = 0
          swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007e8c586c
          [ffffffbf029ffffc] pgd=000000073bfc6003, pud=000000073bfc6003, pmd=0000000000000000
          Internal error: Oops: 96000006 [#1] SMP
          Modules linked in:
          CPU: 0 PID: 1098 Comm: bash Not tainted 5.0.0-rc5-salvator-x-00452-g37596f884f4318ef #287
          Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
          pstate: 60400005 (nZCv daif +PAN -UAO)
          pc : __free_pages+0x8/0x58
          lr : __dma_direct_free_pages+0x50/0x5c
          sp : ffffff801268baa0
          x29: ffffff801268baa0 x28: 0000000000000000
          x27: ffffffc6f9c60bf0 x26: ffffffc6f9c60bf0
          x25: ffffffc6f9c60810 x24: 0000000000000000
          x23: 00000000fffff000 x22: ffffff8012145000
          x21: 0000000000000800 x20: ffffffbf029fffc8
          x19: 0000000000000000 x18: ffffffc6f86c42c8
          x17: 0000000000000000 x16: 0000000000000070
          x15: 0000000000000003 x14: 0000000000000000
          x13: ffffff801103d7f8 x12: 0000000000000028
          x11: ffffff8011117604 x10: 0000000000009ad8
          x9 : ffffff80110126d0 x8 : ffffffc6f7563000
          x7 : 6b6b6b6b6b6b6b6b x6 : 0000000000000018
          x5 : ffffff8011cf3cc8 x4 : 0000000000004000
          x3 : 0000000000080000 x2 : 0000000000000001
          x1 : 0000000000000000 x0 : ffffffbf029fffc8
          Process bash (pid: 1098, stack limit = 0x00000000c38e3e32)
          Call trace:
           __free_pages+0x8/0x58
           __dma_direct_free_pages+0x50/0x5c
           arch_dma_free+0x1c/0x98
           dma_direct_free+0x14/0x24
           dma_free_attrs+0x9c/0xdc
           dmam_release+0x18/0x20
           release_nodes+0x25c/0x28c
           devres_release_all+0x48/0x4c
           device_release_driver_internal+0x184/0x1f0
           device_release_driver+0x14/0x1c
           unbind_store+0x70/0xb8
           drv_attr_store+0x24/0x34
           sysfs_kf_write+0x4c/0x64
           kernfs_fop_write+0x154/0x1c4
           __vfs_write+0x34/0x164
           vfs_write+0xb4/0x16c
           ksys_write+0x5c/0xbc
           __arm64_sys_write+0x14/0x1c
           el0_svc_common+0x98/0x114
           el0_svc_handler+0x1c/0x24
           el0_svc+0x8/0xc
          Code: d51b4234 17fffffa a9bf7bfd 910003fd (b9403404)
          ---[ end trace 8c564cdd3a1a840f ]---
      
      While I've bisected this to commit e8e683ae ("iommu/of: Fix
      probe-deferral"), and reverting that commit on post-v5.0-rc4 kernels
      does fix the problem, this turned out to be a red herring.
      
      On arm64, arch_teardown_dma_ops() resets dev->dma_ops to NULL.
      Hence if a driver has used a managed DMA allocation API, the allocated
      DMA memory will be freed using the direct DMA ops, while it may have
      been allocated using a custom DMA ops (iommu_dma_ops in this case).
      
      Fix this by reversing the order of the calls to devres_release_all() and
      arch_teardown_dma_ops().
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable <stable@vger.kernel.org>
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      376991db
    • Rafael J. Wysocki's avatar
      driver core: Document limitation related to DL_FLAG_RPM_ACTIVE · 70fb9a25
      Rafael J. Wysocki authored
      If device_link_add() is called twice in a row to create a stateless
      device link for the same consumer-supplier pair without an attempt
      to delete the link between these calls, and the second caller passes
      DL_FLAG_RPM_ACTIVE to it in flags, calling either device_link_del()
      or device_link_remove() immediately after that will leave the link's
      supplier device with nonzero PM-runtime usage counter, which may
      prevent the supplier from being runtime-suspended going forward
      until the link is deleted by another invocation of device_link_del()
      or device_link_remove() for it.
      
      Even though this is confusing and may lead to subtle issues, trying
      to avoid it in the framework also may cause problems to appear, so
      document it as a known limitation.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70fb9a25
    • Rafael J. Wysocki's avatar
      PM-runtime: Take suppliers into account in __pm_runtime_set_status() · 4080ab08
      Rafael J. Wysocki authored
      If the target device has any suppliers, as reflected by device links
      to them, __pm_runtime_set_status() does not take them into account,
      which is not consistent with the other parts of the PM-runtime
      framework and may lead to programming mistakes.
      
      Modify __pm_runtime_set_status() to take suppliers into account by
      activating them upfront if the new status is RPM_ACTIVE and
      deactivating them on exit if the new status is RPM_SUSPENDED.
      
      If the activation of one of the suppliers fails, the new status
      will be RPM_SUSPENDED and the (remaining) suppliers will be
      deactivated on exit (the child count of the device's parent
      will be dropped too then).
      
      Of course, adding device links locking to __pm_runtime_set_status()
      means that it cannot be run fron interrupt context, so make it use
      spin_lock_irq() and spin_unlock_irq() instead of spin_lock_irqsave()
      and spin_unlock_irqrestore(), respectively.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4080ab08
  2. 03 Feb, 2019 1 commit
  3. 01 Feb, 2019 10 commits
    • Stephen Rothwell's avatar
      firmware: intel_stratix10_service: remove COMPILE_TEST · 7cbc2b42
      Stephen Rothwell authored
      This does not build yet ...
      
      Cc: Richard Gong <richard.gong@intel.com>
      Fixes: 095ff29d ("firmware: intel_stratix10_service: add hardware dependency")
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cbc2b42
    • Rafael J. Wysocki's avatar
      driver core: Add device link flag DL_FLAG_AUTOPROBE_CONSUMER · e7dd4010
      Rafael J. Wysocki authored
      Add a new device link flag, DL_FLAG_AUTOPROBE_CONSUMER, to request the
      driver core to probe for a consumer driver automatically after binding
      a driver to the supplier device on a persistent managed device link.
      
      As unbinding the supplier driver on a managed device link causes the
      consumer driver to be detached from its device automatically, this
      flag provides a complementary mechanism which is needed to address
      some "composite device" use cases.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7dd4010
    • Rafael J. Wysocki's avatar
      driver core: Make driver core own stateful device links · 72175d4e
      Rafael J. Wysocki authored
      Even though stateful device links are managed by the driver core in
      principle, their creators are allowed and sometimes even expected
      to drop references to them via device_link_del() or
      device_link_remove(), but that doesn't really play well with the
      "persistent" link concept.
      
      If "persistent" managed device links are created from driver
      probe callbacks, device_link_add() called to do that will take a
      new reference on the link each time the callback runs and those
      references will never be dropped, which kind of isn't nice.
      
      This issues arises because of the link reference counting carried
      out by device_link_add() for existing links, but that is only done to
      avoid deleting device links that may still be necessary, which
      shouldn't be a concern for managed (stateful) links.  These device
      links are managed by the driver core and whoever creates one of them
      will need it at least as long as until the consumer driver is detached
      from its device and deleting it may be left to the driver core just
      fine.
      
      For this reason, rework device_link_add() to apply the reference
      counting to stateless links only and make device_link_del() and
      device_link_remove() drop references to stateless links only too.
      After this change, if called to add a stateful device link for
      a consumer-supplier pair for which a stateful device link is
      present already, device_link_add() will return the existing link
      without incrementing its reference counter.  Accordingly,
      device_link_del() and device_link_remove() will WARN() and do
      nothing when called to drop a reference to a stateful link.  Thus,
      effectively, all stateful device links will be owned by the driver
      core.
      
      In addition, clean up the handling of the link management flags,
      DL_FLAG_AUTOREMOVE_CONSUMER and DL_FLAG_AUTOREMOVE_SUPPLIER, so that
      (a) they are never set at the same time and (b) if device_link_add()
      is called for a consumer-supplier pair with an existing stateful link
      between them, the flags of that link will be combined with the flags
      passed to device_link_add() to ensure that the life time of the link
      is sufficient for all of the callers of device_link_add() for the
      same consumer-supplier pair.
      
      Update the device_link_add() kerneldoc comment to reflect the
      above changes.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72175d4e
    • Rafael J. Wysocki's avatar
      IOMMU: Make dwo drivers use stateless device links · ea4f6400
      Rafael J. Wysocki authored
      The device links used by rockchip-iommu and exynos-iommu are
      completely managed by these drivers within the IOMMU framework,
      so there is no reason to involve the driver core in the management
      of these links.
      
      For this reason, make rockchip-iommu and exynos-iommu pass
      DL_FLAG_STATELESS in flags to device_link_add(), so that the device
      links used by them are stateless.
      
      [Note that this change is requisite for a subsequent one that will
       rework the management of stateful device links in the driver core
       and it will not be compatible with the two drivers in question any
       more.]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Acked-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea4f6400
    • Rafael J. Wysocki's avatar
      driver core: Do not call rpm_put_suppliers() in pm_runtime_drop_link() · a1fdbfbb
      Rafael J. Wysocki authored
      Calling rpm_put_suppliers() from pm_runtime_drop_link() is excessive
      as it affects all suppliers of the consumer device and not just the
      one pointed to by the device link being dropped.  Worst case it may
      cause the consumer device to stop working unexpectedly.  Moreover, in
      principle it is racy with respect to runtime PM of the consumer
      device.
      
      To avoid these problems drop runtime PM references on the particular
      supplier pointed to by the link in question only and do that after
      the link has been dropped from the consumer device's list of links to
      suppliers, which is in device_link_free().
      
      Fixes: a0504aec ("PM / runtime: Drop usage count for suppliers at device link removal")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1fdbfbb
    • Rafael J. Wysocki's avatar
      driver core: Fix adding device links to probing suppliers · 15cfb094
      Rafael J. Wysocki authored
      Currently, it is not valid to add a device link from a consumer
      driver ->probe callback to a supplier that is still probing too, but
      generally this is a valid use case.  For example, if the consumer has
      just acquired a resource that can only be available if the supplier
      is functional, adding a device link to that supplier right away
      should be safe (and even desirable arguably), but device_link_add()
      doesn't handle that case correctly and the initial state of the link
      created by it is wrong then.
      
      To address this problem, change the initial state of device links
      added between a probing supplier and a probing consumer to
      DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to
      skip such links on the supplier side.
      
      With this change, if the supplier probe completes first,
      device_links_driver_bound() called for it will skip the link state
      update and when it is called for the consumer, the link state will
      be updated to "active".  In turn, if the consumer probe completes
      first, device_links_driver_bound() called for it will change the
      state of the link to "active" and when it is called for the
      supplier, the link status update will be skipped.
      
      However, in principle the supplier or consumer probe may still fail
      after the link has been added, so modify device_links_no_driver() to
      change device links in the "active" or "consumer probe" state to
      "dormant" on the supplier side and update __device_links_no_driver()
      to change the link state to "available" only if it is "consumer
      probe" or "active".
      
      Then, if the supplier probe fails first, the leftover link to the
      probing consumer will become "dormant" and device_links_no_driver()
      called for the consumer (when its probe fails) will clean it up.
      In turn, if the consumer probe fails first, it will either drop the
      link, or change its state to "available" and, in the latter case,
      when device_links_no_driver() is called for the supplier, it will
      update the link state to "dormant".  [If the supplier probe fails,
      but the consumer probe succeeds, which should not happen as long as
      the consumer driver is correct, the link still will be around, but
      it will be "dormant" until the supplier is probed again.]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15cfb094
    • Rafael J. Wysocki's avatar
      driver core: Fix handling of runtime PM flags in device_link_add() · e2f3cd83
      Rafael J. Wysocki authored
      After commit ead18c23 ("driver core: Introduce device links
      reference counting"), if there is a link between the given supplier
      and the given consumer already, device_link_add() will refcount it
      and return it unconditionally without updating its flags.  It is
      possible, however, that the second (or any subsequent) caller of
      device_link_add() for the same consumer-supplier pair will pass
      DL_FLAG_PM_RUNTIME, possibly along with DL_FLAG_RPM_ACTIVE, in flags
      to it and the existing link may not behave as expected then.
      
      First, if DL_FLAG_PM_RUNTIME is not set in the existing link's flags
      at all, it needs to be set like during the original initialization of
      the link.
      
      Second, if DL_FLAG_RPM_ACTIVE is passed to device_link_add() in flags
      (in addition to DL_FLAG_PM_RUNTIME), the existing link should to be
      updated to reflect the "active" runtime PM configuration of the
      consumer-supplier pair and extra care must be taken here to avoid
      possible destructive races with runtime PM of the consumer.
      
      To that end, redefine the rpm_active field in struct device_link
      as a refcount, initialize it to 1 and make rpm_resume() (for the
      consumer) and device_link_add() increment it whenever they acquire
      a runtime PM reference on the supplier device.  Accordingly, make
      rpm_suspend() (for the consumer) and pm_runtime_clean_up_links()
      decrement it and drop runtime PM references to the supplier
      device in a loop until rpm_active becones 1 again.
      
      Fixes: ead18c23 ("driver core: Introduce device links reference counting")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2f3cd83
    • Rafael J. Wysocki's avatar
      driver core: Do not resume suppliers under device_links_write_lock() · 5db25c9e
      Rafael J. Wysocki authored
      It is incorrect to call pm_runtime_get_sync() under
      device_links_write_lock(), because it may end up trying to take
      device_links_read_lock() while resuming the target device and that
      will deadlock in the non-SRCU case, so avoid that by resuming the
      supplier device in device_link_add() before calling
      device_links_write_lock().
      
      Fixes: 21d5c57b ("PM / runtime: Use device links")
      Fixes: baa8809f ("PM / runtime: Optimize the use of device links")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5db25c9e
    • Rafael J. Wysocki's avatar
      driver core: Avoid careless re-use of existing device links · f265df55
      Rafael J. Wysocki authored
      After commit ead18c23 ("driver core: Introduce device links
      reference counting"), if there is a link between the given supplier
      and the given consumer already, device_link_add() will refcount it
      and return it unconditionally.  However, if the flags passed to
      it on the second (or any subsequent) attempt to create a device
      link between the same consumer-supplier pair are not compatible with
      the existing link's flags, that is incorrect.
      
      First off, if the existing link is stateless and the next caller of
      device_link_add() for the same consumer-supplier pair wants a
      stateful one, or the other way around, the existing link cannot be
      returned, because it will not match the expected behavior, so make
      device_link_add() dump the stack and return NULL in that case.
      
      Moreover, if the DL_FLAG_AUTOREMOVE_CONSUMER flag is passed to
      device_link_add(), its caller will expect its reference to the link
      to be dropped automatically on consumer driver removal, which will
      not happen if that flag is not set in the link's flags (and
      analogously for DL_FLAG_AUTOREMOVE_SUPPLIER).  For this reason, make
      device_link_add() update the existing link's flags accordingly
      before returning it to the caller.
      
      Fixes: ead18c23 ("driver core: Introduce device links reference counting")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f265df55
    • Rafael J. Wysocki's avatar
      driver core: Fix DL_FLAG_AUTOREMOVE_SUPPLIER device link flag handling · c8d50986
      Rafael J. Wysocki authored
      Change the list walk in device_links_driver_cleanup() to a safe one
      to avoid use-after-free when dropping a link from the list during the
      walk.
      
      Also, while at it, fix device_link_add() to refuse to create
      stateless device links with DL_FLAG_AUTOREMOVE_SUPPLIER set, which is
      an invalid combination (setting that flag means that the driver core
      should manage the link, so it cannot be stateless), and extend the
      kerneldoc comment of device_link_add() to cover the
      DL_FLAG_AUTOREMOVE_SUPPLIER flag properly too.
      
      Fixes: 1689cac5 ("driver core: Add flag to autoremove device link on supplier unbind")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8d50986
  4. 31 Jan, 2019 12 commits
    • Sergei Shtylyov's avatar
      devres: always use dev_name() in devm_ioremap_resource() · 8d84b18f
      Sergei Shtylyov authored
      devm_ioremap_resource() prefers calling devm_request_mem_region() with a
      resource name instead of a device name -- this looks pretty iff a resource
      name isn't specified via a device tree with a "reg-names" property (in this
      case, a resource name is set to a device node's full name), but if it is,
      it doesn't really scale since these names are only unique to a given device
      node, not globally; so, looking at the output of 'cat /proc/iomem', you do
      not have an idea which memory region belongs to which device (see "dirmap",
      "regs", and "wbuf" lines below):
      
      08000000-0bffffff : dirmap
      48000000-bfffffff : System RAM
        48000000-48007fff : reserved
        48080000-48b0ffff : Kernel code
        48b10000-48b8ffff : reserved
        48b90000-48c7afff : Kernel data
        bc6a4000-bcbfffff : reserved
        bcc0f000-bebfffff : reserved
        bec0e000-bec0efff : reserved
        bec11000-bec11fff : reserved
        bec12000-bec14fff : reserved
        bec15000-bfffffff : reserved
      e6050000-e605004f : gpio@e6050000
      e6051000-e605104f : gpio@e6051000
      e6052000-e605204f : gpio@e6052000
      e6053000-e605304f : gpio@e6053000
      e6054000-e605404f : gpio@e6054000
      e6055000-e605504f : gpio@e6055000
      e6060000-e606050b : pin-controller@e6060000
      e6e60000-e6e6003f : e6e60000.serial
      e7400000-e7400fff : ethernet@e7400000
      ee200000-ee2001ff : regs
      ee208000-ee2080ff : wbuf
      
      I think that devm_request_mem_region() should be called with dev_name()
      despite the region names won't look as pretty as before (however, we gain
      more consistency with e.g. the serial driver:
      
      08000000-0bffffff : ee200000.rpc
      48000000-bfffffff : System RAM
        48000000-48007fff : reserved
        48080000-48b0ffff : Kernel code
        48b10000-48b8ffff : reserved
        48b90000-48c7afff : Kernel data
        bc6a4000-bcbfffff : reserved
        bcc0f000-bebfffff : reserved
        bec0e000-bec0efff : reserved
        bec11000-bec11fff : reserved
        bec12000-bec14fff : reserved
        bec15000-bfffffff : reserved
      e6050000-e605004f : e6050000.gpio
      e6051000-e605104f : e6051000.gpio
      e6052000-e605204f : e6052000.gpio
      e6053000-e605304f : e6053000.gpio
      e6054000-e605404f : e6054000.gpio
      e6055000-e605504f : e6055000.gpio
      e6060000-e606050b : e6060000.pin-controller
      e6e60000-e6e6003f : e6e60000.serial
      e7400000-e7400fff : e7400000.ethernet
      ee200000-ee2001ff : ee200000.rpc
      ee208000-ee2080ff : ee200000.rpc
      
      Fixes: 72f8c0bf ("lib: devres: add convenience function to remap a resource")
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d84b18f
    • Mathieu Malaterre's avatar
      drivers: base: Use __printf markup to silence compiler · fa548d79
      Mathieu Malaterre authored
      Silence warnings (triggered at W=1) by adding relevant __printf
      attributes.
      
        drivers/base/cpu.c:432:2: warning: function '__cpu_device_create' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa548d79
    • Richard Gong's avatar
      firmware: intel_stratix10_service: add hardware dependency · 095ff29d
      Richard Gong authored
      Add a Kconfig dependency to ensure Intel Stratix10 service layer driver
      can be built only on the platform that supports it.
      Signed-off-by: default avatarRichard Gong <richard.gong@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      095ff29d
    • Alexander Duyck's avatar
      driver core: Rewrite test_async_driver_probe to cover serialization and NUMA affinity · 57ea974f
      Alexander Duyck authored
      The current async_probe test code is only testing one device allocated
      prior to driver load and only loading one device afterwards. Instead of
      doing things this way it makes much more sense to load one device per CPU
      in order to actually stress the async infrastructure. By doing this we
      should see delays significantly increase in the event of devices being
      serialized.
      
      In addition I have updated the test to verify that we are trying to place
      the work on the correct NUMA node when we are running in async mode. By
      doing this we can verify the best possible outcome for device and driver
      load times.
      
      I have added a timeout value that is used to disable the sleep and instead
      cause the probe routine to report an error indicating it timed out. By
      doing this we limit the maximum runtime for the test to 20 seconds or less.
      
      The last major change in this set is that I have gone through and tuned it
      for handling the massive number of possible events that will be scheduled.
      Instead of reporting the sleep for each individual device it is moved to
      only being displayed if we enable debugging.
      
      With this patch applied below are what a failing test and a passing test
      should look like. I elided a few hundred lines in the failing test that
      were duplicated since the system I was testing on had a massive number of
      CPU cores:
      
      -- Failing --
      [  243.524697] test_async_driver_probe: registering first set of asynchronous devices...
      [  243.535625] test_async_driver_probe: registering asynchronous driver...
      [  243.543038] test_async_driver_probe: registration took 0 msecs
      [  243.549559] test_async_driver_probe: registering second set of asynchronous devices...
      [  243.568350] platform test_async_driver.447: registration took 9 msecs
      [  243.575544] test_async_driver_probe: registering first synchronous device...
      [  243.583454] test_async_driver_probe: registering synchronous driver...
      [  248.825920] test_async_driver_probe: registration took 5235 msecs
      [  248.825922] test_async_driver_probe: registering second synchronous device...
      [  248.825928] test_async_driver test_async_driver.443: NUMA node mismatch 3 != 1
      [  248.825932] test_async_driver test_async_driver.445: NUMA node mismatch 3 != 1
      [  248.825935] test_async_driver test_async_driver.446: NUMA node mismatch 3 != 1
      [  248.825939] test_async_driver test_async_driver.440: NUMA node mismatch 3 != 1
      [  248.825943] test_async_driver test_async_driver.441: NUMA node mismatch 3 != 1
      ...
      [  248.827150] test_async_driver test_async_driver.229: NUMA node mismatch 0 != 1
      [  248.827158] test_async_driver test_async_driver.228: NUMA node mismatch 0 != 1
      [  248.827220] test_async_driver test_async_driver.281: NUMA node mismatch 2 != 1
      [  248.827229] test_async_driver test_async_driver.282: NUMA node mismatch 2 != 1
      [  248.827240] test_async_driver test_async_driver.280: NUMA node mismatch 2 != 1
      [  253.945834] test_async_driver test_async_driver.1: NUMA node mismatch 0 != 1
      [  253.945878] test_sync_driver test_sync_driver.1: registration took 5119 msecs
      [  253.961693] test_async_driver_probe: async events still pending, forcing timeout and synchronize
      [  259.065839] test_async_driver test_async_driver.2: NUMA node mismatch 0 != 1
      [  259.073786] test_async_driver test_async_driver.3: async probe took too long
      [  259.081669] test_async_driver test_async_driver.3: NUMA node mismatch 0 != 1
      [  259.089569] test_async_driver test_async_driver.4: async probe took too long
      [  259.097451] test_async_driver test_async_driver.4: NUMA node mismatch 0 != 1
      [  259.105338] test_async_driver test_async_driver.5: async probe took too long
      [  259.113204] test_async_driver test_async_driver.5: NUMA node mismatch 0 != 1
      [  259.121089] test_async_driver test_async_driver.6: async probe took too long
      [  259.128961] test_async_driver test_async_driver.6: NUMA node mismatch 0 != 1
      [  259.136850] test_async_driver test_async_driver.7: async probe took too long
      ...
      [  262.124062] test_async_driver test_async_driver.221: async probe took too long
      [  262.132130] test_async_driver test_async_driver.221: NUMA node mismatch 3 != 1
      [  262.140206] test_async_driver test_async_driver.222: async probe took too long
      [  262.148277] test_async_driver test_async_driver.222: NUMA node mismatch 3 != 1
      [  262.156351] test_async_driver test_async_driver.223: async probe took too long
      [  262.164419] test_async_driver test_async_driver.223: NUMA node mismatch 3 != 1
      [  262.172630] test_async_driver_probe: Test failed with 222 errors and 336 warnings
      
      -- Passing --
      [  105.419247] test_async_driver_probe: registering first set of asynchronous devices...
      [  105.432040] test_async_driver_probe: registering asynchronous driver...
      [  105.439718] test_async_driver_probe: registration took 0 msecs
      [  105.446239] test_async_driver_probe: registering second set of asynchronous devices...
      [  105.477986] platform test_async_driver.447: registration took 22 msecs
      [  105.485276] test_async_driver_probe: registering first synchronous device...
      [  105.493169] test_async_driver_probe: registering synchronous driver...
      [  110.597981] test_async_driver_probe: registration took 5097 msecs
      [  110.604806] test_async_driver_probe: registering second synchronous device...
      [  115.707490] test_sync_driver test_sync_driver.1: registration took 5094 msecs
      [  115.715478] test_async_driver_probe: completed successfully
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57ea974f
    • Alexander Duyck's avatar
      libnvdimm: Schedule device registration on node local to the device · af87b9a7
      Alexander Duyck authored
      Force the device registration for nvdimm devices to be closer to the actual
      device. This is achieved by using either the NUMA node ID of the region, or
      of the parent. By doing this we can have everything above the region based
      on the region, and everything below the region based on the nvdimm bus.
      
      By guaranteeing NUMA locality I see an improvement of as high as 25% for
      per-node init of a system with 12TB of persistent memory.
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af87b9a7
    • Alexander Duyck's avatar
      PM core: Use new async_schedule_dev command · 8b9ec6b7
      Alexander Duyck authored
      Use the device specific version of the async_schedule commands to defer
      various tasks related to power management. By doing this we should see a
      slight improvement in performance as any device that is sensitive to
      latency/locality in the setup will now be initializing on the node closest
      to the device.
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b9ec6b7
    • Alexander Duyck's avatar
      driver core: Attach devices on CPU local to device node · c37e20ea
      Alexander Duyck authored
      Call the asynchronous probe routines on a CPU local to the device node. By
      doing this we should be able to improve our initialization time
      significantly as we can avoid having to access the device from a remote
      node which may introduce higher latency.
      
      For example, in the case of initializing memory for NVDIMM this can have a
      significant impact as initialing 3TB on remote node can take up to 39
      seconds while initialing it on a local node only takes 23 seconds. It is
      situations like this where we will see the biggest improvement.
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c37e20ea
    • Alexander Duyck's avatar
      async: Add support for queueing on specific NUMA node · 6be9238e
      Alexander Duyck authored
      Introduce four new variants of the async_schedule_ functions that allow
      scheduling on a specific NUMA node.
      
      The first two functions are async_schedule_near and
      async_schedule_near_domain end up mapping to async_schedule and
      async_schedule_domain, but provide NUMA node specific functionality. They
      replace the original functions which were moved to inline function
      definitions that call the new functions while passing NUMA_NO_NODE.
      
      The second two functions are async_schedule_dev and
      async_schedule_dev_domain which provide NUMA specific functionality when
      passing a device as the data member and that device has a NUMA node other
      than NUMA_NO_NODE.
      
      The main motivation behind this is to address the need to be able to
      schedule device specific init work on specific NUMA nodes in order to
      improve performance of memory initialization.
      
      I have seen a significant improvement in initialziation time for persistent
      memory as a result of this approach. In the case of 3TB of memory on a
      single node the initialization time in the worst case went from 36s down to
      about 26s for a 10s improvement. As such the data shows a general benefit
      for affinitizing the async work to the node local to the device.
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6be9238e
    • Alexander Duyck's avatar
      workqueue: Provide queue_work_node to queue work near a given NUMA node · 8204e0c1
      Alexander Duyck authored
      Provide a new function, queue_work_node, which is meant to schedule work on
      a "random" CPU of the requested NUMA node. The main motivation for this is
      to help assist asynchronous init to better improve boot times for devices
      that are local to a specific node.
      
      For now we just default to the first CPU that is in the intersection of the
      cpumask of the node and the online cpumask. The only exception is if the
      CPU is local to the node we will just use the current CPU. This should work
      for our purposes as we are currently only using this for unbound work so
      the CPU will be translated to a node anyway instead of being directly used.
      
      As we are only using the first CPU to represent the NUMA node for now I am
      limiting the scope of the function so that it can only be used with unbound
      workqueues.
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8204e0c1
    • Alexander Duyck's avatar
      driver core: Probe devices asynchronously instead of the driver · ef0ff683
      Alexander Duyck authored
      Probe devices asynchronously instead of the driver. This results in us
      seeing the same behavior if the device is registered before the driver or
      after. This way we can avoid serializing the initialization should the
      driver not be loaded until after the devices have already been added.
      
      The motivation behind this is that if we have a set of devices that
      take a significant amount of time to load we can greatly reduce the time to
      load by processing them in parallel instead of one at a time. In addition,
      each device can exist on a different node so placing a single thread on one
      CPU to initialize all of the devices for a given driver can result in poor
      performance on a system with multiple nodes.
      
      This approach can reduce the time needed to scan SCSI LUNs significantly.
      The only way to realize that speedup is by enabling more concurrency which
      is what is achieved with this patch.
      
      To achieve this it was necessary to add a new member "async_driver" to the
      device_private structure to store the driver pointer while we wait on the
      deferred probe call.
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef0ff683
    • Alexander Duyck's avatar
      device core: Consolidate locking and unlocking of parent and device · ed88747c
      Alexander Duyck authored
      Try to consolidate all of the locking and unlocking of both the parent and
      device when attaching or removing a driver from a given device.
      
      To do that I first consolidated the lock pattern into two functions
      __device_driver_lock and __device_driver_unlock. After doing that I then
      created functions specific to attaching and detaching the driver while
      acquiring these locks. By doing this I was able to reduce the number of
      spots where we touch need_parent_lock from 12 down to 4.
      
      This patch should produce no functional changes, it is meant to be a code
      clean-up/consolidation only.
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed88747c
    • Alexander Duyck's avatar
      driver core: Establish order of operations for device_add and device_del via bitflag · 3451a495
      Alexander Duyck authored
      Add an additional bit flag to the device_private struct named "dead".
      
      This additional flag provides a guarantee that when a device_del is
      executed on a given interface an async worker will not attempt to attach
      the driver following the earlier device_del call. Previously this
      guarantee was not present and could result in the device_del call
      attempting to remove a driver from an interface only to have the async
      worker attempt to probe the driver later when it finally completes the
      asynchronous probe call.
      
      One additional change added was that I pulled the check for dev->driver
      out of the __device_attach_driver call and instead placed it in the
      __device_attach_async_helper call. This was motivated by the fact that the
      only other caller of this, __device_attach, had already taken the
      device_lock() and checked for dev->driver. Instead of testing for this
      twice in this path it makes more sense to just consolidate the dev->dead
      and dev->driver checks together into one set of checks.
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3451a495
  5. 22 Jan, 2019 14 commits