1. 16 Jul, 2014 8 commits
    • Ben Goz's avatar
      amdkfd: Add kernel queue module · ed6e6a34
      Ben Goz authored
      The kernel queue module enables the amdkfd to establish kernel queues, not
      exposed to user space.
      
      The kernel queues are used for HIQ (HSA Interface Queue) and DIQ (Debug
      Interface Queue) operations
      
      v3: Removed use of internal typedefs and added use of the new gart allocation
      functions
      
      v4: Fixed a miscalculation in kernel queue wrapping
      
      v5:
      
      Move amdkfd from drm/radeon/ to drm/amd/
      Change format of mqd structure to match latest KV firmware
      Add support for AQL queues creation to enable working with open-source HSA
      runtime
      Add define for kernel queue size
      Various fixes
      Signed-off-by: default avatarBen Goz <ben.goz@amd.com>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      ed6e6a34
    • Ben Goz's avatar
      amdkfd: Add mqd_manager module · 6e99df57
      Ben Goz authored
      The mqd_manager module handles MQD data structures.
      MQD stands for Memory Queue Descriptor, which is used by the H/W to
      keep the usermode queue state in memory.
      
      v3:
      
      Removed new typedefs
      Removed pragma pack 4
      Remove cik_mqds.h file
      Changed lower_32/upper_32 calls to use linux macros
      Used new gart allocation functions
      Added documentation
      
      v4:
      
      Added missing initialization of the addr field in init_mqd()
      
      Setting the hqd persistent.preload_req bit ON so that when queues switches
      on/off, their context will kept and read from the mqd when the cp reassign
      them, and thus the dispatched workload context kept consistent without any
      interrupts.
      
      v5:
      
      Move amdkfd from drm/radeon/ to drm/amd/
      Change format of mqd structure to match latest KV firmware
      Add support for AQL queues creation to enable working with open-source HSA
      runtime.
      Various fixes
      Signed-off-by: default avatarBen Goz <ben.goz@amd.com>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      6e99df57
    • Ben Goz's avatar
      amdkfd: Add queue module · ed8aab45
      Ben Goz authored
      The queue module enables allocating and initializing queues uniformly.
      
      v3: Removed typedef and redundant memset call. Broke long pr_debug print to one
      liners and Added documentation.
      
      v5: Move amdkfd from drm/radeon/ to drm/amd/
      Signed-off-by: default avatarBen Goz <ben.goz@amd.com>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      ed8aab45
    • Oded Gabbay's avatar
      amdkfd: Add binding/unbinding calls to amd_iommu driver · b17f068a
      Oded Gabbay authored
      This patch adds the functions to bind and unbind pasid
      from a device through the amd_iommu driver.
      
      The unbind function is called when the mm_struct of the
      process is released.
      
      The bind function is not called here because it is called
      only in the IOCTLs which are not yet implemented at this
      stage of the patchset.
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      b17f068a
    • Oded Gabbay's avatar
      amdkfd: Add basic modules to amdkfd · 19f6d2a6
      Oded Gabbay authored
      This patch adds the process module and three helper modules:
      
      - kfd_process, which handles process which open /dev/kfd
      
      - kfd_doorbell, which provides helper functions for doorbell allocation,
        release and mapping to userspace
      
      - kfd_pasid, which provides helper functions for pasid allocation and release
      
      - kfd_aperture, which provides helper functions for managing the LDS, Local GPU
        memory and Scratch memory apertures of the process
      
      This patch only contains the basic kfd_process module, which doesn't contain
      the reference to the queue scheduler. This was done to allow easier code review.
      
      Also, this patch doesn't contain the calls to the IOMMU driver for binding the
      pasid to the device. Again, this was done to allow easier code review
      
      The kfd_process object is created when a process opens /dev/kfd and is closed
      when the mm_struct of that process is teared-down.
      
      v3:
      
      Removed kfd_vidmem.c file
      Replaced direct mmput call to mmu_notifier release
      Removed typedefs
      Moved bool field to end of the structure
      Added new kernel params for gart usage limitation
      Added initialization of sa manager
      Fixed debug messages
      Remove support for LDS in 32 bit
      Changed code to support mmap of doorbell pages from userspace
      Added documentation for apertures
      
      v4: Replaced RCU by SRCU for kfd_process list management
      
      v5:
      
      Move amdkfd from drm/radeon/ to drm/amd/
      Rename kfd_aperture.c to kfd_flat_memory.c
      Protect against multiple init calls
      MQD size is H/W dependent so moved it to device info structure
      Rename kfd_mem_obj structure's members
      Use delayed function for process tear-down
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      19f6d2a6
    • Evgeny Pinchuk's avatar
      amdkfd: Add topology module to amdkfd · 5b5c4e40
      Evgeny Pinchuk authored
      This patch adds the topology module to the driver. The topology is exposed to
      userspace through the sysfs.
      
      The calls to add and remove a device to/from topology are done by the radeon
      driver.
      
      v3:
      
      The CPU information, that is provided in the topology section of the amdkfd
      driver, is extracted from the CRAT table. Unlike the CPU information located
      in /sys/devices/system/cpu/cpu*, which is extracted from the SRAT table.
      
      While the CPU information provided by the CRAT and the SRAT tables might be
      identical, the node topology might be different. The SRAT table contains the
      topology of CPU nodes only. The CRAT table contains the topology of CPU and GPU
      nodes together (and can be interleaved). For example CPU node 1 in SRAT can be
      CPU node 3 in CRAT. Furthermore it's worth to mention that the CRAT table
      contains only HSA compatible nodes (nodes which are compliant with the HSA
      spec).
      
      To recap, amdkfd exposes a different kind of topology than the one exposed by
      /sys/devices/system/cpu/cpu even though it may contain similar information.
      
      v4:
      
      The topology module doesn't support uevent handling and doesn't notify the
      userspace about runtime modifications. It is up to the userspace to acquire
      snapshots of the topology information created by the amdkfd and exposed
      in sysfs.
      
      The following is an example of how the topology looks on a Kaveri A10-7850K
      system with amdkfd installed:
      
      /sys/devices/virtual/kfd/kfd/
      |
      --- topology/
            |
            |--- generation_id
            |--- system_properties
            |--- nodes/
                  |
                  |--- 0/
                       |
                       |--- gpu_id
                       |--- name
                       |--- properties
                       |--- caches/
                            |
                            |--- 0/
                                 |
                                 |--- properties
                            |--- 1/
                                 |
                                 |--- properties
                            |--- 2/
                                 |
                                 |--- properties
                       |--- io_links/
                            |
                       |--- mem_banks/
                            |
                            |--- 0/
                                 |
                                 |--- properties
                            |--- 1/
                                 |
                                 |--- properties
                            |--- 2/
                                 |
                                 |--- properties
                            |--- 3/
                                 |
                                 |--- properties
      
      v5:
      
      Move amdkfd from drm/radeon/ to drm/amd/
      
      Add a check if dev->gpu pointer is null before accessing it in the
      node_show function in kfd_topology.c
      This situation may occur when amdkfd is loaded and there is a GPU with a CRAT
      table, but that GPU isn't supported by amdkfd
      Signed-off-by: default avatarEvgeny Pinchuk <evgeny.pinchuk@amd.com>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      5b5c4e40
    • Oded Gabbay's avatar
      amdkfd: Add amdkfd skeleton driver · 4a488a7a
      Oded Gabbay authored
      This patch adds the amdkfd skeleton driver. The driver does nothing except
      define a /dev/kfd device.
      
      It returns -ENODEV on all amdkfd IOCTLs.
      
      v3: Move bool field to the end of structure, removed the pmc ioctls and added
      a meaningful error message for ioctl error.
      
      v5:
      
      Create a new folder drm/amd and move amdkfd from drm/radeon/ to drm/amd/
      Remove scheduler_class from kfd_priv.h as it was never used
      Add skeleton implementation of the Get Version IOCTL
      
      v6:
      Update module version to the correct number and remove the "default m" from the
      Kconfig file
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      4a488a7a
    • Oded Gabbay's avatar
      amdkfd: Add IOCTL set definitions of amdkfd · b7facbae
      Oded Gabbay authored
      - KFD_IOC_GET_VERSION:
      	Retrieves the interface version of amdkfd
      
      - KFD_IOC_CREATE_QUEUE:
      	Creates a usermode queue that runs on a specific GPU device
      
      - KFD_IOC_DESTROY_QUEUE:
      	Destroys an existing usermode queue
      
      - KFD_IOC_SET_MEMORY_POLICY:
      	Sets the memory policy of the default and alternate aperture of the
              calling process
      
      - KFD_IOC_GET_CLOCK_COUNTERS:
      	Retrieves counters (timestamps) of CPU and GPU
      
      - KFD_IOC_GET_PROCESS_APERTURES:
      	Retrieves information about process apertures that were initialized
              during the open() call of the amdkfd device
      
      - KFD_IOC_UPDATE_QUEUE:
      	Updates configuration of an existing usermode queue
      
      v3: Remove pragma pack and pmc ioctls. Added parameter for doorbell offset and
      a comment on counters
      
      v5:
      
      Add define for AQL queues.
      Fix arguments of Get Version IOCTL
      Make IOCTL's structures to be the same size on 32/64 bit
      
      v6: Change the version of the amdkfd-thunk interface
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      b7facbae
  2. 15 Jul, 2014 2 commits
    • Oded Gabbay's avatar
      Update MAINTAINERS and CREDITS files with amdkfd info · 16423d67
      Oded Gabbay authored
      v6: Update entries to reflect new name & location of driver
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      16423d67
    • Oded Gabbay's avatar
      drm/radeon: Add radeon <--> amdkfd interface · e28740ec
      Oded Gabbay authored
      This patch adds the interface between the radeon driver and the amdkfd driver.
      The interface implementation is contained in radeon_kfd.c and radeon_kfd.h.
      
      The interface itself is represented by a pointer to struct
      kfd_dev. The pointer is located inside radeon_device structure.
      
      All the register accesses that amdkfd need are done using this interface. This
      allows us to avoid direct register accesses in amdkfd proper,  while also
      avoiding locking between amdkfd and radeon.
      
      The single exception is the doorbells that are used in both of the drivers.
      However, because they are located in separate pci bar pages, the danger of
      sharing registers between the drivers is minimal.
      
      Having said that, we are planning to move the doorbells as well to radeon.
      
      v3:
      
      Add interface for sa manager init and fini. The init function will allocate a
      buffer on system memory and pin it to the GART address space via the radeon sa
      manager.
      
      All mappings of buffers to GART address space are done via the radeon sa
      manager. The interface of allocate memory will use the radeon sa manager to sub
      allocate from the single buffer that was allocated during the init function.
      
      Change lower_32/upper_32 calls to use linux macros
      
      Add documentation for the interface
      
      v4:
      
      Change ptr field type in kgd_mem from uint32_t* to void* to match to type that
      is returned by radeon_sa_bo_cpu_addr
      
      v5:
      
      Change format of mqd structure to work with latest KV firmware
      Add support for AQL queues creation to enable working with open-source HSA
      runtime.
      Move generic kfd-->kgd interface and other generic kgd definitions to a generic
      header file that will be used by AMD's radeon and amdgpu drivers
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      e28740ec
  3. 14 Jul, 2014 1 commit
  4. 28 Jan, 2014 1 commit
  5. 11 Feb, 2014 1 commit
  6. 16 Jan, 2014 1 commit
  7. 10 Nov, 2014 1 commit
    • Oded Gabbay's avatar
      iommu/amd: fix accounting of device_state · a015c1e9
      Oded Gabbay authored
      This patch fixes a bug in the accounting of the device_state.
      In the current code, the device_state was put (decremented) too many times,
      which sometimes lead to the driver getting stuck permanently in
      put_device_state_wait(). That happen because the device_state->count would go
      below zero, which is never supposed to happen.
      
      The root cause is that the device_state was decremented in put_pasid_state()
      and put_pasid_state_wait() but also in all the functions that call those
      functions. Therefore, the device_state was decremented twice in each of these
      code paths.
      
      The fix is to decouple the device_state accounting from the pasid_state
      accounting - remove the call to put_device_state() from the
      put_pasid_state() and the put_pasid_state_wait())
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      a015c1e9
  8. 13 Nov, 2014 4 commits
    • Joerg Roedel's avatar
      iommu/amd: use new invalidate_range mmu-notifier · e7cc3dd4
      Joerg Roedel authored
      Make use of the new invalidate_range mmu_notifier call-back and remove the
      old logic of assigning an empty page-table between invalidate_range_start
      and invalidate_range_end.
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Tested-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Jay Cornwall <Jay.Cornwall@amd.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      e7cc3dd4
    • Joerg Roedel's avatar
      mmu_notifier: add the callback for mmu_notifier_invalidate_range() · 0f0a327f
      Joerg Roedel authored
      Now that the mmu_notifier_invalidate_range() calls are in place, add the
      callback to allow subsystems to register against it.
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Jay Cornwall <Jay.Cornwall@amd.com>
      Cc: Oded Gabbay <Oded.Gabbay@amd.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      0f0a327f
    • Joerg Roedel's avatar
      mmu_notifier: call mmu_notifier_invalidate_range() from VMM · 34ee645e
      Joerg Roedel authored
      Add calls to the new mmu_notifier_invalidate_range() function to all
      places in the VMM that need it.
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Jay Cornwall <Jay.Cornwall@amd.com>
      Cc: Oded Gabbay <Oded.Gabbay@amd.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      34ee645e
    • Joerg Roedel's avatar
      mmu_notifier: add mmu_notifier_invalidate_range() · 1897bdc4
      Joerg Roedel authored
      This notifier closes an important gap in the current mmu_notifier
      implementation, the existing callbacks are called too early or too late to
      reliably manage a non-CPU TLB.  Specifically, invalidate_range_start() is
      called when all pages are still mapped and invalidate_range_end() when all
      pages are unmapped and potentially freed.
      
      This is fine when the users of the mmu_notifiers manage their own SoftTLB,
      like KVM does.  When the TLB is managed in software it is easy to wipe out
      entries for a given range and prevent new entries to be established until
      invalidate_range_end is called.
      
      But when the user of mmu_notifiers has to manage a hardware TLB it can
      still wipe out TLB entries in invalidate_range_start, but it can't make
      sure that no new TLB entries in the given range are established between
      invalidate_range_start and invalidate_range_end.
      
      To avoid silent data corruption the entries in the non-CPU TLB need to be
      flushed when the pages are unmapped (at this point in time no _new_ TLB
      entries can be established in the non-CPU TLB) but not yet freed (as the
      non-CPU TLB may still have _existing_ entries pointing to the pages about
      to be freed).
      
      To fix this problem we need to catch the moment when the Linux VMM flushes
      remote TLBs (as a non-CPU TLB is not very CPU TLB), as this is the point
      in time when the pages are unmapped but _not_ yet freed.
      
      The mmu_notifier_invalidate_range() function aims to catch that moment.
      
      IOMMU code will be one user of the notifier-callback.  Currently this is
      only the AMD IOMMUv2 driver, but its code is about to be more generalized
      and converted to a generic IOMMU-API extension to fit the needs of similar
      functionality in other IOMMUs as well.
      
      The current attempt in the AMD IOMMUv2 driver to work around the
      invalidate_range_start/end() shortcoming is to assign an empty page table
      to the non-CPU TLB between any invalidata_range_start/end calls.  With the
      empty page-table assigned, every page-table walk to re-fill the non-CPU
      TLB will cause a page-fault reported to the IOMMU driver via an interrupt,
      possibly causing interrupt storms.
      
      The page-fault handler in the AMD IOMMUv2 driver doesn't handle the fault
      if an invalidate_range_start/end pair is active, it just reports back
      SUCCESS to the device and let it refault the page.  But existing hardware
      (newer Radeon GPUs) that makes use of this feature don't re-fault
      indefinitly, after a certain number of faults for the same address the
      device enters a failure state and needs to be resetted.
      
      To avoid the GPUs entering a failure state we need to get rid of the
      empty-page-table workaround and use the mmu_notifier_invalidate_range()
      function introduced with this patch.
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Jay Cornwall <Jay.Cornwall@amd.com>
      Cc: Oded Gabbay <Oded.Gabbay@amd.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      1897bdc4
  9. 12 Nov, 2014 21 commits