Commits · cd05b51aaa6ea6f7e4c22802e3f2703ac6087912 · Kirill Smelkov / linux

03 Dec, 2019 12 commits

drm/amdgpu: should stop GFX ring in hw_fini · cd05b51a

Monk Liu authored Nov 29, 2019

To align with the scheme from gfx9

disabling GFX ring after VM shutdown could avoid
garbage data be fetched to GFX RB which may lead
to unnecessary screw up on GFX
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cd05b51a

drm/amdgpu: do autoload right after MEC loaded for SRIOV VF · dacf56e4

Monk Liu authored Nov 26, 2019

since we don't have RLCG ucode loading and no SRlist as well
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dacf56e4

drm/amdgpu: skip rlc ucode loading for SRIOV gfx10 · 6294017f

Monk Liu authored Nov 26, 2019

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6294017f

drm/amdgpu: fix calltrace during kmd unload(v3) · 747d4f71

Monk Liu authored Nov 26, 2019

issue:
kernel would report a warning from a double unpin
during the driver unloading on the CSB bo

why:
we unpin it during hw_fini, and there will be another
unpin in sw_fini on CSB bo.

fix:
actually we don't need to pin/unpin it during
hw_init/fini since it is created with kernel pinned,
we only need to fullfill the CSB again during hw_init
to prevent CSB/VRAM lost after S3

v2:
get_csb in init_rlc so hw_init() will make CSIB content
back even after reset or s3

v3:
use bo_create_kernel instead of bo_create_reserved for CSB
otherwise the bo_free_kernel() on CSB is not aligned and
would lead to its internal reserve pending there forever

take care of gfx7/8 as well
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

747d4f71

drm/amdgpu/gfx10: unlock srbm_mutex after queue programming finish · fa2b93e3

Xiaojie Yuan authored Nov 06, 2019

srbm_mutex is to guarantee atomicity for r/w of gfx indexed registers
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fa2b93e3

drm/amdgpu: Added ASIC specific checks in gfxhub V1.1 get XGMI info · f0312f45

John Clements authored Dec 02, 2019

Added max hive/node info checks per supported ASIC
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f0312f45

drm/amdgpu/powerplay: unify smu send message function · 76d8f83b

Likun Gao authored Dec 02, 2019

Drop smu_send_smc_msg function from ASIC specify structure.
Reuse smu_send_smc_msg_with_param function for smu_send_smc_msg.
Set paramer to 0 for smu_send_msg function, otherwise it will send
with previous paramer value (Not a certain value).
Materialize msg type for smu send message function definition.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

76d8f83b

drm/amd/display: re-enable wait in pipelock, but add timeout · 627f75d1

Alex Deucher authored Nov 15, 2019

Removing this causes hangs in some games, so re-add it, but add
a timeout so we don't hang while switching flip types.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205169
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=112266Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

627f75d1

drm/amd/display: Get NV14 specific ip params as needed · 30c51773

Zhan liu authored Dec 02, 2019

[Why]
NV14 is using its own ip params that's different from other
DCN2.0 ASICs.

[How]
Add ASIC revision check to make sure NV14 gets correct
ip params.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

30c51773

drm/amd/display: Adding NV14 IP Parameters · 516fb68d

Zhan liu authored Dec 02, 2019

[Why]
NV14 IP Parameters are missing.

[How]
Add IP Parameters in.
Signed-off-by: Zhan liu <zhan.liu@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

516fb68d

drm/amd/display: Include num_vmid and num_dsc within NV14's resource caps · c3d03c5a

Zhan Liu authored Nov 28, 2019

[Why]
"num_vmid" and "num_dsc" are missing within NV14's resource caps structure.

[How]
Add the missing parts.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c3d03c5a

drm/amdgpu: use CPU to flush vmhub if sched stopped · e2195f7d

Monk Liu authored Nov 26, 2019

otherwse the flush_gpu_tlb will hang if we unload the
KMD becuase the schedulers already stopped
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e2195f7d

26 Nov, 2019 6 commits

amdgpu: Enable KFD on POWER systems · c38402fe

Timothy Pearson authored Nov 24, 2019

KFD has been verified to function on POWER systems (Talos II / Vega 64).
It should be available as a kernel configuration option on these systems.
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c38402fe

drm/amdgpu: Optimize KFD page table reservation · 29a39c90

Felix Kuehling authored Jul 15, 2019

Be less pessimistic about estimated page table use for KFD. Most
allocations use 2MB pages and therefore need less VRAM for page
tables. This allows more VRAM to be used for applications especially
on large systems with many GPUs and hundreds of GB of system memory.

Example: 8 GPUs with 32GB VRAM each + 256GB system memory = 512GB
Old page table reservation per GPU:  1GB
New page table reservation per GPU: 32MB
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

29a39c90

drm/amdgpu: flag vram lost on baco reset for VI/CIK · dea8b900

Alex Deucher authored Nov 25, 2019

VI/CIK BACO was inflight when this fix landed for SOC15/NV.
Add the fix to VI/CIK as well.
Acked-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dea8b900

MAINTAINERS: Drop Rex Zhu for amdgpu powerplay · a0c2a84d

Alex Deucher authored Nov 22, 2019

No longer works on the driver.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a0c2a84d

drm/amdgpu: Resolved offchip EEPROM I/O issue · 5985ebbe

John Clements authored Nov 25, 2019

Updated target I2C address
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5985ebbe

drm/amd/display: add default clocks if not able to fetch them · 94662169

Alex Deucher authored Nov 19, 2019

dm_pp_get_clock_levels_by_type needs to add the default clocks
to the powerplay case as well. This was accidently dropped.

Fixes: b3ea88fe ("drm/amd/powerplay: add get_clock_by_type interface for display")
Bug: https://gitlab.freedesktop.org/drm/amd/issues/906Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

94662169

25 Nov, 2019 2 commits

Merge tag 'drm-next-5.5-2019-11-22' of git://people.freedesktop.org/~agd5f/linux into drm-next · acc61b89

Dave Airlie authored Nov 26, 2019

drm-next-5.5-2019-11-22:

amdgpu:
- Fix bad DMA on some PPC platforms
- MMHUB fix for powergating
- BACO fix for Navi
- Misc raven fixes
- Enable vbios fetch directly from rom on navi
- debugfs fix for DC
- SR-IOV fixes for arcturus
- Misc power fixes

radeon:
- Fix bad DMA on some PPC platforms
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122203025.3787-1-alexander.deucher@amd.com

acc61b89

Merge tag 'drm-intel-next-fixes-2019-11-22' of... · e639ea0f

Dave Airlie authored Nov 26, 2019

Merge tag 'drm-intel-next-fixes-2019-11-22' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

- Reverts a patch to avoid spinning forever when context's timeline
  is active but has no requests
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122155523.GA20167@jlahtine-desk.ger.corp.intel.com

e639ea0f

22 Nov, 2019 20 commits

drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10 · f920d1bb

changzhu authored Nov 19, 2019

It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.

After adding semaphore acquire before invalidation, the semaphore
register become read-only if another process try to acquire semaphore.
Then it will not be able to release this semaphore. Then it may cause
deadlock problem. If this deadlock problem happens, it needs a semaphore
firmware fix.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

f920d1bb

drm/amdgpu: initialize vm_inv_eng0_sem for gfxhub and mmhub · 6c2c8972

changzhu authored Nov 19, 2019

SW must acquire/release one of the vm_invalidate_eng*_sem around the
invalidation req/ack. Through this way,it can avoid losing invalidate
acknowledge state across power-gating off cycle.
To use vm_invalidate_eng*_sem, it needs to initialize
vm_invalidate_eng*_sem firstly.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

6c2c8972

drm/amd/amdgpu/sriov skip RLCG s/r list for arcturus VF. · 1b34de7c

Jack Zhang authored Nov 21, 2019

After rlcg fw 2.1, kmd driver starts to load extra fw for
LIST_CNTL,GPM_MEM,SRM_MEM. We needs to skip the three fw
because all rlcg related fw have been loaded by host driver.
Guest driver would load the three fw fail without this change.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1b34de7c

drm/amd/amdgpu/sriov temporarily skip ras,dtm,hdcp for arcturus VF · ef1c0cbc

Jack Zhang authored Nov 21, 2019

Temporarily skip ras,dtm,hdcp initialize and terminate for arcturus VF
Currently the three features haven't been enabled at SRIOV, it would
trigger guest driver load fail with the bare-metal path of the three
features.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ef1c0cbc

drm/amdgpu/gfx10: re-init clear state buffer after gpu reset · 210b3b3c

Xiaojie Yuan authored Nov 20, 2019

This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.

clear state buffer (resides in vram) is corrupted after 1st baco reset,
upon gfxoff exit, CPF gets garbage header in CSIB and hangs.
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

210b3b3c

merge fix for "ftrace: Rework event_create_dir()" · a3511321

Stephen Rothwell authored Nov 21, 2019

Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a3511321

drm/amdgpu: Update Arcturus golden registers · 57fb0ab2

Jay Cornwall authored Nov 20, 2019

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

57fb0ab2

drm/amdgpu/gfx10: fix out-of-bound mqd_backup array access · 908a28be

Xiaojie Yuan authored Nov 22, 2019

Fixes: 0900a9ef ("drm/amdgpu/gfx10: fix mqd backup/restore for gfx rings (v2)")
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

908a28be

drm/amdgpu/gfx10: explicitly wait for cp idle after halt/unhalt · 1e902a6d

Xiaojie Yuan authored Nov 14, 2019

50us is not enough to wait for cp ready after gpu reset on some navi asics.
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Suggested-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

1e902a6d

Revert "drm/amd/display: enable S/G for RAVEN chip" · 5e18d2b1

Alex Deucher authored Nov 15, 2019

This reverts commit 1c425915.

S/G display is not stable with the IOMMU enabled on some
platforms.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205523Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5e18d2b1

drm/amdgpu: disable gfxoff on original raven · 8fc41344

Alex Deucher authored Nov 15, 2019

There are still combinations of sbios and firmware that
are not stable.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204689Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8fc41344

drm/amdgpu: remove experimental flag for Navi14 · 5355d7e0

Alex Deucher authored Nov 15, 2019

5.4 and newer works fine with navi14.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5355d7e0

drm/amdgpu: disable gfxoff when using register read interface · 70f7eb63

Alex Deucher authored Nov 14, 2019

When gfxoff is enabled, accessing gfx registers via MMIO
can lead to a hang.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205497Acked-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

70f7eb63

drm/amdgpu/powerplay: properly set PP_GFXOFF_MASK (v2) · dda0f455

Alex Deucher authored Nov 13, 2019

So that the setting reflects what the hw supports. This will
be used in a subsequent patch so needs to be correct.

v2: squash in fix from Colin Ian King

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205497Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dda0f455

drm/amdgpu: fix bad DMA from INTERRUPT_CNTL2 · 3d0e3ce5

Sam Bobroff authored Nov 18, 2019

The INTERRUPT_CNTL2 register expects a valid DMA address, but is
currently set with a GPU MC address.  This can cause problems on
systems that detect the resulting DMA read from an invalid address
(found on a Power8 guest).

Instead, use the DMA address of the dummy page because it will always
be safe.

Fixes: 27ae1064 ("drm/amdgpu: add interupt handler implementation for si v3")
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3d0e3ce5

drm/radeon: fix bad DMA from INTERRUPT_CNTL2 · 62d91dd2

Sam Bobroff authored Nov 18, 2019

The INTERRUPT_CNTL2 register expects a valid DMA address, but is
currently set with a GPU MC address.  This can cause problems on
systems that detect the resulting DMA read from an invalid address
(found on a Power8 guest).

Instead, use the DMA address of the dummy page because it will always
be safe.

Fixes: d8f60cfc ("drm/radeon/kms: Add support for interrupts on r6xx/r7xx chips (v3)")
Fixes: 25a857fb ("drm/radeon/kms: add support for interrupts on SI")
Fixes: a59781bb ("drm/radeon: add support for interrupts on CIK (v5)")
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

62d91dd2

drm/amd/display: Fix debugfs on MST connectors · e3dd3aa8

Mikita Lipski authored Oct 31, 2019

[why]
Previous patch allowed to initialize debugfs entries on both MST
and SST connectors, but MST connectors get registered much later
which exposed an issue of debugfs entries being initialized in the
same folder.

[how]
Return SST debugfs entries' initialization back to where it was.
For MST connectors we should initialize debugfs entries in connector
register function after the connector is registered.
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e3dd3aa8

drm/amdgpu/nv: add asic func for fetching vbios from rom directly · f8a69a80

Alex Deucher authored Nov 13, 2019

Needed as a fallback if the vbios can't be fetched by other means.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f8a69a80

drm/amdgpu: put flush_delayed_work at first · c0e21ea1

Yintian Tao authored Nov 18, 2019

There is one regression from 042f3d7b745cd76aa
To put flush_delayed_work after adev->shutdown = true
which will make amdgpu_ih_process not response the irq
At last, all ib ring tests will be failed just like below

[drm] amdgpu: finishing device.
[drm] Fence fallback timer expired on ring gfx
[drm] Fence fallback timer expired on ring comp_1.0.0
[drm] Fence fallback timer expired on ring comp_1.1.0
[drm] Fence fallback timer expired on ring comp_1.2.0
[drm] Fence fallback timer expired on ring comp_1.3.0
[drm] Fence fallback timer expired on ring comp_1.0.1
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.1.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.2.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.3.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma0 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on uvd_enc_0.0 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vce0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).

v2: replace cancel_delayed_work_sync() with flush_delayed_work()
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c0e21ea1

drm/amdgpu/vcn2.5: fix the enc loop with hw fini · 4e20f655

Leo Liu authored Nov 15, 2019

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4e20f655