Commits · 335d5d7bbd3c4fe71ae765ad106d21b39ba85fd1 · nexedi / linux

21 Mar, 2019 9 commits

drm/amd/display: change generic_reg_wait to void. · 335d5d7b

Yongqiang Sun authored Feb 26, 2019

we were only checking the return value in one place, thus changing
generic_reg_wait from int to void and reading the register instead of
getting it from generic_reg_wait, when we need the return value.
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

335d5d7b

drm/amd/display: Reset alpha state for planes to the correct values · eec3d5ef

Nicholas Kazlauskas authored Feb 28, 2019

[Why]
The plane_reset callback is subclassed but hasn't been updated since
the drm helper got updated to include resetting alpha related state
(state->alpha and state->pixel_blend_mode). The overlay planes
exposed by amdgpu_dm were therefore being rendered as invisible by
default ever since supported was exposed for alpha blending properties
on overlays.

This caused regressions in igt@kms_plane_multiple@atomic-tiling-none
and igt@kms_plane@plane-position-covered-pipe tests.

[How]
Reset the plane state values to their correct values as defined in
the drm helper.

This fixes the IGT test regression.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

eec3d5ef

drm/amdgpu: use more entries for the first paging queue · 1d31408a

Christian König authored Mar 06, 2019

To aid recoverable page faults.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1d31408a

drm/amdgpu: free up the first paging queue v2 · 4f8bc72f

Christian König authored Dec 05, 2018

We need the first paging queue to handle page faults.

v2: handle any number of SDMA instances gracefully
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4f8bc72f

drm/amdgpu: re-enable retry faults · f11a13ec

Christian König authored Nov 05, 2018

Now that we have re-reoute faults to the other IH
ring we can enable retries again.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f11a13ec

drm/amdkfd/sriov:Put the pre and post reset in exclusive mode v2 · f81e8d53

Wentao Lou authored Mar 13, 2019

add amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f81e8d53

drm/amd/display: Respect DRM framebuffer info for video surfaces · 1791e54f

Nicholas Kazlauskas authored Mar 13, 2019

[Why]
Incorrect hardcoded assumptions are made regarding luma and chroma
alignment. The actual values set for the DRM framebuffer should be used
when programming the address.

[How]
Respect the given pitch for both luma and chroma planes - it's not like
we can force the alignment to anything else at this point anyway.

Use the FB offset for the chroma planes directly. DRM already
provides this to us so there's no need to calculate it manually.

While we don't actually use the chroma surface size parameters on Raven,
these should have technically been fb->width / 2 and fb->height / 2
since the chroma plane is half size of the luma plane for NV12.

Leave a TODO indicating that those should be set based on the actual
surface format instead since this is only correct for YUV420 formats.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1791e54f

drm/amdgpu: Wait for newly allocated PTs to be idle · 98ae7f98

Felix Kuehling authored Mar 13, 2019

When page table are updated by the CPU, synchronize with the
allocation and initialization of newly allocated page tables.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

98ae7f98

drm/amdgpu: more descriptive message if HMM not enabled · 194f87dd

Philip Yang authored Mar 04, 2019

If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109808Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

194f87dd

19 Mar, 2019 31 commits

drm/amdgpu: support userptr cross VMAs case with HMM · 5aeaccca

Philip Yang authored Mar 04, 2019

userptr may cross two VMAs if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new VMA adjacent to old VMA which was cloned
from parent process, some pages of userptr are in the first VMA, the
rest pages are in the second VMA.

HMM expects range only have one VMA, loop over all VMAs in the address
range, create multiple ranges to handle this case. See
is_mergeable_anon_vma in mm/mmap.c for details.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5aeaccca

drm/amdkfd: support concurrent userptr update for HMM · 386a68e7

Philip Yang authored Mar 04, 2019

Userptr restore may have concurrent userptr invalidation after
hmm_vma_fault adds the range to the hmm->ranges list, needs call
hmm_vma_range_done to remove the range from hmm->ranges list first,
then reschedule the restore worker. Otherwise hmm_vma_fault will add
same range to the list, this will cause loop in the list because
range->next point to range itself.

Add function untrack_invalid_user_pages to reduce code duplication.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

386a68e7

drm/amdgpu: stop evicting busy PDs/PTs · 1bd4e4ca

Christian König authored Nov 07, 2018

Otherwise we won't be able to cleanly handle page faults.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1bd4e4ca

drm/amdgpu: wait for VM to become idle during flush · 56753e73

Christian König authored Jan 10, 2019

Make sure that not only the entities are flush, but that
we also wait for the HW to finish all processing.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

56753e73

drm/amdgpu: remove non-sense NULL ptr check · 3119e7f4

Christian König authored Jan 10, 2019

It's a bug having a dead pointer in the IDR, silently returning
is the worst we can do.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3119e7f4

drm/amdgpu: remove chash · 04ed8459

Christian König authored Nov 07, 2018

Remove the chash implementation for now since it isn't used any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

04ed8459

drm/amdgpu: use ring/hash for fault handling on GMC9 v3 · c1a8abd9

Christian König authored Nov 07, 2018

Further testing showed that the idea with the chash doesn't work as expected.
Especially we can't predict when we can remove the entries from the hash again.

So replace the chash with a ring buffer/hash mix where entries in the container
age automatically based on their timestamp.

v2: use ring buffer / hash mix
v3: check the timeout to make sure all entries age
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c1a8abd9

drm/amdgpu: limit the number of IVs processed at once · 8c65fe5f

Christian König authored Mar 05, 2019

Only process a maximum of 32 IVs before writing back the RPTR. This improves
hw handling when we get close to an overflow in the ring buffer.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8c65fe5f

drm/amdgpu: enable IH ring 1&2 for Vega20 as well · b51cd19e

Christian König authored Mar 04, 2019

That doesn't seem to have any negative effects.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b51cd19e

drm/amdgpu: enable IH doorbell for ring 1&2 on Vega · 1ae64cec

Christian König authored Feb 27, 2019

The doorbells should already be reserved, just enable them.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1ae64cec

drm/amdgpu: change Vega IH ring 1 config · 0133690e

Christian König authored Feb 27, 2019

Disable overflow and enable full drain. This makes fault handling on ring 1
much more reliable since we don't generate back pressure any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0133690e

drm/amdgpu: Only clear dumb buffers if ring is enabled · 46846ba2

Nicholas Kazlauskas authored Mar 11, 2019

The buffers should be cleared when possible but we also don't want
buffer creation to fail in the rare case where the ring isn't ready
during the call. This could happen during some suspend/resume sequences.

Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

46846ba2

drm/amdgpu: Clear VRAM for DRM dumb_create buffers · 95b13468

Nicholas Kazlauskas authored Mar 08, 2019

The dumb_create API isn't intended for high performance rendering
and it's more useful for userspace (ie. IGT) to have them precleared.

The bonus here is that we also won't needlessly leak whatever was
previously in VRAM, but it also probably wasn't sensitive if it was
going through this API.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

95b13468

drm/amdgpu: fix semicolon.cocci warnings · 289d513b

kbuild test robot authored Mar 06, 2019

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:405:2-3: Unneeded semicolon
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:435:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

289d513b

drm/amdgpu: add new ras workflow control flags · 108c6a63

xinhui pan authored Mar 11, 2019

add ras post init function.
Do some initialization after all IP have finished their late init.

Add new member flags which will control the ras work flow.
For now, vbios enable ras for us on boot. That might change in the
future.
So there should be a flag from vbios to tell us if ras is enabled or not
on boot. Looks like there is no such info now.

Other bits of the flags are reserved to control other parts of ras.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

108c6a63

drm/amdgpu: let ras initialization a little noticeable · 5d0f903f

xinhui pan authored Mar 12, 2019

add drm info output if ras initialized successfully.
add ras atomfirmware sanity check.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5d0f903f

drm/amdgpu: Fix lockdep warning more gracely · 163def43

xinhui pan authored Mar 11, 2019

lockdep need a static key.
Previously we set ignore bit to avoid the warning.
Now call sysfs_attr_init to initialize the static key.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-and-Tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

163def43

drm/amdgpu: Fix ras debugfs data parse · b076296b

xinhui pan authored Mar 11, 2019

Unzero char is accepted by sscanf, so when data is structure but
unexpectedly return error invalid;
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b076296b

drm/amdgpu: add new member hw_supported · 5caf466a

xinhui pan authored Mar 11, 2019

Currently, it is not clear how ras is supported. Both software and
hardware can set the supported. That is confusing.

Fix it by adding new member hw_supported.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5caf466a

drm/amdgpu: Fix warning when lockdep is enabled · 2b9505e3

xinhui pan authored Mar 11, 2019

Set ignore bit to satisfy locpdep.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2b9505e3

drm/amdgpu: Fix NULL pointer when ta is missing · 54eb4ed6

xinhui pan authored Mar 11, 2019

Ta is optional, so check if ta firmware is loaded or not.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

54eb4ed6

drm/amdgpu: fix ras parameter descriptions · 2f3940e9

Evan Quan authored Mar 07, 2019

The descriptions of modinfo wrongly show two parameters
for each feature(see below). This patch can fix this
incorrect outputs.

parm:           amdgpu_ras_enable:Enable RAS features on the GPU (0 = disable, 1 = enable, -1 = auto (default))
parm:           ras_enable:int
parm:           amdgpu_ras_mask:Mask of RAS features to enable (default 0xffffffff), only valid when ras_enable == 1
parm:           ras_mask:uint
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2f3940e9

drm/amdgpu: export both supported and enabled ras features · 1febb00e

xinhui pan authored Mar 07, 2019

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1febb00e

drm/amdgpu: lookup vbios table to check ecc capability · b404ae82

xinhui pan authored Mar 07, 2019

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b404ae82

drm/amdgpu: query sram ecc/ecc availability from atombios · f49ea9f8

Hawking Zhang authored Mar 07, 2019

query sram ecc capability via amdgpu_atomfirmware_ecc_default_enabled
query ecc availability via amdgpu_atomfirmware_sram_ecc_supported
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f49ea9f8

drm/amdgpu: add atomfirmware helper function to query sram ecc caps · 8b6da23f

Hawking Zhang authored Mar 07, 2019

sram ecc capability could be get from firmware_capability field in firmwareinfo table
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8b6da23f

drm/amdgpu: add atomfirmware helper function to query ecc status · 511c4348

Hawking Zhang authored Mar 07, 2019

ecc default status (enabled or disabled) could be get from umc_config field in umc_info table
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

511c4348

drm/amdgpu: update atomfirmware header with ecc related members · ed606ca3

Hawking Zhang authored Mar 07, 2019

add new umc_info structures and new firmware_capability defines
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ed606ca3

drm/amdgpu: handle ras resume · acbbee01

xinhui pan authored Mar 07, 2019

Suspend will put irq, so resume need get irq back.
And in the same time, skip other ras initialization.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

acbbee01

drm/amdkfd: add RAS ECC event support (v3) · 9b54d201

Eric Huang authored Jan 11, 2019

RAS ECC event will combine with GPU reset event, due to
ECC interrupts are caused by uncorrectable error that triggers
GPU reset.

v2: Fix misleading-indentation warning
v3: fix build with CONFIG_HSA_AMD disabled
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9b54d201

drm/amdkfd: add RAS capabilities in topology for Vega20 (v2) · 0dee45a2

Eric Huang authored Jan 11, 2019

It is to collaborate with HSA_CAPABILITY in libhsakmt.

v2: squash in NULL pointer check
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0dee45a2