1. 25 Jul, 2024 3 commits
  2. 24 Jul, 2024 8 commits
    • Philip Yang's avatar
      drm/amdkfd: Validate queue cwsr area and eop buffer size · 629568d2
      Philip Yang authored
      When creating KFD user compute queue, check if queue eop buffer size,
      cwsr area size, ctl stack size equal to the size of KFD node
      properities.
      
      Check the entire cwsr area which may split into multiple svm ranges
      aligned to granularity boundary.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      629568d2
    • Philip Yang's avatar
      drm/amdkfd: Store queue cwsr area size to node properties · 517fff22
      Philip Yang authored
      Use the queue eop buffer size, cwsr area size, ctl stack size
      calculation from Thunk, store the value to KFD node properties.
      
      Those will be used to validate queue eop buffer size, cwsr area size,
      ctl stack size when creating KFD user compute queue.
      
      Those will be exposed to user space via sysfs KFD node properties, to
      remove the duplicate calculation code from Thunk.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      517fff22
    • ZhenGuo Yin's avatar
      drm/amdgpu: reset vm state machine after gpu reset(vram lost) · 47c0388b
      ZhenGuo Yin authored
      [Why]
      Page table of compute VM in the VRAM will lost after gpu reset.
      VRAM won't be restored since compute VM has no shadows.
      
      [How]
      Use higher 32-bit of vm->generation to record a vram_lost_counter.
      Reset the VM state machine when vm->genertaion is not equal to
      the new generation token.
      
      v2: Check vm->generation instead of calling drm_sched_entity_error
      in amdgpu_vm_validate.
      v3: Use new generation token instead of vram_lost_counter for check.
      Signed-off-by: default avatarZhenGuo Yin <zhenguo.yin@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      47c0388b
    • Srinivasan Shanmugam's avatar
      drm/amd/display: Add null check for set_output_gamma in dcn30_set_output_transfer_func · 08ae395e
      Srinivasan Shanmugam authored
      This commit adds a null check for the set_output_gamma function pointer
      in the  dcn30_set_output_transfer_func function. Previously,
      set_output_gamma was being checked for nullity at line 386, but then it
      was being dereferenced without any nullity check at line 401. This
      could potentially lead to a null pointer dereference error if
      set_output_gamma is indeed null.
      
      To fix this, we now ensure that set_output_gamma is not null before
      dereferencing it. We do this by adding a nullity check for
      set_output_gamma before the call to set_output_gamma at line 401. If
      set_output_gamma is null, we log an error message and do not call the
      function.
      
      This fix prevents a potential null pointer dereference error.
      
      drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c:401 dcn30_set_output_transfer_func()
      error: we previously assumed 'mpc->funcs->set_output_gamma' could be null (see line 386)
      
      drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c
          373 bool dcn30_set_output_transfer_func(struct dc *dc,
          374                                 struct pipe_ctx *pipe_ctx,
          375                                 const struct dc_stream_state *stream)
          376 {
          377         int mpcc_id = pipe_ctx->plane_res.hubp->inst;
          378         struct mpc *mpc = pipe_ctx->stream_res.opp->ctx->dc->res_pool->mpc;
          379         const struct pwl_params *params = NULL;
          380         bool ret = false;
          381
          382         /* program OGAM or 3DLUT only for the top pipe*/
          383         if (pipe_ctx->top_pipe == NULL) {
          384                 /*program rmu shaper and 3dlut in MPC*/
          385                 ret = dcn30_set_mpc_shaper_3dlut(pipe_ctx, stream);
          386                 if (ret == false && mpc->funcs->set_output_gamma) {
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If this is NULL
      
          387                         if (stream->out_transfer_func.type == TF_TYPE_HWPWL)
          388                                 params = &stream->out_transfer_func.pwl;
          389                         else if (pipe_ctx->stream->out_transfer_func.type ==
          390                                         TF_TYPE_DISTRIBUTED_POINTS &&
          391                                         cm3_helper_translate_curve_to_hw_format(
          392                                         &stream->out_transfer_func,
          393                                         &mpc->blender_params, false))
          394                                 params = &mpc->blender_params;
          395                          /* there are no ROM LUTs in OUTGAM */
          396                         if (stream->out_transfer_func.type == TF_TYPE_PREDEFINED)
          397                                 BREAK_TO_DEBUGGER();
          398                 }
          399         }
          400
      --> 401         mpc->funcs->set_output_gamma(mpc, mpcc_id, params);
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Then it will crash
      
          402         return ret;
          403 }
      
      Fixes: d99f1387 ("drm/amd/display: Add DCN3 HWSEQ")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Cc: Tom Chung <chiahsuan.chung@amd.com>
      Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Cc: Roman Li <roman.li@amd.com>
      Cc: Hersen Wu <hersenxs.wu@amd.com>
      Cc: Alex Hung <alex.hung@amd.com>
      Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
      Cc: Harry Wentland <harry.wentland@amd.com>
      Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
      Signed-off-by: default avatarSrinivasan Shanmugam <srinivasan.shanmugam@amd.com>
      Reviewed-by: default avatarTom Chung <chiahsuan.chung@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      08ae395e
    • Tim Huang's avatar
      drm/amdgpu: add missed harvest check for VCN IP v4/v5 · 0b071245
      Tim Huang authored
      To prevent below probe failure, add a check for models with VCN
      IP v4.0.6 where VCN1 may be harvested.
      
      v2:
      Apply the same check to VCN IP v4.0 and v5.0.
      
      [   54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
      [   54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43
      c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02
      00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00
      [   54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286
      [   54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX:
      0000000000000000
      [   54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI:
      ffff99a82f680000
      [   54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09:
      0000000000000000
      [   54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12:
      0000000000000000
      [   54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15:
      0000000000000001
      [   54.075400] FS:  00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000)
      knlGS:0000000000000000
      [   54.075988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4:
      0000000000750ef0
      [   54.076927] PKRU: 55555554
      [   54.077132] Call Trace:
      [   54.077319]  <TASK>
      [   54.077484]  ? show_regs+0x69/0x80
      [   54.077747]  ? __die+0x28/0x70
      [   54.077979]  ? page_fault_oops+0x180/0x4b0
      [   54.078286]  ? do_user_addr_fault+0x2d2/0x680
      [   54.078610]  ? exc_page_fault+0x84/0x190
      [   54.078910]  ? asm_exc_page_fault+0x2b/0x30
      [   54.079224]  ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
      [   54.079941]  ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu]
      [   54.080617]  vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu]
      [   54.081316]  amdgpu_device_ip_set_powergating_state+0x64/0xc0
      [amdgpu]
      [   54.082057]  amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu]
      [   54.082727]  amdgpu_ring_alloc+0x44/0x70 [amdgpu]
      [   54.083351]  amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu]
      [   54.084054]  amdgpu_ring_test_helper+0x22/0x90 [amdgpu]
      [   54.084698]  vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu]
      [   54.085307]  amdgpu_device_init+0x1f96/0x2780 [amdgpu]
      [   54.085951]  amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu]
      [   54.086591]  amdgpu_pci_probe+0x19f/0x550 [amdgpu]
      [   54.087215]  local_pci_probe+0x48/0xa0
      [   54.087509]  pci_device_probe+0xc9/0x250
      [   54.087812]  really_probe+0x1a4/0x3f0
      [   54.088101]  __driver_probe_device+0x7d/0x170
      [   54.088443]  driver_probe_device+0x24/0xa0
      [   54.088765]  __driver_attach+0xdd/0x1d0
      [   54.089068]  ? __pfx___driver_attach+0x10/0x10
      [   54.089417]  bus_for_each_dev+0x8e/0xe0
      [   54.089718]  driver_attach+0x22/0x30
      [   54.090000]  bus_add_driver+0x120/0x220
      [   54.090303]  driver_register+0x62/0x120
      [   54.090606]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
      [   54.091255]  __pci_register_driver+0x62/0x70
      [   54.091593]  amdgpu_init+0x67/0xff0 [amdgpu]
      [   54.092190]  do_one_initcall+0x5f/0x330
      [   54.092495]  do_init_module+0x68/0x240
      [   54.092794]  load_module+0x201c/0x2110
      [   54.093093]  init_module_from_file+0x97/0xd0
      [   54.093428]  ? init_module_from_file+0x97/0xd0
      [   54.093777]  idempotent_init_module+0x11c/0x2a0
      [   54.094134]  __x64_sys_finit_module+0x64/0xc0
      [   54.094476]  do_syscall_64+0x58/0x120
      [   54.094767]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      Signed-off-by: default avatarTim Huang <tim.huang@amd.com>
      Reviewed-by: default avatarSaleemkhan Jamadar <saleemkhan.jamadar@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      0b071245
    • Yifan Zhang's avatar
      drm/amdgpu: skip kfd init if GFX is not ready. · 3b37e272
      Yifan Zhang authored
      avoid kfd init crash in that case.
      Signed-off-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Tested-by: default avatarJesse Zhang <Jesse.Zhang@amd.com>
      Reviewed-by: default avatarJesse Zhang <Jesse.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      3b37e272
    • Philip Yang's avatar
      drm/amdkfd: Validate user queue update · 305cd109
      Philip Yang authored
      Ensure update queue new ring buffer is mapped on GPU with correct size.
      
      Decrease queue old ring_bo queue_refcount and increase new ring_bo
      queue_refcount.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      305cd109
    • Philip Yang's avatar
      drm/amdkfd: Validate user queue svm memory residency · b049504e
      Philip Yang authored
      Queue CWSR area maybe registered to GPU as svm memory, create queue to
      ensure svm mapped to GPU with KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag.
      
      Add queue_refcount to struct svm_range, to track queue CWSR area usage.
      
      Because unmap mmu notifier callback return value is ignored, if
      application unmap the CWSR area while queue is active, pr_warn message
      in dmesg log. To be safe, evict user queue.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      b049504e
  3. 23 Jul, 2024 29 commits