• Ivan Mironov's avatar
    drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS · 7e89e4aa
    Ivan Mironov authored
    I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and
    following started to happen on each boot:
    
    	...
    	BUG: kernel NULL pointer dereference, address: 0000000000000128
    	...
    	CPU: 9 PID: 1940 Comm: modprobe Tainted: G            E     5.7.2-200.im0.fc32.x86_64 #1
    	Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020
    	RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
    	...
    	Call Trace:
    	 i2c_smbus_xfer+0x3d/0xf0
    	 i2c_default_probe+0xf3/0x130
    	 i2c_detect.isra.0+0xfe/0x2b0
    	 ? kfree+0xa3/0x200
    	 ? kobject_uevent_env+0x11f/0x6a0
    	 ? i2c_detect.isra.0+0x2b0/0x2b0
    	 __process_new_driver+0x1b/0x20
    	 bus_for_each_dev+0x64/0x90
    	 ? 0xffffffffc0f34000
    	 i2c_register_driver+0x73/0xc0
    	 do_one_initcall+0x46/0x200
    	 ? _cond_resched+0x16/0x40
    	 ? kmem_cache_alloc_trace+0x167/0x220
    	 ? do_init_module+0x23/0x260
    	 do_init_module+0x5c/0x260
    	 __do_sys_init_module+0x14f/0x170
    	 do_syscall_64+0x5b/0xf0
    	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
    	...
    
    Error appears when some i2c device driver tries to probe for devices
    using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`.
    Code supporting this adapter requires `adev->psp.ras.ras` to be not
    NULL, which is true only when `amdgpu_ras_init()` detects HW support by
    calling `amdgpu_ras_check_supported()`.
    
    Before 9015d60c, adapter was registered by
    
    	-> amdgpu_device_ip_init()
    	  -> amdgpu_ras_recovery_init()
    	    -> amdgpu_ras_eeprom_init()
    	      -> smu_v11_0_i2c_eeprom_control_init()
    
    after verifying that `adev->psp.ras.ras` is not NULL in
    `amdgpu_ras_recovery_init()`. Currently it is registered
    unconditionally by
    
    	-> amdgpu_device_ip_init()
    	  -> pp_sw_init()
    	    -> hwmgr_sw_init()
    	      -> vega20_smu_init()
    	        -> smu_v11_0_i2c_eeprom_control_init()
    
    Fix simply adds HW support check (ras == NULL => no support) before
    calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`.
    
    Please note that there is a chance that similar fix is also required for
    CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without
    RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense
    when there is no HW support.
    
    Cc: stable@vger.kernel.org
    Fixes: 9015d60c ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device")
    Signed-off-by: default avatarIvan Mironov <mironov.ivan@gmail.com>
    Tested-by: default avatarBjorn Nostvold <bjorn.nostvold@gmail.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    7e89e4aa
vega20_smumgr.c 20.8 KB