mlxsw: core_thermal: Fix fan speed in maximum cooling state
The cooling levels array is supposed to prevent the system fans from being configured below a 20% duty cycle as otherwise some of them get stuck at 0 RPM. Due to an off-by-one error, the last element in the array was not initialized, causing it to be set to zero, which in turn lead to fans being configured with a 0% duty cycle in maximum cooling state. Since commit 332fdf95 ("mlxsw: thermal: Fix out-of-bounds memory accesses") the contents of the array are static. Therefore, instead of fixing the initialization of the array, simply remove it and adjust thermal_cooling_device_ops::set_cur_state() so that the configured duty cycle is never set below 20%. Before: # cat /sys/class/thermal/thermal_zone0/cdev0/type mlxsw_fan # echo 10 > /sys/class/thermal/thermal_zone0/cdev0/cur_state # cat /sys/class/hwmon/hwmon0/name mlxsw # cat /sys/class/hwmon/hwmon0/pwm1 0 After: # cat /sys/class/thermal/thermal_zone0/cdev0/type mlxsw_fan # echo 10 > /sys/class/thermal/thermal_zone0/cdev0/cur_state # cat /sys/class/hwmon/hwmon0/name mlxsw # cat /sys/class/hwmon/hwmon0/pwm1 255 This bug was uncovered when the thermal subsystem repeatedly tried to configure the cooling devices to their maximum state due to another issue [1]. This resulted in the fans being stuck at 0 RPM, which eventually lead to the system undergoing thermal shutdown. [1] https://lore.kernel.org/netdev/ZA3CFNhU4AbtsP4G@shredder/ Fixes: a421ce08 ("mlxsw: core: Extend cooling device with cooling levels") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Showing
Please register or sign in to comment