• Daniel Lezcano's avatar
    Revert "thermal: core: Don't update trip points inside the hysteresis range" · f67cf45d
    Daniel Lezcano authored
    It has been reported the commit cf3986f8 introduced a regression
    when the temperature is wavering in the hysteresis region. The
    mitigation stops leading to an uncontrolled temperature increase until
    reaching the critical trip point.
    
    Here what happens:
    
     * 'throttle' is when the current temperature is greater than the trip
       point temperature
     * 'target' is the mitigation level
     * 'passive' is positive when there is a mitigation, zero otherwise
     * these values are computed in the step_wise governor
    
    Configuration:
    
     trip point 1: temp=95°C, hyst=5°C (passive)
     trip point 2: temp=115°C, hyst=0°C (critical)
     governor: step_wise
    
     1. The temperature crosses the way up the trip point 1 at 95°C
    
       - trend=raising
       - throttle=1, target=1
       - passive=1
       - set_trips: low=90°C, high=115°C
    
     2. The temperature decreases but stays in the hysteresis region at
        93°C
    
       - trend=dropping
       - throttle=0, target=0
       - passive=1
    
       Before cf3986f8
       - set_trips: low=90°C, high=95°C
    
       After cf3986f8
       - set_trips: low=90°C, high=115°C
    
     3. The temperature increases a bit but stays in the hysteresis region
        at 94°C (so below the trip point 1 temp 95°C)
    
       - trend=raising
       - throttle=0, target=0
       - passive=1
    
       Before cf3986f8
       - set_trips: low=90°C, high=95°C
    
       After cf3986f8
       - set_trips: low=90°C, high=115°C
    
     4. The temperature decreases but stays in the hysteresis region at
        93°C
    
       - trend=dropping
       - throttle=0, target=THERMAL_NO_TARGET
       - passive=0
    
       Before cf3986f8
       - set_trips: low=90°C, high=95°C
    
       After cf3986f8
       - set_trips: low=90°C, high=115°C
    
    At this point, the 'passive' value is zero, there is no mitigation,
    the temperature is in the hysteresis region, the next trip point is
    115°C. As 'passive' is zero, the timer to monitor the thermal zone is
    disabled. Consequently if the temperature continues to increase, no
    mitigation will happen and it will reach the 115°C trip point and
    reboot.
    
    Before the optimization, the high boundary would have been 95°C, thus
    triggering the mitigation again and rearming the polling timer.
    
    The optimization make sense but given the current implementation of
    the step_wise governor collaborating via this 'passive' flag with the
    core framework it can not work.
    
    From a higher perspective it seems like there is a problem between the
    governor which sets a variable to be used by the core framework. That
    sounds akward and it would make much more sense if the core framework
    controls the governor and not the opposite. But as the devil hides in
    the details, there are some subtilities to be addressed before.
    
    Elaborating those would be out of the scope this changelog. So let's
    stay simple and revert the change first to fixup all broken mobile
    platforms.
    
    This reverts commit cf3986f8 ("thermal: core: Don't update trip
    points inside the hysteresis range") and takes a conflict with commit
    0c0c4740 ("0c0c4740 thermal: trip: Use for_each_trip() in
    __thermal_zone_set_trips()") in drivers/thermal/thermal_trip.c into
    account.
    
    Fixes: cf3986f8 ("thermal: core: Don't update trip points inside the hysteresis range")
    Reported-by: default avatarManaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
    Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
    Acked-by: default avatarNícolas F. R. A. Prado <nfraprado@collabora.com>
    Cc: 6.7+ <stable@vger.kernel.org> # 6.7+
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    f67cf45d
thermal_trip.c 3.76 KB