• Frederic Weisbecker's avatar
    timer/migration: Remove buggy early return on deactivation · 4b6f4c5a
    Frederic Weisbecker authored
    When a CPU enters into idle and deactivates itself from the timer
    migration hierarchy without any global timer of its own to propagate,
    the group event of that CPU is set to "ignore" and tmigr_update_events()
    accordingly performs an early return without considering timers queued
    by other CPUs.
    
    If the hierarchy has a single level, and the CPU is the last one to
    enter idle, it will ignore others' global timers, as in the following
    layout:
    
               [GRP0:0]
             migrator = 0
             active   = 0
             nextevt  = T0i
              /         \
             0           1
          active (T0i)  idle (T1)
    
    0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
    upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
    
               [GRP0:0]
             migrator = NONE
             active   = NONE
             nextevt  = T0i
              /         \
             0           1
          idle (T0i)  idle (T1)
    
    1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
    pushed as its next expiry and its own event kept as "ignore". As a result
    tmigr_update_events() ignores T1 and CPU 0 goes to idle with T1
    unhandled.
    
    This isn't proper to single level hierarchy though. A similar issue,
    although slightly different, may arise on multi-level:
    
                                [GRP1:0]
                             migrator = GRP0:0
                             active   = GRP0:0
                             nextevt  = T0:0i, T0:1
                             /              \
                  [GRP0:0]                  [GRP0:1]
               migrator = 0              migrator = NONE
               active   = 0              active   = NONE
               nextevt  = T0i            nextevt  = T2
               /         \                /         \
              0 (T0i)     1 (T1)         2 (T2)      3
          active         idle            idle       idle
    
    0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
    upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
    CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2
    
                                [GRP1:0]
                             migrator = GRP0:0
                             active   = GRP0:0
                             nextevt  = T0:0i, T0:1
                             /              \
                  [GRP0:0]                  [GRP0:1]
               migrator = NONE           migrator = NONE
               active   = NONE           active   = NONE
               nextevt  = T0i            nextevt  = T2
               /         \                /         \
              0 (T0i)     1 (T1)         2 (T2)      3
            idle         idle            idle         idle
    
    1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
    pushed as its next expiry and its own event kept as "ignore". As a result
    tmigr_update_events() ignores T1. The change only propagated up to 1st
    level so far.
    
                                [GRP1:0]
                             migrator = NONE
                             active   = NONE
                             nextevt  = T0:1
                             /              \
                  [GRP0:0]                  [GRP0:1]
               migrator = NONE           migrator = NONE
               active   = NONE           active   = NONE
               nextevt  = T0i            nextevt  = T2
               /         \                /         \
              0 (T0i)     1 (T1)         2 (T2)      3
            idle         idle            idle         idle
    
    2) The change now propagates up to the top. tmigr_update_events() finds
    that the child event is ignored and thus removes it. The top level next
    event is now T2 which is returned to CPU 0 as its next effective expiry
    to take account for as the global idle migrator. However T1 has been
    ignored along the way, leaving it unhandled.
    
    Fix those issues with removing the buggy related early return. Ignored
    child events must not prevent from evaluating the other events within
    the same group.
    Reported-by: default avatarBoqun Feng <boqun.feng@gmail.com>
    Reported-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
    Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
    Link: https://lore.kernel.org/r/ZfOhB9ZByTZcBy4u@lothringen
    4b6f4c5a
timer_migration.c 54.7 KB