Commit facd40aa authored by Anna-Maria Behnsen's avatar Anna-Maria Behnsen Committed by Thomas Gleixner

timers/migration: Do not rely always on group->parent

When reading group->parent without holding the group lock it is racy
against CPUs coming online the first time and thereby creating another
level of the hierarchy. This is not a problem when this value is read once
to decide whether to abort a propagation or not. The worst outcome is an
unnecessary/early CPU wake up. But it is racy when reading it several times
during a single 'action' (like activation, deactivation, checking for
remote timer expiry,...) and relying on the consitency of this value
without holding the lock. This happens at the moment e.g. in
tmigr_inactive_up() which is also calling tmigr_udpate_events(). Code relys
on group->parent not to change during this 'action'.

Update parent struct member description to explain the above only
once. Remove parent pointer checks when they are not mandatory (like update
of data->childmask). Remove a warning, which would be nice but the trigger
of this warning is not reliable and add expand the data structure member
description instead. Expand a comment, why it is safe to rely on parent
pointer here (inside hierarchy update).

Fixes: 7ee98877 ("timers: Implement the hierarchical pull model")
Reported-by: default avatarBorislav Petkov <bp@alien8.de>
Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240716-tmigr-fixes-v4-1-757baa7803fe@linutronix.de
parent 22a40d14
...@@ -507,7 +507,14 @@ static void walk_groups(up_f up, void *data, struct tmigr_cpu *tmc) ...@@ -507,7 +507,14 @@ static void walk_groups(up_f up, void *data, struct tmigr_cpu *tmc)
* (get_next_timer_interrupt()) * (get_next_timer_interrupt())
* @firstexp: Contains the first event expiry information when last * @firstexp: Contains the first event expiry information when last
* active CPU of hierarchy is on the way to idle to make * active CPU of hierarchy is on the way to idle to make
* sure CPU will be back in time. * sure CPU will be back in time. It is updated in top
* level group only. Be aware, there could occur a new top
* level of the hierarchy between the 'top level call' in
* tmigr_update_events() and the check for the parent group
* in walk_groups(). Then @firstexp might contain a value
* != KTIME_MAX even if it was not the final top
* level. This is not a problem, as the worst outcome is a
* CPU which might wake up a little early.
* @evt: Pointer to tmigr_event which needs to be queued (of idle * @evt: Pointer to tmigr_event which needs to be queued (of idle
* child group) * child group)
* @childmask: childmask of child group * @childmask: childmask of child group
...@@ -649,7 +656,7 @@ static bool tmigr_active_up(struct tmigr_group *group, ...@@ -649,7 +656,7 @@ static bool tmigr_active_up(struct tmigr_group *group,
} while (!atomic_try_cmpxchg(&group->migr_state, &curstate.state, newstate.state)); } while (!atomic_try_cmpxchg(&group->migr_state, &curstate.state, newstate.state));
if ((walk_done == false) && group->parent) if (walk_done == false)
data->childmask = group->childmask; data->childmask = group->childmask;
/* /*
...@@ -1317,20 +1324,9 @@ static bool tmigr_inactive_up(struct tmigr_group *group, ...@@ -1317,20 +1324,9 @@ static bool tmigr_inactive_up(struct tmigr_group *group,
/* Event Handling */ /* Event Handling */
tmigr_update_events(group, child, data); tmigr_update_events(group, child, data);
if (group->parent && (walk_done == false)) if (walk_done == false)
data->childmask = group->childmask; data->childmask = group->childmask;
/*
* data->firstexp was set by tmigr_update_events() and contains the
* expiry of the first global event which needs to be handled. It
* differs from KTIME_MAX if:
* - group is the top level group and
* - group is idle (which means CPU was the last active CPU in the
* hierarchy) and
* - there is a pending event in the hierarchy
*/
WARN_ON_ONCE(data->firstexp != KTIME_MAX && group->parent);
trace_tmigr_group_set_cpu_inactive(group, newstate, childmask); trace_tmigr_group_set_cpu_inactive(group, newstate, childmask);
return walk_done; return walk_done;
...@@ -1552,10 +1548,11 @@ static void tmigr_connect_child_parent(struct tmigr_group *child, ...@@ -1552,10 +1548,11 @@ static void tmigr_connect_child_parent(struct tmigr_group *child,
data.childmask = child->childmask; data.childmask = child->childmask;
/* /*
* There is only one new level per time. When connecting the * There is only one new level per time (which is protected by
* child and the parent and set the child active when the parent * tmigr_mutex). When connecting the child and the parent and
* is inactive, the parent needs to be the uppermost * set the child active when the parent is inactive, the parent
* level. Otherwise there went something wrong! * needs to be the uppermost level. Otherwise there went
* something wrong!
*/ */
WARN_ON(!tmigr_active_up(parent, child, &data) && parent->parent); WARN_ON(!tmigr_active_up(parent, child, &data) && parent->parent);
} }
......
...@@ -22,7 +22,17 @@ struct tmigr_event { ...@@ -22,7 +22,17 @@ struct tmigr_event {
* struct tmigr_group - timer migration hierarchy group * struct tmigr_group - timer migration hierarchy group
* @lock: Lock protecting the event information and group hierarchy * @lock: Lock protecting the event information and group hierarchy
* information during setup * information during setup
* @parent: Pointer to the parent group * @parent: Pointer to the parent group. Pointer is updated when a
* new hierarchy level is added because of a CPU coming
* online the first time. Once it is set, the pointer will
* not be removed or updated. When accessing parent pointer
* lock less to decide whether to abort a propagation or
* not, it is not a problem. The worst outcome is an
* unnecessary/early CPU wake up. But do not access parent
* pointer several times in the same 'action' (like
* activation, deactivation, check for remote expiry,...)
* without holding the lock as it is not ensured that value
* will not change.
* @groupevt: Next event of the group which is only used when the * @groupevt: Next event of the group which is only used when the
* group is !active. The group event is then queued into * group is !active. The group event is then queued into
* the parent timer queue. * the parent timer queue.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment