sched/rt: cpupri_find: Implement fallback mechanism for !fit case

When searching for the best lowest_mask with a fitness_fn passed, make sure we record the lowest_level that returns a valid lowest_mask so that we can use that as a fallback in case we fail to find a fitting CPU at all levels. The intention in the original patch was not to allow a down migration to unfitting CPU. But this missed the case where we are already running on unfitting one. With this change now RT tasks can still move between unfitting CPUs when they're already running on such CPU. And as Steve suggested; to adhere to the strict priority rules of RT, if a task is already running on a fitting CPU but due to priority it can't run on it, allow it to downmigrate to unfitting CPU so it can run. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Fixes: 804d402f ("sched/rt: Make RT capacity-aware") Link: https://lkml.kernel.org/r/20200302132721.8353-2-qais.yousef@arm.com Link: https://lore.kernel.org/lkml/20200203142712.a7yvlyo2y3le5cpn@e107158-lin/

sched/rt: cpupri_find: Implement fallback mechanism for !fit case
When searching for the best lowest_mask with a fitness_fn passed, make sure we record the lowest_level that returns a valid lowest_mask so that we can use that as a fallback in case we fail to find a fitting CPU at all levels. The intention in the original patch was not to allow a down migration to unfitting CPU. But this missed the case where we are already running on unfitting one. With this change now RT tasks can still move between unfitting CPUs when they're already running on such CPU. And as Steve suggested; to adhere to the strict priority rules of RT, if a task is already running on a fitting CPU but due to priority it can't run on it, allow it to downmigrate to unfitting CPU so it can run. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Fixes: 804d402f ("sched/rt: Make RT capacity-aware") Link: https://lkml.kernel.org/r/20200302132721.8353-2-qais.yousef@arm.com Link: https://lore.kernel.org/lkml/20200203142712.a7yvlyo2y3le5cpn@e107158-lin/
d9cb236b · Qais Yousef · Ingo Molnar · 5ab297ba · d9cb236b
Commit d9cb236b authored Mar 02, 2020 by Qais Yousef Committed by Ingo Molnar Mar 06, 2020
Show whitespace changes
Inline Side-by-side

Showing with 101 additions and 56 deletions

kernel/sched/cpupri.c kernel/sched/cpupri.c +101 -56

No files found.
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -41,33 +41,9 @@ static int convert_prio(int prio)
 	return cpupri;
 }

-/**
- * cpupri_find - find the best (lowest-pri) CPU in the system
- * @cp: The cpupri context
- * @p: The task
- * @lowest_mask: A mask to fill in with selected CPUs (or NULL)
- * @fitness_fn: A pointer to a function to do custom checks whether the CPU
- *              fits a specific criteria so that we only return those CPUs.
- *
- * Note: This function returns the recommended CPUs as calculated during the
- * current invocation.  By the time the call returns, the CPUs may have in
- * fact changed priorities any number of times.  While not ideal, it is not
- * an issue of correctness since the normal rebalancer logic will correct
- * any discrepancies created by racing against the uncertainty of the current
- * priority configuration.
- *
- * Return: (int)bool - CPUs were found
- */
-int cpupri_find(struct cpupri *cp, struct task_struct *p,
-		struct cpumask *lowest_mask,
-		bool (*fitness_fn)(struct task_struct *p, int cpu))
+static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
+				struct cpumask *lowest_mask, int idx)
 {
-	int idx = 0;
-	int task_pri = convert_prio(p->prio);
-
-	BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES);
-
-	for (idx = 0; idx < task_pri; idx++) {
 	struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
 	int skip = 0;

@@ -95,14 +71,12 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,

 	/* Need to do the rmb for every iteration */
 	if (skip)
-			continue;
+		return 0;

 	if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
-			continue;
+		return 0;

 	if (lowest_mask) {
-			int cpu;
-
 		cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);

 		/*
@@ -114,9 +88,45 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
 		 * priority level and continue on.
 		 */
 		if (cpumask_empty(lowest_mask))
+			return 0;
+	}
+
+	return 1;
+}
+
+/**
+ * cpupri_find - find the best (lowest-pri) CPU in the system
+ * @cp: The cpupri context
+ * @p: The task
+ * @lowest_mask: A mask to fill in with selected CPUs (or NULL)
+ * @fitness_fn: A pointer to a function to do custom checks whether the CPU
+ *              fits a specific criteria so that we only return those CPUs.
+ *
+ * Note: This function returns the recommended CPUs as calculated during the
+ * current invocation.  By the time the call returns, the CPUs may have in
+ * fact changed priorities any number of times.  While not ideal, it is not
+ * an issue of correctness since the normal rebalancer logic will correct
+ * any discrepancies created by racing against the uncertainty of the current
+ * priority configuration.
+ *
+ * Return: (int)bool - CPUs were found
+ */
+int cpupri_find(struct cpupri *cp, struct task_struct *p,
+		struct cpumask *lowest_mask,
+		bool (*fitness_fn)(struct task_struct *p, int cpu))
+{
+	int task_pri = convert_prio(p->prio);
+	int best_unfit_idx = -1;
+	int idx = 0, cpu;
+
+	BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES);
+
+	for (idx = 0; idx < task_pri; idx++) {
+
+		if (!__cpupri_find(cp, p, lowest_mask, idx))
 			continue;

-			if (!fitness_fn)
+		if (!lowest_mask || !fitness_fn)
 			return 1;

 		/* Ensure the capacity of the CPUs fit the task */
@@ -129,13 +139,48 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
 		 * If no CPU at the current priority can fit the task
 		 * continue looking
 		 */
-			if (cpumask_empty(lowest_mask))
+		if (cpumask_empty(lowest_mask)) {
+			/*
+			 * Store our fallback priority in case we
+			 * didn't find a fitting CPU
+			 */
+			if (best_unfit_idx == -1)
+				best_unfit_idx = idx;
+
 			continue;
 		}

 		return 1;
 	}

+	/*
+	 * If we failed to find a fitting lowest_mask, make sure we fall back
+	 * to the last known unfitting lowest_mask.
+	 *
+	 * Note that the map of the recorded idx might have changed since then,
+	 * so we must ensure to do the full dance to make sure that level still
+	 * holds a valid lowest_mask.
+	 *
+	 * As per above, the map could have been concurrently emptied while we
+	 * were busy searching for a fitting lowest_mask at the other priority
+	 * levels.
+	 *
+	 * This rule favours honouring priority over fitting the task in the
+	 * correct CPU (Capacity Awareness being the only user now).
+	 * The idea is that if a higher priority task can run, then it should
+	 * run even if this ends up being on unfitting CPU.
+	 *
+	 * The cost of this trade-off is not entirely clear and will probably
+	 * be good for some workloads and bad for others.
+	 *
+	 * The main idea here is that if some CPUs were overcommitted, we try
+	 * to spread which is what the scheduler traditionally did. Sys admins
+	 * must do proper RT planning to avoid overloading the system if they
+	 * really care.
+	 */
+	if (best_unfit_idx != -1)
+		return __cpupri_find(cp, p, lowest_mask, best_unfit_idx);
+
 	return 0;
 }