- 02 Oct, 2023 3 commits
-
-
Cyril Hrubis authored
Standardize on a single variant. Signed-off-by:
Cyril Hrubis <chrubis@suse.cz> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231002115553.3007-4-chrubis@suse.cz
-
Cyril Hrubis authored
- Describe explicitly that sched_rt_runtime_us is allocated from sched_rt_period_us and hence always less or equal to that value. - The limit for sched_rt_runtime_us is not INT_MAX-1, but rather it's limited by the value of sched_rt_period_us. If sched_rt_period_us is INT_MAX then sched_rt_runtime_us can be set to INT_MAX as well. Signed-off-by:
Cyril Hrubis <chrubis@suse.cz> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231002115553.3007-3-chrubis@suse.cz
-
Cyril Hrubis authored
The validation of the value written to sched_rt_period_us was broken because: - the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec() Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check: if (sysclt_sched_rt_period <= 0) return EINVAL; This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead. Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX. As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace. There is also a LTP test being submitted for these sysctl files at: http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@suse.cz/Signed-off-by:
Cyril Hrubis <chrubis@suse.cz> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz
-
- 29 Sep, 2023 4 commits
-
-
Qais Yousef authored
It was useful to track feec() placement decision and debug the spare capacity and optimization issues vs uclamp_max. Signed-off-by:
Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230916232955.2099394-4-qyousef@layalina.io
-
Qais Yousef authored
find_energy_efficient_cpu() bails out early if effective util of the task is 0 as the delta at this point will be zero and there's nothing for EAS to do. When uclamp is being used, this could lead to wrong decisions when uclamp_max is set to 0. In this case the task is capped to performance point 0, but it is actually running and consuming energy and we can benefit from EAS energy calculations. Rework the condition so that it bails out when both util and uclamp_min are 0. We can do that without needing to use uclamp_task_util(); remove it. Fixes: d81304bc ("sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition") Signed-off-by:
Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230916232955.2099394-3-qyousef@layalina.io
-
Qais Yousef authored
When uclamp_max is being used, the util of the task could be higher than the spare capacity of the CPU, but due to uclamp_max value we force-fit it there. The way the condition for checking for max_spare_cap in find_energy_efficient_cpu() was constructed; it ignored any CPU that has its spare_cap less than or _equal_ to max_spare_cap. Since we initialize max_spare_cap to 0; this lead to never setting max_spare_cap_cpu and hence ending up never performing compute_energy() for this cluster and missing an opportunity for a better energy efficient placement to honour uclamp_max setting. max_spare_cap = 0; cpu_cap = capacity_of(cpu) - cpu_util(p); // 0 if cpu_util(p) is high ... util_fits_cpu(...); // will return true if uclamp_max forces it to fit ... // this logic will fail to update max_spare_cap_cpu if cpu_cap is 0 if (cpu_cap > max_spare_cap) { max_spare_cap = cpu_cap; max_spare_cap_cpu = cpu; } prev_spare_cap suffers from a similar problem. Fix the logic by converting the variables into long and treating -1 value as 'not populated' instead of 0 which is a viable and correct spare capacity value. We need to be careful signed comparison is used when comparing with cpu_cap in one of the conditions. Fixes: 1d42509e ("sched/fair: Make EAS wakeup placement consider uclamp restrictions") Signed-off-by:
Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230916232955.2099394-2-qyousef@layalina.io
-
Valentin Schneider authored
dl_rq->dl_nr_migratory is increased whenever a DL entity is enqueued and it has nr_cpus_allowed > 1. Unlike the pushable_dl_tasks tree, dl_rq->dl_nr_migratory includes a dl_rq's current task. This means a dl_rq can have a migratable current, N non-migratable queued tasks, and be flagged as overloaded and have its CPU set in the dlo_mask, despite having an empty pushable_tasks tree. Make an dl_rq's overload logic be driven by {enqueue,dequeue}_pushable_dl_task(), in other words make DL RQs only be flagged as overloaded if they have at least one runnable-but-not-current migratable task. o push_dl_task() is unaffected, as it is a no-op if there are no pushable tasks. o pull_dl_task() now no longer scans runqueues whose sole migratable task is their current one, which it can't do anything about anyway. It may also now pull tasks to a DL RQ with dl_nr_running > 1 if only its current task is migratable. Since dl_rq->dl_nr_migratory becomes unused, remove it. RT had the exact same mechanism (rt_rq->rt_nr_migratory) which was dropped in favour of relying on rt_rq->pushable_tasks, see: 612f769e ("sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask") Signed-off-by:
Valentin Schneider <vschneid@redhat.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Acked-by:
Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230928150251.463109-1-vschneid@redhat.com
-
- 25 Sep, 2023 1 commit
-
-
Valentin Schneider authored
Sebastian noted that the rto_push_work IRQ work can be queued for a CPU that has an empty pushable_tasks list, which means nothing useful will be done in the IPI other than queue the work for the next CPU on the rto_mask. rto_push_irq_work_func() only operates on tasks in the pushable_tasks list, but the conditions for that irq_work to be queued (and for a CPU to be added to the rto_mask) rely on rq_rt->nr_migratory instead. nr_migratory is increased whenever an RT task entity is enqueued and it has nr_cpus_allowed > 1. Unlike the pushable_tasks list, nr_migratory includes a rt_rq's current task. This means a rt_rq can have a migratible current, N non-migratible queued tasks, and be flagged as overloaded / have its CPU set in the rto_mask, despite having an empty pushable_tasks list. Make an rt_rq's overload logic be driven by {enqueue,dequeue}_pushable_task(). Since rt_rq->{rt_nr_migratory,rt_nr_total} become unused, remove them. Note that the case where the current task is pushed away to make way for a migration-disabled task remains unchanged: the migration-disabled task has to be in the pushable_tasks list in the first place, which means it has nr_cpus_allowed > 1. Reported-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Valentin Schneider <vschneid@redhat.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Tested-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20230811112044.3302588-1-vschneid@redhat.com
-
- 24 Sep, 2023 3 commits
-
-
Wang Jinchao authored
Simplify the conditional logic for checking worker flags by splitting the original compound `if` statement into separate `if` and `else if` clauses. This modification not only retains the previous functionality, but also reduces a single `if` check, improving code clarity and potentially enhancing performance. Signed-off-by:
Wang Jinchao <wangjinchao@xfusion.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/ZOIMvURE99ZRAYEj@fedora
-
Josh Don authored
We've observed the following warning being hit in distribute_cfs_runtime(): SCHED_WARN_ON(cfs_rq->runtime_remaining > 0) We have the following race: - CPU 0: running bandwidth distribution (distribute_cfs_runtime). Inspects the local cfs_rq and makes its runtime_remaining positive. However, we defer unthrottling the local cfs_rq until after considering all remote cfs_rq's. - CPU 1: starts running bandwidth distribution from the slack timer. When it finds the cfs_rq for CPU 0 on the throttled list, it observers the that the cfs_rq is throttled, yet is not on the CSD list, and has a positive runtime_remaining, thus triggering the warning in distribute_cfs_runtime. To fix this, we can rework the local unthrottling logic to put the local cfs_rq on a local list, so that any future bandwidth distributions will realize that the cfs_rq is about to be unthrottled. Signed-off-by:
Josh Don <joshdon@google.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922230535.296350-2-joshdon@google.com
-
Josh Don authored
This makes the following patch cleaner by avoiding extra CONFIG_SMP conditionals on the availability of rq->throttled_csd_list. Signed-off-by:
Josh Don <joshdon@google.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922230535.296350-1-joshdon@google.com
-
- 22 Sep, 2023 2 commits
-
-
Liming Wu authored
in_atomic_preempt_off() already gets called in schedule_debug() once, which is the only caller of __schedule_bug(). Skip the second call within __schedule_bug(), it should always be true at this point. [ mingo: Clarified the changelog. ] Signed-off-by:
Liming Wu <liming.wu@jaguarmicro.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230825023501.1848-1-liming.wu@jaguarmicro.com
-
Ingo Molnar authored
The list_head counterpart of list_for_each_entry_reverse() was missing, add it to complete the list handling APIs in <linux/list.h>. [ This new API is also relied on by a WIP scheduler patch, so this variant is not a theoretical possibility only. ] Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org
-
- 21 Sep, 2023 7 commits
-
-
Ingo Molnar authored
Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Use the same _LINUX_SCHED_ prefix nomenclature as the other 29 header guards in include/linux/sched/ do. Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
It's the only non-trivial header in include/linux/sched/ missing a header guard. Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Finn Thain authored
Except on x86, preempt_count is always accessed with READ_ONCE(). Repeated invocations in macros like irq_count() produce repeated loads. These redundant instructions appear in various fast paths. In the one shown below, for example, irq_count() is evaluated during kernel entry if !tick_nohz_full_cpu(smp_processor_id()). 0001ed0a <irq_enter_rcu>: 1ed0a: 4e56 0000 linkw %fp,#0 1ed0e: 200f movel %sp,%d0 1ed10: 0280 ffff e000 andil #-8192,%d0 1ed16: 2040 moveal %d0,%a0 1ed18: 2028 0008 movel %a0@(8),%d0 1ed1c: 0680 0001 0000 addil #65536,%d0 1ed22: 2140 0008 movel %d0,%a0@(8) 1ed26: 082a 0001 000f btst #1,%a2@(15) 1ed2c: 670c beqs 1ed3a <irq_enter_rcu+0x30> 1ed2e: 2028 0008 movel %a0@(8),%d0 1ed32: 2028 0008 movel %a0@(8),%d0 1ed36: 2028 0008 movel %a0@(8),%d0 1ed3a: 4e5e unlk %fp 1ed3c: 4e75 rts This patch doesn't prevent the pointless btst and beqs instructions above, but it does eliminate 2 of the 3 pointless move instructions here and elsewhere. On x86, preempt_count is per-cpu data and the problem does not arise presumably because the compiler is free to optimize more effectively. This patch was tested on m68k and x86. I was expecting no changes to object code for x86 and mostly that's what I saw. However, there were a few places where code generation was perturbed for some reason. The performance issue addressed here is minor on uniprocessor m68k. I got a 0.01% improvement from this patch for a simple "find /sys -false" benchmark. For architectures and workloads susceptible to cache line bounce the improvement is expected to be larger. The only SMP architecture I have is x86, and as x86 unaffected I have not done any further measurements. Fixes: 15115830 ("preempt: Cleanup the macro maze a bit") Signed-off-by:
Finn Thain <fthain@linux-m68k.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/0a403120a682a525e6db2d81d1a3ffcc137c3742.1694756831.git.fthain@linux-m68k.org
-
Sebastian Andrzej Siewior authored
Since commit: 8a99b683 ("sched: Move SCHED_DEBUG sysctl to debugfs") The sched_debug interface moved from /proc to debugfs. The comment mentions still the outdated proc interfaces. Update the comment, point to the current location of the interface. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230920130025.412071-3-bigeasy@linutronix.de
-
Sebastian Andrzej Siewior authored
The /proc/sys/kernel/sched_child_runs_first knob is no longer connected since: 5e963f2b ("sched/fair: Commit to EEVDF") Remove it. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230920130025.412071-2-bigeasy@linutronix.de
-
- 19 Sep, 2023 2 commits
-
-
Ingo Molnar authored
The name is a bit opaque - make it clear that this is about wakeup preemption. Also rename the ->check_preempt_curr() methods similarly. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org>
-
Ingo Molnar authored
Other scheduling classes already postfix their similar methods with the class name. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org>
-
- 18 Sep, 2023 4 commits
-
-
GUO Zihua authored
Remove duplicated includes of linux/cgroup.h and linux/psi.h. Both of these includes are included regardless of the config and they are all protected by ifndef, so no point including them again. Signed-off-by:
GUO Zihua <guozihua@huawei.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230818015633.18370-1-guozihua@huawei.com
-
Aaron Lu authored
When using sysbench to benchmark Postgres in a single docker instance with sysbench's nr_threads set to nr_cpu, it is observed there are times update_cfs_group() and update_load_avg() shows noticeable overhead on a 2sockets/112core/224cpu Intel Sapphire Rapids(SPR): 13.75% 13.74% [kernel.vmlinux] [k] update_cfs_group 10.63% 10.04% [kernel.vmlinux] [k] update_load_avg Annotate shows the cycles are mostly spent on accessing tg->load_avg with update_load_avg() being the write side and update_cfs_group() being the read side. tg->load_avg is per task group and when different tasks of the same taskgroup running on different CPUs frequently access tg->load_avg, it can be heavily contended. E.g. when running postgres_sysbench on a 2sockets/112cores/224cpus Intel Sappire Rapids, during a 5s window, the wakeup number is 14millions and migration number is 11millions and with each migration, the task's load will transfer from src cfs_rq to target cfs_rq and each change involves an update to tg->load_avg. Since the workload can trigger as many wakeups and migrations, the access(both read and write) to tg->load_avg can be unbound. As a result, the two mentioned functions showed noticeable overhead. With netperf/nr_client=nr_cpu/UDP_RR, the problem is worse: during a 5s window, wakeup number is 21millions and migration number is 14millions; update_cfs_group() costs ~25% and update_load_avg() costs ~16%. Reduce the overhead by limiting updates to tg->load_avg to at most once per ms. The update frequency is a tradeoff between tracking accuracy and overhead. 1ms is chosen because PELT window is roughly 1ms and it delivered good results for the tests that I've done. After this change, the cost of accessing tg->load_avg is greatly reduced and performance improved. Detailed test results below. ============================== postgres_sysbench on SPR: 25% base: 42382±19.8% patch: 50174±9.5% (noise) 50% base: 67626±1.3% patch: 67365±3.1% (noise) 75% base: 100216±1.2% patch: 112470±0.1% +12.2% 100% base: 93671±0.4% patch: 113563±0.2% +21.2% ============================== hackbench on ICL: group=1 base: 114912±5.2% patch: 117857±2.5% (noise) group=4 base: 359902±1.6% patch: 361685±2.7% (noise) group=8 base: 461070±0.8% patch: 491713±0.3% +6.6% group=16 base: 309032±5.0% patch: 378337±1.3% +22.4% ============================= hackbench on SPR: group=1 base: 100768±2.9% patch: 103134±2.9% (noise) group=4 base: 413830±12.5% patch: 378660±16.6% (noise) group=8 base: 436124±0.6% patch: 490787±3.2% +12.5% group=16 base: 457730±3.2% patch: 680452±1.3% +48.8% ============================ netperf/udp_rr on ICL 25% base: 114413±0.1% patch: 115111±0.0% +0.6% 50% base: 86803±0.5% patch: 86611±0.0% (noise) 75% base: 35959±5.3% patch: 49801±0.6% +38.5% 100% base: 61951±6.4% patch: 70224±0.8% +13.4% =========================== netperf/udp_rr on SPR 25% base: 104954±1.3% patch: 107312±2.8% (noise) 50% base: 55394±4.6% patch: 54940±7.4% (noise) 75% base: 13779±3.1% patch: 36105±1.1% +162% 100% base: 9703±3.7% patch: 28011±0.2% +189% ============================================== netperf/tcp_stream on ICL (all in noise range) 25% base: 43092±0.1% patch: 42891±0.5% 50% base: 19278±14.9% patch: 22369±7.2% 75% base: 16822±3.0% patch: 17086±2.3% 100% base: 18216±0.6% patch: 18078±2.9% =============================================== netperf/tcp_stream on SPR (all in noise range) 25% base: 34491±0.3% patch: 34886±0.5% 50% base: 19278±14.9% patch: 22369±7.2% 75% base: 16822±3.0% patch: 17086±2.3% 100% base: 18216±0.6% patch: 18078±2.9% Reported-by:
Nitin Tekchandani <nitin.tekchandani@intel.com> Suggested-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Aaron Lu <aaron.lu@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by:
David Vernet <void@manifault.com> Tested-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Tested-by:
Swapnil Sapkal <Swapnil.Sapkal@amd.com> Link: https://lkml.kernel.org/r/20230912065808.2530-2-aaron.lu@intel.com
-
Elliot Berman authored
After commit f5d39b02 ("freezer,sched: Rewrite core freezer logic"), tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN are always woken up on the thaw path. Prior to that commit, tasks could ask freezer to consider them "frozen enough" via freezer_do_not_count(). The commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which allows freezer to immediately mark the task as TASK_FROZEN without waking up the task. This is efficient for the suspend path, but on the thaw path, the task is always woken up even if the task didn't need to wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although these tasks are capable of handling of the wakeup, we can observe a power/perf impact from the extra wakeup. We observed on Android many tasks wait in the TASK_FREEZABLE state (particularly due to many of them being binder clients). We observed nearly 4x the number of tasks and a corresponding linear increase in latency and power consumption when thawing the system. The latency increased from ~15ms to ~50ms. Avoid the spurious wakeups by saving the state of TASK_FREEZABLE tasks. If the task was running before entering TASK_FROZEN state (__refrigerator()) or if the task received a wake up for the saved state, then the task is woken on thaw. saved_state from PREEMPT_RT locks can be re-used because freezer would not stomp on the rtlock wait flow: TASK_RTLOCK_WAIT isn't considered freezable. Reported-by:
Prakash Viswalingam <quic_prakashv@quicinc.com> Signed-off-by:
Elliot Berman <quic_eberman@quicinc.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Elliot Berman authored
In preparation for freezer to also use saved_state, remove the CONFIG_PREEMPT_RT compilation guard around saved_state. On the arm64 platform I tested which did not have CONFIG_PREEMPT_RT, there was no statistically significant deviation by applying this patch. Test methodology: perf bench sched message -g 40 -l 40 Signed-off-by:
Elliot Berman <quic_eberman@quicinc.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 15 Sep, 2023 9 commits
-
-
Uros Bizjak authored
Use equivalent do-while loop instead of infinite for loop. There are no asm code changes. Signed-off-by:
Uros Bizjak <ubizjak@gmail.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230228161426.4508-1-ubizjak@gmail.com
-
Chengming Zhou authored
We don't need to maintain per-queue leaf_cfs_rq_list on !SMP, since it's used for cfs_rq load tracking & balancing on SMP. But sched debug interface uses it to print per-cfs_rq stats. This patch fixes the !SMP version of cfs_rq_is_decayed(), so the per-queue leaf_cfs_rq_list is also maintained correctly on !SMP, to fix the warning in assert_list_leaf_cfs_rq(). Fixes: 0a00a354 ("sched/fair: Delete useless condition in tg_unthrottle_up()") Reported-by:
Leo Yu-Chi Liang <ycliang@andestech.com> Signed-off-by:
Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Tested-by:
Leo Yu-Chi Liang <ycliang@andestech.com> Reviewed-by:
Vincent Guittot <vincent.guittot@linaro.org> Closes: https://lore.kernel.org/all/ZN87UsqkWcFLDxea@swlinux02/ Link: https://lore.kernel.org/r/20230913132031.2242151-1-chengming.zhou@linux.dev
-
Yury Norov authored
Reword sched_numa_find_nth_cpu() comment and make it kernel-doc compatible. Signed-off-by:
Yury Norov <yury.norov@gmail.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20230819141239.287290-7-yury.norov@gmail.com
-
Yury Norov authored
sched_numa_find_nth_cpu() doesn't handle NUMA_NO_NODE properly, and may crash kernel if passed with it. On the other hand, the only user of sched_numa_find_nth_cpu() has to check NUMA_NO_NODE case explicitly. It would be easier for users if this logic will get moved into sched_numa_find_nth_cpu(). Signed-off-by:
Yury Norov <yury.norov@gmail.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20230819141239.287290-6-yury.norov@gmail.com
-
Yury Norov authored
When CONFIG_NUMA is enabled, sched_numa_find_nth_cpu() searches for a CPU in sched_domains_numa_masks. The masks includes only online CPUs, so effectively offline CPUs are skipped. When CONFIG_NUMA is disabled, the fallback function should be consistent. Fixes: cd7f5535 ("sched: add sched_numa_find_nth_cpu()") Signed-off-by:
Yury Norov <yury.norov@gmail.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20230819141239.287290-5-yury.norov@gmail.com
-
Yury Norov authored
When the node provided by user is CPU-less, corresponding record in sched_domains_numa_masks is not set. Trying to dereference it in the following code leads to kernel crash. To avoid it, start searching from the nearest node with CPUs. Fixes: cd7f5535 ("sched: add sched_numa_find_nth_cpu()") Reported-by:
Yicong Yang <yangyicong@hisilicon.com> Reported-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Yury Norov <yury.norov@gmail.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Yicong Yang <yangyicong@hisilicon.com> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20230819141239.287290-4-yury.norov@gmail.com Closes: https://lore.kernel.org/lkml/CAAH8bW8C5humYnfpW3y5ypwx0E-09A3QxFE1JFzR66v+mO4XfA@mail.gmail.com/T/ Closes: https://lore.kernel.org/lkml/ZMHSNQfv39HN068m@yury-ThinkPad/T/#mf6431cb0b7f6f05193c41adeee444bc95bf2b1c4
-
Yury Norov authored
task_numa_placement() searches for a nearest node to migrate by calling for_each_node_state(). Now that we have numa_nearest_node(), switch to using it. Signed-off-by:
Yury Norov <yury.norov@gmail.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20230819141239.287290-3-yury.norov@gmail.com
-
Yury Norov authored
The function in fact searches the nearest node for a given one, based on a N_ONLINE state. This is a common pattern to search for a nearest node. This patch converts numa_map_to_online_node() to numa_nearest_node() so that others won't need to opencode the logic. Signed-off-by:
Yury Norov <yury.norov@gmail.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20230819141239.287290-2-yury.norov@gmail.com
-
Matthew Wilcox (Oracle) authored
list_for_each_entry_rcu() takes an optional fourth argument which allows RCU to assert that the correct lock is held. Several callers of for_each_thread() rely on their caller to be holding the appropriate lock, so this is a useful assertion to include. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20230821134428.2504912-1-willy@infradead.org
-
- 13 Sep, 2023 5 commits
-
-
Peter Zijlstra authored
Random remaining guard use... Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Use guards to reduce gotos and simplify control flow. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Use guards to reduce gotos and simplify control flow. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Use guards to reduce gotos and simplify control flow. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Use guards to reduce gotos and simplify control flow. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-