- 23 Aug, 2022 6 commits
-
-
Chengming Zhou authored
commit 7dc603c9 ("sched/fair: Fix PELT integrity for new tasks") fixed two load tracking problems for new task, including detach on unattached new task problem. There still left another detach on unattached task problem for the task which has been woken up by try_to_wake_up() and waiting for actually being woken up by sched_ttwu_pending(). try_to_wake_up(p) cpu = select_task_rq(p) if (task_cpu(p) != cpu) set_task_cpu(p, cpu) migrate_task_rq_fair() remove_entity_load_avg() --> unattached se->avg.last_update_time = 0; __set_task_cpu() ttwu_queue(p, cpu) ttwu_queue_wakelist() __ttwu_queue_wakelist() task_change_group_fair() detach_task_cfs_rq() detach_entity_cfs_rq() detach_entity_load_avg() --> detach on unattached task set_task_rq() attach_task_cfs_rq() attach_entity_cfs_rq() attach_entity_load_avg() The reason of this problem is similar, we should check in detach_entity_cfs_rq() that se->avg.last_update_time != 0, before do detach_entity_load_avg(). Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20220818124805.601-7-zhouchengming@bytedance.com
-
Chengming Zhou authored
When we are migrating task out of the CPU, we can combine detach and propagation into dequeue_entity() to save the detach_entity_cfs_rq() in migrate_task_rq_fair(). This optimization is like combining DO_ATTACH in the enqueue_entity() when migrating task to the CPU. So we don't have to traverse the CFS tree extra time to do the detach_entity_cfs_rq() -> propagate_entity_cfs_rq(), which wouldn't be called anymore with this patch's change. detach_task() deactivate_task() dequeue_task_fair() for_each_sched_entity(se) dequeue_entity() update_load_avg() /* (1) */ detach_entity_load_avg() set_task_cpu() migrate_task_rq_fair() detach_entity_cfs_rq() /* (2) */ update_load_avg(); detach_entity_load_avg(); propagate_entity_cfs_rq(); for_each_sched_entity() update_load_avg() This patch save the detach_entity_cfs_rq() called in (2) by doing the detach_entity_load_avg() for a CPU migrating task inside (1) (the task being the first se in the loop) Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20220818124805.601-6-zhouchengming@bytedance.com
-
Chengming Zhou authored
When reading the sched_avg related code, I found the comments in enqueue/dequeue_entity() are not updated with the current code. We don't add/subtract entity's runnable_avg from cfs_rq->runnable_avg during enqueue/dequeue_entity(), those are done only for attach/detach. This patch updates the comments to reflect the current code working. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20220818124805.601-5-zhouchengming@bytedance.com
-
Chengming Zhou authored
set_task_rq() -> set_task_rq_fair() will try to synchronize the blocked task's sched_avg when migrate, which is not needed for already detached task. task_change_group_fair() will detached the task sched_avg from prev cfs_rq first, so reset sched_avg last_update_time before set_task_rq() to avoid that. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20220818124805.601-4-zhouchengming@bytedance.com
-
Chengming Zhou authored
We use cpu_cgrp_subsys->fork() to set task group for the new fair task in cgroup_post_fork(). Since commit b1e82065 ("sched: Fix yet more sched_fork() races") has already set_task_rq() for the new fair task in sched_cgroup_fork(), so cpu_cgrp_subsys->fork() can be removed. cgroup_can_fork() --> pin parent's sched_task_group sched_cgroup_fork() __set_task_cpu() set_task_rq() cgroup_post_fork() ss->fork() := cpu_cgroup_fork() sched_change_group(..., TASK_SET_GROUP) task_set_group_fair() set_task_rq() --> can be removed After this patch's change, task_change_group_fair() only need to care about task cgroup migration, make the code much simplier. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20220818124805.601-3-zhouchengming@bytedance.com
-
Chengming Zhou authored
Previously we only maintain task se depth in task_move_group_fair(), if a !fair task change task group, its se depth will not be updated, so commit eb7a59b2 ("sched/fair: Reset se-depth when task switched to FAIR") fix the problem by updating se depth in switched_to_fair() too. Then commit daa59407 ("sched/fair: Unify switched_{from,to}_fair() and task_move_group_fair()") unified these two functions, moved se.depth setting to attach_task_cfs_rq(), which further into attach_entity_cfs_rq() with commit df217913 ("sched/fair: Factorize attach/detach entity"). This patch move task se depth maintenance from attach_entity_cfs_rq() to set_task_rq(), which will be called when CPU/cgroup change, so its depth will always be correct. This patch is preparation for the next patch. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20220818124805.601-2-zhouchengming@bytedance.com
-
- 04 Aug, 2022 1 commit
-
-
Xin Gao authored
Signed-off-by: Xin Gao <gaoxin@cdjrlc.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220719111044.7095-1-gaoxin@cdjrlc.com
-
- 03 Aug, 2022 2 commits
-
-
Bing Huang authored
The load_balance_mask and select_rq_mask percpu variables are only used in kernel/sched/fair.c. Make them static and move their allocation into init_sched_fair_class(). Replace kzalloc_node() with zalloc_cpumask_var_node() to get rid of the CONFIG_CPUMASK_OFFSTACK #ifdef and to align with per-cpu cpumask allocation for RT (local_cpu_mask in init_sched_rt_class()) and DL class (local_cpu_mask_dl in init_sched_dl_class()). [ mingo: Tidied up changelog & touched up the code. ] Signed-off-by: Bing Huang <huangbing@kylinos.cn> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20220722213609.3901-1-huangbing775@126.com
-
Hao Jia authored
After commit 7a82e5f5 ("sched/fair: Merge for each idle cpu loop of ILB"), _nohz_idle_balance()'s 'idle' parameter is not used anymore, so we can remove it. Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20220803130223.70419-1-jiahao.os@bytedance.com
-
- 02 Aug, 2022 4 commits
-
-
Zhen Lei authored
Currently, the values of some fields are printed right-aligned, causing the field value to be next to the next field name rather than next to its own field name. So print each field value left-aligned, to make it more readable. Before: stack: 0 pid: 307 ppid: 2 flags:0x00000008 After: stack:0 pid:308 ppid:2 flags:0x0000000a This also makes them print in the same style as the other two fields: task:demo0 state:R running task Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20220727060819.1085-1-thunder.leizhen@huawei.com
-
Dietmar Eggemann authored
Save a multiplication in dl_task_fits_capacity() by using already maintained per-sched_dl_entity (i.e. per-task) `dl_runtime/dl_deadline` (dl_density). cap_scale(dl_deadline, cap) >= dl_runtime dl_deadline * cap >> SCHED_CAPACITY_SHIFT >= dl_runtime cap >= dl_runtime << SCHED_CAPACITY_SHIFT / dl_deadline cap >= (dl_runtime << BW_SHIFT / dl_deadline) >> BW_SHIFT - SCHED_CAPACITY_SHIFT cap >= dl_density >> BW_SHIFT - SCHED_CAPACITY_SHIFT __sched_setscheduler()->__checkparam_dl() ensures that the 2 corner cases (if conditions) `runtime == RUNTIME_INF (-1)` and `period == 0` of to_ratio(deadline, runtime) are not met when setting dl_density in __sched_setscheduler()-> __setscheduler_params()->__setparam_dl(). Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220729111305.1275158-4-dietmar.eggemann@arm.com
-
Dietmar Eggemann authored
dl_cpuset_cpumask_can_shrink() is used to validate whether there is still enough CPU capacity for DL tasks in the reduced cpuset. Currently it still operates on `# remaining CPUs in the cpuset` (1). Change this to use the already capacity-aware DL admission control __dl_overflow() for the `cpumask can shrink` test. dl_b->bw = sched_rt_period << BW_SHIFT / sched_rt_period dl_b->bw * (1) >= currently allocated bandwidth in root_domain (rd) Replace (1) w/ `\Sum CPU capacity in rd >> SCHED_CAPACITY_SHIFT` Adapt __dl_bw_capacity() to take a cpumask instead of a CPU number argument so that `rd->span` and `cpumask of the reduced cpuset` can be used here. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220729111305.1275158-3-dietmar.eggemann@arm.com
-
Dietmar Eggemann authored
Create an inline helper for conditional code to be only executed on asymmetric CPU capacity systems. This makes these (currently ~10 and future) conditions a lot more readable. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220729111305.1275158-2-dietmar.eggemann@arm.com
-
- 01 Aug, 2022 27 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq updates from Thomas Gleixner: "Updates for interrupt core and drivers: Core: - Fix a few inconsistencies between UP and SMP vs interrupt affinities - Small updates and cleanups all over the place New drivers: - LoongArch interrupt controller - Renesas RZ/G2L interrupt controller Updates: - Hotpath optimization for SiFive PLIC - Workaround for broken PLIC edge triggered interrupts - Simall cleanups and improvements as usual" * tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) irqchip/mmp: Declare init functions in common header file irqchip/mips-gic: Check the return value of ioremap() in gic_of_init() genirq: Use for_each_action_of_desc in actions_show() irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch irqchip: Add LoongArch CPU interrupt controller support irqchip: Add Loongson Extended I/O interrupt controller support irqchip/loongson-liointc: Add ACPI init support irqchip/loongson-pch-msi: Add ACPI init support irqchip/loongson-pch-pic: Add ACPI init support irqchip: Add Loongson PCH LPC controller support LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain LoongArch: Use ACPI_GENERIC_GSI for gsi handling genirq/generic_chip: Export irq_unmap_generic_chip ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback APCI: irq: Add support for multiple GSI domains LoongArch: Provisionally add ACPICA data structures irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains irqdomain: Report irq number for NOMAP domains irqchip/gic-v3: Fix comment typo dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/V2L SoC ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer updates from Thomas Gleixner: "Timers, timekeeping and related drivers update: Core: - Make wait_event_hrtimeout() aware of RT/DL tasks New drivers: - R-Car Gen4 timer - Tegra186 timer - Mediatek MT6795 CPUXGPT timer Updates: - Rework suspend/resume handling in timer drivers so it takes inactive clocks into account. - The usual device tree compatible add ons - Small fixed and cleanups all over the place" * tag 'timers-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) wait: Fix __wait_event_hrtimeout for RT/DL tasks clocksource/drivers/sun5i: Remove unnecessary (void*) conversions dt-bindings: timer: allwinner,sun4i-a10-timer: Add D1 compatible dt-bindings: timer: ingenic,tcu: use absolute path to other schema clocksource/drivers/sun4i: Remove unnecessary (void*) conversions dt-bindings: timer: renesas,cmt: Fix R-Car Gen4 fall-out clocksource/drivers/tegra186: Put Kconfig option 'tristate' to 'bool' clocksource/drivers/timer-ti-dm: Make driver selection bool for TI K3 clocksource/drivers/timer-ti-dm: Add compatible for am6 SoCs clocksource/drivers/timer-ti-dm: Make timer selectable for ARCH_K3 clocksource/drivers/timer-ti-dm: Move inline functions to driver for am6 clocksource/drivers/sh_cmt: Add R-Car Gen4 support dt-bindings: timer: renesas,cmt: R-Car V3U is R-Car Gen4 dt-bindings: timer: renesas,cmt: Add r8a779f0 and generic Gen4 CMT support clocksource/drivers/timer-microchip-pit64b: Fix compilation warnings clocksource/drivers/timer-microchip-pit64b: Use mchp_pit64b_{suspend, resume} clocksource/drivers/timer-microchip-pit64b: Remove suspend/resume ops for ce thermal/drivers/rcar_gen3_thermal: Add r8a779f0 support clocksource/drivers/timer-mediatek: Implement CPUXGPT timers dt-bindings: timer: mediatek: Add CPUX System Timer and MT6795 compatible ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf events updates from Ingo Molnar: - Fix Intel Alder Lake PEBS memory access latency & data source profiling info bugs. - Use Intel large-PEBS hardware feature in more circumstances, to reduce PMI overhead & reduce sampling data. - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI variant, which tells tooling the exact number of samples lost. - Add new IBS register bits definitions. - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements. * tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/ibs: Add new IBS register bits into header perf/x86/intel: Fix PEBS data source encoding for ADL perf/x86/intel: Fix PEBS memory access info encoding for ADL perf/core: Add a new read format to get a number of lost samples perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments perf/x86/amd/uncore: Add PerfMonV2 DF event format perf/x86/amd/uncore: Detect available DF counters perf/x86/amd/uncore: Use attr_update for format attributes perf/x86/amd/uncore: Use dynamic events array x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking updates from Ingo Molnar: "This was a fairly quiet cycle for the locking subsystem: - lockdep: Fix a handful of the more complex lockdep_init_map_*() primitives that can lose the lock_type & cause false reports. No such mishap was observed in the wild. - jump_label improvements: simplify the cross-arch support of initial NOP patching by making it arch-specific code (used on MIPS only), and remove the s390 initial NOP patching that was superfluous" * tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix lockdep_init_map_*() confusion jump_label: make initial NOP patching the special case jump_label: mips: move module NOP patching into arch code jump_label: s390: avoid pointless initial NOP patching
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler updates from Ingo Molnar: "Load-balancing improvements: - Improve NUMA balancing on AMD Zen systems for affine workloads. - Improve the handling of reduced-capacity CPUs in load-balancing. - Energy Model improvements: fix & refine all the energy fairness metrics (PELT), and remove the conservative threshold requiring 6% energy savings to migrate a task. Doing this improves power efficiency for most workloads, and also increases the reliability of energy-efficiency scheduling. - Optimize/tweak select_idle_cpu() to spend (much) less time searching for an idle CPU on overloaded systems. There's reports of several milliseconds spent there on large systems with large workloads ... [ Since the search logic changed, there might be behavioral side effects. ] - Improve NUMA imbalance behavior. On certain systems with spare capacity, initial placement of tasks is non-deterministic, and such an artificial placement imbalance can persist for a long time, hurting (and sometimes helping) performance. The fix is to make fork-time task placement consistent with runtime NUMA balancing placement. Note that some performance regressions were reported against this, caused by workloads that are not memory bandwith limited, which benefit from the artificial locality of the placement bug(s). Mel Gorman's conclusion, with which we concur, was that consistency is better than random workload benefits from non-deterministic bugs: "Given there is no crystal ball and it's a tradeoff, I think it's better to be consistent and use similar logic at both fork time and runtime even if it doesn't have universal benefit." - Improve core scheduling by fixing a bug in sched_core_update_cookie() that caused unnecessary forced idling. - Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs for newly woken tasks. - Fix a newidle balancing bug that introduced unnecessary wakeup latencies. ABI improvements/fixes: - Do not check capabilities and do not issue capability check denial messages when a scheduler syscall doesn't require privileges. (Such as increasing niceness.) - Add forced-idle accounting to cgroups too. - Fix/improve the RSEQ ABI to not just silently accept unknown flags. (No existing tooling is known to have learned to rely on the previous behavior.) - Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags. Optimizations: - Optimize & simplify leaf_cfs_rq_list() - Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg(). Misc fixes & cleanups: - Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems. - Fix a full-NOHZ bug that can in some cases result in the tick not being re-enabled when the last SCHED_RT task is gone from a runqueue but there's still SCHED_OTHER tasks around. - Various PREEMPT_RT related fixes. - Misc cleanups & smaller fixes" * tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) rseq: Kill process when unknown flags are encountered in ABI structures rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags sched/core: Fix the bug that task won't enqueue into core tree when update cookie nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt() sched/core: Always flush pending blk_plug sched/fair: fix case with reduced capacity CPU sched/core: Use try_cmpxchg in set_nr_{and_not,if}_polling sched/core: add forced idle accounting for cgroups sched/fair: Remove the energy margin in feec() sched/fair: Remove task_util from effective utilization in feec() sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu() sched/fair: Rename select_idle_mask to select_rq_mask sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util() sched/fair: Decay task PELT values during wakeup migration sched/fair: Provide u64 read for 32-bits arch helper sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg sched: only perform capability check on privileged operation sched: Remove unused function group_first_cpu() sched/fair: Remove redundant word " *" selftests/rseq: check if libc rseq support is registered ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slabLinus Torvalds authored
Pull slab updates from Vlastimil Babka: - An addition of 'accounted' flag to slab allocation tracepoints to indicate memcg_kmem accounting, by Vasily - An optimization of memcg handling in freeing paths, by Muchun - Various smaller fixes and cleanups * tag 'slab-for-5.20_or_6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab_common: move generic bulk alloc/free functions to SLOB mm/sl[au]b: use own bulk free function when bulk alloc failed mm: slab: optimize memcg_slab_free_hook() mm/tracing: add 'accounted' entry into output of allocation tracepoints tools/vm/slabinfo: Handle files in debugfs mm/slub: Simplify __kmem_cache_alias() mm, slab: fix bad alignments
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 updates from Will Deacon: "Highlights include a major rework of our kPTI page-table rewriting code (which makes it both more maintainable and considerably faster in the cases where it is required) as well as significant changes to our early boot code to reduce the need for data cache maintenance and greatly simplify the KASLR relocation dance. Summary: - Remove unused generic cpuidle support (replaced by PSCI version) - Fix documentation describing the kernel virtual address space - Handling of some new CPU errata in Arm implementations - Rework of our exception table code in preparation for handling machine checks (i.e. RAS errors) more gracefully - Switch over to the generic implementation of ioremap() - Fix lockdep tracking in NMI context - Instrument our memory barrier macros for KCSAN - Rework of the kPTI G->nG page-table repainting so that the MMU remains enabled and the boot time is no longer slowed to a crawl for systems which require the late remapping - Enable support for direct swapping of 2MiB transparent huge-pages on systems without MTE - Fix handling of MTE tags with allocating new pages with HW KASAN - Expose the SMIDR register to userspace via sysfs - Continued rework of the stack unwinder, particularly improving the behaviour under KASAN - More repainting of our system register definitions to match the architectural terminology - Improvements to the layout of the vDSO objects - Support for allocating additional bits of HWCAP2 and exposing FEAT_EBF16 to userspace on CPUs that support it - Considerable rework and optimisation of our early boot code to reduce the need for cache maintenance and avoid jumping in and out of the kernel when handling relocation under KASLR - Support for disabling SVE and SME support on the kernel command-line - Support for the Hisilicon HNS3 PMU - Miscellanous cleanups, trivial updates and minor fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits) arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr} arm64: fix KASAN_INLINE arm64/hwcap: Support FEAT_EBF16 arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long arm64/hwcap: Document allocation of upper bits of AT_HWCAP arm64: enable THP_SWAP for arm64 arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52 arm64: errata: Remove AES hwcap for COMPAT tasks arm64: numa: Don't check node against MAX_NUMNODES drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node() docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags" mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON mm: kasan: Skip unpoisoning of user pages mm: kasan: Ensure the tags are visible before the tag in page->flags drivers/perf: hisi: add driver for HNS3 PMU drivers/perf: hisi: Add description for HNS3 PMU driver drivers/perf: riscv_pmu_sbi: perf format perf/arm-cci: Use the bitmap API to allocate bitmaps ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68kLinus Torvalds authored
Pull m68k updates from Geert Uytterhoeven: - Use RNG seed from bootinfo block on virt platform - defconfig updates - Minor fixes and improvements * tag 'm68k-for-v5.20-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v5.19-rc1 m68k: Add common forward declaration for show_registers() m68k: mac: Remove forward declaration for mac_nmi_handler() m68k: virt: Fix missing platform_device_unregister() on error in virt_platform_init() m68k: virt: Use RNG seed from bootinfo block m68k: bitops: Change __fls to return and accept unsigned long m68k: Kconfig.machine: Add endif comment m68k: Kconfig.debug: Replace single quotes m68k: Kconfig.cpu: Fix indentation and add endif comments m68k: q40: Align '*' in comments m68k: sun3: Use __func__ to get function's name in an output message m68k: mac: Fix typos in comments m68k: virt: Kconfig minor fixes
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 kdump updates from Borislav Petkov: - Add the ability to pass early an RNG seed to the kernel from the boot loader - Add the ability to pass the IMA measurement of kernel and bootloader to the kexec-ed kernel * tag 'x86_kdump_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/setup: Use rng seeds from setup_data x86/kexec: Carry forward IMA measurement log on kexec
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 build updates from Borislav Petkov: - Fix stack protector builds when cross compiling with Clang - Other Kbuild improvements and fixes * tag 'x86_build_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/purgatory: Omit use of bin2c x86/purgatory: Hard-code obj-y in Makefile x86/build: Remove unused OBJECT_FILES_NON_STANDARD_test_nx.o x86/Kconfig: Fix CONFIG_CC_HAS_SANE_STACKPROTECTOR when cross compiling with clang
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 core updates from Borislav Petkov: - Have invalid MSR accesses warnings appear only once after a pr_warn_once() change broke that - Simplify {JMP,CALL}_NOSPEC and let the objtool retpoline patching infra take care of them instead of having unreadable alternative macros there * tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/extable: Fix ex_handler_msr() print condition x86,nospec: Simplify {JMP,CALL}_NOSPEC
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull misc x86 updates from Borislav Petkov: - Add a bunch of PCI IDs for new AMD CPUs and use them in k10temp - Free the pmem platform device on the registration error path * tag 'x86_misc_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hwmon: (k10temp): Add support for new family 17h and 19h models x86/amd_nb: Add AMD PCI IDs for SMN communication x86/pmem: Fix platform-device leak in error path
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 cpu updates from Borislav Petkov: - Remove the vendor check when selecting MWAIT as the default idle state - Respect idle=nomwait when supplied on the kernel cmdline - Two small cleanups * tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Use MSR_IA32_MISC_ENABLE constants x86: Fix comment for X86_FEATURE_ZEN x86: Remove vendor checks from prefer_mwait_c1_over_halt x86: Handle idle=nomwait cmdline properly for x86_idle
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fpu update from Borislav Petkov: - Add machinery to initialize AMX register state in order for AMX-capable CPUs to be able to enter deeper low-power state * tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Add a new flag to initialize the AMX state x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 mm updates from Borislav Petkov: - Rename a PKRU macro to make more sense when reading the code - Update pkeys documentation - Avoid reading contended mm's TLB generation var if not absolutely necessary along with fixing a case where arch_tlbbatch_flush() doesn't adhere to the generation scheme and thus violates the conditions for the above avoidance. * tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/tlb: Ignore f->new_tlb_gen when zero x86/pkeys: Clarify PKRU_AD_KEY macro Documentation/protection-keys: Clean up documentation for User Space pkeys x86/mm/tlb: Avoid reading mm_tlb_gen when possible
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 cleanup from Borislav Petkov: - A single CONFIG_ symbol correction in a comment * tag 'x86_cleanups_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Refer to the intended config STRICT_DEVMEM in a comment
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 vmware cleanup from Borislav Petkov: - A single statement simplification by using the BIT() macro * tag 'x86_vmware_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmware: Use BIT() macro for shifting
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull RAS update from Borislav Petkov: "A single RAS change: - Probe whether hardware error injection (direct MSR writes) is possible when injecting errors on AMD platforms. In some cases, the platform could prohibit those" * tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Check whether writes to MCA_STATUS are getting ignored
-
Linus Torvalds authored
Merge tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull acl updates from Christian Brauner: "Last cycle we introduced support for mounting overlayfs on top of idmapped mounts. While looking into additional testing we realized that posix acls don't really work correctly with stacking filesystems on top of idmapped layers. We already knew what the fix were but it would require work that is more suitable for the merge window so we turned off posix acls for v5.19 for overlayfs on top of idmapped layers with Miklos routing my patch upstream in 72a8e05d ("Merge tag 'ovl-fixes-5.19-rc7' [..]"). This contains the work to support posix acls for overlayfs on top of idmapped layers. Since the posix acl fixes should use the new vfs{g,u}id_t work the associated branch has been merged in. (We sent a pull request for this earlier.) We've also pulled in Miklos pull request containing my patch to turn of posix acls on top of idmapped layers. This allowed us to avoid rebasing the branch which we didn't like because we were already at rc7 by then. Merging it in allows this branch to first fix posix acls and then to cleanly revert the temporary fix it brought in by commit 4a47c638 ("ovl: turn of SB_POSIXACL with idmapped layers temporarily"). The last patch in this series adds Seth Forshee as a co-maintainer for idmapped mounts. Seth has been integral to all of this work and is also the main architect behind the filesystem idmapping work which ultimately made filesystems such as FUSE and overlayfs available in containers. He continues to be active in both development and review. I'm very happy he decided to help and he has my full trust. This increases the bus factor which is always great for work like this. I'm honestly very excited about this because I think in general we don't do great in the bringing on new maintainers department" For more explanations of the ACL issues, see https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/ * tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: Add Seth Forshee as co-maintainer for idmapped mounts Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily" ovl: handle idmappings in ovl_get_acl() acl: make posix_acl_clone() available to overlayfs acl: port to vfs{g,u}id_t acl: move idmapped mount fixup into vfs_{g,s}etxattr() mnt_idmapping: add vfs[g,u]id_into_k[g,u]id()
-
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linuxLinus Torvalds authored
Pull fs idmapping updates from Christian Brauner: "This introduces the new vfs{g,u}id_t types we agreed on. Similar to k{g,u}id_t the new types are just simple wrapper structs around regular {g,u}id_t types. They allow to establish a type safety boundary in the VFS for idmapped mounts preventing confusion betwen {g,u}ids mapped into an idmapped mount and {g,u}ids mapped into the caller's or the filesystem's idmapping. An initial set of helpers is introduced that allows to operate on vfs{g,u}id_t types. We will remove all references to non-type safe idmapped mounts helpers in the very near future. The patches do already exist. This converts the core attribute changing codepaths which become significantly easier to reason about because of this change. Just a few highlights here as the patches give detailed overviews of what is happening in the commit messages: - The kernel internal struct iattr contains type safe vfs{g,u}id_t values clearly communicating that these values have to take a given mount's idmapping into account. - The ownership values placed in struct iattr to change ownership are identical for idmapped and non-idmapped mounts going forward. This also allows to simplify stacking filesystems such as overlayfs that change attributes In other words, they always represent the values. - Instead of open coding checks for whether ownership changes have been requested and an actual update of the inode is required we now have small static inline wrappers that abstract this logic away removing a lot of code duplication from individual filesystems that all open-coded the same checks" * tag 'fs.idmapped.vfsuid.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: mnt_idmapping: align kernel doc and parameter order mnt_idmapping: use new helpers in mapped_fs{g,u}id() fs: port HAS_UNMAPPED_ID() to vfs{g,u}id_t mnt_idmapping: return false when comparing two invalid ids attr: fix kernel doc attr: port attribute changes to new types security: pass down mount idmapping to setattr hook quota: port quota helpers mount ids fs: port to iattr ownership update helpers fs: introduce tiny iattr ownership update helpers fs: use mount types in iattr fs: add two type safe mapping helpers mnt_idmapping: add vfs{g,u}id_t
-
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linuxLinus Torvalds authored
Pull file locking updates from Jeff Layton: "Just a couple of flock() patches from Kuniyuki Iwashima. The main change is that this moves a file_lock allocation from the slab to the stack" * tag 'filelock-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs/lock: Rearrange ops in flock syscall. fs/lock: Don't allocate file_lock in flock_make_lock().
-
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofsLinus Torvalds authored
Pull erofs updates from Gao Xiang: "First of all, we'd like to add Yue Hu and Jeffle Xu as two new reviewers. Thank them for spending time working on EROFS! There is no major feature outstanding in this cycle, mainly a patchset I worked on to prepare for rolling hash deduplication and folios for compressed data as the next big features. It kills the unneeded PG_error flag dependency as well. Apart from that, there are bugfixes and cleanups as always. Details are listed below: - Add Yue Hu and Jeffle Xu as reviewers - Add the missing wake_up when updating lzma streams - Avoid consecutive detection for Highmem memory - Prepare for multi-reference pclusters and get rid of PG_error - Fix ctx->pos update for NFS export - minor cleanups" * tag 'erofs-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (23 commits) erofs: update ctx->pos for every emitted dirent erofs: get rid of the leftover PAGE_SIZE in dir.c erofs: get rid of erofs_prepare_dio() helper erofs: introduce multi-reference pclusters (fully-referenced) erofs: record the longest decompressed size in this round erofs: introduce z_erofs_do_decompressed_bvec() erofs: try to leave (de)compressed_pages on stack if possible erofs: introduce struct z_erofs_decompress_backend erofs: get rid of `z_pagemap_global' erofs: clean up `enum z_erofs_collectmode' erofs: get rid of `enum z_erofs_page_type' erofs: rework online page handling erofs: switch compressed_pages[] to bufvec erofs: introduce `z_erofs_parse_in_bvecs' erofs: drop the old pagevec approach erofs: introduce bufvec to store decompressed buffers erofs: introduce `z_erofs_parse_out_bvecs()' erofs: clean up z_erofs_collector_begin() erofs: get rid of unneeded `inode', `map' and `sb' erofs: avoid consecutive detection for Highmem memory ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds authored
Pull fsnotify updates from Jan Kara: - support for FAN_MARK_IGNORE which untangles some of the not well defined corner cases with fanotify ignore masks - small cleanups * tag 'fsnotify_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: Fix comment typo fanotify: introduce FAN_MARK_IGNORE fanotify: cleanups for fanotify_mark() input validations fanotify: prepare for setting event flags in ignore mask fs: inotify: Fix typo in inotify comment
-
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds authored
Pull ext2 and reiserfs updates from Jan Kara: "A fix for ext2 handling of a corrupted fs image and cleanups in ext2 and reiserfs" * tag 'fs_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Add more validity checks for inode counts fs/reiserfs/inode: remove dead code in _get_block_create_0() fs/ext2: replace ternary operator with min_t()
-
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlmLinus Torvalds authored
Pull dlm updates from David Teigland: - Delay the cleanup of interrupted posix lock requests until the user space result arrives. Previously, the immediate cleanup would lead to extraneous warnings when the result arrived. - Tracepoint improvements, e.g. adding the lock resource name. - Delay the completion of lockspace creation until one full recovery cycle has completed. This allows more error cases to be returned to the caller. - Remove warnings from the locking layer about delayed network replies. The recently added midcomms warnings are much more useful. - Begin the process of deprecating two unused lock-timeout-related features. These features now require enabling via a Kconfig option, and enabling them triggers deprecation warnings. We expect to remove the code in v6.2. * tag 'dlm-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: move kref_put assert for lkb structs fs: dlm: don't use deprecated timeout features by default fs: dlm: add deprecation Kconfig and warnings for timeouts fs: dlm: remove timeout from dlm_user_adopt_orphan fs: dlm: remove waiter warnings fs: dlm: fix grammar in lowcomms output fs: dlm: add comment about lkb IFL flags fs: dlm: handle recovery result outside of ls_recover fs: dlm: make new_lockspace() wait until recovery completes fs: dlm: call dlm_lsop_recover_prep once fs: dlm: update comments about recovery and membership handling fs: dlm: add resource name to tracepoints fs: dlm: remove additional dereference of lksb fs: dlm: change ast and bast trace order fs: dlm: change posix lock sigint handling fs: dlm: use dlm_plock_info for do_unlock_close fs: dlm: change plock interrupted message to debug again fs: dlm: add pid to debug log fs: dlm: plock use list_first_entry
-
Alexander Aring authored
The unhold_lkb() function decrements the lock's kref, and asserts that the ref count was not the final one. Use the kref_put release function (which should not be called) to call the assert, rather than doing the assert based on the kref_put return value. Using kill_lkb() as the release function doesn't make sense if we only want to assert. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
-
Alexander Aring authored
This patch will disable use of deprecated timeout features if CONFIG_DLM_DEPRECATED_API is not set. The deprecated features will be removed in upcoming kernel release v6.2. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
-