An error occurred fetching the project authors.
- 04 Jul, 2022 10 commits
-
-
Sudeep Holla authored
Finally let us add support for socket nodes in /cpu-map in the device tree. Since this may not be present in all the old platforms and even most of the existing platforms, we need to assume absence of the socket node indicates that it is a single socket system and handle appropriately. Also it is likely that most single socket systems skip to as the node since it is optional. Link: https://lore.kernel.org/r/20220704101605.1318280-20-sudeep.holla@arm.comTested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Sudeep Holla authored
Let us set the cluster identifier as parsed from the device tree cluster nodes within /cpu-map. We don't support nesting of clusters yet as there are no real hardware to support clusters of clusters. Link: https://lore.kernel.org/r/20220704101605.1318280-19-sudeep.holla@arm.comTested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Ionela Voinescu authored
Currently the cluster identifier is not set on DT based platforms. The reset or default value is -1 for all the CPUs. Once we assign the cluster identifier values correctly, the cluster_sibling mask will be populated and returned by cpu_clustergroup_mask() to contribute in the creation of the CLS scheduling domain level, if SCHED_CLUSTER is enabled. To avoid topologies that will result in questionable or incorrect scheduling domains, impose restrictions regarding the span of clusters, as presented to scheduling domains building code: cluster_sibling should not span more or the same CPUs as cpu_coregroup_mask(). This is needed in order to obtain a strict separation between the MC and CLS levels, and maintain the same domains for existing platforms in the presence of CONFIG_SCHED_CLUSTER, where the new cluster information is redundant and irrelevant for the scheduler. While previously the scheduling domain builder code would have removed MC as redundant and kept CLS if SCHED_CLUSTER was enabled and the cpu_coregroup_mask() and cpu_clustergroup_mask() spanned the same CPUs, now CLS will be removed and MC kept. Link: https://lore.kernel.org/r/20220704101605.1318280-18-sudeep.holla@arm.com Cc: Darren Hart <darren@os.amperecomputing.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Acked-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Sudeep Holla authored
Currently as we parse the CPU topology from /cpu-map node from the device tree, we assign generated cluster count as the physical package identifier for each CPU which is wrong. The device tree bindings for CPU topology supports sockets to infer the socket or physical package identifier for a given CPU. Since it is fairly new and not supported on most of the old and existing systems, we can assume all such systems have single socket/physical package. Fix the physical package identifier to 0 by removing the assignment of cluster identifier to the same. Link: https://lore.kernel.org/r/20220704101605.1318280-17-sudeep.holla@arm.comTested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Sudeep Holla authored
There is no point in looping through all the CPU's physical package identifier to check if it is valid or not once a CPU which is outside the topology(i.e. outlier CPU) is found. Let us just break out of the loop early in such case. Link: https://lore.kernel.org/r/20220704101605.1318280-16-sudeep.holla@arm.comTested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Sudeep Holla authored
Instead of just comparing the cpu topology IDs with -1 to check their validity, improve that by checking for a valid non-negative value. Link: https://lore.kernel.org/r/20220704101605.1318280-15-sudeep.holla@arm.comSuggested-by:
Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Sudeep Holla authored
Currently the cluster identifier is not set on the DT based platforms. The reset or default value is -1 for all the CPUs. Once we assign the cluster identifier values correctly that may result in getting the thread siblings wrong as the core identifiers can be same for 2 different CPUs belonging to 2 different cluster. So, in order to get the thread sibling cpumasks correct, we need to update them only if the cores they belong are in the same cluster within the socket. Let us skip updation of the thread sibling cpumaks if the cluster identifier doesn't match. This change won't affect even if the cluster identifiers are not set currently but will avoid any breakage once we set the same correctly. Link: https://lore.kernel.org/r/20220704101605.1318280-14-sudeep.holla@arm.comTested-by:
Gavin Shan <gshan@redhat.com> Tested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Sudeep Holla authored
Since the cacheinfo LLC information is used directly in arch_topology, there is no need to parse and store the LLC ID information only for ACPI systems in the CPU topology. Remove the redundant LLC ID from the generic CPU arch_topology information. Link: https://lore.kernel.org/r/20220704101605.1318280-13-sudeep.holla@arm.comTested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Sudeep Holla authored
The cacheinfo is now initialised early along with the CPU topology initialisation. Instead of relying on the LLC ID information parsed separately only with ACPI PPTT elsewhere, migrate to use the similar information from the cacheinfo. This is generic for both DT and ACPI systems. The ACPI LLC ID information parsed separately can now be removed from arch specific code. Link: https://lore.kernel.org/r/20220704101605.1318280-11-sudeep.holla@arm.comTested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Sudeep Holla authored
Currently ACPI populates just the minimum information about the last level cache from PPTT in order to feed the same to build sched_domains. Similar support for DT platforms is not present. In order to enable the same, the entire cache hierarchy information can be built as part of CPU topoplogy parsing both on ACPI and DT platforms. Note that this change builds the cacheinfo early even on ACPI systems, but the current mechanism of building llc_sibling mask remains unchanged. Link: https://lore.kernel.org/r/20220704101605.1318280-10-sudeep.holla@arm.comTested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
- 06 May, 2022 1 commit
-
-
Lukasz Luba authored
Add trace event to capture the moment of the call for updating the thermal pressure value. It's helpful to investigate how often those events occur in a system dealing with throttling. This trace event is needed since the old 'cdev_update' might not be used by some drivers. The old 'cdev_update' trace event only provides a cooling state value: [0, n]. That state value then needs additional tools to translate it: state -> freq -> capacity -> thermal pressure. This new trace event just stores proper thermal pressure value in the trace buffer, no need for additional logic. This is helpful for cooperation when someone can simply sends to the list the trace buffer output from the platform (no need from additional information from other subsystems). There are also platforms which due to some design reasons don't use cooling devices and thus don't trigger old 'cdev_update' trace event. They are also important and measuring latency for the thermal signal raising/decaying characteristics is in scope. This new trace event would cover them as well. We already have a trace point 'pelt_thermal_tp' which after a change to trace event can be paired with this new 'thermal_pressure_update' and derive more insight what is going on in the system under thermal pressure (and why). Signed-off-by:
Lukasz Luba <lukasz.luba@arm.com> Acked-by:
Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20220427080806.1906-1-lukasz.luba@arm.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 20 Apr, 2022 2 commits
-
-
Wang Qing authored
When ACPI is not enabled, cpuid_topo->llc_id = cpu_topo->llc_id = -1, which will set llc_sibling 0xff(...), this is misleading. Don't set llc_sibling(default 0) if we don't know the cache topology. Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Wang Qing <wangqing@vivo.com> Fixes: 37c3ec2d ("arm64: topology: divorce MC scheduling domain from core_siblings") Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1649644580-54626-1-git-send-email-wangqing@vivo.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darren Hart authored
Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop Control Unit, but have no shared CPU-side last level cache. cpu_coregroup_mask() will return a cpumask with weight 1, while cpu_clustergroup_mask() will return a cpumask with weight 2. As a result, build_sched_domain() will BUG() once per CPU with: BUG: arch topology borken the CLS domain not a subset of the MC domain The MC level cpumask is then extended to that of the CLS child, and is later removed entirely as redundant. This sched domain topology is an improvement over previous topologies, or those built without SCHED_CLUSTER, particularly for certain latency sensitive workloads. With the current scheduler model and heuristics, this is a desirable default topology for Ampere Altra and Altra Max system. Rather than create a custom sched domains topology structure and introduce new logic in arch/arm64 to detect these systems, update the core_mask so coregroup is never a subset of clustergroup, extending it to cluster_siblings if necessary. Only do this if CONFIG_SCHED_CLUSTER is enabled to avoid also changing the topology (MC) when CONFIG_SCHED_CLUSTER is disabled. This has the added benefit over a custom topology of working for both symmetric and asymmetric topologies. It does not address systems where the CLUSTER topology is above a populated MC topology, but these are not considered today and can be addressed separately if and when they appear. The final sched domain topology for a 2 socket Ampere Altra system is unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided: For CPU0: CONFIG_SCHED_CLUSTER=y CLS [0-1] DIE [0-79] NUMA [0-159] CONFIG_SCHED_CLUSTER is not set DIE [0-79] NUMA [0-159] Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: D. Scott Phillips <scott@os.amperecomputing.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: <stable@vger.kernel.org> # 5.16.x Suggested-by:
Barry Song <song.bao.hua@hisilicon.com> Reviewed-by:
Barry Song <song.bao.hua@hisilicon.com> Reviewed-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Darren Hart <darren@os.amperecomputing.com> Link: https://lore.kernel.org/r/c8fe9fce7c86ed56b4c455b8c902982dc2303868.1649696956.git.darren@os.amperecomputing.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 10 Mar, 2022 1 commit
-
-
Ionela Voinescu authored
Define topology_init_cpu_capacity_cppc() to use highest performance values from _CPC objects to obtain and set maximum capacity information for each CPU. acpi_cppc_processor_probe() is a good point at which to trigger the initialization of CPU (u-arch) capacity values, as at this point the highest performance values can be obtained from each CPU's _CPC objects. Architectures can therefore use this functionality through arch_init_invariance_cppc(). The performance scale used by CPPC is a unified scale for all CPUs in the system. Therefore, by obtaining the raw highest performance values from the _CPC objects, and normalizing them on the [0, 1024] capacity scale, used by the task scheduler, we obtain the CPU capacity of each CPU. While an ACPI Notify(0x85) could alert about a change in the highest performance value, which should in turn retrigger the CPU capacity computations, this notification is not currently handled by the ACPI processor driver. When supported, a call to arch_init_invariance_cppc() would perform the update. Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Acked-by:
Sudeep Holla <sudeep.holla@arm.com> Tested-by:
Valentin Schneider <valentin.schneider@arm.com> Tested-by:
Yicong Yang <yangyicong@hisilicon.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 23 Nov, 2021 2 commits
-
-
Lukasz Luba authored
There is no need of this function (and related) since code has been converted to use the new arch_update_thermal_pressure() API. The old code can be removed. Signed-off-by:
Lukasz Luba <lukasz.luba@arm.com> Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org>
-
Lukasz Luba authored
The thermal pressure is a mechanism which is used for providing information about reduced CPU performance to the scheduler. Usually code has to convert the value from frequency units into capacity units, which are understandable by the scheduler. Create a common conversion code which can be just used via a handy API. Internally, the topology_update_thermal_pressure() operates on frequency in MHz and max CPU frequency is taken from 'freq_factor' (per-cpu). Signed-off-by:
Lukasz Luba <lukasz.luba@arm.com> Reviewed-by:
Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org>
-
- 11 Nov, 2021 1 commit
-
-
Wang ShaoBo authored
When testing cpu online and offline, warning happened like this: [ 146.746743] WARNING: CPU: 92 PID: 974 at kernel/sched/topology.c:2215 build_sched_domains+0x81c/0x11b0 [ 146.749988] CPU: 92 PID: 974 Comm: kworker/92:2 Not tainted 5.15.0 #9 [ 146.750402] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.79 08/21/2021 [ 146.751213] Workqueue: events cpuset_hotplug_workfn [ 146.751629] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 146.752048] pc : build_sched_domains+0x81c/0x11b0 [ 146.752461] lr : build_sched_domains+0x414/0x11b0 [ 146.752860] sp : ffff800040a83a80 [ 146.753247] x29: ffff800040a83a80 x28: ffff20801f13a980 x27: ffff20800448ae00 [ 146.753644] x26: ffff800012a858e8 x25: ffff800012ea48c0 x24: 0000000000000000 [ 146.754039] x23: ffff800010ab7d60 x22: ffff800012f03758 x21: 000000000000005f [ 146.754427] x20: 000000000000005c x19: ffff004080012840 x18: ffffffffffffffff [ 146.754814] x17: 3661613030303230 x16: 30303078303a3239 x15: ffff800011f92b48 [ 146.755197] x14: ffff20be3f95cef6 x13: 2e6e69616d6f642d x12: 6465686373204c4c [ 146.755578] x11: ffff20bf7fc83a00 x10: 0000000000000040 x9 : 0000000000000000 [ 146.755957] x8 : 0000000000000002 x7 : ffffffffe0000000 x6 : 0000000000000002 [ 146.756334] x5 : 0000000090000000 x4 : 00000000f0000000 x3 : 0000000000000001 [ 146.756705] x2 : 0000000000000080 x1 : ffff800012f03860 x0 : 0000000000000001 [ 146.757070] Call trace: [ 146.757421] build_sched_domains+0x81c/0x11b0 [ 146.757771] partition_sched_domains_locked+0x57c/0x978 [ 146.758118] rebuild_sched_domains_locked+0x44c/0x7f0 [ 146.758460] rebuild_sched_domains+0x2c/0x48 [ 146.758791] cpuset_hotplug_workfn+0x3fc/0x888 [ 146.759114] process_one_work+0x1f4/0x480 [ 146.759429] worker_thread+0x48/0x460 [ 146.759734] kthread+0x158/0x168 [ 146.760030] ret_from_fork+0x10/0x20 [ 146.760318] ---[ end trace 82c44aad6900e81a ]--- For some architectures like risc-v and arm64 which use common code clear_cpu_topology() in shutting down CPUx, When CONFIG_SCHED_CLUSTER is set, cluster_sibling in cpu_topology of each sibling adjacent to CPUx is missed clearing, this causes checking failed in topology_span_sane() and rebuilding topology failure at end when CPU online. Different sibling's cluster_sibling in cpu_topology[] when CPU92 offline (CPU 92, 93, 94, 95 are in one cluster): Before revision: CPU [92] [93] [94] [95] cluster_sibling [92] [92-95] [92-95] [92-95] After revision: CPU [92] [93] [94] [95] cluster_sibling [92] [93-95] [93-95] [93-95] Signed-off-by:
Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by:
Barry Song <song.bao.hua@hisilicon.com> Tested-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20211110095856.469360-1-bobo.shaobowang@huawei.com
-
- 15 Oct, 2021 1 commit
-
-
Jonathan Cameron authored
Both ACPI and DT provide the ability to describe additional layers of topology between that of individual cores and higher level constructs such as the level at which the last level cache is shared. In ACPI this can be represented in PPTT as a Processor Hierarchy Node Structure [1] that is the parent of the CPU cores and in turn has a parent Processor Hierarchy Nodes Structure representing a higher level of topology. For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. +-----------------------------------+ +---------+ | +------+ +------+ +--------------------------+ | | | CPU0 | | cpu1 | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ cluster | | tag | | | | | CPU2 | | CPU3 | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +----+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | L3 | | data | +-----------------------------------+ | | | +------+ +------+ | +-----------+ | | | | | | | | | | | | | +------+ +------+ +----+ L3 | | | | | | tag | | | | +------+ +------+ | | | | | | | | | | | +-----------+ | | | +------+ +------+ +--------------------------+ | +-----------------------------------| | | +-----------------------------------| | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ | | tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +---+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +--+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | +---------+ +-----------------------------------+ That means spreading tasks among clusters will bring more bandwidth while packing tasks within one cluster will lead to smaller cache synchronization latency. So both kernel and userspace will have a chance to leverage this topology to deploy tasks accordingly to achieve either smaller cache latency within one cluster or an even distribution of load among clusters for higher throughput. This patch exposes cluster topology to both kernel and userspace. Libraried like hwloc will know cluster by cluster_cpus and related sysfs attributes. PoC of HWLOC support at [2]. Note this patch only handle the ACPI case. Special consideration is needed for SMT processors, where it is necessary to move 2 levels up the hierarchy from the leaf nodes (thus skipping the processor core level). Note that arm64 / ACPI does not provide any means of identifying a die level in the topology but that may be unrelate to the cluster level. [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node structure (Type 0) [2] https://github.com/hisilicon/hwloc/tree/linux-clusterSigned-off-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by:
Tian Tao <tiantao6@hisilicon.com> Signed-off-by:
Barry Song <song.bao.hua@hisilicon.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210924085104.44806-2-21cnbao@gmail.com
-
- 05 Oct, 2021 1 commit
-
-
Mianhan Liu authored
arch_topology.c hasn't use any macro or function declared in linux/percpu.h, linux/smp.h and linux/string.h. Thus, these files can be removed from arch_topology.c safely without affecting the compilation of the drivers/base/ module Signed-off-by:
Mianhan Liu <liumh1@shanghaitech.edu.cn> Link: https://lore.kernel.org/r/20210928193138.24192-1-liumh1@shanghaitech.edu.cnSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 30 Aug, 2021 1 commit
-
-
Thara Gopinath authored
Add interrupt support to notify the kernel of h/w initiated frequency throttling by LMh. Convey this to scheduler via thermal presssure interface. Signed-off-by:
Thara Gopinath <thara.gopinath@linaro.org> [Viresh: Added changes for arch_topology.c to fix build errors ] Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org>
-
- 01 Jul, 2021 1 commit
-
-
Viresh Kumar authored
Currently topology_scale_freq_tick() (which gets called from scheduler_tick()) may end up using a pointer to "struct scale_freq_data", which was previously cleared by topology_clear_scale_freq_source(), as there is no protection in place here. The users of topology_clear_scale_freq_source() though needs a guarantee that the previously cleared scale_freq_data isn't used anymore, so they can free the related resources. Since topology_scale_freq_tick() is called from scheduler tick, we don't want to add locking in there. Use the RCU update mechanism instead (which is already used by the scheduler's utilization update path) to guarantee race free updates here. synchronize_rcu() makes sure that all RCU critical sections that started before it is called, will finish before it returns. And so the callers of topology_clear_scale_freq_source() don't need to worry about their callback getting called anymore. Cc: Paul E. McKenney <paulmck@kernel.org> Fixes: 01e055c1 ("arch_topology: Allow multiple entities to provide sched_freq_tick() callback") Tested-by:
Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Qian Cai <quic_qiancai@quicinc.com> Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org>
-
- 12 Mar, 2021 1 commit
-
-
Viresh Kumar authored
It is possible now for other parts of the kernel to provide their own implementation of sched_freq_tick() and they can very well be modules themselves (like CPPC cpufreq driver, which is going to use these in a later commit). Export arch_freq_scale and topology_{set|clear}_scale_freq_source(). Reviewed-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org>
-
- 10 Mar, 2021 2 commits
-
-
Viresh Kumar authored
This patch attempts to make it generic enough so other parts of the kernel can also provide their own implementation of scale_freq_tick() callback, which is called by the scheduler periodically to update the per-cpu arch_freq_scale variable. The implementations now need to provide 'struct scale_freq_data' for the CPUs for which they have hardware counters available, and a callback gets registered for each possible CPU in a per-cpu variable. The arch specific (or ARM AMU) counters are updated to adapt to this and they take the highest priority if they are available, i.e. they will be used instead of CPPC based counters for example. The special code to rebuild the sched domains, in case invariance status change for the system, is moved out of arm64 specific code and is added to arch_topology.c. Note that this also defines SCALE_FREQ_SOURCE_CPUFREQ but doesn't use it and it is added to show that cpufreq is also acts as source of information for FIE and will be used by default if no other counters are supported for a platform. Reviewed-by:
Ionela Voinescu <ionela.voinescu@arm.com> Tested-by:
Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Will Deacon <will@kernel.org> # for arm64 Tested-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org>
-
Viresh Kumar authored
Rename freq_scale to a less generic name, as it will get exported soon for modules. Since x86 already names its own implementation of this as arch_freq_scale, lets stick to that. Suggested-by:
Will Deacon <will@kernel.org> Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org>
-
- 08 Oct, 2020 1 commit
-
-
Ionela Voinescu authored
Compared to other arch_* functions, arch_set_freq_scale() has an atypical weak definition that can be replaced by a strong architecture specific implementation. The more typical support for architectural functions involves defining an empty stub in a header file if the symbol is not already defined in architecture code. Some examples involve: - #define arch_scale_freq_capacity topology_get_freq_scale - #define arch_scale_freq_invariant topology_scale_freq_invariant - #define arch_scale_cpu_capacity topology_get_cpu_scale - #define arch_update_cpu_topology topology_update_cpu_topology - #define arch_scale_thermal_pressure topology_get_thermal_pressure - #define arch_set_thermal_pressure topology_set_thermal_pressure Bring arch_set_freq_scale() in line with these functions by renaming it to topology_set_freq_scale() in the arch topology driver, and by defining the arch_set_freq_scale symbol to point to the new function for arm and arm64. While there are other users of the arch_topology driver, this patch defines arch_set_freq_scale for arm and arm64 only, due to their existing definitions of arch_scale_freq_capacity. This is the getter function of the frequency invariance scale factor and without a getter function, the setter function - arch_set_freq_scale() has not purpose. Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Acked-by:
Viresh Kumar <viresh.kumar@linaro.org> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts) Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 02 Oct, 2020 1 commit
-
-
Joe Perches authored
Convert the various sprintf fmaily calls in sysfs device show functions to sysfs_emit and sysfs_emit_at for PAGE_SIZE buffer safety. Done with: $ spatch -sp-file sysfs_emit_dev.cocci --in-place --max-width=80 . And cocci script: $ cat sysfs_emit_dev.cocci @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Signed-off-by:
Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/3d033c33056d88bbe34d4ddb62afd05ee166ab9a.1600285923.git.joe@perches.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 18 Sep, 2020 3 commits
-
-
Valentin Schneider authored
arch_scale_freq_invariant() is used by schedutil to determine whether the scheduler's load-tracking signals are frequency invariant. Its definition is overridable, though by default it is hardcoded to 'true' if arch_scale_freq_capacity() is defined ('false' otherwise). This behaviour is not overridden on arm, arm64 and other users of the generic arch topology driver, which is somewhat precarious: arch_scale_freq_capacity() will always be defined, yet not all cpufreq drivers are guaranteed to drive the frequency invariance scale factor setting. In other words, the load-tracking signals may very well *not* be frequency invariant. Now that cpufreq can be queried on whether the current driver is driving the Frequency Invariance (FI) scale setting, the current situation can be improved. This combines the query of whether cpufreq supports the setting of the frequency scale factor, with whether all online CPUs are counter-based FI enabled. While cpufreq FI enablement applies at system level, for all CPUs, counter-based FI support could also be used for only a subset of CPUs to set the invariance scale factor. Therefore, if cpufreq-based FI support is present, we consider the system to be invariant. If missing, we require all online CPUs to be counter-based FI enabled in order for the full system to be considered invariant. If the system ends up not being invariant, a new condition is needed in the counter initialization code that disables all scale factor setting based on counters. Precedence of counters over cpufreq use is not important here. The invariant status is only given to the system if all CPUs have at least one method of setting the frequency scale factor. Signed-off-by:
Valentin Schneider <valentin.schneider@arm.com> Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Acked-by:
Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Valentin Schneider authored
The passed cpumask arguments to arch_set_freq_scale() and arch_freq_counters_available() are only iterated over, so reflect this in the prototype. This also allows to pass system cpumasks like cpu_online_mask without getting a warning. Signed-off-by:
Valentin Schneider <valentin.schneider@arm.com> Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Acked-by:
Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Ionela Voinescu authored
The current frequency passed to arch_set_freq_scale() could end up being 0, signaling an error in setting a new frequency. Also, if the maximum frequency in 0, this will result in a division by 0 error. Therefore, validate these input values before using them for the setting of the frequency scale factor. Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Acked-by:
Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 22 Jul, 2020 1 commit
-
-
Valentin Schneider authored
The following commit: 14533a16 ("thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code") moved the definition of arch_set_thermal_pressure() to sched/core.c, but kept its declaration in linux/arch_topology.h. When building e.g. an x86 kernel with CONFIG_SCHED_THERMAL_PRESSURE=y, cpufreq_cooling.c ends up getting the declaration of arch_set_thermal_pressure() from include/linux/arch_topology.h, which is somewhat awkward. On top of this, sched/core.c unconditionally defines o The thermal_pressure percpu variable o arch_set_thermal_pressure() while arch_scale_thermal_pressure() does nothing unless redefined by the architecture. arch_*() functions are meant to be defined by architectures, so revert the aforementioned commit and re-implement it in a way that keeps arch_set_thermal_pressure() architecture-definable, and doesn't define the thermal pressure percpu variable for kernels that don't need it (CONFIG_SCHED_THERMAL_PRESSURE=n). Signed-off-by:
Valentin Schneider <valentin.schneider@arm.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200712165917.9168-2-valentin.schneider@arm.com
-
- 18 Mar, 2020 1 commit
-
-
Jeffy Chen authored
Add a sanity check before putting the cpu clk. Fixes: b8fe128d (“arch_topology: Adjust initial CPU capacities with current freq") Signed-off-by:
Jeffy Chen <jeffy.chen@rock-chips.com> Link: https://lore.kernel.org/r/20200317063308.23209-1-jeffy.chen@rock-chips.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 11 Mar, 2020 2 commits
-
-
Zeng Tao authored
Currently there are only 10 bytes to store the cpu-topology 'name' information. Only 10 bytes copied into cluster/thread/core names. If the cluster ID exceeds 2-digit number, it will result in the data corruption, and ending up in a dead loop in the parsing routines. The same applies to the thread names with more that 3-digit number. This issue was found using the boundary tests under virtualised environment like QEMU. Let us increase the buffer to fix such potential issues. Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Zeng Tao <prime.zeng@hisilicon.com> Link: https://lore.kernel.org/r/1583294092-5929-1-git-send-email-prime.zeng@hisilicon.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jeffy Chen authored
The CPU freqs are not supposed to change before cpufreq policies properly registered, meaning that they should be used to calculate the initial CPU capacities. Doing this helps choosing the best CPU during early boot, especially for the initramfs decompressing. There's no functional changes for non-clk CPU DVFS mechanism. Signed-off-by:
Jeffy Chen <jeffy.chen@rock-chips.com> Link: https://lore.kernel.org/r/20200113034815.25924-1-jeffy.chen@rock-chips.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 Mar, 2020 3 commits
-
-
Ionela Voinescu authored
The Frequency Invariance Engine (FIE) is providing a frequency scaling correction factor that helps achieve more accurate load-tracking. So far, for arm and arm64 platforms, this scale factor has been obtained based on the ratio between the current frequency and the maximum supported frequency recorded by the cpufreq policy. The setting of this scale factor is triggered from cpufreq drivers by calling arch_set_freq_scale. The current frequency used in computation is the frequency requested by a governor, but it may not be the frequency that was implemented by the platform. This correction factor can also be obtained using a core counter and a constant counter to get information on the performance (frequency based only) obtained in a period of time. This will more accurately reflect the actual current frequency of the CPU, compared with the alternative implementation that reflects the request of a performance level from the OS. Therefore, implement arch_scale_freq_tick to use activity monitors, if present, for the computation of the frequency scale factor. The use of AMU counters depends on: - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present - CONFIG_CPU_FREQ - the current frequency obtained using counter information is divided by the maximum frequency obtained from the cpufreq policy. While it is possible to have a combination of CPUs in the system with and without support for activity monitors, the use of counters for frequency invariance is only enabled for a CPU if all related CPUs (CPUs in the same frequency domain) support and have enabled the core and constant activity monitor counters. In this way, there is a clear separation between the policies for which arch_set_freq_scale (cpufreq based FIE) is used, and the policies for which arch_scale_freq_tick (counter based FIE) is used to set the frequency scale factor. For this purpose, a late_initcall_sync is registered to trigger validation work for policies that will enable or disable the use of AMU counters for frequency invariance. If CONFIG_CPU_FREQ is not defined, the use of counters is enabled on all CPUs only if all possible CPUs correctly support the necessary counters. Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by:
Lukasz Luba <lukasz.luba@arm.com> Acked-by:
Sudeep Holla <sudeep.holla@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Ingo Molnar authored
drivers/base/arch_topology.c is only built if CONFIG_GENERIC_ARCH_TOPOLOGY=y, resulting in such build failures: cpufreq_cooling.c:(.text+0x1e7): undefined reference to `arch_set_thermal_pressure' Move it to sched/core.c instead, and keep it enabled on x86 despite us not having a arch_scale_thermal_pressure() facility there, to build-test this thing. Cc: Thara Gopinath <thara.gopinath@linaro.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Thara Gopinath authored
Add architecture specific APIs to update and track thermal pressure on a per CPU basis. A per CPU variable thermal_pressure is introduced to keep track of instantaneous per CPU thermal pressure. Thermal pressure is the delta between maximum capacity and capped capacity due to a thermal event. topology_get_thermal_pressure can be hooked into the scheduler specified arch_scale_thermal_pressure to retrieve instantaneous thermal pressure of a CPU. arch_set_thermal_pressure can be used to update the thermal pressure. Considering topology_get_thermal_pressure reads thermal_pressure and arch_set_thermal_pressure writes into thermal_pressure, one can argue for some sort of locking mechanism to avoid a stale value. But considering topology_get_thermal_pressure can be called from a system critical path like scheduler tick function, a locking mechanism is not ideal. This means that it is possible the thermal_pressure value used to calculate average thermal pressure for a CPU can be stale for up to 1 tick period. Signed-off-by:
Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200222005213.3873-4-thara.gopinath@linaro.org
-
- 17 Jan, 2020 1 commit
-
-
Zeng Tao authored
When the kernel is configured with CONFIG_NR_CPUS smaller than the number of CPU nodes in the device tree(DT), all the CPU nodes parsing done to fetch topology information will fail. This is not reasonable as it is legal to have all the physical CPUs in the system in the DT. Let us just skip such CPU DT nodes that are not used in the kernel rather than returning an error. Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Zeng Tao <prime.zeng@hisilicon.com> Link: https://lore.kernel.org/r/1579225973-32423-1-git-send-email-prime.zeng@hisilicon.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 Aug, 2019 1 commit
-
-
Viresh Kumar authored
CPUFREQ_NOTIFY is going to get removed soon, lets use CPUFREQ_CREATE_POLICY instead of that here. CPUFREQ_CREATE_POLICY is called only once (which is exactly what we want here) for each cpufreq policy when it is first created. Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 22 Jul, 2019 2 commits
-
-
Atish Patra authored
Currently, ARM32 and ARM64 uses different data structures to represent their cpu topologies. Since, we are moving the ARM64 topology to common code to be used by other architectures, we can reuse that for ARM32 as well. Take this opprtunity to remove the redundant functions from ARM32 and reuse the common code instead. To: Russell King <linux@armlinux.org.uk> Signed-off-by:
Atish Patra <atish.patra@wdc.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> (on TC2) Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Paul Walmsley <paul.walmsley@sifive.com>
-
Atish Patra authored
Both RISC-V & ARM64 are using cpu-map device tree to describe their cpu topology. It's better to move the relevant code to a common place instead of duplicate code. To: Will Deacon <will.deacon@arm.com> To: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Atish Patra <atish.patra@wdc.com> [Tested on QDF2400] Tested-by:
Jeffrey Hugo <jhugo@codeaurora.org> [Tested on Juno and other embedded platforms.] Tested-by:
Sudeep Holla <sudeep.holla@arm.com> Reviewed-by:
Sudeep Holla <sudeep.holla@arm.com> Acked-by:
Will Deacon <will.deacon@arm.com> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Paul Walmsley <paul.walmsley@sifive.com>
-