1. 10 Aug, 2021 4 commits
    • Laurent Dufour's avatar
      pseries/drmem: update LMBs after LPM · d144f4d5
      Laurent Dufour authored
      After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
      updated by the hypervisor in the case the NUMA topology of the LPAR's
      memory is updated.
      
      This is handled by the kernel, but the memory's node is not updated because
      there is no way to move a memory block between nodes from the Linux kernel
      point of view.
      
      If later a memory block is added or removed, drmem_update_dt() is called
      and it is overwriting the DT node ibm,dynamic-reconfiguration-memory to
      match the added or removed LMB. But the LMB's associativity node has not
      been updated after the DT node update and thus the node is overwritten by
      the Linux's topology instead of the hypervisor one.
      
      Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
      updated to force an update of the LMB's associativity. However, ignore the
      call to that hook when the update has been triggered by drmem_update_dt().
      Because, in that case, the LMB tree has been used to set the DT property
      and thus it doesn't need to be updated back. Since drmem_update_dt() is
      called under the protection of the device_hotplug_lock and the hook is
      called in the same context, use a simple boolean variable to detect that
      call.
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
      Reviewed-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210517090606.56930-1-ldufour@linux.ibm.com
      d144f4d5
    • Laurent Dufour's avatar
      powerpc/numa: Consider the max NUMA node for migratable LPAR · 9c7248bb
      Laurent Dufour authored
      When a LPAR is migratable, we should consider the maximum possible NUMA
      node instead of the number of NUMA nodes from the actual system.
      
      The DT property 'ibm,current-associativity-domains' defines the maximum
      number of nodes the LPAR can see when running on that box. But if the
      LPAR is being migrated on another box, it may see up to the nodes
      defined by 'ibm,max-associativity-domains'. So if a LPAR is migratable,
      that value should be used.
      
      Unfortunately, there is no easy way to know if an LPAR is migratable or
      not. The hypervisor exports the property 'ibm,migratable-partition' in
      the case it set to migrate partition, but that would not mean that the
      current partition is migratable.
      
      Without this patch, when a LPAR is started on a 2 node box and then
      migrated to a 3 node box, the hypervisor may spread the LPAR's CPUs on
      the 3rd node. In that case if a CPU from that 3rd node is added to the
      LPAR, it will be wrongly assigned to the node because the kernel has
      been set to use up to 2 nodes (the configuration of the departure node).
      With this patch applies, the CPU is correctly added to the 3rd node.
      
      Fixes: f9f130ff ("powerpc/numa: Detect support for coregroup")
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210511073136.17795-1-ldufour@linux.ibm.com
      9c7248bb
    • Christophe Leroy's avatar
      powerpc/non-smp: Unconditionaly call smp_mb() on switch_mm · c8a6d910
      Christophe Leroy authored
      Commit 3ccfebed ("powerpc, membarrier: Skip memory barrier in
      switch_mm()") added some logic to skip the smp_mb() in
      switch_mm_irqs_off() before the call to switch_mmu_context().
      
      However, on non SMP smp_mb() is just a compiler barrier and doing
      it unconditionaly is simpler than the logic used to check whether the
      barrier is needed or not.
      
      After the patch:
      
      00000000 <switch_mm_irqs_off>:
      ...
         c:	7c 04 18 40 	cmplw   r4,r3
        10:	81 24 00 24 	lwz     r9,36(r4)
        14:	91 25 04 c8 	stw     r9,1224(r5)
        18:	4d 82 00 20 	beqlr
        1c:	48 00 00 00 	b       1c <switch_mm_irqs_off+0x1c>
      			1c: R_PPC_REL24	switch_mmu_context
      
      Before the patch:
      
      00000000 <switch_mm_irqs_off>:
      ...
         c:	7c 04 18 40 	cmplw   r4,r3
        10:	81 24 00 24 	lwz     r9,36(r4)
        14:	91 25 04 c8 	stw     r9,1224(r5)
        18:	4d 82 00 20 	beqlr
        1c:	81 24 00 28 	lwz     r9,40(r4)
        20:	71 29 00 0a 	andi.   r9,r9,10
        24:	40 82 00 34 	bne     58 <switch_mm_irqs_off+0x58>
        28:	48 00 00 00 	b       28 <switch_mm_irqs_off+0x28>
      			28: R_PPC_REL24	switch_mmu_context
      ...
        58:	2c 03 00 00 	cmpwi   r3,0
        5c:	41 82 ff cc 	beq     28 <switch_mm_irqs_off+0x28>
        60:	48 00 00 00 	b       60 <switch_mm_irqs_off+0x60>
      			60: R_PPC_REL24	switch_mmu_context
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/e9d501da0c59f60ca767b1b3ea4603fce6d02b9e.1625486440.git.christophe.leroy@csgroup.eu
      c8a6d910
    • Christophe Leroy's avatar
      powerpc: Remove in_kernel_text() · 09ca4975
      Christophe Leroy authored
      Last user of in_kernel_text() stopped using in with
      commit 549e8152 ("powerpc: Make the 64-bit kernel as a
      position-independent executable").
      
      Generic function is_kernel_text() does the same.
      
      So remote it.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/2a3a5b6f8cc0ef4e854d7b764f66aa8d2ee270d2.1624813698.git.christophe.leroy@csgroup.eu
      09ca4975
  2. 04 Aug, 2021 8 commits
    • Nicholas Piggin's avatar
      powerpc/64s/perf: Always use SIAR for kernel interrupts · cf9c615c
      Nicholas Piggin authored
      If an interrupt is taken in kernel mode, always use SIAR for it rather than
      looking at regs_sipr. This prevents samples piling up around interrupt
      enable (hard enable or interrupt replay via soft enable) in PMUs / modes
      where the PR sample indication is not in synch with SIAR.
      
      This results in better sampling of interrupt entry and exit in particular.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210720141504.420110-1-npiggin@gmail.com
      cf9c615c
    • Parth Shah's avatar
      powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings · e9ef81e1
      Parth Shah authored
      On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
      in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
      itself to find both the L2 and L3 cache siblings.
      Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings,
      but fill the mask using same property "2" array.
      Signed-off-by: default avatarParth Shah <parth@linux.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-4-parth@linux.ibm.com
      e9ef81e1
    • Gautham R. Shenoy's avatar
      powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() · 69aa8e07
      Gautham R. Shenoy authored
      The helper function get_shared_cpu_map() was added in
      
      'commit 500fe5f5 ("powerpc/cacheinfo: Report the correct
      shared_cpu_map on big-cores")'
      
      and subsequently expanded upon in
      
      'commit 0be47634 ("powerpc/cacheinfo: Print correct cache-sibling
      map/list for L2 cache")'
      
      in order to help report the correct groups of threads sharing these caches
      on big-core systems where groups of threads within a core can share
      different sets of caches.
      
      Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property,
      cache->shared_cpu_map contains the correct set of thread-siblings
      sharing the cache. Hence we no longer need the functions
      get_shared_cpu_map(). This patch removes this function. We also remove
      the helper function index_dir_to_cpu() which was only called by
      get_shared_cpu_map().
      
      With these functions removed, we can still see the correct
      cache-sibling map/list for L1 and L2 caches on systems with L1 and L2
      caches distributed among groups of threads in a core.
      
      With this patch, on a SMT8 POWER10 system where the L1 and L2 caches
      are split between the two groups of threads in a core, for CPUs 8,9,
      the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as
      follows:
      
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15
      
      $ ppc64_cpu --smt=4
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11
      
      $ ppc64_cpu --smt=2
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9
      
      $ ppc64_cpu --smt=1
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-3-parth@linux.ibm.com
      69aa8e07
    • Gautham R. Shenoy's avatar
      powerpc/cacheinfo: Lookup cache by dt node and thread-group id · a4bec516
      Gautham R. Shenoy authored
      Currently the cacheinfo code on powerpc indexes the "cache" objects
      (modelling the L1/L2/L3 caches) where the key is device-tree node
      corresponding to that cache. On some of the POWER server platforms
      thread-groups within the core share different sets of caches (Eg: On
      SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
      threads 1,3,5,7 of the same core share another L1 cache). On such
      platforms, there is a single device-tree node corresponding to that
      cache and the cache-configuration within the threads of the core is
      indicated via "ibm,thread-groups" device-tree property.
      
      Since the current code is not aware of the "ibm,thread-groups"
      property, on the aforementoined systems, cacheinfo code still treats
      all the threads in the core to be sharing the cache because of the
      single device-tree node (In the earlier example, the cacheinfo code
      would says CPUs 0-7 share L1 cache).
      
      In this patch, we make the powerpc cacheinfo code aware of the
      "ibm,thread-groups" property. We indexe the "cache" objects by the
      key-pair (device-tree node, thread-group id). For any CPUX, for a
      given level of cache, the thread-group id is defined to be the first
      CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
      of cache which are not represented in "ibm,thread-groups" property,
      the thread-group id is -1.
      
      [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map"
      and "thread_group_l2_cache_map" to get rid of the compile error.]
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarParth Shah <parth@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-2-parth@linux.ibm.com
      a4bec516
    • Masahiro Yamada's avatar
      powerpc: move the install rule to arch/powerpc/Makefile · 86ff0bce
      Masahiro Yamada authored
      Currently, the install target in arch/powerpc/Makefile descends into
      arch/powerpc/boot/Makefile to invoke the shell script, but there is no
      good reason to do so.
      
      arch/powerpc/Makefile can run the shell script directly.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-3-masahiroy@kernel.org
      86ff0bce
    • Masahiro Yamada's avatar
      powerpc: make the install target not depend on any build artifact · 9bef456b
      Masahiro Yamada authored
      The install target should not depend on any build artifact.
      
      The reason is explained in commit 19514fc6 ("arm, kbuild: make
      "make install" not depend on vmlinux").
      
      Change the PowerPC installation code in a similar way.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-2-masahiroy@kernel.org
      9bef456b
    • Masahiro Yamada's avatar
      powerpc: remove unused zInstall target from arch/powerpc/boot/Makefile · 156ca4e6
      Masahiro Yamada authored
      Commit c913e5f9 ("powerpc/boot: Don't install zImage.* from make
      install") added the zInstall target to arch/powerpc/boot/Makefile,
      but you cannot use it since the corresponding hook is missing in
      arch/powerpc/Makefile.
      
      It has never worked since its addition. Nobody has complained about
      it for 7 years, which means this code was unneeded.
      
      With this removal, the install.sh will be passed in with 4 parameters.
      Simplify the shell script.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-1-masahiroy@kernel.org
      156ca4e6
    • Nathan Chancellor's avatar
      cpuidle: pseries: Mark pseries_idle_proble() as __init · d04691d3
      Nathan Chancellor authored
      After commit 7cbd631d4dec ("cpuidle: pseries: Fixup CEDE0 latency only
      for POWER10 onwards"), pseries_idle_probe() is no longer inlined when
      compiling with clang, which causes a modpost warning:
      
      WARNING: modpost: vmlinux.o(.text+0xc86a54): Section mismatch in
      reference from the function pseries_idle_probe() to the function
      .init.text:fixup_cede0_latency()
      The function pseries_idle_probe() references
      the function __init fixup_cede0_latency().
      This is often because pseries_idle_probe lacks a __init
      annotation or the annotation of fixup_cede0_latency is wrong.
      
      pseries_idle_probe() is a non-init function, which calls
      fixup_cede0_latency(), which is an init function, explaining the
      mismatch. pseries_idle_probe() is only called from
      pseries_processor_idle_init(), which is an init function, so mark
      pseries_idle_probe() as __init so there is no more warning.
      
      Fixes: 054e44ba ("cpuidle: pseries: Add function to parse extended CEDE records")
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210803211547.1093820-1-nathan@kernel.org
      d04691d3
  3. 03 Aug, 2021 3 commits
    • Michal Suchanek's avatar
      powerpc/stacktrace: Include linux/delay.h · a6cae77f
      Michal Suchanek authored
      commit 7c6986ad ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()")
      introduces udelay() call without including the linux/delay.h header.
      This may happen to work on master but the header that declares the
      functionshould be included nonetheless.
      
      Fixes: 7c6986ad ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()")
      Signed-off-by: default avatarMichal Suchanek <msuchanek@suse.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729180103.15578-1-msuchanek@suse.de
      a6cae77f
    • Gautham R. Shenoy's avatar
      cpuidle: pseries: Do not cap the CEDE0 latency in fixup_cede0_latency() · 71737a6c
      Gautham R. Shenoy authored
      Currently in fixup_cede0_latency() code, we perform the fixup the
      CEDE(0) exit latency value only if minimum advertized extended CEDE
      latency values are less than 10us. This was done so as to not break
      the expected behaviour on POWER8 platforms where the advertised
      latency was higher than the default 10us, which would delay the SMT
      folding on the core.
      
      However, after the earlier patch "cpuidle/pseries: Fixup CEDE0 latency
      only for POWER10 onwards", we can be sure that the fixup of CEDE0
      latency is going to happen only from POWER10 onwards. Hence
      unconditionally use the minimum exit latency provided by the platform.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1626676399-15975-3-git-send-email-ego@linux.vnet.ibm.com
      71737a6c
    • Gautham R. Shenoy's avatar
      cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards · 50741b70
      Gautham R. Shenoy authored
      Commit d947fb4c ("cpuidle: pseries: Fixup exit latency for
      CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
      of the Extended CEDE states advertised by the platform
      
      On POWER9 LPARs, the firmwares advertise a very low value of 2us for
      CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the
      PHYP hypervisor corresponds to the latency required to wakeup from the
      underlying hardware idle state. However the wakeup latency from the
      LPAR perspective should include
      
      1. The time taken to transition the CPU from the Hypervisor into the
         LPAR post wakeup from platform idle state
      
      2. Time taken to send the IPI from the source CPU (waker) to the idle
         target CPU (wakee).
      
      1. can be measured via timer idle test, where we queue a timer, say
      for 1ms, and enter the CEDE state. When the timer fires, in the timer
      handler we compute how much extra timer over the expected 1ms have we
      consumed. On a a POWER9 LPAR the numbers are
      
      CEDE latency measured using a timer (numbers in ns)
      N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
      400     2601     5677     5668.74    5917    6413     9299   455.01
      
      1. and 2. combined can be determined by an IPI latency test where we
      send an IPI to an idle CPU and in the handler compute the time
      difference between when the IPI was sent and when the handler ran. We
      see the following numbers on POWER9 LPAR.
      
      CEDE latency measured using an IPI (numbers in ns)
      N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
      400     711      7564     7369.43   8559    9514      9698   1200.01
      
      Suppose, we consider the 99th percentile latency value measured using
      the IPI to be the wakeup latency, the value would be 9.5us This is in
      the ballpark of the default value of 10us.
      
      Hence, use the exit latency of CEDE(0) based on the latency values
      advertized by platform only from POWER10 onwards. The values
      advertized on POWER10 platforms is more realistic and informed by the
      latency measurements. For earlier platforms stick to the default value
      of 10us. The fix was suggested by Michael Ellerman.
      
      Fixes: d947fb4c ("cpuidle: pseries: Fixup exit latency for CEDE(0)")
      Reported-by: default avatarEnrico Joedecke <joedecke@de.ibm.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1626676399-15975-2-git-send-email-ego@linux.vnet.ibm.com
      50741b70
  4. 26 Jul, 2021 2 commits
  5. 23 Jul, 2021 2 commits
    • Nicholas Piggin's avatar
      KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state · d9c57d3e
      Nicholas Piggin authored
      The H_ENTER_NESTED hypercall is handled by the L0, and it is a request
      by the L1 to switch the context of the vCPU over to that of its L2
      guest, and return with an interrupt indication. The L1 is responsible
      for switching some registers to guest context, and the L0 switches
      others (including all the hypervisor privileged state).
      
      If the L2 MSR has TM active, then the L1 is responsible for
      recheckpointing the L2 TM state. Then the L1 exits to L0 via the
      H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit,
      and then it recheckpoints the TM state as part of the nested entry and
      finally HRFIDs into the L2 with TM active MSR. Not efficient, but about
      the simplest approach for something that's horrendously complicated.
      
      Problems arise if the L1 exits to the L0 with a TM state which does not
      match the L2 TM state being requested. For example if the L1 is
      transactional but the L2 MSR is non-transactional, or vice versa. The
      L0's HRFID can take a TM Bad Thing interrupt and crash.
      
      Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then
      ensuring that if the L1 is suspended then the L2 must have TM active,
      and if the L1 is not suspended then the L2 must not have TM active.
      
      Fixes: 360cae31 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
      Cc: stable@vger.kernel.org # v4.20+
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d9c57d3e
    • Nicholas Piggin's avatar
      KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow · f62f3c20
      Nicholas Piggin authored
      The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on
      the rtas_args.nargs that was provided by the guest. That guest nargs
      value is not range checked, so the guest can cause the host rets pointer
      to be pointed outside the args array. The individual rtas function
      handlers check the nargs and nrets values to ensure they are correct,
      but if they are not, the handlers store a -3 (0xfffffffd) failure
      indication in rets[0] which corrupts host memory.
      
      Fix this by testing up front whether the guest supplied nargs and nret
      would exceed the array size, and fail the hcall directly without storing
      a failure indication to rets[0].
      
      Also expand on a comment about why we kill the guest and try not to
      return errors directly if we have a valid rets[0] pointer.
      
      Fixes: 8e591cb7 ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls")
      Cc: stable@vger.kernel.org # v3.10+
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f62f3c20
  6. 18 Jul, 2021 13 commits
    • Linus Torvalds's avatar
      Linux 5.14-rc2 · 2734d6c1
      Linus Torvalds authored
      2734d6c1
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of... · 8c25c447
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Skip invalid hybrid PMU on hybrid systems when the atom (little) CPUs
         are offlined.
      
       - Fix 'perf test' problems related to the recently added hybrid
         (BIG/little) code.
      
       - Split ARM's coresight (hw tracing) decode by aux records to avoid
         fatal decoding errors.
      
       - Fix add event failure in 'perf probe' when running 32-bit perf in a
         64-bit kernel.
      
       - Fix 'perf sched record' failure when CONFIG_SCHEDSTATS is not set.
      
       - Fix memory and refcount leaks detected by ASAn when running 'perf
         test', should be clean of warnings now.
      
       - Remove broken definition of __LITTLE_ENDIAN from tools'
         linux/kconfig.h, which was breaking the build in some systems.
      
       - Cast PTHREAD_STACK_MIN to int as it may turn into 'long
         sysconf(__SC_THREAD_STACK_MIN_VALUE), breaking the build in some
         systems.
      
       - Fix libperf build error with LIBPFM4=1.
      
       - Sync UAPI files changed by the memfd_secret new syscall.
      
      * tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (35 commits)
        perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
        perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel
        perf data: Close all files in close_dir()
        perf probe-file: Delete namelist in del_events() on the error path
        perf test bpf: Free obj_buf
        perf trace: Free strings in trace__parse_events_option()
        perf trace: Free syscall tp fields in evsel->priv
        perf trace: Free syscall->arg_fmt
        perf trace: Free malloc'd trace fields on exit
        perf lzma: Close lzma stream on exit
        perf script: Fix memory 'threads' and 'cpus' leaks on exit
        perf script: Release zstd data
        perf session: Cleanup trace_event
        perf inject: Close inject.output on exit
        perf report: Free generated help strings for sort option
        perf env: Fix memory leak of cpu_pmu_caps
        perf test maps__merge_in: Fix memory leak of maps
        perf dso: Fix memory leak in dso__new_map()
        perf test event_update: Fix memory leak of unit
        perf test event_update: Fix memory leak of evlist
        ...
      8c25c447
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f0eb870a
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "A few fixes for issues in the new online shrink code, additional
        corrections for my recent bug-hunt w.r.t. extent size hints on
        realtime, and improved input checking of the GROWFSRT ioctl.
      
        IOW, the usual 'I somehow got bored during the merge window and
        resumed auditing the farther reaches of xfs':
      
         - Fix shrink eligibility checking when sparse inode clusters enabled
      
         - Reset '..' directory entries when unlinking directories to prevent
           verifier errors if fs is shrinked later
      
         - Don't report unusable extent size hints to FSGETXATTR
      
         - Don't warn when extent size hints are unusable because the sysadmin
           configured them that way
      
         - Fix insufficient parameter validation in GROWFSRT ioctl
      
         - Fix integer overflow when adding rt volumes to filesystem"
      
      * tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: detect misaligned rtinherit directory extent size hints
        xfs: fix an integer overflow error in xfs_growfs_rt
        xfs: improve FSGROWFSRT precondition checking
        xfs: don't expose misaligned extszinherit hints to userspace
        xfs: correct the narrative around misaligned rtinherit/extszinherit dirs
        xfs: reset child dir '..' entry when unlinking child
        xfs: check for sparse inode clusters that cross new EOAG when shrinking
      f0eb870a
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · fbf1bddc
      Linus Torvalds authored
      Pull iomap fixes from Darrick Wong:
       "A handful of bugfixes for the iomap code.
      
        There's nothing especially exciting here, just fixes for UBSAN (not
        KASAN as I erroneously wrote in the tag message) warnings about
        undefined behavior in the SEEK_DATA/SEEK_HOLE code, and some
        reshuffling of per-page block state info to fix some problems with
        gfs2.
      
         - Fix KASAN warnings due to integer overflow in SEEK_DATA/SEEK_HOLE
      
         - Fix assertion errors when using inlinedata files on gfs2"
      
      * tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor
        iomap: Don't create iomap_page objects for inline files
        iomap: Permit pages without an iop to enter writeback
        iomap: remove the length variable in iomap_seek_hole
        iomap: remove the length variable in iomap_seek_data
      fbf1bddc
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.14' of... · 6750691a
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Restore the original behavior of scripts/setlocalversion when
         LOCALVERSION is set to empty.
      
       - Show Kconfig prompts even for 'make -s'
      
       - Fix the combination of COFNIG_LTO_CLANG=y and CONFIG_MODVERSIONS=y
         for older GNU Make versions
      
      * tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Documentation: Fix intiramfs script name
        Kbuild: lto: fix module versionings mismatch in GNU make 3.X
        kbuild: do not suppress Kconfig prompts for silent build
        scripts/setlocalversion: fix a bug when LOCALVERSION is empty
      6750691a
    • Robert Richter's avatar
      Documentation: Fix intiramfs script name · 5e60f363
      Robert Richter authored
      Documentation was not changed when renaming the script in commit
      80e715a0 ("initramfs: rename gen_initramfs_list.sh to
      gen_initramfs.sh"). Fixing this.
      
      Basically does:
      
       $ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh)
      
      Fixes: 80e715a0 ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh")
      Signed-off-by: default avatarRobert Richter <rrichter@amd.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      5e60f363
    • Lecopzer Chen's avatar
      Kbuild: lto: fix module versionings mismatch in GNU make 3.X · 1d11053d
      Lecopzer Chen authored
      When building modules(CONFIG_...=m), I found some of module versions
      are incorrect and set to 0.
      This can be found in build log for first clean build which shows
      
      WARNING: EXPORT symbol "XXXX" [drivers/XXX/XXX.ko] version generation failed,
      symbol will not be versioned.
      
      But in second build(incremental build), the WARNING disappeared and the
      module version becomes valid CRC and make someone who want to change
      modules without updating kernel image can't insert their modules.
      
      The problematic code is
      +	$(foreach n, $(filter-out FORCE,$^),				\
      +		$(if $(wildcard $(n).symversions),			\
      +			; cat $(n).symversions >> $@.symversions))
      
      For example:
        rm -f fs/notify/built-in.a.symversions    ; rm -f fs/notify/built-in.a; \
      llvm-ar cDPrST fs/notify/built-in.a fs/notify/fsnotify.o \
      fs/notify/notification.o fs/notify/group.o ...
      
      `foreach n` shows nothing to `cat` into $(n).symversions because
      `if $(wildcard $(n).symversions)` return nothing, but actually
      they do exist during this line was executed.
      
      -rw-r--r-- 1 root root 168580 Jun 13 19:10 fs/notify/fsnotify.o
      -rw-r--r-- 1 root root    111 Jun 13 19:10 fs/notify/fsnotify.o.symversions
      
      The reason is the $(n).symversions are generated at runtime, but
      Makefile wildcard function expends and checks the file exist or not
      during parsing the Makefile.
      
      Thus fix this by use `test` shell command to check the file
      existence in runtime.
      
      Rebase from both:
      1. [https://lore.kernel.org/lkml/20210616080252.32046-1-lecopzer.chen@mediatek.com/]
      2. [https://lore.kernel.org/lkml/20210702032943.7865-1-lecopzer.chen@mediatek.com/]
      
      Fixes: 38e89184 ("kbuild: lto: fix module versioning")
      Co-developed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarLecopzer Chen <lecopzer.chen@mediatek.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      1d11053d
    • Masahiro Yamada's avatar
      kbuild: do not suppress Kconfig prompts for silent build · d952cfaf
      Masahiro Yamada authored
      When a new CONFIG option is available, Kbuild shows a prompt to get
      the user input.
      
        $ make
        [ snip ]
        Core Scheduling for SMT (SCHED_CORE) [N/y/?] (NEW)
      
      This is the only interactive place in the build process.
      
      Commit 174a1dcc ("kbuild: sink stdout from cmd for silent build")
      suppressed Kconfig prompts as well because syncconfig is invoked by
      the 'cmd' macro. You cannot notice the fact that Kconfig is waiting
      for the user input.
      
      Use 'kecho' to show the equivalent short log without suppressing stdout
      from sub-make.
      
      Fixes: 174a1dcc ("kbuild: sink stdout from cmd for silent build")
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      d952cfaf
    • Mikulas Patocka's avatar
      scripts/setlocalversion: fix a bug when LOCALVERSION is empty · 5df99bec
      Mikulas Patocka authored
      The commit 042da426 ("scripts/setlocalversion: simplify the short
      version part") reduces indentation. Unfortunately, it also changes behavior
      in a subtle way - if the user has empty "LOCALVERSION" variable, the plus
      sign is appended to the kernel version. It wasn't appended before.
      
      This patch reverts to the old behavior - we append the plus sign only if
      the LOCALVERSION variable is not set.
      
      Fixes: 042da426 ("scripts/setlocalversion: simplify the short version part")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      5df99bec
    • Yang Jihong's avatar
      perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set · b0f00855
      Yang Jihong authored
      The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user
      if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events.
      As a result, the command fails.
      
      Before:
      
        #perf sched record sleep 1
        event syntax error: 'sched:sched_stat_wait'
                             \___ unknown tracepoint
      
        Error:  File /sys/kernel/tracing/events/sched/sched_stat_wait not found.
        Hint:   Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
      
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      Solution:
        Check whether schedstat tracepoints are exposed. If no, these events are not recorded.
      
      After:
        # perf sched record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ]
        # perf sched report
        run measurement overhead: 4736 nsecs
        sleep measurement overhead: 9059979 nsecs
        the run test took 999854 nsecs
        the sleep test took 8945271 nsecs
        nr_run_events:        716
        nr_sleep_events:      785
        nr_wakeup_events:     0
        ...
        ------------------------------------------------------------
      
      Fixes: 2a09b5de ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set")
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b0f00855
    • Yang Jihong's avatar
      perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel · 22a66551
      Yang Jihong authored
      The "address" member of "struct probe_trace_point" uses long data type.
      If kernel is 64-bit and perf program is 32-bit, size of "address"
      variable is 32 bits.
      
      As a result, upper 32 bits of address read from kernel are truncated, an
      error occurs during address comparison in kprobe_warn_out_range().
      
      Before:
      
        # perf probe -a schedule
        schedule is out of .text, skip it.
          Error: Failed to add events.
      
      Solution:
        Change data type of "address" variable to u64 and change corresponding
      address printing and value assignment.
      
      After:
      
        # perf.new.new probe -a schedule
        Added new event:
          probe:schedule       (on schedule)
      
        You can now use it in all perf tools, such as:
      
                perf record -e probe:schedule -aR sleep 1
      
        # perf probe -l
          probe:schedule       (on schedule@kernel/sched/core.c)
        # perf record -e probe:schedule -aR sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ]
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'probe:schedule'
        # Event count (approx.): 1366
        #
        # Overhead  Command          Shared Object      Symbol
        # ........  ...............  .................  ............
        #
             6.22%  migration/0      [kernel.kallsyms]  [k] schedule
             6.22%  migration/1      [kernel.kallsyms]  [k] schedule
             6.22%  migration/2      [kernel.kallsyms]  [k] schedule
             6.22%  migration/3      [kernel.kallsyms]  [k] schedule
             6.15%  migration/10     [kernel.kallsyms]  [k] schedule
             6.15%  migration/11     [kernel.kallsyms]  [k] schedule
             6.15%  migration/12     [kernel.kallsyms]  [k] schedule
             6.15%  migration/13     [kernel.kallsyms]  [k] schedule
             6.15%  migration/14     [kernel.kallsyms]  [k] schedule
             6.15%  migration/15     [kernel.kallsyms]  [k] schedule
             6.15%  migration/4      [kernel.kallsyms]  [k] schedule
             6.15%  migration/5      [kernel.kallsyms]  [k] schedule
             6.15%  migration/6      [kernel.kallsyms]  [k] schedule
             6.15%  migration/7      [kernel.kallsyms]  [k] schedule
             6.15%  migration/8      [kernel.kallsyms]  [k] schedule
             6.15%  migration/9      [kernel.kallsyms]  [k] schedule
             0.22%  rcu_sched        [kernel.kallsyms]  [k] schedule
        ...
        #
        # (Cannot load tips.txt file, please install perf!)
        #
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jianlin Lv <jianlin.lv@arm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Li Huafei <lihuafei1@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20210715063723.11926-1-yangjihong1@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      22a66551
    • Riccardo Mancini's avatar
      perf data: Close all files in close_dir() · d4b3eedc
      Riccardo Mancini authored
      When using 'perf report' in directory mode, the first file is not closed
      on exit, causing a memory leak.
      
      The problem is caused by the iterating variable never reaching 0.
      
      Fixes: 14552063 ("perf data: Add perf_data__(create_dir|close_dir) functions")
      Signed-off-by: default avatarRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Link: http://lore.kernel.org/lkml/20210716141122.858082-1-rickyman7@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d4b3eedc
    • Riccardo Mancini's avatar
      perf probe-file: Delete namelist in del_events() on the error path · e0fa7ab4
      Riccardo Mancini authored
      ASan reports some memory leaks when running:
      
        # perf test "42: BPF filter"
      
      This second leak is caused by a strlist not being dellocated on error
      inside probe_file__del_events.
      
      This patch adds a goto label before the deallocation and makes the error
      path jump to it.
      Signed-off-by: default avatarRiccardo Mancini <rickyman7@gmail.com>
      Fixes: e7895e42 ("perf probe: Split del_perf_probe_events()")
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/174963c587ae77fa108af794669998e4ae558338.1626343282.git.rickyman7@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e0fa7ab4
  7. 17 Jul, 2021 8 commits