Commits · 26973fa5ac0e3b88d0d476caccfc10839b26098b · Kirill Smelkov / linux

14 Oct, 2018 9 commits

powerpc/mm: use pte helpers in generic code · 26973fa5

Christophe Leroy authored Oct 09, 2018

Get rid of platform specific _PAGE_XXXX in powerpc common code and
use helpers instead.

mm/dump_linuxpagetables.c will be handled separately
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

26973fa5

powerpc/mm: don't use _PAGE_EXEC for calling hash_preload() · 34eb138e

Christophe Leroy authored Oct 09, 2018

The 'access' parameter of hash_preload() is either 0 or _PAGE_EXEC.
Among the two versions of hash_preload(), only the PPC64 one is
doing something with this 'access' parameter.

In order to remove the use of _PAGE_EXEC outside platform code,
'access' parameter is replaced by 'is_exec' which will be either
true of false, and the PPC64 version of hash_preload() creates
the access flag based on 'is_exec'.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

34eb138e

powerpc/mm: add pte helpers to query and change pte flags · daba7902

Christophe Leroy authored Oct 09, 2018

In order to avoid using generic _PAGE_XXX flags in powerpc
core functions, define helpers for all needed flags:
- pte_mkuser() and pte_mkprivileged() to set/unset and/or
unset/set _PAGE_USER and/or _PAGE_PRIVILEGED
- pte_hashpte() to check if _PAGE_HASHPTE is set.
- pte_ci() check if cache is inhibited (already existing on book3s/64)
- pte_exprotect() to protect against execution
- pte_exec() and pte_mkexec() to query and set page execution
- pte_mkpte() to set _PAGE_PTE flag.
- pte_hw_valid() to check _PAGE_PRESENT since pte_present does
something different on book3s/64.

On book3s/32 there is no exec protection, so pte_mkexec() and
pte_exprotect() are nops and pte_exec() returns always true.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

daba7902

powerpc/mm: move some nohash pte helpers in nohash/[32:64]/pgtable.h · aa9cd505

Christophe Leroy authored Oct 09, 2018

In order to allow their use in nohash/32/pgtable.h, we have to move the
following helpers in nohash/[32:64]/pgtable.h:
- pte_mkwrite()
- pte_mkdirty()
- pte_mkyoung()
- pte_wrprotect()
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

aa9cd505

powerpc/mm: don't use _PAGE_EXEC in book3s/32 · d81e6f8b

Christophe Leroy authored Oct 09, 2018

book3s/32 doesn't define _PAGE_EXEC, so no need to use it.

All other platforms define _PAGE_EXEC so no need to check
it is not NUL when not book3s/32.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

d81e6f8b

powerpc: handover page flags with a pgprot_t parameter · c766ee72

Christophe Leroy authored Oct 09, 2018

In order to avoid multiple conversions, handover directly a
pgprot_t to map_kernel_page() as already done for radix.

Do the same for __ioremap_caller() and __ioremap_at().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c766ee72

powerpc/mm: properly set PAGE_KERNEL flags in ioremap() · 56f3c141

Christophe Leroy authored Oct 09, 2018

Set PAGE_KERNEL directly in the caller and do not rely on a
hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set.

As already done for PPC64, use pgprot_cache() helpers instead of
_PAGE_XXX flags in PPC32 ioremap() derived functions.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

56f3c141

powerpc: don't use ioremap_prot() nor __ioremap() unless really needed. · aa91796e

Christophe Leroy authored Oct 09, 2018

In many places, ioremap_prot() and __ioremap() can be replaced with
higher level functions like ioremap(), ioremap_coherent(),
ioremap_cache(), ioremap_wc() ...
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

aa91796e

soc/fsl/qbman: use ioremap_cache() instead of ioremap_prot(0) · 402a5698

Christophe Leroy authored Oct 09, 2018

ioremap_prot() with flag set to 0 relies on a hack in
__ioremap_caller() which adds PAGE_KERNEL flags when the
handed flags don't look like a valid set of flags
(ie don't include _PAGE_PRESENT)

The intention being to map cached memory, use ioremap_cache() instead.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

402a5698

13 Oct, 2018 31 commits

drivers/block/z2ram: use ioremap_wt() instead of __ioremap(_PAGE_WRITETHRU) · ed18e423

Christophe Leroy authored Oct 09, 2018

_PAGE_WRITETHRU is a target specific flag. Prefer generic functions.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

ed18e423

drivers/video/fbdev: use ioremap_wc/wt() instead of __ioremap() · e04e3950

Christophe Leroy authored Oct 09, 2018

_PAGE_NO_CACHE is a platform specific flag. In addition, this flag
is misleading because one would think it requests a noncached page
whereas a noncached page is _PAGE_NO_CACHE | _PAGE_GUARDED

_PAGE_NO_CACHE alone means write combined noncached page, so lets
use ioremap_wc() instead.

_PAGE_WRITETHRU is also platform specific flag. Use ioremap_wt()
instead.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e04e3950

powerpc/32: Add ioremap_wt() and ioremap_coherent() · 86c391bd

Christophe Leroy authored Oct 09, 2018

Other arches have ioremap_wt() to map IO areas write-through.
Implement it on PPC as well in order to avoid drivers using
__ioremap(_PAGE_WRITETHRU)

Also implement ioremap_coherent() to avoid drivers using
__ioremap(_PAGE_COHERENT)
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

86c391bd

powerpc/rtas: Fix a potential race between CPU-Offline & Migration · dfd718a2

Gautham R. Shenoy authored Oct 01, 2018

Live Partition Migrations require all the present CPUs to execute the
H_JOIN call, and hence rtas_ibm_suspend_me() onlines any offline CPUs
before initiating the migration for this purpose.

The commit 85a88cab
("powerpc/pseries: Disable CPU hotplug across migrations")
disables any CPU-hotplug operations once all the offline CPUs are
brought online to prevent any further state change. Once the
CPU-Hotplug operation is disabled, the code assumes that all the CPUs
are online.

However, there is a minor window in rtas_ibm_suspend_me() between
onlining the offline CPUs and disabling CPU-Hotplug when a concurrent
CPU-offline operations initiated by the userspace can succeed thereby
nullifying the the aformentioned assumption. In this unlikely case
these offlined CPUs will not call H_JOIN, resulting in a system hang.

Fix this by verifying that all the present CPUs are actually online
after CPU-Hotplug has been disabled, failing which we restore the
state of the offline CPUs in rtas_ibm_suspend_me() and return an
-EBUSY.

Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

dfd718a2

powerpc/cacheinfo: Report the correct shared_cpu_map on big-cores · 500fe5f5

Gautham R. Shenoy authored Oct 11, 2018

Currently on POWER9 SMT8 cores systems, in sysfs, we report the
shared_cache_map for L1 caches (both data and instruction) to be the
cpu-ids of the threads in SMT8 cores. This is incorrect since on
POWER9 SMT8 cores there are two groups of threads, each of which
shares its own L1 cache.

This patch addresses this by reporting the shared_cpu_map correctly in
sysfs for L1 caches.

Before the patch
   /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map : 000000ff
   /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map : 000000ff
   /sys/devices/system/cpu/cpu1/cache/index0/shared_cpu_map : 000000ff
   /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map : 000000ff

After the patch
   /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map : 00000055
   /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map : 00000055
   /sys/devices/system/cpu/cpu1/cache/index0/shared_cpu_map : 000000aa
   /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map : 000000aa
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

500fe5f5

powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcores · 8e8a31d7

Gautham R. Shenoy authored Oct 11, 2018

POWER9 SMT8 cores consist of two groups of threads, where threads in
each group shares L1-cache. The scheduler is not aware of this
distinction as the current sched-domain hierarchy has all the threads
of the core defined at the SMT domain.

	SMT  [Thread siblings of the SMT8 core]
	DIE  [CPUs in the same die]
	NUMA [All the CPUs in the system]

Due to this, we can observe run-to-run variance when we run a
multi-threaded benchmark bound to a single core based on how the
scheduler spreads the software threads across the two groups in the
core.

We fix this in this patch by defining each group of threads which
share L1-cache to be the SMT level. The group of threads in the SMT8
core is defined to be the CACHE level. The sched-domain hierarchy
after this patch will be :

	SMT	[Thread siblings in the core that share L1 cache]
	CACHE 	[Thread siblings that are in the SMT8 core]
	DIE  	[CPUs in the same die]
	NUMA 	[All the CPUs in the system]
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

8e8a31d7

powerpc: Detect the presence of big-cores via "ibm, thread-groups" · 425752c6

Gautham R. Shenoy authored Oct 11, 2018

On IBM POWER9, the device tree exposes a property array identifed by
"ibm,thread-groups" which will indicate which groups of threads share
a particular set of resources.

As of today we only have one form of grouping identifying the group of
threads in the core that share the L1 cache, translation cache and
instruction data flow.

This patch adds helper functions to parse the contents of
"ibm,thread-groups" and populate a per-cpu variable to cache
information about siblings of each CPU that share the L1, traslation
cache and instruction data-flow.

It also defines a new global variable named "has_big_cores" which
indicates if the cores on this configuration have multiple groups of
threads that share L1 cache.

For each online CPU, it maintains a cpu_smallcore_mask, which
indicates the online siblings which share the L1-cache with it.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

425752c6

powerpc: Fix stackprotector detection for non-glibc toolchains · bf6cbd0c

Michael Ellerman authored Oct 12, 2018

If GCC is not built with glibc support then we must explicitly tell it
which register to use for TLS mode stack protector, otherwise it will
error out and the cc-option check will fail.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bf6cbd0c

powerpc/xmon: Show the stack protector canary in xmon · 50530f5e

Michael Ellerman authored Oct 12, 2018

This is helpful for debugging stack protector crashes.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

50530f5e

powerpc: Use SWITCH_FRAME_SIZE for prom and rtas entry · ed9e84a4

Joel Stanley authored Oct 12, 2018

Commit 6c171994 ("powerpc/of: Remove useless register save/restore
when calling OF back") removed the saving of srr0 and srr1 when calling
into OpenFirmware. Commit e31aa453 ("powerpc: Use LOAD_REG_IMMEDIATE
only for constants on 64-bit") did the same for rtas.

This means we don't need to save the extra stack space and can use
the common SWITCH_FRAME_SIZE.

There were already no users of _SRR0 and _SRR1 so we can remove them
too.

Link: https://github.com/linuxppc/linux/issues/83Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

ed9e84a4

powerpc/pseries/mobility: Extend start/stop topology update scope · 65b9fdad

Michael Bringmann authored Oct 09, 2018

The powerpc mobility code may receive RTAS requests to perform PRRN
(Platform Resource Reassignment Notification) topology changes at any
time, including during LPAR migration operations.

In some configurations where the affinity of CPUs or memory is being
changed on that platform, the PRRN requests may apply or refer to
outdated information prior to the complete update of the device-tree.

This patch changes the duration for which topology updates are
suppressed during LPAR migrations from just the rtas_ibm_suspend_me()
/ 'ibm,suspend-me' call(s) to cover the entire migration_store()
operation to allow all changes to the device-tree to be applied prior
to accepting and applying any PRRN requests.

For tracking purposes, pr_info notices are added to the functions
start_topology_update() and stop_topology_update() of 'numa.c'.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

65b9fdad

powerpc/Makefile: Fix PPC_BOOK3S_64 ASFLAGS · 960e3002

Joel Stanley authored Oct 11, 2018

Ever since commit 15a3204d ("powerpc/64s: Set assembler machine type
to POWER4") we force -mpower4 to be passed to the assembler
irrespective of the CFLAGS used (for Book3s 64).

When building a powerpc64 kernel with clang, clang will not add -many
to the assembler flags, so any instructions that the compiler has
generated that are not available on power4 will cause an error:

  /usr/bin/as -a64 -mppc64 -mlittle-endian -mpower8 \
   -I ./arch/powerpc/include -I ./arch/powerpc/include/generated \
   -I ./include -I ./arch/powerpc/include/uapi \
   -I ./arch/powerpc/include/generated/uapi -I ./include/uapi \
   -I ./include/generated/uapi -I arch/powerpc -I arch/powerpc \
   -maltivec -mpower4 -o init/do_mounts.o /tmp/do_mounts-3b0a3d.s
  /tmp/do_mounts-51ce54.s:748: Error: unrecognized opcode: `isel'

GCC does include -many, so the GCC driven gas call will succeed:

  as -v -I ./arch/powerpc/include -I ./arch/powerpc/include/generated -I
  ./include -I ./arch/powerpc/include/uapi
  -I ./arch/powerpc/include/generated/uapi -I ./include/uapi
  -I ./include/generated/uapi -I arch/powerpc -I arch/powerpc
   -a64 -mpower8 -many -mlittle -maltivec -mpower4 -o init/do_mounts.o

Note that isel is power7 and above for IBM CPUs. GCC only generates it
for Power9 and above, but the above test was run against the clang
generated assembly.

Peter Bergner explains:

  When using -many -mpower4, gas will first try and find a matching
  power4 mnemonic and failing that, it will then allow any valid
  mnemonic that gas knows about. GCC's use of -many predates me
  though.

  IIRC, Alan looked at trying to remove it, but I forget why he
  didn't. Could be either a gcc or gas issue at the time. I'm not sure
  whether issue still exists or not. He and I have modified how gas
  works internally a fair amount since he tried removing gcc use of
  -many.

  I will also note that when using -many, gas will choose the first
  mnemonic that matches in the mnemonic table and we have (mostly)
  sorted the table so that server mnemonics show up earlier in the
  table than other mnemonics, so they'll be seen/chosen first.

By explicitly setting -many we can build with Clang and GCC while
retaining the -mpower4 option.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

960e3002

powerpc/pseries/memory-hotplug: Fix return value type of find_aa_index · b45e9d76

YueHaibing authored Oct 09, 2018

The variable 'aa_index' is defined as an unsigned value in
update_lmb_associativity_index(), but find_aa_index() may return -1
when dlpar_clone_property() fails. So change find_aa_index() to return
a bool, which indicates whether 'aa_index' was found or not.

Fixes: c05a5a40 ("powerpc/pseries: Dynamic add entires to associativity lookup array")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Nathan Fontenot nfont@linux.vnet.ibm.com>
[mpe: Tweak changelog, rename is_found to just found]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b45e9d76

powerpc/eeh: Cleanup control flow in eeh_handle_normal_event() · b90484ec

Sam Bobroff authored Sep 12, 2018

Rather than mixing "if (state)" blocks and gotos, convert entirely to
"if (state)" blocks to make the state machine behaviour clearer.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b90484ec

powerpc/eeh: Cleanup eeh_ops.wait_state() · fef7f905

Sam Bobroff authored Sep 12, 2018

The wait_state member of eeh_ops does not need to be platform
dependent; it's just logic around eeh_ops.get_state(). Therefore,
merge the two (slightly different!) platform versions into a new
function, eeh_wait_state() and remove the eeh_ops member.

While doing this, also correct:
* The wait logic, so that it never waits longer than max_wait.
* The wait logic, so that it never waits less than
  EEH_STATE_MIN_WAIT_TIME.
* One call site where the result is treated like a bit field before
  it's checked for negative error values.
* In pseries_eeh_get_state(), rename the "state" parameter to "delay"
  because that's what it is.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

fef7f905

powerpc/eeh: Cleanup eeh_pe_state_mark() · e762bb89

Sam Bobroff authored Sep 12, 2018

Currently, eeh_pe_state_mark() marks a PE (and it's children) with a
state and then performs additional processing if that state included
EEH_PE_ISOLATED.

The state parameter is always a constant at the call site, so
rearrange eeh_pe_state_mark() into two functions and just call the
appropriate one at each site.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e762bb89

powerpc/eeh: Cleanup unnecessary eeh_pe_state_mark_with_cfg() · eed4bdbe

Sam Bobroff authored Sep 12, 2018

The function eeh_pe_state_mark_with_cfg() just performs the work of
eeh_pe_state_mark() and then, conditionally, the work of
eeh_pe_state_clear(). However it is only ever called with a constant
state such that the condition is always true, so replace it by direct
calls.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

eed4bdbe

powerpc/eeh: Cleanup eeh_enabled() · 54644927

Sam Bobroff authored Sep 12, 2018

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

54644927

powerpc/eeh: Cleanup logic in eeh_rmv_from_parent_pe() · 9a3eda26

Sam Bobroff authored Sep 12, 2018

Move the call to eeh_dev_to_pe() up, so that later it's clear that
"pe" isn't NULL.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

9a3eda26

powerpc/eeh: Cleanup field names in eeh_rmv_data · 1c5c533b

Sam Bobroff authored Sep 12, 2018

Change the name of the fields in eeh_rmv_data to clarify their usage.

Change "edev_list" to "removed_vf_list" because it does not contain
generic edevs, but rather only edevs that contain virtual functions
(which need to be removed during recovery).

Similarly, change "removed" to "removed_dev_count" because it is a
count of any removed devices, not just those in the above list.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

1c5c533b

powerpc/eeh: Cleanup list_head field names · 80e65b00

Sam Bobroff authored Sep 12, 2018

Instances of struct eeh_pe are placed in a tree structure using the
fields "child_list" and "child", so place these next to each other
in the definition.

The field "child" is a list entry, so remove the unnecessary and
misleading use of the list initializer, LIST_HEAD(), on it.

The eeh_dev struct contains two list entry fields, called "list" and
"rmv_list". Rename them to "entry" and "rmv_entry" and, as above, stop
initializing them with LIST_HEAD().
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

80e65b00

powerpc/eeh: Cleanup eeh_add_virt_device() · bf773df9

Sam Bobroff authored Sep 12, 2018

Remove the unnecessary cast through void * on the first parameter and
remove the unused second parameter (always NULL).
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bf773df9

powerpc/eeh: Cleanup unused field in eeh_dev · b95a4606

Sam Bobroff authored Sep 12, 2018

The 'bus' member of struct eeh_dev is assigned to once but never used,
so remove it.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b95a4606

powerpc/eeh: Cleanup EEH_POSTPONED_PROBE · bffc0176

Sam Bobroff authored Sep 12, 2018

Currently a flag, EEH_POSTPONED_PROBE, is used to prevent an incorrect
message "EEH: No capable adapters found" from being displayed during
the boot of powernv systems.

It is necessary because, on powernv, the call to eeh_probe_devices()
made from eeh_init() is too early and EEH can't yet be enabled. A
second call is made later from eeh_pnv_post_init(), which succeeds.

(On pseries, the first call succeeds because PCI devices are set up
early enough and no second call is made.)

This can be simplified by moving the early call to eeh_probe_devices()
from eeh_init() (where it's seen by both platforms) to
pSeries_final_fixup(), so that each platform only calls
eeh_probe_devices() once, at a point where it can succeed.
This is slightly later in the boot sequence, but but still early
enough and it is now in the same place in the sequence for both
platforms (the pcibios_fixup hook).

The display of the message can be cleaned up as well, by moving it
into eeh_probe_devices().
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bffc0176

powerpc/eeh: Fix use of EEH_PE_KEEP on wrong field · 473af09b

Sam Bobroff authored Sep 12, 2018

eeh_add_to_parent_pe() sometimes removes the EEH_PE_KEEP flag, but it
incorrectly removes it from pe->type, instead of pe->state.

However, rather than clearing it from the correct field, remove it.
Inspection of the code shows that it can't ever have had any effect
(even if it had been cleared from the correct field), because the
field is never tested after it is cleared by the statement in
question.

The clear statement was added by commit 807a827d ("powerpc/eeh:
Keep PE during hotplug"), but it didn't explain why it was necessary.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

473af09b

powerpc/eeh: Fix null deref for devices removed during EEH · bcbe3730

Sam Bobroff authored Sep 12, 2018

If a device is removed during EEH processing (either by a driver's
handler or as part of recovery), it can lead to a null dereference
in eeh_pe_report_edev().

To handle this, skip devices that have been removed.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

bcbe3730

powerpc/eeh: Fix possible null deref in eeh_dump_dev_log() · f9bc28ae

Sam Bobroff authored Sep 12, 2018

If an error occurs during an unplug operation, it's possible for
eeh_dump_dev_log() to be called when edev->pdn is null, which
currently leads to dereferencing a null pointer.

Handle this by skipping the error log for those devices.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

f9bc28ae

powerpc/boot: Build boot wrapper with optimisations · 747b2176

Joel Stanley authored Oct 10, 2018

The boot wrapper is currently built with -Os. By building with O2 we
can meaningfully reduce the time decompressing the kernel.

I tested by comparing 10 runs of each option in Qemu and on hardware.
The kernel is compressed with KERNEL_XZ built with GCC 8.2.0-7ubuntu1.
The values are counts of the timebase.

Qemu TCG powernv Power8:

              Os            O2            O3
 median       10221123889   6201518438    6568186825
 stddev        1361267211    429090641     657930076
 improvement                    39.33%        35.74%

Palmetto Power8:

              Os            O2            O3
 median           50279         50599          35790
 stddev       992144533     627130655      623721078
 improvement                   36.79%         37.13%

Romulus Power9:

              Os            O2            O3
 median       670312391     454733720      448881398
 stddev          157569        107276         108760
 improvement                   32.16%         33.03%

TCG was quite noisy, with every few runs producing an outlier. Even so,
O2 is faster than O3. On hardware the numbers were less noisy and O3 is
slightly faster than O2.

The wrapper size increases when moving from Os. Comparing zImage.epapr
to the existing Os build using bloat-o-meter:

  Before=43401, After=56837 (13KB), chg +30.96%
  Before=43401, After=64305 (20KB), chg +48.16%

I chose O2 for a balance between Qemu and hardware speed up.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

747b2176

powerpc/boot: Disable vector instructions · e8e132e6

Joel Stanley authored Oct 10, 2018

This will avoid auto-vectorisation when building with higher
optimisation levels.

We don't know if the machine can support VSX and even if it's present
it's probably not going to be enabled at this point in boot.

These flag were both added prior to GCC 4.6 which is the minimum
compiler version supported by upstream, thanks to Segher for the
details.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e8e132e6

powerpc/boot: Fix opal console in boot wrapper · 1a855eac

Joel Stanley authored Oct 10, 2018

As of commit 10c77dba ("powerpc/boot: Fix build failure in 32-bit
boot wrapper") the opal code is hidden behind CONFIG_PPC64_BOOT_WRAPPER,
but the boot wrapper avoids include/linux, so it does not get the normal
Kconfig flags.

We can drop the guard entirely as in commit f8e8e69c ("powerpc/boot:
Only build OPAL code when necessary") the makefile only includes opal.c
in the build if CONFIG_PPC64_BOOT_WRAPPER is set.

Fixes: 10c77dba ("powerpc/boot: Fix build failure in 32-bit boot wrapper")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

1a855eac

powerpc/boot: Expose Kconfig symbols to wrapper · 5e9dcb61

Joel Stanley authored Oct 10, 2018

Currently the wrapper is built without including anything in
$(src)/include/, which means there are no CONFIG_ symbols defined.
This means the platform specific serial drivers were never enabled.

We now copy the definitions into the boot directory, so any C file can
now include autoconf.h to depend on configuration options.

Fixes: 866bfc75 ("powerpc: conditionally compile platform-specific serial drivers")
Signed-off-by: Joel Stanley <joel@jms.id.au>
[mpe: Fix to use $(objtree) to find autoconf.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

5e9dcb61