- 02 Jul, 2017 1 commit
-
-
Nicholas Piggin authored
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 28 Jun, 2017 20 commits
-
-
Nicholas Piggin authored
Current busy-wait loops are implemented by repeatedly calling cpu_relax() to give an arch option for a low-latency option to improve power and/or SMT resource contention. This poses some difficulties for powerpc, which has SMT priority setting instructions (priorities determine how ifetch cycles are apportioned). powerpc's cpu_relax() is implemented by setting a low priority then setting normal priority. This has several problems: - Changing thread priority can have some execution cost and potential impact to other threads in the core. It's inefficient to execute them every time around a busy-wait loop. - Depending on implementation details, a `low ; medium` sequence may not have much if any affect. Some software with similar pattern actually inserts a lot of nops between, in order to cause a few fetch cycles with the low priority. - The busy-wait loop runs with regular priority. This might only be a few fetch cycles, but if there are several threads running such loops, they could cause a noticable impact on a non-idle thread. Implement spin_begin, spin_end primitives that can be used around busy wait loops, which default to no-ops. And spin_cpu_relax which defaults to cpu_relax. This will allow architectures to hook the entry and exit of busy-wait loops, and will allow powerpc to set low SMT priority at entry, and normal priority at exit. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Akshay Adiga authored
pnv_wakeup_noloss() expects r12 to contain SRR1 value to determine if the wakeup reason is an HMI in CHECK_HMI_INTERRUPT. When we wakeup with ESL=0, SRR1 will not contain the wakeup reason, so there is no point setting r12 to SRR1. However, we don't set r12 at all so r12 contains garbage (likely a kernel pointer), and is still used to check HMI assuming that it contained SRR1. This causes the OPAL msglog to be filled with the following print: HMI: Received HMI interrupt: HMER = 0x0040000000000000 This patch clears r12 after waking up from stop with ESL=EC=0, so that we don't accidentally enter the HMI handler in pnv_wakeup_noloss() if the value of r12[42:45] corresponds to HMI as wakeup reason. Prior to commit 9d292501 ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths") this bug existed, in that we would incorrectly look at SRR1 to check for a HMI when SRR1 didn't contain a wakeup reason. However the SRR1 value would just happen to never have bits 42:45 set. Fixes: 9d292501 ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths") Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log and comment massaging] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Anshuman Khandual authored
Adds some explaination on how the vmemmap based struct page layout's physical mapping is allocated and tracked through linked list. It also keeps note of a possible race condition. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Anshuman Khandual authored
Add some explaination to the layout of vmemmap virtual address space and how physical page mapping is only used for valid PFNs present at any point on the system. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Santosh Sivaraj authored
nr_cpu_ids can be limited by nr_cpus boot parameter, whereas NR_CPUS is a compile time constant, which shouldn't be compared against during cpu kick. Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Santosh Sivaraj authored
During secondary start, we do not need to BUG_ON if an invalid CPU number is passed. We already print an error if secondary cannot be started, so just return an error instead. Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Javier Martinez Canillas authored
The at24 driver allows to register I2C EEPROM chips using different vendor and devices, but the I2C subsystem does not take the vendor into account when matching using the I2C table since it only has device entries. But when matching using an OF table, both the vendor and device has to be taken into account so the driver defines only a set of compatible strings using the "atmel" vendor as a generic fallback for compatible I2C devices. So add this generic fallback to the device node compatible string to make the device to match the driver using the OF device ID table. Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Javier Martinez Canillas authored
The at24 driver allows to register I2C EEPROM chips using different vendor and devices, but the I2C subsystem does not take the vendor into account when matching using the I2C table since it only has device entries. But when matching using an OF table, both the vendor and device has to be taken into account so the driver defines only a set of compatible strings using the "atmel" vendor as a generic fallback for compatible I2C devices. So add this generic fallback to the device node compatible string to make the device to match the driver using the OF device ID table. Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Javier Martinez Canillas authored
The at24 driver allows to register I2C EEPROM chips using different vendor and devices, but the I2C subsystem does not take the vendor into account when matching using the I2C table since it only has device entries. But when matching using an OF table, both the vendor and device has to be taken into account so the driver defines only a set of compatible strings using the "atmel" vendor as a generic fallback for compatible I2C devices. So add this generic fallback to the device node compatible string to make the device to match the driver using the OF device ID table. Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Javier Martinez Canillas authored
The at24 driver allows to register I2C EEPROM chips using different vendor and devices, but the I2C subsystem does not take the vendor into account when matching using the I2C table since it only has device entries. But when matching using an OF table, both the vendor and device has to be taken into account so the driver defines only a set of compatible strings using the "atmel" vendor as a generic fallback for compatible I2C devices. So add this generic fallback to the device node compatible string to make the device to match the driver using the OF device ID table. Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Javier Martinez Canillas authored
The at24 driver allows to register I2C EEPROM chips using different vendor and devices, but the I2C subsystem does not take the vendor into account when matching using the I2C table since it only has device entries. But when matching using an OF table, both the vendor and device has to be taken into account so the driver defines only a set of compatible strings using the "atmel" vendor as a generic fallback for compatible I2C devices. So add this generic fallback to the device node compatible string to make the device to match the driver using the OF device ID table. Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
A memory barrier is not required after the task wakes up, only if we clear the polling flag before waking. The case where we have work to do is the important one, so optimise for it. Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Ensure these don't get put into bouncing cachelines. Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
local_irq_enable can cause interrupts to be taken which could take significant amount of processing time. The idle process should set its polling flag before this, so another process that wakes it during this time will not have to send an IPI. Expand the TIF_POLLING_NRFLAG coverage to as large as possible. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Hari Bathini authored
Around 95% of memory is reserved by fadump/capture kernel. All this memory is freed, one page at a time, on writing '1' to the node /sys/kernel/fadump_release_mem. On systems with large memory, this can take a long time to complete, leading to soft lockup warning messages. To avoid this, add reschedule points at regular intervals. Also, while memblock_reserve() implicitly takes care of holes in the given memory range while reserving memory, those holes need to be taken care of while releasing memory as memory is freed one page at a time. Add support to skip holes while releasing memory. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Hari Bathini authored
fadump fails to register when there are holes in boot memory area. Provide a helpful error message to the user in such case. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Hari Bathini authored
To register fadump, boot memory area - the size of low memory chunk that is required for a kernel to boot successfully when booted with restricted memory, is assumed to have no holes. But this memory area is currently not protected from hot-remove operations. So, fadump could fail to re-register after a memory hot-remove operation, if memory is removed from boot memory area. To avoid this, ensure that memory from boot memory area is not hot-removed when fadump is registered. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Hari Bathini authored
fadump sets up crash memory ranges to be used for creating PT_LOAD program headers in elfcore header. Memory chunk RMA_START through boot memory area size is added as the first memory range because firmware, at the time of crash, moves this memory chunk to different location specified during fadump registration making it necessary to create a separate program header for it with the correct offset. This memory chunk is skipped while setting up the remaining memory ranges. But currently, there is possibility that some of this memory may have duplicate entries like when it is hot-removed and added again. Ensure that no two memory ranges represent the same memory. When 5 lmbs are hot-removed and then hot-plugged before registering fadump, here is how the program headers in /proc/vmcore exported by fadump look like without this change: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align NOTE 0x0000000000010000 0x0000000000000000 0x0000000000000000 0x0000000000001894 0x0000000000001894 0 LOAD 0x0000000000021020 0xc000000000000000 0x0000000000000000 0x0000000040000000 0x0000000040000000 RWE 0 LOAD 0x0000000040031020 0xc000000000000000 0x0000000000000000 0x0000000010000000 0x0000000010000000 RWE 0 LOAD 0x0000000050040000 0xc000000010000000 0x0000000010000000 0x0000000050000000 0x0000000050000000 RWE 0 LOAD 0x00000000a0040000 0xc000000060000000 0x0000000060000000 0x000000019ffe0000 0x000000019ffe0000 RWE 0 and with this change: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align NOTE 0x0000000000010000 0x0000000000000000 0x0000000000000000 0x0000000000001894 0x0000000000001894 0 LOAD 0x0000000000021020 0xc000000000000000 0x0000000000000000 0x0000000040000000 0x0000000040000000 RWE 0 LOAD 0x0000000040030000 0xc000000040000000 0x0000000040000000 0x0000000020000000 0x0000000020000000 RWE 0 LOAD 0x0000000060030000 0xc000000060000000 0x0000000060000000 0x000000019ffe0000 0x000000019ffe0000 RWE 0 Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
Correct "branch" event code of Power9 is "r4d05e". Replace the current "branch" event code with "r4d05e" and add a hack to use "r10012" as event code for Power9 DD1. Fixes: d89f473f ("powerpc/perf: Fix PM_BRU_CMPL event code for power9") Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
There is no reason for that message to be pr_info(), it will be printed every time we start a KVM guest. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 27 Jun, 2017 10 commits
-
-
Benjamin Herrenschmidt authored
On POWER9 the ERAT may be incorrect on wakeup from some stop states that lose state. This causes random segvs and illegal instructions when these stop states are enabled. This patch invalidates the ERAT on wakeup on POWER9 to prevent this from causing a problem. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Merge comment change with upstream changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
From: Michael Neuling <mikey@neuling.org> On P9 (Nimbus) DD2 and later, in radix mode, the move to the PID register will implicitly invalidate the user space ERAT entries and leave the kernel ones alone. Thus the only thing needed is an isync() to synchronize this with subsequent uaccess's Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Russell Currey authored
On PHB3/POWER8 systems, devices can select between two different sections of address space, TVE#0 and TVE#1. TVE#0 is intended for 32bit devices that aren't capable of addressing more than 4GB. Selecting TVE#1 instead, with the capability of addressing over 4GB, is performed by setting bit 59 of a PCI address. However, some devices aren't capable of addressing at least 59 bits, but still want more than 4GB of DMA space. In order to enable this, reconfigure TVE#0 to be suitable for 64-bit devices by allocating memory past the initial 4GB that is inaccessible by 64-bit DMAs. This bypass mode is only enabled if a device requests 4GB or more of DMA address space, if the system has PHB3 (POWER8 systems), and if the device does not share a PE with any devices from different vendors. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Russell Currey authored
Add a helper that determines if all the devices contained in a given PE are all from the same vendor or not. This can be useful in determining if it's okay to make PE-wide changes that may be suitable for some devices but not for others. This is used later in the series. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Russell Currey authored
As with P7IOC and PHB3, add kernel-side support for decoding and printing diagnostic data for PHB4. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Russell Currey authored
Diagnostic data for PHBs currently works by allocated a fixed-sized buffer. This is simple, but either wastes memory (though only a few kilobytes) or in the case of PHB4 isn't enough to fit the whole data blob. For machines that don't describe the diagnostic data size in the device tree, use the hardcoded buffer size as before. For those that do, only allocate exactly what's needed. In the special case of P7IOC (which has two types of diag data), the larger should be specified in the device tree. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Russell Currey authored
Dumping the PE State Tables (PEST) can be highly verbose if a number of PEs are affected, especially in the case where the whole PHB is frozen and 512 lines get printed. Check for duplicates when dumping the PEST to reduce useless output. For example: PE[0f8] A/B: 9700002600000000 80000080d00000f8 PE[0f9] A/B: 8000000000000000 0000000000000000 PE[..0fe] A/B: as above PE[0ff] A/B: 8440002b00000000 0000000000000000 instead of: PE[0f8] A/B: 9700002600000000 80000080d00000f8 PE[0f9] A/B: 8000000000000000 0000000000000000 PE[0fa] A/B: 8000000000000000 0000000000000000 PE[0fb] A/B: 8000000000000000 0000000000000000 PE[0fc] A/B: 8000000000000000 0000000000000000 PE[0fd] A/B: 8000000000000000 0000000000000000 PE[0fe] A/B: 8000000000000000 0000000000000000 PE[0ff] A/B: 8440002b00000000 0000000000000000 and you can imagine how much worse it can get for 512 PEs. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Neuling authored
Update to real function name. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Neuling authored
The asm code assumes the FP regs are at the start of fp_state. While this is true now, it may not always be the case and there is nothing enforcing it. This fixes the asm-offsets to point to the actual FP registers inside the fp_state. Similarly for VMX. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Neuling authored
The P9 PVR bits 12-15 don't indicate a revision but instead different chip configurations. From BookIV we have: Bits Configuration 0 : Scale out 12 cores 1 : Scale out 24 cores 2 : Scale up 12 cores 3 : Scale up 24 cores DD1 doesn't use this but DD2 does. Linux will mostly use the "Scale out 24 core" configuration (ie. SMT4 not SMT8) which results in a PVR of 0x004e1200. The reported revision in /proc/cpuinfo is hence reported incorrectly as "18.0". This patch fixes this to mask off only the relevant bits for the major revision (ie. bits 8-11) for POWER9. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 23 Jun, 2017 1 commit
-
-
Balbir Singh authored
Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate Entry (Local)) instructions. The tlbie instruction has changed over the years, so not all versions accept the same operands. Use the ISA v3 field operands because they are the most verbose, we may change them in future. Example output: qemu-system-ppc-5371 [016] 1412.369519: tlbie: tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0 Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Add some missing trace_tlbie()s, reword change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 22 Jun, 2017 1 commit
-
-
Paul Mackerras authored
This converts the powerpc VDSO time update function to use the new interface introduced in commit 576094b7 ("time: Introduce new GENERIC_TIME_VSYSCALL", 2012-09-11). Where the old interface gave us the time as of the last update in seconds and whole nanoseconds, with the new interface we get the nanoseconds part effectively in a binary fixed-point format with tk->tkr_mono.shift bits to the right of the binary point. With the old interface, the fractional nanoseconds got truncated, meaning that the value returned by the VDSO clock_gettime function would have about 1ns of jitter in it compared to the value computed by the generic timekeeping code in the kernel. The powerpc VDSO time functions (clock_gettime and gettimeofday) already work in units of 2^-32 seconds, or 0.23283 ns, because that makes it simple to split the result into seconds and fractional seconds, and represent the fractional seconds in either microseconds or nanoseconds. This is good enough accuracy for now, so this patch avoids changing how the VDSO works or the interface in the VDSO data page. This patch converts the powerpc update_vsyscall_old to be called update_vsyscall and use the new interface. We convert the fractional second to units of 2^-32 seconds without truncating to whole nanoseconds. (There is still a conversion to whole nanoseconds for any legacy users of the vdso_data/systemcfg stamp_xtime field.) In addition, this improves the accuracy of the computation of tb_to_xs for those systems with high-frequency timebase clocks (>= 268.5 MHz) by doing the right shift in two parts, one before the multiplication and one after, rather than doing the right shift before the multiplication. (We can't do all of the right shift after the multiplication unless we use 128-bit arithmetic.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 21 Jun, 2017 4 commits
-
-
Santosh Sivaraj authored
Since trace_clock is in a different file and already marked with notrace, enable tracing in time.c by removing it from the disabled list in Makefile. Also annotate clocksource read functions and sched_clock with notrace. Testing: Timer and ftrace selftests run with different trace clocks. Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
As for slb_miss_realmode(), rename slb_allocate_realmode() to avoid confusion over whether it runs in real or virtual mode - it runs in both. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
-
Michael Ellerman authored
slb_miss_realmode() doesn't always runs in real mode, which is what the name implies. So rename it to avoid confusing people. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
-
Michael Ellerman authored
All the callers of slb_miss_realmode currently open code the #ifndef CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case. We have a macro to do this, BRANCH_TO_COMMON(), so use it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
-
- 20 Jun, 2017 3 commits
-
-
Nicholas Piggin authored
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
EX_R3 is used only for a small section of the bad stack handler. Merge it with EX_DAR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
EX_LR is used only for a small section of the SLB miss handler. Merge it with EX_DAR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-