- 21 Feb, 2019 6 commits
-
-
Peter Xu authored
The change_pte() notifier was designed to use as a quick path to update secondary MMU PTEs on write permission changes or PFN changes. For KVM, it could reduce the vm-exits when vcpu faults on the pages that was touched up by KSM. It's not used to do cache invalidations, for example, if we see the notifier will be called before the real PTE update after all (please see set_pte_at_notify that set_pte_at was called later). All the necessary cache invalidation should all be done in invalidate_range() already. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Merge commits we're sharing with kvm-ppc tree.
-
Paul Mackerras authored
This adds an "in_guest" parameter to machine_check_print_event_info() so that we can avoid trying to translate guest NIP values into symbolic form using the host kernel's symbol table. Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Paul Mackerras authored
This makes the handling of machine check interrupts that occur inside a guest simpler and more robust, with less done in assembler code and in real mode. Now, when a machine check occurs inside a guest, we always get the machine check event struct and put a copy in the vcpu struct for the vcpu where the machine check occurred. We no longer call machine_check_queue_event() from kvmppc_realmode_mc_power7(), because on POWER8, when a vcpu is running on an offline secondary thread and we call machine_check_queue_event(), that calls irq_work_queue(), which doesn't work because the CPU is offline, but instead triggers the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which fires again and again because nothing clears the condition). All that machine_check_queue_event() actually does is to cause the event to be printed to the console. For a machine check occurring in the guest, we now print the event in kvmppc_handle_exit_hv() instead. The assembly code at label machine_check_realmode now just calls C code and then continues exiting the guest. We no longer either synthesize a machine check for the guest in assembly code or return to the guest without a machine check. The code in kvmppc_handle_exit_hv() is extended to handle the case where the guest is not FWNMI-capable. In that case we now always synthesize a machine check interrupt for the guest. Previously, if the host thinks it has recovered the machine check fully, it would return to the guest without any notification that the machine check had occurred. If the machine check was caused by some action of the guest (such as creating duplicate SLB entries), it is much better to tell the guest that it has caused a problem. Therefore we now always generate a machine check interrupt for guests that are not FWNMI-capable. Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Merge hch's big DMA rework series. This is in a topic branch in case he wants to merge it to minimise conflicts.
-
Michael Ellerman authored
kvmhv_p9_guest_entry() implements a fast-path guest entry for Power9 when guest and host are both running with the Radix MMU. Currently in that path we don't save the host AMR (Authority Mask Register) value, and we always restore 0 on return to the host. That is OK at the moment because the AMR is not used for storage keys with the Radix MMU. However we plan to start using the AMR on Radix to prevent the kernel from reading/writing to userspace outside of copy_to/from_user(). In order to make that work we need to save/restore the AMR value. We only restore the value if it is different from the guest value, which is already in the register when we exit to the host. This should mean we rarely need to actually restore the value when running a modern Linux as a guest, because it will be using the same value as us. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Russell Currey <ruscur@russell.cc>
-
- 19 Feb, 2019 1 commit
-
-
Michael Ellerman authored
There's a few important fixes in our fixes branch, in particular the pgd/pud_present() one, so merge it now.
-
- 18 Feb, 2019 32 commits
-
-
Christoph Hellwig authored
There is no need to provide anything but get_arch_dma_ops to <linux/dma-mapping.h>. More the remaining declarations to <asm/iommu.h> and drop all the includes. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
There is no good reason for this helper, just opencode it. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Just fold the calculation into __phys_to_dma/__dma_to_phys as those are the only places that should know about it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Now that we've switched all the powerpc nommu and swiotlb methods to use the generic dma_direct_* calls we can remove these ops vectors entirely and rely on the common direct mapping bypass that avoids indirect function calls entirely. This also allows to remove a whole lot of boilerplate code related to setting up these operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Switch the streaming DMA mapping and ownership transfer methods to the functionally identical dma_direct_ versions. Factor the cache maintainance helpers into the form expected by the common code for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
The generic code allows a few nice things such as node local allocations and dipping into the CMA area. The lookup of the right zone for a given dma mask works a little different, but the results should be the same. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
The only user left is powerpc, but even there the generic dma-direct version works just as well, given that we guarantee that the swiotlb buffer must always be addressable. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
This function is largely identical to the generic version used everywhere else. Replace it with the generic version. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
This function is identical to the generic dma_direct_get_required_mask, except that the generic version also takes the bus_dma_mask account, which could lead to incorrect results in the powerpc version. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
The coherent cache version of this function already is functionally identicall to the default version, and by defining the arch_dma_coherent_to_pfn hook the same is ture for the noncoherent version as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Use the standard portable helper instead of the powerpc specific one, which is about to go away. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Instead of letting the architecture supply all of dma_set_mask just give it an additional hook selected by Kconfig. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
We need to compare the last byte in the dma range and not the one after it for the bus_dma_mask, just like we do for the regular dma_mask. Fix this cleanly by merging the two comparisms into one. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
The max_direct_dma_addr duplicates the bus_dma_mask field in struct device. Use the generic field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
pci_dma_dev_setup_swiotlb is only used by the fsl_pci code, and closely related to it, so fsl_pci.c seems like a better place for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
This function is only used by the Cell iommu code, which can keep track if it is using the iommu internally just as good. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
All iommu capable platforms now always use the iommu code with the internal bypass, so there is not need for this magic anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
The ppc_md and pci_controller_ops methods are unused now and can be removed. The dma_nommu implementation is generic to the generic one except for using max_pfn instead of calling into the memblock API, and all other dma_map_ops instances implement a method of their own. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Use the generic iommu bypass code instead of overriding set_dma_mask. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
These devices are not PCIe devices and do not have associated dma map ops, so this is just dead code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
This function is completely bogus - the fact that two PCIe devices come from the same vendor has absolutely nothing to say about the DMA capabilities and characteristics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Use the generic iommu bypass code instead of overriding set_dma_mask. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
If dart_init failed we didn't have a chance to setup dma or controller ops yet, so there is no point in resetting them. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
This gets rid of a lot of clumsy code and finally allows us to mark dma_iommu_ops const. Includes fixes from Michael Ellerman. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Configure the dma settings at device setup time, and stop playing games with get_pci_dma_ops. This prepares for using the common dma_configure code later on. Includes fixes from Michael Ellerman. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Use the generic iommu bypass code instead of overriding set_dma_mask. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Call dma_get_required_mask_pSeriesLP directly instead of dma_iommu_ops to simply the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
Add a new iommu_bypass flag to struct dev_archdata so that the dma_iommu implementation can handle the direct mapping transparently instead of switiching ops around. Setting of this flag is controlled by new pci_controller_ops method. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
vio_dma_mapping_ops currently does a lot of indirect calls through dma_iommu_ops, which not only make the code harder to follow but are also expensive in the post-spectre world. Unwind the indirect calls by calling the ppc_iommu_* or iommu_* APIs directly applicable, or just use the dma_iommu_* methods directly where we can. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
If there is no ZONE_DMA32 we might need GFP_DMA to be able to allocate memory that satisfies a 32-bit DMA mask. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christoph Hellwig authored
The pasemi driver never set a DMA mask, and given that the powerpc DMA mapping routines never check it this worked ok so far. But the generic dma-direct code which I plan to switch on for powerpc checks the DMA mask and fails unsupported mapping requests, so we need to make sure the proper 64-bit mask is set. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 17 Feb, 2019 1 commit
-
-
Michael Ellerman authored
In v4.20 we changed our pgd/pud_present() to check for _PAGE_PRESENT rather than just checking that the value is non-zero, e.g.: static inline int pgd_present(pgd_t pgd) { - return !pgd_none(pgd); + return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT)); } Unfortunately this is broken on big endian, as the result of the bitwise & is truncated to int, which is always zero because _PAGE_PRESENT is 0x8000000000000000ul. This means pgd_present() and pud_present() are always false at compile time, and the compiler elides the subsequent code. Remarkably with that bug present we are still able to boot and run with few noticeable effects. However under some work loads we are able to trigger a warning in the ext4 code: WARNING: CPU: 11 PID: 29593 at fs/ext4/inode.c:3927 .ext4_set_page_dirty+0x70/0xb0 CPU: 11 PID: 29593 Comm: debugedit Not tainted 4.20.0-rc1 #1 ... NIP .ext4_set_page_dirty+0x70/0xb0 LR .set_page_dirty+0xa0/0x150 Call Trace: .set_page_dirty+0xa0/0x150 .unmap_page_range+0xbf0/0xe10 .unmap_vmas+0x84/0x130 .unmap_region+0xe8/0x190 .__do_munmap+0x2f0/0x510 .__vm_munmap+0x80/0x110 .__se_sys_munmap+0x14/0x30 system_call+0x5c/0x70 The fix is simple, we need to convert the result of the bitwise & to an int before returning it. Thanks to Erhard, Jan Kara and Aneesh for help with debugging. Fixes: da7ad366 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit") Cc: stable@vger.kernel.org # v4.20+ Reported-by: Erhard F. <erhard_f@mailbox.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-