- 21 Feb, 2017 4 commits
-
-
Vinod Koul authored
-
Vinod Koul authored
-
Vinod Koul authored
-
Vinod Koul authored
-
- 16 Feb, 2017 1 commit
-
-
Dan Williams authored
Back in 2011, Russell pointed out that the "async_tx channel switch" capability was violating expectations of the dma mapping api [1]. At the time the existing uses were reviewed as still usable, but that longer term we needed a rework of the raid offload implementation. While some of the framework for a fixed implementation was introduced in 2012 [2], the wider rewrite never materialized. There continues to be interest in raid offload with new dma/raid engine drivers being submitted. Those drivers must not build on top of the broken channel switching capability. Prevent async_tx from using an offload engine if the channel switching capability is enabled. This still allows the engine to be used for other purposes, but the broken way async_tx uses these engines for raid will be disabled. For configurations where this causes a performance regression the only solution is to start the work of eliminating the async_tx api and moving channel management into the raid code directly where it can manage marshalling an operation stream between multiple dma channels. [1]: http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/036753.html [2]: https://lkml.org/lkml/2012/12/6/71 Cc: Anatolij Gustschin <agust@denx.de> Cc: Anup Patel <anup.patel@broadcom.com> Cc: Rameshwar Prasad Sahu <rsahu@apm.com> Cc: Saeed Bishara <saeed.bishara@gmail.com> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reported-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 14 Feb, 2017 1 commit
-
-
Geert Uytterhoeven authored
By default, the DMA mask covers only the low 32-bit address space, which causes SWIOTLB on arm64 to fall back to a bounce buffer for DMA transfers involving memory outside the 32-bit address space. The R-Car DMA controller hardware supports a 40-bit address space, hence widen the DMA mask to 40 bits to actually make use of this feature. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 05 Feb, 2017 1 commit
-
-
Icenowy Zheng authored
As 64-bit Allwinner H5 SoC has the same DMA engine with H3, the DMA driver should be allowed to be built for ARM64, in order to make it work on H5. Signed-off-by: Icenowy Zheng <icenowy@aosc.xyz> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 31 Jan, 2017 1 commit
-
-
Boris Brezillon authored
Almost all ->device_prep_dma_xx() methods have a wrapper defined in dmaengine.h. Add one for ->device_prep_dma_memcpy(). Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 25 Jan, 2017 9 commits
-
-
Jun Nie authored
Fix build warning that related to PAGE_SIZE. The maximum DMA length has nothing to do with PAGE_SIZE, just use a fix number for the definition. drivers/dma/zx_dma.c: In function 'zx_dma_prep_memcpy': drivers/dma/zx_dma.c:523:8: warning: division by zero [-Wdiv-by-zero] drivers/dma/zx_dma.c: In function 'zx_dma_prep_slave_sg': drivers/dma/zx_dma.c:567:11: warning: division by zero [-Wdiv-by-zero] Signed-off-by: Jun Nie <jun.nie@linaro.org> Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Andy Shevchenko authored
Intel Merrifield platform contains Intel integrated DMA (iDMA 32-bit) which has a slightly different register mapping, e.g. some bits in CTL_* and CFG_* channel registers, and has to use platform data since there is no autoconfiguration. The iDMA 32-bit specification is available in the publicly available documentation for Intel Braswell and BayTrail SoCs as LPE Audio DMA. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Andy Shevchenko authored
iDMA 32-bit is Intel designed DMA controller that behaves like Synopsys Designware DMA. This patch adds a support of the new Intel hardware. Due to iDMA 32-bit has no autoconfiguration the platform code must provide a platform data to dw_dma_probe(). By default full FIFO (1024 bytes) is assigned to channel 0. Here we slice FIFO on equal parts between channels for iDMA 32-bit case. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Andy Shevchenko authored
The integrated DMA (iDMA 32-bit) is Intel designed DMA controller which mimics Synopsys Designware DMA. This patch appends the register mappings for the parts which are slightly different to the DesignWare hardware. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Andy Shevchenko authored
The newly introduced helpers prepare driver to support new DMA controller hardware. While here, introduce DWC_CTLH_BLOCK_TS() macro as well. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Andy Shevchenko authored
iDMA 32-bit has a special handling of the FIFO during pause() / terminate_all(). Prepare code to implement that. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Andy Shevchenko authored
Replace convert_burst() with one liner in place. The change simplifies further extension of the driver to cover new DMA controller hardware. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Andy Shevchenko authored
It is really useful not only for debugging to have an IRQ line and DMA pool labeled with driver and its instance ID. Do this for DesignWare DMA driver. All current users of this IP would be enhanced later on. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Jarkko Nikula authored
When transferring more data than the maximum block size supported by the HW multiplied by source width the transfer is split into smaller chunks. Currently code calculates the memory width and thus aligment before splitting for both memory to device and device to memory transfers. For memory to device transfers this work fine since alignment is preserved through the splitting and split blocks are still memory width aligned. However in device to memory transfers aligment breaks when maximum block size multiplied by register width doesn't have the same alignment than the buffer. For instance when transferring from an 8-bit register 4100 bytes (32-bit aligned) on a DW DMA that has maximum block size of 4095 elements. An attempt to do such transfers caused data corruption. Fix this by calculating and setting the destination memory width after splitting by using the split block aligment and length. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 14 Jan, 2017 2 commits
-
-
Linus Walleij authored
The ste_dma40 has burst level granularity on the residue registers, which is necessary for some clients to know, notably the UART. Before this patch we get this message: uart-pl011 80007000.uart: RX DMA disabled - no residue processing This patch fixes it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Linus Walleij authored
Since the introduction of the .directions flags, ste_dma40 was never patched to indicate which transfer directions it can manage. This causes a problem when trying to use the dmaengine for generic ALSA SoC DMA: ux500-msp-i2s.1: Failed to get DMA channel capabilities, falling back to period counting: -6 This patch fixes this issue by indicating the supported transfer directions for slave and memcpy channels. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 10 Jan, 2017 2 commits
-
-
M'boumba Cedric Madianga authored
This patch adds some error messages when a slave device fails to request a channel. Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com> Reviewed-by: Ludovic BARRE <ludovic.barre@st.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Andy Shevchenko authored
LPE Audio driver should take care of DMA IPs by itself. Keeping an ID like this in dw_dma_pci.c is anyway wrong since that block has two DMA controllers under one ID (like MFD device). That's also why I didn't include LPE Audio ID for Intel Merrifield previously. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 03 Jan, 2017 6 commits
-
-
M'boumba Cedric Madianga authored
This patch sets the max_burst value supported by the STM32 DMA Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
M'boumba Cedric Madianga authored
Implement the new device_synchronize() callback to allow proper synchronization when stopping a channel. Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
M'boumba Cedric Madianga authored
This patch resolves the residue computation issue detected in cyclic mode. Now, in cyclic mode, we increment next_sg variable as soon as a period is transferred instead of after pushing a new sg request. Then, we take into account that after transferring a complete buffer, the next_sg variable is equal to 0. Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com> Reviewed-by: Ludovic BARRE <ludovic.barre@st.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
M'boumba Cedric Madianga authored
This patch reworks the way to manage transfer starting. Now, starting DMA is only allowed when the channel is not busy. Then, stm32_dma_start_transfer is declared as void. At least, after each transfer completion, we start the next transfer if a new descriptor as been queued in the issued list during an ongoing transfer. Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com> Reviewed-by: Ludovic BARRE <ludovic.barre@st.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
M'boumba Cedric Madianga authored
Only four cells are required for dma client binding not five. Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com> Reviewed-by: Ludovic BARRE <ludovic.barre@st.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
M'boumba Cedric Madianga authored
As STM32 DMA driver is only used as buit-in driver, it couldn't be used as module. Signed-off-by: M'boumba Cedric Madianga <cedric.madianga@gmail.com> Reviewed-by: Ludovic BARRE <ludovic.barre@st.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 02 Jan, 2017 6 commits
-
-
Shawn Guo authored
The dma residue is defined as the free space to end of transfer buffer, which could be multiple segments chained together. So the residue calculation in zx_dma_tx_status() works for both slave_sg and cyclic case. But unfortunately, the 'index' is wrong. It should plus one, because the current segment is already occupied and shouldn't be counted into free space. This fixes the HDMI audio noise issue we see on ZX296718 with SPDIF interface. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Jun Nie <jun.nie@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Shawn Guo authored
The zx_dma driver supports cyclic transfer mode. Let's set DMA_CYCLIC cap_mask bit to make that clear, and avoid unnecessary failure when clients request channel via dma_request_chan_by_mask() with DMA_CYCLIC bit set in mask. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Jun Nie <jun.nie@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Shawn Guo authored
ZTE ZX dma driver is not ZX296702 specific. It works for not only ZX296702 but also other ZTE ZX family platforms like ZX296718. Let's rename the file to reflect that. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Jun Nie <jun.nie@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Magnus Lilja authored
Commit 3d8cc000 ("dmaengine: ipu: Consolidate duplicated irq handlers") consolidated the two interrupts routines into one, but the remaining interrupt routine only checks the status of the error interrupts, not the normal interrupts. This patch fixes that problem (tested on i.MX31 PDK board). Fixes: 3d8cc000 ("dmaengine: ipu: Consolidate duplicated irq handlers") Cc: Vinod Koul <vinod.koul@intel.com> Cc: <stable@vger.kernel.org> # 4.1.x Signed-off-by: Magnus Lilja <lilja.magnus@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Amit Kumar authored
cudeaurora.org is used in place of codeaurora.org in the contact field. Signed-off-by: Amit Kumar <free.amit.kumar@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
Matthew Wilcox authored
dmaengine currently uses an IDR to allocate DMA IDs, but it only needs to know whether IDs are in use or not; the ID to pointer functionality of the IDR is unused. That means it can use the more space-efficient IDA. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
-
- 01 Jan, 2017 2 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds authored
Pull DAX updates from Dan Williams: "The completion of Jan's DAX work for 4.10. As I mentioned in the libnvdimm-for-4.10 pull request, these are some final fixes for the DAX dirty-cacheline-tracking invalidation work that was merged through the -mm, ext4, and xfs trees in -rc1. These patches were prepared prior to the merge window, but we waited for 4.10-rc1 to have a stable merge base after all the prerequisites were merged. Quoting Jan on the overall changes in these patches: "So I'd like all these 6 patches to go for rc2. The first three patches fix invalidation of exceptional DAX entries (a bug which is there for a long time) - without these patches data loss can occur on power failure even though user called fsync(2). The other three patches change locking of DAX faults so that ->iomap_begin() is called in a more relaxed locking context and we are safe to start a transaction there for ext4" These have received a build success notification from the kbuild robot, and pass the latest libnvdimm unit tests. There have not been any -next releases since -rc1, so they have not appeared there" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: ext4: Simplify DAX fault path dax: Call ->iomap_begin without entry lock during dax fault dax: Finish fault completely when loading holes dax: Avoid page invalidation races and unnecessary radix tree traversals mm: Invalidate DAX radix tree entries only if appropriate ext2: Return BH_New buffers for zeroed blocks
-
- 30 Dec, 2016 2 commits
-
-
git://git.lwn.net/linuxLinus Torvalds authored
Pull documentation fixes from Jonathan Corbet: "Two small fixes: - A merge error on my part broke the DocBook build. I've requisitioned one of tglx's frozen sharks for appropriate disciplinary action and resolved to be more careful about testing the DocBook stuff as long as it's still around. - Fix an error in unaligned-memory-access.txt" * tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux: Documentation/unaligned-memory-access.txt: fix incorrect comparison operator docs: Fix build failure
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fix from Herbert Xu: "This fixes a boot failure on some platforms when crypto self test is enabled along with the new acomp interface" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: testmgr - Use heap buffer for acomp test input
-
- 29 Dec, 2016 2 commits
-
-
Olof Johansson authored
mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte': mm/filemap.c:933:9: error: too few arguments to function 'test_bit' return test_bit(PG_waiters); ^~~~~~~~ Fixes: b91e1302 ('mm: optimize PageWaiters bit use for unlock_page()') Signed-off-by: Olof Johansson <olof@lixom.net> Brown-paper-bag-by: Linus Torvalds <dummy@duh.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
In commit 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: Nick Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 28 Dec, 2016 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fix from Herbert Xu: "This fixes a hash corruption bug in the marvell driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
-