- 21 Nov, 2019 2 commits
-
-
Nicolas Saenz Julienne authored
Using a mask to represent bus DMA constraints has a set of limitations. The biggest one being it can only hold a power of two (minus one). The DMA mapping code is already aware of this and treats dev->bus_dma_mask as a limit. This quirk is already used by some architectures although still rare. With the introduction of the Raspberry Pi 4 we've found a new contender for the use of bus DMA limits, as its PCIe bus can only address the lower 3GB of memory (of a total of 4GB). This is impossible to represent with a mask. To make things worse the device-tree code rounds non power of two bus DMA limits to the next power of two, which is unacceptable in this case. In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all over the tree and treat it as such. Note that dev->bus_dma_limit should contain the higher accessible DMA address. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Merge branch 'for-next/zone-dma' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into dma-mapping-for-next Pull in a stable branch from the arm64 tree that adds the zone_dma_bits variable to avoid creating hard to resolve conflicts with that addition.
-
- 20 Nov, 2019 7 commits
-
-
Christoph Hellwig authored
The valid memory address check in dma_capable only makes sense when mapping normal memory, not when using dma_map_resource to map a device resource. Add a new boolean argument to dma_capable to exclude that check for the dma_map_resource case. Fixes: b12d6627 ("dma-direct: check for overflows on 32 bit DMA addresses") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
-
Christoph Hellwig authored
When mapping resources we can't just use swiotlb ram for bounce buffering. Switch to a direct dma_capable check instead. Fixes: cfced786 ("dma-mapping: remove the default map_resource implementation") Reported-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
-
Dan Carpenter authored
The put_hash_bucket() is a bit cleaner if it takes an unsigned long directly instead of a pointer to unsigned long. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Support for calling the DMA API functions without a valid device pointer was removed a while ago, so remove the stale support for that from the powerpc __phys_to_dma / __dma_to_phys helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
-
Christoph Hellwig authored
Move dma_capable down a bit so that we don't need a forward declaration for phys_to_dma. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
-
Christoph Hellwig authored
Currently each architectures that wants to override dma_to_phys and phys_to_dma also has to provide dma_capable. But there isn't really any good reason for that. powerpc and mips just have copies of the generic one minus the latests fix, and the arm one was the inspiration for said fix, but misses the bus_dma_mask handling. Make all architectures use the generic version instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
-
Christoph Hellwig authored
These are pure cache maintainance routines, so drop the unused struct device argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 11 Nov, 2019 7 commits
-
-
Nicolas Saenz Julienne authored
The devices found behind this PCIe chip have unusual DMA mapping constraints as there is an AMBA interconnect placed in between them and the different PCI endpoints. The offset between physical memory addresses and AMBA's view is provided by reading a PCI config register, which is saved and used whenever DMA mapping is needed. It turns out that this DMA setup can be represented by properly setting 'dma_pfn_offset', 'dma_bus_mask' and 'dma_mask' during the PCI device enable fixup. And ultimately allows us to get rid of this device's custom DMA functions. Aside from the code deletion and DMA setup, sta2x11_pdev_to_mapping() is moved to avoid warnings whenever CONFIG_PM is not enabled. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Nicolas Saenz Julienne authored
As seen on the new Raspberry Pi 4 and sta2x11's DMA implementation it is possible for a device configured with 32 bit DMA addresses and a partial DMA mapping located at the end of the address space to overflow. It happens when a higher physical address, not DMAable, is translated to it's DMA counterpart. For example the Raspberry Pi 4, configurable up to 4 GB of memory, has an interconnect capable of addressing the lower 1 GB of physical memory with a DMA offset of 0xc0000000. It transpires that, any attempt to translate physical addresses higher than the first GB will result in an overflow which dma_capable() can't detect as it only checks for addresses bigger then the maximum allowed DMA address. Fix this by verifying in dma_capable() if the DMA address range provided is at any point lower than the minimum possible DMA address on the bus. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Eric Dumazet authored
With modern NIC, it is not unusual having about ~256,000 active dma mappings and a hash size of 1024 buckets is too small. Forcing full cache line per bucket does not seem useful, especially now that we have contention on free_entries_lock for allocations and freeing of entries. Better use the space to fit more buckets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Eric Dumazet authored
Move all fields used during exact match lookups to the first cache line. This makes debug_dma_mapping_error() and friends about 50% faster. Since it removes two 32bit holes, force a cacheline alignment on struct dma_debug_entry. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Switch xtensa over to use the generic uncached support, and thus the generic implementations of dma_alloc_* and dma_alloc_*, which also gains support for mmaping DMA memory. The non-working nommu DMA support has been disabled, but could be re-enabled easily if platforms that actually have an uncached segment show up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Max Filippov <jcmvbkbc@gmail.com>
-
Christoph Hellwig authored
Integrate the generic dma remapping implementation into the main flow. This prepares for architectures like xtensa that use an uncached segment for pages in the kernel mapping, but can also remap highmem from CMA. To simplify that implementation we now always deduct the page from the physical address via the DMA address instead of the virtual address. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
-
Christoph Hellwig authored
For dma-direct we know that the DMA address is an encoding of the physical address that we can trivially decode. Use that fact to provide implementations that do not need the arch_dma_coherent_to_pfn architecture hook. Note that we still can only support mmap of non-coherent memory only if the architecture provides a way to set an uncached bit in the page tables. This must be true for architectures that use the generic remap helpers, but other architectures can also manually select it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
-
- 07 Nov, 2019 3 commits
-
-
Christoph Hellwig authored
The argument isn't used anywhere, so stop passing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
-
Christoph Hellwig authored
We can just call dma_free_contiguous directly instead of wrapping it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
-
Nicolas Saenz Julienne authored
With the introduction of ZONE_DMA in arm64 we moved the default CMA and crashkernel reservation into that area. This caused a regression on big machines that need big CMA and crashkernel reservations. Note that ZONE_DMA is only 1GB big. Restore the previous behavior as the wide majority of devices are OK with reserving these in ZONE_DMA32. The ones that need them in ZONE_DMA will configure it explicitly. Fixes: 1a8e1cef ("arm64: use both ZONE_DMA and ZONE_DMA32") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
- 01 Nov, 2019 1 commit
-
-
Nicolas Saenz Julienne authored
Some architectures, notably ARM, are interested in tweaking this depending on their runtime DMA addressing limitations. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
- 31 Oct, 2019 1 commit
-
-
Kees Cook authored
Now that the vmap area checks are being performed in the DMA infrastructure directly, there is no need to repeat them in USB. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 30 Oct, 2019 10 commits
-
-
Shyam Saini authored
These parameters are only referenced by __init routine calls during early boot so they should be marked as __initdata and __initconst accordingly. Signed-off-by: Shyam Saini <mayhs11saini@gmail.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Eric Dumazet authored
debug_dma_dump_mappings() can take a lot of cpu cycles : lpk43:/# time wc -l /sys/kernel/debug/dma-api/dump 163435 /sys/kernel/debug/dma-api/dump real 0m0.463s user 0m0.003s sys 0m0.459s Let's add a cond_resched() to avoid holding cpu for too long. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Corentin Labbe <clabbe@baylibre.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Kees Cook authored
As we've seen from USB and other areas[1], we need to always do runtime checks for DMA operating on memory regions that might be remapped. This adds vmap checks (similar to those already in USB but missing in other places) into dma_map_single() so all callers benefit from the checking. [1] https://git.kernel.org/linus/3840c5b78803b2b6cc1ff820100a74a092c40cbbSuggested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> [hch: fixed the printk message] Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Vladimir Murzin authored
Daniele reported that issue previously fixed in c41f9ea9 ("drivers: dma-coherent: Account dma_pfn_offset when used with device tree") reappear shortly after 43fc509c ("dma-coherent: introduce interface for default DMA pool") where fix was accidentally dropped. Lets put fix back in place and respect dma-ranges for reserved memory. Fixes: 43fc509c ("dma-coherent: introduce interface for default DMA pool") Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com> Tested-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com> Tested-by: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull iommu fixes from Joerg Roedel: - Follow-on fix for Renesas IPMMU to get rid of a redundant error message. - Quirk for AMD IOMMU to make it work on another Acer Laptop model with a broken IVRS ACPI table. - Fix for a panic at kdump in the Intel IOMMU driver. * tag 'iommu-fixes-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix panic after kexec -p for kdump iommu/amd: Apply the same IVRS IOAPIC workaround to Acer Aspire A315-41 iommu/ipmmu-vmsa: Remove dev_err() on platform_get_irq() failure
-
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds authored
Pull gfs2 fix from Andreas Gruenbacher: "Fix remounting (broken in -rc1)." * tag 'gfs2-v5.4-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix initialisation of args for remount
-
Andrew Price authored
When gfs2 was converted to use fs_context, the initialisation of the mount args structure to the currently active args was lost with the removal of gfs2_remount_fs(), so the checks of the new args on remount became checks against the default values instead of the current ones. This caused unexpected remount behaviour and test failures (xfstests generic/294, generic/306 and generic/452). Reinstate the args initialisation, this time in gfs2_init_fs_context() and conditional upon fc->purpose, as that's the only time we get control before the mount args are parsed in the remount process. Fixes: 1f52aa08 ("gfs2: Convert gfs2 to fs_context") Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
John Donnelly authored
This cures a panic on restart after a kexec operation on 5.3 and 5.4 kernels. The underlying state of the iommu registers (iommu->flags & VTD_FLAG_TRANS_PRE_ENABLED) on a restart results in a domain being marked as "DEFER_DEVICE_DOMAIN_INFO" that produces an Oops in identity_mapping(). [ 43.654737] BUG: kernel NULL pointer dereference, address: 0000000000000056 [ 43.655720] #PF: supervisor read access in kernel mode [ 43.655720] #PF: error_code(0x0000) - not-present page [ 43.655720] PGD 0 P4D 0 [ 43.655720] Oops: 0000 [#1] SMP PTI [ 43.655720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.2-1940.el8uek.x86_64 #1 [ 43.655720] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018 [ 43.655720] RIP: 0010:iommu_need_mapping+0x29/0xd0 [ 43.655720] Code: 00 0f 1f 44 00 00 48 8b 97 70 02 00 00 48 83 fa ff 74 53 48 8d 4a ff b8 01 00 00 00 48 83 f9 fd 76 01 c3 48 8b 35 7f 58 e0 01 <48> 39 72 58 75 f2 55 48 89 e5 41 54 53 48 8b 87 28 02 00 00 4c 8b [ 43.655720] RSP: 0018:ffffc9000001b9b0 EFLAGS: 00010246 [ 43.655720] RAX: 0000000000000001 RBX: 0000000000001000 RCX: fffffffffffffffd [ 43.655720] RDX: fffffffffffffffe RSI: ffff8880719b8000 RDI: ffff8880477460b0 [ 43.655720] RBP: ffffc9000001b9e8 R08: 0000000000000000 R09: ffff888047c01700 [ 43.655720] R10: 00002194036fc692 R11: 0000000000000000 R12: 0000000000000000 [ 43.655720] R13: ffff8880477460b0 R14: 0000000000000cc0 R15: ffff888072d2b558 [ 43.655720] FS: 0000000000000000(0000) GS:ffff888071c00000(0000) knlGS:0000000000000000 [ 43.655720] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.655720] CR2: 0000000000000056 CR3: 000000007440a002 CR4: 00000000001606b0 [ 43.655720] Call Trace: [ 43.655720] ? intel_alloc_coherent+0x2a/0x180 [ 43.655720] ? __schedule+0x2c2/0x650 [ 43.655720] dma_alloc_attrs+0x8c/0xd0 [ 43.655720] dma_pool_alloc+0xdf/0x200 [ 43.655720] ehci_qh_alloc+0x58/0x130 [ 43.655720] ehci_setup+0x287/0x7ba [ 43.655720] ? _dev_info+0x6c/0x83 [ 43.655720] ehci_pci_setup+0x91/0x436 [ 43.655720] usb_add_hcd.cold.48+0x1d4/0x754 [ 43.655720] usb_hcd_pci_probe+0x2bc/0x3f0 [ 43.655720] ehci_pci_probe+0x39/0x40 [ 43.655720] local_pci_probe+0x47/0x80 [ 43.655720] pci_device_probe+0xff/0x1b0 [ 43.655720] really_probe+0xf5/0x3a0 [ 43.655720] driver_probe_device+0xbb/0x100 [ 43.655720] device_driver_attach+0x58/0x60 [ 43.655720] __driver_attach+0x8f/0x150 [ 43.655720] ? device_driver_attach+0x60/0x60 [ 43.655720] bus_for_each_dev+0x74/0xb0 [ 43.655720] driver_attach+0x1e/0x20 [ 43.655720] bus_add_driver+0x151/0x1f0 [ 43.655720] ? ehci_hcd_init+0xb2/0xb2 [ 43.655720] ? do_early_param+0x95/0x95 [ 43.655720] driver_register+0x70/0xc0 [ 43.655720] ? ehci_hcd_init+0xb2/0xb2 [ 43.655720] __pci_register_driver+0x57/0x60 [ 43.655720] ehci_pci_init+0x6a/0x6c [ 43.655720] do_one_initcall+0x4a/0x1fa [ 43.655720] ? do_early_param+0x95/0x95 [ 43.655720] kernel_init_freeable+0x1bd/0x262 [ 43.655720] ? rest_init+0xb0/0xb0 [ 43.655720] kernel_init+0xe/0x110 [ 43.655720] ret_from_fork+0x24/0x50 Fixes: 8af46c78 ("iommu/vt-d: Implement is_attach_deferred iommu ops entry") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: John Donnelly <john.p.donnelly@oracle.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
Takashi Iwai authored
Acer Aspire A315-41 requires the very same workaround as the existing quirk for Dell Latitude 5495. Add the new entry for that. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1137799Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
YueHaibing authored
platform_get_irq() will call dev_err() itself on failure, so there is no need for the driver to also do this. This is detected by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Joerg Roedel <jroedel@suse.de>
-
- 29 Oct, 2019 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds authored
Pull fuse fixes from Miklos Szeredi: "Mostly virtiofs fixes, but also fixes a regression and couple of longstanding data/metadata writeback ordering issues" * tag 'fuse-fixes-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: redundant get_fuse_inode() calls in fuse_writepages_fill() fuse: Add changelog entries for protocols 7.1 - 7.8 fuse: truncate pending writes on O_TRUNC fuse: flush dirty data/metadata before non-truncate setattr virtiofs: Remove set but not used variable 'fc' virtiofs: Retry request submission from worker context virtiofs: Count pending forgets as in_flight forgets virtiofs: Set FR_SENT flag only after request has been sent virtiofs: No need to check fpq->connected state virtiofs: Do not end request in submission context fuse: don't advise readdirplus for negative lookup fuse: don't dereference req->args on finished request virtio-fs: don't show mount options virtio-fs: Change module name to virtiofs.ko
-
- 28 Oct, 2019 7 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arcLinus Torvalds authored
Pull ARC fixes from Vineet Gupta: "Small fixes for ARC: - perf fix for Big Endian build [Alexey] - hadk platform enable soem peripherals [Eugeniy]" * tag 'arc-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: perf: Accommodate big-endian CPU ARC: [plat-hsdk]: Enable on-boardi SPI ADC IC ARC: [plat-hsdk]: Enable on-board SPI NOR flash IC
-
Catalin Marinas authored
This variable is only used in the arch/arm64/mm/init.c file for ZONE_DMA32 initialisation, no need to expose it. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds authored
Pull HID fixes from Jiri Kosina: - HID++ device support regression fixes (race condition during cleanup, device detection fix, opps fix) from Andrey Smirnov - disable PM on i2c-hid, as it's causing problems with a lot of devices; other OSes apparently don't implement/enable it either; from Kai-Heng Feng - error handling fix in intel-ish driver, from Zhang Lixu - syzbot fuzzer fix for HID core code from Alan Stern - a few other tiny fixups (printk message cleanup, new device ID) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: i2c-hid: add Trekstor Primebook C11B to descriptor override HID: logitech-hidpp: do all FF cleanup in hidpp_ff_destroy() HID: logitech-hidpp: rework device validation HID: logitech-hidpp: split g920_get_config() HID: i2c-hid: Remove runtime power management HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring() HID: google: add magnemite/masterball USB ids HID: Fix assumption that devices have inputs HID: prodikeys: make array keys static const, makes object smaller HID: fix error message in hid_open_report()
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull virtio fixes from Michael Tsirkin: "Some minor fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vringh: fix copy direction of vringh_iov_push_kern() vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt' virtio_ring: fix stalls for packed rings
-
Jason Wang authored
We want to copy from iov to buf, so the direction was wrong. Note: no real user for the helper, but it will be used by future features. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Stefano Garzarella authored
The 'work' field was introduced with commit 06a8fc78 ("VSOCK: Introduce virtio_vsock_common.ko") but it is never used in the code, so we can remove it to save memory allocated in the per-packet 'struct virtio_vsock_pkt' Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
Marvin Liu authored
When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can use virtqueue_enable_cb_delayed_packed to reduce the number of device interrupts. At the moment, this is the case for virtio-net when the napi_tx module parameter is set to false. In this case, the virtio driver selects an event offset and expects that the device will send a notification when rolling over the event offset in the ring. However, if this roll-over happens before the event suppression structure update, the notification won't be sent. To address this race condition the driver needs to check wether the device rolled over the offset after updating the event suppression structure. With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the flags field of the descriptor at the specified offset. Unfortunately, checking at the event offset isn't reliable: if descriptors are chained (e.g. when INDIRECT is off) not all descriptors are overwritten by the device, so it's possible that the device skipped the specific descriptor driver is checking when writing out used descriptors. If this happens, the driver won't detect the race condition and will incorrectly expect the device to send a notification. For virtio-net, the result will be a TX queue stall, with the transmission getting blocked forever. With the packed ring, it isn't easy to find a location which is guaranteed to change upon the roll-over, except the next device descriptor, as described in the spec: Writes of device and driver descriptors can generally be reordered, but each side (driver and device) are only required to poll (or test) a single location in memory: the next device descriptor after the one they processed previously, in circular order. while this might be sub-optimal, let's do exactly this for now. Cc: stable@vger.kernel.org Cc: Jason Wang <jasowang@redhat.com> Fixes: f51f9826 ("virtio_ring: leverage event idx in packed ring") Signed-off-by: Marvin Liu <yong.liu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-
- 27 Oct, 2019 1 commit
-
-
Linus Torvalds authored
-