- 03 Aug, 2018 40 commits
-
-
Qu Wenruo authored
[ Upstream commit ff3d27a0 ] Under the following case, qgroup rescan can double account cowed tree blocks: In this case, extent tree only has one tree block. - | transid=5 last committed=4 | btrfs_qgroup_rescan_worker() | |- btrfs_start_transaction() | | transid = 5 | |- qgroup_rescan_leaf() | |- btrfs_search_slot_for_read() on extent tree | Get the only extent tree block from commit root (transid = 4). | Scan it, set qgroup_rescan_progress to the last | EXTENT/META_ITEM + 1 | now qgroup_rescan_progress = A + 1. | | fs tree get CoWed, new tree block is at A + 16K | transid 5 get committed - | transid=6 last committed=5 | btrfs_qgroup_rescan_worker() | btrfs_qgroup_rescan_worker() | |- btrfs_start_transaction() | | transid = 5 | |- qgroup_rescan_leaf() | |- btrfs_search_slot_for_read() on extent tree | Get the only extent tree block from commit root (transid = 5). | scan it using qgroup_rescan_progress (A + 1). | found new tree block beyong A, and it's fs tree block, | account it to increase qgroup numbers. - In above case, tree block A, and tree block A + 16K get accounted twice, while qgroup rescan should stop when it already reach the last leaf, other than continue using its qgroup_rescan_progress. Such case could happen by just looping btrfs/017 and with some possibility it can hit such double qgroup accounting problem. Fix it by checking the path to determine if we should finish qgroup rescan, other than relying on next loop to exit. Reported-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Sterba authored
[ Upstream commit 3d3a2e61 ] Currently the code assumes that there's an implied barrier by the sequence of code preceding the wakeup, namely the mutex unlock. As Nikolay pointed out: I think this is wrong (not your code) but the original assumption that the RELEASE semantics provided by mutex_unlock is sufficient. According to memory-barriers.txt: Section 'LOCK ACQUISITION FUNCTIONS' states: (2) RELEASE operation implication: Memory operations issued before the RELEASE will be completed before the RELEASE operation has completed. Memory operations issued after the RELEASE *may* be completed before the RELEASE operation has completed. (I've bolded the may portion) The example given there: As an example, consider the following: *A = a; *B = b; ACQUIRE *C = c; *D = d; RELEASE *E = e; *F = f; The following sequence of events is acceptable: ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE So if we assume that *C is modifying the flag which the waitqueue is checking, and *E is the actual wakeup, then those accesses can be re-ordered... IMHO this code should be considered broken... Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Omar Sandoval authored
[ Upstream commit 05522109 ] btrfs_free_extent() can fail because of ENOMEM. There's no reason to panic here, we can just abort the transaction. Fixes: f4b9aa8d ("btrfs_truncate") Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Omar Sandoval <osandov@fb.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Omar Sandoval authored
[ Upstream commit c08db7d8 ] In btrfs_evict_inode(), if btrfs_truncate_inode_items() fails, the inode item will still be in the tree but we still return the ino to the ino cache. That will blow up later when someone tries to allocate that ino, so don't return it to the cache. Fixes: 581bb050 ("Btrfs: Cache free inode numbers in memory") Reviewed-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans Verkuil authored
[ Upstream commit 90b2da89 ] When a buffer is queued or requeued in vb2_buffer_done, then don't call the finish memop. In this case the buffer is only returned to vb2, not to userspace. Calling 'finish' here will cause an unbalance when the queue is canceled, since the core will call the same memop again. Signed-off-by:
Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by:
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ezequiel Garcia authored
[ Upstream commit 636757ab ] When the driver is configured in the "memcpy" dma-mode, it uses vb2_vmalloc_memops, which is backed by a SLAB allocator and so shouldn't be using GFP_DMA32. Fix it. Signed-off-by:
Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by:
Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by:
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Fuyun Liang authored
[ Upstream commit 7d0b130c ] RX Buffer Descriptor contains a VALID bit which indicates if the BD is valid and has some data. This field is set by HNS3 hardware to intimate the driver of some valid data present in the BD. nd should be reset by the driver when BD is being used again. In the existing code this bit was not being (re-)initialized properly and hence was causing problems. Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by:
Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by:
Peng Li <lipeng321@huawei.com> Signed-off-by:
Salil Mehta <salil.mehta@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eyal Reizer authored
[ Upstream commit 6e91d483 ] the wl pointer can be null In case only wlcore_sdio is probed while no WiLink module is successfully probed, as in the case of mounting a wl12xx module while using a device tree file configured with wl18xx related settings. In this case the system was crashing in wl1271_suspend() as platform device data is not set. Make sure wl the pointer is valid before using it. Signed-off-by:
Eyal Reizer <eyalr@ti.com> Signed-off-by:
Kalle Valo <kvalo@codeaurora.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ganapathi Bhat authored
[ Upstream commit b817047a ] Race condition is observed during rmmod of mwifiex_usb: 1. The rmmod thread will call mwifiex_usb_disconnect(), download SHUTDOWN command and do wait_event_interruptible_timeout(), waiting for response. 2. The main thread will handle the response and will do a wake_up_interruptible(), unblocking rmmod thread. 3. On getting unblocked, rmmod thread will make rx_cmd.urb = NULL in mwifiex_usb_free(). 4. The main thread will try to resubmit rx_cmd.urb in mwifiex_usb_submit_rx_urb(), which is NULL. To fix, wait for main thread to complete before calling mwifiex_usb_free(). Signed-off-by:
Ganapathi Bhat <gbhat@marvell.com> Signed-off-by:
Kalle Valo <kvalo@codeaurora.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vincent Palatin authored
[ Upstream commit 0dbbf255 ] If we cannot communicate with the EC chip to detect the protocol version and its features, it's very likely useless to continue. Else we will commit all kind of uninformed mistakes (using the wrong protocol, the wrong buffer size, mixing the EC with other chips). Signed-off-by:
Vincent Palatin <vpalatin@chromium.org> Acked-by:
Benson Leung <bleung@chromium.org> Signed-off-by:
Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by:
Gwendal Grignou <gwendal@chromium.org> Reviewed-by:
Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by:
Lee Jones <lee.jones@linaro.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kai Chieh Chuang authored
[ Upstream commit 9c0ac70a ] In case, one BE is used by two FE1/FE2 FE1--->BE--> | FE2----] when FE1/FE2 call dpcm_be_dai_hw_free() together the BE users will be 2 (> 1), hence cannot be hw_free the be state will leave at, ex. SND_SOC_DPCM_STATE_STOP later FE1/FE2 call dpcm_be_dai_shutdown(), will be skip due to wrong state. leaving the BE not being hw_free and shutdown. The BE dai will be hw_free later when calling dpcm_be_dai_shutdown() if still in invalid state. Signed-off-by:
KaiChieh Chuang <kaichieh.chuang@mediatek.com> Signed-off-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jian-Hong Pan authored
[ Upstream commit 66d9975c ] Without this patch we cannot turn on the Bluethooth adapter on ASUS E406MA. T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=2ff8 ProdID=b011 Rev= 2.00 S: Manufacturer=Realtek S: Product=802.11n WLAN Adapter S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms Signed-off-by:
Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by:
Marcel Holtmann <marcel@holtmann.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thierry Escande authored
[ Upstream commit 9960521c ] This patch fixes the following warning during boot: do not call blocking ops when !TASK_RUNNING; state=1 set at [<(ptrval)>] qca_setup+0x194/0x750 [hci_uart] WARNING: CPU: 2 PID: 1878 at kernel/sched/core.c:6135 __might_sleep+0x7c/0x88 In qca_set_baudrate(), the current task state is set to TASK_UNINTERRUPTIBLE before going to sleep for 300ms. It was then restored to TASK_INTERRUPTIBLE. This patch sets the current task state back to TASK_RUNNING instead. Signed-off-by:
Thierry Escande <thierry.escande@linaro.org> Signed-off-by:
Marcel Holtmann <marcel@holtmann.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shaul Triebitz authored
[ Upstream commit 0f22e400 ] Make sure the rx_allocator worker is canceled before running the rx_init routine. rx_init frees and re-allocates all rxb's pages. The rx_allocator worker also allocates pages for the used rxb's. Running rx_init and rx_allocator simultaniously causes a kernel panic. Fix that by canceling the work in rx_init. Signed-off-by:
Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by:
Luca Coelho <luciano.coelho@intel.com> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ethan Lien authored
[ Upstream commit e73e81b6 ] [Problem description and how we fix it] We should balance dirty metadata pages at the end of btrfs_finish_ordered_io, since a small, unmergeable random write can potentially produce dirty metadata which is multiple times larger than the data itself. For example, a small, unmergeable 4KiB write may produce: 16KiB dirty leaf (and possibly 16KiB dirty node) in subvolume tree 16KiB dirty leaf (and possibly 16KiB dirty node) in checksum tree 16KiB dirty leaf (and possibly 16KiB dirty node) in extent tree Although we do call balance dirty pages in write side, but in the buffered write path, most metadata are dirtied only after we reach the dirty background limit (which by far only counts dirty data pages) and wakeup the flusher thread. If there are many small, unmergeable random writes spread in a large btree, we'll find a burst of dirty pages exceeds the dirty_bytes limit after we wakeup the flusher thread - which is not what we expect. In our machine, it caused out-of-memory problem since a page cannot be dropped if it is marked dirty. Someone may worry about we may sleep in btrfs_btree_balance_dirty_nodelay, but since we do btrfs_finish_ordered_io in a separate worker, it will not stop the flusher consuming dirty pages. Also, we use different worker for metadata writeback endio, sleep in btrfs_finish_ordered_io help us throttle the size of dirty metadata pages. [Reproduce steps] To reproduce the problem, we need to do 4KiB write randomly spread in a large btree. In our 2GiB RAM machine: 1) Create 4 subvolumes. 2) Run fio on each subvolume: [global] direct=0 rw=randwrite ioengine=libaio bs=4k iodepth=16 numjobs=1 group_reporting size=128G runtime=1800 norandommap time_based randrepeat=0 3) Take snapshot on each subvolume and repeat fio on existing files. 4) Repeat step (3) until we get large btrees. In our case, by observing btrfs_root_item->bytes_used, we have 2GiB of metadata in each subvolume tree and 12GiB of metadata in extent tree. 5) Stop all fio, take snapshot again, and wait until all delayed work is completed. 6) Start all fio. Few seconds later we hit OOM when the flusher starts to work. It can be reproduced even when using nocow write. Signed-off-by:
Ethan Lien <ethanlien@synology.com> Reviewed-by:
David Sterba <dsterba@suse.com> [ add comment ] Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kiszka authored
[ Upstream commit 3bbce531 ] Fix a memory leak by freeing the PCI resource list in devm_pci_release_host_bridge_dev(). Fixes: 5c3f18cc ("PCI: Add devm_pci_alloc_host_bridge() interface") Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shuah Khan (Samsung OSG) authored
[ Upstream commit 5c30a038 ] When intel_pstate test is skipped because of unmet dependencies and/or unsupported configuration, it returns 0 which is treated as a pass by the Kselftest framework. This leads to false positive result even when the test could not be run. Change it to return kselftest skip code when a test gets skipped to clearly report that the test could not be run. Kselftest framework SKIP code is 4 and the framework prints appropriate messages to indicate that the test is skipped. Signed-off-by:
Shuah Khan (Samsung OSG) <shuah@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shuah Khan (Samsung OSG) authored
[ Upstream commit b27f0259 ] When memfd test is skipped because of unmet dependencies and/or unsupported configuration, it returns non-zero value which is treated as a fail by the Kselftest framework. This leads to false negative result even when the test could not be run. Change it to return kselftest skip code when a test gets skipped to clearly report that the test could not be run. Added an explicit check for root user at the start of memfd hugetlbfs test and return skip code if a non-root user attempts to run it. In addition, return skip code when not enough huge pages are available to run the test. Kselftest framework SKIP code is 4 and the framework prints appropriate messages to indicate that the test is skipped. Signed-off-by:
Shuah Khan (Samsung OSG) <shuah@kernel.org> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by:
Shuah Khan (Samsung OSG) <shuah@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Díaz authored
[ Upstream commit e9d33f14 ] A few changes improve the overall usability of the test: * fix a hard-coded maximum frequency (3300), * don't adjust the CPU frequency if only evaluating results, * fix a comparison for multiple frequencies. A symptom of that last issue looked like this: ./run.sh: line 107: [: too many arguments ./run.sh: line 110: 3099 3099 3100-3100: syntax error in expression (error token is \"3099 3100-3100\") Because a check will count how many differente frequencies there are among the CPUs of the system, and after they are tallied another read is performed, which might produce different results. Signed-off-by:
Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by:
Shuah Khan (Samsung OSG) <shuah@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kan Liang authored
[ Upstream commit d71f11c0 ] For Nehalem and Westmere, there is only one fixed counter for W-Box. There is no index which is bigger than UNCORE_PMC_IDX_FIXED. It is not correct to use >= to check fixed counter. The code quality issue will bring problem when new counter index is introduced. Signed-off-by:
Kan Liang <kan.liang@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-2-git-send-email-kan.liang@intel.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kan Liang authored
[ Upstream commit 4749f819 ] There is no index which is bigger than UNCORE_PMC_IDX_FIXED. The only exception is client IMC uncore, which has been specially handled. For generic code, it is not correct to use >= to check fixed counter. The code quality issue will bring problem when a new counter index is introduced. Signed-off-by:
Kan Liang <kan.liang@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-3-git-send-email-kan.liang@intel.comSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Grzeschik authored
[ Upstream commit de19ca6f ] As the amount of available ports varies by the kernels build configuration. To remove the limitation of the fixed 128 ports we allocate the amount of idevs by using the number we get from the kernel. Signed-off-by:
Michael Grzeschik <m.grzeschik@pengutronix.de> Acked-by:
Shuah Khan (Samsung OSG) <shuah@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shuah Khan (Samsung OSG) authored
[ Upstream commit d179f99a ] detach_port() fails to call usbip_vhci_driver_close() from its error path after usbip_vhci_detach_device() returns failure, leaking memory allocated in usbip_vhci_driver_open() and holding udev_context and udev references. Fix it to call usbip_vhci_driver_close(). Signed-off-by:
Shuah Khan (Samsung OSG) <shuah@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Filippo Muzzini authored
[ Upstream commit a12bffeb ] In bfq_requests_merged(), there is a deadlock because the lock on bfqq->bfqd->lock is held by the calling function, but the code of this function tries to grab the lock again. This deadlock is currently hidden by another bug (fixed by next commit for this source file), which causes the body of bfq_requests_merged() to be never executed. This commit removes the deadlock by removing the lock/unlock pair. Signed-off-by:
Filippo Muzzini <filippo.muzzini@outlook.it> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chao Yu authored
[ Upstream commit 27319ba4 ] Thread GC thread - f2fs_ioc_start_atomic_write - get_dirty_pages - filemap_write_and_wait_range - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - f2fs_is_atomic_file - set_page_dirty - set_inode_flag(, FI_ATOMIC_FILE) Dirty data page can still be generated by GC in race condition as above call stack. This patch adds fi->dio_rwsem[WRITE] in f2fs_ioc_start_atomic_write to avoid such race. Signed-off-by:
Chao Yu <yuchao0@huawei.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chao Yu authored
[ Upstream commit c22aecd7 ] dquot_initialize() can fail due to any exception inside quota subsystem, f2fs needs to be aware of it, and return correct return value to caller. Signed-off-by:
Chao Yu <yuchao0@huawei.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sahitya Tummala authored
[ Upstream commit 60b2b4ee ] f2fs_ioc_shutdown() ioctl gets stuck in the below path when issued with F2FS_GOING_DOWN_FULLSYNC option. __switch_to+0x90/0xc4 percpu_down_write+0x8c/0xc0 freeze_super+0xec/0x1e4 freeze_bdev+0xc4/0xcc f2fs_ioctl+0xc0c/0x1ce0 f2fs_compat_ioctl+0x98/0x1f0 Signed-off-by:
Sahitya Tummala <stummala@codeaurora.org> Reviewed-by:
Chao Yu <yuchao0@huawei.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chao Yu authored
[ Upstream commit e5e5732d ] After revoking atomic write, related LBA can be reused by others, so we need to wait page writeback before reusing the LBA, in order to avoid interference between old atomic written in-flight IO and new IO. Signed-off-by:
Chao Yu <yuchao0@huawei.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chao Yu authored
[ Upstream commit 64c74a7a ] - f2fs_fill_super - recover_fsync_data - recover_data - del_fsync_inode - iput - iput_final - write_inode_now - f2fs_write_inode - f2fs_balance_fs - f2fs_balance_fs_bg - sync_dirty_inodes With data_flush mount option, during recovery, in order to avoid entering above writeback flow, let's detect recovery status and do skip in f2fs_balance_fs_bg. Signed-off-by:
Chao Yu <yuchao0@huawei.com> Signed-off-by:
Yunlei He <heyunlei@huawei.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chao Yu authored
[ Upstream commit 14a28559 ] This patch fixes error path of move_data_page: - clear cold data flag if it fails to write page. - redirty page for non-ENOMEM case. Signed-off-by:
Chao Yu <yuchao0@huawei.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anatoly Pugachev authored
[ Upstream commit 4071e67c ] The following patch disables loading of f2fs module on architectures which have PAGE_SIZE > 4096 , since it is impossible to mount f2fs on such architectures , log messages are: mount: /mnt: wrong fs type, bad option, bad superblock on /dev/vdiskb1, missing codepage or helper program, or other error. /dev/vdiskb1: F2FS filesystem, UUID=1d8b9ca4-2389-4910-af3b-10998969f09c, volume name "" May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid page_cache_size (8192), supports only 4KB May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS filesystem in 1th superblock May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid page_cache_size (8192), supports only 4KB May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS filesystem in 2th superblock May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid page_cache_size (8192), supports only 4KB which was introduced by git commit 5c9b4692 tested on git kernel 4.17.0-rc6-00309-gec30dcf7 with patch applied: modprobe: ERROR: could not insert 'f2fs': Invalid argument May 28 01:40:28 v215 kernel: F2FS not supported on PAGE_SIZE(8192) != 4096 Signed-off-by:
Anatoly Pugachev <matorola@gmail.com> Reviewed-by:
Chao Yu <yuchao0@huawei.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Trond Myklebust authored
[ Upstream commit ae55e59d ] If the server recalls the layout that was just handed out, we risk hitting a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we release the sequence slot after processing the LAYOUTGET operation that was sent as part of the OPEN compound. Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Kodanev authored
[ Upstream commit 9c7f96fd ] The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before using nft_trans_set(trans). Otherwise we can get out of bounds read. For example, KASAN reported the one when running 0001_cache_handling_0 nft test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE: [75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables] [75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356 ... [75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G E 4.17.0-rc7.1.x86_64 #1 [75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2 [75517.618129] Call Trace: [75517.648821] dump_stack+0xd1/0x13b [75517.691040] ? show_regs_print_info+0x5/0x5 [75517.742519] ? kmsg_dump_rewind_nolock+0xf5/0xf5 [75517.799300] ? lock_acquire+0x143/0x310 [75517.846738] print_address_description+0x85/0x3a0 [75517.904547] kasan_report+0x18d/0x4b0 [75517.949892] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.019153] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.088420] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.157689] nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.224869] nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables] [75518.291024] ? nft_add_set_elem+0x2280/0x2280 [nf_tables] [75518.357154] ? nla_parse+0x1a5/0x300 [75518.401455] ? kasan_kmalloc+0xa6/0xd0 [75518.447842] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink] [75518.507743] ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink] [75518.569745] ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink] [75518.631711] ? lock_acquire+0x143/0x310 [75518.679133] ? netlink_deliver_tap+0x9b/0x1070 [75518.733840] ? kasan_unpoison_shadow+0x31/0x40 [75518.788542] netlink_unicast+0x45d/0x680 [75518.837111] ? __isolate_free_page+0x890/0x890 [75518.891913] ? netlink_attachskb+0x6b0/0x6b0 [75518.944542] netlink_sendmsg+0x6fa/0xd30 [75518.993107] ? netlink_unicast+0x680/0x680 [75519.043758] ? netlink_unicast+0x680/0x680 [75519.094402] sock_sendmsg+0xd9/0x160 [75519.138810] ___sys_sendmsg+0x64d/0x980 [75519.186234] ? copy_msghdr_from_user+0x350/0x350 [75519.243118] ? lock_downgrade+0x650/0x650 [75519.292738] ? do_raw_spin_unlock+0x5d/0x250 [75519.345456] ? _raw_spin_unlock+0x24/0x30 [75519.395065] ? __handle_mm_fault+0xbde/0x3410 [75519.448830] ? sock_setsockopt+0x3d2/0x1940 [75519.500516] ? __lock_acquire.isra.25+0xdc/0x19d0 [75519.558448] ? lock_downgrade+0x650/0x650 [75519.608057] ? __audit_syscall_entry+0x317/0x720 [75519.664960] ? __fget_light+0x58/0x250 [75519.711325] ? __sys_sendmsg+0xde/0x170 [75519.758850] __sys_sendmsg+0xde/0x170 [75519.804193] ? __ia32_sys_shutdown+0x90/0x90 [75519.856725] ? syscall_trace_enter+0x897/0x10e0 [75519.912354] ? trace_event_raw_event_sys_enter+0x920/0x920 [75519.979432] ? __audit_syscall_entry+0x720/0x720 [75520.036118] do_syscall_64+0xa3/0x3d0 [75520.081248] ? prepare_exit_to_usermode+0x47/0x1d0 [75520.139904] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75520.201680] RIP: 0033:0x7fc153320ba0 [75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0 [75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003 [75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090 [75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660 [75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001 [75520.790946] Allocated by task 7356: [75520.833994] kasan_kmalloc+0xa6/0xd0 [75520.878088] __kmalloc+0x189/0x450 [75520.920107] nft_trans_alloc_gfp+0x20/0x190 [nf_tables] [75520.983961] nf_tables_newtable+0xcd0/0x1bd0 [nf_tables] [75521.048857] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink] [75521.108655] netlink_unicast+0x45d/0x680 [75521.157013] netlink_sendmsg+0x6fa/0xd30 [75521.205271] sock_sendmsg+0xd9/0x160 [75521.249365] ___sys_sendmsg+0x64d/0x980 [75521.296686] __sys_sendmsg+0xde/0x170 [75521.341822] do_syscall_64+0xa3/0x3d0 [75521.386957] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75521.467867] Freed by task 23454: [75521.507804] __kasan_slab_free+0x132/0x180 [75521.558137] kfree+0x14d/0x4d0 [75521.596005] free_rt_sched_group+0x153/0x280 [75521.648410] sched_autogroup_create_attach+0x19a/0x520 [75521.711330] ksys_setsid+0x2ba/0x400 [75521.755529] __ia32_sys_setsid+0xa/0x10 [75521.802850] do_syscall_64+0xa3/0x3d0 [75521.848090] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75521.929000] The buggy address belongs to the object at ffff881bdb643f80 which belongs to the cache kmalloc-96 of size 96 [75522.079797] The buggy address is located 72 bytes inside of 96-byte region [ffff881bdb643f80, ffff881bdb643fe0) [75522.221234] The buggy address belongs to the page: [75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0 [75522.377443] flags: 0x2fffff80000100(slab) [75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020 [75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000 [75522.615601] page dumped because: kasan: bad access detected Fixes: 37a9cc52 ("netfilter: nf_tables: add generation mask to sets") Signed-off-by:
Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Javier González authored
[ Upstream commit e37d0798 ] When cleaning up buffer entries as we wrap up, their state should be "completed". If any of the entries is in "submitted" state, it means that something bad has happened. Trigger a warning immediately instead of waiting for the state flag to eventually be updated, thus hiding the issue. Signed-off-by:
Javier González <javier@cnexlabs.com> Signed-off-by:
Matias Bjørling <mb@lightnvm.io> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Leon Romanovsky authored
[ Upstream commit 2468b82d ] Let's perform checks in-place instead of BUG_ONs. Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Piggin authored
[ Upstream commit 926bc2f1 ] The stores to update the SLB shadow area must be made as they appear in the C code, so that the hypervisor does not see an entry with mismatched vsid and esid. Use WRITE_ONCE for this. GCC has been observed to elide the first store to esid in the update, which means that if the hypervisor interrupts the guest after storing to vsid, it could see an entry with old esid and new vsid, which may possibly result in memory corruption. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stewart Smith authored
[ Upstream commit 447808bf ] time_init() will set up tb_ticks_per_usec based on reality. time_init() is called *after* udbg_init_opal_common() during boot. from arch/powerpc/kernel/time.c: unsigned long tb_ticks_per_usec = 100; /* sane default */ Currently, all powernv systems have a timebase frequency of 512mhz (512000000/1000000 == 0x200) - although there's nothing written down anywhere that I can find saying that we couldn't make that different based on the requirements in the ISA. So, we've been (accidentally) thwacking the (currently) correct (for powernv at least) value for tb_ticks_per_usec earlier than we otherwise would have. The "sane default" seems to be adequate for our purposes between udbg_init_opal_common() and time_init() being called, and if it isn't, then we should probably be setting it somewhere that isn't hvc_opal.c! Signed-off-by:
Stewart Smith <stewart@linux.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sam Bobroff authored
[ Upstream commit 46d4be41 ] Correct two cases where eeh_pcid_get() is used to reference the driver's module but the reference is dropped before the driver pointer is used. In eeh_rmv_device() also refactor a little so that only two calls to eeh_pcid_put() are needed, rather than three and the reference isn't taken at all if it wasn't needed. Signed-off-by:
Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Suchanek authored
[ Upstream commit a6b3964a ] A no-op form of ori (or immediate of 0 into r31 and the result stored in r31) has been re-tasked as a speculation barrier. The instruction only acts as a barrier on newer machines with appropriate firmware support. On older CPUs it remains a harmless no-op. Implement barrier_nospec using this instruction. mpe: The semantics of the instruction are believed to be that it prevents execution of subsequent instructions until preceding branches have been fully resolved and are no longer executing speculatively. There is no further documentation available at this time. Signed-off-by:
Michal Suchanek <msuchanek@suse.de> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christophe Leroy authored
[ Upstream commit 1128bb78 ] commit 87a156fb ("Align hot loops of some string functions") degraded the performance of string functions by adding useless nops A simple benchmark on an 8xx calling 100000x a memchr() that matches the first byte runs in 41668 TB ticks before this patch and in 35986 TB ticks after this patch. So this gives an improvement of approx 10% Another benchmark doing the same with a memchr() matching the 128th byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks after this patch, so regardless on the number of loops, removing those useless nops improves the test by 5683 TB ticks. Fixes: 87a156fb ("Align hot loops of some string functions") Signed-off-by:
Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-