- 22 Dec, 2020 2 commits
-
-
Chunguang Xu authored
Commit cfd73237 ("ext4: add prefetching for block allocation bitmaps") introduced block bitmap prefetch, and expects to read block bitmaps of flex_bg through an IO. However, it seems to ignore the value range of s_log_groups_per_flex. In the scenario where the value of s_log_groups_per_flex is greater than 27, s_mb_prefetch or s_mb_prefetch_limit will overflow, cause a divide zero exception. In addition, the logic of calculating nr is also flawed, because the size of flexbg is fixed during a single mount, but s_mb_prefetch can be modified, which causes nr to fail to meet the value condition of [1, flexbg_size]. To solve this problem, we need to set the upper limit of s_mb_prefetch. Since we expect to load block bitmaps of a flex_bg through an IO, we can consider determining a reasonable upper limit among the IO limit parameters. After consideration, we chose BLK_MAX_SEGMENT_SIZE. This is a good choice to solve divide zero problem and avoiding performance degradation. [ Some minor code simplifications to make the changes easy to follow -- TYT ] Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Samuel Liao <samuelliao@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1607051143-24508-1-git-send-email-brookxu@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
When filesystem inconsistency is detected with group locked, we currently try to modify superblock to store error there without blocking. However this can cause superblock checksum failures (or DIF/DIX failure) when the superblock is just being written out. Make error handling code just store error information in ext4_sb_info structure and copy it to on-disk superblock only in ext4_commit_super(). In case of error happening with group locked, we just postpone the superblock flushing to a workqueue. [ Added fixup so that s_first_error_* does not get updated after the file system is remounted. Also added fix for syzbot failure. - Ted ] Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201127113405.26867-8-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+9043030c040ce1849a60@syzkaller.appspotmail.com
-
- 17 Dec, 2020 15 commits
-
-
Jan Kara authored
We convert errno's to ext4 on-disk format error codes in save_error_info(). Add a function and a bit of macro magic to make this simpler. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-7-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
Just move error info related functions in super.c close to ext4_handle_error(). We'll want to combine save_error_info() with ext4_handle_error() and this makes change more obvious and saves a forward declaration as well. No functional change. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-6-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
The only difference between __ext4_abort() and __ext4_error() is that the former one ignores errors=continue mount option. Unify the code to reduce duplication. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-5-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
We use __ext4_error() when ext4_protect_reserved_inode() finds filesystem corruption. However EXT4_ERROR_INODE_ERR() is perfectly capable of reporting all the needed information. So just use that. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-4-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
Superblock is written out either through ext4_commit_super() or through ext4_handle_dirty_super(). In both cases we recompute the checksum so it is not necessary to recompute it after updating superblock free inodes & blocks counters. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-3-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
ext4_handle_error() with errors=continue mount option can accidentally remount the filesystem read-only when the system is rebooting. Fix that. Fixes: 1dc1097f ("ext4: avoid panic during forced reboot") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20201127113405.26867-2-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
Xattr code using inodes with large xattr data can end up dropping last inode reference (and thus deleting the inode) from places like ext4_xattr_set_entry(). That function is called with transaction started and so ext4_evict_inode() can deadlock against fs freezing like: CPU1 CPU2 removexattr() freeze_super() vfs_removexattr() ext4_xattr_set() handle = ext4_journal_start() ... ext4_xattr_set_entry() iput(old_ea_inode) ext4_evict_inode(old_ea_inode) sb->s_writers.frozen = SB_FREEZE_FS; sb_wait_write(sb, SB_FREEZE_FS); ext4_freeze() jbd2_journal_lock_updates() -> blocks waiting for all handles to stop sb_start_intwrite() -> blocks as sb is already in SB_FREEZE_FS state Generally it is advisable to delete inodes from a separate transaction as it can consume quite some credits however in this case it would be quite clumsy and furthermore the credits for inode deletion are quite limited and already accounted for. So just tweak ext4_evict_inode() to avoid freeze protection if we have transaction already started and thus it is not really needed anyway. Cc: stable@vger.kernel.org Fixes: dec214d0 ("ext4: xattr inode deduplication") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127110649.24730-1-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Harshad Shirwadkar authored
Add a helper to read number of fast commit blocks from jbd2 superblock and also rename the JBD2_MIN_FC_BLKS to JBD2_DEFAULT_FAST_COMMIT_BLOCKS since this constant is just the default number of fast commit blocks to use in case number of fast commit blocks isn't set in jbd2 superblock. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201120202232.2240293-2-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Harshad Shirwadkar authored
This patch makes fast_commit.h byte by byte identical with e2fsprogs/fast_commit.h. This will help us ensure that there are no on-disk format inconsistencies between e2fsck and kernel ext4. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201120202232.2240293-1-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Gustavo A. R. Silva authored
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/03497331f088a938d7a728e7a689bd7953139429.1605896059.git.gustavoars@kernel.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Harshad Shirwadkar authored
Fast commit on-disk format is designed such that the replay of these tags can be idempotent. This patch adds documentation in the code in form of comments and in form kernel docs that describes these characteristics. This patch also adds a TODO item needed to ensure kernel fast commit replay idempotence. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201119232822.1860882-1-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Kaixu Xia authored
There are no callers of the EXT4_CURRENT_REV macro, so remove it. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1605164202-31120-1-git-send-email-kaixuxia@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Dan Carpenter authored
The ext4_find_extent() function never returns NULL, it returns error pointers. Fixes: 44059e503b03 ("ext4: fast commit recovery path") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201023112232.GB282278@mwandaSigned-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
Theodore Ts'o authored
Check for valid block size directly by validating s_log_block_size; we were doing this in two places. First, by calculating blocksize via BLOCK_SIZE << s_log_block_size, and then checking that the blocksize was valid. And then secondly, by checking s_log_block_size directly. The first check is not reliable, and can trigger an UBSAN warning if s_log_block_size on a maliciously corrupted superblock is greater than 22. This is harmless, since the second test will correctly reject the maliciously fuzzed file system, but to make syzbot shut up, and because the two checks are duplicative in any case, delete the blocksize check, and move the s_log_block_size earlier in ext4_fill_super(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: syzbot+345b75652b1d24227443@syzkaller.appspotmail.com
-
Chunguang Xu authored
When freeing metadata, we will create an ext4_free_data and insert it into the pending free list. After the current transaction is committed, the object will be freed. ext4_mb_free_metadata() will check whether the area to be freed overlaps with the pending free list. If true, return directly. At this time, ext4_free_data is leaked. Fortunately, the probability of this problem is small, since it only occurs if the file system is corrupted such that a block is claimed by more one inode and those inodes are deleted within a single jbd2 transaction. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/1604764698-4269-8-git-send-email-brookxu@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 09 Dec, 2020 2 commits
-
-
Chunguang Xu authored
Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604764698-4269-7-git-send-email-brookxu@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Chunguang Xu authored
Since ext4_data_block_valid() has been renamed to ext4_inode_block_valid(), the related comments need to be updated. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604764698-4269-5-git-send-email-brookxu@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
- 03 Dec, 2020 9 commits
-
-
Chunguang Xu authored
The code of mb_find_order_for_block is a bit obscure, but we can simplify it with mb_find_buddy(), make the code more concise. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604764698-4269-3-git-send-email-brookxu@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Chunguang Xu authored
After this patch (163a203d), if an abnormal bitmap is detected, we will mark the group as corrupt, and we will not use this group in the future. Therefore, it should be meaningless to regenerate the buddy bitmap of this group, It might be better to delete it. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604764698-4269-2-git-send-email-brookxu@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Chunguang Xu authored
There are currently multiple forms of assertion, such as J_ASSERT(). J_ASEERT() is provided for the jbd module, which is a public module. Maybe we should use custom ASSERT() like other file systems, such as xfs, which would be better. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604764698-4269-1-git-send-email-brookxu@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Roman Anufriev authored
Right now, it is hard to understand which quota journalling type is enabled: you need to be quite familiar with kernel code and trace it or really understand what different combinations of fs flags/mount options lead to. This patch adds printing of current quota jounalling mode on each mount/remount, thus making it easier to check it at a glance/in autotests. The semantics is similar to ext4 data journalling modes: * journalled - quota configured, journalling will be enabled * writeback - quota configured, journalling won't be enabled * none - quota isn't configured * disabled - kernel compiled without CONFIG_QUOTA feature Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1603336860-16153-2-git-send-email-dotdot@yandex-team.ruSigned-off-by: Roman Anufriev <dotdot@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Roman Anufriev authored
Right now, there are several places, where we check whether fs is capable of enabling quota or if quota is journalled with quite long and non-self-descriptive condition statements. This patch wraps these statements into helpers for better readability and easier usage. Link: https://lore.kernel.org/r/1603336860-16153-1-git-send-email-dotdot@yandex-team.ruReviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Roman Anufriev <dotdot@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Colin Ian King authored
Variable ex is assigned a variable that is not being read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20201021132326.148052-1-colin.king@canonical.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Xianting Tian authored
bv_page can't be NULL in a valid bio_vec, so we can remove the NULL check, as we did in other places when calling bio_for_each_segment_all() to go through all bio_vec of a bio. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Xianting Tian <tian.xianting@h3c.com> Link: https://lore.kernel.org/r/20201020082201.34257-1-tian.xianting@h3c.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Kaixu Xia authored
The out_fail branch path don't release the bh and the second bh is valid only in the for statement, so we don't need to set them to NULL. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/1603194069-17557-1-git-send-email-kaixuxia@tencent.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Alexander Lochmann authored
We used LockDoc to derive locking rules for each member of struct transaction_t. Based on those results, we extended the existing documentation by more members of struct transaction_t, and updated the existing documentation. Link: https://lore.kernel.org/r/10cfbef1-994c-c604-f8a6-b1042fcc622f@tu-dortmund.deSigned-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de> Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
- 22 Nov, 2020 12 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds authored
Pull HID fixes from Jiri Kosina: - Various functionality / regression fixes for Logitech devices from Hans de Goede - Fix for (recently added) GPIO support in mcp2221 driver from Lars Povlsen - Power management handling fix/quirk in i2c-hid driver for certain BIOSes that have strange aproach to power-cycle from Hans de Goede - a few device ID additions and device-specific quirks * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver HID: logitech-dj: Fix an error in mse_bluetooth_descriptor HID: Add Logitech Dinovo Edge battery quirk HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices HID: mcp2221: Fix GPIO output handling HID: hid-sensor-hub: Fix issue with devices with no report ID HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown HID: add support for Sega Saturn HID: cypress: Support Varmilo Keyboards' media hotkeys HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses HID: logitech-hidpp: Add PID for MX Anywhere 2 HID: uclogic: Add ID for Trust Flex Design Tablet
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Thomas Gleixner: "A couple of scheduler fixes: - Make the conditional update of the overutilized state work correctly by caching the relevant flags state before overwriting them and checking them afterwards. - Fix a data race in the wakeup path which caused loadavg on ARM64 platforms to become a random number generator. - Fix the ordering of the iowaiter accounting operations so it can't be decremented before it is incremented. - Fix a bug in the deadline scheduler vs. priority inheritance when a non-deadline task A has inherited the parameters of a deadline task B and then blocks on a non-deadline task C. The second inheritance step used the static deadline parameters of task A, which are usually 0, instead of further propagating task B's parameters. The zero initialized parameters trigger a bug in the deadline scheduler" * tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix priority inheritance with multiple scheduling classes sched: Fix rq->nr_iowait ordering sched: Fix data-race in wakeup sched/fair: Fix overutilized update in enqueue_task_fair()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fix from Thomas Gleixner: "A single fix for the x86 perf sysfs interfaces which used kobject attributes instead of device attributes and therefore making clang's control flow integrity checker upset" * tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: fix sysfs type mismatches
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fix from Thomas Gleixner: "A single fix for lockdep which makes the recursion protection cover graph lock/unlock" * tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Put graph lock/unlock under lock_recursion protection
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI fixes from Borislav Petkov: "Forwarded EFI fixes from Ard Biesheuvel: - fix memory leak in efivarfs driver - fix HYP mode issue in 32-bit ARM version of the EFI stub when built in Thumb2 mode - avoid leaking EFI pgd pages on allocation failure" * tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Free efi_pgd with free_pages() efivarfs: fix memory leak in efivarfs_create() efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of same because the proper one is going through the IOMMU tree (Thomas Gleixner) - An Intel microcode loader fix to save the correct microcode patch to apply during resume (Chen Yu) - A fix to not access user memory of other processes when dumping opcode bytes (Thomas Gleixner) * tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account" x86/dumpstack: Do not try to access user space code of other tasks x86/microcode/intel: Check patch signature before saving microcode for early loading iommu/vt-d: Take CONFIG_PCI_ATS into account
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "8 patches. Subsystems affected by this patch series: mm (madvise, pagemap, readahead, memcg, userfaultfd), kbuild, and vfs" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix madvise WILLNEED performance problem libfs: fix error cast of negative value in simple_attr_write() mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() mm: memcg/slab: fix root memcg vmstats mm: fix readahead_page_batch for retry entries mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports compiler-clang: remove version check for BPF Tracing mm/madvise: fix memory leak from process_madvise
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging and IIO fixes from Greg KH: "Here are some small Staging and IIO driver fixes for 5.10-rc5. They include: - IIO fixes for reported regressions and problems - new device ids for IIO drivers - new device id for rtl8723bs driver - staging ralink driver Kconfig dependency fix - staging mt7621-pci bus resource fix All of these have been in linux-next all week with no reported issues" * tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI iio: light: fix kconfig dependency bug for VCNL4035 iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used iio/adc: ingenic: Fix battery VREF for JZ4770 SoC staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK staging: mt7621-pci: avoid to request pci bus resources iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout counter/ti-eqep: Fix regmap max_register iio: adc: stm32-adc: fix a regression when using dma and irq iio: adc: mediatek: fix unset field iio: cros_ec: Use default frequencies when EC returns invalid information
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty fixes from Greg KH: "Here are some small tty/serial fixes for 5.10-rc5 that resolve some reported issues: - speakup crash when telling the kernel to use a device that isn't really there - imx serial driver fixes for reported problems - ar933x_uart driver fix for probe error handling path All have been in linux-next for a while with no reported issues" * tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: ar933x_uart: disable clk on error handling path in probe tty: serial: imx: keep console clocks always on speakup: Do not let the line discipline be used several times tty: serial: imx: fix potential deadlock
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
Pull ext4 fixes from Ted Ts'o: "A final set of miscellaneous bug fixes for ext4" * tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix bogus warning in ext4_update_dx_flag() jbd2: fix kernel-doc markups ext4: drop fast_commit from /proc/mounts
-
David Howells authored
When doing a lookup in a directory, the afs filesystem uses a bulk status fetch to speculatively retrieve the statuses of up to 48 other vnodes found in the same directory and it will then either update extant inodes or create new ones - effectively doing 'lookup ahead'. To avoid the possibility of deadlocking itself, however, the filesystem doesn't lock all of those inodes; rather just the directory inode is locked (by the VFS). When the operation completes, afs_inode_init_from_status() or afs_apply_status() is called, depending on whether the inode already exists, to commit the new status. A case exists, however, where the speculative status fetch operation may straddle a modification operation on one of those vnodes. What can then happen is that the speculative bulk status RPC retrieves the old status, and whilst that is happening, the modification happens - which returns an updated status, then the modification status is committed, then we attempt to commit the speculative status. This results in something like the following being seen in dmesg: kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus say that the vnode had data version 8 when we'd already recorded version 9 due to a local modification. This was causing the cache to be invalidated for that vnode when it shouldn't have been. If it happens on a data file, this might lead to local changes being lost. Fix this by ignoring speculative status updates if the data version doesn't match the expected value. Note that it is possible to get a DV regression if a volume gets restored from a backup - but we should get a callback break in such a case that should trigger a recheck anyway. It might be worth checking the volume creation time in the volsync info and, if a change is observed in that (as would happen on a restore), invalidate all caches associated with the volume. Fixes: 5cf9dd55 ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-