1. 10 Jan, 2024 4 commits
    • Ye Bin's avatar
      ext4: fix inconsistent between segment fstrim and full fstrim · 68da4c44
      Ye Bin authored
      Suppose we issue two FITRIM ioctls for ranges [0,15] and [16,31] with
      mininum length of trimmed range set to 8 blocks. If we have say a range of
      blocks 10-22 free, this range will not be trimmed because it straddles the
      boundary of the two FITRIM ranges and neither part is big enough. This is a
      bit surprising to some users that call FITRIM on smaller ranges of blocks
      to limit impact on the system. Also XFS trims all free space extents that
      overlap with the specified range so we are inconsistent among filesystems.
      Let's change ext4_try_to_trim_range() to consider for trimming the whole
      free space extent that straddles the end of specified range, not just the
      part of it within the range.
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20231216010919.1995851-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      68da4c44
    • Ojaswin Mujoo's avatar
      ext4: fallback to complex scan if aligned scan doesn't work · 1f6bc02f
      Ojaswin Mujoo authored
      Currently in case the goal length is a multiple of stripe size we use
      ext4_mb_scan_aligned() to find the stripe size aligned physical blocks.
      In case we are not able to find any, we again go back to calling
      ext4_mb_choose_next_group() to search for a different suitable block
      group. However, since the linear search always begins from the start,
      most of the times we end up with the same BG and the cycle continues.
      
      With large fliesystems, the CPU can be stuck in this loop for hours
      which can slow down the whole system. Hence, until we figure out a
      better way to continue the search (rather than starting from beginning)
      in ext4_mb_choose_next_group(), lets just fallback to
      ext4_mb_complex_scan_group() in case aligned scan fails, as it is much
      more likely to find the needed blocks.
      Signed-off-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/ee033f6dfa0a7f2934437008a909c3788233950f.1702455010.git.ojaswin@linux.ibm.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1f6bc02f
    • Matthew Wilcox (Oracle)'s avatar
      ext4: convert ext4_da_do_write_end() to take a folio · 4d5cdd75
      Matthew Wilcox (Oracle) authored
      There's nothing page-specific happening in ext4_da_do_write_end();
      it's merely used for its refcount & lock, both of which are folio
      properties.  Saves four calls to compound_head().
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20231214053035.1018876-1-willy@infradead.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4d5cdd75
    • Suraj Jitindar Singh's avatar
      ext4: allow for the last group to be marked as trimmed · 7c784d62
      Suraj Jitindar Singh authored
      The ext4 filesystem tracks the trim status of blocks at the group
      level.  When an entire group has been trimmed then it is marked as
      such and subsequent trim invocations with the same minimum trim size
      will not be attempted on that group unless it is marked as able to be
      trimmed again such as when a block is freed.
      
      Currently the last group can't be marked as trimmed due to incorrect
      logic in ext4_last_grp_cluster(). ext4_last_grp_cluster() is supposed
      to return the zero based index of the last cluster in a group. This is
      then used by ext4_try_to_trim_range() to determine if the trim
      operation spans the entire group and as such if the trim status of the
      group should be recorded.
      
      ext4_last_grp_cluster() takes a 0 based group index, thus the valid
      values for grp are 0..(ext4_get_groups_count - 1). Any group index
      less than (ext4_get_groups_count - 1) is not the last group and must
      have EXT4_CLUSTERS_PER_GROUP(sb) clusters. For the last group we need
      to calculate the number of clusters based on the number of blocks in
      the group. Finally subtract 1 from the number of clusters as zero
      based indexing is expected.  Rearrange the function slightly to make
      it clear what we are calculating and returning.
      
      Reproducer:
      // Create file system where the last group has fewer blocks than
      // blocks per group
      $ mkfs.ext4 -b 4096 -g 8192 /dev/nvme0n1 8191
      $ mount /dev/nvme0n1 /mnt
      
      Before Patch:
      $ fstrim -v /mnt
      /mnt: 25.9 MiB (27156480 bytes) trimmed
      // Group not marked as trimmed so second invocation still discards blocks
      $ fstrim -v /mnt
      /mnt: 25.9 MiB (27156480 bytes) trimmed
      
      After Patch:
      fstrim -v /mnt
      /mnt: 25.9 MiB (27156480 bytes) trimmed
      // Group marked as trimmed so second invocation DOESN'T discard any blocks
      fstrim -v /mnt
      /mnt: 0 B (0 bytes) trimmed
      
      Fixes: 45e4ab32 ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()")
      Cc:  <stable@vger.kernel.org> # 4.19+
      Signed-off-by: default avatarSuraj Jitindar Singh <surajjs@amazon.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20231213051635.37731-1-surajjs@amazon.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      7c784d62
  2. 05 Jan, 2024 10 commits
  3. 14 Dec, 2023 4 commits
  4. 12 Dec, 2023 1 commit
    • Ye Bin's avatar
      jbd2: fix soft lockup in journal_finish_inode_data_buffers() · 6c02757c
      Ye Bin authored
      There's issue when do io test:
      WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170]
      CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G  OE
      Call trace:
       dump_backtrace+0x0/0x1a0
       show_stack+0x24/0x30
       dump_stack+0xb0/0x100
       watchdog_timer_fn+0x254/0x3f8
       __hrtimer_run_queues+0x11c/0x380
       hrtimer_interrupt+0xfc/0x2f8
       arch_timer_handler_phys+0x38/0x58
       handle_percpu_devid_irq+0x90/0x248
       generic_handle_irq+0x3c/0x58
       __handle_domain_irq+0x68/0xc0
       gic_handle_irq+0x90/0x320
       el1_irq+0xcc/0x180
       queued_spin_lock_slowpath+0x1d8/0x320
       jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2]
       kjournald2+0xec/0x2f0 [jbd2]
       kthread+0x134/0x138
       ret_from_fork+0x10/0x18
      
      Analyzed informations from vmcore as follows:
      (1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list';
      (2) Now is processing the 855th jbd2_inode;
      (3) JBD2 task has TIF_NEED_RESCHED flag;
      (4) There's no pags in address_space around the 855th jbd2_inode;
      (5) There are some process is doing drop caches;
      (6) Mounted with 'nodioread_nolock' option;
      (7) 128 CPUs;
      
      According to informations from vmcore we know 'journal->j_list_lock' spin lock
      competition is fierce. So journal_finish_inode_data_buffers() maybe process
      slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors().
      However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK,
      will not call cond_resched(). So may lead to soft lockup.
      journal_finish_inode_data_buffers
        filemap_fdatawait_range_keep_errors
          __filemap_fdatawait_range
            while (index <= end)
              nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK);
              if (!nr_pages)
                 break;    --> If 'nr_pages' is equal zero will break, then will not call cond_resched()
              for (i = 0; i < nr_pages; i++)
                wait_on_page_writeback(page);
              cond_resched();
      
      To solve above issue, add scheduling point in the journal_finish_inode_data_buffers();
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      6c02757c
  5. 01 Dec, 2023 4 commits
  6. 27 Nov, 2023 2 commits
    • Linus Torvalds's avatar
      Linux 6.7-rc3 · 2cc14f52
      Linus Torvalds authored
      2cc14f52
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 5b2b1173
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt::
       "Eventfs fixes:
      
         - With the usage of simple_recursive_remove() recommended by Al Viro,
           the code should not be calling "d_invalidate()" itself. Doing so is
           causing crashes. The code was calling d_invalidate() on the race of
           trying to look up a file while the parent was being deleted. This
           was detected, and the added dentry was having d_invalidate() called
           on it, but the deletion of the directory was also calling
           d_invalidate() on that same dentry.
      
         - A fix to not free the eventfs_inode (ei) until the last dput() was
           called on its ei->dentry made the ei->dentry exist even after it
           was marked for free by setting the ei->is_freed. But code elsewhere
           still was checking if ei->dentry was NULL if ei->is_freed is set
           and would trigger WARN_ON if that was the case. That's no longer
           true and there should not be any warnings when it is true.
      
         - Use GFP_NOFS for allocations done under eventfs_mutex. The
           eventfs_mutex can be taken on file system reclaim, make sure that
           allocations done under that mutex do not trigger file system
           reclaim.
      
         - Clean up code by moving the taking of inode_lock out of the helper
           functions and into where they are needed, and not use the parameter
           to know to take it or not. It must always be held but some callers
           of the helper function have it taken when they were called.
      
         - Warn if the inode_lock is not held in the helper functions.
      
         - Warn if eventfs_start_creating() is called without a parent. As
           eventfs is underneath tracefs, all files created will have a parent
           (the top one will have a tracefs parent).
      
        Tracing update:
      
         - Add Mathieu Desnoyers as an official reviewer of the tracing subsystem"
      
      * tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
        eventfs: Make sure that parent->d_inode is locked in creating files/dirs
        eventfs: Do not allow NULL parent to eventfs_start_creating()
        eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
        eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
        eventfs: Do not invalidate dentry in create_file/dir_dentry()
        eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
      5b2b1173
  7. 26 Nov, 2023 6 commits
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · d2da77f4
      Linus Torvalds authored
      Pull parisc architecture fixes from Helge Deller:
       "This patchset fixes and enforces correct section alignments for the
        ex_table, altinstructions, parisc_unwind, jump_table and bug_table
        which are created by inline assembly.
      
        Due to not being correctly aligned at link & load time they can
        trigger unnecessarily the kernel unaligned exception handler at
        runtime. While at it, I switched the bug table to use relative
        addresses which reduces the size of the table by half on 64-bit.
      
        We still had the ENOSYM and EREMOTERELEASE errno symbols as left-overs
        from HP-UX, which now trigger build-issues with glibc. We can simply
        remove them.
      
        Most of the patches are tagged for stable kernel series.
      
        Summary:
      
         - Drop HP-UX ENOSYM and EREMOTERELEASE return codes to avoid glibc
           build issues
      
         - Fix section alignments for ex_table, altinstructions, parisc unwind
           table, jump_table and bug_table
      
         - Reduce size of bug_table on 64-bit kernel by using relative
           pointers"
      
      * tag 'parisc-for-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Reduce size of the bug_table on 64-bit kernel by half
        parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes
        parisc: Use natural CPU alignment for bug_table
        parisc: Ensure 32-bit alignment on parisc unwind section
        parisc: Mark lock_aligned variables 16-byte aligned on SMP
        parisc: Mark jump_table naturally aligned
        parisc: Mark altinstructions read-only and 32-bit aligned
        parisc: Mark ex_table entries 32-bit aligned in uaccess.h
        parisc: Mark ex_table entries 32-bit aligned in assembly.h
      d2da77f4
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4892711a
      Linus Torvalds authored
      Pull x86 microcode fixes from Ingo Molnar:
       "Fix/enhance x86 microcode version reporting: fix the bootup log spam,
        and remove the driver version announcement to avoid version confusion
        when distros backport fixes"
      
      * tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode: Rework early revisions reporting
        x86/microcode: Remove the driver announcement and version
      4892711a
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e81fe505
      Linus Torvalds authored
      Pull x86 perf event fix from Ingo Molnar:
       "Fix a bug in the Intel hybrid CPUs hardware-capabilities enumeration
        code resulting in non-working events on those platforms"
      
      * tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
      e81fe505
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1d0dbc3d
      Linus Torvalds authored
      Pull locking fix from Ingo Molnar:
       "Fix lockdep block chain corruption resulting in KASAN warnings"
      
      * tag 'locking-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lockdep: Fix block chain corruption
      1d0dbc3d
    • Linus Torvalds's avatar
      Merge tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 4515866d
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - use after free fix in releasing multichannel interfaces
      
       - fixes for special file types (report char, block, FIFOs properly when
         created e.g. by NFS to Windows)
      
       - fixes for reporting various special file types and symlinks properly
         when using SMB1
      
      * tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb: client: introduce cifs_sfu_make_node()
        smb: client: set correct file type from NFS reparse points
        smb: client: introduce ->parse_reparse_point()
        smb: client: implement ->query_reparse_point() for SMB1
        cifs: fix use after free for iface while disabling secondary channels
      4515866d
    • Linus Torvalds's avatar
      Merge tag 'usb-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 090472ed
      Linus Torvalds authored
      Pull USB / PHY / Thunderbolt fixes from Greg KH:
       "Here are a number of reverts, fixes, and new device ids for 6.7-rc3
        for the USB, PHY, and Thunderbolt driver subsystems. Include in here
        are:
      
         - reverts of some PHY drivers that went into 6.7-rc1 that shouldn't
           have been merged yet, the author is reworking them based on review
           comments as they were using older apis that shouldn't be used
           anymore for newer drivers
      
         - small thunderbolt driver fixes for reported issues
      
         - USB driver fixes for a variety of small issues in dwc3, typec,
           xhci, and other smaller drivers.
      
         - new device ids for usb-serial and onboard_usb_hub drivers.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
        USB: serial: option: add Luat Air72*U series products
        USB: dwc3: qcom: fix ACPI platform device leak
        USB: dwc3: qcom: fix software node leak on probe errors
        USB: dwc3: qcom: fix resource leaks on probe deferral
        USB: dwc3: qcom: simplify wakeup interrupt setup
        USB: dwc3: qcom: fix wakeup after probe deferral
        dt-bindings: usb: qcom,dwc3: fix example wakeup interrupt types
        usb: misc: onboard-hub: add support for Microchip USB5744
        dt-bindings: usb: microchip,usb5744: Add second supply
        usb: misc: ljca: Fix enumeration error on Dell Latitude 9420
        USB: serial: option: add Fibocom L7xx modules
        USB: xhci-plat: fix legacy PHY double init
        usb: typec: tipd: Supply also I2C driver data
        usb: xhci-mtk: fix in-ep's start-split check failure
        usb: dwc3: set the dma max_seg_size
        usb: config: fix iteration issue in 'usb_get_bos_descriptor()'
        usb: dwc3: add missing of_node_put and platform_device_put
        USB: dwc2: write HCINT with INTMASK applied
        usb: misc: ljca: Drop _ADR support to get ljca children devices
        usb: cdnsp: Fix deadlock issue during using NCM gadget
        ...
      090472ed
  8. 25 Nov, 2023 9 commits