1. 10 Jan, 2022 19 commits
    • Nghia Le's avatar
      ext4: remove useless resetting io_end_size in mpage_process_page() · effc5b3b
      Nghia Le authored
      The command "make clang-analyzer" detects dead stores in
      mpage_process_page() function.
      
      Do not reset io_end_size to 0 in the current paths, as the function
      exits on those paths without further using io_end_size.
      Signed-off-by: default avatarNghia Le <nghialm78@gmail.com>
      Link: https://lore.kernel.org/r/20211025221803.3326-1-nghialm78@gmail.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      effc5b3b
    • Lukas Czerner's avatar
      ext4: allow to change s_last_trim_minblks via sysfs · 4a69aecb
      Lukas Czerner authored
      Ext4 has an optimization mechanism for batched disacrd (FITRIM) that
      should help speed up subsequent calls of FITRIM ioctl by skipping the
      groups that were previously trimmed. However because the FITRIM allows
      to set the minimum size of an extent to trim, ext4 stores the last
      minimum extent size and only avoids trimming the group if it was
      previously trimmed with minimum extent size equal to, or smaller than
      the current call.
      
      There is currently no way to bypass the optimization without
      umount/mount cycle. This becomes a problem when the file system is
      live migrated to a different storage, because the optimization will
      prevent possibly useful discard calls to the storage.
      
      Fix it by exporting the s_last_trim_minblks via sysfs interface which
      will allow us to set the minimum size to the number of blocks larger
      than subsequent FITRIM call, effectively bypassing the optimization.
      
      By setting the s_last_trim_minblks to ULONG_MAX the optimization will be
      effectively cleared regardless of the previous state, or file system
      configuration.
      
      For example:
      getconf ULONG_MAX > /sys/fs/ext4/dm-1/last_trim_minblks
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reported-by: default avatarLaurent GUERBY <laurent@guerby.net>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Link: https://lore.kernel.org/r/20211103145122.17338-2-lczerner@redhat.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4a69aecb
    • Lukas Czerner's avatar
      ext4: change s_last_trim_minblks type to unsigned long · 2327fb2e
      Lukas Czerner authored
      There is no good reason for the s_last_trim_minblks to be atomic. There is
      no data integrity needed and there is no real danger in setting and
      reading it in a racy manner. Change it to be unsigned long, the same type
      as s_clusters_per_group which is the maximum that's allowed.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Suggested-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Link: https://lore.kernel.org/r/20211103145122.17338-1-lczerner@redhat.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      2327fb2e
    • Lukas Czerner's avatar
      ext4: implement support for get/set fs label · bbc605cd
      Lukas Czerner authored
      Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for
      online reading and setting of file system label.
      
      ext4_ioctl_getlabel() is simple, just get the label from the primary
      superblock. This might not be the first sb on the file system if
      'sb=' mount option is used.
      
      In ext4_ioctl_setlabel() we update what ext4 currently views as a
      primary superblock and then proceed to update backup superblocks. There
      are two caveats:
       - the primary superblock might not be the first superblock and so it
         might not be the one used by userspace tools if read directly
         off the disk.
       - because the primary superblock might not be the first superblock we
         potentialy have to update it as part of backup superblock update.
         However the first sb location is a bit more complicated than the rest
         so we have to account for that.
      
      The superblock modification is created generic enough so the
      infrastructure can be used for other potential superblock modification
      operations, such as chaning UUID.
      
      Tested with generic/492 with various configurations. I also checked the
      behavior with 'sb=' mount options, including very large file systems
      with and without sparse_super/sparse_super2.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Link: https://lore.kernel.org/r/20211213135618.43303-1-lczerner@redhat.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      bbc605cd
    • Lukas Czerner's avatar
      ext4: only set EXT4_MOUNT_QUOTA when journalled quota file is specified · 4c1bd5a9
      Lukas Czerner authored
      Only set EXT4_MOUNT_QUOTA when journalled quota file is specified,
      otherwise simply disabling specific quota type (usrjquota=) will also
      set the EXT4_MOUNT_QUOTA super block option.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Fixes: e6e268cb ("ext4: move quota configuration out of handle_mount_opt()")
      Link: https://lore.kernel.org/r/20220104143518.134465-2-lczerner@redhat.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4c1bd5a9
    • Lukas Czerner's avatar
      ext4: don't use kfree() on rcu protected pointer sbi->s_qf_names · 13b215a9
      Lukas Czerner authored
      During ext4 mount api rework the commit e6e268cb ("ext4: move quota
      configuration out of handle_mount_opt()") introduced a bug where we
      would kfree(sbi->s_qf_names[i]) before assigning the new quota name in
      ext4_apply_quota_options().
      
      This is wrong because we're using kfree() on rcu prointer that could be
      simultaneously accessed from ext4_show_quota_options() during remount.
      Fix it by using rcu_replace_pointer() to replace the old qname with the
      new one and then kfree_rcu() the old quota name.
      
      Also use get_qf_name() instead of sbi->s_qf_names in strcmp() to silence
      the sparse warning.
      
      Fixes: e6e268cb ("ext4: move quota configuration out of handle_mount_opt()")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Link: https://lore.kernel.org/r/20220104143518.134465-1-lczerner@redhat.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      13b215a9
    • Jan Kara's avatar
      ext4: avoid trim error on fs with small groups · 173b6e38
      Jan Kara authored
      A user reported FITRIM ioctl failing for him on ext4 on some devices
      without apparent reason.  After some debugging we've found out that
      these devices (being LVM volumes) report rather large discard
      granularity of 42MB and the filesystem had 1k blocksize and thus group
      size of 8MB. Because ext4 FITRIM implementation puts discard
      granularity into minlen, ext4_trim_fs() declared the trim request as
      invalid. However just silently doing nothing seems to be a more
      appropriate reaction to such combination of parameters since user did
      not specify anything wrong.
      
      CC: Lukas Czerner <lczerner@redhat.com>
      Fixes: 5c2ed62f ("ext4: Adjust minlen with discard_granularity in the FITRIM ioctl")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20211112152202.26614-1-jack@suse.czSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      173b6e38
    • Zhang Yi's avatar
      ext4: fix an use-after-free issue about data=journal writeback mode · 5c48a7df
      Zhang Yi authored
      Our syzkaller report an use-after-free issue that accessing the freed
      buffer_head on the writeback page in __ext4_journalled_writepage(). The
      problem is that if there was a truncate racing with the data=journalled
      writeback procedure, the writeback length could become zero and
      bget_one() refuse to get buffer_head's refcount, then the truncate
      procedure release buffer once we drop page lock, finally, the last
      ext4_walk_page_buffers() trigger the use-after-free problem.
      
      sync                               truncate
      ext4_sync_file()
       file_write_and_wait_range()
                                         ext4_setattr(0)
                                          inode->i_size = 0
        ext4_writepage()
         len = 0
         __ext4_journalled_writepage()
          page_bufs = page_buffers(page)
          ext4_walk_page_buffers(bget_one) <- does not get refcount
                                          do_invalidatepage()
                                            free_buffer_head()
          ext4_walk_page_buffers(page_bufs) <- trigger use-after-free
      
      After commit bdf96838 ("ext4: fix race between truncate and
      __ext4_journalled_writepage()"), we have already handled the racing
      case, so the bget_one() and bput_one() are not needed. So this patch
      simply remove these hunk, and recheck the i_size to make it safe.
      
      Fixes: bdf96838 ("ext4: fix race between truncate and __ext4_journalled_writepage()")
      Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20211225090937.712867-1-yi.zhang@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      5c48a7df
    • Ye Bin's avatar
      ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits' · 298b5c52
      Ye Bin authored
      We got issue as follows when run syzkaller test:
      [ 1901.130043] EXT4-fs error (device vda): ext4_remount:5624: comm syz-executor.5: Abort forced by user
      [ 1901.130901] Aborting journal on device vda-8.
      [ 1901.131437] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.16: Detected aborted journal
      [ 1901.131566] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.11: Detected aborted journal
      [ 1901.132586] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.18: Detected aborted journal
      [ 1901.132751] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.9: Detected aborted journal
      [ 1901.136149] EXT4-fs error (device vda) in ext4_reserve_inode_write:6035: Journal has aborted
      [ 1901.136837] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-fuzzer: Detected aborted journal
      [ 1901.136915] ==================================================================
      [ 1901.138175] BUG: KASAN: null-ptr-deref in __ext4_journal_ensure_credits+0x74/0x140 [ext4]
      [ 1901.138343] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.13: Detected aborted journal
      [ 1901.138398] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.1: Detected aborted journal
      [ 1901.138808] Read of size 8 at addr 0000000000000000 by task syz-executor.17/968
      [ 1901.138817]
      [ 1901.138852] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.30: Detected aborted journal
      [ 1901.144779] CPU: 1 PID: 968 Comm: syz-executor.17 Not tainted 4.19.90-vhulk2111.1.0.h893.eulerosv2r10.aarch64+ #1
      [ 1901.146479] Hardware name: linux,dummy-virt (DT)
      [ 1901.147317] Call trace:
      [ 1901.147552]  dump_backtrace+0x0/0x2d8
      [ 1901.147898]  show_stack+0x28/0x38
      [ 1901.148215]  dump_stack+0xec/0x15c
      [ 1901.148746]  kasan_report+0x108/0x338
      [ 1901.149207]  __asan_load8+0x58/0xb0
      [ 1901.149753]  __ext4_journal_ensure_credits+0x74/0x140 [ext4]
      [ 1901.150579]  ext4_xattr_delete_inode+0xe4/0x700 [ext4]
      [ 1901.151316]  ext4_evict_inode+0x524/0xba8 [ext4]
      [ 1901.151985]  evict+0x1a4/0x378
      [ 1901.152353]  iput+0x310/0x428
      [ 1901.152733]  do_unlinkat+0x260/0x428
      [ 1901.153056]  __arm64_sys_unlinkat+0x6c/0xc0
      [ 1901.153455]  el0_svc_common+0xc8/0x320
      [ 1901.153799]  el0_svc_handler+0xf8/0x160
      [ 1901.154265]  el0_svc+0x10/0x218
      [ 1901.154682] ==================================================================
      
      This issue may happens like this:
      	Process1                               Process2
      ext4_evict_inode
        ext4_journal_start
         ext4_truncate
           ext4_ind_truncate
             ext4_free_branches
               ext4_ind_truncate_ensure_credits
      	   ext4_journal_ensure_credits_fn
      	     ext4_journal_restart
      	       handle->h_transaction = NULL;
                                                 mount -o remount,abort  /mnt
      					   -> trigger JBD abort
                     start_this_handle -> will return failed
        ext4_xattr_delete_inode
          ext4_journal_ensure_credits
            ext4_journal_ensure_credits_fn
              __ext4_journal_ensure_credits
      	  jbd2_handle_buffer_credits
      	    journal = handle->h_transaction->t_journal; ->null-ptr-deref
      
      Now, indirect truncate process didn't handle error. To solve this issue
      maybe simply add check handle is abort in '__ext4_journal_ensure_credits'
      is enough, and i also think this is necessary.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20211224100341.3299128-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      298b5c52
    • Harshad Shirwadkar's avatar
      ext4: initialize err_blk before calling __ext4_get_inode_loc · c27c29c6
      Harshad Shirwadkar authored
      It is not guaranteed that __ext4_get_inode_loc will definitely set
      err_blk pointer when it returns EIO. To avoid using uninitialized
      variables, let's first set err_blk to 0.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20211201163421.2631661-1-harshads@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      c27c29c6
    • Chunguang Xu's avatar
      ext4: fix a possible ABBA deadlock due to busy PA · 8c80fb31
      Chunguang Xu authored
      We found on older kernel (3.10) that in the scenario of insufficient
      disk space, system may trigger an ABBA deadlock problem, it seems that
      this problem still exists in latest kernel, try to fix it here. The
      main process triggered by this problem is that task A occupies the PA
      and waits for the jbd2 transaction finish, the jbd2 transaction waits
      for the completion of task B's IO (plug_list), but task B waits for
      the release of PA by task A to finish discard, which indirectly forms
      an ABBA deadlock. The related calltrace is as follows:
      
          Task A
          vfs_write
          ext4_mb_new_blocks()
          ext4_mb_mark_diskspace_used()       JBD2
          jbd2_journal_get_write_access()  -> jbd2_journal_commit_transaction()
        ->schedule()                          filemap_fdatawait()
       |                                              |
       | Task B                                       |
       | do_unlinkat()                                |
       | ext4_evict_inode()                           |
       | jbd2_journal_begin_ordered_truncate()        |
       | filemap_fdatawrite_range()                   |
       | ext4_mb_new_blocks()                         |
        -ext4_mb_discard_group_preallocations() <-----
      
      Here, try to cancel ext4_mb_discard_group_preallocations() internal
      retry due to PA busy, and do a limited number of retries inside
      ext4_mb_discard_preallocations(), which can circumvent the above
      problems, but also has some advantages:
      
      1. Since the PA is in a busy state, if other groups have free PAs,
         keeping the current PA may help to reduce fragmentation.
      2. Continue to traverse forward instead of waiting for the current
         group PA to be released. In most scenarios, the PA discard time
         can be reduced.
      
      However, in the case of smaller free space, if only a few groups have
      space, then due to multiple traversals of the group, it may increase
      CPU overhead. But in contrast, I feel that the overall benefit is
      better than the cost.
      Signed-off-by: default avatarChunguang Xu <brookxu@tencent.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/1637630277-23496-1-git-send-email-brookxu.cn@gmail.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      8c80fb31
    • Qing Wang's avatar
      ext4: replace snprintf in show functions with sysfs_emit · dfac1a16
      Qing Wang authored
      coccicheck complains about the use of snprintf() in sysfs show functions.
      
      Fix the coccicheck warning:
      WARNING: use scnprintf or sprintf.
      
      Use sysfs_emit instead of scnprintf or sprintf makes more sense.
      Signed-off-by: default avatarQing Wang <wangqing@vivo.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/1634095731-4528-1-git-send-email-wangqing@vivo.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      dfac1a16
    • Jan Kara's avatar
      ext4: make sure to reset inode lockdep class when quota enabling fails · 4013d47a
      Jan Kara authored
      When we succeed in enabling some quota type but fail to enable another
      one with quota feature, we correctly disable all enabled quota types.
      However we forget to reset i_data_sem lockdep class. When the inode gets
      freed and reused, it will inherit this lockdep class (i_data_sem is
      initialized only when a slab is created) and thus eventually lockdep
      barfs about possible deadlocks.
      
      Reported-and-tested-by: syzbot+3b6f9218b1301ddda3e2@syzkaller.appspotmail.com
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20211007155336.12493-3-jack@suse.czSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4013d47a
    • Jan Kara's avatar
      ext4: make sure quota gets properly shutdown on error · 15fc69bb
      Jan Kara authored
      When we hit an error when enabling quotas and setting inode flags, we do
      not properly shutdown quota subsystem despite returning error from
      Q_QUOTAON quotactl. This can lead to some odd situations like kernel
      using quota file while it is still writeable for userspace. Make sure we
      properly cleanup the quota subsystem in case of error.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20211007155336.12493-2-jack@suse.czSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      15fc69bb
    • Ye Bin's avatar
      ext4: Fix BUG_ON in ext4_bread when write quota data · 380a0091
      Ye Bin authored
      We got issue as follows when run syzkaller:
      [  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
      [  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
      [  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
      [  167.983601] ------------[ cut here ]------------
      [  167.984245] kernel BUG at fs/ext4/inode.c:847!
      [  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      [  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
      [  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      [  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
      [  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
      [  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
      [  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
      [  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
      [  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
      [  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
      [  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
      [  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
      [  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
      [  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  168.001899] Call Trace:
      [  168.002235]  <TASK>
      [  168.007167]  ext4_bread+0xd/0x53
      [  168.007612]  ext4_quota_write+0x20c/0x5c0
      [  168.010457]  write_blk+0x100/0x220
      [  168.010944]  remove_free_dqentry+0x1c6/0x440
      [  168.011525]  free_dqentry.isra.0+0x565/0x830
      [  168.012133]  remove_tree+0x318/0x6d0
      [  168.014744]  remove_tree+0x1eb/0x6d0
      [  168.017346]  remove_tree+0x1eb/0x6d0
      [  168.019969]  remove_tree+0x1eb/0x6d0
      [  168.022128]  qtree_release_dquot+0x291/0x340
      [  168.023297]  v2_release_dquot+0xce/0x120
      [  168.023847]  dquot_release+0x197/0x3e0
      [  168.024358]  ext4_release_dquot+0x22a/0x2d0
      [  168.024932]  dqput.part.0+0x1c9/0x900
      [  168.025430]  __dquot_drop+0x120/0x190
      [  168.025942]  ext4_clear_inode+0x86/0x220
      [  168.026472]  ext4_evict_inode+0x9e8/0xa22
      [  168.028200]  evict+0x29e/0x4f0
      [  168.028625]  dispose_list+0x102/0x1f0
      [  168.029148]  evict_inodes+0x2c1/0x3e0
      [  168.030188]  generic_shutdown_super+0xa4/0x3b0
      [  168.030817]  kill_block_super+0x95/0xd0
      [  168.031360]  deactivate_locked_super+0x85/0xd0
      [  168.031977]  cleanup_mnt+0x2bc/0x480
      [  168.033062]  task_work_run+0xd1/0x170
      [  168.033565]  do_exit+0xa4f/0x2b50
      [  168.037155]  do_group_exit+0xef/0x2d0
      [  168.037666]  __x64_sys_exit_group+0x3a/0x50
      [  168.038237]  do_syscall_64+0x3b/0x90
      [  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      In order to reproduce this problem, the following conditions need to be met:
      1. Ext4 filesystem with no journal;
      2. Filesystem image with incorrect quota data;
      3. Abort filesystem forced by user;
      4. umount filesystem;
      
      As in ext4_quota_write:
      ...
               if (EXT4_SB(sb)->s_journal && !handle) {
                       ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
                               " cancelled because transaction is not started",
                               (unsigned long long)off, (unsigned long long)len);
                       return -EIO;
               }
      ...
      We only check handle if NULL when filesystem has journal. There is need
      check handle if NULL even when filesystem has no journal.
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      380a0091
    • Sebastian Andrzej Siewior's avatar
      ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal · ab047d51
      Sebastian Andrzej Siewior authored
      The kmemcache for ext4_fc_dentry_cachep remains registered after module
      removal.
      
      Destroy ext4_fc_dentry_cachep kmemcache on module removal.
      
      Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20211110134640.lyku5vklvdndw6uk@linutronix.de
      Link: https://lore.kernel.org/r/YbiK3JetFFl08bd7@linutronix.de
      Link: https://lore.kernel.org/r/20211223164436.2628390-1-bigeasy@linutronix.deSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      ab047d51
    • Xin Yin's avatar
      ext4: fast commit may miss tracking unwritten range during ftruncate · 9725958b
      Xin Yin authored
      If use FALLOC_FL_KEEP_SIZE to alloc unwritten range at bottom, the
      inode->i_size will not include the unwritten range. When call
      ftruncate with fast commit enabled, it will miss to track the
      unwritten range.
      
      Change to trace the full range during ftruncate.
      Signed-off-by: default avatarXin Yin <yinxin.x@bytedance.com>
      Reviewed-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20211223032337.5198-3-yinxin.x@bytedance.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      9725958b
    • Xin Yin's avatar
      ext4: use ext4_ext_remove_space() for fast commit replay delete range · 0b5b5a62
      Xin Yin authored
      For now ,we use ext4_punch_hole() during fast commit replay delete range
      procedure. But it will be affected by inode->i_size, which may not
      correct during fast commit replay procedure. The following test will
      failed.
      
      -create & write foo (len 1000K)
      -falloc FALLOC_FL_ZERO_RANGE foo (range 400K - 600K)
      -create & fsync bar
      -falloc FALLOC_FL_PUNCH_HOLE foo (range 300K-500K)
      -fsync foo
      -crash before a full commit
      
      After the fast_commit reply procedure, the range 400K-500K will not be
      removed. Because in this case, when calling ext4_punch_hole() the
      inode->i_size is 0, and it just retruns with doing nothing.
      
      Change to use ext4_ext_remove_space() instead of ext4_punch_hole()
      to remove blocks of inode directly.
      Signed-off-by: default avatarXin Yin <yinxin.x@bytedance.com>
      Reviewed-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20211223032337.5198-2-yinxin.x@bytedance.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      0b5b5a62
    • Xin Yin's avatar
      ext4: fix fast commit may miss tracking range for FALLOC_FL_ZERO_RANGE · 5e4d0eba
      Xin Yin authored
      when call falloc with FALLOC_FL_ZERO_RANGE, to set an range to unwritten,
      which has been already initialized. If the range is align to blocksize,
      fast commit will not track range for this change.
      
      Also track range for unwritten range in ext4_map_blocks().
      Signed-off-by: default avatarXin Yin <yinxin.x@bytedance.com>
      Reviewed-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20211221022839.374606-1-yinxin.x@bytedance.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      5e4d0eba
  2. 23 Dec, 2021 7 commits
  3. 09 Dec, 2021 13 commits
  4. 05 Dec, 2021 1 commit