1. 15 Jun, 2019 40 commits
    • Eugen Hristev's avatar
      media: atmel: atmel-isc: fix asd memory allocation · 6f9ca872
      Eugen Hristev authored
      [ Upstream commit 1e4e25c4 ]
      
      The subsystem will free the asd memory on notifier cleanup, if the asd is
      added to the notifier.
      However the memory is freed using kfree.
      Thus, we cannot allocate the asd using devm_*
      This can lead to crashes and problems.
      To test this issue, just return an error at probe, but cleanup the
      notifier beforehand.
      
      Fixes: 10626744 ("[media] atmel-isc: add the Image Sensor Controller code")
      Signed-off-by: default avatarEugen Hristev <eugen.hristev@microchip.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f9ca872
    • Chao Yu's avatar
      f2fs: fix to do checksum even if inode page is uptodate · b0395364
      Chao Yu authored
      [ Upstream commit b42b179b ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0395364
    • Chao Yu's avatar
      f2fs: fix to retrieve inline xattr space · 0fd306ae
      Chao Yu authored
      [ Upstream commit 45a74688 ]
      
      With below mkfs and mount option, generic/339 of fstest will report that
      scratch image becomes corrupted.
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
      MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
      
      [ASSERT] (f2fs_check_dirent_position:1315)  --> Wrong position of dirent pino:1970, name: (...)
      level:8, dir_level:0, pgofs:951, correct range:[900, 901]
      
      In old kernel, inline data and directory always reserved 200 bytes in
      inode layout, even if inline_xattr is disabled, then new kernel tries
      to retrieve that space for non-inline xattr inode, but for inline dentry,
      its layout size should be fixed, so we just keep that reserved space.
      
      But the problem here is that, after inline dentry conversion, inline
      dentry layout no longer exists, if we still reserve inline xattr space,
      after dents updates, there will be a hole in inline xattr space, which
      can break hierarchy hash directory structure.
      
      This patch fixes this issue by retrieving inline xattr space after
      inline dentry conversion.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0fd306ae
    • Chao Yu's avatar
      f2fs: fix to avoid deadloop in foreground GC · e2877ff9
      Chao Yu authored
      [ Upstream commit 793ab1c8 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203211
      
      - Overview
      When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cd test
      touch t
      
      - Messages
      [   58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      ... (repeating)
      
      During GC, if segment type stored in SSA and SIT is inconsistent, we just
      skip migrating current segment directly, since we need to know the exact
      type to decide the migration function we use.
      
      So in foreground GC, we will easily run into a infinite loop as we may
      select the same victim segment which has inconsistent type due to greedy
      policy. In order to end up this, we choose to shutdown filesystem. For
      backgrond GC, we need to do that as well, so that we can avoid latter
      potential infinite looped foreground GC.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e2877ff9
    • Chao Yu's avatar
      f2fs: fix to do sanity check on valid block count of segment · 53275b02
      Chao Yu authored
      [ Upstream commit e95bcdb2 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_13.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2102!
       RIP: 0010:update_sit_entry+0x394/0x410
       Call Trace:
        f2fs_allocate_data_block+0x16f/0x660
        do_write_page+0x62/0x170
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x372/0x1400
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      sit.vblocks and sum valid block count in sit.valid_map may be
      inconsistent, segment w/ zero vblocks will be treated as free
      segment, while allocating in free segment, we may allocate a
      free block, if its bitmap is valid previously, it can cause
      kernel crash due to bitmap verification failure.
      
      Anyway, to avoid further serious metadata inconsistence and
      corruption, it is necessary and worth to detect SIT
      inconsistence. So let's enable check_block_count() to verify
      vblocks and valid_map all the time rather than do it only
      CONFIG_F2FS_CHECK_FS is enabled.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53275b02
    • Chao Yu's avatar
      f2fs: fix to avoid panic in dec_valid_node_count() · 076d876b
      Chao Yu authored
      [ Upstream commit ea6d7e72 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203213
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:2012!
       RIP: 0010:truncate_node+0x2c9/0x2e0
       Call Trace:
        f2fs_truncate_xattr_node+0xa1/0x130
        f2fs_remove_inode_page+0x82/0x2d0
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_node_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      076d876b
    • Chao Yu's avatar
      f2fs: fix to use inline space only if inline_xattr is enable · 7cc9fda7
      Chao Yu authored
      [ Upstream commit 622927f3 ]
      
      With below mkfs and mount option:
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
      MOUNT_OPTIONS -- -o noinline_xattr
      
      We may miss xattr data with below testcase:
      - mkdir dir
      - setfattr -n "user.name" -v 0 dir
      - for ((i = 0; i < 190; i++)) do touch dir/$i; done
      - umount
      - mount
      - getfattr -n "user.name" dir
      
      user.name: No such attribute
      
      The root cause is that we persist xattr data into reserved inline xattr
      space, even if inline_xattr is not enable in inline directory inode, after
      inline dentry conversion, reserved space no longer exists, so that xattr
      data missed.
      
      Let's use inline xattr space only if inline_xattr flag is set on inode
      to fix this iusse.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7cc9fda7
    • Chao Yu's avatar
      f2fs: fix to avoid panic in dec_valid_block_count() · 3f4a094e
      Chao Yu authored
      [ Upstream commit 5e159cd3 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203209
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_01.c
      ./run.sh f2fs
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:1788!
       RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
       Call Trace:
        f2fs_truncate_blocks+0x36d/0x3c0
        f2fs_truncate+0x88/0x110
        f2fs_setattr+0x3e1/0x460
        notify_change+0x2da/0x400
        do_truncate+0x6d/0xb0
        do_sys_ftruncate+0xf1/0x160
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_block_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3f4a094e
    • Chao Yu's avatar
      f2fs: fix to clear dirty inode in error path of f2fs_iget() · 83c46592
      Chao Yu authored
      [ Upstream commit 546d22f0 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203217
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_test_05.c
      mkdir test
      mount -t f2fs tmp.img test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/inode.c:707!
       RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
       Call Trace:
        evict+0xba/0x180
        f2fs_iget+0x598/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_readlinkat+0x56/0x110
        __x64_sys_readlink+0x16/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During inode loading, __recover_inline_status() can recovery inode status
      and set inode dirty, once we failed in following process, it will fail
      the check in f2fs_evict_inode, result in trigger BUG_ON().
      
      Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83c46592
    • Chao Yu's avatar
      f2fs: fix to do sanity check on free nid · c5722a10
      Chao Yu authored
      [ Upstream commit 626bcf2b ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203225
      
      - Overview
      When mounting the attached crafted image and unmounting it, following errors are reported.
      Additionally, it hangs on sync after unmounting.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      touch test/t
      umount test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:3073!
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
       Call Trace:
        f2fs_put_super+0xf4/0x270
        generic_shutdown_super+0x62/0x110
        kill_block_super+0x1c/0x50
        kill_f2fs_super+0xad/0xd0
        deactivate_locked_super+0x35/0x60
        cleanup_mnt+0x36/0x70
        task_work_run+0x75/0x90
        exit_to_usermode_loop+0x93/0xa0
        do_syscall_64+0xba/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
      
      NAT table is corrupted, so reserved meta/node inode ids were added into
      free list incorrectly, during file creation, since reserved id has cached
      in inode hash, so it fails the creation and preallocated nid can not be
      released later, result in kernel panic.
      
      To fix this issue, let's do nid boundary check during free nid loading.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c5722a10
    • Chao Yu's avatar
      f2fs: fix to avoid panic in f2fs_remove_inode_page() · 54916125
      Chao Yu authored
      [ Upstream commit 8b6810f8 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203219
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_06.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1183!
       RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
       Call Trace:
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_remove_inode_page() will trigger kernel panic due to
      inconsistent i_blocks value of inode.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing of potential image corruption.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      54916125
    • Chao Yu's avatar
      f2fs: fix error path of recovery · d6fa38f7
      Chao Yu authored
      [ Upstream commit 98838579 ]
      
      There are some places in where we missed to unlock page or unlock page
      incorrectly, fix them.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d6fa38f7
    • Chao Yu's avatar
      f2fs: fix to avoid panic in f2fs_inplace_write_data() · da06b18b
      Chao Yu authored
      [ Upstream commit 05573d6c ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203239
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_15.c
      ./run.sh f2fs
      sync
      
      - Kernel messages
       ------------[ cut here ]------------
       kernel BUG at fs/f2fs/segment.c:3162!
       RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
       Call Trace:
        f2fs_do_write_data_page+0x3c1/0x820
        __write_data_page+0x156/0x720
        f2fs_write_cache_pages+0x20d/0x460
        f2fs_write_data_pages+0x1b4/0x300
        do_writepages+0x15/0x60
        __filemap_fdatawrite_range+0x7c/0xb0
        file_write_and_wait_range+0x2c/0x80
        f2fs_do_sync_file+0x102/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_inplace_write_data() will trigger kernel panic due
      to data block locates in node type segment.
      
      To avoid panic, let's just return error code and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      da06b18b
    • Chao Yu's avatar
      f2fs: fix to avoid panic in do_recover_data() · 62f9300c
      Chao Yu authored
      [ Upstream commit 22d61e28 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203227
      
      - Overview
      When mounting the attached crafted image, following errors are reported.
      Additionally, it hangs on sync after trying to mount it.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/recovery.c:549!
       RIP: 0010:recover_data+0x167a/0x1780
       Call Trace:
        f2fs_recover_fsync_data+0x613/0x710
        f2fs_fill_super+0x1043/0x1aa0
        mount_bdev+0x16d/0x1a0
        mount_fs+0x4a/0x170
        vfs_kern_mount+0x5d/0x100
        do_mount+0x200/0xcf0
        ksys_mount+0x79/0xc0
        __x64_sys_mount+0x1c/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During recovery, if ofs_of_node is inconsistent in between recovered
      node page and original checkpointed node page, let's just fail recovery
      instead of making kernel panic.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62f9300c
    • Miroslav Lichvar's avatar
      ntp: Allow TAI-UTC offset to be set to zero · 01e0054e
      Miroslav Lichvar authored
      [ Upstream commit fdc6bae9 ]
      
      The ADJ_TAI adjtimex mode sets the TAI-UTC offset of the system clock.
      It is typically set by NTP/PTP implementations and it is automatically
      updated by the kernel on leap seconds. The initial value is zero (which
      applications may interpret as unknown), but this value cannot be set by
      adjtimex. This limitation seems to go back to the original "nanokernel"
      implementation by David Mills.
      
      Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in
      order to allow setting it back to the initial value.
      
      Fixes: 153b5d05 ("ntp: support for TAI")
      Suggested-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Link: https://lkml.kernel.org/r/20190417084833.7401-1-mlichvar@redhat.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      01e0054e
    • Fabien Dessenne's avatar
      mailbox: stm32-ipcc: check invalid irq · 6b0cb6f1
      Fabien Dessenne authored
      [ Upstream commit 68a1c848 ]
      
      On failure of_irq_get() returns a negative value or zero, which is
      not handled as an error in the existing implementation.
      Instead of using this API, use platform_get_irq() that returns
      exclusively a negative value on failure.
      Also, do not output an error log in case of defer probe error.
      Signed-off-by: default avatarFabien Dessenne <fabien.dessenne@st.com>
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6b0cb6f1
    • Martin Blumenstingl's avatar
      pwm: meson: Use the spin-lock only to protect register modifications · 3e5d4a6e
      Martin Blumenstingl authored
      [ Upstream commit f173747f ]
      
      Holding the spin-lock for all of the code in meson_pwm_apply() can
      result in a "BUG: scheduling while atomic". This can happen because
      clk_get_rate() (which is called from meson_pwm_calc()) may sleep.
      Only hold the spin-lock when modifying registers to solve this.
      
      The reason why we need a spin-lock in the driver is because the
      REG_MISC_AB register is shared between the two channels provided by one
      PWM controller. The only functions where REG_MISC_AB is modified are
      meson_pwm_enable() and meson_pwm_disable() so the register reads/writes
      in there need to be protected by the spin-lock.
      
      The original code also used the spin-lock to protect the values in
      struct meson_pwm_channel. This could be necessary if two consumers can
      use the same PWM channel. However, PWM core doesn't allow this so we
      don't need to protect the values in struct meson_pwm_channel with a
      lock.
      
      Fixes: 211ed630 ("pwm: Add support for Meson PWM Controller")
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Reviewed-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Signed-off-by: default avatarThierry Reding <thierry.reding@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3e5d4a6e
    • Michael Ellerman's avatar
      EDAC/mpc85xx: Prevent building as a module · ffbb6164
      Michael Ellerman authored
      [ Upstream commit 2b8358a9 ]
      
      The mpc85xx EDAC driver can be configured as a module but then fails to
      build because it uses two unexported symbols:
      
        ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
        ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
      
      We don't want to export those symbols just for this driver, so make the
      driver only configurable as a built-in.
      
      This seems to have been broken since at least
      
        c92132f5 ("edac/85xx: Add PCIe error interrupt edac support")
      
      (Nov 2013).
      
       [ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
         as a module. ]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarJohannes Thumshirn <jth@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: linuxppc-dev@ozlabs.org
      Cc: morbidrsa@gmail.com
      Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.auSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      ffbb6164
    • Krzesimir Nowak's avatar
      bpf: fix undefined behavior in narrow load handling · d16989b4
      Krzesimir Nowak authored
      [ Upstream commit e2f7fc0a ]
      
      Commit 31fd8581 ("bpf: permits narrower load from bpf program
      context fields") made the verifier add AND instructions to clear the
      unwanted bits with a mask when doing a narrow load. The mask is
      computed with
      
        (1 << size * 8) - 1
      
      where "size" is the size of the narrow load. When doing a 4 byte load
      of a an 8 byte field the verifier shifts the literal 1 by 32 places to
      the left. This results in an overflow of a signed integer, which is an
      undefined behavior. Typically, the computed mask was zero, so the
      result of the narrow load ended up being zero too.
      
      Cast the literal to long long to avoid overflows. Note that narrow
      load of the 4 byte fields does not have the undefined behavior,
      because the load size can only be either 1 or 2 bytes, so shifting 1
      by 8 or 16 places will not overflow it. And reading 4 bytes would not
      be a narrow load of a 4 bytes field.
      
      Fixes: 31fd8581 ("bpf: permits narrower load from bpf program context fields")
      Reviewed-by: default avatarAlban Crequy <alban@kinvolk.io>
      Reviewed-by: default avatarIago López Galeiras <iago@kinvolk.io>
      Signed-off-by: default avatarKrzesimir Nowak <krzesimir@kinvolk.io>
      Cc: Yonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d16989b4
    • Ben Skeggs's avatar
      drm/nouveau/kms/gv100-: fix spurious window immediate interlocks · 8966652b
      Ben Skeggs authored
      [ Upstream commit d2434e4d ]
      
      Cursor position updates were accidentally causing us to attempt to interlock
      window with window immediate, and without a matching window immediate update,
      NVDisplay could hang forever in some circumstances.
      
      Fixes suspend/resume on (at least) Quadro RTX4000 (TU104).
      Reported-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8966652b
    • Josh Poimboeuf's avatar
      objtool: Don't use ignore flag for fake jumps · fa57e115
      Josh Poimboeuf authored
      [ Upstream commit e6da9567 ]
      
      The ignore flag is set on fake jumps in order to keep
      add_jump_destinations() from setting their jump_dest, since it already
      got set when the fake jump was created.
      
      But using the ignore flag is a bit of a hack.  It's normally used to
      skip validation of an instruction, which doesn't really make sense for
      fake jumps.
      
      Also, after the next patch, using the ignore flag for fake jumps can
      trigger a false "why am I validating an ignored function?" warning.
      
      Instead just add an explicit check in add_jump_destinations() to skip
      fake jumps.
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/71abc072ff48b2feccc197723a9c52859476c068.1557766718.git.jpoimboe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fa57e115
    • Matt Redfearn's avatar
      drm/bridge: adv7511: Fix low refresh rate selection · 06cd48e1
      Matt Redfearn authored
      [ Upstream commit 67793bd3 ]
      
      The driver currently sets register 0xfb (Low Refresh Rate) based on the
      value of mode->vrefresh. Firstly, this field is specified to be in Hz,
      but the magic numbers used by the code are Hz * 1000. This essentially
      leads to the low refresh rate always being set to 0x01, since the
      vrefresh value will always be less than 24000. Fix the magic numbers to
      be in Hz.
      Secondly, according to the comment in drm_modes.h, the field is not
      supposed to be used in a functional way anyway. Instead, use the helper
      function drm_mode_vrefresh().
      
      Fixes: 9c8af882 ("drm: Add adv7511 encoder driver")
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@thinci.com>
      Signed-off-by: default avatarSean Paul <seanpaul@chromium.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424132210.26338-1-matt.redfearn@thinci.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      06cd48e1
    • Peteris Rudzusiks's avatar
      drm/nouveau: fix duplication of nv50_head_atom struct · 4ff8e3e1
      Peteris Rudzusiks authored
      [ Upstream commit c4a52d66 ]
      
      nv50_head_atomic_duplicate_state() makes a copy of nv50_head_atom
      struct. This patch adds copying of struct member named "or", which
      previously was left uninitialized in the duplicated structure.
      
      Due to this bug, incorrect nhsync and nvsync values were sometimes used.
      In my particular case, that lead to a mismatch between the output
      resolution of the graphics device (GeForce GT 630 OEM) and the reported
      input signal resolution on the display. xrandr reported 1680x1050, but
      the display reported 1280x1024. As a result of this mismatch, the output
      on the display looked like it was cropped (only part of the output was
      actually visible on the display).
      
      git bisect pointed to commit 2ca7fb5c ("drm/nouveau/kms/nv50: handle
      SetControlOutputResource from head"), which added the member "or" to
      nv50_head_atom structure, but forgot to copy it in
      nv50_head_atomic_duplicate_state().
      
      Fixes: 2ca7fb5c ("drm/nouveau/kms/nv50: handle SetControlOutputResource from head")
      Signed-off-by: default avatarPeteris Rudzusiks <peteris.rudzusiks@gmail.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4ff8e3e1
    • Ben Skeggs's avatar
      drm/nouveau/kms/gf119-gp10x: push HeadSetControlOutputResource() mthd when encoders change · 8e1e4616
      Ben Skeggs authored
      [ Upstream commit a0b694d0 ]
      
      HW has error checks in place which check that pixel depth is explicitly
      provided on DP, while HDMI has a "default" setting that we use.
      
      In multi-display configurations with identical modelines, but different
      protocols (HDMI + DP, in this case), it was possible for the DP head to
      get swapped to the head which previously drove the HDMI output, without
      updating HeadSetControlOutputResource(), triggering the error check and
      hanging the core update.
      Reported-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8e1e4616
    • Stephane Eranian's avatar
      perf/x86/intel: Allow PEBS multi-entry in watermark mode · ffd946c4
      Stephane Eranian authored
      [ Upstream commit c7a28657 ]
      
      This patch fixes a restriction/bug introduced by:
      
         583feb08 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS")
      
      The original patch prevented using multi-entry PEBS when wakeup_events != 0.
      However given that wakeup_events is part of a union with wakeup_watermark, it
      means that in watermark mode, PEBS multi-entry is also disabled which is not the
      intent. This patch fixes this by checking is watermark mode is enabled.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Cc: vincent.weaver@maine.edu
      Fixes: 583feb08 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS")
      Link: http://lkml.kernel.org/r/20190514003400.224340-1-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ffd946c4
    • Tony Lindgren's avatar
      mfd: twl6040: Fix device init errors for ACCCTL register · b5833cb3
      Tony Lindgren authored
      [ Upstream commit 48171d0e ]
      
      I noticed that we can get a -EREMOTEIO errors on at least omap4 duovero:
      
      twl6040 0-004b: Failed to write 2d = 19: -121
      
      And then any following register access will produce errors.
      
      There 2d offset above is register ACCCTL that gets written on twl6040
      powerup. With error checking added to the related regcache_sync() call,
      the -EREMOTEIO error is reproducable on twl6040 powerup at least
      duovero.
      
      To fix the error, we need to wait until twl6040 is accessible after the
      powerup. Based on tests on omap4 duovero, we need to wait over 8ms after
      powerup before register write will complete without failures. Let's also
      make sure we warn about possible errors too.
      
      Note that we have twl6040_patch[] reg_sequence with the ACCCTL register
      configuration and regcache_sync() will write the new value to ACCCTL.
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b5833cb3
    • Ben Skeggs's avatar
      drm/nouveau/disp/dp: respect sink limits when selecting failsafe link configuration · 126847e2
      Ben Skeggs authored
      [ Upstream commit 13d03e9d ]
      
      Where possible, we want the failsafe link configuration (one which won't
      hang the OR during modeset because of not enough bandwidth for the mode)
      to also be supported by the sink.
      
      This prevents "link rate unsupported by sink" messages when link training
      fails.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      126847e2
    • Binbin Wu's avatar
      mfd: intel-lpss: Set the device in reset state when init · 99f61a90
      Binbin Wu authored
      [ Upstream commit dad06532 ]
      
      In virtualized setup, when system reboots due to warm
      reset interrupt storm is seen.
      
      Call Trace:
      <IRQ>
      dump_stack+0x70/0xa5
      __report_bad_irq+0x2e/0xc0
      note_interrupt+0x248/0x290
      ? add_interrupt_randomness+0x30/0x220
      handle_irq_event_percpu+0x54/0x80
      handle_irq_event+0x39/0x60
      handle_fasteoi_irq+0x91/0x150
      handle_irq+0x108/0x180
      do_IRQ+0x52/0xf0
      common_interrupt+0xf/0xf
      </IRQ>
      RIP: 0033:0x76fc2cfabc1d
      Code: 24 28 bf 03 00 00 00 31 c0 48 8d 35 63 77 0e 00 48 8d 15 2e
      94 0e 00 4c 89 f9 49 89 d9 4c 89 d3 e8 b8 e2 01 00 48 8b 54 24 18
      <48> 89 ef 48 89 de 4c 89 e1 e8 d5 97 01 00 84 c0 74 2d 48 8b 04
      24
      RSP: 002b:00007ffd247c1fc0 EFLAGS: 00000293 ORIG_RAX: ffffffffffffffda
      RAX: 0000000000000000 RBX: 00007ffd247c1ff0 RCX: 000000000003d3ce
      RDX: 0000000000000000 RSI: 00007ffd247c1ff0 RDI: 000076fc2cbb6010
      RBP: 000076fc2cded010 R08: 00007ffd247c2210 R09: 00007ffd247c22a0
      R10: 000076fc29465470 R11: 0000000000000000 R12: 00007ffd247c1fc0
      R13: 000076fc2ce8e470 R14: 000076fc27ec9960 R15: 0000000000000414
      handlers:
      [<000000000d3fa913>] idma64_irq
      Disabling IRQ #27
      
      To avoid interrupt storm, set the device in reset state
      before bringing out the device from reset state.
      
      Changelog v2:
      - correct the subject line by adding "mfd: "
      Signed-off-by: default avatarBinbin Wu <binbin.wu@intel.com>
      Acked-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      99f61a90
    • Daniel Gomez's avatar
      mfd: tps65912-spi: Add missing of table registration · 6cddbdf2
      Daniel Gomez authored
      [ Upstream commit 9e364e87 ]
      
      MODULE_DEVICE_TABLE(of, <of_match_table> should be called to complete DT
      OF mathing mechanism and register it.
      
      Before this patch:
      modinfo drivers/mfd/tps65912-spi.ko | grep alias
      alias:          spi:tps65912
      
      After this patch:
      modinfo drivers/mfd/tps65912-spi.ko | grep alias
      alias:          of:N*T*Cti,tps65912C*
      alias:          of:N*T*Cti,tps65912
      alias:          spi:tps65912
      Reported-by: default avatarJavier Martinez Canillas <javier@dowhile0.org>
      Signed-off-by: default avatarDaniel Gomez <dagmcr@gmail.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6cddbdf2
    • Amit Kucheria's avatar
      drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER · 54566819
      Amit Kucheria authored
      [ Upstream commit fc7d18cf ]
      
      We print a calibration failure message on -EPROBE_DEFER from
      nvmem/qfprom as follows:
      [    3.003090] qcom-tsens 4a9000.thermal-sensor: version: 1.4
      [    3.005376] qcom-tsens 4a9000.thermal-sensor: tsens calibration failed
      [    3.113248] qcom-tsens 4a9000.thermal-sensor: version: 1.4
      
      This confuses people when, in fact, calibration succeeds later when
      nvmem/qfprom device is available. Don't print this message on a
      -EPROBE_DEFER.
      Signed-off-by: default avatarAmit Kucheria <amit.kucheria@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      54566819
    • Jiada Wang's avatar
      thermal: rcar_gen3_thermal: disable interrupt in .remove · a7ada939
      Jiada Wang authored
      [ Upstream commit 63f55fce ]
      
      Currently IRQ remains enabled after .remove, later if device is probed,
      IRQ is requested before .thermal_init, this may cause IRQ function be
      called before device is initialized.
      
      this patch disables interrupt in .remove, to ensure irq function
      only be called after device is fully initialized.
      Signed-off-by: default avatarJiada Wang <jiada_wang@mentor.com>
      Reviewed-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Reviewed-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a7ada939
    • Cyrill Gorcunov's avatar
      kernel/sys.c: prctl: fix false positive in validate_prctl_map() · b1859f99
      Cyrill Gorcunov authored
      [ Upstream commit a9e73998 ]
      
      While validating new map we require the @start_data to be strictly less
      than @end_data, which is fine for regular applications (this is why this
      nit didn't trigger for that long).  These members are set from executable
      loaders such as elf handers, still it is pretty valid to have a loadable
      data section with zero size in file, in such case the start_data is equal
      to end_data once kernel loader finishes.
      
      As a result when we're trying to restore such programs the procedure fails
      and the kernel returns -EINVAL.  From the image dump of a program:
      
       | "mm_start_code": "0x400000",
       | "mm_end_code": "0x8f5fb4",
       | "mm_start_data": "0xf1bfb0",
       | "mm_end_data": "0xf1bfb0",
      
      Thus we need to change validate_prctl_map from strictly less to less or
      equal operator use.
      
      Link: http://lkml.kernel.org/r/20190408143554.GY1421@uranus.lan
      Fixes: f606b77f ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Cc: Andrey Vagin <avagin@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b1859f99
    • Qian Cai's avatar
      mm/slab.c: fix an infinite loop in leaks_show() · ff75e0ec
      Qian Cai authored
      [ Upstream commit 745e1014 ]
      
      "cat /proc/slab_allocators" could hang forever on SMP machines with
      kmemleak or object debugging enabled due to other CPUs running do_drain()
      will keep making kmemleak_object or debug_objects_cache dirty and unable
      to escape the first loop in leaks_show(),
      
      do {
      	set_store_user_clean(cachep);
      	drain_cpu_caches(cachep);
      	...
      
      } while (!is_store_user_clean(cachep));
      
      For example,
      
      do_drain
        slabs_destroy
          slab_destroy
            kmem_cache_free
              __cache_free
                ___cache_free
                  kmemleak_free_recursive
                    delete_object_full
                      __delete_object
                        put_object
                          free_object_rcu
                            kmem_cache_free
                              cache_free_debugcheck --> dirty kmemleak_object
      
      One approach is to check cachep->name and skip both kmemleak_object and
      debug_objects_cache in leaks_show().  The other is to set store_user_clean
      after drain_cpu_caches() which leaves a small window between
      drain_cpu_caches() and set_store_user_clean() where per-CPU caches could
      be dirty again lead to slightly wrong information has been stored but
      could also speed up things significantly which sounds like a good
      compromise.  For example,
      
       # cat /proc/slab_allocators
       0m42.778s # 1st approach
       0m0.737s  # 2nd approach
      
      [akpm@linux-foundation.org: tweak comment]
      Link: http://lkml.kernel.org/r/20190411032635.10325-1-cai@lca.pw
      Fixes: d31676df ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ff75e0ec
    • Yue Hu's avatar
      mm/cma_debug.c: fix the break condition in cma_maxchunk_get() · ecadffd7
      Yue Hu authored
      [ Upstream commit f0fd5050 ]
      
      If not find zero bit in find_next_zero_bit(), it will return the size
      parameter passed in, so the start bit should be compared with bitmap_maxno
      rather than cma->count.  Although getting maxchunk is working fine due to
      zero value of order_per_bit currently, the operation will be stuck if
      order_per_bit is set as non-zero.
      
      Link: http://lkml.kernel.org/r/20190319092734.276-1-zbestahu@gmail.comSigned-off-by: default avatarYue Hu <huyue2@yulong.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Safonov <d.safonov@partner.samsung.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ecadffd7
    • Aneesh Kumar K.V's avatar
      mm: page_mkclean vs MADV_DONTNEED race · 397b073f
      Aneesh Kumar K.V authored
      [ Upstream commit 024eee0e ]
      
      MADV_DONTNEED is handled with mmap_sem taken in read mode.  We call
      page_mkclean without holding mmap_sem.
      
      MADV_DONTNEED implies that pages in the region are unmapped and subsequent
      access to the pages in that range is handled as a new page fault.  This
      implies that if we don't have parallel access to the region when
      MADV_DONTNEED is run we expect those range to be unallocated.
      
      w.r.t page_mkclean() we need to make sure that we don't break the
      MADV_DONTNEED semantics.  MADV_DONTNEED check for pmd_none without holding
      pmd_lock.  This implies we skip the pmd if we temporarily mark pmd none.
      Avoid doing that while marking the page clean.
      
      Keep the sequence same for dax too even though we don't support
      MADV_DONTNEED for dax mapping
      
      The bug was noticed by code review and I didn't observe any failures w.r.t
      test run.  This is similar to
      
      commit 58ceeb6b
      Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Date:   Thu Apr 13 14:56:26 2017 -0700
      
          thp: fix MADV_DONTNEED vs. MADV_FREE race
      
      commit ced10803
      Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Date:   Thu Apr 13 14:56:20 2017 -0700
      
          thp: fix MADV_DONTNEED vs. numa balancing race
      
      Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc:"Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      397b073f
    • Yue Hu's avatar
      mm/cma.c: fix the bitmap status to show failed allocation reason · 9e3c2b37
      Yue Hu authored
      [ Upstream commit 2b59e01a ]
      
      Currently one bit in cma bitmap represents number of pages rather than
      one page, cma->count means cma size in pages. So to find available pages
      via find_next_zero_bit()/find_next_bit() we should use cma size not in
      pages but in bits although current free pages number is correct due to
      zero value of order_per_bit. Once order_per_bit is changed the bitmap
      status will be incorrect.
      
      The size input in cma_debug_show_areas() is not correct.  It will
      affect the available pages at some position to debug the failure issue.
      
      This is an example with order_per_bit = 1
      
      Before this change:
      [    4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages
      
      After this change:
      [    4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages
      
      Obviously the bitmap status before is incorrect.
      
      Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.comSigned-off-by: default avatarYue Hu <huyue2@yulong.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9e3c2b37
    • Baoquan He's avatar
      mm/memory_hotplug.c: fix the wrong usage of N_HIGH_MEMORY · e39f78af
      Baoquan He authored
      [ Upstream commit d3ba3ae1 ]
      
      In node_states_check_changes_online(), N_HIGH_MEMORY is used to substitute
      ZONE_HIGHMEM directly.  This is not right.  N_HIGH_MEMORY is to mark the
      memory state of node.  Here zone index is checked, which should be
      compared with 'ZONE_HIGHMEM' accordingly.
      
      Replace it with ZONE_HIGHMEM.
      
      This is a code cleanup - no known runtime effects.
      
      Link: http://lkml.kernel.org/r/20190320080732.14933-1-bhe@redhat.com
      Fixes: 8efe33f4 ("mm/memory_hotplug.c: simplify node_states_check_changes_online")
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e39f78af
    • Qian Cai's avatar
      mm/compaction.c: fix an undefined behaviour · f3918ea4
      Qian Cai authored
      [ Upstream commit dd7ef7bd ]
      
      In a low-memory situation, cc->fast_search_fail can keep increasing as it
      is unable to find an available page to isolate in
      fast_isolate_freepages().  As the result, it could trigger an error below,
      so just compare with the maximum bits can be shifted first.
      
      UBSAN: Undefined behaviour in mm/compaction.c:1160:30
      shift exponent 64 is too large for 64-bit type 'unsigned long'
      CPU: 131 PID: 1308 Comm: kcompactd1 Kdump: loaded Tainted: G
      W    L    5.0.0+ #17
      Call trace:
       dump_backtrace+0x0/0x450
       show_stack+0x20/0x2c
       dump_stack+0xc8/0x14c
       __ubsan_handle_shift_out_of_bounds+0x7e8/0x8c4
       compaction_alloc+0x2344/0x2484
       unmap_and_move+0xdc/0x1dbc
       migrate_pages+0x274/0x1310
       compact_zone+0x26ec/0x43bc
       kcompactd+0x15b8/0x1a24
       kthread+0x374/0x390
       ret_from_fork+0x10/0x18
      
      [akpm@linux-foundation.org: code cleanup]
      Link: http://lkml.kernel.org/r/20190320203338.53367-1-cai@lca.pw
      Fixes: 70b44595 ("mm, compaction: use free lists to quickly locate a migration source")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f3918ea4
    • Christoph Hellwig's avatar
      initramfs: free initrd memory if opening /initrd.image fails · a52e81f3
      Christoph Hellwig authored
      [ Upstream commit 54c7a891 ]
      
      Patch series "initramfs tidyups".
      
      I've spent some time chasing down behavior in initramfs and found
      plenty of opportunity to improve the code.  A first stab on that is
      contained in this series.
      
      This patch (of 7):
      
      We free the initrd memory for all successful or error cases except for the
      case where opening /initrd.image fails, which looks like an oversight.
      
      Steven said:
      
      : This also changes the behaviour when CONFIG_INITRAMFS_FORCE is enabled
      : - specifically it means that the initrd is freed (previously it was
      : ignored and never freed).  But that seems like reasonable behaviour and
      : the previous behaviour looks like another oversight.
      
      Link: http://lkml.kernel.org/r/20190213174621.29297-3-hch@lst.deSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a52e81f3
    • Yue Hu's avatar
      mm/cma.c: fix crash on CMA allocation if bitmap allocation fails · 7df45315
      Yue Hu authored
      [ Upstream commit 1df3a339 ]
      
      f022d8cb ("mm: cma: Don't crash on allocation if CMA area can't be
      activated") fixes the crash issue when activation fails via setting
      cma->count as 0, same logic exists if bitmap allocation fails.
      
      Link: http://lkml.kernel.org/r/20190325081309.6004-1-zbestahu@gmail.comSigned-off-by: default avatarYue Hu <huyue2@yulong.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7df45315