1. 08 Jun, 2016 40 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.5.7 · 8c596d17
      Greg Kroah-Hartman authored
      8c596d17
    • David Sterba's avatar
      btrfs: make state preallocation more speculative in __set_extent_bit · c396d335
      David Sterba authored
      commit 059f791c upstream.
      
      Similar to __clear_extent_bit, do not fail if the state preallocation
      fails as we might not need it. One less BUG_ON.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c396d335
    • Zhao Lei's avatar
      btrfs: scrub: Set bbio to NULL before calling btrfs_map_block · f8f2b9dd
      Zhao Lei authored
      commit f1fee653 upstream.
      
      We usually call btrfs_put_bbio() when btrfs_map_block() failed,
      btrfs_put_bbio() works right whether bbio is a valid value, or NULL.
      
      But there is a exception, in some case, btrfs_map_block() will return
      fail without touching *bbio(keeping its original value), and if bbio
      was not initialized yet, invalid memory accessing will happened.
      
      Above case is in scrub_missing_raid56_pages(), and similar case in
      scrub_raid56_parity().
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8f2b9dd
    • Liu Bo's avatar
      Btrfs: fix unexpected return value of fiemap · 327b1cf8
      Liu Bo authored
      commit 2d324f59 upstream.
      
      btrfs's fiemap is supposed to return 0 on success and return < 0 on
      error. however, ret becomes 1 after looking up the last file extent:
      
        btrfs_lookup_file_extent ->
          btrfs_search_slot(..., ins_len=0, cow=0)
      
      and if the offset is beyond EOF, we'll get 'path' pointed to the place
      of potentail insertion, and ret == 1.
      
      This may confuse applications using ioctl(FIEL_IOC_FIEMAP).
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      327b1cf8
    • Filipe Manana's avatar
      Btrfs: fix empty symlink after creating symlink and fsync parent dir · ac5414fc
      Filipe Manana authored
      commit 3f9749f6 upstream.
      
      If we create a symlink, fsync its parent directory, crash/power fail and
      mount the filesystem, we end up with an empty symlink, which not only is
      useless it's also not allowed in linux (the man page symlink(2) is well
      explicit about that).  So we just need to make sure to fully log an inode
      if it's a symlink, to ensure its inline extent gets logged, ensuring the
      same behaviour as ext3, ext4, xfs, reiserfs, f2fs, nilfs2, etc.
      
      Example reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
        $ mkdir /mnt/testdir
        $ sync
        $ ln -s /mnt/foo /mnt/testdir/bar
        $ xfs_io -c fsync /mnt/testdir
        <power fail>
        $ mount /dev/sdb /mnt
        $ readlink /mnt/testdir/bar
        <empty string>
      
      A test case for fstests follows soon.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac5414fc
    • Filipe Manana's avatar
      Btrfs: fix for incorrect directory entries after fsync log replay · ec450cd5
      Filipe Manana authored
      commit 657ed1aa upstream.
      
      If we move a directory to a new parent and later log that parent and don't
      explicitly log the old parent, when we replay the log we can end up with
      entries for the moved directory in both the old and new parent directories.
      Besides being ilegal to have directories with multiple hard links in linux,
      it also resulted in the leaving the inode item with a link count of 1.
      A similar issue also happens if we move a regular file - after the log tree
      is replayed the file has a link in both the old and new parent directories,
      when it should be only at the new directory.
      
      Sample reproducer:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
        $ mkdir /mnt/x
        $ mkdir /mnt/y
        $ touch /mnt/x/foo
        $ mkdir /mnt/y/z
        $ sync
        $ ln /mnt/x/foo /mnt/x/bar
        $ mv /mnt/y/z /mnt/x/z
        < power fail >
        $ mount /dev/sdc /mnt
        $ ls -1Ri /mnt
        /mnt:
        257 x
        258 y
      
        /mnt/x:
        259 bar
        259 foo
        260 z
      
        /mnt/x/z:
      
        /mnt/y:
        260 z
      
        /mnt/y/z:
      
        $ umount /dev/sdc
        $ btrfs check /dev/sdc
        Checking filesystem on /dev/sdc
        UUID: a67e2c4a-a4b4-4fdc-b015-9d9af1e344be
        checking extents
        checking free space cache
        checking fs roots
        root 5 inode 260 errors 2000, link count wrong
              unresolved ref dir 257 index 4 namelen 1 name z filetype 2 errors 0
              unresolved ref dir 258 index 2 namelen 1 name z filetype 2 errors 0
        (...)
      
      Attempting to remove the directory becomes impossible:
      
        $ mount /dev/sdc /mnt
        $ rmdir /mnt/y/z
        $ ls -lh /mnt/y
        ls: cannot access /mnt/y/z: No such file or directory
        total 0
        d????????? ? ? ? ?            ? z
        $ rmdir /mnt/x/z
        rmdir: failed to remove ‘/mnt/x/z’: Stale file handle
        $ ls -lh /mnt/x
        ls: cannot access /mnt/x/z: Stale file handle
        total 0
        -rw-r--r-- 2 root root 0 Apr  6 18:06 bar
        -rw-r--r-- 2 root root 0 Apr  6 18:06 foo
        d????????? ? ?    ?    ?            ? z
      
      So make sure that on rename we set the last_unlink_trans value for our
      inode, even if it's a directory, to the value of the current transaction's
      ID and that if the new parent directory is logged that we fallback to a
      transaction commit.
      
      A test case for fstests is being submitted as well.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec450cd5
    • Anand Jain's avatar
      btrfs: pass the right error code to the btrfs_std_error · 5ea5cce5
      Anand Jain authored
      commit ad8403df upstream.
      
      Also drop the newline from the message.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      5ea5cce5
    • Scott Talbert's avatar
      btrfs: fix memory leak during RAID 5/6 device replacement · 7281989d
      Scott Talbert authored
      commit 4673272f upstream.
      
      A 'struct bio' is allocated in scrub_missing_raid56_pages(), but it was never
      freed anywhere.
      Signed-off-by: default avatarScott Talbert <scott.talbert@hgst.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7281989d
    • Vincent Stehlé's avatar
      Btrfs: fix fspath error deallocation · b9cd7ba4
      Vincent Stehlé authored
      commit 72928f24 upstream.
      
      Make sure to deallocate fspath with vfree() in case of error in
      init_ipath().
      
      fspath is allocated with vmalloc() in init_data_container() since
      commit 425d17a2 ("Btrfs: use larger limit for translation of logical to
      inode").
      Signed-off-by: default avatarVincent Stehlé <vincent.stehle@intel.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9cd7ba4
    • Adam Borowski's avatar
      btrfs: fix int32 overflow in shrink_delalloc(). · f97bce9f
      Adam Borowski authored
      commit 8eb0dfdb upstream.
      
      UBSAN: Undefined behaviour in fs/btrfs/extent-tree.c:4623:21
      signed integer overflow:
      10808 * 262144 cannot be represented in type 'int [8]'
      
      If 8192<=items<16384, we request a writeback of an insane number of pages
      which is benign (everything will be written).  But if items>=16384, the
      space reservation won't be enough.
      Signed-off-by: default avatarAdam Borowski <kilobyte@angband.pl>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f97bce9f
    • David Sterba's avatar
      btrfs: add write protection to SET_FEATURES ioctl · e5788f96
      David Sterba authored
      commit 7ab19625 upstream.
      
      Perform the want_write check if we get far enough to do any writes.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5788f96
    • Anand Jain's avatar
      btrfs: fix lock dep warning move scratch super outside of chunk_mutex · 686d38c6
      Anand Jain authored
      commit 48b3b9d4 upstream.
      
      Move scratch super outside of the chunk lock to avoid below
      lockdep warning. The better place to scratch super is in
      the function btrfs_rm_dev_replace_free_srcdev() just before
      free_device, which is outside of the chunk lock as well.
      
      To reproduce:
        (fresh boot)
        mkfs.btrfs -f -draid5 -mraid5 /dev/sdc /dev/sdd /dev/sde
        mount /dev/sdc /btrfs
        dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100
        (get devmgt from https://github.com/asj/devmgt.git)
        devmgt detach /dev/sde
        dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100
        sync
        btrfs replace start -Brf 3 /dev/sdf /btrfs <--
        devmgt attach host7
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      4.6.0-rc2asj+ #1 Not tainted
      ---------------------------------------------------
      
      btrfs/2174 is trying to acquire lock:
      (sb_writers){.+.+.+}, at:
      [<ffffffff812449b4>] __sb_start_write+0xb4/0xf0
      
      but task is already holding lock:
      (&fs_info->chunk_mutex){+.+.+.}, at:
      [<ffffffffa05c5f55>] btrfs_dev_replace_finishing+0x145/0x980 [btrfs]
      
      which lock already depends on the new lock.
      
      Chain exists of:
      sb_writers --> &fs_devs->device_list_mutex --> &fs_info->chunk_mutex
      Possible unsafe locking scenario:
      CPU0				CPU1
      ----				----
      lock(&fs_info->chunk_mutex);
      				lock(&fs_devs->device_list_mutex);
      				lock(&fs_info->chunk_mutex);
      lock(sb_writers);
      686d38c6
    • Josef Bacik's avatar
      Btrfs: remove BUG_ON()'s in btrfs_map_block · d7f27d07
      Josef Bacik authored
      commit e042d1ec upstream.
      
      btrfs_map_block can go horribly wrong in the face of fs corruption, lets agree
      to not be assholes and panic at any possible chance things are all fucked up.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      [ removed type casts ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7f27d07
    • Liu Bo's avatar
      Btrfs: fix divide error upon chunk's stripe_len · b737cf51
      Liu Bo authored
      commit 3d8da678 upstream.
      
      The struct 'map_lookup' uses type int for @stripe_len, while
      btrfs_chunk_stripe_len() can return a u64 value, and it may end up with
      @stripe_len being undefined value and it can lead to 'divide error' in
       __btrfs_map_block().
      
      This changes 'map_lookup' to use type u64 for stripe_len, also right now
      we only use BTRFS_STRIPE_LEN for stripe_len, so this adds a valid checker for
      BTRFS_STRIPE_LEN.
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Reported-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ folded division fix to scrub_raid56_parity ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b737cf51
    • David Sterba's avatar
      btrfs: add check to sysfs handler of label · 1a0608e8
      David Sterba authored
      commit 66ac9fe7 upstream.
      
      Add a sanity check for the fs_info as we will dereference it, similar to
      what the 'store features' handler does.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a0608e8
    • David Sterba's avatar
      btrfs: add read-only check to sysfs handler of features · 296f8da3
      David Sterba authored
      commit ee611138 upstream.
      
      We don't want to trigger the change on a read-only filesystem, similar
      to what the label handler does.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      296f8da3
    • Anand Jain's avatar
      btrfs: fix lock dep warning, move scratch dev out of device_list_mutex and uuid_mutex · 72de21b0
      Anand Jain authored
      commit 779bf3fe upstream.
      
      When the replace target fails, the target device will be taken
      out of fs device list, scratch + update_dev_time and freed. However
      we could do the scratch  + update_dev_time and free part after the
      device has been taken out of device list, so that we don't have to
      hold the device_list_mutex and uuid_mutex locks.
      
      Reported issue:
      
      [ 5375.718845] ======================================================
      [ 5375.718846] [ INFO: possible circular locking dependency detected ]
      [ 5375.718849] 4.4.5-scst31x-debug-11+ #40 Not tainted
      [ 5375.718849] -------------------------------------------------------
      [ 5375.718851] btrfs-health/4662 is trying to acquire lock:
      [ 5375.718861]  (sb_writers){.+.+.+}, at: [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
      [ 5375.718862]
      [ 5375.718862] but task is already holding lock:
      [ 5375.718907]  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs]
      [ 5375.718907]
      [ 5375.718907] which lock already depends on the new lock.
      [ 5375.718907]
      [ 5375.718908]
      [ 5375.718908] the existing dependency chain (in reverse order) is:
      [ 5375.718911]
      [ 5375.718911] -> #3 (&fs_devs->device_list_mutex){+.+.+.}:
      [ 5375.718917]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.718921]        [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0
      [ 5375.718940]        [<ffffffffa0219bf6>] btrfs_show_devname+0x36/0x210 [btrfs]
      [ 5375.718945]        [<ffffffff81267079>] show_vfsmnt+0x49/0x150
      [ 5375.718948]        [<ffffffff81240b07>] m_show+0x17/0x20
      [ 5375.718951]        [<ffffffff81246868>] seq_read+0x2d8/0x3b0
      [ 5375.718955]        [<ffffffff8121df28>] __vfs_read+0x28/0xd0
      [ 5375.718959]        [<ffffffff8121e806>] vfs_read+0x86/0x130
      [ 5375.718962]        [<ffffffff8121f4c9>] SyS_read+0x49/0xa0
      [ 5375.718966]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [ 5375.718968]
      [ 5375.718968] -> #2 (namespace_sem){+++++.}:
      [ 5375.718971]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.718974]        [<ffffffff81635199>] down_write+0x49/0x80
      [ 5375.718977]        [<ffffffff81243593>] lock_mount+0x43/0x1c0
      [ 5375.718979]        [<ffffffff81243c13>] do_add_mount+0x23/0xd0
      [ 5375.718982]        [<ffffffff81244afb>] do_mount+0x27b/0xe30
      [ 5375.718985]        [<ffffffff812459dc>] SyS_mount+0x8c/0xd0
      [ 5375.718988]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [ 5375.718991]
      [ 5375.718991] -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}:
      [ 5375.718994]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.718996]        [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0
      [ 5375.719001]        [<ffffffff8122d608>] path_openat+0x468/0x1360
      [ 5375.719004]        [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
      [ 5375.719007]        [<ffffffff8121da7b>] do_sys_open+0x12b/0x210
      [ 5375.719010]        [<ffffffff8121db7e>] SyS_open+0x1e/0x20
      [ 5375.719013]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [ 5375.719015]
      [ 5375.719015] -> #0 (sb_writers){.+.+.+}:
      [ 5375.719018]        [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0
      [ 5375.719021]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.719026]        [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0
      [ 5375.719028]        [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
      [ 5375.719031]        [<ffffffff81242eb4>] mnt_want_write+0x24/0x50
      [ 5375.719035]        [<ffffffff8122ded2>] path_openat+0xd32/0x1360
      [ 5375.719037]        [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
      [ 5375.719040]        [<ffffffff8121d8a4>] file_open_name+0xe4/0x130
      [ 5375.719043]        [<ffffffff8121d923>] filp_open+0x33/0x60
      [ 5375.719073]        [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs]
      [ 5375.719099]        [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs]
      [ 5375.719123]        [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs]
      [ 5375.719150]        [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs]
      [ 5375.719175]        [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs]
      [ 5375.719199]        [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs]
      [ 5375.719222]        [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs]
      [ 5375.719225]        [<ffffffff810a70df>] kthread+0xef/0x110
      [ 5375.719229]        [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70
      [ 5375.719230]
      [ 5375.719230] other info that might help us debug this:
      [ 5375.719230]
      [ 5375.719233] Chain exists of:
      [ 5375.719233]   sb_writers --> namespace_sem --> &fs_devs->device_list_mutex
      [ 5375.719233]
      [ 5375.719234]  Possible unsafe locking scenario:
      [ 5375.719234]
      [ 5375.719234]        CPU0                    CPU1
      [ 5375.719235]        ----                    ----
      [ 5375.719236]   lock(&fs_devs->device_list_mutex);
      [ 5375.719238]                                lock(namespace_sem);
      [ 5375.719239]                                lock(&fs_devs->device_list_mutex);
      [ 5375.719241]   lock(sb_writers);
      [ 5375.719241]
      [ 5375.719241]  *** DEADLOCK ***
      [ 5375.719241]
      [ 5375.719243] 4 locks held by btrfs-health/4662:
      [ 5375.719266]  #0:  (&fs_info->health_mutex){+.+.+.}, at: [<ffffffffa0246303>] health_kthread+0x63/0x490 [btrfs]
      [ 5375.719293]  #1:  (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.+.}, at: [<ffffffffa02c6611>] btrfs_dev_replace_finishing+0x41/0x990 [btrfs]
      [ 5375.719319]  #2:  (uuid_mutex){+.+.+.}, at: [<ffffffffa0282620>] btrfs_destroy_dev_replace_tgtdev+0x20/0x150 [btrfs]
      [ 5375.719343]  #3:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs]
      [ 5375.719343]
      [ 5375.719343] stack backtrace:
      [ 5375.719347] CPU: 2 PID: 4662 Comm: btrfs-health Not tainted 4.4.5-scst31x-debug-11+ #40
      [ 5375.719348] Hardware name: Supermicro SYS-6018R-WTRT/X10DRW-iT, BIOS 1.0c 01/07/2015
      [ 5375.719352]  0000000000000000 ffff880856f73880 ffffffff813529e3 ffffffff826182a0
      [ 5375.719354]  ffffffff8260c090 ffff880856f738c0 ffffffff810d667c ffff880856f73930
      [ 5375.719357]  ffff880861f32b40 ffff880861f32b68 0000000000000003 0000000000000004
      [ 5375.719357] Call Trace:
      [ 5375.719363]  [<ffffffff813529e3>] dump_stack+0x85/0xc2
      [ 5375.719366]  [<ffffffff810d667c>] print_circular_bug+0x1ec/0x260
      [ 5375.719369]  [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0
      [ 5375.719373]  [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
      [ 5375.719376]  [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.719378]  [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0
      [ 5375.719383]  [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0
      [ 5375.719385]  [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0
      [ 5375.719387]  [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
      [ 5375.719389]  [<ffffffff81242eb4>] mnt_want_write+0x24/0x50
      [ 5375.719393]  [<ffffffff8122ded2>] path_openat+0xd32/0x1360
      [ 5375.719415]  [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs]
      [ 5375.719418]  [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
      [ 5375.719420]  [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
      [ 5375.719423]  [<ffffffff810f615d>] ? rcu_read_lock_sched_held+0x6d/0x80
      [ 5375.719426]  [<ffffffff81201a9b>] ? kmem_cache_alloc+0x26b/0x5d0
      [ 5375.719430]  [<ffffffff8122e7d4>] ? getname_kernel+0x34/0x120
      [ 5375.719433]  [<ffffffff8121d8a4>] file_open_name+0xe4/0x130
      [ 5375.719436]  [<ffffffff8121d923>] filp_open+0x33/0x60
      [ 5375.719462]  [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs]
      [ 5375.719485]  [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs]
      [ 5375.719506]  [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs]
      [ 5375.719530]  [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs]
      [ 5375.719554]  [<ffffffffa02c6b23>] ? btrfs_dev_replace_finishing+0x553/0x990 [btrfs]
      [ 5375.719576]  [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs]
      [ 5375.719598]  [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs]
      [ 5375.719621]  [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs]
      [ 5375.719641]  [<ffffffffa02463d8>] ? health_kthread+0x138/0x490 [btrfs]
      [ 5375.719661]  [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs]
      [ 5375.719663]  [<ffffffff810a70df>] kthread+0xef/0x110
      [ 5375.719666]  [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200
      [ 5375.719669]  [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70
      [ 5375.719672]  [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200
      [ 5375.719697] ------------[ cut here ]------------
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reported-by: default avatarYauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72de21b0
    • Luis de Bethencourt's avatar
      btrfs: avoid overflowing f_bfree · 30083555
      Luis de Bethencourt authored
      commit 41b34acc upstream.
      
      Since mixed block groups accounting isn't byte-accurate and f_bree is an
      unsigned integer, it could overflow. Avoid this.
      Signed-off-by: default avatarLuis de Bethencourt <luisbg@osg.samsung.com>
      Suggested-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30083555
    • Luis de Bethencourt's avatar
      btrfs: fix mixed block count of available space · 12e13702
      Luis de Bethencourt authored
      commit ae02d1bd upstream.
      
      Metadata for mixed block is already accounted in total data and should not
      be counted as part of the free metadata space.
      Signed-off-by: default avatarLuis de Bethencourt <luisbg@osg.samsung.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=114281Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12e13702
    • Austin S. Hemmelgarn's avatar
      btrfs: allow balancing to dup with multi-device · fd76c86b
      Austin S. Hemmelgarn authored
      commit 88be159c upstream.
      
      Currently, we don't allow the user to try and rebalance to a dup profile
      on a multi-device filesystem.  In most cases, this is a perfectly sensible
      restriction as raid1 uses the same amount of space and provides better
      protection.
      
      However, when reshaping a multi-device filesystem down to a single device
      filesystem, this requires the user to convert metadata and system chunks
      to single profile before deleting devices, and then convert again to dup,
      which leaves a period of time where metadata integrity is reduced.
      
      This patch removes the single-device-only restriction from converting to
      dup profile to remove this potential data integrity reduction.
      Signed-off-by: default avatarAustin S. Hemmelgarn <ahferroin7@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd76c86b
    • Liu Bo's avatar
      Btrfs: do not create empty block group if we have allocated data · aaef5d1d
      Liu Bo authored
      commit cf25ce51 upstream.
      
      Now we force to create empty block group to keep data profile alive,
      however, in the below example, we eventually get an empty block group
      while we're trying to get more space for other types (metadata/system),
      
      - Before,
      block group "A": size=2G, used=1.2G
      block group "B": size=2G, used=512M
      
      - After "btrfs balance start -dusage=50 mount_point",
      block group "A": size=2G, used=(1.2+0.5)G
      block group "C": size=2G, used=0
      
      Since there is no data in block group C, it won't be deleted
      automatically and we have to get the unused 2G until the next mount.
      
      Balance itself just moves data and doesn't remove data, so it's safe
      to not create such a empty block group if we already have data
       allocated in other block groups.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aaef5d1d
    • Luke Dashjr's avatar
      btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl · 8dbf3ff1
      Luke Dashjr authored
      commit 4c63c245 upstream.
      
      32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can
      be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr
      fail.
      Signed-off-by: default avatarLuke Dashjr <luke-jr+git@utopios.org>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dbf3ff1
    • Dave Chinner's avatar
      xfs: skip stale inodes in xfs_iflush_cluster · edd7cd7a
      Dave Chinner authored
      commit 7d3aa7fe upstream.
      
      We don't write back stale inodes so we should skip them in
      xfs_iflush_cluster, too.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edd7cd7a
    • Dave Chinner's avatar
      xfs: fix inode validity check in xfs_iflush_cluster · a7fd4331
      Dave Chinner authored
      commit 51b07f30 upstream.
      
      Some careless idiot(*) wrote crap code in commit 1a3e8f3d ("xfs:
      convert inode cache lookups to use RCU locking") back in late 2010,
      and so xfs_iflush_cluster checks the wrong inode for whether it is
      still valid under RCU protection. Fix it to lock and check the
      correct inode.
      
      (*) Careless-idiot: Dave Chinner <dchinner@redhat.com>
      Discovered-by: default avatarBrain Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7fd4331
    • Dave Chinner's avatar
      xfs: xfs_iflush_cluster fails to abort on error · 431ea8f5
      Dave Chinner authored
      commit b1438f47 upstream.
      
      When a failure due to an inode buffer occurs, the error handling
      fails to abort the inode writeback correctly. This can result in the
      inode being reclaimed whilst still in the AIL, leading to
      use-after-free situations as well as filesystems that cannot be
      unmounted as the inode log items left in the AIL never get removed.
      
      Fix this by ensuring fatal errors from xfs_imap_to_bp() result in
      the inode flush being aborted correctly.
      Reported-by: default avatarShyam Kaushik <shyam@zadarastorage.com>
      Diagnosed-by: default avatarShyam Kaushik <shyam@zadarastorage.com>
      Tested-by: default avatarShyam Kaushik <shyam@zadarastorage.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      431ea8f5
    • Dave Chinner's avatar
      xfs: Don't wrap growfs AGFL indexes · df89f87a
      Dave Chinner authored
      commit ad747e3b upstream.
      
      Commit 96f859d5 ("libxfs: pack the agfl header structure so
      XFS_AGFL_SIZE is correct") allowed the freelist to use the empty
      slot at the end of the freelist on 64 bit systems that was not
      being used due to sizeof() rounding up the structure size.
      
      This has caused versions of xfs_repair prior to 4.5.0 (which also
      has the fix) to report this as a corruption once the filesystem has
      been grown. Older kernels can also have problems (seen from a whacky
      container/vm management environment) mounting filesystems grown on a
      system with a newer kernel than the vm/container it is deployed on.
      
      To avoid this problem, change the initial free list indexes not to
      wrap across the end of the AGFL, hence avoiding the initialisation
      of agf_fllast to the last index in the AGFL.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df89f87a
    • Eric Sandeen's avatar
      xfs: disallow rw remount on fs with unknown ro-compat features · 80d8d896
      Eric Sandeen authored
      commit d0a58e83 upstream.
      
      Today, a kernel which refuses to mount a filesystem read-write
      due to unknown ro-compat features can still transition to read-write
      via the remount path.  The old kernel is most likely none the wiser,
      because it's unaware of the new feature, and isn't using it.  However,
      writing to the filesystem may well corrupt metadata related to that
      new feature, and moving to a newer kernel which understand the feature
      will have problems.
      
      Right now the only ro-compat feature we have is the free inode btree,
      which showed up in v3.16.  It would be good to push this back to
      all the active stable kernels, I think, so that if anyone is using
      newer mkfs (which enables the finobt feature) with older kernel
      releases, they'll be protected.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBill O'Donnell <billodo@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80d8d896
    • Arnd Bergmann's avatar
      gcov: disable tree-loop-im to reduce stack usage · 45654872
      Arnd Bergmann authored
      commit c87bf431 upstream.
      
      Enabling CONFIG_GCOV_PROFILE_ALL produces us a lot of warnings like
      
      lib/lz4/lz4hc_compress.c: In function 'lz4_compresshcctx':
      lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1504 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      After some investigation, I found that this behavior started with gcc-4.9,
      and opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702.
      A suggested workaround for it is to use the -fno-tree-loop-im
      flag that turns off one of the optimization stages in gcc, so the
      code runs a little slower but does not use excessive amounts
      of stack.
      
      We could make this conditional on the gcc version, but I could not
      find an easy way to do this in Kbuild and the benefit would be
      fairly small, given that most of the gcc version in production are
      affected now.
      
      I'm marking this for 'stable' backports because it addresses a bug
      with code generation in gcc that exists in all kernel versions
      with the affected gcc releases.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichal Marek <mmarek@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45654872
    • Kirill A. Shutemov's avatar
      mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap() · 45a9709c
      Kirill A. Shutemov authored
      commit 0798d3c0 upstream.
      
      If page_move_anon_rmap() is refiling a pmd-splitted THP mapped in a tail
      page from a pte, the "address" must be THP aligned in order for the
      page->index bugcheck to pass in the CONFIG_DEBUG_VM=y builds.
      
      Link: http://lkml.kernel.org/r/1464253620-106404-1-git-send-email-kirill.shutemov@linux.intel.com
      Fixes: 6d0a07ed ("mm: thp: calculate the mapcount correctly for THP pages during WP faults")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Tested-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45a9709c
    • Srinivas Pandruvada's avatar
      scripts/package/Makefile: rpmbuild add support of RPMOPTS · 5ba525be
      Srinivas Pandruvada authored
      commit 65a9f31c upstream.
      
      After commit 21a59991 ("scripts/package/Makefile: rpmbuild is needed
      for rpm targets"), it is no longer possible to specify RPMOPTS.
      For example, we can no longer able to control _topdir using the following
      make command.
      make RPMOPTS="--define '_topdir /home/xyz/workspace/'" binrpm-pkg
      
      Fixes: 21a59991 ("scripts/package/Makefile: rpmbuild is needed for rpm targets")
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarMichal Marek <mmarek@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ba525be
    • Ville Syrjälä's avatar
      dma-debug: avoid spinlock recursion when disabling dma-debug · e799b0ec
      Ville Syrjälä authored
      commit 3017cd63 upstream.
      
      With netconsole (at least) the pr_err("...  disablingn") call can
      recurse back into the dma-debug code, where it'll try to grab
      free_entries_lock again.  Avoid the problem by doing the printk after
      dropping the lock.
      
      Link: http://lkml.kernel.org/r/1463678421-18683-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e799b0ec
    • Rafael J. Wysocki's avatar
      PM / sleep: Handle failures in device_suspend_late() consistently · 0c42f6c9
      Rafael J. Wysocki authored
      commit 3a17fb32 upstream.
      
      Grygorii Strashko reports:
      
       The PM runtime will be left disabled for the device if its
       .suspend_late() callback fails and async suspend is not allowed
       for this device. In this case device will not be added in
       dpm_late_early_list and dpm_resume_early() will ignore this
       device, as result PM runtime will be disabled for it forever
       (side effect: after 8 subsequent failures for the same device
       the PM runtime will be reenabled due to disable_depth overflow).
      
      To fix this problem, add devices to dpm_late_early_list regardless
      of whether or not device_suspend_late() returns errors for them.
      
      That will ensure failures in there to be handled consistently for
      all devices regardless of their async suspend/resume status.
      Reported-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c42f6c9
    • Weston Andros Adamson's avatar
      nfs: avoid race that crashes nfs_init_commit · 3ae21caf
      Weston Andros Adamson authored
      commit ade8febd upstream.
      
      Since the patch "NFS: Allow multiple commit requests in flight per file"
      we can run multiple simultaneous commits on the same inode.  This
      introduced a race over collecting pages to commit that made it possible
      to call nfs_init_commit() with an empty list - which causes crashes like
      the one below.
      
      The fix is to catch this race and avoid calling nfs_init_commit and
      initiate_commit when there is no work to do.
      
      Here is the crash:
      
      [600522.076832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      [600522.078475] IP: [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
      [600522.078745] PGD 4272b1067 PUD 4272cb067 PMD 0
      [600522.078972] Oops: 0000 [#1] SMP
      [600522.079204] Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vmw_vsock_vmci_transport vsock bonding ipmi_devintf ipmi_msghandler coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev vmw_balloon parport_pc parport acpi_cpufreq vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmxnet3
      [600522.081380]  vmw_pvscsi ata_generic pata_acpi
      [600522.081809] CPU: 3 PID: 15667 Comm: /usr/bin/python Not tainted 4.1.9-100.pd.88.el7.x86_64 #1
      [600522.082281] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
      [600522.082814] task: ffff8800bbbfa780 ti: ffff88042ae84000 task.ti: ffff88042ae84000
      [600522.083378] RIP: 0010:[<ffffffffa0479e72>]  [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
      [600522.083973] RSP: 0018:ffff88042ae87438  EFLAGS: 00010246
      [600522.084571] RAX: 0000000000000000 RBX: ffff880003485e40 RCX: ffff88042ae87588
      [600522.085188] RDX: 0000000000000000 RSI: ffff88042ae874b0 RDI: ffff880003485e40
      [600522.085756] RBP: ffff88042ae87448 R08: ffff880003486010 R09: ffff88042ae874b0
      [600522.086332] R10: 0000000000000000 R11: 0000000000000005 R12: ffff88042ae872d0
      [600522.086905] R13: ffff88042ae874b0 R14: ffff880003485e40 R15: ffff88042704c840
      [600522.087484] FS:  00007f4728ff2740(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
      [600522.088070] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [600522.088663] CR2: 0000000000000040 CR3: 000000042b6aa000 CR4: 00000000001406e0
      [600522.089327] Stack:
      [600522.089926]  0000000000000001 ffff88042ae87588 ffff88042ae874f8 ffffffffa04f09fa
      [600522.090549]  0000000000017840 0000000000017840 ffff88042ae87588 ffff8803258d9930
      [600522.091169]  ffff88042ae87578 ffffffffa0563d80 0000000000000000 ffff88042704c840
      [600522.091789] Call Trace:
      [600522.092420]  [<ffffffffa04f09fa>] pnfs_generic_commit_pagelist+0x1da/0x320 [nfsv4]
      [600522.093052]  [<ffffffffa0563d80>] ? ff_layout_commit_prepare_v3+0x30/0x30 [nfs_layout_flexfiles]
      [600522.093696]  [<ffffffffa0562645>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles]
      [600522.094359]  [<ffffffffa047bc78>] nfs_generic_commit_list+0xe8/0x120 [nfs]
      [600522.095032]  [<ffffffffa047bd6a>] nfs_commit_inode+0xba/0x110 [nfs]
      [600522.095719]  [<ffffffffa046ac54>] nfs_release_page+0x44/0xd0 [nfs]
      [600522.096410]  [<ffffffff811a8122>] try_to_release_page+0x32/0x50
      [600522.097109]  [<ffffffff811bd4f1>] shrink_page_list+0x961/0xb30
      [600522.097812]  [<ffffffff811bdced>] shrink_inactive_list+0x1cd/0x550
      [600522.098530]  [<ffffffff811bea65>] shrink_lruvec+0x635/0x840
      [600522.099250]  [<ffffffff811bed60>] shrink_zone+0xf0/0x2f0
      [600522.099974]  [<ffffffff811bf312>] do_try_to_free_pages+0x192/0x470
      [600522.100709]  [<ffffffff811bf6ca>] try_to_free_pages+0xda/0x170
      [600522.101464]  [<ffffffff811b2198>] __alloc_pages_nodemask+0x588/0x970
      [600522.102235]  [<ffffffff811fbbd5>] alloc_pages_vma+0xb5/0x230
      [600522.103000]  [<ffffffff813a1589>] ? cpumask_any_but+0x39/0x50
      [600522.103774]  [<ffffffff811d6115>] wp_page_copy.isra.55+0x95/0x490
      [600522.104558]  [<ffffffff810e3438>] ? __wake_up+0x48/0x60
      [600522.105357]  [<ffffffff811d7d3b>] do_wp_page+0xab/0x4f0
      [600522.106137]  [<ffffffff810a1bbb>] ? release_task+0x36b/0x470
      [600522.106902]  [<ffffffff8126dbd7>] ? eventfd_ctx_read+0x67/0x1c0
      [600522.107659]  [<ffffffff811da2a8>] handle_mm_fault+0xc78/0x1900
      [600522.108431]  [<ffffffff81067ef1>] __do_page_fault+0x181/0x420
      [600522.109173]  [<ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280
      [600522.109893]  [<ffffffff810681c0>] do_page_fault+0x30/0x80
      [600522.110594]  [<ffffffff81024f36>] ? syscall_trace_leave+0xc6/0x120
      [600522.111288]  [<ffffffff81790a58>] page_fault+0x28/0x30
      [600522.111947] Code: 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 4c 8d 87 d0 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce <48> 8b 40 40 4c 8b 50 30 74 24 48 8b 87 d0 01 00 00 48 8b 7e 08
      [600522.113343] RIP  [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
      [600522.114003]  RSP <ffff88042ae87438>
      [600522.114636] CR2: 0000000000000040
      
      Fixes: af7cf057 (NFS: Allow multiple commit requests in flight per file)
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ae21caf
    • Nicolai Stange's avatar
      ext4: silence UBSAN in ext4_mb_init() · fec0f961
      Nicolai Stange authored
      commit 935244cd upstream.
      
      Currently, in ext4_mb_init(), there's a loop like the following:
      
        do {
          ...
          offset += 1 << (sb->s_blocksize_bits - i);
          i++;
        } while (i <= sb->s_blocksize_bits + 1);
      
      Note that the updated offset is used in the loop's next iteration only.
      
      However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
      the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
      and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15
        shift exponent 4294967295 is too large for 32-bit type 'int'
        [...]
        Call Trace:
         [<ffffffff818c4d25>] dump_stack+0xbc/0x117
         [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390
         [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0
         [<ffffffff814293c7>] ? create_cache+0x57/0x1f0
         [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0
         [<ffffffff821c2168>] ? mutex_lock+0x38/0x60
         [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50
         [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0
         [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0
         [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0
         [...]
      
      Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of offset is never used again.
      
      Silence UBSAN by introducing another variable, offset_incr, holding the
      next increment to apply to offset and adjust that one by right shifting it
      by one position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fec0f961
    • Nicolai Stange's avatar
      ext4: address UBSAN warning in mb_find_order_for_block() · 560f7098
      Nicolai Stange authored
      commit b5cb316c upstream.
      
      Currently, in mb_find_order_for_block(), there's a loop like the following:
      
        while (order <= e4b->bd_blkbits + 1) {
          ...
          bb += 1 << (e4b->bd_blkbits - order);
        }
      
      Note that the updated bb is used in the loop's next iteration only.
      
      However, at the last iteration, that is at order == e4b->bd_blkbits + 1,
      the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11
        shift exponent -1 is negative
        [...]
        Call Trace:
         [<ffffffff818c4d35>] dump_stack+0xbc/0x117
         [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590
         [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80
         [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240
         [...]
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of bb is never used again.
      
      Silence UBSAN by introducing another variable, bb_incr, holding the next
      increment to apply to bb and adjust that one by right shifting it by one
      position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      560f7098
    • Jan Kara's avatar
      ext4: fix oops on corrupted filesystem · 6ce15a9a
      Jan Kara authored
      commit 74177f55 upstream.
      
      When filesystem is corrupted in the right way, it can happen
      ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we
      subsequently remove inode from the in-memory orphan list. However this
      deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we
      leave i_orphan list_head with a stale content. Later we can look at this
      content causing list corruption, oops, or other issues. The reported
      trace looked like:
      
      WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100()
      list_del corruption, 0000000061c1d6e0->next is LIST_POISON1
      0000000000100100)
      CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250
      Stack:
       60462947 62219960 602ede24 62219960
       602ede24 603ca293 622198f0 602f02eb
       62219950 6002c12c 62219900 601b4d6b
      Call Trace:
       [<6005769c>] ? vprintk_emit+0x2dc/0x5c0
       [<602ede24>] ? printk+0x0/0x94
       [<600190bc>] show_stack+0xdc/0x1a0
       [<602ede24>] ? printk+0x0/0x94
       [<602ede24>] ? printk+0x0/0x94
       [<602f02eb>] dump_stack+0x2a/0x2c
       [<6002c12c>] warn_slowpath_common+0x9c/0xf0
       [<601b4d6b>] ? __list_del_entry+0x6b/0x100
       [<6002c254>] warn_slowpath_fmt+0x94/0xa0
       [<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0
       [<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0
       [<60023ebf>] ? set_signals+0x3f/0x50
       [<600a205a>] ? kmem_cache_free+0x10a/0x180
       [<602f4e88>] ? mutex_lock+0x18/0x30
       [<601b4d6b>] __list_del_entry+0x6b/0x100
       [<601177ec>] ext4_orphan_del+0x22c/0x2f0
       [<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0
       [<6010b973>] ? ext4_truncate+0x383/0x390
       [<6010bc8b>] ext4_write_begin+0x30b/0x4b0
       [<6001bb50>] ? copy_from_user+0x0/0xb0
       [<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0
       [<60072c4f>] generic_perform_write+0xaf/0x1e0
       [<600c4166>] ? file_update_time+0x46/0x110
       [<60072f0f>] __generic_file_write_iter+0x18f/0x1b0
       [<6010030f>] ext4_file_write_iter+0x15f/0x470
       [<60094e10>] ? unlink_file_vma+0x0/0x70
       [<6009b180>] ? unlink_anon_vmas+0x0/0x260
       [<6008f169>] ? free_pgtables+0xb9/0x100
       [<600a6030>] __vfs_write+0xb0/0x130
       [<600a61d5>] vfs_write+0xa5/0x170
       [<600a63d6>] SyS_write+0x56/0xe0
       [<6029fcb0>] ? __libc_waitpid+0x0/0xa0
       [<6001b698>] handle_syscall+0x68/0x90
       [<6002633d>] userspace+0x4fd/0x600
       [<6002274f>] ? save_registers+0x1f/0x40
       [<60028bd7>] ? arch_prctl+0x177/0x1b0
       [<60017bd5>] fork_handler+0x85/0x90
      
      Fix the problem by using list_del_init() as we always should with
      i_orphan list.
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ce15a9a
    • Seth Forshee's avatar
      ext4: fix check of dqget() return value in ext4_ioctl_setproject() · 2d337f81
      Seth Forshee authored
      commit ff0bc084 upstream.
      
      A failed call to dqget() returns an ERR_PTR() and not null. Fix
      the check in ext4_ioctl_setproject() to handle this correctly.
      
      Fixes: 9b7365fc ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support")
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d337f81
    • Theodore Ts'o's avatar
      ext4: clean up error handling when orphan list is corrupted · fdfcb7de
      Theodore Ts'o authored
      commit 7827a7f6 upstream.
      
      Instead of just printing warning messages, if the orphan list is
      corrupted, declare the file system is corrupted.  If there are any
      reserved inodes in the orphaned inode list, declare the file system
      corrupted and stop right away to avoid doing more potential damage to
      the file system.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdfcb7de
    • Theodore Ts'o's avatar
      ext4: fix hang when processing corrupted orphaned inode list · b0bd19d3
      Theodore Ts'o authored
      commit c9eb13a9 upstream.
      
      If the orphaned inode list contains inode #5, ext4_iget() returns a
      bad inode (since the bootloader inode should never be referenced
      directly).  Because of the bad inode, we end up processing the inode
      repeatedly and this hangs the machine.
      
      This can be reproduced via:
      
         mke2fs -t ext4 /tmp/foo.img 100
         debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
         mount -o loop /tmp/foo.img /mnt
      
      (But don't do this if you are using an unpatched kernel if you care
      about the system staying functional.  :-)
      
      This bug was found by the port of American Fuzzy Lop into the kernel
      to find file system problems[1].  (Since it *only* happens if inode #5
      shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
      surprising that AFL needed two hours before it found it.)
      
      [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf
      
      Reported by: Vegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0bd19d3
    • Philipp Zabel's avatar
      drm/imx: Match imx-ipuv3-crtc components using device node in platform data · 8bf62654
      Philipp Zabel authored
      commit 310944d1 upstream.
      
      The component master driver imx-drm-core matches component devices using
      their of_node. Since commit 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc
      module autoloading"), the imx-ipuv3-crtc dev->of_node is not set during
      probing. Before that, of_node was set and caused an of: modalias to be
      used instead of the platform: modalias, which broke module autoloading.
      
      On the other hand, if dev->of_node is not set yet when the imx-ipuv3-crtc
      probe function calls component_add, component matching in imx-drm-core
      fails. While dev->of_node will be set once the next component tries to
      bring up the component master, imx-drm-core component binding will never
      succeed if one of the crtc devices is probed last.
      
      Add of_node to the component platform data and match against the
      pdata->of_node instead of dev->of_node in imx-drm-core to work around
      this problem.
      
      Fixes: 950b410dd1ab ("gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading")
      Signed-off-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Tested-by: default avatarFabio Estevam <fabio.estevam@nxp.com>
      Tested-by: default avatarLothar Waßmann <LW@KARO-electronics.de>
      Tested-by: default avatarHeiko Schocher <hs@denx.de>
      Tested-by: default avatarChris Ruehl <chris.ruehl@gtsys.com.hk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bf62654