1. 16 Dec, 2016 30 commits
    • Amir Goldstein's avatar
      ovl: show redirect_dir mount option · c5bef3a7
      Amir Goldstein authored
      Show the value of redirect_dir in /proc/mounts.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      c5bef3a7
    • Miklos Szeredi's avatar
      ovl: allow setting max size of redirect · 3ea22a71
      Miklos Szeredi authored
      Add a module option to allow tuning the max size of absolute redirects.
      Default is 256.
      
      Size of relative redirects is naturally limited by the the underlying
      filesystem's max filename length (usually 255).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      3ea22a71
    • Miklos Szeredi's avatar
      ovl: allow redirect_dir to default to "on" · 688ea0e5
      Miklos Szeredi authored
      This patch introduces a kernel config option and a module param.  Both can
      be used independently to turn the default value of redirect_dir on or off.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      688ea0e5
    • Amir Goldstein's avatar
      ovl: check for emptiness of redirect dir · d1595119
      Amir Goldstein authored
      Before introducing redirect_dir feature, the condition
      !ovl_lower_positive(dentry) for a directory, implied that it is a pure
      upper directory, which may be removed if empty.
      
      Now that directory can be redirect, it is possible that upper does not
      cover any lower (i.e. !ovl_lower_positive(dentry)), but the directory is a
      merge (with redirected path) and maybe non empty.
      
      Check for this case in ovl_remove_upper().
      
      This change fixes the following test case from rename-pop-dir.py
      of unionmount-testsuite:
      
          """Remove dir and rename old name"""
          d = ctx.non_empty_dir()
          d2 = ctx.no_dir()
      
          ctx.rmdir(d, err=ENOTEMPTY)
          ctx.rename(d, d2)
          ctx.rmdir(d, err=ENOENT)
          ctx.rmdir(d2, err=ENOTEMPTY)
      
      ./run --ov rename-pop-dir
      /mnt/a/no_dir103: Expected error (Directory not empty) was not produced
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      d1595119
    • Miklos Szeredi's avatar
      ovl: redirect on rename-dir · a6c60655
      Miklos Szeredi authored
      Current code returns EXDEV when a directory would need to be copied up to
      move.  We could copy up the directory tree in this case, but there's
      another, simpler solution: point to old lower directory from moved upper
      directory.
      
      This is achieved with a "trusted.overlay.redirect" xattr storing the path
      relative to the root of the overlay.  After such attribute has been set,
      the directory can be moved without further actions required.
      
      This is a backward incompatible feature, old kernels won't be able to
      correctly mount an overlay containing redirected directories.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      a6c60655
    • Miklos Szeredi's avatar
      ovl: lookup redirects · 02b69b28
      Miklos Szeredi authored
      If a directory has the "trusted.overlay.redirect" xattr, it means that the
      value of the xattr should be used to find the underlying directory on the
      next lower layer.
      
      The redirect may be relative or absolute.  Absolute redirects begin with a
      slash.
      
      A relative redirect means: instead of the current dentry's name use the
      value of the redirect to find the directory in the next lower
      layer. Relative redirects must not contain a slash.
      
      An absolute redirect means: look up the directory relative to the root of
      the overlay using the value of the redirect in the next lower layer.
      
      Redirects work on lower layers as well.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      02b69b28
    • Miklos Szeredi's avatar
      ovl: consolidate lookup for underlying layers · e28edc46
      Miklos Szeredi authored
      Use a common helper for lookup of upper and lower layers.  This paves the
      way for looking up directory redirects.
      
      No functional change.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e28edc46
    • Amir Goldstein's avatar
      ovl: fix nested overlayfs mount · 48fab5d7
      Amir Goldstein authored
      When the upper overlayfs checks "trusted.overlay.*" xattr on the underlying
      overlayfs mount, it gets -EPERM, which confuses the upper overlayfs.
      
      Fix this by returning -EOPNOTSUPP instead of -EPERM from
      ovl_own_xattr_get() and ovl_own_xattr_set().  This behavior is consistent
      with the behavior of ovl_listxattr(), which filters out the private
      overlayfs xattrs.
      
      Note: nested overlays are deprecated.  But this change makes sense
      regardless: these xattrs are private to the overlay and should always be
      hidden.  Hence getting and setting them should indicate this.
      
      [SzMi: Use EOPNOTSUPP instead of ENODATA and use it for both getting and
      setting "trusted.overlay." xattrs.  This is a perfectly valid error code
      for "we don't support this prefix", which is the case here.]
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      48fab5d7
    • Miklos Szeredi's avatar
      ovl: check namelen · 6b2d5fe4
      Miklos Szeredi authored
      We already calculate f_namelen in statfs as the maximum of the name lengths
      provided by the filesystems taking part in the overlay.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      6b2d5fe4
    • Miklos Szeredi's avatar
      ovl: split super.c · bbb1e54d
      Miklos Szeredi authored
      fs/overlayfs/super.c is the biggest of the overlayfs source files and it
      contains various utility functions as well as the rather complicated lookup
      code.  Split these parts out to separate files.
      
      Before:
      
       1446 fs/overlayfs/super.c
      
      After:
      
        919 fs/overlayfs/super.c
        267 fs/overlayfs/namei.c
        235 fs/overlayfs/util.c
         51 fs/overlayfs/ovl_entry.h
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      bbb1e54d
    • Miklos Szeredi's avatar
      ovl: use d_is_dir() · 2b8c30e9
      Miklos Szeredi authored
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2b8c30e9
    • Miklos Szeredi's avatar
      ovl: simplify lookup · 8ee6059c
      Miklos Szeredi authored
      If encountering a non-directory, then stop looking at lower layers.
      
      In this case the oe->opaque flag is not set anymore, which doesn't matter
      since existence of lower file is now checked at remove/rename time.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      8ee6059c
    • Miklos Szeredi's avatar
      ovl: check lower existence of rename target · 3ee23ff1
      Miklos Szeredi authored
      Check if something exists on the lower layer(s) under the target or rename
      to decide if directory needs to be marked "opaque".
      
      Marking opaque is done before the rename, and on failure the marking was
      undone.  Also the opaque xattr was removed if the target didn't cover
      anything.
      
      This patch changes behavior so that removal of "opaque" is not done in
      either of the above cases.  This means that directory may have the opaque
      flag even if it doesn't cover anything.  However this shouldn't affect the
      performance or semantics of the overalay, while simplifying the code.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      3ee23ff1
    • Miklos Szeredi's avatar
      ovl: rename: simplify handling of lower/merged directory · 370e55ac
      Miklos Szeredi authored
      d_is_dir() is safe to call on a negative dentry.  Use this fact to simplify
      handling of the lower or merged directories.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      370e55ac
    • Miklos Szeredi's avatar
      ovl: get rid of PURE type · 38e813db
      Miklos Szeredi authored
      The remainging uses of __OVL_PATH_PURE can be replaced by
      ovl_dentry_is_opaque().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      38e813db
    • Miklos Szeredi's avatar
      ovl: check lower existence when removing · 2aff4534
      Miklos Szeredi authored
      Currently ovl_lookup() checks existence of lower file even if there's a
      non-directory on upper (which is always opaque).  This is done so that
      remove can decide whether a whiteout is needed or not.
      
      It would be better to defer this check to unlink, since most of the time
      the gathered information about opaqueness will be unused.
      
      This adds a helper ovl_lower_positive() that checks if there's anything on
      the lower layer(s).
      
      The following patches also introduce changes to how the "opaque" attribute
      is updated on directories: this attribute is added when the directory is
      creted or moved over a whiteout or object covering something on the lower
      layer.  However following changes will allow the attribute to remain on the
      directory after being moved, even if the new location doesn't cover
      anything.  Because of this, we need to check lower layers even for opaque
      directories, so that whiteout is only created when necessary.
      
      This function will later be also used to decide about marking a directory
      opaque, so deal with negative dentries as well.  When dealing with
      negative, it's enough to check for being a whiteout
      
      If the dentry is positive but not upper then it also obviously needs
      whiteout/opaque.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2aff4534
    • Miklos Szeredi's avatar
      ovl: add ovl_dentry_is_whiteout() · c412ce49
      Miklos Szeredi authored
      And use it instead of ovl_dentry_is_opaque() where appropriate.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      c412ce49
    • Miklos Szeredi's avatar
      ovl: don't check sticky · 99f5d08e
      Miklos Szeredi authored
      Since commit 07a2daab ("ovl: Copy up underlying inode's ->i_mode to
      overlay inode") sticky checking on overlay inode is performed by the vfs,
      so checking against sticky on underlying inode is not needed.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      99f5d08e
    • Miklos Szeredi's avatar
      ovl: don't check rename to self · 804032fa
      Miklos Szeredi authored
      This is redundant, the vfs already performed this check (and was broken,
      see commit 9409e22a ("vfs: rename: check backing inode being equal")).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      804032fa
    • Miklos Szeredi's avatar
      ovl: treat special files like a regular fs · ca4c8a3a
      Miklos Szeredi authored
      No sense in opening special files on the underlying layers, they work just
      as well if opened on the overlay.
      
      Side effect is that it's no longer possible to connect one side of a pipe
      opened on overlayfs with the other side opened on the underlying layer.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      ca4c8a3a
    • Miklos Szeredi's avatar
      6c02cb59
    • Amir Goldstein's avatar
      ovl: use vfs_clone_file_range() for copy up if possible · 2ea98466
      Amir Goldstein authored
      When copying up within the same fs, try to use vfs_clone_file_range().
      This is very efficient when lower and upper are on the same fs
      with file reflink support. If vfs_clone_file_range() fails for any
      reason, copy up falls back to the regular data copy code.
      
      Tested correct behavior when lower and upper are on:
      1. same ext4 (copy)
      2. same xfs + reflink patches + mkfs.xfs (copy)
      3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink)
      4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy)
      
      For comparison, on my laptop, xfstest overlay/001 (copy up of large
      sparse files) takes less than 1 second in the xfs reflink setup vs.
      25 seconds on the rest of the setups.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2ea98466
    • Miklos Szeredi's avatar
      Revert "ovl: get_write_access() in truncate" · 31c3a706
      Miklos Szeredi authored
      This reverts commit 03bea604.
      
      Commit 4d0c5ba2 ("vfs: do get_write_access() on upper layer of
      overlayfs") makes the writecount checks inside overlayfs superfluous, the
      file is already copied up and write access acquired on the upper inode when
      ovl_setattr is called with ATTR_SIZE.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      31c3a706
    • Miklos Szeredi's avatar
      ovl: update doc · 2d8f2908
      Miklos Szeredi authored
      The quirk for file locks and leases no longer applies.
      
      Add missing info about renaming directory residing on lower layer.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2d8f2908
    • Amir Goldstein's avatar
      vfs: fix vfs_clone_file_range() for overlayfs files · b335e9d9
      Amir Goldstein authored
      With overlayfs, it is wrong to compare file_inode(inode)->i_sb
      of regular files with those of non-regular files, because the
      former reference the real (upper/lower) sb and the latter reference
      the overlayfs sb.
      
      Move the test for same super block after the sanity tests for
      clone range of directory and non-regular file.
      
      This change fixes xfstest generic/157, which returned EXDEV instead
      of EISDIR/EINVAL in the following test cases over overlayfs:
      
        echo "Try to reflink a dir"
        _reflink_range $testdir1/dir1 0 $testdir1/file2 0 $blksz
      
        echo "Try to reflink a device"
        _reflink_range $testdir1/dev1 0 $testdir1/file2 0 $blksz
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      b335e9d9
    • Amir Goldstein's avatar
      vfs: call vfs_clone_file_range() under freeze protection · 031a072a
      Amir Goldstein authored
      Move sb_start_write()/sb_end_write() out of the vfs helper and up into the
      ioctl handler.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      031a072a
    • Amir Goldstein's avatar
      vfs: allow vfs_clone_file_range() across mount points · 913b86e9
      Amir Goldstein authored
      FICLONE/FICLONERANGE ioctls return -EXDEV if src and dest
      files are not on the same mount point.
      Practically, clone only requires that src and dest files
      are on the same file system.
      
      Move the check for same mount point to ioctl handler and keep
      only the check for same super block in the vfs helper.
      
      A following patch is going to use the vfs_clone_file_range()
      helper in overlayfs to copy up between lower and upper
      mount points on the same file system.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      913b86e9
    • Miklos Szeredi's avatar
      vfs: no mnt_want_write_file() in vfs_{copy,clone}_file_range() · 3616119d
      Miklos Szeredi authored
      We've checked for file_out being opened for write.  This ensures that we
      already have mnt_want_write() on target.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      3616119d
    • Miklos Szeredi's avatar
      Revert "vfs: rename: check backing inode being equal" · 8d3e2936
      Miklos Szeredi authored
      This reverts commit 9409e22a.
      
      Since commit 51f7e52d ("ovl: share inode for hard link") there's no
      need to call d_real_inode() to check two overlay inodes for equality.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      8d3e2936
    • Miklos Szeredi's avatar
      Revert "af_unix: fix hard linked sockets on overlay" · beef5121
      Miklos Szeredi authored
      This reverts commit eb0a4a47.
      
      Since commit 51f7e52d ("ovl: share inode for hard link") there's no
      need to call d_real_inode() to check two overlay inodes for equality.
      
      Side effect of this revert is that it's no longer possible to connect one
      socket on overlayfs to one on the underlying layer (something which didn't
      make sense anyway).
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      beef5121
  2. 04 Dec, 2016 2 commits
    • Linus Torvalds's avatar
      Linux 4.9-rc8 · 3e5de27e
      Linus Torvalds authored
      3e5de27e
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux · 0cb65c83
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "A pretty small pull request: a couple of AMD powerxpress regression
        fixes and a power management fix, a couple of i915 fixes and one hdlcd
        fix, along with one core don't oops because of incorrect API usage fix"
      
      * tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux:
        drm/i915: drop the struct_mutex when wedged or trying to reset
        drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
        drm: Don't call drm_for_each_crtc with a non-KMS driver
        drm/radeon: fix check for port PM availability
        drm/amdgpu: fix check for port PM availability
        drm/amd/powerplay: initialize the soft_regs offset in struct smu7_hwmgr
        drm: hdlcd: Fix cleanup order
      0cb65c83
  3. 03 Dec, 2016 4 commits
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2016-12-01' of... · ab7cd8d8
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
      
      2 intel fixes.
      
      * tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel:
        drm/i915: drop the struct_mutex when wedged or trying to reset
        drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
      ab7cd8d8
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 3c49de52
      Linus Torvalds authored
      Merge more fixes from Andrew Morton:
       "2 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, vmscan: add cond_resched() into shrink_node_memcg()
        mm: workingset: fix NULL ptr in count_shadow_nodes
      3c49de52
    • Michal Hocko's avatar
      mm, vmscan: add cond_resched() into shrink_node_memcg() · bd041733
      Michal Hocko authored
      Boris Zhmurov has reported RCU stalls during the kswapd reclaim:
      
        INFO: rcu_sched detected stalls on CPUs/tasks:
         23-...: (22 ticks this GP) idle=92f/140000000000000/0 softirq=2638404/2638404 fqs=23
         (detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115)
        Task dump for CPU 23:
        kswapd1         R  running task        0   148      2 0x00000008
        Call Trace:
          shrink_node+0xd2/0x2f0
          kswapd+0x2cb/0x6a0
          mem_cgroup_shrink_node+0x160/0x160
          kthread+0xbd/0xe0
          __switch_to+0x1fa/0x5c0
          ret_from_fork+0x1f/0x40
          kthread_create_on_node+0x180/0x180
      
      a closer code inspection has shown that we might indeed miss all the
      scheduling points in the reclaim path if no pages can be isolated from
      the LRU list.  This is a pathological case but other reports from Donald
      Buczek have shown that we might indeed hit such a path:
      
              clusterd-989   [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193
               kswapd1-86    [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239830 nr_taken=0 file=1
               kswapd1-86    [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239844 nr_taken=0 file=1
               kswapd1-86    [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239858 nr_taken=0 file=1
               kswapd1-86    [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239872 nr_taken=0 file=1
               kswapd1-86    [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239886 nr_taken=0 file=1
               kswapd1-86    [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239900 nr_taken=0 file=1
               kswapd1-86    [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239914 nr_taken=0 file=1
        [...]
               kswapd1-86    [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4241133 nr_taken=0 file=1
      
      this is minute long snapshot which didn't take a single page from the
      LRU.  It is not entirely clear why only 1303 pages have been scanned
      during that time (maybe there was a heavy IRQ activity interfering).
      
      In any case it looks like we can really hit long periods without
      scheduling on non preemptive kernels so an explicit cond_resched() in
      shrink_node_memcg which is independent on the reclaim operation is due.
      
      Link: http://lkml.kernel.org/r/20161202095841.16648-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarBoris Zhmurov <bb@kernelpanic.ru>
      Tested-by: default avatarBoris Zhmurov <bb@kernelpanic.ru>
      Reported-by: default avatarDonald Buczek <buczek@molgen.mpg.de>
      Reported-by: default avatar"Christopher S. Aker" <caker@theshore.net>
      Reported-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd041733
    • Michal Hocko's avatar
      mm: workingset: fix NULL ptr in count_shadow_nodes · 20ab67a5
      Michal Hocko authored
      Commit 0a6b76dd ("mm: workingset: make shadow node shrinker memcg
      aware") has made the workingset shadow nodes shrinker memcg aware.  The
      implementation is not correct though because memcg_kmem_enabled() might
      become true while we are doing a global reclaim when the sc->memcg might
      be NULL which is exactly what Marek has seen:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000400
        IP: [<ffffffff8122d520>] mem_cgroup_node_nr_lru_pages+0x20/0x40
        PGD 0
        Oops: 0000 [#1] SMP
        CPU: 0 PID: 60 Comm: kswapd0 Tainted: G           O   4.8.10-12.pvops.qubes.x86_64 #1
        task: ffff880011863b00 task.stack: ffff880011868000
        RIP: mem_cgroup_node_nr_lru_pages+0x20/0x40
        RSP: e02b:ffff88001186bc70  EFLAGS: 00010293
        RAX: 0000000000000000 RBX: ffff88001186bd20 RCX: 0000000000000002
        RDX: 000000000000000c RSI: 0000000000000000 RDI: 0000000000000000
        RBP: ffff88001186bc70 R08: 28f5c28f5c28f5c3 R09: 0000000000000000
        R10: 0000000000006c34 R11: 0000000000000333 R12: 00000000000001f6
        R13: ffffffff81c6f6a0 R14: 0000000000000000 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff880013c00000(0000) knlGS:ffff880013d00000
        CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000400 CR3: 00000000122f2000 CR4: 0000000000042660
        Call Trace:
          count_shadow_nodes+0x9a/0xa0
          shrink_slab.part.42+0x119/0x3e0
          shrink_node+0x22c/0x320
          kswapd+0x32c/0x700
          kthread+0xd8/0xf0
          ret_from_fork+0x1f/0x40
        Code: 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 3b 35 dd eb b1 00 55 48 89 e5 73 2c 89 d2 31 c9 31 c0 4c 63 ce 48 0f a3 ca 73 13 <4a> 8b b4 cf 00 04 00 00 41 89 c8 4a 03 84 c6 80 00 00 00 83 c1
        RIP  mem_cgroup_node_nr_lru_pages+0x20/0x40
         RSP <ffff88001186bc70>
        CR2: 0000000000000400
        ---[ end trace 100494b9edbdfc4d ]---
      
      This patch fixes the issue by checking sc->memcg rather than
      memcg_kmem_enabled() which is sufficient because shrink_slab makes sure
      that only memcg aware shrinkers will get non-NULL memcgs and only if
      memcg_kmem_enabled is true.
      
      Fixes: 0a6b76dd ("mm: workingset: make shadow node shrinker memcg aware")
      Link: http://lkml.kernel.org/r/20161201132156.21450-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarMarek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
      Tested-by: default avatarMarek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
      Acked-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.6+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20ab67a5
  4. 02 Dec, 2016 4 commits
    • Nicolas Pitre's avatar
      kbuild: fix building bzImage with CONFIG_TRIM_UNUSED_KSYMS enabled · 86556392
      Nicolas Pitre authored
      When building a specific target such as bzImage, modules aren't normally
      built.  However if CONFIG_TRIM_UNUSED_KSYMS is enabled, no built modules
      means none of the exported symbols are used and therefore they will all
      be trimmed away from the final kernel.  A subsequent "make modules" will
      fail because modpost cannot find the needed symbols for those modules in
      the kernel binary.
      
      Let's make sure modules are also built whenever CONFIG_TRIM_UNUSED_KSYMS
      is enabled and that the kernel binary is properly rebuilt accordingly.
      Signed-off-by: default avatarNicolas Pitre <nico@linaro.org>
      Tested-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86556392
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 8dc0f265
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "This should be the last set of bugfixes for arm-soc in v4.9. None of
        these are critical regressions, but it would be nice to still get them
        merged.
      
         - On the Juno platform, the idle latency was described wrong, leading
           to suboptimal cpuidle tuning.
      
         - Also on the same platform, PCI I/O space was set up incorrectly and
           could not work.
      
         - On the sti platform, a syntactically incorrect DT entry caused
           warnings.
      
         - The newly added 'gr8' platform has somewhat confusing file names,
           which we rename for consistency"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
        arm64: dts: juno: Correct PCI IO window
        ARM: dts: STiH407-family: fix i2c nodes
        ARM: gr8: Rename the DTSI and relevant DTS
      8dc0f265
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8bca927f
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Lots more phydev and probe error path leaks in various drivers by
          Johan Hovold.
      
       2) Fix race in packet_set_ring(), from Philip Pettersson.
      
       3) Use after free in dccp_invalid_packet(), from Eric Dumazet.
      
       4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric
          Dumazet.
      
       5) When tunneling between ipv4 and ipv6 we can be left with the wrong
          skb->protocol value as we enter the IPSEC engine and this causes all
          kinds of problems. Set it before the output path does any
          dst_output() calls, from Eli Cooper.
      
       6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from
          Florian Fainelli.
      
       7) Various netfilter nat bug fixes from FLorian Westphal.
      
       8) Fix memory leak in ipvlan_link_new(), from Gao Feng.
      
       9) Locking fixes, particularly wrt. socket lookups, in l2tp from
          Guillaume Nault.
      
      10) Avoid invoking rhash teardowns in atomic context by moving netlink
          cb->done() dump completion from a worker thread. Fix from Herbert
          Xu.
      
      11) Buffer refcount problems in tun and macvtap on errors, from Jason
          Wang.
      
      12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user
          selects BBR. Fix from Julian Wollrath.
      
      13) Fix deadlock in transmit path on altera TSE driver, from Lino
          Sanfilippo.
      
      14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita
          Yushchenko.
      
      15) tc_tunnel_key needs to be properly exported to userspace via uapi,
          fix from Roi Dayan.
      
      16) rds_tcp_init_net() doesn't unregister notifier in error path, fix
          from Sowmini Varadhan.
      
      17) Stale packet header pointer access after pskb_expand_head() in
          genenve driver, fix from Sabrina Dubroca.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
        net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
        geneve: avoid use-after-free of skb->data
        tipc: check minimum bearer MTU
        net: renesas: ravb: unintialized return value
        sh_eth: remove unchecked interrupts for RZ/A1
        net: bcmgenet: Utilize correct struct device for all DMA operations
        NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
        cdc_ether: Fix handling connection notification
        ip6_offload: check segs for NULL in ipv6_gso_segment.
        RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
        Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
        ipv6: Set skb->protocol properly for local output
        ipv4: Set skb->protocol properly for local output
        packet: fix race condition in packet_set_ring
        net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
        net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
        net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
        net: ethernet: stmmac: platform: fix outdated function header
        net: ethernet: stmmac: dwmac-meson8b: fix probe error path
        net: ethernet: stmmac: dwmac-generic: fix probe error path
        ...
      8bca927f
    • Eric Dumazet's avatar
      net: avoid signed overflows for SO_{SND|RCV}BUFFORCE · b98b0bc8
      Eric Dumazet authored
      CAP_NET_ADMIN users should not be allowed to set negative
      sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
      corruptions, crashes, OOM...
      
      Note that before commit 82981930 ("net: cleanups in
      sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
      and SO_RCVBUF were vulnerable.
      
      This needs to be backported to all known linux kernels.
      
      Again, many thanks to syzkaller team for discovering this gem.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b98b0bc8