1. 29 Jul, 2016 40 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · a867d734
      Linus Torvalds authored
      Pull userns vfs updates from Eric Biederman:
       "This tree contains some very long awaited work on generalizing the
        user namespace support for mounting filesystems to include filesystems
        with a backing store.  The real world target is fuse but the goal is
        to update the vfs to allow any filesystem to be supported.  This
        patchset is based on a lot of code review and testing to approach that
        goal.
      
        While looking at what is needed to support the fuse filesystem it
        became clear that there were things like xattrs for security modules
        that needed special treatment.  That the resolution of those concerns
        would not be fuse specific.  That sorting out these general issues
        made most sense at the generic level, where the right people could be
        drawn into the conversation, and the issues could be solved for
        everyone.
      
        At a high level what this patchset does a couple of simple things:
      
         - Add a user namespace owner (s_user_ns) to struct super_block.
      
         - Teach the vfs to handle filesystem uids and gids not mapping into
           to kuids and kgids and being reported as INVALID_UID and
           INVALID_GID in vfs data structures.
      
        By assigning a user namespace owner filesystems that are mounted with
        only user namespace privilege can be detected.  This allows security
        modules and the like to know which mounts may not be trusted.  This
        also allows the set of uids and gids that are communicated to the
        filesystem to be capped at the set of kuids and kgids that are in the
        owning user namespace of the filesystem.
      
        One of the crazier corner casees this handles is the case of inodes
        whose i_uid or i_gid are not mapped into the vfs.  Most of the code
        simply doesn't care but it is easy to confuse the inode writeback path
        so no operation that could cause an inode write-back is permitted for
        such inodes (aka only reads are allowed).
      
        This set of changes starts out by cleaning up the code paths involved
        in user namespace permirted mounts.  Then when things are clean enough
        adds code that cleanly sets s_user_ns.  Then additional restrictions
        are added that are possible now that the filesystem superblock
        contains owner information.
      
        These changes should not affect anyone in practice, but there are some
        parts of these restrictions that are changes in behavior.
      
         - Andy's restriction on suid executables that does not honor the
           suid bit when the path is from another mount namespace (think
           /proc/[pid]/fd/) or when the filesystem was mounted by a less
           privileged user.
      
         - The replacement of the user namespace implicit setting of MNT_NODEV
           with implicitly setting SB_I_NODEV on the filesystem superblock
           instead.
      
           Using SB_I_NODEV is a stronger form that happens to make this state
           user invisible.  The user visibility can be managed but it caused
           problems when it was introduced from applications reasonably
           expecting mount flags to be what they were set to.
      
        There is a little bit of work remaining before it is safe to support
        mounting filesystems with backing store in user namespaces, beyond
        what is in this set of changes.
      
         - Verifying the mounter has permission to read/write the block device
           during mount.
      
         - Teaching the integrity modules IMA and EVM to handle filesystems
           mounted with only user namespace root and to reduce trust in their
           security xattrs accordingly.
      
         - Capturing the mounters credentials and using that for permission
           checks in d_automount and the like.  (Given that overlayfs already
           does this, and we need the work in d_automount it make sense to
           generalize this case).
      
        Furthermore there are a few changes that are on the wishlist:
      
         - Get all filesystems supporting posix acls using the generic posix
           acls so that posix_acl_fix_xattr_from_user and
           posix_acl_fix_xattr_to_user may be removed.  [Maintainability]
      
         - Reducing the permission checks in places such as remount to allow
           the superblock owner to perform them.
      
         - Allowing the superblock owner to chown files with unmapped uids and
           gids to something that is mapped so the files may be treated
           normally.
      
        I am not considering even obvious relaxations of permission checks
        until it is clear there are no more corner cases that need to be
        locked down and handled generically.
      
        Many thanks to Seth Forshee who kept this code alive, and putting up
        with me rewriting substantial portions of what he did to handle more
        corner cases, and for his diligent testing and reviewing of my
        changes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
        fs: Call d_automount with the filesystems creds
        fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
        evm: Translate user/group ids relative to s_user_ns when computing HMAC
        dquot: For now explicitly don't support filesystems outside of init_user_ns
        quota: Handle quota data stored in s_user_ns in quota_setxquota
        quota: Ensure qids map to the filesystem
        vfs: Don't create inodes with a uid or gid unknown to the vfs
        vfs: Don't modify inodes with a uid or gid unknown to the vfs
        cred: Reject inodes with invalid ids in set_create_file_as()
        fs: Check for invalid i_uid in may_follow_link()
        vfs: Verify acls are valid within superblock's s_user_ns.
        userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
        fs: Refuse uid/gid changes which don't map into s_user_ns
        selinux: Add support for unprivileged mounts from user namespaces
        Smack: Handle labels consistently in untrusted mounts
        Smack: Add support for unprivileged mounts from user namespaces
        fs: Treat foreign mounts as nosuid
        fs: Limit file caps to the user namespace of the super block
        userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
        userns: Remove implicit MNT_NODEV fragility.
        ...
      a867d734
    • Linus Torvalds's avatar
      Merge tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 601f887d
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Fix a nasty (and really hard to debug) memory corruption during resume
        from hibernation on x86-64 (that leads to a kernel panic most of the
        time) due to the use of a stale stack pointer value in FRAME_BEGIN
        (Josh Poimboeuf)"
      
      * tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        x86/power/64: Fix hibernation return address corruption
      601f887d
    • Linus Torvalds's avatar
      Merge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 574c7e23
      Linus Torvalds authored
      Pull more cgroup updates from Tejun Heo:
       "I forgot to include the patches which got applied to for-4.7-fixes
        late during last cycle.
      
        Eric's three patches fix bugs introduced with the namespace support"
      
      * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroupns: Only allow creation of hierarchies in the initial cgroup namespace
        cgroupns: Close race between cgroup_post_fork and copy_cgroup_ns
        cgroupns: Fix the locking in copy_cgroup_ns
      574c7e23
    • Linus Torvalds's avatar
      Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a6408f6c
      Linus Torvalds authored
      Pull smp hotplug updates from Thomas Gleixner:
       "This is the next part of the hotplug rework.
      
         - Convert all notifiers with a priority assigned
      
         - Convert all CPU_STARTING/DYING notifiers
      
           The final removal of the STARTING/DYING infrastructure will happen
           when the merge window closes.
      
        Another 700 hundred line of unpenetrable maze gone :)"
      
      * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
        timers/core: Correct callback order during CPU hot plug
        leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
        powerpc/numa: Convert to hotplug state machine
        arm/perf: Fix hotplug state machine conversion
        irqchip/armada: Avoid unused function warnings
        ARC/time: Convert to hotplug state machine
        clocksource/atlas7: Convert to hotplug state machine
        clocksource/armada-370-xp: Convert to hotplug state machine
        clocksource/exynos_mct: Convert to hotplug state machine
        clocksource/arm_global_timer: Convert to hotplug state machine
        rcu: Convert rcutree to hotplug state machine
        KVM/arm/arm64/vgic-new: Convert to hotplug state machine
        smp/cfd: Convert core to hotplug state machine
        x86/x2apic: Convert to CPU hotplug state machine
        profile: Convert to hotplug state machine
        timers/core: Convert to hotplug state machine
        hrtimer: Convert to hotplug state machine
        x86/tboot: Convert to hotplug state machine
        arm64/armv8 deprecated: Convert to hotplug state machine
        hwtracing/coresight-etm4x: Convert to hotplug state machine
        ...
      a6408f6c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide · 1a81a8f2
      Linus Torvalds authored
      Pull IDE updates from David Miller:
       "Just a couple small bug fixes, nothing overly exciting in here"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
        ide: missing break statement in set_timings_mdma()
        ide: hpt366: fix incorrect mask when checking at cmd_high_time
        ide-tape: fix misprint in failure handling in idetape_init()
        cmd640: add __init attribute
      1a81a8f2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 86505fc0
      Linus Torvalds authored
      Pull sparc updates from David Miller:
      
       1) Double spin lock bug in sunhv serial driver, from Dan Carpenter.
      
       2) Use correct RSS estimate when determining whether to grow the huge
          TSB or not, from Mike Kravetz.
      
       3) Don't use full three level page tables for hugepages, PMD level is
          sufficient.  From Nitin Gupta.
      
       4) Mask out extraneous bits from TSB_TAG_ACCESS register, we only want
          the address bits.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Trim page tables for 8M hugepages
        sparc64 mm: Fix base TSB sizing when hugetlb pages are used
        sparc: serial: sunhv: fix a double lock bug
        sparc32: off by ones in BUG_ON()
        sparc: Don't leak context bits into thread->fault_address
      86505fc0
    • Linus Torvalds's avatar
      Merge tag 'arc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 9d3bc3d4
      Linus Torvalds authored
      Pull ARC updates from Vineet Gupta:
       "Things have been calm here - nothing much except for a few fixes"
      
      * tag 'arc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: mm: don't loose PTE_SPECIAL in pte_modify()
        ARC: dma: fix address translation in arc_dma_free
        ARC: typo fix in mm/ioremap.c
        ARC: fix linux-next build breakage
      9d3bc3d4
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-sleep' · e148d0f8
      Rafael J. Wysocki authored
      * pm-sleep:
        x86/power/64: Fix hibernation return address corruption
      e148d0f8
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 · befff3bf
      Linus Torvalds authored
      Pull AVR32 updates from Hans-Christian Noren Egtvedt.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
        avr32: off by one in at32_init_pio()
        avr32: fixup code style in unistd.h and syscall_table.S
        avr32: wire up preadv2 and pwritev2 syscalls
      befff3bf
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · b5f00d18
      Linus Torvalds authored
      Pull ARM updates from Russell King:
       "Included in this update are:
      
         - Patches from Gregory Clement to fix the coherent DMA cases in our
           dma-mapping code.
      
         - A number of CPU errata updates and fixes.
      
         - ARM cpuidle improvements from Jisheng Zhang.
      
         - Fix from Kees for the location of _etext.
      
         - Cleanups from Masahiro Yamada to avoid duplicated messages during
           the kernel build, and remove CONFIG_ARCH_HAS_BARRIERS.
      
         - Remove a udelay loop limitation, allowing for faster CPUs to
           calibrate the delay correctly.
      
         - Cleanup some left-overs from the SW PAN implementation.
      
         - Ensure that a modified address limit is not visible to exception
           handlers"
      
      * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (21 commits)
        ARM: 8586/1: cpuidle: make arm_cpuidle_suspend() a bit more efficient
        ARM: 8585/1: cpuidle: fix !cpuidle_ops[cpu].init case during init
        ARM: 8561/4: dma-mapping: Fix the coherent case when iommu is used
        ARM: 8561/3: dma-mapping: Don't use outer_flush_range when the L2C is coherent
        ARM: 8560/1: errata: Workaround errata A12 825619 / A17 852421
        ARM: 8559/1: errata: Workaround erratum A12 821420
        ARM: 8558/1: errata: Workaround errata A12 818325/852422 A17 852423
        ARM: save and reset the address limit when entering an exception
        ARM: 8577/1: Fix Cortex-A15 798181 errata initialization
        ARM: 8584/1: floppy: avoid gcc-6 warning
        ARM: 8583/1: mm: fix location of _etext
        ARM: 8582/1: remove unused CONFIG_ARCH_HAS_BARRIERS
        ARM: 8306/1: loop_udelay: remove bogomips value limitation
        ARM: 8581/1: add missing <asm/prom.h> to arch/arm/kernel/devtree.c
        ARM: 8576/1: avoid duplicating "Kernel: arch/arm/boot/*Image is ready"
        ARM: 8556/1: on a generic DT system: do not touch l2x0
        ARM: uaccess: remove put_user() code duplication
        ARM: 8580/1: Remove orphaned __addr_ok() definition
        ARM: get rid of horrible *(unsigned int *)(regs + 1)
        ARM: introduce svc_pt_regs structure
        ...
      b5f00d18
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 27ae0c41
      Linus Torvalds authored
      Pull fuse updates from Miklos Szeredi:
       "This fixes error propagation from writeback to fsync/close for
        writeback cache mode as well as adding a missing capability flag to
        the INIT message.  The rest are cleanups.
      
        (The commits are recent but all the code actually sat in -next for a
        while now.  The recommits are due to conflict avoidance and the
        addition of Cc: stable@...)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: use filemap_check_errors()
        mm: export filemap_check_errors() to modules
        fuse: fix wrong assignment of ->flags in fuse_send_init()
        fuse: fuse_flush must check mapping->flags for errors
        fuse: fsync() did not return IO errors
        fuse: don't mess with blocking signals
        new helper: wait_event_killable_exclusive()
        fuse: improve aio directIO write performance for size extending writes
      27ae0c41
    • Linus Torvalds's avatar
      Revert "vfs: add lookup_hash() helper" · 20d00ee8
      Linus Torvalds authored
      This reverts commit 3c9fe8cd.
      
      As Miklos points out in commit c1b2cc1a, the "lookup_hash()" helper
      is now unused, and in fact, with the hash salting changes, since the
      hash of a dentry name now depends on the directory dentry it is in, the
      helper function isn't even really likely to be useful.
      
      So rather than keep it around in case somebody else might end up finding
      a use for it, let's just remove the helper and not trick people into
      thinking it might be a useful thing.
      
      For example, I had obviously completely missed how the helper didn't
      follow the normal dentry hashing patterns, and how the hash salting
      patch broke overlayfs.  Things would quietly build and look sane, but
      not work.
      Suggested-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20d00ee8
    • Linus Torvalds's avatar
      Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · e7b4f2d8
      Linus Torvalds authored
      Pull overlayfs update from Miklos Szeredi:
       "First of all, this fixes a regression in overlayfs introduced by the
        dentry hash salting.  I've moved the patch fixing this to the front of
        the queue, so if (god forbid) something needs to be bisected in
        overlayfs this regression won't interfere with that.
      
        The biggest part is preparation for selinux support, done by Vivek
        Goyal.  Essentially this makes all operations on underlying
        filesystems be done with credentials of mounter.  This makes
        everything nicely consistent.
      
        There are also fixes for a number of known and recently discovered
        non-standard behavior (thanks to Eryu Guan for testing and improving
        the test suites)"
      
      * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (23 commits)
        ovl: simplify empty checking
        qstr: constify instances in overlayfs
        ovl: clear nlink on rmdir
        ovl: disallow overlayfs as upperdir
        ovl: fix warning
        ovl: remove duplicated include from super.c
        ovl: append MAY_READ when diluting write checks
        ovl: dilute permission checks on lower only if not special file
        ovl: fix POSIX ACL setting
        ovl: share inode for hard link
        ovl: store real inode pointer in ->i_private
        ovl: permission: return ECHILD instead of ENOENT
        ovl: update atime on upper
        ovl: fix sgid on directory
        ovl: simplify permission checking
        ovl: do not require mounter to have MAY_WRITE on lower
        ovl: do operations on underlying file system in mounter's context
        ovl: modify ovl_permission() to do checks on two inodes
        ovl: define ->get_acl() for overlay inodes
        ovl: move some common code in a function
        ...
      e7b4f2d8
    • Linus Torvalds's avatar
      Merge tag 'freevxfs-for-4.8' of git://git.infradead.org/users/hch/freevxfs · 0a7736d0
      Linus Torvalds authored
      Pull freevxfs updates from Christoph Hellwig:
       "Support for foreign endianess and HP-UP superblocks from
        Krzysztof Błaszkowski"
      
      * tag 'freevxfs-for-4.8' of git://git.infradead.org/users/hch/freevxfs:
        freevxfs: update Kconfig information
        freevxfs: refactor readdir and lookup code
        freevxfs: fix lack of inode initialization
        freevxfs: fix memory leak in vxfs_read_fshead()
        freevxfs: update documentation and cresdits for HP-UX support
        freevxfs: implement ->alloc_inode and ->destroy_inode
        freevxfs: avoid the need for forward declaring the super operations
        freevxfs: move VFS inode allocation into vxfs_blkiget and vxfs_stiget
        freevxfs: remove vxfs_put_fake_inode
        freevxfs: handle big endian HP-UX file systems
      0a7736d0
    • Linus Torvalds's avatar
      Merge tag 'configfs-for-4.8' of git://git.infradead.org/users/hch/configfs · a54809f1
      Linus Torvalds authored
      Pull configfs update from Christoph Hellwig:
       "A simple error handling fix from Tal Shorer"
      
      * tag 'configfs-for-4.8' of git://git.infradead.org/users/hch/configfs:
        configfs: don't set buffer_needs_fill to zero if show() returns error
      a54809f1
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · b0c4e2ac
      Linus Torvalds authored
      Pull CIFS/SMB3 fixes from Steve French:
       "Various CIFS/SMB3 fixes, most for stable"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: Fix a possible invalid memory access in smb2_query_symlink()
        fs/cifs: make share unaccessible at root level mountable
        cifs: fix crash due to race in hmac(md5) handling
        cifs: unbreak TCP session reuse
        cifs: Check for existing directory when opening file with O_CREAT
        Add MF-Symlinks support for SMB 2.0
      b0c4e2ac
    • Nitin Gupta's avatar
      sparc64: Trim page tables for 8M hugepages · 7bc3777c
      Nitin Gupta authored
      For PMD aligned (8M) hugepages, we currently allocate
      all four page table levels which is wasteful. We now
      allocate till PMD level only which saves memory usage
      from page tables.
      
      Also, when freeing page table for 8M hugepage backed region,
      make sure we don't try to access non-existent PTE level.
      
      Orabug: 22630259
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7bc3777c
    • Miklos Szeredi's avatar
      fuse: use filemap_check_errors() · 4a7f4e88
      Miklos Szeredi authored
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      4a7f4e88
    • Miklos Szeredi's avatar
      mm: export filemap_check_errors() to modules · d72d9e2a
      Miklos Szeredi authored
      Can be used by fuse, btrfs and f2fs to replace opencoded variants.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      d72d9e2a
    • Wei Fang's avatar
      fuse: fix wrong assignment of ->flags in fuse_send_init() · 9446385f
      Wei Fang authored
      FUSE_HAS_IOCTL_DIR should be assigned to ->flags, it may be a typo.
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 69fe05c9 ("fuse: add missing INIT flags")
      Cc: <stable@vger.kernel.org>
      9446385f
    • Maxim Patlasov's avatar
      fuse: fuse_flush must check mapping->flags for errors · 9ebce595
      Maxim Patlasov authored
      fuse_flush() calls write_inode_now() that triggers writeback, but actual
      writeback will happen later, on fuse_sync_writes(). If an error happens,
      fuse_writepage_end() will set error bit in mapping->flags. So, we have to
      check mapping->flags after fuse_sync_writes().
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 4d99ff8f ("fuse: Turn writeback cache on")
      Cc: <stable@vger.kernel.org> # v3.15+
      9ebce595
    • Alexey Kuznetsov's avatar
      fuse: fsync() did not return IO errors · ac7f052b
      Alexey Kuznetsov authored
      Due to implementation of fuse writeback filemap_write_and_wait_range() does
      not catch errors. We have to do this directly after fuse_sync_writes()
      Signed-off-by: default avatarAlexey Kuznetsov <kuznet@virtuozzo.com>
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 4d99ff8f ("fuse: Turn writeback cache on")
      Cc: <stable@vger.kernel.org> # v3.15+
      ac7f052b
    • Josh Poimboeuf's avatar
      x86/power/64: Fix hibernation return address corruption · 4ce827b4
      Josh Poimboeuf authored
      In kernel bug 150021, a kernel panic was reported when restoring a
      hibernate image.  Only a picture of the oops was reported, so I can't
      paste the whole thing here.  But here are the most interesting parts:
      
        kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
        BUG: unable to handle kernel paging request at ffff8804615cfd78
        ...
        RIP: ffff8804615cfd78
        RSP: ffff8804615f0000
        RBP: ffff8804615cfdc0
        ...
        Call Trace:
         do_signal+0x23
         exit_to_usermode_loop+0x64
         ...
      
      The RIP is on the same page as RBP, so it apparently started executing
      on the stack.
      
      The bug was bisected to commit ef0f3ed5 (x86/asm/power: Create
      stack frames in hibernate_asm_64.S), which in retrospect seems quite
      dangerous, since that code saves and restores the stack pointer from a
      global variable ('saved_context').
      
      There are a lot of moving parts in the hibernate save and restore paths,
      so I don't know exactly what caused the panic.  Presumably, a FRAME_END
      was executed without the corresponding FRAME_BEGIN, or vice versa.  That
      would corrupt the return address on the stack and would be consistent
      with the details of the above panic.
      
      [ rjw: One major problem is that by the time the FRAME_BEGIN in
        restore_registers() is executed, the stack pointer value may not
        be valid any more.  Namely, the stack area pointed to by it
        previously may have been overwritten by some image memory contents
        and that page frame may now be used for whatever different purpose
        it had been allocated for before hibernation.  In that case, the
        FRAME_BEGIN will corrupt that memory. ]
      
      Instead of doing the frame pointer save/restore around the bounds of the
      affected functions, just do it around the call to swsusp_save().
      
      That has the same effect of ensuring that if swsusp_save() sleeps, the
      frame pointers will be correct.  It's also a much more obviously safe
      way to do it than the original patch.  And objtool still doesn't report
      any warnings.
      
      Fixes: ef0f3ed5 (x86/asm/power: Create stack frames in hibernate_asm_64.S)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021
      Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
      Reported-by: default avatarAndre Reinke <andre.reinke@mailbox.org>
      Tested-by: default avatarAndre Reinke <andre.reinke@mailbox.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4ce827b4
    • Miklos Szeredi's avatar
      ovl: simplify empty checking · 30c17ebf
      Miklos Szeredi authored
      The empty checking logic is duplicated in ovl_check_empty_and_clear() and
      ovl_remove_and_whiteout(), except the condition for clearing whiteouts is
      different:
      
      ovl_check_empty_and_clear() checked for being upper
      
      ovl_remove_and_whiteout() checked for merge OR lower
      
      Move the intersection of those checks (upper AND merge) into
      ovl_check_empty_and_clear() and simplify ovl_remove_and_whiteout().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      30c17ebf
    • Al Viro's avatar
      qstr: constify instances in overlayfs · 29c42e80
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      29c42e80
    • Miklos Szeredi's avatar
      ovl: clear nlink on rmdir · dbc816d0
      Miklos Szeredi authored
      To make delete notification work on fa/inotify.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      dbc816d0
    • Miklos Szeredi's avatar
      ovl: disallow overlayfs as upperdir · 76bc8e28
      Miklos Szeredi authored
      This does not work and does not make sense.  So instead of fixing it
      (probably not hard) just disallow.
      Reported-by: default avatarAndrei Vagin <avagin@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org>
      76bc8e28
    • Miklos Szeredi's avatar
      ovl: fix warning · 656189d2
      Miklos Szeredi authored
      There's a superfluous newline in the warning message in ovl_d_real().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      656189d2
    • Wei Yongjun's avatar
      ovl: remove duplicated include from super.c · 5f215013
      Wei Yongjun authored
      Remove duplicated include.
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5f215013
    • Vivek Goyal's avatar
      ovl: append MAY_READ when diluting write checks · 500cac3c
      Vivek Goyal authored
      Right now we remove MAY_WRITE/MAY_APPEND bits from mask if realfile is on
      lower/. This is done as files on lower will never be written and will be
      copied up. But to copy up a file, mounter should have MAY_READ permission
      otherwise copy up will fail. So set MAY_READ in mask when MAY_WRITE is
      reset.
      
      Dan Walsh noticed this when he did access(lowerfile, W_OK) and it returned
      True (context mounts) but when he tried to actually write to file, it
      failed as mounter did not have permission on lower file.
      
      [SzM] don't set MAY_READ if only MAY_APPEND is set without MAY_WRITE; this
      won't trigger a copy-up.
      Reported-by: default avatarDan Walsh <dwalsh@redhat.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      500cac3c
    • Vivek Goyal's avatar
      ovl: dilute permission checks on lower only if not special file · e29841a0
      Vivek Goyal authored
      Right now if file is on lower/, we remove MAY_WRITE/MAY_APPEND bits from
      mask as lower/ will never be written and file will be copied up. But this
      is not true for special files. These files are not copied up and are opened
      in place. So don't dilute the checks for these types of files.
      Reported-by: default avatarDan Walsh <dwalsh@redhat.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e29841a0
    • Miklos Szeredi's avatar
      ovl: fix POSIX ACL setting · d837a49b
      Miklos Szeredi authored
      Setting POSIX ACL needs special handling:
      
      1) Some permission checks are done by ->setxattr() which now uses mounter's
      creds ("ovl: do operations on underlying file system in mounter's
      context").  These permission checks need to be done with current cred as
      well.
      
      2) Setting ACL can fail for various reasons.  We do not need to copy up in
      these cases.
      
      In the mean time switch to using generic_setxattr.
      
      [Arnd Bergmann] Fix link error without POSIX ACL. posix_acl_from_xattr()
      doesn't have a 'static inline' implementation when CONFIG_FS_POSIX_ACL is
      disabled, and I could not come up with an obvious way to do it.
      
      This instead avoids the link error by defining two sets of ACL operations
      and letting the compiler drop one of the two at compile time depending
      on CONFIG_FS_POSIX_ACL. This avoids all references to the ACL code,
      also leading to smaller code.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      d837a49b
    • Miklos Szeredi's avatar
      ovl: share inode for hard link · 51f7e52d
      Miklos Szeredi authored
      Inode attributes are copied up to overlay inode (uid, gid, mode, atime,
      mtime, ctime) so generic code using these fields works correcty.  If a hard
      link is created in overlayfs separate inodes are allocated for each link.
      If chmod/chown/etc. is performed on one of the links then the inode
      belonging to the other ones won't be updated.
      
      This patch attempts to fix this by sharing inodes for hard links.
      
      Use inode hash (with real inode pointer as a key) to make sure overlay
      inodes are shared for hard links on upper.  Hard links on lower are still
      split (which is not user observable until the copy-up happens, see
      Documentation/filesystems/overlayfs.txt under "Non-standard behavior").
      
      The inode is only inserted in the hash if it is non-directoy and upper.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      51f7e52d
    • Miklos Szeredi's avatar
      ovl: store real inode pointer in ->i_private · 39b681f8
      Miklos Szeredi authored
      To get from overlay inode to real inode we currently use 'struct
      ovl_entry', which has lifetime connected to overlay dentry.  This is okay,
      since each overlay dentry had a new overlay inode allocated.
      
      Following patch will break that assumption, so need to leave out ovl_entry.
      This patch stores the real inode directly in i_private, with the lowest bit
      used to indicate whether the inode is upper or lower.
      
      Lifetime rules remain, using ovl_inode_real() must only be done while
      caller holds ref on overlay dentry (and hence on real dentry), or within
      RCU protected regions.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      39b681f8
    • Miklos Szeredi's avatar
      ovl: permission: return ECHILD instead of ENOENT · a999d7e1
      Miklos Szeredi authored
      The error is due to RCU and is temporary.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      a999d7e1
    • Miklos Szeredi's avatar
      ovl: update atime on upper · d719e8f2
      Miklos Szeredi authored
      Fix atime update logic in overlayfs.
      
      This patch adds an i_op->update_time() handler to overlayfs inodes.  This
      forwards atime updates to the upper layer only.  No atime updates are done
      on lower layers.
      
      Remove implicit atime updates to underlying files and directories with
      O_NOATIME.  Remove explicit atime update in ovl_readlink().
      
      Clear atime related mnt flags from cloned upper mount.  This means atime
      updates are controlled purely by overlayfs mount options.
      
      Reported-by: Konstantin Khlebnikov <koct9i@gmail.com> 
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      d719e8f2
    • Miklos Szeredi's avatar
      ovl: fix sgid on directory · bb0d2b8a
      Miklos Szeredi authored
      When creating directory in workdir, the group/sgid inheritance from the
      parent dir was omitted completely.  Fix this by calling inode_init_owner()
      on overlay inode and using the resulting uid/gid/mode to create the file.
      
      Unfortunately the sgid bit can be stripped off due to umask, so need to
      reset the mode in this case in workdir before moving the directory in
      place.
      Reported-by: default avatarEryu Guan <eguan@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      bb0d2b8a
    • Miklos Szeredi's avatar
      ovl: simplify permission checking · 9c630ebe
      Miklos Szeredi authored
      The fact that we always do permission checking on the overlay inode and
      clear MAY_WRITE for checking access to the lower inode allows cruft to be
      removed from ovl_permission().
      
      1) "default_permissions" option effectively did generic_permission() on the
      overlay inode with i_mode, i_uid and i_gid updated from underlying
      filesystem.  This is what we do by default now.  It did the update using
      vfs_getattr() but that's only needed if the underlying filesystem can
      change (which is not allowed).  We may later introduce a "paranoia_mode"
      that verifies that mode/uid/gid are not changed.
      
      2) splitting out the IS_RDONLY() check from inode_permission() also becomes
      unnecessary once we remove the MAY_WRITE from the lower inode check.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      9c630ebe
    • Vivek Goyal's avatar
      ovl: do not require mounter to have MAY_WRITE on lower · 754f8cb7
      Vivek Goyal authored
      Now we have two levels of checks in ovl_permission(). overlay inode
      is checked with the creds of task while underlying inode is checked
      with the creds of mounter.
      
      Looks like mounter does not have to have WRITE access to files on lower/.
      So remove the MAY_WRITE from access mask for checks on underlying
      lower inode.
      
      This means task should still have the MAY_WRITE permission on lower
      inode and mounter is not required to have MAY_WRITE.
      
      It also solves the problem of read only NFS mounts being used as lower.
      If __inode_permission(lower_inode, MAY_WRITE) is called on read only
      NFS, it fails. By resetting MAY_WRITE, check succeeds and case of
      read only NFS shold work with overlay without having to specify any
      special mount options (default permission).
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      754f8cb7
    • Vivek Goyal's avatar
      ovl: do operations on underlying file system in mounter's context · 1175b6b8
      Vivek Goyal authored
      Given we are now doing checks both on overlay inode as well underlying
      inode, we should be able to do checks and operations on underlying file
      system using mounter's context.
      
      So modify all operations to do checks/operations on underlying dentry/inode
      in the context of mounter.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      1175b6b8