1. 09 Oct, 2014 8 commits
    • Eric W. Biederman's avatar
      vfs: factor out lookup_mountpoint from new_mountpoint · e2dfa935
      Eric W. Biederman authored
      I am shortly going to add a new user of struct mountpoint that
      needs to look up existing entries but does not want to create
      a struct mountpoint if one does not exist.  Therefore to keep
      the code simple and easy to read split out lookup_mountpoint
      from new_mountpoint.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e2dfa935
    • Eric W. Biederman's avatar
      vfs: Keep a list of mounts on a mount point · 0a5eb7c8
      Eric W. Biederman authored
      To spot any possible problems call BUG if a mountpoint
      is put when it's list of mounts is not empty.
      
      AV: use hlist instead of list_head
      Reviewed-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: default avatarEric W. Biederman <ebiederman@twitter.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0a5eb7c8
    • Eric W. Biederman's avatar
      vfs: Don't allow overwriting mounts in the current mount namespace · 7af1364f
      Eric W. Biederman authored
      In preparation for allowing mountpoints to be renamed and unlinked
      in remote filesystems and in other mount namespaces test if on a dentry
      there is a mount in the local mount namespace before allowing it to
      be renamed or unlinked.
      
      The primary motivation here are old versions of fusermount unmount
      which is not safe if the a path can be renamed or unlinked while it is
      verifying the mount is safe to unmount.  More recent versions are simpler
      and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
      in a directory owned by an arbitrary user.
      
      Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good
      enough to remove concerns about new kernels mixed with old versions
      of fusermount.
      
      A secondary motivation for restrictions here is that it removing empty
      directories that have non-empty mount points on them appears to
      violate the rule that rmdir can not remove empty directories.  As
      Linus Torvalds pointed out this is useful for programs (like git) that
      test if a directory is empty with rmdir.
      
      Therefore this patch arranges to enforce the existing mount point
      semantics for local mount namespace.
      
      v2: Rewrote the test to be a drop in replacement for d_mountpoint
      v3: Use bool instead of int as the return type of is_local_mountpoint
      Reviewed-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7af1364f
    • Eric W. Biederman's avatar
      vfs: More precise tests in d_invalidate · bafc9b75
      Eric W. Biederman authored
      The current comments in d_invalidate about what and why it is doing
      what it is doing are wildly off-base.  Which is not surprising as
      the comments date back to last minute bug fix of the 2.2 kernel.
      
      The big fat lie of a comment said: If it's a directory, we can't drop
      it for fear of somebody re-populating it with children (even though
      dropping it would make it unreachable from that root, we still might
      repopulate it if it was a working directory or similar).
      
      [AV] What we really need to avoid is multiple dentry aliases of the
      same directory inode; on all filesystems that have ->d_revalidate()
      we either declare all positive dentries always valid (and thus never
      fed to d_invalidate()) or use d_materialise_unique() and/or d_splice_alias(),
      which take care of alias prevention.
      
      The current rules are:
      - To prevent mount point leaks dentries that are mount points or that
        have childrent that are mount points may not be be unhashed.
      - All dentries may be unhashed.
      - Directories may be rehashed with d_materialise_unique
      
      check_submounts_and_drop implements this already for well maintained
      remote filesystems so implement the current rules in d_invalidate
      by just calling check_submounts_and_drop.
      
      The one difference between d_invalidate and check_submounts_and_drop
      is that d_invalidate must respect it when a d_revalidate method has
      earlier called d_drop so preserve the d_unhashed check in
      d_invalidate.
      Reviewed-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      bafc9b75
    • Eric W. Biederman's avatar
      vfs: Document the effect of d_revalidate on d_find_alias · 3ccb354d
      Eric W. Biederman authored
      d_drop or check_submounts_and_drop called from d_revalidate can result
      in renamed directories with child dentries being unhashed.  These
      renamed and drop directory dentries can be rehashed after
      d_materialise_unique uses d_find_alias to find them.
      Reviewed-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3ccb354d
    • Al Viro's avatar
      delayed mntput · 9ea459e1
      Al Viro authored
      On final mntput() we want fs shutdown to happen before return to
      userland; however, the only case where we want it happen right
      there (i.e. where task_work_add won't do) is MNT_INTERNAL victim.
      Those have to be fully synchronous - failure halfway through module
      init might count on having vfsmount killed right there.  Fortunately,
      final mntput on MNT_INTERNAL vfsmounts happens on shallow stack.
      So we handle those synchronously and do an analog of delayed fput
      logics for everything else.
      
      As the result, we are guaranteed that fs shutdown will always happen
      on shallow stack.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9ea459e1
    • Ian Kent's avatar
      autofs - remove obsolete d_invalidate() from expire · b3ca406f
      Ian Kent authored
      Biederman's umount-on-rmdir series changes d_invalidate() to sumarily remove
      mounts under the passed in dentry regardless of whether they are busy
      or not. So calling this in fs/autofs4/expire.c:autofs4_tree_busy() is
      definitely the wrong thing to do becuase it will silently umount entries
      instead of just cleaning stale dentrys.
      
      But this call shouldn't be needed and testing shows that automounting
      continues to function without it.
      
      As Al Viro correctly surmises the original intent of the call was to
      perform what shrink_dcache_parent() does.
      
      If at some time in the future I see stale dentries accumulating
      following failed mounts I'll revisit the issue and possibly add a
      shrink_dcache_parent() call if needed.
      Signed-off-by: default avatarIan Kent <raven@themaw.net>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b3ca406f
    • Al Viro's avatar
      Allow sharing external names after __d_move() · 8d85b484
      Al Viro authored
      * external dentry names get a small structure prepended to them
      (struct external_name).
      * it contains an atomic refcount, matching the number of struct dentry
      instances that have ->d_name.name pointing to that external name.  The
      first thing free_dentry() does is decrementing refcount of external name,
      so the instances that are between the call of free_dentry() and
      RCU-delayed actual freeing do not contribute.
      * __d_move(x, y, false) makes the name of x equal to the name of y,
      external or not.  If y has an external name, extra reference is grabbed
      and put into x->d_name.name.  If x used to have an external name, the
      reference to the old name is dropped and, should it reach zero, freeing
      is scheduled via kfree_rcu().
      * free_dentry() in dentry with external name decrements the refcount of
      that name and, should it reach zero, does RCU-delayed call that will
      free both the dentry and external name.  Otherwise it does what it
      used to do, except that __d_free() doesn't even look at ->d_name.name;
      it simply frees the dentry.
      
      All non-RCU accesses to dentry external name are safe wrt freeing since they
      all should happen before free_dentry() is called.  RCU accesses might run
      into a dentry seen by free_dentry() or into an old name that got already
      dropped by __d_move(); however, in both cases dentry must have been
      alive and refer to that name at some point after we'd done rcu_read_lock(),
      which means that any freeing must be still pending.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8d85b484
  2. 29 Sep, 2014 1 commit
    • Al Viro's avatar
      missing data dependency barrier in prepend_name() · 6d13f694
      Al Viro authored
      AFAICS, prepend_name() is broken on SMP alpha.  Disclaimer: I don't have
      SMP alpha boxen to reproduce it on.  However, it really looks like the race
      is real.
      
      CPU1: d_path() on /mnt/ramfs/<255-character>/foo
      CPU2: mv /mnt/ramfs/<255-character> /mnt/ramfs/<63-character>
      
      CPU2 does d_alloc(), which allocates an external name, stores the name there
      including terminating NUL, does smp_wmb() and stores its address in
      dentry->d_name.name.  It proceeds to d_add(dentry, NULL) and d_move()
      old dentry over to that.  ->d_name.name value ends up in that dentry.
      
      In the meanwhile, CPU1 gets to prepend_name() for that dentry.  It fetches
      ->d_name.name and ->d_name.len; the former ends up pointing to new name
      (64-byte kmalloc'ed array), the latter - 255 (length of the old name).
      Nothing to force the ordering there, and normally that would be OK, since we'd
      run into the terminating NUL and stop.  Except that it's alpha, and we'd need
      a data dependency barrier to guarantee that we see that store of NUL
      __d_alloc() has done.  In a similar situation dentry_cmp() would survive; it
      does explicit smp_read_barrier_depends() after fetching ->d_name.name.
      prepend_name() doesn't and it risks walking past the end of kmalloc'ed object
      and possibly oops due to taking a page fault in kernel mode.
      
      Cc: stable@vger.kernel.org # 3.12+
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6d13f694
  3. 28 Sep, 2014 1 commit
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 1e3827bf
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "Assorted fixes + unifying __d_move() and __d_materialise_dentry() +
        minimal regression fix for d_path() of victims of overwriting rename()
        ported on top of that"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        vfs: Don't exchange "short" filenames unconditionally.
        fold swapping ->d_name.hash into switch_names()
        fold unlocking the children into dentry_unlock_parents_for_move()
        kill __d_materialise_dentry()
        __d_materialise_dentry(): flip the order of arguments
        __d_move(): fold manipulations with ->d_child/->d_subdirs
        don't open-code d_rehash() in d_materialise_unique()
        pull rehashing and unlocking the target dentry into __d_materialise_dentry()
        ufs: deal with nfsd/iget races
        fuse: honour max_read and max_write in direct_io mode
        shmem: fix nlink for rename overwrite directory
      1e3827bf
  4. 27 Sep, 2014 15 commits
  5. 26 Sep, 2014 14 commits
  6. 25 Sep, 2014 1 commit
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · f4cb707e
      Linus Torvalds authored
      Pull ACPI and power management fixes from Rafael Wysocki:
       "These are regression fixes (ACPI hotplug, cpufreq, hibernation, ACPI
        LPSS driver), fixes for stuff that never worked correctly (ACPI GPIO
        support in some cases and a wrong sign of an error code in the ACPI
        core in one place), and one blacklist item for ACPI backlight
        handling.
      
        Specifics:
      
         - Revert of a recent hibernation core commit that introduced a NULL
           pointer dereference during resume for at least one user (Rafael J
           Wysocki).
      
         - Fix for the ACPI LPSS (Low-Power Subsystem) driver to disable
           asynchronous PM callback execution for LPSS devices during system
           suspend/resume (introduced in 3.16) which turns out to break
           ordering expectations on some systems.  From Fu Zhonghui.
      
         - cpufreq core fix related to the handling of sysfs nodes during
           system suspend/resume that has been broken for intel_pstate since
           3.15 from Lan Tianyu.
      
         - Restore the generation of "online" uevents for ACPI container
           devices that was removed in 3.14, but some user space utilities
           turn out to need them (Rafael J Wysocki).
      
         - The cpufreq core fails to release a lock in an error code path
           after changes made in 3.14.  Fix from Prarit Bhargava.
      
         - ACPICA and ACPI/GPIO fixes to make the handling of ACPI GPIO
           operation regions (which means AML using GPIOs) work correctly in
           all cases from Bob Moore and Srinivas Pandruvada.
      
         - Fix for a wrong sign of the ACPI core's create_modalias() return
           value in case of an error from Mika Westerberg.
      
         - ACPI backlight blacklist entry for ThinkPad X201s from Aaron Lu"
      
      * tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"
        gpio / ACPI: Use pin index and bit length
        ACPICA: Update to GPIO region handler interface.
        ACPI / platform / LPSS: disable async suspend/resume of LPSS devices
        cpufreq: release policy->rwsem on error
        cpufreq: fix cpufreq suspend/resume for intel_pstate
        ACPI / scan: Correct error return value of create_modalias()
        ACPI / video: disable native backlight for ThinkPad X201s
        ACPI / hotplug: Generate online uevents for ACPI containers
      f4cb707e