1. 01 Jul, 2015 4 commits
    • Eric Dumazet's avatar
      fs/file.c: __fget() and dup2() atomicity rules · 5ba97d28
      Eric Dumazet authored
      __fget() does lockless fetch of pointer from the descriptor
      table, attempts to grab a reference and treats "it was already
      zero" as "it's already gone from the table, we just hadn't
      seen the store, let's fail".  Unfortunately, that breaks the
      atomicity of dup2() - __fget() might see the old pointer,
      notice that it's been already dropped and treat that as
      "it's closed".  What we should be getting is either the
      old file or new one, depending whether we come before or after
      dup2().
      
      Dmitry had following test failing sometimes :
      
      int fd;
      void *Thread(void *x) {
        char buf;
        int n = read(fd, &buf, 1);
        if (n != 1)
          exit(printf("read failed: n=%d errno=%d\n", n, errno));
        return 0;
      }
      
      int main()
      {
        fd = open("/dev/urandom", O_RDONLY);
        int fd2 = open("/dev/urandom", O_RDONLY);
        if (fd == -1 || fd2 == -1)
          exit(printf("open failed\n"));
        pthread_t th;
        pthread_create(&th, 0, Thread, 0);
        if (dup2(fd2, fd) == -1)
          exit(printf("dup2 failed\n"));
        pthread_join(th, 0);
        if (close(fd) == -1)
          exit(printf("close failed\n"));
        if (close(fd2) == -1)
          exit(printf("close failed\n"));
        printf("DONE\n");
        return 0;
      }
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5ba97d28
    • Eric Dumazet's avatar
      fs/file.c: don't acquire files->file_lock in fd_install() · 8a81252b
      Eric Dumazet authored
      Mateusz Guzik reported :
      
       Currently obtaining a new file descriptor results in locking fdtable
       twice - once in order to reserve a slot and second time to fill it.
      
      Holding the spinlock in __fd_install() is needed in case a resize is
      done, or to prevent a resize.
      
      Mateusz provided an RFC patch and a micro benchmark :
        http://people.redhat.com/~mguzik/pipebench.c
      
      A resize is an unlikely operation in a process lifetime,
      as table size is at least doubled at every resize.
      
      We can use RCU instead of the spinlock.
      
      __fd_install() must wait if a resize is in progress.
      
      The resize must block new __fd_install() callers from starting,
      and wait that ongoing install are finished (synchronize_sched())
      
      resize should be attempted by a single thread to not waste resources.
      
      rcu_sched variant is used, as __fd_install() and expand_fdtable() run
      from process context.
      
      It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
      CPU E5-2696 v2 @ 2.50GHz
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMateusz Guzik <mguzik@redhat.com>
      Acked-by: default avatarMateusz Guzik <mguzik@redhat.com>
      Tested-by: default avatarMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8a81252b
    • Wang YanQing's avatar
      fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation · 1af95de6
      Wang YanQing authored
      Execution of get_anon_bdev concurrently and preemptive kernel all
      could bring race condition, it isn't enough to check dev against
      its upper limitation with equality operator only.
      
      This patch fix it.
      Signed-off-by: default avatarWang YanQing <udknight@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1af95de6
    • Carlos Maiolino's avatar
      vfs: avoid creation of inode number 0 in get_next_ino · 2adc376c
      Carlos Maiolino authored
      currently, get_next_ino() is able to create inodes with inode number = 0.
      This have a bad impact in the filesystems relying in this function to generate
      inode numbers.
      
      While there is no problem at all in having inodes with number 0, userspace tools
      which handle file management tasks can have problems handling these files, like
      for example, the impossiblity of users to delete these files, since glibc will
      ignore them. So, I believe the best way is kernel to avoid creating them.
      
      This problem has been raised previously, but the old thread didn't have any
      other update for a year+, and I've seen too many users hitting the same issue
      regarding the impossibility to delete files while using filesystems relying on
      this function. So, I'm starting the thread again, with the same patch
      that I believe is enough to address this problem.
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2adc376c
  2. 29 Jun, 2015 1 commit
  3. 23 Jun, 2015 15 commits
  4. 19 Jun, 2015 2 commits
    • David Howells's avatar
      overlayfs: Make f_path always point to the overlay and f_inode to the underlay · 4bacc9c9
      David Howells authored
      Make file->f_path always point to the overlay dentry so that the path in
      /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
      overlay as well as the underlay (path-based LSMs probably don't need it).
      
      Using my union testsuite to set things up, before the patch I see:
      
      	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
      	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
      	...
      	lr-x------. 1 root root 64 Jun  5 14:38 5 -> /a/foo107
      	[root@andromeda union-testsuite]# stat /mnt/a/foo107
      	...
      	Device: 23h/35d Inode: 13381       Links: 1
      	...
      	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
      	...
      	Device: 23h/35d Inode: 13381       Links: 1
      	...
      
      After the patch:
      
      	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
      	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
      	...
      	lr-x------. 1 root root 64 Jun  5 14:22 5 -> /mnt/a/foo107
      	[root@andromeda union-testsuite]# stat /mnt/a/foo107
      	...
      	Device: 23h/35d Inode: 40346       Links: 1
      	...
      	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
      	...
      	Device: 23h/35d Inode: 40346       Links: 1
      	...
      
      Note the change in where /proc/$$/fd/5 points to in the ls command.  It was
      pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
      (which is correct).
      
      The inode accessed, however, is the lower layer.  The union layer is on device
      25h/37d and the upper layer on 24h/36d.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4bacc9c9
    • David Howells's avatar
      overlay: Call ovl_drop_write() earlier in ovl_dentry_open() · f25801ee
      David Howells authored
      Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open()
      as we've done the copy up for which we needed the freeze-write lock by that
      point.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f25801ee
  5. 17 Jun, 2015 2 commits
  6. 16 Jun, 2015 5 commits
  7. 14 Jun, 2015 2 commits
  8. 29 May, 2015 1 commit
    • Al Viro's avatar
      d_walk() might skip too much · 2159184e
      Al Viro authored
      when we find that a child has died while we'd been trying to ascend,
      we should go into the first live sibling itself, rather than its sibling.
      
      Off-by-one in question had been introduced in "deal with deadlock in
      d_walk()" and the fix needs to be backported to all branches this one
      has been backported to.
      
      Cc: stable@vger.kernel.org # 3.2 and later
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2159184e
  9. 15 May, 2015 8 commits