1. 07 Jan, 2011 33 commits
    • Nick Piggin's avatar
      fs: consolidate dentry kill sequence · 77812a1e
      Nick Piggin authored
      The tricky locking for disposing of a dentry is duplicated 3 times in the
      dcache (dput, pruning a dentry from the LRU, and pruning its ancestors).
      Consolidate them all into a single function dentry_kill.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      77812a1e
    • Nick Piggin's avatar
      ec33679d
    • Nick Piggin's avatar
      be182bff
    • Nick Piggin's avatar
      fs: dcache reduce prune_one_dentry locking · 89e60548
      Nick Piggin authored
      prune_one_dentry can avoid quite a bit of locking in the common case where
      ancestors have an elevated refcount. Alternatively, we could have gone the
      other way and made fewer trylocks in the case where d_count goes to zero, but
      is probably less common.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      89e60548
    • Nick Piggin's avatar
      fs: dcache reduce d_parent locking · a734eb45
      Nick Piggin authored
      Use RCU to simplify locking in dget_parent.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      a734eb45
    • Nick Piggin's avatar
      fs: dcache rationalise dget variants · dc0474be
      Nick Piggin authored
      dget_locked was a shortcut to avoid the lazy lru manipulation when we already
      held dcache_lock (lru manipulation was relatively cheap at that point).
      However, how that the lru lock is an innermost one, we never hold it at any
      caller, so the lock cost can now be avoided. We already have well working lazy
      dcache LRU, so it should be fine to defer LRU manipulations to scan time.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      dc0474be
    • Nick Piggin's avatar
      fs: dcache reduce dcache_inode_lock · 357f8e65
      Nick Piggin authored
      dcache_inode_lock can be avoided in d_delete() and d_materialise_unique()
      in cases where it is not required.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      357f8e65
    • Nick Piggin's avatar
      fs: dcache reduce locking in d_alloc · 89ad485f
      Nick Piggin authored
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      89ad485f
    • Nick Piggin's avatar
      fs: dcache reduce dput locking · 61f3dee4
      Nick Piggin authored
      It is possible to run dput without taking data structure locks up-front. In
      many cases where we don't kill the dentry anyway, these locks are not required.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      61f3dee4
    • Nick Piggin's avatar
      fs: dcache avoid starvation in dcache multi-step operations · 58db63d0
      Nick Piggin authored
      Long lived dcache "multi-step" operations which retry on rename seq can
      be starved with a lot of rename activity. If they fail after the 1st pass,
      take the rename_lock for writing to avoid further starvation.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      58db63d0
    • Nick Piggin's avatar
      fs: dcache remove dcache_lock · b5c84bf6
      Nick Piggin authored
      dcache_lock no longer protects anything. remove it.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      b5c84bf6
    • Nick Piggin's avatar
      fs: Use rename lock and RCU for multi-step operations · 949854d0
      Nick Piggin authored
      The remaining usages for dcache_lock is to allow atomic, multi-step read-side
      operations over the directory tree by excluding modifications to the tree.
      Also, to walk in the leaf->root direction in the tree where we don't have
      a natural d_lock ordering.
      
      This could be accomplished by taking every d_lock, but this would mean a
      huge number of locks and actually gets very tricky.
      
      Solve this instead by using the rename seqlock for multi-step read-side
      operations, retry in case of a rename so we don't walk up the wrong parent.
      Concurrent dentry insertions are not serialised against.  Concurrent deletes
      are tricky when walking up the directory: our parent might have been deleted
      when dropping locks so also need to check and retry for that.
      
      We can also use the rename lock in cases where livelock is a worry (and it
      is introduced in subsequent patch).
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      949854d0
    • Nick Piggin's avatar
      fs: increase d_name lock coverage · 9abca360
      Nick Piggin authored
      Cover d_name with d_lock in more cases, where there may be concurrent
      modification to it.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      9abca360
    • Nick Piggin's avatar
      fs: scale inode alias list · b23fb0a6
      Nick Piggin authored
      Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
      from concurrent modification. d_alias is also protected by d_lock.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      b23fb0a6
    • Nick Piggin's avatar
      fs: dcache scale subdirs · 2fd6b7f5
      Nick Piggin authored
      Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
      using dcache_lock for these anyway (eg. using i_mutex).
      
      Note: if we change the locking rule in future so that ->d_child protection is
      provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
      But it would be an exception to an otherwise regular locking scheme, so we'd
      have to see some good results. Probably not worthwhile.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      2fd6b7f5
    • Nick Piggin's avatar
      fs: dcache scale d_unhashed · da502956
      Nick Piggin authored
      Protect d_unhashed(dentry) condition with d_lock. This means keeping
      DCACHE_UNHASHED bit in synch with hash manipulations.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      da502956
    • Nick Piggin's avatar
      fs: dcache scale dentry refcount · b7ab39f6
      Nick Piggin authored
      Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
      0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
      we start protecting many other dentry members with d_lock.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      b7ab39f6
    • Nick Piggin's avatar
      fs: dcache scale lru · 23044507
      Nick Piggin authored
      Add a new lock, dcache_lru_lock, to protect the dcache LRU list from concurrent
      modification. d_lru is also protected by d_lock, which allows LRU lists to be
      accessed without the lru lock, using RCU in future patches.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      23044507
    • Nick Piggin's avatar
      fs: dcache scale hash · 789680d1
      Nick Piggin authored
      Add a new lock, dcache_hash_lock, to protect the dcache hash table from
      concurrent modification. d_hash is also protected by d_lock.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      789680d1
    • Nick Piggin's avatar
      hostfs: simplify locking · ec2447c2
      Nick Piggin authored
      Remove dcache_lock locking from hostfs filesystem, and move it into dcache
      helpers. All that is required is a coherent path name. Protection from
      concurrent modification of the namespace after path name generation is not
      provided in current code, because dcache_lock is dropped before the path is
      used.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      ec2447c2
    • Nick Piggin's avatar
      fs: change d_hash for rcu-walk · b1e6a015
      Nick Piggin authored
      Change d_hash so it may be called from lock-free RCU lookups. See similar
      patch for d_compare for details.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      b1e6a015
    • Nick Piggin's avatar
      fs: change d_compare for rcu-walk · 621e155a
      Nick Piggin authored
      Change d_compare so it may be called from lock-free RCU lookups. This
      does put significant restrictions on what may be done from the callback,
      however there don't seem to have been any problems with in-tree fses.
      If some strange use case pops up that _really_ cannot cope with the
      rcu-walk rules, we can just add new rcu-unaware callbacks, which would
      cause name lookup to drop out of rcu-walk mode.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      621e155a
    • Nick Piggin's avatar
      fs: name case update method · fb2d5b86
      Nick Piggin authored
      smpfs and ncpfs want to update a live dentry name in-place. Rather than
      have them open code the locking, provide a documented dcache API.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      fb2d5b86
    • Nick Piggin's avatar
      jfs: dont overwrite dentry name in d_revalidate · 2bc334dc
      Nick Piggin authored
      Use vfat's method for dealing with negative dentries to preserve case,
      rather than overwrite dentry name in d_revalidate, which is a bit ugly
      and also gets in the way of doing lock-free path walking.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      2bc334dc
    • Nick Piggin's avatar
      cifs: dont overwrite dentry name in d_revalidate · 79eb4dde
      Nick Piggin authored
      Use vfat's method for dealing with negative dentries to preserve case,
      rather than overwrite dentry name in d_revalidate, which is a bit ugly
      and also gets in the way of doing lock-free path walking.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      79eb4dde
    • Nick Piggin's avatar
      fs: change d_delete semantics · fe15ce44
      Nick Piggin authored
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      fe15ce44
    • Nick Piggin's avatar
      fs: dcache documentation cleanup · 5eef7fa9
      Nick Piggin authored
      Remove redundant (and incorrect, since dcache RCU lookup) dentry locking
      documentation and point to the canonical document.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      5eef7fa9
    • Nick Piggin's avatar
      config fs: avoid switching ->d_op on live dentry · fbc8d4c0
      Nick Piggin authored
      Switching d_op on a live dentry is racy in general, so avoid it. In this case
      it is a negative dentry, which is safer, but there are still concurrent ops
      which may be called on d_op in that case (eg. d_revalidate). So in general
      a filesystem may not do this. Fix configfs so as not to do this.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      fbc8d4c0
    • Nick Piggin's avatar
      cgroup fs: avoid switching ->d_op on live dentry · 5adcee1d
      Nick Piggin authored
      Switching d_op on a live dentry is racy in general, so avoid it. In this case
      it is a negative dentry, which is safer, but there are still concurrent ops
      which may be called on d_op in that case (eg. d_revalidate). So in general
      a filesystem may not do this. Fix cgroupfs so as not to do this.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      5adcee1d
    • Nick Piggin's avatar
      fs: use fast counters for vfs caches · 3e880fb5
      Nick Piggin authored
      percpu_counter library generates quite nasty code, so unless you need
      to dynamically allocate counters or take fast approximate value, a
      simple per cpu set of counters is much better.
      
      The percpu_counter can never be made to work as well, because it has an
      indirection from pointer to percpu memory, and it can't use direct
      this_cpu_inc interfaces because it doesn't use static PER_CPU data, so
      code will always be worse.
      
      In the fastpath, it is the difference between this:
      
              incl %gs:nr_dentry      # nr_dentry
      
      and this:
      
              movl    percpu_counter_batch(%rip), %edx        # percpu_counter_batch,
              movl    $1, %esi        #,
              movq    $nr_dentry, %rdi        #,
              call    __percpu_counter_add    # (plus I clobber registers)
      
      __percpu_counter_add:
              pushq   %rbp    #
              movq    %rsp, %rbp      #,
              subq    $32, %rsp       #,
              movq    %rbx, -24(%rbp) #,
              movq    %r12, -16(%rbp) #,
              movq    %r13, -8(%rbp)  #,
              movq    %rdi, %rbx      # fbc, fbc
      #APP
      # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
              movq %gs:kernel_stack,%rax      #, pfo_ret__
      # 0 "" 2
      #NO_APP
              incl    -8124(%rax)     # <variable>.preempt_count
              movq    32(%rdi), %r12  # <variable>.counters, tcp_ptr__
      #APP
      # 78 "lib/percpu_counter.c" 1
              add %gs:this_cpu_off, %r12      # this_cpu_off, tcp_ptr__
      # 0 "" 2
      #NO_APP
              movslq  (%r12),%r13     #* tcp_ptr__, tmp73
              movslq  %edx,%rax       # batch, batch
              addq    %rsi, %r13      # amount, count
              cmpq    %rax, %r13      # batch, count
              jge     .L27    #,
              negl    %edx    # tmp76
              movslq  %edx,%rdx       # tmp76, tmp77
              cmpq    %rdx, %r13      # tmp77, count
              jg      .L28    #,
      .L27:
              movq    %rbx, %rdi      # fbc,
              call    _raw_spin_lock  #
              addq    %r13, 8(%rbx)   # count, <variable>.count
              movq    %rbx, %rdi      # fbc,
              movl    $0, (%r12)      #,* tcp_ptr__
              call    _raw_spin_unlock        #
      .L29:
      #APP
      # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
              movq %gs:kernel_stack,%rax      #, pfo_ret__
      # 0 "" 2
      #NO_APP
              decl    -8124(%rax)     # <variable>.preempt_count
              movq    -8136(%rax), %rax       #, D.14625
              testb   $8, %al #, D.14625
              jne     .L32    #,
      .L31:
              movq    -24(%rbp), %rbx #,
              movq    -16(%rbp), %r12 #,
              movq    -8(%rbp), %r13  #,
              leave
              ret
              .p2align 4,,10
              .p2align 3
      .L28:
              movl    %r13d, (%r12)   # count,*
              jmp     .L29    #
      .L32:
              call    preempt_schedule        #
              .p2align 4,,6
              jmp     .L31    #
              .size   __percpu_counter_add, .-__percpu_counter_add
              .p2align 4,,15
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      3e880fb5
    • Nick Piggin's avatar
      vfs: revert per-cpu nr_unused counters for dentry and inodes · 86c8749e
      Nick Piggin authored
      The nr_unused counters count the number of objects on an LRU, and as such they
      are synchronized with LRU object insertion and removal and scanning, and
      protected under the LRU lock.
      
      Making it per-cpu does not actually get any concurrency improvements because of
      this lock, and summing the counter is much slower, and
      incrementing/decrementing it costs more code size and is slower too.
      
      These counters should stay per-LRU, which currently means global.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      86c8749e
    • Nick Piggin's avatar
      kernel: kmem_ptr_validate considered harmful · ccd35fb9
      Nick Piggin authored
      This is a nasty and error prone API. It is no longer used, remove it.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      ccd35fb9
    • Nick Piggin's avatar
      fs: d_validate fixes · 786a5e15
      Nick Piggin authored
      d_validate has been broken for a long time.
      
      kmem_ptr_validate does not guarantee that a pointer can be dereferenced
      if it can go away at any time. Even rcu_read_lock doesn't help, because
      the pointer might be queued in RCU callbacks but not executed yet.
      
      So the parent cannot be checked, nor the name hashed. The dentry pointer
      can not be touched until it can be verified under lock. Hashing simply
      cannot be used.
      
      Instead, verify the parent/child relationship by traversing parent's
      d_child list. It's slow, but only ncpfs and the destaged smbfs care
      about it, at this point.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      786a5e15
  2. 05 Jan, 2011 2 commits
  3. 04 Jan, 2011 5 commits