• NeilBrown's avatar
    rhashtable: use bit_spin_locks to protect hash bucket. · 8f0db018
    NeilBrown authored
    This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
    bucket pointer to lock the hash chain for that bucket.
    
    The benefits of a bit spin_lock are:
     - no need to allocate a separate array of locks.
     - no need to have a configuration option to guide the
       choice of the size of this array
     - locking cost is often a single test-and-set in a cache line
       that will have to be loaded anyway.  When inserting at, or removing
       from, the head of the chain, the unlock is free - writing the new
       address in the bucket head implicitly clears the lock bit.
       For __rhashtable_insert_fast() we ensure this always happens
       when adding a new key.
     - even when lockings costs 2 updates (lock and unlock), they are
       in a cacheline that needs to be read anyway.
    
    The cost of using a bit spin_lock is a little bit of code complexity,
    which I think is quite manageable.
    
    Bit spin_locks are sometimes inappropriate because they are not fair -
    if multiple CPUs repeatedly contend of the same lock, one CPU can
    easily be starved.  This is not a credible situation with rhashtable.
    Multiple CPUs may want to repeatedly add or remove objects, but they
    will typically do so at different buckets, so they will attempt to
    acquire different locks.
    
    As we have more bit-locks than we previously had spinlocks (by at
    least a factor of two) we can expect slightly less contention to
    go with the slightly better cache behavior and reduced memory
    consumption.
    
    To enhance type checking, a new struct is introduced to represent the
      pointer plus lock-bit
    that is stored in the bucket-table.  This is "struct rhash_lock_head"
    and is empty.  A pointer to this needs to be cast to either an
    unsigned lock, or a "struct rhash_head *" to be useful.
    Variables of this type are most often called "bkt".
    
    Previously "pprev" would sometimes point to a bucket, and sometimes a
    ->next pointer in an rhash_head.  As these are now different types,
    pprev is NULL when it would have pointed to the bucket. In that case,
    'blk' is used, together with correct locking protocol.
    Signed-off-by: default avatarNeilBrown <neilb@suse.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    8f0db018
rhashtable.c 29.4 KB