• Linus Torvalds's avatar
    vfs: get rid of batshit-insane pointless dentry hash calculations · 6d7d1a0d
    Linus Torvalds authored
    For some odd historical reason, the final mixing round for the dentry
    cache hash table lookup had an insane "xor with big constant" logic.  In
    two places.
    
    The big constant that is being xor'ed is GOLDEN_RATIO_PRIME, which is a
    fairly random-looking number that is designed to be *multiplied* with so
    that the bits get spread out over a whole long-word.
    
    But xor'ing with it is insane.  It doesn't really even change the hash -
    it really only shifts the hash around in the hash table.  To make
    matters worse, the insane big constant is different on 32-bit and 64-bit
    builds, even though the name hash bits we use are always 32-bit (and the
    bits from the pointer we mix in effectively are too).
    
    It's all total voodoo programming, in other words.
    
    Now, some testing and analysis of the hash chains shows that the rest of
    the hash function seems to be fairly good.  It does pick the right bits
    of the parent dentry pointer, for example, and while it's generally a
    bad idea to use an xor to mix down the upper bits (because if there is a
    repeating pattern, the xor can cause "destructive interference"), it
    seems to not have been a disaster.
    
    For example, replacing the hash with the normal "hash_long()" code (that
    uses the GOLDEN_RATIO_PRIME constant correctly, btw) actually just makes
    the hash worse.  The hand-picked hash knew which bits of the pointer had
    the highest entropy, and hash_long() ends up mixing bits less optimally
    at least in some trivial tests.
    
    So the hash function overall seems fine, it just has that really odd
    "shift result around by a constant xor".
    
    So get rid of the silly xor, and replace the down-mixing of the bits
    with an add instead of an xor that tends to not have the same kind of
    destructive interference issues.  Some stats on the resulting hash
    chains shows that they look statistically identical before and after,
    but the code is simpler and no longer makes you go "WTF?".
    
    Also, the incoming hash really is just "unsigned int", not a long, and
    there's no real point to worry about the high 26 bits of the dentry
    pointer for the 64-bit case, because they are all going to be identical
    anyway.
    
    So also change the hashing to be done in the more natural 'unsigned int'
    that is the real size of the actual hashed data anyway.
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    6d7d1a0d
dcache.c 77.4 KB