• Dave Chinner's avatar
    xfs: improve buffer cache hash scalability · 9bc08a45
    Dave Chinner authored
    When doing large parallel file creates on a 16p machines, large amounts of
    time is being spent in _xfs_buf_find(). A system wide profile with perf top
    shows this:
    
              1134740.00 19.3% _xfs_buf_find
               733142.00 12.5% __ticket_spin_lock
    
    The problem is that the hash contains 45,000 buffers, and the hash table width
    is only 256 buffers. That means we've got around 200 buffers per chain, and
    searching it is quite expensive. The hash table size needs to increase.
    
    Secondly, every time we do a lookup, we promote the buffer we find to the head
    of the hash chain. This is causing cachelines to be dirtied and causes
    invalidation of cachelines across all CPUs that may have walked the hash chain
    recently. hence every walk of the hash chain is effectively a cold cache walk.
    Remove the promotion to avoid this invalidation.
    
    The results are:
    
              1045043.00 21.2% __ticket_spin_lock
               326184.00  6.6% _xfs_buf_find
    
    A 70% drop in the CPU usage when looking up buffers. Unfortunately that does
    not result in an increase in performance underthis workload as contention on
    the inode_lock soaks up most of the reduction in CPU usage.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    9bc08a45
xfs_buf.c 44.8 KB