• Andrew Morton's avatar
    [PATCH] fix swapcache packing in the radix tree · 02eaba7f
    Andrew Morton authored
    First some terminology: this patch introduces a kernel-wide `pgoff_t'
    type.  It is the index of a page into the pagecache.  The thing at
    page->index.  For most mappings it is also the offset of the page into
    that mapping.  This type has a very distinct function in the kernel and
    it needs a name.  I don't have any particular plans to go and migrate
    everything so we can support 64-bit pagecache indices on x86, but this
    would be the way to do it.
    
    This patch improves the packing density of swapcache pages in the radix
    tree.
    
    A swapcache page is identified by the `swap type' (indexes the swap
    device) and the `offset' (into that swap device).  These two numbers
    are encoded into a `swp_entry_t' machine word in arch-specific code
    because the resulting number is placed into pagetables in a form which
    will generate a fault.
    
    The kernel also need to generate a pgoff_t for that page to index it
    into the swapper_space radix tree.  That pgoff_t is usually
    bitwise-identical to the swp_entry_t.  That worked OK when the
    pagecache was using a hash.  But with a radix tree, it produces
    catastrophically bad results.
    
    x86 (and many other architectures) place the `type' field into the
    low-order bits of the swp_entry_t.  So *all* swapcache pages are
    basically identical in the eight low-order bits.  This produces a very
    sparse radix tree for swapcache.  I'm observing packing densities of 1%
    to 2%: so the typical 128-slot radix tree node has only one or two
    pages in it.
    
    The end result is that the kernel needs to allocate approximately one
    new radix-tree node for each page which is added to the swapcache.  So
    no wonder we're having radix-tree node exhaustion during swapout!
    (It's actually quite encouraging that the kernel works as well as it
    does).
    
    The patch changes the encoding of the swp_entry_t so that its
    most-significant bits contain the `type' field and the
    least-significant bits contain the `offset' field, right-aligned.
    
    That is: the encoding in swp_entry_t is now arch-independent.  The new
    file <linux/swapops.h> has conversion functions which convert the
    swp_entry_t to and from its machine pte representation.
    
    Packing density in the swapper_space mapping goes up to around 90%
    (observed) and the kernel is tons happier under swap load.
    
    
    An alternative approach would be to create new conversion functions
    which convert an arch-specific swp_entry_t to and from a pgoff_t.  I
    tried that.  It worked, but I liked it less.
    02eaba7f
pgtable.h 13.7 KB