• Shaohua Li's avatar
    swap: add per-partition lock for swapfile · ec8acf20
    Shaohua Li authored
    swap_lock is heavily contended when I test swap to 3 fast SSD (even
    slightly slower than swap to 2 such SSD).  The main contention comes
    from swap_info_get().  This patch tries to fix the gap with adding a new
    per-partition lock.
    
    Global data like nr_swapfiles, total_swap_pages, least_priority and
    swap_list are still protected by swap_lock.
    
    nr_swap_pages is an atomic now, it can be changed without swap_lock.  In
    theory, it's possible get_swap_page() finds no swap pages but actually
    there are free swap pages.  But sounds not a big problem.
    
    Accessing partition specific data (like scan_swap_map and so on) is only
    protected by swap_info_struct.lock.
    
    Changing swap_info_struct.flags need hold swap_lock and
    swap_info_struct.lock, because scan_scan_map() will check it.  read the
    flags is ok with either the locks hold.
    
    If both swap_lock and swap_info_struct.lock must be hold, we always hold
    the former first to avoid deadlock.
    
    swap_entry_free() can change swap_list.  To delete that code, we add a
    new highest_priority_index.  Whenever get_swap_page() is called, we
    check it.  If it's valid, we use it.
    
    It's a pity get_swap_page() still holds swap_lock().  But in practice,
    swap_lock() isn't heavily contended in my test with this patch (or I can
    say there are other much more heavier bottlenecks like TLB flush).  And
    BTW, looks get_swap_page() doesn't really need the lock.  We never free
    swap_info[] and we check SWAP_WRITEOK flag.  The only risk without the
    lock is we could swapout to some low priority swap, but we can quickly
    recover after several rounds of swap, so sounds not a big deal to me.
    But I'd prefer to fix this if it's a real problem.
    
    "swap: make each swap partition have one address_space" improved the
    swapout speed from 1.7G/s to 2G/s.  This patch further improves the
    speed to 2.3G/s, so around 15% improvement.  It's a multi-process test,
    so TLB flush isn't the biggest bottleneck before the patches.
    
    [arnd@arndb.de: fix it for nommu]
    [hughd@google.com: add missing unlock]
    [minchan@kernel.org: get rid of lockdep whinge on sys_swapon]
    Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Minchan Kim <minchan.kim@gmail.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
    Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ec8acf20
swap_state.c 10.7 KB