• Jared Hulbert's avatar
    mm: introduce VM_MIXEDMAP · b379d790
    Jared Hulbert authored
    This series introduces some important infrastructure work.  The overall result
    is that:
    
    1. We now support XIP backed filesystems using memory that have no
       struct page allocated to them. And patches 6 and 7 actually implement
       this for s390.
    
       This is pretty important in a number of cases. As far as I understand,
       in the case of virtualisation (eg. s390), each guest may mount a
       readonly copy of the same filesystem (eg. the distro). Currently,
       guests need to allocate struct pages for this image. So if you have
       100 guests, you already need to allocate more memory for the struct
       pages than the size of the image. I think. (Carsten?)
    
       For other (eg. embedded) systems, you may have a very large non-
       volatile filesystem. If you have to have struct pages for this, then
       your RAM consumption will go up proportionally to fs size. Even
       though it is just a small proportion, the RAM can be much more costly
       eg in terms of power, so every KB less that Linux uses makes it more
       attractive to a lot of these guys.
    
    2. VM_MIXEDMAP allows us to support mappings where you actually do want
       to refcount _some_ pages in the mapping, but not others, and support
       COW on arbitrary (non-linear) mappings. Jared needs this for his NVRAM
       filesystem in progress. Future iterations of this filesystem will
       most likely want to migrate pages between pagecache and XIP backing,
       which is where the requirement for mixed (some refcounted, some not)
       comes from.
    
    3. pte_special also has a peripheral usage that I need for my lockless
       get_user_pages patch. That was shown to speed up "oltp" on db2 by
       10% on a 2 socket system, which is kind of significant because they
       scrounge for months to try to find 0.1% improvement on these
       workloads. I'm hoping we might finally be faster than AIX on
       pSeries with this :). My reference to lockless get_user_pages is not
       meant to justify this patchset (which doesn't include lockless gup),
       but just to show that pte_special is not some s390 specific thing that
       should be hidden in arch code or xip code: I definitely want to use it
       on at least x86 and powerpc as well.
    
    This patch:
    
    Introduce a new type of mapping, VM_MIXEDMAP.  This is unlike VM_PFNMAP in
    that it can support COW mappings of arbitrary ranges including ranges without
    struct page *and* ranges with a struct page that we actually want to refcount
    (PFNMAP can only support COW in those cases where the un-COW-ed translations
    are mapped linearly in the virtual address, and can only support non
    refcounted ranges).
    
    VM_MIXEDMAP achieves this by refcounting all pfn_valid pages, and not
    refcounting !pfn_valid pages (which is not an option for VM_PFNMAP, because it
    needs to avoid refcounting pfn_valid pages eg.  for /dev/mem mappings).
    Signed-off-by: default avatarJared Hulbert <jaredeh@gmail.com>
    Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
    Acked-by: default avatarCarsten Otte <cotte@de.ibm.com>
    Cc: Jared Hulbert <jaredeh@gmail.com>
    Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b379d790
memory.c 74.1 KB