include/asm-sparc64/rmap.h · 865fadf052c506ef75db12ccafb42b25664cc62b · Kirill Smelkov / linux

Andrew Morton authored Jul 18, 2002
This is the "minimal rmap" patch, writen by Rik, ported to 2.5 by Craig
Kulsea.

Basically,

before: When the page reclaim code decides that is has scanned too many
unreclaimable pages on the LRU it does a scan of process virtual
address spaces for pages to add to swapcache.  ptes pointing at the
page are unmapped as the scan proceeds.  When all ptes referring to a
page have been unmapped and it has been written to swap the page is
reclaimable.

after: When an anonymous page is encountered on the tail of the LRU we
use the rmap to see if it hasn't been referenced lately.  If so then
add it to swapcache.  When the page is again encountered on the LRU, if
it is still unreferenced then try to unmap all ptes which refer to it
in one hit, and if it is clean (ie: on swap) then free it.

The rest of the VM - list management, the classzone concept, etc
remains unchanged.

There are a number of things which the per-page pte chain could be
used for.  Bill Irwin has identified the following.


(1)  page replacement no longer goes around randomly unmapping things

(2)  referenced bits are more accurate because there aren't several ms
        or even seconds between find the multiple pte's mapping a page

(3)  reduces page replacement from O(total virtually mapped) to O(physical)

(4)  enables defragmentation of physical memory

(5)  enables cooperative offlining of memory for friendly guest instance
        behavior in UML and/or LPAR settings

(6)  demonstrable benefit in performance of swapping which is common in
        end-user interactive workstation workloads (I don't like the word
        "desktop"). c.f. Craig Kulesa's post wrt. swapping performance

(7)  evidence from 2.4-based rmap trees indicates approximate parity
        with mainline in kernel compiles with appropriate locking bits

(8)  partitioning of physical memory can reduce the complexity of page
        replacement searches by scanning only the "interesting" zones
        implemented and merged in 2.4-based rmap

(9)  partitioning of physical memory can increase the parallelism of page
        replacement searches by independently processing different zones
        implemented, but not merged in 2.4-based rmap

(10) the reverse mappings may be used for efficiently keeping pte cache
        attributes coherent

(11) they may be used for virtual cache invalidation (with changes)

(12) the reverse mappings enable proper RSS limit enforcement
        implemented and merged in 2.4-based rmap



The code adds a pointer to struct page, consumes additional storage for
the pte chains and adds computational expense to the page reclaim code
(I measured it at 3% additional load during streaming I/O).  The
benefits which we get back for all this are, I must say, theoretical
and unproven.  If it has real advantages (or, indeed, disadvantages)
then why has nobody demonstrated them?



There are a number of things remaining to be done:

1: Demonstrate the above advantages.

2: Make it work with pte-highmem  (Bill Irwin is signed up for this)

3: Don't add pte_chains to non-shared pages optimisation (Dave McCracken's
   patch does this)

4: Move the pte_chains into highmem too (Bill, I guess)

5: per-cpu pte_chain freelists (Rik?)

6: maybe GC the pte_chain backing pages. (Seems unavoidable.  Rik?)

7: multithread the page reclaim code.  (I have patches).

8: clustered add-to-swap.  Not sure if I buy this.  anon pages are
   often well-ordered-by-virtual-address on the LRU, so it "just
   works" for benchmarky loads.  But there may be some other loads...

9: Fix bad IO latency in page reclaim (I have lame patches)

10: Develop tuning tools, use them.

11: The nightly updatedb run is still evicting everything.
c48c43e6
rmap.h 120 Bytes
Replace rmap.h