Commit 00501b53 authored by Johannes Weiner's avatar Johannes Weiner Committed by Linus Torvalds

mm: memcontrol: rewrite charge API

These patches rework memcg charge lifetime to integrate more naturally
with the lifetime of user pages.  This drastically simplifies the code and
reduces charging and uncharging overhead.  The most expensive part of
charging and uncharging is the page_cgroup bit spinlock, which is removed
entirely after this series.

Here are the top-10 profile entries of a stress test that reads a 128G
sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
 executing in the root memcg).  Before:

    15.36%              cat  [kernel.kallsyms]   [k] copy_user_generic_string
    13.31%              cat  [kernel.kallsyms]   [k] memset
    11.48%              cat  [kernel.kallsyms]   [k] do_mpage_readpage
     4.23%              cat  [kernel.kallsyms]   [k] get_page_from_freelist
     2.38%              cat  [kernel.kallsyms]   [k] put_page
     2.32%              cat  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge
     2.18%          kswapd0  [kernel.kallsyms]   [k] __mem_cgroup_uncharge_common
     1.92%          kswapd0  [kernel.kallsyms]   [k] shrink_page_list
     1.86%              cat  [kernel.kallsyms]   [k] __radix_tree_lookup
     1.62%              cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn

After:

    15.67%           cat  [kernel.kallsyms]   [k] copy_user_generic_string
    13.48%           cat  [kernel.kallsyms]   [k] memset
    11.42%           cat  [kernel.kallsyms]   [k] do_mpage_readpage
     3.98%           cat  [kernel.kallsyms]   [k] get_page_from_freelist
     2.46%           cat  [kernel.kallsyms]   [k] put_page
     2.13%       kswapd0  [kernel.kallsyms]   [k] shrink_page_list
     1.88%           cat  [kernel.kallsyms]   [k] __radix_tree_lookup
     1.67%           cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
     1.39%       kswapd0  [kernel.kallsyms]   [k] free_pcppages_bulk
     1.30%           cat  [kernel.kallsyms]   [k] kfree

As you can see, the memcg footprint has shrunk quite a bit.

   text    data     bss     dec     hex filename
  37970    9892     400   48262    bc86 mm/memcontrol.o.old
  35239    9892     400   45531    b1db mm/memcontrol.o

This patch (of 4):

The memcg charge API charges pages before they are rmapped - i.e.  have an
actual "type" - and so every callsite needs its own set of charge and
uncharge functions to know what type is being operated on.  Worse,
uncharge has to happen from a context that is still type-specific, rather
than at the end of the page's lifetime with exclusive access, and so
requires a lot of synchronization.

Rewrite the charge API to provide a generic set of try_charge(),
commit_charge() and cancel_charge() transaction operations, much like
what's currently done for swap-in:

  mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
  pages from the memcg if necessary.

  mem_cgroup_commit_charge() commits the page to the charge once it
  has a valid page->mapping and PageAnon() reliably tells the type.

  mem_cgroup_cancel_charge() aborts the transaction.

This reduces the charge API and enables subsequent patches to
drastically simplify uncharging.

As pages need to be committed after rmap is established but before they
are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
additions again.  Revive lru_cache_add_active_or_unevictable().

[hughd@google.com: fix shmem_unuse]
[hughd@google.com: Add comments on the private use of -EAGAIN]
Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: default avatarHugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 4449a51a
......@@ -24,24 +24,7 @@ Please note that implementation details can be changed.
a page/swp_entry may be charged (usage += PAGE_SIZE) at
mem_cgroup_charge_anon()
Called at new page fault and Copy-On-Write.
mem_cgroup_try_charge_swapin()
Called at do_swap_page() (page fault on swap entry) and swapoff.
Followed by charge-commit-cancel protocol. (With swap accounting)
At commit, a charge recorded in swap_cgroup is removed.
mem_cgroup_charge_file()
Called at add_to_page_cache()
mem_cgroup_cache_charge_swapin()
Called at shmem's swapin.
mem_cgroup_prepare_migration()
Called before migration. "extra" charge is done and followed by
charge-commit-cancel protocol.
At commit, charge against oldpage or newpage will be committed.
mem_cgroup_try_charge()
2. Uncharge
a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by
......@@ -69,19 +52,14 @@ Please note that implementation details can be changed.
to new page is committed. At failure, charge to old page is committed.
3. charge-commit-cancel
In some case, we can't know this "charge" is valid or not at charging
(because of races).
To handle such case, there are charge-commit-cancel functions.
mem_cgroup_try_charge_XXX
mem_cgroup_commit_charge_XXX
mem_cgroup_cancel_charge_XXX
these are used in swap-in and migration.
Memcg pages are charged in two steps:
mem_cgroup_try_charge()
mem_cgroup_commit_charge() or mem_cgroup_cancel_charge()
At try_charge(), there are no flags to say "this page is charged".
at this point, usage += PAGE_SIZE.
At commit(), the function checks the page should be charged or not
and set flags or avoid charging.(usage -= PAGE_SIZE)
At commit(), the page is associated with the memcg.
At cancel(), simply usage -= PAGE_SIZE.
......
......@@ -54,28 +54,11 @@ struct mem_cgroup_reclaim_cookie {
};
#ifdef CONFIG_MEMCG
/*
* All "charge" functions with gfp_mask should use GFP_KERNEL or
* (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg doesn't
* alloc memory but reclaims memory from all available zones. So, "where I want
* memory from" bits of gfp_mask has no meaning. So any bits of that field is
* available but adding a rule is better. charge functions' gfp_mask should
* be set to GFP_KERNEL or gfp_mask & GFP_RECLAIM_MASK for avoiding ambiguous
* codes.
* (Of course, if memcg does memory allocation in future, GFP_KERNEL is sane.)
*/
extern int mem_cgroup_charge_anon(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask);
/* for swap handling */
extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
struct page *page, gfp_t mask, struct mem_cgroup **memcgp);
extern void mem_cgroup_commit_charge_swapin(struct page *page,
struct mem_cgroup *memcg);
extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg);
extern int mem_cgroup_charge_file(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask);
int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask, struct mem_cgroup **memcgp);
void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
bool lrucare);
void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
......@@ -233,30 +216,22 @@ void mem_cgroup_print_bad_page(struct page *page);
#else /* CONFIG_MEMCG */
struct mem_cgroup;
static inline int mem_cgroup_charge_anon(struct page *page,
struct mm_struct *mm, gfp_t gfp_mask)
{
return 0;
}
static inline int mem_cgroup_charge_file(struct page *page,
struct mm_struct *mm, gfp_t gfp_mask)
{
return 0;
}
static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
struct page *page, gfp_t gfp_mask, struct mem_cgroup **memcgp)
static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask,
struct mem_cgroup **memcgp)
{
*memcgp = NULL;
return 0;
}
static inline void mem_cgroup_commit_charge_swapin(struct page *page,
struct mem_cgroup *memcg)
static inline void mem_cgroup_commit_charge(struct page *page,
struct mem_cgroup *memcg,
bool lrucare)
{
}
static inline void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg)
static inline void mem_cgroup_cancel_charge(struct page *page,
struct mem_cgroup *memcg)
{
}
......
......@@ -320,6 +320,9 @@ extern void swap_setup(void);
extern void add_page_to_unevictable_list(struct page *page);
extern void lru_cache_add_active_or_unevictable(struct page *page,
struct vm_area_struct *vma);
/* linux/mm/vmscan.c */
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask, nodemask_t *mask);
......
......@@ -167,6 +167,11 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
/* For mmu_notifiers */
const unsigned long mmun_start = addr;
const unsigned long mmun_end = addr + PAGE_SIZE;
struct mem_cgroup *memcg;
err = mem_cgroup_try_charge(kpage, vma->vm_mm, GFP_KERNEL, &memcg);
if (err)
return err;
/* For try_to_free_swap() and munlock_vma_page() below */
lock_page(page);
......@@ -179,6 +184,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
get_page(kpage);
page_add_new_anon_rmap(kpage, vma, addr);
mem_cgroup_commit_charge(kpage, memcg, false);
lru_cache_add_active_or_unevictable(kpage, vma);
if (!PageAnon(page)) {
dec_mm_counter(mm, MM_FILEPAGES);
......@@ -200,6 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
err = 0;
unlock:
mem_cgroup_cancel_charge(kpage, memcg);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
unlock_page(page);
return err;
......@@ -315,18 +323,11 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
if (!new_page)
goto put_old;
if (mem_cgroup_charge_anon(new_page, mm, GFP_KERNEL))
goto put_new;
__SetPageUptodate(new_page);
copy_highpage(new_page, old_page);
copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
ret = __replace_page(vma, vaddr, old_page, new_page);
if (ret)
mem_cgroup_uncharge_page(new_page);
put_new:
page_cache_release(new_page);
put_old:
put_page(old_page);
......
......@@ -31,6 +31,7 @@
#include <linux/security.h>
#include <linux/cpuset.h>
#include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
#include <linux/hugetlb.h>
#include <linux/memcontrol.h>
#include <linux/cleancache.h>
#include <linux/rmap.h>
......@@ -548,19 +549,24 @@ static int __add_to_page_cache_locked(struct page *page,
pgoff_t offset, gfp_t gfp_mask,
void **shadowp)
{
int huge = PageHuge(page);
struct mem_cgroup *memcg;
int error;
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapBacked(page), page);
error = mem_cgroup_charge_file(page, current->mm,
gfp_mask & GFP_RECLAIM_MASK);
if (error)
return error;
if (!huge) {
error = mem_cgroup_try_charge(page, current->mm,
gfp_mask, &memcg);
if (error)
return error;
}
error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
if (error) {
mem_cgroup_uncharge_cache_page(page);
if (!huge)
mem_cgroup_cancel_charge(page, memcg);
return error;
}
......@@ -575,13 +581,16 @@ static int __add_to_page_cache_locked(struct page *page,
goto err_insert;
__inc_zone_page_state(page, NR_FILE_PAGES);
spin_unlock_irq(&mapping->tree_lock);
if (!huge)
mem_cgroup_commit_charge(page, memcg, false);
trace_mm_filemap_add_to_page_cache(page);
return 0;
err_insert:
page->mapping = NULL;
/* Leave page->index set: truncation relies upon it */
spin_unlock_irq(&mapping->tree_lock);
mem_cgroup_uncharge_cache_page(page);
if (!huge)
mem_cgroup_cancel_charge(page, memcg);
page_cache_release(page);
return error;
}
......
......@@ -715,13 +715,20 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
unsigned long haddr, pmd_t *pmd,
struct page *page)
{
struct mem_cgroup *memcg;
pgtable_t pgtable;
spinlock_t *ptl;
VM_BUG_ON_PAGE(!PageCompound(page), page);
if (mem_cgroup_try_charge(page, mm, GFP_TRANSHUGE, &memcg))
return VM_FAULT_OOM;
pgtable = pte_alloc_one(mm, haddr);
if (unlikely(!pgtable))
if (unlikely(!pgtable)) {
mem_cgroup_cancel_charge(page, memcg);
return VM_FAULT_OOM;
}
clear_huge_page(page, haddr, HPAGE_PMD_NR);
/*
......@@ -734,7 +741,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
ptl = pmd_lock(mm, pmd);
if (unlikely(!pmd_none(*pmd))) {
spin_unlock(ptl);
mem_cgroup_uncharge_page(page);
mem_cgroup_cancel_charge(page, memcg);
put_page(page);
pte_free(mm, pgtable);
} else {
......@@ -742,6 +749,8 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
page_add_new_anon_rmap(page, vma, haddr);
mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, haddr, pmd, entry);
add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
......@@ -827,13 +836,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
count_vm_event(THP_FAULT_FALLBACK);
return VM_FAULT_FALLBACK;
}
if (unlikely(mem_cgroup_charge_anon(page, mm, GFP_TRANSHUGE))) {
put_page(page);
count_vm_event(THP_FAULT_FALLBACK);
return VM_FAULT_FALLBACK;
}
if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page))) {
mem_cgroup_uncharge_page(page);
put_page(page);
count_vm_event(THP_FAULT_FALLBACK);
return VM_FAULT_FALLBACK;
......@@ -979,6 +982,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
struct page *page,
unsigned long haddr)
{
struct mem_cgroup *memcg;
spinlock_t *ptl;
pgtable_t pgtable;
pmd_t _pmd;
......@@ -999,20 +1003,21 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
__GFP_OTHER_NODE,
vma, address, page_to_nid(page));
if (unlikely(!pages[i] ||
mem_cgroup_charge_anon(pages[i], mm,
GFP_KERNEL))) {
mem_cgroup_try_charge(pages[i], mm, GFP_KERNEL,
&memcg))) {
if (pages[i])
put_page(pages[i]);
mem_cgroup_uncharge_start();
while (--i >= 0) {
mem_cgroup_uncharge_page(pages[i]);
memcg = (void *)page_private(pages[i]);
set_page_private(pages[i], 0);
mem_cgroup_cancel_charge(pages[i], memcg);
put_page(pages[i]);
}
mem_cgroup_uncharge_end();
kfree(pages);
ret |= VM_FAULT_OOM;
goto out;
}
set_page_private(pages[i], (unsigned long)memcg);
}
for (i = 0; i < HPAGE_PMD_NR; i++) {
......@@ -1041,7 +1046,11 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
pte_t *pte, entry;
entry = mk_pte(pages[i], vma->vm_page_prot);
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
memcg = (void *)page_private(pages[i]);
set_page_private(pages[i], 0);
page_add_new_anon_rmap(pages[i], vma, haddr);
mem_cgroup_commit_charge(pages[i], memcg, false);
lru_cache_add_active_or_unevictable(pages[i], vma);
pte = pte_offset_map(&_pmd, haddr);
VM_BUG_ON(!pte_none(*pte));
set_pte_at(mm, haddr, pte, entry);
......@@ -1065,12 +1074,12 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
out_free_pages:
spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
mem_cgroup_uncharge_start();
for (i = 0; i < HPAGE_PMD_NR; i++) {
mem_cgroup_uncharge_page(pages[i]);
memcg = (void *)page_private(pages[i]);
set_page_private(pages[i], 0);
mem_cgroup_cancel_charge(pages[i], memcg);
put_page(pages[i]);
}
mem_cgroup_uncharge_end();
kfree(pages);
goto out;
}
......@@ -1081,6 +1090,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
spinlock_t *ptl;
int ret = 0;
struct page *page = NULL, *new_page;
struct mem_cgroup *memcg;
unsigned long haddr;
unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
......@@ -1132,7 +1142,8 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
goto out;
}
if (unlikely(mem_cgroup_charge_anon(new_page, mm, GFP_TRANSHUGE))) {
if (unlikely(mem_cgroup_try_charge(new_page, mm,
GFP_TRANSHUGE, &memcg))) {
put_page(new_page);
if (page) {
split_huge_page(page);
......@@ -1161,7 +1172,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
put_user_huge_page(page);
if (unlikely(!pmd_same(*pmd, orig_pmd))) {
spin_unlock(ptl);
mem_cgroup_uncharge_page(new_page);
mem_cgroup_cancel_charge(new_page, memcg);
put_page(new_page);
goto out_mn;
} else {
......@@ -1170,6 +1181,8 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
pmdp_clear_flush(vma, haddr, pmd);
page_add_new_anon_rmap(new_page, vma, haddr);
mem_cgroup_commit_charge(new_page, memcg, false);
lru_cache_add_active_or_unevictable(new_page, vma);
set_pmd_at(mm, haddr, pmd, entry);
update_mmu_cache_pmd(vma, address, pmd);
if (!page) {
......@@ -2413,6 +2426,7 @@ static void collapse_huge_page(struct mm_struct *mm,
spinlock_t *pmd_ptl, *pte_ptl;
int isolated;
unsigned long hstart, hend;
struct mem_cgroup *memcg;
unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
......@@ -2423,7 +2437,8 @@ static void collapse_huge_page(struct mm_struct *mm,
if (!new_page)
return;
if (unlikely(mem_cgroup_charge_anon(new_page, mm, GFP_TRANSHUGE)))
if (unlikely(mem_cgroup_try_charge(new_page, mm,
GFP_TRANSHUGE, &memcg)))
return;
/*
......@@ -2510,6 +2525,8 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(new_page, vma, address);
mem_cgroup_commit_charge(new_page, memcg, false);
lru_cache_add_active_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, address, pmd, _pmd);
update_mmu_cache_pmd(vma, address, pmd);
......@@ -2523,7 +2540,7 @@ static void collapse_huge_page(struct mm_struct *mm,
return;
out:
mem_cgroup_uncharge_page(new_page);
mem_cgroup_cancel_charge(new_page, memcg);
goto out_up_write;
}
......
This diff is collapsed.
......@@ -2049,6 +2049,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *dirty_page = NULL;
unsigned long mmun_start = 0; /* For mmu_notifiers */
unsigned long mmun_end = 0; /* For mmu_notifiers */
struct mem_cgroup *memcg;
old_page = vm_normal_page(vma, address, orig_pte);
if (!old_page) {
......@@ -2204,7 +2205,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
}
__SetPageUptodate(new_page);
if (mem_cgroup_charge_anon(new_page, mm, GFP_KERNEL))
if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
goto oom_free_new;
mmun_start = address & PAGE_MASK;
......@@ -2234,6 +2235,8 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
*/
ptep_clear_flush(vma, address, page_table);
page_add_new_anon_rmap(new_page, vma, address);
mem_cgroup_commit_charge(new_page, memcg, false);
lru_cache_add_active_or_unevictable(new_page, vma);
/*
* We call the notify macro here because, when using secondary
* mmu page tables (such as kvm shadow page tables), we want the
......@@ -2271,7 +2274,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
new_page = old_page;
ret |= VM_FAULT_WRITE;
} else
mem_cgroup_uncharge_page(new_page);
mem_cgroup_cancel_charge(new_page, memcg);
if (new_page)
page_cache_release(new_page);
......@@ -2410,10 +2413,10 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
{
spinlock_t *ptl;
struct page *page, *swapcache;
struct mem_cgroup *memcg;
swp_entry_t entry;
pte_t pte;
int locked;
struct mem_cgroup *ptr;
int exclusive = 0;
int ret = 0;
......@@ -2489,7 +2492,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
goto out_page;
}
if (mem_cgroup_try_charge_swapin(mm, page, GFP_KERNEL, &ptr)) {
if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg)) {
ret = VM_FAULT_OOM;
goto out_page;
}
......@@ -2514,10 +2517,6 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
* while the page is counted on swap but not yet in mapcount i.e.
* before page_add_anon_rmap() and swap_free(); try_to_free_swap()
* must be called after the swap_free(), or it will never succeed.
* Because delete_from_swap_page() may be called by reuse_swap_page(),
* mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
* in page->private. In this case, a record in swap_cgroup is silently
* discarded at swap_free().
*/
inc_mm_counter_fast(mm, MM_ANONPAGES);
......@@ -2533,12 +2532,14 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (pte_swp_soft_dirty(orig_pte))
pte = pte_mksoft_dirty(pte);
set_pte_at(mm, address, page_table, pte);
if (page == swapcache)
if (page == swapcache) {
do_page_add_anon_rmap(page, vma, address, exclusive);
else /* ksm created a completely new copy */
mem_cgroup_commit_charge(page, memcg, true);
} else { /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, address);
/* It's better to call commit-charge after rmap is established */
mem_cgroup_commit_charge_swapin(page, ptr);
mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
}
swap_free(entry);
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
......@@ -2571,7 +2572,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
out:
return ret;
out_nomap:
mem_cgroup_cancel_charge_swapin(ptr);
mem_cgroup_cancel_charge(page, memcg);
pte_unmap_unlock(page_table, ptl);
out_page:
unlock_page(page);
......@@ -2627,6 +2628,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *page_table, pmd_t *pmd,
unsigned int flags)
{
struct mem_cgroup *memcg;
struct page *page;
spinlock_t *ptl;
pte_t entry;
......@@ -2660,7 +2662,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
*/
__SetPageUptodate(page);
if (mem_cgroup_charge_anon(page, mm, GFP_KERNEL))
if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
goto oom_free_page;
entry = mk_pte(page, vma->vm_page_prot);
......@@ -2673,6 +2675,8 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
inc_mm_counter_fast(mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, address);
mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
setpte:
set_pte_at(mm, address, page_table, entry);
......@@ -2682,7 +2686,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
pte_unmap_unlock(page_table, ptl);
return 0;
release:
mem_cgroup_uncharge_page(page);
mem_cgroup_cancel_charge(page, memcg);
page_cache_release(page);
goto unlock;
oom_free_page:
......@@ -2919,6 +2923,7 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
{
struct page *fault_page, *new_page;
struct mem_cgroup *memcg;
spinlock_t *ptl;
pte_t *pte;
int ret;
......@@ -2930,7 +2935,7 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (!new_page)
return VM_FAULT_OOM;
if (mem_cgroup_charge_anon(new_page, mm, GFP_KERNEL)) {
if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg)) {
page_cache_release(new_page);
return VM_FAULT_OOM;
}
......@@ -2950,12 +2955,14 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
goto uncharge_out;
}
do_set_pte(vma, address, new_page, pte, true, true);
mem_cgroup_commit_charge(new_page, memcg, false);
lru_cache_add_active_or_unevictable(new_page, vma);
pte_unmap_unlock(pte, ptl);
unlock_page(fault_page);
page_cache_release(fault_page);
return ret;
uncharge_out:
mem_cgroup_uncharge_page(new_page);
mem_cgroup_cancel_charge(new_page, memcg);
page_cache_release(new_page);
return ret;
}
......
......@@ -1032,25 +1032,6 @@ void page_add_new_anon_rmap(struct page *page,
__mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
hpage_nr_pages(page));
__page_set_anon_rmap(page, vma, address, 1);
VM_BUG_ON_PAGE(PageLRU(page), page);
if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) {
SetPageActive(page);
lru_cache_add(page);
return;
}
if (!TestSetPageMlocked(page)) {
/*
* We use the irq-unsafe __mod_zone_page_stat because this
* counter is not modified from interrupt context, and the pte
* lock is held(spinlock), which implies preemption disabled.
*/
__mod_zone_page_state(page_zone(page), NR_MLOCK,
hpage_nr_pages(page));
count_vm_event(UNEVICTABLE_PGMLOCKED);
}
add_page_to_unevictable_list(page);
}
/**
......
......@@ -621,7 +621,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info,
radswap = swp_to_radix_entry(swap);
index = radix_tree_locate_item(&mapping->page_tree, radswap);
if (index == -1)
return 0;
return -EAGAIN; /* tell shmem_unuse we found nothing */
/*
* Move _head_ to start search for next from here.
......@@ -680,7 +680,6 @@ static int shmem_unuse_inode(struct shmem_inode_info *info,
spin_unlock(&info->lock);
swap_free(swap);
}
error = 1; /* not an error, but entry was found */
}
return error;
}
......@@ -692,7 +691,7 @@ int shmem_unuse(swp_entry_t swap, struct page *page)
{
struct list_head *this, *next;
struct shmem_inode_info *info;
int found = 0;
struct mem_cgroup *memcg;
int error = 0;
/*
......@@ -707,26 +706,32 @@ int shmem_unuse(swp_entry_t swap, struct page *page)
* the shmem_swaplist_mutex which might hold up shmem_writepage().
* Charged back to the user (not to caller) when swap account is used.
*/
error = mem_cgroup_charge_file(page, current->mm, GFP_KERNEL);
error = mem_cgroup_try_charge(page, current->mm, GFP_KERNEL, &memcg);
if (error)
goto out;
/* No radix_tree_preload: swap entry keeps a place for page in tree */
error = -EAGAIN;
mutex_lock(&shmem_swaplist_mutex);
list_for_each_safe(this, next, &shmem_swaplist) {
info = list_entry(this, struct shmem_inode_info, swaplist);
if (info->swapped)
found = shmem_unuse_inode(info, swap, &page);
error = shmem_unuse_inode(info, swap, &page);
else
list_del_init(&info->swaplist);
cond_resched();
if (found)
if (error != -EAGAIN)
break;
/* found nothing in this: move on to search the next */
}
mutex_unlock(&shmem_swaplist_mutex);
if (found < 0)
error = found;
if (error) {
if (error != -ENOMEM)
error = 0;
mem_cgroup_cancel_charge(page, memcg);
} else
mem_cgroup_commit_charge(page, memcg, true);
out:
unlock_page(page);
page_cache_release(page);
......@@ -1030,6 +1035,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info;
struct shmem_sb_info *sbinfo;
struct mem_cgroup *memcg;
struct page *page;
swp_entry_t swap;
int error;
......@@ -1108,8 +1114,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
goto failed;
}
error = mem_cgroup_charge_file(page, current->mm,
gfp & GFP_RECLAIM_MASK);
error = mem_cgroup_try_charge(page, current->mm, gfp, &memcg);
if (!error) {
error = shmem_add_to_page_cache(page, mapping, index,
swp_to_radix_entry(swap));
......@@ -1125,12 +1130,16 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
* Reset swap.val? No, leave it so "failed" goes back to
* "repeat": reading a hole and writing should succeed.
*/
if (error)
if (error) {
mem_cgroup_cancel_charge(page, memcg);
delete_from_swap_cache(page);
}
}
if (error)
goto failed;
mem_cgroup_commit_charge(page, memcg, true);
spin_lock(&info->lock);
info->swapped--;
shmem_recalc_inode(inode);
......@@ -1168,8 +1177,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
if (sgp == SGP_WRITE)
__SetPageReferenced(page);
error = mem_cgroup_charge_file(page, current->mm,
gfp & GFP_RECLAIM_MASK);
error = mem_cgroup_try_charge(page, current->mm, gfp, &memcg);
if (error)
goto decused;
error = radix_tree_maybe_preload(gfp & GFP_RECLAIM_MASK);
......@@ -1179,9 +1187,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
radix_tree_preload_end();
}
if (error) {
mem_cgroup_uncharge_cache_page(page);
mem_cgroup_cancel_charge(page, memcg);
goto decused;
}
mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_anon(page);
spin_lock(&info->lock);
......
......@@ -687,6 +687,40 @@ void add_page_to_unevictable_list(struct page *page)
spin_unlock_irq(&zone->lru_lock);
}
/**
* lru_cache_add_active_or_unevictable
* @page: the page to be added to LRU
* @vma: vma in which page is mapped for determining reclaimability
*
* Place @page on the active or unevictable LRU list, depending on its
* evictability. Note that if the page is not evictable, it goes
* directly back onto it's zone's unevictable list, it does NOT use a
* per cpu pagevec.
*/
void lru_cache_add_active_or_unevictable(struct page *page,
struct vm_area_struct *vma)
{
VM_BUG_ON_PAGE(PageLRU(page), page);
if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) {
SetPageActive(page);
lru_cache_add(page);
return;
}
if (!TestSetPageMlocked(page)) {
/*
* We use the irq-unsafe __mod_zone_page_stat because this
* counter is not modified from interrupt context, and the pte
* lock is held(spinlock), which implies preemption disabled.
*/
__mod_zone_page_state(page_zone(page), NR_MLOCK,
hpage_nr_pages(page));
count_vm_event(UNEVICTABLE_PGMLOCKED);
}
add_page_to_unevictable_list(page);
}
/*
* If the page can not be invalidated, it is moved to the
* inactive list to speed up its reclaim. It is moved to the
......
......@@ -1106,15 +1106,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (unlikely(!page))
return -ENOMEM;
if (mem_cgroup_try_charge_swapin(vma->vm_mm, page,
GFP_KERNEL, &memcg)) {
if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
ret = -ENOMEM;
goto out_nolock;
}
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
if (unlikely(!maybe_same_pte(*pte, swp_entry_to_pte(entry)))) {
mem_cgroup_cancel_charge_swapin(memcg);
mem_cgroup_cancel_charge(page, memcg);
ret = 0;
goto out;
}
......@@ -1124,11 +1123,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
get_page(page);
set_pte_at(vma->vm_mm, addr, pte,
pte_mkold(mk_pte(page, vma->vm_page_prot)));
if (page == swapcache)
if (page == swapcache) {
page_add_anon_rmap(page, vma, addr);
else /* ksm created a completely new copy */
mem_cgroup_commit_charge(page, memcg, true);
} else { /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, addr);
mem_cgroup_commit_charge_swapin(page, memcg);
mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
}
swap_free(entry);
/*
* Move the page to the active list so it is not
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment