Commit 0fd44ab2 authored by Liu Shixin's avatar Liu Shixin Committed by Andrew Morton

mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM

Patch series "Fix I/O high when memory almost met memcg limit", v2.

Recently, when install package in a docker which almost reached its memory
limit, the installer has no respond severely for more than 15 minutes. 
During this period, I/O stays high(~1G/s) and influence the whole machine.
I've constructed a use case as follows:

  1. create a docker:

	$ cat test.sh
	#!/bin/bash
  
	docker rm centos7 --force

	docker create --name centos7 --memory 4G --memory-swap 6G centos:7 /usr/sbin/init
	docker start centos7
	sleep 1

	docker cp ./alloc_page centos7:/
	docker cp ./reproduce.sh centos7:/

	docker exec -it centos7 /bin/bash

  2. try reproduce the problem in docker:

	$ cat reproduce.sh
	#!/bin/bash
  
	while true; do
		flag=$(ps -ef | grep -v grep | grep alloc_page| wc -l)
		if [ "$flag" -eq 0 ]; then
			/alloc_page &
		fi

		sleep 30

		start_time=$(date +%s)
		yum install -y expect > /dev/null 2>&1

		end_time=$(date +%s)

		elapsed_time=$((end_time - start_time))

		echo "$elapsed_time seconds"
		yum remove -y expect > /dev/null 2>&1
	done

	$ cat alloc_page.c:
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <string.h>

	#define SIZE 1*1024*1024 //1M

	int main()
	{
		void *addr = NULL;
		int i;

		for (i = 0; i < 1024 * 6 - 50;i++) {
			addr = (void *)malloc(SIZE);
			if (!addr)
				return -1;

			memset(addr, 0, SIZE);
		}

		sleep(99999);
		return 0;
	}


We found that this problem is caused by a lot ot meaningless read-ahead. 
Since the docker is almost met memory limit, the page will be reclaimed
immediately after read-ahead and will read-ahead again immediately.  The
program is executed slowly and waste a lot of I/O resource.

These two patch aim to break the read-ahead in above scenario.

[1] https://lore.kernel.org/linux-mm/c2f4a2fa-3bde-72ce-66f5-db81a373fdbc@huawei.com/T/
[2] https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/
[3] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/


This patch (of 2):

When filemap_add_folio() return -ENOMEM, break read-ahead loop like what
filemap_alloc_folio() does.

Link: https://lkml.kernel.org/r/20240322093555.226789-1-liushixin2@huawei.com
Link: https://lkml.kernel.org/r/20240322093555.226789-2-liushixin2@huawei.comSigned-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
Signed-off-by: default avatarJinjiang Tu <tujinjiang@huawei.com>
Reviewed-by: default avatarJan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent f238b8c3
...@@ -228,6 +228,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, ...@@ -228,6 +228,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
*/ */
for (i = 0; i < nr_to_read; i++) { for (i = 0; i < nr_to_read; i++) {
struct folio *folio = xa_load(&mapping->i_pages, index + i); struct folio *folio = xa_load(&mapping->i_pages, index + i);
int ret;
if (folio && !xa_is_value(folio)) { if (folio && !xa_is_value(folio)) {
/* /*
...@@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, ...@@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
folio = filemap_alloc_folio(gfp_mask, 0); folio = filemap_alloc_folio(gfp_mask, 0);
if (!folio) if (!folio)
break; break;
if (filemap_add_folio(mapping, folio, index + i,
gfp_mask) < 0) { ret = filemap_add_folio(mapping, folio, index + i, gfp_mask);
if (ret < 0) {
folio_put(folio); folio_put(folio);
if (ret == -ENOMEM)
break;
read_pages(ractl); read_pages(ractl);
ractl->_index++; ractl->_index++;
i = ractl->_index + ractl->_nr_pages - index - 1; i = ractl->_index + ractl->_nr_pages - index - 1;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment