mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM (0fd44ab2) · Commits · Kirill Smelkov / linux

Commit 0fd44ab2 authored Mar 22, 2024 by

Liu Shixin Committed by Andrew Morton Apr 25, 2024

mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM

Patch series "Fix I/O high when memory almost met memcg limit", v2.

Recently, when install package in a docker which almost reached its memory
limit, the installer has no respond severely for more than 15 minutes. 
During this period, I/O stays high(~1G/s) and influence the whole machine.
I've constructed a use case as follows:

  1. create a docker:

	$ cat test.sh
	#!/bin/bash
  
	docker rm centos7 --force

	docker create --name centos7 --memory 4G --memory-swap 6G centos:7 /usr/sbin/init
	docker start centos7
	sleep 1

	docker cp ./alloc_page centos7:/
	docker cp ./reproduce.sh centos7:/

	docker exec -it centos7 /bin/bash

  2. try reproduce the problem in docker:

	$ cat reproduce.sh
	#!/bin/bash
  
	while true; do
		flag=$(ps -ef | grep -v grep | grep alloc_page| wc -l)
		if [ "$flag" -eq 0 ]; then
			/alloc_page &
		fi

		sleep 30

		start_time=$(date +%s)
		yum install -y expect > /dev/null 2>&1

		end_time=$(date +%s)

		elapsed_time=$((end_time - start_time))

		echo "$elapsed_time seconds"
		yum remove -y expect > /dev/null 2>&1
	done

	$ cat alloc_page.c:
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <string.h>

	#define SIZE 1*1024*1024 //1M

	int main()
	{
		void *addr = NULL;
		int i;

		for (i = 0; i < 1024 * 6 - 50;i++) {
			addr = (void *)malloc(SIZE);
			if (!addr)
				return -1;

			memset(addr, 0, SIZE);
		}

		sleep(99999);
		return 0;
	}


We found that this problem is caused by a lot ot meaningless read-ahead. 
Since the docker is almost met memory limit, the page will be reclaimed
immediately after read-ahead and will read-ahead again immediately.  The
program is executed slowly and waste a lot of I/O resource.

These two patch aim to break the read-ahead in above scenario.

[1] https://lore.kernel.org/linux-mm/c2f4a2fa-3bde-72ce-66f5-db81a373fdbc@huawei.com/T/
[2] https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/
[3] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/


This patch (of 2):

When filemap_add_folio() return -ENOMEM, break read-ahead loop like what
filemap_alloc_folio() does.

Link: https://lkml.kernel.org/r/20240322093555.226789-1-liushixin2@huawei.com
Link: https://lkml.kernel.org/r/20240322093555.226789-2-liushixin2@huawei.comSigned-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

parent f238b8c3

Show whitespace changes

Inline Side-by-side

View file @ 0fd44ab2

...	@@ -228,6 +228,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,	...	@@ -228,6 +228,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
	*/		*/
	for (i = 0; i < nr_to_read; i++) {		for (i = 0; i < nr_to_read; i++) {
	struct folio *folio = xa_load(&mapping->i_pages, index + i);		struct folio *folio = xa_load(&mapping->i_pages, index + i);
			int ret;

	if (folio && !xa_is_value(folio)) {		if (folio && !xa_is_value(folio)) {
	/*		/*
...	@@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,	...	@@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
	folio = filemap_alloc_folio(gfp_mask, 0);		folio = filemap_alloc_folio(gfp_mask, 0);
	if (!folio)		if (!folio)
	break;		break;
	if (filemap_add_folio(mapping, folio, index + i,
	gfp_mask) < 0) {		ret = filemap_add_folio(mapping, folio, index + i, gfp_mask);
			if (ret < 0) {
	folio_put(folio);		folio_put(folio);
			if (ret == -ENOMEM)
			break;
	read_pages(ractl);		read_pages(ractl);
	ractl->_index++;		ractl->_index++;
	i = ractl->_index + ractl->_nr_pages - index - 1;		i = ractl->_index + ractl->_nr_pages - index - 1;
...		...

Please register or to comment