Commits · 30a5537f9a9e91937aad6a47f55683f7ce0be257 · nexedi / linux

12 Feb, 2015 4 commits

f2fs: trigger correct checkpoint during umount · 30a5537f

Jaegeuk Kim authored Jan 14, 2015

This patch fixes to trigger checkpoint with umount flag when kill_sb was called.
In kill_sb, f2fs_sync_fs was finally called, but at this time, f2fs can't do
checkpoint with CP_UMOUNT.
After then, f2fs_put_super is not doing checkpoint, since it is not dirty.

So, this patch adds a flag to indicate f2fs_sync_fs is called during umount.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

30a5537f

f2fs: update memory footprint information · 6f0aacbc

Jaegeuk Kim authored Jan 10, 2015

This patch adds missing memory usages, and splits them in detail.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

6f0aacbc

f2fs: fix wrong memory footprint statistics in debugfs · 9066c6a7

Chao Yu authored Jan 10, 2015

Our value of memory footprint statistics showed in debugfs is not calculated
correctly. Fix it in this patch.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9066c6a7

f2fs: avoid infinite loop on cp_error · 871f599f

Jaegeuk Kim authored Jan 09, 2015

If cp_error is set, we should avoid all the infinite loop.
In f2fs_sync_file, there is a hole, and this patch fixes that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

871f599f

10 Jan, 2015 32 commits

f2fs: pids_lock can be static · 08e4126e

kbuild test robot authored Jan 09, 2015

fs/f2fs/trace.c:19:12: sparse: symbol 'pids_lock' was not declared. Should it be static?
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

08e4126e

f2fs: add f2fs_destroy_trace_ios to free radix tree · 351f4fba

Jaegeuk Kim authored Jan 07, 2015

This patch removes radix tree after finishing tracing IOs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

351f4fba

f2fs: add spin_lock to cover radix operations in IO tracer · c0508650

Jaegeuk Kim authored Jan 07, 2015

This patch adds spin_lock to cover radix tree operations in IO tracer.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c0508650

f2fs: add nat/sit entries into status · dd4e4b59

Jaegeuk Kim authored Jan 07, 2015

This patch adds NAT/SIT entry informations.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

dd4e4b59

f2fs: free radix_tree_nodes used by nat_set entries · 7aed0d45

Jaegeuk Kim authored Jan 07, 2015

In the normal case, the radix_tree_nodes are freed successfully.
But, when cp_error was detected, we should destroy them forcefully.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

7aed0d45

f2fs: fix wrong unlock_page call · df199139

Jaegeuk Kim authored Jan 06, 2015

This patch removes wrongly called unlock_page.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

df199139

f2fs: get rid of kzalloc in __recover_inline_status · 9e5ba77f

Chao Yu authored Jan 06, 2015

We use kzalloc to allocate memory in __recover_inline_status, and use this
all-zero memory to check the inline date content of inode page by comparing
them. This is low effective and not needed, let's check inline date content
directly.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: make the code more neat]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9e5ba77f

f2fs: align direct_io'ed data to section · 38aa0889

Jaegeuk Kim authored Jan 05, 2015

This patch aligns the start block address of a file for direct io to the f2fs's
section size.

Some flash devices manage an over 4KB-sized page as a write unit, and if the
direct_io'ed data are written but not aligned to that unit, the performance can
be degraded due to the partial page copies.

Thus, since f2fs has a section that is well aligned to FTL units, we can align
the block address to the section size so that f2fs avoids this misalignment.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

38aa0889

f2fs: remove uncovered code path · 41ef94b3

Jaegeuk Kim authored Dec 30, 2014

This patch removes unnecessary function calls.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

41ef94b3

f2fs: avoid potential unnecessary codes · 3547ea96

Jaegeuk Kim authored Dec 30, 2014

This patch relocates some operations to avoid unnecessary execution.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3547ea96

f2fs: clean up to remove parameter · e1509cf2

Jaegeuk Kim authored Dec 30, 2014

This patch uses dn->data_blkaddr as a parameter for the destination block
address.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

e1509cf2

f2fs: reuse inode_entry_slab in gc procedure for using slab more effectively · 06292073

Chao Yu authored Dec 29, 2014

There are two slab cache inode_entry_slab and winode_slab using the same
structure as below:

struct dir_inode_entry {
	struct list_head list;	/* list head */
	struct inode *inode;	/* vfs inode pointer */
};

struct inode_entry {
	struct list_head list;
	struct inode *inode;
};

It's a little waste that the two cache can not share their memory space for each
other.
So in this patch we remove one redundant winode_slab slab cache, then use more
universal name struct inode_entry as remaining data structure name of slab,
finally we reuse the inode_entry_slab to store dirty dir item and gc item for
more effective.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

06292073

f2fs: cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio · 2ace38e0

Chao Yu authored Dec 24, 2014

Cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio
as one parameter.
Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

2ace38e0

f2fs: cleanup trace event of f2fs_submit_page_{m,}bio with DECLARE_EVENT_CLASS · 3e1c8f12

Chao Yu authored Dec 23, 2014

This patch adds missing parameter _type_ for trace_f2fs_submit_page_bio, then
use DECLARE_EVENT_CLASS/DEFINE_EVENT_CONDITION pair to cleanup some trace event
code related to f2fs_submit_page_{m,}bio.

Additionally, after we remove redundant code, size of code can be reduced:
   text    data     bss     dec     hex filename
 176787    8712      56  185555   2d4d3 f2fs.ko.org
 174408    8648      56  183112   2cb48 f2fs.ko
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3e1c8f12

f2fs: fix missing cold bit during recovery · 09eb483e

Jaegeuk Kim authored Dec 23, 2014

In do_recover_data, we find and update previous node pages after updating
its new block addresses.
After then, we call fill_node_footer without reset field, we erase its
cold bit so that this new cold node block is written to wrong log area.
This patch fixes not to miss its old flag.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

09eb483e

f2fs: add block count by in-place-update in stat info · b9a2c252

Changman Lee authored Dec 24, 2014

This patch adds block count by in-place-update in stat.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

b9a2c252

f2fs: avoid double lock for cp_rwsem · dd802406

Jaegeuk Kim authored Dec 18, 2014

The __f2fs_add_link is covered by cp_rwsem all the time.
This calls init_inode_metadata, which conducts some acl operations including
memory allocation with GFP_KERNEL previously.
But, under memory pressure, f2fs_write_data_page can be called, which also
grabs cp_rwsem too.

In this case, this incurs a deadlock pointed by Chao.
Thread #1        Thread #2
 down_read
                 down_write
  down_read
 -> here down_read should wait forever.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

dd802406

f2fs: activate f2fs_trace_ios · db9f7c1a

Jaegeuk Kim authored Dec 17, 2014

This patch activates f2fs_trace_ios.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

db9f7c1a

f2fs: activate f2fs_trace_pid · 9e4ded3f

Jaegeuk Kim authored Dec 17, 2014

This patch activates f2fs_trace_pid.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9e4ded3f

f2fs: add key functions for f2fs_io_tracer · 0e689d03

Jaegeuk Kim authored Dec 17, 2014

This patch adds two key functions to trace process ids and IOs.
The basic idea is to
1. remain process ids, pids, in page->private.
2. show pids in IO traces.

So, later we can retrieve process information according to IO traces.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

0e689d03

f2fs: add f2fs_io_tracer support · 63f92ddc

Jaegeuk Kim authored Dec 17, 2014

This patch adds:
 o initial trace.c and trace.h with skeleton functions
 o Kconfig and Makefile to activate this feature
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

63f92ddc

f2fs: use f2fs_io_info to clean up messy parameters during IO path · cf04e8eb

Jaegeuk Kim authored Dec 17, 2014

This patch cleans up parameters on IO paths.
The key idea is to use f2fs_io_info adding a parameter, block address, and then
use this structure as parameters.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

cf04e8eb

f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary · 9ecf4b80

Chao Yu authored Dec 18, 2014

Use more common function ra_meta_pages() with META_POR to readahead node blocks
in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the
readahead code there, and also we can remove unused function ra_sum_pages().

changes from v2:
 o use invalidate_mapping_pages as before suggested by Changman Lee.
changes from v1:
 o fix one bug when using truncate_inode_pages_range which is pointed out by
   Jaegeuk Kim.
Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9ecf4b80

f2fs: merge two uchar variable in struct node_info to reduce memory cost · 5c27f4ee

Chao Yu authored Dec 18, 2014

This patch moves one member of struct nat_entry: _flag_ to struct node_info,
so _version_ in struct node_info and _flag_ which are unsigned char type will
merge to one 32-bit space in register/memory. So the size of nat_entry will be
reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40
bytes to 32 bytes) and then slab memory using by f2fs will be reduced.

changes from v2:
 o update description of memory usage gain for 64-bit machine suggested by
   Changman Lee.
changes from v1:
 o introduce inline copy_node_info() to copy valid data from node info suggested
   by Jaegeuk Kim, it can avoid bug.
Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

5c27f4ee

f2fs: readahead contiguous current summary blocks in checkpoint · 3fa06d7b

Chao Yu authored Dec 09, 2014

Let's add readahead code for reading contiguous compact/normal summary blocks
in checkpoint, then we will gain better performance in mount procedure.

Changes from v1
  o remove inappropriate 'unlikely' in npages_for_summary_flush.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3fa06d7b

f2fs: use missing the use of f2fs_kunmap_page · 5df1f1da

Jaegeuk Kim authored Dec 13, 2014

This patch calls f2fs_kunmap_page which I missed before.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

5df1f1da

f2fs: remove unnecessary call to invalidate inmemory pages · 042b7816

Jaegeuk Kim authored Dec 12, 2014

Now we use inmemory pages for atomic write only and provide abort procedure,
we don't need to truncate them explicitly.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

042b7816

f2fs: fix small discards not to issue redundantly · d7bc2484

Jaegeuk Kim authored Dec 12, 2014

The ckpt_valid_map and cur_valid_map are synced by seg_info_to_raw_sit.

In the case of small discards, the candidates are selected before sync,
while fitrim selects candidates after sync.

So, for small discards, we need to add candidates only just being obsoleted.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

d7bc2484

f2fs: change atomic and volatile write policies · 1e84371f

Jaegeuk Kim authored Dec 09, 2014

This patch adds two new ioctls to release inmemory pages grabbed by atomic
writes.
 o f2fs_ioc_abort_volatile_write
  - If transaction was failed, all the grabbed pages and data should be written.
 o f2fs_ioc_release_volatile_write
  - This is to enhance the performance of PERSIST mode in sqlite.

In order to avoid huge memory consumption which causes OOM, this patch changes
volatile writes to use normal dirty pages, instead blocked flushing to the disk
as long as system does not suffer from memory pressure.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

1e84371f

f2fs: don't need to call lock_op and lock_page for abort · 70c640b1

Jaegeuk Kim authored Dec 10, 2014

We don't need to call lock_op and lock_page at the aborting path.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

70c640b1

f2fs: fix wrong condition check to trigger f2fs_sync_fs · 88a70a69

Jaegeuk Kim authored Dec 10, 2014

If there is not enough available memory, we need to trigger f2fs_sync_fs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

88a70a69

f2fs: remove checking dirty_exceed · cd52b636

Jaegeuk Kim authored Dec 10, 2014

We don't need to force to write dirty_exceeded for f2fs_balance_fs_bg.
This flag was only meaningful to write bypassing conditions.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

cd52b636

09 Jan, 2015 1 commit

Merge branch 'akpm' (patches from Andrew) · b3d574ae

Linus Torvalds authored Jan 09, 2015

Merge misc fixes from Andrew Morton:
 "12 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  memcg: fix destination cgroup leak on task charges migration
  mm: memcontrol: switch soft limit default back to infinity
  mm/debug_pagealloc: remove obsolete Kconfig options
  vfs: renumber FMODE_NONOTIFY and add to uniqueness check
  arch/blackfin/mach-bf533/boards/stamp.c: add linux/delay.h
  ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file
  MAINTAINERS: update rydberg's addresses
  mm: protect set_page_dirty() from ongoing truncation
  mm: prevent endless growth of anon_vma hierarchy
  exit: fix race between wait_consider_task() and wait_task_zombie()
  ocfs2: remove bogus check in dlm_process_recovery_data

b3d574ae

08 Jan, 2015 3 commits

mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed · 9e5e3661

Vlastimil Babka authored Jan 08, 2015

Charles Shirron and Paul Cassella from Cray Inc have reported kswapd
stuck in a busy loop with nothing left to balance, but
kswapd_try_to_sleep() failing to sleep.  Their analysis found the cause
to be a combination of several factors:

1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait

2. The process has been killed (by OOM in this case), but has not yet been
   scheduled to remove itself from the waitqueue and die.

3. kswapd checks for throttled processes in prepare_kswapd_sleep():

        if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
                wake_up(&pgdat->pfmemalloc_wait);
		return false; // kswapd will not go to sleep
	}

   However, for a process that was already killed, wake_up() does not remove
   the process from the waitqueue, since try_to_wake_up() checks its state
   first and returns false when the process is no longer waiting.

4. kswapd is running on the same CPU as the only CPU that the process is
   allowed to run on (through cpus_allowed, or possibly single-cpu system).

5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
   encounters no voluntary preemption points and repeatedly fails
   prepare_kswapd_sleep(), blocking the process from running and removing
   itself from the waitqueue, which would let kswapd sleep.

So, the source of the problem is that we prevent kswapd from going to
sleep until there are processes waiting on the pfmemalloc_wait queue,
and a process waiting on a queue is guaranteed to be removed from the
queue only when it gets scheduled.  This was done to make sure that no
process is left sleeping on pfmemalloc_wait when kswapd itself goes to
sleep.

However, it isn't necessary to postpone kswapd sleep until the
pfmemalloc_wait queue actually empties.  To prevent processes from being
left sleeping, it's actually enough to guarantee that all processes
waiting on pfmemalloc_wait queue have been woken up by the time we put
kswapd to sleep.

This patch therefore fixes this issue by substituting 'wake_up' with
'wake_up_all' and removing 'return false' in the code snippet from
prepare_kswapd_sleep() above.  Note that if any process puts itself in
the queue after this waitqueue_active() check, or after the wake up
itself, it means that the process will also wake up kswapd - and since
we are under prepare_to_wait(), the wake up won't be missed.  Also we
update the comment prepare_kswapd_sleep() to hopefully more clearly
describe the races it is preventing.

Fixes: 5515061d ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>	[3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9e5e3661

memcg: fix destination cgroup leak on task charges migration · 4bdfc1c4

Vladimir Davydov authored Jan 08, 2015

We are supposed to take one css reference per each memory page and per
each swap entry accounted to a memory cgroup.  However, during task
charges migration we take a reference to the destination cgroup twice
per each swap entry: first in mem_cgroup_do_precharge()->try_charge()
and then in mem_cgroup_move_swap_account(), permanently leaking the
destination cgroup.

The hunk taking the second reference seems to be a leftover from the
pre-00501b53 ("mm: memcontrol: rewrite charge API") era.  Remove it
to fix the leak.

Fixes: e8ea14cc (mm: memcontrol: take a css reference for each charged page)
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4bdfc1c4

mm: memcontrol: switch soft limit default back to infinity · 24d404dc

Johannes Weiner authored Jan 08, 2015

Commit 3e32cb2e ("mm: memcontrol: lockless page counters")
accidentally switched the soft limit default from infinity to zero,
which turns all memcgs with even a single page into soft limit excessors
and engages soft limit reclaim on all of them during global memory
pressure.  This makes global reclaim generally more aggressive, but also
inverts the meaning of existing soft limit configurations where unset
soft limits are usually more generous than set ones.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

24d404dc