- 26 Feb, 2016 5 commits
-
-
David Sterba authored
-
David Sterba authored
-
David Sterba authored
-
David Sterba authored
-
David Sterba authored
# Conflicts: # fs/btrfs/file.c
-
- 18 Feb, 2016 13 commits
-
-
Zhao Lei authored
For a non-existent device, old code bypasses adding it in dev's reada queue. And to solve problem of unfinished waitting in raid5/6, commit 5fbc7c59 ("Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting") adding an exception for the first stripe, in short, the first stripe will always be processed whether the device exists or not. Actually we have a better way for the above request: just bypass creation of the reada_extent for non-existent device, it will make code simple and effective. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
Reada background works is not designed to finish all jobs completely, it will break in following case: 1: When a device reaches workload limit (MAX_IN_FLIGHT) 2: Total reads reach max limit (10000) 3: All devices don't have queued more jobs, often happened in DUP case And if all background works exit with remaining jobs, btrfs_reada_wait() will wait indefinetelly. Above problem is rarely happened in old code, because: 1: Every work queues 2x new works So many works reduced chances of undone jobs. 2: One work will continue 10000 times loop in case of no-jobs It reduced no-thread window time. But after we fixed above case, the "undone reada extents" frequently happened. Fix: Check to ensure we have at least one thread if there are undone jobs in btrfs_reada_wait(). Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
Reada creates 2 works for each level of tree recursively. In case of a tree having many levels, the number of created works is 2^level_of_tree. Actually we don't need so many works in parallel, this patch limits max works to BTRFS_MAX_MIRRORS * 2. The per-fs works_counter will be also used for btrfs_reada_wait() to check is there are background workers. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
No need to decrease dev->reada_in_flight in __readahead_hook()'s internal and reada_extent_put(). reada_extent_put() have no chance to decrease dev->reada_in_flight in free operation, because reada_extent have additional refcnt when scheduled to a dev. We can put inc and dec operation for dev->reada_in_flight to one place instead to make logic simple and safe, and move useless reada_extent->scheduled_for to a bool flag instead. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
Remove one copy of loop to fix the typo of iterate zones. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
Current code set nritems to 0 to make for_loop useless to bypass it, and set generation's value which is not necessary. Jump into cleanup directly is better choise. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
What __readahead_hook() need exactly is fs_info, no need to convert fs_info to root in caller and convert back in __readahead_hook() Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
reada_start_machine_dev() already have reada_extent pointer, pass it into __readahead_hook() directly instead of search radix_tree will make code run faster. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
We can't release reada_extent earlier than __readahead_hook(), because __readahead_hook() still need to use it, it is necessary to hode a refcnt to avoid it be freed. Actually it is not a problem after my patch named: Avoid many times of empty loop It make reada_extent in above line include at least one reada_extctl, which keeps additional one refcnt for reada_extent. But we still need this patch to make the code in pretty logic. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
level is not used in severial functions, remove them from arguments, and remove relative code for get its value. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
When failed adding all dev_zones for a reada_extent, the extent will have no chance to be selected to run, and keep in memory for ever. We should bypass this extent to avoid above case. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
If some device is not reachable, we should bypass and continus addingb next, instead of break on bad device. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
Move is_need_to_readahead contition earlier to avoid useless loop to get relative data for readahead. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
- 16 Feb, 2016 4 commits
-
-
Zhao Lei authored
We can see following loop(10000 times) in trace_log: [ 75.416137] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 75.417413] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 [ 75.418611] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 75.419793] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 [ 75.421016] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 75.422324] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 [ 75.423661] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 75.424882] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 ...(10000 times) [ 124.101672] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 124.102850] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 [ 124.104008] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 124.105121] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 Reason: If more than one user trigger reada in same extent, the first task finished setting of reada data struct and call reada_start_machine() to start, and the second task only add a ref_count but have not add reada_extctl struct completely, the reada_extent can not finished all jobs, and will be selected in __reada_start_machine() for 10000 times(total times in __reada_start_machine()). Fix: For a reada_extent without job, we don't need to run it, just return 0 to let caller break. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
In rechecking zone-in-tree, we still need to check zone include our logical address. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
We can avoid additional locking-acquirment and one pair of kref_get/put by combine two condition. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Zhao Lei authored
reada_zone->end is end pos of segment: end = start + cache->key.offset - 1; So we need to use "<=" in condition to judge is a pos in the segment. The problem happened rearly, because logical pos rarely pointed to last 4k of a blockgroup, but we need to fix it to make code right in logic. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
- 12 Feb, 2016 3 commits
-
-
Qu Wenruo authored
Introduce new mount option alias "norecovery" for nologreplay, to keep "norecovery" behavior the same with other filesystems. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Introduce a new mount option "nologreplay" to co-operate with "ro" mount option to get real readonly mount, like "norecovery" in ext* and xfs. Since the new parse_options() need to check new flags at remount time, so add a new parameter for parse_options(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Current "recovery" mount option will only try to use backup root. However the word "recovery" is too generic and may be confusing for some users. Here introduce a new and more specific mount option, "usebackuproot" to replace "recovery" mount option. "Recovery" will be kept for compatibility reason, but will be deprecated. Also, since "usebackuproot" will only affect mount behavior and after open_ctree() it has nothing to do with the filesystem, so clear the flag after mount succeeded. This provides the basis for later unified "norecovery" mount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ dropped usebackuproot from show_mount, added note about 'recovery' to docs ] Signed-off-by: David Sterba <dsterba@suse.com>
-
- 11 Feb, 2016 15 commits
-
-
David Sterba authored
Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The number of distinct key types is not that big that we could waste one for something new we want to store in the tree. Similar to the temporary items, we'll introduce a new name for an existing key value and use the objectid for further extension. The victim is the BTRFS_DEV_STATS_KEY (248). The device stats are an example of a permanent item. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
No visible change. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The number of distinct key types is not that big that we could waste one for something new we want to store in the tree. We'll introduce a new name for an existing key value and use the objectid for further extension. The victim is the BTRFS_BALANCE_ITEM_KEY (248). The nature of the balance status item is a good example of the temporary item. It exists from beginning of the balance, keeps the status until it finishes. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Kcalloc is functionally equivalent and does overflow checks. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can safely use GFP_KERNEL in the functions called from the ioctl handlers. Here we can allocate up to 32k so less pressure to the allocator could help. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can safely use GFP_KERNEL in the functions called from the ioctl handlers. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Readdir is initiated from userspace and is not on the critical writeback path, we don't need to use GFP_NOFS for allocations. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Fallocate is initiated from userspace and is not on the critical writeback path, we don't need to use GFP_NOFS for allocations. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We don't need to use GFP_NOFS in all contexts, eg. during mount or for dummy root tree, but we might for the the log tree creation. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Scrub is not on the critical writeback path we don't need to use GFP_NOFS for all allocations. The failures are handled and stats passed back to userspace. Let's use GFP_KERNEL on the paths where everything is ok, ie. setup the global structures and the IO submission paths. Functions that do the repair and fixups still use GFP_NOFS as we might want to skip any other filesystem activity if we encounter an error. This could turn out to be unnecessary, but requires more review compared to the easy cases in this patch. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The readahead framework is not on the critical writeback path we don't need to use GFP_NOFS for allocations. All error paths are handled and the readahead failures are not fatal. The actual users (scrub, dev-replace) will trigger reads if the blocks are not found in cache. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The send operation is not on the critical writeback path we don't need to use GFP_NOFS for allocations. All error paths are handled and the whole operation is restartable. Signed-off-by: David Sterba <dsterba@suse.com>
-