Commits · cf108bca465dde0c015f32dd453b99457d31c7c7 · Kirill Smelkov / linux

11 Jul, 2008 20 commits

ext4: Invert the locking order of page_lock and transaction start · cf108bca

Jan Kara authored Jul 11, 2008

This changes are needed to support data=ordered mode handling via
inodes.  This enables us to get rid of the journal heads and buffer
heads for data buffers in the ordered mode.  With the changes, during
tranasaction commit we writeout the inode pages using the
writepages()/writepage(). That implies we take page lock during
transaction commit. This can cause a deadlock with the locking order
page_lock -> jbd2_journal_start, since the jbd2_journal_start can wait
for the journal_commit to happen and the journal_commit now needs to
take the page lock. To avoid this dead lock reverse the locking order.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

cf108bca

vfs: Move mark_inode_dirty() from under page lock in generic_write_end() · c7d206b3

Jan Kara authored Jul 11, 2008

There's no need to call mark_inode_dirty() under page lock in
generic_write_end(). It unnecessarily makes hold time of page lock longer
and more importantly it forces locking order of page lock and transaction
start for journaling filesystems.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

c7d206b3

ext4: Use page_mkwrite vma_operations to get mmap write notification. · 2e9ee850

Aneesh Kumar K.V authored Jul 11, 2008

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

2e9ee850

ext4: Documentation updates. · 93e3270c

Jose R. Santos authored Jul 11, 2008

Some of the information in Documentation/filesystems/ext4.txt is out
of date and in need of an update.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

93e3270c

ext4: fix online resize with mballoc · 5f21b0e6

Frederic Bohe authored Jul 11, 2008

Update group infos when updating a group's descriptor.
Add group infos when adding a group's descriptor.
Refresh cache pages used by mb_alloc when changes occur.
This will probably need modifications when META_BG resizing will be allowed.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>

5f21b0e6

ext4: use atomic functions to set bh_state · 953e622b

Eric Sandeen authored Jul 11, 2008

Use the BUFFER_FNS functions (set_buffer_foo) to set buffer
head state atomically instead of nonatomic __set_bit().
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

953e622b

ext4: Set journal pointer to NULL when journal is released · 47b4a50b

Jan Kara authored Jul 11, 2008

Set sbi->s_journal to NULL after we call journal_destroy(). This
will be later needed because after journal_destroy() is called,
ext4_clear_inode() can still be called for some inodes (e.g. root
inode) and we'll need to detect there that journal doesn't exists
anymore.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

47b4a50b

ext4: mballoc avoid use root reserved blocks for non root allocation · 07031431

Mingming Cao authored Jul 11, 2008

mballoc allocation missed check for blocks reserved for root users. Add
ext4_has_free_blocks() check before allocation. Also modified
ext4_has_free_blocks() to support multiple block allocation request.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

07031431

ext4: call blkdev_issue_flush on fsync · d755fb38

Eric Sandeen authored Jul 11, 2008

To ensure that bits are truly on-disk after an fsync,
we should call blkdev_issue_flush if barriers are supported.

Inspired by an old thread on barriers, by reiserfs & xfs
which do the same, and by a patch SuSE ships with their kernel
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d755fb38

ext4: cleanup block allocator · 654b4908

Aneesh Kumar K.V authored Jul 11, 2008

Move the code related to block allocation to a single function and add helper
funtions to differient allocation for data and meta data blocks
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

654b4908

ext4: Use inode preallocation with -o noextents · 7061eba7

Aneesh Kumar K.V authored Jul 11, 2008

When mballoc is enabled, block allocation for old block-based
files are allocated using mballoc allocator instead of old
block-based allocator. The old ext3 block reservation is turned
off when mballoc is turned on.

However, the in-core preallocation is not enabled for block-based/
non-extent based file block allocation. This result in performance
regression, as now we don't have "reservation" ore in-core preallocation
to prevent interleaved fragmentation in multiple writes workload.

This patch fix this by enable per inode in-core preallocation
for non extent files when mballoc is used.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

7061eba7

ext4: fix ext4_init_block_bitmap() for metablock block group · 6afd6707

Akinobu Mita authored Jul 11, 2008

When meta_bg feature is enabled and s_first_meta_bg != 0, 
ext4_init_block_bitmap() miscalculates the number of block used by
the group descriptor table (0 or 1 for metablock block group)

This patch fixes this by using ext4_bg_num_gdb()
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Andreas Dilger <adilger@sun.com>

6afd6707

ext4: Fix sparse warning · 7477827f

Aneesh Kumar K.V authored Jul 11, 2008

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

7477827f

ext4: cleanup never-used magic numbers from htree code · d9c769b7

Li Zefan authored Jul 11, 2008

dx_root_limit() will had some dead code which forced it to always return
20, and dx_node_limit to always return 22 for debugging purposes.
Remove it.
Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d9c769b7

ext4: Fix ext4_ext_journal_restart() to reflect errors up to the caller · 9102e4fa

Shen Feng authored Jul 11, 2008

Fix ext4_ext_journal_restart() so it returns any errors reported by 
ext4_journal_extend() and ext4_journal_restart().
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

9102e4fa

ext4: Make ext4_ext_find_extent fills ext_path completely · 1973adcb

Shen Feng authored Jul 11, 2008

When pos=0 or depth, the fields of ext4_ext_path is are not
completely filled.  This patch also removes some unnecessary code.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

1973adcb

ext4: return error when calling ext4_ext_split failed · 787e0981

Shen Feng authored Jul 11, 2008

ext4_ext_create_new_leaf must return error when its
calling to ext4_ext_split failed.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

787e0981

ext4: Update i_disksize properly when allocating from fallocate area. · a379cd1d

Aneesh Kumar K.V authored Jul 11, 2008

When allocating unitialized space at the end of file which had been
preallocated with the FALLOC_FL_KEEP_SIZE option, the file size is not
updated at that time.  But the later we are not updating the file size
when writing to that preallocated space.

These changes are for code correctness.  This patch allows us to update
the i_disksize at the write_end() callback of filesystem properly.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a379cd1d

ext4: remove quota allocation when ext4_mb_new_blocks fails · 363d4251

Shen Feng authored Jul 11, 2008

Quota allocation is not removed when ext4_mb_new_blocks calls
kmem_cache_alloc failed.  Also make sure the allocation context is freed
on the error path.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

363d4251

ext4: remove redundant code in ext4_fill_super() · f9a8ac99

Li Zefan authored Jul 11, 2008

The previous sb_min_blocksize() has already set the block size.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f9a8ac99

14 Jul, 2008 1 commit

jbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit transaction · 530576bb

Mingming Cao authored Jul 13, 2008

journal_try_to_free_buffers() could race with jbd commit transaction
when the later is holding the buffer reference while waiting for the
data buffer to flush to disk. If the caller of
journal_try_to_free_buffers() request tries hard to release the buffers,
it will treat the failure as error and return back to the caller. We
have seen the directo IO failed due to this race.  Some of the caller of
releasepage() also expecting the buffer to be dropped when passed with
GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers().

With this patch, if the caller is passing the GFP_KERNEL to indicating
this call could wait, in case of try_to_free_buffers() failed, let's
waiting for journal_commit_transaction() to finish commit the current
committing transaction , then try to free those buffers again with
journal locked.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> 
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

530576bb

11 Jul, 2008 2 commits

ext4: New inode allocation for FLEX_BG meta-data groups. · 772cb7c8

Jose R. Santos authored Jul 11, 2008

This patch mostly controls the way inode are allocated in order to
make ialloc aware of flex_bg block group grouping.  It achieves this
by bypassing the Orlov allocator when block group meta-data are packed
toghether through mke2fs.  Since the impact on the block allocator is
minimal, this patch should have little or no effect on other block
allocation algorithms. By controlling the inode allocation, it can
basically control where the initial search for new block begins and
thus indirectly manipulate the block allocator.

This allocator favors data and meta-data locality so the disk will
gradually be filled from block group zero upward.  This helps improve
performance by reducing seek time.  Since the group of inode tables
within one flex_bg are treated as one giant inode table, uninitialized
block groups would not need to partially initialize as many inode
table as with Orlov which would help fsck time as the filesystem usage
goes up.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

772cb7c8

jbd2: Add commit time into the commit block · 736603ab

Theodore Ts'o authored Jul 11, 2008

Carlo Wood has demonstrated that it's possible to recover deleted
files from the journal.  Something that will make this easier is if we
can put the time of the commit into commit block.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

736603ab

14 Jul, 2008 3 commits

ext4: replace __FUNCTION__ occurrences · 4db9c54a

Stoyan Gaydarov authored Jul 13, 2008

__FUNCTION__ is gcc-specific, use __func__ instead
Signed-off-by: Stoyan Gaydarov <stoyboyker@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

4db9c54a

ext4: fix error processing in mb_free_blocks · 7e5a8cdd

Shen Feng authored Jul 13, 2008

The error processing of the return value of mb_free_blocks is meanless
because it only returns 0.  This fix includes

- make mb_free_blocks return void

- remove the error processing part in callers

- unlock group before calling ext4_error in mb_free_blocks
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

7e5a8cdd

ext4: error proc entry creation when the fs/ext4 is not correctly created · cfbe7e4f

Shen Feng authored Jul 13, 2008

When the directory fs/ext4 is not correctly created under proc, the entry
under this directory should not be created.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

cfbe7e4f

11 Jul, 2008 14 commits

ext4: fix build failure if DX_DEBUG is enabled · f795e140

Li Zefan authored Jul 11, 2008

ext4_next_entry() is used by the debugging function dx_show_leaf(), so
it must be defined before that function.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f795e140

ext4: Remove unused variable from ext4_show_options · 7ad72ca6
Theodore Ts'o authored Jul 11, 2008
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
7ad72ca6

ext4: Rename read_block_bitmap() to ext4_read_block_bitmap() · 574ca174

Theodore Ts'o authored Jul 11, 2008

Since this a non-static function, make it be ext4 specific to avoid
conflicts with potentially other filesystems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

574ca174

ext4: remove double definitions of xattr macros · 3537576a

Shen Feng authored Jul 11, 2008

remove the definitions of macros XATTR_TRUSTED_PREFIX and XATTR_USER_PREFIX
since they are defined in linux/xattr.h
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

3537576a

ext4: miscellaneous error checks and coding cleanups for mballoc · 74767c5a

Shen Feng authored Jul 11, 2008

ext4_mb_seq_history_open(): check if sbi->s_mb_history is NULL

ext4_mb_history_init(): replace kmalloc and memset with kzalloc

ext4_mb_init_backend(): remove memset since kzalloc is used

ext4_mb_init(): the return value of ext4_mb_init_backend is int,
	but i is unsigned, replace it with a new int variable.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

74767c5a

ext4: add error processing when calling ext4_mb_init_cache in mballoc · fdf6c7a7

Shen Feng authored Jul 11, 2008

Add error processing for ext4_mb_load_buddy when it calls
ext4_mb_init_cache.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

fdf6c7a7

ext4: Fix ext4_mb_init_cache return error · 31b481dc

Mingming Cao authored Jul 11, 2008

ext4_mb_init_cache() incorrectly always return EIO on success. This
causes the caller of ext4_mb_init_cache() fail when it checks the return
value.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

31b481dc

ext4: improve some code in rb tree part of dir.c · 69baee06

Shen Feng authored Jul 11, 2008

* remove unnecessary code in free_rb_tree_fname

* rename free_rb_tree_fname to ext4_htree_create_dir_info
  since it and ext4_htree_free_dir_info are a pair

* replace kmalloc with kzalloc in ext4_htree_free_dir_info

All these make the code more readable and simple.
PS: this patch is also suitable for ext3.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

69baee06

ext4: switch to seq_files · 91d99827

Alexey Dobriyan authored Jul 11, 2008

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

91d99827

ext4: Use BUG_ON() instead of BUG() · 07d45f12

Julia Lawall authored Jul 11, 2008

if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

07d45f12

ext4: start searching for the right extent from the goal group. · ed8f9c75

Aneesh Kumar K.V authored Jul 11, 2008

With mballoc we search for the best extent using different
criteria. We should always use the goal group when we are
starting with a new criteria.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

ed8f9c75

ext4: fix comments to say "ext4" · 8a35694e

Shen Feng authored Jul 11, 2008

Change second/third to fourth.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8a35694e

ext4: Fix mb_find_next_bit not to return larger than max · e7dfb246

Aneesh Kumar K.V authored Jul 11, 2008

Some architectures implement ext4_find_next_bit and
ext4_find_next_zero_bit in such a way that they return
greater than max for some input values. Make sure
mb_find_next_bit and mb_find_next_zero_bit return the
right values.

On 2.6.25 we have include/asm-x86/bitops_32.h
static inline unsigned find_first_bit(const unsigned long *addr, unsigned size)
{
	unsigned x = 0;

	while (x < size) {
		unsigned long val = *addr++;
		if (val)
			return __ffs(val) + x;
		x += (sizeof(*addr)<<3);
	}
	return x;
}

This can return value greater than size.

Reported and fixed here for lustre

https://bugzilla.lustre.org/show_bug.cgi?id=15932
https://bugzilla.lustre.org/attachment.cgi?id=17205Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

e7dfb246

ext4: validate directory entry data before use · f3b35f06

Duane Griffin authored Jul 11, 2008

ext4_dx_find_entry uses ext4_next_entry without verifying that the entry is
valid. If its rec_len == 0 this causes an infinite loop. Refactor the loop
to check the validity of entries before checking whether they match and
moving onto the next one.

There are other uses of ext4_next_entry in this file which also look
problematic. They should be reviewed and fixed if/when we have a test-case
that triggers them.

This patch fixes the first case (image hdb.25.softlockup.gz) reported in
http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

f3b35f06