Commits · 5aca07eb7d8f14d90c740834d15ca15277f4820c · nexedi / linux

09 Dec, 2009 5 commits

Dmitry Monakhov authored Dec 08, 2009

Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5aca07eb

ext4: ext4_get_reserved_space() must return bytes instead of blocks · 8aa6790f

Dmitry Monakhov authored Dec 08, 2009

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8aa6790f

ext4: remove blocks from inode prealloc list on failure · b844167e

Curt Wohlgemuth authored Dec 08, 2009

This fixes a leak of blocks in an inode prealloc list if device failures
cause ext4_mb_mark_diskspace_used() to fail.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b844167e

ext4: wait for log to commit when umounting · d4edac31

Josef Bacik authored Dec 08, 2009

There is a potential race when a transaction is committing right when
the file system is being umounting.  This could reduce in a race
because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
before the commit code calls a callback so the mballoc code can
release freed blocks in the transaction, resulting in a panic trying
to access the freed s_group_info.

The fix is to wait for the transaction to finish committing before we
shutdown the multiblock allocator.  
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d4edac31

ext4: Avoid data / filesystem corruption when write fails to copy data · b9a4207d

Jan Kara authored Dec 08, 2009

When ext4_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks
already instantiated beyond i_size.  Although these blocks were never
inside i_size, we have to truncate the pagecache of these blocks so
that corresponding buffers get unmapped.  Otherwise subsequent
__block_prepare_write (called because we are retrying the write) will
find the buffers mapped, not call ->get_block, and thus the page will
be backed by already freed blocks leading to filesystem and data
corruption.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b9a4207d

07 Dec, 2009 2 commits

ext4: Use ext4 file system driver for ext2/ext3 file system mounts · 24b58424

Theodore Ts'o authored Dec 07, 2009

Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled,
will cause ext4 to be used for either ext2 or ext3 file system mounts
when ext2 or ext3 is not enabled in the configuration.

This allows minimalist kernel fanatics to drop to file system drivers
from their compiled kernel with out losing functionality.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

24b58424

ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() · c09eef30
Roel Kluin authored Dec 07, 2009
```
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
c09eef30

01 Dec, 2009 1 commit
- jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() · e6ec116b
  Theodore Ts'o authored Dec 01, 2009
```
OOM happens.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  e6ec116b
24 Nov, 2009 5 commits

ext4: remove unused parameter wbc from __ext4_journalled_writepage() · 3f0ca309

Wu Fengguang authored Nov 24, 2009

CC: Jan Kara <jack@suse.cz> 
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

3f0ca309

ext4: remove encountered_congestion trace · b4d72415

Wu Fengguang authored Nov 24, 2009

It is no longer set and scheduled to be removed.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b4d72415

ext4: move_extent_per_page() cleanup · ac48b0a1

Akira Fujita authored Nov 24, 2009

Integrate duplicate lines (acquire/release semaphore and invalidate
extent cache in move_extent_per_page()) into mext_replace_branches(),
to reduce source and object code size.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

ac48b0a1

ext4: initialize moved_len before calling ext4_move_extents() · 446aaa6e

Kazuya Mio authored Nov 24, 2009

The move_extent.moved_len is used to pass back the number of exchanged
blocks count to user space.  Currently the caller must clear this
field; but we spend more code space checking for this requirement than
simply zeroing the field ourselves, so let's just make life easier for
everyone all around.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

446aaa6e

ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT · 94d7c16c

Akira Fujita authored Nov 24, 2009

At the beginning of ext4_move_extent(), we call
ext4_discard_preallocations() to discard inode PAs of orig and donor
inodes.  But in the following case, blocks can be double freed, so
move ext4_discard_preallocations() to the end of ext4_move_extents().

1. Discard inode PAs of orig and donor inodes with
   ext4_discard_preallocations() in ext4_move_extents().

   orig : [ DATA1 ]
   donor: [ DATA2 ]

2. While data blocks are exchanging between orig and donor inodes, new
   inode PAs is created to orig by other process's block allocation.
   (Since there are semaphore gaps in ext4_move_extents().)  And new
   inode PAs is used partially (2-1).

   2-1 Create new inode PAs to orig inode
   orig : [ DATA1 | used PA1 | free PA1 ]
   donor: [ DATA2 ]

3. Donor inode which has old orig inode's blocks is deleted after
   EXT4_IOC_MOVE_EXT finished (3-1, 3-2).  So the block bitmap
   corresponds to old orig inode's blocks are freed.

   3-1 After EXT4_IOC_MOVE_EXT finished
   orig : [ DATA2 |  free PA1 ]
   donor: [ DATA1 |  used PA1 ]

   3-2 Delete donor inode
   orig : [ DATA2 |  free PA1 ]
   donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]

4. The double-free of blocks is occurred, when close() is called to
   orig inode.  Because ext4_discard_preallocations() for orig inode
   frees used PA1 and free PA1, though used PA1 is already freed in 3.

   4-1 Double-free of blocks is occurred
   orig : [ DATA2 |  FREE SPACE(free PA1) ]
   donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

94d7c16c

23 Nov, 2009 4 commits

ext4: use ext4_data_block_valid() in ext4_free_blocks() · 9084d471

Theodore Ts'o authored Nov 22, 2009

The block validity framework does a more comprehensive set of checks,
and it saves object code space to use the ext4_data_block_valid() than
the limited open-coded version that had been in ext4_free_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

9084d471

ext4: add check for wraparound in ext4_data_block_valid() · 1585d8d8
Theodore Ts'o authored Nov 22, 2009
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
1585d8d8

ext4: print i_mode in octal in ext4 tracepoints · 6eebee62

Theodore Ts'o authored Nov 22, 2009

Inode permissions are much easier to understand if they are printed in
octal.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

6eebee62

ext4: call ext4_forget() from ext4_free_blocks() · e6362609

Theodore Ts'o authored Nov 23, 2009

Add the facility for ext4_forget() to be called from
ext4_free_blocks().  This simplifies the code in a large number of
places, and centralizes most of the work of calling ext4_forget() into
a single place.

Also fix a bug in the extents migration code; it wasn't calling
ext4_forget() when releasing the indirect blocks during the
conversion.  As a result, if the system cashed during or shortly after
the extents migration, and the released indirect blocks get reused as
data blocks, the journal replay would corrupt the data blocks.  With
this new patch, fixing this bug was as simple as adding the
EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

e6362609

22 Nov, 2009 1 commit

ext4: fold ext4_free_blocks() and ext4_mb_free_blocks() · 44338711

Theodore Ts'o authored Nov 22, 2009

ext4_mb_free_blocks() is only called by ext4_free_blocks(), and the
latter function doesn't really do much.  So merge the two functions
together, such that ext4_free_blocks() is now found in
fs/ext4/mballoc.c.  This saves about 200 bytes of compiled text space.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

44338711

23 Nov, 2009 1 commit

ext4: fold ext4_journal_forget() into ext4_forget() · b7e57e7c

Theodore Ts'o authored Nov 22, 2009

Convert the last two callers of ext4_journal_forget() to use
ext4_forget() instead, and then fold ext4_journal_forget() into
ext4_forget().  This reduces are code complexity and shortens our call
stack.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b7e57e7c

24 Nov, 2009 1 commit

ext4: fold ext4_journal_revoke() into ext4_forget() · e4684b3f

Theodore Ts'o authored Nov 24, 2009

The only caller of ext4_journal_revoke() is ext4_forget(), so we can
fold ext4_journal_revoke() into ext4_forget() to simplify the code and
shorten the call stack.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

e4684b3f

23 Nov, 2009 1 commit

ext4: move ext4_forget() to ext4_jbd2.c · d6797d14

Theodore Ts'o authored Nov 22, 2009

The ext4_forget() function better belongs in ext4_jbd2.c.  This will
allow us to do some cleanup of the ext4_journal_revoke() and
ext4_journal_forget() functions, as well as giving us better error
reporting since we can report the caller of ext4_forget() when things
go wrong.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d6797d14

19 Nov, 2009 2 commits

ext4: make "norecovery" an alias for "noload" · e3bb52ae

Eric Sandeen authored Nov 19, 2009

Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.

In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.

Also show this status in /proc/mounts
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

e3bb52ae

ext4: make trim/discard optional (and off by default) · 5328e635

Eric Sandeen authored Nov 19, 2009

It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues.  Make
this mount-time optional, and default it to off until we know
that things are working out OK.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5328e635

23 Nov, 2009 2 commits

ext4: fix error handling in ext4_ind_get_blocks() · 2bba702d

Jan Kara authored Nov 23, 2009

When an error happened in ext4_splice_branch we failed to notice that
in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem
by checking for error properly.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

2bba702d

ext4: avoid issuing unnecessary barriers · 6b17d902

Theodore Ts'o authored Nov 23, 2009

We don't to issue an I/O barrier on an error or if we force commit
because we are doing data journaling.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org

6b17d902

15 Nov, 2009 1 commit

ext4: fix block validity checks so they work correctly with meta_bg · 1032988c

Theodore Ts'o authored Nov 15, 2009

The block validity checks used by ext4_data_block_valid() wasn't
correctly written to check file systems with the meta_bg feature.  Fix
this.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

1032988c

23 Nov, 2009 2 commits

ext4: fix uninit block bitmap initialization when s_meta_first_bg is non-zero · 8dadb198

Theodore Ts'o authored Nov 23, 2009

The number of old-style block group descriptor blocks is
s_meta_first_bg when the meta_bg feature flag is set.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

8dadb198

ext4: don't update the superblock in ext4_statfs() · 3f8fb949

Theodore Ts'o authored Nov 23, 2009

commit a71ce8c6 updated ext4_statfs()
to update the on-disk superblock counters, but modified this buffer
directly without any journaling of the change.  This is one of the
accesses that was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

3f8fb949

15 Nov, 2009 2 commits

ext4: journal all modifications in ext4_xattr_set_handle · 86ebfd08

Eric Sandeen authored Nov 15, 2009

ext4_xattr_set_handle() was zeroing out an inode outside
of journaling constraints; this is one of the accesses that
was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Reviewed-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

86ebfd08

ext4: fix i_flags access in ext4_da_writepages_trans_blocks() · 30c6e07a

Julia Lawall authored Nov 15, 2009

We need to be testing the i_flags field in the ext4 specific portion
of the inode, instead of the (confusingly aliased) i_flags field in
the generic struct inode.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

30c6e07a

23 Nov, 2009 3 commits

ext4: make sure directory and symlink blocks are revoked · 50689696

Theodore Ts'o authored Nov 23, 2009

When an inode gets unlinked, the functions ext4_clear_blocks() and
ext4_remove_blocks() call ext4_forget() for all the buffer heads
corresponding to the deleted inode's data blocks.  If the inode is a
directory or a symlink, the is_metadata parameter must be non-zero so
ext4_forget() will revoke them via jbd2_journal_revoke().  Otherwise,
if these blocks are reused for a data file, and the system crashes
before a journal checkpoint, the journal replay could end up
corrupting these data blocks.

Thanks to Curt Wohlgemuth for pointing out potential problems in this
area.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

50689696

ext4: add tracepoint for ext4_forget() · beac2da7
Theodore Ts'o authored Nov 23, 2009
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
beac2da7

ext4: remove failed journal checksum check · cf40db13

Theodore Ts'o authored Nov 22, 2009

Now that we are checking for failed journal checksums in the jbd2
layer, we don't need to check in the ext4 mount path --- since a
checksum fail will result in ext4_load_journal() returning an error,
causing the file system to refuse to be mounted until e2fsck can deal
with the problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

cf40db13

15 Nov, 2009 1 commit

jbd2: don't wipe the journal on a failed journal checksum · e6a47428

Theodore Ts'o authored Nov 15, 2009

If there is a failed journal checksum, don't reset the journal.  This
allows for userspace programs to decide how to recover from this
situation.  It may be that ignoring the journal checksum failure might
be a better way of recovering the file system.  Once we add per-block
checksums, we can definitely do better.  Until then, a system
administrator can try backing up the file system image (or taking a
snapshot) and and trying to determine experimentally whether ignoring
the checksum failure or aborting the journal replay results in less
data loss.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

e6a47428

14 Nov, 2009 1 commit

ext4: plug a buffer_head leak in an error path of ext4_iget() · 567f3e9a

Theodore Ts'o authored Nov 14, 2009

One of the invalid error paths in ext4_iget() forgot to brelse() the
inode buffer head.  Fix it by adding a brelse() in the common error
return path, which also simplifies function.

Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

567f3e9a

23 Nov, 2009 5 commits

ext4: fix spelling typos in move_extent.c · 92c28159

Akira Fujita authored Nov 23, 2009

Fix a few spelling typos in move_extent.c
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.co.jp>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

92c28159

ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT · 49bd22bc

Akira Fujita authored Nov 23, 2009

If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
will trigger a false-positive warning of a recursive lock. Since we
take i_data_sem for the two inodes ordered by their inode numbers,
this isn't a problem. Use of down_write_nested() will notify the lock
dependency checker machinery that there is no problem here.

This problem was reported by Brian Rogers:

http://marc.info/?l=linux-ext4&m=125115356928011&w=1Reported-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

49bd22bc

ext4: fix lock order problem in ext4_move_extents() · fc04cb49

Akira Fujita authored Nov 23, 2009

ext4_move_extents() checks the logical block contiguousness
of original file with ext4_find_extent() and mext_next_extent().
Therefore the extent which ext4_ext_path structure indicates
must not be changed between above functions.

But in current implementation, there is no i_data_sem protection
between ext4_ext_find_extent() and mext_next_extent().  So the extent
which ext4_ext_path structure indicates may be overwritten by
delalloc.  As a result, ext4_move_extents() will exchange wrong blocks
between original and donor files.  I change the place where
acquire/release i_data_sem to solve this problem.

Moreover, I changed move_extent_per_page() to start transaction first,
and then acquire i_data_sem.  Without this change, there is a
possibility of the deadlock between mmap() and ext4_move_extents():

* NOTE: "A", "B" and "C" mean different processes

A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.

B:   do_page_fault() starts the transaction (T),
     and then tries to acquire i_data_sem.
     But process "A" is already holding it, so it is kept waiting.

C:   While "A" and "B" running, kjournald2 tries to commit transaction (T)
     but it is under updating, so kjournald2 waits for it.

A-2: Call ext4_journal_start with holding i_data_sem,
     but transaction (T) is locked.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

fc04cb49

ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails · f868a48d

Akira Fujita authored Nov 23, 2009

If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
exchanged before the failure should be returned to the userspace
caller.  Unfortunately, currently if the block size is not the same as
the page size, the returned block count that is returned is the
page-aligned block count instead of the actual block count.  This
commit addresses this bug.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f868a48d

ext4: avoid divide by zero when trying to mount a corrupted file system · 503358ae

Theodore Ts'o authored Nov 23, 2009

If s_log_groups_per_flex is greater than 31, then groups_per_flex will
will overflow and cause a divide by zero error.  This can cause kernel
BUG if such a file system is mounted.

Thanks to Nageswara R Sastry for analyzing the failure and providing
an initial patch.

http://bugzilla.kernel.org/show_bug.cgi?id=14287Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

503358ae