- 11 Jun, 2010 4 commits
-
-
Miao Xie authored
when we use remap_file_pages() to remap a file, remap_file_pages always return error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Dan Carpenter authored
refs can be used with uninitialized data if btrfs_lookup_extent_info() fails on the first pass through the loop. In the original code if that happens then check_path_shared() probably returns 1, this patch changes it to return 1 for safety. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we dropped passing the mode to the function that actually does the preallocation. This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Miao Xie authored
We cannot use the loop device which has been connected to a file in the btrf The reproduce steps is following: # dd if=/dev/zero of=vdev0 bs=1M count=1024 # losetup /dev/loop0 vdev0 # mkfs.btrfs /dev/loop0 ... failed to zero device start -5 The reason is that the btrfs don't implement either ->write_begin or ->write the VFS API, so we fix it by setting ->write to do_sync_write(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 27 May, 2010 5 commits
-
-
Chris Mason authored
The ENOSPC code will now return ENOSPC to btrfs_start_transaction. btrfs_dirty_inode needs to check for this and error out appropriately. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
In order to support DIO that isn't aligned to the filesystem blocksize, we fall back to buffered for any unaligned DIOs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Less printk is good printk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
After the path is released, the generation number got from block pointer is no long valid. The race may cause disk corruption, because verify_parent_transid() calls clear_extent_buffer_uptodate() when generation numbers mismatch. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
The O_DIRECT code wasn't checking for multiple references on preallocated or nodatacow extents. This means it wasn't honoring snapshots properly. The fix here is to add an explicit check for multiple references This also fixes the math for selecting the correct disk block, making sure not to go past the end of the extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 26 May, 2010 3 commits
-
-
Chris Mason authored
btrfs_dirty_inode tries to sneak in without much waiting or space reservation, mostly for performance reasons. This usually works well but can cause problems when there are many many writers. When btrfs_update_inode fails with ENOSPC, we fallback to a slower btrfs_start_transaction call that will reserve some space. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
This moves the delalloc space reservation done for O_DIRECT into btrfs_direct_IO. This way we don't leak reserved space if the generic O_DIRECT write code errors out before it calls into btrfs_direct_IO. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
This changes O_DIRECT write code to mark extents as delalloc while it is processing them. Yan Zheng has reworked the enospc accounting based on tracking delalloc extents and this makes it much easier to track enospc in the O_DIRECT code. There are a few space cases with the O_DIRECT code though, it only sets the EXTENT_DELALLOC bits, instead of doing EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because we don't want to mess with clearing the dirty and uptodate bits when things go wrong. This is important because there are no pages in the page cache, so any extent state structs that we put in the tree won't get freed by releasepage. We have to clear them ourselves as the DIO ends. With this commit, we reserve space at in btrfs_file_aio_write, and then as each btrfs_direct_IO call progresses it sets EXTENT_DELALLOC on the range. btrfs_get_blocks_direct is responsible for clearing the delalloc at the same time it drops the extent lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 25 May, 2010 19 commits
-
-
Chris Mason authored
The async helper threads offload crc work onto all the CPUs, and make streaming writes much faster. This changes the O_DIRECT write code to use them. The only small complication was that we need to pass in the logical offset in the file for each bio, because we can't find it in the bio's pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Yan Zheng noticed two places we were doing a lot of work without task->state set to TASK_RUNNING. This sets the state properly after we get ready to sleep but decide not to. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
In order for AIO to work, we need to implement aio_write. This patch converts our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and nothing broke, and the AIO stuff magically started working. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
This provides basic DIO support for reading and writing. It does not do the work to recover from mismatching checksums, that will come later. A few design changes have been made from Jim's code (sorry Jim!) 1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO code in order to account for all of BTRFS's oddities, but thanks to that work it seems like the best bet is to just ignore compression and such and just opt to fallback on buffered IO. 2) Fallback on buffered IO for compressed or inline extents. Jim's code did it's own buffering to make dio with compressed extents work. Now we just fallback onto normal buffered IO. 3) Use ordered extents for the writes so that all of the lock_extent() lookup_ordered() type checks continue to work. 4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with DIO writes. I've tested this with fsx and everything works great. This patch depends on my dio and filemap.c patches to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
Btrfs cannot handle having logically non-contiguous requests submitted. For example if you have Logical: [0-4095][HOLE][8192-12287] Physical: [0-4095] [4096-8191] Normally the DIO code would put these into the same BIO's. The problem is we need to know exactly what offset is associated with what BIO so we can do our checksumming and unlocking properly, so putting them in the same BIO doesn't work. So add another check where we submit the current BIO if the physical blocks are not contigous OR the logical blocks are not contiguous. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
Because BTRFS can do RAID and such, we need our own submit hook so we can setup the bio's in the correct fashion, and handle checksum errors properly. So there are a few changes here 1) The submit_io hook. This is straightforward, just call this instead of submit_bio. 2) Allow the fs to return -ENOTBLK for reads. Usually this has only worked for writes, since writes can fallback onto buffered IO. But BTRFS needs the option of falling back on buffered IO if it encounters a compressed extent, since we need to read the entire extent in and decompress it. So if we get -ENOTBLK back from get_block we'll return back and fallback on buffered just like the write case. I've tested these changes with fsx and everything seems to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
This is similar to what already happens in the write case. If we have a short read while doing O_DIRECT, instead of just returning, fallthrough and try to read the rest via buffered IO. BTRFS needs this because if we encounter a compressed or inline extent during DIO, we need to fallback on buffered. If the extent is compressed we need to read the entire thing into memory and de-compress it into the users pages. I have tested this with fsx and everything works great. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
This patch adds metadata ENOSPC handling for the balance code. It is consisted by following major changes: 1. Avoid COW tree leave in the phrase of merging tree. 2. Handle interaction with snapshot creation. 3. make the backref cache can live across transactions. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
Pre-allocate space for data relocation. This can detect ENOPSC condition caused by fragmentation of free space. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
Previous patches make the allocater return -ENOSPC if there is no unreserved free metadata space. This patch updates tree log code and various other places to propagate/handle the ENOSPC error. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
reserve metadata space for handling orphan inodes Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
Reserve metadata space for extent tree, checksum tree and root tree Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
Introduce metadata reservation context for delayed allocation and update various related functions. This patch also introduces EXTENT_FIRST_DELALLOC control bit for set/clear_extent_bit. It tells set/clear_bit_hook whether they are processing the first extent_state with EXTENT_DELALLOC bit set. This change is important if set/clear_extent_bit involves multiple extent_state. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
Introducing metadata reseravtion contexts has two major advantages. First, it makes metadata reseravtion more traceable. Second, it can reclaim freed space and re-add them to the itself after transaction committed. Besides add btrfs_block_rsv structure and related helper functions, This patch contains following changes: Move code that decides if freed tree block should be pinned into btrfs_free_tree_block(). Make space accounting more accurate, mainly for handling read only block groups. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
All code in init_btrfs_i can be moved into btrfs_alloc_inode() Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
Shrink delayed allocation space in a synchronized manner is more controllable than flushing all delay allocated space in an async thread. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
We already have fs_info->chunk_mutex to avoid concurrent chunk creation. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan, Zheng authored
The size of reserved space is stored in space_info. If block groups of different raid types are linked to separate space_info, changing allocation profile will corrupt reserved space accounting. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 16 May, 2010 6 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: rtnetlink: make SR-IOV VF interface symmetric sctp: delete active ICMP proto unreachable timer when free transport tcp: fix MD5 (RFC2385) support
-
git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds authored
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: MIPS: Oprofile: Fix Loongson irq handler MIPS: N32: Use compat version for sys_ppoll. MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1
-
Chris Wright authored
Now we have a set of nested attributes: IFLA_VFINFO_LIST (NESTED) IFLA_VF_INFO (NESTED) IFLA_VF_MAC IFLA_VF_VLAN IFLA_VF_TX_RATE This allows a single set to operate on multiple attributes if desired. Among other things, it means a dump can be replayed to set state. The current interface has yet to be released, so this seems like something to consider for 2.6.34. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
transport may be free before ICMP proto unreachable timer expire, so we should delete active ICMP proto unreachable timer when transport is going away. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
TCP MD5 support uses percpu data for temporary storage. It currently disables preemption so that same storage cannot be reclaimed by another thread on same cpu. We also have to make sure a softirq handler wont try to use also same context. Various bug reports demonstrated corruptions. Fix is to disable preemption and BH. Reported-by: Bhaskar Dutta <bhaskie@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 15 May, 2010 3 commits
-
-
Wu Zhangjin authored
The interrupt enable bit for the performance counters is in the Control Register $24, not in the counter register. loongson2_perfcount_handler(), we need to use Reported-by: Xu Hengyang <hengyang@mail.ustc.edu.cn> Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com> Cc: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/1198/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> ---
-
Chandrakala Chavva authored
The sys_ppoll() takes struct 'struct timespec'. This is different for the N32 and N64 ABIs. Use the compat version to do the proper conversions. Signed-off-by: David Daney <ddaney@caviumnetworks.com> To: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/1210/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> ---
-
Shane McDonald authored
In the FPU emulator code of the MIPS, the Cause bits of the FCSR register are not currently writeable by the ctc1 instruction. In odd corner cases, this can cause problems. For example, a case existed where a divide-by-zero exception was generated by the FPU, and the signal handler attempted to restore the FPU registers to their state before the exception occurred. In this particular setup, writing the old value to the FCSR register would cause another divide-by-zero exception to occur immediately. The solution is to change the ctc1 instruction emulator code to allow the Cause bits of the FCSR register to be writeable. This is the behaviour of the hardware that the code is emulating. This problem was found by Shane McDonald, but the credit for the fix goes to Kevin Kissell. In Kevin's words: I submit that the bug is indeed in that ctc_op: case of the emulator. The Cause bits (17:12) are supposed to be writable by that instruction, but the CTC1 emulation won't let them be updated by the instruction. I think that actually if you just completely removed lines 387-388 [...] things would work a good deal better. At least, it would be a more accurate emulation of the architecturally defined FPU. If I wanted to be really, really pedantic (which I sometimes do), I'd also protect the reserved bits that aren't necessarily writable. Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com> To: anemo@mba.ocn.ne.jp To: kevink@paralogos.com To: sshtylyov@mvista.com Patchwork: http://patchwork.linux-mips.org/patch/1205/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> ---
-