An error occurred fetching the project authors.
- 29 Dec, 2003 2 commits
-
-
Andrew Morton authored
The locking rules say that b_committed_data is covered by jbd_lock_bh_state(), so implement that during the start of commit, while throwing away unused shadow buffers. I don't expect that there is really a race here, but them's the rules.
-
Andrew Morton authored
Sometimes kjournald has to refile a huge number of buffers, because someone else wrote them out beforehand - they are all clean. This happens under a lock and scheduling latencies of 88 milliseconds on a 2.7GHx CPU were observed. The patch forward-ports a little bit of the 2.4 low-latency patch to fix this problem. Worst-case on ext3 is now sub-half-millisecond, except for when the RCU dentry reaping softirq cuts in :(
-
- 22 Oct, 2003 1 commit
-
-
Andrew Morton authored
Plug the two-megabyte-per-day memory leak.
-
- 01 Aug, 2003 1 commit
-
-
Andrew Morton authored
We're getting asserion failures in commit in data=journal mode. journal_unmap_buffer() has unexpectedly donated this buffer to the committing transaction, and the commit-time assertion doesn't expect that to happen. It doesn't happen in 2.4 because both paths are under lock_journal(). Simply remove the assertion: the commit code will uncheckpoint the buffer and then recheckpoint it if needed.
-
- 10 Jul, 2003 2 commits
-
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> start_this_handle() takes into account t_outstanding_credits when calculating log free space, but journal_next_log_block() accounts for blocks being logged also. Hence, blocks are accounting twice. This effectively reduces the amount of log space available to transactions and forces more commits. Fix it by decrementing t_outstanding_credits each time we allocate a new journal block.
-
Andrew Morton authored
- remove accidental debug code from ext3 commit. - /proc/profile documentation fix (Randy Dunlap) - use sb_breadahead() in ext2_preread_inode() - unused var in mpage_writepages()
-
- 02 Jul, 2003 1 commit
-
-
Andrew Morton authored
CPU0 CPU1 journal_get_write_access(bh) (Add buffer to t_reserved_list) journal_get_write_access(bh) (It's already on t_reserved_list: nothing to do) (We decide we don't want to journal the buffer after all) journal_release_buffer() (It gets pulled off the transaction) journal_dirty_metadata() (The buffer isn't on the reserved list! The kernel explodes) Simple fix: just leave the buffer on t_reserved_list in journal_release_buffer(). If nobody ends up claiming the buffer then it will get thrown away at start of transaction commit.
-
- 25 Jun, 2003 1 commit
-
-
Andrew Morton authored
We need to unconditionally brelse() the buffer in there, because journal_remove_journal_head() leaves a ref behind. release_buffer_page() does that. Call it all the time because we can usually strip the buffers and free the page even if it was not marked buffer_freed(). Mainly affects data=journal mode
-
- 20 Jun, 2003 1 commit
-
-
Andrew Morton authored
ext3 and JBD still have enormous numbers of lines which end in tabs. Fix them all up.
-
- 18 Jun, 2003 18 commits
-
-
Andrew Morton authored
With data=ordered it is often the case that a quick write-and-truncate will leave large numbers of pages on the page LRU with no ->mapping, and attached buffers. Because ext3 was not ready to let the pages go at the time of truncation. These pages are trivially reclaimable, but their seeming absence makes the VM overcommit accounting confused (they don't count as "free", nor as pagecache). And they make the /proc/meminfo stats look odd. So what we do here is to try to strip the buffers from these pages as the buffers exit the journal commit.
-
Andrew Morton authored
start_this_handle() can decide to add this handle to a transaction, but kjournald then moves the handle into commit phase. Extend the coverage of j_state_lock so that start_this_transaction()'s examination of journal->j_state is atomic wrt journal_commit_transaction().
-
Andrew Morton authored
Plug a conceivable race with the freeing up of trasnactions, and add some more debug checks.
-
Andrew Morton authored
This filesystem-wide sleeping lock is no longer needed. Remove it.
-
Andrew Morton authored
lock_kernel() is no longer needed in JBD. Remove all the lock_kernel() calls from fs/jbd/. Here is where I get to say "ex-parrot".
-
Andrew Morton authored
Remove the remaining sleep_on() calls from JBD.
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> We're about to remove lock_journal(), and it is lock_journal which separates the running and committing transaction's revokes on the single revoke table. So implement two revoke tables and rotate them at commit time.
-
Andrew Morton authored
Go through all sites which use j_committing_transaction and ensure that the deisgned locking is correctly implemented there.
-
Andrew Morton authored
Implement the designed locking around journal->j_running_transaction. A lot more of the new locking scheme falls into place.
-
Andrew Morton authored
Provide the designed locking around the transaction's t_jcb callback list. It turns out that this is wholly redundant at present.
-
Andrew Morton authored
Provide the designating locking for transaction_t.t_updates.
-
Andrew Morton authored
This was a system-wide spinlock. Simple transformation: make it a filesystem-wide spinlock, in the JBD journal. That's a bit lame, and later it might be nice to make it per-transaction_t. But there are interesting ranking and ordering problems with that, especially around __journal_refile_buffer().
-
Andrew Morton authored
Go through all use of b_transaction and implement the rules. Fairly straightforward.
-
Andrew Morton authored
We now start to move across the JBD data structure's fields, from "innermost" and outwards. Start with journal_head.b_frozen_data, because the locking for this field was partially implemented in jbd-010-b_committed_data-race-fix.patch. It is protected by jbd_lock_bh_state(). We keep the lock_journal() and spin_lock(&journal_datalist_lock) calls in place. Later, spin_lock(&journal_datalist_lock) is replaced by spin_lock(&journal->j_list_lock). Of course, this completion of the locking around b_frozen_data also puts a lot of the locking for other fields in place.
-
Andrew Morton authored
journal_unlock_journal_head() is misnamed: what it does is to drop a ref on the journal_head and free it if that ref fell to zero. It doesn't actually unlock anything. Rename it to journal_put_journal_head().
-
Andrew Morton authored
buffer_heads and journal_heads are joined at the hip. We need a lock to protect the joint and its refcounts. JBD is currently using a global spinlock for that. Change it to use one bit in bh->b_state.
-
Andrew Morton authored
This was a strange spinlock which was designed to prevent another CPU from ripping a buffer's journal_head away while this CPU was inspecting its state. Really, we don't need it - we can inspect that state directly from bh->b_state. So kill it off, along with a few things which used it which are themselves not actually used any more.
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> We have a race wherein the block allocator can decide that journal_head.b_committed_data is present and then will use it. But kjournald can concurrently free it and set the pointer to NULL. It goes oops. We introduce per-buffer_head "spinlocking" based on a bit in b_state. To do this we abstract out pte_chain_lock() and reuse the implementation. The bit-based spinlocking is pretty inefficient CPU-wise (hence the warning in there) and we may move this to a hashed spinlock later.
-
- 03 Apr, 2003 1 commit
-
-
Andrew Morton authored
From: Hua Zhong <hzhong@cisco.com> The current ext3 totally ignores I/O errors that happened during a journal_force_commit time, causing user space to falsely believe it has succeeded, which actually did not. This patch checks IO error during journal_commit_transaction. and aborts the journal when there is I/O error. Originally I thought about reporting the error without doing aborting the journal, but it probably needs a new flag. Aborting the journal seems to be the easy way to signal "hey sth is wrong..".
-
- 10 Feb, 2003 1 commit
-
-
Andrew Morton authored
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> points out a bug in ll_rw_block() usage. Typical usage is: mark_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); wait_on_buffer(bh); the problem is that if the buffer was locked on entry to this code sequence (due to in-progress I/O), ll_rw_block() will not wait, and start new I/O. So this code will wait on the _old_ I/O, and will then continue execution, leaving the buffer dirty. It turns out that all callers were only writing one buffer, and they were all waiting on that writeout. So I added a new sync_dirty_buffer() function: void sync_dirty_buffer(struct buffer_head *bh) { lock_buffer(bh); if (test_clear_buffer_dirty(bh)) { get_bh(bh); bh->b_end_io = end_buffer_io_sync; submit_bh(WRITE, bh); } else { unlock_buffer(bh); } } which allowed a fair amount of code to be removed, while adding the desired data-integrity guarantees. UFS has its own wrappers around ll_rw_block() which got in the way, so this operation was open-coded in that case.
-
- 14 Jan, 2003 1 commit
-
-
Andrew Morton authored
This is the leak which Con found. Long story... - If a dirty page is fed into ext3_writepage() during truncate, block_write_full_page() will reutrn -EIO (it's outside i_size) and will leave the buffers dirty. In the expectation that discard_buffer() will clean them. - ext3_writepage() then adds the still-dirty buffers to the journal's "async data list". These are buffers which are known to have had IO started. All we need to do is to wait on them in commit. - meanwhile, truncate will chop the pages off the address_space. But truncate cannot invalidate the buffers (in journal_unmap_buffer()) because the buffers are attached to the committing transaction. (hm. This behaviour in journal_unmap_buffer() is bogus. We just never need to write these buffers.) - ext3 commit will "wait on writeout" of these writepage buffers (even though it was never started) and will then release them from the journalling system. So we end up with pages which are attached to no mapping, which are clean and which have dirty buffers. These are unreclaimable. Aside: ext3-ordered has two buffer lists: the "sync data list" and the "async data list". The sync list consists of probably-dirty buffers which were dirtied in commit_write(). Transaction commit must write all thee out and wait on them. The async list supposedly consists of clean buffers which were attached to the journal in ->writepage. These have had IO started (by writepage) so commit merely needs to wait on them. This is all designed for the 2.4 VM really. In 2.5, tons of writeback goes via writepage (instead of the buffer lru) and these buffers end up madly hpooing between the async and sync lists. Plus it's arguably incorrect to just wait on the writes in commit - if the buffers were set dirty again (say, by zap_pte_range()) then perhaps we should write them again before committing. So what the patch does is to remove the async list. All ordered-data buffers are now attached to the single "sync data list". So when we come to commit, those buffers which are dirty will have IO started and all buffers are waited upon. This means that the dirty buffers against a clean page which came about from block_write_full_page()'s -EIO will be written to disk in commit - this cleans them, and the page is now reclaimable. No leak. It seems bogus to write these buffers in commit, and indeed it is. But ext3 will not allow those blocks to be reused until the commit has ended so there is no corruption risk. And the amount of data involved is low - it only comes about as a race between truncate and writepage().
-
- 09 Oct, 2002 1 commit
-
-
Andrew Morton authored
From Peter Chubb printk changes: A sector_t can be either 64 or 32 bits, so cast it to a printable type that is at least as large as 64-bits on all platforms (i.e., cast to unsigned long long and use a %llu format) Transition to 64-bit sector_t: fix isofs_get_blocks by converting the (possibly 64-bit) arg to a long. SCSI 64-bit sector_t cleanup: capacity now stored as sector_t; make sure that the READ_CAPACITY command doesn't sign-extend its returned value; avoid 64-bit division when printing size in MB. Still to do: - 16-byte SCSI commands - Individual scsi drivers.
-
- 04 Jul, 2002 1 commit
-
-
Andrew Morton authored
This is a patch which Stephen has applied to ext3's 2.4 repository. Originally written by Andreas, generalised somewhat by Stephen. Add jbd callback mechanism, requested for InterMezzo. We allow the jbd's client to request notification when a given handle's IO finally commits to disk, so that clients can manage their own writeback state asynchronously.
-
- 18 Jun, 2002 1 commit
-
-
Andrew Morton authored
Stephen and Neil Brown recently worked this out. It's a rare situation which only affects data=journal mode. Fix problem in data=journal mode where writeback could be left pending on a journaled, deleted disk block. If that block then gets reallocated, we can end up with an alias in which the old data can be written back to disk over the new. Thanks to Neil Brown for spotting this and coming up with the initial fix.
-
- 20 May, 2002 1 commit
-
-
Christoph Hellwig authored
The lock.h header contained some hand-crafted lcoking routines from the pre-SMP days. In 2.5 only lock_super/unlock_super are left, guarded by a number of completly unrelated (!) includes. This patch moves lock_super/unlock_super to fs.h, which defined struct super_block that is needed for those to operate it, removes locks.h and updates all caller to not include it and add the missing, previously nested includes where needed.
-
- 05 May, 2002 1 commit
-
-
Andrew Morton authored
Pages under writeback are not locked. So it is possible (and quite legal) for a page to be under readpage() while it is still under writeback. For a partially uptodate page with blocksize < PAGE_CACHE_SIZE. When this happens, the read and write I/O completion handlers get confused over the shared BH_Async usage and the page ends up not getting PG_writeback cleared. Truncate gets stuck in D state. The patch separates the read and write I/O completion state. It also shuffles the buffer fields around. Putting the commonly-accessed b_state at offset zero shrinks the kernel by a few hundred bytes because it can be accessed with indirect addressing, not indirect+indexed.
-
- 30 Apr, 2002 4 commits
-
-
Andrew Morton authored
Implements hashed waitqueues for buffer_heads. Drops twelve bytes from struct buffer_head.
-
Andrew Morton authored
Moves all buffer_head-related stuff out of linux/fs.h and into linux/buffer_head.h. buffer_head.h is currently included at the very end of fs.h. So it is possible to include buffer_head directly from all .c files and remove this nested include. Also rationalises all the set_buffer_foo() and mark_buffer_bar() functions. We have: set_buffer_foo(bh) clear_buffer_foo(bh) buffer_foo(bh) and, in some cases, where needed: test_set_buffer_foo(bh) test_clear_buffer_foo(bh) And that's it. BUFFER_FNS() and TAS_BUFFER_FNS() macros generate all the above real inline functions. Normally not a big fan of cpp abuse, but in this case it fits. These function-generating macros are available to filesystems to expand their own b_state functions. JBD uses this in one case.
-
Andrew Morton authored
Removes the buffer_head unused list. Use a mempool instead. The reduced lock contention provided about a 10% boost on ANton's 12-way.
-
Andrew Morton authored
[ I reversed the order in which writeback walks the superblock's dirty inodes. It sped up dbench's unlink phase greatly. I'm such a sleaze ] The core writeback patch. Switches file writeback from the dirty buffer LRU over to address_space.dirty_pages. - The buffer LRU is removed - The buffer hash is removed (uses blockdev pagecache lookups) - The bdflush and kupdate functions are implemented against address_spaces, via pdflush. - The relationship between pages and buffers is changed. - If a page has dirty buffers, it is marked dirty - If a page is marked dirty, it *may* have dirty buffers. - A dirty page may be "partially dirty". block_write_full_page discovers this. - A bunch of consistency checks of the form if (!something_which_should_be_true()) buffer_error(); have been introduced. These fog the code up but are important for ensuring that the new buffer/page code is working correctly. - New locking (inode.i_bufferlist_lock) is introduced for exclusion from try_to_free_buffers(). This is needed because set_page_dirty is called under spinlock, so it cannot lock the page. But it needs access to page->buffers to set them all dirty. i_bufferlist_lock is also used to protect inode.i_dirty_buffers. - fs/inode.c has been split: all the code related to file data writeback has been moved into fs/fs-writeback.c - Code related to file data writeback at the address_space level is in the new mm/page-writeback.c - try_to_free_buffers() is now non-blocking - Switches vmscan.c over to understand that all pages with dirty data are now marked dirty. - Introduces a new a_op for VM writeback: ->vm_writeback(struct page *page, int *nr_to_write) this is a bit half-baked at present. The intent is that the address_space is given the opportunity to perform clustered writeback. To allow it to opportunistically write out disk-contiguous dirty data which may be in other zones. To allow delayed-allocate filesystems to get good disk layout. - Added address_space.io_pages. Pages which are being prepared for writeback. This is here for two reasons: 1: It will be needed later, when BIOs are assembled direct against pagecache, bypassing the buffer layer. It avoids a deadlock which would occur if someone moved the page back onto the dirty_pages list after it was added to the BIO, but before it was submitted. (hmm. This may not be a problem with PG_writeback logic). 2: Avoids a livelock which would occur if some other thread is continually redirtying pages. - There are two known performance problems in this code: 1: Pages which are locked for writeback cause undesirable blocking when they are being overwritten. A patch which leaves pages unlocked during writeback comes later in the series. 2: While inodes are under writeback, they are locked. This causes namespace lookups against the file to get unnecessarily blocked in wait_on_inode(). This is a fairly minor problem. I don't have a fix for this at present - I'll fix this when I attach dirty address_spaces direct to super_blocks. - The patch vastly increases the amount of dirty data which the kernel permits highmem machines to maintain. This is because the balancing decisions are made against the amount of memory in the machine, not against the amount of buffercache-allocatable memory. This may be very wrong, although it works fine for me (2.5 gigs). We can trivially go back to the old-style throttling with s/nr_free_pagecache_pages/nr_free_buffer_pages/ in balance_dirty_pages(). But better would be to allow blockdev mappings to use highmem (I'm thinking about this one, slowly). And to move writer-throttling and writeback decisions into the VM (modulo the file-overwriting problem). - Drops 24 bytes from struct buffer_head. More to come. - There's some gunk like super_block.flags:MS_FLUSHING which needs to be killed. Need a better way of providing collision avoidance between pdflush threads, to prevent more than one pdflush thread working a disk at the same time. The correct way to do that is to put a flag in the request queue to say "there's a pdlfush thread working this disk". This is easy to do: just generalise the "ra_pages" pointer to point at a struct which includes ra_pages and the new collision-avoidance flag.
-
- 09 Feb, 2002 1 commit
-
-
Dave Jones authored
Big bits first, I'll redo the smaller bits tomorrow after some sleep. Same as last time, rediffed against pre5
-