- 26 Oct, 2010 34 commits
-
-
Eric Dumazet authored
new_inode() dirties a contended cache line to get increasing inode numbers. This limits performance on workloads that cause significant parallel inode allocation. Solve this problem by using a per_cpu variable fed by the shared last_ino in batches of 1024 allocations. This reduces contention on the shared last_ino, and give same spreading ino numbers than before (i.e. same wraparound after 2^32 allocations). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Split up inode_add_to_list/__inode_add_to_list. Locking for the two lists will be split soon so these helpers really don't buy us much anymore. The __ prefixes for the sb list helpers will go away soon, but until inode_lock is gone we'll need them to distinguish between the locked and unlocked variants. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Now that iunique is not abusing find_inode anymore we can move the i_ref increment back to where it belongs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Stop abusing find_inode_fast for iunique and opencode the inode hash walk. Introduce a new iunique_lock to protect the iunique counters once inode_lock is removed. Based on a patch originally from Nick Piggin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Dave Chinner authored
Before replacing the inode hash locking with a more scalable mechanism, factor the removal of the inode from the hashes rather than open coding it in several places. Based on a patch originally from Nick Piggin. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
Convert the inode LRU to use lazy updates to reduce lock and cacheline traffic. We avoid moving inodes around in the LRU list during iget/iput operations so these frequent operations don't need to access the LRUs. Instead, we defer the refcount checks to reclaim-time and use a per-inode state flag, I_REFERENCED, to tell reclaim that iget has touched the inode in the past. This means that only reclaim should be touching the LRU with any frequency, hence significantly reducing lock acquisitions and the amount contention on LRU updates. This also removes the inode_in_use list, which means we now only have one list for tracking the inode LRU status. This makes it much simpler to split out the LRU list operations under it's own lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Dave Chinner authored
The number of inodes allocated does not need to be tied to the addition or removal of an inode to/from a list. If we are not tied to a list lock, we could update the counters when inodes are initialised or destroyed, but to do that we need to convert the counters to be per-cpu (i.e. independent of a lock). This means that we have the freedom to change the list/locking implementation without needing to care about the counters. Based on a patch originally from Eric Dumazet. [AV: cleaned up a bit, fixed build breakage on weird configs Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Miklos Szeredi authored
If clone_mnt() happens while mnt_make_readonly() is running, the cloned mount might have MNT_WRITE_HOLD flag set, which results in mnt_want_write() spinning forever on this mount. Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen accidentally. But if it does happen it can hang the machine. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Make node look as if it was on hlist, with hlist_del() working correctly. Usable without any locking... Convert a couple of places where we want to do that to inode->i_hash. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
note: for race-free uses you inode_lock held Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
We are in fill_super(); again, no inodes with zero i_count could be around until we set MS_ACTIVE. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
In fill_super() we hadn't MS_ACTIVE set yet, so there won't be any inodes with zero i_count sitting around. In put_super() we already have MS_ACTIVE removed *and* we had called invalidate_inodes() since then. So again there won't be any inodes with zero i_count... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
It's pointless - we *do* have busy inodes (root directory, for one), so that call will fail and attempt to change XIP flag will be ignored. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Namhyung Kim authored
If we have the appropriate page already, call __block_write_begin() directly instead of releasing and regrabbing it inside of block_write_begin(). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Namhyung Kim authored
Since inode->i_mode shares its bits for S_IFMT, S_ISDIR should be used to distinguish whether it is a dir or not. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Chris Mason authored
The aio batching code is using igrab to get an extra reference on the inode so it can safely batch. igrab will go ahead and take the global inode spinlock, which can be a bottleneck on large machines doing lots of AIO. In this case, igrab isn't required because we already have a reference on the file handle. It is safe to just bump the i_count directly on the inode. Benchmarking shows this patch brings IOP/s on tons of flash up by about 2.5X. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Christoph Hellwig authored
Updated Documentation/filesystems/Locking to match the code. Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Namhyung Kim authored
bh->b_private is initialized within init_buffer(), thus the assignment should be redundant. Remove it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Randy Dunlap authored
Move the EXPORTFS kconfig symbol out of the NETWORK_FILESYSTEMS block since it provides a library function that can be (and is) used by other (non-network) filesystems. This also eliminates a kconfig dependency warning: warning: (XFS_FS && BLOCK || NFSD && NETWORK_FILESYSTEMS && INET && FILE_LOCKING && BKL) selects EXPORTFS which has unmet direct dependencies (NETWORK_FILESYSTEMS) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Alex Elder <aelder@sgi.com> Cc: xfs-masters@oss.sgi.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Use sync_dirty_buffer instead of the incorrect opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
KAMEZAWA Hiroyuki authored
Now, rw_verify_area() checsk f_pos is negative or not. And if negative, returns -EINVAL. But, some special files as /dev/(k)mem and /proc/<pid>/mem etc.. has negative offsets. And we can't do any access via read/write to the file(device). So introduce FMODE_UNSIGNED_OFFSET to allow negative file offsets. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Richard Weinberger authored
365b1818 ("add f_flags to struct statfs(64)") resized f_spare within struct statfs which caused a UML crash. There is no need to copy f_spare. Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Valerie Aurora authored
Documentation: Fix trivial typo in filesystems/sharedsubtree.txt This typo is easy to ignore unless you have spent a great deal of time thinking about how to eliminate duplicate dentries in unions. Signed-off-by: Valerie Aurora <vaurora@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Dan Carpenter authored
The intent was to verify that bh = affs_bread_ino(...) returned a valid pointer. We checked "ext_bh" earlier in the function and it's valid here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Eric Dumazet authored
Andrew, Could you please review this patch, you probably are the right guy to take it, because it crosses fs and net trees. Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt depend on previous patch (sysctl: fix min/max handling in __do_proc_doulongvec_minmax()) Thanks ! [PATCH V4] fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Jan Kara authored
Currently isofs_get_blocks() was limited to handle only 4TB files on 32-bit architectures because of unnecessary use of iblock variable which was signed long. Just remove the variable. The error messages that were using this variable should have rather used b_off anyway because that is the block we are currently mapping. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
__block_write_begin and block_prepare_write are identical except for slightly different calling conventions. Convert all callers to the __block_write_begin calling conventions and drop block_prepare_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Hugetlbfs used to need it, but after the destroy_inode and evict_inode changes it's not required anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Add a new helper to write out the inode using the writeback code, that is including the correct dirty bit and list manipulation. A few of filesystems already opencode this, and a lot of others should be using it instead of using write_inode_now which also writes out the data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
The caller that didn't need it is gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 25 Oct, 2010 6 commits
-
-
Linus Torvalds authored
Merge branch 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci * 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci: (50 commits) davinci: fix remaining board support after io_pgoffst removal davinci: mityomapl138: make file local data static arm/davinci: remove duplicated include davinci: Initial support for Omapl138-Hawkboard davinci: MityDSP-L138/MityARM-1808 read MAC address from I2C Prom davinci: add tnetv107x touchscreen platform device input: add driver for tnetv107x touchscreen controller davinci: add keypad config for tnetv107x evm board davinci: add tnetv107x keypad platform device input: add driver for tnetv107x on-chip keypad controller net: davinci_emac: cleanup unused cpdma code net: davinci_emac: switch to new cpdma layer net: davinci_emac: separate out cpdma code net: davinci_emac: cleanup unused mdio emac code omap: cleanup unused davinci mdio arch code davinci: cleanup mdio arch code and switch to phy_id net: davinci_emac: switch to new mdio omap: add mdio platform devices davinci: add mdio platform devices net: davinci_emac: separate out davinci mdio ... Fix up trivial conflict in drivers/input/keyboard/Kconfig (two entries added next to each other - one from the davinci merge, one from the input merge)
-
git://git.open-osd.org/linux-open-osdLinus Torvalds authored
* 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Remove inode->i_count manipulation in exofs_new_inode fs/exofs: typo fix of faild to failed exofs: Set i_mapping->backing_dev_info anyway exofs: Cleaup read path in regard with read_for_write
-
Borislav Petkov authored
Commit b40827fa ("x86-32, mm: Add an initial page table for core bootstrapping") added an include directive which is needless and is taken care of by a previous one. Remove it. Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Boaz Harrosh authored
exofs_new_inode() was incrementing the inode->i_count and decrementing it in create_done(), in a bad attempt to make sure the inode will still be there when the asynchronous create_done() finally arrives. This was very stupid because iput() was not called, and if it was actually needed, it would leak the inode. However all this is not needed, because at exofs_evict_inode() we already wait for create_done() by waiting for the object_created event. Therefore remove the superfluous ref counting and just Thicken the comment at exofs_evict_inode() a bit. While at it change places that open coded wait_obj_created() to call the already available wrapper. CC: Dave Chinner <dchinner@redhat.com> CC: Christoph Hellwig <hch@lst.de> CC: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
-
Joe Perches authored
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
-
git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds authored
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (48 commits) [S390] topology: export cpu topology via proc/sysinfo [S390] topology: move topology sysinfo code [S390] topology: clean up facility detection [S390] cleanup facility list handling [S390] enable ARCH_DMA_ADDR_T_64BIT with 64BIT [S390] dasd: ignore unsolicited interrupts for DIAG [S390] kvm: Enable z196 instruction facilities [S390] dasd: fix unsolicited interrupt recognition [S390] dasd: fix use after free in dbf [S390] kvm: Fix badness at include/asm/mmu_context.h:83 [S390] cio: fix I/O cancel function [S390] topology: change default [S390] smp: use correct cpu address in print_cpu_info() [S390] remove ieee_instruction_pointer from thread_struct [S390] cleanup system call parameter setup [S390] correct alignment of cpuid structure [S390] cleanup lowcore access from external interrupts [S390] cleanup lowcore access from program checks [S390] pgtable: move pte_mkhuge() from hugetlb.h to pgtable.h [S390] fix SIGBUS handling ...
-