- 10 Mar, 2005 40 commits
-
-
Paul Jackson authored
This my cpuset patch, with the following changes in the last two weeks: 1) Updated to 2.6.8.1-mm1 2) [Simon Derr <Simon.Derr@bull.net>] Fix new cpuset to begin empty, not copied from parent. Needed to avoid breaking exclusive property. 3) [Dinakar Guniguntala <dino@in.ibm.com>] Finish initializing top cpuset from cpu_possible_map after smp_init() called. 4) [Paul Jackson <pj@sgi.com>] Check on each call to __alloc_pages() if the current tasks cpuset mems_allowed has changed. Use a cpuset generation number, bumped on any cpuset memory placement change, to make this check efficient. Update the tasks mems_allowed from its cpuset, if the cpuset has changed. 5) [Paul Jackson <pj@sgi.com>] If a task is moved to another cpuset, then update its cpus_allowed, using set_cpus_allowed(). 6) [Paul Jackson <pj@sgi.com>] Update Documentation/cpusets.txt to reflect above changes (4) and (5). I continue to recommend the following patch for inclusion in your 2.6.9-*mm series, when that opens. It provides an important facility for high performance computing on large systems. Simon Derr of Bull (France) and myself are the primary authors. Erich Focht has indicated that NEC is also a potential user of this patch on the TX-7 NUMA machines, and that he "would very much welcome the inclusion of cpusets." I offer this update to lkml, in order to invite continued feedback. The one prerequiste patch for this cpuset patch was just posted before this one. That was a patch to provide a new bitmap list format, of which cpusets is the first user. This patch has been built on top of 2.6.8.1-mm1, for the arch's: i386 x86_64 sparc ia64 powerpc-405 powerpc-750 sparc64 with and without CONFIG_CPUSET. It has been booted and tested on ia64 (sn2_defconfig, SN2 hardware). The 'alpha' arch also built, except for what seems to be an unrelated toolchain problem (crosstool ld sigsegv) in the final link step. === Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to a set of tasks. Cpusets constrain the CPU and Memory placement of tasks to only the processor and memory resources within a tasks current cpuset. They form a nested hierarchy visible in a virtual file system. These are the essential hooks, beyond what is already present, required to manage dynamic job placement on large systems. Cpusets require small kernel hooks in init, exit, fork, mempolicy, sched_setaffinity, page_alloc and vmscan. And they require a "struct cpuset" pointer, a cpuset_mems_generation, and a "mems_allowed" nodemask_t (to go along with the "cpus_allowed" cpumask_t that's already there) in each task struct. These hooks: 1) establish and propagate cpusets, 2) enforce CPU placement in sched_setaffinity, 3) enforce Memory placement in mbind and sys_set_mempolicy, 4) restrict page allocation and scanning to mems_allowed, and 5) restrict migration and set_cpus_allowed to cpus_allowed. The other required hook, restricting task scheduling to CPUs in a tasks cpus_allowed mask, is already present. Cpusets extend the usefulness of, the existing placement support that was added to Linux 2.6 kernels: sched_setaffinity() for CPU placement, and mbind() and set_mempolicy() for memory placement. On smaller or dedicated use systems, the existing calls are often sufficient. On larger NUMA systems, running more than one, performance critical, job, it is necessary to be able to manage jobs in their entirety. This includes providing a job with exclusive CPU and memory that no other job can use, and being able to list all tasks currently in a cpuset. A given job running within a cpuset, would likely use the existing placement calls to manage its CPU and memory placement in more detail. Cpusets are named, nested sets of CPUs and Memory Nodes. Each cpuset is represented by a directory in the cpuset virtual file system, normally mounted at /dev/cpuset. Each cpuset directory provides the following files, which can be read and written: cpus: List of CPUs allowed to tasks in that cpuset. mems: List of Memory Nodes allowed to tasks in that cpuset. tasks: List of pid's of tasks in that cpuset. cpu_exclusive: Flag (0 or 1) - if set, cpuset has exclusive use of its CPUs (no sibling or cousin cpuset may overlap CPUs). mem_exclusive: Flag (0 or 1) - if set, cpuset has exclusive use of its Memory Nodes (no sibling or cousin may overlap). notify_on_release: Flag (0 or 1) - if set, then /sbin/cpuset_release_agent will be invoked, with the name (/dev/cpuset relative path) of that cpuset in argv[1], when the last user of it (task or child cpuset) goes away. This supports automatic cleanup of abandoned cpusets. In addition one new filetype is added to the /proc file system: /proc/<pid>/cpuset: For each task (pid), list its cpuset path, relative to the root of the cpuset file system. This file is read-only. New cpusets are created using 'mkdir' (at the shell or in C). Old ones are removed using 'rmdir'. The above files are accessed using read(2) and write(2) system calls, or shell commands such as 'cat' and 'echo'. The CPUs and Memory Nodes in a given cpuset are always a subset of its parent. The root cpuset has all possible CPUs and Memory Nodes in the system. A cpuset may be exclusive (cpu or memory) only if its parent is similarly exclusive. See further Documentation/cpusets.txt, at the top of the following patch. /proc interface: It is useful, when learning and making new uses of cpusets and placement to be able to see what are the current value of a tasks cpus_allowed and mems_allowed, which are the actual placement used by the kernel scheduler and memory allocator. The cpus_allowed and mems_allowed values are needed by user space apps that are micromanaging placement, such as when moving an app to a obtained by that app within its cpuset using sched_setaffinity, mbind and set_mempolicy. The cpus_allowed value is also available via the sched_getaffinity system call. But since the entire rest of the cpuset API, including the display of mems_allowed added here, is via an ascii style presentation in /proc and /dev/cpuset, it is worth the extra couple lines of code to display cpus_allowed in the same way. This patch adds the display of these two fields to the 'status' file in the /proc/<pid> directory of each task. The fields are only added if CONFIG_CPUSETS is enabled (which is also needed to define the mems_allowed field of each task). The new output lines look like: $ tail -2 /proc/1/status Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff Mems_allowed: ffffffff,ffffffff Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Paul Jackson authored
[This is a copy of the bitmap list format patch that I submitted to lkml on 9 Aug 2004, after removing the prefix character fluff^Wstuff. I include it here again just to get it associated with the current cpuset patch, which will follow in a second email 10 seconds later. -pj] A bitmap print and parse format that provides lists of ranges of numbers, to be first used for by cpusets (next patch). Cpusets provide a way to manage subsets of CPUs and Memory Nodes for scheduling and memory placement, via a new virtual file system, usually mounted at /dev/cpuset. Manipulation of cpusets can be done directly via this file system, from the shell. However, manipulating 512 bit cpumasks or 256 bit nodemasks (which will get bigger) via hex mask strings is painful for humans. The intention is to provide a format for the cpu and memory mask files in /dev/cpusets that will stand the test of time. This format is supported by a couple of new lib/bitmap.c routines, for printing and parsing these strings. Wrappers for cpumask and nodemask are provided. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Add a `:'. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
a) If old_dir == new_dir, we don't need to update the ".." entry, so this doesn't touch it if unneeded. b) old algorithm is using 1) add a new entry (doen't point the data cluster yet). 2) remove a old entry. 3) switch the data cluster to new entry. 4) update a ".." entry this order lose the data cluster when between 2) and 3). Instead of above, this is using 1) add a new entry (doen't point the data cluster yet). 2) switch the data cluster to new entry. 3) update a ".." entry if needed. 4) remove a old entry. this order would not lose the data cluster, but on disk metadata is corrupted on some point (later, fsck would recover this corruption without losing the data). c) use synchronous update. d) Fix the corrupted directory check created by 1 of new algorithm. 1) Fix fat_bmap(). If directory's ->i_start == 0, fat_bmap() is handling it as root directory, this removes that strange behavior. 2) On mkdir() path if directory's ->i_start == 0, returns -EIO. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Use the synchronous updates, in order to guarantee that the writing to a disk is completeing when a system call returns. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Fix a missing error check for sync_buffer_dirty(). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Instead of mark_inode_dirty(inode); if (IS_SYNC(inode)) fat_sync_inode(inode); use this if (IS_SYNC(inode)) fat_sync_inode(inode); else mark_inode_dirty(inode); And if occurs a error, restore the ->i_start and ->i_logstart. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>- Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Some mark_inode_dirty() is unneeded. Those are already detached (it's not written) or change a ->i_nlink count only (fatfs don't have). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Since MSDOS_SB() is inline function, it increases text size at each calls. I don't know whether there is __attribute__ for avoiding this. This removes the multiple call. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
The "i_pos" can calculate later, so this makes the "i_pos" when it's needed. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
In order not to write out the same block repeatedly, rewrite the fat_add_entries(). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
With this change, ->mkdir() uses the correct updating order. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
The msdos_add_entry() use similar interface to vfat_add_entry(). And use a same timestamp on some operations path. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Cleans up the msdos_rename(). (use the logic similar to vfat_rename().) Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
With this, the vfat_build_slots() builds the completely data including the timestamp and cluster. (But this is not using "cluster", it's not complete yet) Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
The msdos_find() provide the "struct fat_slot_info". Then some cleanups. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Use "struct fat_slot_info" for fat_scan(). But ".." entry can not provide valid informations for inode, so add the fat_get_dotdot_entry() as special case. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Just use ERR_PTR() instead of getting the error code by additional argument. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
This changes the fat_slot_info->nr_slot, now it's total counts which include a shortname entry. And this adds a fat_remove_entries() which use the ->nr_slots. In order not to write out the same block repeatedly, fat_remove_entries() was rewritten from vfat_remove_entries(). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
The fat_search_long() provide the "struct fat_slot_info" by this change. So, vfat_find() became to be enough simple, and it just returns 0 or error. And the error check of vfat_find() is also simplify. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Add "struct fat_slot_info" for updating data of the directory entries. 1) Rename "struct vfat_slot_info" to "struct fat_slot_info" 2) Add "de" and "bh" to fat_slot_info instead of using argument. 3) Replace the "vfat_slot_info + de + bh" by new fat_slot_info Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
This changes ->free_clusters and ->prev_free from "int" to "unsigned int". These value should be never negative value (but it's using 0xffffffff(-1) as undefined state). With this changes, fatfs would handle the corruption of free_clusters more proper. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
In order not to write the same block repeatedly, rewrite the FAT entry access stuff. And this separates the "allocate the cluster" and "link to cluster chain" operations for expanding the file/dir safely. (fat_alloc_clusters() allocates the cluster, and fat_chain_add() links allocated cluster to the chain which inode has.) Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
Adds MS_SYNCHRONOUS flag support. Signed-off-by: Colin Leroy <colin@colino.net> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
In the case of dotsOK, re-initialization of "ptname" pointer is needed, otherwise, "ptname" is pointing the previous start position. This fixes it. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
This updates the FAT attributes as well as (hopefully) corrects the handling of VFAT ctime. The FAT attributes are implemented as a 32-bit ioctl, per the previous discussions. Signed-Off-By: H. Peter Anvin <hpa@zytor.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hirofumi Ogawa authored
With Christoph Hellwig <hch@lst.de> These patches adds the `-o sync' and `-o dirsync' supports to fatfs. If user specified that option, the fatfs does traditional ordered updates by using synchronous writes. If compared to before, these patches will show a improvement of robustness I think. `-o sync' - writes all buffers out before returning from syscall. `-o dirsync' - writes the directory's metadata, and unreferencing operations of data block. remaining to be done fat_generic_ioctl(), fat_notify_change(), ATTR_ARCH of fat_xxx_write[v], and probably, filling hole in cont_prepare_write(), NOTE: Since fatfs doesn't have link-count, unfortunately ->rename() is not safe order at all. It may make the shared blocks, but user shouldn't lose the data by ->rename(). If you test this, please use the dosfstools at http://www.zip.com.au/~akpm/linux/patches/stuff/fatfsprogs.tar.bz2 This is fixing several bugs of dosfstools. And "2/29" patch from hpa adds new ioctl, the attached archive is also including the commands for testing it. This patch fixes vectored write support on fat to do the nessecary non-standard action done in write() aswell. Also adds aio support and makes read/write wrappers around the aio version. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
J. Bruce Fields authored
Removing the spurious (!clnt) check, as in the following, will also simplify that conflict resolution a bit. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
gam3@gam3.net authored
ETXTBSY doesn't have a direct anaolog in NFS, so just map it to nfserr_io. As this is the default error code this does not change the operation of nfsd. It only reduces logging. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Get rid of remove_lease, use setlease() with F_UNLCK Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Remove nfs4_check_deleg_recall(). Move its checks into __setlease() via two new lock_manager callbacks, fl_mylease and fl_change, so that all leases (not just NFSv4 lease as with nfs4_check_deleg_recall) are checked. Default implementations of fl_mylease and fl_change are provided for the sake of the fcntl_setlease interface. Both callbacks must always be defined. fl_mylease: for the NFSv4 server, this check is used to see if an existing lease comes from the same client. For the fcntl_setlease interface, the existing logic is preserved. the fl_mylease check sees if the existing lease is from the input filp. fl_change: called if the fl_mylease returns true the NFSv4 server does not hand out a delegation to a client that already has one. -EAGAIN is returned. Otherwise lease_modify is used. For the fcntl_setlease interface, the exisiting logic is preserved: The callback used in lease_modify(). Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
We get a NULL dereference here anyway, no need to BUG() explicitly. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
Note that the second parameter is unused. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
It is the responsibility of the code that runs in the do_recall thread to drop the reference to the delegation. So if we fail to run that thread, then we need to do it ourselves here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Neil Brown authored
The dl_state flag isn't actually useful. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-