- 18 Feb, 2004 40 commits
-
-
Andrew Morton authored
From: Joe Thornber <thornber@redhat.com> For some reason dm_table_create() was allocating GFP_NOIO rather than GFP_KERNEL.
-
Andrew Morton authored
From: Joe Thornber <thornber@redhat.com> Tidy up the error path for alloc_dev()
-
Andrew Morton authored
From: Joe Thornber <thornber@redhat.com> Make sure that we maintain ordering when deferring bios.
-
Andrew Morton authored
From: Joe Thornber <thornber@redhat.com> Remove struct dm_deferred_io from dm.c. [Christophe Saout]
-
Andrew Morton authored
From: Joe Thornber <thornber@redhat.com> Move to_bytes() and to_sectors() into dm.h
-
Andrew Morton authored
From: Joe Thornber <thornber@redhat.com> Export dm_vcalloc()
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> With this patch, md used two major numbers for arrays. One Major is number 9 with name 'md' have unpartitioned md arrays, one per minor number. The other Major is allocated dynamically with name 'mdp' and had on array for every 64 minors, allowing for upto 63 partitions. The arrays under one major are completely separate from the arrays under the other. The preferred name for devices with the new major are of the form: /dev/md/d1p3 # partion 3 of device 1 - minor 67 When a paritioned md device is assembled, the partitions are not recognised until after the whole-array device is opened again. A future version of mdadm will perform this open so that the need will be transparent.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> Currently raid1 uses PAGE_SIZE read/write requests for resync, as it doesn't know how to honour per-device restrictions. This patch uses to bio_add_page to honour those restrictions and ups the limit on request size to 64K. This has a measurable impact on rebuild speed (25M/s -> 60M/s)
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> For each resync request, we allocate a "r1_bio" which has a bio "master_bio" attached that goes largely unused. We also allocate a read_bio which is used. This patch removes the read_bio and just uses the master_bio instead. This fixes a bug wherein bi_bdev of the master_bio wasn't being set, but was being used. We also introduce a new "sectors" field into the r1_bio as we can no-longer rely in master_bio->bi_sectors.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> next_r1 is never used, so it can just go. read_bio isn't needed as we can easily use one of the pointers in the write_bios array - write_bios[->read_disk]. So rename "write_bios" to "bios" and store the pointer to the read bio in there.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> The only time it is really needed is to differentiate a retry-on-fail from a write-after-read-for-resync request to raid1d. So we use a bit in 'state' for that.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> Instead of having a single end_request handler that must determine whether it was a read or a write request, we have two separate handlers, which makes each of them easier to follow.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> The "START_ARRAY" ioctl depends on major/minor numbers (as stored in the raid superblock) are stable over reboots, which is increasingly untrue. There are better ways to start an array (e.g. with mdadm) so we mark the ioctl as deprecated for 2.6, and will remove it in 2.7.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> From: Stephen Hemminger <shemminger@osdl.org> Date: Fri, 12 Sep 2003 11:31:06 -0700 NFS won't build w/o CONFIG_PROC_FS. Looks like typo's (or a C++ programmer) in stats.h
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> From: shemminger@osdl.org Sat Sep 6 09:19:50 2003 Date: Fri, 5 Sep 2003 16:19:30 -0700 Converts /proc/net/rpc/nfs and /proc/net/rpc/nfsd to use the simpler seq_file interface.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> There is no way to return an error from a cache init routine, so instead we make sure to pre-allocate the memory needed, and free it after the lookup if the lookup failed.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> When adding a item to a sunrpc/svc cache that contains kmalloced data it is usefully to move the malloced data out of the key object into the new cache object rather than copying (as then we would need to cope with kmalloc failure and such). This means modifying the original. If the kmalloced data forms part of the key, then we must not move the data out until after the key isn't needed any more. So this patch moves the call to "INIT" on a new item (which fills in the key) to *after* the item has been found (or not), and also makes sure we only call the HASH function once. Thanks to "J. Bruce Fields" <bfields@fieldses.org> also 1/ remove unnecessary assignment 2/ fix comments that lag behind implementation.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> We currently call cache_put, which can schedule(), under a spin_lock. This patch moves that call outside the spinlock.
-
Andrew Morton authored
From: Valdis.Kletnieks@vt.edu 15 changes of #if to #ifdef and 2 places CONFIG_FOO should be defined(CONFIG_FOO). This gets rid of spurious warnings if you build with "-Wundef" so you get a warning if you have a preprocessor command like: #if CONFIG_ETRAX_DS1302_RSTBIT == 27 and you'll be told if it's substituting a zero rather than silent weirdness and unexpected code generation.
-
Andrew Morton authored
From: Ralf Baechle <ralf@linux-mips.org> Three new MIPS-specific serial drivers. ip22.c is derived from the sparc zilog driver; guess we should write a generic Zilog driver somewhen ...
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Some x86-64 users were complaining that coredumps >2GB don't work. This will enable large coredump for everybody. Apparently the 32bit gdb/binutils cannot handle them, but I hear the binutils people are working on fixing that. I doubt it will harm people - unreadable coredumps are not worse than no coredump and it won't make any difference in space usage if you get a 1.99GB or a 2.5GB coredump. So just enable it unconditionally. If it should be really a problem for 32bit the rlimit defaults in resource.h could be changed. For file systems that don't support O_LARGEFILE you should just get an truncated coredumps for big address spaces.
-
Andrew Morton authored
From: Andrey Borzenkov <arvidjaar@mail.ru> - use struct nameidata in devfs_d_revalidate_wait to detect when it is called without i_sem hold; take i_sem on parent in this case. This prevents both deadlock with devfs_lookup by allowing it to drop i_sem consistently and oops in d_instantiate by ensuring that it always runs protected - remove dead code that deals with major number allocation. The only remaining user was devfs itself and patch changes it to - use register_chardev to get device number for internal /dev/.devfsd and /dev/.statd. - remove dead auto allocation flag as well - remove code that does module get on dev open - it is handled by fops_get. Use init_special_inode consistently - get rid of struct cdev_type and bdev_type - both have just single dev_t now
-
Andrew Morton authored
From: Juergen Quade <quade@hsnr.de> Lots of places in the kernel are using [v]snprintf wrongly: they assume it returns the number of characters copied. It doesn't. It returns the number of characters which _would_ have been copied had the buffer not been filled up. So create new functions vscnprintf() and scnprintf() which have the expected (sane) semaptics, and migrate callers over to using them.
-
Andrew Morton authored
We need to hold i_sem while running i_size_write(). But that seems like a lot of fuss and deadlock potential. So just write the dang thing.
-
Andrew Morton authored
The NGROUPS changes broke it, and we're not sure how to fixit, and nobody appears to be working on or testing intermezzo.
-
Andrew Morton authored
From: Tim Hockin <thockin@sun.com>, Neil Brown <neilb@cse.unsw.edu.au>, me New groups infrastructure. task->groups and task->ngroups are replaced by task->group_info. Group)info is a refcounted, dynamic struct with an array of pages. This allows for large numbers of groups. The current limit of 32 groups has been raised to 64k groups. It can be raised more by changing the NGROUPS_MAX constant in limits.h
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Jeff Garzik disliked the bonding driver knowing it was called "bond0". Remove that alias, and revert documentation.
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> New MODULE_ALIASes in: 1) arch/i386/kernel/microcode.c 2) drivers/char/genrtc.c 3) drivers/ide/ide-tape.c 4) drivers/net/bonding/bond_main.c 5) drivers/net/bsd_comp.c 6) drivers/net/ppp_deflate.c 7) drivers/net/ppp_generic.c
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Someone complained about the number of references to /etc/modules.conf in the documentation. While fixing them up (and examples where changed), removed those which are redundant due to MODULE_ALIAS.
-
Andrew Morton authored
From: Adrian Bunk <bunk@fs.tum.de> - AMD Elan is a different subarch, you can't configure a kernel that runs on both the AMD Elan and other i386 CPUs - added optimizing CFLAGS for the AMD Elan
-
Andrew Morton authored
From: Adrian Bunk <bunk@fs.tum.de> gcc 2.95 supports -march=k6 (no need for check_gcc)
-
Andrew Morton authored
From: Adrian Bunk <bunk@fs.tum.de> add Pentium M and Pentium-4 M options: - add MPENTIUMM (equivalent to PENTIUMIII except for a bigger X86_L1_CACHE_SHIFT) - document that MPENTIUM4 is the right choice for a Pentium-4 M
-
Andrew Morton authored
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com> The issue of exceedingly large hash tables has been discussed on the mailing list a while back, but seems to slip through the cracks. What we found is it's not a problem for x86 (and most other architectures) because __get_free_pages won't be able to get anything beyond order MAX_ORDER-1 (10) which means at most those hash tables are 4MB each (assume 4K page size). However, on ia64, in order to support larger hugeTLB page size, the MAX_ORDER is bumped up to 18, which now means a 2GB upper limits enforced by the page allocator (assume 16K page size). PPC64 is another example that bumps up MAX_ORDER. Last time I checked, the tcp ehash table is taking a whooping (insane!) 2GB on one of our large machine. dentry and inode hash tables also take considerable amount of memory. Setting the size of these tables is difficult: they need to be constrained on many-zone ia64 machines, but this could cause significant performance problems when there are (for example) 100 million dentries in cache. Large-memory machines which do not slice that memory up into huge numbers of zones do not need to run the risk of this slowdown. So the sizing algorithms remain essentially unchanged, and boot-time options are provided which permit the tables to be scaled down.
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> The cpu hotplug code actually provides two notifiers: CPU_UP_PREPARE which preceeds the online and can fail, and CPU_ONLINE which can't. Current usage is only done at boot, so this distinction doesn't matter, but it's a bad example to set. This also means that the migration threads do not have to be higher priority than the others, since they are ready to go before any CPU_ONLINE callbacks are done. This patch is experimental but fairly straight foward: I haven't been able to test it since extracting it from the hotplug cpu code, so it's possible I screwed something up.
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Three more removed CPU notifiers extracted from the hotplug CPU patch. kernel/softirq.c: the tasklet cpu prepration callback is useless: the vectors are already initialized to NULL. Even with the hotplug CPU patches, they're of little or no use. fs/buffer.c: once again, they are already initialized to zero. mm/page_alloc.c: once again, already initialized to zero.
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Move duplicated code to __queue_work(), and don't set the CPU for queue_delayed_work() until the timer goes off. The second one only has an effect on CONFIG_HOTPLUG_CPU where the CPU goes down and the timer goes off on a different CPU than it was scheduled on.
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Some well-meaning person put a notifier in for CPUs to update the kstat structures in sched.c. However, it does nothing, and even with the full hotplug CPU patch, it still does nothing. Simple counters very rarely need anything done when CPUs come up or go down. If you have per-cpu caches, or per-cpu threads, you need to do something. But very rarely for stats.
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> These two patches provide the framework for stopping kernel threads to allow hotplug CPU. This one just adds kthread.c and kthread.h, next one uses it. Most importantly, adds a Monty Python quote to the kernel. Details: The hotplug CPU code introduces two major problems: 1) Threads which previously never stopped (migration thread, ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline. 2) Threads which previously never had to be created now have to be created when a CPU goes online. Unfortunately, stopping a thread is fairly baroque, involving memory barriers, a completion and spinning until the task is actually dead (for example, complete_and_exit() must be used if inside a module). There are also three problems in starting a thread: 1) Doing it from a random process context risks environment contamination: better to do it from keventd to guarantee a clean environment, a-la call_usermodehelper. 2) Getting the task struct without races is a hard: see kernel/sched.c migration_call(), kernel/workqueue.c create_workqueue_thread(). 3) There are races in starting a thread for a CPU which is not yet online: migration thread does a complex dance at the moment for a similar reason (there may be no migration thread to migrate us). Place all this logic in some primitives to make life easier: kthread_create() and kthread_stop(). These primitives require no extra data-structures in the caller: they operate on normal "struct task_struct"s. Other changes: - Expose keventd_up(), as keventd and migration threads will use kthread to launch, and kthread normally uses workqueues and must recognize this case. - Kthreads created at boot before "keventd" are spawned directly. However, this means that they don't have all signals blocked, and hence can be killed. The simplest solution is to always explicitly block all signals in the kthread. - Change over the migration threads, the workqueue threads and the ksoftirqd threads to use kthread. - module.c currently spawns threads directly to stop the machine, so a module can be atomically tested for removal. - Unfortunately, this means that the current task is manipulated (which races with set_cpus_allowed, for example), and it can't set its priority artificially high. Using a kernel thread can solve this cleanly, and with kthread_run, it's simple. - kthreads use keventd, so they inherit its cpus_allowed mask. Unset it. All current users set it explicity anyway, but it's nice to fix. - call_usermode_helper uses keventd, so the process created inherits its cpus_allowed mask. Unset it. - Prevent errors in boot when cpus_possible() contains a cpu which is not online (ie. a cpu didn't come up). This doesn't happen on x86, since a boot failure makes that CPU no longer possible (hacky, but it works). - When the cpu fails to come up, some callbacks do kthread_stop(), which doesn't work without keventd (which hasn't started yet). Call it directly, and take care that it restores signal state (note: do_sigaction does a flush on blocked signals, so we don't need to repeat it).
-
Andrew Morton authored
From: Dominik Brodowski <linux@dominikbrodowski.de>, John Stultz <johnstul@us.ibm.com>, Dmitry Torokhov Add the ACPI Powermanagement Timer as x86 kernel timing source. Unlike the Time Stamp Counter, it is a reliable timing source which does not get affected by aggressive powermanagement features like CPU frequency scaling. Some ideas and some code are based on Arjan van de Ven's implementation for 2.4, and on R. Byron Moore's drivers/acpi/hardware/hwtimer.c. We also replace the loop based delay_pmtmr with a TSC based delay_pmtmr, which resolves a number of issues caused by the loop based delay. Unsynced TSCs as well frequency changing TSCs will effect the length of __delay(), but it seems this method works best.
-
Andrew Morton authored
From: "Yury V. Umanets" <umka@namesys.com> This removes a redundant assignment in loop.
-