Commits · 6b1b56f960ed422ba668f269d168e26e2ac87666 · Kirill Smelkov / linux

18 Feb, 2004 40 commits

[PATCH] dm: Tidy up the error path for alloc_dev() · 6b1b56f9
Andrew Morton authored Feb 18, 2004
```
From: Joe Thornber <thornber@redhat.com>

Tidy up the error path for alloc_dev()
```
6b1b56f9
[PATCH] dm: Maintain ordering when deferring bios · 54e37e09
Andrew Morton authored Feb 18, 2004
```
From: Joe Thornber <thornber@redhat.com>

Make sure that we maintain ordering when deferring bios.
```
54e37e09
[PATCH] dm: Get rid of struct dm_deferred_io in dm.c · a0befbbc
Andrew Morton authored Feb 18, 2004
```
From: Joe Thornber <thornber@redhat.com>

Remove struct dm_deferred_io from dm.c.  [Christophe Saout]
```
a0befbbc
[PATCH] dm: Move to_bytes() and to_sectors() into dm.h · 0901c174
Andrew Morton authored Feb 18, 2004
```
From: Joe Thornber <thornber@redhat.com>

Move to_bytes() and to_sectors() into dm.h
```
0901c174
[PATCH] dm: Export dm_vcalloc() · c087ec3d
Andrew Morton authored Feb 18, 2004
```
From: Joe Thornber <thornber@redhat.com>

Export dm_vcalloc()
```
c087ec3d

[PATCH] md: Allow partitioning of MD devices. · 1797a796

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

With this patch, md used two major numbers for arrays.

One Major is number 9 with name 'md' have unpartitioned md arrays, one per
minor number.

The other Major is allocated dynamically with name 'mdp' and had on array for
every 64 minors, allowing for upto 63 partitions.

The arrays under one major are completely separate from the arrays under the
other.

The preferred name for devices with the new major are of the form:

  /dev/md/d1p3  # partion 3 of device 1 - minor 67

When a paritioned md device is assembled, the partitions are not recognised
until after the whole-array device is opened again.  A future version of
mdadm will perform this open so that the need will be transparent.

1797a796

[PATCH] md: Dynamically limit size of bio requests used for raid1 resync · 5077fef0

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

Currently raid1 uses PAGE_SIZE read/write requests for resync, as it doesn't
know how to honour per-device restrictions. This patch uses to bio_add_page
to honour those restrictions and ups the limit on request size to 64K. This
has a measurable impact on rebuild speed (25M/s -> 60M/s)

5077fef0

[PATCH] md: Avoid unnecessary bio allocation during raid1 resync · 89654f5b

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

For each resync request, we allocate a "r1_bio" which has a bio "master_bio"
attached that goes largely unused.  We also allocate a read_bio which is
used.  This patch removes the read_bio and just uses the master_bio instead.

This fixes a bug wherein bi_bdev of the master_bio wasn't being set, but was
being used.

We also introduce a new "sectors" field into the r1_bio as we can no-longer
rely in master_bio->bi_sectors.

89654f5b

[PATCH] md: Remove some un-needed fields from r1bio_s · d0d464b1

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

next_r1 is never used, so it can just go.

read_bio isn't needed as we can easily use one of the pointers in the
write_bios array - write_bios[->read_disk].  So rename "write_bios" to "bios"
and store the pointer to the read bio in there.

d0d464b1

[PATCH] md: Discard the cmd field from r1_bio structure · ebf7768e

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

The only time it is really needed is to differentiate a retry-on-fail from a
write-after-read-for-resync request to raid1d.  So we use a bit in 'state'
for that.

ebf7768e

[PATCH] md: Split read and write end_request handlers · c1dd448e

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

Instead of having a single end_request handler that must determine whether it
was a read or a write request, we have two separate handlers, which makes
each of them easier to follow.

c1dd448e

[PATCH] md: Print "deprecated" warning when START_ARRAY is used. · a2c4e506

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

The "START_ARRAY" ioctl depends on major/minor numbers (as stored in the raid
superblock) are stable over reboots, which is increasingly untrue.

There are better ways to start an array (e.g. with mdadm) so we mark the
ioctl as deprecated for 2.6, and will remove it in 2.7.

a2c4e506

[PATCH] kNFSd:fix build problems in nfs w/o proc_fs on 2.6.0-test5 · 67afcb4f

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

From: Stephen Hemminger <shemminger@osdl.org>
Date: Fri, 12 Sep 2003 11:31:06 -0700

NFS won't build w/o CONFIG_PROC_FS.  Looks like typo's (or a C++
programmer) in stats.h

67afcb4f

[PATCH] kNFSd: convert NFS /proc interfaces to seq_file · 2a0807bd

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

From: shemminger@osdl.org Sat Sep  6 09:19:50 2003
Date: Fri, 5 Sep 2003 16:19:30 -0700

Converts /proc/net/rpc/nfs and /proc/net/rpc/nfsd to use the simpler
seq_file interface.

2a0807bd

[PATCH] kNFSd: ip_map_init does a kmalloc which isn't checked... · bbcc5fa8

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

There is no way to return an error from a cache init routine, so instead we
make sure to pre-allocate the memory needed, and free it after the lookup
if the lookup failed.

bbcc5fa8

[PATCH] kNFSd: Allow sunrpc/svc cache init function to modify the "key" · 9417bd87

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

When adding a item to a sunrpc/svc cache that contains kmalloced data it is
usefully to move the malloced data out of the key object into the new cache
object rather than copying (as then we would need to cope with kmalloc
failure and such).  This means modifying the original.

If the kmalloced data forms part of the key, then we must not move the data
out until after the key isn't needed any more.  So this patch moves the
call to "INIT" on a new item (which fills in the key) to *after* the item
has been found (or not), and also makes sure we only call the HASH function
once.

Thanks to "J.  Bruce Fields" <bfields@fieldses.org>

also

 1/ remove unnecessary assignment
 2/ fix comments that lag behind implementation.

9417bd87

[PATCH] kNFSd: Fix possible scheduling_while_atomic in cache.c · 16b82dca

Andrew Morton authored Feb 18, 2004

From: NeilBrown <neilb@cse.unsw.edu.au>

We currently call cache_put, which can schedule(), under a spin_lock.  This
patch moves that call outside the spinlock.

16b82dca

[PATCH] #if versus #ifdef cleanup · c65febbb

Andrew Morton authored Feb 18, 2004

From: Valdis.Kletnieks@vt.edu

15 changes of #if to #ifdef and 2 places CONFIG_FOO should be
defined(CONFIG_FOO).  This gets rid of spurious warnings if you build with
"-Wundef" so you get a warning if you have a preprocessor command like:

#if CONFIG_ETRAX_DS1302_RSTBIT == 27

and you'll be told if it's substituting a zero rather than silent
weirdness and unexpected code generation.

c65febbb

[PATCH] MIPS: New 2.6 serial drivers · b7df53b3

Andrew Morton authored Feb 18, 2004

From: Ralf Baechle <ralf@linux-mips.org>

Three new MIPS-specific serial drivers. ip22.c is derived from the sparc
zilog driver; guess we should write a generic Zilog driver somewhen ...

b7df53b3

[PATCH] Enable coredumps > 2GB · 95b387a4

Andrew Morton authored Feb 18, 2004

From: Andi Kleen <ak@muc.de>

Some x86-64 users were complaining that coredumps >2GB don't work.

This will enable large coredump for everybody.  Apparently the 32bit
gdb/binutils cannot handle them, but I hear the binutils people are working
on fixing that.  I doubt it will harm people - unreadable coredumps are not
worse than no coredump and it won't make any difference in space usage if
you get a 1.99GB or a 2.5GB coredump.  So just enable it unconditionally.
If it should be really a problem for 32bit the rlimit defaults in
resource.h could be changed.

For file systems that don't support O_LARGEFILE you should just get an
truncated coredumps for big address spaces.

95b387a4

[PATCH] devfs: race fixes and cleanup · bf98c406

Andrew Morton authored Feb 18, 2004

From: Andrey Borzenkov <arvidjaar@mail.ru>

- use struct nameidata in devfs_d_revalidate_wait to detect when it is
  called without i_sem hold; take i_sem on parent in this case.  This
  prevents both deadlock with devfs_lookup by allowing it to drop i_sem
  consistently and oops in d_instantiate by ensuring that it always runs
  protected

- remove dead code that deals with major number allocation.  The only
  remaining user was devfs itself and patch changes it to

- use register_chardev to get device number for internal /dev/.devfsd and
  /dev/.statd.

- remove dead auto allocation flag as well

- remove code that does module get on dev open - it is handled by fops_get.
   Use init_special_inode consistently

- get rid of struct cdev_type and bdev_type - both have just single dev_t
  now

bf98c406

[PATCH] snprintf fixes · 01d1a791

Andrew Morton authored Feb 18, 2004

From: Juergen Quade <quade@hsnr.de>

Lots of places in the kernel are using [v]snprintf wrongly: they assume it
returns the number of characters copied.  It doesn't.  It returns the
number of characters which _would_ have been copied had the buffer not been
filled up.

So create new functions vscnprintf() and scnprintf() which have the
expected (sane) semaptics, and migrate callers over to using them.

01d1a791

[PATCH] bd_set_size i_size handling · 53b15b86

Andrew Morton authored Feb 18, 2004

We need to hold i_sem while running i_size_write().  But that seems like a
lot of fuss and deadlock potential.  So just write the dang thing.

53b15b86

[PATCH] Mark intermezzo as broken · eaaec5b5

Andrew Morton authored Feb 18, 2004

The NGROUPS changes broke it, and we're not sure how to fixit, and nobody
appears to be working on or testing intermezzo.

eaaec5b5

[PATCH] NGROUPS 2.6.2rc2 + fixups · a937b06e

Andrew Morton authored Feb 18, 2004

From: Tim Hockin <thockin@sun.com>,
      Neil Brown <neilb@cse.unsw.edu.au>,
      me

New groups infrastructure.  task->groups and task->ngroups are replaced by
task->group_info.  Group)info is a refcounted, dynamic struct with an array
of pages.  This allows for large numbers of groups.  The current limit of
32 groups has been raised to 64k groups.  It can be raised more by changing
the NGROUPS_MAX constant in limits.h

a937b06e

[PATCH] bonding alias revert and documentation fix · 7e594425

Andrew Morton authored Feb 18, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

Jeff Garzik disliked the bonding driver knowing it was called "bond0".
Remove that alias, and revert documentation.

7e594425

[PATCH] add some more MODULE_ALIASes · 69b848dd

Andrew Morton authored Feb 18, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

New MODULE_ALIASes in:
1) arch/i386/kernel/microcode.c
2) drivers/char/genrtc.c
3) drivers/ide/ide-tape.c
4) drivers/net/bonding/bond_main.c
5) drivers/net/bsd_comp.c
6) drivers/net/ppp_deflate.c
7) drivers/net/ppp_generic.c

69b848dd

[PATCH] Documentation: remove /etc/modules.conf refs · bf5e91d7

Andrew Morton authored Feb 18, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

Someone complained about the number of references to /etc/modules.conf in
the documentation.  While fixing them up (and examples where changed),
removed those which are redundant due to MODULE_ALIAS.

bf5e91d7

[PATCH] AMD Elan is a different subarch · 4aef2132

Andrew Morton authored Feb 18, 2004

From: Adrian Bunk <bunk@fs.tum.de>

- AMD Elan is a different subarch, you can't configure a kernel that runs
  on both the AMD Elan and other i386 CPUs

- added optimizing CFLAGS for the AMD Elan

4aef2132

[PATCH] gcc 2.95 supports -march=k6 (no need for check_gcc) · b26c400f
Andrew Morton authored Feb 18, 2004
```
From: Adrian Bunk <bunk@fs.tum.de>

gcc 2.95 supports -march=k6 (no need for check_gcc)
```
b26c400f

[PATCH] add Pentium M and Pentium-4 M options · 53720dcf

Andrew Morton authored Feb 18, 2004

From: Adrian Bunk <bunk@fs.tum.de>

add Pentium M and Pentium-4 M options:

- add MPENTIUMM (equivalent to PENTIUMIII except for a bigger
  X86_L1_CACHE_SHIFT)

- document that MPENTIUM4 is the right choice for a Pentium-4 M

53720dcf

[PATCH] Limit hashtable sizes · 7453596a

Andrew Morton authored Feb 18, 2004

From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>

The issue of exceedingly large hash tables has been discussed on the
mailing list a while back, but seems to slip through the cracks.

What we found is it's not a problem for x86 (and most other
architectures) because __get_free_pages won't be able to get anything
beyond order MAX_ORDER-1 (10) which means at most those hash tables are
4MB each (assume 4K page size). However, on ia64, in order to support
larger hugeTLB page size, the MAX_ORDER is bumped up to 18, which now
means a 2GB upper limits enforced by the page allocator (assume 16K page
size). PPC64 is another example that bumps up MAX_ORDER.

Last time I checked, the tcp ehash table is taking a whooping (insane!)
2GB on one of our large machine. dentry and inode hash tables also take
considerable amount of memory.

Setting the size of these tables is difficult: they need to be constrained on
many-zone ia64 machines, but this could cause significant performance
problems when there are (for example) 100 million dentries in cache.
Large-memory machines which do not slice that memory up into huge numbers of
zones do not need to run the risk of this slowdown.

So the sizing algorithms remain essentially unchanged, and boot-time options
are provided which permit the tables to be scaled down.

7453596a

[PATCH] Use CPU_UP_PREPARE properly · 86c1b9ae

Andrew Morton authored Feb 18, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

The cpu hotplug code actually provides two notifiers: CPU_UP_PREPARE
which preceeds the online and can fail, and CPU_ONLINE which can't.

Current usage is only done at boot, so this distinction doesn't
matter, but it's a bad example to set.  This also means that the
migration threads do not have to be higher priority than the
others, since they are ready to go before any CPU_ONLINE callbacks
are done.

This patch is experimental but fairly straight foward: I haven't been
able to test it since extracting it from the hotplug cpu code, so it's
possible I screwed something up.

86c1b9ae

[PATCH] Remove More Unneccessary CPU Notifiers · 79caa7d5

Andrew Morton authored Feb 18, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

Three more removed CPU notifiers extracted from the hotplug CPU patch.

kernel/softirq.c: the tasklet cpu prepration callback is useless:
the vectors are already initialized to NULL.  Even with the hotplug
CPU patches, they're of little or no use.

fs/buffer.c: once again, they are already initialized to zero.

mm/page_alloc.c: once again, already initialized to zero.

79caa7d5

[PATCH] Minor workqueue.c cleanup · d01feda8

Andrew Morton authored Feb 18, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

Move duplicated code to __queue_work(), and don't set the CPU for
queue_delayed_work() until the timer goes off.  The second one only has an
effect on CONFIG_HOTPLUG_CPU where the CPU goes down and the timer goes off
on a different CPU than it was scheduled on.

d01feda8

[PATCH] Remove kstat cpu notifiers · 35651c8c

Andrew Morton authored Feb 18, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

Some well-meaning person put a notifier in for CPUs to update the kstat
structures in sched.c.  However, it does nothing, and even with the full
hotplug CPU patch, it still does nothing.

Simple counters very rarely need anything done when CPUs come up or go
down.  If you have per-cpu caches, or per-cpu threads, you need to do
something.  But very rarely for stats.

35651c8c

[PATCH] kthread primitive · 933ba102

Andrew Morton authored Feb 18, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

These two patches provide the framework for stopping kernel threads to
allow hotplug CPU.  This one just adds kthread.c and kthread.h, next
one uses it.

Most importantly, adds a Monty Python quote to the kernel.

Details:

The hotplug CPU code introduces two major problems:

1) Threads which previously never stopped (migration thread,
   ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
2) Threads which previously never had to be created now have
   to be created when a CPU goes online.

Unfortunately, stopping a thread is fairly baroque, involving memory
barriers, a completion and spinning until the task is actually dead
(for example, complete_and_exit() must be used if inside a module).

There are also three problems in starting a thread:
1) Doing it from a random process context risks environment contamination:
   better to do it from keventd to guarantee a clean environment, a-la
   call_usermodehelper.
2) Getting the task struct without races is a hard: see kernel/sched.c
   migration_call(), kernel/workqueue.c create_workqueue_thread().
3) There are races in starting a thread for a CPU which is not yet
   online: migration thread does a complex dance at the moment for
   a similar reason (there may be no migration thread to migrate us).

Place all this logic in some primitives to make life easier:
kthread_create() and kthread_stop().  These primitives require no
extra data-structures in the caller: they operate on normal "struct
task_struct"s.

Other changes:

- Expose keventd_up(), as keventd and migration threads will use kthread to
  launch, and kthread normally uses workqueues and must recognize this case.

- Kthreads created at boot before "keventd" are spawned directly.  However,
  this means that they don't have all signals blocked, and hence can be
  killed.  The simplest solution is to always explicitly block all signals in
  the kthread.

- Change over the migration threads, the workqueue threads and the
  ksoftirqd threads to use kthread.

- module.c currently spawns threads directly to stop the machine, so a
  module can be atomically tested for removal.

- Unfortunately, this means that the current task is manipulated (which
  races with set_cpus_allowed, for example), and it can't set its priority
  artificially high.  Using a kernel thread can solve this cleanly, and with
  kthread_run, it's simple.

- kthreads use keventd, so they inherit its cpus_allowed mask.  Unset it.
  All current users set it explicity anyway, but it's nice to fix.

- call_usermode_helper uses keventd, so the process created inherits its
  cpus_allowed mask.  Unset it.

- Prevent errors in boot when cpus_possible() contains a cpu which is not
  online (ie.  a cpu didn't come up).  This doesn't happen on x86, since a
  boot failure makes that CPU no longer possible (hacky, but it works).

- When the cpu fails to come up, some callbacks do kthread_stop(), which
  doesn't work without keventd (which hasn't started yet).  Call it directly,
  and take care that it restores signal state (note: do_sigaction does a
  flush on blocked signals, so we don't need to repeat it).

933ba102

[PATCH] ACPI PM timer · ad77865c

Andrew Morton authored Feb 18, 2004

From: Dominik Brodowski <linux@dominikbrodowski.de>,
      John Stultz <johnstul@us.ibm.com>,
      Dmitry Torokhov

Add the ACPI Powermanagement Timer as x86 kernel timing source.  Unlike the
Time Stamp Counter, it is a reliable timing source which does not get
affected by aggressive powermanagement features like CPU frequency scaling.

Some ideas and some code are based on Arjan van de Ven's implementation for
2.4, and on R.  Byron Moore's drivers/acpi/hardware/hwtimer.c.


We also replace the loop based delay_pmtmr with a TSC based delay_pmtmr,
which resolves a number of issues caused by the loop based delay.  Unsynced
TSCs as well frequency changing TSCs will effect the length of __delay(), but
it seems this method works best.

ad77865c

[PATCH] loop: remove redundant initialisation · ee6afa31
Andrew Morton authored Feb 18, 2004
```
From: "Yury V. Umanets" <umka@namesys.com>

This removes a redundant assignment in loop.
```
ee6afa31

[PATCH] loop.c doesn't fail init gracefully · 685eba2c

Andrew Morton authored Feb 18, 2004

From: BlaisorBlade <blaisorblade_spam@yahoo.it>

loop_init doesn't fail gracefully for two reasons:

1) If initialization of loop driver fails, we have an call to
   devfs_add("loop") without any devfs_remove; I add that.

2) On lwn.net 2.6 kernel docs, Jonathan Corbet says: "If you are calling
   add_disk() in your driver initialization routine, you should not fail
   the initialization process after the first call."

So I make loop.c conform to this request by moving add_disk after all
memory allocations.

685eba2c