Commits · 85734c4777ef9ea13f46fcc5d666b4e662197878 · Kirill Smelkov / linux

29 Dec, 2003 40 commits

[PATCH] dm and bounce buffer panic fix · 85734c47

Andrew Morton authored Dec 29, 2003

From: Mark Haverkamp <markh@osdl.org>

About three weeks ago markw at osdl posted a mail about a panic that he
was seeing:

http://marc.theaimsgroup.com/?l=linux-kernel&m=106737176716474&w=2

I believe what is happening, is that the dm __clone_and_map function is
generating bio structures with the bi_idx field non-zero.  When
__blk_queue_bounce creates a new bio with bounce pages, it sets the bi_idx
field to 0 rather than the bi_idx of the original.  This causes trouble since
bv_page pointers will be dereferenced later that are zero.  The following
uses the original bio structure's bi_idx in the new bio structure and in
copy_to_high_bio_irq and bounce_end_io.

This has cleared up the panic when using the volume.

(acked by Joe Thornber)

85734c47

[PATCH] ext3: bd_claim for journal device · 9907e736

Andrew Morton authored Dec 29, 2003

From: Neil Brown <neilb@cse.unsw.edu.au>

Change ext3 to run bd_claim() against external journal devices. It is
significant only for those who have ext3 journals on a separate device, and
gets exclusive access to that device.

9907e736

[PATCH] remove include recursion from linux/pagemap.h · 1fcec52f
Andrew Morton authored Dec 29, 2003
```
From: Arnaldo Carvalho de Melo <acme@conectiva.com.br>

pagemap.h, do not include thyself.
```
1fcec52f
[PATCH] remove lock_kernel() from proc_bus_pci_lseek() · 1b6f967a
Andrew Morton authored Dec 29, 2003
```
Remove pointless lock_kernel(), replace with the standard-but-still-odd
i_sem-based lseek locking.
```
1b6f967a

[PATCH] fix oops in proc_kill_inodes() · 4617516d

Andrew Morton authored Dec 29, 2003

proc_kill_inodes() walks the s_files list, playing with ->f_dentry.

But there is a window in which __fput() will leave a file on that list with a
null f_dentry and f_vfsmnt.

I'm not sure it was ever confirmed that this fixed the reported oops, but it
seems much better to set those fields to null _after_ removing the filp from
the list.

(Actually, there's no need to null those pointers out at all.  But whatever;
it caught a bug).

4617516d

[PATCH] pagefault accounting fix · d2c585d3

Andrew Morton authored Dec 29, 2003

From: William Lee Irwin III <wli@holomorphy.com>

Our accounting of minor faults versus major faults is currently quite wrong.

To fix it up we need to propagate the actual fault type back to the
higher-level code.  Repurpose the currently-unused third arg to ->nopage
for this.

d2c585d3

[PATCH] Remove CLONE_FILES from init kernel thread creation · 282ed003

Andrew Morton authored Dec 29, 2003

From: James Morris <jmorris@redhat.com>

The patch below removes the CLONE_FILES flag from the kernel_thread() call
which starts init.

This is to prevent other kernel threads from sharing file descriptors
opened by init (try 'lsof /dev/initctl' on a 2.6 system :-).

The reason this patch is being proposed is so that usermode helper apps
launched via kernel threads (e.g. modprobe, hotplug) do not then inherit
any such file descriptors.  This is not a problem in itself so far (other
than being messy), but it is a problem for SELinux, which will otherwise
need to grant access to /dev/initctl by modprobe and hotplug, a somewhat
undesirable scenario.

As far as I can tell, there is no reason why init needs to be spawned with
CLONE_FILES.  Please let me know if there are any objections to the
change, which I would like to propose for 2.6.0+ as a cleanup.

282ed003

[PATCH] Add support for SGI's IOC4 chipset · 125a4634
Andrew Morton authored Dec 29, 2003
```
From: Aniket Malatpure <aniket@sgi.com>

Adds support for the IOC4 IDE part.
```
125a4634

[PATCH] new /proc/irq cpumask format; consolidate cpumask display and input code · 409c7f3a

Andrew Morton authored Dec 29, 2003

From: Paul Jackson <pj@sgi.com>

This patch is a followup to one from Bill Irwin.  On Nov
17, he had consolidated the half-dozen chunks of code
that displayed cpumasks in /proc/irq/prof_cpu_mask and
/proc/irq/<pid>/smp_affinity into a single routine, which he
called format_cpumask().

I believe that Andrew Morton has accepted Bill's patch into
his 2.6.0-test10-mm1 patch set as the "format_cpumask" patch.
I hope that the following patch will replace Bill's patch.
I look forward to Bill's feedback on this patch.

The following patch carries Bill's work further:

 1) It also consolidates the input side (write syscalls).
 2) It adapts a new format, same on input and output.
 3) The core routines work for any multi-word bitmask,
    not just cpumasks.
 4) The core routines avoid overrunning their output
    buffers.

Note esp. for David Mosberger:

    The small patch I sent you and the linux-ia64 list
    yesterday entitled: "check user access ok writing
    /proc/irq/<pid>/smp_affinity" for arch ia64 only is
    _separate_ from the following patch.  Neither presumes the
    other.  However, they do collide on one line.  Last one in
    is a Monkey's Uncle and will need an updated patch from me
    (or otherwise need to resolve the one obvious collision).

Details of the following patch:

Both the display and input of cpumasks on 9 arch's are
consolidated into a single pair of routines, which use the
same format for input and output, as recommended by Tony
Luck.  The two common routines work on any multi-word bitmask
(array of unsigned longs).  A pair of trivial inline wrappers
cpumask_snprintf() and cpumask_parse() hide this generality
for the common case of cpumask input and output.

My real motivation for consolidating this code will become
visible later - when I seek to add a nodemask_t that resembles
cpumask_t (just a different length).  These common underlying
routines will be used there as well, following up on a suggestion
of Christoph Hellwig that I investigate implementing nodemask_t
as an ADT sharing infrastructure with cpumask_t.  However, I
believe that this patch stands on its own merit, consolidating
a couple hundred lines of duplicated code, and making the
cpumask display format usable on very large systems.

There are two exceptions to the consolidation - the alpha and
sparc64 arch's manipulate bare unsigned longs, not cpumask_t's,
on input (write syscall), and do stuff that was more funky than
I could make sense of.  So the input side of these two arch's
was left as-is.  I'd welcome someone with access to either of
these systems to provide additional patches.

The new format consists of multiple 32 bit words, separated by
commas, displayed and input in hex.  The following comment from
this patch describes this format further:

* The ascii representation of multi-word bit masks displays each
* 32bit word in hex (not zero filled), and for masks longer than
* one word, uses a comma separator between words.  Words are
* displayed in big-endian order most significant first.  And hex
* digits within a word are also in big-endian order, of course.
*
* Examples:
*   A mask with just bit 0 set displays as "1".
*   A mask with just bit 127 set displays as "80000000,0,0,0".
*   A mask with just bit 64 set displays as "1,0,0".
*   A mask with bits 0, 1, 2, 4, 8, 16, 32 and 64 set displays
*     as "1,1,10117".  The first "1" is for bit 64, the second
*     for bit 32, the third for bit 16, and so forth, to the
*     "7", which is for bits 2, 1 and 0.
*   A mask with bits 32 through 39 set displays as "ff,0".

The essential reason for adding the comma breaks was to make
the long masks from our (SGI's) big 512 CPU systems parsable by
humans.  An unbroken string of 128 hex digits is pretty difficult
to read.  For those who are compiling systems with CONFIG_NR_CPUS
of 32 or less, there should be no visible change in format.

There are of course a thousand possible output formats that
meet similar criteria.  If someone wants to lobby for and seek
consensus behind another such format, that's fine.  Now that
the format is consolidated into a single pair of routines,
it should be easy to adapt whatever we choose.

Internally, the display routine uses snprintf to track the
remaining space in its output buffer, to avoid the risk of
overrunning it.

A new file, lib/mask.c, is added to the lib directory, to
hold the two common routines.  I anticipate adding a few more
common routines for generic support of multi-word bit masks to
lib/mask.c, in subsequent patches that will add a nodemask_t
type as an ADT sharing implementation with cpumask_t.

409c7f3a

[PATCH] cpumask.h reorg · 89832108

Andrew Morton authored Dec 29, 2003

From: Paul Jackson <pj@sgi.com>

Push the cpumask implementation from linux/cpumask.h into asm/cpumask.h, so
that ia64 can do special things without breaking sparc64.

1) Each arch has its own include/asm-<arch>/cpumask.h file

2) That arch-specific header file can include <asm-generic/cpumask.h>,
   if it wants to make use of the generic cpumask implementation.

3) Using code should continue to include linux/cpumask.h, which
   in turn includes asm/cpumask.h.  Some common implementation
   independent cpumask related items, such as the cpu_online_map,
   are declared directly in linux/cpumask.h.

89832108

[PATCH] Add lib/parser.c kernel-doc · adf9a351

Andrew Morton authored Dec 29, 2003

From: Will Dyson <will_dyson@pobox.com>

Add documentation and comments to lib/parser.c and include/linux/parser.h

adf9a351

[PATCH] IDE capability elevation fix · cb8d8fe9

Andrew Morton authored Dec 29, 2003

From: Alan Cox <alan@redhat.com>

Capability elevation bug in 2.6.0 IDE. Long fixed in 2.4.x, trivial to cure

cb8d8fe9

[PATCH] IDE MMIO fix · 90c6dd77

Andrew Morton authored Dec 29, 2003

From: Alan Cox <alan@redhat.com>

IDE core code had the mmio==2 (ioremap) mode supported but two small changes
had been missed for ide-dma.c. Without this fix mmio IDE controllers bomb if
you have plenty of memory as it uses request_mem_region on an ioremap return.

90c6dd77

[PATCH] Can't disable IDE DMA · 22f4d9f1

Andrew Morton authored Dec 29, 2003

From: Peter Chubb <peterc@gelato.unsw.edu.au>

If you try to disable IDE DMA from Kconfig, you'll end up with an undefined
symbol, ide_hwif_setup_dma().

The attached rather ugly patch fixes the problem by defining a dummy
function.

22f4d9f1

[PATCH] PIIX5 Doesn't work on IA64 · c1f0e653

Andrew Morton authored Dec 29, 2003

From: Peter Chubb <peterc@gelato.unsw.edu.au>

The PIIX5 IDE controller on I2000 IA64 boxen using the 460GX chipset will
hang on startup if an ordinary harddrive is plugged into it (it seems to
workj for the LSI120 and the CDROM drives).

This is because the 460GX chipset contains a PCI expanssion bridge that
works like the 450NX PXB, and has the same PCI ID (but a later revision).
The PIIX driver, to work around interactions between PIIX4 and the 450NX
PXB, tries to disable DMA.

Unfortunately, the way it tries to disable DMA doesn't work, and the higher
layers think that DMA is still on, and so timeout waiting for DMA, and then
hang on bootup.

A simple workaround is to tighten the check for the buggy chipset, as in
the attached patch.  However, someone with more time (and who actually
*understands* the IDE subsystem) needs to fix the real bug as well.

c1f0e653

[PATCH] ide-tape update · 8179c97e

Andrew Morton authored Dec 29, 2003

From: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>,
      Stuart Hayes <stuart_hayes@dell.com>

- Check drive's write protect bit, try to return appropriate
  errors when attempting to write a write-protected tape.

- Moved "idetape_read_position" call in idetape_chrdev_open
  after the "wait_ready" call.

- Added IDETAPE_MEDIUM_PRESENT flag so driver would know
  not to rewind tape after ejecting it.

- Fixed bug with ide_abort_pipeline (it was deleting stages
  from tape->next_stage to end, instead of from
  new_last_stage->next (tape->next_stage was set to NULL
  by idetape_discard_read_pipeline before calling!).

- Made improvements to idetape_wait_ready.

- Added a few comments here and there.

- Made MTOFFL unlock tape drive door before attempting to eject.

- Added fixes to get Seagate STT3401A Travan working:
  Handle drives that don't support 0-length reads/writes increased timeout
  (retension takes ~10 minutes before irq is returned).
  Fixed request mode page packet command byte 3.

Also remove code depending on NO_LONGER_REQUIRED to match 2.4.x (me).

8179c97e

[PATCH] Minor bug fixes to the compat layer · 14209d06

Andrew Morton authored Dec 29, 2003

From: Arun Sharma <arun.sharma@intel.com>

- Several instances where we were using pid_t instead of uid_t

- If the caller passed a NULL `oldact' pointer into sys_sigprocmask then
  don't try to write the old sigmask there.

14209d06

[PATCH] watchdog write() return value fixes · 41339307

Andrew Morton authored Dec 29, 2003

From: gleb@nbase.co.il (Gleb Natapov)

There is inconsistency in fops->write() implementation in different
watchdog drivers.  Some of them return number of bytes written while others
return 1.

I think the correct implementation should always return number of bytes
written (we examine all the buffer after all) otherwise "echo V >
/dev/watchdog" doesn't work as expected (it doesn't stop watchdog).

41339307

[PATCH] missing padding in cpio_mkfile in usr/gen_init_cpio.c · a7380b60

Andrew Morton authored Dec 29, 2003

From: Olaf Hering <olh@suse.de>

We need to update `offset' here so that the subsequent push_pad() (which
uses `offset') will do the right thing.

a7380b60

[PATCH] document elevator= parameter · a5c9613f

Andrew Morton authored Dec 29, 2003

From: Valdis.Kletnieks@vt.edu

Nick wrote a nice as-iosched.txt file, but apparently nobody updated the
kernel-parameters.txt file...

a5c9613f

[PATCH] support centrino 1GHz · ce2da20e

Andrew Morton authored Dec 29, 2003

From: Jeremy Fitzhardinge <jeremy@goop.org>

I've been getting quite a lot of people mailing me about this CPU.  It
seems Toshiba has released a machine with it.  It would be nice if this
patch gets into a kernel soonish.  It's very low-impact.

ce2da20e

[PATCH] Intel 440gx PCI IDs · a77ef229
Andrew Morton authored Dec 29, 2003
```
- Add missing PCI ID

- Forward-port IRQ routing workaround from 2.4.
```
a77ef229

[PATCH] seq_file version of /proc/interrupts · ab6b1810

Andrew Morton authored Dec 29, 2003

From: corbet@lwn.net (Jonathan Corbet)

This converts all architectures' /proc/interrupts implementation over to
seq_file.  We need this for SMP machines with ridiculous numbers of CPUs and
if you convert one arch, you have to convert them all...

ab6b1810

[PATCH] eicon/ and hardware/eicon/ drivers using the same symbols · b031787e

Andrew Morton authored Dec 29, 2003

From: Adrian Bunk <bunk@fs.tum.de>

The legacy eicon driver in drivers/isdn/eicon is the old one and will be
removed as soon as all features went to the new driver.  Anyway this old
driver was never meant to be non-module.

b031787e

[PATCH] fix SOUND_CMPCI Configure help entry · 54f47272

Andrew Morton authored Dec 29, 2003

From: Adrian Bunk <bunk@fs.tum.de>

the issue below is only a minor documentation fix, but it has confused
me when configuring a kernel for such a card.

54f47272

[PATCH] find_busiest_queue() commentary fix · 2d0014c7
Andrew Morton authored Dec 29, 2003
```
From: Ingo Molnar <mingo@elte.hu>

Clarify a comment in the CPU scheduler.
```
2d0014c7

[PATCH] use alloc_percpu in percpu_counters · 22565897

Andrew Morton authored Dec 29, 2003

From: Martin Hicks <mort@wildopensource.com>

Once NR_CPUS exceeds about 300 ext2 and ext3 will not compile, because the
percpu counters in the superblocks are so huge that they cannot be kmalloced.

Fix this by converting the percpu_counter mechanism to use alloc_percpu()
rather than an NR_CPUS-sized array.

22565897

[PATCH] lockless semop · 55e8b1a1

Andrew Morton authored Dec 29, 2003

From: Manfred Spraul <manfred@colorfullife.com>

attached is the lockless semop patch. I did another test run with 
idle=poll on an pentium III, and it remained unchanged: 99.9% direct 
fast path, 0.1% race with wakeup against writing the final result code:

http://khack.osdl.org/stp/282936/environment/proc/slabinfo

That means there is no immediate need to add the two-stage
implementation to finish_wait.

It reduces the spinlock operations on the semaphore array spinlock by 1/3.

55e8b1a1

[PATCH] Fix writev atomicity on pipe/fifo · 1af764e1

Andrew Morton authored Dec 29, 2003

From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

Current writev() of pipe/fifo can be interleaved with data from other
processes doing writes even when the requests size is <= PIPE_BUF.  These
writes should in fact be atomic.

The readv() side is also supported for same behavior with read().  And it
is faster.

readv/writev version of bw_pipe in LMbench

2.6.0-test9-bk12
hirofumi@devron (i686-pc-linux-gnu)[1010]$ ./bw_pipe -m 4096 -M 5
Pipe bandwidth: 45.53 MB/sec
hirofumi@devron (i686-pc-linux-gnu)[1009]$ ./bw_pipe -m 1024 -M 5
Pipe bandwidth: 20.08 MB/sec

2.6.0-test9-bk12 + patch
hirofumi@devron (i686-pc-linux-gnu)[1001]$ ./bw_pipe -m 4096 -M 5
Pipe bandwidth: 65.98 MB/sec
hirofumi@devron (i686-pc-linux-gnu)[1002]$ ./bw_pipe -m 1024 -M 5
Pipe bandwidth: 32.19 MB/sec

1af764e1

[PATCH] optimize ia32 memmove · ed109bc5

Andrew Morton authored Dec 29, 2003

From: Manfred Spraul <manfred@colorfullife.com>

The memmove implementation of i386 is not optimized: it uses movsb, which is
far slower than movsd.  The optimization is trivial: if dest is less than
source, then call memcpy().  markw tried it on a 4xXeon with dbt2, it saved
around 300 million cpu ticks in cache_flusharray():

oprofile, GLOBAL_POWER_EVENTS, count 100k
Before:
c0144ed1 <cache_flusharray>: /* cache_flusharray total:  21823  0.0165 */
     6 4.5e-06 :c0144f8e:       cmp    %esi,%ebx
    11 8.3e-06 :c0144f90:       jae    c0144f9e <cache_flusharray+0xcd>
     3 2.3e-06 :c0144f92:       mov    %ebx,%edi
  7305  0.0055 :c0144f94:       repz movsb %ds:(%esi),%es:(%edi)
   201 1.5e-04 :c0144f96:       add    $0x10,%esp

After:
c0144f1d <cache_flusharray>: /* cache_flusharray total:  17959  0.0136 */
  1270 9.6e-04 :c0144f1d:       push   %ebp
[snip]
     6 4.6e-06 :c0144fdc:       cmp    %esi,%ebx
    13 9.9e-06 :c0144fde:       jae    c0145000 <cache_flusharray+0xe3>
     2 1.5e-06 :c0144fe0:       mov    %edx,%eax
     1 7.6e-07 :c0144fe2:       mov    %ebx,%edi
    11 8.4e-06 :c0144fe4:       shr    $0x2,%eax
     1 7.6e-07 :c0144fe7:       mov    %eax,%ecx
  4129  0.0031 :c0144fe9:       repz movsl %ds:(%esi),%es:(%edi)
   261 2.0e-04 :c0144feb:       test   $0x2,%dl
    27 2.1e-05 :c0144fee:       je     c0144ff2 <cache_flusharray+0xd5>
               :c0144ff0:       movsw  %ds:(%esi),%es:(%edi)
    95 7.2e-05 :c0144ff2:       test   $0x1,%dl
    96 7.3e-05 :c0144ff5:       je     c0144ff8 <cache_flusharray+0xdb>
               :c0144ff7:       movsb  %ds:(%esi),%es:(%edi)
   121 9.2e-05 :c0144ff8:       add    $0x1c,%esp

ed109bc5

[PATCH] Use NODES_SHIFT to calculate ZONE_SHIFT · e2c3c9e2

Andrew Morton authored Dec 29, 2003

From: jbarnes@sgi.com (Jesse Barnes)

Now that we have a proper NODES_SHIFT value, we need to use it to define
ZONE_SHIFT otherwise we'll spill over 8 bits if we have more than 85 nodes.

e2c3c9e2

[PATCH] Fix for more than 256 CPUs · e403669e

Andrew Morton authored Dec 29, 2003

From: Paul Jackson <pj@sgi.com>

The patch is needed to build NR_CPUS > 256.

Without this fix, you get compile errors:
    include/linux/cpumask.h: In function `next_online_cpu':
    include/linux/cpumask.h:56: structure has no member named `val'

e403669e

[PATCH] ia32 WP test cleanup · 6caf4668

Andrew Morton authored Dec 29, 2003

From: Zwane Mwaikambo <zwane@arm.linux.org.uk>

Make the test unconditional - we can always run it now we have fixmap
support.

6caf4668

[PATCH] Restore /proc/pid/maps formatting · 3f3a4378

Andrew Morton authored Dec 29, 2003

The seq_file conversion of /proc/pid/maps caused altered behaviour with
respect to 2.4.22.  Before the conversion, spaces and tabs in filenames were
displayed verbatim.  After the conversion they are escaped as \040, etc.

Also, if the mmapped file has been unlinked the output appears as

40017000-40018000 rw-p 00000000 03:02 1425800    /home/akpm/foo\040(deleted)

instead of

40017000-40018000 rw-p 00000000 03:02 1425800    /home/akpm/foo (deleted)

This could break applications which parse /proc/pid/maps (one person has
reported this).

The patch restores the 2.4.20 behaviour.

3f3a4378

[PATCH] Get modpost to work properly with vmlinux in a different directory · e5d9d44e

Andrew Morton authored Dec 29, 2003

From: "Bryan O'Sullivan" <bos@pathscale.com>

The current version of modpost breaks if invoked from outside the build
tree.  This patch fixes that, and simplifies the code a bit while it's at
it.

e5d9d44e

[PATCH] Be verbose about the ia32 time source · 67fbc534

Andrew Morton authored Dec 29, 2003

From: john stultz <johnstul@us.ibm.com>

The patch arranges for each timesource type to have a name, and uses that to
tell the user which timesource is in use at bootup time.

67fbc534

[PATCH] vmscan: reset refill_counter after refilling the inactive list · 9c8c9492

Andrew Morton authored Dec 29, 2003

zone->refill_counter is only there to provide decent levels of work batching:
don't call refill_inactive_zone() just for a couple of pages.

But the logic in there allows it to build up to huge values and it can
overflow (go negative) which will disable refilling altogether until it wraps
positive again.

Just reset it to zero whenever we decide to do some refilling.

9c8c9492

[PATCH] serial console registration bugfix · 6f222020

Andrew Morton authored Dec 29, 2003

From: Bjorn Helgaas <bjorn.helgaas@hp.com>

uart_set_options() can dereference a null pointer.  This happens if you
specify a console that hasn't previously been setup by early_serial_setup().

For example, on ia64, the HCDP typically tells us about line 0, so we calls
early_serial_setup() for it.  If the user specifies "console=ttyS3", we
machine-check when trying to follow the uninitialized port->ops pointer.

It's not entirely clear to me whether we should return 0 or -ENODEV or
something.  The advantage of returning zero is that if the user specifies
"console=ttyS0" and we just lack the HCDP, the console doesn't work as early
as usual, but it does start working after the serial driver detects the port
(though the baud/parity/etc from the command line are lost).  Returning
-ENODEV seems to prevent it from ever working.

6f222020

[PATCH] Fix sysenter disabling in vm86 mode · 783faefa

Andrew Morton authored Dec 29, 2003

From: Brian Gerst <bgerst@didntduck.org>

The current code disables sysenter when first entering vm86 mode, but does
not disable it again when coming back to a vm86 task after a task switch.

783faefa

[PATCH] Add `gcc -Os' config option · ffd0cf49

Andrew Morton authored Dec 29, 2003

From: Adrian Bunk <bunk@fs.tum.de>

Allow the kernel to be built with `-Os'.

It requires CONFIG_EMBEDDED.  This is to make it "hard to get at" because
one gcc version (3.2.x I think) from RH9 generates crashy kernels with this
option set.

ffd0cf49