Commits · 33615dd1bbab1a6311c0549ba42638a71237acbf · Kirill Smelkov / linux

23 Dec, 2002 9 commits

Fix sysenter restart backwards jump, add offset comments, · 33615dd1
Linus Torvalds authored Dec 23, 2002
```
and make the alignment of the return point 
be saner.
```
33615dd1

[PATCH] more clustered-apic-mode work · 6c39ac1f

Martin J. Bligh authored Dec 22, 2002

Code mostly originally by James Cleverdon.

Abstracts out more clustered_apic_mode gunk into

 - ioapic_phys_id_map()
 - wakeup_secondary_cpu()
 - setup_portio_remap()

6c39ac1f

[PATCH] clustered IPI cleanups · 67382f14

Martin J. Bligh authored Dec 22, 2002

This one fixes up the IPI code to do something more sensible.  Sorry,
was just too ugly to leave it alone ...  but I did keep it seperated out
;-) Though this is not an equivalent transform it will only affect
NUMA-Q & summit - same op twice because some twit just split it out in
the last patch for both NUMA-Q & Summit.

Because clustered apic logical mode can't do arbitrary broadcasts of
addressing (it's not just a bitmap), I have to do send IPI instructions
as a sequence of unicasts.  However, there's already a loop in the
generic send_IPI_mask code to do that ...  there's no need to call
send_IPI_mask once for each CPU.  The comment I wrote at the time even
noted that this was silly.

67382f14

[PATCH] cleanup IPI code · 00047bcc

Martin J. Bligh authored Dec 22, 2002

Reformat the IPI stuff, specifically send_IPI_mask, send_IPI_allbutself,
and send_IPI_all.  Though the way they work is pretty silly for NUMA-Q,
I do an equivalent transform here, and fix the code in a seperate patch
(next one).  Goes into mach_ipi.h

00047bcc

[PATCH] mpparse cleanups · 37d1206a

Martin J. Bligh authored Dec 22, 2002

Most of code originally by James Cleverdon.

More stuff reformed in the mpparse sections - this time not apic stuff,
so we create mach_mpparse.h and stick it in there.

Abstracts out:
 - mpc_oem_bus_info() - stores mappings between buses and nodes/quads.
 - mpc_oem_pci_bus()  - stores mappings between global and local pci bus numbers

Changes summit_check() into mps_oem_check() to generalise it.

37d1206a

[PATCH] abstract out mpparse code · 43f1c206

Martin J. Bligh authored Dec 22, 2002

Most of code originally by James Cleverdon.

Abstracts out code from the mpparse stuff into:

 - mpc_apic_id()
 - apicid_to_cpu_present()

instead of using clustered_apic_mode switching.

43f1c206

[PATCH] abstract out clustered APIC code · df0e5a8f

Martin J. Bligh authored Dec 22, 2002

Code originally by James Cleverdon.

This abstracts out some sections that were switched by
clustered_apic_mode into the following functions:

 - apic_id_registered()
 - init_apic_ldr()
 - multi_timer_check()

Changes the return check in balance_irq from testing clustered_apic_mode
to testing "no_balance_irq" to be more general.

The removal of:
	entry.dest.logical.logical_dest = TARGET_CPUS;
is because it's a duplicate (we do it twice in the same function for
no reason).

df0e5a8f

[PATCH] NUMA-Q subarch directory · 172a3ef7

Martin J. Bligh authored Dec 22, 2002

This adds a shell of a NUMA-Q subarch directory, and copies
mach-default/mach_apic.h into it.  I then edited the default version to
remove the NUMA-Q stuff, and the NUMA-Q version to remove the default
stuff.

172a3ef7

[PATCH] x86 subarch header files · 47a62db5

Martin J. Bligh authored Dec 22, 2002

Patch from John Stultz.

This reorganises the subarch files to put all the headers under the
include dir, instead of mixing them up with the C files.  The only
interesting part is the top section where he makes it fall back from the
subarch dir to the default dir if there's no .h file under the subarch
dir.

This means we can create multiple subarches without copying every single
file that any subarch wants into all the directories.  And is much
tidier, IMHO.

47a62db5

22 Dec, 2002 9 commits

Handle single-stepping over fast system calls without polluting · 52a150d8

Linus Torvalds authored Dec 22, 2002

the fast case with a pushf/popf, by having the kernel debug trap
set the TIF_SINGLESTEP flag and causing the return path to dtrt.

52a150d8

[PATCH] Avoid overwriting boot_cpu_data from trampoline code · dd0f2bdf

Manfred Spraul authored Dec 22, 2002

boot_cpu_data should contain the common capabilities of all cpus in the
system. identify_cpu [arch/i386/kernel/cpu/common.c] tries to enforce
that. But right now, the SMP trampoline code [arch/i386/kernel/head.S]
overwrites boot_cpu_data when the secondary cpus are started, i.e.
boot_cpu_data contains the capabilities from the last cpu that booted :-(

The attached patch adds a new, __initdata variable for the asm code.

dd0f2bdf

[PATCH] honour init= bootparm · 576d92d6
Rusty Russell authored Dec 22, 2002
```
Restore the accidentally dropped code to handle "init=xxx"
```
576d92d6

[PATCH] Fix pageattr with mem=nopentium · 4d59f610

Andi Kleen authored Dec 22, 2002

This fixes a hang in change_page_attr() that occured with mem=nopentium.

Make sure a non large page kernel mapping is handled correctly.
Previously the page reference counter was handled incorrectly in this
case.

Also hardens change_page_attr against bogus addresses.  You get an
EINVAL now.

4d59f610

[PATCH] Make mem=nopentium clear cpu_has_pse · 0b9e43dc

Andi Kleen authored Dec 22, 2002

"mem=nopentium" would clear the PSE bit in boot_cpu_data, but the CPU
detection later would overwrite it again from CPUID.

The large pages would be correctly disabled, but cpu_has_pse was lying.

This patch makes sure it stays clear when the option is given.

I also took the liberty to remove these obnoxious cpu capability
printks who give no use information (the data can be either gotten
from CPUID in user space in raw form or from /proc/cpuinfo processed)

0b9e43dc

[PATCH] reorder 'rep;nop;' in the spinlock macro · 5e163a89

Manfred Spraul authored Dec 21, 2002

According to Intel's recommendation, 'rep;nop; should be called before
testing if the lock variable was modified (i.e. rep nop;cmp;jcc). The
current implementation does it the wrong way around: first test, then
wait, then branch. I've asked Asit Mallik from Intel, and he recommended
to change it.

It should be at least consistent: Right now, spinlock uses
'cmp;rep nop;jcc', rwlock uses 'rep nop;cmp;jcc'

5e163a89

Merge http://linux-voyager.bkbits.net/dma-generic-mapping-2.5 · d21918b6
Linus Torvalds authored Dec 21, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
d21918b6

remove PCI_NEW_DMA_COMPAT_API · e6241a27

James Bottomley authored Dec 21, 2002

use a #include mechanism for generic implementations of the pci_
API in terms of the dma_ one

e6241a27

Merge bk://linuxusb.bkbits.net/linus-2.5 · b163be65
Linus Torvalds authored Dec 21, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
b163be65

21 Dec, 2002 22 commits

[ALPHA] Add __param support to link script. · ba96dab4
Richard Henderson authored Dec 21, 2002

ba96dab4
Merge kroah.com:/home/linux/linux/BK/bleeding-2.5 · 7037193a
Greg Kroah-Hartman authored Dec 21, 2002
```
into kroah.com:/home/linux/linux/BK/gregkh-2.5
```
7037193a
[PATCH] dev_printk macro · b874f98e
James Keniston authored Dec 21, 2002

b874f98e

[PATCH] scanner.c: Support for devices with only one bulk-in endpoint · 6f815233

Henning Meier-Geinitz authored Dec 21, 2002

This patch (originally from Sergey Vlasov) adds support for scanners
with only one bulk-in endpoint. It's needed by all the GT-6801 based
scanners like the Artec Ultima 2000 or some of the Mustek BearPaws.

6f815233

[PATCH] scanner.h: add/fix vendor/product ids · 00945e82

Henning Meier-Geinitz authored Dec 21, 2002

This patch adds additional vendor and product ids for Nikon, Mustek,
Plustek, Genius, Epson, Canon, Umax, Hewlett-Packard, Benq, Agfa,
and Minolta scanners. The entries for Benq, Genius and Plustek
scanners have been updated.

I've also increased the version number to 0.4.9 and brought the
version numbers in scanner.c and scanner.h in sync.

00945e82

[PATCH] ehci, qtd submit and completions · a37d3ccc

David Brownell authored Dec 21, 2002

 > ... usb-storage gets unhappy when
 > it decides (why?  and unsuccessfully) to reset high speed
 > devices.  ...

I don't know if that problem is resolved, but this patch
makes the question moot by handling an earlier error correctly.

The patch updates an incorrect test, so a short read will now
be treated as one.  Please merge.

This lets storage behave again.  As in, "mkfs -c" then copy
about 8 GB around, then 'dbench'.

a37d3ccc

Remove old pci_dma_supported(), this is done by the generic · 4e375211
Linus Torvalds authored Dec 21, 2002
```
device DMA now (see <linux/pci.h> for the compat wrapper).
```
4e375211
allow pci primary busses to have parents in the device model · 8f66ebaf
James Bottomley authored Dec 21, 2002

8f66ebaf

generic device DMA API · 1ebad6d8

James Bottomley authored Dec 21, 2002

add dma_ API to mirror pci_ DMA API but phrased to use struct
device instead of struct pci_dev.

See Documentation/DMA-API.txt for details

1ebad6d8

More mtrr/if.c fixes · 011f5659

Linus Torvalds authored Dec 21, 2002

 - printk is not an acceptable substitute for errors
 - fix indentation of mtrr_close()
 - fix duplicate mtrr "release" fn pointer initializer

011f5659

[PATCH] remove unused macro MAP_ALIGN() · 7a503673
Andrew Morton authored Dec 21, 2002
```
Patch from Christoph Hellwig <hch@lst.de>

remove unused macro MAP_ALIGN()
```
7a503673
[PATCH] remove memclass() · 9a7e870f
Andrew Morton authored Dec 21, 2002
```
From hch.  Nothing is using the memclass() predicate.
```
9a7e870f

[PATCH] don't cacheline-align radix_tree_nodes · 2a17c650

Andrew Morton authored Dec 21, 2002

They are 260 bytes.  We can get 15 per page without cacheline
alignment.  But we're currently only getting ten per page on P4.

2a17c650

[PATCH] hugetlbfs: set inode->i_size · 74bbb9c7

Andrew Morton authored Dec 21, 2002

An `ls' in hugetlbfs currently shows all files having zero size.

So, part-cosmetic, part-informative, we here set i_size to represent the
index of the highest present page in the mapping, plus one.

74bbb9c7

[PATCH] hugetlb: report shared memory attachment counts · 165eaa86

Andrew Morton authored Dec 21, 2002

From Rohit Seth

Attached is a patch that passes the correct information back to user
land for number of attachments to shared memory segment.  I could have
done few more changes in a way nattach is getting set for regular cases
now, but just want to limit it at this point.

165eaa86

[PATCH] hugetlb bugfixes · f19dc938

Andrew Morton authored Dec 21, 2002

From Rohit Seth

1) Bug fixes (mainly in the unsuccessful attempts of hugepages).

   i) not modifying the value of key for unsuccessful key
      allocation

   ii) Correct usage of mmap_sem in free_hugepages

   iii) Proper unlocking of key->lock for partial hugepage
        allocations


2) Include the IPC_LOCK for permission to use hugepages via the
   syscall interface.  This brings the syscall interface into line with
   the hugetlbfs interface.

   It also adds permits users who are in the superuser group to
   access hugetlb resources.  This is so that database servers can run
   without elevated permissions.

3) Increment the key_counts during forks to correctly identify the
   number of processes references a key.

f19dc938

[PATCH] ext3: fix buffer dirtying · 0c74aabb

Andrew Morton authored Dec 21, 2002

This is a forward-port from 2.4.  One of Stephen's recent fixes.  I
managed to merge up only half of it.  Here is the rest.  It should fix
the asserton failure reported by Robert Macaulay
<robert_macaulay@dell.com>

"There was a race window in buffer refiling where we could temporarily
 expose the journal's internal BH_JBDDirect flag as BH_Dirty, which is
 visible to the rest of the VFS.  That doesn't affect the journaling,
 because we hold journal_head locks while the buffer is in this
 transient state, but bdflush can see the buffer and write it out
 unexpectedly, causing ext3 to find the buffer in an unexpected state
 later."

 The fix simply keeps the dirty bits clear during the internal buffer
 processing, restoring the state to the private BH_JBDDirect once
 refiling is complete."

0c74aabb

[PATCH] ext3 use-after-free bugfix · dd2f1160

Andrew Morton authored Dec 21, 2002

If ext3_add_nondir() fails it will do an iput() of the inode.  But we
continue to run ext3_mark_inode_dirty() against the potentially-freed
inode.  This oopses when slab poisoning is enabled.

Fix it so that we only run ext3_mark_inode_dirty() if the inode was
successfully instantiated.

dd2f1160

[PATCH] rename locals in ext2_new_block() · 02d0c3df

Andrew Morton authored Dec 21, 2002

Renames the local variables `bh2', `i', `j', 'k', and `tmp' to
something meanigful.  This brings ext2_new_block() into line with
ext3_new_block().

02d0c3df

[PATCH] ext2: smarter block allocation startup · 7dcaa802
Andrew Morton authored Dec 21, 2002
```
The same thing, for ext2.
```
7dcaa802

[PATCH] ext3: smarter block allocation startup · d2562c9d

Andrew Morton authored Dec 21, 2002

When an ext3 (or ext2) file is first created the filesystem has to
choose the initial starting block for its data allocations. In the
usual (new-file) case, that initial goal block is the zeroeth block of
a particular blockgroup.

This is the worst possible choice. Because it _guarantees_ that this
file's blocks will be pessimally intermingled with the blocks of
another file which is growing within the same blockgroup.

We've always had this problem with files in the same directory. With
the introduction of the Orlov allocator we now have the problem with
files in different directories. And it got noticed. This is the cause
of the post-Orlov 50% slowdown in dbench throughput on ext3 on
write-through caching SCSI on SMP. And 25% in ext2.

It doesn't happen on uniprocessor because a single CPU will not exhibit
sufficient concurrency in allocation against two or more files.

It will happen on uniprocessor if the files are growing slowly.

It has always happened if the files are in the same directory.

ext2 has the same problem but it is siginficantly less damaging there
because of ext2's eight-block per-inode preallocation window.

The patch largely solves this problem by not always starting the
allocation goal at the zeroeth block of the blockgroup. We instead
chop the blockgroup into sixteen starting points and select one of those
based on the lower four bits of the calling process's PID.

The PID was chosen as the index because this will help to ensure that
related files have the same starting goal. If one process is slowly
writing two files in the same directory, we still lose.

Using the PID in the heuristic is a bit weird. As an alternative I
tried using the file's directory's i_ino. That fixed the dbench
problem OK but caused a 15% slowdown in the fast-growth `untar a kernel
tree' workload. Because this approach will cause files which are in
different directories to spread out more. Suppressing that behaviour
when the files are all being created by the same process is a
reasonable heuristic.

I changed dbench to never unlink its files, and used e2fsck to
determine how many fragmented files were present after a `dbench 32'
run. With this patch and the next couple, ext2's fragmentation went
from 22% to 13% and ext3's from 25% to 10.4%.

d2562c9d

[PATCH] ext2/3: better starting group for S_ISREG files · 1cdf4231

Andrew Morton authored Dec 21, 2002

ext2 places non-directory objects into the same blockgroup as their
directory, as long as that directory has free inodes.  It does this
even if there are no free blocks in that blockgroup (!).

This means that if there are lots of files being created at a common
point in the tree, they _all_ have the same starting blockgroup.  For
each file we do a big search forwards for the first block and the
allocations end up getting intermingled.

So this patch will avoid placing new inodes in block groups which have
no free blocks.

So far so good.  But this means that if a lot of new files are being
created under a directory (or multiple directories) which are in the
same blockgroup, all the new inodes will overflow into the same
blockgroup.  No improvement at all.

So the patch arranges for the new inode locations to be "spread out"
across different blockgroups if they are not going to be placed in
their directory's block group.  This is done by adding parent->i_ino
into the starting point for the quadratic hash.  i_ino was chosen so
that files which are in the same directory will tend to all land in the
same new blockgroup.

1cdf4231