Commits · b8fc4428d704b91911051cbea79304cbc746f3a5 · Kirill Smelkov / linux

06 Feb, 2003 40 commits

Hand-merge with Ingo's changes · b8fc4428
Daniel Jacobowitz authored Feb 06, 2003

b8fc4428
Signal handling bugs for thread exit + ptrace · a866697c
Daniel Jacobowitz authored Feb 06, 2003

a866697c
Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities. · 45c1a159
Daniel Jacobowitz authored Feb 06, 2003

45c1a159

[PATCH] fix megaraid driver compile error · 3fa327f8

Mark Haverkamp authored Feb 05, 2003

This moves access of the host element to device since host has been
removed from struct scsi_cmnd.

3fa327f8

[PATCH] signal-fixes-2.5.59-A4 · ebf5ebe3

Ingo Molnar authored Feb 05, 2003

this is the current threading patchset, which accumulated up during the
past two weeks. It consists of a biggest set of changes from Roland, to
make threaded signals work. There were still tons of testcases and
boundary conditions (mostly in the signal/exit/ptrace area) that we did
not handle correctly.

Roland's thread-signal semantics/behavior/ptrace fixes:

 - fix signal delivery race with do_exit() => signals are re-queued to the
   'process' if do_exit() finds pending unhandled ones. This prevents
   signals getting lost upon thread-sys_exit().

 - a non-main thread has died on one processor and gone to TASK_ZOMBIE,
   but before it's gotten to release_task a sys_wait4 on the other
   processor reaps it.  It's only because it's ptraced that this gets
   through eligible_child.  Somewhere in there the main thread is also
   dying so it reparents the child thread to hit that case.  This means
   that there is a race where P might be totally invalid.

 - forget_original_parent is not doing the right thing when the group
   leader dies, i.e. reparenting threads to init when there is a zombie
   group leader.  Perhaps it doesn't matter for any practical purpose
   without ptrace, though it makes for ppid=1 for each thread in core
   dumps, which looks funny. Incidentally, SIGCHLD here really should be
   p->exit_signal.

 - one of the gdb tests makes a questionable assumption about what kill
   will do when it has some threads stopped by ptrace and others running.

exit races:

1. Processor A is in sys_wait4 case TASK_STOPPED considering task P.
   Processor B is about to resume P and then switch to it.

   While A is inside that case block, B starts running P and it clears
   P->exit_code, or takes a pending fatal signal and sets it to a new
   value. Depending on the interleaving, the possible failure modes are:
        a. A gets to its put_user after B has cleared P->exit_code
           => returns with WIFSTOPPED, WSTOPSIG==0
        b. A gets to its put_user after B has set P->exit_code anew
           => returns with e.g. WIFSTOPPED, WSTOPSIG==SIGKILL

   A can spend an arbitrarily long time in that case block, because
   there's getrusage and put_user that can take page faults, and
   write_lock'ing of the tasklist_lock that can block.  But even if it's
   short the race is there in principle.

2. This is new with NPTL, i.e. CLONE_THREAD.
   Two processors A and B are both in sys_wait4 case TASK_STOPPED
   considering task P.

   Both get through their tests and fetches of P->exit_code before either
   gets to P->exit_code = 0.  => two threads return the same pid from
   waitpid.

   In other interleavings where one processor gets to its put_user after
   the other has cleared P->exit_code, it's like case 1(a).


3. SMP races with stop/cont signals

   First, take:

        kill(pid, SIGSTOP);
        kill(pid, SIGCONT);

   or:

        kill(pid, SIGSTOP);
        kill(pid, SIGKILL);

   It's possible for this to leave the process stopped with a pending
   SIGCONT/SIGKILL.  That's a state that should never be possible.
   Moreover, kill(pid, SIGKILL) without any repetition should always be
   enough to kill a process.  (Likewise SIGCONT when you know it's
   sequenced after the last stop signal, must be sufficient to resume a
   process.)

4. take:

        kill(pid, SIGKILL);     // or any fatal signal
        kill(pid, SIGCONT);     // or SIGKILL

    it's possible for this to cause pid to be reaped with status 0
    instead of its true termination status.  The equivalent scenario
    happens when the process being killed is in an _exit call or a
    trap-induced fatal signal before the kills.

plus i've done stability fixes for bugs that popped up during
beta-testing, and minor tidying of Roland's changes:

 - a rare tasklist corruption during exec, causing some very spurious and
   colorful crashes.

 - a copy_process()-related dereference of already freed thread structure
   if hit with a SIGKILL in the wrong moment.

 - SMP spinlock deadlocks in the signal code

this patchset has been tested quite well in the 2.4 backport of the
threading changes - and i've done some stresstesting on 2.5.59 SMP as
well, and did an x86 UP testcompile + testboot as well.

ebf5ebe3

[PATCH] ips driver 4/4: error messages · 44a5a59c

David Jeffery authored Feb 05, 2003

This small patch does 2 things. It reworks the firmware/driver
versioning messages to make them more understandable, and it
fixes one case where the 64bit addressing changes caused
error/success to not be properly reported to the serveraid tools.

44a5a59c

[PATCH] ips driver 3/4: 64bit dma addressing · d31bb16c

David Jeffery authored Feb 05, 2003

This large patch adds support for using 64bit addressing.

Special thanks goes to Mike Anderson who did the initial
versions of this patch.

d31bb16c

[PATCH] ips driver 2/4: initialization reordering · 836f40cb

David Jeffery authored Feb 05, 2003

This large patch reworks much of the adapter initialization
code.

It splits the scsi initialization code from the pci
initialization.  It adds support for working with some
future cards.  It also removes the use of multiple pci_driver
registrations and instead does its own adapter ordering.

836f40cb

[PATCH] ips driver 1/4: fix struct length and remove dead code · 9d252c21

David Jeffery authored Feb 05, 2003

This small patch fixes the length of the IPS_ENQ
struct.  It was too short which can cause the adapter
to write beyond the the end of the struct during
driver initialization and corrupt part of memory.

9d252c21

Merge http://linux-scsi.bkbits.net/scsi-for-linus-2.5 · bd15d114
Linus Torvalds authored Feb 05, 2003
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
bd15d114
Merge raven.il.steeleye.com:/home/jejb/BK/scsi-misc-2.5 · 35766eb7
James Bottomley authored Feb 05, 2003
```
into raven.il.steeleye.com:/home/jejb/BK/scsi-for-linus-2.5
```
35766eb7

[PATCH] coding style updates for scsi_lib.c · 78ef52ec

Christoph Hellwig authored Feb 05, 2003

I just couldn't see the mess anymore..  Nuke the ifdefs and use sane
variable names.  Some more small nitpicks but no behaviour changes at
all.

78ef52ec

[PATCH] 2.5.59 add two help texts to drivers_scsi_Kconfig · baaf76dd

Rusty Russell authored Feb 05, 2003

From:  Steven Cole <elenstev@mesatop.com>

  Here are some help texts from 2.4.21-pre3 Configure.help which are
  needed in 2.5.59 drivers/scsi/Kconfig.

  Steven

baaf76dd

[PATCH] [patch, 2.5] scsi_qla1280.c free on error path · f8646d20

Rusty Russell authored Feb 05, 2003

From:  Marcus Alanen <maalanen@ra.abo.fi>

  Remove check_region in favour of request_region. Free resources
  properly on error path. Horribly subtle ioremap/iounmap lurks here I
  think, in qla1280_pci_config(), which the below patch should take care
  of.

  I'm wondering if there couldn't / shouldn't be a better way to
  allocate resources. Obviously lots of drivers have broken error paths.
  Is this even necessary?

  Marcus


  #
  # create_patch: qla1280_release_on_error_path-2002-12-08-A.patch
  # Date: Sun Dec  8 22:32:33 EET 2002
  #

f8646d20

[SCSI] Remove host_active · d30a24be
Christoph Hellwig authored Feb 05, 2003
```
It isn't used anywhere anymore
```
d30a24be
Merge http://linux-acpi.bkbits.net/linux-acpi · 26d987a7
Linus Torvalds authored Feb 05, 2003
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
26d987a7
ACPI: Enable compilation w/o cpufreq · 32dbc81b
Andy Grover authored Feb 05, 2003

32dbc81b
[PATCH] quota memleak · a2dd1464
Randy Dunlap authored Feb 05, 2003
```
The Stanford Checker found a memleak.
```
a2dd1464
Merge bk://kernel.bkbits.net/vojtech/x86-64 · d0d3f1f0
Linus Torvalds authored Feb 05, 2003
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
d0d3f1f0
x86-64: Minor fixes to make the kernel compile and remove warnings. · 4a69c79b
Vojtech Pavlik authored Feb 06, 2003

4a69c79b

[PATCH] Fix signed use of i_blocks in ext3 truncate · 9a3e1a96

Andrew Morton authored Feb 05, 2003

Patch from "Stephen C. Tweedie" <sct@redhat.com>

Fix "h_buffer_credits<0" assert failure during truncate.

The bug occurs when the "i_blocks" count in the file's inode overflows
past 2^31. That works fine most of the time, because i_blocks is an
unsigned long, and should go up to 2^32; but there's a place in truncate
where ext3 calculates the size of the next transaction chunk for the
delete, and that mistakenly uses a signed long instead. Because the
huge i_blocks gets cast to a negative value, ext3 does not reserve
enough credits for the transaction and the above error results.

This is usually only possible on filesystems corrupted for other
reasons, but it is reproducible if you create a single, non-sparse file
larger than 1TB on ext3 and then try to delete it.

9a3e1a96

[PATCH] CPU Hotplug mm/slab.c CPU_UP_CANCELED fix · 4f1cb3ff

Andrew Morton authored Feb 05, 2003

Patch from Manfred Spraul.

Fixes a bug which was exposed by Zwane's hotplug CPU work.  The
cache_cache.array pointer is initially given a temp bootstrap area, which is
later converted over to the final value after the CPU is brought up.

But if slab is enhanced to permit cancellation of a CPU bringup, this pointer
ends up pointing at stale memory.  So reinitialise it by hand when
kmem_cache_init() is run.

4f1cb3ff

[PATCH] spinlock debugging on uniprocessors · ecd2d220

Andrew Morton authored Feb 05, 2003

Patch from Manfred Spraul <manfred@colorfullife.com>

This enables spinlock debuggng on uniprocessor builds, under
CONFIG_DEBUG_SPINLOCK.

The reason I want this is that one day we'll need to pull out the debugging
support from the timer code which detects uninitialised timers.  And once
that has gone, uniprocessor developers and testers have no way of detecting
uninitialised timers - there will be mysterious deadlocks on SMP machines.
And there will surely be more uninitialised timers

The patch also removes the last pieces of the support for including
<asm/spinlock.h> directly.  Doesn't work since (IIRC) 2.3.x

ecd2d220

[PATCH] mm/mremap.c whitespace cleanup · 32738fbf
Andrew Morton authored Feb 05, 2003
```
- Not everyone uses 160-column xterms.

- Coding style consistency
```
32738fbf

[PATCH] hugetlb mremap fix · df79ea40

Andrew Morton authored Feb 05, 2003

If you attempt to perform a relocating 4k-aligned mremap and the new address
for the map lands on top of a hugepage VMA, do_mremap() will attempt to
perform a 4k-aligned unmap inside the hugetlb VMA.  The hugetlb layer goes
BUG.

Fix that by trapping the poorly-aligned unmap attempt in do_munmap().
do_remap() will then fall through without having done anything to the place
where it tests for a hugetlb VMA.

It would be neater to perform these checks on entry to do_mremap(), but that
would incur another VMA lookup.

Also, if you attempt to perform a 4k-aligned and/or sized munmap() inside a
hugepage VMA the same BUG happens.  This patch fixes that too.

This all means that an mremap attempt against a hugetlb area will fail, but
only after having unmapped the source pages.  That's a bit messy, but
supporting hugetlb mremap doesn't seem worth it, and completely disallowing
it will add overhead to normal mremaps.

df79ea40

[PATCH] Fix hugetlb_vmtruncate_list() · 8a1335e9

Andrew Morton authored Feb 05, 2003

This function is quite wrong - has an "=" where it should have a "-" and
confuses PAGE_SIZE and HPAGE_SIZE in its address and file offset arithmetic.

8a1335e9

[PATCH] ia32 hugetlb cleanup · a20d5200
Andrew Morton authored Feb 05, 2003
```
- whitespace

- remove unneeded spinlocking no-op.
```
a20d5200

[PATCH] Fix hugetlbfs faults · 8b5111ec

Andrew Morton authored Feb 05, 2003

If the underlying mapping was truncated and someone references the
now-unmapped memory the kernel will enter handle_mm_fault() and will start
instantiating PAGE_SIZE pte's inside the hugepage VMA.  Everything goes
generally pear-shaped.

So trap this in handle_mm_fault().  It adds no overhead to non-hugepage
builds.

Another possible fix would be to not unmap the huge pages at all in truncate
- just anonymise them.

But I think we want full ftruncate semantics for hugepages for management
purposes.

8b5111ec

[PATCH] Give all architectures a hugetlb_nopage(). · 08a1cc4e

Andrew Morton authored Feb 05, 2003

If someone maps a hugetlbfs file, then truncates it, then references the part
of the mapping outside the truncation point, they take a pagefault and we end
up hitting hugetlb_nopage().

We want to prevent this from ever happening. This patch just makes sure that
all architectures have a goes-BUG hugetlb_nopage() to trap it.

08a1cc4e

[PATCH] hugetlbfs cleanups · 3cc33271

Andrew Morton authored Feb 05, 2003

- Remove quota code.

- Remove extraneous copy-n-paste code from truncate: that's only for
  physically-backed filesystems.

- Whitespace changes.

3cc33271

[PATCH] hugetlbfs i_size fixes · 05732657

Andrew Morton authored Feb 05, 2003

We're expanding hugetlbfs i_size in the wrong place.  If someone attempts to
mmap more pages than are available, i_size is updated to reflect the
attempted mapping size.

So set i_size only when pages are successfully added to the mapping.

i_size handling at truncate time is still a bit wrong - if the mapping has
pages at (say) page offset 100-200 and the mappng is truncated to (say) page
offset 50, i_size should be set to zero.  But it is instead set to
50*HPAGE_SIZE.  That's harmless.

05732657

[PATCH] hugetlbfs: fix truncate · 136963d1

Andrew Morton authored Feb 05, 2003

- Opening a hugetlbfs file O_TRUNC calls the generic vmtruncate() functions
  and nukes the kernel.

  Give S_ISREG hugetlbfs files a inode_operations, and hence a setattr
  which know how to handle these files.

- Don't permit the user to truncate hugetlbfs files to sizes which are not
  a multiple of HPAGE_SIZE.

- We don't support expanding in ftruncate(), so remove that code.

136963d1

[PATCH] get_unmapped_area for hugetlbfs · 8ca8cd5b

Andrew Morton authored Feb 05, 2003

Having to specify the mapping address is a pain.  Give hugetlbfs files a
file_operations.get_unmapped_area().

The implementation is in hugetlbfs rather than in arch code because it's
probably common to several architectures.  If the architecture has special
needs it can define HAVE_ARCH_HUGETLB_UNMAPPED_AREA and go it alone.  Just
like HAVE_ARCH_UNMAPPED_AREA.

8ca8cd5b

[PATCH] convert hugetlb code to use compound pages · b3a656b6

Andrew Morton authored Feb 05, 2003

The odd thing about hugetlb is that it maintains its own freelist of pages.
And it has to do that, else it would trivially run out of pages due to buddy
fragmetation.

So we we don't want callers of put_page() to be passing those pages
to __free_pages_ok() on the final put().

So hugetlb installs a destructor in the compound pages to point at
free_huge_page(), which knows how to put these pages back onto the free list.

Also, don't mark hugepages as all PageReserved any more. That's preenting
callers from doing proper refcounting. Any code which does a user pagetable
walk and hits part of a hugepage will now handle it transparently.

b3a656b6

[PATCH] Infrastructure for correct hugepage refcounting · eefb08ee

Andrew Morton authored Feb 05, 2003

We currently have a problem when things like ptrace, futexes and direct-io
try to pin user pages.  If the user's address is in a huge page we're
elevting the refcount of a constituent 4k page, not the head page of the
high-order allocation unit.

To solve this, a generic way of handling higher-order pages has been
implemented:

- A higher-order page is called a "compound page".  Chose this because
  "huge page", "large page", "super page", etc all seem to mean different
  things to different people.

- The first (controlling) 4k page of a compound page is referred to as the
  "head" page.

- The remaining pages are tail pages.

All pages have PG_compound set.  All pages have their lru.next pointing at
the head page (even the head page has this).

The head page's lru.prev, if non-zero, holds the address of the compound
page's put_page() function.

The order of the allocation is stored in the first tail page's lru.prev.
This is only for debug at present.  This usage means that zero-order pages
may not be compound.

The above relationships are established for _all_ higher-order pages in the
page allocator.  Which has some cost, but not much - another atomic op during
fork(), mainly.

This functionality is only enabled if CONFIG_HUGETLB_PAGE, although it could
be turned on permanently.  There's a little extra cost in get_page/put_page.

These changes do not preclude adding compound pages to the LRU in the future
- we can add a new page flag to the head page and then move all the
additional data to the first tail page's lru.next, lru.prev, list.next,
list.prev, index, private, etc.

eefb08ee

[PATCH] give hugetlbfs a set_page_dirty a_op · 6725839b

Andrew Morton authored Feb 05, 2003

Seems that nobody has tested direct IO into hugetlb pages yet.  The VFS gets
upset about running set_page_dirty() against a non-uptodate page.

So give hugetlbfs inodes a private no-op ->set_page_dirty() to isolate them
from all that.

6725839b

[PATCH] pte_chain_alloc fixes · afcde6ef

Andrew Morton authored Feb 05, 2003

There are several places in which the return value from pte_chain_alloc() is
not being checked, and one place in which a GFP_KERNEL allocatiopn is
happening inside spinlock.

afcde6ef

[PATCH] loop inefficiency fix · a1329fe8

Andrew Morton authored Feb 05, 2003

Patch from Hugh Dickins <hugh@veritas.com>

The loop driver's loop over elements of bi_io_vec is in lo_send and
lo_receive: iterating that same transfer bi_vcnt times at the level above is,
er, excessive.  (And no need to increment bi_idx here.)

a1329fe8

[PATCH] default_idle micro-optimisation · 87afb5f6

Andrew Morton authored Feb 05, 2003

Patch from rwhron@earthlink.net

Micro-optimization of default_idle from -aa.  current_cpu_data.hlt_works_ok
is only false for some old 386/486 pcs.

87afb5f6

[PATCH] Optimise follow_page() for page-table-based hugepages · 1f1921fc

Andrew Morton authored Feb 05, 2003

ia32 and others can determine a page's hugeness by inspecting the pmd's value
directly.  No need to perform a VMA lookup against the user's virtual
address.

This patch ifdef's away the VMA-based implementation of
hugepage-aware-follow_page for ia32 and replaces it with a pmd-based
implementation.

The intent is that architectures will implement one or the other.  So the architecture either:

1: Implements hugepage_vma()/follow_huge_addr(), and stubs out
   pmd_huge()/follow_huge_pmd() or

2: Implements pmd_huge()/follow_huge_pmd(), and stubs out
   hugepage_vma()/follow_huge_addr()

1f1921fc