Commits · 6e9d6b8ee4e0c37d3952256e6472c57490d6780d · nexedi / linux

30 Oct, 2005 40 commits

Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev · 6e9d6b8e
Linus Torvalds authored Oct 30, 2005

6e9d6b8e
[PATCH] NFS: Remove unbalanced spin_unlock() calls from nfs_refresh_inode() · d3f8cf48
Trond Myklebust authored Oct 30, 2005
```
Doh!
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
```
d3f8cf48

Linus Torvalds authored Oct 30, 2005

Petr Vandrovec correctly points out that the SMB region of the PIIX4 is
just 16 bytes, not 32.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

08db2a70

[libata] use dev_printk() throughout drivers · a9524a76

Jeff Garzik authored Oct 30, 2005

A few drivers were not following the standard meme of printing out
their driver name and version at module load time; this is fixed
as well.

a9524a76

[libata ata_piix] fix native mode probe, after recent updates · fbf30fba
Jeff Garzik authored Oct 30, 2005

fbf30fba
[libata ata_piix] use dev_printk() where appropriate · 6248e647
Jeff Garzik authored Oct 30, 2005

6248e647

[libata] fix legacy IDE probing · 0f0d5192

Jeff Garzik authored Oct 30, 2005

ata_pci_init_one() receives an array of struct ata_port_info.  Recent
updates to the code had always obtained port information from
array element 0, rather than array element N.

Change to avoid hardcoding port_info[0], thereby restoring proper
hardware information to secondary legacy ports.

0f0d5192

[libata] change ata_qc_complete() to take error mask as second arg · a7dac447

Jeff Garzik authored Oct 30, 2005

The second argument to ata_qc_complete() was being used for two
purposes: communicate the ATA Status register to the completion
function, and indicate an error. On legacy PCI IDE hardware, the latter
is often implicit in the former. On more modern hardware, the driver
often completely emulated a Status register value, passing ATA_ERR as an
indication that something went wrong.

Now that previous code changes have eliminated the need to use drv_stat
arg to communicate the ATA Status register value, we can convert it to a
mask of possible error classes.

This will lead to more flexible error handling in the future.

a7dac447

Merge branch 'master' · 81cfb886
Jeff Garzik authored Oct 30, 2005

81cfb886
Merge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 9f75e1ef
Linus Torvalds authored Oct 29, 2005

9f75e1ef

[PATCH] mm/filemap.c:filemap_populate(): move export. · b1459461

Nikita Danilov authored Oct 29, 2005

move EXPORT_SYMBOL(filemap_populate) to the proper place: just after
function itself: it's easy to miss that function is exported otherwise.
Signed-off-by: Nikita Danilov <nikita@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b1459461

[PATCH] mm: wider use of for_each_*cpu() · 2f96996d

John Hawkes authored Oct 29, 2005

In 'mm' change the explicit use of a for-loop using NR_CPUS into the
general for_each_cpu() constructs.  This widens the scope of potential
future optimizations of the general constructs, as well as takes advantage
of the existing optimizations of first_cpu() and next_cpu(), which is
advantageous when the true CPU count is much smaller than NR_CPUS.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2f96996d

[PATCH] Remove policy contextualization from mbind · 5fcbb230

Christoph Lameter authored Oct 29, 2005

Policy contextualization is only useful for task based policies and not for
vma based policies.  It may be useful to define allowed nodes that are not
accessible from this thread because other threads may have access to these
nodes.  Without this patch strange memory policy situations may cause an
application to fail with out of memory.

Example:

Let's say we have two threads A and B that share the same address space and
a huge array computational array X.

Thread A is restricted by its cpuset to nodes 0 and 1 and thread B is
restricted by its cpuset to nodes 2 and 3.

Thread A now wants to restrict allocations to the first node and thus
applies a BIND policy on X to node 0 and 2.  The cpuset limits this to node
0.  Thus pages for X must be allocated on node 0 now.

Thread B now touches a page that has never been used in X and faults in a
page.  According to the BIND policy of the vma for X the page must be
allocated on page 0.  However, the cpuset of B does not allow allocation on
0 and 1.  Now the application fails in alloc_pages with out of memory.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5fcbb230

[PATCH] Implement sys_* do_* layering in the memory policy layer. · 8bccd85f

Christoph Lameter authored Oct 29, 2005

- Do a separation between do_xxx and sys_xxx functions. sys_xxx functions
  take variable sized bitmaps from user space as arguments. do_xxx functions
  take fixed sized nodemask_t as arguments and may be used from inside the
  kernel. Doing so simplifies the initialization code. There is no
  fs = kernel_ds assumption anymore.

- Split up get_nodes into get_nodes (which gets the node list) and
  contextualize_policy which restricts the nodes to those accessible
  to the task and updates cpusets.

- Add comments explaining limitations of bind policy
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8bccd85f

[PATCH] memory hotplug: ppc64 specific hot-add functions · bb7e7e03

Dave Hansen authored Oct 29, 2005

Here is a set of ppc64 specific patches that at least allow
compilation/booting with the following configurations:

FLATMEM
SPARSEMEN
SPARSEMEM + MEMORY_HOTPLUG
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bb7e7e03

[PATCH] memory hotplug: i386 addition functions · 05039b92

Dave Hansen authored Oct 29, 2005

Adds the necessary for non-NUMA hot-add of highmem to an existing zone on
i386.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

05039b92

[PATCH] memory hotplug: call setup_per_zone_pages_min after hotplug · 61b13993

Dave Hansen authored Oct 29, 2005

From: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
> I found the tests does not work well with Dave's patchset.
> I've found the followings:
>
> 	- setup_per_zone_pages_min() calls should be added in
> 	   capture_page_range() and online_pages()
> 	- lru_add_drain() should be called before try_to_migrate_pages()

The following patch deals with the first item.
Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

61b13993

[PATCH] memory hotplug: move section_mem_map alloc to sparse.c · 0b0acbec

Dave Hansen authored Oct 29, 2005

This basically keeps up from having to extern __kmalloc_section_memmap().

The vaddr_in_vmalloc_area() helper could go in a vmalloc header, but that
header gets hard to work with, because it needs some arch-specific macros.
Just stick it in here for now, instead of creating another header.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Lion Vollnhals <webmaster@schiggl.de>
Signed-off-by: Jiri Slaby <xslaby@fi.muni.cz>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0b0acbec

[PATCH] memory hotplug: sysfs and add/remove functions · 3947be19

Dave Hansen authored Oct 29, 2005

This adds generic memory add/remove and supporting functions for memory
hotplug into a new file as well as a memory hotplug kernel config option.

Individual architecture patches will follow.

For now, disable memory hotplug when swsusp is enabled.  There's a lot of
churn there right now.  We'll fix it up properly once it calms down.
Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3947be19

[PATCH] memory hotplug locking: zone span seqlock · bdc8cb98

Dave Hansen authored Oct 29, 2005

See the "fixup bad_range()" patch for more information, but this actually
creates a the lock to protect things making assumptions about a zone's size
staying constant at runtime.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bdc8cb98

[PATCH] memory hotplug locking: node_size_lock · 208d54e5

Dave Hansen authored Oct 29, 2005

pgdat->node_size_lock is basically only neeeded in one place in the normal
code: show_mem(), which is the arch-specific sysrq-m printing function.

Strictly speaking, the architectures not doing memory hotplug do no need this
locking in show_mem(). However, they are all included for completeness. This
should also make any future consolidation of all of the implementations a
little more straightforward.

This lock is also held in the sparsemem code during a memory removal, as
sections are invalidated. This is the place there pfn_valid() is made false
for a memory area that's being removed. The lock is only required when doing
pfn_valid() operations on memory which the user does not already have a
reference on the page, such as in show_mem().
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

208d54e5

[PATCH] memory hotplug prep: fixup bad_range() · c6a57e19

Dave Hansen authored Oct 29, 2005

When doing memory hotplug operations, the size of existing zones can obviously
change.  This means that zone->zone_{start_pfn,spanned_pages} can change.

There are currently no locks that protect these structure members.  However,
they are rarely accessed at runtime.  Outside of swsusp, the only place that I
can find is bad_range().

So, split bad_range() up into two pieces: one that needs to be locked and
anther that doesn't.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c6a57e19

[PATCH] memory hotplug prep: __section_nr helper · 4ca644d9

Dave Hansen authored Oct 29, 2005

A little helper that we use in the hotplug code.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4ca644d9

[PATCH] memory hotplug prep: break out zone initialization · ed8ece2e

Dave Hansen authored Oct 29, 2005

If a zone is empty at boot-time and then hot-added to later, it needs to run
the same init code that would have been run on it at boot.

This patch breaks out zone table and per-cpu-pages functions for use by the
hotplug code.  You can almost see all of the free_area_init_core() function on
one page now.  :)
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ed8ece2e

[PATCH] memory hotplug prep: kill local_mapnr · 2774812f

Dave Hansen authored Oct 29, 2005

The following series implements memory hot-add for ppc64 and i386.  There are
x86_64 and ia64 implementations that will be submitted shortly as well,
through the normal maintainers.

This patch:

local_mapnr is unused, except for in an alpha header.  Keep the alpha one,
kill the rest.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2774812f

[PATCH] .text page fault SMP scalability optimization · 1a44e149

Andrea Arcangeli authored Oct 29, 2005

We had a problem on ppc64 where with more than 4 threads a large system
wouldn't scale well while faulting in the .text (most of the time was spent
in the kernel despite it was an userland compute intensive app).  The
reason is the useless overwrite of the same pte from all cpu.

I fixed it this way (verified on an older kernel but the forward port is
almost identical).  This will benefit all archs not just ppc64.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

1a44e149

[PATCH] hugetlb: overcommit accounting check · 2e9b367c

Adam Litke authored Oct 29, 2005

Basic overcommit checking for hugetlb_file_map() based on an implementation
used with demand faulting in SLES9.

Since demand faulting can't guarantee the availability of pages at mmap
time, this patch implements a basic sanity check to ensure that the number
of huge pages required to satisfy the mmap are currently available.
Despite the obvious race, I think it is a good start on doing proper
accounting.  I'd like to work towards an accounting system that mimics the
semantics of normal pages (especially for the MAP_PRIVATE/COW case).  That
work is underway and builds on what this patch starts.

Huge page shared memory segments are simpler and still maintain their
commit on shmget semantics.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2e9b367c

[PATCH] hugetlb: demand fault handler · 4c887265

Adam Litke authored Oct 29, 2005

Below is a patch to implement demand faulting for huge pages. The main
motivation for changing from prefaulting to demand faulting is so that huge
page memory areas can be allocated according to NUMA policy.

Thanks to consolidated hugetlb code, switching the behavior requires changing
only one fault handler. The bulk of the patch just moves the logic from
hugelb_prefault() to hugetlb_pte_fault() and find_get_huge_page().
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4c887265

[PATCH] hugetlb: remove repeated code · 551110a9

Krishnakumar R authored Oct 29, 2005

Clean up some repeated code related to HugeTLB.  hugetlb_zero_setup would
have already allocated the file->f_op.
Signed-off-by: Krishnakumar. R <rkrishnakumar@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

551110a9

[PATCH] cleanup hugelbfs_forget_inode · 0b1533f6

Christoph Hellwig authored Oct 29, 2005

Reformat hugelbfs_forget_inode and add the missing but harmless
write_inode_now call.  It looks the same as generic_forget_inode now except
for the call to truncate_hugepages instead of truncate_inode_pages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0b1533f6

[PATCH] kill hugelbfs_do_delete_inode · 6b09b9df

Christoph Hellwig authored Oct 29, 2005

hugetlbfs_do_delete_inode is the same as generic_delete_inode now, so remove
it in favour of the latter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6b09b9df

[PATCH] hugetlbfs: clean up hugetlbfs_delete_inode · 149f4211

Christoph Hellwig authored Oct 29, 2005

Make hugetlbfs looks the same as generic_detelte_inode, fixing a bunch of
missing updates to it at the same time.  Rename it to
hugetlbfs_do_delete_inode and add a real hugetlbfs_delete_inode that
implements ->delete_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

149f4211

[PATCH] hugetlbfs: move free_inodes accounting · 96527980

Christoph Hellwig authored Oct 29, 2005

Move hugetlbfs accounting into ->alloc_inode / ->destroy_inode.  This keeps
the code simpler, fixes a loeak where a failing inode allocation wouldn't
decrement the counter and moves hugetlbfs_delete_inode and
hugetlbfs_forget_inode closer to their generic counterparts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

96527980

[PATCH] mm: update comments to pte lock · b8072f09

Hugh Dickins authored Oct 29, 2005

Updated several references to page_table_lock in common code comments.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b8072f09

[PATCH] mm: fix rss and mmlist locking · f412ac08

Hugh Dickins authored Oct 29, 2005

A couple of oddities were guarded by page_table_lock, no longer properly
guarded when that is split.

The mm_counters of file_rss and anon_rss: make those an atomic_t, or an
atomic64_t if the architecture supports it, in such a case.  Definitions by
courtesy of Christoph Lameter: who spent considerable effort on more scalable
ways of counting, but found insufficient benefit in practice.

And adding an mm with swap to the mmlist for swapoff: the list is well-
guarded by its own lock, but the list_empty check now has to be repeated
inside it.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f412ac08

[PATCH] mm: split page table lock · 4c21e2f2

Hugh Dickins authored Oct 29, 2005

Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.

This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)

In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.

Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.

There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4c21e2f2

[PATCH] mm: uml kill unused · b38c6845

Hugh Dickins authored Oct 29, 2005

In worrying over the various pte operations in different architectures, I came
across some unused functions in UML: remove mprotect_kernel_vm,
protect_vm_page and addr_pte.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b38c6845

[PATCH] mm: uml pte atomicity · 8f5cd76c

Hugh Dickins authored Oct 29, 2005

There's usually a good reason when a pte is examined without the lock; but it
makes me nervous when the pointer is dereferenced more than once.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8f5cd76c

[PATCH] mm: cris v32 mmu_context_lock · a7e4705b

Hugh Dickins authored Oct 29, 2005

The cris v32 switch_mm guards get_mmu_context with next->page_table_lock: good
it's not really SMP yet, since get_mmu_context messes with global variables
affecting other mms.  Replace by global mmu_context_lock.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

a7e4705b

[PATCH] mm: parisc pte atomicity · 92dc6fcc

Hugh Dickins authored Oct 29, 2005

There's a worrying function translation_exists in parisc cacheflush.h,
unaffected by split ptlock since flush_dcache_page is using it on some other
mm, without any relevant lock. Oh well, make it a slightly more robust by
factoring the pfn check within it. And it looked liable to confuse a
camouflaged swap or file entry with a good pte: fix that too.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

92dc6fcc