Commits · 4a81c9aaf48367e0c90bf39083875bd99e4128e3 · nexedi / linux

25 May, 2003 40 commits

Andrew Morton authored May 25, 2003

- Add an explanation for clearing the focus bit on P4 (zwane)

- __d_path kerneldoc fix (John Levon)

- generic-hdlc documentation fix (Krzysztof Halasa <khc@pm.waw.pl>)

- cmdline_read_proc cleanup (Oleg Drokin)

- remove a couple of unused vars from drivers/ide/pci/hpt366.c

- sound/core/sgbuf.c needs mm.h at least on alpha, for mem_map and other
  page stuff.  (Ivan Kokshaysky <ink@jurassic.park.msu.ru>)

- Don't use "u32 long" in cs46xx.c (Kevin Puetz <puetzk@puetzk.org>)

- fs/nfs/nfs4xdr.c warning fix: all the `goto out;' statements are
  commented away, so comment away the label too.

- net/ipv6/af_inet6.c: remove unused var

- drivers/media/video/bttv-cards.c: jiffies are unsigned long

- drivers/media/video/saa7134/saa7134-cards.c: unused var

- Fix Documentation/Changes comment wrt sparc compiler version

- drivers/pnp/quirks.c needs slab.h for kfree().  (Daniele Bellucci
  <bellucda@tiscali.it>)

4a81c9aa

[PATCH] extend-check_valid_hugepage_range.patch · 76e5699d

Andrew Morton authored May 25, 2003

From: David Gibson <david@gibson.dropbear.id.au>


Renames check_valid_hugepage_range() to is_hugepage_only_range(), which makes
more sense.

76e5699d

[PATCH] add notify_count for de_thread · 73accc3d

Andrew Morton authored May 25, 2003

From: Manfred Spraul <manfred@colorfullife.com>

de_thread is called by exec to kill all threads in the thread group except
the threads required for exec.

The waiting is implemented by waiting for a wakeup from __exit_signal: If
the reference count is less or equal to 2, then the waiter is woken up.  If
exec is called by a non-leader thread, then two threads are required for
exec.

But if a thread group leader calls exec, then only one thread is required
for exec.  Thus the hardcoded "2" leads to a superfluous wakeup.  The patch
fixes that by adding a "notify_count" field to the signal structure.

73accc3d

[PATCH] net/sunrpc/sunrpc_syms.c typo fix · 9ee208ea
Andrew Morton authored May 25, 2003
```
From: Frank Cusack <fcusack@fcusack.com>

net/sunrpc/sunrpc_syms.c typo fix
```
9ee208ea

[PATCH] overcommit root margin · cf50f395

Andrew Morton authored May 25, 2003

From: Dave Hansen <haveblue@us.ibm.com>

This patch makes vm_enough_memory(), more likely return failure when
overcommit_memory==0 and !CAP_SYS_ADMIN.  I'm not sure it's worth having
another tunable just for this.

I also reworked the documentation a bit.  It should be a lot clearer to
read now.

cf50f395

[PATCH] devpts xattr handler for security labels · 4a3fbc84

Andrew Morton authored May 25, 2003

From: Stephen Smalley <sds@epoch.ncsc.mil>

This patch against 2.5.69-bk adds an xattr handler for security labels
to devpts and corresponding hooks to the LSM API to support conversion
between xattr values and the security labels stored in the inode
security field by the security module.

This allows userspace to get and set the security labels on devpts
nodes, e.g. so that sshd can set the security label for the pty using
setxattr, just as sshd already sets the ownership using chown.

SELinux uses this support to protect the pty in accordance with the user
process' security label. The changes to the LSM API are general and
should be re-useable by xattr handlers in other pseudo filesystems to
support similar security labeling. The xattr handler for devpts
includes the same generic framework as in ext[23], so handlers for other
kinds of attributes can be added easily in the future.

4a3fbc84

[PATCH] CONFIG_EPOLL · fb39f360

Andrew Morton authored May 25, 2003

From: Christopher Hoover <ch@murgatroid.com>

Here's a patch to drop some more text/data/bss out of 2.5.  This time
the ``victim'' is eventpollfs (epoll).

fb39f360

[PATCH] CONFIG_FUTEX · e8c0de6e

Andrew Morton authored May 25, 2003

From: Christopher Hoover <ch@murgatroid.com>

Not everyone needs futex support, so it should be optional.  This is needed
for small platforms.

e8c0de6e

[PATCH] /proc/pid inode security labels · 20378c29

Andrew Morton authored May 25, 2003

From: Stephen Smalley <sds@epoch.ncsc.mil>

This patch against 2.5.69-bk adds a hook to proc_pid_make_inode to allow
security modules to set the security attributes on /proc/pid inodes based on
the security attributes of the associated task. This is required by SELinux
in order to control access to the process state accessible via /proc/pid
inodes in accordance with the task's security label.

An alternative approach that was considered was to implement an xattr handler
for /proc/pid inodes. That approach would still require a hook call from the
xattr handler to the security module to obtain an xattr value based on the
task security attributes, so it would add a further level of
indirection/translation. The only benefit of implementing an xattr handler
for the /proc/pid inodes would be that the /proc/pid inode security labels
could then be exported to userspace. However, the /proc/pid inode security
labels are only used internally by the security module for access control
purposes, and userspace access to the full range of process attributes is
already provided via the /proc/pid/attr interface. Consequently, a simple
hook in proc_pid_make_inode seemed preferable.

20378c29

[PATCH] Process Attribute API for Security Modules (fixlet) · 09d35c2a

Andrew Morton authored May 25, 2003

From: Stephen Smalley <sds@epoch.ncsc.mil>

This patch, relative to the /proc/pid/attr patch against 2.5.69, fixes the
mode values of the /proc/pid/attr nodes to avoid interference by the normal
Linux access checks for these nodes (and also fixes the /proc/pid/attr/prev
mode to reflect its read-only nature).

Otherwise, when the dumpable flag is cleared by a set[ug]id or unreadable
executable, a process will lose the ability to set its own attributes via
writes to /proc/pid/attr due to a DAC failure (/proc/pid inodes are
assigned the root uid/gid if the task is not dumpable, and the original
mode only permitted the owner to write).

The security module should implement appropriate permission checking in its
[gs]etprocattr hook functions.  In the case of SELinux, the setprocattr
hook function only allows a process to write to its own /proc/pid/attr
nodes as well as imposing other policy-based restrictions, and the
getprocattr hook function performs a permission check between the security
labels of the current process and target process to determine whether the
operation is permitted.

09d35c2a

[PATCH] Process Attribute API for Security Modules · ea7870c8

Andrew Morton authored May 25, 2003

From: Stephen Smalley <sds@epoch.ncsc.mil>

This updated patch against 2.5.69 merges the readdir and lookup routines
for proc_base and proc_attr, fixes the copy_to_user call in proc_attr_read
and proc_info_read, moves the new data and code within CONFIG_SECURITY, and
uses ARRAY_SIZE, per the comments from Al Viro and Andrew Morton. As
before, this patch implements a process attribute API for security modules
via a set of nodes in a /proc/pid/attr directory. Credit for the idea of
implementing this API via /proc/pid/attr nodes goes to Al Viro. Jan Harkes
provided a nice cleanup of the implementation to reduce the code bloat.

ea7870c8

[PATCH] mark shrinkable slabs as being reclaimable · 6f333c22

Andrew Morton authored May 25, 2003

All slabs which can be reclaimed via VM presure are marked as being
shrinkable, so the core slab code will keep count of their pages.

Except for the one in XFS. It has strange wrapper stuff.

6f333c22

[PATCH] slab: account for reclaimable caches · 8f542f30

Andrew Morton authored May 25, 2003

We have a problem at present in vm_enough_memory(): it uses smoke-n-mirrors
to try to work out how much memory can be reclaimed from dcache and icache.
it sometimes gets it quite wrong, especially if the slab has internal
fragmentation.  And it often does.

So here we take a new approach.  Rather than trying to work out how many
pages are reclaimable by counting up the number of inodes and dentries, we
change the slab allocator to keep count of how many pages are currently used
by slabs which can be shrunk by the VM.

The creator of the slab marks the slab as being reclaimable at
kmem_cache_create()-time.  Slab keeps a global counter of pages which are
currently in use by thus-tagged slabs.

Of course, we now slightly overestimate the amount of reclaimable memory,
because not _all_ of the icache, dcache, mbcache and quota caches are
reclaimable.

But I think it's better to be a bit permissive rather than bogusly failing
brk() calls as we do at present.

8f542f30

[PATCH] Don't remove inode from hash until filesystem has · d6686d54

Andrew Morton authored May 25, 2003

From: Neil Brown <neilb@cse.unsw.edu.au>

When an NFS request arrives, it contains a filehandle which needs to be
converted to a dentry.  Many filesystems use find_exported_dentry in
fs/exportfs/expfs.c.  A key part of this on filesystem where a 32bit inode
number uniquely locates a file is export_iget which calls iget(sb, inum).

iget will either:

   1/ find the inode in the inode cache and return it

 or

   2/ create a new inode and call ->read_inode to load it from the
      storage device.

export_iget then verifies the inode is really a good inode (->read_inode
didn't detect any problems) and the right inode (base on generation number
from the file handle).

For this to work reliably, it is important that whenever an inode is *not* in
the cache, the on-device version is up-to-date.  Otherwise, when read_inode
loads the inode it will get bad data.

For a file that has not been deleted, this condition always holds: a dirty
inode is always flushed to disc before the inode is unhashed.

However for a file that is being deleted this condition doesn't (didn't)
hold.  When iput -> iput_final -> generic_drop_inode -> generic_delete_inode
is called we would unhash the inode before calling into the filesytem through
->delete_inode.

So there is a small window between when generic_delete_inode unhashes the
inode, and when ->delete_inode writes something to disc, where a call to
->read_inode (for export_iget) might discover what it thinks is a valid
inode, but is really one that is in the process of being destroyed.

It is this window that I want to close by moving the unhashing to the end of
generic_delete_inode.

d6686d54

[PATCH] Fix readdir error return value · 2eb4051e

Andrew Morton authored May 25, 2003

From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

There are a couple of places in the readdir code where it forgets to set
the returned error code to -EFAULT, leaving it at the default -EINVAL.

Fix that up, and rename getdents_callback64.count to "result", which makes
more sense.

2eb4051e

[PATCH] xirc2ps_cs irq return fix · 61a6c177

Andrew Morton authored May 25, 2003

From zwane

We shutdown the MAC part of the card and have interrupts disabled, interrupt
gets queued, we reenable interrupts after shutting down device, service the
interrupt, check status and get 0xff from powered down device.

No idea what he's talking about here, but apparently the irq return handling
isn't working out.  Just return IRQ_HANDLED all the time.

61a6c177

[PATCH] reiserfs: inode attributes support. · 37c90629

Andrew Morton authored May 25, 2003

From: Oleg Drokin <green@namesys.com>

This is a forward port of 2.4's inode attributes support for reiserfs.
Original implementation for 2.4 was performed by Nikita Danilov.

In order to enable this support, one must use "attrs" mount options, eg:

	mount /dev/hda1 /mount/pont -t reiserfs -o attrs

Also either the filesystem must have been created with a recent mkreiserfs
or must have been modified by a recent version of reiserfsck with its
"--clean-attributes" option.

If that is not done, attributes support will not be enabled and a kernel
message will be printed.  This is necessary because old kernels left random
garbage in the place where these attributes now live.

These attributes are totally compatible with ext2's ones.  You can
manipulate them with chattr/lsattr etc.

Additionally the chattr 'd' option may be used to disable tail packing on a
specific file or a directory tree.  (The 'd' option normally means "don't
dump".  reiserfs has overloaded it).

37c90629

[PATCH] APM does unsafe conditional set_cpus_allowed · 0c85cefd

Andrew Morton authored May 25, 2003

From: Zwane Mwaikambo <zwane@linuxpower.ca>

kapmd does a conditional check in order to decide whether to set the task's
cpu affinity mask.  This can change during runtime, therefore we
unconditionally set it.  There is an early exit in set_cpus_allowed if the
current processor is in the allowed mask anyway.

0c85cefd

[PATCH] Fix dcache_lock/tasklist_lock ranking bug · 055e188d

Andrew Morton authored May 25, 2003

__unhash_process acquires the dcache_lock while holding the
tasklist_lock for writing. This can deadlock. Additionally,
fs/proc/base.c incorrectly assumed that p->pid would be set to 0 during
release_task.

The patch fixes that by adding a new spinlock to the task structure and
fixing all references to (!p->pid).

The alternative to the new spinlock would be to hold dcache_lock around
__unhash_process.

- fs/proc/base.c assumed that p->pid is reset to 0 during exit.  This is
  not the case anymore.  I now look at the count of the pid structure for
  PIDTYPE_PID.

- de_thread now tested - as broken as it was before: open handles to
  /proc/<pid> are either stale or invalid after an exec of a nptl process,
  if the exec was call from a secondary thread.

- a few lock_kernels removed - that part of /proc doesn't need it.

- additional instances of 'if(current->pid)' replaced with pid_alive.

055e188d

[PATCH] arch/i386/kernel/mpparse.c warning fixes · 05cdeac3
Andrew Morton authored May 25, 2003
```
From: William Lee Irwin III <wli@holomorphy.com>

mpc_apicid is a u8, and MAX_APICS can be 256.
```
05cdeac3

[PATCH] siocdevprivate_ioctl warning fix · 2a52198b

Andrew Morton authored May 25, 2003

fs/compat.c: In function `compat_sys_ioctl':
fs/compat.c:324: warning: implicit declaration of function `siocdevprivate_ioctl'

2a52198b

[PATCH] tty_io warning fix · 396382dc

Andrew Morton authored May 25, 2003

Don't assume the size of dev_t: on ppc64 it is unsignedlong and this
generates a printk warning.

396382dc

[PATCH] ppc64: more warning fixes · 4a6e2172

Andrew Morton authored May 25, 2003

arch/ppc64/kernel/htab.c:105: warning: implicit declaration of function `pSeries_lpar_hpte_insert'
arch/ppc64/kernel/htab.c:109: warning: implicit declaration of function `pSeries_hpte_insert'

4a6e2172

[PATCH] ppc64: arch/ppc64/kernel/traps.c warning fixes · c5ef8de3
Andrew Morton authored May 25, 2003
```
Fix a printk warning
```
c5ef8de3
[PATCH] ppc64: nail warnings in arch/ppc64/kernel/setup.c · 83599e3c
Andrew Morton authored May 25, 2003
```
two printk warnings
```
83599e3c
[PATCH] ppc64: ioctl32 warning fix · e806a036
Andrew Morton authored May 25, 2003
```
warning: assignment makes pointer from integer without a cast
```
e806a036
[PATCH] ppc64: build fix · 9b2a6123
Andrew Morton authored May 25, 2003
```
It needs sched.h for `current'.
```
9b2a6123

[PATCH] ppc64: Unused variables in ppc64 prom.c · 48df450c

Andrew Morton authored May 25, 2003

From: David Gibson <david@gibson.dropbear.id.au>

This removes a bunch of unused variables in prom_init(), squashing the
associated warnings.

48df450c

[PATCH] ppc64: Squash warning in ppc64 xics.c · ea8b5b2e

Andrew Morton authored May 25, 2003

From: David Gibson <david@gibson.dropbear.id.au>

xics.c uses ppc64_boot_msg() without prototype, this fixes it by inclding
<asm/machdep.h>.

ea8b5b2e

[PATCH] ppc64: do_signal32 warning fix · 1cb4f432
Andrew Morton authored May 25, 2003
```
do_signal32() is used before it is defined, this prototype squashes the
warning.
```
1cb4f432
[PATCH] ppc64: Squash implicit declaration warning in ppc64 · 62c2905c
Andrew Morton authored May 25, 2003
```
From: David Gibson <david@gibson.dropbear.id.au>

Squash implicit declaration warning in ppc64 align.c
```
62c2905c

[PATCH] ppc64: Squash warning in ppc64 addnote tool · e6670878

Andrew Morton authored May 25, 2003

From: David Gibson <david@gibson.dropbear.id.au>

addnote in arch/ppc64/boot (a userspace tool, not kernel code) uses exit()
without including stdlib.h.

e6670878

[PATCH] ppc64: PPC64 irq return fix · ffe8c05d
Andrew Morton authored May 25, 2003
```
PPC64 irq return fix
```
ffe8c05d

[PATCH] ppc64: Fix some PPC64 compile warnings · d69b7c27

Andrew Morton authored May 25, 2003

Fix some warnings in the ppc64 build.

Also declare a couple of AIO functions in aio.h rather than aio.c They are
needed for 32-bit emulation support.

d69b7c27

[PATCH] ppc64: 32/64bit emulation for aio · 2b748116
Andrew Morton authored May 25, 2003
```
From: Anton Blanchard <anton@samba.org>

PPC64 32/64-bit emulation for AIO.
```
2b748116

Make cdev infrastructure initialize early · 276df1b2

Linus Torvalds authored May 25, 2003

Very early initialization (core_initcall) needs to have the cdev
initialization done.  So make it part of the pre-initcall sequence, the
same way the bdev caches were done.

276df1b2

Fix compile warning from Al's chardev cleanups. · 48554ca4
Linus Torvalds authored May 24, 2003

48554ca4

[PATCH] support "requeueing" futexes · 7149345c

Ingo Molnar authored May 24, 2003

This addresses a futex related SMP scalability problem of
glibc. A number of regressions have been reported to the NTPL mailing list
when going to many CPUs, for applications that use condition variables and
the pthread_cond_broadcast() API call. Using this functionality, testcode
shows a slowdown from 0.12 seconds runtime to over 237 seconds (!)
runtime, on 4-CPU systems.

pthread condition variables use two futex-backed mutex-alike locks: an
internal one for the glibc CV state itself, and a user-supplied mutex
which the API guarantees to take in certain codepaths. (Unfortunately the
user-supplied mutex cannot be used to protect the CV state, so we've got
to deal with two locks.)

The cause of the slowdown is a 'swarm effect': if lots of threads are
blocked on a condition variable, and pthread_cond_broadcast() is done,
then glibc first does a FUTEX_WAKE on the cv-internal mutex, then down a
mutex_down() on the user-supplied mutex. Ie. a swarm of threads is created
which all race to serialize on the user-supplied mutex. The more threads
are used, the more likely it becomes that the scheduler will balance them
over to other CPUs - where they just schedule, try to lock the mutex, and
go to sleep. This 'swarm effect' is purely technical, a side-effect of
glibc's use of futexes, and the imperfect coupling of the two locks.

the solution to this problem is to not wake up the swarm of threads, but
'requeue' them from the CV-internal mutex to the user-supplied mutex. The
attached patch adds the FUTEX_REQUEUE feature FUTEX_REQUEUE requeues N
threads from futex address A to futex address B.

This way glibc can wake up a single thread (which will take the
user-mutex), and can requeue the rest, with a single system-call.

Ulrich Drepper has implemented FUTEX_REQUEUE support in glibc, and a
number of people have tested it over the past couple of weeks. Here are
the measurements done by Saurabh Desai:

System: 4xPIII 700MHz

 ./cond-perf -r 100 -n 200:        1p       2p         4p
 Default NPTL:                 0.120s   0.211s   237.407s
 requeue NPTL:                 0.124s   0.156s     0.040s

 ./cond-perf -r 1000 -n 100:
 Default NPTL:                 0.276s   0.412s     0.530s
 requeue NPTL:                 0.349s   0.503s     0.550s

 ./pp -v -n 128 -i 1000 -S 32768:
 Default NPTL: 128 games in    1.111s   1.270s    16.894s
 requeue NPTL: 128 games in    1.111s   1.959s     2.426s

 ./pp -v -n 1024 -i 10 -S 32768:
 Default NPTL: 1024 games in   0.181s   0.394s     incompleted 2m+
 requeue NPTL: 1024 games in   0.166s   0.254s     0.341s

the speedup with increasing number of threads is quite significant, in the
128 threads, case it's more than 8 times. In the cond-perf test, on 4 CPUs
it's almost infinitely faster than the 'swarm of threads' catastrophy
triggered by the old code.

7149345c

[PATCH] i_cdev/i_cindex · 9bda5f68

Alexander Viro authored May 24, 2003

new fields in struct inode - i_cdev and i_cindex. When we do open() on
a character device we cache result of cdev lookup in inode and put the
inode on a cyclic list anchored in cdev. If we already have that done,
we don't bother with any lookups. When inode disappears it's removed
from the list. When cdev gets unregistered we remove all cached
references to it (and remove such inodes from the list). cdev is held
until final fput() now.

9bda5f68

[PATCH] cdev-cidr, part 1 · 787d458a

Alexander Viro authored May 24, 2003

New object: struct cdev.  It contains a kobject, a pointer to
file_operations and a pointer to owner module.  These guys have a search
structure of the same sort as gendisks and chrdev_open() picks
file_operations from them.

Intended use: embed such animal in driver-owned structure (e.g.
tty_driver) and register it as associated with given range of device
numbers.  Generic code will do lookup for such object and use it for the
rest.

The behaviour of register_chrdev() is _not_ changed - it allocates
struct cdev and registers it; any old driver will work as if nothing had
changed.

On that stage we only use it during chrdev_open() to find
file_operations.  Later it will be cached in inode->i_cdev (and index in
range - in inode->i_cindex) so that ->open() could get whatever objects
it wants directly without any special-cased lookups, etc.

787d458a