Commits · 520c85346666d4d9a6fcaaa8450542302dc28b91 · Kirill Smelkov / linux

06 Jan, 2009 7 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 520c8534

Linus Torvalds authored Jan 05, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  inotify: fix type errors in interfaces
  fix breakage in reiserfs_new_inode()
  fix the treatment of jfs special inodes
  vfs: remove duplicate code in get_fs_type()
  add a vfs_fsync helper
  sys_execve and sys_uselib do not call into fsnotify
  zero i_uid/i_gid on inode allocation
  inode->i_op is never NULL
  ntfs: don't NULL i_op
  isofs check for NULL ->i_op in root directory is dead code
  affs: do not zero ->i_op
  kill suid bit only for regular files
  vfs: lseek(fd, 0, SEEK_CUR) race condition

520c8534

mm lockless pagecache barrier fix · e8c82c2e

Nick Piggin authored Jan 06, 2009

An XFS workload showed up a bug in the lockless pagecache patch. Basically it
would go into an "infinite" loop, although it would sometimes be able to break
out of the loop! The reason is a missing compiler barrier in the "increment
reference count unless it was zero" case of the lockless pagecache protocol in
the gang lookup functions.

This would cause the compiler to use a cached value of struct page pointer to
retry the operation with, rather than reload it. So the page might have been
removed from pagecache and freed (refcount==0) but the lookup would not correctly
notice the page is no longer in pagecache, and keep attempting to increment the
refcount and failing, until the page gets reallocated for something else. This
isn't a data corruption because the condition will be detected if the page has
been reallocated. However it can result in a lockup.

Linus points out that ACCESS_ONCE is also required in that pointer load, even
if it's absence is not causing a bug on our particular build. The most general
way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.

Assembly of find_get_pages,
before:
.L220:
        movq    (%rbx), %rax    #* ivtmp.1162, tmp82
        movq    (%rax), %rdi    #, prephitmp.1149
.L218:
        testb   $1, %dil        #, prephitmp.1149
        jne     .L217   #,
        testq   %rdi, %rdi      # prephitmp.1149
        je      .L203   #,
        cmpq    $-1, %rdi       #, prephitmp.1149
        je      .L217   #,
        movl    8(%rdi), %esi   # <variable>._count.counter, c
        testl   %esi, %esi      # c
        je      .L218   #,

after:
.L212:
        movq    (%rbx), %rax    #* ivtmp.1109, tmp81
        movq    (%rax), %rdi    #, ret
        testb   $1, %dil        #, ret
        jne     .L211   #,
        testq   %rdi, %rdi      # ret
        je      .L197   #,
        cmpq    $-1, %rdi       #, ret
        je      .L211   #,
        movl    8(%rdi), %esi   # <variable>._count.counter, c
        testl   %esi, %esi      # c
        je      .L212   #,

(notice the obvious infinite loop in the first example, if page->count remains 0)
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e8c82c2e

i2o: Update my address · f1b11e50

Alan Cox authored Jan 05, 2009

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f1b11e50

mm: update my address · 046c6884

Alan Cox authored Jan 05, 2009

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

046c6884

X86_DEBUGCTLMSR won't work on uml · 5641f1fd

Al Viro authored Jan 05, 2009

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5641f1fd

uml got broken by commit · 7483cb7b

Al Viro authored Jan 05, 2009

... if you revert a commit, revert the fixups elsewhere that had been
triggered by it.  Such as 8c56250f
(lockdep, UML: fix compilation when CONFIG_TRACE_IRQFLAGS_SUPPORT is not set).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7483cb7b

get rid of the last symlink in uml build · 22409f9c

Al Viro authored Jan 05, 2009

We need to make asm-offsets.h contents visible for objects built
with userland headers.  Instead of creating a symlink, just have the
file with equivalent include (relative to location of header) created
once.  That kills the last symlink used in arch/um builds.

Additionally, both generated headers can become dependencies of
archprepare now, killing the misuse of prepare.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

22409f9c

05 Jan, 2009 14 commits

inotify: fix type errors in interfaces · 4ae8978c

Michael Kerrisk authored Jan 05, 2009

The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.

For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
currently 'u32', it should be '__s32' .  That is Robert's suggestion, and
is consistent with the other declarations of watch descriptors in the
kernel source, in particular, the inotify_event structure in
include/linux/inotify.h:

struct inotify_event {
        __s32           wd;             /* watch descriptor */
        __u32           mask;           /* watch mask */
        __u32           cookie;         /* cookie to synchronize two events */
        __u32           len;            /* length (including nulls) of name */
        char            name[0];        /* stub for possible name */
};

The patch makes the changes needed for inotify_rm_watch().
Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Robert Love <rlove@google.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4ae8978c

fix breakage in reiserfs_new_inode() · 2f1169e2

Al Viro authored Jan 02, 2009

now that we use ih.key earlier, we need to do all its setup early enough
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

2f1169e2

fix the treatment of jfs special inodes · 5b45d96b

Al Viro authored Dec 29, 2008

We used to put them on a single list, without any locking.  Racy.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

5b45d96b

vfs: remove duplicate code in get_fs_type() · d8e9650d

Li Zefan authored Dec 25, 2008

save 14 bytes:

   text    data     bss     dec     hex filename
   1354      32       4    1390     56e fs/filesystems.o.before
   text    data     bss     dec     hex filename
   1340      32       4    1376     560 fs/filesystems.o
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d8e9650d

add a vfs_fsync helper · 4c728ef5

Christoph Hellwig authored Dec 22, 2008

Fsync currently has a fdatawrite/fdatawait pair around the method call,
and a mutex_lock/unlock of the inode mutex.  All callers of fsync have
to duplicate this, but we have a few and most of them don't quite get
it right.  This patch adds a new vfs_fsync that takes care of this.
It's a little more complicated as usual as ->fsync might get a NULL file
pointer and just a dentry from nfsd, but otherwise gets afile and we
want to take the mapping and file operations from it when it is there.

Notes on the fsync callers:

 - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
   	lower file
 - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
	file, and returning 0 when ->fsync was missing
 - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
   taking i_mutex.  Now given that shared memory doesn't have disk
   backing not doing anything in fsync seems fine and I left it out of
   the vfs_fsync conversion for now, but in that case we might just
   not pass it through to the lower file at all but just call the no-op
   simple_sync_file directly.

[and now actually export vfs_fsync]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4c728ef5

sys_execve and sys_uselib do not call into fsnotify · 6110e3ab

Eric Paris authored Dec 17, 2008

sys_execve and sys_uselib do not call into fsnotify so inotify does not get
open events for these types of syscalls.  This patch simply makes the
requisite fsnotify calls.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6110e3ab

zero i_uid/i_gid on inode allocation · 56ff5efa

Al Viro authored Dec 09, 2008

... and don't bother in callers.  Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.

i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

56ff5efa

inode->i_op is never NULL · acfa4380

Al Viro authored Dec 04, 2008

We used to have rather schizophrenic set of checks for NULL ->i_op even
though it had been eliminated years ago. You'd need to go out of your
way to set it to NULL explicitly _and_ a bunch of code would die on
such inodes anyway. After killing two remaining places that still
did that bogosity, all that crap can go away.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

acfa4380

ntfs: don't NULL i_op · 9742df33

Al Viro authored Dec 04, 2008

it's already set to empty table (and no, ntfs doesn't have any explicit
checks for NULL ->i_op or NULL ->i_fop)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9742df33

isofs check for NULL ->i_op in root directory is dead code · 261964c6

Al Viro authored Dec 04, 2008

for one thing it never happens, for another we check that inode
is a directory right after that place anyway (and we'd already
checked that reading it from disk has not failed).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

261964c6

affs: do not zero ->i_op · c765d479

Al Viro authored Dec 04, 2008

it is already set to empty table and should never be NULL
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c765d479

kill suid bit only for regular files · 7f5ff766

Dmitri Monakhov authored Dec 01, 2008

We don't have to do it because it is useless for non regular files.
In fact block device may trigger this path without dentry->d_inode->i_mutex.

(akpm: concerns were expressed (by me) about S_ISDIR inodes)
Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7f5ff766

vfs: lseek(fd, 0, SEEK_CUR) race condition · 5b6f1eb9

Alain Knaff authored Nov 10, 2008

This patch fixes a race condition in lseek. While it is expected that
unpredictable behaviour may result while repositioning the offset of a
file descriptor concurrently with reading/writing to the same file
descriptor, this should not happen when merely *reading* the file
descriptor's offset.

Unfortunately, the only portable way in Unix to read a file
descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this
concurrently with read/write may mess up the position.

[with fixes from akpm]
Signed-off-by: Alain Knaff <alain@knaff.lu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

5b6f1eb9

Merge branch 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current · fe0bdec6

Linus Torvalds authored Jan 04, 2009

* 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  audit: validate comparison operations, store them in sane form
  clean up audit_rule_{add,del} a bit
  make sure that filterkey of task,always rules is reported
  audit rules ordering, part 2
  fixing audit rule ordering mess, part 1
  audit_update_lsm_rules() misses the audit_inode_hash[] ones
  sanitize audit_log_capset()
  sanitize audit_fd_pair()
  sanitize audit_mq_open()
  sanitize AUDIT_MQ_SENDRECV
  sanitize audit_mq_notify()
  sanitize audit_mq_getsetattr()
  sanitize audit_ipc_set_perm()
  sanitize audit_ipc_obj()
  sanitize audit_socketcall
  don't reallocate buffer in every audit_sockaddr()

fe0bdec6

04 Jan, 2009 19 commits

rtc: add alarm/update irq interfaces · 099e6576

Alessandro Zummo authored Jan 04, 2009

Add standard interfaces for alarm/update irqs enabling.  Drivers are no
more required to implement equivalent ioctl code as rtc-dev will provide
it.

UIE emulation should now be handled correctly and will work even for those
RTC drivers who cannot be configured to do both UIE and AIE.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

099e6576

fs: symlink write_begin allocation context fix · 54566b2c

Nick Piggin authored Jan 04, 2009

With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

54566b2c

viafb: fix crashes due to 4k stack overflow · e687d691

Bruno Prémont authored Jan 04, 2009

The function viafb_cursor() uses 2 stack-variables of CURSOR_SIZE bits;
CURSOR_SIZE is defined as (8 * 1024).  Using up twice 1k on stack is too
much for 4k-stack (though it works with 8k-stacks).  Make those two
variables kzalloc'ed to preserve stack space.

Also merge the whole lot of local struct's in viafb_ioctl into a union so
the stack usage gets minimized here as well.  (struct's are only accessed
in their indicidual IOCTL case) This second part is only compile-tested as
I know of no userspace app using the IOCTLs.
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: <JosephChan@via.com.tw>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e687d691

fs: introduce bgl_lock_ptr() · c644f0e4

Pekka Enberg authored Jan 04, 2009

As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in
<linux/blockgroup_lock.h> and add separate sb_bgl_lock() helpers to
filesystem specific header files to break the hidden dependency to
struct ext[234]_sb_info.

Also, while at it, convert the macros to static inlines to try make up
for all the times I broke Andrew Morton's tree.
Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c644f0e4

spi.h uses/needs device.h · 0a30c5ce

Randy Dunlap authored Jan 04, 2009

Include header files as used/needed:

  In file included from drivers/leds/leds-dac124s085.c:16:
  include/linux/spi/spi.h:66: error: field 'dev' has incomplete type
  include/linux/spi/spi.h: In function 'to_spi_device':
  include/linux/spi/spi.h:100: warning: type defaults to 'int' in declaration of '__mptr'
  ...
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0a30c5ce

vmalloc.c: fix flushing in vmap_page_range() · 2e4e27c7

Adam Lackorzynski authored Jan 04, 2009

The flush_cache_vmap in vmap_page_range() is called with the end of the
range twice.  The following patch fixes this for me.
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2e4e27c7

cgroups: fix a race between cgroup_clone and umount · 7b574b7b

Li Zefan authored Jan 04, 2009

The race is calling cgroup_clone() while umounting the ns cgroup subsys,
and thus cgroup_clone() might access invalid cgroup_fs, or kill_sb() is
called after cgroup_clone() created a new dir in it.

The BUG I triggered is BUG_ON(root->number_of_cgroups != 1);

  ------------[ cut here ]------------
  kernel BUG at kernel/cgroup.c:1093!
  invalid opcode: 0000 [#1] SMP
  ...
  Process umount (pid: 5177, ti=e411e000 task=e40c4670 task.ti=e411e000)
  ...
  Call Trace:
   [<c0493df7>] ? deactivate_super+0x3f/0x51
   [<c04a3600>] ? mntput_no_expire+0xb3/0xdd
   [<c04a3ab2>] ? sys_umount+0x265/0x2ac
   [<c04a3b06>] ? sys_oldumount+0xd/0xf
   [<c0403911>] ? sysenter_do_call+0x12/0x31
  ...
  EIP: [<c0456e76>] cgroup_kill_sb+0x23/0xe0 SS:ESP 0068:e411ef2c
  ---[ end trace c766c1be3bf944ac ]---

Cc: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7b574b7b

audit: validate comparison operations, store them in sane form · 5af75d8d

Al Viro authored Dec 16, 2008

Don't store the field->op in the messy (and very inconvenient for e.g.
audit_comparator()) form; translate to dense set of values and do full
validation of userland-submitted value while we are at it.

->audit_init_rule() and ->audit_match_rule() get new values now; in-tree
instances updated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

5af75d8d

clean up audit_rule_{add,del} a bit · 36c4f1b1
Al Viro authored Dec 15, 2008
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
36c4f1b1
make sure that filterkey of task,always rules is reported · e048e02c
Al Viro authored Dec 16, 2008
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
e048e02c

audit rules ordering, part 2 · e45aa212

Al Viro authored Dec 15, 2008

Fix the actual rule listing; add per-type lists _not_ used for matching,
with all exit,... sitting on one such list.  Simplifies "do something
for all rules" logics, while we are at it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e45aa212

fixing audit rule ordering mess, part 1 · 0590b933

Al Viro authored Dec 14, 2008

Problem: ordering between the rules on exit chain is currently lost;
all watch and inode rules are listed after everything else _and_
exit,never on one kind doesn't stop exit,always on another from
being matched.

Solution: assign priorities to rules, keep track of the current
highest-priority matching rule and its result (always/never).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

0590b933

audit_update_lsm_rules() misses the audit_inode_hash[] ones · 1a9d0797
Al Viro authored Dec 14, 2008
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
1a9d0797

sanitize audit_log_capset() · 57f71a0a

Al Viro authored Jan 04, 2009

* no allocations
* return void
* don't duplicate checked for dummy context
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

57f71a0a

sanitize audit_fd_pair() · 157cf649

Al Viro authored Dec 14, 2008

* no allocations
* return void
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

157cf649

sanitize audit_mq_open() · 564f6993

Al Viro authored Dec 14, 2008

* don't bother with allocations
* don't do double copy_from_user()
* don't duplicate parts of check for audit_dummy_context()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

564f6993

sanitize AUDIT_MQ_SENDRECV · c32c8af4

Al Viro authored Dec 14, 2008

* logging the original value of *msg_prio in mq_timedreceive(2)
  is insane - the argument is write-only (i.e. syscall always
  ignores the original value and only overwrites it).
* merge __audit_mq_timed{send,receive}
* don't do copy_from_user() twice
* don't mess with allocations in auditsc part
* ... and don't bother checking !audit_enabled and !context in there -
  we'd already checked for audit_dummy_context().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c32c8af4

sanitize audit_mq_notify() · 20114f71

Al Viro authored Dec 10, 2008

* don't copy_from_user() twice
* don't bother with allocations
* don't duplicate parts of audit_dummy_context()
* make it return void
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

20114f71

sanitize audit_mq_getsetattr() · 7392906e

Al Viro authored Dec 10, 2008

* get rid of allocations
* make it return void
* don't duplicate parts of audit_dummy_context()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7392906e