Commits · c6fc81afa2d7ef2f775e48672693d8a0a8a7325d · Kirill Smelkov / linux

26 Apr, 2010 40 commits

sched: Fix a race between ttwu() and migrate_task() · c6fc81af

John Wright authored Apr 13, 2010

Based on commit e2912009 upstream, but
done differently as this issue is not present in .33 or .34 kernels due
to rework in this area.

If a task is in the TASK_WAITING state, then try_to_wake_up() is working
on it, and it will place it on the correct cpu.

This commit ensures that neither migrate_task() nor __migrate_task()
calls set_task_cpu(p) while p is in the TASK_WAKING state.  Otherwise,
there could be two concurrent calls to set_task_cpu(p), resulting in
the task's cfs_rq being inconsistent with its cpu.
Signed-off-by: John Wright <john.wright@hp.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c6fc81af

ecryptfs: fix error code for missing xattrs in lower fs · 328851d3

Christian Pulvermacher authored Mar 23, 2010

commit cfce08c6 upstream.

If the lower file system driver has extended attributes disabled,
ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
This breaks execution of programs in the ecryptfs mount, since the
kernel expects the latter error when checking for security
capabilities in xattrs.
Signed-off-by: Christian Pulvermacher <pulvermacher@gmx.de>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

328851d3

eCryptfs: Decrypt symlink target for stat size · 0bee0a7b

Tyler Hicks authored Mar 22, 2010

commit 3a60a168 upstream.

Create a getattr handler for eCryptfs symlinks that is capable of
reading the lower target and decrypting its path.  Prior to this patch,
a stat's st_size field would represent the strlen of the encrypted path,
while readlink() would return the strlen of the decrypted path.  This
could lead to confusion in some userspace applications, since the two
values should be equal.

https://bugs.launchpad.net/bugs/524919Reported-by: Loïc Minier <loic.minier@canonical.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

0bee0a7b

ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode · c1fd3a9c

Jeff Mahoney authored Mar 19, 2010

commit 133b8f9d upstream.

Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.

When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
2008 and several months later commit caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.

The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.

As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().

This patch removes the d_drop call and fixes both issues.

This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887Reported-by: Árpád Bíró <biroa@demasz.hu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c1fd3a9c

b43: Optimize PIO scratchbuffer usage · b8798e6f

Michael Buesch authored Oct 09, 2009

commit 88499ab3 upstream.

This optimizes the PIO scratchbuffer usage.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b8798e6f

agp/hp: fixup hp agp after ACPI changes · 8bb98f7c

Bjorn Helgaas authored Jan 07, 2010

commit 67fe63b0 upstream.

Commit 15b8dd53 changed the string in info->hardware_id from a static
array to a pointer and added a length field.  But instead of changing
"sizeof(array)" to "length", we changed it to "sizeof(length)" (== 4),
which corrupts the string we're trying to null-terminate.

We no longer even need to null-terminate the string, but we *do* need to
check whether we found a HID.  If there's no HID, we used to have an empty
array, but now we have a null pointer.

The combination of these defects causes this oops:

  Unable to handle kernel NULL pointer dereference (address 0000000000000003)
  modprobe[895]: Oops 8804682956800 [1]
  ip is at zx1_gart_probe+0xd0/0xcc0 [hp_agp]

  http://marc.info/?l=linux-ia64&m=126264484923647&w=2Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Reported-by: Émeric Maschino <emeric.maschino@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

8bb98f7c

ALSA: hda - Add position_fix quirk for Biostar mobo · fca6a735

Takashi Iwai authored Apr 15, 2010

commit 8815cd03 upstream.

The Biostar mobo seems to give a wrong DMA position, resulting in
stuttering or skipping sounds on 2.6.34.  Since the commit
7b3a177b, "ALSA: pcm_lib: fix "something
must be really wrong" condition", makes the position check more strictly,
the DMA position problem is revealed more clearly now.

The fix is to use only LPIB for obtaining the position, i.e. passing
position_fix=1.  This patch adds a static quirk to achieve it as default.
Reported-by: Frank Griffin <ftg@roadrunner.com>
Cc: Eric Piel <Eric.Piel@tremplin-utc.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

fca6a735

b43: fall back gracefully to PIO mode after fatal DMA errors · 04f6e097

Linus Torvalds authored Feb 26, 2010

commit 9e3bd919 upstream.

This makes the b43 driver just automatically fall back to PIO mode when
DMA doesn't work.

The driver already told the user to do it, so rather than have the user
reload the module with a new flag, just make the driver do it
automatically. We keep the message as an indication that something is
wrong, but now just automatically fall back to the hopefully working PIO
case.

(Some post-2.6.33 merge fixups by Larry Finger <Larry.Finger@lwfinger.net>
and yours truly... -- JWL)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

04f6e097

b43: Allow PIO mode to be selected at module load · 2e53415d

Larry Finger authored Dec 10, 2009

commit b02914af upstream.

If userencounter the "Fatal DMA Problem" with a BCM43XX device, and
still wish to use b43 as the driver, their only option is to rebuild
the kernel with CONFIG_B43_FORCE_PIO. This patch removes this option and
allows PIO mode to be selected with a load-time parameter for the module.
Note that the configuration variable CONFIG_B43_PIO is also removed.

Once the DMA problem with the BCM4312 devices is solved, this patch will
likely be reverted.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: John Daiker <daikerjohn@gmail.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2e53415d

b43: Remove reset after fatal DMA error · a56d0404

Larry Finger authored Dec 09, 2009

commit 214ac9a4 upstream.

As shown in Kernel Bugzilla #14761, doing a controller restart after a
fatal DMA error does not accomplish anything other than consume the CPU
on an affected system. Accordingly, substitute a meaningful message for
the restart.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a56d0404

i2c-i801: Add Intel Cougar Point device IDs · c10fdc21

Seth Heasley authored Mar 02, 2010

commit 39376434 upstream.

Add the Intel Cougar Point (PCH) SMBus controller device IDs.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c10fdc21

ahci: AHCI and RAID mode SATA patch for Intel Cougar Point DeviceIDs · 2e1efde6

Seth Heasley authored Jan 12, 2010

commit 5623cab8 upstream.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2e1efde6

ata_piix: IDE Mode SATA patch for Intel Cougar Point DeviceIDs · 3c1bd2b8

Seth Heasley authored Jan 12, 2010

commit 88e8201e upstream.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

3c1bd2b8

ALSA: hda - enable snoop for Intel Cougar Point · d5771ad1

Seth Heasley authored Feb 22, 2010

commit 32679f95 upstream.

This patch enables snoop, eliminating static during playback.
This patch supersedes the previous Cougar Point audio patch.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

d5771ad1

ALSA: hda_intel: ALSA HD Audio patch for Intel Cougar Point DeviceIDs · 397e6f81

Seth Heasley authored Jan 12, 2010

commit d2f2fcd2 upstream.

This patch adds the Intel Cougar Point (PCH) HD Audio Controller DeviceIDs.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

397e6f81

x86/PCI: irq and pci_ids patch for Intel Cougar Point DeviceIDs · 9a9e24c8

Seth Heasley authored Jan 12, 2010

commit 93da6202 upstream.

This patch adds the Intel Cougar Point (PCH) LPC and SMBus Controller DeviceIDs.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9a9e24c8

IPoIB: Fix TX queue lockup with mixed UD/CM traffic · 098f33f9

Eli Cohen authored Mar 03, 2010

commit f0dc117a upstream.

The IPoIB UD QP reports send completions to priv->send_cq, which is
usually left unarmed; it only gets armed when the number of
outstanding send requests reaches the size of the TX queue. This
arming is done only in the send path for the UD QP.  However, when
sending CM packets, the net queue may be stopped for the same reasons
but no measures are taken to recover the UD path from a lockup.

Consider this scenario: a host sends high rate of both CM and UD
packets, with a TX queue length of N.  If at some time the number of
outstanding UD packets is more than N/2 and the overall outstanding
packets is N-1, and CM sends a packet (making the number of
outstanding sends equal N), the TX queue will be stopped.  When all
the CM packets complete, the number of outstanding packets will still
be higher than N/2 so the TX queue will not be restarted.

Fix this by calling ib_req_notify_cq() when the queue is stopped in
the CM path.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

098f33f9

PCI: fix nested spinlock hang in aer_inject · 90022a26

Andrew Patterson authored Jan 22, 2010

commit bd1f46de upstream.

The aer_inject module hangs in aer_inject() when checking the device's
error masks.  The hang is due to a recursive use of the aer_inject lock.
The aer_inject() routine grabs the lock while processing the error and then
calls pci_read_config_dword to read the masks. The pci_read_config_dword
routine is earlier overridden by pci_read_aer, which among other things,
grabs the aer_inject lock.

Fixed by moving the pci_read_config_dword calls to read the masks to before
the lock is taken.
Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

90022a26

PCI: kill off pci_register_set_vga_state() symbol export. · 892cb6dc

Paul Mundt authored Mar 11, 2010

commit ded1d8f2 upstream.

When pci_register_set_vga_state() was made __init, the EXPORT_SYMBOL() was
retained, which now leaves us with a section mismatch.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

892cb6dc

pci: Update pci_set_vga_state() to call arch functions · a6f2691c

Mike Travis authored Feb 02, 2010

commit 95a8b6ef upstream.

Update pci_set_vga_state to call arch dependent functions to enable Legacy
VGA I/O transactions to be redirected to correct target.

[akpm@linux-foundation.org: make pci_register_set_vga_state() __init]
Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <201002022238.o12McE1J018723@imap1.linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a6f2691c

SCSI: fc-transport: Use packed modifier for fc_bsg_request structure. · cf1b6414

Harish Zunjarrao authored Jan 12, 2010

commit eda05a28 upstream.

The 32bit kernel does not add padding bytes in the fc_bsg_request structure
whereas the 64bit kernel adds padding bytes in the fc_bsg_request structure.
Due to this, structure elements gets mismatched with 32bit application and
64bit kernel.To resolve this, used packed modifier to avoid adding padding bytes.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cf1b6414

vgaarb: Fix VGA arbiter to accept PCI domains other than 0 · 7de7b92d

Mike Travis authored Feb 02, 2010

commit 773a38db upstream.

Update the VGA Arbiter to accept PCI Domains other than 0.
Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <201002022238.o12McFe8018730@imap1.linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

7de7b92d

PCIe AER: prevent AER injection if hardware masks error reporting · 9623f6b4

Youquan Song authored Dec 17, 2009

commit b49bfd32 upstream.

The Correcteable/Uncorrectable Error Mask Registers are used by PCIe AER
driver which will controls the reporting of individual errors to PCIe RC
via PCIe error messages.

If hardware masks special error reporting to RC, the aer_inject driver
should not inject aer error.
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9623f6b4

igb: add support for 82576NS SerDes adapter · a8fa34c4

Alexander Duyck authored Oct 05, 2009

commit 747d49ba upstream.

This patch adds the device ID necessary to support the 82576NS SerDes
adapter.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a8fa34c4

SCSI: add scsi target reset support to scsi ioctl · f435966f

Mike Christie authored Nov 05, 2009

commit 3f9daedf upstream.

The scsi ioctl code path was missing scsi target reset
support. This patch just adds it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f435966f

fc class: fail fast bsg requests · 285f4f11

Mike Christie authored Nov 05, 2009

commit 2bc1c59d upstream.

If the port state is blocked and the fast io fail tmo has
fired then this patch will fail bsg requests immediately.
This is needed if userspace is sending IOs to test the transport
like with fcping, so it will not have to wait for the dev loss tmo.
With this patch he bsg req fast io fail code behaves like the normal
and sg io/passthrough fast io fail.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-By: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

285f4f11

9p: Skip check for mandatory locks when unlocking · 56d70410

Sachin Prabhu authored Mar 13, 2010

commit f78233dd upstream.

While investigating a bug, I came across a possible bug in v9fs. The
problem is similar to the one reported for NFS by ASANO Masahiro in
http://lkml.org/lkml/2005/12/21/334.

v9fs_file_lock() will skip locks on file which has mode set to 02666.
This is a problem in cases where the mode of the file is changed after
a process has obtained a lock on the file. Such a lock will be skipped
during unlock and the machine will end up with a BUG in
locks_remove_flock().

v9fs_file_lock() should skip the check for mandatory locks when
unlocking a file.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

56d70410

ocfs2: Change bg_chain check for ocfs2_validate_gd_parent. · 26ec941a

Tao Ma authored Mar 03, 2010

commit 78c37eb0 upstream.

In ocfs2_validate_gd_parent, we check bg_chain against the
cl_next_free_rec of the dinode. Actually in resize, we have
the chance of bg_chain == cl_next_free_rec. So add some
additional condition check for it.

I also rename paramter "clean_error" to "resize", since the
old one is not clearly enough to indicate that we should only
meet with this case in resize.

btw, the correpsonding bug is
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

26ec941a

ocfs2: set i_mode on disk during acl operations · 80acb649

Mark Fasheh authored Mar 15, 2010

commit fcefd25a upstream.

ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory
inode, but never setting it on the disk copy. Thus, acls were some times not
getting propagated between nodes. This patch fixes the issue by adding a
helper function ocfs2_acl_set_mode() which does this the right way.
ocfs2_set_acl() and ocfs2_init_acl() are then updated to call
ocfs2_acl_set_mode().
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

80acb649

Staging: comedi: usbdux.c: fix locking up of the driver when the comedi ringbuffer runs empty · 36109605

Bernd Porr authored Nov 27, 2009

commit d4c3a565 upstream.

Jan-Matthias Braun spotted a bug which locks up the driver when the
comedi ring buffer runs empty and provided a patch. The driver would
still send the data to comedi but the reader won't wake up any more.
What's required is setting the flag COMEDI_CB_BLOCK after new data has
arrived which wakes up the reader and therefore the read() command.
Signed-off-by: Bernd Porr <berndporr@f2s.com>
Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

36109605

Staging: comedi: fix usbdux timeout bug · 3410f6e8

Bernd Porr authored Nov 16, 2009

commit ea25371a upstream.

I've fixed a bug in the USBDUX driver which caused timeouts while
sending commands to the boards. This was mainly because of one bulk
transfer which had a timeout of 1ms (!). I've now set all timeouts to
1000ms.

From: Bernd Porr <BerndPorr@f2s.com>
Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

3410f6e8

quota: Fix possible dq_flags corruption · b297f8be

Andrew Perepechko authored Apr 12, 2010

commit 08261673 upstream.

dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and
atomically for example in mark_dquot_dirty or clear_dquot_dirty.  Hence a
change done by an atomic operation can be overwritten by a change done by a
non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk.
Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b297f8be

dm mpath: fix stall when requeueing io · 2548bd8d

Kiyoshi Ueda authored Mar 26, 2010

upstream commit 9eef87da backported to
2.6.32.10 by Nikanth Karthikesan <knikanth@suse.de>

This patch fixes the problem that system may stall if target's ->map_rq
returns DM_MAPIO_REQUEUE in map_request().
E.g. stall happens on 1 CPU box when a dm-mpath device with queue_if_no_path
     bounces between all-paths-down and paths-up on I/O load.

When target's ->map_rq returns DM_MAPIO_REQUEUE, map_request() requeues
the request and returns to dm_request_fn().  Then, dm_request_fn()
doesn't exit the I/O dispatching loop and continues processing
the requeued request again.
This map and requeue loop can be done with interrupt disabled,
so 1 CPU system can be stalled if this situation happens.

For example, commands below can stall my 1 CPU box within 1 minute or so:
  # dmsetup table mp
  mp: 0 2097152 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 2 8:144 1 1
  # while true; do dd if=/dev/mapper/mp of=/dev/null bs=1M count=100; done &
  # while true; do \
  > dmsetup message mp 0 "fail_path 8:144" \
  > dmsetup suspend --noflush mp \
  > dmsetup resume mp \
  > dmsetup message mp 0 "reinstate_path 8:144" \
  > done

To fix the problem above, this patch changes dm_request_fn() to exit
the I/O dispatching loop once if a request is requeued in map_request().
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2548bd8d

fix NFS4 handling of mountpoint stat · 839f573c

Al Viro authored Jan 30, 2010

commit 462d6057 upstream.

RFC says we need to follow the chain of mounts if there's more
than one stacked on that point.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

839f573c

x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write · e422a4a5

Avi Kivity authored Feb 13, 2010

commit 0d1622d7 upstream.

The Intel Architecture Optimization Reference Manual states that a short
load that follows a long store to the same object will suffer a store
forwading penalty, particularly if the two accesses use different addresses.
Trivially, a long load that follows a short store will also suffer a penalty.

__downgrade_write() in rwsem incurs both penalties:  the increment operation
will not be able to reuse a recently-loaded rwsem value, and its result will
not be reused by any recently-following rwsem operation.

A comment in the code states that this is because 64-bit immediates are
special and expensive; but while they are slightly special (only a single
instruction allows them), they aren't expensive: a test shows that two loops,
one loading a 32-bit immediate and one loading a 64-bit immediate, both take
1.5 cycles per iteration.

Fix this by changing __downgrade_write to use the same add instruction on
i386 and on x86_64, so that it uses the same operand size as all the other
rwsem functions.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e422a4a5

x86: Fix breakage of UML from the changes in the rwsem system · f9a05e26

Linus Torvalds authored Jan 17, 2010

commit 4126faf0 upstream.

The patches 5d0b7235 and
bafaecd1 broke the UML build:

On Sun, 17 Jan 2010, Ingo Molnar wrote:
>
> FYI, -tip testing found that these changes break the UML build:
>
> kernel/built-in.o: In function `__up_read':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:192: undefined reference to `call_rwsem_wake'
> kernel/built-in.o: In function `__up_write':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:210: undefined reference to `call_rwsem_wake'
> kernel/built-in.o: In function `__downgrade_write':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:228: undefined reference to `call_rwsem_downgrade_wake'
> kernel/built-in.o: In function `__down_read':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:112: undefined reference to `call_rwsem_down_read_failed'
> kernel/built-in.o: In function `__down_write_nested':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:154: undefined reference to `call_rwsem_down_write_failed'
> collect2: ld returned 1 exit status

Add lib/rwsem_64.o to the UML subarch objects to fix.

LKML-Reference: <alpine.LFD.2.00.1001171023440.13231@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f9a05e26

x86-64: support native xadd rwsem implementation · 70cc92a7

Linus Torvalds authored Jan 12, 2010

commit bafaecd1 upstream.

This one is much faster than the spinlock based fallback rwsem code,
with certain artifical benchmarks having shown 300%+ improvement on
threaded page faults etc.

Again, note the 32767-thread limit here. So this really does need that
whole "make rwsem_count_t be 64-bit and fix the BIAS values to match"
extension on top of it, but that is conceptually a totally independent
issue.

NOT TESTED! The original patch that this all was based on were tested by
KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the
cleaned-up series, so caveat emptor..

Also note that it _may_ be a good idea to mark some more registers
clobbered on x86-64 in the inline asms instead of saving/restoring them.
They are inline functions, but they are only used in places where there
are not a lot of live registers _anyway_, so doing for example the
clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any
worse, and would make the slow-path code smaller.

(Not that the slow-path really matters to that degree. Saving a few
unnecessary registers is the _least_ of our problems when we hit the slow
path. The instruction/cycle counting really only matters in the fast
path).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121810410.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

70cc92a7

x86-64, rwsem: 64-bit xadd rwsem implementation · 70701527

H. Peter Anvin authored Jan 18, 2010

commit 1838ef1d upstream.

For x86-64, 32767 threads really is not enough.  Change rwsem_count_t
to a signed long, so that it is 64 bits on x86-64.

This required the following changes to the assembly code:

a) %z0 doesn't work on all versions of gcc!  At least gcc 4.4.2 as
   shipped with Fedora 12 emits "ll" not "q" for 64 bits, even for
   integer operands.  Newer gccs apparently do this correctly, but
   avoid this problem by using the _ASM_ macros instead of %z.
b) 64 bits immediates are only allowed in "movq $imm,%reg"
   constructs... no others.  Change some of the constraints to "e",
   and fix the one case where we would have had to use an invalid
   immediate -- in that case, we only care about the upper half
   anyway, so just access the upper half.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <tip-bafaecd1@git.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

70701527

x86: clean up rwsem type system · e8356820

Linus Torvalds authored Jan 12, 2010

commit 5d0b7235 upstream.

The fast version of the rwsems (the code that uses xadd) has
traditionally only worked on x86-32, and as a result it mixes different
kinds of types wildly - they just all happen to be 32-bit.  We have
"long", we have "__s32", and we have "int".

To make it work on x86-64, the types suddenly matter a lot more.  It can
be either a 32-bit or 64-bit signed type, and both work (with the caveat
that a 32-bit counter will only have 15 bits of effective write
counters, so it's limited to 32767 users).  But whatever type you
choose, it needs to be used consistently.

This makes a new 'rwsem_counter_t', that is a 32-bit signed type.  For a
64-bit type, you'd need to also update the BIAS values.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121755220.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e8356820

x86-32: clean up rwsem inline asm statements · e0c312c9

Linus Torvalds authored Jan 12, 2010

commit 59c33fa7 upstream.

This makes gcc use the right register names and instruction operand sizes
automatically for the rwsem inline asm statements.

So instead of using "(%%eax)" to specify the memory address that is the
semaphore, we use "(%1)" or similar. And instead of forcing the operation
to always be 32-bit, we use "%z0", taking the size from the actual
semaphore data structure itself.

This doesn't actually matter on x86-32, but if we want to use the same
inline asm for x86-64, we'll need to have the compiler generate the proper
64-bit names for the registers (%rax instead of %eax), and if we want to
use a 64-bit counter too (in order to avoid the 15-bit limit on the
write counter that limits concurrent users to 32767 threads), we'll need
to be able to generate instructions with "q" accesses rather than "l".

Since this header currently isn't enabled on x86-64, none of that matters,
but we do want to use the xadd version of the semaphores rather than have
to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
when you have lots of threads all taking page faults, and the fallback
rwsem code that uses a spinlock performs abysmally badly in that case.

[ hpa: modified the patch to skip size suffixes entirely when they are
redundant due to register operands. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121613560.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e0c312c9