Commits · 499f286dc02cde6b668e2d757dfe100cb0c43445 · Kirill Smelkov / linux

12 Mar, 2012 1 commit

autofs: work around unhappy compat problem on x86-64 · 499f286d

Ian Kent authored Feb 22, 2012

commit a32744d4 upstream.

When the autofs protocol version 5 packet type was added in commit
5c0a32fc ("autofs4: add new packet type for v5 communications"), it
obvously tried quite hard to be word-size agnostic, and uses explicitly
sized fields that are all correctly aligned.

However, with the final "char name[NAME_MAX+1]" array at the end, the
actual size of the structure ends up being not very well defined:
because the struct isn't marked 'packed', doing a "sizeof()" on it will
align the size of the struct up to the biggest alignment of the members
it has.

And despite all the members being the same, the alignment of them is
different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte
alignment on x86-64.  And while 'NAME_MAX+1' ends up being a nice round
number (256), the name[] array starts out a 4-byte aligned.

End result: the "packed" size of the structure is 300 bytes: 4-byte, but
not 8-byte aligned.

As a result, despite all the fields being in the same place on all
architectures, sizeof() will round up that size to 304 bytes on
architectures that have 8-byte alignment for u64.

Note that this is *not* a problem for 32-bit compat mode on POWER, since
there __u64 is 8-byte aligned even in 32-bit mode.  But on x86, 32-bit
and 64-bit alignment is different for 64-bit entities, and as a result
the structure that has exactly the same layout has different sizes.

So on x86-64, but no other architecture, we will just subtract 4 from
the size of the structure when running in a compat task.  That way we
will write the properly sized packet that user mode expects.

Not pretty.  Sadly, this very subtle, and unnecessary, size difference
has been encoded in user space that wants to read packets of *exactly*
the right size, and will refuse to touch anything else.
Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

499f286d

01 Mar, 2012 39 commits

Linux 3.0.23 · bf6a68d2
Greg Kroah-Hartman authored Feb 29, 2012

bf6a68d2

cdrom: use copy_to_user() without the underscores · 6b06abac

Dan Carpenter authored Feb 06, 2012

commit 822bfa51 upstream.

"nframes" comes from the user and "nframes * CD_FRAMESIZE_RAW" can wrap
on 32 bit systems.  That would have been ok if we used the same wrapped
value for the copy, but we use a shifted value.  We should just use the
checked version of copy_to_user() because it's not going to make a
difference to the speed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6b06abac

epoll: limit paths · 54774023

Jason Baron authored Jan 12, 2012

commit 28d82dc1 upstream.

The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely.  A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.

To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited.  Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm.  In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.

Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.

This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events.  I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'.  In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5.  Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links.  This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.

In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'.  In this case, each 'source file descriptor' has a 1 path of
length 1.  Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous.  Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.

In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations.  Currently its only used in a subset
of the add paths.  I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths.  I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead.  Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.

Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4

This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1).  However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order.  Thus, this limit is currently easily bypassed.  The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.

Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL.  I've also
testing using the piptest.c epoll tester, which showed no difference in
performance.  I've also created a number of different epoll networks and
tested that they behave as expectded.

I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Nelson Elhage <nelhage@ksplice.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

54774023

epoll: ep_unregister_pollwait() can use the freed pwq->whead · d10e3b29

Oleg Nesterov authored Feb 24, 2012

commit 971316f0 upstream.

signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.

Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.

This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d10e3b29

epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() · 391d7bfe

Oleg Nesterov authored Feb 24, 2012

commit d80e731e upstream.

This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

391d7bfe

hwmon: (f75375s) Fix register write order when setting fans to full speed · 809a6d6f

Nikolaus Schulz authored Feb 22, 2012

commit c1c1a3d0 upstream.

By hwmon sysfs interface convention, setting pwm_enable to zero sets a fan
to full speed.  In the f75375s driver, this need be done by enabling
manual fan control, plus duty mode for the F875387 chip, and then setting
the maximum duty cycle.  Fix a bug where the two necessary register writes
were swapped, effectively discarding the setting to full-speed.
Signed-off-by: Nikolaus Schulz <mail@microschulz.de>
Cc: Riku Voipio <riku.voipio@iki.fi>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

809a6d6f

hdpvr: fix race conditon during start of streaming · c79db0b5

Janne Grunau authored Feb 02, 2012

commit afa15953 upstream.

status has to be set to STREAMING before the streaming worker is
queued. hdpvr_transmit_buffers() will exit immediately otherwise.
Reported-by: Joerg Desch <vvd.joede@googlemail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c79db0b5

builddeb: Don't create files in /tmp with predictable names · 674b8d57

Ben Hutchings authored Feb 15, 2012

commit 6c635224 upstream.

The current use of /tmp for file lists is insecure.  Put them under
$objtree/debian instead.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: maximilian attems <max@stro.at>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

674b8d57

davinci_emac: Do not free all rx dma descriptors during init · 841099b9

Christian Riesch authored Feb 23, 2012

commit 5d697032 upstream.

This patch fixes a regression that was introduced by

commit 0a5f3846
davinci_emac: Add Carrier Link OK check in Davinci RX Handler

Said commit adds a check whether the carrier link is ok. If the link is
not ok, the skb is freed and no new dma descriptor added to the rx dma
channel. This causes trouble during initialization when the carrier
status has not yet been updated. If a lot of packets are received while
netif_carrier_ok returns false, all dma descriptors are freed and the
rx dma transfer is stopped.

The bug occurs when the board is connected to a network with lots of
traffic and the ifconfig down/up is done, e.g., when reconfiguring
the interface with DHCP.

The bug can be reproduced by flood pinging the davinci board while doing
ifconfig eth0 down
ifconfig eth0 up
on the board.

After that, the rx path stops working and the overrun value reported
by ifconfig is counting up.

This patch reverts commit 0a5f3846
and instead issues warnings only if cpdma_chan_submit returns -ENOMEM.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Cc: <stable@vger.kernel.org>
Cc: Cyril Chemparathy <cyril@ti.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Tested-by: Rajashekhara, Sudhakar <sudhakar.raj@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

841099b9

jme: Fix FIFO flush issue · 405b6959

Guo-Fu Tseng authored Feb 22, 2012

commit ba9adbe6 upstream.

Set the RX FIFO flush watermark lower.
According to Federico and JMicron's reply,
setting it to 16QW would be stable on most platforms.
Otherwise, user might experience packet drop issue.
Reported-by: Federico Quagliata <federico@quagliata.org>
Fixed-by: Federico Quagliata <federico@quagliata.org>
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

405b6959

ipvs: fix matching of fwmark templates during scheduling · 426f4568

Simon Horman authored Jan 27, 2012

commit e0aac52e upstream.

	Commit f11017ec (2.6.37)
moved the fwmark variable in subcontext that is invalidated before
reaching the ip_vs_ct_in_get call. As vaddr is provided as pointer
in the param structure make sure the fwmark variable is in
same context. As the fwmark templates can not be matched,
more and more template connections are created and the
controlled connections can not go to single real server.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

426f4568

scsi_pm: Fix bug in the SCSI power management handler · 4b3b9e9e

Alan Stern authored Feb 17, 2012

commit fea6d607 upstream.

This patch (as1520) fixes a bug in the SCSI layer's power management
implementation.

LUN scanning can be carried out asynchronously in do_scan_async(), and
sd uses an asynchronous thread for the time-consuming parts of disk
probing in sd_probe_async().  Currently nothing coordinates these
async threads with system sleep transitions; they can and do attempt
to continue scanning/probing SCSI devices even after the host adapter
has been suspended.  As one might expect, the outcome is not ideal.

This is what the "prepare" stage of system suspend was created for.
After the prepare callback has been called for a host, target, or
device, drivers are not allowed to register any children underneath
them.  Currently the SCSI prepare callback is not implemented; this
patch rectifies that omission.

For SCSI hosts, the prepare routine calls scsi_complete_async_scans()
to wait until async scanning is finished.  It might be slightly more
efficient to wait only until the host in question has been scanned,
but there's currently no way to do that.  Besides, during a sleep
transition we will ultimately have to wait until all the host scanning
has finished anyway.

For SCSI devices, the prepare routine calls async_synchronize_full()
to wait until sd probing is finished.  The routine does nothing for
SCSI targets, because asynchronous target scanning is done only as
part of host scanning.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4b3b9e9e

scsi_scan: Fix 'Poison overwritten' warning caused by using freed 'shost' · bbfe8a71

Huajun Li authored Feb 12, 2012

commit 267a6ad4 upstream.

In do_scan_async(), calling scsi_autopm_put_host(shost) may reference
freed shost, and cause Posison overwitten warning.
Yes, this case can happen, for example, an USB is disconnected just
when do_scan_async() thread starts to run, then scsi_host_put() called
in scsi_finish_async_scan() will lead to shost be freed(because the
refcount of shost->shost_gendev decreases to 1 after USB disconnects),
at this point, if references shost again, system will show following
warning msg.

To make scsi_autopm_put_host(shost) always reference a valid shost,
put it just before scsi_host_put() in function
scsi_finish_async_scan().

[  299.281565] =============================================================================
[  299.281634] BUG kmalloc-4096 (Tainted: G          I ): Poison overwritten
[  299.281682] -----------------------------------------------------------------------------
[  299.281684]
[  299.281752] INFO: 0xffff880056c305d0-0xffff880056c305d0. First byte
0x6a instead of 0x6b
[  299.281816] INFO: Allocated in scsi_host_alloc+0x4a/0x490 age=1688
cpu=1 pid=2004
[  299.281870] 	__slab_alloc+0x617/0x6c1
[  299.281901] 	__kmalloc+0x28c/0x2e0
[  299.281931] 	scsi_host_alloc+0x4a/0x490
[  299.281966] 	usb_stor_probe1+0x5b/0xc40 [usb_storage]
[  299.282010] 	storage_probe+0xa4/0xe0 [usb_storage]
[  299.282062] 	usb_probe_interface+0x172/0x330 [usbcore]
[  299.282105] 	driver_probe_device+0x257/0x3b0
[  299.282138] 	__driver_attach+0x103/0x110
[  299.282171] 	bus_for_each_dev+0x8e/0xe0
[  299.282201] 	driver_attach+0x26/0x30
[  299.282230] 	bus_add_driver+0x1c4/0x430
[  299.282260] 	driver_register+0xb6/0x230
[  299.282298] 	usb_register_driver+0xe5/0x270 [usbcore]
[  299.282337] 	0xffffffffa04ab03d
[  299.282364] 	do_one_initcall+0x47/0x230
[  299.282396] 	sys_init_module+0xa0f/0x1fe0
[  299.282429] INFO: Freed in scsi_host_dev_release+0x18a/0x1d0 age=85
cpu=0 pid=2008
[  299.282482] 	__slab_free+0x3c/0x2a1
[  299.282510] 	kfree+0x296/0x310
[  299.282536] 	scsi_host_dev_release+0x18a/0x1d0
[  299.282574] 	device_release+0x74/0x100
[  299.282606] 	kobject_release+0xc7/0x2a0
[  299.282637] 	kobject_put+0x54/0xa0
[  299.282668] 	put_device+0x27/0x40
[  299.282694] 	scsi_host_put+0x1d/0x30
[  299.282723] 	do_scan_async+0x1fc/0x2b0
[  299.282753] 	kthread+0xdf/0xf0
[  299.282782] 	kernel_thread_helper+0x4/0x10
[  299.282817] INFO: Slab 0xffffea00015b0c00 objects=7 used=7 fp=0x
      (null) flags=0x100000000004080
[  299.282882] INFO: Object 0xffff880056c30000 @offset=0 fp=0x          (null)
[  299.282884]
...
Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bbfe8a71

genirq: Handle pending irqs in irq_startup() · fd844dab

Thomas Gleixner authored Feb 08, 2012

commit b4bc724e upstream.

An interrupt might be pending when irq_startup() is called, but the
startup code does not invoke the resend logic. In some cases this
prevents the device from issuing another interrupt which renders the
device non functional.

Call the resend function in irq_startup() to keep things going.
Reported-and-tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fd844dab

genirq: Unmask oneshot irqs when thread was not woken · b7f0787d

Thomas Gleixner authored Feb 07, 2012

commit ac563761 upstream.

When the primary handler of an interrupt which is marked IRQ_ONESHOT
returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not
woken and the unmask logic of the interrupt line is never
invoked. This keeps the interrupt masked forever.

This was not noticed as most IRQ_ONESHOT users wake the thread
unconditionally (usually because they cannot access the underlying
device from hard interrupt context). Though this behaviour was nowhere
documented and not necessarily intentional. Some drivers can avoid the
thread wakeup in certain cases and run into the situation where the
interrupt line s kept masked.

Handle it gracefully.
Reported-and-tested-by: Lothar Wassmann <lw@karo-electronics.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b7f0787d

ath9k: stop on rates with idx -1 in ath9k rate control's .tx_status · 6778e220

Pavel Roskin authored Feb 11, 2012

commit 2504a642 upstream.

Rate control algorithms are supposed to stop processing when they
encounter a rate with the index -1.  Checking for rate->count not being
zero is not enough.

Allowing a rate with negative index leads to memory corruption in
ath_debug_stat_rc().

One consequence of the bug is discussed at
https://bugzilla.redhat.com/show_bug.cgi?id=768639Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6778e220

x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors · 534b465e

Andreas Herrmann authored Feb 08, 2012

commit 32c32338 upstream.

For L1 instruction cache and L2 cache the shared CPU information
is wrong. On current AMD family 15h CPUs those caches are shared
between both cores of a compute unit.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=42607Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Petkov Borislav <Borislav.Petkov@amd.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/20120208195229.GA17523@alberich.amd.comSigned-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

534b465e

USB: Don't fail USB3 probe on missing legacy PCI IRQ. · 5fa70afe

Sarah Sharp authored Feb 23, 2012

commit 68d07f64 upstream

Intel has a PCI USB xhci host controller on a new platform. It doesn't
have a line IRQ definition in BIOS.  The Linux driver refuses to
initialize this controller, but Windows works well because it only depends
on MSI.

Actually, Linux also can work for MSI.  This patch avoids the line IRQ
checking for USB3 HCDs in usb core PCI probe.  It allows the xHCI driver
to try to enable MSI or MSI-X first.  It will fail the probe if MSI
enabling failed and there's no legacy PCI IRQ.

This patch should be backported to kernels as old as 2.6.32.

[Maintainer note: This patch is a backport of commit
68d07f64 "USB: Don't fail USB3 probe on
missing legacy PCI IRQ." to the 3.0 kernel.  Note, the original patch
description was wrong.  We should not back port this to kernels older
than 2.6.36, since that was the first kernel to support MSI and MSI-X
for xHCI hosts.  These systems will just not work without MSI support,
so the probe should fail on kernels older than 2.6.36.]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5fa70afe

usb-storage: fix freezing of the scanning thread · 721eaa34

Alan Stern authored Feb 21, 2012

commit bb94a406 upstream.

This patch (as1521b) fixes the interaction between usb-storage's
scanning thread and the freezer.  The current implementation has a
race: If the device is unplugged shortly after being plugged in and
just as a system sleep begins, the scanning thread may get frozen
before the khubd task.  Khubd won't be able to freeze until the
disconnect processing is complete, and the disconnect processing can't
proceed until the scanning thread finishes, so the sleep transition
will fail.

The implementation in the 3.2 kernel suffers from an additional
problem.  There the scanning thread calls set_freezable_with_signal(),
and the signals sent by the freezer will mess up the thread's I/O
delays, which are all interruptible.

The solution to both problems is the same: Replace the kernel thread
used for scanning with a delayed-work routine on the system freezable
work queue.  Freezable work queues have the nice property that you can
cancel a work item even while the work queue is frozen, and no signals
are needed.

The 3.2 version of this patch solves the problem in Bugzilla #42730.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

721eaa34

i387: re-introduce FPU state preloading at context switch time · f4def3f8

Linus Torvalds authored Feb 18, 2012

commit 34ddc81a upstream.

After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870e ("i387:
do not preload FPU state at task switch time").

However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably

 - properly abstracted as a true FPU state switch, rather than as
   open-coded save and restore with various hacks.

   In particular, implementing it as a proper FPU state switch allows us
   to optimize the CR0.TS flag accesses: there is no reason to set the
   TS bit only to then almost immediately clear it again.  CR0 accesses
   are quite slow and expensive, don't flip the bit back and forth for
   no good reason.

 - Make sure that the same model works for both x86-32 and x86-64, so
   that there are no gratuitous differences between the two due to the
   way they save and restore segment state differently due to
   architectural differences that really don't matter to the FPU state.

 - Avoid exposing the "preload" state to the context switch routines,
   and in particular allow the concept of lazy state restore: if nothing
   else has used the FPU in the meantime, and the process is still on
   the same CPU, we can avoid restoring state from memory entirely, just
   re-expose the state that is still in the FPU unit.

   That optimized lazy restore isn't actually implemented here, but the
   infrastructure is set up for it.  Of course, older CPU's that use
   'fnsave' to save the state cannot take advantage of this, since the
   state saving also trashes the state.

In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage.  Hopefully it's easier to
follow as a result.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f4def3f8

i387: move TS_USEDFPU flag from thread_info to task_struct · 0a9d89d9

Linus Torvalds authored Feb 17, 2012

commit f94edacf upstream.

This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.

This fixes two independent bugs at the same time:

 - changing 'thread_info->status' from the scheduler causes nasty
   problems for the other users of that variable, since it is defined to
   be thread-synchronous (that's what the "TS_" part of the naming was
   supposed to indicate).

   So perfectly valid code could (and did) do

	ti->status |= TS_RESTORE_SIGMASK;

   and the compiler was free to do that as separate load, or and store
   instructions.  Which can cause problems with preemption, since a task
   switch could happen in between, and change the TS_USEDFPU bit. The
   change to TS_USEDFPU would be overwritten by the final store.

   In practice, this seldom happened, though, because the 'status' field
   was seldom used more than once, so gcc would generally tend to
   generate code that used a read-modify-write instruction and thus
   happened to avoid this problem - RMW instructions are naturally low
   fat and preemption-safe.

 - On x86-32, the current_thread_info() pointer would, during interrupts
   and softirqs, point to a *copy* of the real thread_info, because
   x86-32 uses %esp to calculate the thread_info address, and thus the
   separate irq (and softirq) stacks would cause these kinds of odd
   thread_info copy aliases.

   This is normally not a problem, since interrupts aren't supposed to
   look at thread information anyway (what thread is running at
   interrupt time really isn't very well-defined), but it confused the
   heck out of irq_fpu_usable() and the code that tried to squirrel
   away the FPU state.

   (It also caused untold confusion for us poor kernel developers).

It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling).  And the FPU data that we are going to save/restore is
found there too.

Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
the %esp issue.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Reported-and-tested-by: Raphael Prevost <raphael@buro.asia>
Acked-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0a9d89d9

i387: move AMD K7/K8 fpu fxsave/fxrstor workaround from save to restore · 70b5ef05

Linus Torvalds authored Feb 16, 2012

commit 4903062b upstream.

The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
pending.  In order to not leak FIP state from one process to another, we
need to do a floating point load after the fxsave of the old process,
and before the fxrstor of the new FPU state.  That resets the state to
the (uninteresting) kernel load, rather than some potentially sensitive
user information.

We used to do this directly after the FPU state save, but that is
actually very inconvenient, since it

 (a) corrupts what is potentially perfectly good FPU state that we might
     want to lazy avoid restoring later and

 (b) on x86-64 it resulted in a very annoying ordering constraint, where
     "__unlazy_fpu()" in the task switch needs to be delayed until after
     the DS segment has been reloaded just to get the new DS value.

Coupling it to the fxrstor instead of the fxsave automatically avoids
both of these issues, and also ensures that we only do it when actually
necessary (the FP state after a save may never actually get used).  It's
simply a much more natural place for the leaked state cleanup.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

70b5ef05

i387: do not preload FPU state at task switch time · 06f4bbda

Linus Torvalds authored Feb 16, 2012

commit b3b0870e upstream.

Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
is spending several days looking for a bug in the state save/restore
code. And the preload code has some rather subtle interactions with
both paravirtualization support and segment state restore, so it's not
nearly as simple as it should be.

Also, now that we no longer necessarily depend on a single bit (ie
TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
to do better. If we are really switching between two processes that
keep touching the FP state, save/restore is inevitable, but in the case
of having one process that does most of the FPU usage, we may actually
be able to do much better than the preloading.

In particular, we may be able to keep track of which CPU the process ran
on last, and also per CPU keep track of which process' FP state that CPU
has. For modern CPU's that don't destroy the FPU contents on save time,
that would allow us to do a lazy restore by just re-enabling the
existing FPU state - with no restore cost at all!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

06f4bbda

i387: don't ever touch TS_USEDFPU directly, use helper functions · 9221484f

Linus Torvalds authored Feb 16, 2012

commit 6d59d7a9 upstream.

This creates three helper functions that do the TS_USEDFPU accesses, and
makes everybody that used to do it by hand use those helpers instead.

In addition, there's a couple of helper functions for the "change both
CR0.TS and TS_USEDFPU at the same time" case, and the places that do
that together have been changed to use those. That means that we have
fewer random places that open-code this situation.

The intent is partly to clarify the code without actually changing any
semantics yet (since we clearly still have some hard to reproduce bug in
this area), but also to make it much easier to use another approach
entirely to caching the CR0.TS bit for software accesses.

Right now we use a bit in the thread-info 'status' variable (this patch
does not change that), but we might want to make it a full field of its
own or even make it a per-cpu variable.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9221484f

i387: move TS_USEDFPU clearing out of __save_init_fpu and into callers · 0affff96

Linus Torvalds authored Feb 16, 2012

commit b6c66418 upstream.

Touching TS_USEDFPU without touching CR0.TS is confusing, so don't do
it.  By moving it into the callers, we always do the TS_USEDFPU next to
the CR0.TS accesses in the source code, and it's much easier to see how
the two go hand in hand.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0affff96

i387: fix x86-64 preemption-unsafe user stack save/restore · c3cb6440

Linus Torvalds authored Feb 16, 2012

commit 15d8791c upstream.

Commit 5b1cbac3 ("i387: make irq_fpu_usable() tests more robust")
added a sanity check to the #NM handler to verify that we never cause
the "Device Not Available" exception in kernel mode.

However, that check actually pinpointed a (fundamental) race where we do
cause that exception as part of the signal stack FPU state save/restore
code.

Because we use the floating point instructions themselves to save and
restore state directly from user mode, we cannot do that atomically with
testing the TS_USEDFPU bit: the user mode access itself may cause a page
fault, which causes a task switch, which saves and restores the FP/MMX
state from the kernel buffers.

This kind of "recursive" FP state save is fine per se, but it means that
when the signal stack save/restore gets restarted, it will now take the
'#NM' exception we originally tried to avoid.  With preemption this can
happen even without the page fault - but because of the user access, we
cannot just disable preemption around the save/restore instruction.

There are various ways to solve this, including using the
"enable/disable_page_fault()" helpers to not allow page faults at all
during the sequence, and fall back to copying things by hand without the
use of the native FP state save/restore instructions.

However, the simplest thing to do is to just allow the #NM from kernel
space, but fix the race in setting and clearing CR0.TS that this all
exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
atomic wrt scheduling, so while the actual state save/restore can be
interrupted and restarted, the act of actually clearing/setting CR0.TS
and the TS_USEDFPU bit together must not.

Instead of just adding random "preempt_disable/enable()" calls to what
is already excessively ugly code, this introduces some helper functions
that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
the user state instead.

Those helper functions should probably eventually replace the other
ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
some more: the task switching functionality in particular needs to
expose the difference between the 'prev' and 'next' threads, while the
new helper functions intentionally were written to only work with
'current'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c3cb6440

i387: fix sense of sanity check · 09ffc93a

Linus Torvalds authored Feb 15, 2012

commit c38e2345 upstream.

The check for save_init_fpu() (introduced in commit 5b1cbac3: "i387:
make irq_fpu_usable() tests more robust") was the wrong way around, but
I hadn't noticed, because my "tests" were bogus: the FPU exceptions are
disabled by default, so even doing a divide by zero never actually
triggers this code at all unless you do extra work to enable them.

So if anybody did enable them, they'd get one spurious warning.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

09ffc93a

i387: make irq_fpu_usable() tests more robust · 00717d1f

Linus Torvalds authored Feb 13, 2012

commit 5b1cbac3 upstream.

Some code - especially the crypto layer - wants to use the x86
FP/MMX/AVX register set in what may be interrupt (typically softirq)
context.

That *can* be ok, but the tests for when it was ok were somewhat
suspect.  We cannot touch the thread-specific status bits either, so
we'd better check that we're not going to try to save FP state or
anything like that.

Now, it may be that the TS bit is always cleared *before* we set the
USEDFPU bit (and only set when we had already cleared the USEDFP
before), so the TS bit test may actually have been sufficient, but it
certainly was not obviously so.

So this explicitly verifies that we will not touch the TS_USEDFPU bit,
and adds a few related sanity-checks.  Because it seems that somehow
AES-NI is corrupting user FP state.  The cause is not clear, and this
patch doesn't fix it, but while debugging it I really wanted the code to
be more obviously correct and robust.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

00717d1f

i387: math_state_restore() isn't called from asm · 454d1471

Linus Torvalds authored Feb 13, 2012

commit be98c2cd upstream.

It was marked asmlinkage for some really old and stale legacy reasons.
Fix that and the equally stale comment.

Noticed when debugging the irq_fpu_usable() bugs.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

454d1471

USB: Set hub depth after USB3 hub reset · ac29c0ae

Elric Fu authored Feb 18, 2012

commit a45aa3b3 upstream.

The superspeed device attached to a USB 3.0 hub(such as VIA's)
doesn't respond the address device command after resume. The
root cause is the superspeed hub will miss the Hub Depth value
that is used as an offset into the route string to locate the
bits it uses to determine the downstream port number after
reset, and all packets can't be routed to the device attached
to the superspeed hub.

Hub driver sends a Set Hub Depth request to the superspeed hub
except for USB 3.0 root hub when the hub is initialized and
doesn't send the request again after reset due to the resume
process. So moving the code that sends the Set Hub Depth request
to the superspeed hub from hub_configure() to hub_activate()
is to cover those situations include initialization and reset.

The patch should be backported to kernels as old as 2.6.39.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ac29c0ae

xhci: Fix encoding for HS bulk/control NAK rate. · 5652021f

Sarah Sharp authored Feb 13, 2012

commit 340a3504 upstream.

The xHCI 0.96 spec says that HS bulk and control endpoint NAK rate must
be encoded as an exponent of two number of microframes.  The endpoint
descriptor has the NAK rate encoded in number of microframes.  We were
just copying the value from the endpoint descriptor into the endpoint
context interval field, which was not correct.  This lead to the VIA
host rejecting the add of a bulk OUT endpoint from any USB 2.0 mass
storage device.

The fix is to use the correct encoding.  Refactor the code to convert
number of frames to an exponential number of microframes, and make sure
we convert the number of microframes in HS bulk and control endpoints to
an exponent.

This should be back ported to kernels as old as 2.6.31, that contain the
commit dfa49c4a "USB: xhci - fix math
in xhci_get_endpoint_interval"
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Felipe Contreras <felipe.contreras@gmail.com>
Suggested-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5652021f

xhci: Fix oops caused by more USB2 ports than USB3 ports. · afa0cb70

Sarah Sharp authored Feb 09, 2012

commit 3278a55a upstream.

The code to set the device removable bits in the USB 2.0 roothub
descriptor was accidentally looking at the USB 3.0 port registers
instead of the USB 2.0 registers.  This can cause an oops if there are
more USB 2.0 registers than USB 3.0 registers.

This should be backported to kernels as old as 2.6.39, that contain the
commit 4bbb0ace "xhci: Return a USB 3.0
hub descriptor for USB3 roothub."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

afa0cb70

USB: Fix handoff when BIOS disables host PCI device. · cce0edb3

Sarah Sharp authored Feb 07, 2012

commit cab928ee upstream.

On some systems with an Intel Panther Point xHCI host controller, the
BIOS disables the xHCI PCI device during boot, and switches the xHCI
ports over to EHCI.  This allows the BIOS to access USB devices without
having xHCI support.

The downside is that the xHCI BIOS handoff mechanism will fail because
memory mapped I/O is not enabled for the disabled PCI device.
Jesse Barnes says this is expected behavior.  The PCI core will enable
BARs before quirks run, but it will leave it in an undefined state, and
it may not have memory mapped I/O enabled.

Make the generic USB quirk handler call pci_enable_device() to re-enable
MMIO, and call pci_disable_device() once the host-specific BIOS handoff
is finished.  This will balance the ref counts in the PCI core.  When
the PCI probe function is called, usb_hcd_pci_probe() will call
pci_enable_device() again.

This should be back ported to kernels as old as 2.6.31.  That was the
first kernel with xHCI support, and no one has complained about BIOS
handoffs failing due to memory mapped I/O being disabled on other hosts
(EHCI, UHCI, or OHCI).
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: Oliver Neukum <oneukum@suse.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cce0edb3

USB: Remove duplicate USB 3.0 hub feature #defines. · 66278506

Sarah Sharp authored Jan 05, 2012

commit d9f5343e upstream.

Somehow we ended up with duplicate hub feature #defines in ch11.h.
Tatyana Brokhman first created the USB 3.0 hub feature macros in 2.6.38
with commit 0eadcc09 "usb: USB3.0 ch11
definitions".  In 2.6.39, I modified a patch from John Youn that added
similar macros in a different place in the same file, and committed
dbe79bbe "USB 3.0 Hub Changes".

Some of the #defines used different names for the same values.  Others
used exactly the same names with the same values, like these gems:

 #define USB_PORT_FEAT_BH_PORT_RESET     28
...
 #define USB_PORT_FEAT_BH_PORT_RESET            28

According to my very geeky husband (who looked it up in the C99 spec),
it is allowed to have object-like macros with duplicate names as long as
the replacement list is exactly the same.  However, he recalled that
some compilers will give warnings when they find duplicate macros.  It's
probably best to remove the duplicates in the stable tree, so that the
code compiles for everyone.

The macros are now fixed to move the feature requests that are specific
to USB 3.0 hubs into a new section (out of the USB 2.0 hub feature
section), and use the most common macro name.

This patch should be backported to 2.6.39.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Tatyana Brokhman <tlinder@codeaurora.org>
Cc: John Youn <johnyoun@synopsys.com>
Cc: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

66278506

USB: Serial: ti_usb_3410_5052: Add Abbot Diabetes Care cable id · 4ee72bce

Andrew Lunn authored Feb 20, 2012

commit 7fd25702 upstream.

This USB-serial cable with mini stereo jack enumerates as:
Bus 001 Device 004: ID 1a61:3410 Abbott Diabetes Care

It is a TI3410 inside.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4ee72bce

USB: option: cleanup zte 3g-dongle's pid in option.c · e4388320

Rui li authored Feb 14, 2012

commit b9e44fe5 upstream.

  1. Remove all old mass-storage ids's pid:
     0x0026,0x0053,0x0098,0x0099,0x0149,0x0150,0x0160;
  2. As the pid from 0x1401 to 0x1510 which have not surely assigned to
     use for serial-port or mass-storage port,so i think it should be
     removed now, and will re-add after it have assigned in future;
  3. sort the pid to WCDMA and CDMA.
Signed-off-by: Rui li <li.rui27@zte.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e4388320

USB: Added Kamstrup VID/PIDs to cp210x serial driver. · f887e1ac

Bruno Thomsen authored Feb 21, 2012

commit c6c1e449 upstream.
Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f887e1ac

ipv4: fix redirect handling · 42ab5316

Eric Dumazet authored Nov 18, 2011

[ Upstream commit 9cc20b26 ]

commit f39925db (ipv4: Cache learned redirect information in
inetpeer.) introduced a regression in ICMP redirect handling.

It assumed ipv4_dst_check() would be called because all possible routes
were attached to the inetpeer we modify in ip_rt_redirect(), but thats
not true.

commit 7cc9150e (route: fix ICMP redirect validation) tried to fix
this but solution was not complete. (It fixed only one route)

So we must lookup existing routes (including different TOS values) and
call check_peer_redir() on them.
Reported-by: Ivan Zahariev <famzah@icdsoft.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

42ab5316

route: fix ICMP redirect validation · bebee22b

Flavio Leitner authored Oct 24, 2011

[ Upstream commit 7cc9150e ]

The commit f39925db
(ipv4: Cache learned redirect information in inetpeer.)
removed some ICMP packet validations which are required by
RFC 1122, section 3.2.2.2:
...
  A Redirect message SHOULD be silently discarded if the new
  gateway address it specifies is not on the same connected
  (sub-) net through which the Redirect arrived [INTRO:2,
  Appendix A], or if the source of the Redirect is not the
  current first-hop gateway for the specified destination (see
  Section 3.3.1).
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bebee22b