- 20 Sep, 2010 40 commits
-
-
Oleg Nesterov authored
commit 897f0b3c upstream This patch just states the fact the cpusets/cpuhotplug interaction is broken and removes the deadlockable code which only pretends to work. - cpuset_lock() doesn't really work. It is needed for cpuset_cpus_allowed_locked() but we can't take this lock in try_to_wake_up()->select_fallback_rq() path. - cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take cpuset_lock() and hangs forever because CPU is already dead and thus T can't be scheduled. - cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock() which is not irq-safe, but try_to_wake_up() can be called from irq. Kill them, and change select_fallback_rq() to use cpu_possible_mask, like we currently do without CONFIG_CPUSETS. Also, with or without this patch, with or without CONFIG_CPUSETS, the callers of select_fallback_rq() can race with each other or with set_cpus_allowed() pathes. The subsequent patches try to to fix these problems. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091003.GA9123@redhat.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Oleg Nesterov authored
commit 47a70985 upstream Trivial typo fix. rq->migration_thread can be NULL after task_rq_unlock(), this is why we have "mt" which should be used instead. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100330165829.GA18284@redhat.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit 60db48ca upstream rtmutex_set_prio() is used to implement priority inheritance for futexes. When a task is deboosted it gets enqueued at the tail of its RT priority list. This is violating the POSIX scheduling semantics: rt priority list X contains two runnable tasks A and B task A runs with priority X and holds mutex M task C preempts A and is blocked on mutex M -> task A is boosted to priority of task C (Y) task A unlocks the mutex M and deboosts itself -> A is dequeued from rt priority list Y -> A is enqueued to the tail of rt priority list X task C schedules away task B runs This is wrong as task A did not schedule away and therefor violates the POSIX scheduling semantics. Enqueue the task to the head of the priority list instead. Reported-by:
Mathias Weber <mathias.weber.mw1@roche.com> Reported-by:
Carsten Emde <cbe@osadl.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra <peterz@infradead.org> Tested-by:
Carsten Emde <cbe@osadl.org> Tested-by:
Mathias Weber <mathias.weber.mw1@roche.com> LKML-Reference: <20100120171629.809074113@linutronix.de> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit 37dad3fc upstream The ability of enqueueing a task to the head of a SCHED_FIFO priority list is required to fix some violations of POSIX scheduling policy. Implement the functionality in sched_rt. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra <peterz@infradead.org> Tested-by:
Carsten Emde <cbe@osadl.org> Tested-by:
Mathias Weber <mathias.weber.mw1@roche.com> LKML-Reference: <20100120171629.772169931@linutronix.de> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit ea87bb78 upstream The ability of enqueueing a task to the head of a SCHED_FIFO priority list is required to fix some violations of POSIX scheduling policy. Extend the related functions with a "head" argument. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra <peterz@infradead.org> Tested-by:
Carsten Emde <cbe@osadl.org> Tested-by:
Mathias Weber <mathias.weber.mw1@roche.com> LKML-Reference: <20100120171629.734886007@linutronix.de> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 0970d299 upstream Thomas found that due to ttwu() changing a task's cpu without holding the rq->lock, task_rq_lock() might end up locking the wrong rq. Avoid this by serializing against TASK_WAKING. Reported-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1266241712.15770.420.camel@laptop> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 11854247 upstream We moved to migrate on wakeup, which means that sleeping tasks could still be present on offline cpus. Amend the check to only test running tasks. Reported-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit fabf318e upstream There are a number of issues: 1) TASK_WAKING vs cgroup_clone (cpusets) copy_process(): sched_fork() child->state = TASK_WAKING; /* waiting for wake_up_new_task() */ if (current->nsproxy != p->nsproxy) ns_cgroup_clone() cgroup_clone() mutex_lock(inode->i_mutex) mutex_lock(cgroup_mutex) cgroup_attach_task() ss->can_attach() ss->attach() [ -> cpuset_attach() ] cpuset_attach_task() set_cpus_allowed_ptr(); while (child->state == TASK_WAKING) cpu_relax(); will deadlock the system. 2) cgroup_clone (cpusets) vs copy_process So even if the above would work we still have: copy_process(): if (current->nsproxy != p->nsproxy) ns_cgroup_clone() cgroup_clone() mutex_lock(inode->i_mutex) mutex_lock(cgroup_mutex) cgroup_attach_task() ss->can_attach() ss->attach() [ -> cpuset_attach() ] cpuset_attach_task() set_cpus_allowed_ptr(); ... p->cpus_allowed = current->cpus_allowed over-writing the modified cpus_allowed. 3) fork() vs hotplug if we unplug the child's cpu after the sanity check when the child gets attached to the task_list but before wake_up_new_task() shit will meet with fan. Solve all these issues by moving fork cpu selection into wake_up_new_task(). Reported-by:
Serge E. Hallyn <serue@us.ibm.com> Tested-by:
Serge E. Hallyn <serue@us.ibm.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1264106190.4283.1314.camel@laptop> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 70f11205 upstream The hot-unplug kstopmachine usage does a wakeup after deactivating the cpu, hence we cannot use cpu_active() here but must rely on the good olde online. Reported-by:
Sachin Sant <sachinp@in.ibm.com> Reported-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by:
Jens Axboe <jens.axboe@oracle.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1261326987.4314.24.camel@laptop> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 88ec22d3 upstream In order to remove the cfs_rq dependency from set_task_cpu() we need to ensure the task is cfs_rq invariant for all callsites. The simple approach is to substract cfs_rq->min_vruntime from se->vruntime on dequeue, and add cfs_rq->min_vruntime on enqueue. However, this has the downside of breaking FAIR_SLEEPERS since we loose the old vruntime as we only maintain the relative position. To solve this, we observe that we only migrate runnable tasks, we do this using deactivate_task(.sleep=0) and activate_task(.wakeup=0), therefore we can restrain the min_vruntime invariance to that state. The only other case is wakeup balancing, since we want to maintain the old vruntime we cannot make it relative on dequeue, but since we don't migrate inactive tasks, we can do so right before we activate it again. This is where we need the new pre-wakeup hook, we need to call this while still holding the old rq->lock. We could fold it into ->select_task_rq(), but since that has multiple callsites and would obfuscate the locking requirements, that seems like a fudge. This leaves the fork() case, simply make sure that ->task_fork() leaves the ->vruntime in a relative state. This covers all cases where set_task_cpu() gets called, and ensures it sees a relative vruntime. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170518.191697025@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit efbbd05a upstream As will be apparent in the next patch, we need a pre wakeup hook for sched_fair task migration, hence rename the post wakeup hook and one pre wakeup. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170518.114746117@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 5da9a0fb upstream Since select_task_rq() is now responsible for guaranteeing ->cpus_allowed and cpu_active_mask, we need to verify this. select_task_rq_rt() can blindly return smp_processor_id()/task_cpu() without checking the valid masks, select_task_rq_fair() can do the same in the rare case that all SD_flags are disabled. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170517.961475466@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 38022906 upstream sched: Fix sched_exec() balancing Since we access ->cpus_allowed without holding rq->lock we need a retry loop to validate the result, this comes for near free when we merge sched_migrate_task() into sched_exec() since that already does the needed check. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170517.884743662@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 077614ee upstream There's a preemption race in the set_task_cpu() debug check in that when we get preempted after setting task->state we'd still be on the rq proper, but fail the test. Check for preempted tasks, since those are always on the RQ. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20091217121830.137155561@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Ingo Molnar authored
commit 416eb395 upstream Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170517.807938893@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit e2912009 upstream In order to clean up the set_task_cpu() rq dependencies we need to ensure it is never called on blocked tasks because such usage does not pair with consistent rq->lock usage. This puts the migration burden on ttwu(). Furthermore we need to close a race against changing ->cpus_allowed, since select_task_rq() runs with only preemption disabled. For sched_fork() this is safe because the child isn't in the tasklist yet, for wakeup we fix this by synchronizing set_cpus_allowed_ptr() against TASK_WAKING, which leaves sched_exec to be a problem This also closes a hole in (6ad4c188 sched: Fix balance vs hotplug race) where ->select_task_rq() doesn't validate the result against the sched_domain/root_domain. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170517.807938893@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 06b83b5f upstream For later convenience use TASK_WAKING for fresh tasks. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170517.732561278@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Xiaotian Feng authored
commit 9ee349ad upstream Sachin found cpu hotplug test failures on powerpc, which made the kernel hang on his POWER box. The problem is that we fail to re-activate a cpu when a hot-unplug fails. Fix this by moving the de-activation into _cpu_down after doing the initial checks. Remove the synchronize_sched() calls and rely on those implied by rebuilding the sched domains using the new mask. Reported-by:
Sachin Sant <sachinp@in.ibm.com> Signed-off-by:
Xiaotian Feng <dfeng@redhat.com> Tested-by:
Sachin Sant <sachinp@in.ibm.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170517.500272612@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit 1a551ae7 upstream read_lock(&tasklist_lock) does not protect sys_sched_get_rr_param() against a concurrent update of the policy or scheduler parameters as do_sched_scheduler() does not take the tasklist_lock. The access to task->sched_class->get_rr_interval is protected by task_rq_lock(task). Use rcu_read_lock() to protect find_task_by_vpid() and prevent the task struct from going away. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20091209100706.862897167@linutronix.de> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit 23f5d142 upstream tasklist_lock is held read locked to protect the find_task_by_vpid() call and to prevent the task going away. sched_setaffinity acquires a task struct ref and drops tasklist lock right away. The access to the cpus_allowed mask is protected by rq->lock. rcu_read_lock() provides the same protection here. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20091209100706.789059966@linutronix.de> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit 5fe85be0 upstream read_lock(&tasklist_lock) does not protect sys_sched_getscheduler and sys_sched_getparam() against a concurrent update of the policy or scheduler parameters as do_sched_setscheduler() does not take the tasklist_lock. The accessed integers can be retrieved w/o locking and are snapshots anyway. Using rcu_read_lock() to protect find_task_by_vpid() and prevent the task struct from going away is not changing the above situation. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20091209100706.753790977@linutronix.de> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Rafael J.Wysocki authored
commit 7539a3b3 upstream Alan Stern noticed that all the wakeup side (and atomic) variants of the completion APIs should be irq safe, but the newly introduced completion_done() and try_wait_for_completion() aren't. The use of the irq unsafe variants in IRQ contexts can cause crashes/hangs. Fix the problem by making them use spin_lock_irqsave() and spin_lock_irqrestore(). Reported-by:
Alan Stern <stern@rowland.harvard.edu> Signed-off-by:
Rafael J. Wysocki <rjw@sisk.pl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: pm list <linux-pm@lists.linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Chinner <david@fromorbit.com> Cc: Lachlan McIlroy <lachlan@sgi.com> LKML-Reference: <200912130007.30541.rjw@sisk.pl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Ingo Molnar authored
commit b9889ed1 upstream This build warning: kernel/sched.c: In function 'set_task_cpu': kernel/sched.c:2070: warning: unused variable 'old_rq' Made me realize that the forced2_migrations stat looks pretty pointless (and a misnomer) - remove it. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit cd29fe6f upstream Currently we try to do task placement in wake_up_new_task() after we do the load-balance pass in sched_fork(). This yields complicated semantics in that we have to deal with tasks on different RQs and the set_task_cpu() calls in copy_process() and sched_fork() Rename ->task_new() to ->task_fork() and call it from sched_fork() before the balancing, this gives the policy a clear point to place the task. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit ab19cb23 upstream Since set_task_clock() doesn't rely on rq->clock anymore we can simplyfy the mess in ttwu(). Optimize things a bit by not fiddling with the IRQ state there. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 5afcdab7 upstream set_task_cpu() should be rq invariant and only touch task state, it currently fails to do so, which opens up a few races, since not all callers hold both rq->locks. Remove the relyance on rq->clock, as any site calling set_task_cpu() should also do a remote clock update, which should ensure the observed time between these two cpus is monotonic, as per kernel/sched_clock.c:sched_clock_remote(). Therefore we can simply remove the clock_offset bits and be happy. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Hiroshi Shimamoto authored
commit 9824a2b7 upstream cpu_nr_migrations() is not used, remove it. Signed-off-by:
Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4AF12A66.6020609@ct.jp.nec.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 970b13ba upstream sched: Consolidate select_task_rq() callers Small cleanup. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> [ v2: build fix ] Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit dba091b9 upstream sched_rr_get_param calls task->sched_class->get_rr_interval(task) without protection against a concurrent sched_setscheduler() call which modifies task->sched_class. Serialize the access with task_rq_lock(task) and hand the rq pointer into get_rr_interval() as it's needed at least in the sched_fair implementation. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra <peterz@infradead.org> LKML-Reference: <alpine.LFD.2.00.0912090930120.3089@localhost.localdomain> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Thomas Gleixner authored
commit 31605683 upstream sched_getaffinity() is not protected against a concurrent modification of the tasks affinity. Serialize the access with task_rq_lock(task). Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20091208202026.769251187@linutronix.de> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Roland McGrath authored
commit eefdca04 upstream. In commit d4d67150, we reopened an old hole for a 64-bit ptracer touching a 32-bit tracee in system call entry. A %rax value set via ptrace at the entry tracing stop gets used whole as a 32-bit syscall number, while we only check the low 32 bits for validity. Fix it by truncating %rax back to 32 bits after syscall_trace_enter, in addition to testing the full 64 bits as has already been added. Reported-by:
Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by:
Roland McGrath <roland@redhat.com> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
H. Peter Anvin authored
commit c41d68a5 upstream. compat_alloc_user_space() expects the caller to independently call access_ok() to verify the returned area. A missing call could introduce problems on some architectures. This patch incorporates the access_ok() check into compat_alloc_user_space() and also adds a sanity check on the length. The existing compat_alloc_user_space() implementations are renamed arch_compat_alloc_user_space() and are used as part of the implementation of the new global function. This patch assumes NULL will cause __get_user()/__put_user() to either fail or access userspace on all architectures. This should be followed by checking the return value of compat_access_user_space() for NULL in the callers, at which time the access_ok() in the callers can also be removed. Reported-by:
Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Acked-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Chris Metcalf <cmetcalf@tilera.com> Acked-by:
David S. Miller <davem@davemloft.net> Acked-by:
Ingo Molnar <mingo@elte.hu> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
H. Peter Anvin authored
commit 36d001c7 upstream. On 64 bits, we always, by necessity, jump through the system call table via %rax. For 32-bit system calls, in theory the system call number is stored in %eax, and the code was testing %eax for a valid system call number. At one point we loaded the stored value back from the stack to enforce zero-extension, but that was removed in checkin d4d67150. An actual 32-bit process will not be able to introduce a non-zero-extended number, but it can happen via ptrace. Instead of re-introducing the zero-extension, test what we are actually going to use, i.e. %rax. This only adds a handful of REX prefixes to the code. Reported-by:
Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 55496c89 upstream. Doh, a real life genuine preemption leak.. This caused a suspend failure. Reported-bisected-and-tested-by-the-invaluable: Jeff Chua <jeff.chua.linux@gmail.com> Acked-by:
Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Nico Schottelius <nico-linux-20100709@schottelius.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Florian Pritz <flo@xssn.at> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Len Brown <lenb@kernel.org> sleep states LKML-Reference: <1284150773.402.122.camel@laptop> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Johannes Berg authored
commit 42da2f94 upstream. Wireless extensions have an unfortunate, undocumented requirement which requires drivers to always fill iwp->length when returning a successful status. When a driver doesn't do this, it leads to a kernel heap content leak when userspace offers a larger buffer than would have been necessary. Arguably, this is a driver bug, as it should, if it returns 0, fill iwp->length, even if it separately indicated that the buffer contents was not valid. However, we can also at least avoid the memory content leak if the driver doesn't do this by setting the iwp length to max_tokens, which then reflects how big the buffer is that the driver may fill, regardless of how big the userspace buffer is. To illustrate the point, this patch also fixes a corresponding cfg80211 bug (since this requirement isn't documented nor was ever pointed out by anyone during code review, I don't trust all drivers nor all cfg80211 handlers to implement it correctly). Signed-off-by:
Johannes Berg <johannes.berg@intel.com> Signed-off-by:
John W. Linville <linville@tuxdriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
John W. Linville authored
commit d8e1ba76 upstream. This avoids a NULL pointer dereference as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=625889 When the WARN condition is hit in ieee80211_get_tx_rate, it will return NULL. So, we need to check the return value and avoid dereferencing it in that case. Signed-off-by:
John W. Linville <linville@tuxdriver.com> Acked-by:
Bob Copeland <me@bobcopeland.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Christian Lamparter authored
commit f880c205 upstream. Michael reported that p54* never really entered power save mode, even tough it was enabled. It turned out that upon a power save mode change the firmware will set a special flag onto the last outgoing frame tx status (which in this case is almost always the designated PSM nullfunc frame). This flag confused the driver; It erroneously reported transmission failures to the stack, which then generated the next nullfunc. and so on... Reported-by:
Michael Buesch <mb@bu3sch.de> Tested-by:
Michael Buesch <mb@bu3sch.de> Signed-off-by:
Christian Lamparter <chunkeey@googlemail.com> Signed-off-by:
John W. Linville <linville@tuxdriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Frederic Weisbecker authored
commit 5225c458 upstream. Each histogram entry has a callchain root that stores the callchain samples. However we forgot to initialize the tracking of children hits of these roots, which then got random values on their creation. The root children hits is multiplied by the minimum percentage of hits provided by the user, and the result becomes the minimum hits expected from children branches. If the random value due to the uninitialization is big enough, then this minimum number of hits can be huge and eventually filter every children branches. The end result was invisible callchains. All we need to fix this is to initialize the children hits of the root. Reported-by:
Christoph Hellwig <hch@infradead.org> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
KAMEZAWA Hiroyuki authored
commit 0dcc48c1 upstream. next_active_pageblock() is for finding next _used_ freeblock. It skips several blocks when it finds there are a chunk of free pages lager than pageblock. But it has 2 bugs. 1. We have no lock. page_order(page) - pageblock_order can be minus. 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages. Signed-off-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Dmitry Torokhov authored
commit af045b86 upstream. We need to call platform_device_unregister(i8042_platform_device) before calling platform_driver_unregister() because i8042_remove() resets i8042_platform_device to NULL. This leaves the platform device instance behind and prevents driver reload. Fixes https://bugzilla.kernel.org/show_bug.cgi?id=16613Reported-by:
Seryodkin Victor <vvscore@gmail.com> Signed-off-by:
Dmitry Torokhov <dtor@mail.ru> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-