- 27 Sep, 2010 10 commits
-
-
David S. Miller authored
[ Upstream commit 4ce6b9e1621c187a32a47a17bf6be93b1dc4a3df ] In a similar vain to commit 17762060 ("bridge: Clear IPCB before possible entry into IP stack") Any time we call into the IP stack we have to make sure the state there is as expected by the ipv4 code. With help from Eric Dumazet and Herbert Xu. Reported-by:
Brandan Das <brandan.das@stratus.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Herbert Xu authored
[ Upstream commit 17762060 ] The bridge protocol lives dangerously by having incestuous relations with the IP stack. In this instance an abomination has been created where a bogus IPCB area from a bridged packet leads to a crash in the IP stack because it's interpreted as IP options. This patch papers over the problem by clearing the IPCB area in that particular spot. To fix this properly we'd also need to parse any IP options if present but I'm way too lazy for that. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric Dumazet authored
[ Upstream commit c5ed63d6 ] As discovered by Anton Blanchard, current code to autotune tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and sysctl_max_syn_backlog makes little sense. The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB machine in Anton's case. (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket)) is much bigger if spinlock debugging is on. Its wrong to select bigger limits in this case (where kernel structures are also bigger) bhash_size max is 65536, and we get this value even for small machines. A better ground is to use size of ehash table, this also makes code shorter and more obvious. Based on a patch from Anton, and another from David. Reported-and-tested-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David S. Miller authored
[ Upstream commit ad1af0fe ] As reported by Anton Blanchard when we use percpu_counter_read_positive() to make our orphan socket limit checks, the check can be off by up to num_cpus_online() * batch (which is 32 by default) which on a 128 cpu machine can be as large as the default orphan limit itself. Fix this by doing the full expensive sum check if the optimized check triggers. Reported-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
KOSAKI Motohiro authored
[ Upstream commit d84ba638 ] This issue come from ruby language community. Below test program hang up when only run on Linux. % uname -mrsv Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686 % ruby -rsocket -ve ' BasicSocket.do_not_reverse_lookup = true serv = TCPServer.open("127.0.0.1", 0) s1 = TCPSocket.open("127.0.0.1", serv.addr[1]) s2 = serv.accept s2.close s1.write("a") rescue p $! s1.write("a") rescue p $! Thread.new { s1.write("a") }.join' ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux] #<Errno::EPIPE: Broken pipe> [Hang Here] FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call select() internally. and tcp_poll has a bug. SUS defined 'ready for writing' of select() as following. | A descriptor shall be considered ready for writing when a call to an output | function with O_NONBLOCK clear would not block, whether or not the function | would transfer data successfully. That said, EPIPE situation is clearly one of 'ready for writing'. We don't have read-side issue because tcp_poll() already has read side shutdown care. | if (sk->sk_shutdown & RCV_SHUTDOWN) | mask |= POLLIN | POLLRDNORM | POLLRDHUP; So, Let's insert same logic in write side. - reference url http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068Signed-off-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David S. Miller authored
[ Upstream commit 628e300c ] If irda_open_tsap() fails, the irda_bind() code tries to destroy the ->ias_obj object by hand, but does so wrongly. In particular, it fails to a) release the hashbin attached to the object and b) reset the self->ias_obj pointer to NULL. Fix both problems by using irias_delete_object() and explicitly setting self->ias_obj to NULL, just as irda_release() does. Reported-by:
Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jarek Poplawski authored
[ Upstream commit 64289c8e ] The patch: "gro: fix different skb headrooms" in its part: "2) allocate a minimal skb for head of frag_list" is buggy. The copied skb has p->data set at the ip header at the moment, and skb_gro_offset is the length of ip + tcp headers. So, after the change the length of mac header is skipped. Later skb_set_mac_header() sets it into the NET_SKB_PAD area (if it's long enough) and ip header is misaligned at NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the original skb was wrongly allocated, so let's copy it as it was. bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 fixes commit: 3d3be433Reported-by:
Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by:
Jarek Poplawski <jarkao2@gmail.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Tested-by:
Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric Dumazet authored
[ Upstream commit 3d3be433 ] Packets entering GRO might have different headrooms, even for a given flow (because of implementation details in drivers, like copybreak). We cant force drivers to deliver packets with a fixed headroom. 1) fix skb_segment() skb_segment() makes the false assumption headrooms of fragments are same than the head. When CHECKSUM_PARTIAL is used, this can give csum_start errors, and crash later in skb_copy_and_csum_dev() 2) allocate a minimal skb for head of frag_list skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to allocate a fresh skb. This adds NET_SKB_PAD to a padding already provided by netdevice, depending on various things, like copybreak. Use alloc_skb() to allocate an exact padding, to reduce cache line needs: NET_SKB_PAD + NET_IP_ALIGN bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 Many thanks to Plamen Petrov, testing many debugging patches ! With help of Jarek Poplawski. Reported-by:
Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David S. Miller authored
[ Upstream commit 1bff4dbb ] Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Dan Rosenberg authored
commit a0846f18 upstream. The TIOCGICOUNT device ioctl in both mos7720.c and mos7840.c allows unprivileged users to read uninitialized stack memory, because the "reserved" member of the serial_icounter_struct struct declared on the stack is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by:
Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
- 20 Sep, 2010 30 commits
-
-
Greg Kroah-Hartman authored
-
Chris Wilson authored
commit 356ad3cd upstream. Otherwise when disabling the output we switch to the new fb (which is likely NULL) and skip the call to mode_set -- leaking driver private state on the old_fb. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29857Reported-by:
Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by:
Dave Airlie <airlied@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Chris Wilson authored
commit 032d2a0d upstream. Arguably this is a bug in drm-core in that we should not be called twice in succession with DPMS_ON, however this is still occuring and we see FDI link training failures on the second call leading to the occassional blank display. For the time being ignore the repeated call. Original patch by Dave Airlie <airlied@redhat.com> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Dan Carpenter authored
commit c877cdce upstream. copy_to_user() returns the number of bytes remaining to be copied and I'm pretty sure we want to return a negative error code here. Signed-off-by:
Dan Carpenter <error27@gmail.com> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Dan Carpenter authored
commit 9927a403 upstream. copy_to_user returns the number of bytes remaining to be copied, but we want to return a negative error code here. These are returned to userspace. Signed-off-by:
Dan Carpenter <error27@gmail.com> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Trond Myklebust authored
commit 5a67657a upstream. If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just after rpc_pipe_release calls rpc_purge_list(), but before it calls gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter will free a message without deleting it from the rpci->pipe list. We will be left with a freed object on the rpc->pipe list. Most frequent symptoms are kernel crashes in rpc.gssd system calls on the pipe in question. Reported-by:
J. Bruce Fields <bfields@redhat.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Trond Myklebust authored
commit b20d37ca upstream. Reported-by:
Ben Greear <greearb@candelatech.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Anton Vorontsov authored
commit 1d220334 upstream. The missing break statement causes wrong capacity calculation for batteries that report energy. Reported-by:
d binderman <dcb314@hotmail.com> Signed-off-by:
Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Guillem Jover authored
commit c3b327d6 upstream. All bits in the values read from registers to be used for the next write were getting overwritten, avoid doing so to not mess with the current configuration. Signed-off-by:
Guillem Jover <guillem@hadrons.org> Cc: Riku Voipio <riku.voipio@iki.fi> Signed-off-by:
Jean Delvare <khali@linux-fr.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Guillem Jover authored
commit 96f36408 upstream. The spec notes that fan0 and fan1 control mode bits are located in bits 7-6 and 5-4 respectively, but the FAN_CTRL_MODE macro was making the bits shift by 5 instead of by 4. Signed-off-by:
Guillem Jover <guillem@hadrons.org> Cc: Riku Voipio <riku.voipio@iki.fi> Signed-off-by:
Jean Delvare <khali@linux-fr.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Al Viro authored
commit 653d48b2 upstream. If a signal hits us outside of a syscall and another gets delivered when we are in sigreturn (e.g. because it had been in sa_mask for the first one and got sent to us while we'd been in the first handler), we have a chance of returning from the second handler to location one insn prior to where we ought to return. If r0 happens to contain -513 (-ERESTARTNOINTR), sigreturn will get confused into doing restart syscall song and dance. Incredible joy to debug, since it manifests as random, infrequent and very hard to reproduce double execution of instructions in userland code... The fix is simple - mark it "don't bother with restarts" in wrapper, i.e. set r8 to 0 in sys_sigreturn and sys_rt_sigreturn wrappers, suppressing the syscall restart handling on return from these guys. They can't legitimately return a restart-worthy error anyway. Testcase: #include <unistd.h> #include <signal.h> #include <stdlib.h> #include <sys/time.h> #include <errno.h> void f(int n) { __asm__ __volatile__( "ldr r0, [%0]\n" "b 1f\n" "b 2f\n" "1:b .\n" "2:\n" : : "r"(&n)); } void handler1(int sig) { } void handler2(int sig) { raise(1); } void handler3(int sig) { exit(0); } main() { struct sigaction s = {.sa_handler = handler2}; struct itimerval t1 = { .it_value = {1} }; struct itimerval t2 = { .it_value = {2} }; signal(1, handler1); sigemptyset(&s.sa_mask); sigaddset(&s.sa_mask, 1); sigaction(SIGALRM, &s, NULL); signal(SIGVTALRM, handler3); setitimer(ITIMER_REAL, &t1, NULL); setitimer(ITIMER_VIRTUAL, &t2, NULL); f(-513); /* -ERESTARTNOINTR */ write(1, "buggered\n", 9); return 1; } Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Acked-by:
Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Takashi Iwai authored
commit b08b1637 upstream. The pin NID 0x1a should be handled as well as NID 0x1b. Also added comments. Signed-off-by:
Takashi Iwai <tiwai@suse.de> Cc: David Henningsson <david.henningsson@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Takashi Iwai authored
commit 5d4abf93 upstream. Since ALC259/269 use the same parser of ALC268, the pin 0x1b was ignored as an invalid widget. Just add this NID to handle properly. This will add the missing mixer controls for some devices. Signed-off-by:
Takashi Iwai <tiwai@suse.de> Cc: David Henningsson <david.henningsson@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Anton Blanchard authored
commit fa535a77 upstream When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can call cpuacct_update_stats with values much larger than percpu_counter_batch. This means the call to percpu_counter_add will always add to the global count which is protected by a spinlock and we end up with a global spinlock in the scheduler. Based on an idea by KOSAKI Motohiro, this patch scales the batch value by cputime_one_jiffy such that we have the same batch limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot but that initialisation happened too early on PowerPC (before time_init) and it was never updated at runtime as a result of a hotplug cpu add/remove. This patch instead scales percpu_counter_batch by cputime_one_jiffy at runtime, which keeps the batch correct even after cpu hotplug operations. We cap it at INT_MAX in case of overflow. For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1 and gcc is smart enough to optimise min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch at least on x86 and PowerPC. So there is no need to add an #ifdef. On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel: CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec Tested with: wget http://ozlabs.org/~anton/junkcode/context_switch.c make context_switch for i in `seq 0 63`; do taskset -c $i ./context_switch & done vmstat 1 Signed-off-by:
Anton Blanchard <anton@samba.org> Reviewed-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by:
Balbir Singh <balbir@linux.vnet.ibm.com> Tested-by:
Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Suresh Siddha authored
commit 99bd5e2f upstream Issues in the current select_idle_sibling() logic in select_task_rq_fair() in the context of a task wake-up: a) Once we select the idle sibling, we use that domain (spanning the cpu that the task is currently woken-up and the idle sibling that we found) in our wake_affine() decisions. This domain is completely different from the domain(we are supposed to use) that spans the cpu that the task currently woken-up and the cpu where the task previously ran. b) We do select_idle_sibling() check only for the cpu that the task is currently woken-up on. If select_task_rq_fair() selects the previously run cpu for waking the task, doing a select_idle_sibling() check for that cpu also helps and we don't do this currently. c) In the scenarios where the cpu that the task is woken-up is busy but with its HT siblings are idle, we are selecting the task be woken-up on the idle HT sibling instead of a core that it previously ran and currently completely idle. i.e., we are not taking decisions based on wake_affine() but directly selecting an idle sibling that can cause an imbalance at the SMT/MC level which will be later corrected by the periodic load balancer. Fix this by first going through the load imbalance calculations using wake_affine() and once we make a decision of woken-up cpu vs previously-ran cpu, then choose a possible idle sibling for waking up the task on. Signed-off-by:
Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1270079265.7835.8.camel@sbs-t61.sc.intel.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 669c55e9 upstream Dave reported that his large SPARC machines spend lots of time in hweight64(), try and optimize some of those needless cpumask_weight() invocations (esp. with the large offstack cpumasks these are very expensive indeed). Reported-by:
David Miller <davem@davemloft.net> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Mike Galbraith authored
commit 8b911acd upstream Don't bother with selection when the current cpu is idle. Recent load balancing changes also make it no longer necessary to check wake_affine() success before returning the selected sibling, so we now always use it. Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301369.6785.36.camel@marge.simson.net> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Mike Galbraith authored
commit 50b926e4 upstream SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't enabled, leading to many cache misses on large machines as we traverse looking for an idle shared cache to wake to. Change the enabler of select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the sibling domain level. Reported-by:
Lin Ming <ming.m.lin@intel.com> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1262612696.15495.15.camel@marge.simson.net> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit fe3bcfe1 upstream Instead of only considering SD_WAKE_AFFINE | SD_PREFER_SIBLING domains also allow all SD_PREFER_SIBLING domains below a SD_WAKE_AFFINE domain to change the affinity target. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091112145610.909723612@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit a50bde51 upstream Clean up the new affine to idle sibling bits while trying to grok them. Should not have any function differences. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091112145610.832503781@chello.nl> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Daniel J Blueman authored
commit f3b577de upstream The task_group() function returns a pointer that must be protected by either RCU, the ->alloc_lock, or the cgroup lock (see the rcu_dereference_check() in task_subsys_state(), which is invoked by task_group()). The wake_affine() function currently does none of these, which means that a concurrent update would be within its rights to free the structure returned by task_group(). Because wake_affine() uses this structure only to compute load-balancing heuristics, there is no reason to acquire either of the two locks. Therefore, this commit introduces an RCU read-side critical section that starts before the first call to task_group() and ends after the last use of the "tg" pointer returned from task_group(). Thanks to Li Zefan for pointing out the need to extend the RCU read-side critical section from that proposed by the original patch. Signed-off-by:
Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit fb58bac5 upstream As Nick pointed out, and realized by myself when doing: sched: Fix balance vs hotplug race the patch: sched: for_each_domain() vs RCU is wrong, sched_domains are freed after synchronize_sched(), which means disabling preemption is enough. Reported-by:
Nick Piggin <npiggin@suse.de> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 861d034e upstream sched_fork() -- we do task placement in ->task_fork_fair() ensure we update_rq_clock() so we work with current time. We leave the vruntime in relative state, so the time delay until wake_up_new_task() doesn't matter. wake_up_new_task() -- Since task_fork_fair() left p->vruntime in relative state we can safely migrate, the activate_task() on the remote rq will call update_rq_clock() and causes the clock to be synced (enough). Tested-by:
Jack Daniel <wanders.thirst@gmail.com> Tested-by:
Philby John <pjohn@mvista.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1281002322.1923.1708.camel@laptop> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit cc87f76a upstream The cpuload calculation in calc_load_account_active() assumes rq->nr_uninterruptible will not change on an offline cpu after migrate_nr_uninterruptible(). However the recent migrate on wakeup changes broke that and would result in decrementing the offline cpu's rq->nr_uninterruptible. Fix this by accounting the nr_uninterruptible on the waking cpu. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 65cc8e48 upstream Now that we hold the rq->lock over set_task_cpu() again, we can do away with most of the TASK_WAKING checks and reduce them again to set_cpus_allowed_ptr(). Removes some conditionals from scheduling hot-paths. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
commit 0017d735 upstream Oleg noticed a few races with the TASK_WAKING usage on fork. - since TASK_WAKING is basically a spinlock, it should be IRQ safe - since we set TASK_WAKING (*) without holding rq->lock it could be there still is a rq->lock holder, thereby not actually providing full serialization. (*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING. Cure the second issue by not setting TASK_WAKING in sched_fork(), but only temporarily in wake_up_new_task() while calling select_task_rq(). Cure the first by holding rq->lock around the select_task_rq() call, this will disable IRQs, this however requires that we push down the rq->lock release into select_task_rq_fair()'s cgroup stuff. Because select_task_rq_fair() still needs to drop the rq->lock we cannot fully get rid of TASK_WAKING. Reported-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Oleg Nesterov authored
commit 9084bb82 upstream Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems with select_fallback_rq(). It can be called from any context and can't use any cpuset locks including task_lock(). It is called when the task doesn't have online cpus in ->cpus_allowed but ttwu/etc must be able to find a suitable cpu. I am not proud of this patch. Everything which needs such a fat comment can't be good even if correct. But I'd prefer to not change the locking rules in the code I hardly understand, and in any case I believe this simple change make the code much more correct compared to deadlocks we currently have. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091027.GA9155@redhat.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Oleg Nesterov authored
commit 6a1bdc1b upstream _cpu_down() changes the current task's affinity and then recovers it at the end. The problems are well known: we can't restore old_allowed if it was bound to the now-dead-cpu, and we can race with the userspace which can change cpu-affinity during unplug. _cpu_down() should not play with current->cpus_allowed at all. Instead, take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable() removes the dying cpu from cpu_online_mask. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Acked-by:
Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091023.GA9148@redhat.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Oleg Nesterov authored
commit 30da688e upstream. sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless. This can race with other CPUs updating our ->cpus_allowed, and this looks meaningless to me. The task is current and running, it must have online cpus in ->cpus_allowed, the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu, this likely means we raced with set_cpus_allowed() which was called for reason, why should sched_exec() retry and call ->select_task_rq() again? Change the code to call sched_class->select_task_rq() directly and do nothing if the returned cpu is wrong after re-checking under rq->lock. From now task_struct->cpus_allowed is always stable under TASK_WAKING, select_fallback_rq() is always called under rq-lock or the caller or the caller owns TASK_WAKING (select_task_rq). Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091019.GA9141@redhat.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Oleg Nesterov authored
commit c1804d54 upstream The previous patch preserved the retry logic, but it looks unneeded. __migrate_task() can only fail if we raced with migration after we dropped the lock, but in this case the caller of set_cpus_allowed/etc must initiate migration itself if ->on_rq == T. We already fixed p->cpus_allowed, the changes in active/online masks must be visible to racer, it should migrate the task to online cpu correctly. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091014.GA9138@redhat.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-