- 02 Mar, 2011 11 commits
-
-
Jeremy Fitzhardinge authored
commit e7a3481c upstream. If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by:
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Alan Stern authored
commit 3df7169e upstream. This patch (as1417) fixes a problem affecting some (or all) nVidia chipsets. When the computer is shut down, the OHCI controllers continue to power the USB buses and evidently they drive a Reset signal out all their ports. This prevents attached devices from going to low power. Mouse LEDs stay on, for example, which is disconcerting for users and a drain on laptop batteries. The fix involves leaving each OHCI controller in the OPERATIONAL state during system shutdown rather than putting it in the RESET state. Although this nominally means the controller is running, in fact it's not doing very much since all the schedules are all disabled. However there is ongoing DMA to the Host Controller Communications Area, so the patch also disables the bus-master capability of all PCI USB controllers after the shutdown routine runs. The fix is applied only to nVidia-based PCI OHCI controllers, so it shouldn't cause problems on systems using other hardware. As an added safety measure, in case the kernel encounters one of these running controllers during boot, the patch changes quirk_usb_handoff_ohci() (which runs early on during PCI discovery) to reset the controller before anything bad can happen. Reported-by:
Pali Rohár <pali.rohar@gmail.com> Signed-off-by:
Alan Stern <stern@rowland.harvard.edu> CC: David Brownell <david-b@pacbell.net> Tested-by:
Pali Rohár <pali.rohar@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Shaohua Li authored
commit 39fe05e5 upstream. If CPU support always running local APIC timer, per-cpu hpet timer could be disabled, which is useless and wasteful in such case. Let's leave the timers to others. The effect is that we reserve less timers. Signed-off-by:
Shaohua Li <shaohua.li@intel.com> Cc: venkatesh.pallipadi@intel.com LKML-Reference: <20090812031612.GA10062@sli10-desk.sh.intel.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Apollon Oikonomopoulos authored
commit 171995e5 upstream. x25 does not decrement the network device reference counts on module unload. Thus unregistering any pre-existing interface after unloading the x25 module hangs and results in unregister_netdevice: waiting for tap0 to become free. Usage count = 1 This patch decrements the reference counts of all interfaces in x25_link_free, the way it is already done in x25_link_device_down for NETDEV_DOWN events. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David S. Miller authored
commit 57fe93b3 upstream. There is a possibility malicious users can get limited information about uninitialized stack mem array. Even if sk_run_filter() result is bound to packet length (0 .. 65535), we could imagine this can be used by hostile user. Initializing mem[] array, like Dan Rosenberg suggested in his patch is expensive since most filters dont even use this array. Its hard to make the filter validation in sk_chk_filter(), because of the jumps. This might be done later. In this patch, I use a bitmap (a single long var) so that only filters using mem[] loads/stores pay the price of added security checks. For other filters, additional cost is a single instruction. [ Since we access fentry->k a lot now, cache it in a local variable and mark filter entry pointer as const. -DaveM ] Reported-by:
Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> [Backported by dann frazier <dannf@debian.org>] Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Dan Rosenberg authored
commit 252a52aa upstream. The PKT_CTRL_CMD_STATUS device ioctl retrieves a pointer to a pktcdvd_device from the global pkt_devs array. The index into this array is provided directly by the user and is a signed integer, so the comparison to ensure that it falls within the bounds of this array will fail when provided with a negative index. This can be used to read arbitrary kernel memory or cause a crash due to an invalid pointer dereference. This can be exploited by users with permission to open /dev/pktcdvd/control (on many distributions, this is readable by group "cdrom"). Signed-off-by:
Dan Rosenberg <dan.j.rosenberg@gmail.com> [ Rather than add a cast, just make the function take the right type -Linus ] Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
dann frazier authored
commit 226291aa upstream. If ocfs2_live_connection_list is empty, ocfs2_connection_find() will return a pointer to the LIST_HEAD, cast as a ocfs2_live_connection. This can cause an oops when ocfs2_control_send_down() dereferences c->oc_conn: Call Trace: [<ffffffffa00c2a3c>] ocfs2_control_message+0x28c/0x2b0 [ocfs2_stack_user] [<ffffffffa00c2a95>] ocfs2_control_write+0x35/0xb0 [ocfs2_stack_user] [<ffffffff81143a88>] vfs_write+0xb8/0x1a0 [<ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0 [<ffffffff811442f1>] sys_write+0x51/0x80 [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b Fix by explicitly returning NULL if no match is found. Signed-off-by:
dann frazier <dann.frazier@canonical.com> Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Dan Rosenberg authored
commit 51e97a12 upstream. The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids array and attempts to ensure that only a supported hmac entry is returned. The current code fails to do this properly - if the last id in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the id integer remains set after exiting the loop, and the address of an out-of-bounds entry will be returned and subsequently used in the parent function, causing potentially ugly memory corruption. This patch resets the id integer to 0 on encountering an invalid id so that NULL will be returned after finishing the loop if no valid ids are found. Signed-off-by:
Dan Rosenberg <drosenberg@vsecurity.com> Acked-by:
Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Kashyap, Desai authored
commit bcfe42e9 upstream. There's a branch at the end of this function that is supposed to normalize the return value with what the mid-layer expects. In this one case, we get it wrong. Also increase the verbosity of the INFO level printk at the end of mptscsih_abort to include the actual return value and the scmd->serial_number. The reason being success or failure is actually determined by the state of the internal tag list when a TMF is issued, and not the return value of the TMF cmd. The serial_number is also used in this decision, thus it's useful to know for debugging purposes. Reported-by:
Peter M. Petrakis <peter.petrakis@canonical.com> Signed-off-by:
Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by:
James Bottomley <James.Bottomley@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Kashyap, Desai authored
commit 84857c8b upstream. Added missing release callback for file_operations mptctl_fops. Without release callback there will be never freed. It remains on mptctl's eent list even after the file is closed and released. Relavent RHEL bugzilla is 660871 Signed-off-by:
Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by:
James Bottomley <James.Bottomley@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Konstantin Khorenko authored
commit 3aa6e0aa upstream. If nfsd fails to find an exported via NFS file in the readahead cache, it should increment corresponding nfsdstats counter (ra_depth[10]), but due to a bug it may instead write to ra_depth[11], corrupting the following field. In a kernel with NFSDv4 compiled in the corruption takes the form of an increment of a counter of the number of NFSv4 operation 0's received; since there is no operation 0, this is harmless. In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the memory beyond nfsdstats. Signed-off-by:
Konstantin Khorenko <khorenko@openvz.org> Signed-off-by:
J. Bruce Fields <bfields@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
- 18 Feb, 2011 1 commit
-
-
Greg Kroah-Hartman authored
-
- 17 Feb, 2011 28 commits
-
-
Namhyung Kim authored
commit 571428be upstream. free_user() releases uidhash_lock but was missing annotation. Add it. This removes following sparse warnings: include/linux/spinlock.h:339:9: warning: context imbalance in 'free_user' - unexpected unlock kernel/user.c:120:6: warning: context imbalance in 'free_uid' - wrong count at exit Signed-off-by:
Namhyung Kim <namhyung@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dhaval Giani <dhaval.giani@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Dan Carpenter authored
commit 618765801ebc271fe0ba3eca99fcfd62a1f786e1 upstream. This was left over from "7c941438 sched: Remove USER_SCHED" Signed-off-by:
Dan Carpenter <error27@gmail.com> Acked-by:
Dhaval Giani <dhaval.giani@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> LKML-Reference: <20100315082148.GD18181@bicker> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
Commit: e51fd5e2 upstream Mike reports that since e9e9250b (sched: Scale down cpu_power due to RT tasks), wake_affine() goes funny on RT tasks due to them still having a !0 weight and wake_affine() still subtracts that from the rq weight. Since nobody should be using se->weight for RT tasks, set the value to zero. Also, since we now use ->cpu_power to normalize rq weights to account for RT cpu usage, add that factor into the imbalance computation. Reported-by:
Mike Galbraith <efault@gmx.de> Tested-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1275316109.27810.22969.camel@twins> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Nikhil Rao authored
Commit: d5ad140b upstream An earlier commit reverts idle balancing throttling reset to fix a 30% regression in volanomark throughput. We still need to reset idle_stamp when we pull a task in newidle balance. Reported-by:
Alex Shi <alex.shi@intel.com> Signed-off-by:
Nikhil Rao <ncrao@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290022924-3548-1-git-send-email-ncrao@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Alex Shi authored
Commit: b5482cfa upstream Commit fab47622 triggers excessive idle balancing, causing a ~30% loss in volanomark throughput. Remove idle balancing throttle reset. Originally-by:
Alex Shi <alex.shi@intel.com> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Nikhil Rao <ncrao@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1289928732.5169.211.camel@maggy.simson.net> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
Commit: 1e5a7405 upstream Instead of dealing with sched classes inside each check_preempt_curr() implementation, pull out this logic into the generic wakeup preemption path. This fixes a hang in KVM (and others) where we are waiting for the stop machine thread to run ... Reported-by:
Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by:
Marcelo Tosatti <mtosatti@redhat.com> Tested-by:
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1288891946.2039.31.camel@laptop> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Suresh Siddha authored
Commit: aae6d3dd upstream Currently we consider a sched domain to be well balanced when the imbalance is less than the domain's imablance_pct. As the number of cores and threads are increasing, current values of imbalance_pct (for example 25% for a NUMA domain) are not enough to detect imbalances like: a) On a WSM-EP system (two sockets, each having 6 cores and 12 logical threads), 24 cpu-hogging tasks get scheduled as 13 on one socket and 11 on another socket. Leading to an idle HT cpu. b) On a hypothetial 2 socket NHM-EX system (each socket having 8 cores and 16 logical threads), 16 cpu-hogging tasks can get scheduled as 9 on one socket and 7 on another socket. Leaving one core in a socket idle whereas in another socket we have a core having both its HT siblings busy. While this issue can be fixed by decreasing the domain's imbalance_pct (by making it a function of number of logical cpus in the domain), it can potentially cause more task migrations across sched groups in an overloaded case. Fix this by using imbalance_pct only during newly_idle and busy load balancing. And during idle load balancing, check if there is an imbalance in number of idle cpu's across the busiest and this sched_group or if the busiest group has more tasks than its weight that the idle cpu in this_group can pull. Reported-by:
Nikhil Rao <ncrao@google.com> Signed-off-by:
Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1284760952.2676.11.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
Commit: b2b5ce02 upstream Dima noticed that we fail to correct the ->vruntime of sleeping tasks when we move them between cgroups. Reported-by:
Dima Zavin <dima@android.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by:
Mike Galbraith <efault@gmx.de> LKML-Reference: <1287150604.29097.1513.camel@twins> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Ingo Molnar authored
Commit: b7dadc38 upstream KVM uses it for example: ERROR: "account_system_vtime" [arch/x86/kvm/kvm.ko] undefined! Cc: Venkatesh Pallipadi <venki@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-3-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: d267f87f upstream When CPU is idle and on first interrupt, irq_enter calls tick_check_idle() to notify interruption from idle. But, there is a problem if this call is done after __irq_enter, as all routines in __irq_enter may find stale time due to yet to be done tick_check_idle. Specifically, trace calls in __irq_enter when they use global clock and also account_system_vtime change in this patch as it wants to use sched_clock_cpu() to do proper irq timing. But, tick_check_idle was moved after __irq_enter intentionally to prevent problem of unneeded ksoftirqd wakeups by the commit ee5f80a9: irq: call __irq_enter() before calling the tick_idle_check Impact: avoid spurious ksoftirqd wakeups Moving tick_check_idle() before __irq_enter and wrapping it with local_bh_enable/disable would solve both the problems. Fixed-by:
Yong Zhang <yong.zhang0@gmail.com> Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-9-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: aa483808 upstream The idea was suggested by Peter Zijlstra here: http://marc.info/?l=linux-kernel&m=127476934517534&w=2 irq time is technically not available to the tasks running on the CPU. This patch removes irq time from CPU power piggybacking on sched_rt_avg_update(). Tested this by keeping CPU X busy with a network intensive task having 75% oa a single CPU irq processing (hard+soft) on a 4-way system. And start seven cycle soakers on the system. Without this change, there will be two tasks on each CPU. With this change, there is a single task on irq busy CPU X and remaining 7 tasks are spread around among other 3 CPUs. Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-8-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: 305e6835 upstream Scheduler accounts both softirq and interrupt processing times to the currently running task. This means, if the interrupt processing was for some other task in the system, then the current task ends up being penalized as it gets shorter runtime than otherwise. Change sched task accounting to acoount only actual task time from currently running task. Now update_curr(), modifies the delta_exec to depend on rq->clock_task. Note that this change only handles CONFIG_IRQ_TIME_ACCOUNTING case. We can extend this to CONFIG_VIRT_CPU_ACCOUNTING with minimal effort. But, thats for later. This change will impact scheduling behavior in interrupt heavy conditions. Tested on a 4-way system with eth0 handled by CPU 2 and a network heavy task (nc) running on CPU 3 (and no RSS/RFS). With that I have CPU 2 spending 75%+ of its time in irq processing. CPU 3 spending around 35% time running nc task. Now, if I run another CPU intensive task on CPU 2, without this change /proc/<pid>/schedstat shows 100% of time accounted to this task. With this change, it rightly shows less than 25% accounted to this task as remaining time is actually spent on irq processing. Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-7-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: e82b8e4e upstream This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it when TSC is enabled. This change just enables fine grained irq time accounting, isn't used yet. Following patches use it for different purposes. Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-6-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: b52bfee4 upstream s390/powerpc/ia64 have support for CONFIG_VIRT_CPU_ACCOUNTING which does the fine granularity accounting of user, system, hardirq, softirq times. Adding that option on archs like x86 will be challenging however, given the state of TSC reliability on various platforms and also the overhead it will add in syscall entry exit. Instead, add a lighter variant that only does finer accounting of hardirq and softirq times, providing precise irq times (instead of timer tick based samples). This accounting is added with a new config option CONFIG_IRQ_TIME_ACCOUNTING so that there won't be any overhead for users not interested in paying the perf penalty. This accounting is based on sched_clock, with the code being generic. So, other archs may find it useful as well. This patch just adds the core logic and does not enable this logic yet. Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-5-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: 6cdd5199 upstream To account softirq time cleanly in scheduler, we need to identify whether softirq is invoked in ksoftirqd context or softirq at hardirq tail context. Add PF_KSOFTIRQD for that purpose. As all PF flag bits are currently taken, create space by moving one of the infrequently used bits (PF_THREAD_BOUND) down in task_struct to be along with some other state fields. Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-4-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Dave Young authored
Commit: 637bbdc5 upstream PF_ALIGNWARN is not implemented and it is for 486 as the comment. It is not likely someone will implement this flag feature. So here remove this flag and leave the valuable 0x00000001 for future use. Signed-off-by:
Dave Young <hidave.darkstar@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20100913121903.GB22238@darkstar> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: e1e10a26 upstream Just a minor cleanup patch that makes things easier to the following patches. No functionality change in this patch. Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-3-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: 75e1056f upstream Peter Zijlstra found a bug in the way softirq time is accounted in VIRT_CPU_ACCOUNTING on this thread: http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html The problem is, softirq processing uses local_bh_disable internally. There is no way, later in the flow, to differentiate between whether softirq is being processed or is it just that bh has been disabled. So, a hardirq when bh is disabled results in time being wrongly accounted as softirq. Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING as well. As account_system_time() in normal tick based accouting also uses softirq_count, which will be set even when not in softirq with bh disabled. Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq processing. The patch below does that and adds API in_serving_softirq() which returns whether we are currently processing softirq or not. Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c to in_serving_softirq. Looks like many usages of in_softirq really want in_serving_softirq. Those changes can be made individually on a case by case basis. Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Nikhil Rao authored
Commit: 75dd321d upstream When SD_PREFER_SIBLING is set on a sched domain, drop group_capacity to 1 only if the local group has extra capacity. The extra check prevents the case where you always pull from the heaviest group when it is already under-utilized (possible with a large weight task outweighs the tasks on the system). For example, consider a 16-cpu quad-core quad-socket machine with MC and NUMA scheduling domains. Let's say we spawn 15 nice0 tasks and one nice-15 task, and each task is running on one core. In this case, we observe the following events when balancing at the NUMA domain: - find_busiest_group() will always pick the sched group containing the niced task to be the busiest group. - find_busiest_queue() will then always pick one of the cpus running the nice0 task (never picks the cpu with the nice -15 task since weighted_cpuload > imbalance). - The load balancer fails to migrate the task since it is the running task and increments sd->nr_balance_failed. - It repeats the above steps a few more times until sd->nr_balance_failed > 5, at which point it kicks off the active load balancer, wakes up the migration thread and kicks the nice 0 task off the cpu. The load balancer doesn't stop until we kick out all nice 0 tasks from the sched group, leaving you with 3 idle cpus and one cpu running the nice -15 task. When balancing at the NUMA domain, we drop sgs.group_capacity to 1 if the child domain (in this case MC) has SD_PREFER_SIBLING set. Subsequent load checks are not relevant because the niced task has a very large weight. In this patch, we add an extra condition to the "if(prefer_sibling)" check in update_sd_lb_stats(). We drop the capacity of a group only if the local group has extra capacity, ie. nr_running < group_capacity. This patch preserves the original intent of the prefer_siblings check (to spread tasks across the system in low utilization scenarios) and fixes the case above. It helps in the following ways: - In low utilization cases (where nr_tasks << nr_cpus), we still drop group_capacity down to 1 if we prefer siblings. - On very busy systems (where nr_tasks >> nr_cpus), sgs.nr_running will most likely be > sgs.group_capacity. - When balancing large weight tasks, if the local group does not have extra capacity, we do not pick the group with the niced task as the busiest group. This prevents failed balances, active migration and the under-utilization described above. Signed-off-by:
Nikhil Rao <ncrao@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1287173550-30365-5-git-send-email-ncrao@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Nikhil Rao authored
Commit: fab47622 upstream This patch forces a load balance on a newly idle cpu when the local group has extra capacity and the busiest group does not have any. It improves system utilization when balancing tasks with a large weight differential. Under certain situations, such as a niced down task (i.e. nice = -15) in the presence of nr_cpus NICE0 tasks, the niced task lands on a sched group and kicks away other tasks because of its large weight. This leads to sub-optimal utilization of the machine. Even though the sched group has capacity, it does not pull tasks because sds.this_load >> sds.max_load, and f_b_g() returns NULL. With this patch, if the local group has extra capacity, we shortcut the checks in f_b_g() and try to pull a task over. A sched group has extra capacity if the group capacity is greater than the number of running tasks in that group. Thanks to Mike Galbraith for discussions leading to this patch and for the insight to reuse SD_NEWIDLE_BALANCE. Signed-off-by:
Nikhil Rao <ncrao@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1287173550-30365-4-git-send-email-ncrao@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Nikhil Rao authored
Commit: 2582f0eb upstream When cycling through sched groups to determine the busiest group, set group_imb only if the busiest cpu has more than 1 runnable task. This patch fixes the case where two cpus in a group have one runnable task each, but there is a large weight differential between these two tasks. The load balancer is unable to migrate any task from this group, and hence do not consider this group to be imbalanced. Signed-off-by:
Nikhil Rao <ncrao@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286996978-7007-3-git-send-email-ncrao@google.com> [ small code readability edits ] Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Nikhil Rao authored
Commit: ef8002f6 upstream This patch adds a check in task_hot to return if the task has SCHED_IDLE policy. SCHED_IDLE tasks have very low weight, and when run with regular workloads, are typically scheduled many milliseconds apart. There is no need to consider these tasks hot for load balancing. Signed-off-by:
Nikhil Rao <ncrao@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1287173550-30365-2-git-send-email-ncrao@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Peter Zijlstra authored
Commit: 6506cf6c upstream This addresses the following RCU lockdep splat: [0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03 [0.052999] lockdep: fixing up alternatives. [0.054105] [0.054106] =================================================== [0.054999] [ INFO: suspicious rcu_dereference_check() usage. ] [0.054999] --------------------------------------------------- [0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection! [0.054999] [0.054999] other info that might help us debug this: [0.054999] [0.054999] [0.054999] rcu_scheduler_active = 1, debug_locks = 1 [0.054999] 3 locks held by swapper/1: [0.054999] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a [0.054999] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51 [0.054999] #2: (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113 [0.054999] [0.054999] stack backtrace: [0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1 [0.054999] Call Trace: [0.054999] [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3 [0.054999] [<ffffffff810325c3>] task_group+0x7b/0x8a [0.054999] [<ffffffff810325e5>] set_task_rq+0x13/0x40 [0.054999] [<ffffffff814be39a>] init_idle+0xd2/0x113 [0.054999] [<ffffffff814be78a>] fork_idle+0xb8/0xc7 [0.054999] [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b [0.054999] [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b [0.054999] [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724 [0.054999] [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b [0.054999] [<ffffffff814be876>] _cpu_up+0xac/0x127 [0.054999] [<ffffffff814be946>] cpu_up+0x55/0x6a [0.054999] [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff [0.054999] [<ffffffff81003854>] kernel_thread_helper+0x4/0x10 [0.054999] [<ffffffff814c353c>] ? restore_args+0x0/0x30 [0.054999] [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff [0.054999] [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10 [0.056074] Booting Node 0, Processors #1lockdep: fixing up alternatives. [0.130045] #2lockdep: fixing up alternatives. [0.203089] #3 Ok. [0.275286] Brought up 4 CPUs [0.276005] Total of 4 processors activated (16017.17 BogoMIPS). The cgroup_subsys_state structures referenced by idle tasks are never freed, because the idle tasks should be part of the root cgroup, which is not removable. The problem is that while we do in-fact hold rq->lock, the newly spawned idle thread's cpu is not yet set to the correct cpu so the lockdep check in task_group(): lockdep_is_held(&task_rq(p)->lock) will fail. But this is a chicken and egg problem. Setting the CPU's runqueue requires that the CPU's runqueue already be set. ;-) So insert an RCU read-side critical section to avoid the complaint. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Paul E. McKenney authored
Commit: b0a0f667 upstream > =================================================== > [ INFO: suspicious rcu_dereference_check() usage. ] > --------------------------------------------------- > /home/greearb/git/linux.wireless-testing/kernel/sched.c:618 invoked rcu_dereference_check() without protection! > > other info that might help us debug this: > > rcu_scheduler_active = 1, debug_locks = 1 > 1 lock held by ifup/23517: > #0: (&rq->lock){-.-.-.}, at: [<c042f782>] task_fork_fair+0x3b/0x108 > > stack backtrace: > Pid: 23517, comm: ifup Not tainted 2.6.36-rc6-wl+ #5 > Call Trace: > [<c075e219>] ? printk+0xf/0x16 > [<c0455842>] lockdep_rcu_dereference+0x74/0x7d > [<c0426854>] task_group+0x6d/0x79 > [<c042686e>] set_task_rq+0xe/0x57 > [<c042f79e>] task_fork_fair+0x57/0x108 > [<c042e965>] sched_fork+0x82/0xf9 > [<c04334b3>] copy_process+0x569/0xe8e > [<c0433ef0>] do_fork+0x118/0x262 > [<c076302f>] ? do_page_fault+0x16a/0x2cf > [<c044b80c>] ? up_read+0x16/0x2a > [<c04085ae>] sys_clone+0x1b/0x20 > [<c04030a5>] ptregs_clone+0x15/0x30 > [<c0402f1c>] ? sysenter_do_call+0x12/0x38 Here a newly created task is having its runqueue assigned. The new task is not yet on the tasklist, so cannot go away. This is therefore a false positive, suppress with an RCU read-side critical section. Reported-by: Ben Greear <greearb@candelatech.com Signed-off-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Ben Greear <greearb@candelatech.com Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
stable-bot for Steven Rostedt authored
From:: Steven Rostedt <srostedt@redhat.com> Commit: b3bc211c upstream If a high priority task is waking up on a CPU that is running a lower priority task that is bound to a CPU, see if we can move the high RT task to another CPU first. Note, if all other CPUs are running higher priority tasks than the CPU bounded current task, then it will be preempted regardless. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Gregory Haskins <ghaskins@novell.com> LKML-Reference: <20100921024138.888922071@goodmis.org> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Steven Rostedt authored
Commit: 43fa5460 upstream When first working on the RT scheduler design, we concentrated on keeping all CPUs running RT tasks instead of having multiple RT tasks on a single CPU waiting for the migration thread to move them. Instead we take a more proactive stance and push or pull RT tasks from one CPU to another on wakeup or scheduling. When an RT task wakes up on a CPU that is running another RT task, instead of preempting it and killing the cache of the running RT task, we look to see if we can migrate the RT task that is waking up, even if the RT task waking up is of higher priority. This may sound a bit odd, but RT tasks should be limited in migration by the user anyway. But in practice, people do not do this, which causes high prio RT tasks to bounce around the CPUs. This becomes even worse when we have priority inheritance, because a high prio task can block on a lower prio task and boost its priority. When the lower prio task wakes up the high prio task, if it happens to be on the same CPU it will migrate off of it. But in reality, the above does not happen much either, because the wake up of the lower prio task, which has already been boosted, if it was on the same CPU as the higher prio task, it would then migrate off of it. But anyway, we do not want to migrate them either. To examine the scheduling, I created a test program and examined it under kernelshark. The test program created CPU * 2 threads, where each thread had a different priority. The program takes different options. The options used in this change log was to have priority inheritance mutexes or not. All threads did the following loop: static void grab_lock(long id, int iter, int l) { ftrace_write("thread %ld iter %d, taking lock %d\n", id, iter, l); pthread_mutex_lock(&locks[l]); ftrace_write("thread %ld iter %d, took lock %d\n", id, iter, l); busy_loop(nr_tasks - id); ftrace_write("thread %ld iter %d, unlock lock %d\n", id, iter, l); pthread_mutex_unlock(&locks[l]); } void *start_task(void *id) { [...] while (!done) { for (l = 0; l < nr_locks; l++) { grab_lock(id, i, l); ftrace_write("thread %ld iter %d sleeping\n", id, i); ms_sleep(id); } i++; } [...] } The busy_loop(ms) keeps the CPU spinning for ms milliseconds. The ms_sleep(ms) sleeps for ms milliseconds. The ftrace_write() writes to the ftrace buffer to help analyze via ftrace. The higher the id, the higher the prio, the shorter it does the busy loop, but the longer it spins. This is usually the case with RT tasks, the lower priority tasks usually run longer than higher priority tasks. At the end of the test, it records the number of loops each thread took, as well as the number of voluntary preemptions, non-voluntary preemptions, and number of migrations each thread took, taking the information from /proc/$$/sched and /proc/$$/status. Running this on a 4 CPU processor, the results without changes to the kernel looked like this: Task vol nonvol migrated iterations ---- --- ------ -------- ---------- 0: 53 3220 1470 98 1: 562 773 724 98 2: 752 933 1375 98 3: 749 39 697 98 4: 758 5 515 98 5: 764 2 679 99 6: 761 2 535 99 7: 757 3 346 99 total: 5156 4977 6341 787 Each thread regardless of priority migrated a few hundred times. The higher priority tasks, were a little better but still took quite an impact. By letting higher priority tasks bump the lower prio task from the CPU, things changed a bit: Task vol nonvol migrated iterations ---- --- ------ -------- ---------- 0: 37 2835 1937 98 1: 666 1821 1865 98 2: 654 1003 1385 98 3: 664 635 973 99 4: 698 197 352 99 5: 703 101 159 99 6: 708 1 75 99 7: 713 1 2 99 total: 4843 6594 6748 789 The total # of migrations did not change (several runs showed the difference all within the noise). But we now see a dramatic improvement to the higher priority tasks. (kernelshark showed that the watchdog timer bumped the highest priority task to give it the 2 count. This was actually consistent with every run). Notice that the # of iterations did not change either. The above was with priority inheritance mutexes. That is, when the higher prority task blocked on a lower priority task, the lower priority task would inherit the higher priority task (which shows why task 6 was bumped so many times). When not using priority inheritance mutexes, the current kernel shows this: Task vol nonvol migrated iterations ---- --- ------ -------- ---------- 0: 56 3101 1892 95 1: 594 713 937 95 2: 625 188 618 95 3: 628 4 491 96 4: 640 7 468 96 5: 631 2 501 96 6: 641 1 466 96 7: 643 2 497 96 total: 4458 4018 5870 765 Not much changed with or without priority inheritance mutexes. But if we let the high priority task bump lower priority tasks on wakeup we see: Task vol nonvol migrated iterations ---- --- ------ -------- ---------- 0: 115 3439 2782 98 1: 633 1354 1583 99 2: 652 919 1218 99 3: 645 713 934 99 4: 690 3 3 99 5: 694 1 4 99 6: 720 3 4 99 7: 747 0 1 100 Which shows a even bigger change. The big difference between task 3 and task 4 is because we have only 4 CPUs on the machine, causing the 4 highest prio tasks to always have preference. Although I did not measure cache misses, and I'm sure there would be little to measure since the test was not data intensive, I could imagine large improvements for higher priority tasks when dealing with lower priority tasks. Thus, I'm satisfied with making the change and agreeing with what Gregory Haskins argued a few years ago when we first had this discussion. One final note. All tasks in the above tests were RT tasks. Any RT task will always preempt a non RT task that is running on the CPU the RT task wants to run on. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Gregory Haskins <ghaskins@novell.com> LKML-Reference: <20100921024138.605460343@goodmis.org> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Venkatesh Pallipadi authored
Commit: 58b26c4c upstream scheduler uses cache_nice_tries as an indicator to do cache_hot and active load balance, when normal load balance fails. Currently, this value is changed on any failed load balance attempt. That ends up being not so nice to workloads that enter/exit idle often, as they do more frequent new_idle balance and that pretty soon results in cache hot tasks being pulled in. Making the cache_nice_tries ignore failed new_idle balance seems to make better sense. With that only the failed load balance in periodic load balance gets accounted and the rate of accumulation of cache_nice_tries will not depend on idle entry/exit (short running sleep-wakeup kind of tasks). This reduces movement of cache_hot tasks. schedstat diff (after-before) excerpt from a workload that has frequent and short wakeup-idle pattern (:2 in cpu col below refers to NEWIDLE idx) This snapshot was across ~400 seconds. Without this change: domainstats: domain0 cpu cnt bln fld imb gain hgain nobusyq nobusyg 0:2 306487 219575 73167 110069413 44583 19070 1172 218403 1:2 292139 194853 81421 120893383 50745 21902 1259 193594 2:2 283166 174607 91359 129699642 54931 23688 1287 173320 3:2 273998 161788 93991 132757146 57122 24351 1366 160422 4:2 289851 215692 62190 83398383 36377 13680 851 214841 5:2 316312 222146 77605 117582154 49948 20281 988 221158 6:2 297172 195596 83623 122133390 52801 21301 929 194667 7:2 283391 178078 86378 126622761 55122 22239 928 177150 8:2 297655 210359 72995 110246694 45798 19777 1125 209234 9:2 297357 202011 79363 119753474 50953 22088 1089 200922 10:2 278797 178703 83180 122514385 52969 22726 1128 177575 11:2 272661 167669 86978 127342327 55857 24342 1195 166474 12:2 293039 204031 73211 110282059 47285 19651 948 203083 13:2 289502 196762 76803 114712942 49339 20547 1016 195746 14:2 264446 169609 78292 115715605 50459 21017 982 168627 15:2 260968 163660 80142 116811793 51483 21281 1064 162596 With this change: domainstats: domain0 cpu cnt bln fld imb gain hgain nobusyq nobusyg 0:2 272347 187380 77455 105420270 24975 1 953 186427 1:2 267276 172360 86234 116242264 28087 6 1028 171332 2:2 259769 156777 93281 123243134 30555 1 1043 155734 3:2 250870 143129 97627 127370868 32026 6 1188 141941 4:2 248422 177116 64096 78261112 22202 2 757 176359 5:2 275595 180683 84950 116075022 29400 6 778 179905 6:2 262418 162609 88944 119256898 31056 4 817 161792 7:2 252204 147946 92646 122388300 32879 4 824 147122 8:2 262335 172239 81631 110477214 26599 4 864 171375 9:2 261563 164775 88016 117203621 28331 3 849 163926 10:2 243389 140949 93379 121353071 29585 2 909 140040 11:2 242795 134651 98310 124768957 30895 2 1016 133635 12:2 255234 166622 79843 104696912 26483 4 746 165876 13:2 244944 151595 83855 109808099 27787 3 801 150794 14:2 241301 140982 89935 116954383 30403 6 845 140137 15:2 232271 128564 92821 119185207 31207 4 1416 127148 Signed-off-by:
Venkatesh Pallipadi <venki@google.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1284167957-3675-1-git-send-email-venki@google.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Suresh Siddha authored
Commit: da2b71ed upstream Currently sched_avg_update() (which updates rt_avg stats in the rq) is getting called from scale_rt_power() (in the load balance context) which doesn't take rq->lock. Fix it by moving the sched_avg_update() to more appropriate update_cpu_load() where the CFS load gets updated as well. Signed-off-by:
Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282596171.2694.3.camel@sbsiddha-MOBL3> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-