- 12 Mar, 2010 40 commits
-
-
Oleg Nesterov authored
zap_pid_ns_processes() uses force_sig(SIGKILL) to ensure SIGKILL will be delivered to sub-namespace inits as well. This is correct, but we are going to change force_sig_info() semantics. See http://bugzilla.kernel.org/show_bug.cgi?id=15395#c31 We can use send_sig_info(SEND_SIG_NOINFO) instead, since 614c517d ("signals: SEND_SIG_NOINFO should be considered as SI_FROMUSER()") SEND_SIG_NOINFO means "from user" and therefore send_signal() will get the correct from_ancestor_ns = T flag. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Corey Minyard authored
There is no need for linux/ipmi_smi.h to include itself. Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Bela Lubkin authored
Actually use the slave_addrs module parameter if it is specified, and make things consistent about passing zero in for the slave address for the default. Signed-off-by: Bela Lubkin <blubkin@vmware.com> Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Martin Wilck authored
In some cases kipmid can use a lot of CPU. This adds a way to tune the CPU used by kipmid to help in those cases. By setting kipmid_max_busy_us to a value between 100 and 500, it is possible to bring down kipmid CPU load to practically 0 without loosing too much ipmi throughput performance. Not setting the value, or setting the value to zero, operation is unaffected. Signed-off-by: Martin Wilck <martin.wilck@ts.fujitsu.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jiri Slaby authored
Make sure compiler won't do weird things with limits. E.g. fetching them twice may return 2 different values after writable limits are implemented. I.e. either use rlimit helpers added in 3e10e716 ("resource: add helpers for fetching rlimits") or ACCESS_ONCE if not applicable. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Veaceslav Falico authored
Remove unneeded initialization in tty_audit_fork(). It is called only via copy_signal() and is useless after the kmem_cache_zalloc() was used. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Veaceslav Falico authored
Remove unneeded initializations in thread_group_cputime_init() and in posix_cpu_timers_init_group(). They are useless after kmem_cache_zalloc() was used in copy_signal(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Veaceslav Falico authored
Kill unused functions taskstats_tgid_init() and acct_init_pacct() because we don't use them anywhere after using kmem_cache_zalloc() in copy_signal(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Veaceslav Falico authored
Use kmem_cache_zalloc() on signal creation and remove unneeded initialization lines in copy_signal(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining arch_has_single_step in <asm/ptrace.h> and implementing the user_enable_single_step and user_disable_single_step functions, which also causes the breakpoint information to be cleared on fork, which could be considered a bug fix. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't, which is consistent with all architectures using the modern ptrace code. The old code only disables the breakpoints on PTRACE_KILL, while after this patch this also happens for PTRACE_CONT and PTRACE_SYSCALL which matches the behaviour of the other architetures. I think this is a bugfixes, but please double verify this is correct. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining arch_has_single_step in <asm/ptrace.h> and implementing the user_enable_single_step and user_disable_single_step functions, which also causes the breakpoint information to be cleared on fork, which could be considered a bug fix. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. The way breakpoints are disabled is entirely inconsistent currently, I tried to make some sense of it, but I suspect all of the content of ptrace_disable should be moved into user_disable_single_step, this defintively needs some revisting as the current patch changes behaviour in not quite designed ways. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT and PTRACE_KILL. This also makes PTRACE_SINGLESTEP return -EIO while it previously succeeded despite not actually causing any kind of single stepping. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining arch_has_single_step in <asm/ptrace.h> and implementing the user_enable_single_step and user_disable_single_step functions, which also causes the breakpoint information to be cleared on fork, which could be considered a bug fix. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining arch_has_single_step in <asm/ptrace.h> and implementing the user_enable_single_step and user_disable_single_step functions, which also causes the breakpoint information to be cleared on fork, which could be considered a bug fix. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. XXX: I'm not sure arch_has_single_step() is placed in the exactly correct location, please verify in which of the ptrace headers it should really be. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT and PTRACE_KILL. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT and PTRACE_KILL. This also makes PTRACE_SINGLESTEP return -EIO while it previously succeeded despite not actually causing any kind of single stepping. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Acked-by: Michal Simek <monstr@monstr.eu> Cc: John Williams <john.williams@petalogix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. m68knommu already defines the nessecary user_enable_single_step and user_disable_single_step functions for this. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining arch_has_single_step in <asm/ptrace.h> and implementing the user_enable_single_step and user_disable_single_step functions, which also causes the breakpoint information to be cleared on fork, which could be considered a bug fix. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining arch_has_single_step in <asm/ptrace.h> and implementing the user_enable_single_step and user_disable_single_step functions, which also causes the breakpoint information to be cleared on fork, which could be considered a bug fix. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't which is consistent with all architectures using the modern ptrace code. Currently avr32 doesn't implement any code to disable single stepping when one of the non-syscall requests is called which seems wrong, but I've left it as-is for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining arch_has_single_step in <asm/ptrace.h> and implementing the user_enable_single_step and user_disable_single_step functions, which also causes the breakpoint information to be cleared on fork, which could be considered a bug fix. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't and the single stepping disable only happens if the tracee process isn't a zombie yet, which is consistent with all architectures using the modern ptrace code. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining arch_has_single_step in <asm/ptrace.h> and implementing the user_enable_single_step and user_disable_single_step functions, which also causes the breakpoint information to be cleared on fork, which could be considered a bug fix. Also the TIF_SYSCALL_TRACE thread flag is now cleared on PTRACE_KILL which it previously wasn't, which is consistent with all architectures using the modern ptrace code. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Acked-by: Matt Turner <mattst88@gmail.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
While in theory user_enable_single_step/user_disable_single_step/ user_enable_blockstep could also be provided as an inline or macro there's no good reason to do so, and having the prototype in one places keeps code size and confusion down. Roland said: The original thought there was that user_enable_single_step() et al might well be only an instruction or three on a sane machine (as if we have any of those!), and since there is only one call site inlining would be beneficial. But I agree that there is no strong reason to care about inlining it. As to the arch changes, there is only one thought I'd add to the record. It was always my thinking that for an arch where PTRACE_SINGLESTEP does text-modifying breakpoint insertion, user_enable_single_step() should not be provided. That is, arch_has_single_step()=>true means that there is an arch facility with "pure" semantics that does not have any unexpected side effects. Inserting a breakpoint might do very unexpected strange things in multi-threaded situations. Aside from that, it is a peculiar side effect that user_{enable,disable}_single_step() should cause COW de-sharing of text pages and so forth. For PTRACE_SINGLESTEP, all these peculiarities are the status quo ante for that arch, so having arch_ptrace() itself do those is one thing. But for building other things in the future, it is nicer to have a uniform "pure" semantics that arch-independent code can expect. OTOH, all such arch issues are really up to the arch maintainer. As of today, there is nothing but ptrace using user_enable_single_step() et al so it's a distinction without a practical difference. If/when there are other facilities that use user_enable_single_step() and might care, the affected arch's can revisit the question when someone cares about the quality of the arch support for said new facility. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Use ptrace_request() in the three remaining architectures that didn't use it (m68knommu, h8300, microblaze). This means: - ptrace_request now handles PTRACE_{PEEK,POKE}{TEXT,DATA} and PTRACE_DETATCH calls that were previously called directly, or in case of h8300 even open coded. - adds new support for PTRACE_SETOPTIONS/PTRACE_GETEVENTMSG/ PTRACE_GETSIGINFO/PTRACE_SETSIGINFO Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michal Simek <monstr@monstr.eu> Acked-by: Greg Ungerer <gerg@uclinux.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Miao Xie authored
we can't declarate two variable at the same scope by NODEMASK_ALLOC(). This patch fixes it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Paul Menage <menage@google.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
Nishimura-san have been working for memcg very good. His review and tests give us much improvements and account migraiton which he is now challenging is really important. He is a stakeholder. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
In current page-fault code, handle_mm_fault() -> ... -> mem_cgroup_charge() -> map page or handle error. -> check return code. If page fault's return code is VM_FAULT_OOM, page_fault_out_of_memory() is called. But if it's caused by memcg, OOM should have been already invoked. Then, I added a patch: a636b327. That patch records last_oom_jiffies for memcg's sub-hierarchy and prevents page_fault_out_of_memory from being invoked in near future. But Nishimura-san reported that check by jiffies is not enough when the system is terribly heavy. This patch changes memcg's oom logic as. * If memcg causes OOM-kill, continue to retry. * remove jiffies check which is used now. * add memcg-oom-lock which works like perzone oom lock. * If current is killed(as a process), bypass charge. Something more sophisticated can be added but this pactch does fundamental things. TODO: - add oom notifier - add permemcg disable-oom-kill flag and freezer at oom. - more chances for wake up oom waiter (when changing memory limit etc..) Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Decription of sanity check for memory thresholds. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dan Malek <dan@embeddedalley.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
An example of cgroup notification API usage. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dan Malek <dan@embeddedalley.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Events should be removed after rmdir of cgroup directory, but before destroying subsystem state objects. Let's take reference to cgroup directory dentry to do that. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dan Malek <dan@embeddedalley.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Notify userspace about cgroup removing only after rmdir of cgroup directory to avoid race between userspace and kernelspace. eventfd are used to notify about two types of event: - control file-specific, like crossing memory threshold; - cgroup removing. To understand what really happen, userspace can check if the cgroup still exists. To avoid race beetween userspace and kernelspace we have to notify userspace about cgroup removing only after rmdir of cgroup directory. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dan Malek <dan@embeddedalley.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
Presently, if panic_on_oom=2, the whole system panics even if the oom happend in some special situation (as cpuset, mempolicy....). Then, panic_on_oom=2 means painc_on_oom_always. Now, memcg doesn't check panic_on_oom flag. This patch adds a check. BTW, how it's useful ? kdump+panic_on_oom=2 is the last tool to investigate what happens in oom-ed system. When a task is killed, the sysytem recovers and there will be few hint to know what happnes. In mission critical system, oom should never happen. Then, panic_on_oom=2+kdump is useful to avoid next OOM by knowing precise information via snapshot. TODO: - For memcg, it's for isolate system's memory usage, oom-notiifer and freeze_at_oom (or rest_at_oom) should be implemented. Then, management daemon can do similar jobs (as kdump) or taking snapshot per cgroup. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Daisuke Nishimura authored
Update memcg_test.txt to describe how to test the move-charge feature. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
Memcg has 2 eventcountes which counts "the same" event. Just usages are different from each other. This patch tries to reduce event counter. Now logic uses "only increment, no reset" counter and masks for each checks. Softlimit chesk was done per 1000 evetns. So, the similar check can be done by !(new_counter & 0x3ff). Threshold check was done per 100 events. So, the similar check can be done by (!new_counter & 0x7f) ALL event checks are done right after EVENT percpu counter is updated. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
Presently, move_task does "batched" precharge. Because res_counter or css's refcnt are not-scalable jobs for memcg, try_charge_().. tend to be done in batched manner if allowed. Now, softlimit and threshold check their event counter in try_charge, but the charge is not a per-page event. And event counter is not updated at charge(). Moreover, precharge doesn't pass "page" to try_charge() and softlimit tree will be never updated until uncharge() causes an event." So the best place to check the event counter is commit_charge(). This is per-page event by its nature. This patch move checks to there. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
When per-cpu counter for memcg was implemneted, dynamic percpu allocator was not very good. But now, we have good one and useful macros. This patch replaces memcg's private percpu counter implementation with generic dynamic percpu allocator. The benefits are - We can remove private implementation. - The counters will be NUMA-aware. (Current one is not...) - This patch makes sizeof struct mem_cgroup smaller. Then, struct mem_cgroup may be fit in page size on small config. - About basic performance aspects, see below. [Before] # size mm/memcontrol.o text data bss dec hex filename 24373 2528 4132 31033 7939 mm/memcontrol.o [page-fault-throuput test on 8cpu/SMP in root cgroup] # /root/bin/perf stat -a -e page-faults,cache-misses --repeat 5 ./multi-fault-fork 8 Performance counter stats for './multi-fault-fork 8' (5 runs): 45878618 page-faults ( +- 0.110% ) 602635826 cache-misses ( +- 0.105% ) 61.005373262 seconds time elapsed ( +- 0.004% ) Then cache-miss/page fault = 13.14 [After] #size mm/memcontrol.o text data bss dec hex filename 23913 2528 4132 30573 776d mm/memcontrol.o # /root/bin/perf stat -a -e page-faults,cache-misses --repeat 5 ./multi-fault-fork 8 Performance counter stats for './multi-fault-fork 8' (5 runs): 48179400 page-faults ( +- 0.271% ) 588628407 cache-misses ( +- 0.136% ) 61.004615021 seconds time elapsed ( +- 0.004% ) Then cache-miss/page fault = 12.22 Text size is reduced. This performance improvement is not big and will be invisible in real world applications. But this result shows this patch has some good effect even on (small) SMP. Here is a test program I used. 1. fork() processes on each cpus. 2. do page fault repeatedly on each process. 3. after 60secs, kill all childredn and exit. (3 is necessary for getting stable data, this is improvement from previous one.) #define _GNU_SOURCE #include <stdio.h> #include <sched.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <signal.h> #include <stdlib.h> /* * For avoiding contention in page table lock, FAULT area is * sparse. If FAULT_LENGTH is too large for your cpus, decrease it. */ #define FAULT_LENGTH (2 * 1024 * 1024) #define PAGE_SIZE 4096 #define MAXNUM (128) void alarm_handler(int sig) { } void *worker(int cpu, int ppid) { void *start, *end; char *c; cpu_set_t set; int i; CPU_ZERO(&set); CPU_SET(cpu, &set); sched_setaffinity(0, sizeof(set), &set); start = mmap(NULL, FAULT_LENGTH, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); if (start == MAP_FAILED) { perror("mmap"); exit(1); } end = start + FAULT_LENGTH; pause(); //fprintf(stderr, "run%d", cpu); while (1) { for (c = (char*)start; (void *)c < end; c += PAGE_SIZE) *c = 0; madvise(start, FAULT_LENGTH, MADV_DONTNEED); } return NULL; } int main(int argc, char *argv[]) { int num, i, ret, pid, status; int pids[MAXNUM]; if (argc < 2) return 0; setpgid(0, 0); signal(SIGALRM, alarm_handler); num = atoi(argv[1]); pid = getpid(); for (i = 0; i < num; ++i) { ret = fork(); if (!ret) { worker(i, pid); exit(0); } pids[i] = ret; } sleep(1); kill(-pid, SIGALRM); sleep(60); for (i = 0; i < num; i++) kill(pids[i], SIGKILL); for (i = 0; i < num; i++) waitpid(pids[i], &status, 0); return 0; } Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
s/mem_cgroup_print_mem_info/mem_cgroup_print_oom_info/ Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
It allows to register multiple memory and memsw thresholds and gets notifications when it crosses. To register a threshold application need: - create an eventfd; - open memory.usage_in_bytes or memory.memsw.usage_in_bytes; - write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to cgroup.event_control. Application will be notified through eventfd when memory usage crosses threshold in any direction. It's applicable for root and non-root cgroup. It uses stats to track memory usage, simmilar to soft limits. It checks if we need to send event to userspace on every 100 page in/out. I guess it's good compromise between performance and accuracy of thresholds. [akpm@linux-foundation.org: coding-style fixes] [nishimura@mxp.nes.nec.co.jp: fix documentation merge issue] Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dan Malek <dan@embeddedalley.com> Cc: Vladislav Buzov <vbuzov@embeddedalley.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Alexander Shishkin <virtuoso@slind.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Instead of incrementing counter on each page in/out and comparing it with constant, we set counter to constant, decrement counter on each page in/out and compare it with zero. We want to make comparing as fast as possible. On many RISC systems (probably not only RISC) comparing with zero is more effective than comparing with a constant, since not every constant can be immediate operand for compare instruction. Also, I've renamed MEM_CGROUP_STAT_EVENTS to MEM_CGROUP_STAT_SOFTLIMIT, since really it's not a generic counter. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dan Malek <dan@embeddedalley.com> Cc: Vladislav Buzov <vbuzov@embeddedalley.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Alexander Shishkin <virtuoso@slind.org> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Helper to get memory or mem+swap usage of the cgroup. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dan Malek <dan@embeddedalley.com> Cc: Vladislav Buzov <vbuzov@embeddedalley.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Alexander Shishkin <virtuoso@slind.org> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-