- 18 Feb, 2015 11 commits
-
-
Yan, Zheng authored
When the LBR call stack is enabled, it is necessary to save/restore the LBR stack on context switch. We can use pmu specific data to store LBR stack when task is scheduled out. This patch adds code that allocates the pmu specific data. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-8-git-send-email-kan.liang@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Yan, Zheng authored
If two tasks were both forked from the same parent task, Events in their perf task contexts can be the same. Perf core may leave out switching the perf event contexts. Previous patch inroduces pmu specific data. The data is for saving the LBR stack, it is task specific. So we need to switch the data even when context switch is optimized out. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-7-git-send-email-kan.liang@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Yan, Zheng authored
Introduce a new flag PERF_ATTACH_TASK_DATA for perf event's attach stata. The flag is set by PMU's event_init() callback, it indicates that perf event needs PMU specific data. The PMU specific data are initialized to zeros. Later patches will use PMU specific data to save LBR stack. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-6-git-send-email-kan.liang@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Yan, Zheng authored
Haswell has a new feature that utilizes the existing LBR facility to record call chains. To enable this feature, bits (JCC, NEAR_IND_JMP, NEAR_REL_JMP, FAR_BRANCH, EN_CALLSTACK) in LBR_SELECT must be set to 1, bits (NEAR_REL_CALL, NEAR-IND_CALL, NEAR_RET) must be cleared. Due to a hardware bug of Haswell, this feature doesn't work well with FREEZE_LBRS_ON_PMI. When the call stack feature is enabled, the LBR stack will capture unfiltered call data normally, but as return instructions are executed, the last captured branch record is flushed from the on-chip registers in a last-in first-out (LIFO) manner. Thus, branch information relative to leaf functions will not be captured, while preserving the call stack information of the main line execution path. This patch defines a separate lbr_sel map for Haswell. The map contains a new entry for the call stack feature. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-5-git-send-email-kan.liang@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Yan, Zheng authored
Previous commit introduces context switch callback, its function overlaps with the flush branch stack callback. So we can use the context switch callback to flush LBR stack. This patch adds code that uses the flush branch callback to flush the LBR stack when task is being scheduled in. The callback is enabled only when there are events use the LBR hardware. This patch also removes all old flush branch stack code. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-4-git-send-email-kan.liang@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Yan, Zheng authored
The callback is invoked when process is scheduled in or out. It provides mechanism for later patches to save/store the LBR stack. For the schedule in case, the callback is invoked at the same place that flush branch stack callback is invoked. So it also can replace the flush branch stack callback. To avoid unnecessary overhead, the callback is enabled only when there are events use the LBR stack. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-3-git-send-email-kan.liang@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Yan, Zheng authored
The index of lbr_sel_map is bit value of perf branch_sample_type. PERF_SAMPLE_BRANCH_MAX is 1024 at present, so each lbr_sel_map uses 4096 bytes. By using bit shift as index, we can reduce lbr_sel_map size to 40 bytes. This patch defines 'bit shift' for branch types, and use 'bit shift' to define lbr_sel_maps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jolsa@redhat.com Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1415156173-10035-2-git-send-email-kan.liang@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Aravind Gopalakrishnan authored
The caller of force_ibs_eilvt_setup() is ibs_eilvt_setup() which does not care about the return values. So mark it void and clean up the return statements. Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <hpa@zytor.com> Cc: <paulus@samba.org> Cc: <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422037175-20957-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Shaohua Li authored
For hardware events, the userspace page of the event gets updated in context switches, so if we read the timestamp in the page, we get fresh info. For software events, this is missing currently. This patch makes the behavior consistent. With this patch, we can implement clock_gettime(THREAD_CPUTIME) with PERF_COUNT_SW_DUMMY in userspace as suggested by Andy and Peter. Code like this: if (pc->cap_user_time) { do { seq = pc->lock; barrier(); running = pc->time_running; cyc = rdtsc(); time_mult = pc->time_mult; time_shift = pc->time_shift; time_offset = pc->time_offset; barrier(); } while (pc->lock != seq); quot = (cyc >> time_shift); rem = cyc & ((1 << time_shift) - 1); delta = time_offset + quot * time_mult + ((rem * time_mult) >> time_shift); running += delta; return running; } I tried it on a busy system, the userspace page updating doesn't have noticeable overhead. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/aa2dd2e4f1e9f2225758be5ba00f14d6909a8ce1.1423180257.git.shli@fb.com [ Improved the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Shaohua Li authored
Update the shadow timestamp before start event, because .add might use the timestamp. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Link: http://lkml.kernel.org/r/9cd0276d6a047cb7c2885994f25e3a1f7c8c28af.1423180257.git.shli@fb.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Markus Elfring authored
The pci_dev_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/54D0B59C.2060106@users.sourceforge.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 04 Feb, 2015 16 commits
-
-
Andy Lutomirski authored
While perfmon2 is a sufficiently evil library (it pokes MSRs directly) that breaking it is fair game, it's still useful, so we might as well try to support it. This allows users to write 2 to /sys/devices/cpu/rdpmc to disable all rdpmc protection so that hack like perfmon2 can continue to work. At some point, if perf_event becomes fast enough to replace perfmon2, then this can go. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/caac3c1c707dcca48ecbc35f4def21495856f479.1414190806.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
We currently allow any process to use rdpmc. This significantly weakens the protection offered by PR_TSC_DISABLED, and it could be helpful to users attempting to exploit timing attacks. Since we can't enable access to individual counters, use a very coarse heuristic to limit access to rdpmc: allow access only when a perf_event is mmapped. This protects seccomp sandboxes. There is plenty of room to further tighen these restrictions. For example, this allows rdpmc for any x86_pmu event, but it's only useful for self-monitoring tasks. As a side effect, cap_user_rdpmc will now be false for AMD uncore events. This isn't a real regression, since .event_idx is disabled for these events anyway for the time being. Whenever that gets re-added, the cap_user_rdpmc code can be adjusted or refactored accordingly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/a2bdb3cf3a1d70c26980d7c6dddfbaa69f3182bf.1414190806.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0fea9a7fac3c1eea86cb0a5954184e74f4213666.1414190806.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/266afcba1d1f91ea5501e4e16e94bbbc1a9339b6.1414190806.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
The code is correct, but only for a rather subtle reason. This confused me for quite a while when I read switch_mm, so clarify the code to avoid confusing other people, too. TBH, I wouldn't be surprised if this code was only correct by accident. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0db86397f968996fb772c443c251415b0b430ddd.1414190806.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Mark Rutland authored
Currently the adjusments made as part of perf_event_task_tick() use the percpu rotation lists to iterate over any active PMU contexts, but these are not used by the context rotation code, having been replaced by separate (per-context) hrtimer callbacks. However, some manipulation of the rotation lists (i.e. removal of contexts) has remained in perf_rotate_context(). This leads to the following issues: * Contexts are not always removed from the rotation lists. Removal of PMUs which have been placed in rotation lists, but have not been removed by a hrtimer callback can result in corruption of the rotation lists (when memory backing the context is freed). This has been observed to result in hangs when PMU drivers built as modules are inserted and removed around the creation of events for said PMUs. * Contexts which do not require rotation may be removed from the rotation lists as a result of a hrtimer, and will not be considered by the unthrottling code in perf_event_task_tick. This patch fixes the issue by updating the rotation ist when events are scheduled in/out, ensuring that each rotation list stays in sync with the HW state. As each event holds a refcount on the module of its PMU, this ensures that when a PMU module is unloaded none of its CPU contexts can be in a rotation list. By maintaining a list of perf_event_contexts rather than perf_event_cpu_contexts, we don't need separate paths to handle the cpu and task contexts, which also makes the code a little simpler. As the rotation_list variables are not used for rotation, these are renamed to active_ctx_list, which better matches their current function. perf_pmu_rotate_{start,stop} are renamed to perf_pmu_ctx_{activate,deactivate}. Reported-by: Johannes Jensen <johannes.jensen@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134511.GR17721@leverpostejSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Mark Rutland authored
When initialising an event, perf_init_event will call try_module_get() to ensure that the PMU's module cannot be removed for the lifetime of the event, with __free_event() dropping the reference when the event is finally destroyed. If something fails after the event has been initialised, but before the event is installed, perf_event_alloc will drop the reference on the module. However, if we fail to initialise an event for some reason (e.g. we ask an uncore PMU to perform sampling, and it refuses to initialise the event), we do not drop the refcount. If we try to open such a bogus event without a precise IDR type, we will loop over each PMU in the pmus list, incrementing each of their refcounts without decrementing them. This patch adds a module_put when pmu->event_init(event) fails, ensuring that the refcounts are balanced in failure cases. As the innards of the precise and search based initialisation look very similar, this logic is hoisted out into a new helper function. While the early return for the failed try_module_get is removed from the search case, this is handled by the remaining return when ret is not -ENOENT. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420642611-22667-1-git-send-email-mark.rutland@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Jiri Olsa authored
Currently we flag available data (via poll syscall) on perf fd with POLL_IN macro, which is normally used for SIGIO interface. We've been lucky, because POLLIN (0x1) is subset of POLL_IN (0x20001) and sys_poll (do_pollfd function) cut the extra bit out (0x20000). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422467678-22341-1-git-send-email-jolsa@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
So what I suspect; but I'm in zombie mode today it seems; is that while I initially thought that it was impossible for ctx to change when refcount dropped to 0, I now suspect its possible. Note that until perf_remove_from_context() the event is still active and visible on the lists. So a concurrent sys_perf_event_open() from another task into this task can race. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: mark.rutland@arm.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134434.GB26304@twins.programming.kicks-ass.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra (Intel) authored
Jiri reported triggering the new WARN_ON_ONCE in event_sched_out over the weekend: event_sched_out.isra.79+0x2b9/0x2d0 group_sched_out+0x69/0xc0 ctx_sched_out+0x106/0x130 task_ctx_sched_out+0x37/0x70 __perf_install_in_context+0x70/0x1a0 remote_function+0x48/0x60 generic_exec_single+0x15b/0x1d0 smp_call_function_single+0x67/0xa0 task_function_call+0x53/0x80 perf_install_in_context+0x8b/0x110 I think the below should cure this; if we install a group leader it will iterate the (still intact) group list and find its siblings and try and install those too -- even though those still have the old event->ctx -- in the new ctx. Upon installing the first group sibling we'd try and schedule out the group and trigger the above warn. Fix this by installing the group leader last, installing siblings would have no effect, they're not reachable through the group lists and therefore we don't schedule them. Also delay resetting the state until we're absolutely sure the events are quiescent. Reported-by: Jiri Olsa <jolsa@redhat.com> Reported-by: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150126162639.GA21418@twins.programming.kicks-ass.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
There have been a few reported issues wrt. the lack of locking around changing event->ctx. This patch tries to address those. It avoids the whole rwsem thing; and while it appears to work, please give it some thought in review. What I did fail at is sensible runtime checks on the use of event->ctx, the RCU use makes it very hard. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Add a few WARN()s to catch things that should never happen. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.150481799@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 03 Feb, 2015 3 commits
-
-
Ingo Molnar authored
Merge tag 'pr-20150201-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull "x86: Entry cleanups and a bugfix for 3.20" from Andy Lutomirski: " This fixes a bug in the RCU code I added in ist_enter. It also includes the sysret stuff discussed here: http://lkml.kernel.org/g/cover.1421453410.git.luto%40amacapital.net " Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Merge tag 'pr-20150201-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull VDSO fix fro Andy Lutomirski: "x86, vdso: One trivial last-minute VDSO build improvement Andrey noticed that the VDSO build wasn't cleaning itself up. This one-liner fixes it." Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 02 Feb, 2015 1 commit
-
-
Linus Torvalds authored
-
- 01 Feb, 2015 8 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds authored
Pull ARM SoC fixes from Olof Johansson: "One more week's worth of fixes. Worth pointing out here are: - A patch fixing detaching of iommu registrations when a device is removed -- earlier the ops pointer wasn't managed properly - Another set of Renesas boards get the same GIC setup fixup as others have in previous -rcs - Serial port aliases fixups for sunxi. We did the same to tegra but we caught that in time before the merge window due to more machines being affected. Here it took longer for anyone to notice. - A couple more DT tweaks on sunxi - A follow-up patch for the mvebu coherency disabling in last -rc batch" * tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm: dma-mapping: Set DMA IOMMU ops in arm_iommu_attach_device() ARM: shmobile: r8a7790: Instantiate GIC from C board code in legacy builds ARM: shmobile: r8a73a4: Instantiate GIC from C board code in legacy builds ARM: mvebu: don't set the PL310 in I/O coherency mode when I/O coherency is disabled ARM: sunxi: dt: Fix aliases ARM: dts: sun4i: Add simplefb node with de_fe0-de_be0-lcd0-hdmi pipeline ARM: dts: sun6i: ippo-q8h-v5: Fix serial0 alias ARM: dts: sunxi: Fix usb-phy support for sun4i/sun5i
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input layer updates from Dmitry Torokhov: "Just a few quirks for PS/2 this time" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elantech - add more Fujtisu notebooks to force crc_enabled Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857) Input: synaptics - adjust min/max for Lenovo ThinkPad X1 Carbon 2nd
-
Linus Torvalds authored
Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Rafael J Wysocki <rjw@rjwysocki.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rainer Koenig authored
Add two more Fujitsu LIFEBOOK models that also ship with the Elantech touchpad and don't work with crc_disabled to the quirk list. Signed-off-by: Rainer Koenig <Rainer.Koenig@ts.fujitsu.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
-
Olof Johansson authored
Merge tag 'renesas-soc-fixes3-for-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes Merge "Third Round of Renesas ARM Based SoC Fixes for v3.19" from Simon Horman: * Instantiate GIC from C board code in legacy builds on r8a7790 and r8a73a4 * tag 'renesas-soc-fixes3-for-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: r8a7790: Instantiate GIC from C board code in legacy builds ARM: shmobile: r8a73a4: Instantiate GIC from C board code in legacy builds Signed-off-by: Olof Johansson <olof@lixom.net>
-
Andy Lutomirski authored
We used to optimize rescheduling and audit on syscall exit. Now that the full slow path is reasonably fast, remove these optimizations. Syscall exit auditing is now handled exclusively by syscall_trace_leave. This adds something like 10ns to the previously optimized paths on my computer, presumably due mostly to SAVE_REST / RESTORE_REST. I think that we should eventually replace both the syscall and non-paranoid interrupt exit slow paths with a pair of C functions along the lines of the syscall entry hooks. Link: http://lkml.kernel.org/r/22f2aa4a0361707a5cfb1de9d45260b39965dead.1421453410.git.luto@amacapital.netAcked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
-
Andy Lutomirski authored
The x86_64 entry code currently jumps through complex and inconsistent hoops to try to minimize the impact of syscall exit work. For a true fast-path syscall, almost nothing needs to be done, so returning is just a check for exit work and sysret. For a full slow-path return from a syscall, the C exit hook is invoked if needed and we join the iret path. Using iret to return to userspace is very slow, so the entry code has accumulated various special cases to try to do certain forms of exit work without invoking iret. This is error-prone, since it duplicates assembly code paths, and it's dangerous, since sysret can malfunction in interesting ways if used carelessly. It's also inefficient, since a lot of useful cases aren't optimized and therefore force an iret out of a combination of paranoia and the fact that no one has bothered to write even more asm code to avoid it. I would argue that this approach is backwards. Rather than trying to avoid the iret path, we should instead try to make the iret path fast. Under a specific set of conditions, iret is unnecessary. In particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not set, and both SS and CS are as expected, then movq 32(%rsp),%rsp;sysret does the same thing as iret. This set of conditions is nearly always satisfied on return from syscalls, and it can even occasionally be satisfied on return from an irq. Even with the careful checks for sysret applicability, this cuts nearly 80ns off of the overhead from syscalls with unoptimized exit work. This includes tracing and context tracking, and any return that invokes KVM's user return notifier. For example, the cost of getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to ~280ns on my computer. This may allow the removal and even eventual conversion to C of a respectable amount of exit asm. This may require further tweaking to give the full benefit on Xen. It may be worthwhile to adjust signal delivery and exec to try hit the sysret path. This does not optimize returns to 32-bit userspace. Making the same optimization for CS == __USER32_CS is conceptually straightforward, but it will require some tedious code to handle the differences between sysretl and sysexitl. Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.netSigned-off-by: Andy Lutomirski <luto@amacapital.net>
-
Andy Lutomirski authored
context_tracking_user_exit() has no effect if in_interrupt() returns true, so ist_enter() didn't work. Fix it by calling exception_enter(), and thus context_tracking_user_exit(), before incrementing the preempt count. This also adds an assertion that will catch the problem reliably if CONFIG_PROVE_RCU=y to help prevent the bug from being reintroduced. Link: http://lkml.kernel.org/r/261ebee6aee55a4724746d0d7024697013c40a08.1422709102.git.luto@amacapital.net Fixes: 95927475 x86, traps: Track entry into and exit from IST context Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
-
- 31 Jan, 2015 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "i2c driver bugfixes (s3c2410, slave-eeprom, sh_mobile), size regression "bugfix" (i2c slave), documentation bugfix (st). Also, one documentation update (da9063), so some devicetrees can now be verified" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: sh_mobile: terminate DMA reads properly i2c: Only include slave support if selected i2c: s3c2410: fix ABBA deadlock by keeping clock prepared i2c: slave-eeprom: fix boundary check when using sysfs i2c: st: Rename clock reference to something that exists DT: i2c: Add devices handled by the da9063 MFD driver
-