Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar: "Kernel side changes: - Improve accuracy of perf/sched clock on x86. (Adrian Hunter) - Intel DS and BTS updates. (Alexander Shishkin) - Intel cstate PMU support. (Kan Liang) - Add group read support to perf_event_read(). (Peter Zijlstra) - Branch call hardware sampling support, implemented on x86 and PowerPC. (Stephane Eranian) - Event groups transactional interface enhancements. (Sukadev Bhattiprolu) - Enable proper x86/intel/uncore PMU support on multi-segment PCI systems. (Taku Izumi) - ... misc fixes and cleanups. The perf tooling team was very busy again with 200+ commits, the full diff doesn't fit into lkml size limits. Here's an (incomplete) list of the tooling highlights: New features: - Change the default event used in all tools (record/top): use the most precise "cycles" hw counter available, i.e. when the user doesn't specify any event, it will try using cycles:ppp, cycles:pp, etc and fall back transparently until it finds a working counter. (Arnaldo Carvalho de Melo) - Integration of perf with eBPF that, given an eBPF .c source file (or .o file built for the 'bpf' target with clang), will get it automatically built, validated and loaded into the kernel via the sys_bpf syscall, which can then be used and seen using 'perf trace' and other tools. (Wang Nan) Various user interface improvements: - Automatic pager invocation on long help output. (Namhyung Kim) - Search for more options when passing args to -h, e.g.: (Arnaldo Carvalho de Melo) $ perf report -h interface Usage: perf report [<options>] --gtk Use the GTK2 interface --stdio Use the stdio interface --tui Use the TUI interface - Show ordered command line options when -h is used or when an unknown option is specified. (Arnaldo Carvalho de Melo) - If options are passed after -h, show just its descriptions, not all options. (Arnaldo Carvalho de Melo) - Implement column based horizontal scrolling in the hists browser (top, report), making it possible to use the TUI for things like 'perf mem report' where there are many more columns than can fit in a terminal. (Arnaldo Carvalho de Melo) - Enhance the error reporting of tracepoint event parsing, e.g.: $ oldperf record -e sched:sched_switc usleep 1 event syntax error: 'sched:sched_switc' \___ unknown tracepoint Run 'perf list' for a list of valid events Now we get the much nicer: $ perf record -e sched:sched_switc ls event syntax error: 'sched:sched_switc' \___ can't access trace events Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug' And after we have those mount point permissions fixed: $ perf record -e sched:sched_switc ls event syntax error: 'sched:sched_switc' \___ unknown tracepoint Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. I.e. basically now the event parsing routing uses the strerror_open() routines introduced by and used in 'perf trace' work. (Jiri Olsa) - Fail properly when pattern matching fails to find a tracepoint, i.e. '-e non:existent' was being correctly handled, with a proper error message about that not being a valid event, but '-e non:existent*' wasn't, fix it. (Jiri Olsa) - Do event name substring search as last resort in 'perf list'. (Arnaldo Carvalho de Melo) E.g.: # perf list clock List of pre-defined events (to be used in -e): cpu-clock [Software event] task-clock [Software event] uncore_cbox_0/clockticks/ [Kernel PMU event] uncore_cbox_1/clockticks/ [Kernel PMU event] kvm:kvm_pvclock_update [Tracepoint event] kvm:kvm_update_master_clock [Tracepoint event] power:clock_disable [Tracepoint event] power:clock_enable [Tracepoint event] power:clock_set_rate [Tracepoint event] syscalls:sys_enter_clock_adjtime [Tracepoint event] syscalls:sys_enter_clock_getres [Tracepoint event] syscalls:sys_enter_clock_gettime [Tracepoint event] syscalls:sys_enter_clock_nanosleep [Tracepoint event] syscalls:sys_enter_clock_settime [Tracepoint event] syscalls:sys_exit_clock_adjtime [Tracepoint event] syscalls:sys_exit_clock_getres [Tracepoint event] syscalls:sys_exit_clock_gettime [Tracepoint event] syscalls:sys_exit_clock_nanosleep [Tracepoint event] syscalls:sys_exit_clock_settime [Tracepoint event] Intel PT hardware tracing enhancements: - Accept a zero --itrace period, meaning "as often as possible". In the case of Intel PT that is the same as a period of 1 and a unit of 'instructions' (i.e. --itrace=i1i). (Adrian Hunter) - Harmonize itrace's synthesized callchains with the existing --max-stack tool option. (Adrian Hunter) - Allow time to be displayed in nanoseconds in 'perf script'. (Adrian Hunter) - Fix potential infinite loop when handling Intel PT timestamps. (Adrian Hunter) - Slighly improve Intel PT debug logging. (Adrian Hunter) - Warn when AUX data has been lost, just like when processing PERF_RECORD_LOST. (Adrian Hunter) - Further document export-to-postgresql.py script. (Adrian Hunter) - Add option to synthesize branch stack from auxtrace data. (Adrian Hunter) Misc notable changes: - Switch the default callchain output mode to 'graph,0.5,caller', to make it look like the default for other tools, reducing the learning curve for people used to 'caller' based viewing. (Arnaldo Carvalho de Melo) - various call chain usability enhancements. (Namhyung Kim) - Introduce the 'P' event modifier, meaning 'max precision level, please', i.e.: $ perf record -e cycles:P usleep 1 Is now similar to: $ perf record usleep 1 Useful, for instance, when specifying multiple events. (Jiri Olsa) - Add 'socket' sort entry, to sort by the processor socket in 'perf top' and 'perf report'. (Kan Liang) - Introduce --socket-filter to 'perf report', for filtering by processor socket. (Kan Liang) - Add new "Zoom into Processor Socket" operation in the perf hists browser, used in 'perf top' and 'perf report'. (Kan Liang) - Allow probing on kmodules without DWARF. (Masami Hiramatsu) - Fix 'perf probe -l' for probes added to kernel module functions. (Masami Hiramatsu) - Preparatory work for the 'perf stat record' feature that will allow generating perf.data files with counting data in addition to the sampling mode we have now (Jiri Olsa) - Update libtraceevent KVM plugin. (Paolo Bonzini) - ... plus lots of other enhancements that I failed to list properly, by: Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda, Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He Kuang, Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan Liang, Kirill Tkhai, Masami Hiramatsu, Matt Fleming, Namhyung Kim, Paolo Bonzini, Peter Zijlstra, Rabin Vincent, Scott Wood, Stephane Eranian, Sukadev Bhattiprolu, Taku Izumi, Vaishali Thakkar, Wang Nan, Yang Shi and Yunlong Song" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (260 commits) perf unwind: Pass symbol source to libunwind tools build: Fix libiberty feature detection perf tools: Compile scriptlets to BPF objects when passing '.c' to --event perf record: Add clang options for compiling BPF scripts perf bpf: Attach eBPF filter to perf event perf tools: Make sure fixdep is built before libbpf perf script: Enable printing of branch stack perf trace: Add cmd string table to decode sys_bpf first arg perf bpf: Collect perf_evsel in BPF object files perf tools: Load eBPF object into kernel perf tools: Create probe points for BPF programs perf tools: Enable passing bpf object file to --event perf ebpf: Add the libbpf glue perf tools: Make perf depend on libbpf perf symbols: Fix endless loop in dso__split_kallsyms_for_kcore perf tools: Enable pre-event inherit setting by config terms perf symbols: we can now read separate debug-info files based on a build ID perf symbols: Fix type error when reading a build-id perf tools: Search for more options when passing args to -h perf stat: Cache aggregated map entries in extra cpumap ...

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "Kernel side changes: - Improve accuracy of perf/sched clock on x86. (Adrian Hunter) - Intel DS and BTS updates. (Alexander Shishkin) - Intel cstate PMU support. (Kan Liang) - Add group read support to perf_event_read(). (Peter Zijlstra) - Branch call hardware sampling support, implemented on x86 and PowerPC. (Stephane Eranian) - Event groups transactional interface enhancements. (Sukadev Bhattiprolu) - Enable proper x86/intel/uncore PMU support on multi-segment PCI systems. (Taku Izumi) - ... misc fixes and cleanups. The perf tooling team was very busy again with 200+ commits, the full diff doesn't fit into lkml size limits. Here's an (incomplete) list of the tooling highlights: New features: - Change the default event used in all tools (record/top): use the most precise "cycles" hw counter available, i.e. when the user doesn't specify any event, it will try using cycles:ppp, cycles:pp, etc and fall back transparently until it finds a working counter. (Arnaldo Carvalho de Melo) - Integration of perf with eBPF that, given an eBPF .c source file (or .o file built for the 'bpf' target with clang), will get it automatically built, validated and loaded into the kernel via the sys_bpf syscall, which can then be used and seen using 'perf trace' and other tools. (Wang Nan) Various user interface improvements: - Automatic pager invocation on long help output. (Namhyung Kim) - Search for more options when passing args to -h, e.g.: (Arnaldo Carvalho de Melo) $ perf report -h interface Usage: perf report [<options>] --gtk Use the GTK2 interface --stdio Use the stdio interface --tui Use the TUI interface - Show ordered command line options when -h is used or when an unknown option is specified. (Arnaldo Carvalho de Melo) - If options are passed after -h, show just its descriptions, not all options. (Arnaldo Carvalho de Melo) - Implement column based horizontal scrolling in the hists browser (top, report), making it possible to use the TUI for things like 'perf mem report' where there are many more columns than can fit in a terminal. (Arnaldo Carvalho de Melo) - Enhance the error reporting of tracepoint event parsing, e.g.: $ oldperf record -e sched:sched_switc usleep 1 event syntax error: 'sched:sched_switc' \___ unknown tracepoint Run 'perf list' for a list of valid events Now we get the much nicer: $ perf record -e sched:sched_switc ls event syntax error: 'sched:sched_switc' \___ can't access trace events Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug' And after we have those mount point permissions fixed: $ perf record -e sched:sched_switc ls event syntax error: 'sched:sched_switc' \___ unknown tracepoint Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. I.e. basically now the event parsing routing uses the strerror_open() routines introduced by and used in 'perf trace' work. (Jiri Olsa) - Fail properly when pattern matching fails to find a tracepoint, i.e. '-e non:existent' was being correctly handled, with a proper error message about that not being a valid event, but '-e non:existent*' wasn't, fix it. (Jiri Olsa) - Do event name substring search as last resort in 'perf list'. (Arnaldo Carvalho de Melo) E.g.: # perf list clock List of pre-defined events (to be used in -e): cpu-clock [Software event] task-clock [Software event] uncore_cbox_0/clockticks/ [Kernel PMU event] uncore_cbox_1/clockticks/ [Kernel PMU event] kvm:kvm_pvclock_update [Tracepoint event] kvm:kvm_update_master_clock [Tracepoint event] power:clock_disable [Tracepoint event] power:clock_enable [Tracepoint event] power:clock_set_rate [Tracepoint event] syscalls:sys_enter_clock_adjtime [Tracepoint event] syscalls:sys_enter_clock_getres [Tracepoint event] syscalls:sys_enter_clock_gettime [Tracepoint event] syscalls:sys_enter_clock_nanosleep [Tracepoint event] syscalls:sys_enter_clock_settime [Tracepoint event] syscalls:sys_exit_clock_adjtime [Tracepoint event] syscalls:sys_exit_clock_getres [Tracepoint event] syscalls:sys_exit_clock_gettime [Tracepoint event] syscalls:sys_exit_clock_nanosleep [Tracepoint event] syscalls:sys_exit_clock_settime [Tracepoint event] Intel PT hardware tracing enhancements: - Accept a zero --itrace period, meaning "as often as possible". In the case of Intel PT that is the same as a period of 1 and a unit of 'instructions' (i.e. --itrace=i1i). (Adrian Hunter) - Harmonize itrace's synthesized callchains with the existing --max-stack tool option. (Adrian Hunter) - Allow time to be displayed in nanoseconds in 'perf script'. (Adrian Hunter) - Fix potential infinite loop when handling Intel PT timestamps. (Adrian Hunter) - Slighly improve Intel PT debug logging. (Adrian Hunter) - Warn when AUX data has been lost, just like when processing PERF_RECORD_LOST. (Adrian Hunter) - Further document export-to-postgresql.py script. (Adrian Hunter) - Add option to synthesize branch stack from auxtrace data. (Adrian Hunter) Misc notable changes: - Switch the default callchain output mode to 'graph,0.5,caller', to make it look like the default for other tools, reducing the learning curve for people used to 'caller' based viewing. (Arnaldo Carvalho de Melo) - various call chain usability enhancements. (Namhyung Kim) - Introduce the 'P' event modifier, meaning 'max precision level, please', i.e.: $ perf record -e cycles:P usleep 1 Is now similar to: $ perf record usleep 1 Useful, for instance, when specifying multiple events. (Jiri Olsa) - Add 'socket' sort entry, to sort by the processor socket in 'perf top' and 'perf report'. (Kan Liang) - Introduce --socket-filter to 'perf report', for filtering by processor socket. (Kan Liang) - Add new "Zoom into Processor Socket" operation in the perf hists browser, used in 'perf top' and 'perf report'. (Kan Liang) - Allow probing on kmodules without DWARF. (Masami Hiramatsu) - Fix 'perf probe -l' for probes added to kernel module functions. (Masami Hiramatsu) - Preparatory work for the 'perf stat record' feature that will allow generating perf.data files with counting data in addition to the sampling mode we have now (Jiri Olsa) - Update libtraceevent KVM plugin. (Paolo Bonzini) - ... plus lots of other enhancements that I failed to list properly, by: Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda, Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He Kuang, Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan Liang, Kirill Tkhai, Masami Hiramatsu, Matt Fleming, Namhyung Kim, Paolo Bonzini, Peter Zijlstra, Rabin Vincent, Scott Wood, Stephane Eranian, Sukadev Bhattiprolu, Taku Izumi, Vaishali Thakkar, Wang Nan, Yang Shi and Yunlong Song" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (260 commits) perf unwind: Pass symbol source to libunwind tools build: Fix libiberty feature detection perf tools: Compile scriptlets to BPF objects when passing '.c' to --event perf record: Add clang options for compiling BPF scripts perf bpf: Attach eBPF filter to perf event perf tools: Make sure fixdep is built before libbpf perf script: Enable printing of branch stack perf trace: Add cmd string table to decode sys_bpf first arg perf bpf: Collect perf_evsel in BPF object files perf tools: Load eBPF object into kernel perf tools: Create probe points for BPF programs perf tools: Enable passing bpf object file to --event perf ebpf: Add the libbpf glue perf tools: Make perf depend on libbpf perf symbols: Fix endless loop in dso__split_kallsyms_for_kcore perf tools: Enable pre-event inherit setting by config terms perf symbols: we can now read separate debug-info files based on a build ID perf symbols: Fix type error when reading a build-id perf tools: Search for more options when passing args to -h perf stat: Cache aggregated map entries in extra cpumap ...
b02ac6b1 · Linus Torvalds · 105ff3cb · bebd23a2 · b02ac6b1 · b02ac6b1
Commit b02ac6b1 authored Nov 03, 2015 by Linus Torvalds
215 changed files
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -48,7 +48,7 @@ struct cpu_hw_events {
 	unsigned long amasks[MAX_HWEVENTS][MAX_EVENT_ALTERNATIVES];
 	unsigned long avalues[MAX_HWEVENTS][MAX_EVENT_ALTERNATIVES];

-	unsigned int group_flag;
+	unsigned int txn_flags;
 	int n_txn_start;

 	/* BHRB bits */
@@ -1441,7 +1441,7 @@ static int power_pmu_add(struct perf_event *event, int ef_flags)
 	 * skip the schedulability test here, it will be performed
 	 * at commit time(->commit_txn) as a whole
 	 */
-	if (cpuhw->group_flag & PERF_EVENT_TXN)
+	if (cpuhw->txn_flags & PERF_PMU_TXN_ADD)
 		goto nocheck;

 	if (check_excludes(cpuhw->event, cpuhw->flags, n0, 1))
@@ -1586,13 +1586,22 @@ static void power_pmu_stop(struct perf_event *event, int ef_flags)
 * Start group events scheduling transaction
 * Set the flag to make pmu::enable() not perform the
 * schedulability test, it will be performed at commit time
+ *
+ * We only support PERF_PMU_TXN_ADD transactions. Save the
+ * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD
+ * transactions.
 */
-static void power_pmu_start_txn(struct pmu *pmu)
+static void power_pmu_start_txn(struct pmu *pmu, unsigned int txn_flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);

+	WARN_ON_ONCE(cpuhw->txn_flags);		/* txn already in flight */
+
+	cpuhw->txn_flags = txn_flags;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
-	cpuhw->group_flag |= PERF_EVENT_TXN;
 	cpuhw->n_txn_start = cpuhw->n_events;
 }

@@ -1604,8 +1613,15 @@ static void power_pmu_start_txn(struct pmu *pmu)
 static void power_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+	unsigned int txn_flags;
+
+	WARN_ON_ONCE(!cpuhw->txn_flags);	/* no txn in flight */
+
+	txn_flags = cpuhw->txn_flags;
+	cpuhw->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;

-	cpuhw->group_flag &= ~PERF_EVENT_TXN;
 	perf_pmu_enable(pmu);
 }

@@ -1621,7 +1637,15 @@ static int power_pmu_commit_txn(struct pmu *pmu)

 	if (!ppmu)
 		return -EAGAIN;
+
 	cpuhw = this_cpu_ptr(&cpu_hw_events);
+	WARN_ON_ONCE(!cpuhw->txn_flags);	/* no txn in flight */
+
+	if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD) {
+		cpuhw->txn_flags = 0;
+		return 0;
+	}
+
 	n = cpuhw->n_events;
 	if (check_excludes(cpuhw->event, cpuhw->flags, 0, n))
 		return -EAGAIN;
@@ -1632,7 +1656,7 @@ static int power_pmu_commit_txn(struct pmu *pmu)
 	for (i = cpuhw->n_txn_start; i < n; ++i)
 		cpuhw->event[i]->hw.config = cpuhw->events[i];

-	cpuhw->group_flag &= ~PERF_EVENT_TXN;
+	cpuhw->txn_flags = 0;
 	perf_pmu_enable(pmu);
 	return 0;
 }

--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -142,6 +142,15 @@ static struct attribute_group event_long_desc_group = {

 static struct kmem_cache *hv_page_cache;

+DEFINE_PER_CPU(int, hv_24x7_txn_flags);
+DEFINE_PER_CPU(int, hv_24x7_txn_err);
+
+struct hv_24x7_hw {
+	struct perf_event *events[255];
+};
+
+DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw);
+
 /*
 * request_buffer and result_buffer are not required to be 4k aligned,
 * but are not allowed to cross any 4k boundary. Aligning them to 4k is
@@ -1231,9 +1240,48 @@ static void update_event_count(struct perf_event *event, u64 now)
 static void h_24x7_event_read(struct perf_event *event)
 {
 	u64 now;
+	struct hv_24x7_request_buffer *request_buffer;
+	struct hv_24x7_hw *h24x7hw;
+	int txn_flags;
+
+	txn_flags = __this_cpu_read(hv_24x7_txn_flags);
+
+	/*
+	 * If in a READ transaction, add this counter to the list of
+	 * counters to read during the next HCALL (i.e commit_txn()).
+	 * If not in a READ transaction, go ahead and make the HCALL
+	 * to read this counter by itself.
+	 */
+
+	if (txn_flags & PERF_PMU_TXN_READ) {
+		int i;
+		int ret;

-	now = h_24x7_get_value(event);
-	update_event_count(event, now);
+		if (__this_cpu_read(hv_24x7_txn_err))
+			return;
+
+		request_buffer = (void *)get_cpu_var(hv_24x7_reqb);
+
+		ret = add_event_to_24x7_request(event, request_buffer);
+		if (ret) {
+			__this_cpu_write(hv_24x7_txn_err, ret);
+		} else {
+			/*
+			 * Assoicate the event with the HCALL request index,
+			 * so ->commit_txn() can quickly find/update count.
+			 */
+			i = request_buffer->num_requests - 1;
+
+			h24x7hw = &get_cpu_var(hv_24x7_hw);
+			h24x7hw->events[i] = event;
+			put_cpu_var(h24x7hw);
+		}
+
+		put_cpu_var(hv_24x7_reqb);
+	} else {
+		now = h_24x7_get_value(event);
+		update_event_count(event, now);
+	}
 }

 static void h_24x7_event_start(struct perf_event *event, int flags)
@@ -1255,6 +1303,117 @@ static int h_24x7_event_add(struct perf_event *event, int flags)
 	return 0;
 }

+/*
+ * 24x7 counters only support READ transactions. They are
+ * always counting and dont need/support ADD transactions.
+ * Cache the flags, but otherwise ignore transactions that
+ * are not PERF_PMU_TXN_READ.
+ */
+static void h_24x7_event_start_txn(struct pmu *pmu, unsigned int flags)
+{
+	struct hv_24x7_request_buffer *request_buffer;
+	struct hv_24x7_data_result_buffer *result_buffer;
+
+	/* We should not be called if we are already in a txn */
+	WARN_ON_ONCE(__this_cpu_read(hv_24x7_txn_flags));
+
+	__this_cpu_write(hv_24x7_txn_flags, flags);
+	if (flags & ~PERF_PMU_TXN_READ)
+		return;
+
+	request_buffer = (void *)get_cpu_var(hv_24x7_reqb);
+	result_buffer = (void *)get_cpu_var(hv_24x7_resb);
+
+	init_24x7_request(request_buffer, result_buffer);
+
+	put_cpu_var(hv_24x7_resb);
+	put_cpu_var(hv_24x7_reqb);
+}
+
+/*
+ * Clean up transaction state.
+ *
+ * NOTE: Ignore state of request and result buffers for now.
+ *	 We will initialize them during the next read/txn.
+ */
+static void reset_txn(void)
+{
+	__this_cpu_write(hv_24x7_txn_flags, 0);
+	__this_cpu_write(hv_24x7_txn_err, 0);
+}
+
+/*
+ * 24x7 counters only support READ transactions. They are always counting
+ * and dont need/support ADD transactions. Clear ->txn_flags but otherwise
+ * ignore transactions that are not of type PERF_PMU_TXN_READ.
+ *
+ * For READ transactions, submit all pending 24x7 requests (i.e requests
+ * that were queued by h_24x7_event_read()), to the hypervisor and update
+ * the event counts.
+ */
+static int h_24x7_event_commit_txn(struct pmu *pmu)
+{
+	struct hv_24x7_request_buffer *request_buffer;
+	struct hv_24x7_data_result_buffer *result_buffer;
+	struct hv_24x7_result *resb;
+	struct perf_event *event;
+	u64 count;
+	int i, ret, txn_flags;
+	struct hv_24x7_hw *h24x7hw;
+
+	txn_flags = __this_cpu_read(hv_24x7_txn_flags);
+	WARN_ON_ONCE(!txn_flags);
+
+	ret = 0;
+	if (txn_flags & ~PERF_PMU_TXN_READ)
+		goto out;
+
+	ret = __this_cpu_read(hv_24x7_txn_err);
+	if (ret)
+		goto out;
+
+	request_buffer = (void *)get_cpu_var(hv_24x7_reqb);
+	result_buffer = (void *)get_cpu_var(hv_24x7_resb);
+
+	ret = make_24x7_request(request_buffer, result_buffer);
+	if (ret) {
+		log_24x7_hcall(request_buffer, result_buffer, ret);
+		goto put_reqb;
+	}
+
+	h24x7hw = &get_cpu_var(hv_24x7_hw);
+
+	/* Update event counts from hcall */
+	for (i = 0; i < request_buffer->num_requests; i++) {
+		resb = &result_buffer->results[i];
+		count = be64_to_cpu(resb->elements[0].element_data[0]);
+		event = h24x7hw->events[i];
+		h24x7hw->events[i] = NULL;
+		update_event_count(event, count);
+	}
+
+	put_cpu_var(hv_24x7_hw);
+
+put_reqb:
+	put_cpu_var(hv_24x7_resb);
+	put_cpu_var(hv_24x7_reqb);
+out:
+	reset_txn();
+	return ret;
+}
+
+/*
+ * 24x7 counters only support READ transactions. They are always counting
+ * and dont need/support ADD transactions. However, regardless of type
+ * of transaction, all we need to do is cleanup, so we don't have to check
+ * the type of transaction.
+ */
+static void h_24x7_event_cancel_txn(struct pmu *pmu)
+{
+	WARN_ON_ONCE(!__this_cpu_read(hv_24x7_txn_flags));
+	reset_txn();
+}
+
 static struct pmu h_24x7_pmu = {
 	.task_ctx_nr = perf_invalid_context,

@@ -1266,6 +1425,9 @@ static struct pmu h_24x7_pmu = {
 	.start       = h_24x7_event_start,
 	.stop        = h_24x7_event_stop,
 	.read        = h_24x7_event_read,
+	.start_txn   = h_24x7_event_start_txn,
+	.commit_txn  = h_24x7_event_commit_txn,
+	.cancel_txn  = h_24x7_event_cancel_txn,
 };

 static int hv_24x7_init(void)

--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -676,6 +676,9 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
 		return -1;

+	if (branch_sample_type & PERF_SAMPLE_BRANCH_CALL)
+		return -1;
+
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
 		return pmu_bhrb_filter;

--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -72,6 +72,7 @@ struct cpu_hw_events {
 	atomic_t		ctr_set[CPUMF_CTR_SET_MAX];
 	u64			state, tx_state;
 	unsigned int		flags;
+	unsigned int		txn_flags;
 };
 static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	.ctr_set = {
@@ -82,6 +83,7 @@ static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	},
 	.state = 0,
 	.flags = 0,
+	.txn_flags = 0,
 };

 static int get_counter_set(u64 event)
@@ -538,7 +540,7 @@ static int cpumf_pmu_add(struct perf_event *event, int flags)
 	 * For group events transaction, the authorization check is
 	 * done in cpumf_pmu_commit_txn().
 	 */
-	if (!(cpuhw->flags & PERF_EVENT_TXN))
+	if (!(cpuhw->txn_flags & PERF_PMU_TXN_ADD))
 		if (validate_ctr_auth(&event->hw))
 			return -ENOENT;

@@ -576,13 +578,22 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
 /*
 * Start group events scheduling transaction.
 * Set flags to perform a single test at commit time.
+ *
+ * We only support PERF_PMU_TXN_ADD transactions. Save the
+ * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD
+ * transactions.
 */
-static void cpumf_pmu_start_txn(struct pmu *pmu)
+static void cpumf_pmu_start_txn(struct pmu *pmu, unsigned int txn_flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);

+	WARN_ON_ONCE(cpuhw->txn_flags);		/* txn already in flight */
+
+	cpuhw->txn_flags = txn_flags;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
-	cpuhw->flags |= PERF_EVENT_TXN;
 	cpuhw->tx_state = cpuhw->state;
 }

@@ -593,11 +604,18 @@ static void cpumf_pmu_start_txn(struct pmu *pmu)
 */
 static void cpumf_pmu_cancel_txn(struct pmu *pmu)
 {
+	unsigned int txn_flags;
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);

+	WARN_ON_ONCE(!cpuhw->txn_flags);	/* no txn in flight */
+
+	txn_flags = cpuhw->txn_flags;
+	cpuhw->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	WARN_ON(cpuhw->tx_state != cpuhw->state);

-	cpuhw->flags &= ~PERF_EVENT_TXN;
 	perf_pmu_enable(pmu);
 }

@@ -611,13 +629,20 @@ static int cpumf_pmu_commit_txn(struct pmu *pmu)
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 	u64 state;

+	WARN_ON_ONCE(!cpuhw->txn_flags);	/* no txn in flight */
+
+	if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD) {
+		cpuhw->txn_flags = 0;
+		return 0;
+	}
+
 	/* check if the updated state can be scheduled */
 	state = cpuhw->state & ~((1 << CPUMF_LCCTL_ENABLE_SHIFT) - 1);
 	state >>= CPUMF_LCCTL_ENABLE_SHIFT;
 	if ((state & cpuhw->info.auth_ctl) != state)
 		return -ENOENT;

-	cpuhw->flags &= ~PERF_EVENT_TXN;
+	cpuhw->txn_flags = 0;
 	perf_pmu_enable(pmu);
 	return 0;
 }

--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -108,7 +108,7 @@ struct cpu_hw_events {
 	/* Enabled/disable state.  */
 	int			enabled;

-	unsigned int		group_flag;
+	unsigned int		txn_flags;
 };
 static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = { .enabled = 1, };

@@ -1379,7 +1379,7 @@ static int sparc_pmu_add(struct perf_event *event, int ef_flags)
 	 * skip the schedulability test here, it will be performed
 	 * at commit time(->commit_txn) as a whole
 	 */
-	if (cpuc->group_flag & PERF_EVENT_TXN)
+	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto nocheck;

 	if (check_excludes(cpuc->event, n0, 1))
@@ -1494,12 +1494,17 @@ static int sparc_pmu_event_init(struct perf_event *event)
 * Set the flag to make pmu::enable() not perform the
 * schedulability test, it will be performed at commit time
 */
-static void sparc_pmu_start_txn(struct pmu *pmu)
+static void sparc_pmu_start_txn(struct pmu *pmu, unsigned int txn_flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);

+	WARN_ON_ONCE(cpuhw->txn_flags);		/* txn already in flight */
+
+	cpuhw->txn_flags = txn_flags;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
-	cpuhw->group_flag |= PERF_EVENT_TXN;
 }

 /*
@@ -1510,8 +1515,15 @@ static void sparc_pmu_start_txn(struct pmu *pmu)
 static void sparc_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+	unsigned int txn_flags;
+
+	WARN_ON_ONCE(!cpuhw->txn_flags);	/* no txn in flight */
+
+	txn_flags = cpuhw->txn_flags;
+	cpuhw->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;

-	cpuhw->group_flag &= ~PERF_EVENT_TXN;
 	perf_pmu_enable(pmu);
 }

@@ -1528,14 +1540,20 @@ static int sparc_pmu_commit_txn(struct pmu *pmu)
 	if (!sparc_pmu)
 		return -EINVAL;

-	cpuc = this_cpu_ptr(&cpu_hw_events);
+	WARN_ON_ONCE(!cpuc->txn_flags);	/* no txn in flight */
+
+	if (cpuc->txn_flags & ~PERF_PMU_TXN_ADD) {
+		cpuc->txn_flags = 0;
+		return 0;
+	}
+
 	n = cpuc->n_events;
 	if (check_excludes(cpuc->event, 0, n))
 		return -EINVAL;
 	if (sparc_check_constraints(cpuc->event, cpuc->events, n))
 		return -EAGAIN;

-	cpuc->group_flag &= ~PERF_EVENT_TXN;
+	cpuc->txn_flags = 0;
 	perf_pmu_enable(pmu);
 	return 0;
 }

--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_knc.o perf_event_p4.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_rapl.o perf_event_intel_cqm.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_pt.o perf_event_intel_bts.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_cstate.o

 obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= perf_event_intel_uncore.o \
 					   perf_event_intel_uncore_snb.o \

--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -157,7 +157,7 @@ struct _cpuid4_info_regs {
 	struct amd_northbridge *nb;
 };

-unsigned short			num_cache_leaves;
+static unsigned short num_cache_leaves;

 /* AMD doesn't have CPUID4. Emulate it here to report the same
   information to the user.  This makes some assumptions about the machine:
@@ -326,7 +326,7 @@ static void amd_calc_l3_indices(struct amd_northbridge *nb)
 *
 * @returns: the disabled index if used or negative value if slot free.
 */
-int amd_get_l3_disable_slot(struct amd_northbridge *nb, unsigned slot)
+static int amd_get_l3_disable_slot(struct amd_northbridge *nb, unsigned slot)
 {
 	unsigned int reg = 0;

@@ -403,8 +403,8 @@ static void amd_l3_disable_index(struct amd_northbridge *nb, int cpu,
 *
 * @return: 0 on success, error status on failure
 */
-int amd_set_l3_disable_slot(struct amd_northbridge *nb, int cpu, unsigned slot,
-			    unsigned long index)
+static int amd_set_l3_disable_slot(struct amd_northbridge *nb, int cpu,
+			    unsigned slot, unsigned long index)
 {
 	int ret = 0;


--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1175,7 +1175,7 @@ static int x86_pmu_add(struct perf_event *event, int flags)
 	 * skip the schedulability test here, it will be performed
 	 * at commit time (->commit_txn) as a whole.
 	 */
-	if (cpuc->group_flag & PERF_EVENT_TXN)
+	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto done_collect;

 	ret = x86_pmu.schedule_events(cpuc, n, assign);
@@ -1326,7 +1326,7 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	 * XXX assumes any ->del() called during a TXN will only be on
 	 * an event added during that same TXN.
 	 */
-	if (cpuc->group_flag & PERF_EVENT_TXN)
+	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		return;

 	/*
@@ -1748,11 +1748,22 @@ static inline void x86_pmu_read(struct perf_event *event)
 * Start group events scheduling transaction
 * Set the flag to make pmu::enable() not perform the
 * schedulability test, it will be performed at commit time
+ *
+ * We only support PERF_PMU_TXN_ADD transactions. Save the
+ * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD
+ * transactions.
 */
-static void x86_pmu_start_txn(struct pmu *pmu)
+static void x86_pmu_start_txn(struct pmu *pmu, unsigned int txn_flags)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	WARN_ON_ONCE(cpuc->txn_flags);		/* txn already in flight */
+
+	cpuc->txn_flags = txn_flags;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
-	__this_cpu_or(cpu_hw_events.group_flag, PERF_EVENT_TXN);
 	__this_cpu_write(cpu_hw_events.n_txn, 0);
 }

@@ -1763,7 +1774,16 @@ static void x86_pmu_start_txn(struct pmu *pmu)
 */
 static void x86_pmu_cancel_txn(struct pmu *pmu)
 {
-	__this_cpu_and(cpu_hw_events.group_flag, ~PERF_EVENT_TXN);
+	unsigned int txn_flags;
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	WARN_ON_ONCE(!cpuc->txn_flags);	/* no txn in flight */
+
+	txn_flags = cpuc->txn_flags;
+	cpuc->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	/*
 	 * Truncate collected array by the number of events added in this
 	 * transaction. See x86_pmu_add() and x86_pmu_*_txn().
@@ -1786,6 +1806,13 @@ static int x86_pmu_commit_txn(struct pmu *pmu)
 	int assign[X86_PMC_IDX_MAX];
 	int n, ret;

+	WARN_ON_ONCE(!cpuc->txn_flags);	/* no txn in flight */
+
+	if (cpuc->txn_flags & ~PERF_PMU_TXN_ADD) {
+		cpuc->txn_flags = 0;
+		return 0;
+	}
+
 	n = cpuc->n_events;

 	if (!x86_pmu_initialized())
@@ -1801,7 +1828,7 @@ static int x86_pmu_commit_txn(struct pmu *pmu)
 	 */
 	memcpy(cpuc->assign, assign, n*sizeof(int));

-	cpuc->group_flag &= ~PERF_EVENT_TXN;
+	cpuc->txn_flags = 0;
 	perf_pmu_enable(pmu);
 	return 0;
 }

--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -196,7 +196,7 @@ struct cpu_hw_events {

 	int			n_excl; /* the number of exclusive events */

-	unsigned int		group_flag;
+	unsigned int		txn_flags;
 	int			is_fake;

 	/*

--- a/arch/x86/kernel/cpu/perf_event_intel_bts.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c
@@ -495,6 +495,19 @@ static int bts_event_init(struct perf_event *event)
 	if (x86_add_exclusive(x86_lbr_exclusive_bts))
 		return -EBUSY;

+	/*
+	 * BTS leaks kernel addresses even when CPL0 tracing is
+	 * disabled, so disallow intel_bts driver for unprivileged
+	 * users on paranoid systems since it provides trace data
+	 * to the user in a zero-copy fashion.
+	 *
+	 * Note that the default paranoia setting permits unprivileged
+	 * users to profile the kernel.
+	 */
+	if (event->attr.exclude_kernel && perf_paranoid_kernel() &&
+	    !capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
 	ret = x86_reserve_hardware();
 	if (ret) {
 		x86_del_exclusive(x86_lbr_exclusive_bts);

--- a/arch/x86/kernel/cpu/perf_event_intel_cstate.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cstate.c
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -510,10 +510,11 @@ int intel_pmu_drain_bts_buffer(void)
 		u64	flags;
 	};
 	struct perf_event *event = cpuc->events[INTEL_PMC_IDX_FIXED_BTS];
-	struct bts_record *at, *top;
+	struct bts_record *at, *base, *top;
 	struct perf_output_handle handle;
 	struct perf_event_header header;
 	struct perf_sample_data data;
+	unsigned long skip = 0;
 	struct pt_regs regs;

 	if (!event)
@@ -522,10 +523,10 @@ int intel_pmu_drain_bts_buffer(void)
 	if (!x86_pmu.bts_active)
 		return 0;

-	at  = (struct bts_record *)(unsigned long)ds->bts_buffer_base;
-	top = (struct bts_record *)(unsigned long)ds->bts_index;
+	base = (struct bts_record *)(unsigned long)ds->bts_buffer_base;
+	top  = (struct bts_record *)(unsigned long)ds->bts_index;

-	if (top <= at)
+	if (top <= base)
 		return 0;

 	memset(&regs, 0, sizeof(regs));
@@ -534,6 +535,27 @@ int intel_pmu_drain_bts_buffer(void)

 	perf_sample_data_init(&data, 0, event->hw.last_period);

+	/*
+	 * BTS leaks kernel addresses in branches across the cpl boundary,
+	 * such as traps or system calls, so unless the user is asking for
+	 * kernel tracing (and right now it's not possible), we'd need to
+	 * filter them out. But first we need to count how many of those we
+	 * have in the current batch. This is an extra O(n) pass, however,
+	 * it's much faster than the other one especially considering that
+	 * n <= 2560 (BTS_BUFFER_SIZE / BTS_RECORD_SIZE * 15/16; see the
+	 * alloc_bts_buffer()).
+	 */
+	for (at = base; at < top; at++) {
+		/*
+		 * Note that right now *this* BTS code only works if
+		 * attr::exclude_kernel is set, but let's keep this extra
+		 * check here in case that changes.
+		 */
+		if (event->attr.exclude_kernel &&
+		    (kernel_ip(at->from) || kernel_ip(at->to)))
+			skip++;
+	}
+
 	/*
 	 * Prepare a generic sample, i.e. fill in the invariant fields.
 	 * We will overwrite the from and to address before we output
@@ -541,10 +563,16 @@ int intel_pmu_drain_bts_buffer(void)
 	 */
 	perf_prepare_sample(&header, &data, event, &regs);

-	if (perf_output_begin(&handle, event, header.size * (top - at)))
+	if (perf_output_begin(&handle, event, header.size *
+			      (top - base - skip)))
 		return 1;

-	for (; at < top; at++) {
+	for (at = base; at < top; at++) {
+		/* Filter out any records that contain kernel addresses. */
+		if (event->attr.exclude_kernel &&
+		    (kernel_ip(at->from) || kernel_ip(at->to)))
+			continue;
+
 		data.ip		= at->from;
 		data.addr	= at->to;


--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -151,10 +151,10 @@ static void __intel_pmu_lbr_enable(bool pmi)
 	 * No need to reprogram LBR_SELECT in a PMI, as it
 	 * did not change.
 	 */
-	if (cpuc->lbr_sel && !pmi) {
+	if (cpuc->lbr_sel)
 		lbr_select = cpuc->lbr_sel->config;
+	if (!pmi)
 		wrmsrl(MSR_LBR_SELECT, lbr_select);
-	}

 	rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
 	orig_debugctl = debugctl;
@@ -555,6 +555,8 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 	if (br_type & PERF_SAMPLE_BRANCH_IND_JUMP)
 		mask |= X86_BR_IND_JMP;

+	if (br_type & PERF_SAMPLE_BRANCH_CALL)
+		mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -890,6 +892,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
 	[PERF_SAMPLE_BRANCH_IND_CALL_SHIFT]	= LBR_IND_CALL,
 	[PERF_SAMPLE_BRANCH_COND_SHIFT]		= LBR_JCC,
 	[PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT]	= LBR_IND_JMP,
+	[PERF_SAMPLE_BRANCH_CALL_SHIFT]		= LBR_REL_CALL,
 };

 static const int hsw_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
@@ -905,6 +908,7 @@ static const int hsw_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
 	[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT]	= LBR_REL_CALL | LBR_IND_CALL
 						| LBR_RETURN | LBR_CALL_STACK,
 	[PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT]	= LBR_IND_JMP,
+	[PERF_SAMPLE_BRANCH_CALL_SHIFT]		= LBR_REL_CALL,
 };

 /* core */

--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -139,9 +139,6 @@ static int __init pt_pmu_hw_init(void)
 	long i;

 	attrs = NULL;
-	ret = -ENODEV;
-	if (!test_cpu_cap(&boot_cpu_data, X86_FEATURE_INTEL_PT))
-		goto fail;

 	for (i = 0; i < PT_CPUID_LEAVES; i++) {
 		cpuid_count(20, i,
@@ -1130,6 +1127,10 @@ static __init int pt_init(void)
 	int ret, cpu, prior_warn = 0;

 	BUILD_BUG_ON(sizeof(struct topa) > PAGE_SIZE);
+
+	if (!test_cpu_cap(&boot_cpu_data, X86_FEATURE_INTEL_PT))
+		return -ENODEV;
+
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		u64 ctl;

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -7,7 +7,8 @@ struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
 static bool pcidrv_registered;
 struct pci_driver *uncore_pci_driver;
 /* pci bus to socket mapping */
-int uncore_pcibus_to_physid[256] = { [0 ... 255] = -1, };
+DEFINE_RAW_SPINLOCK(pci2phy_map_lock);
+struct list_head pci2phy_map_head = LIST_HEAD_INIT(pci2phy_map_head);
 struct pci_dev *uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX];

 static DEFINE_RAW_SPINLOCK(uncore_box_lock);
@@ -20,6 +21,59 @@ static struct event_constraint uncore_constraint_fixed =
 struct event_constraint uncore_constraint_empty =
 	EVENT_CONSTRAINT(0, 0, 0);

+int uncore_pcibus_to_physid(struct pci_bus *bus)
+{
+	struct pci2phy_map *map;
+	int phys_id = -1;
+
+	raw_spin_lock(&pci2phy_map_lock);
+	list_for_each_entry(map, &pci2phy_map_head, list) {
+		if (map->segment == pci_domain_nr(bus)) {
+			phys_id = map->pbus_to_physid[bus->number];
+			break;
+		}
+	}
+	raw_spin_unlock(&pci2phy_map_lock);
+
+	return phys_id;
+}
+
+struct pci2phy_map *__find_pci2phy_map(int segment)
+{
+	struct pci2phy_map *map, *alloc = NULL;
+	int i;
+
+	lockdep_assert_held(&pci2phy_map_lock);
+
+lookup:
+	list_for_each_entry(map, &pci2phy_map_head, list) {
+		if (map->segment == segment)
+			goto end;
+	}
+
+	if (!alloc) {
+		raw_spin_unlock(&pci2phy_map_lock);
+		alloc = kmalloc(sizeof(struct pci2phy_map), GFP_KERNEL);
+		raw_spin_lock(&pci2phy_map_lock);
+
+		if (!alloc)
+			return NULL;
+
+		goto lookup;
+	}
+
+	map = alloc;
+	alloc = NULL;
+	map->segment = segment;
+	for (i = 0; i < 256; i++)
+		map->pbus_to_physid[i] = -1;
+	list_add_tail(&map->list, &pci2phy_map_head);
+
+end:
+	kfree(alloc);
+	return map;
+}
+
 ssize_t uncore_event_show(struct kobject *kobj,
 			  struct kobj_attribute *attr, char *buf)
 {
@@ -809,7 +863,7 @@ static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id
 	int phys_id;
 	bool first_box = false;

-	phys_id = uncore_pcibus_to_physid[pdev->bus->number];
+	phys_id = uncore_pcibus_to_physid(pdev->bus);
 	if (phys_id < 0)
 		return -ENODEV;

@@ -856,9 +910,10 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 {
 	struct intel_uncore_box *box = pci_get_drvdata(pdev);
 	struct intel_uncore_pmu *pmu;
-	int i, cpu, phys_id = uncore_pcibus_to_physid[pdev->bus->number];
+	int i, cpu, phys_id;
 	bool last_box = false;

+	phys_id = uncore_pcibus_to_physid(pdev->bus);
 	box = pci_get_drvdata(pdev);
 	if (!box) {
 		for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -117,6 +117,15 @@ struct uncore_event_desc {
 	const char *config;
 };

+struct pci2phy_map {
+	struct list_head list;
+	int segment;
+	int pbus_to_physid[256];
+};
+
+int uncore_pcibus_to_physid(struct pci_bus *bus);
+struct pci2phy_map *__find_pci2phy_map(int segment);
+
 ssize_t uncore_event_show(struct kobject *kobj,
 			  struct kobj_attribute *attr, char *buf);

@@ -317,7 +326,8 @@ u64 uncore_shared_reg_config(struct intel_uncore_box *box, int idx);
 extern struct intel_uncore_type **uncore_msr_uncores;
 extern struct intel_uncore_type **uncore_pci_uncores;
 extern struct pci_driver *uncore_pci_driver;
-extern int uncore_pcibus_to_physid[256];
+extern raw_spinlock_t pci2phy_map_lock;
+extern struct list_head pci2phy_map_head;
 extern struct pci_dev *uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX];
 extern struct event_constraint uncore_constraint_empty;


--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -420,15 +420,25 @@ static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
 static int snb_pci2phy_map_init(int devid)
 {
 	struct pci_dev *dev = NULL;
-	int bus;
+	struct pci2phy_map *map;
+	int bus, segment;

 	dev = pci_get_device(PCI_VENDOR_ID_INTEL, devid, dev);
 	if (!dev)
 		return -ENOTTY;

 	bus = dev->bus->number;
-
-	uncore_pcibus_to_physid[bus] = 0;
+	segment = pci_domain_nr(dev->bus);
+
+	raw_spin_lock(&pci2phy_map_lock);
+	map = __find_pci2phy_map(segment);
+	if (!map) {
+		raw_spin_unlock(&pci2phy_map_lock);
+		pci_dev_put(dev);
+		return -ENOMEM;
+	}
+	map->pbus_to_physid[bus] = 0;
+	raw_spin_unlock(&pci2phy_map_lock);

 	pci_dev_put(dev);


--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
@@ -1087,7 +1087,8 @@ static struct pci_driver snbep_uncore_pci_driver = {
 static int snbep_pci2phy_map_init(int devid)
 {
 	struct pci_dev *ubox_dev = NULL;
-	int i, bus, nodeid;
+	int i, bus, nodeid, segment;
+	struct pci2phy_map *map;
 	int err = 0;
 	u32 config = 0;

@@ -1106,16 +1107,27 @@ static int snbep_pci2phy_map_init(int devid)
 		err = pci_read_config_dword(ubox_dev, 0x54, &config);
 		if (err)
 			break;
+
+		segment = pci_domain_nr(ubox_dev->bus);
+		raw_spin_lock(&pci2phy_map_lock);
+		map = __find_pci2phy_map(segment);
+		if (!map) {
+			raw_spin_unlock(&pci2phy_map_lock);
+			err = -ENOMEM;
+			break;
+		}
+
 		/*
 		 * every three bits in the Node ID mapping register maps
 		 * to a particular node.
 		 */
 		for (i = 0; i < 8; i++) {
 			if (nodeid == ((config >> (3 * i)) & 0x7)) {
-				uncore_pcibus_to_physid[bus] = i;
+				map->pbus_to_physid[bus] = i;
 				break;
 			}
 		}
+		raw_spin_unlock(&pci2phy_map_lock);
 	}

 	if (!err) {
@@ -1123,13 +1135,17 @@ static int snbep_pci2phy_map_init(int devid)
 		 * For PCI bus with no UBOX device, find the next bus
 		 * that has UBOX device and use its mapping.
 		 */
-		i = -1;
-		for (bus = 255; bus >= 0; bus--) {
-			if (uncore_pcibus_to_physid[bus] >= 0)
-				i = uncore_pcibus_to_physid[bus];
-			else
-				uncore_pcibus_to_physid[bus] = i;
+		raw_spin_lock(&pci2phy_map_lock);
+		list_for_each_entry(map, &pci2phy_map_head, list) {
+			i = -1;
+			for (bus = 255; bus >= 0; bus--) {
+				if (map->pbus_to_physid[bus] >= 0)
+					i = map->pbus_to_physid[bus];
+				else
+					map->pbus_to_physid[bus] = i;
+			}
 		}
+		raw_spin_unlock(&pci2phy_map_lock);
 	}

 	pci_dev_put(ubox_dev);
@@ -2444,7 +2460,7 @@ static struct intel_uncore_type *bdx_pci_uncores[] = {
 	NULL,
 };

-static DEFINE_PCI_DEVICE_TABLE(bdx_uncore_pci_ids) = {
+static const struct pci_device_id bdx_uncore_pci_ids[] = {
 	{ /* Home Agent 0 */
 		PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f30),
 		.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_HA, 0),

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -168,21 +168,20 @@ static void cyc2ns_write_end(int cpu, struct cyc2ns_data *data)
 *              ns = cycles * cyc2ns_scale / SC
 *
 *      And since SC is a constant power of two, we can convert the div
- *  into a shift.
+ *  into a shift. The larger SC is, the more accurate the conversion, but
+ *  cyc2ns_scale needs to be a 32-bit value so that 32-bit multiplication
+ *  (64-bit result) can be used.
 *
- *  We can use khz divisor instead of mhz to keep a better precision, since
- *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  We can use khz divisor instead of mhz to keep a better precision.
 *  (mathieu.desnoyers@polymtl.ca)
 *
 *                      -johnstul@us.ibm.com "math is hard, lets go shopping!"
 */

-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
 static void cyc2ns_data_init(struct cyc2ns_data *data)
 {
 	data->cyc2ns_mul = 0;
-	data->cyc2ns_shift = CYC2NS_SCALE_FACTOR;
+	data->cyc2ns_shift = 0;
 	data->cyc2ns_offset = 0;
 	data->__count = 0;
 }
@@ -216,14 +215,14 @@ static inline unsigned long long cycles_2_ns(unsigned long long cyc)

 	if (likely(data == tail)) {
 		ns = data->cyc2ns_offset;
-		ns += mul_u64_u32_shr(cyc, data->cyc2ns_mul, CYC2NS_SCALE_FACTOR);
+		ns += mul_u64_u32_shr(cyc, data->cyc2ns_mul, data->cyc2ns_shift);
 	} else {
 		data->__count++;

 		barrier();

 		ns = data->cyc2ns_offset;
-		ns += mul_u64_u32_shr(cyc, data->cyc2ns_mul, CYC2NS_SCALE_FACTOR);
+		ns += mul_u64_u32_shr(cyc, data->cyc2ns_mul, data->cyc2ns_shift);

 		barrier();

@@ -257,12 +256,22 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 	 * time function is continuous; see the comment near struct
 	 * cyc2ns_data.
 	 */
-	data->cyc2ns_mul =
-		DIV_ROUND_CLOSEST(NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR,
-				  cpu_khz);
-	data->cyc2ns_shift = CYC2NS_SCALE_FACTOR;
+	clocks_calc_mult_shift(&data->cyc2ns_mul, &data->cyc2ns_shift, cpu_khz,
+			       NSEC_PER_MSEC, 0);
+
+	/*
+	 * cyc2ns_shift is exported via arch_perf_update_userpage() where it is
+	 * not expected to be greater than 31 due to the original published
+	 * conversion algorithm shifting a 32-bit value (now specifies a 64-bit
+	 * value) - refer perf_event_mmap_page documentation in perf_event.h.
+	 */
+	if (data->cyc2ns_shift == 32) {
+		data->cyc2ns_shift = 31;
+		data->cyc2ns_mul >>= 1;
+	}
+
 	data->cyc2ns_offset = ns_now -
-		mul_u64_u32_shr(tsc_now, data->cyc2ns_mul, CYC2NS_SCALE_FACTOR);
+		mul_u64_u32_shr(tsc_now, data->cyc2ns_mul, data->cyc2ns_shift);

 	cyc2ns_write_end(cpu, data);


--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -353,8 +353,12 @@ AVXcode: 1
 17: vmovhps Mq,Vq (v1) | vmovhpd Mq,Vq (66),(v1)
 18: Grp16 (1A)
 19:
-1a: BNDCL Ev,Gv | BNDCU Ev,Gv | BNDMOV Gv,Ev | BNDLDX Gv,Ev,Gv
-1b: BNDCN Ev,Gv | BNDMOV Ev,Gv | BNDMK Gv,Ev | BNDSTX Ev,GV,Gv
+# Intel SDM opcode map does not list MPX instructions. For now using Gv for
+# bnd registers and Ev for everything else is OK because the instruction
+# decoder does not use the information except as an indication that there is
+# a ModR/M byte.
+1a: BNDCL Gv,Ev (F3) | BNDCU Gv,Ev (F2) | BNDMOV Gv,Ev (66) | BNDLDX Gv,Ev
+1b: BNDCN Gv,Ev (F2) | BNDMOV Ev,Gv (66) | BNDMK Gv,Ev (F3) | BNDSTX Ev,Gv
 1c:
 1d:
 1e:
@@ -732,6 +736,12 @@ bd: vfnmadd231ss/d Vx,Hx,Wx (66),(v),(v1)
 be: vfnmsub231ps/d Vx,Hx,Wx (66),(v)
 bf: vfnmsub231ss/d Vx,Hx,Wx (66),(v),(v1)
 # 0x0f 0x38 0xc0-0xff
+c8: sha1nexte Vdq,Wdq
+c9: sha1msg1 Vdq,Wdq
+ca: sha1msg2 Vdq,Wdq
+cb: sha256rnds2 Vdq,Wdq
+cc: sha256msg1 Vdq,Wdq
+cd: sha256msg2 Vdq,Wdq
 db: VAESIMC Vdq,Wdq (66),(v1)
 dc: VAESENC Vdq,Hdq,Wdq (66),(v1)
 dd: VAESENCLAST Vdq,Hdq,Wdq (66),(v1)
@@ -790,6 +800,7 @@ AVXcode: 3
 61: vpcmpestri Vdq,Wdq,Ib (66),(v1)
 62: vpcmpistrm Vdq,Wdq,Ib (66),(v1)
 63: vpcmpistri Vdq,Wdq,Ib (66),(v1)
+cc: sha1rnds4 Vdq,Wdq,Ib
 df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
 f0: RORX Gy,Ey,Ib (F2),(v)
 EndTable
@@ -874,7 +885,7 @@ GrpTable: Grp7
 2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B)
 3: LIDT Ms
 4: SMSW Mw/Rv
-5:
+5: rdpkru (110),(11B) | wrpkru (111),(11B)
 6: LMSW Ew
 7: INVLPG Mb | SWAPGS (o64),(000),(11B) | RDTSCP (001),(11B)
 EndTable
@@ -888,6 +899,9 @@ EndTable

 GrpTable: Grp9
 1: CMPXCHG8B/16B Mq/Mdq
+3: xrstors
+4: xsavec
+5: xsaves
 6: VMPTRLD Mq | VMCLEAR Mq (66) | VMXON Mq (F3) | RDRAND Rv (11B)
 7: VMPTRST Mq | VMPTRST Mq (F3) | RDSEED Rv (11B)
 EndTable
@@ -932,8 +946,8 @@ GrpTable: Grp15
 3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE
 5: XRSTOR | lfence (11B)
-6: XSAVEOPT | mfence (11B)
-7: clflush | sfence (11B)
+6: XSAVEOPT | clwb (66) | mfence (11B)
+7: clflush | clflushopt (66) | sfence (11B) | pcommit (66),(11B)
 EndTable

 GrpTable: Grp16

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -140,33 +140,67 @@ struct hw_perf_event {
 		};
 #endif
 	};
+	/*
+	 * If the event is a per task event, this will point to the task in
+	 * question. See the comment in perf_event_alloc().
+	 */
 	struct task_struct		*target;
+
+/*
+ * hw_perf_event::state flags; used to track the PERF_EF_* state.
+ */
+#define PERF_HES_STOPPED	0x01 /* the counter is stopped */
+#define PERF_HES_UPTODATE	0x02 /* event->count up-to-date */
+#define PERF_HES_ARCH		0x04
+
 	int				state;
+
+	/*
+	 * The last observed hardware counter value, updated with a
+	 * local64_cmpxchg() such that pmu::read() can be called nested.
+	 */
 	local64_t			prev_count;
+
+	/*
+	 * The period to start the next sample with.
+	 */
 	u64				sample_period;
+
+	/*
+	 * The period we started this sample with.
+	 */
 	u64				last_period;
+
+	/*
+	 * However much is left of the current period; note that this is
+	 * a full 64bit value and allows for generation of periods longer
+	 * than hardware might allow.
+	 */
 	local64_t			period_left;
+
+	/*
+	 * State for throttling the event, see __perf_event_overflow() and
+	 * perf_adjust_freq_unthr_context().
+	 */
 	u64                             interrupts_seq;
 	u64				interrupts;

+	/*
+	 * State for freq target events, see __perf_event_overflow() and
+	 * perf_adjust_freq_unthr_context().
+	 */
 	u64				freq_time_stamp;
 	u64				freq_count_stamp;
 #endif
 };

-/*
- * hw_perf_event::state flags
- */
-#define PERF_HES_STOPPED	0x01 /* the counter is stopped */
-#define PERF_HES_UPTODATE	0x02 /* event->count up-to-date */
-#define PERF_HES_ARCH		0x04
-
 struct perf_event;

 /*
 * Common implementation detail of pmu::{start,commit,cancel}_txn
 */
-#define PERF_EVENT_TXN 0x1
+#define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
+#define PERF_PMU_TXN_READ 0x2		/* txn to read event group from PMU */

 /**
 * pmu::capabilities flags
@@ -210,7 +244,19 @@ struct pmu {

 	/*
 	 * Try and initialize the event for this PMU.
-	 * Should return -ENOENT when the @event doesn't match this PMU.
+	 *
+	 * Returns:
+	 *  -ENOENT	-- @event is not for this PMU
+	 *
+	 *  -ENODEV	-- @event is for this PMU but PMU not present
+	 *  -EBUSY	-- @event is for this PMU but PMU temporarily unavailable
+	 *  -EINVAL	-- @event is for this PMU but @event is not valid
+	 *  -EOPNOTSUPP -- @event is for this PMU, @event is valid, but not supported
+	 *  -EACCESS	-- @event is for this PMU, @event is valid, but no privilidges
+	 *
+	 *  0		-- @event is for this PMU and valid
+	 *
+	 * Other error return values are allowed.
 	 */
 	int (*event_init)		(struct perf_event *event);

@@ -221,27 +267,61 @@ struct pmu {
 	void (*event_mapped)		(struct perf_event *event); /*optional*/
 	void (*event_unmapped)		(struct perf_event *event); /*optional*/

+	/*
+	 * Flags for ->add()/->del()/ ->start()/->stop(). There are
+	 * matching hw_perf_event::state flags.
+	 */
 #define PERF_EF_START	0x01		/* start the counter when adding    */
 #define PERF_EF_RELOAD	0x02		/* reload the counter when starting */
 #define PERF_EF_UPDATE	0x04		/* update the counter when stopping */

 	/*
-	 * Adds/Removes a counter to/from the PMU, can be done inside
-	 * a transaction, see the ->*_txn() methods.
+	 * Adds/Removes a counter to/from the PMU, can be done inside a
+	 * transaction, see the ->*_txn() methods.
+	 *
+	 * The add/del callbacks will reserve all hardware resources required
+	 * to service the event, this includes any counter constraint
+	 * scheduling etc.
+	 *
+	 * Called with IRQs disabled and the PMU disabled on the CPU the event
+	 * is on.
+	 *
+	 * ->add() called without PERF_EF_START should result in the same state
+	 *  as ->add() followed by ->stop().
+	 *
+	 * ->del() must always PERF_EF_UPDATE stop an event. If it calls
+	 *  ->stop() that must deal with already being stopped without
+	 *  PERF_EF_UPDATE.
 	 */
 	int  (*add)			(struct perf_event *event, int flags);
 	void (*del)			(struct perf_event *event, int flags);

 	/*
-	 * Starts/Stops a counter present on the PMU. The PMI handler
-	 * should stop the counter when perf_event_overflow() returns
-	 * !0. ->start() will be used to continue.
+	 * Starts/Stops a counter present on the PMU.
+	 *
+	 * The PMI handler should stop the counter when perf_event_overflow()
+	 * returns !0. ->start() will be used to continue.
+	 *
+	 * Also used to change the sample period.
+	 *
+	 * Called with IRQs disabled and the PMU disabled on the CPU the event
+	 * is on -- will be called from NMI context with the PMU generates
+	 * NMIs.
+	 *
+	 * ->stop() with PERF_EF_UPDATE will read the counter and update
+	 *  period/count values like ->read() would.
+	 *
+	 * ->start() with PERF_EF_RELOAD will reprogram the the counter
+	 *  value, must be preceded by a ->stop() with PERF_EF_UPDATE.
 	 */
 	void (*start)			(struct perf_event *event, int flags);
 	void (*stop)			(struct perf_event *event, int flags);

 	/*
 	 * Updates the counter value of the event.
+	 *
+	 * For sampling capable PMUs this will also update the software period
+	 * hw_perf_event::period_left field.
 	 */
 	void (*read)			(struct perf_event *event);

@@ -252,20 +332,26 @@ struct pmu {
 	 *
 	 * Start the transaction, after this ->add() doesn't need to
 	 * do schedulability tests.
+	 *
+	 * Optional.
 	 */
-	void (*start_txn)		(struct pmu *pmu); /* optional */
+	void (*start_txn)		(struct pmu *pmu, unsigned int txn_flags);
 	/*
 	 * If ->start_txn() disabled the ->add() schedulability test
 	 * then ->commit_txn() is required to perform one. On success
 	 * the transaction is closed. On error the transaction is kept
 	 * open until ->cancel_txn() is called.
+	 *
+	 * Optional.
 	 */
-	int  (*commit_txn)		(struct pmu *pmu); /* optional */
+	int  (*commit_txn)		(struct pmu *pmu);
 	/*
 	 * Will cancel the transaction, assumes ->del() is called
 	 * for each successful ->add() during the transaction.
+	 *
+	 * Optional.
 	 */
-	void (*cancel_txn)		(struct pmu *pmu); /* optional */
+	void (*cancel_txn)		(struct pmu *pmu);

 	/*
 	 * Will return the value for perf_event_mmap_page::index for this event,

--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -168,6 +168,7 @@ enum perf_branch_sample_type_shift {

 	PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT	= 11, /* call/ret stack */
 	PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT	= 12, /* indirect jumps */
+	PERF_SAMPLE_BRANCH_CALL_SHIFT		= 13, /* direct call */

 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
@@ -188,6 +189,7 @@ enum perf_branch_sample_type {

 	PERF_SAMPLE_BRANCH_CALL_STACK	= 1U << PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT,
 	PERF_SAMPLE_BRANCH_IND_JUMP	= 1U << PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT,
+	PERF_SAMPLE_BRANCH_CALL		= 1U << PERF_SAMPLE_BRANCH_CALL_SHIFT,

 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
@@ -476,7 +478,7 @@ struct perf_event_mmap_page {
 	 *   u64 delta;
 	 *
 	 *   quot = (cyc >> time_shift);
-	 *   rem = cyc & ((1 << time_shift) - 1);
+	 *   rem = cyc & (((u64)1 << time_shift) - 1);
 	 *   delta = time_offset + quot * time_mult +
 	 *              ((rem * time_mult) >> time_shift);
 	 *
@@ -507,7 +509,7 @@ struct perf_event_mmap_page {
 	 * And vice versa:
 	 *
 	 *   quot = cyc >> time_shift;
-	 *   rem  = cyc & ((1 << time_shift) - 1);
+	 *   rem  = cyc & (((u64)1 << time_shift) - 1);
 	 *   timestamp = time_zero + quot * time_mult +
 	 *               ((rem * time_mult) >> time_shift);
 	 */

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
--- a/tools/build/.gitignore
+++ b/tools/build/.gitignore
+fixdep
--- a/tools/build/Build
+++ b/tools/build/Build
+fixdep-y := fixdep.o
--- a/tools/build/Build.include
+++ b/tools/build/Build.include
@@ -54,15 +54,26 @@ make-cmd = $(call escsq,$(subst \#,\\\#,$(subst $$,$$$$,$(cmd_$(1)))))
 # PHONY targets skipped in both cases.
 any-prereq = $(filter-out $(PHONY),$?) $(filter-out $(PHONY) $(wildcard $^),$^)

+###
+# Copy dependency data into .cmd file
+#  - gcc -M dependency info
+#  - command line to create object 'cmd_object :='
+dep-cmd = $(if $(wildcard $(fixdep)),                                           \
+           $(fixdep) $(depfile) $@ '$(make-cmd)' > $(dot-target).tmp;           \
+           rm -f $(depfile);                                                    \
+           mv -f $(dot-target).tmp $(dot-target).cmd,                           \
+           printf '\# cannot find fixdep (%s)\n' $(fixdep) > $(dot-target).cmd; \
+           printf '\# using basic dep data\n\n' >> $(dot-target).cmd;           \
+           cat $(depfile) >> $(dot-target).cmd;                                 \
+           printf '%s\n' 'cmd_$@ := $(make-cmd)' >> $(dot-target).cmd)
+
 ###
 # if_changed_dep  - execute command if any prerequisite is newer than
 #                   target, or command line has changed and update
 #                   dependencies in the cmd file
 if_changed_dep = $(if $(strip $(any-prereq) $(arg-check)),         \
 	@set -e;                                                   \
-	$(echo-cmd) $(cmd_$(1));                                   \
-	cat $(depfile) > $(dot-target).cmd;                        \
-	printf '%s\n' 'cmd_$@ := $(make-cmd)' >> $(dot-target).cmd)
+	$(echo-cmd) $(cmd_$(1)) && $(dep-cmd))

 # if_changed      - execute command if any prerequisite is newer than
 #                   target, or command line has changed

--- a/tools/build/Documentation/Build.txt
+++ b/tools/build/Documentation/Build.txt
@@ -11,8 +11,9 @@ Unlike the kernel we don't have a single build object 'obj-y' list that where
 we setup source objects, but we support more. This allows one 'Build' file to
 carry a sources list for multiple build objects.

-a) Build framework makefiles
----------------------------
+
+Build framework makefiles
+-------------------------

 The build framework consists of 2 Makefiles:

@@ -23,7 +24,7 @@ While the 'Build.include' file contains just some generic definitions, the
 'Makefile.build' file is the makefile used from the outside. It's
 interface/usage is following:

-  $ make -f tools/build/Makefile srctree=$(KSRC) dir=$(DIR) obj=$(OBJECT)
+  $ make -f tools/build/Makefile.build srctree=$(KSRC) dir=$(DIR) obj=$(OBJECT)

 where:

@@ -38,8 +39,9 @@ called $(OBJECT)-in.o:

 which includes all compiled sources described in 'Build' makefiles.

-a) Build makefiles
------------------
+
+Build makefiles
+---------------

 The user supplies 'Build' makefiles that contains a objects list, and connects
 the build to nested directories.
@@ -95,8 +97,31 @@ It's only a matter of 2 single commands to create the final binaries:

 You can check the 'ex' example in 'tools/build/tests/ex' for more details.

-b) Rules
--------
+
+Makefile.include
+----------------
+
+The tools/build/Makefile.include makefile could be included
+via user makefiles to get usefull definitions.
+
+It defines following interface:
+
+  - build macro definition:
+      build := -f $(srctree)/tools/build/Makefile.build dir=. obj
+
+    to make it easier to invoke build like:
+      make $(build)=ex
+
+
+Fixdep
+------
+It is necessary to build the fixdep helper before invoking the build.
+The Makefile.include file adds the fixdep target, that could be
+invoked by the user.
+
+
+Rules
+-----

 The build framework provides standard compilation rules to handle .S and .c
 compilation.
@@ -104,8 +129,9 @@ compilation.
 It's possible to include special rule if needed (like we do for flex or bison
 code generation).

-c) CFLAGS
---------
+
+CFLAGS
+------

 It's possible to alter the standard object C flags in the following way:

@@ -115,8 +141,8 @@ It's possible to alter the standard object C flags in the following way:
 This C flags changes has the scope of the Build makefile they are defined in.


-d) Dependencies
---------------
+Dependencies
+------------

 For each built object file 'a.o' the '.a.cmd' is created and holds:

@@ -130,8 +156,8 @@ All existing '.cmd' files are included in the Build process to follow properly
 the dependencies and trigger a rebuild when necessary.


-e) Single rules
---------------
+Single rules
+------------

 It's possible to build single object file by choice, like:


--- a/tools/build/Makefile
+++ b/tools/build/Makefile
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(shell pwd)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+endif
+
+include $(srctree)/tools//scripts/Makefile.include
+
+define allow-override
+  $(if $(or $(findstring environment,$(origin $(1))),\
+            $(findstring command line,$(origin $(1)))),,\
+    $(eval $(1) = $(2)))
+endef
+
+$(call allow-override,CC,$(CROSS_COMPILE)gcc)
+$(call allow-override,LD,$(CROSS_COMPILE)ld)
+
+ifeq ($(V),1)
+  Q =
+else
+  Q = @
+endif
+
+export Q srctree CC LD
+
+MAKEFLAGS := --no-print-directory
+build     := -f $(srctree)/tools/build/Makefile.build dir=. obj
+
+all: fixdep
+
+clean:
+	$(call QUIET_CLEAN, fixdep)
+	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
+	$(Q)rm -f fixdep
+
+$(OUTPUT)fixdep-in.o: FORCE
+	$(Q)$(MAKE) $(build)=fixdep
+
+$(OUTPUT)fixdep: $(OUTPUT)fixdep-in.o
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $<
+
+FORCE:
+
+.PHONY: FORCE
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -21,6 +21,13 @@ endif

 build-dir := $(srctree)/tools/build

+# Define $(fixdep) for dep-cmd function
+ifeq ($(OUTPUT),)
+  fixdep := $(build-dir)/fixdep
+else
+  fixdep := $(OUTPUT)/fixdep
+endif
+
 # Generic definitions
 include $(build-dir)/Build.include


--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -53,7 +53,8 @@ FEATURE_TESTS ?=			\
 	libdw-dwarf-unwind		\
 	zlib				\
 	lzma				\
-	get_cpuid
+	get_cpuid			\
+	bpf

 FEATURE_DISPLAY ?=			\
 	dwarf				\
@@ -71,7 +72,8 @@ FEATURE_DISPLAY ?=			\
 	libdw-dwarf-unwind		\
 	zlib				\
 	lzma				\
-	get_cpuid
+	get_cpuid			\
+	bpf

 # Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
 # If in the future we need per-feature checks/flags for features not
@@ -121,8 +123,9 @@ define feature_print_text_code
    MSG = $(shell printf '...%30s: %s' $(1) $(2))
 endef

+FEATURE_DUMP_FILENAME = $(OUTPUT)FEATURE-DUMP$(FEATURE_USER)
 FEATURE_DUMP := $(foreach feat,$(FEATURE_DISPLAY),feature-$(feat)($(feature-$(feat))))
-FEATURE_DUMP_FILE := $(shell touch $(OUTPUT)FEATURE-DUMP; cat $(OUTPUT)FEATURE-DUMP)
+FEATURE_DUMP_FILE := $(shell touch $(FEATURE_DUMP_FILENAME); cat $(FEATURE_DUMP_FILENAME))

 ifeq ($(dwarf-post-unwind),1)
  FEATURE_DUMP += dwarf-post-unwind($(dwarf-post-unwind-text))
@@ -131,16 +134,16 @@ endif
 # The $(feature_display) controls the default detection message
 # output. It's set if:
 # - detected features differes from stored features from
-#   last build (in FEATURE-DUMP file)
+#   last build (in $(FEATURE_DUMP_FILENAME) file)
 # - one of the $(FEATURE_DISPLAY) is not detected
 # - VF is enabled

 ifneq ("$(FEATURE_DUMP)","$(FEATURE_DUMP_FILE)")
-  $(shell echo "$(FEATURE_DUMP)" > $(OUTPUT)FEATURE-DUMP)
+  $(shell echo "$(FEATURE_DUMP)" > $(FEATURE_DUMP_FILENAME))
  feature_display := 1
 endif

-feature_display_check = $(eval $(feature_check_code))
+feature_display_check = $(eval $(feature_check_display_code))
 define feature_display_check_code
  ifneq ($(feature-$(1)), 1)
    feature_display := 1

--- a/tools/build/Makefile.include
+++ b/tools/build/Makefile.include
+build := -f $(srctree)/tools/build/Makefile.build dir=. obj
+
+ifdef CROSS_COMPILE
+fixdep:
+else
+fixdep:
+	$(Q)$(MAKE) -C $(srctree)/tools/build fixdep
+endif
+
+.PHONY: fixdep
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -132,10 +132,10 @@ test-libbfd.bin:
 	$(BUILD) -DPACKAGE='"perf"' -lbfd -lz -liberty -ldl

 test-liberty.bin:
-	$(CC) -Wall -Werror -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty
+	$(CC) $(CFLAGS) -Wall -Werror -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' $(LDFLAGS) -lbfd -ldl -liberty

 test-liberty-z.bin:
-	$(CC) -Wall -Werror -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty -lz
+	$(CC) $(CFLAGS) -Wall -Werror -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' $(LDFLAGS) -lbfd -ldl -liberty -lz

 test-cplus-demangle.bin:
 	$(BUILD) -liberty

--- a/tools/build/fixdep.c
+++ b/tools/build/fixdep.c
+/*
+ * "Optimize" a list of dependencies as spit out by gcc -MD
+ * for the build framework.
+ *
+ * Original author:
+ *   Copyright    2002 by Kai Germaschewski  <kai.germaschewski@gmx.de>
+ *
+ * This code has been borrowed from kbuild's fixdep (scripts/basic/fixdep.c),
+ * Please check it for detailed explanation. This fixdep borow only the
+ * base transformation of dependecies without the CONFIG mangle.
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <limits.h>
+
+char *target;
+char *depfile;
+char *cmdline;
+
+static void usage(void)
+{
+	fprintf(stderr, "Usage: fixdep <depfile> <target> <cmdline>\n");
+	exit(1);
+}
+
+/*
+ * Print out the commandline prefixed with cmd_<target filename> :=
+ */
+static void print_cmdline(void)
+{
+	printf("cmd_%s := %s\n\n", target, cmdline);
+}
+
+/*
+ * Important: The below generated source_foo.o and deps_foo.o variable
+ * assignments are parsed not only by make, but also by the rather simple
+ * parser in scripts/mod/sumversion.c.
+ */
+static void parse_dep_file(void *map, size_t len)
+{
+	char *m = map;
+	char *end = m + len;
+	char *p;
+	char s[PATH_MAX];
+	int is_target;
+	int saw_any_target = 0;
+	int is_first_dep = 0;
+
+	while (m < end) {
+		/* Skip any "white space" */
+		while (m < end && (*m == ' ' || *m == '\\' || *m == '\n'))
+			m++;
+		/* Find next "white space" */
+		p = m;
+		while (p < end && *p != ' ' && *p != '\\' && *p != '\n')
+			p++;
+		/* Is the token we found a target name? */
+		is_target = (*(p-1) == ':');
+		/* Don't write any target names into the dependency file */
+		if (is_target) {
+			/* The /next/ file is the first dependency */
+			is_first_dep = 1;
+		} else {
+			/* Save this token/filename */
+			memcpy(s, m, p-m);
+			s[p - m] = 0;
+
+			/*
+			 * Do not list the source file as dependency,
+			 * so that kbuild is not confused if a .c file
+			 * is rewritten into .S or vice versa. Storing
+			 * it in source_* is needed for modpost to
+			 * compute srcversions.
+			 */
+			if (is_first_dep) {
+				/*
+				 * If processing the concatenation of
+				 * multiple dependency files, only
+				 * process the first target name, which
+				 * will be the original source name,
+				 * and ignore any other target names,
+				 * which will be intermediate temporary
+				 * files.
+				 */
+				if (!saw_any_target) {
+					saw_any_target = 1;
+					printf("source_%s := %s\n\n",
+						target, s);
+					printf("deps_%s := \\\n",
+						target);
+				}
+				is_first_dep = 0;
+			} else
+				printf("  %s \\\n", s);
+		}
+		/*
+		 * Start searching for next token immediately after the first
+		 * "whitespace" character that follows this token.
+		 */
+		m = p + 1;
+	}
+
+	if (!saw_any_target) {
+		fprintf(stderr, "fixdep: parse error; no targets found\n");
+		exit(1);
+	}
+
+	printf("\n%s: $(deps_%s)\n\n", target, target);
+	printf("$(deps_%s):\n", target);
+}
+
+static void print_deps(void)
+{
+	struct stat st;
+	int fd;
+	void *map;
+
+	fd = open(depfile, O_RDONLY);
+	if (fd < 0) {
+		fprintf(stderr, "fixdep: error opening depfile: ");
+		perror(depfile);
+		exit(2);
+	}
+	if (fstat(fd, &st) < 0) {
+		fprintf(stderr, "fixdep: error fstat'ing depfile: ");
+		perror(depfile);
+		exit(2);
+	}
+	if (st.st_size == 0) {
+		fprintf(stderr, "fixdep: %s is empty\n", depfile);
+		close(fd);
+		return;
+	}
+	map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if ((long) map == -1) {
+		perror("fixdep: mmap");
+		close(fd);
+		return;
+	}
+
+	parse_dep_file(map, st.st_size);
+
+	munmap(map, st.st_size);
+
+	close(fd);
+}
+
+int main(int argc, char **argv)
+{
+	if (argc != 4)
+		usage();
+
+	depfile = argv[1];
+	target  = argv[2];
+	cmdline = argv[3];
+
+	print_cmdline();
+	print_deps();
+
+	return 0;
+}
--- a/tools/build/tests/ex/Build
+++ b/tools/build/tests/ex/Build
@@ -4,6 +4,7 @@ ex-y += b.o
 ex-y += b.o
 ex-y += empty/
 ex-y += empty2/
+ex-y += inc.o

 libex-y += c.o
 libex-y += d.o

--- a/tools/build/tests/ex/Makefile
+++ b/tools/build/tests/ex/Makefile
-export srctree := ../../../..
+export srctree := $(abspath ../../../..)
 export CC      := gcc
 export LD      := ld
 export AR      := ar

-build := -f $(srctree)/tools/build/Makefile.build dir=. obj
+ex:
+
+include $(srctree)/tools/build/Makefile.include
+
 ex: ex-in.o libex-in.o
 	gcc -o $@ $^

-ex.%: FORCE
+ex.%: fixdep FORCE
 	make -f $(srctree)/tools/build/Makefile.build dir=. $@

-ex-in.o: FORCE
+ex-in.o: fixdep FORCE
 	make $(build)=ex

-libex-in.o: FORCE
+libex-in.o: fixdep FORCE
 	make $(build)=libex

 clean:

--- a/tools/build/tests/ex/ex.c
+++ b/tools/build/tests/ex/ex.c
@@ -5,6 +5,7 @@ int c(void);
 int d(void);
 int e(void);
 int f(void);
+int inc(void);

 int main(void)
 {
@@ -14,6 +15,7 @@ int main(void)
 	d();
 	e();
 	f();
+	inc();

 	return 0;
 }
--- a/tools/build/tests/ex/inc.c
+++ b/tools/build/tests/ex/inc.c
+#ifdef INCLUDE
+#include "krava.h"
+#endif
+
+int inc(void)
+{
+	return 0;
+}
--- a/tools/build/tests/run.sh
+++ b/tools/build/tests/run.sh
@@ -34,9 +34,36 @@ function test_ex_suffix {
 	make -C ex V=1 clean > /dev/null 2>&1
 	rm -f ex.out
 }
+
+function test_ex_include {
+	make -C ex V=1 clean > ex.out 2>&1
+
+	# build with krava.h include
+	touch ex/krava.h
+	make -C ex V=1 CFLAGS=-DINCLUDE >> ex.out 2>&1
+
+	if [ ! -x ./ex/ex ]; then
+	  echo FAILED
+	  exit -1
+	fi
+
+	# build without the include
+	rm -f ex/krava.h ex/ex
+	make -C ex V=1 >> ex.out 2>&1
+
+	if [ ! -x ./ex/ex ]; then
+	  echo FAILED
+	  exit -1
+	fi
+
+	make -C ex V=1 clean > /dev/null 2>&1
+	rm -f ex.out
+}
+
 echo -n Testing..

 test_ex
 test_ex_suffix
+test_ex_include

 echo OK
--- a/tools/include/linux/compiler.h
+++ b/tools/include/linux/compiler.h
@@ -43,13 +43,29 @@

 #include <linux/types.h>

+/*
+ * Following functions are taken from kernel sources and
+ * break aliasing rules in their original form.
+ *
+ * While kernel is compiled with -fno-strict-aliasing,
+ * perf uses -Wstrict-aliasing=3 which makes build fail
+ * under gcc 4.4.
+ *
+ * Using extra __may_alias__ type to allow aliasing
+ * in this case.
+ */
+typedef __u8  __attribute__((__may_alias__))  __u8_alias_t;
+typedef __u16 __attribute__((__may_alias__)) __u16_alias_t;
+typedef __u32 __attribute__((__may_alias__)) __u32_alias_t;
+typedef __u64 __attribute__((__may_alias__)) __u64_alias_t;
+
 static __always_inline void __read_once_size(const volatile void *p, void *res, int size)
 {
 	switch (size) {
-	case 1: *(__u8 *)res = *(volatile __u8 *)p; break;
-	case 2: *(__u16 *)res = *(volatile __u16 *)p; break;
-	case 4: *(__u32 *)res = *(volatile __u32 *)p; break;
-	case 8: *(__u64 *)res = *(volatile __u64 *)p; break;
+	case 1: *(__u8_alias_t  *) res = *(volatile __u8_alias_t  *) p; break;
+	case 2: *(__u16_alias_t *) res = *(volatile __u16_alias_t *) p; break;
+	case 4: *(__u32_alias_t *) res = *(volatile __u32_alias_t *) p; break;
+	case 8: *(__u64_alias_t *) res = *(volatile __u64_alias_t *) p; break;
 	default:
 		barrier();
 		__builtin_memcpy((void *)res, (const void *)p, size);
@@ -60,10 +76,10 @@ static __always_inline void __read_once_size(const volatile void *p, void *res,
 static __always_inline void __write_once_size(volatile void *p, void *res, int size)
 {
 	switch (size) {
-	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
-	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
-	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
-	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
+	case 1: *(volatile  __u8_alias_t *) p = *(__u8_alias_t  *) res; break;
+	case 2: *(volatile __u16_alias_t *) p = *(__u16_alias_t *) res; break;
+	case 4: *(volatile __u32_alias_t *) p = *(__u32_alias_t *) res; break;
+	case 8: *(volatile __u64_alias_t *) p = *(__u64_alias_t *) res; break;
 	default:
 		barrier();
 		__builtin_memcpy((void *)p, (const void *)res, size);

--- a/tools/include/linux/err.h
+++ b/tools/include/linux/err.h
+#ifndef __TOOLS_LINUX_ERR_H
+#define __TOOLS_LINUX_ERR_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+#include <asm/errno.h>
+
+/*
+ * Original kernel header comment:
+ *
+ * Kernel pointers have redundant information, so we can use a
+ * scheme where we can return either an error code or a normal
+ * pointer with the same return value.
+ *
+ * This should be a per-architecture thing, to allow different
+ * error and pointer decisions.
+ *
+ * Userspace note:
+ * The same principle works for userspace, because 'error' pointers
+ * fall down to the unused hole far from user space, as described
+ * in Documentation/x86/x86_64/mm.txt for x86_64 arch:
+ *
+ * 0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm hole caused by [48:63] sign extension
+ * ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+ *
+ * It should be the same case for other architectures, because
+ * this code is used in generic kernel code.
+ */
+#define MAX_ERRNO	4095
+
+#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)
+
+static inline void * __must_check ERR_PTR(long error_)
+{
+	return (void *) error_;
+}
+
+static inline long __must_check PTR_ERR(__force const void *ptr)
+{
+	return (long) ptr;
+}
+
+static inline bool __must_check IS_ERR(__force const void *ptr)
+{
+	return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+#endif /* _LINUX_ERR_H */
--- a/tools/include/linux/filter.h
+++ b/tools/include/linux/filter.h
+/*
+ * Linux Socket Filter Data Structures
+ */
+#ifndef __TOOLS_LINUX_FILTER_H
+#define __TOOLS_LINUX_FILTER_H
+
+#include <linux/bpf.h>
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	BPF_REG_1
+#define BPF_REG_ARG2	BPF_REG_2
+#define BPF_REG_ARG3	BPF_REG_3
+#define BPF_REG_ARG4	BPF_REG_4
+#define BPF_REG_ARG5	BPF_REG_5
+#define BPF_REG_CTX	BPF_REG_6
+#define BPF_REG_FP	BPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	BPF_REG_0
+#define BPF_REG_X	BPF_REG_7
+#define BPF_REG_TMP	BPF_REG_8
+
+/* BPF program can access up to 512 bytes of stack space. */
+#define MAX_BPF_STACK	512
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU64_IMM(OP, DST, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_OP(OP) | BPF_K,	\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Endianess conversion, cpu_to_{l,b}e(), {l,b}e_to_cpu() */
+
+#define BPF_ENDIAN(TYPE, DST, LEN)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_END | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = LEN })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV64_IMM(DST, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type,  BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Indirect packet access, R0 = *(uint *) (skb->data + src_reg + imm32) */
+
+#define BPF_LD_IND(SIZE, SRC, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_IND,	\
+		.dst_reg = 0,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
+
+#define BPF_ST_MEM(SIZE, DST, OFF, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
+
+#define BPF_JMP_REG(OP, DST, SRC, OFF)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Function call */
+
+#define BPF_EMIT_CALL(FUNC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_CALL,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = ((FUNC) - BPF_FUNC_unspec) })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct bpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#endif /* __TOOLS_LINUX_FILTER_H */
--- a/tools/lib/api/Build
+++ b/tools/lib/api/Build
 libapi-y += fd/
 libapi-y += fs/
+libapi-y += cpu.o
--- a/tools/lib/api/Makefile
+++ b/tools/lib/api/Makefile
@@ -21,12 +21,14 @@ CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64

 RM = rm -f

-build  := -f $(srctree)/tools/build/Makefile.build dir=. obj
 API_IN := $(OUTPUT)libapi-in.o

+all:
+
 export srctree OUTPUT CC LD CFLAGS V
+include $(srctree)/tools/build/Makefile.include

-all: $(LIBFILE)
+all: fixdep $(LIBFILE)

 $(API_IN): FORCE
 	@$(MAKE) $(build)=libapi

--- a/tools/lib/api/cpu.c
+++ b/tools/lib/api/cpu.c
+#include <stdio.h>
+
+#include "cpu.h"
+#include "fs/fs.h"
+
+int cpu__get_max_freq(unsigned long long *freq)
+{
+	char entry[PATH_MAX];
+	int cpu;
+
+	if (sysfs__read_int("devices/system/cpu/online", &cpu) < 0)
+		return -1;
+
+	snprintf(entry, sizeof(entry),
+		 "devices/system/cpu/cpu%d/cpufreq/cpuinfo_max_freq", cpu);
+
+	return sysfs__read_ull(entry, freq);
+}
--- a/tools/lib/api/cpu.h
+++ b/tools/lib/api/cpu.h
+#ifndef __API_CPU__
+#define __API_CPU__
+
+int cpu__get_max_freq(unsigned long long *freq);
+
+#endif /* __API_CPU__ */
--- a/tools/lib/api/fs/Build
+++ b/tools/lib/api/fs/Build
 libapi-y += fs.o
-libapi-y += debugfs.o
-libapi-y += findfs.o
-libapi-y += tracefs.o
+libapi-y += tracing_path.o
--- a/tools/lib/api/fs/debugfs.c
+++ b/tools/lib/api/fs/debugfs.c
-#define _GNU_SOURCE
-#include <errno.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <unistd.h>
-#include <stdbool.h>
-#include <sys/vfs.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/mount.h>
-#include <linux/kernel.h>
-
-#include "debugfs.h"
-#include "tracefs.h"
-
-#ifndef DEBUGFS_DEFAULT_PATH
-#define DEBUGFS_DEFAULT_PATH		"/sys/kernel/debug"
-#endif
-
-char debugfs_mountpoint[PATH_MAX + 1] = DEBUGFS_DEFAULT_PATH;
-
-static const char * const debugfs_known_mountpoints[] = {
-	DEBUGFS_DEFAULT_PATH,
-	"/debug",
-	0,
-};
-
-static bool debugfs_found;
-
-bool debugfs_configured(void)
-{
-	return debugfs_find_mountpoint() != NULL;
-}
-
-/* find the path to the mounted debugfs */
-const char *debugfs_find_mountpoint(void)
-{
-	const char *ret;
-
-	if (debugfs_found)
-		return (const char *)debugfs_mountpoint;
-
-	ret = find_mountpoint("debugfs", (long) DEBUGFS_MAGIC,
-			      debugfs_mountpoint, PATH_MAX + 1,
-			      debugfs_known_mountpoints);
-	if (ret)
-		debugfs_found = true;
-
-	return ret;
-}
-
-/* mount the debugfs somewhere if it's not mounted */
-char *debugfs_mount(const char *mountpoint)
-{
-	/* see if it's already mounted */
-	if (debugfs_find_mountpoint())
-		goto out;
-
-	/* if not mounted and no argument */
-	if (mountpoint == NULL) {
-		/* see if environment variable set */
-		mountpoint = getenv(PERF_DEBUGFS_ENVIRONMENT);
-		/* if no environment variable, use default */
-		if (mountpoint == NULL)
-			mountpoint = DEBUGFS_DEFAULT_PATH;
-	}
-
-	if (mount(NULL, mountpoint, "debugfs", 0, NULL) < 0)
-		return NULL;
-
-	/* save the mountpoint */
-	debugfs_found = true;
-	strncpy(debugfs_mountpoint, mountpoint, sizeof(debugfs_mountpoint));
-out:
-	return debugfs_mountpoint;
-}
-
-int debugfs__strerror_open(int err, char *buf, size_t size, const char *filename)
-{
-	char sbuf[128];
-
-	switch (err) {
-	case ENOENT:
-		if (debugfs_found) {
-			snprintf(buf, size,
-				 "Error:\tFile %s/%s not found.\n"
-				 "Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n",
-				 debugfs_mountpoint, filename);
-			break;
-		}
-		snprintf(buf, size, "%s",
-			 "Error:\tUnable to find debugfs\n"
-			 "Hint:\tWas your kernel compiled with debugfs support?\n"
-			 "Hint:\tIs the debugfs filesystem mounted?\n"
-			 "Hint:\tTry 'sudo mount -t debugfs nodev /sys/kernel/debug'");
-		break;
-	case EACCES: {
-		const char *mountpoint = debugfs_mountpoint;
-
-		if (!access(debugfs_mountpoint, R_OK) && strncmp(filename, "tracing/", 8) == 0) {
-			const char *tracefs_mntpoint = tracefs_find_mountpoint();
-
-			if (tracefs_mntpoint)
-				mountpoint = tracefs_mntpoint;
-		}
-
-		snprintf(buf, size,
-			 "Error:\tNo permissions to read %s/%s\n"
-			 "Hint:\tTry 'sudo mount -o remount,mode=755 %s'\n",
-			 debugfs_mountpoint, filename, mountpoint);
-	}
-		break;
-	default:
-		snprintf(buf, size, "%s", strerror_r(err, sbuf, sizeof(sbuf)));
-		break;
-	}
-
-	return 0;
-}
-
-int debugfs__strerror_open_tp(int err, char *buf, size_t size, const char *sys, const char *name)
-{
-	char path[PATH_MAX];
-
-	snprintf(path, PATH_MAX, "tracing/events/%s/%s", sys, name ?: "*");
-
-	return debugfs__strerror_open(err, buf, size, path);
-}
--- a/tools/lib/api/fs/debugfs.h
+++ b/tools/lib/api/fs/debugfs.h
-#ifndef __API_DEBUGFS_H__
-#define __API_DEBUGFS_H__
-
-#include "findfs.h"
-
-#ifndef DEBUGFS_MAGIC
-#define DEBUGFS_MAGIC          0x64626720
-#endif
-
-#ifndef PERF_DEBUGFS_ENVIRONMENT
-#define PERF_DEBUGFS_ENVIRONMENT "PERF_DEBUGFS_DIR"
-#endif
-
-bool debugfs_configured(void);
-const char *debugfs_find_mountpoint(void);
-char *debugfs_mount(const char *mountpoint);
-
-extern char debugfs_mountpoint[];
-
-int debugfs__strerror_open(int err, char *buf, size_t size, const char *filename);
-int debugfs__strerror_open_tp(int err, char *buf, size_t size, const char *sys, const char *name);
-
-#endif /* __API_DEBUGFS_H__ */
--- a/tools/lib/api/fs/findfs.c
+++ b/tools/lib/api/fs/findfs.c
-#include <errno.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <stdbool.h>
-#include <sys/vfs.h>
-
-#include "findfs.h"
-
-/* verify that a mountpoint is actually the type we want */
-
-int valid_mountpoint(const char *mount, long magic)
-{
-	struct statfs st_fs;
-
-	if (statfs(mount, &st_fs) < 0)
-		return -ENOENT;
-	else if ((long)st_fs.f_type != magic)
-		return -ENOENT;
-
-	return 0;
-}
-
-/* find the path to a mounted file system */
-const char *find_mountpoint(const char *fstype, long magic,
-			    char *mountpoint, int len,
-			    const char * const *known_mountpoints)
-{
-	const char * const *ptr;
-	char format[128];
-	char type[100];
-	FILE *fp;
-
-	if (known_mountpoints) {
-		ptr = known_mountpoints;
-		while (*ptr) {
-			if (valid_mountpoint(*ptr, magic) == 0) {
-				strncpy(mountpoint, *ptr, len - 1);
-				mountpoint[len-1] = 0;
-				return mountpoint;
-			}
-			ptr++;
-		}
-	}
-
-	/* give up and parse /proc/mounts */
-	fp = fopen("/proc/mounts", "r");
-	if (fp == NULL)
-		return NULL;
-
-	snprintf(format, 128, "%%*s %%%ds %%99s %%*s %%*d %%*d\n", len);
-
-	while (fscanf(fp, format, mountpoint, type) == 2) {
-		if (strcmp(type, fstype) == 0)
-			break;
-	}
-	fclose(fp);
-
-	if (strcmp(type, fstype) != 0)
-		return NULL;
-
-	return mountpoint;
-}
--- a/tools/lib/api/fs/findfs.h
+++ b/tools/lib/api/fs/findfs.h
-#ifndef __API_FINDFS_H__
-#define __API_FINDFS_H__
-
-#include <stdbool.h>
-
-#define _STR(x) #x
-#define STR(x) _STR(x)
-
-/*
- * On most systems <limits.h> would have given us this, but  not on some systems
- * (e.g. GNU/Hurd).
- */
-#ifndef PATH_MAX
-#define PATH_MAX 4096
-#endif
-
-const char *find_mountpoint(const char *fstype, long magic,
-			    char *mountpoint, int len,
-			    const char * const *known_mountpoints);
-
-int valid_mountpoint(const char *mount, long magic);
-
-#endif /* __API_FINDFS_H__ */
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
-/* TODO merge/factor in debugfs.c here */
-
 #include <ctype.h>
 #include <errno.h>
+#include <limits.h>
 #include <stdbool.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -11,10 +10,29 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <sys/mount.h>

-#include "debugfs.h"
 #include "fs.h"

+#define _STR(x) #x
+#define STR(x) _STR(x)
+
+#ifndef SYSFS_MAGIC
+#define SYSFS_MAGIC            0x62656572
+#endif
+
+#ifndef PROC_SUPER_MAGIC
+#define PROC_SUPER_MAGIC       0x9fa0
+#endif
+
+#ifndef DEBUGFS_MAGIC
+#define DEBUGFS_MAGIC          0x64626720
+#endif
+
+#ifndef TRACEFS_MAGIC
+#define TRACEFS_MAGIC          0x74726163
+#endif
+
 static const char * const sysfs__fs_known_mountpoints[] = {
 	"/sys",
 	0,
@@ -25,19 +43,48 @@ static const char * const procfs__known_mountpoints[] = {
 	0,
 };

+#ifndef DEBUGFS_DEFAULT_PATH
+#define DEBUGFS_DEFAULT_PATH "/sys/kernel/debug"
+#endif
+
+static const char * const debugfs__known_mountpoints[] = {
+	DEBUGFS_DEFAULT_PATH,
+	"/debug",
+	0,
+};
+
+
+#ifndef TRACEFS_DEFAULT_PATH
+#define TRACEFS_DEFAULT_PATH "/sys/kernel/tracing"
+#endif
+
+static const char * const tracefs__known_mountpoints[] = {
+	TRACEFS_DEFAULT_PATH,
+	"/sys/kernel/debug/tracing",
+	"/tracing",
+	"/trace",
+	0,
+};
+
 struct fs {
 	const char		*name;
 	const char * const	*mounts;
-	char			 path[PATH_MAX + 1];
+	char			 path[PATH_MAX];
 	bool			 found;
 	long			 magic;
 };

 enum {
-	FS__SYSFS  = 0,
-	FS__PROCFS = 1,
+	FS__SYSFS   = 0,
+	FS__PROCFS  = 1,
+	FS__DEBUGFS = 2,
+	FS__TRACEFS = 3,
 };

+#ifndef TRACEFS_MAGIC
+#define TRACEFS_MAGIC 0x74726163
+#endif
+
 static struct fs fs__entries[] = {
 	[FS__SYSFS] = {
 		.name	= "sysfs",
@@ -49,6 +96,16 @@ static struct fs fs__entries[] = {
 		.mounts	= procfs__known_mountpoints,
 		.magic	= PROC_SUPER_MAGIC,
 	},
+	[FS__DEBUGFS] = {
+		.name	= "debugfs",
+		.mounts	= debugfs__known_mountpoints,
+		.magic	= DEBUGFS_MAGIC,
+	},
+	[FS__TRACEFS] = {
+		.name	= "tracefs",
+		.mounts	= tracefs__known_mountpoints,
+		.magic	= TRACEFS_MAGIC,
+	},
 };

 static bool fs__read_mounts(struct fs *fs)
@@ -159,14 +216,54 @@ static const char *fs__mountpoint(int idx)
 	return fs__get_mountpoint(fs);
 }

-#define FS__MOUNTPOINT(name, idx)	\
-const char *name##__mountpoint(void)	\
-{					\
-	return fs__mountpoint(idx);	\
+static const char *mount_overload(struct fs *fs)
+{
+	size_t name_len = strlen(fs->name);
+	/* "PERF_" + name + "_ENVIRONMENT" + '\0' */
+	char upper_name[5 + name_len + 12 + 1];
+
+	snprintf(upper_name, name_len, "PERF_%s_ENVIRONMENT", fs->name);
+	mem_toupper(upper_name, name_len);
+
+	return getenv(upper_name) ?: *fs->mounts;
+}
+
+static const char *fs__mount(int idx)
+{
+	struct fs *fs = &fs__entries[idx];
+	const char *mountpoint;
+
+	if (fs__mountpoint(idx))
+		return (const char *)fs->path;
+
+	mountpoint = mount_overload(fs);
+
+	if (mount(NULL, mountpoint, fs->name, 0, NULL) < 0)
+		return NULL;
+
+	return fs__check_mounts(fs) ? fs->path : NULL;
+}
+
+#define FS(name, idx)				\
+const char *name##__mountpoint(void)		\
+{						\
+	return fs__mountpoint(idx);		\
+}						\
+						\
+const char *name##__mount(void)			\
+{						\
+	return fs__mount(idx);			\
+}						\
+						\
+bool name##__configured(void)			\
+{						\
+	return name##__mountpoint() != NULL;	\
 }

-FS__MOUNTPOINT(sysfs,  FS__SYSFS);
-FS__MOUNTPOINT(procfs, FS__PROCFS);
+FS(sysfs,   FS__SYSFS);
+FS(procfs,  FS__PROCFS);
+FS(debugfs, FS__DEBUGFS);
+FS(tracefs, FS__TRACEFS);

 int filename__read_int(const char *filename, int *value)
 {
@@ -185,6 +282,50 @@ int filename__read_int(const char *filename, int *value)
 	return err;
 }

+int filename__read_ull(const char *filename, unsigned long long *value)
+{
+	char line[64];
+	int fd = open(filename, O_RDONLY), err = -1;
+
+	if (fd < 0)
+		return -1;
+
+	if (read(fd, line, sizeof(line)) > 0) {
+		*value = strtoull(line, NULL, 10);
+		if (*value != ULLONG_MAX)
+			err = 0;
+	}
+
+	close(fd);
+	return err;
+}
+
+int sysfs__read_ull(const char *entry, unsigned long long *value)
+{
+	char path[PATH_MAX];
+	const char *sysfs = sysfs__mountpoint();
+
+	if (!sysfs)
+		return -1;
+
+	snprintf(path, sizeof(path), "%s/%s", sysfs, entry);
+
+	return filename__read_ull(path, value);
+}
+
+int sysfs__read_int(const char *entry, int *value)
+{
+	char path[PATH_MAX];
+	const char *sysfs = sysfs__mountpoint();
+
+	if (!sysfs)
+		return -1;
+
+	snprintf(path, sizeof(path), "%s/%s", sysfs, entry);
+
+	return filename__read_int(path, value);
+}
+
 int sysctl__read_int(const char *sysctl, int *value)
 {
 	char path[PATH_MAX];

--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
 #ifndef __API_FS__
 #define __API_FS__

-#ifndef SYSFS_MAGIC
-#define SYSFS_MAGIC            0x62656572
-#endif
+#include <stdbool.h>

-#ifndef PROC_SUPER_MAGIC
-#define PROC_SUPER_MAGIC       0x9fa0
+/*
+ * On most systems <limits.h> would have given us this, but  not on some systems
+ * (e.g. GNU/Hurd).
+ */
+#ifndef PATH_MAX
+#define PATH_MAX 4096
 #endif

-const char *sysfs__mountpoint(void);
-const char *procfs__mountpoint(void);
+#define FS(name)				\
+	const char *name##__mountpoint(void);	\
+	const char *name##__mount(void);	\
+	bool name##__configured(void);		\
+
+FS(sysfs)
+FS(procfs)
+FS(debugfs)
+FS(tracefs)
+
+#undef FS
+

 int filename__read_int(const char *filename, int *value);
+int filename__read_ull(const char *filename, unsigned long long *value);
+
 int sysctl__read_int(const char *sysctl, int *value);
+int sysfs__read_int(const char *entry, int *value);
+int sysfs__read_ull(const char *entry, unsigned long long *value);
 #endif /* __API_FS__ */
--- a/tools/lib/api/fs/tracefs.c
+++ b/tools/lib/api/fs/tracefs.c
-#include <errno.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <unistd.h>
-#include <stdbool.h>
-#include <sys/vfs.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/mount.h>
-#include <linux/kernel.h>
-
-#include "tracefs.h"
-
-#ifndef TRACEFS_DEFAULT_PATH
-#define TRACEFS_DEFAULT_PATH		"/sys/kernel/tracing"
-#endif
-
-char tracefs_mountpoint[PATH_MAX + 1] = TRACEFS_DEFAULT_PATH;
-
-static const char * const tracefs_known_mountpoints[] = {
-	TRACEFS_DEFAULT_PATH,
-	"/sys/kernel/debug/tracing",
-	"/tracing",
-	"/trace",
-	0,
-};
-
-static bool tracefs_found;
-
-bool tracefs_configured(void)
-{
-	return tracefs_find_mountpoint() != NULL;
-}
-
-/* find the path to the mounted tracefs */
-const char *tracefs_find_mountpoint(void)
-{
-	const char *ret;
-
-	if (tracefs_found)
-		return (const char *)tracefs_mountpoint;
-
-	ret = find_mountpoint("tracefs", (long) TRACEFS_MAGIC,
-			      tracefs_mountpoint, PATH_MAX + 1,
-			      tracefs_known_mountpoints);
-
-	if (ret)
-		tracefs_found = true;
-
-	return ret;
-}
-
-/* mount the tracefs somewhere if it's not mounted */
-char *tracefs_mount(const char *mountpoint)
-{
-	/* see if it's already mounted */
-	if (tracefs_find_mountpoint())
-		goto out;
-
-	/* if not mounted and no argument */
-	if (mountpoint == NULL) {
-		/* see if environment variable set */
-		mountpoint = getenv(PERF_TRACEFS_ENVIRONMENT);
-		/* if no environment variable, use default */
-		if (mountpoint == NULL)
-			mountpoint = TRACEFS_DEFAULT_PATH;
-	}
-
-	if (mount(NULL, mountpoint, "tracefs", 0, NULL) < 0)
-		return NULL;
-
-	/* save the mountpoint */
-	tracefs_found = true;
-	strncpy(tracefs_mountpoint, mountpoint, sizeof(tracefs_mountpoint));
-out:
-	return tracefs_mountpoint;
-}
--- a/tools/lib/api/fs/tracefs.h
+++ b/tools/lib/api/fs/tracefs.h
-#ifndef __API_TRACEFS_H__
-#define __API_TRACEFS_H__
-
-#include "findfs.h"
-
-#ifndef TRACEFS_MAGIC
-#define TRACEFS_MAGIC          0x74726163
-#endif
-
-#ifndef PERF_TRACEFS_ENVIRONMENT
-#define PERF_TRACEFS_ENVIRONMENT "PERF_TRACEFS_DIR"
-#endif
-
-bool tracefs_configured(void);
-const char *tracefs_find_mountpoint(void);
-int tracefs_valid_mountpoint(const char *debugfs);
-char *tracefs_mount(const char *mountpoint);
-
-extern char tracefs_mountpoint[];
-
-#endif /* __API_DEBUGFS_H__ */
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
+#ifndef _GNU_SOURCE
+# define _GNU_SOURCE
+#endif
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include "fs.h"
+
+#include "tracing_path.h"
+
+
+char tracing_mnt[PATH_MAX]         = "/sys/kernel/debug";
+char tracing_path[PATH_MAX]        = "/sys/kernel/debug/tracing";
+char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events";
+
+
+static void __tracing_path_set(const char *tracing, const char *mountpoint)
+{
+	snprintf(tracing_mnt, sizeof(tracing_mnt), "%s", mountpoint);
+	snprintf(tracing_path, sizeof(tracing_path), "%s/%s",
+		 mountpoint, tracing);
+	snprintf(tracing_events_path, sizeof(tracing_events_path), "%s/%s%s",
+		 mountpoint, tracing, "events");
+}
+
+static const char *tracing_path_tracefs_mount(void)
+{
+	const char *mnt;
+
+	mnt = tracefs__mount();
+	if (!mnt)
+		return NULL;
+
+	__tracing_path_set("", mnt);
+
+	return mnt;
+}
+
+static const char *tracing_path_debugfs_mount(void)
+{
+	const char *mnt;
+
+	mnt = debugfs__mount();
+	if (!mnt)
+		return NULL;
+
+	__tracing_path_set("tracing/", mnt);
+
+	return mnt;
+}
+
+const char *tracing_path_mount(void)
+{
+	const char *mnt;
+
+	mnt = tracing_path_tracefs_mount();
+	if (mnt)
+		return mnt;
+
+	mnt = tracing_path_debugfs_mount();
+
+	return mnt;
+}
+
+void tracing_path_set(const char *mntpt)
+{
+	__tracing_path_set("tracing/", mntpt);
+}
+
+char *get_tracing_file(const char *name)
+{
+	char *file;
+
+	if (asprintf(&file, "%s/%s", tracing_path, name) < 0)
+		return NULL;
+
+	return file;
+}
+
+void put_tracing_file(char *file)
+{
+	free(file);
+}
+
+static int strerror_open(int err, char *buf, size_t size, const char *filename)
+{
+	char sbuf[128];
+
+	switch (err) {
+	case ENOENT:
+		/*
+		 * We will get here if we can't find the tracepoint, but one of
+		 * debugfs or tracefs is configured, which means you probably
+		 * want some tracepoint which wasn't compiled in your kernel.
+		 * - jirka
+		 */
+		if (debugfs__configured() || tracefs__configured()) {
+			snprintf(buf, size,
+				 "Error:\tFile %s/%s not found.\n"
+				 "Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n",
+				 tracing_events_path, filename);
+			break;
+		}
+		snprintf(buf, size, "%s",
+			 "Error:\tUnable to find debugfs/tracefs\n"
+			 "Hint:\tWas your kernel compiled with debugfs/tracefs support?\n"
+			 "Hint:\tIs the debugfs/tracefs filesystem mounted?\n"
+			 "Hint:\tTry 'sudo mount -t debugfs nodev /sys/kernel/debug'");
+		break;
+	case EACCES: {
+		snprintf(buf, size,
+			 "Error:\tNo permissions to read %s/%s\n"
+			 "Hint:\tTry 'sudo mount -o remount,mode=755 %s'\n",
+			 tracing_events_path, filename, tracing_mnt);
+	}
+		break;
+	default:
+		snprintf(buf, size, "%s", strerror_r(err, sbuf, sizeof(sbuf)));
+		break;
+	}
+
+	return 0;
+}
+
+int tracing_path__strerror_open_tp(int err, char *buf, size_t size, const char *sys, const char *name)
+{
+	char path[PATH_MAX];
+
+	snprintf(path, PATH_MAX, "%s/%s", sys, name ?: "*");
+
+	return strerror_open(err, buf, size, path);
+}
--- a/tools/lib/api/fs/tracing_path.h
+++ b/tools/lib/api/fs/tracing_path.h
+#ifndef __API_FS_TRACING_PATH_H
+#define __API_FS_TRACING_PATH_H
+
+#include <linux/types.h>
+
+extern char tracing_path[];
+extern char tracing_events_path[];
+
+void tracing_path_set(const char *mountpoint);
+const char *tracing_path_mount(void);
+
+char *get_tracing_file(const char *name);
+void put_tracing_file(char *file);
+
+int tracing_path__strerror_open_tp(int err, char *buf, size_t size, const char *sys, const char *name);
+#endif /* __API_FS_TRACING_PATH_H */
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -64,8 +64,9 @@ srctree := $(patsubst %/,%,$(dir $(srctree)))
 #$(info Determined 'srctree' to be $(srctree))
 endif

-FEATURE_DISPLAY = libelf libelf-getphdrnum libelf-mmap bpf
-FEATURE_TESTS = libelf bpf
+FEATURE_USER = .libbpf
+FEATURE_TESTS = libelf libelf-getphdrnum libelf-mmap bpf
+FEATURE_DISPLAY = libelf bpf

 INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/arch/$(ARCH)/include/uapi -I$(srctree)/include/uapi
 FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES)
@@ -122,8 +123,10 @@ endif
 # the same command line setup.
 MAKEOVERRIDES=

+all:
+
 export srctree OUTPUT CC LD CFLAGS V
-build := -f $(srctree)/tools/build/Makefile.build dir=. obj
+include $(srctree)/tools/build/Makefile.include

 BPF_IN    := $(OUTPUT)libbpf-in.o
 LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
@@ -132,7 +135,7 @@ CMD_TARGETS = $(LIB_FILE)

 TARGETS = $(CMD_TARGETS)

-all: $(VERSION_FILES) all_cmd
+all: fixdep $(VERSION_FILES) all_cmd

 all_cmd: $(CMD_TARGETS)


--- a/tools/lib/lockdep/Makefile
+++ b/tools/lib/lockdep/Makefile
@@ -93,8 +93,10 @@ else
  print_install =		echo '  INSTALL  '$1'	to	$(DESTDIR_SQ)$2';
 endif

+all:
+
 export srctree OUTPUT CC LD CFLAGS V
-build := -f $(srctree)/tools/build/Makefile.build dir=. obj
+include $(srctree)/tools/build/Makefile.include

 do_compile_shared_library =			\
 	($(print_shared_lib_compile)		\
@@ -109,7 +111,7 @@ CMD_TARGETS = $(LIB_FILE)
 TARGETS = $(CMD_TARGETS)


-all: all_cmd
+all: fixdep all_cmd

 all_cmd: $(CMD_TARGETS)


--- a/tools/lib/symbol/kallsyms.c
+++ b/tools/lib/symbol/kallsyms.c
@@ -2,6 +2,12 @@
 #include <stdio.h>
 #include <stdlib.h>

+u8 kallsyms2elf_type(char type)
+{
+	type = tolower(type);
+	return (type == 't' || type == 'w') ? STT_FUNC : STT_OBJECT;
+}
+
 int kallsyms__parse(const char *filename, void *arg,
 		    int (*process_symbol)(void *arg, const char *name,
 					  char type, u64 start))

--- a/tools/lib/symbol/kallsyms.h
+++ b/tools/lib/symbol/kallsyms.h
@@ -9,7 +9,7 @@
 #define KSYM_NAME_LEN 256
 #endif

-static inline u8 kallsyms2elf_type(char type)
+static inline u8 kallsyms2elf_binding(char type)
 {
 	if (type == 'W')
 		return STB_WEAK;
@@ -17,6 +17,8 @@ static inline u8 kallsyms2elf_type(char type)
 	return isupper(type) ? STB_GLOBAL : STB_LOCAL;
 }

+u8 kallsyms2elf_type(char type);
+
 int kallsyms__parse(const char *filename, void *arg,
 		    int (*process_symbol)(void *arg, const char *name,
 					  char type, u64 start));

--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -848,6 +848,7 @@ static void free_arg(struct print_arg *arg)
 		free(arg->bitmask.bitmask);
 		break;
 	case PRINT_DYNAMIC_ARRAY:
+	case PRINT_DYNAMIC_ARRAY_LEN:
 		free(arg->dynarray.index);
 		break;
 	case PRINT_OP:
@@ -2728,6 +2729,42 @@ process_dynamic_array(struct event_format *event, struct print_arg *arg, char **
 	return EVENT_ERROR;
 }

+static enum event_type
+process_dynamic_array_len(struct event_format *event, struct print_arg *arg,
+			  char **tok)
+{
+	struct format_field *field;
+	enum event_type type;
+	char *token;
+
+	if (read_expect_type(EVENT_ITEM, &token) < 0)
+		goto out_free;
+
+	arg->type = PRINT_DYNAMIC_ARRAY_LEN;
+
+	/* Find the field */
+	field = pevent_find_field(event, token);
+	if (!field)
+		goto out_free;
+
+	arg->dynarray.field = field;
+	arg->dynarray.index = 0;
+
+	if (read_expected(EVENT_DELIM, ")") < 0)
+		goto out_err;
+
+	type = read_token(&token);
+	*tok = token;
+
+	return type;
+
+ out_free:
+	free_token(token);
+ out_err:
+	*tok = NULL;
+	return EVENT_ERROR;
+}
+
 static enum event_type
 process_paren(struct event_format *event, struct print_arg *arg, char **tok)
 {
@@ -2975,6 +3012,10 @@ process_function(struct event_format *event, struct print_arg *arg,
 		free_token(token);
 		return process_dynamic_array(event, arg, tok);
 	}
+	if (strcmp(token, "__get_dynamic_array_len") == 0) {
+		free_token(token);
+		return process_dynamic_array_len(event, arg, tok);
+	}

 	func = find_func_handler(event->pevent, token);
 	if (func) {
@@ -3655,14 +3696,25 @@ eval_num_arg(void *data, int size, struct event_format *event, struct print_arg
 			goto out_warning_op;
 		}
 		break;
+	case PRINT_DYNAMIC_ARRAY_LEN:
+		offset = pevent_read_number(pevent,
+					    data + arg->dynarray.field->offset,
+					    arg->dynarray.field->size);
+		/*
+		 * The total allocated length of the dynamic array is
+		 * stored in the top half of the field, and the offset
+		 * is in the bottom half of the 32 bit field.
+		 */
+		val = (unsigned long long)(offset >> 16);
+		break;
 	case PRINT_DYNAMIC_ARRAY:
 		/* Without [], we pass the address to the dynamic data */
 		offset = pevent_read_number(pevent,
 					    data + arg->dynarray.field->offset,
 					    arg->dynarray.field->size);
 		/*
-		 * The actual length of the dynamic array is stored
-		 * in the top half of the field, and the offset
+		 * The total allocated length of the dynamic array is
+		 * stored in the top half of the field, and the offset
 		 * is in the bottom half of the 32 bit field.
 		 */
 		offset &= 0xffff;
@@ -4853,8 +4905,8 @@ static void pretty_print(struct trace_seq *s, void *data, int size, struct event
 				else
 					ls = 2;

-				if (*(ptr+1) == 'F' ||
-				    *(ptr+1) == 'f') {
+				if (*(ptr+1) == 'F' || *(ptr+1) == 'f' ||
+				    *(ptr+1) == 'S' || *(ptr+1) == 's') {
 					ptr++;
 					show_func = *ptr;
 				} else if (*(ptr+1) == 'M' || *(ptr+1) == 'm') {

--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -294,6 +294,7 @@ enum print_arg_type {
 	PRINT_OP,
 	PRINT_FUNC,
 	PRINT_BITMASK,
+	PRINT_DYNAMIC_ARRAY_LEN,
 };

 struct print_arg {

--- a/tools/lib/traceevent/plugin_kvm.c
+++ b/tools/lib/traceevent/plugin_kvm.c
@@ -124,7 +124,10 @@ static const char *disassemble(unsigned char *insn, int len, uint64_t rip,
 	_ER(WBINVD,		 54)		\
 	_ER(XSETBV,		 55)		\
 	_ER(APIC_WRITE,		 56)		\
-	_ER(INVPCID,		 58)
+	_ER(INVPCID,		 58)		\
+	_ER(PML_FULL,		 62)		\
+	_ER(XSAVES,		 63)		\
+	_ER(XRSTORS,		 64)

 #define SVM_EXIT_REASONS \
 	_ER(EXIT_READ_CR0,	0x000)		\
@@ -352,15 +355,18 @@ static int kvm_nested_vmexit_handler(struct trace_seq *s, struct pevent_record *
 union kvm_mmu_page_role {
 	unsigned word;
 	struct {
-		unsigned glevels:4;
 		unsigned level:4;
+		unsigned cr4_pae:1;
 		unsigned quadrant:2;
-		unsigned pad_for_nice_hex_output:6;
 		unsigned direct:1;
 		unsigned access:3;
 		unsigned invalid:1;
-		unsigned cr4_pge:1;
 		unsigned nxe:1;
+		unsigned cr0_wp:1;
+		unsigned smep_and_not_wp:1;
+		unsigned smap_and_not_wp:1;
+		unsigned pad_for_nice_hex_output:8;
+		unsigned smm:8;
 	};
 };

@@ -385,15 +391,18 @@ static int kvm_mmu_print_role(struct trace_seq *s, struct pevent_record *record,
 	if (pevent_is_file_bigendian(event->pevent) ==
 	    pevent_is_host_bigendian(event->pevent)) {

-		trace_seq_printf(s, "%u/%u q%u%s %s%s %spge %snxe",
+		trace_seq_printf(s, "%u q%u%s %s%s %spae %snxe %swp%s%s%s",
 				 role.level,
-				 role.glevels,
 				 role.quadrant,
 				 role.direct ? " direct" : "",
 				 access_str[role.access],
 				 role.invalid ? " invalid" : "",
-				 role.cr4_pge ? "" : "!",
-				 role.nxe ? "" : "!");
+				 role.cr4_pae ? "" : "!",
+				 role.nxe ? "" : "!",
+				 role.cr0_wp ? "" : "!",
+				 role.smep_and_not_wp ? " smep" : "",
+				 role.smap_and_not_wp ? " smap" : "",
+				 role.smm ? " smm" : "");
 	} else
 		trace_seq_printf(s, "WORD: %08x", role.word);


--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -671,6 +671,7 @@ The letters are:
 	e	synthesize tracing error events
 	d	create a debug log
 	g	synthesize a call chain (use with i or x)
+	l	synthesize last branch entries (use with i or x)

 "Instructions" events look like they were recorded by "perf record -e
 instructions".
@@ -707,12 +708,26 @@ on the sample is *not* adjusted and reflects the last known value of TSC.

 For Intel PT, the default period is 100us.

+Setting it to a zero period means "as often as possible".
+
+In the case of Intel PT that is the same as a period of 1 and a unit of
+'instructions' (i.e. --itrace=i1i).
+
 Also the call chain size (default 16, max. 1024) for instructions or
 transactions events can be specified. e.g.

 	--itrace=ig32
 	--itrace=xg32

+Also the number of last branch entries (default 64, max. 1024) for instructions or
+transactions events can be specified. e.g.
+
+       --itrace=il10
+       --itrace=xl10
+
+Note that last branch entries are cleared for each sample, so there is no overlap
+from one sample to the next.
+
 To disable trace decoding entirely, use the option --no-itrace.


@@ -749,3 +764,32 @@ perf inject also accepts the --itrace option in which case tracing data is
 removed and replaced with the synthesized events. e.g.

 	perf inject --itrace -i perf.data -o perf.data.new
+
+Below is an example of using Intel PT with autofdo.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
+amended to take the number of elements as a parameter.
+
+	$ gcc-5 -O3 sort.c -o sort_optimized
+	$ ./sort_optimized 30000
+	Bubble sorting array of 30000 elements
+	2254 ms
+
+	$ cat ~/.perfconfig
+	[intel-pt]
+		mispred-all
+
+	$ perf record -e intel_pt//u ./sort 3000
+	Bubble sorting array of 3000 elements
+	58 ms
+	[ perf record: Woken up 2 times to write data ]
+	[ perf record: Captured and wrote 3.939 MB perf.data ]
+	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
+	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ ./sort_autofdo 30000
+	Bubble sorting array of 30000 elements
+	2155 ms
+
+Note there is currently no advantage to using Intel PT instead of LBR, but
+that may change in the future if greater use is made of the data.
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -6,6 +6,7 @@
 		e	synthesize error events
 		d	create a debug log
 		g	synthesize a call chain (use with i or x)
+		l	synthesize last branch entries (use with i or x)

 	The default is all events i.e. the same as --itrace=ibxe

@@ -20,3 +21,6 @@

 	Also the call chain size (default 16, max. 1024) for instructions or
 	transactions events can be specified.
+
+	Also the number of last branch entries (default 64, max. 1024) for
+	instructions or transactions events can be specified.
--- a/tools/perf/Documentation/perf-bench.txt
+++ b/tools/perf/Documentation/perf-bench.txt
@@ -82,7 +82,7 @@ Be multi thread instead of multi process
 Specify number of groups

 -l::
--loop=::
+--nr_loops=::
 Specify number of loops

 Example of *messaging*
@@ -139,64 +139,48 @@ Suite for evaluating performance of simple memory copy in various ways.
 Options of *memcpy*
 ^^^^^^^^^^^^^^^^^^^
 -l::
--length::
-Specify length of memory to copy (default: 1MB).
+--size::
+Specify size of memory to copy (default: 1MB).
 Available units are B, KB, MB, GB and TB (case insensitive).

-r::
--routine::
-Specify routine to copy (default: default).
-Available routines are depend on the architecture.
+-f::
+--function::
+Specify function to copy (default: default).
+Available functions are depend on the architecture.
 On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.

-i::
--iterations::
+-l::
+--nr_loops::
 Repeat memcpy invocation this number of times.

 -c::
--cycle::
+--cycles::
 Use perf's cpu-cycles event instead of gettimeofday syscall.

-o::
--only-prefault::
-Show only the result with page faults before memcpy.
-
-n::
--no-prefault::
-Show only the result without page faults before memcpy.
-
 *memset*::
 Suite for evaluating performance of simple memory set in various ways.

 Options of *memset*
 ^^^^^^^^^^^^^^^^^^^
 -l::
--length::
-Specify length of memory to set (default: 1MB).
+--size::
+Specify size of memory to set (default: 1MB).
 Available units are B, KB, MB, GB and TB (case insensitive).

-r::
--routine::
-Specify routine to set (default: default).
-Available routines are depend on the architecture.
+-f::
+--function::
+Specify function to set (default: default).
+Available functions are depend on the architecture.
 On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.

-i::
--iterations::
+-l::
+--nr_loops::
 Repeat memset invocation this number of times.

 -c::
--cycle::
+--cycles::
 Use perf's cpu-cycles event instead of gettimeofday syscall.

-o::
--only-prefault::
-Show only the result with page faults before memset.
-
-n::
--no-prefault::
-Show only the result without page faults before memset.
-
 SUITES FOR 'numa'
 ~~~~~~~~~~~~~~~~~
 *mem*::

--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -50,6 +50,9 @@ OPTIONS

 include::itrace.txt[]

+--strip::
+	Use with --itrace to strip out non-synthesized events.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-archive[1]
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -30,6 +30,7 @@ counted. The following modifiers exist:
 G - guest counting (in KVM guests)
 H - host counting (not in KVM guests)
 p - precise level
+ P - use maximum detected precise level
 S - read sample value (PERF_SAMPLE_READ)
 D - pin the event to the PMU

@@ -125,6 +126,8 @@ To limit the list use:
 . If none of the above is matched, it will apply the supplied glob to all
  events, printing the ones that match.

+. As a last resort, it will do a substring search in all event names.
+
 One or more types can be used at the same time, listing the events for the
 types specified.


--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -144,7 +144,7 @@ OPTIONS

 --call-graph::
 	Setup and enable call-graph (stack chain/backtrace) recording,
-	implies -g.
+	implies -g.  Default is "fp".

 	Allows specifying "fp" (frame pointer) or "dwarf"
 	(DWARF's CFI - Call Frame Information) or "lbr"
@@ -154,13 +154,18 @@ OPTIONS
 	In some systems, where binaries are build with gcc
 	--fomit-frame-pointer, using the "fp" method will produce bogus
 	call graphs, using "dwarf", if available (perf tools linked to
-	the libunwind library) should be used instead.
+	the libunwind or libdw library) should be used instead.
 	Using the "lbr" method doesn't require any compiler options. It
 	will produce call graphs from the hardware LBR registers. The
 	main limition is that it is only available on new Intel
 	platforms, such as Haswell. It can only get user call chain. It
 	doesn't work with branch stack sampling at the same time.

+	When "dwarf" recording is used, perf also records (user) stack dump
+	when sampled.  Default size of the stack dump is 8192 (bytes).
+	User can change the size by passing the size after comma like
+	"--call-graph dwarf,4096".
+
 -q::
 --quiet::
 	Don't print any message, useful for scripting.
@@ -236,6 +241,7 @@ following filters are defined:
        - any_call: any function call or system call
        - any_ret: any function return or system call return
        - ind_call: any indirect branch
+        - call: direct calls, including far (to/from kernel) calls
        - u:  only when the branch target is at the user level
        - k: only when the branch target is in the kernel
        - hv: only when the target is at the hypervisor level
@@ -308,6 +314,12 @@ This option sets the time out limit. The default value is 500 ms.
 Record context switch events i.e. events of type PERF_RECORD_SWITCH or
 PERF_RECORD_SWITCH_CPU_WIDE.

+--clang-path::
+Path to clang binary to use for compiling BPF scriptlets.
+
+--clang-opt::
+Options passed to clang when compiling BPF scriptlets.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -29,7 +29,7 @@ OPTIONS
 --show-nr-samples::
 	Show the number of samples for each symbol

--showcpuutilization::
+--show-cpu-utilization::
        Show sample percentage for different cpu modes.

 -T::
@@ -68,7 +68,7 @@ OPTIONS
 --sort=::
 	Sort histogram entries by given key(s) - multiple keys can be specified
 	in CSV format.  Following sort keys are available:
-	pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.
+	pid, comm, dso, symbol, parent, cpu, socket, srcline, weight, local_weight.

 	Each key has following meaning:

@@ -79,6 +79,7 @@ OPTIONS
 	- parent: name of function matched to the parent regex filter. Unmatched
 	entries are displayed as "[other]".
 	- cpu: cpu number the task ran at the time of sample
+	- socket: processor socket number the task ran at the time of sample
 	- srcline: filename and line number executed at the time of sample.  The
 	DWARF debugging info must be provided.
 	- srcfile: file name of the source file of the same. Requires dwarf
@@ -168,30 +169,40 @@ OPTIONS
 --dump-raw-trace::
        Dump raw trace in ASCII.

-g [type,min[,limit],order[,key][,branch]]::
--call-graph::
-        Display call chains using type, min percent threshold, optional print
-	limit and order.
-	type can be either:
+-g::
+--call-graph=<print_type,threshold[,print_limit],order,sort_key,branch>::
+        Display call chains using type, min percent threshold, print limit,
+	call order, sort key and branch.  Note that ordering of parameters is not
+	fixed so any parement can be given in an arbitraty order.  One exception
+	is the print_limit which should be preceded by threshold.
+
+	print_type can be either:
 	- flat: single column, linear exposure of call chains.
-	- graph: use a graph tree, displaying absolute overhead rates.
+	- graph: use a graph tree, displaying absolute overhead rates. (default)
 	- fractal: like graph, but displays relative rates. Each branch of
-		 the tree is considered as a new profiled object. +
+		 the tree is considered as a new profiled object.
+	- none: disable call chain display.
+
+	threshold is a percentage value which specifies a minimum percent to be
+	included in the output call graph.  Default is 0.5 (%).
+
+	print_limit is only applied when stdio interface is used.  It's to limit
+	number of call graph entries in a single hist entry.  Note that it needs
+	to be given after threshold (but not necessarily consecutive).
+	Default is 0 (unlimited).

 	order can be either:
 	- callee: callee based call graph.
 	- caller: inverted caller based call graph.
+	Default is 'caller' when --children is used, otherwise 'callee'.

-	key can be:
-	- function: compare on functions
+	sort_key can be:
+	- function: compare on functions (default)
 	- address: compare on individual code addresses

 	branch can be:
-	- branch: include last branch information in callgraph
-	when available. Usually more convenient to use --branch-history
-	for this.
-
-	Default: fractal,0.5,callee,function.
+	- branch: include last branch information in callgraph when available.
+	          Usually more convenient to use --branch-history for this.

 --children::
 	Accumulate callchain of children to parent entry so that then can
@@ -204,6 +215,8 @@ OPTIONS
 	beyond the specified depth will be ignored. This is a trade-off
 	between information loss and faster processing especially for
 	workloads that can have a very long callchain stack.
+	Note that when using the --itrace option the synthesized callchain size
+	will override this value if the synthesized callchain size is bigger.

 	Default: 127

@@ -349,6 +362,9 @@ include::itrace.txt[]
 	This option extends the perf report to show reference callgraphs,
 	which collected by reference event, in no callgraph event.

+--socket-filter::
+	Only report the samples on the processor socket that match with this filter
+
 include::callchain-overhead-calculation.txt[]

 SEE ALSO

--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -112,11 +112,11 @@ OPTIONS
 --debug-mode::
        Do various checks like samples ordering and lost events.

-f::
+-F::
 --fields::
        Comma separated list of fields to print. Options are:
        comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
-	srcline, period, iregs, flags.
+	srcline, period, iregs, brstack, brstacksym, flags.
        Field list can be prepended with the type, trace, sw or hw,
        to indicate to which event type the field list applies.
        e.g., -f sw:comm,tid,time,ip,sym  and -f trace:time,cpu,trace
@@ -175,6 +175,16 @@ OPTIONS
 	Finally, a user may not set fields to none for all event types.
 	i.e., -f "" is not allowed.

+	The brstack output includes branch related information with raw addresses using the
+	/v/v/v/v/ syntax in the following order:
+	FROM: branch source instruction
+	TO  : branch target instruction
+        M/P/-: M=branch target mispredicted or branch direction was mispredicted, P=target predicted or direction predicted, -=not supported
+	X/- : X=branch inside a transactional region, -=not in transaction region or not supported
+	A/- : A=TSX abort entry, -=not aborted region or not supported
+
+	The brstacksym is identical to brstack, except that the FROM and TO addresses are printed in a symbolic form if possible.
+
 -k::
 --vmlinux=<file>::
        vmlinux pathname
@@ -249,6 +259,9 @@ include::itrace.txt[]
 --full-source-path::
 	Show the full path for source files for srcline output.

+--ns::
+	Use 9 decimal places when displaying time (i.e. show the nanoseconds)
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],

--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -128,8 +128,9 @@ perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- m

 -I msecs::
 --interval-print msecs::
-	Print count deltas every N milliseconds (minimum: 100ms)
-	example: perf stat -I 1000 -e cycles -a sleep 5
+Print count deltas every N milliseconds (minimum: 10ms)
+The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals.  Use with caution.
+	example: 'perf stat -I 1000 -e cycles -a sleep 5'

 --per-socket::
 Aggregate counts per processor socket for system-wide mode measurements.  This

--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -160,9 +160,10 @@ Default is to monitor all CPUS.
 -g::
 	Enables call-graph (stack chain/backtrace) recording.

--call-graph::
+--call-graph [mode,type,min[,limit],order[,key][,branch]]::
 	Setup and enable call-graph (stack chain/backtrace) recording,
-	implies -g.
+	implies -g.  See `--call-graph` section in perf-record and
+	perf-report man pages for details.

 --children::
 	Accumulate callchain of children to parent entry so that then can

--- a/tools/perf/Documentation/perf.txt
+++ b/tools/perf/Documentation/perf.txt
@@ -27,6 +27,14 @@ OPTIONS
 	Setup buildid cache directory. It has higher priority than
 	buildid.dir config file option.

+-v::
+--version::
+  Display perf version.
+
+-h::
+--help::
+  Run perf help command.
+
 DESCRIPTION
 -----------
 Performance counters for Linux are a new kernel-based subsystem

--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -17,6 +17,7 @@ tools/build
 tools/arch/x86/include/asm/atomic.h
 tools/arch/x86/include/asm/rmwcc.h
 tools/lib/traceevent
+tools/lib/bpf
 tools/lib/api
 tools/lib/bpf
 tools/lib/hweight.c
@@ -41,6 +42,7 @@ tools/include/asm-generic/bitops.h
 tools/include/linux/atomic.h
 tools/include/linux/bitops.h
 tools/include/linux/compiler.h
+tools/include/linux/filter.h
 tools/include/linux/hash.h
 tools/include/linux/kernel.h
 tools/include/linux/list.h
@@ -49,6 +51,7 @@ tools/include/linux/poison.h
 tools/include/linux/rbtree.h
 tools/include/linux/rbtree_augmented.h
 tools/include/linux/types.h
+tools/include/linux/err.h
 include/asm-generic/bitops/arch_hweight.h
 include/asm-generic/bitops/const_hweight.h
 include/asm-generic/bitops/fls64.h
@@ -67,6 +70,8 @@ arch/*/lib/memset*.S
 include/linux/poison.h
 include/linux/hw_breakpoint.h
 include/uapi/linux/perf_event.h
+include/uapi/linux/bpf.h
+include/uapi/linux/bpf_common.h
 include/uapi/linux/const.h
 include/uapi/linux/swab.h
 include/uapi/linux/hw_breakpoint.h

--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -75,6 +75,8 @@ include config/utilities.mak
 # Define NO_LZMA if you do not want to support compressed (xz) kernel modules
 #
 # Define NO_AUXTRACE if you do not want AUX area tracing support
+#
+# Define NO_LIBBPF if you do not want BPF support

 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL
@@ -145,6 +147,7 @@ AWK     = awk

 LIB_DIR          = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
+BPF_DIR = $(srctree)/tools/lib/bpf/

 # include config/Makefile by default and rule out
 # non-config cases
@@ -180,6 +183,7 @@ strip-libs = $(filter-out -l%,$(1))

 ifneq ($(OUTPUT),)
  TE_PATH=$(OUTPUT)
+  BPF_PATH=$(OUTPUT)
 ifneq ($(subdir),)
  LIB_PATH=$(OUTPUT)/../lib/api/
 else
@@ -188,6 +192,7 @@ endif
 else
  TE_PATH=$(TRACE_EVENT_DIR)
  LIB_PATH=$(LIB_DIR)
+  BPF_PATH=$(BPF_DIR)
 endif

 LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
@@ -199,6 +204,8 @@ LIBTRACEEVENT_DYNAMIC_LIST_LDFLAGS = -Xlinker --dynamic-list=$(LIBTRACEEVENT_DYN
 LIBAPI = $(LIB_PATH)libapi.a
 export LIBAPI

+LIBBPF = $(BPF_PATH)libbpf.a
+
 # python extension build directories
 PYTHON_EXTBUILD     := $(OUTPUT)python_ext_build/
 PYTHON_EXTBUILD_LIB := $(PYTHON_EXTBUILD)lib/
@@ -251,6 +258,9 @@ export PERL_PATH
 LIB_FILE=$(OUTPUT)libperf.a

 PERFLIBS = $(LIB_FILE) $(LIBAPI) $(LIBTRACEEVENT)
+ifndef NO_LIBBPF
+  PERFLIBS += $(LIBBPF)
+endif

 # We choose to avoid "if .. else if .. else .. endif endif"
 # because maintaining the nesting to match is a pain.  If
@@ -297,16 +307,16 @@ strip: $(PROGRAMS) $(OUTPUT)perf
 PERF_IN := $(OUTPUT)perf-in.o

 export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
-build := -f $(srctree)/tools/build/Makefile.build dir=. obj
+include $(srctree)/tools/build/Makefile.include

-$(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
+$(PERF_IN): prepare FORCE
 	$(Q)$(MAKE) $(build)=perf

 $(OUTPUT)perf: $(PERFLIBS) $(PERF_IN) $(LIBTRACEEVENT_DYNAMIC_LIST)
 	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(LIBTRACEEVENT_DYNAMIC_LIST_LDFLAGS) \
 		$(PERF_IN) $(LIBS) -o $@

-$(GTK_IN): FORCE
+$(GTK_IN): fixdep FORCE
 	$(Q)$(MAKE) $(build)=gtk

 $(OUTPUT)libperf-gtk.so: $(GTK_IN) $(PERFLIBS)
@@ -349,27 +359,27 @@ endif
 __build-dir = $(subst $(OUTPUT),,$(dir $@))
 build-dir   = $(if $(__build-dir),$(__build-dir),.)

-single_dep: $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h
+prepare: $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h fixdep

-$(OUTPUT)%.o: %.c single_dep FORCE
+$(OUTPUT)%.o: %.c prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@

-$(OUTPUT)%.i: %.c single_dep FORCE
+$(OUTPUT)%.i: %.c prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@

-$(OUTPUT)%.s: %.c single_dep FORCE
+$(OUTPUT)%.s: %.c prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@

-$(OUTPUT)%-bison.o: %.c single_dep FORCE
+$(OUTPUT)%-bison.o: %.c prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@

-$(OUTPUT)%-flex.o: %.c single_dep FORCE
+$(OUTPUT)%-flex.o: %.c prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@

-$(OUTPUT)%.o: %.S single_dep FORCE
+$(OUTPUT)%.o: %.S prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@

-$(OUTPUT)%.i: %.S single_dep FORCE
+$(OUTPUT)%.i: %.S prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@

 $(OUTPUT)perf-%: %.o $(PERFLIBS)
@@ -389,7 +399,7 @@ $(patsubst perf-%,%.o,$(PROGRAMS)): $(wildcard */*.h)

 LIBPERF_IN := $(OUTPUT)libperf-in.o

-$(LIBPERF_IN): FORCE
+$(LIBPERF_IN): fixdep FORCE
 	$(Q)$(MAKE) $(build)=libperf

 $(LIB_FILE): $(LIBPERF_IN)
@@ -397,10 +407,10 @@ $(LIB_FILE): $(LIBPERF_IN)

 LIBTRACEEVENT_FLAGS += plugin_dir=$(plugindir_SQ)

-$(LIBTRACEEVENT): FORCE
+$(LIBTRACEEVENT): fixdep FORCE
 	$(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) $(OUTPUT)libtraceevent.a

-libtraceevent_plugins: FORCE
+libtraceevent_plugins: fixdep FORCE
 	$(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) plugins

 $(LIBTRACEEVENT_DYNAMIC_LIST): libtraceevent_plugins
@@ -413,13 +423,20 @@ $(LIBTRACEEVENT)-clean:
 install-traceevent-plugins: $(LIBTRACEEVENT)
 	$(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) install_plugins

-$(LIBAPI): FORCE
+$(LIBAPI): fixdep FORCE
 	$(Q)$(MAKE) -C $(LIB_DIR) O=$(OUTPUT) $(OUTPUT)libapi.a

 $(LIBAPI)-clean:
 	$(call QUIET_CLEAN, libapi)
 	$(Q)$(MAKE) -C $(LIB_DIR) O=$(OUTPUT) clean >/dev/null

+$(LIBBPF): fixdep FORCE
+	$(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) $(OUTPUT)libbpf.a
+
+$(LIBBPF)-clean:
+	$(call QUIET_CLEAN, libbpf)
+	$(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) clean >/dev/null
+
 help:
 	@echo 'Perf make targets:'
 	@echo '  doc		- make *all* documentation (see below)'
@@ -459,7 +476,7 @@ INSTALL_DOC_TARGETS += quick-install-doc quick-install-man quick-install-html
 $(DOC_TARGETS):
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) $(@:doc=all)

-TAG_FOLDERS= . ../lib/traceevent ../lib/api ../lib/symbol
+TAG_FOLDERS= . ../lib/traceevent ../lib/api ../lib/symbol ../include ../lib/bpf
 TAG_FILES= ../../include/uapi/linux/perf_event.h

 TAGS:
@@ -567,7 +584,7 @@ config-clean:
 	$(call QUIET_CLEAN, config)
 	$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null

-clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
+clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean config-clean
 	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
 	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) $(OUTPUT).config-detected
@@ -591,6 +608,6 @@ FORCE:

 .PHONY: all install clean config-clean strip install-gtk
 .PHONY: shell_compatibility_test please_set_SHELL_PATH_to_a_more_modern_shell
-.PHONY: $(GIT-HEAD-PHONY) TAGS tags cscope FORCE single_dep
+.PHONY: $(GIT-HEAD-PHONY) TAGS tags cscope FORCE prepare
 .PHONY: libtraceevent_plugins

--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -128,9 +128,8 @@ static const char *normalize_arch(char *arch)
 	return arch;
 }

-static int perf_session_env__lookup_binutils_path(struct perf_env *env,
-						  const char *name,
-						  const char **path)
+static int perf_env__lookup_binutils_path(struct perf_env *env,
+					  const char *name, const char **path)
 {
 	int idx;
 	const char *arch, *cross_env;
@@ -206,7 +205,7 @@ static int perf_session_env__lookup_binutils_path(struct perf_env *env,
 	return -1;
 }

-int perf_session_env__lookup_objdump(struct perf_env *env)
+int perf_env__lookup_objdump(struct perf_env *env)
 {
 	/*
 	 * For live mode, env->arch will be NULL and we can use
@@ -215,6 +214,5 @@ int perf_session_env__lookup_objdump(struct perf_env *env)
 	if (env->arch == NULL)
 		return 0;

-	return perf_session_env__lookup_binutils_path(env, "objdump",
-						      &objdump_path);
+	return perf_env__lookup_binutils_path(env, "objdump", &objdump_path);
 }
--- a/tools/perf/arch/common.h
+++ b/tools/perf/arch/common.h
 #ifndef ARCH_PERF_COMMON_H
 #define ARCH_PERF_COMMON_H

-#include "../util/session.h"
+#include "../util/env.h"

 extern const char *objdump_path;

-int perf_session_env__lookup_objdump(struct perf_env *env);
+int perf_env__lookup_objdump(struct perf_env *env);

 #endif /* ARCH_PERF_COMMON_H */
--- a/tools/perf/arch/x86/Build
+++ b/tools/perf/arch/x86/Build
 libperf-y += util/
-libperf-$(CONFIG_DWARF_UNWIND) += tests/
+libperf-y += tests/
--- a/tools/perf/arch/x86/Makefile
+++ b/tools/perf/arch/x86/Makefile
@@ -2,3 +2,4 @@ ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 endif
 HAVE_KVM_STAT_SUPPORT := 1
+PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
--- a/tools/perf/arch/x86/include/arch-tests.h
+++ b/tools/perf/arch/x86/include/arch-tests.h
+#ifndef ARCH_TESTS_H
+#define ARCH_TESTS_H
+
+/* Tests */
+int test__rdpmc(void);
+int test__perf_time_to_tsc(void);
+int test__insn_x86(void);
+int test__intel_cqm_count_nmi_context(void);
+
+#ifdef HAVE_DWARF_UNWIND_SUPPORT
+struct thread;
+struct perf_sample;
+int test__arch_unwind_sample(struct perf_sample *sample,
+			     struct thread *thread);
+#endif
+
+extern struct test arch_tests[];
+
+#endif
--- a/tools/perf/arch/x86/tests/Build
+++ b/tools/perf/arch/x86/tests/Build
-libperf-y += regs_load.o
-libperf-y += dwarf-unwind.o
+libperf-$(CONFIG_DWARF_UNWIND) += regs_load.o
+libperf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o
+
+libperf-y += arch-tests.o
+libperf-y += rdpmc.o
+libperf-y += perf-time-to-tsc.o
+libperf-$(CONFIG_AUXTRACE) += insn-x86.o
+libperf-y += intel-cqm.o
--- a/tools/perf/arch/x86/tests/arch-tests.c
+++ b/tools/perf/arch/x86/tests/arch-tests.c
+#include <string.h>
+#include "tests/tests.h"
+#include "arch-tests.h"
+
+struct test arch_tests[] = {
+	{
+		.desc = "x86 rdpmc test",
+		.func = test__rdpmc,
+	},
+	{
+		.desc = "Test converting perf time to TSC",
+		.func = test__perf_time_to_tsc,
+	},
+#ifdef HAVE_DWARF_UNWIND_SUPPORT
+	{
+		.desc = "Test dwarf unwind",
+		.func = test__dwarf_unwind,
+	},
+#endif
+#ifdef HAVE_AUXTRACE_SUPPORT
+	{
+		.desc = "Test x86 instruction decoder - new instructions",
+		.func = test__insn_x86,
+	},
+#endif
+	{
+		.desc = "Test intel cqm nmi context read",
+		.func = test__intel_cqm_count_nmi_context,
+	},
+	{
+		.func = NULL,
+	},
+
+};
--- a/tools/perf/arch/x86/tests/dwarf-unwind.c
+++ b/tools/perf/arch/x86/tests/dwarf-unwind.c
@@ -5,6 +5,7 @@
 #include "event.h"
 #include "debug.h"
 #include "tests/tests.h"
+#include "arch-tests.h"

 #define STACK_SIZE 8192


--- a/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk
+++ b/tools/perf/arch/x86/tests/gen-insn-x86-dat.awk
+#!/bin/awk -f
+# gen-insn-x86-dat.awk: script to convert data for the insn-x86 test
+# Copyright (c) 2015, Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+
+BEGIN {
+	print "/*"
+	print " * Generated by gen-insn-x86-dat.sh and gen-insn-x86-dat.awk"
+	print " * from insn-x86-dat-src.c for inclusion by insn-x86.c"
+	print " * Do not change this code."
+	print "*/\n"
+	op = ""
+	branch = ""
+	rel = 0
+	going = 0
+}
+
+/ Start here / {
+	going = 1
+}
+
+/ Stop here / {
+	going = 0
+}
+
+/^\s*[0-9a-fA-F]+\:/ {
+	if (going) {
+		colon_pos = index($0, ":")
+		useful_line = substr($0, colon_pos + 1)
+		first_pos = match(useful_line, "[0-9a-fA-F]")
+		useful_line = substr(useful_line, first_pos)
+		gsub("\t", "\\t", useful_line)
+		printf "{{"
+		len = 0
+		for (i = 2; i <= NF; i++) {
+			if (match($i, "^[0-9a-fA-F][0-9a-fA-F]$")) {
+				printf "0x%s, ", $i
+				len += 1
+			} else {
+				break
+			}
+		}
+		printf "}, %d, %s, \"%s\", \"%s\",", len, rel, op, branch
+		printf "\n\"%s\",},\n", useful_line
+		op = ""
+		branch = ""
+		rel = 0
+	}
+}
+
+/ Expecting: / {
+	expecting_str = " Expecting: "
+	expecting_len = length(expecting_str)
+	expecting_pos = index($0, expecting_str)
+	useful_line = substr($0, expecting_pos + expecting_len)
+	for (i = 1; i <= NF; i++) {
+		if ($i == "Expecting:") {
+			i++
+			op = $i
+			i++
+			branch = $i
+			i++
+			rel = $i
+			break
+		}
+	}
+}
--- a/tools/perf/arch/x86/tests/gen-insn-x86-dat.sh
+++ b/tools/perf/arch/x86/tests/gen-insn-x86-dat.sh
--- a/tools/perf/arch/x86/tests/insn-x86-dat-32.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
--- a/tools/perf/arch/x86/tests/insn-x86-dat-64.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
--- a/tools/perf/arch/x86/tests/insn-x86-dat-src.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
--- a/tools/perf/arch/x86/tests/insn-x86.c
+++ b/tools/perf/arch/x86/tests/insn-x86.c
--- a/tools/perf/arch/x86/tests/intel-cqm.c
+++ b/tools/perf/arch/x86/tests/intel-cqm.c
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
--- a/tools/perf/tests/rdpmc.c
+++ b/tools/perf/tests/rdpmc.c
--- a/tools/perf/arch/x86/util/dwarf-regs.c
+++ b/tools/perf/arch/x86/util/dwarf-regs.c
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
--- a/tools/perf/bench/mem-functions.c
+++ b/tools/perf/bench/mem-functions.c
--- a/tools/perf/bench/mem-memcpy.c
+++ b/tools/perf/bench/mem-memcpy.c
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
--- a/tools/perf/bench/sched-messaging.c
+++ b/tools/perf/bench/sched-messaging.c
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
--- a/tools/perf/builtin-evlist.c
+++ b/tools/perf/builtin-evlist.c
--- a/tools/perf/builtin-help.c
+++ b/tools/perf/builtin-help.c
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
--- a/tools/perf/python/twatch.py
+++ b/tools/perf/python/twatch.py
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
--- a/tools/perf/tests/bpf-script-example.c
+++ b/tools/perf/tests/bpf-script-example.c
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
--- a/tools/perf/tests/dwarf-unwind.c
+++ b/tools/perf/tests/dwarf-unwind.c
--- a/tools/perf/tests/evsel-tp-sched.c
+++ b/tools/perf/tests/evsel-tp-sched.c
--- a/tools/perf/tests/hists_filter.c
+++ b/tools/perf/tests/hists_filter.c
--- a/tools/perf/tests/make
+++ b/tools/perf/tests/make
--- a/tools/perf/tests/mmap-basic.c
+++ b/tools/perf/tests/mmap-basic.c
--- a/tools/perf/tests/openat-syscall-all-cpus.c
+++ b/tools/perf/tests/openat-syscall-all-cpus.c
--- a/tools/perf/tests/openat-syscall-tp-fields.c
+++ b/tools/perf/tests/openat-syscall-tp-fields.c
--- a/tools/perf/tests/openat-syscall.c
+++ b/tools/perf/tests/openat-syscall.c
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
--- a/tools/perf/trace/strace/groups/file
+++ b/tools/perf/trace/strace/groups/file
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
--- a/tools/perf/ui/browser.h
+++ b/tools/perf/ui/browser.h
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
--- a/tools/perf/ui/browsers/map.c
+++ b/tools/perf/ui/browsers/map.c
--- a/tools/perf/ui/browsers/scripts.c
+++ b/tools/perf/ui/browsers/scripts.c
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
--- a/tools/perf/ui/tui/setup.c
+++ b/tools/perf/ui/tui/setup.c
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
--- a/tools/perf/util/intel-pt-decoder/intel-pt-log.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
--- a/tools/perf/util/intel-pt-decoder/intel-pt-log.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
--- a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
+++ b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
--- a/tools/perf/util/parse-options.c
+++ b/tools/perf/util/parse-options.c
--- a/tools/perf/util/parse-options.h
+++ b/tools/perf/util/parse-options.h
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
--- a/tools/perf/util/strbuf.c
+++ b/tools/perf/util/strbuf.c
--- a/tools/perf/util/strbuf.h
+++ b/tools/perf/util/strbuf.h
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
--- a/tools/perf/util/trace-event.c
+++ b/tools/perf/util/trace-event.c
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
--- a/tools/perf/util/usage.c
+++ b/tools/perf/util/usage.c
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c