1. 21 Jul, 2017 3 commits
    • Rob Herring's avatar
      x86/devicetree: Convert to using %pOF instead of ->full_name · db15e7f2
      Rob Herring authored
      
      Now that we have a custom printf format specifier, convert users of
      full_name to use %pOF instead. This is preparation to remove storing
      of the full path string for each device node.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devicetree@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170718214339.7774-7-robh@kernel.org
      
      
      [ Clarify the error message while at it, as 'node' is ambiguous. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      db15e7f2
    • Jiri Olsa's avatar
      perf/x86/intel: Add proper condition to run sched_task callbacks · df6c3db8
      Jiri Olsa authored
      
      We have 2 functions using the same sched_task callback:
      
        - PEBS drain for free running counters
        - LBR save/store
      
      Both of them are called from intel_pmu_sched_task() and
      either of them can be unwillingly triggered when the
      other one is configured to run.
      
      Let's say there's PEBS drain configured in sched_task
      callback for the event, but in the callback itself
      (intel_pmu_sched_task()) we will also run the code for
      LBR save/restore, which we did not ask for, but the
      code in intel_pmu_sched_task() does not check for that.
      
      This can lead to extra cycles in some perf monitoring,
      like when we monitor PEBS event without LBR data.
      
        # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000
      
        (We need PEBS, non freq/non timestamp event to enable
         the sched_task callback)
      
      The perf stat of cycles and msr:write_msr for above
      command before the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,519,557,441      cycles:k
              91,195,527      msr:write_msr
      
            29.334476406 seconds time elapsed
      
      And after the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,704,973,540      cycles:k
              27,184,720      msr:write_msr
      
            16.977875900 seconds time elapsed
      
      There's no affect on cycles:k because the sched_task happens
      with events switched off, however the msr:write_msr tracepoint
      counter together with almost 50% of time speedup show the
      improvement.
      
      Monitoring LBR event and having extra PEBS drain processing
      in sched_task callback showed just a little speedup, because
      the drain function does not do much extra work in case there
      is no PEBS data.
      
      Adding conditions to recognize the configured work that needs
      to be done in the x86_pmu's sched_task callback.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170719075247.GA27506@krava
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      df6c3db8
    • Andrew Banman's avatar
      x86/platform/uv/BAU: Disable BAU on single hub configurations · 2fe9a5c6
      Andrew Banman authored
      
      The BAU confers no benefit to a UV system running with only one hub/socket.
      Permanently disable the BAU driver if there are less than two hubs online
      to avoid BAU overhead. We have observed failed boots on single-socket UV4
      systems caused by BAU that are avoided with this patch.
      
      Also, while at it, consolidate initialization error blocks and fix a
      memory leak.
      Signed-off-by: default avatarAndrew Banman <abanman@hpe.com>
      Acked-by: default avatarRuss Anderson <rja@hpe.com>
      Acked-by: default avatarMike Travis <mike.travis@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: tony.ernst@hpe.com
      Link: http://lkml.kernel.org/r/1500588351-78016-1-git-send-email-abanman@hpe.com
      
      
      [ Minor cleanups. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2fe9a5c6
  2. 20 Jul, 2017 12 commits
    • Linus Torvalds's avatar
      x86: mark kprobe templates as character arrays, not single characters · 54a7d50b
      Linus Torvalds authored
      
      They really are, and the "take the address of a single character" makes
      the string fortification code unhappy (it believes that you can now only
      acccess one byte, rather than a byte range, and then raises errors for
      the memory copies going on in there).
      
      We could now remove a few 'addressof' operators (since arrays naturally
      degrade to pointers), but this is the minimal patch that just changes
      the C prototypes of those template arrays (the templates themselves are
      defined in inline asm).
      Reported-by: default avatarkernel test robot <xiaolong.ye@intel.com>
      Acked-and-tested-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54a7d50b
    • Josh Poimboeuf's avatar
      debug: Fix WARN_ON_ONCE() for modules · 325cdacd
      Josh Poimboeuf authored
      Mike Galbraith reported a situation where a WARN_ON_ONCE() call in DRM
      code turned into an oops.  As it turns out, WARN_ON_ONCE() seems to be
      completely broken when called from a module.
      
      The bug was introduced with the following commit:
      
        19d43626
      
       ("debug: Add _ONCE() logic to report_bug()")
      
      That commit changed WARN_ON_ONCE() to move its 'once' logic into the bug
      trap handler.  It requires a writable bug table so that the BUGFLAG_DONE
      bit can be written to the flags to indicate the first warning has
      occurred.
      
      The bug table was made writable for vmlinux, which relies on
      vmlinux.lds.S and vmlinux.lds.h for laying out the sections.  However,
      it wasn't made writable for modules, which rely on the ELF section
      header flags.
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Tested-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 19d43626 ("debug: Add _ONCE() logic to report_bug()")
      Link: http://lkml.kernel.org/r/a53b04235a65478dd9afc51f5b329fdc65c84364.1500095401.git.jpoimboe@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      325cdacd
    • Arnd Bergmann's avatar
      x86/platform/intel-mid: Fix a format string overflow warning · 0bc73048
      Arnd Bergmann authored
      
      We have space for exactly three characters for the index in "max7315_%d_base",
      but as GCC points out having more would cause an string overflow:
      
        arch/x86/platform/intel-mid/device_libs/platform_max7315.c: In function 'max7315_platform_data':
        arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:26: error: '%d' directive writing between 1 and 11 bytes into a region of size 9 [-Werror=format-overflow=]
           sprintf(base_pin_name, "max7315_%d_base", nr);
                                ^~~~~~~~~~~~~~~~~
        arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:26: note: directive argument in the range [-2147483647, 2147483647]
        arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:3: note: 'sprintf' output between 15 and 25 bytes into a destination of size 17
           sprintf(base_pin_name, "max7315_%d_base", nr);
      
      This makes it use an snprintf() to truncate the string if that happened
      rather than overflowing the stack. In practice, this is safe, because
      there won't be a large number of max7315 devices in the systems, and
      both the format and the length are defined by the firmware interface.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-9-arnd@arndb.de
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0bc73048
    • Arnd Bergmann's avatar
      x86/platform: Add PCI dependency for PUNIT_ATOM_DEBUG · d689c64d
      Arnd Bergmann authored
      
      The IOSF_MBI option requires PCI support, without it we get a harmless
      Kconfig warning when it gets selected by PUNIT_ATOM_DEBUG:
      
        warning: (X86_INTEL_LPSS && SND_SST_IPC_ACPI && MMC_SDHCI_ACPI && PUNIT_ATOM_DEBUG) selects IOSF_MBI which has unmet direct dependencies (PCI)
      
      This adds another dependency to avoid the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-8-arnd@arndb.de
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d689c64d
    • Arnd Bergmann's avatar
      x86/build: Silence the build with "make -s" · d460131d
      Arnd Bergmann authored
      
      Every kernel build on x86 will result in some output:
      
        Setup is 13084 bytes (padded to 13312 bytes).
        System is 4833 kB
        CRC 6d35fa35
        Kernel: arch/x86/boot/bzImage is ready  (#2)
      
      This shuts it up, so that 'make -s' is truely silent as long as
      everything works. Building without '-s' should produce unchanged
      output.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-6-arnd@arndb.de
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d460131d
    • Arnd Bergmann's avatar
      x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl · 7206f9bf
      Arnd Bergmann authored
      
      The x86 version of insb/insw/insl uses an inline assembly that does
      not have the target buffer listed as an output. This can confuse
      the compiler, leading it to think that a subsequent access of the
      buffer is uninitialized:
      
        drivers/net/wireless/wl3501_cs.c: In function ‘wl3501_mgmt_scan_confirm’:
        drivers/net/wireless/wl3501_cs.c:665:9: error: ‘sig.status’ is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/wireless/wl3501_cs.c:668:12: error: ‘sig.cap_info’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c: In function 'sb1000_rx':
        drivers/net/sb1000.c:775:9: error: 'st[0]' is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/sb1000.c:776:10: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c:784:11: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      I tried to mark the exact input buffer as an output here, but couldn't
      figure it out. As suggested by Linus, marking all memory as clobbered
      however is good enough too. For the outs operations, I also add the
      memory clobber, to force the input to be written to local variables.
      This is probably already guaranteed by the "asm volatile", but it can't
      hurt to do this for symmetry.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-5-arnd@arndb.de
      Link: https://lkml.org/lkml/2017/7/12/605
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7206f9bf
    • Arnd Bergmann's avatar
      x86/fpu/math-emu: Avoid bogus -Wint-in-bool-context warning · 5623452a
      Arnd Bergmann authored
      
      gcc-7.1.1 produces this warning:
      
        arch/x86/math-emu/reg_add_sub.c: In function 'FPU_add':
        arch/x86/math-emu/reg_add_sub.c:80:48: error: ?: using integer constants in boolean context [-Werror=int-in-bool-context]
      
      This appears to be a bug in gcc-7.1.1, and I have reported it as
      PR81484. The compiler suggests that code written as
      
      	if (a & b ? c : d)
      
      is usually incorrect and should have been
      
      	if (a & (b ? c : d))
      
      However, in this case, we correctly write
      
      	if ((a & b) ? c : d)
      
      and should not get a warning for it.
      
      This adds a dirty workaround for the problem, adding a comparison with
      zero inside of the macro. The warning is currently disabled in the kernel,
      so we may decide not to apply the patch, and instead wait for future gcc
      releases to fix the problem. On the other hand, it seems to be the
      only instance of this particular problem.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Bill Metzenthen <billm@melbpc.org.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-4-arnd@arndb.de
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81484
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5623452a
    • Arnd Bergmann's avatar
      x86/fpu/math-emu: Fix possible uninitialized variable use · 75e2f0a6
      Arnd Bergmann authored
      
      When building the kernel with "make EXTRA_CFLAGS=...", this overrides
      the "PARANOID" preprocessor macro defined in arch/x86/math-emu/Makefile,
      and we run into a build warning:
      
        arch/x86/math-emu/reg_compare.c: In function ‘compare_i_st_st’:
        arch/x86/math-emu/reg_compare.c:254:6: error: ‘f’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This fixes the implementation to work correctly even without the PARANOID
      flag, and also fixes the Makefile to not use the EXTRA_CFLAGS variable
      but instead use the ccflags-y variable in the Makefile that is meant
      for this purpose.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Bill Metzenthen <billm@melbpc.org.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-3-arnd@arndb.de
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      75e2f0a6
    • Arnd Bergmann's avatar
      perf/x86: Shut up false-positive -Wmaybe-uninitialized warning · 11d8b058
      Arnd Bergmann authored
      
      The intialization function checks for various failure scenarios, but
      unfortunately the compiler gets a little confused about the possible
      combinations, leading to a false-positive build warning when
      -Wmaybe-uninitialized is set:
      
        arch/x86/events/core.c: In function ‘init_hw_perf_events’:
        arch/x86/events/core.c:264:3: warning: ‘reg_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        arch/x86/events/core.c:264:3: warning: ‘val_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
           pr_err(FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n",
      
      We can't actually run into this case, so this shuts up the warning
      by initializing the variables to a known-invalid state.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-2-arnd@arndb.de
      Link: https://patchwork.kernel.org/patch/9392595/
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      11d8b058
    • Krzysztof Kozlowski's avatar
      x86/defconfig: Remove stale, old Kconfig options · 0e7f0b6c
      Krzysztof Kozlowski authored
      Remove old, dead Kconfig options (in order appearing in this commit):
      
       - EXPERIMENTAL is gone since v3.9;
       - IP_NF_TARGET_ULOG: commit d4da843e ("netfilter: kill remnants of ulog targets");
       - USB_LIBUSUAL: commit f61870ee
      
       ("usb: remove libusual");
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1500526885-4341-1-git-send-email-krzk@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0e7f0b6c
    • Seunghun Han's avatar
      x86/ioapic: Pass the correct data to unmask_ioapic_irq() · e708e35b
      Seunghun Han authored
      One of the rarely executed code pathes in check_timer() calls
      unmask_ioapic_irq() passing irq_get_chip_data(0) as argument.
      
      That's wrong as unmask_ioapic_irq() expects a pointer to the irq data of
      interrupt 0. irq_get_chip_data(0) returns NULL, so the following
      dereference in unmask_ioapic_irq() causes a kernel panic.
      
      The issue went unnoticed in the first place because irq_get_chip_data()
      returns a void pointer so the compiler cannot do a type check on the
      argument. The code path was added for machines with broken configuration,
      but it seems that those machines are either not running current kernels or
      simply do not longer exist.
      
      Hand in irq_get_irq_data(0) as argument which provides the correct data.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: 4467715a
      
       ("x86/irq: Move irq_cfg.irq_2_pin into io_apic.c")
      Signed-off-by: default avatarSeunghun Han <kkamagui@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1500369644-45767-1-git-send-email-kkamagui@gmail.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e708e35b
    • Seunghun Han's avatar
      x86/acpi: Prevent out of bound access caused by broken ACPI tables · dad5ab0d
      Seunghun Han authored
      
      The bus_irq argument of mp_override_legacy_irq() is used as the index into
      the isa_irq_to_gsi[] array. The bus_irq argument originates from
      ACPI_MADT_TYPE_IO_APIC and ACPI_MADT_TYPE_INTERRUPT items in the ACPI
      tables, but is nowhere sanity checked.
      
      That allows broken or malicious ACPI tables to overwrite memory, which
      might cause malfunction, panic or arbitrary code execution.
      
      Add a sanity check and emit a warning when that triggers.
      
      [ tglx: Added warning and rewrote changelog ]
      Signed-off-by: default avatarSeunghun Han <kkamagui@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: security@kernel.org
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dad5ab0d
  3. 18 Jul, 2017 4 commits
    • Jiri Olsa's avatar
      perf/x86/intel: Fix debug_store reset field for freq events · dc853e26
      Jiri Olsa authored
      
      There's a bug in PEBs event enabling code, that prevents PEBS
      freq events to work properly after non freq PEBS event was run.
      
      freq events - perf_event_attr::freq set
                    -F <freq> option of perf record
      
      PEBS events - perf_event_attr::precise_ip > 0
                    default for perf record
      
      Like in following example with CPU 0 busy, we expect ~10000 samples
      for following perf tool run:
      
        # perf record -F 10000 -C 0 sleep 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.640 MB perf.data (10031 samples) ]
      
      Everything's fine, but once we run non freq PEBS event like:
      
        # perf record -c 10000 -C 0 sleep 1
        [ perf record: Woken up 4 times to write data ]
        [ perf record: Captured and wrote 1.053 MB perf.data (20061 samples) ]
      
      the freq events start to fail like this:
      
        # perf record -F 10000 -C 0 sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.185 MB perf.data (40 samples) ]
      
      The issue is in non freq PEBs event initialization of debug_store reset
      field, which value is used to auto-reload the counter value after PEBS
      event drain. This value is not being used for PEBS freq events, but once
      we run non freq event it stays in debug_store data and screws the
      sample_freq counting for PEBS freq events.
      
      Setting the reset field to 0 for freq events.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170714163551.19459-1-jolsa@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dc853e26
    • Kan Liang's avatar
      perf/x86/intel: Add Goldmont Plus CPU PMU support · dd0b06b5
      Kan Liang authored
      
      Add perf core PMU support for Intel Goldmont Plus CPU cores:
      
       - The init code is based on Goldmont.
       - There is a new cache event list, based on the Goldmont cache event
         list.
       - All four general-purpose performance counters support PEBS.
       - The first general-purpose performance counter is for reduced skid
         PEBS mechanism. Using :ppp to indicate the event which want to do
         reduced skid PEBS.
       - Goldmont Plus has 4-wide pipeline for Topdown
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/20170712134423.17766-1-kan.liang@intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dd0b06b5
    • Harry Pan's avatar
      perf/x86/intel: Enable C-state residency events for Apollo Lake · 5c10b048
      Harry Pan authored
      
      Goldmont microarchitecture supports C1/C3/C6, PC2/PC3/PC6/PC10 state
      residency counters, the patch enables them for Apollo Lake platform.
      
      The MSR information is based on Intel Software Developers' Manual,
      Vol. 4, Order No. 335592, Table 2-6 and 2-12.
      Signed-off-by: default avatarHarry Pan <harry.pan@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bp@suse.de
      Cc: davidcc@google.com
      Cc: gs0622@gmail.com
      Cc: lukasz.odzioba@intel.com
      Cc: piotr.luc@intel.com
      Cc: srinivas.pandruvada@linux.intel.com
      Link: http://lkml.kernel.org/r/20170717103749.24337-1-harry.pan@intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5c10b048
    • Roman Kagan's avatar
      x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT · 4c07f904
      Roman Kagan authored
      A recent commit:
      
        d6e41f11
      
       ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
      
      introduced a VM_WARN_ON(!in_atomic()) which generates false positives
      on every VM entry on !CONFIG_PREEMPT_COUNT kernels.
      
      Replace it with a test for preemptible(), which appears to match the
      original intent and works across different CONFIG_PREEMPT* variations.
      Signed-off-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: linux-mm@kvack.org
      Fixes: d6e41f11
      
       ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4c07f904
  4. 17 Jul, 2017 1 commit
    • Geert Uytterhoeven's avatar
      Blackfin: flat: Use %x to format u32 · cb0fbbf2
      Geert Uytterhoeven authored
      Several variables had their types changed from unsigned long to u32,
      but the printk()-style format to print them wasn't updated, leading to:
      
          arch/blackfin/kernel/flat.c: In function 'bfin_get_addr_from_rp':
          arch/blackfin/kernel/flat.c:35:3: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'u32' [-Wformat]
          arch/blackfin/kernel/flat.c: In function 'bfin_put_addr_at_rp':
          arch/blackfin/kernel/flat.c:80:3: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'u32' [-Wformat]
      
      Fixes: 468138d7
      
       ("binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb0fbbf2
  5. 16 Jul, 2017 4 commits
  6. 14 Jul, 2017 7 commits
    • Tobias Klauser's avatar
      xtensa: use generic fb.h · 20cf0c54
      Tobias Klauser authored
      The arch uses a verbatim copy of the asm-generic version and does not
      add any own implementations to the header, so use asm-generic/fb.h
      instead of duplicating code.
      
      Link: http://lkml.kernel.org/r/20170517083545.2115-1-tklauser@distanz.ch
      
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Acked-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20cf0c54
    • Jane Chu's avatar
      sparc64: Measure receiver forward progress to avoid send mondo timeout · 9d53caec
      Jane Chu authored
      
      A large sun4v SPARC system may have moments of intensive xcall activities,
      usually caused by unmapping many pages on many CPUs concurrently. This can
      flood receivers with CPU mondo interrupts for an extended period, causing
      some unlucky senders to hit send-mondo timeout. This problem gets worse
      as cpu count increases because sometimes mappings must be invalidated on
      all CPUs, and sometimes all CPUs may gang up on a single CPU.
      
      But a busy system is not a broken system. In the above scenario, as long
      as the receiver is making forward progress processing mondo interrupts,
      the sender should continue to retry.
      
      This patch implements the receiver's forward progress meter by introducing
      a per cpu counter 'cpu_mondo_counter[cpu]' where 'cpu' is in the range
      of 0..NR_CPUS. The receiver increments its counter as soon as it receives
      a mondo and the sender tracks the receiver's counter. If the receiver has
      stopped making forward progress when the retry limit is reached, the sender
      declares send-mondo-timeout and panic; otherwise, the receiver is allowed
      to keep making forward progress.
      
      In addition, it's been observed that PCIe hotplug events generate Correctable
      Errors that are handled by hypervisor and then OS. Hypervisor 'borrows'
      a guest cpu strand briefly to provide the service. If the cpu strand is
      simultaneously the only cpu targeted by a mondo, it may not be available
      for the mondo in 20msec, causing SUN4V mondo timeout. It appears that 1 second
      is the agreed wait time between hypervisor and guest OS, this patch makes
      the adjustment.
      
      Orabug: 25476541
      Orabug: 26417466
      Signed-off-by: default avatarJane Chu <jane.chu@oracle.com>
      Reviewed-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Reviewed-by: default avatarAnthony Yznaga <anthony.yznaga@oracle.com>
      Reviewed-by: default avatarRob Gardner <rob.gardner@oracle.com>
      Reviewed-by: default avatarThomas Tai <thomas.tai@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d53caec
    • Roman Kagan's avatar
      kvm: x86: hyperv: make VP_INDEX managed by userspace · d3457c87
      Roman Kagan authored
      Hyper-V identifies vCPUs by Virtual Processor Index, which can be
      queried via HV_X64_MSR_VP_INDEX msr.  It is defined by the spec as a
      sequential number which can't exceed the maximum number of vCPUs per VM.
      APIC ids can be sparse and thus aren't a valid replacement for VP
      indices.
      
      Current KVM uses its internal vcpu index as VP_INDEX.  However, to make
      it predictable and persistent across VM migrations, the userspace has to
      control the value of VP_INDEX.
      
      This patch achieves that, by storing vp_index explicitly on vcpu, and
      allowing HV_X64_MSR_VP_INDEX to be set from the host side.  For
      compatibility it's initialized to KVM vcpu index.  Also a few variables
      are renamed to make clear distinction betweed this Hyper-V vp_index and
      KVM vcpu_id (== APIC id).  Besides, a new capability,
      KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
      attempting msr writes where unsupported, to avoid spamming error logs.
      
      Signed-off-by: Roman Kagan <rkagan@vi...
      d3457c87
    • Wanpeng Li's avatar
      KVM: async_pf: Let guest support delivery of async_pf from guest mode · 52a5c155
      Wanpeng Li authored
      
      Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
      async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
      kvm_can_do_async_pf returns 0 if in guest mode.
      
      This is similar to what svm.c wanted to do all along, but it is only
      enabled for Linux as L1 hypervisor.  Foreign hypervisors must never
      receive async page faults as vmexits, because they'd probably be very
      confused about that.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      52a5c155
    • Wanpeng Li's avatar
      KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf · adfe20fb
      Wanpeng Li authored
      
      Add an nested_apf field to vcpu->arch.exception to identify an async page
      fault, and constructs the expected vm-exit information fields. Force a
      nested VM exit from nested_vmx_check_exception() if the injected #PF is
      async page fault.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      adfe20fb
    • Wanpeng Li's avatar
      KVM: async_pf: Add L1 guest async_pf #PF vmexit handler · 1261bfa3
      Wanpeng Li authored
      
      This patch adds the L1 guest async page fault #PF vmexit handler, such
      by L1 similar to ordinary async page fault.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      [Passed insn parameters to kvm_mmu_page_fault().]
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      1261bfa3
    • Wanpeng Li's avatar
      KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list · cfcd20e5
      Wanpeng Li authored
      
      This patch removes all arguments except the first in
      kvm_x86_ops->queue_exception since they can extract the arguments from
      vcpu->arch.exception themselves.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      cfcd20e5
  7. 13 Jul, 2017 2 commits
    • Roman Kagan's avatar
      kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 · efc479e6
      Roman Kagan authored
      
      There is a flaw in the Hyper-V SynIC implementation in KVM: when message
      page or event flags page is enabled by setting the corresponding msr,
      KVM zeroes it out.  This is problematic because on migration the
      corresponding MSRs are loaded on the destination, so the content of
      those pages is lost.
      
      This went unnoticed so far because the only user of those pages was
      in-KVM hyperv synic timers, which could continue working despite that
      zeroing.
      
      Newer QEMU uses those pages for Hyper-V VMBus implementation, and
      zeroing them breaks the migration.
      
      Besides, in newer QEMU the content of those pages is fully managed by
      QEMU, so zeroing them is undesirable even when writing the MSRs from the
      guest side.
      
      To support this new scheme, introduce a new capability,
      KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
      pages aren't zeroed out in KVM.
      Signed-off-by: default avatarRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      efc479e6
    • Ladi Prosek's avatar
      KVM: x86: make backwards_tsc_observed a per-VM variable · a826faf1
      Ladi Prosek authored
      The backwards_tsc_observed global introduced in commit 16a96021
      
       is never
      reset to false. If a VM happens to be running while the host is suspended
      (a common source of the TSC jumping backwards), master clock will never
      be enabled again for any VM. In contrast, if no VM is running while the
      host is suspended, master clock is unaffected. This is inconsistent and
      unnecessarily strict. Let's track the backwards_tsc_observed variable
      separately and let each VM start with a clean slate.
      
      Real world impact: My Windows VMs get slower after my laptop undergoes a
      suspend/resume cycle. The only way to get the perf back is unloading and
      reloading the kvm module.
      Signed-off-by: default avatarLadi Prosek <lprosek@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      a826faf1
  8. 12 Jul, 2017 7 commits