1. 12 Aug, 2016 17 commits
    • Linus Torvalds's avatar
      Merge tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 9710cb66
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "Two hibernation fixes allowing it to work with the recently added
        randomization of the kernel identity mapping base on x86-64 and one
        cpufreq driver regression fix.
      
        Specifics:
      
         - Fix the x86 identity mapping creation helpers to avoid the
           assumption that the base address of the mapping will always be
           aligned at the PGD level, as it may be aligned at the PUD level if
           address space randomization is enabled (Rafael Wysocki).
      
         - Fix the hibernation core to avoid executing tracing functions
           before restoring the processor state completely during resume
           (Thomas Garnier).
      
         - Fix a recently introduced regression in the powernv cpufreq driver
           that causes it to crash due to an out-of-bounds array access
           (Akshay Adiga)"
      
      * tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / hibernate: Restore processor state before using per-CPU variables
        x86/power/64: Always create temporary identity mapping correctly
        cpufreq: powernv: Fix crash in gpstate_timer_handler()
      9710cb66
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 01ea4439
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "This is bigger than usual - the reason is partly a pent-up stream of
        fixes after the merge window and partly accidental.  The fixes are:
      
         - five patches to fix a boot failure on Andy Lutomirsky's laptop
         - four SGI UV platform fixes
         - KASAN fix
         - warning fix
         - documentation update
         - swap entry definition fix
         - pkeys fix
         - irq stats fix"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
        x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
        x86/boot: Rework reserve_real_mode() to allow multiple tries
        x86/boot: Defer setup_real_mode() to early_initcall time
        x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
        x86/boot: Run reserve_bios_regions() after we initialize the memory map
        x86/irq: Do not substract irq_tlb_count from irq_call_count
        x86/mm: Fix swap entry comment and macro
        x86/mm/kaslr: Fix -Wformat-security warning
        x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
        x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
        x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
        x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
        x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
        x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
        x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
        x86/mm: Disable preemption during CR3 read+write
        x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
        x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
        x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
      01ea4439
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3bc6d8c1
      Linus Torvalds authored
      Pull timer fixes from Ingo Molnar:
       "Misc fixes: a /dev/rtc regression fix, two APIC timer period
        calibration fixes, an ARM clocksource driver fix and a NOHZ
        power use regression fix"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
        x86/timers/apic: Inform TSC deadline clockevent device about recalibration
        x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
        timers: Fix get_next_timer_interrupt() computation
        clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered
      3bc6d8c1
    • Rafael J. Wysocki's avatar
      Merge branches 'pm-sleep' and 'pm-cpufreq' · 0aeeb3e7
      Rafael J. Wysocki authored
      * pm-sleep:
        PM / hibernate: Restore processor state before using per-CPU variables
        x86/power/64: Always create temporary identity mapping correctly
      
      * pm-cpufreq:
        cpufreq: powernv: Fix crash in gpstate_timer_handler()
      0aeeb3e7
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e6e7214f
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Misc fixes: cputime fixes, two deadline scheduler fixes and a cgroups
        scheduling fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/cputime: Fix omitted ticks passed in parameter
        sched/cputime: Fix steal time accounting
        sched/deadline: Fix lock pinning warning during CPU hotplug
        sched/cputime: Mitigate performance regression in times()/clock_gettime()
        sched/fair: Fix typo in sync_throttle()
        sched/deadline: Fix wrap-around in DL heap
      e6e7214f
    • Thomas Garnier's avatar
      PM / hibernate: Restore processor state before using per-CPU variables · 62822e2e
      Thomas Garnier authored
      Restore the processor state before calling any other functions to
      ensure per-CPU variables can be used with KASLR memory randomization.
      
      Tracing functions use per-CPU variables (GS based on x86) and one was
      called just before restoring the processor state fully. It resulted
      in a double fault when both the tracing & the exception handler
      functions tried to use a per-CPU variable.
      
      Fixes: bb3632c6 (PM / sleep: trace events for suspend/resume)
      Reported-and-tested-by: default avatarBorislav Petkov <bp@suse.de>
      Reported-by: default avatarJiri Kosina <jikos@kernel.org>
      Tested-by: default avatarRafael J. Wysocki <rafael@kernel.org>
      Tested-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Garnier <thgarnie@google.com>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      62822e2e
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ad83242a
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Mostly tooling fixes, plus two uncore-PMU fixes, an uprobes fix, a
        perf-cgroups fix and an AUX events fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Add enable_box for client MSR uncore
        perf/x86/intel/uncore: Fix uncore num_counters
        uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
        perf/core: Set cgroup in CPU contexts for new cgroup events
        perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash
        perf probe ppc64le: Fix probe location when using DWARF
        perf probe: Add function to post process kernel trace events
        tools: Sync cpufeatures headers with the kernel
        toops: Sync tools/include/uapi/linux/bpf.h with the kernel
        tools: Sync cpufeatures.h and vmx.h with the kernel
        perf probe: Support signedness casting
        perf stat: Avoid skew when reading events
        perf probe: Fix module name matching
        perf probe: Adjust map->reloc offset when finding kernel symbol from map
        perf hists: Trim libtraceevent trace_seq buffers
        perf script: Add 'bpf-output' field to usage message
      ad83242a
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1f8083c6
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
       "Misc fixes: lockstat fix, futex fix on !MMU systems, big endian fix
        for qrwlocks and a race fix for pvqspinlocks"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/pvqspinlock: Fix a bug in qstat_read()
        locking/pvqspinlock: Fix double hash race
        locking/qrwlock: Fix write unlock bug on big endian systems
        futex: Assume all mappings are private on !MMU systems
      1f8083c6
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 25db6918
      Linus Torvalds authored
      Pull irq fix from Ingo Molnar:
       "A fix for an MSI regression"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/msi: Make sure PCI MSIs are activated early
      25db6918
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0e1117b2
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "A fix for EFI capsules and an SGI UV platform fix"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/capsule: Allocate whole capsule into virtual memory
        x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case
      0e1117b2
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 99091700
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights include:
      
         - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
           needs multiple different security services (e.g.  krb5i and krb5p).
      
         - Stable patch to fix a regression introduced by the use of
           SO_REUSEPORT, and that prevented the use of multiple different NFS
           versions to the same server.
      
         - TCP socket reconnection timer fixes.
      
         - Patch from Neil to disable the use of IPv6 temporary addresses"
      
      * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFSv4: Cap the transport reconnection timer at 1/2 lease period
        NFSv4: Cleanup the setting of the nfs4 lease period
        SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
        SUNRPC: Fix reconnection timeouts
        NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
        SUNRPC: disable the use of IPv6 temporary addresses.
        SUNRPC: allow for upcalls for same uid but different gss service
        SUNRPC: Fix up socket autodisconnect
        SUNRPC: Handle EADDRNOTAVAIL on connection failures
      99091700
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · c239ae10
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
      
       - Fix for the nd_blk (NVDIMM Block Window Aperture) driver.
      
         A spec clarification requires the driver to mask off reserved bits in
         status register.  This is tagged for -stable back to the v4.2 kernel.
      
       - Fix for a kernel crash in the nvdimm unit tests when module loading
         is interrupted with SIGTERM.  Tagged for -stable since validation
         efforts external to Intel use the unit tests for qualifying
         backports.
      
       - Add a new 'size' sysfs attribute for the BTT (NVDIMM Block
         Translation Table) driver to make it symmetric with the other
         namespace personality drivers (PFN and DAX) that provide a size
         attribute for indicating how much namespace capacity is lost to
         metadata.
      
         The BTT change arrived at the start of the merge window and has
         appeared in a -next release.  It can technically wait for 4.9, but it
         is small, fixes asymmetry in the libnvdimm-sysfs interface, and
         something I would have squeezed into the v4.8 pull request had it
         arrived a few days earlier.
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        tools/testing/nvdimm: fix SIGTERM vs hotplug crash
        nvdimm, btt: add a size attribute for BTTs
        libnvdimm, nd_blk: mask off reserved status bits
      c239ae10
    • Linus Torvalds's avatar
      Merge tag 'sound-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 86fc0488
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A regression fix of HD-audio runtime PM and two USB quirks"
      
      * tag 'sound-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Manage power well properly for resume
        ALSA: usb-audio: Add quirk for ELP HD USB Camera
        ALSA: usb-audio: Add a sample rate quirk for Creative Live! Cam Socialize HD (VF0610)
      86fc0488
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 8766dc68
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Some powerpc fixes for 4.8:
      
        Misc:
         - powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
         - powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
         - powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
         - cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
         - powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
         - cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
         - powerpc/vdso: Add missing include file from Guenter Roeck
         - powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
         - powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
         - powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
         - crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
         - powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens
      
        Benjamin Herrenschmidt:
         - powerpc/32: Fix crash during static key init
         - powerpc: Update obsolete comment in setup_32.c about early_init()
         - powerpc: Print the kernel load address at the end of prom_init()
         - powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
         - powerpc/xics: Properly set Edge/Level type and enable resend
      
        Mahesh Salgaonkar:
         - powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
         - powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
         - powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
         - powerpc/powernv: Load correct TOC pointer while waking up from winkle.
      
        Andrew Donnellan:
         - cxl: Fix sparse warnings
         - cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
      
        Michael Ellerman:
         - selftests/powerpc: Specify we expect to build with std=gnu99
         - powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
         - powerpc/pci: Fix endian bug in fixed PHB numbering"
      
      * tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (26 commits)
        selftests/powerpc: Specify we expect to build with std=gnu99
        powerpc/vdso: Fix build rules to rebuild vdsos correctly
        powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
        powerpc/32: Fix crash during static key init
        powerpc: Update obsolete comment in setup_32.c about early_init()
        powerpc: Print the kernel load address at the end of prom_init()
        powerpc/ptrace: Fix coredump since ptrace TM changes
        powerpc/32: Fix csum_partial_copy_generic()
        cxl: Set psl_fir_cntl to production environment value
        powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
        powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
        powerpc/pci: Fix endian bug in fixed PHB numbering
        powerpc/eeh: Switch to conventional PCI address output in EEH log
        cxl: Fix sparse warnings
        cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
        cxl: Use fixed width predefined types in data structure.
        powerpc/vdso: Add missing include file
        powerpc: Fix unused function warning 'lmb_to_memblock'
        powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
        powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
        ...
      8766dc68
    • Kan Liang's avatar
      perf/x86/intel/uncore: Add enable_box for client MSR uncore · 95f3be79
      Kan Liang authored
      There are bug reports about miscounting uncore counters on some
      client machines like Sandybridge, Broadwell and Skylake. It is
      very likely to be observed on idle systems.
      
      This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be
      cleared after Package C7, and nothing will be count.
      The related errata (HSD 158) could be found in:
      
        www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
      
      This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL
      in ->enable_box(). The workaround does not cover all cases. It helps for new
      events after returning from C7. But it cannot prevent C7, it will still
      miscount if a counter is already active.
      
      There is no drawback in leaving it enabled, so it does not need
      disable_box() here.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      95f3be79
    • Kan Liang's avatar
      perf/x86/intel/uncore: Fix uncore num_counters · 10e9e7bd
      Kan Liang authored
      Some uncore boxes' num_counters value for Haswell server and
      Broadwell server are not correct (too large, off by one).
      
      This issue was found by comparing the code with the document. Although
      there is no bug report from users yet, accessing non-existent counters
      is dangerous and the behavior is undefined: it may cause miscounting or
      even crashes.
      
      This patch makes them consistent with the uncore document.
      Reported-by: default avatarLukasz Odzioba <lukasz.odzioba@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      10e9e7bd
    • Denys Vlasenko's avatar
      uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions · 68187872
      Denys Vlasenko authored
      Since instruction decoder now supports EVEX-encoded instructions, two fixes
      are needed to correctly handle them in uprobes.
      
      Extended bits for MODRM.rm field need to be sanitized just like we do it
      for VEX3, to avoid encoding wrong register for register-relative access.
      
      EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
      ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
      paranoid here: proper encoding for register-relative access
      should have EVEX.x = 1.
      
      Secondly, we should fetch vex.vvvv for EVEX too.
      This is now super easy because instruction decoder populates
      vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v4.1+
      Fixes: 8a764a87 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
      Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      68187872
  2. 11 Aug, 2016 23 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 4b9eaf33
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "7 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats
        mm, oom: fix uninitialized ret in task_will_free_mem()
        kasan: remove the unnecessary WARN_ONCE from quarantine.c
        mm: memcontrol: fix memcg id ref counter on swap charge move
        mm: memcontrol: fix swap counter leak on swapout from offline cgroup
        proc, meminfo: use correct helpers for calculating LRU sizes in meminfo
        mm/hugetlb: fix incorrect hugepages count during mem hotplug
      4b9eaf33
    • Reza Arbab's avatar
      mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats · 5830169f
      Reza Arbab authored
      The following oops occurs after a pgdat is hotadded:
      
        Unable to handle kernel paging request for data at address 0x00c30001
        Faulting instruction address: 0xc00000000022f8f4
        Oops: Kernel access of bad area, sig: 11 [#1]
        SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter nls_utf8 isofs sg virtio_balloon uio_pdrv_genirq uio ip_tables xfs libcrc32c sr_mod cdrom sd_mod virtio_net ibmvscsi scsi_transport_srp virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
        CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W 4.8.0-rc1-device #110
        task: c000000000ef3080 task.stack: c000000000f6c000
        NIP: c00000000022f8f4 LR: c00000000022f948 CTR: 0000000000000000
        REGS: c000000000f6fa50 TRAP: 0300   Tainted: G        W (4.8.0-rc1-device)
        MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 84002028  XER: 20000000
        CFAR: d000000001d2013c DAR: 0000000000c30001 DSISR: 40000000 SOFTE: 0
        NIP refresh_cpu_vm_stats+0x1a4/0x2f0
        LR refresh_cpu_vm_stats+0x1f8/0x2f0
        Call Trace:
          refresh_cpu_vm_stats+0x1f8/0x2f0 (unreliable)
      
      Add per_cpu_nodestats initialization to the hotplug codepath.
      
      Link: http://lkml.kernel.org/r/1470931473-7090-1-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: default avatarReza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5830169f
    • Geert Uytterhoeven's avatar
      mm, oom: fix uninitialized ret in task_will_free_mem() · f33e6f06
      Geert Uytterhoeven authored
          mm/oom_kill.c: In function `task_will_free_mem':
          mm/oom_kill.c:767: warning: `ret' may be used uninitialized in this function
      
      If __task_will_free_mem() is never called inside the for_each_process()
      loop, ret will not be initialized.
      
      Fixes: 1af8bb43 ("mm, oom: fortify task_will_free_mem()")
      Link: http://lkml.kernel.org/r/1470255599-24841-1-git-send-email-geert@linux-m68k.orgSigned-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f33e6f06
    • Alexander Potapenko's avatar
      kasan: remove the unnecessary WARN_ONCE from quarantine.c · bcbf0d56
      Alexander Potapenko authored
      It's quite unlikely that the user will so little memory that the per-CPU
      quarantines won't fit into the given fraction of the available memory.
      Even in that case he won't be able to do anything with the information
      given in the warning.
      
      Link: http://lkml.kernel.org/r/1470929182-101413-1-git-send-email-glider@google.comSigned-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bcbf0d56
    • Vladimir Davydov's avatar
      mm: memcontrol: fix memcg id ref counter on swap charge move · 615d66c3
      Vladimir Davydov authored
      Since commit 73f576c0 ("mm: memcontrol: fix cgroup creation failure
      after many small jobs") swap entries do not pin memcg->css.refcnt
      directly.  Instead, they pin memcg->id.ref.  So we should adjust the
      reference counters accordingly when moving swap charges between cgroups.
      
      Fixes: 73f576c0 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
      Link: http://lkml.kernel.org/r/9ce297c64954a42dc90b543bc76106c4a94f07e8.1470219853.git.vdavydov@virtuozzo.comSigned-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[3.19+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      615d66c3
    • Vladimir Davydov's avatar
      mm: memcontrol: fix swap counter leak on swapout from offline cgroup · 1f47b61f
      Vladimir Davydov authored
      An offline memory cgroup might have anonymous memory or shmem left
      charged to it and no swap.  Since only swap entries pin the id of an
      offline cgroup, such a cgroup will have no id and so an attempt to
      swapout its anon/shmem will not store memory cgroup info in the swap
      cgroup map.  As a result, memcg->swap or memcg->memsw will never get
      uncharged from it and any of its ascendants.
      
      Fix this by always charging swapout to the first ancestor cgroup that
      hasn't released its id yet.
      
      [hannes@cmpxchg.org: add comment to mem_cgroup_swapout]
      [vdavydov@virtuozzo.com: use WARN_ON_ONCE() in mem_cgroup_id_get_online()]
        Link: http://lkml.kernel.org/r/20160803123445.GJ13263@esperanza
      Fixes: 73f576c0 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
      Link: http://lkml.kernel.org/r/5336daa5c9a32e776067773d9da655d2dc126491.1470219853.git.vdavydov@virtuozzo.comSigned-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>	[3.19+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f47b61f
    • Mel Gorman's avatar
      proc, meminfo: use correct helpers for calculating LRU sizes in meminfo · 2f95ff90
      Mel Gorman authored
      meminfo_proc_show() and si_mem_available() are using the wrong helpers
      for calculating the size of the LRUs.  The user-visible impact is that
      there appears to be an abnormally high number of unevictable pages.
      
      Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f95ff90
    • zhong jiang's avatar
      mm/hugetlb: fix incorrect hugepages count during mem hotplug · c1470b33
      zhong jiang authored
      When memory hotplug operates, free hugepages will be freed if the
      movable node is offline.  Therefore, /proc/sys/vm/nr_hugepages will be
      incorrect.
      
      Fix it by reducing max_huge_pages when the node is offlined.
      
      n-horiguchi@ah.jp.nec.com said:
      
      : dissolve_free_huge_page intends to break a hugepage into buddy, and the
      : destination hugepage is supposed to be allocated from the pool of the
      : destination node, so the system-wide pool size is reduced.  So adding
      : h->max_huge_pages-- makes sense to me.
      
      Link: http://lkml.kernel.org/r/1470624546-902-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1470b33
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · d3396e1e
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "A couple of bug fixes have come in for v4.8 so far.  Since the first
        few were originally meant to go into -rc1 (but didn't get sent in time
        for travel reasons), the branch is unfortunately based on top of a
        commit in the middle of the merge window rather than -rc1.
      
        Content-wise we have:
      
         - a fix for the last remaining broken build in kernelci, getting
           mach-shmobile to build again with SMP disabled
      
         - a fix for a realview regression that broke real hardware but not
           the qemu model that everyone uses in practice (needed for v4.7 as
           well)
      
         - a merge conflict fix for Tegra that also broke v4.7
      
         - two Kconfig fixes for arm64 build regressions
      
         - a couple of arm32 build warning fixes (all harmless)
      
         - fix the RTC on Exynos7 Espresso (which apparently never worked
           right)"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        Merge tag 'pxa-fixes-v4.8' of https://github.com/rjarzmik/linux into randconfig-4.8
        arm64: Kconfig: select HISILICON_IRQ_MBIGEN only if PCI is selected
        arm64: Kconfig: select ALPINE_MSI only if PCI is selected
        ARM: dts: realview: Fix PBX-A9 cache description
        ARM: tegra: fix erroneous address in dts
        ARM: dts: add syscon compatible string for AP syscon
        ARM: dts: add syscon compatible string for CP syscon
        ARM: oxnas: select reset controller framework
        ARM: hide mach-*/ include for ARM_SINGLE_ARMV7M
        ARM: don't include removed directories
        Revert "ARM: aspeed: adapt defconfigs for new CONFIG_PRINTK_TIME"
        ARM: shmobile: don't call platform_can_secondary_boot on UP
        MAINTAINER: alpine: add a mailing list
        ARM: do away with final ARCH_REQUIRE_GPIOLIB
        arm64: dts: Fix RTC by providing rtc_src clock
      d3396e1e
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 6da7e953
      Linus Torvalds authored
      Pull virtio/vhost fixes and cleanups from Michael Tsirkin:
       "Misc fixes and cleanups all over the place"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio/s390: deprecate old transport
        virtio/s390: keep early_put_chars
        virtio_blk: Fix a slient kernel panic
        virtio-vsock: fix include guard typo
        vhost/vsock: fix vhost virtio_vsock_pkt use-after-free
        9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc()
        virtio: fix error handling for debug builds
        virtio: fix memory leak in virtqueue_add()
      6da7e953
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client · 3b3ce01a
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "A patch for a NULL dereference bug introduced in 4.8-rc1 and a handful
        of static checker fixes"
      
      * tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client:
        ceph: initialize pathbase in the !dentry case in encode_caps_cb()
        rbd: nuke the 32-bit pool id check
        rbd: destroy header_oloc in rbd_dev_release()
        ceph: fix null pointer dereference in ceph_flush_snaps()
        libceph: using kfree_rcu() to simplify the code
        libceph: make cancel_generic_request() static
        libceph: fix return value check in alloc_msg_with_page_vector()
      3b3ce01a
    • Sebastian Andrzej Siewior's avatar
      x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe() · d52c0569
      Sebastian Andrzej Siewior authored
      I made a mistake while converting the driver to the hotplug state
      machine and as a result x2apic_cluster_probe() was accessing
      cpus_in_cluster before allocating it.
      
      This patch fixes it by setting the cpumask after the allocation the
      memory succeeded.
      
      While at it, I marked two functions static which are only used within
      this file.
      Reported-by: default avatarLaura Abbott <labbott@redhat.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6b2c2847 ("x86/x2apic: Convert to CPU hotplug state machine")
      Link: http://lkml.kernel.org/r/1470924515-9444-1-git-send-email-bigeasy@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d52c0569
    • Frederic Weisbecker's avatar
      sched/cputime: Fix omitted ticks passed in parameter · 26f2c75c
      Frederic Weisbecker authored
      Commit:
      
        f9bcf1e0 ("sched/cputime: Fix steal time accounting")
      
      ... fixes a leak on steal time accounting but forgets to account
      the ticks passed in parameters, assuming there is only one to
      take into account.
      
      Let's consider that parameter back.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarWanpeng Li <kernellwp@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: linux-tip-commits@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160811125822.GB4214@lerougeSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      26f2c75c
    • Austin Christ's avatar
      efi/capsule: Allocate whole capsule into virtual memory · 6862e6ad
      Austin Christ authored
      According to UEFI 2.6 section 7.5.3, the capsule should be in contiguous
      virtual memory and firmware may consume the capsule immediately. To
      correctly implement this functionality, the kernel driver needs to vmap
      the entire capsule at the time it is made available to firmware.
      
      The virtual allocation of the capsule update has been changed from kmap,
      which was only allocating the first page of the update, to vmap, and
      allocates the entire data payload.
      Signed-off-by: default avatarAustin Christ <austinwc@codeaurora.org>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Cc: <stable@vger.kernel.org> # v4.7
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kweh Hock Leong <hock.leong.kweh@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1470912120-22831-3-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6862e6ad
    • Alex Thorlton's avatar
      x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case · f72075c9
      Alex Thorlton authored
      This problem has actually been in the UV code for a while, but we didn't
      catch it until recently, because we had been relying on EFI_OLD_MEMMAP
      to allow our systems to boot for a period of time.  We noticed the issue
      when trying to kexec a recent community kernel, where we hit this NULL
      pointer dereference in efi_sync_low_kernel_mappings():
      
       [    0.337515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000880
       [    0.346276] IP: [<ffffffff8105df8d>] efi_sync_low_kernel_mappings+0x5d/0x1b0
      
      The problem doesn't show up with EFI_OLD_MEMMAP because we skip the
      chunk of setup_efi_state() that sets the efi_loader_signature for the
      kexec'd kernel.  When the kexec'd kernel boots, it won't set EFI_BOOT in
      setup_arch, so we completely avoid the bug.
      
      We always kexec with noefi on the command line, so this shouldn't be an
      issue, but since we're not actually checking for efi_runtime_disabled in
      uv_bios_init(), we end up trying to do EFI runtime callbacks when we
      shouldn't be. This patch just adds a check for efi_runtime_disabled in
      uv_bios_init() so that we don't map in uv_systab when runtime_disabled ==
      true.
      Signed-off-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: <stable@vger.kernel.org> # v4.7
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1470912120-22831-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f72075c9
    • Andy Lutomirski's avatar
      x86/efi: Allocate a trampoline if needed in efi_free_boot_services() · 5bc653b7
      Andy Lutomirski authored
      On my Dell XPS 13 9350 with firmware 1.4.4 and SGX on, if I boot
      Fedora 24's grub2-efi off a hard disk, my first 1MB of RAM looks
      like:
      
       efi: mem00: [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000000fff] (0MB)
       efi: mem01: [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000001000-0x0000000000027fff] (0MB)
       efi: mem02: [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000028000-0x0000000000029fff] (0MB)
       efi: mem03: [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002a000-0x000000000002bfff] (0MB)
       efi: mem04: [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002c000-0x000000000002cfff] (0MB)
       efi: mem05: [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002d000-0x000000000002dfff] (0MB)
       efi: mem06: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000000002e000-0x0000000000057fff] (0MB)
       efi: mem07: [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000058000-0x0000000000058fff] (0MB)
       efi: mem08: [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000059000-0x000000000009ffff] (0MB)
      
      My EBDA is at 0x2c000, which blocks off everything from 0x2c000 and
      up, and my trampoline is 0x6000 bytes (6 pages), so it doesn't fit
      in the loader data range at 0x28000.
      
      Without this patch, it panics due to a failure to allocate the
      trampoline.  With this patch, it works:
      
       [  +0.001744] Base memory trampoline at [ffff880000001000] 1000 size 24576
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/998c77b3bf709f3dfed85cb30701ed1a5d8a438b.1470821230.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5bc653b7
    • Andy Lutomirski's avatar
      x86/boot: Rework reserve_real_mode() to allow multiple tries · 5ff3e2c3
      Andy Lutomirski authored
      If reserve_real_mode() fails, panicing immediately means we're
      doomed.  Make it safe to try more than once to allocate the
      trampoline:
      
       - Degrade a failure from panic() to pr_info().  (If we make it to
         setup_real_mode() without reserving the trampoline, we'll panic
         them.)
      
       - Factor out helpers so that platform code can supply a specific
         address to try.
      
       - Warn if reserve_real_mode() is called after we're done with the
         memblock allocator.  If that were to happen, we would behave
         unpredictably.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/876e383038f3e9971aa72fd20a4f5da05f9d193d.1470821230.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5ff3e2c3
    • Andy Lutomirski's avatar
      x86/boot: Defer setup_real_mode() to early_initcall time · d0de0f68
      Andy Lutomirski authored
      There's no need to run setup_real_mode() as early as we run it.
      Defer it to the same early_initcall that sets up the page
      permissions for the real mode code.
      
      This should be a code size reduction.  More importantly, it give us
      a longer window in which we can allocate the real mode trampoline.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/fd62f0da4f79357695e9bf3e365623736b05f119.1470821230.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d0de0f68
    • Andy Lutomirski's avatar
      x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly · 18bc7bd5
      Andy Lutomirski authored
      The initialization process for trampoline_cr4_features and
      mmu_cr4_features was confusing.  The intent is for mmu_cr4_features
      and *trampoline_cr4_features to stay in sync, but
      trampoline_cr4_features is NULL until setup_real_mode() runs.  The
      old code synchronized *trampoline_cr4_features *twice*, once in
      setup_real_mode() and once in setup_arch().  It also initialized
      mmu_cr4_features in setup_real_mode(), which causes the actual value
      of mmu_cr4_features to potentially depend on when setup_real_mode()
      is called.
      
      With this patch, mmu_cr4_features is initialized directly in
      setup_arch(), and *trampoline_cr4_features is synchronized to
      mmu_cr4_features when the trampoline is set up.
      
      After this patch, it should be safe to defer setup_real_mode().
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/d48a263f9912389b957dd495a7127b009259ffe0.1470821230.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      18bc7bd5
    • Andy Lutomirski's avatar
      x86/boot: Run reserve_bios_regions() after we initialize the memory map · 007b7560
      Andy Lutomirski authored
      reserve_bios_regions() is a quirk that reserves memory that we might
      otherwise think is available.  There's no need to run it so early,
      and running it before we have the memory map initialized with its
      non-quirky inputs makes it hard to make reserve_bios_regions() more
      intelligent.
      
      Move it right after we populate the memblock state.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matt Fleming <mfleming@suse.de>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/59f58618911005c799c6c9979ce6ae4881d907c2.1470821230.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      007b7560
    • Aaron Lu's avatar
      x86/irq: Do not substract irq_tlb_count from irq_call_count · 82ba4fac
      Aaron Lu authored
      Since commit:
      
        52aec330 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")
      
      the TLB remote shootdown is done through call function vector. That
      commit didn't take care of irq_tlb_count, which a later commit:
      
        fd0f5869 ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")
      
      ... tried to fix.
      
      The fix assumes every increase of irq_tlb_count has a corresponding
      increase of irq_call_count. So the irq_call_count is always bigger than
      irq_tlb_count and we could substract irq_tlb_count from irq_call_count.
      
      Unfortunately this is not true for the smp_call_function_single() case.
      The IPI is only sent if the target CPU's call_single_queue is empty when
      adding a csd into it in generic_exec_single. That means if two threads
      are both adding flush tlb csds to the same CPU's call_single_queue, only
      one IPI is sent. In other words, the irq_call_count is incremented by 1
      but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
      bigger than irq_call_count and the substract will produce a very large
      irq_call_count value due to overflow.
      
      Considering that:
      
        1) it's not worth to send more IPIs for the sake of accurate counting of
           irq_call_count in generic_exec_single();
      
        2) it's not easy to tell if the call function interrupt is for TLB
           shootdown in __smp_call_function_single_interrupt().
      
      Not to exclude TLB shootdown from call function count seems to be the
      simplest fix and this patch just does that.
      
      This bug was found by LKP's cyclic performance regression tracking recently
      with the vm-scalability test suite. I have bisected to commit:
      
        3dec0ba0 ("mm/rmap: share the i_mmap_rwsem")
      
      This commit didn't do anything wrong but revealed the irq_call_count
      problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
      concurrent with multiple threads.  When remap_one is try_to_unmap_one(),
      then multiple threads could queue flush TLB to the same CPU but only
      one IPI will be sent.
      
      Since the commit was added in Linux v3.19, the counting problem only
      shows up from v3.19 onwards.
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
      Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      82ba4fac
    • Dave Hansen's avatar
      x86/mm: Fix swap entry comment and macro · ace7fab7
      Dave Hansen authored
      A recent patch changed the format of a swap PTE.
      
      The comment explaining the format of the swap PTE is wrong about
      the bits used for the swap type field.  Amusingly, the ASCII art
      and the patch description are correct, but the comment itself
      is wrong.
      
      As I was looking at this, I also noticed that the
      SWP_OFFSET_FIRST_BIT has an off-by-one error.  This does not
      really hurt anything.  It just wasted a bit of space in the PTE,
      giving us 2^59 bytes of addressable space in our swapfiles
      instead of 2^60.  But, it doesn't match with the comments, and it
      wastes a bit of space, so fix it.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Fixes: 00839ee3 ("x86/mm: Move swap offset/type up in PTE to work around erratum")
      Link: http://lkml.kernel.org/r/20160810172325.E56AD7DA@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ace7fab7
    • Wanpeng Li's avatar
      sched/cputime: Fix steal time accounting · f9bcf1e0
      Wanpeng Li authored
      Commit:
      
        57430218 ("sched/cputime: Count actually elapsed irq & softirq time")
      
      ... didn't take steal time into consideration with passing the noirqtime
      kernel parameter.
      
      As Paolo pointed out before:
      
      | Why not? If idle=poll, for example, any time the guest is suspended (and
      | thus cannot poll) does count as stolen time.
      
      This patch fixes it by reducing steal time from idle time accounting when
      the noirqtime parameter is true. The average idle time drops from 56.8%
      to 54.75% for nohz idle kvm guest(noirqtime, idle=poll, four vCPUs running
      on one pCPU).
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1470893795-3527-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f9bcf1e0