1. 23 Feb, 2017 40 commits
    • Linus Torvalds's avatar
      Merge tag 'gfs2-4.11.addendum' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 15192b02
      Linus Torvalds authored
      Pull GFS2 fix from Bob Peterson:
       "This is an addendum for the 4.11 merge window.
      
        Andy Price wrote this patch to close a nasty race condition that
        allows access to glocks that are being destroyed. Without this patch,
        GFS2 is vulnerable to random corruption and kernel panic"
      
      * tag 'gfs2-4.11.addendum' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Add missing rcu locking for glock	lookup
      15192b02
    • Linus Torvalds's avatar
      Merge tag 'sound-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 28cbc335
      Linus Torvalds authored
      Pull sound updates from Takashi Iwai:
       "Here is the update of sound bits for 4.11: again at this time, no big
        changes in ALSA and ASoC core but only cosmetic changes like
        consitifaction.
      
        Meanwhile, quite a lot of developments are seen in a few driver side.
      
        ALSA Core:
         - Clean up, consitification of some ops
      
        HD-audio:
         - A slight behavior change of single_cmd option
         - Quirks for AmigaOne X1000, Samsung Ativ Book 8, Dell AiO, ALC221
           HP, and fixes for Lewisburg controller
         - Realtek ALC299, ALC1220 codecs
      
        Others:
         - USB-audio: Tascam US-16x08 DSP mixer quirk
         - Intel HDMI LPE audio support for Baytrail / Cherrytrail; this
           contains some updates in drm/i915 for the new platform binding
      
        ASoC:
         - Lots of updates in Intel drivers, mostly for DisplayPort and HDMI
           on Skylake and onwards, as well as more Baytrail / Cherrytrail
           boards support
         - Channel mapping support for HDMI
         - Support for AllWinner A31 and A33, Everest Semiconductor ES8328,
           Nuvoton NAU8540.
      
      * tag 'sound-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (323 commits)
        ALSA: usb-audio: Tidy up mixer_us16x08.c
        ALSA: usb-audio: Fix memory leak and corruption in mixer_us16x08.c
        ALSA: usb-audio: purge needless variable length array
        ALSA: x86: hdmi: select CONFIG_SND_PCM
        ALSA: x86: Don't enable runtime PM as default
        ALSA: x86: Use runtime PM autosuspend
        ALSA: usb-audio: localize function without external linkage
        ALSA: usb-audio: localize one-referrer variable
        ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk
        ALSA: emu10k1: constify snd_emux_operators structure
        ASoC: sun4i-spdif: drop unnessary snd_soc_unregister_component()
        ASoC: Intel: bxt: Add jack port initialize in bxt_rt298 machine
        ASoC: nau8825: automatic BCLK and LRC divde in master mode
        ASoC: hdac_hdmi: Add device id for Geminilake
        ASoC: Intel: Skylake: Add Geminlake IDs
        ASoC: rt298: Add DMI match for Geminilake reference platform
        ASoC: Intel: Skylake: Check device type to get endpoint configuration
        ASoC: Intel: bxt: Add jack port initialize in da7219_max98357a machine
        ASoC: Intel: Skylake: Add jack port initialize in nau88l25_ssm4567 machine
        ASoC: Intel: Skylake: Add jack port initialize in nau88l25_max98357a machine
        ...
      28cbc335
    • Linus Torvalds's avatar
      Merge tag 'gpio-v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 1ec5c186
      Linus Torvalds authored
      Pull GPIO updates from Linus Walleij:
       "This is the bulk of GPIO changes for the v4.11 cycle
      
        Core changes:
      
         - Augment fwnode_get_named_gpiod() to configure the GPIO pin
           immediately after requesting it like all other APIs do. This is a
           treewide change also updating all users.
      
         - Pass a GPIO label down to gpiod_request() from
           fwnode_get_named_gpiod(). This makes debugfs and the userspace ABI
           correctly reflect the current in-kernel consumer of a pin taken
           using this abstraction. This is a treewide change also updating all
           users.
      
         - Rename devm_get_gpiod_from_child() to
           devm_fwnode_get_gpiod_from_child() to reflect the fact that this
           function is operating on a fwnode object. This is a treewide change
           also updating all users.
      
         - Make it possible to take multiple GPIOs in a single hog of device
           tree hogs.
      
         - The refactorings switching GPIO chips to use the .set_config()
           callback using standard pin control properties and providing a
           backend into the pin control subsystem that were also merged into
           the pin control tree naturally appear here too.
      
        Testing instrumentation:
      
         - A whole slew of cleanups and improvements to the mockup GPIO
           driver. We now have an extended userspace test exercising the
           subsystem, and we can inject interrupts etc from userspace to fully
           test the core GPIO functionality.
      
        New drivers:
      
         - New driver for the Cortina Systems Gemini GPIO controller.
      
         - New driver for the Exar XR17V352/354/358 chips.
      
         - New driver for the ACCES PCI-IDIO-16 PCI GPIO card.
      
        Driver changes:
      
         - RCAR: set the irqchip parent device, add fine-grained runtime PM
           support.
      
         - pca953x: support optional RESET control line on the chip.
      
         - DaVinci: cleanups and simplifications. Add support for multiple
           instances.
      
         - .set_multiple() and naming of lines on more or less all of the
           ISA/PCI GPIO controllers.
      
         - mcp23s08: refactored to use regmap as a first step to further
           rewrites and modernizations"
      
      * tag 'gpio-v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits)
        gpio: reintroduce devm_get_gpiod_from_child()
        gpio: pci-idio-16: Fix PCI BAR index
        gpio: pci-idio-16: Fix PCI device ID code
        gpio: mockup: implement event injecting over debugfs
        gpio: mockup: add a dummy irqchip
        gpio: mockup: implement naming the lines
        gpio: mockup: code shrink
        gpio: mockup: readability tweaks
        gpio: Add GPIO support for the ACCES PCI-IDIO-16
        gpio: Add the devm_fwnode_get_index_gpiod_from_child() helper
        gpio: Rename devm_get_gpiod_from_child()
        gpio: mcp23s08: Select REGMAP/REGMAP_I2C to fix build error
        gpio: ws16c48: Add support for GPIO names
        gpio: gpio-mm: Add support for GPIO names
        gpio: 104-idio-16: Add support for GPIO names
        gpio: 104-idi-48: Add support for GPIO names
        gpio: 104-dio-48e: Add support for GPIO names
        gpio: ws16c48: Remove unnecessary driver_data set
        gpio: gpio-mm: Remove unnecessary driver_data set
        gpio: 104-idio-16: Remove unnecessary driver_data set
        ...
      1ec5c186
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · d5dee39b
      Linus Torvalds authored
      Pull input updates from Dmitry:
      
       - a new driver for Zeitech touchscreen controller
      
       - a new driver for Samsung "touchkeys"
      
       - touchscreen driver for Moorestown platform has been removed because
         platform support is gone
      
       - MPU3050 accelerometer driver was removed in favor of IIO driver
      
       - miscellaneous driver cleanup and fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (88 commits)
        Input: zet6223 - export OF device ID as module aliases
        Input: tsc2004/5 - switch to using generic device properties
        Input: tsc2004/5 - fix regulator handling
        Input: tsc2005 - add OF device table
        Input: add driver for Zeitec ZET6223
        Input: joydev - do not report stale values on first open
        Input: synaptics-rmi4 - forward upper mechanical buttons to PS/2 guest
        Input: synaptics-rmi4 - clean up F30 implementation
        Input: synaptics - use SERIO_OOB_DATA to handle trackstick buttons
        Input: psmouse - add a custom serio protocol to send extra information
        Input: synaptics-rmi4 - fix error return code in rmi_probe_interrupts()
        Input: xpad - restore LED state after device resume
        Input: synaptics-rmi4 - add rmi_find_function()
        Input: xpad - fix stuck mode button on Xbox One S pad
        Input: joydev - use clamp() macro
        Input: refuse to register absolute devices without absinfo
        Input: synaptics-rmi4 - add sysfs interfaces for hardware IDs
        Input: synaptics-rmi4 - add sysfs attribute update_fw_status
        Input: mousedev - stop offering PS/2 to userspace by default
        Input: tca8418 - switch to using generic device properties
        ...
      d5dee39b
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · 4cc4b932
      Linus Torvalds authored
      Pull rdma updates from Doug Ledford:
       "First set of updates for 4.11 kernel merge window
      
         - Add new Broadcom bnxt_re RoCE driver
         - rxe driver updates
         - ioctl cleanups
         - ETH_P_IBOE declaration cleanup
         - IPoIB changes
         - Add port state cache
         - Allow srpt driver to accept guids as port names in config
         - Update to hfi1 driver
         - Update to srp driver
         - Lots of misc minor changes all over"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (114 commits)
        RDMA/bnxt_re: fix for "bnxt_en: Update to firmware interface spec 1.7.0."
        rdma_cm: fail iwarp accepts w/o connection params
        IB/srp: Drain the send queue before destroying a QP
        IB/core: Add support for draining IB_POLL_DIRECT completion queues
        IB/srp: Improve an error path
        IB/srp: Make a diagnostic message more informative
        IB/srp: Document locking conventions
        IB/srp: Fix race conditions related to task management
        IB/srp: Avoid that duplicate responses trigger a kernel bug
        IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
        RDMA/qedr: Fix some error handling
        RDMA/bnxt_re: add DCB dependency
        IB/hns: include linux/module.h
        IB/vmw_pvrdma: Expose vendor error to ULPs
        vmw_pvrdma: switch to pci_alloc_irq_vectors
        IB/hfi1: use size_t for passing array length
        IB/ipoib: Remove redudant label
        IB/ipoib: remove the unnecessary memory free
        IB/mthca: switch to pci_alloc_irq_vectors
        IB/hfi1: Code reuse with memdup_copy
        ...
      4cc4b932
    • Linus Torvalds's avatar
      Merge tag 'backlight-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight · a57eaa1f
      Linus Torvalds authored
      Pull backlight updates from Lee Jones:
       "Core Frameworks:
         - Add Daniel Thompson as co-maintainer
      
        Fix-ups:
         - Improve error handling; adp5520_bl
         - Split initial power checks into dedicated function; pwm_bl
         - Check current PWM status; pwm_bl
      
        Bug Fixes:
         - Fix potential race; lcd
         - Fix module auto-loading; da9052"
      
      * tag 'backlight-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
        MAINTAINERS: Rework entry for Backlight
        backlight: da9052: Fix module autoload
        backlight: pwm_bl: Check the PWM state for initial backlight power state
        backlight: pwm_bl: Move the checks for initial power state to a separate function
        backlight: adp5520: Fix error handling in adp5520_bl_probe()
        backlight: lcd: Fix race condition during register
      a57eaa1f
    • Linus Torvalds's avatar
      Merge tag 'mfd-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · df9cdc17
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "Core Frameworks:
         - Add new !TOUCHSCREEN_SUN4I dependency for SUN4I_GPADC
         - List include/dt-bindings/mfd/* to files supported in MAINTAINERS
      
        New Drivers:
         - Intel Apollo Lake SPI NOR
         - ST STM32 Timers (Advanced, Basic and PWM)
         - Motorola 6556002 CPCAP (PMIC)
      
        New Device Support:
         - Add support for AXP221 to axp20x
         - Add support for Intel Gemini Lake to intel-lpss-pci
         - Add support for MT6323 LED to mt6397-core
         - Add support for COMe-bBD#, COMe-bSL6, COMe-bKL6, COMe-cAL6 and
           COMe-cKL6 to kempld-core
      
        New Functionality:
         - Add support for Analog CODAC to sun6i-prcm
         - Add support for Watchdog to lpc_ich
      
        Fix-ups:
         - Error handling improvements; axp288_charger, axp20x, ab8500-sysctrl
         - Adapt platform data handling; axp20x
         - IRQ handling improvements; arizona, axp20x
         - Remove superfluous code; arizona, axp20x, lpc_ich
         - Trivial coding style/spelling fixes; axp20x, abx500, mfd.txt
         - Regmap fix-ups; axp20x
         - DT changes; mfd.txt, aspeed-lpc, aspeed-gfx, ab8500-core, tps65912,
           mt6397
         - Use new I2C probing mechanism; max77686
         - Constification; rk808
      
        Bug Fixes:
         - Stop data transfer whilst suspended; cros_ec"
      
      * tag 'mfd-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (43 commits)
        mfd: lpc_ich: Enable watchdog on Intel Apollo Lake PCH
        mfd: lpc_ich: Remove useless comments in core part
        mfd: Add support for several boards to Kontron PLD driver
        mfd: constify regmap_irq_chip structures
        MAINTAINERS: Add include/dt-bindings/mfd to MFD entry
        mfd: cpcap: Add minimal support
        mfd: mt6397: Add MT6323 LED support into MT6397 driver
        Documentation: devicetree: Add LED subnode binding for MT6323 PMIC
        mfd: tps65912: Export OF device ID table as module aliases
        mfd: ab8500-core: Rename clock device and compatible
        mfd: cros_ec: Send correct suspend/resume event to EC
        mfd: max77686: Remove I2C device ID table
        mfd: max77686: Use the struct i2c_driver .probe_new instead of .probe
        mfd: max77686: Use of_device_get_match_data() helper
        mfd: max77686: Don't attempt to get i2c_device_id .data
        mfd: ab8500-sysctrl: Handle probe deferral
        mfd: intel-lpss: Add Intel Gemini Lake PCI IDs
        mfd: axp20x: Fix AXP806 access errors on cold boot
        mfd: cros_ec: Send suspend state notification to EC
        mfd: cros_ec: Prevent data transfer while device is suspended
        ...
      df9cdc17
    • Andrew Price's avatar
      gfs2: Add missing rcu locking for glock lookup · f38e5fb9
      Andrew Price authored
      We must hold the rcu read lock across looking up glocks and trying to
      bump their refcount to prevent the glocks from being freed in between.
      
      Cc: <stable@vger.kernel.org> # 4.3+
      Signed-off-by: default avatarAndrew Price <anprice@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      f38e5fb9
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · bc49a783
      Linus Torvalds authored
      Merge updates from Andrew Morton:
       "142 patches:
      
         - DAX updates
      
         - various misc bits
      
         - OCFS2 updates
      
         - most of MM"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (142 commits)
        mm/z3fold.c: limit first_num to the actual range of possible buddy indexes
        mm: fix <linux/pagemap.h> stray kernel-doc notation
        zram: remove obsolete sysfs attrs
        mm/memblock.c: remove unnecessary log and clean up
        oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA
        mm: drop unused argument of zap_page_range()
        mm: drop zap_details::check_swap_entries
        mm: drop zap_details::ignore_dirty
        mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled
        mm: help __GFP_NOFAIL allocations which do not trigger OOM killer
        mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically
        mm: consolidate GFP_NOFAIL checks in the allocator slowpath
        lib/show_mem.c: teach show_mem to work with the given nodemask
        arch, mm: remove arch specific show_mem
        mm, page_alloc: warn_alloc print nodemask
        mm, page_alloc: do not report all nodes in show_mem
        Revert "mm: bail out in shrink_inactive_list()"
        mm, vmscan: consider eligible zones in get_scan_count
        mm, vmscan: cleanup lru size claculations
        mm, vmscan: do not count freed pages as PGDEACTIVATE
        ...
      bc49a783
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · be5165a5
      Linus Torvalds authored
      Pull DeviceTree updates from Rob Herring:
       "Pretty standard stuff with dtc upstream sync being the biggest piece.
      
         - Sync dtc to upstream commit 0931cea3ba20. This picks up overlay
           support in dtc.
      
         - Set dma_ops for reserved memory users.
      
         - Make references to IOMMU consistent in DT bindings.
      
         - Cleanup references to pm_power_off in bindings.
      
         - Move some display bindings that snuck into the old bindings/video/
           path.
      
         - Fix some wrong documentation paths caused from binding
           restructuring.
      
         - Vendor prefixes for Faraday and Fujitsu.
      
         - Fix an of_node ref counting leak in of_find_node_opts_by_path
      
         - Introduce new graph helper of_graph_get_remote_node() which will be
           used by DRM drivers in 4.12"
      
      * tag 'devicetree-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (27 commits)
        DT: add Faraday Tec. as vendor
        of: introduce of_graph_get_remote_node
        of: Add missing space at end of pr_fmt().
        of: make of_device_make_bus_id() static
        of: fix of_node leak caused in of_find_node_opts_by_path
        dt-bindings: net: remove reference to fixed link support
        dt-bindings: power: reset: qnap-poweroff: Drop reference to pm_power_off
        dt-bindings: power: reset: gpio-poweroff: Drop reference to pm_power_off
        dt-bindings: mfd: as3722: Drop reference to pm_power_off
        dt-bindings: display: move ANX7814 and SiI8620 bridge bindings
        of/unittest: Swap arguments of of_unittest_apply_overlay()
        Documentation: usb: fix wrong documentation paths
        serial: fsl-imx-uart.txt: Remove generic property
        devicetree: Add Fujitsu Ltd. vendor prefix
        Documentation: display: fix wrong documentation paths
        of: remove redundant memset in overlay
        bus:qcom : Fix typo in qcom,ebi2.txt
        dt-bindings: qman: Remove pool channel node
        Documentation: panel-dpi: fix path to display-timing.txt
        devicetree: bindings: clk: mvebu: fix description for sata1 on Armada XP
        ...
      be5165a5
    • Linus Torvalds's avatar
      Merge tag 'docs-4.11' of git://git.lwn.net/linux · c1aac62f
      Linus Torvalds authored
      Pull documentation updates from Jonathan Corbet:
       "A slightly quieter cycle for documentation this time around.
      
        Three more DocBook template files have been converted to RST; only 21
        to go. There are various build improvements and the usual array of
        documentation improvements and fixes"
      
      * tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits)
        docs / driver-api: Fix structure references in device_link.rst
        PM / docs: Fix structure references in device.rst
        Add a target to check broken external links in the Documentation
        Documentation: Fix linux-api list typo
        Documentation: DocBook/Makefile comment typo
        Improve sparse documentation
        Documentation: make Makefile.sphinx no-ops quieter
        Documentation: DMA-ISA-LPC.txt
        Documentation: input: fix path to input code definitions
        docs: Remove the copyright year from conf.py
        docs: Fix a warning in the Korean HOWTO.rst translation
        PM / sleep / docs: Convert PM notifiers document to reST
        PM / core / docs: Convert sleep states API document to reST
        PM / core: Update kerneldoc comments in pm.h
        doc-rst: Fix recursive make invocation from macros
        doc-rst: Delete output of failed dot-SVG conversion
        doc-rst: Break shell command sequences on failure
        Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0
        doc-rst: fixed cleandoc target when used with O=dir
        Documentation/sphinx: prevent generation of .pyc files in the source tree
        ...
      c1aac62f
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · fd7e9a88
      Linus Torvalds authored
      Pull KVM updates from Paolo Bonzini:
       "4.11 is going to be a relatively large release for KVM, with a little
        over 200 commits and noteworthy changes for most architectures.
      
        ARM:
         - GICv3 save/restore
         - cache flushing fixes
         - working MSI injection for GICv3 ITS
         - physical timer emulation
      
        MIPS:
         - various improvements under the hood
         - support for SMP guests
         - a large rewrite of MMU emulation. KVM MIPS can now use MMU
           notifiers to support copy-on-write, KSM, idle page tracking,
           swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is
           also supported, so that writes to some memory regions can be
           treated as MMIO. The new MMU also paves the way for hardware
           virtualization support.
      
        PPC:
         - support for POWER9 using the radix-tree MMU for host and guest
         - resizable hashed page table
         - bugfixes.
      
        s390:
         - expose more features to the guest
         - more SIMD extensions
         - instruction execution protection
         - ESOP2
      
        x86:
         - improved hashing in the MMU
         - faster PageLRU tracking for Intel CPUs without EPT A/D bits
         - some refactoring of nested VMX entry/exit code, preparing for live
           migration support of nested hypervisors
         - expose yet another AVX512 CPUID bit
         - host-to-guest PTP support
         - refactoring of interrupt injection, with some optimizations thrown
           in and some duct tape removed.
         - remove lazy FPU handling
         - optimizations of user-mode exits
         - optimizations of vcpu_is_preempted() for KVM guests
      
        generic:
         - alternative signaling mechanism that doesn't pound on
           tsk->sighand->siglock"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits)
        x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
        x86/paravirt: Change vcp_is_preempted() arg type to long
        KVM: VMX: use correct vmcs_read/write for guest segment selector/base
        x86/kvm/vmx: Defer TR reload after VM exit
        x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
        x86/kvm/vmx: Simplify segment_base()
        x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
        x86/kvm/vmx: Don't fetch the TSS base from the GDT
        x86/asm: Define the kernel TSS limit in a macro
        kvm: fix page struct leak in handle_vmon
        KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now
        KVM: Return an error code only as a constant in kvm_get_dirty_log()
        KVM: Return an error code only as a constant in kvm_get_dirty_log_protect()
        KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl()
        KVM: x86: remove code for lazy FPU handling
        KVM: race-free exit from KVM_RUN without POSIX signals
        KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message
        KVM: PPC: Book3S PR: Ratelimit copy data failure error messages
        KVM: Support vCPU-based gfn->hva cache
        KVM: use separate generations for each address space
        ...
      fd7e9a88
    • Linus Torvalds's avatar
      Merge tag 'iommu-fix-v4.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 5066e4a3
      Linus Torvalds authored
      Pull IOMMU fix from Joerg Roedel:
       "Fix a boot crash caused by the VT-d driver when booted with IOMMU
        disabled. This was introduced with the recent IOMMU changes"
      
      * tag 'iommu-fix-v4.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Fix crash on boot when DMAR is disabled
      5066e4a3
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · b4642c10
      Linus Torvalds authored
      Pull seccomp fix from James Morris:
       "A fix for a regression in the seccomp code (it was supposed to be in
        the first pull req but I had it queued in the wrong branch)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        seccomp: Only dump core when single-threaded
      b4642c10
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · a27fcb0c
      Linus Torvalds authored
      Pull xfs updates from Darrick Wong:
       "Here are the XFS changes for 4.11. We aren't introducing any major
        features in this release cycle except for this being the first merge
        window I've managed on my own. :)
      
        Changes since last update:
      
         - Various cleanups
      
         - Livelock fixes for eofblocks scanning
      
         - Improved input verification for on-disk metadata
      
         - Fix races in the copy on write remap mechanism
      
         - Fix buffer io error timeout controls
      
         - Streamlining of directio copy on write
      
         - Asynchronous discard support
      
         - Fix asserts when splitting delalloc reservations
      
         - Don't bloat bmbt when right shifting extents
      
         - Inode alignment fixes for 32k block sizes"
      
      * tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (39 commits)
        xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG
        xfs: simplify xfs_rtallocate_extent
        xfs: tune down agno asserts in the bmap code
        xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment
        xfs: don't reserve blocks for right shift transactions
        xfs: fix len comparison in xfs_extent_busy_trim
        xfs: fix uninitialized variable in _reflink_convert_cow
        xfs: split indlen reservations fairly when under reserved
        xfs: handle indlen shortage on delalloc extent merge
        xfs: resurrect debug mode drop buffered writes mechanism
        xfs: clear delalloc and cache on buffered write failure
        xfs: don't block the log commit handler for discards
        xfs: improve busy extent sorting
        xfs: improve handling of busy extents in the low-level allocator
        xfs: don't fail xfs_extent_busy allocation
        xfs: correct null checks and error processing in xfs_initialize_perag
        xfs: update ctime and mtime on clone destinatation inodes
        xfs: allocate direct I/O COW blocks in iomap_begin
        xfs: go straight to real allocations for direct I/O COW writes
        xfs: return the converted extent in __xfs_reflink_convert_cow
        ...
      a27fcb0c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk · 7d91de74
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
      
       - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven
         Rostedt as the printk reviewer. This idea came up after the
         discussion about printk issues at Kernel Summit. It was formulated
         and discussed at lkml[1].
      
       - Extend a lock-less NMI per-cpu buffers idea to handle recursive
         printk() calls by Sergey Senozhatsky[2]. It is the first step in
         sanitizing printk as discussed at Kernel Summit.
      
         The change allows to see messages that would normally get ignored or
         would cause a deadlock.
      
         Also it allows to enable lockdep in printk(). This already paid off.
         The testing in linux-next helped to discover two old problems that
         were hidden before[3][4].
      
       - Remove unused parameter by Sergey Senozhatsky. Clean up after a past
         change.
      
      [1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com
      [2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com
      [3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
      [4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
        printk: drop call_console_drivers() unused param
        printk: convert the rest to printk-safe
        printk: remove zap_locks() function
        printk: use printk_safe buffers in printk
        printk: report lost messages in printk safe/nmi contexts
        printk: always use deferred printk when flush printk_safe lines
        printk: introduce per-cpu safe_print seq buffer
        printk: rename nmi.c and exported api
        printk: use vprintk_func in vprintk()
        MAINTAINERS: Add printk maintainers
      7d91de74
    • Linus Torvalds's avatar
      Merge tag 'modules-for-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · 6ef192f2
      Linus Torvalds authored
      Pull modules updates from Jessica Yu:
       "Summary of modules changes for the 4.11 merge window:
      
         - A few small code cleanups
      
         - Add modules git tree url to MAINTAINERS"
      
      * tag 'modules-for-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        MAINTAINERS: add tree for modules
        module: fix memory leak on early load_module() failures
        module: Optimize search_module_extables()
        modules: mark __inittest/__exittest as __maybe_unused
        livepatch/module: print notice of TAINT_LIVEPATCH
        module: Drop redundant declaration of struct module
      6ef192f2
    • zhong jiang's avatar
      mm/z3fold.c: limit first_num to the actual range of possible buddy indexes · f201ebd8
      zhong jiang authored
      At present, Tying the first_num size to NCHUNKS_ORDER is confusing.  the
      number of chunks is completely unrelated to the number of buddies.
      
      The patch limits the first_num to actual range of possible buddy indexes.
      and that is more reasonable and obvious without functional change.
      
      Link: http://lkml.kernel.org/r/1476776569-29504-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Suggested-by: default avatarDan Streetman <ddstreet@ieee.org>
      Acked-by: default avatarDan Streetman <ddstreet@ieee.org>
      Acked-by: default avatarVitaly Wool <vitalywool@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f201ebd8
    • Randy Dunlap's avatar
      mm: fix <linux/pagemap.h> stray kernel-doc notation · 083fb8ed
      Randy Dunlap authored
      Delete stray (second) function description in find_lock_page()
      kernel-doc notation.
      
      Note: scripts/kernel-doc just ignores the second function description.
      
      Fixes: 2457aec6 ("mm: non-atomically mark page accessed during page cache allocation where possible")
      Link: http://lkml.kernel.org/r/b037e9a3-516c-ec02-6c8e-fa5479747ba6@infradead.orgSigned-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      083fb8ed
    • Sergey Senozhatsky's avatar
      zram: remove obsolete sysfs attrs · c87d1655
      Sergey Senozhatsky authored
      We had a deprecated_attr_warn() warning for 2 years and now the time has
      come and we finally can do the cleanup.
      
      The plan was as follows:
      
      : per-stat sysfs attributes are considered to be deprecated.
      : The basic strategy is:
      : -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11)
      : -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11)
      :
      : The list of deprecated attributes can be found here:
      : Documentation/ABI/obsolete/sysfs-block-zram
      :
      : Basically, every attribute that has its own read accessible sysfs
      : node (e.g. num_reads) *AND* is accessible via one of the stat files
      : (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered
      : to be deprecated.
      
      The patch also removes `obsolete/sysfs-block-zram', clean ups
      `testing/sysfs-block-zram' and tweaks zram.txt files.
      
      Link: http://lkml.kernel.org/r/20170118035838.11090-1-sergey.senozhatsky@gmail.comSigned-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c87d1655
    • Miles Chen's avatar
      mm/memblock.c: remove unnecessary log and clean up · 5d63f81c
      Miles Chen authored
      There is no variable named flags in memblock_add() and
      memblock_reserve() so remove it from the log messages.
      
      This patch also cleans up the type casting for phys_addr_t by using %pa
      to print them.
      
      Link: http://lkml.kernel.org/r/1484720165-25403-1-git-send-email-miles.chen@mediatek.comSigned-off-by: default avatarMiles Chen <miles.chen@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d63f81c
    • Kirill A. Shutemov's avatar
      oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA · 23519073
      Kirill A. Shutemov authored
      Logic on whether we can reap pages from the VMA should match what we
      have in madvise_dontneed().  In particular, we should skip, VM_PFNMAP
      VMAs, but we don't now.
      
      Let's just extract condition on which we can shoot down pagesi from a
      VMA with MADV_DONTNEED into separate function and use it in both places.
      
      Link: http://lkml.kernel.org/r/20170118122429.43661-4-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23519073
    • Kirill A. Shutemov's avatar
      mm: drop unused argument of zap_page_range() · ecf1385d
      Kirill A. Shutemov authored
      There's no users of zap_page_range() who wants non-NULL 'details'.
      Let's drop it.
      
      Link: http://lkml.kernel.org/r/20170118122429.43661-3-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ecf1385d
    • Kirill A. Shutemov's avatar
      mm: drop zap_details::check_swap_entries · 3e8715fd
      Kirill A. Shutemov authored
      detail == NULL would give the same functionality as
      .check_swap_entries==true.
      
      Link: http://lkml.kernel.org/r/20170118122429.43661-2-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e8715fd
    • Kirill A. Shutemov's avatar
      mm: drop zap_details::ignore_dirty · da162e93
      Kirill A. Shutemov authored
      The only user of ignore_dirty is oom-reaper.  But it doesn't really use
      it.
      
      ignore_dirty only has effect on file pages mapped with dirty pte.  But
      oom-repear skips shared VMAs, so there's no way we can dirty file pte in
      them.
      
      Link: http://lkml.kernel.org/r/20170118122429.43661-1-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da162e93
    • David Rientjes's avatar
      mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled · 685dbf6f
      David Rientjes authored
      The patch "mm, page_alloc: warn_alloc print nodemask" implicitly sets
      the allocation nodemask to cpuset_current_mems_allowed when there is no
      effective mempolicy.  cpuset_current_mems_allowed is only effective when
      cpusets are enabled, which is also printed by warn_alloc(), so setting
      the nodemask to cpuset_current_mems_allowed is redundant and prevents
      debugging issues where ac->nodemask is not set properly in the page
      allocator.
      
      This provides better debugging output since
      cpuset_print_current_mems_allowed() is already provided.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701181347320.142399@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      685dbf6f
    • Michal Hocko's avatar
      mm: help __GFP_NOFAIL allocations which do not trigger OOM killer · 6c18ba7a
      Michal Hocko authored
      Now that __GFP_NOFAIL doesn't override decisions to skip the oom killer
      we are left with requests which require to loop inside the allocator
      without invoking the oom killer (e.g.  GFP_NOFS|__GFP_NOFAIL used by fs
      code) and so they might, in very unlikely situations, loop for ever -
      e.g.  other parallel request could starve them.
      
      This patch tries to limit the likelihood of such a lockup by giving
      these __GFP_NOFAIL requests a chance to move on by consuming a small
      part of memory reserves.  We are using ALLOC_HARDER which should be
      enough to prevent from the starvation by regular allocation requests,
      yet it shouldn't consume enough from the reserves to disrupt high
      priority requests (ALLOC_HIGH).
      
      While we are at it, let's introduce a helper __alloc_pages_cpuset_fallback
      which enforces the cpusets but allows to fallback to ignore them if the
      first attempt fails.  __GFP_NOFAIL requests can be considered important
      enough to allow cpuset runaway in order for the system to move on.  It
      is highly unlikely that any of these will be GFP_USER anyway.
      
      Link: http://lkml.kernel.org/r/20161220134904.21023-4-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c18ba7a
    • Michal Hocko's avatar
      mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically · 06ad276a
      Michal Hocko authored
      __alloc_pages_may_oom makes sure to skip the OOM killer depending on the
      allocation request.  This includes lowmem requests, costly high order
      requests and others.  For a long time __GFP_NOFAIL acted as an override
      for all those rules.  This is not documented and it can be quite
      surprising as well.  E.g.  GFP_NOFS requests are not invoking the OOM
      killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of
      the existing open coded loops around allocator to nofail request (and we
      have done that in the past) then such a change would have a non trivial
      side effect which is far from obvious.  Note that the primary motivation
      for skipping the OOM killer is to prevent from pre-mature invocation.
      
      The exception has been added by commit 82553a93 ("oom: invoke oom
      killer for __GFP_NOFAIL").  The changelog points out that the oom killer
      has to be invoked otherwise the request would be looping for ever.  But
      this argument is rather weak because the OOM killer doesn't really
      guarantee a forward progress for those exceptional cases:
      
      - it will hardly help to form costly order which in turn can result in
        the system panic because of no oom killable task in the end - I believe
        we certainly do not want to put the system down just because there is a
        nasty driver asking for order-9 page with GFP_NOFAIL not realizing all
        the consequences.  It is much better this request would loop for ever
        than the massive system disruption
      
      - lowmem is also highly unlikely to be freed during OOM killer
      
      - GFP_NOFS request could trigger while there is still a lot of memory
        pinned by filesystems.
      
      This patch simply removes the __GFP_NOFAIL special case in order to have a
      more clear semantic without surprising side effects.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarNils Holland <nholland@tisys.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      06ad276a
    • Michal Hocko's avatar
      mm: consolidate GFP_NOFAIL checks in the allocator slowpath · 9a67f648
      Michal Hocko authored
      Tetsuo Handa has pointed out that commit 0a0337e0 ("mm, oom: rework
      oom detection") has subtly changed semantic for costly high order
      requests with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail
      right now.  My code inspection didn't reveal any such users in the tree
      but it is true that this might lead to unexpected allocation failures
      and subsequent OOPs.
      
      __alloc_pages_slowpath wrt.  GFP_NOFAIL is hard to follow currently.
      There are few special cases but we are lacking a catch all place to be
      sure we will not miss any case where the non failing allocation might
      fail.  This patch reorganizes the code a bit and puts all those special
      cases under nopage label which is the generic go-to-fail path.  Non
      failing allocations are retried or those that cannot retry like
      non-sleeping allocation go to the failure point directly.  This should
      make the code flow much easier to follow and make it less error prone
      for future changes.
      
      While we are there we have to move the stall check up to catch
      potentially looping non-failing allocations.
      
      [akpm@linux-foundation.org: fix alloc_flags may-be-used-uninitalized]
      Link: http://lkml.kernel.org/r/20161220134904.21023-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a67f648
    • Michal Hocko's avatar
      lib/show_mem.c: teach show_mem to work with the given nodemask · 9af744d7
      Michal Hocko authored
      show_mem() allows to filter out node specific data which is irrelevant
      to the allocation request via SHOW_MEM_FILTER_NODES.  The filtering is
      done in skip_free_areas_node which skips all nodes which are not in the
      mems_allowed of the current process.  This works most of the time as
      expected because the nodemask shouldn't be outside of the allocating
      task but there are some exceptions.  E.g.  memory hotplug might want to
      request allocations from outside of the allowed nodes (see
      new_node_page).
      
      Get rid of this hardcoded behavior and push the allocation mask down the
      show_mem path and use it instead of cpuset_current_mems_allowed.  NULL
      nodemask is interpreted as cpuset_current_mems_allowed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170117091543.25850-5-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9af744d7
    • Michal Hocko's avatar
      arch, mm: remove arch specific show_mem · 6d23f8a5
      Michal Hocko authored
      We have a generic implementation for quite some time already.  If there
      is any arch specific information to be printed then we should add a
      callback called from the generic code rather than duplicate the whole
      show_mem.
      
      The current code has resulted in the code duplication and the output
      divergence which is both confusing and adds maintainance costs.
      
      Let's just get rid of this mess.
      
      Link: http://lkml.kernel.org/r/20170117091543.25850-4-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [UniCore32]
      Acked-by: Helge Deller <deller@gmx.de> [for parisc]
      Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d23f8a5
    • Michal Hocko's avatar
      mm, page_alloc: warn_alloc print nodemask · a8e99259
      Michal Hocko authored
      warn_alloc is currently used for to report an allocation failure or an
      allocation stall.  We print some details of the allocation request like
      the gfp mask and the request order.  We do not print the allocation
      nodemask which is important when debugging the reason for the allocation
      failure as well.  We alreaddy print the nodemask in the OOM report.
      
      Add nodemask to warn_alloc and print it in warn_alloc as well.
      
      Link: http://lkml.kernel.org/r/20170117091543.25850-3-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8e99259
    • Michal Hocko's avatar
      mm, page_alloc: do not report all nodes in show_mem · c02e50bb
      Michal Hocko authored
      Patch series "show_mem updates", v2.
      
      This is a mixture of one bug fix (patch 1), an enhancement (patch 2) and
      cleanups (the rest of the series).  First two patches should be really
      straightforward.  Patch 3 removes some arch specific show_mem
      implementations because I think they are quite outdated and do not
      really serve any useful purpose anymore.  I think we should really
      strive to have a consistent show_mem output regardless of the
      architecture.  If some architecture is really special and wants to dump
      something additional we should do that via an arch specific hook.
      
      The last patch adds nodemask parameter so that we do not rely on the
      hardcoded mems_allowed of the current task when doing the node
      filtering.  I consider this more a cleanup than a fix because basically
      all users use a nodemask which is a subset of mems_allowed.  There is
      only one call path in the memory hotplug which doesn't comply with this
      but that is hardly something to worry about.
      
      This patch (of 4):
      
      Commit 599d0c95 ("mm, vmscan: move LRU lists to node") has added per
      numa node statistics to show_mem but it forgot to add
      skip_free_areas_node to filter out nodes which are outside of the
      allocating task numa policy.  Add this check to not pollute the output
      with the pointless information.
      
      Link: http://lkml.kernel.org/r/20170117091543.25850-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c02e50bb
    • Michal Hocko's avatar
      Revert "mm: bail out in shrink_inactive_list()" · abd6e8a7
      Michal Hocko authored
      This reverts commit 91dcade4.
      
      inactive_reclaimable_pages shouldn't be needed anymore since that
      get_scan_count is aware of the eligble zones ("mm, vmscan: consider
      eligible zones in get_scan_count").
      
      Link: http://lkml.kernel.org/r/20170117103702.28542-4-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpchxg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abd6e8a7
    • Michal Hocko's avatar
      mm, vmscan: consider eligible zones in get_scan_count · 71ab6cfe
      Michal Hocko authored
      get_scan_count() considers the whole node LRU size when
      
       - doing SCAN_FILE due to many page cache inactive pages
       - calculating the number of pages to scan
      
      In both cases this might lead to unexpected behavior especially on 32b
      systems where we can expect lowmem memory pressure very often.
      
      A large highmem zone can easily distort SCAN_FILE heuristic because
      there might be only few file pages from the eligible zones on the node
      lru and we would still enforce file lru scanning which can lead to
      trashing while we could still scan anonymous pages.
      
      The later use of lruvec_lru_size can be problematic as well.  Especially
      when there are not many pages from the eligible zones.  We would have to
      skip over many pages to find anything to reclaim but shrink_node_memcg
      would only reduce the remaining number to scan by SWAP_CLUSTER_MAX at
      maximum.  Therefore we can end up going over a large LRU many times
      without actually having chance to reclaim much if anything at all.  The
      closer we are out of memory on lowmem zone the worse the problem will
      be.
      
      Fix this by filtering out all the ineligible zones when calculating the
      lru size for both paths and consider only sc->reclaim_idx zones.
      
      The patch would need to be tweaked a bit to apply to 4.10 and older but
      I will do that as soon as it hits the Linus tree in the next merge
      window.
      
      Link: http://lkml.kernel.org/r/20170117103702.28542-3-mhocko@kernel.org
      Fixes: b2e18757 ("mm, vmscan: begin reclaiming pages on a per-node basis")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: default avatarTrevor Cordes <trevor@tecnopolis.ca>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      71ab6cfe
    • Michal Hocko's avatar
      mm, vmscan: cleanup lru size claculations · fd538803
      Michal Hocko authored
      lruvec_lru_size returns the full size of the LRU list while we sometimes
      need a value reduced only to eligible zones (e.g.  for lowmem requests).
      inactive_list_is_low is one such user.  Later patches will add more of
      them.  Add a new parameter to lruvec_lru_size and allow it filter out
      zones which are not eligible for the given context.
      
      Link: http://lkml.kernel.org/r/20170117103702.28542-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd538803
    • Michal Hocko's avatar
      mm, vmscan: do not count freed pages as PGDEACTIVATE · f0958906
      Michal Hocko authored
      PGDEACTIVATE represents the number of pages moved from the active list
      to the inactive list.  At least this sounds like the original motivation
      of the counter.  move_active_pages_to_lru, however, counts pages which
      got freed in the mean time as deactivated as well.  This is a very rare
      event and counting them as deactivation in itself is not harmful but it
      makes the code more convoluted than necessary - we have to count both
      all pages and those which are freed which is a bit confusing.
      
      After this patch the PGDEACTIVATE should have a slightly more clear
      semantic and only count those pages which are moved from the active to
      the inactive list which is a plus.
      
      Link: http://lkml.kernel.org/r/20170112211221.17636-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0958906
    • Geliang Tang's avatar
      mm/backing-dev.c: use rb_entry() · bc71226b
      Geliang Tang authored
      To make the code clearer, use rb_entry() instead of container_of() to
      deal with rbtree.
      
      Link: http://lkml.kernel.org/r/671275de093d93ddc7c6f77ddc0d357149691a39.1484306840.git.geliangtang@gmail.comSigned-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc71226b
    • David Rientjes's avatar
      mm, thp: add new defer+madvise defrag option · 21440d7e
      David Rientjes authored
      There is no thp defrag option that currently allows MADV_HUGEPAGE
      regions to do direct compaction and reclaim while all other thp
      allocations simply trigger kswapd and kcompactd in the background and
      fail immediately.
      
      The "defer" setting simply triggers background reclaim and compaction
      for all regions, regardless of MADV_HUGEPAGE, which makes it unusable
      for our userspace where MADV_HUGEPAGE is being used to indicate the
      application is willing to wait for work for thp memory to be available.
      
      The "madvise" setting will do direct compaction and reclaim for these
      MADV_HUGEPAGE regions, but does not trigger kswapd and kcompactd in the
      background for anybody else.
      
      For reasonable usage, there needs to be a mesh between the two options.
      This patch introduces a fifth mode, "defer+madvise", that will do direct
      reclaim and compaction for MADV_HUGEPAGE regions and trigger background
      reclaim and compaction for everybody else so that hugepages may be
      available in the near future.
      
      A proposal to allow direct reclaim and compaction for MADV_HUGEPAGE
      regions as part of the "defer" mode, making it a very powerful setting
      and avoids breaking userspace, was offered:
           http://marc.info/?t=148236612700003
      This additional mode is a compromise.
      
      A second proposal to allow both "defer" and "madvise" to be selected at
      the same time was also offered:
           http://marc.info/?t=148357345300001.
      This is possible, but there was a concern that it might break existing
      userspaces the parse the output of the defrag mode, so the fifth option
      was introduced instead.
      
      This patch also cleans up the helper function for storing to "enabled"
      and "defrag" since the former supports three modes while the latter
      supports five and triple_flag_store() was getting unnecessarily messy.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701101614330.41805@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      21440d7e
    • Huang Ying's avatar
      mm/swap: skip readahead only when swap slot cache is enabled · ba81f838
      Huang Ying authored
      Because during swap off, a swap entry may have swap_map[] ==
      SWAP_HAS_CACHE (for example, just allocated).  If we return NULL in
      __read_swap_cache_async(), the swap off will abort.  So when swap slot
      cache is disabled, (for swap off), we will wait for page to be put into
      swap cache in such race condition.  This should not be a problem for swap
      slot cache, because swap slot cache should be drained after clearing
      swap_slot_cache_enabled.
      
      [ying.huang@intel.com: fix memory leak in __read_swap_cache_async()]
        Link: http://lkml.kernel.org/r/874lzt6znd.fsf@yhuang-dev.intel.com
      Link: http://lkml.kernel.org/r/5e2c5f6abe8e6eb0797408897b1bba80938e9b9d.1484082593.git.tim.c.chen@linux.intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Corbet <corbet@lwn.net> escreveu:
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba81f838