1. 16 Nov, 2017 40 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 051089a2
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
       "Xen features and fixes for v4.15-rc1
      
        Apart from several small fixes it contains the following features:
      
         - a series by Joao Martins to add vdso support of the pv clock
           interface
      
         - a series by Juergen Gross to add support for Xen pv guests to be
           able to run on 5 level paging hosts
      
         - a series by Stefano Stabellini adding the Xen pvcalls frontend
           driver using a paravirtualized socket interface"
      
      * tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits)
        xen/pvcalls: fix potential endless loop in pvcalls-front.c
        xen/pvcalls: Add MODULE_LICENSE()
        MAINTAINERS: xen, kvm: track pvclock-abi.h changes
        x86/xen/time: setup vcpu 0 time info page
        x86/xen/time: set pvclock flags on xen_time_init()
        x86/pvclock: add setter for pvclock_pvti_cpu0_va
        ptp_kvm: probe for kvm guest availability
        xen/privcmd: remove unused variable pageidx
        xen: select grant interface version
        xen: update arch/x86/include/asm/xen/cpuid.h
        xen: add grant interface version dependent constants to gnttab_ops
        xen: limit grant v2 interface to the v1 functionality
        xen: re-introduce support for grant v2 interface
        xen: support priv-mapping in an HVM tools domain
        xen/pvcalls: remove redundant check for irq >= 0
        xen/pvcalls: fix unsigned less than zero error check
        xen/time: Return -ENODEV from xen_get_wallclock()
        xen/pvcalls-front: mark expected switch fall-through
        xen: xenbus_probe_frontend: mark expected switch fall-throughs
        xen/time: do not decrease steal time after live migration on xen
        ...
      051089a2
    • Linus Torvalds's avatar
      Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 974aa563
      Linus Torvalds authored
      Pull KVM updates from Radim Krčmář:
       "First batch of KVM changes for 4.15
      
        Common:
         - Python 3 support in kvm_stat
         - Accounting of slabs to kmemcg
      
        ARM:
         - Optimized arch timer handling for KVM/ARM
         - Improvements to the VGIC ITS code and introduction of an ITS reset
           ioctl
         - Unification of the 32-bit fault injection logic
         - More exact external abort matching logic
      
        PPC:
         - Support for running hashed page table (HPT) MMU mode on a host that
           is using the radix MMU mode; single threaded mode on POWER 9 is
           added as a pre-requisite
         - Resolution of merge conflicts with the last second 4.14 HPT fixes
         - Fixes and cleanups
      
        s390:
         - Some initial preparation patches for exitless interrupts and crypto
         - New capability for AIS migration
         - Fixes
      
        x86:
         - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
           and after-reset state
         - Refined dependencies for VMX features
         - Fixes for nested SMI injection
         - A lot of cleanups"
      
      * tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
        KVM: s390: provide a capability for AIS state migration
        KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
        KVM: s390: abstract conversion between isc and enum irq_types
        KVM: s390: vsie: use common code functions for pinning
        KVM: s390: SIE considerations for AP Queue virtualization
        KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
        KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
        KVM: arm/arm64: fix the incompatible matching for external abort
        KVM: arm/arm64: Unify 32bit fault injection
        KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
        KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
        KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
        KVM: arm/arm64: vgic-its: New helper functions to free the caches
        KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
        arm/arm64: KVM: Load the timer state when enabling the timer
        KVM: arm/arm64: Rework kvm_timer_should_fire
        KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
        KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
        KVM: arm/arm64: Move phys_timer_emulate function
        KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
        ...
      974aa563
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 441692aa
      Linus Torvalds authored
      Pull ARM updates from Russell King:
      
       - add support for ELF fdpic binaries on both MMU and noMMU platforms
      
       - linker script cleanups
      
       - support for compressed .data section for XIP images
      
       - discard memblock arrays when possible
      
       - various cleanups
      
       - atomic DMA pool updates
      
       - better diagnostics of missing/corrupt device tree
      
       - export information to allow userspace kexec tool to place images more
         inteligently, so that the device tree isn't overwritten by the
         booting kernel
      
       - make early_printk more efficient on semihosted systems
      
       - noMMU cleanups
      
       - SA1111 PCMCIA update in preparation for further cleanups
      
      * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (38 commits)
        ARM: 8719/1: NOMMU: work around maybe-uninitialized warning
        ARM: 8717/2: debug printch/printascii: translate '\n' to "\r\n" not "\n\r"
        ARM: 8713/1: NOMMU: Support MPU in XIP configuration
        ARM: 8712/1: NOMMU: Use more MPU regions to cover memory
        ARM: 8711/1: V7M: Add support for MPU to M-class
        ARM: 8710/1: Kconfig: Kill CONFIG_VECTORS_BASE
        ARM: 8709/1: NOMMU: Disallow MPU for XIP
        ARM: 8708/1: NOMMU: Rework MPU to be mostly done in C
        ARM: 8707/1: NOMMU: Update MPU accessors to use cp15 helpers
        ARM: 8706/1: NOMMU: Move out MPU setup in separate module
        ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()
        ARM: 8705/1: early_printk: use printascii() rather than printch()
        ARM: 8703/1: debug.S: move hexbuf to a writable section
        ARM: add additional table to compressed kernel
        ARM: decompressor: fix BSS size calculation
        pcmcia: sa1111: remove special sa1111 mmio accessors
        pcmcia: sa1111: use sa1111_get_irq() to obtain IRQ resources
        ARM: better diagnostics with missing/corrupt dtb
        ARM: 8699/1: dma-mapping: Remove init_dma_coherent_pool_size()
        ARM: 8698/1: dma-mapping: Mark atomic_pool as __ro_after_init
        ..
      441692aa
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5b0e2cb0
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
       "A bit of a small release, I suspect in part due to me travelling for
        KS. But my backlog of patches to review is smaller than usual, so I
        think in part folks just didn't send as much this cycle.
      
        Non-highlights:
      
         - Five fixes for the >128T address space handling, both to fix bugs
           in our implementation and to bring the semantics exactly into line
           with x86.
      
        Highlights:
      
         - Support for a new OPAL call on bare metal machines which gives us a
           true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.
      
         - Support for Power9 DD2 in the CXL driver.
      
         - Improvements to machine check handling so that uncorrectable errors
           can be reported into the generic memory_failure() machinery.
      
         - Some fixes and improvements for VPHN, which is used under PowerVM
           to notify the Linux partition of topology changes.
      
         - Plumbing to enable TM (transactional memory) without suspend on
           some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).
      
         - Support for emulating vector loads form cache-inhibited memory, on
           some Power9 revisions.
      
         - Disable the fast-endian switch "syscall" by default (behind a
           CONFIG), we believe it has never had any users.
      
         - A major rework of the API drivers use when initiating and waiting
           for long running operations performed by OPAL firmware, and changes
           to the powernv_flash driver to use the new API.
      
         - Several fixes for the handling of FP/VMX/VSX while processes are
           using transactional memory.
      
         - Optimisations of TLB range flushes when using the radix MMU on
           Power9.
      
         - Improvements to the VAS facility used to access coprocessors on
           Power9, and related improvements to the way the NX crypto driver
           handles requests.
      
         - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
      
        Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
        Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
        Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
        Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
        Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
        Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
        Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
        Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
        Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
        Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
        Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
        Kennington III"
      
      * tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
        powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
        powerpc/64s: Fix masking of SRR1 bits on instruction fault
        powerpc/64s: mm_context.addr_limit is only used on hash
        powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
        powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
        powerpc/64s/hash: Fix fork() with 512TB process address space
        powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
        powerpc/64s/hash: Fix 512T hint detection to use >= 128T
        powerpc: Fix DABR match on hash based systems
        powerpc/signal: Properly handle return value from uprobe_deny_signal()
        powerpc/fadump: use kstrtoint to handle sysfs store
        powerpc/lib: Implement UACCESS_FLUSHCACHE API
        powerpc/lib: Implement PMEM API
        powerpc/powernv/npu: Don't explicitly flush nmmu tlb
        powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
        powerpc/powernv/idle: Round up latency and residency values
        powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
        powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
        powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
        powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
        ...
      5b0e2cb0
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 758f8758
      Linus Torvalds authored
      Pull user namespace update from Eric Biederman:
       "The only change that is production ready this round is the work to
        increase the number of uid and gid mappings a user namespace can
        support from 5 to 340.
      
        This code was carefully benchmarked and it was confirmed that in the
        existing cases the performance remains the same. In the worst case
        with 340 mappings an cache cold stat times go from 158ns to 248ns.
        That is noticable but still quite small, and only the people who are
        doing crazy things pay the cost.
      
        This work uncovered some documentation and cleanup opportunities in
        the mapping code, and patches to make those cleanups and improve the
        documentation will be coming in the next merge window"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        userns: Simplify insert_extent
        userns: Make map_id_down a wrapper for map_id_range_down
        userns: Don't read extents twice in m_start
        userns: Simplify the user and group mapping functions
        userns: Don't special case a count of 0
        userns: bump idmap limits to 340
        userns: use union in {g,u}idmap struct
      758f8758
    • Linus Torvalds's avatar
      Merge tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · a02cd422
      Linus Torvalds authored
      Pull f2fs updates from Jaegeuk Kim:
       "In this round, we introduce sysfile-based quota support which is
        required for Android by default. In addition, we allow that users are
        able to reserve some blocks in runtime to mitigate performance drops
        in low free space.
      
        Enhancements:
         - assign proper data segments according to write_hints given by user
         - issue cache_flush on dirty devices only among multiple devices
         - exploit cp_error flag and add more faults to enhance fault
           injection test
         - conduct more readaheads during f2fs_readdir
         - add a range for discard commands
      
        Bug fixes:
         - fix zero stat->st_blocks when inline_data is set
         - drop crypto key and free stale memory pointer while evict_inode is
           failing
         - fix some corner cases in free space and segment management
         - fix wrong last_disk_size
      
        This series includes lots of clean-ups and code enhancement in terms
        of xattr operations, discard/flush command control. In addition, it
        adds versatile debugfs entries to monitor f2fs status"
      
      * tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (75 commits)
        f2fs: deny accessing encryption policy if encryption is off
        f2fs: inject fault in inc_valid_node_count
        f2fs: fix to clear FI_NO_PREALLOC
        f2fs: expose quota information in debugfs
        f2fs: separate nat entry mem alloc from nat_tree_lock
        f2fs: validate before set/clear free nat bitmap
        f2fs: avoid opened loop codes in __add_ino_entry
        f2fs: apply write hints to select the type of segments for buffered write
        f2fs: introduce scan_curseg_cache for cleanup
        f2fs: optimize the way of traversing free_nid_bitmap
        f2fs: keep scanning until enough free nids are acquired
        f2fs: trace checkpoint reason in fsync()
        f2fs: keep isize once block is reserved cross EOF
        f2fs: avoid race in between GC and block exchange
        f2fs: save a multiplication for last_nid calculation
        f2fs: fix summary info corruption
        f2fs: remove dead code in update_meta_page
        f2fs: remove unneeded semicolon
        f2fs: don't bother with inode->i_version
        f2fs: check curseg space before foreground GC
        ...
      a02cd422
    • Linus Torvalds's avatar
      Merge tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 487e2c9f
      Linus Torvalds authored
      Pull AFS updates from David Howells:
       "kAFS filesystem driver overhaul.
      
        The major points of the overhaul are:
      
         (1) Preliminary groundwork is laid for supporting network-namespacing
             of kAFS. The remainder of the namespacing work requires some way
             to pass namespace information to submounts triggered by an
             automount. This requires something like the mount overhaul that's
             in progress.
      
         (2) sockaddr_rxrpc is used in preference to in_addr for holding
             addresses internally and add support for talking to the YFS VL
             server. With this, kAFS can do everything over IPv6 as well as
             IPv4 if it's talking to servers that support it.
      
         (3) Callback handling is overhauled to be generally passive rather
             than active. 'Callbacks' are promises by the server to tell us
             about data and metadata changes. Callbacks are now checked when
             we next touch an inode rather than actively going and looking for
             it where possible.
      
         (4) File access permit caching is overhauled to store the caching
             information per-inode rather than per-directory, shared over
             subordinate files. Whilst older AFS servers only allow ACLs on
             directories (shared to the files in that directory), newer AFS
             servers break that restriction.
      
             To improve memory usage and to make it easier to do mass-key
             removal, permit combinations are cached and shared.
      
         (5) Cell database management is overhauled to allow lighter locks to
             be used and to make cell records autonomous state machines that
             look after getting their own DNS records and cleaning themselves
             up, in particular preventing races in acquiring and relinquishing
             the fscache token for the cell.
      
         (6) Volume caching is overhauled. The afs_vlocation record is got rid
             of to simplify things and the superblock is now keyed on the cell
             and the numeric volume ID only. The volume record is tied to a
             superblock and normal superblock management is used to mediate
             the lifetime of the volume fscache token.
      
         (7) File server record caching is overhauled to make server records
             independent of cells and volumes. A server can be in multiple
             cells (in such a case, the administrator must make sure that the
             VL services for all cells correctly reflect the volumes shared
             between those cells).
      
             Server records are now indexed using the UUID of the server
             rather than the address since a server can have multiple
             addresses.
      
         (8) File server rotation is overhauled to handle VMOVED, VBUSY (and
             similar), VOFFLINE and VNOVOL indications and to handle rotation
             both of servers and addresses of those servers. The rotation will
             also wait and retry if the server says it is busy.
      
         (9) Data writeback is overhauled. Each inode no longer stores a list
             of modified sections tagged with the key that authorised it in
             favour of noting the modified region of a page in page->private
             and storing a list of keys that made modifications in the inode.
      
             This simplifies things and allows other keys to be used to
             actually write to the server if a key that made a modification
             becomes useless.
      
        (10) Writable mmap() is implemented. This allows a kernel to be build
             entirely on AFS.
      
        Note that Pre AFS-3.4 servers are no longer supported, though this can
        be added back if necessary (AFS-3.4 was released in 1998)"
      
      * tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (35 commits)
        afs: Protect call->state changes against signals
        afs: Trace page dirty/clean
        afs: Implement shared-writeable mmap
        afs: Get rid of the afs_writeback record
        afs: Introduce a file-private data record
        afs: Use a dynamic port if 7001 is in use
        afs: Fix directory read/modify race
        afs: Trace the sending of pages
        afs: Trace the initiation and completion of client calls
        afs: Fix documentation on # vs % prefix in mount source specification
        afs: Fix total-length calculation for multiple-page send
        afs: Only progress call state at end of Tx phase from rxrpc callback
        afs: Make use of the YFS service upgrade to fully support IPv6
        afs: Overhaul volume and server record caching and fileserver rotation
        afs: Move server rotation code into its own file
        afs: Add an address list concept
        afs: Overhaul cell database management
        afs: Overhaul permit caching
        afs: Overhaul the callback handling
        afs: Rename struct afs_call server member to cm_server
        ...
      487e2c9f
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · b630a23a
      Linus Torvalds authored
      Pull pin control updates from Linus Walleij:
       "This is the bulk of pin control changes for the v4.15 kernel cycle:
      
        Core:
      
         - The pin control Kconfig entry PINCTRL is now turned into a
           menuconfig option. This obviously has the implication of making the
           subsystem menu visible in menuconfig. This is happening because of
           two things:
      
            (a) Intel have started to deploy and depend on pin controllers in
                a way that is affecting users directly. This happens on the
                highly integrated laptop chipsets named after geographical
                places: baytrail, broxton, cannonlake, cedarfork, cherryview,
                denverton, geminilake, lewisburg, merrifield, sunrisepoint...
                It started a while back and now it is ever more evident that
                this is crucial infrastructure for x86 laptops and not an
                embedded obscurity anymore. Users need to be aware.
      
            (b) Pin control expanders on I2C and SPI that are arch-agnostic.
                Currently Semtech SX150X and Microchip MCP28x08 but more are
                expected. Users will have to be able to configure these in
                directly for their set-up.
      
         - Just go and select GPIOLIB now that we made sure that GPIOLIB is a
           very vanilla subsystem. Do not depend on it, if we need it, select
           it.
      
         - Exposing the pin control subsystem in menuconfig uncovered a bunch
           of obscure bugs that are now hopefully fixed, all more or less
           pertaining to Blackfin.
      
         - Unified namespace for cross-calls between pin control and GPIO.
      
         - New support for clock skew/delay generic DT bindings and generic
           pin config options for this.
      
         - Minor documentation improvements.
      
        Various:
      
         - The Renesas SH-PFC pin controller has evolved a lot. It seems
           Renesas are churning out new SoCs by the minute.
      
         - A bunch of non-critical fixes for the Rockchip driver.
      
         - Improve the use of library functions instead of open coding.
      
         - Support the MCP28018 variant in the MCP28x08 driver.
      
         - Static constifying"
      
      * tag 'pinctrl-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (91 commits)
        pinctrl: gemini: Fix missing pad descriptions
        pinctrl: Add some depends on HAS_IOMEM
        pinctrl: samsung/s3c24xx: add CONFIG_OF dependency
        pinctrl: gemini: Fix GMAC groups
        pinctrl: qcom: spmi-gpio: Add pmi8994 gpio support
        pinctrl: ti-iodelay: remove redundant unused variable dev
        pinctrl: max77620: Use common error handling code in max77620_pinconf_set()
        pinctrl: gemini: Implement clock skew/delay config
        pinctrl: gemini: Use generic DT parser
        pinctrl: Add skew-delay pin config and bindings
        pinctrl: armada-37xx: Add edge both type gpio irq support
        pinctrl: uniphier: remove eMMC hardware reset pin-mux
        pinctrl: rockchip: Add iomux-route switching support for rk3288
        pinctrl: intel: Add Intel Cedar Fork PCH pin controller support
        pinctrl: intel: Make offset to interrupt status register configurable
        pinctrl: sunxi: Enforce the strict mode by default
        pinctrl: sunxi: Disable strict mode for old pinctrl drivers
        pinctrl: sunxi: Introduce the strict flag
        pinctrl: sh-pfc: Save/restore registers for PSCI system suspend
        pinctrl: sh-pfc: r8a7796: Use generic IOCTRL register description
        ...
      b630a23a
    • Linus Torvalds's avatar
      Merge tag 'backlight-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight · 9c7a867e
      Linus Torvalds authored
      Pull backlight updates from Lee Jones:
      
         - handle 32bit overflow in pwm_bl
      
         - remove redundant code/checks in tps65217_bl and ili922x
      
      * tag 'backlight-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
        backlight: ili922x: Remove redundant variable len
        backlight: tps65217_bl: Remove unnecessary default brightness check
        backlight: pwm_bl: Fix overflow condition
      9c7a867e
    • Linus Torvalds's avatar
      Merge tag 'mfd-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · d3092e4e
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "New drivers:
         - Add support for Cherry Trail Dollar Cove TI PMIC
         - Add support for Add Spreadtrum SC27xx series PMICs
      
        New device support:
         - Add support Regulator to axp20x
      
        New functionality:
         - Add DT support; aspeed-scu sc27xx-pmic
         - Add power saving support; rts5249
      
        Fix-ups:
         - DT clean-up/rework; tps65217, max77693, iproc-cdru, iproc-mhb, tps65218
         - Staticise/constify; stw481x
         - Use new succinct IRQ API; fsl-imx25-tsadc
         - Kconfig fix-ups; MFD_TPS65218
         - Identify SPI method; lpc_ich
         - Use managed resources (devm_*) calls; ssbi
         - Remove unused/obsolete code/documentation; mc13xxx
      
        Bug fixes:
         - Fix typo in MAINTAINERS
         - Fix error handling; mxs-lradc
         - Clean-up IRQs on .remove; fsl-imx25-tsadc"
      
      * tag 'mfd-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (21 commits)
        dt-bindings: mfd: mc13xxx: Remove obsolete property
        mfd: axp20x: Add axp20x-regulator cell for AXP813
        mfd: Add Spreadtrum SC27xx series PMICs driver
        dt-bindings: mfd: Add Spreadtrum SC27xx PMIC documentation
        mfd: ssbi: Use devm_of_platform_populate()
        mfd: fsl-imx25: Clean up irq settings during removal
        mfd: mxs-lradc: Fix error handling in mxs_lradc_probe()
        mfd: lpc_ich: Avoton/Rangeley uses SPI_BYT method
        mfd: tps65218: Introduce dependency on CONFIG_OF
        mfd: tps65218: Correct the config description
        MAINTAINERS: Fix Dialog search term for watchdog binding file
        mfd: fsl-imx25: Set irq handler and data in one go
        mfd: rts5249: Add support for RTS5250S power saving
        ACPI / PMIC: Add opregion driver for Intel Dollar Cove TI PMIC
        mfd: Add support for Cherry Trail Dollar Cove TI PMIC
        syscon: dt-bindings: Add binding document for iProc MHB block
        syscon: dt-bindings: Add binding doc for Broadcom iProc CDRU
        mfd: max77693: Add muic of_compatible in mfd_cell
        mfd: stw481x: Make three arrays static const, reduces object code size
        mfd: tps65217: Introduce dependency on CONFIG_OF
        ...
      d3092e4e
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.15-rc1' of... · 2bf16b7a
      Linus Torvalds authored
      Merge tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
      
      Pull char/misc updates from Greg KH:
       "Here is the big set of char/misc and other driver subsystem patches
        for 4.15-rc1.
      
        There are small changes all over here, hyperv driver updates, pcmcia
        driver updates, w1 driver updats, vme driver updates, nvmem driver
        updates, and lots of other little one-off driver updates as well. The
        shortlog has the full details.
      
        All of these have been in linux-next for quite a while with no
        reported issues"
      
      * tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
        VME: Return -EBUSY when DMA list in use
        w1: keep balance of mutex locks and refcnts
        MAINTAINERS: Update VME subsystem tree.
        nvmem: sunxi-sid: add support for A64/H5's SID controller
        nvmem: imx-ocotp: Update module description
        nvmem: imx-ocotp: Enable i.MX7D OTP write support
        nvmem: imx-ocotp: Add i.MX7D timing write clock setup support
        nvmem: imx-ocotp: Move i.MX6 write clock setup to dedicated function
        nvmem: imx-ocotp: Add support for banked OTP addressing
        nvmem: imx-ocotp: Pass parameters via a struct
        nvmem: imx-ocotp: Restrict OTP write to IMX6 processors
        nvmem: uniphier: add UniPhier eFuse driver
        dt-bindings: nvmem: add description for UniPhier eFuse
        nvmem: set nvmem->owner to nvmem->dev->driver->owner if unset
        nvmem: qfprom: fix different address space warnings of sparse
        nvmem: mtk-efuse: fix different address space warnings of sparse
        nvmem: mtk-efuse: use stack for nvmem_config instead of malloc'ing it
        nvmem: imx-iim: use stack for nvmem_config instead of malloc'ing it
        thunderbolt: tb: fix use after free in tb_activate_pcie_devices
        MAINTAINERS: Add git tree for Thunderbolt development
        ...
      2bf16b7a
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.15-rc1' of... · b9743042
      Linus Torvalds authored
      Merge tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core updates from Greg KH:
       "Here is the set of driver core / debugfs patches for 4.15-rc1.
      
        Not many here, mostly all are debugfs fixes to resolve some
        long-reported problems with files going away with references to them
        in userspace. There's also some SPDX cleanups for the debugfs code, as
        well as a few other minor driver core changes for issues reported by
        people.
      
        All of these have been in linux-next for a week or more with no
        reported issues"
      
      * tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        driver core: Fix device link deferred probe
        debugfs: Remove redundant license text
        debugfs: add SPDX identifiers to all debugfs files
        debugfs: defer debugfs_fsdata allocation to first usage
        debugfs: call debugfs_real_fops() only after debugfs_file_get()
        debugfs: purge obsolete SRCU based removal protection
        IB/hfi1: convert to debugfs_file_get() and -put()
        debugfs: convert to debugfs_file_get() and -put()
        debugfs: debugfs_real_fops(): drop __must_hold sparse annotation
        debugfs: implement per-file removal protection
        debugfs: add support for more elaborate ->d_fsdata
        driver core: Move device_links_purge() after bus_remove_device()
        arch_topology: Fix section miss match warning due to free_raw_capacity()
        driver-core: pr_err() strings should end with newlines
      b9743042
    • Radim Krčmář's avatar
      Merge tag 'kvm-s390-next-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux · a6014f1a
      Radim Krčmář authored
      KVM: s390: fixes and improvements for 4.15
      
      - Some initial preparation patches for exitless interrupts and crypto
      - New capability for AIS migration
      - Fixes
      - merge of the sthyi tree from the base s390 team, which moves the sthyi
      out of KVM into a shared function also for non-KVM
      a6014f1a
    • Linus Torvalds's avatar
      Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux · e60e1ee6
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "This is the main drm pull request for v4.15.
      
        Core:
         - Atomic object lifetime fixes
         - Atomic iterator improvements
         - Sparse/smatch fixes
         - Legacy kms ioctls to be interruptible
         - EDID override improvements
         - fb/gem helper cleanups
         - Simple outreachy patches
         - Documentation improvements
         - Fix dma-buf rcu races
         - DRM mode object leasing for improving VR use cases.
         - vgaarb improvements for non-x86 platforms.
      
        New driver:
         - tve200: Faraday Technology TVE200 block.
      
           This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in
           the StorLink SL3516 (later Cortina Systems CS3516) as well as the
           Grain Media GM8180.
      
        New bridges:
         - SiI9234 support
      
        New panels:
         - S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba
           LT089AC19000, Innolux AT043TN24
      
        i915:
         - Remove Coffeelake from alpha support
         - Cannonlake workarounds
         - Infoframe refactoring for DisplayPort
         - VBT updates
         - DisplayPort vswing/emph/buffer translation refactoring
         - CCS fixes
         - Restore GPU clock boost on missed vblanks
         - Scatter list updates for userptr allocations
         - Gen9+ transition watermarks
         - Display IPC (Isochronous Priority Control)
         - Private PAT management
         - GVT: improved error handling and pci config sanitizing
         - Execlist refactoring
         - Transparent Huge Page support
         - User defined priorities support
         - HuC/GuC firmware refactoring
         - DP MST fixes
         - eDP power sequencing fixes
         - Use RCU instead of stop_machine
         - PSR state tracking support
         - Eviction fixes
         - BDW DP aux channel timeout fixes
         - LSPCON fixes
         - Cannonlake PLL fixes
      
        amdgpu:
         - Per VM BO support
         - Powerplay cleanups
         - CI powerplay support
         - PASID mgr for kfd
         - SR-IOV fixes
         - initial GPU reset for vega10
         - Prime mmap support
         - TTM updates
         - Clock query interface for Raven
         - Fence to handle ioctl
         - UVD encode ring support on Polaris
         - Transparent huge page DMA support
         - Compute LRU pipe tweaks
         - BO flag to allow buffers to opt out of implicit sync
         - CTX priority setting API
         - VRAM lost infrastructure plumbing
      
        qxl:
         - fix flicker since atomic rework
      
        amdkfd:
         - Further improvements from internal AMD tree
         - Usermode events
         - Drop radeon support
      
        nouveau:
         - Pascal temperature sensor support
         - Improved BAR2 handling
         - MMU rework to support Pascal MMU
      
        exynos:
         - Improved HDMI/mixer support
         - HDMI audio interface support
      
        tegra:
         - Prep work for tegra186
         - Cleanup/fixes
      
        msm:
         - Preemption support for a5xx
         - Display fixes for 8x96 (snapdragon 820)
         - Async cursor plane fixes
         - FW loading rework
         - GPU debugging improvements
      
        vc4:
         - Prep for DSI panels
         - fix T-format tiling scanout
         - New madvise ioctl
      
        Rockchip:
         - LVDS support
      
        omapdrm:
         - omap4 HDMI CEC support
      
        etnaviv:
         - GPU performance counters groundwork
      
        sun4i:
         - refactor driver load + TCON backend
         - HDMI improvements
         - A31 support
         - Misc fixes
      
        udl:
         - Probe/EDID read fixes.
      
        tilcdc:
         - Misc fixes.
      
        pl111:
         - Support more variants
      
        adv7511:
         - Improve EDID handling.
         - HDMI CEC support
      
        sii8620:
         - Add remote control support"
      
      * tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits)
        drm/rockchip: analogix_dp: Use mutex rather than spinlock
        drm/mode_object: fix documentation for object lookups.
        drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU
        drm/i915: Move init_clock_gating() back to where it was
        drm/i915: Prune the reservation shared fence array
        drm/i915: Idle the GPU before shinking everything
        drm/i915: Lock llist_del_first() vs llist_del_all()
        drm/i915: Calculate ironlake intermediate watermarks correctly, v2.
        drm/i915: Disable lazy PPGTT page table optimization for vGPU
        drm/i915/execlists: Remove the priority "optimisation"
        drm/i915: Filter out spurious execlists context-switch interrupts
        drm/amdgpu: use irq-safe lock for kiq->ring_lock
        drm/amdgpu: bypass lru touch for KIQ ring submission
        drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
        drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
        drm/amd/powerplay: initialize a variable before using it
        drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels
        drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition
        drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug
        drm/rockchip: add CONFIG_OF dependency for lvds
        ...
      e60e1ee6
    • Linus Torvalds's avatar
      Merge tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 5d352e69
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
      
       - Documentation for digital TV (both kAPI and uAPI) are now in sync
         with the implementation (except for legacy/deprecated ioctls). This
         is a major step, as there were always a gap there
      
       - New sensor driver: imx274
      
       - New cec driver: cec-gpio
      
       - New platform driver for rockship rga and tegra CEC
      
       - New RC driver: tango-ir
      
       - Several cleanups at atomisp driver
      
       - Core improvements for RC, CEC, V4L2 async probing support and DVB
      
       - Lots of drivers cleanup, fixes and improvements.
      
      * tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (332 commits)
        dvb_frontend: don't use-after-free the frontend struct
        media: dib0700: fix invalid dvb_detach argument
        media: v4l2-ctrls: Don't validate BITMASK twice
        media: s5p-mfc: fix lockdep warning
        media: dvb-core: always call invoke_release() in fe_free()
        media: usb: dvb-usb-v2: dvb_usb_core: remove redundant code in dvb_usb_fe_sleep
        media: au0828: make const array addr_list static
        media: cx88: make const arrays default_addr_list and pvr2000_addr_list static
        media: drxd: make const array fastIncrDecLUT static
        media: usb: fix spelling mistake: "synchronuously" -> "synchronously"
        media: ddbridge: fix build warnings
        media: av7110: avoid 2038 overflow in debug print
        media: Don't do DMA on stack for firmware upload in the AS102 driver
        media: v4l: async: fix unregister for implicitly registered sub-device notifiers
        media: v4l: async: fix return of unitialized variable ret
        media: imx274: fix missing return assignment from call to imx274_mode_regs
        media: camss-vfe: always initialize reg at vfe_set_xbar_cfg()
        media: atomisp: make function calls cleaner
        media: atomisp: get rid of storage_class.h
        media: atomisp: get rid of wrong stddef.h include
        ...
      5d352e69
    • Linus Torvalds's avatar
      Merge tag 'leaks-4.15-rc1' of git://github.com/tcharding/linux · 93ea0eb7
      Linus Torvalds authored
      Pull leaking_addresses script updates from Tobin Harding:
       "Here are development patches for the leaking_addresses.pl script.
      
        Changes include:
      
         - add summary reporting to the script
      
         - add 'SigIgn' to false positives
      
         - add a file read timeout so the script doesn't block indefinitely
      
         - add infrastructure to enable multi-arch support and add support for ppc
      
         - add some exclude files/paths suggested by various people
      
         - code clean up and refactoring
      
         - overhaul command line options"
      
      * tag 'leaks-4.15-rc1' of git://github.com/tcharding/linux:
        leaking_addresses: add SigIgn to false positives
        leaking_addresses: add timeout on file read
        leaking_addresses: add support for ppc64
        leaking_addresses: add summary reporting options
        leaking_addresses: add to exclude files/paths list
        leaking_addresses: fix comment string typo
        leaking_addresses: remove command line options
        leaking_addresses: remove dead/unused code
        leaking_addresses: use tabs instead of spaces
      93ea0eb7
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 7c225c69
      Linus Torvalds authored
      Merge updates from Andrew Morton:
      
       - a few misc bits
      
       - ocfs2 updates
      
       - almost all of MM
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
        memory hotplug: fix comments when adding section
        mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
        mm: simplify nodemask printing
        mm,oom_reaper: remove pointless kthread_run() error check
        mm/page_ext.c: check if page_ext is not prepared
        writeback: remove unused function parameter
        mm: do not rely on preempt_count in print_vma_addr
        mm, sparse: do not swamp log with huge vmemmap allocation failures
        mm/hmm: remove redundant variable align_end
        mm/list_lru.c: mark expected switch fall-through
        mm/shmem.c: mark expected switch fall-through
        mm/page_alloc.c: broken deferred calculation
        mm: don't warn about allocations which stall for too long
        fs: fuse: account fuse_inode slab memory as reclaimable
        mm, page_alloc: fix potential false positive in __zone_watermark_ok
        mm: mlock: remove lru_add_drain_all()
        mm, sysctl: make NUMA stats configurable
        shmem: convert shmem_init_inodecache() to void
        Unify migrate_pages and move_pages access checks
        mm, pagevec: rename pagevec drained field
        ...
      7c225c69
    • Fan Du's avatar
    • Oscar Salvador's avatar
      mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP · 0cd842f9
      Oscar Salvador authored
      free_area_init_node() calls alloc_node_mem_map(), but this function does
      nothing unless we have CONFIG_FLAT_NODE_MEM_MAP.
      
      As a cleanup, we can move the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" within
      alloc_node_mem_map() out of the function, and define a
      alloc_node_mem_map() { } when CONFIG_FLAT_NODE_MEM_MAP is not present.
      
      This also moves the printk that lays within the "#ifdef
      CONFIG_FLAT_NODE_MEM_MAP" block from free_area_init_node() to
      alloc_node_mem_map(), getting rid of the "#ifdef
      CONFIG_FLAT_NODE_MEM_MAP" in free_area_init_node().
      
      [akpm@linux-foundation.org: clean up the printk while we're there]
      Link: http://lkml.kernel.org/r/20171114111935.GA11758@techadventures.netSigned-off-by: default avatarOscar Salvador <osalvador@techadventures.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0cd842f9
    • Michal Hocko's avatar
      mm: simplify nodemask printing · 0205f755
      Michal Hocko authored
      alloc_warn() and dump_header() have to explicitly handle NULL nodemask
      which forces both paths to use pr_cont.  We can do better.  printk
      already handles NULL pointers properly so all we need is to teach
      nodemask_pr_args to handle NULL nodemask carefully.  This allows
      simplification of both alloc_warn() and dump_header() and gets rid of
      pr_cont altogether.
      
      This patch has been motivated by patch from Joe Perches
      
        http://lkml.kernel.org/r/b31236dfe3fc924054fd7842bde678e71d193638.1509991345.git.joe@perches.com
      
      [akpm@linux-foundation.org: fix tile warning, per Arnd]
      Link: http://lkml.kernel.org/r/20171109100531.3cn2hcqnuj7mjaju@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJoe Perches <joe@perches.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0205f755
    • Tetsuo Handa's avatar
      mm,oom_reaper: remove pointless kthread_run() error check · c50842c8
      Tetsuo Handa authored
      Since oom_init() is called before userspace processes start, memory
      allocation failure for creating the OOM reaper kernel thread will let
      the OOM killer call panic() rather than wake up the OOM reaper.
      
      Link: http://lkml.kernel.org/r/1510137800-4602-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c50842c8
    • Jaewon Kim's avatar
      mm/page_ext.c: check if page_ext is not prepared · e492080e
      Jaewon Kim authored
      online_page_ext() and page_ext_init() allocate page_ext for each
      section, but they do not allocate if the first PFN is !pfn_present(pfn)
      or !pfn_valid(pfn).  Then section->page_ext remains as NULL.
      lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled.  For a
      valid PFN, __set_page_owner will try to get page_ext through
      lookup_page_ext.  Without CONFIG_DEBUG_VM lookup_page_ext will misuse
      NULL pointer as value 0.  This incurrs invalid address access.
      
      This is the panic example when PFN 0x100000 is not valid but PFN
      0x13FC00 is being used for page_ext.  section->page_ext is NULL,
      get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
      0x13FC00.
      
      To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
      will be checked at all times.
      
        Unable to handle kernel paging request at virtual address 01dfa014
        ------------[ cut here ]------------
        Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
        Internal error: Oops: 96000045 [#1] PREEMPT SMP
        Modules linked in:
        PC is at __set_page_owner+0x48/0x78
        LR is at __set_page_owner+0x44/0x78
          __set_page_owner+0x48/0x78
          get_page_from_freelist+0x880/0x8e8
          __alloc_pages_nodemask+0x14c/0xc48
          __do_page_cache_readahead+0xdc/0x264
          filemap_fault+0x2ac/0x550
          ext4_filemap_fault+0x3c/0x58
          __do_fault+0x80/0x120
          handle_mm_fault+0x704/0xbb0
          do_page_fault+0x2e8/0x394
          do_mem_abort+0x88/0x124
      
      Pre-4.7 kernels also need commit f86e4271 ("mm: check the return
      value of lookup_page_ext for all call sites").
      
      Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
      Fixes: eefa864b ("mm/page_ext: resurrect struct page extending code for debugging")
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: <stable@vger.kernel.org>	[depends on f86e4271, see above]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e492080e
    • Wang Long's avatar
      writeback: remove unused function parameter · 2bce774e
      Wang Long authored
      The parameter `struct bdi_writeback *wb` is not been used in the
      function body.  Remove it.
      
      Link: http://lkml.kernel.org/r/1509685485-15278-1-git-send-email-wanglong19@meituan.comSigned-off-by: default avatarWang Long <wanglong19@meituan.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bce774e
    • Michal Hocko's avatar
      mm: do not rely on preempt_count in print_vma_addr · 0a7f682d
      Michal Hocko authored
      The preempt count check on print_vma_addr has been added by commit
      e8bff74a ("x86: fix "BUG: sleeping function called from invalid
      context" in print_vma_addr()") and it relied on the elevated preempt
      count from preempt_conditional_sti because preempt_count check doesn't
      work on non preemptive kernels by default.
      
      The code has evolved though and commit d99e1bd1 ("x86/entry/traps:
      Refactor preemption and interrupt flag handling") has replaced
      preempt_conditional_sti by an explicit preempt_disable which is noop on
      !PREEMPT so the check in print_vma_addr is broken.
      
      Fix the issue by using trylock on mmap_sem rather than chacking the
      preempt count.  The allocation we are relying on has to be GFP_NOWAIT as
      well.  There is a chance that we won't dump the vma state if the lock is
      contended or the memory short but this is acceptable outcome and much
      less fragile than the not working preemption check or tricks around it.
      
      Link: http://lkml.kernel.org/r/20171106134031.g6dbelg55mrbyc6i@dhcp22.suse.cz
      Fixes: d99e1bd1 ("x86/entry/traps: Refactor preemption and interrupt flag handling")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarYang Shi <yang.s@alibaba-inc.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0a7f682d
    • Michal Hocko's avatar
      mm, sparse: do not swamp log with huge vmemmap allocation failures · fcdaf842
      Michal Hocko authored
      While doing memory hotplug tests under heavy memory pressure we have
      noticed too many page allocation failures when allocating vmemmap memmap
      backed by huge page
      
        kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
        [...]
        Call Trace:
          dump_trace+0x59/0x310
          show_stack_log_lvl+0xea/0x170
          show_stack+0x21/0x40
          dump_stack+0x5c/0x7c
          warn_alloc_failed+0xe2/0x150
          __alloc_pages_nodemask+0x3ed/0xb20
          alloc_pages_current+0x7f/0x100
          vmemmap_alloc_block+0x79/0xb6
          __vmemmap_alloc_block_buf+0x136/0x145
          vmemmap_populate+0xd2/0x2b9
          sparse_mem_map_populate+0x23/0x30
          sparse_add_one_section+0x68/0x18e
          __add_pages+0x10a/0x1d0
          arch_add_memory+0x4a/0xc0
          add_memory_resource+0x89/0x160
          add_memory+0x6d/0xd0
          acpi_memory_device_add+0x181/0x251
          acpi_bus_attach+0xfd/0x19b
          acpi_bus_scan+0x59/0x69
          acpi_device_hotplug+0xd2/0x41f
          acpi_hotplug_work_fn+0x1a/0x23
          process_one_work+0x14e/0x410
          worker_thread+0x116/0x490
          kthread+0xbd/0xe0
          ret_from_fork+0x3f/0x70
      
      and we do see many of those because essentially every allocation fails
      for each memory section.  This is an excessive way to tell the user that
      there is nothing to really worry about because we do have a fallback
      mechanism to use base pages.  The only downside might be a performance
      degradation due to TLB pressure.
      
      This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn
      explicitly once on the first allocation failure.  This will reduce the
      noise in the kernel log considerably, while we still have an indication
      that a performance might be impacted.
      
      [mhocko@kernel.org: forgot to git add the follow up fix]
        Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcdaf842
    • Colin Ian King's avatar
      mm/hmm: remove redundant variable align_end · fec11bc0
      Colin Ian King authored
      Variable align_end is assigned a value but it is never read, so the
      variable is redundant and can be removed.  Cleans up the clang warning:
      Value stored to 'align_end' is never read
      
      Link: http://lkml.kernel.org/r/20171017143837.23207-1-colin.king@canonical.comSigned-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fec11bc0
    • Gustavo A. R. Silva's avatar
      mm/list_lru.c: mark expected switch fall-through · 5b568acc
      Gustavo A. R. Silva authored
      In preparation for enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Link: http://lkml.kernel.org/r/20171020190754.GA24332@embeddedor.comSigned-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b568acc
    • Gustavo A. R. Silva's avatar
      mm/shmem.c: mark expected switch fall-through · c8402871
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Link: http://lkml.kernel.org/r/20171020191058.GA24427@embeddedor.comSigned-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8402871
    • Pavel Tatashin's avatar
      mm/page_alloc.c: broken deferred calculation · d135e575
      Pavel Tatashin authored
      In reset_deferred_meminit() we determine number of pages that must not
      be deferred.  We initialize pages for at least 2G of memory, but also
      pages for reserved memory in this node.
      
      The reserved memory is determined in this function:
      memblock_reserved_memory_within(), which operates over physical
      addresses, and returns size in bytes.  However, reset_deferred_meminit()
      assumes that that this function operates with pfns, and returns page
      count.
      
      The result is that in the best case machine boots slower than expected
      due to initializing more pages than needed in single thread, and in the
      worst case panics because fewer than needed pages are initialized early.
      
      Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
      Fixes: 864b9a39 ("mm: consider memblock reservations for deferred memory initialization sizing")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d135e575
    • Tetsuo Handa's avatar
      mm: don't warn about allocations which stall for too long · 400e2249
      Tetsuo Handa authored
      Commit 63f53dea ("mm: warn about allocations which stall for too
      long") was a great step for reducing possibility of silent hang up
      problem caused by memory allocation stalls.  But this commit reverts it,
      for it is possible to trigger OOM lockup and/or soft lockups when many
      threads concurrently called warn_alloc() (in order to warn about memory
      allocation stalls) due to current implementation of printk(), and it is
      difficult to obtain useful information due to limitation of synchronous
      warning approach.
      
      Current printk() implementation flushes all pending logs using the
      context of a thread which called console_unlock().  printk() should be
      able to flush all pending logs eventually unless somebody continues
      appending to printk() buffer.
      
      Since warn_alloc() started appending to printk() buffer while waiting
      for oom_kill_process() to make forward progress when oom_kill_process()
      is processing pending logs, it became possible for warn_alloc() to force
      oom_kill_process() loop inside printk().  As a result, warn_alloc()
      significantly increased possibility of preventing oom_kill_process()
      from making forward progress.
      
      ---------- Pseudo code start ----------
      Before warn_alloc() was introduced:
      
        retry:
          if (mutex_trylock(&oom_lock)) {
            while (atomic_read(&printk_pending_logs) > 0) {
              atomic_dec(&printk_pending_logs);
              print_one_log();
            }
            // Send SIGKILL here.
            mutex_unlock(&oom_lock)
          }
          goto retry;
      
      After warn_alloc() was introduced:
      
        retry:
          if (mutex_trylock(&oom_lock)) {
            while (atomic_read(&printk_pending_logs) > 0) {
              atomic_dec(&printk_pending_logs);
              print_one_log();
            }
            // Send SIGKILL here.
            mutex_unlock(&oom_lock)
          } else if (waited_for_10seconds()) {
            atomic_inc(&printk_pending_logs);
          }
          goto retry;
      ---------- Pseudo code end ----------
      
      Although waited_for_10seconds() becomes true once per 10 seconds,
      unbounded number of threads can call waited_for_10seconds() at the same
      time.  Also, since threads doing waited_for_10seconds() keep doing
      almost busy loop, the thread doing print_one_log() can use little CPU
      resource.  Therefore, this situation can be simplified like
      
      ---------- Pseudo code start ----------
        retry:
          if (mutex_trylock(&oom_lock)) {
            while (atomic_read(&printk_pending_logs) > 0) {
              atomic_dec(&printk_pending_logs);
              print_one_log();
            }
            // Send SIGKILL here.
            mutex_unlock(&oom_lock)
          } else {
            atomic_inc(&printk_pending_logs);
          }
          goto retry;
      ---------- Pseudo code end ----------
      
      when printk() is called faster than print_one_log() can process a log.
      
      One of possible mitigation would be to introduce a new lock in order to
      make sure that no other series of printk() (either oom_kill_process() or
      warn_alloc()) can append to printk() buffer when one series of printk()
      (either oom_kill_process() or warn_alloc()) is already in progress.
      
      Such serialization will also help obtaining kernel messages in readable
      form.
      
      ---------- Pseudo code start ----------
        retry:
          if (mutex_trylock(&oom_lock)) {
            mutex_lock(&oom_printk_lock);
            while (atomic_read(&printk_pending_logs) > 0) {
              atomic_dec(&printk_pending_logs);
              print_one_log();
            }
            // Send SIGKILL here.
            mutex_unlock(&oom_printk_lock);
            mutex_unlock(&oom_lock)
          } else {
            if (mutex_trylock(&oom_printk_lock)) {
              atomic_inc(&printk_pending_logs);
              mutex_unlock(&oom_printk_lock);
            }
          }
          goto retry;
      ---------- Pseudo code end ----------
      
      But this commit does not go that direction, for we don't want to
      introduce a new lock dependency, and we unlikely be able to obtain
      useful information even if we serialized oom_kill_process() and
      warn_alloc().
      
      Synchronous approach is prone to unexpected results (e.g.  too late [1],
      too frequent [2], overlooked [3]).  As far as I know, warn_alloc() never
      helped with providing information other than "something is going wrong".
      I want to consider asynchronous approach which can obtain information
      during stalls with possibly relevant threads (e.g.  the owner of
      oom_lock and kswapd-like threads) and serve as a trigger for actions
      (e.g.  turn on/off tracepoints, ask libvirt daemon to take a memory dump
      of stalling KVM guest for diagnostic purpose).
      
      This commit temporarily loses ability to report e.g.  OOM lockup due to
      unable to invoke the OOM killer due to !__GFP_FS allocation request.
      But asynchronous approach will be able to detect such situation and emit
      warning.  Thus, let's remove warn_alloc().
      
      [1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
      [2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
      [3] commit db73ee0d ("mm, vmscan: do not loop on too_many_isolated for ever"))
      
      Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reported-by: default avataryuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
      Reported-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      400e2249
    • Johannes Weiner's avatar
      fs: fuse: account fuse_inode slab memory as reclaimable · df206988
      Johannes Weiner authored
      Fuse inodes are currently included in the unreclaimable slab counts -
      SUnreclaim in /proc/meminfo, slab_unreclaimable in /proc/vmstat and the
      per-cgroup memory.stat.  But they are reclaimable just like other
      filesystems' inodes, and /proc/sys/vm/drop_caches frees them easily.
      
      Mark the slab cache reclaimable.
      
      Link: http://lkml.kernel.org/r/20171102202727.12539-1-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df206988
    • Vlastimil Babka's avatar
      mm, page_alloc: fix potential false positive in __zone_watermark_ok · b050e376
      Vlastimil Babka authored
      Since commit 97a16fc8 ("mm, page_alloc: only enforce watermarks for
      order-0 allocations"), __zone_watermark_ok() check for high-order
      allocations will shortcut per-migratetype free list checks for
      ALLOC_HARDER allocations, and return true as long as there's free page
      of any migratetype.  The intention is that ALLOC_HARDER can allocate
      from MIGRATE_HIGHATOMIC free lists, while normal allocations can't.
      
      However, as a side effect, the watermark check will then also return
      true when there are pages only on the MIGRATE_ISOLATE list, or (prior to
      CMA conversion to ZONE_MOVABLE) on the MIGRATE_CMA list.  Since the
      allocation cannot actually obtain isolated pages, and might not be able
      to obtain CMA pages, this can result in a false positive.
      
      The condition should be rare and perhaps the outcome is not a fatal one.
      Still, it's better if the watermark check is correct.  There also
      shouldn't be a performance tradeoff here.
      
      Link: http://lkml.kernel.org/r/20171102125001.23708-1-vbabka@suse.cz
      Fixes: 97a16fc8 ("mm, page_alloc: only enforce watermarks for order-0 allocations")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b050e376
    • Shakeel Butt's avatar
      mm: mlock: remove lru_add_drain_all() · 72b03fcd
      Shakeel Butt authored
      lru_add_drain_all() is not required by mlock() and it will drain
      everything that has been cached at the time mlock is called.  And that
      is not really related to the memory which will be faulted in (and
      cached) and mlocked by the syscall itself.
      
      If anything lru_add_drain_all() should be called _after_ pages have been
      mlocked and faulted in but even that is not strictly needed because
      those pages would get to the appropriate LRUs lazily during the reclaim
      path.  Moreover follow_page_pte (gup) will drain the local pcp LRU
      cache.
      
      On larger machines the overhead of lru_add_drain_all() in mlock() can be
      significant when mlocking data already in memory.  We have observed high
      latency in mlock() due to lru_add_drain_all() when the users were
      mlocking in memory tmpfs files.
      
      [mhocko@suse.com: changelog fix]
      Link: http://lkml.kernel.org/r/20171019222507.2894-1-shakeelb@google.comSigned-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Yisheng Xie <xieyisheng1@huawei.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      72b03fcd
    • Kemi Wang's avatar
      mm, sysctl: make NUMA stats configurable · 4518085e
      Kemi Wang authored
      This is the second step which introduces a tunable interface that allow
      numa stats configurable for optimizing zone_statistics(), as suggested
      by Dave Hansen and Ying Huang.
      
      =========================================================================
      
      When page allocation performance becomes a bottleneck and you can
      tolerate some possible tool breakage and decreased numa counter
      precision, you can do:
      
      	echo 0 > /proc/sys/vm/numa_stat
      
      In this case, numa counter update is ignored.  We can see about
      *4.8%*(185->176) drop of cpu cycles per single page allocation and
      reclaim on Jesper's page_bench01 (single thread) and *8.1%*(343->315)
      drop of cpu cycles per single page allocation and reclaim on Jesper's
      page_bench03 (88 threads) running on a 2-Socket Broadwell-based server
      (88 threads, 126G memory).
      
      Benchmark link provided by Jesper D Brouer (increase loop times to
      10000000):
      
        https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
      
      =========================================================================
      
      When page allocation performance is not a bottleneck and you want all
      tooling to work, you can do:
      
      	echo 1 > /proc/sys/vm/numa_stat
      
      This is system default setting.
      
      Many thanks to Michal Hocko, Dave Hansen, Ying Huang and Vlastimil Babka
      for comments to help improve the original patch.
      
      [keescook@chromium.org: make sure mutex is a global static]
        Link: http://lkml.kernel.org/r/20171107213809.GA4314@beast
      Link: http://lkml.kernel.org/r/1508290927-8518-1-git-send-email-kemi.wang@intel.comSigned-off-by: default avatarKemi Wang <kemi.wang@intel.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Suggested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Suggested-by: default avatarYing Huang <ying.huang@intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4518085e
    • weiping zhang's avatar
      shmem: convert shmem_init_inodecache() to void · 9a8ec03e
      weiping zhang authored
      shmem_inode_cachep was created with SLAB_PANIC flag and
      shmem_init_inodecache() never returns non-zero, so convert this
      function to return void.
      
      Link: http://lkml.kernel.org/r/20170909124542.GA35224@bogon.didichuxing.comSigned-off-by: default avatarweiping zhang <zhangweiping@didichuxing.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a8ec03e
    • Otto Ebeling's avatar
      Unify migrate_pages and move_pages access checks · 31367466
      Otto Ebeling authored
      Commit 197e7e52 ("Sanitize 'move_pages()' permission checks") fixed
      a security issue I reported in the move_pages syscall, and made it so
      that you can't act on set-uid processes unless you have the
      CAP_SYS_PTRACE capability.
      
      Unify the access check logic of migrate_pages to match the new behavior
      of move_pages.  We discussed this a bit in the security@ list and
      thought it'd be good for consistency even though there's no evident
      security impact.  The NUMA node access checks are left intact and
      require CAP_SYS_NICE as before.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1710011830320.6333@lakka.kapsi.fiSigned-off-by: default avatarOtto Ebeling <otto.ebeling@iki.fi>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31367466
    • Mel Gorman's avatar
      mm, pagevec: rename pagevec drained field · 7f0b5fb9
      Mel Gorman authored
      According to Vlastimil Babka, the drained field in pagevec is
      potentially misleading because it might be interpreted as draining this
      pagevec instead of the percpu lru pagevecs.  Rename the field for
      clarity.
      
      Link: http://lkml.kernel.org/r/20171019093346.ylahzdpzmoriyf4v@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f0b5fb9
    • Vlastimil Babka's avatar
      mm, page_alloc: simplify list handling in rmqueue_bulk() · 0fac3ba5
      Vlastimil Babka authored
      rmqueue_bulk() fills an empty pcplist with pages from the free list.  It
      tries to preserve increasing order by pfn to the caller, because it
      leads to better performance with some I/O controllers, as explained in
      commit e084b2d9 ("page-allocator: preserve PFN ordering when
      __GFP_COLD is set").
      
      To preserve the order, it's sufficient to add pages to the tail of the
      list as they are retrieved.  The current code instead adds to the head
      of the list, but then updates the list head pointer to the last added
      page, in each step.  This does result in the same order, but is
      needlessly confusing and potentially wasteful, with no apparent benefit.
      This patch simplifies the code and adjusts comment accordingly.
      
      Link: http://lkml.kernel.org/r/f6505442-98a9-12e4-b2cd-0fa83874c159@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0fac3ba5
    • Mel Gorman's avatar
      mm: remove __GFP_COLD · 453f85d4
      Mel Gorman authored
      As the page free path makes no distinction between cache hot and cold
      pages, there is no real useful ordering of pages in the free list that
      allocation requests can take advantage of.  Juding from the users of
      __GFP_COLD, it is likely that a number of them are the result of copying
      other sites instead of actually measuring the impact.  Remove the
      __GFP_COLD parameter which simplifies a number of paths in the page
      allocator.
      
      This is potentially controversial but bear in mind that the size of the
      per-cpu pagelists versus modern cache sizes means that the whole per-cpu
      list can often fit in the L3 cache.  Hence, there is only a potential
      benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
      even worse when THP is taken into account which has little or no chance
      of getting a cache-hot page as the per-cpu list is bypassed and the
      zeroing of multiple pages will thrash the cache anyway.
      
      The truncate microbenchmarks are not shown as this patch affects the
      allocation path and not the free path.  A page fault microbenchmark was
      tested but it showed no sigificant difference which is not surprising
      given that the __GFP_COLD branches are a miniscule percentage of the
      fault path.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      453f85d4
    • Mel Gorman's avatar
      mm: remove cold parameter from free_hot_cold_page* · 2d4894b5
      Mel Gorman authored
      Most callers users of free_hot_cold_page claim the pages being released
      are cache hot.  The exception is the page reclaim paths where it is
      likely that enough pages will be freed in the near future that the
      per-cpu lists are going to be recycled and the cache hotness information
      is lost.  As no one really cares about the hotness of pages being
      released to the allocator, just ditch the parameter.
      
      The APIs are renamed to indicate that it's no longer about hot/cold
      pages.  It should also be less confusing as there are subtle differences
      between them.  __free_pages drops a reference and frees a page when the
      refcount reaches zero.  free_hot_cold_page handled pages whose refcount
      was already zero which is non-obvious from the name.  free_unref_page
      should be more obvious.
      
      No performance impact is expected as the overhead is marginal.  The
      parameter is removed simply because it is a bit stupid to have a useless
      parameter copied everywhere.
      
      [mgorman@techsingularity.net: add pages to head, not tail]
        Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net
      Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d4894b5