1. 01 Feb, 2018 40 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk · ab486bc9
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
      
       - Add a console_msg_format command line option:
      
           The value "default" keeps the old "[time stamp] text\n" format. The
           value "syslog" allows to see the syslog-like "<log
           level>[timestamp] text" format.
      
           This feature was requested by people doing regression tests, for
           example, 0day robot. They want to have both filtered and full logs
           at hands.
      
       - Reduce the risk of softlockup:
      
           Pass the console owner in a busy loop.
      
           This is a new approach to the old problem. It was first proposed by
           Steven Rostedt on Kernel Summit 2017. It marks a context in which
           the console_lock owner calls console drivers and could not sleep.
           On the other side, printk() callers could detect this state and use
           a busy wait instead of a simple console_trylock(). Finally, the
           console_lock owner checks if there is a busy waiter at the end of
           the special context and eventually passes the console_lock to the
           waiter.
      
           The hand-off works surprisingly well and helps in many situations.
           Well, there is still a possibility of the softlockup, for example,
           when the flood of messages stops and the last owner still has too
           much to flush.
      
           There is increasing number of people having problems with
           printk-related softlockups. We might eventually need to get better
           solution. Anyway, this looks like a good start and promising
           direction.
      
       - Do not allow to schedule in console_unlock() called from printk():
      
           This reverts an older controversial commit. The reschedule helped
           to avoid softlockups. But it also slowed down the console output.
           This patch is obsoleted by the new console waiter logic described
           above. In fact, the reschedule made the hand-off less effective.
      
       - Deprecate "%pf" and "%pF" format specifier:
      
           It was needed on ia64, ppc64 and parisc64 to dereference function
           descriptors and show the real function address. It is done
           transparently by "%ps" and "pS" format specifier now.
      
           Sergey Senozhatsky found that all the function descriptors were in
           a special elf section and could be easily detected.
      
       - Remove printk_symbol() API:
      
           It has been obsoleted by "%pS" format specifier, and this change
           helped to remove few continuous lines and a less intuitive old API.
      
       - Remove redundant memsets:
      
           Sergey removed unnecessary memset when processing printk.devkmsg
           command line option.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
        printk: drop redundant devkmsg_log_str memsets
        printk: Never set console_may_schedule in console_trylock()
        printk: Hide console waiter logic into helpers
        printk: Add console owner and waiter logic to load balance console writes
        kallsyms: remove print_symbol() function
        checkpatch: add pF/pf deprecation warning
        symbol lookup: introduce dereference_symbol_descriptor()
        parisc64: Add .opd based function descriptor dereference
        powerpc64: Add .opd based function descriptor dereference
        ia64: Add .opd based function descriptor dereference
        sections: split dereference_function_descriptor()
        openrisc: Fix conflicting types for _exext and _stext
        lib: do not use print_symbol()
        irq debug: do not use print_symbol()
        sysfs: do not use print_symbol()
        drivers: do not use print_symbol()
        x86: do not use print_symbol()
        unicore32: do not use print_symbol()
        sh: do not use print_symbol()
        mn10300: do not use print_symbol()
        ...
      ab486bc9
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfio · 34b1cf60
      Linus Torvalds authored
      Pull VFIO updates from Alex Williamson:
      
       - Mask INTx from user if pdev->irq is zero (Alexey Kardashevskiy)
      
       - Capability helper cleanup (Alex Williamson)
      
       - Allow mmaps overlapping MSI-X vector table with region capability
         exposing this feature (Alexey Kardashevskiy)
      
       - mdev static cleanups (Xiongwei Song)
      
      * tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfio:
        vfio: mdev: make a couple of functions and structure vfio_mdev_driver static
        vfio-pci: Allow mapping MSIX BAR
        vfio: Simplify capability helper
        vfio-pci: Mask INTx if a device is not capabable of enabling it
      34b1cf60
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 27529c89
      Linus Torvalds authored
      Pull tracing updates from Steven Rostedt:
       "There's not much changes for the tracing system this release. Mostly
        small clean ups and fixes.
      
        The biggest change is to how bprintf works. bprintf is used by
        trace_printk() to just save the format and args of a printf call, and
        the formatting is done when the trace buffer is read. This is done to
        keep the formatting out of the fast path (this was recommended by
        you). The issue is when arguments are de-referenced.
      
        If a pointer is saved, and the format has something like "%*pbl", when
        the buffer is read, it will de-reference the argument then. The
        problem is if the data no longer exists. This can cause the kernel to
        oops.
      
        The fix for this was to make these de-reference pointes do the
        formatting at the time it is called (the fast path), as this
        guarantees that the data exists (and doesn't change later)"
      
      * tag 'trace-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        vsprintf: Do not have bprintf dereference pointers
        ftrace: Mark function tracer test functions noinline/noclone
        trace_uprobe: Display correct offset in uprobe_events
        tracing: Make sure the parsed string always terminates with '\0'
        tracing: Clear parser->idx if only spaces are read
        tracing: Detect the string nul character when parsing user input string
      27529c89
    • Linus Torvalds's avatar
      Merge branch 'KASAN-read_word_at_a_time' · 8e44e660
      Linus Torvalds authored
      Merge KASAN word-at-a-time fixups from Andrey Ryabinin.
      
      The word-at-a-time optimizations have caused headaches for KASAN, since
      the whole point is that we access byte streams in bigger chunks, and
      KASAN can be unhappy about the potential extra access at the end of the
      string.
      
      We used to have a horrible hack in dcache, and then people got
      complaints from the strscpy() case.  This fixes it all up properly, by
      adding an explicit helper for the "access byte stream one word at a
      time" case.
      
      * emailed patches from Andrey Ryabinin <aryabinin@virtuozzo.com>:
        fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports"
        fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()
        lib/strscpy: Shut up KASAN false-positives in strscpy()
        compiler.h: Add read_word_at_a_time() function.
        compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
      8e44e660
    • Andrey Ryabinin's avatar
      fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports" · babcbbc7
      Andrey Ryabinin authored
      This reverts commit df4c0e36.
      
      It's no longer needed since dentry_string_cmp() now uses
      read_word_at_a_time() to avoid kasan's reports.
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      babcbbc7
    • Andrey Ryabinin's avatar
      fs/dcache: Use read_word_at_a_time() in dentry_string_cmp() · bfe7aa6c
      Andrey Ryabinin authored
      dentry_string_cmp() performs the word-at-a-time reads from 'cs' and may
      read slightly more than it was requested in kmallac().  Normally this
      would make KASAN to report out-of-bounds access, but this was
      workarounded by commit df4c0e36 ("fs: dcache: manually unpoison
      dname after allocation to shut up kasan's reports").
      
      This workaround is not perfect, since it allows out-of-bounds access to
      dentry's name for all the code, not just in dentry_string_cmp().
      
      So it would be better to use read_word_at_a_time() instead and revert
      commit df4c0e36.
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bfe7aa6c
    • Andrey Ryabinin's avatar
      lib/strscpy: Shut up KASAN false-positives in strscpy() · 1a3241ff
      Andrey Ryabinin authored
      strscpy() performs the word-at-a-time optimistic reads.  So it may may
      access the memory past the end of the object, which is perfectly fine
      since strscpy() doesn't use that (past-the-end) data and makes sure the
      optimistic read won't cross a page boundary.
      
      Use new read_word_at_a_time() to shut up the KASAN.
      
      Note that this potentially could hide some bugs.  In example bellow,
      stscpy() will copy more than we should (1-3 extra uninitialized bytes):
      
              char dst[8];
              char *src;
      
              src = kmalloc(5, GFP_KERNEL);
              memset(src, 0xff, 5);
              strscpy(dst, src, 8);
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a3241ff
    • Andrey Ryabinin's avatar
      compiler.h: Add read_word_at_a_time() function. · 7f1e541f
      Andrey Ryabinin authored
      Sometimes we know that it's safe to do potentially out-of-bounds access
      because we know it won't cross a page boundary.  Still, KASAN will
      report this as a bug.
      
      Add read_word_at_a_time() function which is supposed to be used in such
      cases.  In read_word_at_a_time() KASAN performs relaxed check - only the
      first byte of access is validated.
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f1e541f
    • Andrey Ryabinin's avatar
      compiler.h, kasan: Avoid duplicating __read_once_size_nocheck() · bdb5ac80
      Andrey Ryabinin authored
      Instead of having two identical __read_once_size_nocheck() functions
      with different attributes, consolidate all the difference in new macro
      __no_kasan_or_inline and use it. No functional changes.
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bdb5ac80
    • Linus Torvalds's avatar
      Merge tag 'kconfig-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 562f36ed
      Linus Torvalds authored
      Pull Kconfig updates from Masahiro Yamada:
       "A pretty big batch of Kconfig updates.
      
        I have to mention the lexer and parser of Kconfig are now built from
        real .l and .y sources. So, flex and bison are the requirement for
        building the kernel. Both of them (unlike gperf) have been stable for
        a long time. This change has been tested several weeks in linux-next,
        and I did not receive any problem report about this.
      
        Summary:
      
         - add checks for mistakes, like the choice default is not in choice,
           help is doubled
      
         - document data structure and complex code
      
         - fix various memory leaks
      
         - change Makefile to build lexer and parser instead of using
           pre-generated C files
      
         - drop 'boolean' keyword, which is equivalent to 'bool'
      
         - use default 'yy' prefix and remove unneeded Make variables
      
         - fix gettext() check for xconfig
      
         - announce that oldnoconfig will be finally removed
      
         - make 'Selected by:' and 'Implied by' readable in help and search
           result
      
         - hide silentoldconfig from 'make help' to stop confusing people
      
         - fix misc things and cleanups"
      
      * tag 'kconfig-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (37 commits)
        kconfig: Remove silentoldconfig from help and docs; fix kconfig/conf's help
        kconfig: make "Selected by:" and "Implied by:" readable
        kconfig: announce removal of oldnoconfig if used
        kconfig: fix make xconfig when gettext is missing
        kconfig: Clarify menu and 'if' dependency propagation
        kconfig: Document 'if' flattening logic
        kconfig: Clarify choice dependency propagation
        kconfig: Document SYMBOL_OPTIONAL logic
        kbuild: remove unnecessary LEX_PREFIX and YACC_PREFIX
        kconfig: use default 'yy' prefix for lexer and parser
        kconfig: make conf_unsaved a local variable of conf_read()
        kconfig: make xfgets() really static
        kconfig: make input_mode static
        kconfig: Warn if there is more than one help text
        kconfig: drop 'boolean' keyword
        kconfig: use bool instead of boolean for type definition attributes, again
        kconfig: Remove menu_end_entry()
        kconfig: Document important expression functions
        kconfig: Document automatic submenu creation code
        kconfig: Fix choice symbol expression leak
        ...
      562f36ed
    • Linus Torvalds's avatar
      Merge tag 'kbuild-misc-v4.16' of... · a659f159
      Linus Torvalds authored
      Merge tag 'kbuild-misc-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild misc updates from Masahiro Yamada:
      
       - add snap-pkg target to create Linux kernel snap package
      
       - make out-of-tree creation of source packages fail correctly
      
       - improve and fix several semantic patches
      
      * tag 'kbuild-misc-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Coccinelle: coccicheck: fix typo
        Coccinelle: memdup: drop spurious line
        Coccinelle: kzalloc-simple: Rename kzalloc-simple to zalloc-simple
        Coccinelle: ifnullfree: Trim the warning reported in report mode
        Coccinelle: alloc_cast: Add more memory allocating functions to the list
        Coccinelle: array_size: report even if include is missing
        Coccinelle: kzalloc-simple: Add all zero allocating functions
        kbuild: pkg: make out-of-tree rpm/deb-pkg build immediately fail
        scripts/package: snap-pkg target
      a659f159
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 06c8f7a7
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - terminate the build correctly in case of fixdep errors
      
       - clean up fixdep
      
       - suppress packed-not-aligned warnings from GCC-8
      
       - fix W= handling for extra DTC warnings
      
      * tag 'kbuild-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: fix W= option checks for extra DTC warnings
        Kbuild: suppress packed-not-aligned warning for default setting only
        fixdep: use existing helper to check modular CONFIG options
        fixdep: refactor parse_dep_file()
        fixdep: move global variables to local variables of main()
        fixdep: remove unneeded memcpy() in parse_dep_file()
        fixdep: factor out common code for reading files
        fixdep: use malloc() and read() to load dep_file to buffer
        fixdep: remove unnecessary <arpa/inet.h> inclusion
        fixdep: exit with error code in error branches of do_config_file()
      06c8f7a7
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 2bed2660
      Linus Torvalds authored
      Pull DeviceTree updates from Rob Herring:
      
       - Convert to use memblock_virt_alloc in DT code which supports
         bootmem arches. With this we can remove the arch specific
         early_init_dt_alloc_memory_arch() functions.
      
       - Enable running the DT unittests on UML
      
       - Use SPDX license tags on DT files
      
       - Fix early FDT kconfig ifdef logic
      
       - Clean-up unittest Makefile
      
       - Fix function comment for of_irq_parse_raw
      
       - Add missing documentation for linux,initrd-{start,end} properties
      
       - Clean-up of binding examples using uppercase hex
      
       - Add trivial devices W83773G and Infineon TLV493D-A1B6
      
       - Add missing STM32 SoC bindings
      
       - Various small binding doc fixes
      
      * tag 'devicetree-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (23 commits)
        xtensa: remove arch specific early DT functions
        x86: remove arch specific early_init_dt_alloc_memory_arch
        nios2: remove arch specific early_init_dt_alloc_memory_arch
        mips: remove arch specific early_init_dt_alloc_memory_arch
        metag: remove arch specific early DT functions
        cris: remove arch specific early DT functions
        libfdt: remove unnecessary include directive from <linux/libfdt.h>
        of: unittest: refactor Makefile
        of/fdt: use memblock_virt_alloc for early alloc
        of: Use SPDX license tag for DT files
        of/fdt: Fix #ifdef dependency of early flattree declarations
        dt-bindings: h8300 clocksource: correct spelling of pulse
        dt-bindings: imx6q-pcie: Add required property for i.MX6SX
        mmc: Don't reference Linux-specific OF_GPIO_ACTIVE_LOW flag in DT binding
        dt-bindings: Use lower case hex in unit-addresses
        dt-bindings: display: panel: Fix compatible string for Toshiba LT089AC29000
        dt-bindings: Add Infineon TLV493D-A1B6
        dt-bindings: mailbox: ti,message-manager: Fix interrupt name error
        dt-bindings: chosen: Document linux,initrd-{start,end}
        dt-bindings: arm: document supported STM32 SoC family
        ...
      2bed2660
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · eea43ed8
      Linus Torvalds authored
      Pull input layer updates from Dmitry Torokhov:
      
       - evdev interface has been adjusted to extend the life of timestamps on
         32 bit systems to the year of 2108
      
       - Synaptics RMI4 driver's PS/2 guest handling ha beed updated to
         improve chances of detecting trackpoints on the pass-through port
      
       - mms114 touchcsreen controller driver has been updated to support
         generic device properties and work with mms152 cntrollers
      
       - Goodix driver now supports generic touchscreen properties
      
       - couple of drivers for AVR32 architecture are gone as the architecture
         support has been removed from the kernel
      
       - gpio-tilt driver has been removed as there are no mainline users and
         the driver itself is using legacy APIs and relies on platform data
      
       - MODULE_LINECSE/MODULE_VERSION cleanups
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits)
        Input: goodix - use generic touchscreen_properties
        Input: mms114 - fix typo in definition
        Input: mms114 - use BIT() macro instead of explicit shifting
        Input: mms114 - replace mdelay with msleep
        Input: mms114 - add support for mms152
        Input: mms114 - drop platform data and use generic APIs
        Input: mms114 - mark as direct input device
        Input: mms114 - do not clobber interrupt trigger
        Input: edt-ft5x06 - fix error handling for factory mode on non-M06
        Input: stmfts - set IRQ_NOAUTOEN to the irq flag
        Input: auo-pixcir-ts - delete an unnecessary return statement
        Input: auo-pixcir-ts - remove custom log for a failed memory allocation
        Input: da9052_tsi - remove unused mutex
        Input: docs - use PROPERTY_ENTRY_U32() directly
        Input: synaptics-rmi4 - log when we create a guest serio port
        Input: synaptics-rmi4 - unmask F03 interrupts when port is opened
        Input: synaptics-rmi4 - do not delete interrupt memory too early
        Input: ad7877 - use managed resource allocations
        Input: stmfts,s6sy671 - add SPDX identifier
        Input: remove atmel-wm97xx touchscreen driver
        ...
      eea43ed8
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · f6cff79f
      Linus Torvalds authored
      Pull char/misc driver updates from Greg KH:
       "Here is the big pull request for char/misc drivers for 4.16-rc1.
      
        There's a lot of stuff in here. Three new driver subsystems were added
        for various types of hardware busses:
      
         - siox
         - slimbus
         - soundwire
      
        as well as a new vboxguest subsystem for the VirtualBox hypervisor
        drivers.
      
        There's also big updates from the FPGA subsystem, lots of Android
        binder fixes, the usual handful of hyper-v updates, and lots of other
        smaller driver updates.
      
        All of these have been in linux-next for a long time, with no reported
        issues"
      
      * tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (155 commits)
        char: lp: use true or false for boolean values
        android: binder: use VM_ALLOC to get vm area
        android: binder: Use true and false for boolean values
        lkdtm: fix handle_irq_event symbol for INT_HW_IRQ_EN
        EISA: Delete error message for a failed memory allocation in eisa_probe()
        EISA: Whitespace cleanup
        misc: remove AVR32 dependencies
        virt: vbox: Add error mapping for VERR_INVALID_NAME and VERR_NO_MORE_FILES
        soundwire: Fix a signedness bug
        uio_hv_generic: fix new type mismatch warnings
        uio_hv_generic: fix type mismatch warnings
        auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
        uio_hv_generic: add rescind support
        uio_hv_generic: check that host supports monitor page
        uio_hv_generic: create send and receive buffers
        uio: document uio_hv_generic regions
        doc: fix documentation about uio_hv_generic
        vmbus: add monitor_id and subchannel_id to sysfs per channel
        vmbus: fix ABI documentation
        uio_hv_generic: use ISR callback method
        ...
      f6cff79f
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.16-rc1' of... · 47fcc036
      Linus Torvalds authored
      Merge tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core updates from Greg KH:
       "Here is the set of "big" driver core patches for 4.16-rc1.
      
        The majority of the work here is in the firmware subsystem, with
        reworks to try to attempt to make the code easier to handle in the
        long run, but no functional change. There's also some tree-wide sysfs
        attribute fixups with lots of acks from the various subsystem
        maintainers, as well as a handful of other normal fixes and changes.
      
        And finally, some license cleanups for the driver core and sysfs code.
      
        All have been in linux-next for a while with no reported issues"
      
      * tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
        device property: Define type of PROPERTY_ENRTY_*() macros
        device property: Reuse property_entry_free_data()
        device property: Move property_entry_free_data() upper
        firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
        firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
        USB: serial: keyspan: Drop firmware Kconfig options
        sysfs: remove DEBUG defines
        sysfs: use SPDX identifiers
        drivers: base: add coredump driver ops
        sysfs: add attribute specification for /sysfs/devices/.../coredump
        test_firmware: fix missing unlock on error in config_num_requests_store()
        test_firmware: make local symbol test_fw_config static
        sysfs: turn WARN() into pr_warn()
        firmware: Fix a typo in fallback-mechanisms.rst
        treewide: Use DEVICE_ATTR_WO
        treewide: Use DEVICE_ATTR_RO
        treewide: Use DEVICE_ATTR_RW
        sysfs.h: Use octal permissions
        component: add debugfs support
        bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
        ...
      47fcc036
    • Linus Torvalds's avatar
      Merge tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 5d8515bc
      Linus Torvalds authored
      Pull staging/IIO updates from Greg KH:
       "Here is the big Staging and IIO driver patches for 4.16-rc1.
      
        There is the normal amount of new IIO drivers added, like all
        releases.
      
        The networking IPX and the ncpfs filesystem are moved into the staging
        tree, as they are on their way out of the kernel due to lack of use
        anymore.
      
        The visorbus subsystem finall has started moving out of the staging
        tree to the "real" part of the kernel, and the most and fsl-mc
        codebases are almost ready to move out, that will probably happen for
        4.17-rc1 if all goes well.
      
        Other than that, there is a bunch of license header cleanups in the
        tree, along with the normal amount of coding style churn that we all
        know and love for this codebase. I also got frustrated at the
        Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
        huge chunks of it that were never even being used.
      
        Full details of everything is in the shortlog.
      
        All of these patches have been in linux-next for a while with no
        reported issues"
      
      * tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits)
        staging: rtlwifi: remove redundant initialization of 'cfg_cmd'
        staging: rtl8723bs: remove a couple of redundant initializations
        staging: comedi: reformat lines to 80 chars or less
        staging: lustre: separate a connection destroy from free struct kib_conn
        Staging: rtl8723bs: Use !x instead of NULL comparison
        Staging: rtl8723bs: Remove dead code
        Staging: rtl8723bs: Change names to conform to the kernel code
        staging: ccree: Fix missing blank line after declaration
        staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd'
        staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST
        staging: fbtft: remove unused FB_TFT_SSD1325 kconfig
        staging: comedi: dt2811: remove redundant initialization of 'ns'
        staging: wilc1000: fix alignments to match open parenthesis
        staging: wilc1000: removed unnecessary defined enums typedef
        staging: wilc1000: remove unnecessary use of parentheses
        staging: rtl8192u: remove redundant initialization of 'timeout'
        staging: sm750fb: fix CamelCase for dispSet var
        staging: lustre: lnet/selftest: fix compile error on UP build
        staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons
        staging: rts5208: Fix "seg_no" calculation in reset_ms_card()
        ...
      5d8515bc
    • Linus Torvalds's avatar
      Merge tag 'tty-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · db593322
      Linus Torvalds authored
      Pull tty/staging driver updates from Greg KH:
       "Here is the big tty/serial driver update for 4.16-rc1.
      
        The usual number of various serial driver fixes and updates to try to
        get them to work with crazy hardware configurations (seriously, how
        many different ways are hardware engineers going to come up with to
        hook up a simple UART?)
      
        There is also some serdev bugfixes and updates, as well as a
        smattering of other small fixes in here.
      
        All have been in the linux-next tree for a while, with no reported
        issues"
      
      * tag 'tty-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits)
        tty: serial: exar: Relocate sleep wake-up handling
        tty: fix data race between tty_init_dev and flush of buf
        serial: imx: fix endless loop during suspend
        serial: core: mark port as initialized after successful IRQ change
        serdev: only match serdev devices
        serdev: do not generate modaliases for controllers
        serial: mxs-auart: don't use GPIOF_* with gpiod_get_direction
        serial: 8250_dw: Revert "Improve clock rate setting"
        MAINTAINERS: Add myself as designated reviewer for 8250_dw
        gpio: serial: max310x: Support open-drain configuration for GPIOs
        serdev: Fix serdev_uevent failure on ACPI enumerated serdev-controllers
        serial: 8250_ingenic: Parse earlycon options
        serial: 8250_ingenic: Add support for the JZ4770 SoC
        serial: core: Make uart_parse_options take const char* argument
        serial: 8250_of: fix return code when probe function fails to get reset
        serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
        serial: 8250_uniphier: fix error return code in uniphier_uart_probe()
        tty: n_gsm: Allow ADM response in addition to UA for control dlci
        tty: omap-serial: Fix initial on-boot RTS GPIO level
        tty: serial: jsm: Add one check against NULL pointer dereference
        ...
      db593322
    • Linus Torvalds's avatar
      Merge tag 'usb-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · e4ee8b85
      Linus Torvalds authored
      Pull USB/PHY updates from Greg KH:
       "Here is the big USB and PHY driver update for 4.16-rc1.
      
        Along with the normally expected XHCI, MUSB, and Gadget driver
        patches, there are some PHY driver fixes, license cleanups, sysfs
        attribute cleanups, usbip changes, and a raft of other smaller fixes
        and additions.
      
        Full details are in the shortlog.
      
        All of these have been in the linux-next tree for a long time with no
        reported issues"
      
      * tag 'usb-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (137 commits)
        USB: serial: pl2303: new device id for Chilitag
        USB: misc: fix up some remaining DEVICE_ATTR() usages
        USB: musb: fix up one odd DEVICE_ATTR() usage
        USB: atm: fix up some remaining DEVICE_ATTR() usage
        USB: move many drivers to use DEVICE_ATTR_WO
        USB: move many drivers to use DEVICE_ATTR_RO
        USB: move many drivers to use DEVICE_ATTR_RW
        USB: misc: chaoskey: Use true and false for boolean values
        USB: storage: remove old wording about how to submit a change
        USB: storage: remove invalid URL from drivers
        usb: ehci-omap: don't complain on -EPROBE_DEFER when no PHY found
        usbip: list: don't list devices attached to vhci_hcd
        usbip: prevent bind loops on devices attached to vhci_hcd
        USB: serial: remove redundant initializations of 'mos_parport'
        usb/gadget: Fix "high bandwidth" check in usb_gadget_ep_match_desc()
        usb: gadget: compress return logic into one line
        usbip: vhci_hcd: update 'status' file header and format
        USB: serial: simple: add Motorola Tetra driver
        CDC-ACM: apply quirk for card reader
        usb: option: Add support for FS040U modem
        ...
      e4ee8b85
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide · 7109a04e
      Linus Torvalds authored
      Pull small IDE cleanup from David Miller.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
        ide: remove duplicated assignment to 'cursg'
      7109a04e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next · ba49097e
      Linus Torvalds authored
      Pull sparc updates from David Miller:
       "Of note is the addition of a driver for the Data Analytics
        Accelerator, and some small cleanups"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
        oradax: Fix return value check in dax_attach()
        sparc: vDSO: remove an extra tab
        sparc64: drop unneeded compat include
        sparc64: Oracle DAX driver
        sparc64: Oracle DAX infrastructure
      ba49097e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · ca0c836d
      Linus Torvalds authored
      Pull s390 updates from Martin Schwidefsky:
       "Bug fixes, small improvements and one notable change: the system call
        table and the unistd.h header are now generated automatically with a
        shell script from a text file"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/decompressor: discard __ksymtab and .eh_frame sections
        s390: fix handling of -1 in set{,fs}[gu]id16 syscalls
        s390/tools: generate header files in arch/s390/include/generated/
        s390/syscalls: use generated syscall_table.h and unistd.h header files
        s390/syscalls: add Makefile to generate system call header files
        s390/syscalls: add syscalltbl script
        s390/syscalls: add system call table
        s390/decompressor: swap .text and .rodata.compressed sections
        s390/sclp: fix .data section specification
        s390/ipl: avoid usage of __section(.data)
        s390/head: replace hard coded values with constants
        s390/disassembler: add generated gen_opcode_table tool to .gitignore
        s390: remove bogus system call table entries
        s390/kprobes: remove duplicate includes
        s390/dasd: Remove dead return code checks
        s390/dasd: Simplify code
        s390/vdso: revise CFI annotations of vDSO functions
        s390/kernel: emit CFI data in .debug_frame and discard .eh_frame sections
      ca0c836d
    • Julia Lawall's avatar
      Coccinelle: coccicheck: fix typo · 1640eea3
      Julia Lawall authored
      Correct spelling of "coccinelle".
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      1640eea3
    • Dmitry Torokhov's avatar
      Merge branch 'next' into for-linus · d67ad78e
      Dmitry Torokhov authored
      Prepare input updates for 4.16 merge window.
      d67ad78e
    • Linus Torvalds's avatar
      Merge tag 'docs-4.16' of git://git.lwn.net/linux · 255442c9
      Linus Torvalds authored
      Pull documentation updates from Jonathan Corbet:
       "Documentation updates for 4.16.
      
        New stuff includes refcount_t documentation, errseq documentation,
        kernel-doc support for nested structure definitions, the removal of
        lots of crufty kernel-doc support for unused formats, SPDX tag
        documentation, the beginnings of a manual for subsystem maintainers,
        and lots of fixes and updates.
      
        As usual, some of the changesets reach outside of Documentation/ to
        effect kerneldoc comment fixes. It also adds the new LICENSES
        directory, of which Thomas promises I do not need to be the
        maintainer"
      
      * tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits)
        linux-next: docs-rst: Fix typos in kfigure.py
        linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt
        Documentation: Fix misconversion of #if
        docs: add index entry for networking/msg_zerocopy
        Documentation: security/credentials.rst: explain need to sort group_list
        LICENSES: Add MPL-1.1 license
        LICENSES: Add the GPL 1.0 license
        LICENSES: Add Linux syscall note exception
        LICENSES: Add the MIT license
        LICENSES: Add the BSD-3-clause "Clear" license
        LICENSES: Add the BSD 3-clause "New" or "Revised" License
        LICENSES: Add the BSD 2-clause "Simplified" license
        LICENSES: Add the LGPL-2.1 license
        LICENSES: Add the LGPL 2.0 license
        LICENSES: Add the GPL 2.0 license
        Documentation: Add license-rules.rst to describe how to properly identify file licenses
        scripts: kernel_doc: better handle show warnings logic
        fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at
        doc: md: Fix a file name to md-fault.c in fault-injection.txt
        errseq: Add to documentation tree
        ...
      255442c9
    • Linus Torvalds's avatar
      Merge branch 'work.vmci' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d76e0a05
      Linus Torvalds authored
      Pull vmci iov_iter updates from Al Viro:
       "Get rid of "is it an iovec or an entire array?" flags in vmxi - just
        use iov_iter. Simplifies the living hell out of that code..."
      
      * 'work.vmci' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        vmci: the same on the send side...
        vmci: simplify qp_dequeue_locked()
        vmci: get rid of qp_memcpy_from_queue()
        vmci: fix buf_size in case of iovec-based accesses
      d76e0a05
    • Linus Torvalds's avatar
      Merge branch 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 40b9672a
      Linus Torvalds authored
      Pull asm/uaccess.h whack-a-mole from Al Viro:
       "It's linux/uaccess.h, damnit... Oh, well - eventually they'll stop
        cropping up..."
      
      * 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        asm-prototypes.h: use linux/uaccess.h, not asm/uaccess.h
        riscv: use linux/uaccess.h, not asm/uaccess.h...
        ppc: for put_user() pull linux/uaccess.h, not asm/uaccess.h
      40b9672a
    • Linus Torvalds's avatar
      Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · dc1efc3c
      Linus Torvalds authored
      Pull dcache updates from Al Viro:
       "Neil Brown's d_move()/d_path() race fix"
      
      * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        VFS: close race between getcwd() and d_move()
      dc1efc3c
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 73da9e1a
      Linus Torvalds authored
      Merge updates from Andrew Morton:
      
       - misc fixes
      
       - ocfs2 updates
      
       - most of MM
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
        mm: remove PG_highmem description
        tools, vm: new option to specify kpageflags file
        mm/swap.c: make functions and their kernel-doc agree
        mm, memory_hotplug: fix memmap initialization
        mm: correct comments regarding do_fault_around()
        mm: numa: do not trap faults on shared data section pages.
        hugetlb, mbind: fall back to default policy if vma is NULL
        hugetlb, mempolicy: fix the mbind hugetlb migration
        mm, hugetlb: further simplify hugetlb allocation API
        mm, hugetlb: get rid of surplus page accounting tricks
        mm, hugetlb: do not rely on overcommit limit during migration
        mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
        mm, hugetlb: unify core page allocation accounting and initialization
        mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
        mm/memcontrol.c: make local symbol static
        mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
        include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
        mm/compaction.c: fix comment for try_to_compact_pages()
        mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
        zsmalloc: use U suffix for negative literals being shifted
        ...
      73da9e1a
    • Miles Chen's avatar
      mm: remove PG_highmem description · 3f56a2f8
      Miles Chen authored
      Commit cbe37d09 ("[PATCH] mm: remove PG_highmem") removed PG_highmem
      to save a page flag.  So the description of PG_highmem is no longer
      needed.
      
      Link: http://lkml.kernel.org/r/1517391212-2950-1-git-send-email-miles.chen@mediatek.comSigned-off-by: default avatarMiles Chen <miles.chen@mediatek.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f56a2f8
    • David Rientjes's avatar
      tools, vm: new option to specify kpageflags file · c7905f20
      David Rientjes authored
      page-types currently hardcodes /proc/kpageflags as the file to parse.
      This works when using the tool to examine the state of pageflags on the
      same system, but does not allow storing a snapshot of pageflags at a
      given time to debug issues nor on a different system.
      
      This allows the user to specify a saved version of kpageflags with a new
      page-types -F option.
      
      [akpm@linux-foundation.org: add "filename" to fix usage() string]
      [rientjes@google.com: fix layout]
        Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801301840050.140969@chino.kir.corp.google.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801301458180.153857@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7905f20
    • Randy Dunlap's avatar
      mm/swap.c: make functions and their kernel-doc agree · e02a9f04
      Randy Dunlap authored
      Fix some basic kernel-doc notation in mm/swap.c:
      
       - for function lru_cache_add_anon(), make its kernel-doc function name
         match its function name and change colon to hyphen following the
         function name
      
       - for function pagevec_lookup_entries(), change the function parameter
         name from nr_pages to nr_entries since that is more descriptive of
         what the parameter actually is and then it matches the kernel-doc
         comments also
      
      Fix function kernel-doc to match the change in commit 67fd707f:
      
       - drop the kernel-doc notation for @nr_pages from
         pagevec_lookup_range() and correct the function description for that
         change
      
      Link: http://lkml.kernel.org/r/3b42ee3e-04a9-a6ca-6be4-f00752a114fe@infradead.org
      Fixes: 67fd707f ("mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e02a9f04
    • Michal Hocko's avatar
      mm, memory_hotplug: fix memmap initialization · 9bb5a391
      Michal Hocko authored
      Bharata has noticed that onlining a newly added memory doesn't increase
      the total memory, pointing to commit f7f99100 ("mm: stop zeroing
      memory during allocation in vmemmap") as a culprit.  This commit has
      changed the way how the memory for memmaps is initialized and moves it
      from the allocation time to the initialization time.  This works
      properly for the early memmap init path.
      
      It doesn't work for the memory hotplug though because we need to mark
      page as reserved when the sparsemem section is created and later
      initialize it completely during onlining.  memmap_init_zone is called in
      the early stage of onlining.  With the current code it calls
      __init_single_page and as such it clears up the whole stage and
      therefore online_pages_range skips those pages.
      
      Fix this by skipping mm_zero_struct_page in __init_single_page for
      memory hotplug path.  This is quite uggly but unifying both early init
      and memory hotplug init paths is a large project.  Make sure we plug the
      regression at least.
      
      Link: http://lkml.kernel.org/r/20180130101141.GW21609@dhcp22.suse.cz
      Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarBharata B Rao <bharata@linux.vnet.ibm.com>
      Tested-by: default avatarBharata B Rao <bharata@linux.vnet.ibm.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9bb5a391
    • William Kucharski's avatar
      mm: correct comments regarding do_fault_around() · da391d64
      William Kucharski authored
      There are multiple comments surrounding do_fault_around that memtion
      fault_around_pages() and fault_around_mask(), two routines that do not
      exist.  These comments should be reworded to reference
      fault_around_bytes, the value which is used to determine how much
      do_fault_around() will attempt to read when processing a fault.
      
      These comments should have been updated when fault_around_pages() and
      fault_around_mask() were removed in commit aecd6f44 ("mm: close race
      between do_fault_around() and fault_around_bytes_set()").
      
      Fixes: aecd6f44 ("mm: close race between do_fault_around() and fault_around_bytes_set()")
      Link: http://lkml.kernel.org/r/302D0B14-C7E9-44C6-8BED-033F9ACBD030@oracle.comSigned-off-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarLarry Bassel <larry.bassel@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da391d64
    • Henry Willard's avatar
      mm: numa: do not trap faults on shared data section pages. · 859d4adc
      Henry Willard authored
      Workloads consisting of a large number of processes running the same
      program with a very large shared data segment may experience performance
      problems when numa balancing attempts to migrate the shared cow pages.
      This manifests itself with many processes or tasks in
      TASK_UNINTERRUPTIBLE state waiting for the shared pages to be migrated.
      
      The program listed below simulates the conditions with these results
      when run with 288 processes on a 144 core/8 socket machine.
      
      Average throughput 	Average throughput     Average throughput
      with numa_balancing=0	with numa_balancing=1  with numa_balancing=1
           			without the patch      with the patch
      ---------------------	---------------------  ---------------------
      2118782			2021534		       2107979
      
      Complex production environments show less variability and fewer poorly
      performing outliers accompanied with a smaller number of processes
      waiting on NUMA page migration with this patch applied.  In some cases,
      %iowait drops from 16%-26% to 0.
      
        // SPDX-License-Identifier: GPL-2.0
        /*
         * Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved.
         */
        #include <sys/time.h>
        #include <stdio.h>
        #include <wait.h>
        #include <sys/mman.h>
      
        int a[1000000] = {13};
      
        int  main(int argc, const char **argv)
        {
      	int n = 0;
      	int i;
      	pid_t pid;
      	int stat;
      	int *count_array;
      	int cpu_count = 288;
      	long total = 0;
      
      	struct timeval t1, t2 = {(argc > 1 ? atoi(argv[1]) : 10), 0};
      
      	if (argc > 2)
      		cpu_count = atoi(argv[2]);
      
      	count_array = mmap(NULL, cpu_count * sizeof(int),
      			   (PROT_READ|PROT_WRITE),
      			   (MAP_SHARED|MAP_ANONYMOUS), 0, 0);
      
      	if (count_array == MAP_FAILED) {
      		perror("mmap:");
      		return 0;
      	}
      
      	for (i = 0; i < cpu_count; ++i) {
      		pid = fork();
      		if (pid <= 0)
      			break;
      		if ((i & 0xf) == 0)
      			usleep(2);
      	}
      
      	if (pid != 0) {
      		if (i == 0) {
      			perror("fork:");
      			return 0;
      		}
      
      		for (;;) {
      			pid = wait(&stat);
      			if (pid < 0)
      				break;
      		}
      
      		for (i = 0; i < cpu_count; ++i)
      			total += count_array[i];
      
      		printf("Total %ld\n", total);
      		munmap(count_array, cpu_count * sizeof(int));
      		return 0;
      	}
      
      	gettimeofday(&t1, 0);
      	timeradd(&t1, &t2, &t1);
      	while (timercmp(&t2, &t1, <)) {
      		int b = 0;
      		int j;
      
      		for (j = 0; j < 1000000; j++)
      			b += a[j];
      		gettimeofday(&t2, 0);
      		n++;
      	}
      	count_array[i] = n;
      	return 0;
        }
      
      This patch changes change_pte_range() to skip shared copy-on-write pages
      when called from change_prot_numa().
      
      NOTE: change_prot_numa() is nominally called from task_numa_work() and
      queue_pages_test_walk().  task_numa_work() is the auto NUMA balancing
      path, and queue_pages_test_walk() is part of explicit NUMA policy
      management.  However, queue_pages_test_walk() only calls
      change_prot_numa() when MPOL_MF_LAZY is specified and currently that is
      not allowed, so change_prot_numa() is only called from auto NUMA
      balancing.
      
      In the case of explicit NUMA policy management, shared pages are not
      migrated unless MPOL_MF_MOVE_ALL is specified, and MPOL_MF_MOVE_ALL
      depends on CAP_SYS_NICE.  Currently, there is no way to pass information
      about MPOL_MF_MOVE_ALL to change_pte_range.  This will have to be fixed
      if MPOL_MF_LAZY is enabled and MPOL_MF_MOVE_ALL is to be honored in lazy
      migration mode.
      
      task_numa_work() skips the read-only VMAs of programs and shared
      libraries.
      
      Link: http://lkml.kernel.org/r/1516751617-7369-1-git-send-email-henry.willard@oracle.comSigned-off-by: default avatarHenry Willard <henry.willard@oracle.com>
      Reviewed-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      859d4adc
    • Michal Hocko's avatar
      hugetlb, mbind: fall back to default policy if vma is NULL · 389c8178
      Michal Hocko authored
      Dan Carpenter has noticed that mbind migration callback (new_page) can
      get a NULL vma pointer and choke on it inside alloc_huge_page_vma which
      relies on the VMA to get the hstate.  We used to BUG_ON this case but
      the BUG_+ON has been removed recently by "hugetlb, mempolicy: fix the
      mbind hugetlb migration".
      
      The proper way to handle this is to get the hstate from the migrated
      page and rely on huge_node (resp.  get_vma_policy) do the right thing
      with null VMA.  We are currently falling back to the default mempolicy
      in that case which is in line what THP path is doing here.
      
      Link: http://lkml.kernel.org/r/20180110104712.GR1732@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      389c8178
    • Michal Hocko's avatar
      hugetlb, mempolicy: fix the mbind hugetlb migration · ebd63723
      Michal Hocko authored
      do_mbind migration code relies on alloc_huge_page_noerr for hugetlb
      pages.  alloc_huge_page_noerr uses alloc_huge_page which is a highlevel
      allocation function which has to take care of reserves, overcommit or
      hugetlb cgroup accounting.  None of that is really required for the page
      migration because the new page is only temporal and either will replace
      the original page or it will be dropped.  This is essentially as for
      other migration call paths and there shouldn't be any reason to handle
      mbind in a special way.
      
      The current implementation is even suboptimal because the migration
      might fail just because the hugetlb cgroup limit is reached, or the
      overcommit is saturated.
      
      Fix this by making mbind like other hugetlb migration paths.  Add a new
      migration helper alloc_huge_page_vma as a wrapper around
      alloc_huge_page_nodemask with additional mempolicy handling.
      
      alloc_huge_page_noerr has no more users and it can go.
      
      Link: http://lkml.kernel.org/r/20180103093213.26329-7-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Reale <ar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ebd63723
    • Michal Hocko's avatar
      mm, hugetlb: further simplify hugetlb allocation API · 0c397dae
      Michal Hocko authored
      Hugetlb allocator has several layer of allocation functions depending
      and the purpose of the allocation.  There are two allocators depending
      on whether the page can be allocated from the page allocator or we need
      a contiguous allocator.  This is currently opencoded in
      alloc_fresh_huge_page which is the only path that might allocate giga
      pages which require the later allocator.  Create alloc_fresh_huge_page
      which hides this implementation detail and use it in all callers which
      hardcoded the buddy allocator path (__hugetlb_alloc_buddy_huge_page).
      This shouldn't introduce any funtional change because both migration and
      surplus allocators exlude giga pages explicitly.
      
      While we are at it let's do some renaming.  The current scheme is not
      consistent and overly painfull to read and understand.  Get rid of
      prefix underscores from most functions.  There is no real reason to make
      names longer.
      
      * alloc_fresh_huge_page is the new layer to abstract underlying
        allocator
      * __hugetlb_alloc_buddy_huge_page becomes shorter and neater
        alloc_buddy_huge_page.
      * Former alloc_fresh_huge_page becomes alloc_pool_huge_page because we put
        the new page directly to the pool
      * alloc_surplus_huge_page can drop the opencoded prep_new_huge_page code
        as it uses alloc_fresh_huge_page now
      * others lose their excessive prefix underscores to make names shorter
      
      [dan.carpenter@oracle.com: fix double unlock bug in alloc_surplus_huge_page()]
        Link: http://lkml.kernel.org/r/20180109200559.g3iz5kvbdrz7yydp@mwanda
      Link: http://lkml.kernel.org/r/20180103093213.26329-6-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Reale <ar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0c397dae
    • Michal Hocko's avatar
      mm, hugetlb: get rid of surplus page accounting tricks · 9980d744
      Michal Hocko authored
      alloc_surplus_huge_page increases the pool size and the number of
      surplus pages opportunistically to prevent from races with the pool size
      change.  See commit d1c3fb1f ("hugetlb: introduce
      nr_overcommit_hugepages sysctl") for more details.
      
      The resulting code is unnecessarily hairy, cause code duplication and
      doesn't allow to share the allocation paths.  Moreover pool size changes
      tend to be very seldom so optimizing for them is not really reasonable.
      Simplify the code and allow to allocate a fresh surplus page as long as
      we are under the overcommit limit and then recheck the condition after
      the allocation and drop the new page if the situation has changed.  This
      should provide a reasonable guarantee that an abrupt allocation requests
      will not go way off the limit.
      
      If we consider races with the pool shrinking and enlarging then we
      should be reasonably safe as well.  In the first case we are off by one
      in the worst case and the second case should work OK because the page is
      not yet visible.  We can waste CPU cycles for the allocation but that
      should be acceptable for a relatively rare condition.
      
      Link: http://lkml.kernel.org/r/20180103093213.26329-5-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Reale <ar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9980d744
    • Michal Hocko's avatar
      mm, hugetlb: do not rely on overcommit limit during migration · ab5ac90a
      Michal Hocko authored
      hugepage migration relies on __alloc_buddy_huge_page to get a new page.
      This has 2 main disadvantages.
      
      1) it doesn't allow to migrate any huge page if the pool is used
         completely which is not an exceptional case as the pool is static and
         unused memory is just wasted.
      
      2) it leads to a weird semantic when migration between two numa nodes
         might increase the pool size of the destination NUMA node while the
         page is in use.  The issue is caused by per NUMA node surplus pages
         tracking (see free_huge_page).
      
      Address both issues by changing the way how we allocate and account
      pages allocated for migration.  Those should temporal by definition.  So
      we mark them that way (we will abuse page flags in the 3rd page) and
      update free_huge_page to free such pages to the page allocator.  Page
      migration path then just transfers the temporal status from the new page
      to the old one which will be freed on the last reference.  The global
      surplus count will never change during this path but we still have to be
      careful when migrating a per-node suprlus page.  This is now handled in
      move_hugetlb_state which is called from the migration path and it copies
      the hugetlb specific page state and fixes up the accounting when needed
      
      Rename __alloc_buddy_huge_page to __alloc_surplus_huge_page to better
      reflect its purpose.  The new allocation routine for the migration path
      is __alloc_migrate_huge_page.
      
      The user visible effect of this patch is that migrated pages are really
      temporal and they travel between NUMA nodes as per the migration
      request:
      
      Before migration
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
      
      After
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:1
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
      
      with the previous implementation, both nodes would have nr_hugepages:1
      until the page is freed.
      
      Link: http://lkml.kernel.org/r/20180103093213.26329-4-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Reale <ar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab5ac90a