1. 30 Sep, 2019 4 commits
    • Arnaldo Carvalho de Melo's avatar
      tools headers kvm: Sync kvm headers with the kernel sources · b7ad6108
      Arnaldo Carvalho de Melo authored
      To pick the changes in:
      
        200824f5 ("KVM: s390: Disallow invalid bits in kvm_valid_regs and kvm_dirty_regs")
        4a53d99d ("KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode")
        7396d337 ("KVM: x86: Return to userspace with internal error on unexpected exit reason")
        92f35b75 ("KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE")
      
      None of them trigger any changes in tooling, this time this is just to silence
      these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
        diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
        diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
        Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h'
        diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
        Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
        diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h
        Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
        diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Janosch Frank <frankja@linux.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Huth <thuth@redhat.com>
      Link: https://lkml.kernel.org/n/tip-akuugvvjxte26kzv23zp5d2z@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b7ad6108
    • Arnaldo Carvalho de Melo's avatar
      tools headers uapi: Sync linux/fs.h with the kernel sources · 0ae40612
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        78a1b96b ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl")
        23c688b5 ("fscrypt: allow unprivileged users to add/remove keys for v2 policies")
        5dae460c ("fscrypt: v2 encryption policy support")
        5a7e2992 ("fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl")
        b1c0ec35 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
        22d94f49 ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl")
        3b6df59b ("fscrypt: use FSCRYPT_* definitions, not FS_*")
        2336d0de ("fscrypt: use FSCRYPT_ prefix for uapi constants")
        7af0ab0d ("fs, fscrypt: move uapi definitions to new header <linux/fscrypt.h>")
      
      That don't trigger any changes in tooling, as it so far is used only
      for:
      
        $ grep -l 'fs\.h' tools/perf/trace/beauty/*.sh | xargs grep regex=
        tools/perf/trace/beauty/rename_flags.sh:regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+RENAME_([[:alnum:]_]+)[[:space:]]+\(1[[:space:]]*<<[[:space:]]*([[:xdigit:]]+)[[:space:]]*\)[[:space:]]*.*'
        tools/perf/trace/beauty/sync_file_range.sh:regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+SYNC_FILE_RANGE_([[:alnum:]_]+)[[:space:]]+([[:xdigit:]]+)[[:space:]]*.*'
        tools/perf/trace/beauty/usbdevfs_ioctl.sh:regex="^#[[:space:]]*define[[:space:]]+USBDEVFS_(\w+)(\(\w+\))?[[:space:]]+_IO[CWR]{0,2}\([[:space:]]*(_IOC_\w+,[[:space:]]*)?'U'[[:space:]]*,[[:space:]]*([[:digit:]]+).*"
        tools/perf/trace/beauty/usbdevfs_ioctl.sh:regex="^#[[:space:]]*define[[:space:]]+USBDEVFS_(\w+)[[:space:]]+_IO[WR]{0,2}\([[:space:]]*'U'[[:space:]]*,[[:space:]]*([[:digit:]]+).*"
        $
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
        diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-44g48exl9br9ba0t64chqb4i@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0ae40612
    • Arnaldo Carvalho de Melo's avatar
      tools headers uapi: Sync linux/usbdevice_fs.h with the kernel sources · 05f371f8
      Arnaldo Carvalho de Melo authored
      To pick up the changes from:
      
        4ed33505 ("USB: usbfs: Add a capability flag for runtime suspend")
        7794f486 ("usbfs: Add ioctls for runtime power management")
      
      This triggers these changes in the kernel sources, automagically
      supporting these new ioctls in the 'perf trace' beautifiers.
      
      Soon this will be used in things like filter expressions for tracepoints
      in 'perf record', 'perf trace', 'perf top', i.e. filter expressions will
      do a lookup to turn things like USBDEVFS_WAIT_FOR_RESUME into _IO('U',
      35) before associating the tracepoint expression to tracepoint perf
      event.
      
        $ tools/perf/trace/beauty/usbdevfs_ioctl.sh  > before
        $ cp include/uapi/linux/usbdevice_fs.h tools/include/uapi/linux/usbdevice_fs.h
        $ git diff
        diff --git a/tools/include/uapi/linux/usbdevice_fs.h b/tools/include/uapi/linux/usbdevice_fs.h
        index 78efe870c2b7..cf525cddeb94 100644
        --- a/tools/include/uapi/linux/usbdevice_fs.h
        +++ b/tools/include/uapi/linux/usbdevice_fs.h
        @@ -158,6 +158,7 @@ struct usbdevfs_hub_portinfo {
         #define USBDEVFS_CAP_MMAP                      0x20
         #define USBDEVFS_CAP_DROP_PRIVILEGES           0x40
         #define USBDEVFS_CAP_CONNINFO_EX               0x80
        +#define USBDEVFS_CAP_SUSPEND                   0x100
      
         /* USBDEVFS_DISCONNECT_CLAIM flags & struct */
      
        @@ -223,5 +224,8 @@ struct usbdevfs_streams {
          * extending size of the data returned.
          */
         #define USBDEVFS_CONNINFO_EX(len)  _IOC(_IOC_READ, 'U', 32, len)
        +#define USBDEVFS_FORBID_SUSPEND    _IO('U', 33)
        +#define USBDEVFS_ALLOW_SUSPEND     _IO('U', 34)
        +#define USBDEVFS_WAIT_FOR_RESUME   _IO('U', 35)
      
         #endif /* _UAPI_LINUX_USBDEVICE_FS_H */
        $ tools/perf/trace/beauty/usbdevfs_ioctl.sh  > after
        $ diff -u before after
        --- before	2019-09-27 11:41:50.634867620 -0300
        +++ after	2019-09-27 11:42:07.453102978 -0300
        @@ -24,6 +24,9 @@
         	[30] = "DROP_PRIVILEGES",
         	[31] = "GET_SPEED",
         	[32] = "CONNINFO_EX",
        +	[33] = "FORBID_SUSPEND",
        +	[34] = "ALLOW_SUSPEND",
        +	[35] = "WAIT_FOR_RESUME",
         	[3] = "RESETEP",
         	[4] = "SETINTERFACE",
         	[5] = "SETCONFIGURATION",
        $
      
      This addresses the following perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/usbdevice_fs.h' differs from latest version at 'include/uapi/linux/usbdevice_fs.h'
        diff -u tools/include/uapi/linux/usbdevice_fs.h include/uapi/linux/usbdevice_fs.h
      
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-x1rb109b9nfi7pukota82xhj@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      05f371f8
    • Arnaldo Carvalho de Melo's avatar
      tools headers uapi: Sync asm-generic/mman-common.h with the kernel · b1ba55cf
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        1a4e58cc ("mm: introduce MADV_PAGEOUT")
        9c276cc6 ("mm: introduce MADV_COLD")
      
      That result in these changes in the tools:
      
        $ tools/perf/trace/beauty/madvise_behavior.sh > before
        $ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
        $ git diff
        diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
        index 63b1f506ea67..c160a5354eb6 100644
        --- a/tools/include/uapi/asm-generic/mman-common.h
        +++ b/tools/include/uapi/asm-generic/mman-common.h
        @@ -67,6 +67,9 @@
         #define MADV_WIPEONFORK 18             /* Zero memory on fork, child only */
         #define MADV_KEEPONFORK 19             /* Undo MADV_WIPEONFORK */
      
        +#define MADV_COLD      20              /* deactivate these pages */
        +#define MADV_PAGEOUT   21              /* reclaim these pages */
        +
         /* compatibility flags */
         #define MAP_FILE       0
      
        $ tools/perf/trace/beauty/madvise_behavior.sh > after
        $ diff -u before after
        --- before	2019-09-27 11:29:43.346320100 -0300
        +++ after	2019-09-27 11:30:03.838570439 -0300
        @@ -16,6 +16,8 @@
         	[17] = "DODUMP",
         	[18] = "WIPEONFORK",
         	[19] = "KEEPONFORK",
        +	[20] = "COLD",
        +	[21] = "PAGEOUT",
         	[100] = "HWPOISON",
         	[101] = "SOFT_OFFLINE",
         };
        $
      
      I.e. now when madvise gets those behaviours as args, it will be able to
      translate from the number to a human readable string.
      
      This addresses the following perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
        diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-n40y6c4sa49p29q6sl8w3ufx@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b1ba55cf
  2. 27 Sep, 2019 4 commits
  3. 26 Sep, 2019 32 commits
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · da05b5ea
      Linus Torvalds authored
      Pull timer fix from Ingo Molnar:
       "Fix a timer expiry bug that would cause spurious delay of timers"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timer: Read jiffies once when forwarding base clk
      da05b5ea
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a7b7b772
      Linus Torvalds authored
      Pull more perf updates from Ingo Molnar:
       "The only kernel change is comment typo fixes.
      
        The rest is mostly tooling fixes, but also new vendor event additions
        and updates, a bigger libperf/libtraceevent library and a header files
        reorganization that came in a bit late"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (108 commits)
        perf unwind: Fix libunwind build failure on i386 systems
        perf parser: Remove needless include directives
        perf build: Add detection of java-11-openjdk-devel package
        perf jvmti: Include JVMTI support for s390
        perf vendor events: Remove P8 HW events which are not supported
        perf evlist: Fix access of freed id arrays
        perf stat: Fix free memory access / memory leaks in metrics
        perf tools: Replace needless mmap.h with what is needed, event.h
        perf evsel: Move config terms to a separate header
        perf evlist: Remove unused perf_evlist__fprintf() method
        perf evsel: Introduce evsel_fprintf.h
        perf evsel: Remove need for symbol_conf in evsel_fprintf.c
        perf copyfile: Move copyfile routines to separate files
        libperf: Add perf_evlist__poll() function
        libperf: Add perf_evlist__add_pollfd() function
        libperf: Add perf_evlist__alloc_pollfd() function
        libperf: Add libperf_init() call to the tests
        libperf: Merge libperf_set_print() into libperf_init()
        libperf: Add libperf dependency for tests targets
        libperf: Use sys/types.h to get ssize_t, not unistd.h
        ...
      a7b7b772
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 7897c04a
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Srikar Dronamraju fixed a bug in the newmulti probe code"
      
      * tag 'trace-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/probe: Fix same probe event argument matching
      7897c04a
    • Arnaldo Carvalho de Melo's avatar
      perf unwind: Fix libunwind build failure on i386 systems · 26acf400
      Arnaldo Carvalho de Melo authored
      Naresh Kamboju reported, that on the i386 build pr_err()
      doesn't get defined properly due to header ordering:
      
        perf-in.o: In function `libunwind__x86_reg_id':
        tools/perf/util/libunwind/../../arch/x86/util/unwind-libunwind.c:109:
        undefined reference to `pr_err'
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      26acf400
    • Linus Torvalds's avatar
      Merge tag 'usercopy-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 0576f060
      Linus Torvalds authored
      Pull usercopy fix from Kees Cook:
       "Fix hardened usercopy under CONFIG_DEBUG_VIRTUAL"
      
      * tag 'usercopy-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        usercopy: Avoid HIGHMEM pfn warning
      0576f060
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-5.4-rc1.1' of... · 797a3242
      Linus Torvalds authored
      Merge tag 'linux-kselftest-5.4-rc1.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest updates from Shuah Khan:
       "Fixes to existing tests"
      
      * tag 'linux-kselftest-5.4-rc1.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: tpm2: install python files
        selftests: livepatch: add missing fragments to config
        selftests: watchdog: cleanup whitespace in usage options
        selftest/ftrace: Fix typo in trigger-snapshot.tc
        selftests: watchdog: Add optional file argument
        selftests/seccomp: fix build on older kernels
        selftests: use "$(MAKE)" instead of "make"
      797a3242
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · 972a2bf7
      Linus Torvalds authored
      Pull NFS client updates from Anna Schumaker:
       "Stable bugfixes:
         - Dequeue the request from the receive queue while we're re-encoding
           # v4.20+
         - Fix buffer handling of GSS MIC without slack # 5.1
      
        Features:
         - Increase xprtrdma maximum transport header and slot table sizes
         - Add support for nfs4_call_sync() calls using a custom
           rpc_task_struct
         - Optimize the default readahead size
         - Enable pNFS filelayout LAYOUTGET on OPEN
      
        Other bugfixes and cleanups:
         - Fix possible null-pointer dereferences and memory leaks
         - Various NFS over RDMA cleanups
         - Various NFS over RDMA comment updates
         - Don't receive TCP data into a reset request buffer
         - Don't try to parse incomplete RPC messages
         - Fix congestion window race with disconnect
         - Clean up pNFS return-on-close error handling
         - Fixes for NFS4ERR_OLD_STATEID handling"
      
      * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
        pNFS/filelayout: enable LAYOUTGET on OPEN
        NFS: Optimise the default readahead size
        NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
        NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
        NFSv4: Fix OPEN_DOWNGRADE error handling
        pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
        NFSv4: Add a helper to increment stateid seqids
        NFSv4: Handle RPC level errors in LAYOUTRETURN
        NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
        NFSv4: Clean up pNFS return-on-close error handling
        pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
        NFS: remove unused check for negative dentry
        NFSv3: use nfs_add_or_obtain() to create and reference inodes
        NFS: Refactor nfs_instantiate() for dentry referencing callers
        SUNRPC: Fix congestion window race with disconnect
        SUNRPC: Don't try to parse incomplete RPC messages
        SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
        SUNRPC: Fix buffer handling of GSS MIC without slack
        SUNRPC: RPC level errors should always set task->tk_rpc_status
        SUNRPC: Don't receive TCP data into a request buffer that has been reset
        ...
      972a2bf7
    • Kees Cook's avatar
      binfmt_elf: Do not move brk for INTERP-less ET_EXEC · 7be3cb01
      Kees Cook authored
      When brk was moved for binaries without an interpreter, it should have
      been limited to ET_DYN only. In other words, the special case was an
      ET_DYN that lacks an INTERP, not just an executable that lacks INTERP.
      The bug manifested for giant static executables, where the brk would end
      up in the middle of the text area on 32-bit architectures.
      Reported-and-tested-by: default avatarRichard Kojedzinszky <richard@kojedz.in>
      Fixes: bbdc6076 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7be3cb01
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.4-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 2268419e
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "There are a couple of bug fixes and some small code cleanups that came
        in recently:
      
         - Minor code cleanups
      
         - Fix a superblock logging error
      
         - Ensure that collapse range converts the data fork to extents format
           when necessary
      
         - Revert the ALLOC_USERDATA cleanup because it caused subtle behavior
           regressions"
      
      * tag 'xfs-5.4-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: avoid unused to_mp() function warning
        xfs: log proper length of superblock
        xfs: revert 1baa2800 ("xfs: remove the unused XFS_ALLOC_USERDATA flag")
        xfs: removed unneeded variable
        xfs: convert inode to extent format after extent merge due to shift
      2268419e
    • Linus Torvalds's avatar
      Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · dadedd85
      Linus Torvalds authored
      Pull jffs2 fix from Al Viro:
       "braino fix for mount API conversion for jffs2"
      
      * 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        jffs2: Fix mounting under new mount API
      dadedd85
    • Linus Torvalds's avatar
      Merge tag 's390-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 16cdf084
      Linus Torvalds authored
      Pull more s390 updates from Vasily Gorbik:
      
       - Fix three kasan findings
      
       - Add PERF_EVENT_IOC_PERIOD ioctl support
      
       - Add Crypto Express7S support and extend sysfs attributes for pkey
      
       - Minor common I/O layer documentation corrections
      
      * tag 's390-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/cio: exclude subchannels with no parent from pseudo check
        s390/cio: avoid calling strlen on null pointer
        s390/topology: avoid firing events before kobjs are created
        s390/cpumf: Remove mixed white space
        s390/cpum_sf: Support ioctl PERF_EVENT_IOC_PERIOD
        s390/zcrypt: CEX7S exploitation support
        s390/cio: fix intparm documentation
        s390/pkey: Add sysfs attributes to emit AES CIPHER key blobs
      16cdf084
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · ec56103e
      Linus Torvalds authored
      Pull xen update from Juergen Gross:
       "Only two small patches this time:
      
         - a small cleanup for swiotlb-xen
      
         - a fix for PCI initialization for some platforms"
      
      * tag 'for-linus-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/pci: reserve MCFG areas earlier
        swiotlb-xen: Convert to use macro
      ec56103e
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · cbafe18c
      Linus Torvalds authored
      Merge more updates from Andrew Morton:
      
       - almost all of the rest of -mm
      
       - various other subsystems
      
      Subsystems affected by this patch series:
        memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork,
        cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise,
        cleanups, pagemap
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (77 commits)
        arch/sparc/include/asm/pgtable_64.h: fix build
        mm: treewide: clarify pgtable_page_{ctor,dtor}() naming
        ntfs: remove (un)?likely() from IS_ERR() conditions
        IB/hfi1: remove unlikely() from IS_ERR*() condition
        xfs: remove unlikely() from WARN_ON() condition
        wimax/i2400m: remove unlikely() from WARN*() condition
        fs: remove unlikely() from WARN_ON() condition
        xen/events: remove unlikely() from WARN() condition
        checkpatch: check for nested (un)?likely() calls
        hexagon: drop empty and unused free_initrd_mem
        mm: factor out common parts between MADV_COLD and MADV_PAGEOUT
        mm: introduce MADV_PAGEOUT
        mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM
        mm: introduce MADV_COLD
        mm: untag user pointers in mmap/munmap/mremap/brk
        vfio/type1: untag user pointers in vaddr_get_pfn
        tee/shm: untag user pointers in tee_shm_register
        media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
        drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
        drm/amdgpu: untag user pointers
        ...
      cbafe18c
    • Andrew Morton's avatar
      arch/sparc/include/asm/pgtable_64.h: fix build · a22fea94
      Andrew Morton authored
      A last-minute fixlet which I'd failed to merge at the appropriate time
      had the predictable effect.
      
      Fixes: f672e2c217e2d4b2 ("lib: untag user pointers in strn*_user")
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a22fea94
    • Mark Rutland's avatar
      mm: treewide: clarify pgtable_page_{ctor,dtor}() naming · b4ed71f5
      Mark Rutland authored
      The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
      people, and until recently arm64 used these erroneously/pointlessly for
      other levels of page table.
      
      To make it incredibly clear that these only apply to the PTE level, and to
      align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
      to pgtable_pte_page_{ctor,dtor}().
      
      These changes were generated with the following shell script:
      
      ----
      git grep -lw 'pgtable_page_.tor' | while read FILE; do
          sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
          sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
      done
      ----
      
      ... with the documentation re-flowed to remain under 80 columns, and
      whitespace fixed up in macros to keep backslashes aligned.
      
      There should be no functional change as a result of this patch.
      
      Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.comSigned-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4ed71f5
    • Denis Efremov's avatar
      ntfs: remove (un)?likely() from IS_ERR() conditions · cc22c800
      Denis Efremov authored
      "likely(!IS_ERR(x))" is excessive. IS_ERR() already uses
      unlikely() internally.
      
      Link: http://lkml.kernel.org/r/20190829165025.15750-11-efremov@linux.comSigned-off-by: default avatarDenis Efremov <efremov@linux.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cc22c800
    • Denis Efremov's avatar
      IB/hfi1: remove unlikely() from IS_ERR*() condition · 7b0b6925
      Denis Efremov authored
      "unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses
      unlikely() internally.
      
      Link: http://lkml.kernel.org/r/20190829165025.15750-8-efremov@linux.comSigned-off-by: default avatarDenis Efremov <efremov@linux.com>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Joe Perches <joe@perches.com>
      Acked-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b0b6925
    • Denis Efremov's avatar
      xfs: remove unlikely() from WARN_ON() condition · 14ed8688
      Denis Efremov authored
      "unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely()
      internally.
      
      Link: http://lkml.kernel.org/r/20190829165025.15750-7-efremov@linux.comSigned-off-by: default avatarDenis Efremov <efremov@linux.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14ed8688
    • Denis Efremov's avatar
      wimax/i2400m: remove unlikely() from WARN*() condition · 77c0e745
      Denis Efremov authored
      "unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely()
      internally.
      
      Link: http://lkml.kernel.org/r/20190829165025.15750-6-efremov@linux.comSigned-off-by: default avatarDenis Efremov <efremov@linux.com>
      Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77c0e745
    • Denis Efremov's avatar
      fs: remove unlikely() from WARN_ON() condition · 7159d544
      Denis Efremov authored
      "unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely()
      internally.
      
      Link: http://lkml.kernel.org/r/20190829165025.15750-5-efremov@linux.comSigned-off-by: default avatarDenis Efremov <efremov@linux.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7159d544
    • Denis Efremov's avatar
      xen/events: remove unlikely() from WARN() condition · 89f40354
      Denis Efremov authored
      "unlikely(WARN(x))" is excessive. WARN() already uses unlikely()
      internally.
      
      Link: http://lkml.kernel.org/r/20190829165025.15750-4-efremov@linux.comSigned-off-by: default avatarDenis Efremov <efremov@linux.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Joe Perches <joe@perches.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89f40354
    • David Howells's avatar
      jffs2: Fix mounting under new mount API · a3bc18a4
      David Howells authored
      The mounting of jffs2 is broken due to the changes from the new mount API
      because it specifies a "source" operation, but then doesn't actually
      process it.  But because it specified it, it doesn't return -ENOPARAM and
      the caller doesn't process it either and the source gets lost.
      
      Fix this by simply removing the source parameter from jffs2 and letting the
      VFS deal with it in the default manner.
      
      To test it, enable CONFIG_MTD_MTDRAM and allow the default size and erase
      block size parameters, then try and mount the /dev/mtdblock<N> file that
      that creates as jffs2.  No need to initialise it.
      
      Fixes: ec10a24f ("vfs: Convert jffs2 to use the new mount API")
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: David Woodhouse <dwmw2@infradead.org>
      cc: Richard Weinberger <richard@nod.at>
      cc: linux-mtd@lists.infradead.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a3bc18a4
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.5-20190925' of... · b11f7244
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.5-20190925' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf record:
      
        Stephane Eranian:
      
        - Fix priv level with branch sampling for paranoid=2, i.e. the kernel checks
          if perf_event_attr_attr.exclude_hv is set in addition to .exclude_kernel,
          so reset both to zero.
      
        Arnaldo Carvalho de Melo:
      
        - Don't warn about not being able to read kernel maps (kallsyms, etc) when
          kernel samples aren't being collected.
      
      perf list:
      
        Kim Phillips:
      
        - Allow plurals for metric, metricgroup., i.e.:
      
          $ perf list metrics
      
          was showing nothing, which is very confusing, make it work like:
      
          $ perf stat metric
      
      perf stat:
      
        Andi Kleen:
      
        - Free memory access/leaks detected via valgrind, related to metrics.
      
      Libraries:
      
      libperf:
      
        Jiri Olsa:
      
        - Move more stuff from tools/perf, this time a first stab at moving perf_mmap
          methods.
      
      libtracevent:
      
        Steven Rostedt (VMware):
      
        - Round up in tep_print_event() time precision.
      
        Tzvetomir Stoyanov (VMware):
      
        - Man pages for event print and related and plugins APIs.
      
        - Move traceevent plugins in its own subdirectory.
      
      Feature detection:
      
        Thomas Richter:
      
        - Add detection of java-11-openjdk-devel package, in addition to the older
          versions supported.
      
      Architecture specific:
      
      S/390:
      
        Thomas Richter (2):
      
        - Include JVMTI support for s390
      
      Vendor events:
      
      AMD:
      
        Kim Phillips:
      
        - Add L3 cache events for Family 17h.
      
        - Remove redundant '['.
      
      PowerPC:
      
        Mamatha Inamdar:
      
        - Remove P8 HW events which are not supported.
      
      Cleanups:
      
        Arnaldo Carvalho de Melo:
      
        - Remove needless headers, add needed ones, move things around to reduce the
          headers dependency tree, speeding up builds by not doing needless compiles
          when unrelated stuff gets changed.
      
        - Ditch unused code that was dragging headers.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b11f7244
    • Denis Efremov's avatar
      checkpatch: check for nested (un)?likely() calls · de3f186f
      Denis Efremov authored
      IS_ERR(), IS_ERR_OR_NULL(), IS_ERR_VALUE() and WARN*() already contain
      unlikely() optimization internally.  Thus, there is no point in calling
      these functions and defines under likely()/unlikely().
      
      This check is based on the coccinelle rule developed by Enrico Weigelt
      https://lore.kernel.org/lkml/1559767582-11081-1-git-send-email-info@metux.net/
      
      Link: http://lkml.kernel.org/r/20190829165025.15750-1-efremov@linux.comSigned-off-by: default avatarDenis Efremov <efremov@linux.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Boris Pismenny <borisp@mellanox.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Denis Efremov <efremov@linux.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Sean Paul <sean@poorly.run>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de3f186f
    • Mike Rapoport's avatar
      hexagon: drop empty and unused free_initrd_mem · c7cc8d77
      Mike Rapoport authored
      hexagon never reserves or initializes initrd and the only mention of it is
      the empty free_initrd_mem() function.
      
      As we have a generic implementation of free_initrd_mem(), there is no need
      to define an empty stub for the hexagon implementation and it can be
      dropped.
      
      Link: http://lkml.kernel.org/r/1565858133-25852-1-git-send-email-rppt@linux.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7cc8d77
    • Minchan Kim's avatar
      mm: factor out common parts between MADV_COLD and MADV_PAGEOUT · d616d512
      Minchan Kim authored
      There are many common parts between MADV_COLD and MADV_PAGEOUT.
      This patch factor them out to save code duplication.
      
      Link: http://lkml.kernel.org/r/20190726023435.214162-6-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: kbuild test robot <lkp@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d616d512
    • Minchan Kim's avatar
      mm: introduce MADV_PAGEOUT · 1a4e58cc
      Minchan Kim authored
      When a process expects no accesses to a certain memory range for a long
      time, it could hint kernel that the pages can be reclaimed instantly but
      data should be preserved for future use.  This could reduce workingset
      eviction so it ends up increasing performance.
      
      This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
      MADV_PAGEOUT can be used by a process to mark a memory range as not
      expected to be used for a long time so that kernel reclaims *any LRU*
      pages instantly.  The hint can help kernel in deciding which pages to
      evict proactively.
      
      A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
      intentionally because it's automatically bounded by PMD size.  If PMD
      size(e.g., 256) makes some trouble, we could fix it later by limit it to
      SWAP_CLUSTER_MAX[1].
      
      - man-page material
      
      MADV_PAGEOUT (since Linux x.x)
      
      Do not expect access in the near future so pages in the specified
      regions could be reclaimed instantly regardless of memory pressure.
      Thus, access in the range after successful operation could cause
      major page fault but never lose the up-to-date contents unlike
      MADV_DONTNEED. Pages belonging to a shared mapping are only processed
      if a write access is allowed for the calling process.
      
      MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
      VM_PFNMAP pages.
      
      [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/
      
      [minchan@kernel.org: clear PG_active on MADV_PAGEOUT]
        Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com
      [akpm@linux-foundation.org: resolve conflicts with hmm.git]
      Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a4e58cc
    • Minchan Kim's avatar
      mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM · 8940b34a
      Minchan Kim authored
      The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN
      as default.  It is for preventing to reclaim dirty pages when CMA try to
      migrate pages.  Strictly speaking, we don't need it because CMA didn't
      allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list.
      
      Moreover, it has a problem to prevent anonymous pages's swap out even
      though force_reclaim = true in shrink_page_list on upcoming patch.  So
      this patch makes references's default value to PAGEREF_RECLAIM and rename
      force_reclaim with ignore_references to make it more clear.
      
      This is a preparatory work for next patch.
      
      Link: http://lkml.kernel.org/r/20190726023435.214162-3-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: kbuild test robot <lkp@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8940b34a
    • Minchan Kim's avatar
      mm: introduce MADV_COLD · 9c276cc6
      Minchan Kim authored
      Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.
      
      - Background
      
      The Android terminology used for forking a new process and starting an app
      from scratch is a cold start, while resuming an existing app is a hot
      start.  While we continually try to improve the performance of cold
      starts, hot starts will always be significantly less power hungry as well
      as faster so we are trying to make hot start more likely than cold start.
      
      To increase hot start, Android userspace manages the order that apps
      should be killed in a process called ActivityManagerService.
      ActivityManagerService tracks every Android app or service that the user
      could be interacting with at any time and translates that into a ranked
      list for lmkd(low memory killer daemon).  They are likely to be killed by
      lmkd if the system has to reclaim memory.  In that sense they are similar
      to entries in any other cache.  Those apps are kept alive for
      opportunistic performance improvements but those performance improvements
      will vary based on the memory requirements of individual workloads.
      
      - Problem
      
      Naturally, cached apps were dominant consumers of memory on the system.
      However, they were not significant consumers of swap even though they are
      good candidate for swap.  Under investigation, swapping out only begins
      once the low zone watermark is hit and kswapd wakes up, but the overall
      allocation rate in the system might trip lmkd thresholds and cause a
      cached process to be killed(we measured performance swapping out vs.
      zapping the memory by killing a process.  Unsurprisingly, zapping is 10x
      times faster even though we use zram which is much faster than real
      storage) so kill from lmkd will often satisfy the high zone watermark,
      resulting in very few pages actually being moved to swap.
      
      - Approach
      
      The approach we chose was to use a new interface to allow userspace to
      proactively reclaim entire processes by leveraging platform information.
      This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
      that are known to be cold from userspace and to avoid races with lmkd by
      reclaiming apps as soon as they entered the cached state.  Additionally,
      it could provide many chances for platform to use much information to
      optimize memory efficiency.
      
      To achieve the goal, the patchset introduce two new options for madvise.
      One is MADV_COLD which will deactivate activated pages and the other is
      MADV_PAGEOUT which will reclaim private pages instantly.  These new
      options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
      ways to gain some free memory space.  MADV_PAGEOUT is similar to
      MADV_DONTNEED in a way that it hints the kernel that memory region is not
      currently needed and should be reclaimed immediately; MADV_COLD is similar
      to MADV_FREE in a way that it hints the kernel that memory region is not
      currently needed and should be reclaimed when memory pressure rises.
      
      This patch (of 5):
      
      When a process expects no accesses to a certain memory range, it could
      give a hint to kernel that the pages can be reclaimed when memory pressure
      happens but data should be preserved for future use.  This could reduce
      workingset eviction so it ends up increasing performance.
      
      This patch introduces the new MADV_COLD hint to madvise(2) syscall.
      MADV_COLD can be used by a process to mark a memory range as not expected
      to be used in the near future.  The hint can help kernel in deciding which
      pages to evict early during memory pressure.
      
      It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves
      
      	active file page -> inactive file LRU
      	active anon page -> inacdtive anon LRU
      
      Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
      LRU's head because MADV_COLD is a little bit different symantic.
      MADV_FREE means it's okay to discard when the memory pressure because the
      content of the page is *garbage* so freeing such pages is almost zero
      overhead since we don't need to swap out and access afterward causes just
      minor fault.  Thus, it would make sense to put those freeable pages in
      inactive file LRU to compete other used-once pages.  It makes sense for
      implmentaion point of view, too because it's not swapbacked memory any
      longer until it would be re-dirtied.  Even, it could give a bonus to make
      them be reclaimed on swapless system.  However, MADV_COLD doesn't mean
      garbage so reclaiming them requires swap-out/in in the end so it's bigger
      cost.  Since we have designed VM LRU aging based on cost-model, anonymous
      cold pages would be better to position inactive anon's LRU list, not file
      LRU.  Furthermore, it would help to avoid unnecessary scanning if system
      doesn't have a swap device.  Let's start simpler way without adding
      complexity at this moment.  However, keep in mind, too that it's a caveat
      that workloads with a lot of pages cache are likely to ignore MADV_COLD on
      anonymous memory because we rarely age anonymous LRU lists.
      
      * man-page material
      
      MADV_COLD (since Linux x.x)
      
      Pages in the specified regions will be treated as less-recently-accessed
      compared to pages in the system with similar access frequencies.  In
      contrast to MADV_FREE, the contents of the region are preserved regardless
      of subsequent writes to pages.
      
      MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
      pages.
      
      [akpm@linux-foundation.org: resolve conflicts with hmm.git]
      Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c276cc6
    • Catalin Marinas's avatar
      mm: untag user pointers in mmap/munmap/mremap/brk · ce18d171
      Catalin Marinas authored
      There isn't a good reason to differentiate between the user address space
      layout modification syscalls and the other memory permission/attributes
      ones (e.g.  mprotect, madvise) w.r.t.  the tagged address ABI.  Untag the
      user addresses on entry to these functions.
      
      Link: http://lkml.kernel.org/r/20190821164730.47450-2-catalin.marinas@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Dave P Martin <Dave.Martin@arm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce18d171
    • Andrey Konovalov's avatar
      vfio/type1: untag user pointers in vaddr_get_pfn · 6cf5354c
      Andrey Konovalov authored
      This patch is a part of a series that extends kernel ABI to allow to pass
      tagged user pointers (with the top byte set to something else other than
      0x00) as syscall arguments.
      
      vaddr_get_pfn() uses provided user pointers for vma lookups, which can
      only by done with untagged pointers.
      
      Untag user pointers in this function.
      
      Link: http://lkml.kernel.org/r/87422b4d72116a975896f2b19b00f38acbd28f33.1563904656.git.andreyknvl@google.comSigned-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jens Wiklander <jens.wiklander@linaro.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6cf5354c
    • Andrey Konovalov's avatar
      tee/shm: untag user pointers in tee_shm_register · 78063a9d
      Andrey Konovalov authored
      This patch is a part of a series that extends kernel ABI to allow to pass
      tagged user pointers (with the top byte set to something else other than
      0x00) as syscall arguments.
      
      tee_shm_register()->optee_shm_unregister()->check_mem_type() uses provided
      user pointers for vma lookups (via __check_mem_type()), which can only by
      done with untagged pointers.
      
      Untag user pointers in this function.
      
      Link: http://lkml.kernel.org/r/4b993f33196b3566ac81285ff8453219e2079b45.1563904656.git.andreyknvl@google.comSigned-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarJens Wiklander <jens.wiklander@linaro.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Auger <eric.auger@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78063a9d