- 31 Jul, 2014 5 commits
-
-
Dave Hansen authored
We don't have any good way to figure out what kinds of flushes are being attempted. Right now, we can try to use the vm counters, but those only tell us what we actually did with the hardware (one-by-one vs full) and don't tell us what was actually _requested_. This allows us to select out "interesting" TLB flushes that we might want to optimize (like the ranged ones) and ignore the ones that we have very little control over (the ones at context switch). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154059.4C96CBA5@viggo.jf.intel.comAcked-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-
Dave Hansen authored
There are currently three paths through the remote flush code: 1. full invalidation 2. single page invalidation using invlpg 3. ranged invalidation using invlpg This takes 2 and 3 and combines them in to a single path by making the single-page one just be the start and end be start plus a single page. This makes placement of our tracepoint easier. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154058.E0F90408@viggo.jf.intel.com Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-
Dave Hansen authored
If we take the if (end == TLB_FLUSH_ALL || vmflag & VM_HUGETLB) { local_flush_tlb(); goto out; } path out of flush_tlb_mm_range(), we will have flushed the tlb, but not incremented NR_TLB_LOCAL_FLUSH_ALL. This unifies the way out of the function so that we always take a single path when doing a full tlb flush. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154056.FF763B76@viggo.jf.intel.comAcked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-
Dave Hansen authored
I think the flush_tlb_mm_range() code that tries to tune the flush sizes based on the CPU needs to get ripped out for several reasons: 1. It is obviously buggy. It uses mm->total_vm to judge the task's footprint in the TLB. It should certainly be using some measure of RSS, *NOT* ->total_vm since only resident memory can populate the TLB. 2. Haswell, and several other CPUs are missing from the intel_tlb_flushall_shift_set() function. Thus, it has been demonstrated to bitrot quickly in practice. 3. It is plain wrong in my vm: [ 0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] tlb_flushall_shift: 6 Which leads to it to never use invlpg. 4. The assumptions about TLB refill costs are wrong: http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com (more on this in later patches) 5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59 I believe the sample times were too short. Running the benchmark in a loop yields times that vary quite a bit. Note that this leaves us with a static ceiling of 1 page. This is a conservative, dumb setting, and will be revised in a later patch. This also removes the code which attempts to predict whether we are flushing data or instructions. We expect instruction flushes to be relatively rare and not worth tuning for explicitly. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.comAcked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-
Dave Hansen authored
The if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids) line of code is not exactly the easiest to audit, especially when it ends up at two different indentation levels. This eliminates one of the the copy-n-paste versions. It also gives us a unified exit point for each path through this function. We need this in a minute for our tracepoint. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154054.44F1CDDC@viggo.jf.intel.comAcked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-
- 12 Jun, 2014 1 commit
-
-
Jiri Kosina authored
If pagefault triggers due to SMEP triggering, it can't be really easily distinguished from any other oops-causing pagefault, which might lead to quite some confusion when trying to understand the reason for the oops. Print an explanatory message in case the fault happened during instruction fetch for _PAGE_USER page which is present and executable on SMEP-enabled CPUs. This is consistent with what we are doing for NX already; in addition to immediately seeing from the oops what might be happening, it can even easily give a good indication to sysadmins who are carefully monitoring their kernel logs that someone might be trying to pwn them. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1406102248490.1321@pobox.suse.czSigned-off-by: H. Peter Anvin <hpa@linux.intel.com>
-
- 08 Jun, 2014 8 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 vdso build fix from Peter Anvin: "This fixes building the vdso code on older Linux systems, and probably some non-Linux systems" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, vdso: Use <tools/le_byteshift.h> for littleendian access
-
Linus Torvalds authored
Now that 3.15 is released, this merges the 'next' branch into 'master', bringing us to the normal situation where my 'master' branch is the merge window. * accumulated work in next: (6809 commits) ufs: sb mutex merge + mutex_destroy powerpc: update comments for generic idle conversion cris: update comments for generic idle conversion idle: remove cpu_idle() forward declarations nbd: zero from and len fields in NBD_CMD_DISCONNECT. mm: convert some level-less printks to pr_* MAINTAINERS: adi-buildroot-devel is moderated MAINTAINERS: add linux-api for review of API/ABI changes mm/kmemleak-test.c: use pr_fmt for logging fs/dlm/debug_fs.c: replace seq_printf by seq_puts fs/dlm/lockspace.c: convert simple_str to kstr fs/dlm/config.c: convert simple_str to kstr mm: mark remap_file_pages() syscall as deprecated mm: memcontrol: remove unnecessary memcg argument from soft limit functions mm: memcontrol: clean up memcg zoneinfo lookup mm/memblock.c: call kmemleak directly from memblock_(alloc|free) mm/mempool.c: update the kmemleak stack trace for mempool allocations lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations mm: introduce kmemleak_update_trace() mm/kmemleak.c: use %u to print ->checksum ...
-
Linus Torvalds authored
-
Linus Torvalds authored
This reverts commit 3e1a878b. It came in very late, and already has one reported failure: Sitsofe reports that the current tree fails to boot on his EeePC, and bisected it down to this. Rather than waste time trying to figure out what's wrong, just revert it. Reported-by: Sitsofe Wheeler <sitsofe@gmail.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.linaro.org/people/mike.turquette/linuxLinus Torvalds authored
Pull clock framework updates from Mike Turquette: "The clock framework changes for 3.16 are pretty typical: mostly clock driver additions and fixes. There are additions to the clock core code for some of the basic types (e.g. the common divider type has some fixes and featured added to it). One minor annoyance is a last-minute dependency that wasn't handled quite right. Commit ba0fae3b ("clk: berlin: add core clock driver for BG2/BG2CD") in this pull request depends on include/dt-bindings/clock/berlin2.h, which is already in your tree via the arm-soc pull request. Building for the berlin platform will break when the clk tree is built on it's own, but merged into your master branch everything should be fine" * tag 'clk-for-linus-3.16' of git://git.linaro.org/people/mike.turquette/linux: (75 commits) mmc: sunxi: Add driver for SD/MMC hosts found on Allwinner sunxi SoCs clk: export __clk_round_rate for providers clk: versatile: free icst on error return clk: qcom: Return error pointers for unimplemented clocks clk: qcom: Support msm8974pro global clock control hardware clk: qcom: Properly support display clocks on msm8974 clk: qcom: Support display RCG clocks clk: qcom: Return highest rate when round_rate() exceeds plan clk: qcom: Fix mmcc-8974's PLL configurations clk: qcom: Fix clk_rcg2_is_enabled() check clk: berlin: add core clock driver for BG2Q clk: berlin: add core clock driver for BG2/BG2CD clk: berlin: add driver for BG2x complex divider cells clk: berlin: add driver for BG2x simple PLLs clk: berlin: add driver for BG2x audio/video PLL clk: st: Terminate of match table clk/exynos4: Fix compilation warning ARM: shmobile: r8a7779: Add clock index macros for DT sources clk: divider: Fix overflow in clk_divider_bestdiv clk: u300: Terminate of match table ...
-
git://github.com/awilliam/linux-vfioLinus Torvalds authored
Pull VFIO updates from Alex Williamson: "A handful of VFIO bug fixes for v3.16" * tag 'vfio-v3.16-rc1' of git://github.com/awilliam/linux-vfio: drivers/vfio/pci: Fix wrong MSI interrupt count drivers/vfio: Rework offsetofend() vfio/iommu_type1: Avoid overflow vfio/pci: Fix unchecked return value vfio/pci: Fix sizing of DPA and THP express capabilities
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6Linus Torvalds authored
Pull crypto updates from Herbert Xu: "Here is the crypto update for 3.16: - Added test vectors for SHA/AES-CCM/DES-CBC/3DES-CBC. - Fixed a number of error-path memory leaks in tcrypt. - Fixed error-path memory leak in caam. - Removed unnecessary global mutex from mxs-dcp. - Added ahash walk interface that can actually be asynchronous. - Cleaned up caam error reporting. - Allow crypto_user get operation to be used by non-root users. - Add support for SSS module on Exynos. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6: (60 commits) crypto: testmgr - add aead cbc des, des3_ede tests crypto: testmgr - Fix DMA-API warning crypto: cesa - tfm->__crt_alg->cra_type directly crypto: sahara - tfm->__crt_alg->cra_name directly crypto: padlock - tfm->__crt_alg->cra_name directly crypto: n2 - tfm->__crt_alg->cra_name directly crypto: dcp - tfm->__crt_alg->cra_name directly crypto: cesa - tfm->__crt_alg->cra_name directly crypto: ccp - tfm->__crt_alg->cra_name directly crypto: geode - Don't use tfm->__crt_alg->cra_name directly crypto: geode - Weed out printk() from probe() crypto: geode - Consistently use AES_KEYSIZE_128 crypto: geode - Kill AES_IV_LENGTH crypto: geode - Kill AES_MIN_BLOCK_SIZE crypto: mxs-dcp - Remove global mutex crypto: hash - Add real ahash walk interface hwrng: n2-drv - Introduce the use of the managed version of kzalloc crypto: caam - reinitialize keys_fit_inline for decrypt and givencrypt crypto: s5p-sss - fix multiplatform build hwrng: timeriomem - remove unnecessary OOM messages ...
-
git://git.open-osd.org/linux-open-osdLinus Torvalds authored
Pull exofs raid6 support from Boaz Harrosh: "These simple patches will enable raid6 using the kernel's raid6_pq engine for support under exofs and pnfs-objects. There is nothing needed to do at exofs and pnfs-obj. Just fire your mkfs.exofs with --raid=6 (that was already supported before) and off you go as usual. The ORE will pick up the new map and will start writing two devices of redundancy bits. The patches are so simple because most of the ORE was already for the general raid case, only a few bug fixes were needed and the actual wiring into the raid6_pq engine" * 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Support for raid 6 ore: Remove redundant dev_order(), more cleanups ore: (trivial) reformat some code
-
- 07 Jun, 2014 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds authored
Pull btrfs fix from Chris Mason: "I had this in my 3.16 merge window queue, but it is small and obvious enough for 3.15. I cherry-picked and retested against current rc8" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: send, fix corrupted path strings for long paths
-
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds authored
Pull SCSI target fixes from Nicholas Bellinger: "Here are the remaining fixes for v3.15. This series includes: - iser-target fix for ImmediateData exception reference count bug (Sagi + nab) - iscsi-target fix for MC/S login + potential iser-target MRDSL buffer overrun (Santosh + Roland) - iser-target fix for v3.15-rc multi network portal shutdown regression (nab) - target fix for allowing READ_CAPCITY during ALUA Standby access state (Chris + nab) - target fix for NULL pointer dereference of alua_access_state for un-configured devices (Chris + nab)" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix alua_access_state attribute OOPs for un-configured devices target: Allow READ_CAPACITY opcode in ALUA Standby access state iser-target: Fix multi network portal shutdown regression iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value() iser-target: Add missing target_put_sess_cmd for ImmedateData failure
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Peter Anvin: "A significantly larger than I'd like set of patches for just below the wire. All of these, however, fix real problems. The one thing that is genuinely scary in here is the change of SMP initialization, but that *does* fix a confirmed hang when booting virtual machines. There is also a patch to actually do the right thing about not offlining a CPU when there are not enough interrupt vectors available in the system; the accounting was done incorrectly. The worst case for that patch is that we fail to offline CPUs when we should (the new code is strictly more conservative than the old), so is not particularly risky. Most of the rest is minor stuff; the EFI patches are all about exporting correct information to boot loaders and kexec" * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: EFI_MIXED should not prohibit loading above 4G x86/smpboot: Initialize secondary CPU only if master CPU will wait for it x86/smpboot: Log error on secondary CPU wakeup failure at ERR level x86: Fix list/memory corruption on CPU hotplug x86: irq: Get correct available vectors for cpu disable x86/efi: Do not export efi runtime map in case old map x86/efi: earlyprintk=efi,keep fix
-
Matt Fleming authored
commit 7d453eee ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a regression for the functionality to load kernels above 4G. The relevant (incorrect) reasoning behind this change can be seen in the commit message, "The xloadflags field in the bzImage header is also updated to reflect that the kernel supports both entry points by setting both of XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y. XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is guaranteed to be addressable with 32-bits." This is obviously bogus since 32-bit EFI loaders will never place the kernel above the 4G mark. So this restriction is entirely unnecessary. But things are worse than that - since we want to encourage people to always compile with CONFIG_EFI_MIXED=y so that their kernels work out of the box for both 32-bit and 64-bit firmware, commit 7d453eee effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely. Remove the overzealous and superfluous restriction and restore the XLF_CAN_BE_LOADED_ABOVE_4G functionality. Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.orgSigned-off-by: H. Peter Anvin <hpa@zytor.com>
-
- 06 Jun, 2014 22 commits
-
-
Linus Torvalds authored
Merge more updates from Andrew Morton: - Most of the rest of MM. This includes "mark remap_file_pages syscall as deprecated" but the actual "replace remap_file_pages syscall with emulation" is held back. I guess we'll need to work out when to pull the trigger on that one. - various minor cleanups to obscure filesystems - the drivers/rtc queue - hfsplus updates - ufs, hpfs, fatfs, affs, reiserfs - Documentation/ - signals - procfs - cpu hotplug - lib/idr.c - rapidio - sysctl - ipc updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (171 commits) ufs: sb mutex merge + mutex_destroy powerpc: update comments for generic idle conversion cris: update comments for generic idle conversion idle: remove cpu_idle() forward declarations nbd: zero from and len fields in NBD_CMD_DISCONNECT. mm: convert some level-less printks to pr_* MAINTAINERS: adi-buildroot-devel is moderated MAINTAINERS: add linux-api for review of API/ABI changes mm/kmemleak-test.c: use pr_fmt for logging fs/dlm/debug_fs.c: replace seq_printf by seq_puts fs/dlm/lockspace.c: convert simple_str to kstr fs/dlm/config.c: convert simple_str to kstr mm: mark remap_file_pages() syscall as deprecated mm: memcontrol: remove unnecessary memcg argument from soft limit functions mm: memcontrol: clean up memcg zoneinfo lookup mm/memblock.c: call kmemleak directly from memblock_(alloc|free) mm/mempool.c: update the kmemleak stack trace for mempool allocations lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations mm: introduce kmemleak_update_trace() mm/kmemleak.c: use %u to print ->checksum ...
-
Fabian Frederick authored
Commit 788257d6 ("ufs: remove the BKL") replaced BKL with mutex protection using functions lock_ufs, unlock_ufs and struct mutex 'mutex' in sb_info. Commit b6963327 ("ufs: drop lock/unlock super") removed lock/unlock super and added struct mutex 's_lock' in sb_info. Those 2 mutexes are generally locked/unlocked at the same time except in allocation (balloc, ialloc). This patch merges the 2 mutexes and propagates first commit solution. It also adds mutex destruction before kfree during ufs_fill_super failure and ufs_put_super. [akpm@linux-foundation.org: avoid ifdefs, return -EROFS not -EINVAL] Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: "Chen, Jet" <jet.chen@intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Geert Uytterhoeven authored
As of commit 799fef06 ("powerpc: Use generic idle loop"), this applies to arch_cpu_idle() instead of cpu_idle(). Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Geert Uytterhoeven authored
As of commit 8dc7c5ec ("cris: Use generic idle loop"), cris no longer provides cpu_idle(). - On cris-v10, etrax_gpio_wake_up_check() is called from default_idle() instead of cpu_idle(), - On cris-v32, etrax_gpio_wake_up_check() is not called from default_idle(), so remove this (copy-and-paste?) part. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Geert Uytterhoeven authored
After all architectures were converted to the generic idle framework, commit d190e819 ("idle: Remove GENERIC_IDLE_LOOP config switch") removed the last caller of cpu_idle(). The forward declarations in header files were forgotten. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hani Benhabiles authored
Len field is already set to zero, but not the from field which is sent as 0xfffffffffffffe00. This makes no sense, and may cause confuse server implementations doing sanity checks (qemu-nbd is an example.) Signed-off-by: Hani Benhabiles <hani@linux.com> Cc: Paul Clements <paul.clements@us.sios.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mitchel Humpherys authored
printk is meant to be used with an associated log level. There are some instances of printk scattered around the mm code where the log level is missing. Add a log level and adhere to suggestions by scripts/checkpatch.pl by moving to the pr_* macros. Also add the typical pr_fmt definition so that print statements can be easily traced back to the modules where they occur, correlated one with another, etc. This will require the removal of some (now redundant) prefixes on a few print statements. Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Richard Weinberger authored
Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Josh Triplett authored
This makes it more likely that patch submitters will CC API/ABI changes to the linux-api list, and tools like get_maintainer.pl will do so automatically. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Michael Kerrisk <mtk.man-pages@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Fabian Frederick authored
Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Fabian Frederick authored
Replace seq_printf where possible. This patch also fixes the following checkpatch warning "unnecessary whitespace before a quoted newline" Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Fabian Frederick authored
Replace obsolete functions. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Fabian Frederick authored
Replace obsolete functions simple_strtoul/kstrtouint simple_strtol/kstrtoint (kstr __must_check requires the right function to be applied) Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
The remap_file_pages() system call is used to create a nonlinear mapping, that is, a mapping in which the pages of the file are mapped into a nonsequential order in memory. The advantage of using remap_file_pages() over using repeated calls to mmap(2) is that the former approach does not require the kernel to create additional VMA (Virtual Memory Area) data structures. Supporting of nonlinear mapping requires significant amount of non-trivial code in kernel virtual memory subsystem including hot paths. Also to get nonlinear mapping work kernel need a way to distinguish normal page table entries from entries with file offset (pte_file). Kernel reserves flag in PTE for this purpose. PTE flags are scarce resource especially on some CPU architectures. It would be nice to free up the flag for other usage. Fortunately, there are not many users of remap_file_pages() in the wild. It's only known that one enterprise RDBMS implementation uses the syscall on 32-bit systems to map files bigger than can linearly fit into 32-bit virtual address space. This use-case is not critical anymore since 64-bit systems are widely available. The plan is to deprecate the syscall and replace it with an emulation. The emulation will create new VMAs instead of nonlinear mappings. It's going to work slower for rare users of remap_file_pages() but ABI is preserved. One side effect of emulation (apart from performance) is that user can hit vm.max_map_count limit more easily due to additional VMAs. See comment for DEFAULT_MAX_MAP_COUNT for more details on the limit. [akpm@linux-foundation.org: fix spello] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Jones <davej@redhat.com> Cc: Armin Rigo <arigo@tunes.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jianyu Zhan authored
Memcg zoneinfo lookup sites have either the page, the zone, or the node id and zone index, but sites that only have the zone have to look up the node id and zone index themselves, whereas sites that already have those two integers use a function for a simple pointer chase. Provide mem_cgroup_zone_zoneinfo() that takes a zone pointer and let sites that already have node id and zone index - all for each node, for each zone iterators - use &memcg->nodeinfo[nid]->zoneinfo[zid]. Rename page_cgroup_zoneinfo() to mem_cgroup_page_zoneinfo() to match. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Catalin Marinas authored
Kmemleak could ignore memory blocks allocated via memblock_alloc() leading to false positives during scanning. This patch adds the corresponding callbacks and removes kmemleak_free_* calls in mm/nobootmem.c to avoid duplication. The kmemleak_alloc() in mm/nobootmem.c is kept since __alloc_memory_core_early() does not use memblock_alloc() directly. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Catalin Marinas authored
When mempool_alloc() returns an existing pool object, kmemleak_alloc() is no longer called and the stack trace corresponds to the original object allocation. This patch updates the kmemleak allocation stack trace for such objects to make it more useful for debugging. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Catalin Marinas authored
Since radix_tree_preload() stack trace is not always useful for debugging an actual radix tree memory leak, this patch updates the kmemleak allocation stack trace in the radix_tree_node_alloc() function. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Catalin Marinas authored
The memory allocation stack trace is not always useful for debugging a memory leak (e.g. radix_tree_preload). This function, when called, updates the stack trace for an already allocated object. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jianpeng Ma authored
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Memory reclaim always uses swappiness of the reclaim target memcg (origin of the memory pressure) or vm_swappiness for global memory reclaim. This behavior was consistent (except for difference between global and hard limit reclaim) because swappiness was enforced to be consistent within each memcg hierarchy. After "mm: memcontrol: remove hierarchy restrictions for swappiness and oom_control" each memcg can have its own swappiness independent of hierarchical parents, though, so the consistency guarantee is gone. This can lead to an unexpected behavior. Say that a group is explicitly configured to not swapout by memory.swappiness=0 but its memory gets swapped out anyway when the memory pressure comes from its parent with a It is also unexpected that the knob is meaningless without setting the hard limit which would trigger the reclaim and enforce the swappiness. There are setups where the hard limit is configured higher in the hierarchy by an administrator and children groups are under control of somebody else who is interested in the swapout behavior but not necessarily about the memory limit. From a semantic point of view swappiness is an attribute defining anon vs. file proportional scanning of LRU which is memcg specific (unlike charges which are propagated up the hierarchy) so it should be applied to the particular memcg's LRU regardless where the memory pressure comes from. This patch removes vmscan_swappiness() and stores the swappiness into the scan_control structure. mem_cgroup_swappiness is then used to provide the correct value before shrink_lruvec is called. The global vm_swappiness is used for the root memcg. [hughd@google.com: oopses immediately when booted with cgroup_disable=memory] Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-