- 22 Apr, 2019 17 commits
-
-
Guo Ren authored
The function of __va() will return "void *", but the pgd_base is unsigned long. Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
-
Mao Han authored
This patch implements the perf registers sampling and validation API for csky arch. The valid registers and their register ID are defined in perf_regs.h. Perf tool can backtrace in userspace with unwind library and the registers/user stack dump support. Signed-off-by:
Mao Han <han_mao@c-sky.com> Signed-off-by:
Guo Ren <ren_guo@c-sky.com>
-
Mao Han authored
This patch add support for page fault count, major fault count and minorfault count. Without this patch page faults are not sampled for perf event. Performance counter stats for '/usr/lib/perf-test/callchain_test': 0 page-faults # 0.000 K/sec Signed-off-by:
Mao Han <han_mao@c-sky.com> Signed-off-by:
Guo Ren <ren_guo@c-sky.com>
-
Guo Ren authored
The name of phys_offset is so common for global export and it may conflict with some local name. So change phys_offset to va_pa_offset which also used by riscv. Also use __pa() and __va() instead of using phys_offset directly. Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
-
Guo Ren authored
Modify SETUP_MMU macro to fit on both MMU-on or MMU-off enviornment and vmlinux could bootup from MMU off enviornment for some cases. Unify the style of _start and _start_smp_secondary in head.S to make head.S looks more concise and easy to understand. Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
-
Mao Han authored
In trace events as tracepoints context are not able to be retrieve with task_pt_regs. Without arch caller regs support the pt_regs context will be all zero, perf can not parsing the callchain and resolving the symbols correctly, some time will even get into deadlock while handling the page fault, eg: perf kmem —page record ls Changelog - Add test case cmd in comment - Use regs_fp(regs) which is defined in abi/regdef.h Signed-off-by:
Mao Han <han_mao@c-sky.com> Signed-off-by:
Guo Ren <guoren@kernel.org>
-
Guo Ren authored
In our stress test, we found some crash problem caused by: if (!(vma->vm_flags & VM_EXEC)) return; in update_mmu_cache(). Seems current update_mmu_cache implementation is wrong and we retread to the conservative implementation. Also the usage of kmap_atomic in update_mmu_cache is risky, page-virtual may be scheduled out and changed, so we must use preempt_disable & pagefault_disable which is called by kmap_atomic(). Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
-
Guo Ren authored
Before this patch csky-linux need CONFIG_RAM_BASE to determine start physical address. Now we use phys_offset variable to replace the macro of PHYS_OFFSET and we setup phys_offset with real physical address which is determined during startup in head.S. With this patch we needn't re-compile kernel for different start physical address. ie: 0x0 / 0xc0000000 start physical address could use the same vmlinux, be care different start address must be 512MB aligned. Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
-
Guo Ren authored
Linux kernel has provided some apis for arch signal's implementation. For example: restore_saved_sigmask() set_current_blocked() restore_altstack() But in last version of csky signal.c didn't use them and some codes are confusing, so reconstruct signal.c with reference to riscv's code. Now csky signal.c implementation are very close to riscv and we can get the following benefits: - Clear code structure - The signal code of riscv and csky can be reviewed together - Promoting the unification of arch's signal implementation Also modified the related code in entry.S Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de>
-
Guo Ren authored
We could use regs->sr 16-24 bits to detect syscall: VEC_TRAP0 and r11_sig is no necessary for current implementation. In this patch, we implement the in_syscall and forget_syscall which are inspired from arm & nds32, but csky pt_regs has no syscall_num element and we just set zero to regs->sr's vector-bits-field instead. For ret_from_fork, current task was forked from parent which is in syscall progress and its regs->sr has been already setted with VEC_TRAP0. See: arch/csky/kernel/process.c: copy_thread() Signed-off-by:
Guo Ren <ren_guo@c-sky.com>
-
Guo Ren authored
Move #ifdef __KERNEL__ code in the uapi namespace to non-uapi include/asm/ptrace.h namespace and remove #ifdef __KERNEL__ in include/asm/ptrace.h. Seperate ptrace.h in uapi and non-uapi is more common and clear. Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Dmitry V. Levin <ldv@altlinux.org>
-
Jagadeesh Pagadala authored
Remove duplicate header which is included twice. Signed-off-by:
Jagadeesh Pagadala <jagdsh.linux@gmail.com> Signed-off-by:
Guo Ren <ren_guo@c-sky.com>
-
Masahiro Yamada authored
Since commit 7cbbbb8b ("kbuild: warn redundant generic-y"), redundant generic-y is reported. I missed to delete this one. scripts/Makefile.asm-generic:25: redundant generic-y found in arch/csky/include/asm/Kbuild: ftrace.h In this case, csky-specific implementation exists in arch/csky/include/asm/ftrace.h Signed-off-by:
Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by:
Guo Ren <ren_guo@c-sky.com>
-
Guo Ren authored
Previous syscall_trace implementation couldn't support AUDITSYSCALL and SYSCALL_TRACEPOINTS. Now we redesign it to support audit_syscall and syscall_tracepoints just like other archs'. Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Dmitry V. Levin <ldv@altlinux.org> Cc: Arnd Bergmann <arnd@arndb.de>
-
Mao Han authored
This patch add support for perf callchain sampling on csky platform. As fp is used to unwind the stack, the program being sampled and the C library need to be compiled with -mbacktrace for user callchains, kernel callchains require CONFIG_STACKTRACE = y. Changelog: - Coding convention with Christoph's advice for riscv's. Signed-off-by:
Mao Han <han_mao@c-sky.com> Signed-off-by:
Guo Ren <ren_guo@c-sky.com> Cc: Christoph Hellwig <hch@infradead.org>
-
Guo Ren authored
Support dynamic ftrace including dynamic graph tracer. Gcc-csky with -pg will produce call site in every function prologue and we can use these call site to hook trace function. gcc with -pg origin call site: push lr jbsr _mcount nop32 nop32 If the (callee - caller)'s offset is in range of bsr instruction, we'll modify code with: push lr bsr _mcount nop32 nop32 Else if the (callee - caller)'s offset is out of bsr instrunction, we'll modify code with: push lr movih r26, ... ori r26, ... jsr r26 (r26 is reserved for jsr link reg in csky abiv2 spec.) Signed-off-by:
Guo Ren <ren_guo@c-sky.com>
-
Guo Ren authored
This fixup is continue to commit 35ff802a (csky: fixup remove vdsp implement for kernel.) and in that patch I didn't finish the job. We must forbid gcc to generate any vdsp & fpu instructions and remove vdsp asm in memmove.S. eg: For GCC it's -mcpu=ck860 and For AS it's -Wa,-mcpu=ck860fv Signed-off-by:
Guo Ren <ren_guo@c-sky.com>
-
- 21 Apr, 2019 1 commit
-
-
Linus Torvalds authored
-
- 20 Apr, 2019 11 commits
-
-
git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds authored
Pull NFS client bugfix from Trond Myklebust: "Fix a regression in which an RPC call can be tagged with an error despite the transmission being successful" * tag 'nfs-for-5.1-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Ignore queue transmission errors on successful transmission
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Three minor fixes: two obvious ones in drivers and a fix to the SG_IO path to correctly return status on error" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: aic7xxx: fix EISA support Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO" scsi: core: set result when the command cannot be dispatched
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "A set of small fixes that should go into this series. This contains: - Removal of unused queue member (Hou) - Overflow bvec fix (Ming) - Various little io_uring tweaks (me) - kthread parking - Only call cpu_possible() for verified CPU - Drop unused 'file' argument to io_file_put() - io_uring_enter vs io_uring_register deadlock fix - CQ overflow fix - BFQ internal depth update fix (me)" * tag 'for-linus-20190420' of git://git.kernel.dk/linux-block: block: make sure that bvec length can't be overflow block: kill all_q_node in request_queue io_uring: fix CQ overflow condition io_uring: fix possible deadlock between io_uring_{enter,register} io_uring: drop io_file_put() 'file' argument bfq: update internal depth state when queue depth changes io_uring: only test SQPOLL cpu after we've verified it io_uring: park SQPOLL thread if it's percpu
-
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linuxLinus Torvalds authored
Pill i3c fixes from Boris Brezillon: - fix the random PID check - fix the disable controller logic in the designware driver - fix I3C entry in MAINTAINERS * tag 'i3c/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: MAINTAINERS: Fix the I3C entry i3c: dw: Fix dw_i3c_master_disable controller by using correct mask i3c: Fix the verification of random PID
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "Two core fixes for long-standing bugs for the races at concurrent device creation and deletion that were (unsurprisingly) spotted by syzkaller with usb-fuzzer. The rest are usual small HD-audio fixes" * tag 'sound-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek - add two more pin configuration sets to quirk table ALSA: core: Fix card races between register and disconnect ALSA: info: Fix racy addition/deletion of nodes ALSA: hda: Initialize power_state field properly
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fixes from Ingo Molnar: "Misc clocksource driver fixes, and a sched-clock wrapping fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() clocksource/drivers/timer-ti-dm: Remove omap_dm_timer_set_load_start clocksource/drivers/oxnas: Fix OX820 compatible clocksource/drivers/arm_arch_timer: Remove unneeded pr_fmt macro clocksource/drivers/npcm: select TIMER_OF
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Ingo Molnar: "Misc fixes: - various tooling fixes - kretprobe fixes - kprobes annotation fixes - kprobes error checking fix - fix the default events for AMD Family 17h CPUs - PEBS fix - AUX record fix - address filtering fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Avoid kretprobe recursion bug kprobes: Mark ftrace mcount handler functions nokprobe x86/kprobes: Verify stack frame on kretprobe perf/x86/amd: Add event map for AMD Family 17h perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf() perf tools: Fix map reference counting perf evlist: Fix side band thread draining perf tools: Check maps for bpf programs perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info() tools include uapi: Sync sound/asound.h copy perf top: Always sample time to satisfy needs of use of ordered queuing perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user) tools lib traceevent: Fix missing equality check for strcmp perf stat: Disable DIR_FORMAT feature for 'perf stat record' perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view perf header: Fix lock/unlock imbalances when processing BPF/BTF info perf/x86: Fix incorrect PEBS_REGS perf/ring_buffer: Fix AUX record suppression perf/core: Fix the address filtering fix kprobes: Fix error check when reusing optimized probes
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Ingo Molnar: "Misc fixes all over the place: a console spam fix, section attributes fixes, a KASLR fix, a TLB stack-variable alignment fix, a reboot quirk, boot options related warnings fix, an LTO fix, a deadlock fix and an RDT fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority x86/cpu/bugs: Use __initconst for 'const' init data x86/mm/KASLR: Fix the size of the direct mapping section x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness" x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info" x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T x86/mm: Prevent bogus warnings with "noexec=off" x86/build/lto: Fix truncated .bss with -fdata-sections x86/speculation: Prevent deadlock on ssb_state::lock x86/resctrl: Do not repeat rdtgroup mode initialization
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Ingo Molnar: "A deadline scheduler warning/race fix, and a cfs_period_us quota calculation workaround where the real fix looks too involved to merge immediately" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Correctly handle active 0-lag timers sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fixes from Ingo Molnar: "A lockdep warning fix and a script execution fix when atomics are generated" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomics: Don't assume that scripts are executable locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull cgroup fix from Tejun Heo: "A patch to fix a RCU imbalance error in the devices cgroup configuration error path" * 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: device_cgroup: fix RCU imbalance in error case
-
- 19 Apr, 2019 11 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpuLinus Torvalds authored
Pull percpu fixlet from Dennis Zhou: "This stops printing the base address of percpu memory on initialization" * 'for-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: percpu: stop printing kernel addresses
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty/serial fixes from Greg KH: "Here are five small fixes for some tty/serial/vt issues that have been reported. The vt one has been around for a while, it is good to finally get that resolved. The others fix a build warning that showed up in 5.1-rc1, and resolve a problem in the sh-sci driver. Note, the second patch for build warning fix for the sc16is7xx driver was just applied to the tree, as it resolves a problem with the previous patch to try to solve the issue. It has not shown up in linux-next yet, unlike all of the other patches, but it has passed 0-day testing and everyone seems to agree that it is correct" * tag 'tty-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: sc16is7xx: put err_spi and err_i2c into correct #ifdef vt: fix cursor when clearing the screen sc16is7xx: move label 'err_spi' to correct section serial: sh-sci: Fix HSCIF RX sampling point adjustment serial: sh-sci: Fix HSCIF RX sampling point calculation
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "16 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping mm/kmemleak.c: fix unused-function warning init: initialize jump labels before command line option parsing kernel/watchdog_hld.c: hard lockup message should end with a newline kcov: improve CONFIG_ARCH_HAS_KCOV help text mm: fix inactive list balancing between NUMA nodes and cgroups mm/hotplug: treat CMA pages as unmovable proc: fixup proc-pid-vm test proc: fix map_files test on F29 mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock mm: swapoff: shmem_unuse() stop eviction without igrab() mm: swapoff: take notice of completion sooner mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES mm: swapoff: shmem_find_swap_entries() filter out other types slab: store tagged freelist for off-slab slabmgmt
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging and IIO fixes from Greg KH: "Here is a bunch of IIO driver fixes, and some smaller staging driver fixes, for 5.1-rc6. The IIO fixes were delayed due to my vacation, but all resolve a number of reported issues and have been in linux-next for a few weeks with no reported issues. The other staging driver fixes are all tiny, resolving some reported issues in the comedi and most drivers, as well as some erofs fixes. All of these patches have been in linux-next with no reported issues" * tag 'staging-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (24 commits) staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf staging: comedi: ni_usb6501: Fix use of uninitialized mutex staging: erofs: fix unexpected out-of-bound data access staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf staging: comedi: vmk80xx: Fix use of uninitialized semaphore staging: most: core: use device description as name iio: core: fix a possible circular locking dependency iio: ad_sigma_delta: select channel when reading register iio: pms7003: select IIO_TRIGGERED_BUFFER iio: cros_ec: Fix the maths for gyro scale calculation iio: adc: xilinx: prevent touching unclocked h/w on remove iio: adc: xilinx: fix potential use-after-free on probe iio: adc: xilinx: fix potential use-after-free on remove iio: dac: mcp4725: add missing powerdown bits in store eeprom io: accel: kxcjk1013: restore the range after resume. iio:chemical:bme680: Fix SPI read interface iio:chemical:bme680: Fix, report temperature in millidegrees iio: chemical: fix missing Kconfig block for sgp30 iio: adc: at91: disable adc channel interrupt in timeout case iio: gyro: mpu3050: fix chip ID reading ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc fixes from Greg KH: "Here are four small misc driver fixes for 5.1-rc6. Nothing major at all, they fix up a Kconfig issues, a SPDX invalid license tag, and two tiny bugfixes. All have been in linux-next for a while with no reported issues" * tag 'char-misc-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: drivers: power: supply: goldfish_battery: Fix bogus SPDX identifier extcon: ptn5150: fix COMPILE_TEST dependencies misc: fastrpc: add checked value for dma_set_mask habanalabs: remove low credit limit of DMA #0
-
Ming Lei authored
bvec->bv_offset may be bigger than PAGE_SIZE sometimes, such as, when one bio is splitted in the middle of one bvec via bio_split(), and bi_iter.bi_bvec_done is used to build offset of the 1st bvec of remained bio. And the remained bio's bvec may be re-submitted to fs layer via ITER_IBVEC, such as loop and nvme-loop. So we have to make sure that every bvec's offset is less than PAGE_SIZE from bio_for_each_segment_all() because some drivers(loop, nvme-loop) passes the splitted bvec to fs layer via ITER_BVEC. This patch fixes this issue reported by Zhang Yi When running nvme/011. Cc: Christoph Hellwig <hch@lst.de> Cc: Yi Zhang <yi.zhang@redhat.com> Reported-by:
Yi Zhang <yi.zhang@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Fixes: 6dc4f100 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Hou Tao authored
all_q_node has not been used since commit 4b855ad3 ("blk-mq: Create hctx for each present CPU"), so remove it. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Hou Tao <houtao1@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input updates from Dmitry Torokhov: - several new key mappings for HID - a host of new ACPI IDs used to identify Elan touchpads in Lenovo laptops * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ HID: input: add mapping for "Toggle Display" key HID: input: add mapping for "Full Screen" key HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys HID: input: add mapping for Expose/Overview key HID: input: fix mapping of aspect ratio key [media] doc-rst: switch to new names for Full Screen/Aspect keys Input: document meanings of KEY_SCREEN and KEY_ZOOM Input: elan_i2c - add hardware ID for multiple Lenovo laptops
-
Hans de Goede authored
The "ENERGY_PERF_BIAS: Set to 'normal', was 'performance'" message triggers on pretty much every Intel machine. The purpose of log messages with a warning level is to notify the user of something which potentially is a problem, or at least somewhat unexpected. This message clearly does not match those criteria, so lower its log priority from warning to info. Signed-off-by:
Hans de Goede <hdegoede@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181230172715.17469-1-hdegoede@redhat.comSigned-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Merge tag 'perf-urgent-for-mingo-5.1-20190419' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: perf top: Jiri Olsa: - Fix 'perf top --pid', it needs PERF_SAMPLE_TIME since we switched to using a different thread to sort the events and then even for just a single thread we now need timestamps. BPF: Jiri Olsa: - Fix bpf_prog and btf lookup functions failure path to to properly return NULL. - Fix side band thread draining, used to process PERF_RECORD_BPF_EVENT metadata records. core: Jiri Olsa: - Fix map lookup by name to get a refcount when the name is already in the tree. Found Song Liu: - Fix __map__is_kmodule() by taking into account recently added BPF maps. UAPI: Arnaldo Carvalho de Melo: - Sync sound/asound.h copy Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Andrea Arcangeli authored
The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by:
Andrea Arcangeli <aarcange@redhat.com> Reported-by:
Jann Horn <jannh@google.com> Suggested-by:
Oleg Nesterov <oleg@redhat.com> Acked-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Mike Rapoport <rppt@linux.ibm.com> Reviewed-by:
Oleg Nesterov <oleg@redhat.com> Reviewed-by:
Jann Horn <jannh@google.com> Acked-by:
Jason Gunthorpe <jgg@mellanox.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-