- 22 Apr, 2020 24 commits
-
-
Peter Zijlstra authored
Now that objtool is capable of processing vmlinux.o and actually has something useful to do there, (conditionally) add it to the final link pass. This will increase build time by a few seconds. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115119.287494491@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
In preparation for find_insn_containing(), change insn_hash to use sec_offset_hash(). This actually reduces runtime; probably because mixing in the section index reduces the collisions due to text sections all starting their instructions at offset 0. Runtime on vmlinux.o from 3.1 to 2.5 seconds. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115119.227240432@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
When doing kbuild tests to see if the objtool changes affected those I found that there was a measurable regression: pre post real 1m13.594 1m16.488s user 34m58.246s 35m23.947s sys 4m0.393s 4m27.312s Perf showed that for small files the increased hash-table sizes were a measurable difference. Since we already have -l "vmlinux" to distinguish between the modes, make it also use a smaller portion of the hash-tables. This flips it into a small win: real 1m14.143s user 34m49.292s sys 3m44.746s Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115119.167588731@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Validate that any call out of .noinstr.text is in between instr_begin() and instr_end() annotations. This annotation is useful to ensure correct behaviour wrt tracing sensitive code like entry/exit and idle code. When we run code in a sensitive context we want a guarantee no unknown code is ran. Since this validation relies on knowing the section of call destination symbols, we must run it on vmlinux.o instead of on individual object files. Add two options: -d/--duplicate "duplicate validation for vmlinux" -l/--vmlinux "vmlinux.o validation" Where the latter auto-detects when objname ends with "vmlinux.o" and the former will force all validations, also those already done on !vmlinux object files. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115119.106268040@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Objtool keeps per instruction CFI state in struct insn_state and will save/restore this where required. However, insn_state has grown some !CFI state, and this must not be saved/restored (that would loose/destroy state). Fix this by moving the CFI specific parts of insn_state into struct cfi_state. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115119.045821071@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
There's going to be a new struct cfi_state, rename this one to make place. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.986441913@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
The SAVE/RESTORE hints are now unused; remove them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.926738768@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
'Optimize' ftrace_regs_caller. Instead of comparing against an immediate, the more natural way to test for zero on x86 is: 'test %r,%r'. 48 83 f8 00 cmp $0x0,%rax 74 49 je 226 <ftrace_regs_call+0xa3> 48 85 c0 test %rax,%rax 74 49 je 225 <ftrace_regs_call+0xa2> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.867411350@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
There's a convenient macro for 'SS+8' called FRAME_SIZE. Use it to clarify things. (entry/calling.h calls this SIZEOF_PTREGS but we're using asm/ptrace-abi.h) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.808485515@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
The ftrace_regs_caller() trampoline does something 'funny' when there is a direct-caller present. In that case it stuffs the 'direct-caller' address on the return stack and then exits the function. This then results in 'returning' to the direct-caller with the exact registers we came in with -- an indirect tail-call without using a register. This however (rightfully) confuses objtool because the function shares a few instruction in order to have a single exit path, but the stack layout is different for them, depending through which path we came there. This is currently cludged by forcing the stack state to the non-direct case, but this generates actively wrong (ORC) unwind information for the direct case, leading to potential broken unwinds. Fix this issue by fully separating the exit paths. This results in having to poke a second RET into the trampoline copy, see ftrace_regs_caller_ret. This brings us to a second objtool problem, in order for it to perceive the 'jmp ftrace_epilogue' as a function exit, it needs to be recognised as a tail call. In order to make that happen, ftrace_epilogue needs to be the start of an STT_FUNC, so re-arrange code to make this so. Finally, a third issue is that objtool requires functions to exit with the same stack layout they started with, which is obviously violated in the direct case, employ the new HINT_RET_OFFSET to tell objtool this is an expected exception. Together, this results in generating correct ORC unwind information for the ftrace_regs_caller() function and it's trampoline copies. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.749606694@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Normally objtool ensures a function keeps the stack layout invariant. But there is a useful exception, it is possible to stuff the return stack in order to 'inject' a 'call': push $fun ret In this case the invariant mentioned above is violated. Add an objtool HINT to annotate this and allow a function exit with a modified stack frame. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.690601403@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Teach objtool a little more about IRET so that we can avoid using the SAVE/RESTORE annotation. In particular, make the weird corner case in insn->restore go away. The purpose of that corner case is to deal with the fact that UNWIND_HINT_RESTORE lands on the instruction after IRET, but that instruction can end up being outside the basic block, consider: if (cond) sync_core() foo(); Then the hint will land on foo(), and we'll encounter the restore hint without ever having seen the save hint. By teaching objtool about the arch specific exception frame size, and assuming that any IRET in an STT_FUNC symbol is an exception frame sized POP, we can remove the use of save/restore hints for this code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.631224674@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Julien Thierry authored
Instruction sets can include more or less complex operations which might not fit the currently defined set of stack_ops. Combining more than one stack_op provides more flexibility to describe the behaviour of an instruction. This also reduces the need to define new stack_ops specific to a single instruction set. Allow instruction decoders to generate multiple stack_op per instruction. Signed-off-by: Julien Thierry <jthierry@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200327152847.15294-11-jthierry@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Muchun Song authored
If the prefix of section name is not '.rodata', the following function call can never return 0. strcmp(sec->name, C_JUMP_TABLE_SECTION) So the name comparison is pointless, just remove it. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Nick Desaulniers authored
Compiling with Clang and CONFIG_KASAN=y was exposing a few warnings: call to memset() with UACCESS enabled Document how to fix these for future travelers. Link: https://github.com/ClangBuiltLinux/linux/issues/876Suggested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Suggested-by: Matt Helsley <mhelsley@vmware.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Julien Thierry authored
Some CFI definitions used by generic objtool code have no reason to vary from one architecture to another. Keep those definitions in generic code and move the arch-specific ones to a new arch-specific header. Signed-off-by: Julien Thierry <jthierry@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Raphael Gault authored
The jump and call destination relocation offsets are x86-specific. Abstract them by calling arch-specific implementations. [ jthierry: Remove superfluous comment; replace other addend offsets with arch_dest_rela_offset() ] Signed-off-by: Raphael Gault <raphael.gault@arm.com> Signed-off-by: Julien Thierry <jthierry@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Julien Thierry authored
The initial register state is set up by arch specific code. Use the value the arch code has set when restoring registers from the stack. Suggested-by: Raphael Gault <raphael.gault@arm.com> Signed-off-by: Julien Thierry <jthierry@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Julien Thierry authored
The .alternatives section can contain entries with no original instructions. Objtool will currently crash when handling such an entry. Just skip that entry, but still give a warning to discourage useless entries. Signed-off-by: Julien Thierry <jthierry@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Julien Thierry authored
When a function fails its validation, it might leave a stale state that will be used for the validation of other functions. That would cause false warnings on potentially valid functions. Reset the instruction state before the validation of each individual function. Signed-off-by: Julien Thierry <jthierry@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Julien Thierry authored
POP operations are already in the code path where the destination operand is OP_DEST_REG. There is no need to check the operand type again. Signed-off-by: Julien Thierry <jthierry@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Julien Thierry authored
Currently, the check of tools files against kernel equivalent is only done after every object file has been built. This means one might fix build issues against outdated headers without seeing a warning about this. Check headers before any object is built. Also, make it part of a FORCE'd recipe so every attempt to build objtool will report the outdated headers (if any). Signed-off-by: Julien Thierry <jthierry@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Julien Thierry authored
Sometimes, WARN_FUNC() and other users of symbol_by_offset() will associate the first instruction of a symbol with the symbol preceding it. This is because symbol->offset + symbol->len is already outside of the symbol's range. Fixes: 2a362ecc ("objtool: Optimize find_symbol_*() and read_symbols()") Signed-off-by: Julien Thierry <jthierry@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Apparently there's people doing 64bit builds on 32bit machines. Fixes: 74b873e4 ("objtool: Optimize find_rela_by_dest_range()") Reported-by: youling257@gmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 21 Apr, 2020 16 commits
-
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: tools/vm: fix cross-compile build coredump: fix null pointer dereference on coredump mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path shmem: fix possible deadlocks on shmlock_user_lock vmalloc: fix remap_vmalloc_range() bounds checks mm/shmem: fix build without THP mm/ksm: fix NULL pointer dereference when KSM zero page is enabled tools/build: tweak unused value workaround checkpatch: fix a typo in the regex for $allocFunctions mm, gup: return EINTR when gup is interrupted by fatal signals mm/hugetlb: fix a addressing exception caused by huge_pte_offset MAINTAINERS: add an entry for kfifo mm/userfaultfd: disable userfaultfd-wp on x86_32 slub: avoid redzone when choosing freepointer location sh: fix build error in mm/init.c
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "Bugfixes, and a few cleanups to the newly-introduced assembly language vmentry code for AMD" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions kvm: Disable objtool frame pointer checking for vmenter.S MAINTAINERS: add a reviewer for KVM/s390 KVM: s390: Fix PV check in deliverable_irqs() kvm: Handle reads of SandyBridge RAPL PMU MSRs rather than injecting #GP KVM: Remove CREATE_IRQCHIP/SET_PIT2 race KVM: SVM: Fix __svm_vcpu_run declaration. KVM: SVM: Do not setup frame pointer in __svm_vcpu_run KVM: SVM: Fix build error due to missing release_pages() include KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARD kvm: nVMX: match comment with return type for nested_vmx_exit_reflected kvm: nVMX: reflect MTF VM-exits if injected by L1 KVM: s390: Return last valid slot if approx index is out-of-bounds KVM: Check validity of resolved slot when searching memslots KVM: VMX: Enable machine check support for 32bit targets KVM: SVM: move more vmentry code to assembly KVM: SVM: fix compilation with modular PSP and non-modular KVM
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull virtio fixes and cleanups from Michael Tsirkin: - Some bug fixes - Cleanup a couple of issues that surfaced meanwhile - Disable vhost on ARM with OABI for now - to be fixed fully later in the cycle or in the next release. * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits) vhost: disable for OABI virtio: drop vringh.h dependency virtio_blk: add a missing include virtio-balloon: Avoid using the word 'report' when referring to free page hinting virtio-balloon: make virtballoon_free_page_report() static vdpa: fix comment of vdpa_register_device() vdpa: make vhost, virtio depend on menu vdpa: allow a 32 bit vq alignment drm/virtio: fix up for include file changes remoteproc: pull in slab.h rpmsg: pull in slab.h virtio_input: pull in slab.h remoteproc: pull in slab.h virtio-rng: pull in slab.h virtgpu: pull in uaccess.h tools/virtio: make asm/barrier.h self contained tools/virtio: define aligned attribute virtio/test: fix up after IOTLB changes vhost: Create accessors for virtqueues private_data vdpasim: Return status in vdpasim_get_status ...
-
git://git.infradead.org/users/jjs/linux-tpmddLinus Torvalds authored
Pull tpm fixes from Jarkko Sakkinen: "A few bug fixes" * tag 'tpmdd-next-20200421' of git://git.infradead.org/users/jjs/linux-tpmdd: tpm/tpm_tis: Free IRQ if probing fails tpm: fix wrong return value in tpm_pcr_extend tpm: ibmvtpm: retry on H_CLOSED in tpm_ibmvtpm_send() tpm: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module
-
git://github.com/ojeda/linuxLinus Torvalds authored
Pull clang-format fixlets from Miguel Ojeda: "Two trivial clang-format changes: - Don't indent C++ namespaces (Ian Rogers) - The usual clang-format macro list update (Miguel Ojeda)" * tag 'clang-format-for-linus-v5.7-rc3' of git://github.com/ojeda/linux: clang-format: Update with the latest for_each macro list clang-format: don't indent namespaces
-
Lucas Stach authored
Commit 7ed1c190 ("tools: fix cross-compile var clobbering") moved the setup of the CC variable to tools/scripts/Makefile.include to make the behavior consistent across all the tools Makefiles. As the vm tools missed the include we end up with the wrong CC in a cross-compiling evironment. Fixes: 7ed1c190 (tools: fix cross-compile var clobbering) Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Martin Kelly <martin@martingkelly.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416104748.25243-1-l.stach@pengutronix.deSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Sudip Mukherjee authored
If the core_pattern is set to "|" and any process segfaults then we get a null pointer derefernce while trying to coredump. The call stack shows: RIP: do_coredump+0x628/0x11c0 When the core_pattern has only "|" there is no use of trying the coredump and we can check that while formating the corename and exit with an error. After this change I get: format_corename failed Aborting core Fixes: 315c6926 ("coredump: split pipe command whitespace before expanding template") Reported-by: Matthew Ruffell <matthew.ruffell@canonical.com> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Wise <pabs3@bonedaddy.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416194612.21418-1-sudipm.mukherjee@gmail.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yang Shi authored
Syzbot reported the below lockdep splat: WARNING: possible irq lock inversion dependency detected 5.6.0-rc7-syzkaller #0 Not tainted -------------------------------------------------------- syz-executor.0/10317 just changed the state of lock: ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline] ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: shmem_mfill_atomic_pte+0x1012/0x21c0 mm/shmem.c:2407 but this lock was taken by another, SOFTIRQ-safe lock in the past: (&(&xa->xa_lock)->rlock#5){..-.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&info->lock)->rlock); local_irq_disable(); lock(&(&xa->xa_lock)->rlock#5); lock(&(&info->lock)->rlock); <Interrupt> lock(&(&xa->xa_lock)->rlock#5); *** DEADLOCK *** The full report is quite lengthy, please see: https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2004152007370.13597@eggly.anvils/T/#m813b412c5f78e25ca8c6c7734886ed4de43f241d It is because CPU 0 held info->lock with IRQ enabled in userfaultfd_copy path, then CPU 1 is splitting a THP which held xa_lock and info->lock in IRQ disabled context at the same time. If softirq comes in to acquire xa_lock, the deadlock would be triggered. The fix is to acquire/release info->lock with *_irq version instead of plain spin_{lock,unlock} to make it softirq safe. Fixes: 4c27fe4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Reported-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: http://lkml.kernel.org/r/1587061357-122619-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Recent commit 71725ed1 ("mm: huge tmpfs: try to split_huge_page() when punching hole") has allowed syzkaller to probe deeper, uncovering a long-standing lockdep issue between the irq-unsafe shmlock_user_lock, the irq-safe xa_lock on mapping->i_pages, and shmem inode's info->lock which nests inside xa_lock (or tree_lock) since 4.8's shmem_uncharge(). user_shm_lock(), servicing SysV shmctl(SHM_LOCK), wants shmlock_user_lock while its caller shmem_lock() holds info->lock with interrupts disabled; but hugetlbfs_file_setup() calls user_shm_lock() with interrupts enabled, and might be interrupted by a writeback endio wanting xa_lock on i_pages. This may not risk an actual deadlock, since shmem inodes do not take part in writeback accounting, but there are several easy ways to avoid it. Requiring interrupts disabled for shmlock_user_lock would be easy, but it's a high-level global lock for which that seems inappropriate. Instead, recall that the use of info->lock to guard info->flags in shmem_lock() dates from pre-3.1 days, when races with SHMEM_PAGEIN and SHMEM_TRUNCATE could occur: nowadays it serves no purpose, the only flag added or removed is VM_LOCKED itself, and calls to shmem_lock() an inode are already serialized by the caller. Take info->lock out of the chain and the possibility of deadlock or lockdep warning goes away. Fixes: 4595ef88 ("shmem: make shmem_inode_info::lock irq-safe") Reported-by: syzbot+c8a8197c8852f566b9d9@syzkaller.appspotmail.com Reported-by: syzbot+40b71e145e73f78f81ad@syzkaller.appspotmail.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004161707410.16322@eggly.anvils Link: https://lore.kernel.org/lkml/000000000000e5838c05a3152f53@google.com/ Link: https://lore.kernel.org/lkml/0000000000003712b305a331d3b1@google.com/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jann Horn authored
remap_vmalloc_range() has had various issues with the bounds checks it promises to perform ("This function checks that addr is a valid vmalloc'ed area, and that it is big enough to cover the vma") over time, e.g.: - not detecting pgoff<<PAGE_SHIFT overflow - not detecting (pgoff<<PAGE_SHIFT)+usize overflow - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same vmalloc allocation - comparing a potentially wildly out-of-bounds pointer with the end of the vmalloc region In particular, since commit fc970227 ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer dereferences by calling mmap() on a BPF map with a size that is bigger than the distance from the start of the BPF map to the end of the address space. This could theoretically be used as a kernel ASLR bypass, by using whether mmap() with a given offset oopses or returns an error code to perform a binary search over the possible address range. To allow remap_vmalloc_range_partial() to verify that addr and addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset to remap_vmalloc_range_partial() instead of adding it to the pointer in remap_vmalloc_range(). In remap_vmalloc_range_partial(), fix the check against get_vm_area_size() by using size comparisons instead of pointer comparisons, and add checks for pgoff. Fixes: 83342314 ("[PATCH] mm: introduce remap_vmalloc_range()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Some optimizers don't notice that shmem_punch_compound() is always true (PageTransCompound() being false) without CONFIG_TRANSPARENT_HUGEPAGE==y. Use IS_ENABLED to help them to avoid the BUILD_BUG inside HPAGE_PMD_NR. Fixes: 71725ed1 ("mm: huge tmpfs: try to split_huge_page() when punching hole") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004142339170.10035@eggly.anvilsSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Muchun Song authored
find_mergeable_vma() can return NULL. In this case, it leads to a crash when we access vm_mm(its offset is 0x40) later in write_protect_page. And this case did happen on our server. The following call trace is captured in kernel 4.19 with the following patch applied and KSM zero page enabled on our server. commit e86c59b1 ("mm/ksm: improve deduplication of zero pages with colouring") So add a vma check to fix it. BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 Oops: 0000 [#1] SMP NOPTI CPU: 9 PID: 510 Comm: ksmd Kdump: loaded Tainted: G OE 4.19.36.bsk.9-amd64 #4.19.36.bsk.9 RIP: try_to_merge_one_page+0xc7/0x760 Code: 24 58 65 48 33 34 25 28 00 00 00 89 e8 0f 85 a3 06 00 00 48 83 c4 60 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 46 08 a8 01 75 b8 <49> 8b 44 24 40 4c 8d 7c 24 20 b9 07 00 00 00 4c 89 e6 4c 89 ff 48 RSP: 0018:ffffadbdd9fffdb0 EFLAGS: 00010246 RAX: ffffda83ffd4be08 RBX: ffffda83ffd4be40 RCX: 0000002c6e800000 RDX: 0000000000000000 RSI: ffffda83ffd4be40 RDI: 0000000000000000 RBP: ffffa11939f02ec0 R08: 0000000094e1a447 R09: 00000000abe76577 R10: 0000000000000962 R11: 0000000000004e6a R12: 0000000000000000 R13: ffffda83b1e06380 R14: ffffa18f31f072c0 R15: ffffda83ffd4be40 FS: 0000000000000000(0000) GS:ffffa0da43b80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 0000002c77c0a003 CR4: 00000000007626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ksm_scan_thread+0x115e/0x1960 kthread+0xf5/0x130 ret_from_fork+0x1f/0x30 [songmuchun@bytedance.com: if the vma is out of date, just exit] Link: http://lkml.kernel.org/r/20200416025034.29780-1-songmuchun@bytedance.com [akpm@linux-foundation.org: add the conventional braces, replace /** with /*] Fixes: e86c59b1 ("mm/ksm: improve deduplication of zero pages with colouring") Co-developed-by: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Hugh Dickins <hughd@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416025034.29780-1-songmuchun@bytedance.com Link: http://lkml.kernel.org/r/20200414132905.83819-1-songmuchun@bytedance.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
George Burgess IV authored
Clang has -Wself-assign enabled by default under -Wall, which always gets -Werror'ed on this file, causing sync-compare-and-swap to be disabled by default. The generally-accepted way to spell "this value is intentionally unused," is casting it to `void`. This is accepted by both GCC and Clang with -Wall enabled: https://godbolt.org/z/qqZ9r3Signed-off-by: George Burgess IV <gbiv@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: http://lkml.kernel.org/r/20200414195638.156123-1-gbiv@google.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christophe JAILLET authored
Here, we look for function such as 'netdev_alloc_skb_ip_align', so a '_' is missing in the regex. To make sure: grep -r --include=*.c skbip_a * | wc ==> 0 results grep -r --include=*.c skb_ip_a * | wc ==> 112 results Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200407190029.892-1-christophe.jaillet@wanadoo.frSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
EINTR is the usual error code which other killable interfaces return. This is the case for the other fatal_signal_pending break out from the same function. Make the code consistent. ERESTARTSYS is also quite confusing because the signal is fatal and so no restart will happen before returning to the userspace. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: http://lkml.kernel.org/r/20200409071133.31734-1-mhocko@kernel.orgSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Longpeng authored
Our machine encountered a panic(addressing exception) after run for a long time and the calltrace is: RIP: hugetlb_fault+0x307/0xbe0 RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286 RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48 RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48 RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080 R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8 R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074 FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: follow_hugetlb_page+0x175/0x540 __get_user_pages+0x2a0/0x7e0 __get_user_pages_unlocked+0x15d/0x210 __gfn_to_pfn_memslot+0x3c5/0x460 [kvm] try_async_pf+0x6e/0x2a0 [kvm] tdp_page_fault+0x151/0x2d0 [kvm] ... kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm] kvm_vcpu_ioctl+0x309/0x6d0 [kvm] do_vfs_ioctl+0x3f0/0x540 SyS_ioctl+0xa1/0xc0 system_call_fastpath+0x22/0x27 For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it may return a wrong 'pmdp' if there is a race. Please look at the following code snippet: ... pud = pud_offset(p4d, addr); if (sz != PUD_SIZE && pud_none(*pud)) return NULL; /* hugepage or swap? */ if (pud_huge(*pud) || !pud_present(*pud)) return (pte_t *)pud; pmd = pmd_offset(pud, addr); if (sz != PMD_SIZE && pmd_none(*pmd)) return NULL; /* hugepage or swap? */ if (pmd_huge(*pmd) || !pmd_present(*pmd)) return (pte_t *)pmd; ... The following sequence would trigger this bug: - CPU0: sz = PUD_SIZE and *pud = 0 , continue - CPU0: "pud_huge(*pud)" is false - CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT) - CPU0: "!pud_present(*pud)" is false, continue - CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp However, we want CPU0 to return NULL or pudp in this case. We must make sure there is exactly one dereference of pud and pmd. Signed-off-by: Longpeng <longpeng2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200413010342.771-1-longpeng2@huawei.comSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-