- 12 Jun, 2019 1 commit
-
-
Joel Savitz authored
In the case that a process is constrained by taskset(1) (i.e. sched_setaffinity(2)) to a subset of available cpus, and all of those are subsequently offlined, the scheduler will set tsk->cpus_allowed to the current value of task_cs(tsk)->effective_cpus. This is done via a call to do_set_cpus_allowed() in the context of cpuset_cpus_allowed_fallback() made by the scheduler when this case is detected. This is the only call made to cpuset_cpus_allowed_fallback() in the latest mainline kernel. However, this is not sane behavior. I will demonstrate this on a system running the latest upstream kernel with the following initial configuration: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffff Cpus_allowed_list: 0-63 (Where cpus 32-63 are provided via smt.) If we limit our current shell process to cpu2 only and then offline it and reonline it: # taskset -p 4 $$ pid 2272's current affinity mask: ffffffffffffffff pid 2272's new affinity mask: 4 # echo off > /sys/devices/system/cpu/cpu2/online # dmesg | tail -3 [ 2195.866089] process 2272 (bash) no longer affine to cpu2 [ 2195.872700] IRQ 114: no longer affine to CPU2 [ 2195.879128] smpboot: CPU 2 is now offline # echo on > /sys/devices/system/cpu/cpu2/online # dmesg | tail -1 [ 2617.043572] smpboot: Booting Node 0 Processor 2 APIC 0x4 We see that our current process now has an affinity mask containing every cpu available on the system _except_ the one we originally constrained it to: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffffb Cpus_allowed_list: 0-1,3-63 This is not sane behavior, as the scheduler can now not only place the process on previously forbidden cpus, it can't even schedule it on the cpu it was originally constrained to! Other cases result in even more exotic affinity masks. Take for instance a process with an affinity mask containing only cpus provided by smt at the moment that smt is toggled, in a configuration such as the following: # taskset -p f000000000 $$ # grep -i cpu /proc/$$/status Cpus_allowed: 000000f0,00000000 Cpus_allowed_list: 36-39 A double toggle of smt results in the following behavior: # echo off > /sys/devices/system/cpu/smt/control # echo on > /sys/devices/system/cpu/smt/control # grep -i cpus /proc/$$/status Cpus_allowed: ffffff00,ffffffff Cpus_allowed_list: 0-31,40-63 This is even less sane than the previous case, as the new affinity mask excludes all smt-provided cpus with ids less than those that were previously in the affinity mask, as well as those that were actually in the mask. With this patch applied, both of these cases end in the following state: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,ffffffff Cpus_allowed_list: 0-63 The original policy is discarded. Though not ideal, it is the simplest way to restore sanity to this fallback case without reinventing the cpuset wheel that rolls down the kernel just fine in cgroup v2. A user who wishes for the previous affinity mask to be restored in this fallback case can use that mechanism instead. This patch modifies scheduler behavior by instead resetting the mask to task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy mode. I tested the cases above on both modes. Note that the scheduler uses this fallback mechanism if and only if _every_ other valid avenue has been traveled, and it is the last resort before calling BUG(). Suggested-by: Waiman Long <longman@redhat.com> Suggested-by: Phil Auld <pauld@redhat.com> Signed-off-by: Joel Savitz <jsavitz@redhat.com> Acked-by: Phil Auld <pauld@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 10 Jun, 2019 1 commit
-
-
Tejun Heo authored
While adding handling for dying task group leaders c03cd773 ("cgroup: Include dying leaders with live threads in PROCS iterations") added an inverted cset skip condition to css_task_iter_advance_css_set(). It should skip cset if it's completely empty but was incorrectly testing for the inverse condition for the dying_tasks list. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: c03cd773 ("cgroup: Include dying leaders with live threads in PROCS iterations") Reported-by: syzbot+d4bba5ccd4f9a2a68681@syzkaller.appspotmail.com
-
- 05 Jun, 2019 1 commit
-
-
Tejun Heo authored
b636fd38 ("cgroup: Implement css_task_iter_skip()") introduced css_task_iter_skip() which is used to fix task iterations skipping dying threadgroup leaders with live threads. Skipping is implemented as a subportion of full advancing but css_task_iter_next() forgot to fully advance a skipped iterator before determining the next task to visit causing it to return invalid task pointers. Fix it by making css_task_iter_next() fully advance the iterator if it has been skipped since the previous iteration. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: syzbot Link: http://lkml.kernel.org/r/00000000000097025d058a7fd785@google.com Fixes: b636fd38 ("cgroup: Implement css_task_iter_skip()")
-
- 31 May, 2019 3 commits
-
-
Tejun Heo authored
CSS_TASK_ITER_PROCS currently iterates live group leaders; however, this means that a process with dying leader and live threads will be skipped. IOW, cgroup.procs might be empty while cgroup.threads isn't, which is confusing to say the least. Fix it by making cset track dying tasks and include dying leaders with live threads in PROCS iteration. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com>
-
Tejun Heo authored
When a task is moved out of a cset, task iterators pointing to the task are advanced using the normal css_task_iter_advance() call. This is fine but we'll be tracking dying tasks on csets and thus moving tasks from cset->tasks to (to be added) cset->dying_tasks. When we remove a task from cset->tasks, if we advance the iterators, they may move over to the next cset before we had the chance to add the task back on the dying list, which can allow the task to escape iteration. This patch separates out skipping from advancing. Skipping only moves the affected iterators to the next pointer rather than fully advancing it and the following advancing will recognize that the cursor has already been moved forward and do the rest of advancing. This ensures that when a task moves from one list to another in its cset, as long as it moves in the right direction, it's always visible to iteration. This doesn't cause any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
-
Tejun Heo authored
cgroup_release() calls cgroup_subsys->release() which is used by the pids controller to uncharge its pid. We want to use it to manage iteration of dying tasks which requires putting it before __unhash_process(). Move cgroup_release() above __exit_signal(). While this makes it uncharge before the pid is freed, pid is RCU freed anyway and the window is very narrow. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
-
- 30 May, 2019 1 commit
-
-
Odin Ugedal authored
Add another example to clarify that HugePages smaller than 1MB will be displayed using "KB", with an uppercased K (eg. 20KB), and not the normal SI prefix kilo (small k). Because of a misunderstanding/copy-paste error inside runc (see https://github.com/opencontainers/runc/pull/2065), it tried accessing the cgroup control file of a 64kB HugePage using "hugetlb.64kB._____" instead of the correct "hugetlb.64KB._____". Adding a new example will make it clear how sizes smaller than 1MB are handled. Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 29 May, 2019 1 commit
-
-
Tejun Heo authored
A PF_EXITING task can stay associated with an offline css. If such task calls task_get_css(), it can get stuck indefinitely. This can be triggered by BSD process accounting which writes to a file with PF_EXITING set when racing against memcg disable as in the backtrace at the end. After this change, task_get_css() may return a css which was already offline when the function was called. None of the existing users are affected by this change. INFO: rcu_sched self-detected stall on CPU INFO: rcu_sched detected stalls on CPUs/tasks: ... NMI backtrace for cpu 0 ... Call Trace: <IRQ> dump_stack+0x46/0x68 nmi_cpu_backtrace.cold.2+0x13/0x57 nmi_trigger_cpumask_backtrace+0xba/0xca rcu_dump_cpu_stacks+0x9e/0xce rcu_check_callbacks.cold.74+0x2af/0x433 update_process_times+0x28/0x60 tick_sched_timer+0x34/0x70 __hrtimer_run_queues+0xee/0x250 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x56/0x110 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:balance_dirty_pages_ratelimited+0x28f/0x3d0 ... btrfs_file_write_iter+0x31b/0x563 __vfs_write+0xfa/0x140 __kernel_write+0x4f/0x100 do_acct_process+0x495/0x580 acct_process+0xb9/0xdb do_exit+0x748/0xa00 do_group_exit+0x3a/0xa0 get_signal+0x254/0x560 do_signal+0x23/0x5c0 exit_to_usermode_loop+0x5d/0xa0 prepare_exit_to_usermode+0x53/0x80 retint_user+0x8/0x8 Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v4.2+ Fixes: ec438699 ("cgroup, block: implement task_get_css() and use it in bio_associate_current()")
-
- 28 May, 2019 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds authored
Pull pin control fixes from Linus Walleij: "The commits that stand out are the Intel fixes that arrived during the merge window and I got relayed by pull request from Andy. Apart from that a minor Kconfig noise. - Interrupt clearing fix for the Intel pin controllers affecting touchpads on some laptops. - Compile Kconfig fix for the STMFX expander pin controller" * tag 'pinctrl-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: stmfx: Fix compile issue when CONFIG_OF_GPIO is not defined pinctrl: intel: Clear interrupt status in mask/unmask callback pinctrl: intel: Use GENMASK() consistently
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpioLinus Torvalds authored
Pull GPIO fix from Linus Walleij: "Fix a build error in gpio-adp5588" * tag 'gpio-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: fix gpio-adp5588 build errors
-
Randy Dunlap authored
Fix build errors on ia64 when DISCONTIGMEM=y and NUMA=y by exporting paddr_to_nid(). Fixes these build errors: ERROR: "paddr_to_nid" [sound/core/snd-pcm.ko] undefined! ERROR: "paddr_to_nid" [net/sunrpc/sunrpc.ko] undefined! ERROR: "paddr_to_nid" [fs/cifs/cifs.ko] undefined! ERROR: "paddr_to_nid" [drivers/video/fbdev/core/fb.ko] undefined! ERROR: "paddr_to_nid" [drivers/usb/mon/usbmon.ko] undefined! ERROR: "paddr_to_nid" [drivers/usb/core/usbcore.ko] undefined! ERROR: "paddr_to_nid" [drivers/md/raid1.ko] undefined! ERROR: "paddr_to_nid" [drivers/md/dm-mod.ko] undefined! ERROR: "paddr_to_nid" [drivers/md/dm-crypt.ko] undefined! ERROR: "paddr_to_nid" [drivers/md/dm-bufio.ko] undefined! ERROR: "paddr_to_nid" [drivers/ide/ide-core.ko] undefined! ERROR: "paddr_to_nid" [drivers/ide/ide-cd_mod.ko] undefined! ERROR: "paddr_to_nid" [drivers/gpu/drm/drm.ko] undefined! ERROR: "paddr_to_nid" [drivers/char/agp/agpgart.ko] undefined! ERROR: "paddr_to_nid" [drivers/block/nbd.ko] undefined! ERROR: "paddr_to_nid" [drivers/block/loop.ko] undefined! ERROR: "paddr_to_nid" [drivers/block/brd.ko] undefined! ERROR: "paddr_to_nid" [crypto/ccm.ko] undefined! Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: linux-ia64@vger.kernel.org Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 27 May, 2019 1 commit
-
-
Linus Walleij authored
Merge tag 'intel-pinctrl-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/intel into fixes intel-pinctrl for v5.2-2 Fix a laggish ELAN touchpad responsiveness due to an odd interrupt masking. The following is an automated git shortlog grouped by driver: intel: - Clear interrupt status in mask/unmask callback - Use GENMASK() consistently
-
- 26 May, 2019 6 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull tracing warning fix from Steven Rostedt: "Make the GCC 9 warning for sub struct memset go away. GCC 9 now warns about calling memset() on partial structures when it goes across multiple fields. This adds a helper for the place in tracing that does this type of clearing of a structure" * tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Silence GCC 9 array bounds warning
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull KVM fixes from Paolo Bonzini: "The usual smattering of fixes and tunings that came in too late for the merge window, but should not wait four months before they appear in a release. I also travelled a bit more than usual in the first part of May, which didn't help with picking up patches and reports promptly" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (33 commits) KVM: x86: fix return value for reserved EFER tools/kvm_stat: fix fields filter for child events KVM: selftests: Wrap vcpu_nested_state_get/set functions with x86 guard kvm: selftests: aarch64: compile with warnings on kvm: selftests: aarch64: fix default vm mode kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION KVM: x86/pmu: do not mask the value that is written to fixed PMUs KVM: x86/pmu: mask the result of rdpmc according to the width of the counters x86/kvm/pmu: Set AMD's virt PMU version to 1 KVM: x86: do not spam dmesg with VMCS/VMCB dumps kvm: Check irqchip mode before assign irqfd kvm: svm/avic: fix off-by-one in checking host APIC ID KVM: selftests: do not blindly clobber registers in guest asm KVM: selftests: Remove duplicated TEST_ASSERT in hyperv_cpuid.c KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow kvm: vmx: Fix -Wmissing-prototypes warnings KVM: nVMX: Fix using __this_cpu_read() in preemptible context kvm: fix compilation on s390 ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/randomLinus Torvalds authored
Pull /dev/random fix from Ted Ts'o: "Fix a soft lockup regression when reading from /dev/random in early boot" * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: fix soft lockup when trying to read from an uninitialized blocking pool
-
Theodore Ts'o authored
Fixes: eb9d1bf0: "random: only read from /dev/random after its pool has received 128 bits" Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Miguel Ojeda authored
Starting with GCC 9, -Warray-bounds detects cases when memset is called starting on a member of a struct but the size to be cleared ends up writing over further members. Such a call happens in the trace code to clear, at once, all members after and including `seq` on struct trace_iterator: In function 'memset', inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3: ./include/linux/string.h:344:9: warning: '__builtin_memset' offset [8505, 8560] from the object at 'iter' is out of the bounds of referenced subobject 'seq' with type 'struct trace_seq' at offset 4368 [-Warray-bounds] 344 | return __builtin_memset(p, c, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to avoid GCC complaining about it, we compute the address ourselves by adding the offsetof distance instead of referring directly to the member. Since there are two places doing this clear (trace.c and trace_kdb.c), take the chance to move the workaround into a single place in the internal header. Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.comSigned-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [ Removed unnecessary parenthesis around "iter" ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- 25 May, 2019 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
Pull ext4 fixes from Ted Ts'o: "Bug fixes (including a regression fix) for ext4" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix dcache lookup of !casefolded directories ext4: do not delete unlinked inode from orphan list on failed truncate ext4: wait for outstanding dio during truncate in nojournal mode ext4: don't perform block validity checks on the journal inode
-
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds authored
Pull libnvdimm fixes from Dan Williams: - Fix a regression that disabled device-mapper dax support - Remove unnecessary hardened-user-copy overhead (>30%) for dax read(2)/write(2). - Fix some compilation warnings. * tag 'libnvdimm-fixes-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead dax: Arrange for dax_supported check to span multiple devices libnvdimm: Fix compilation warnings with W=1
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull tracing fixes from Steven Rostedt: "Tom Zanussi sent me some small fixes and cleanups to the histogram code and I forgot to incorporate them. I also added a small clean up patch that was sent to me a while ago and I just noticed it" * tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: kernel/trace/trace.h: Remove duplicate header of trace_seq.h tracing: Add a check_val() check before updating cond_snapshot() track_val tracing: Check keys for variable references in expressions too tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
-
Gabriel Krisman Bertazi authored
Found by visual inspection, this wasn't caught by my xfstest, since it's effect is ignoring positive dentries in the cache the fallback just goes to the disk. it was introduced in the last iteration of the case-insensitive patch. d_compare should return 0 when the entries match, so make sure we are correctly comparing the entire string if the encoding feature is set and we are on a case-INsensitive directory. Fixes: b886ee3e ("ext4: Support case-insensitive file name lookups") Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "This is the same set of patches sent in the merge window as the final pull except that Martin's read only rework is replaced with a simple revert of the original change that caused the regression. Everything else is an obvious fix or small cleanup" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: Revert "scsi: sd: Keep disk read-only when re-reading partition" scsi: bnx2fc: fix incorrect cast to u64 on shift operation scsi: smartpqi: Reporting unhandled SCSI errors scsi: myrs: Fix uninitialized variable scsi: lpfc: Update lpfc version to 12.2.0.2 scsi: lpfc: add check for loss of ndlp when sending RRQ scsi: lpfc: correct rcu unlock issue in lpfc_nvme_info_show scsi: lpfc: resolve lockdep warnings scsi: qedi: remove set but not used variables 'cdev' and 'udev' scsi: qedi: remove memset/memcpy to nfunc and use func instead scsi: qla2xxx: Add cleanup for PCI EEH recovery
-
- 24 May, 2019 17 commits
-
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: - NVMe pull request from Keith, with fixes from a few folks. - bio and sbitmap before atomic barrier fixes (Andrea) - Hang fix for blk-mq freeze and unfreeze (Bob) - Single segment count regression fix (Christoph) - AoE now has a new maintainer - tools/io_uring/ Makefile fix, and sync with liburing (me) * tag 'for-linus-20190524' of git://git.kernel.dk/linux-block: (23 commits) tools/io_uring: sync with liburing tools/io_uring: fix Makefile for pthread library link blk-mq: fix hang caused by freeze/unfreeze sequence block: remove the bi_seg_{front,back}_size fields in struct bio block: remove the segment size check in bio_will_gap block: force an unlimited segment size on queues with a virt boundary block: don't decrement nr_phys_segments for physically contigous segments sbitmap: fix improper use of smp_mb__before_atomic() bio: fix improper use of smp_mb__before_atomic() aoe: list new maintainer for aoe driver nvme-pci: use blk-mq mapping for unmanaged irqs nvme: update MAINTAINERS nvme: copy MTFA field from identify controller nvme: fix memory leak for power latency tolerance nvme: release namespace SRCU protection before performing controller ioctls nvme: merge nvme_ns_ioctl into nvme_ioctl nvme: remove the ifdef around nvme_nvm_ioctl nvme: fix srcu locking on error return in nvme_get_ns_from_disk nvme: Fix known effects nvme-pci: Sync queues on reset ...
-
Linus Torvalds authored
Merge tag 'linux-kselftest-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: - Two fixes to regressions introduced in kselftest Makefile test run output refactoring work (Kees Cook) - Adding Atom support to syscall_arg_fault test (Tong Bo) * tag 'linux-kselftest-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/timers: Add missing fflush(stdout) calls selftests: Remove forced unbuffering for test running selftests/x86: Support Atom for syscall_arg_fault test
-
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linuxLinus Torvalds authored
Pull Devicetree fixes from Rob Herring: - Update checkpatch.pl to use DT vendor-prefixes.yaml - Fix DT binding references to files converted to DT schema - Clean-up Arm CPU binding examples to match schema - Add Sifive block versioning scheme documentation - Pass binding directory base to validation tools for reference lookups * tag 'devicetree-fixes-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: checkpatch.pl: Update DT vendor prefix check dt: bindings: mtd: replace references to nand.txt with nand-controller.yaml dt-bindings: interrupt-controller: arm,gic: Fix schema errors in example dt-bindings: arm: Clean up CPU binding examples dt: fix refs that were renamed to json with the same file name dt-bindings: Pass binding directory to validation tools dt-bindings: sifive: describe sifive-blocks versioning
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds authored
Pule more SPDX updates from Greg KH: "Here is another set of reviewed patches that adds SPDX tags to different kernel files, based on a set of rules that are being used to parse the comments to try to determine that the license of the file is "GPL-2.0-or-later". Only the "obvious" versions of these matches are included here, a number of "non-obvious" variants of text have been found but those have been postponed for later review and analysis. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers" * tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (85 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 125 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 123 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 122 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 121 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 120 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 119 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 116 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 114 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 113 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 111 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 110 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 106 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 105 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 103 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 101 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98 ...
-
Waiman Long authored
The kernel test robot has reported that the use of __this_cpu_add() causes bug messages like: BUG: using __this_cpu_add() in preemptible [00000000] code: ... Given the imprecise nature of the count and the possibility of resetting the count and doing the measurement again, this is not really a big problem to use the unprotected __this_cpu_*() functions. To make the preemption checking code happy, the this_cpu_*() functions will be used if CONFIG_DEBUG_PREEMPT is defined. The imprecise nature of the locking counts are also documented with the suggestion that we should run the measurement a few times with the counts reset in between to get a better picture of what is going on under the hood. Fixes: a8654596 ("locking/rwsem: Enable lock event counting") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Paolo Bonzini authored
Commit 11988499 ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes", 2019-04-02) introduced a "return false" in a function returning int, and anyway set_efer has a "nonzero on error" conventon so it should be returning 1. Reported-by: Pavel Machek <pavel@denx.de> Fixes: 11988499 ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes") Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Stefan Raspl authored
The fields filter would not work with child fields, as the respective parents would not be included. No parents displayed == no childs displayed. To reproduce, run on s390 (would work on other platforms, too, but would require a different filter name): - Run 'kvm_stat -d' - Press 'f' - Enter 'instruct' Notice that events like instruction_diag_44 or instruction_diag_500 are not displayed - the output remains empty. With this patch, we will filter by matching events and their parents. However, consider the following example where we filter by instruction_diag_44: kvm statistics - summary regex filter: instruction_diag_44 Event Total %Total CurAvg/s exit_instruction 276 100.0 12 instruction_diag_44 256 92.8 11 Total 276 12 Note that the parent ('exit_instruction') displays the total events, but the childs listed do not match its total (256 instead of 276). This is intended (since we're filtering all but one child), but might be confusing on first sight. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Thomas Huth authored
struct kvm_nested_state is only available on x86 so far. To be able to compile the code on other architectures as well, we need to wrap the related code with #ifdefs. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
aarch64 fixups needed to compile with warnings as errors. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
VM_MODE_P52V48_4K is not a valid mode for AArch64. Replace its use in vm_create_default() with a mode that works and represents a good AArch64 default. (We didn't ever see a problem with this because we don't have any unit tests using vm_create_default(), but it's good to get it fixed in advance.) Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
The memory slot size must be aligned to the host's page size. When testing a guest with a 4k page size on a host with a 64k page size, then 3 guest pages are not host page size aligned. Since we just need a nearly arbitrary number of extra pages to ensure the memslot is not aligned to a 64 host-page boundary for this test, then we can use 16, as that's 64k aligned, but not 64 * 64k aligned. Fixes: 76d58e0f ("KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size", 2019-04-17) Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christian Borntraeger authored
kselftests exposed a problem in the s390 handling for memory slots. Right now we only do proper memory slot handling for creation of new memory slots. Neither MOVE, nor DELETION are handled properly. Let us implement those. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
According to the SDM, for MSR_IA32_PERFCTR0/1 "the lower-order 32 bits of each MSR may be written with any value, and the high-order 8 bits are sign-extended according to the value of bit 31", but the fixed counters in real hardware are limited to the width of the fixed counters ("bits beyond the width of the fixed-function counter are reserved and must be written as zeros"). Fix KVM to do the same. Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This patch will simplify the changes in the next, by enforcing the masking of the counters to RDPMC and RDMSR. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
After commit: 672ff6cf ("KVM: x86: Raise #GP when guest vCPU do not support PMU") my AMD guests started #GPing like this: general protection fault: 0000 [#1] PREEMPT SMP CPU: 1 PID: 4355 Comm: bash Not tainted 5.1.0-rc6+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:x86_perf_event_update+0x3b/0xa0 with Code: pointing to RDPMC. It is RDPMC because the guest has the hardware watchdog CONFIG_HARDLOCKUP_DETECTOR_PERF enabled which uses perf. Instrumenting kvm_pmu_rdpmc() some, showed that it fails due to: if (!pmu->version) return 1; which the above commit added. Since AMD's PMU leaves the version at 0, that causes the #GP injection into the guest. Set pmu->version arbitrarily to 1 and move it above the non-applicable struct kvm_pmu members. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: kvm@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Cc: Mihai Carabas <mihai.carabas@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: stable@vger.kernel.org Fixes: 672ff6cf ("KVM: x86: Raise #GP when guest vCPU do not support PMU") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Userspace can easily set up invalid processor state in such a way that dmesg will be filled with VMCS or VMCB dumps. Disable this by default using a module parameter. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
When assigning kvm irqfd we didn't check the irqchip mode but we allow KVM_IRQFD to succeed with all the irqchip modes. However it does not make much sense to create irqfd even without the kernel chips. Let's provide a arch-dependent helper to check whether a specific irqfd is allowed by the arch. At least for x86, it should make sense to check: - when irqchip mode is NONE, all irqfds should be disallowed, and, - when irqchip mode is SPLIT, irqfds that are with resamplefd should be disallowed. For either of the case, previously we'll silently ignore the irq or the irq ack event if the irqchip mode is incorrect. However that can cause misterious guest behaviors and it can be hard to triage. Let's fail KVM_IRQFD even earlier to detect these incorrect configurations. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Radim Krčmář <rkrcmar@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-