- 23 Feb, 2019 1 commit
-
-
Thomas Gleixner authored
Merge tag 'irqchip-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier - Core pseudo-NMI handling code - Allow the default irq domain to be retrieved - A new interrupt controller for the Loongson LS1X platform - Affinity support for the SiFive PLIC - Better support for the iMX irqsteer driver - NUMA aware memory allocations for GICv3 - A handful of other fixes (i8259, GICv3, PLIC)
-
- 22 Feb, 2019 4 commits
-
-
Aisheng Dong authored
One irqsteer channel can support up to 8 output interrupts. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Aisheng Dong authored
One group can manage 64 interrupts by using two registers (e.g. STATUS/SET). However, the integrated irqsteer may support only 32 interrupts which needs only one register in a group. But the current driver assume there's a mininum of two registers in a group which result in a wrong register map for 32 interrupts per channel irqsteer. Let's use the reg_num caculated by interrupts per channel instead of irq_group to cover this case. Cc: Rob Herring <robh+dt@kernel.org> Cc: Shawn Guo <shawnguo@kernel.org> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Aisheng Dong authored
One irqsteer channel can support up to 8 output interrupts. Cc: Rob Herring <robh+dt@kernel.org> Cc: Shawn Guo <shawnguo@kernel.org> Cc: devicetree@vger.kernel.org Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Aisheng Dong authored
Not all 64 interrupts may be used in one group. e.g. most irqsteer in imx8qxp and imx8qm subsystems supports only 32 interrupts. As the IP integration parameters are Channel number and interrupts number, let's use fsl,irqs-num to represents how many interrupts supported by this irqsteer channel. Note this will break the compatibility of old binding. As the original fsl,irq-groups was born out of a misunderstanding of the HW config options and we are not aware of any users of the current binding. And the old binding was just published in recent months, so it's worth to change now to avoid confusing in the future. Cc: Rob Herring <robh+dt@kernel.org> Cc: Shawn Guo <shawnguo@kernel.org> Cc: devicetree@vger.kernel.org Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
- 21 Feb, 2019 7 commits
-
-
Doug Berger authored
Using the irq_gc_lock/irq_gc_unlock functions in the suspend and resume functions creates the opportunity for a deadlock during suspend, resume, and shutdown. Using the irq_gc_lock_irqsave/ irq_gc_unlock_irqrestore variants prevents this possible deadlock. Cc: stable@vger.kernel.org Fixes: 7f646e92 ("irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> [maz: tidied up $SUBJECT] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Shanker Donthineni authored
The NUMA node information is visible to ITS driver but not being used other than handling hardware errata. ITS/GICR hardware accesses to the local NUMA node is usually quicker than the remote NUMA node. How slow the remote NUMA accesses are depends on the implementation details. This patch allocates memory for ITS management tables and command queue from the corresponding NUMA node using the appropriate NUMA aware functions. This change improves the performance of the ITS tables read latency on systems where it has more than one ITS block, and with the slower inter node accesses. Apache Web server benchmarking using ab tool on a HiSilicon D06 board with multiple numa mem nodes shows Time per request and Transfer rate improvements of ~3.6% with this patch. Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Ganapatrao Kulkarni <gkulkarni@marvell.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
The default irq domain allows legacy code to create irqdomain mappings without having to track the domain it is allocating from. Setting the default domain is a one shot, fire and forget operation, and no effort was made to be able to retrieve this information at a later point in time. Newer irqdomain APIs (the hierarchical stuff) relies on both the irqchip code to track the irqdomain it is allocating from, as well as some form of firmware abstraction to easily identify which piece of HW maps to which irq domain (DT, ACPI). For systems without such firmware (or legacy platform that are getting dragged into the 21st century), things are a bit harder. For these cases (and these cases only!), let's provide a way to retrieve the default domain, allowing the use of the v2 API without having to resort to platform-specific hacks. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Anup Patel authored
Currently on SMP host, all CPUs take external interrupts routed via PLIC. All CPUs will try to claim a given external interrupt but only one of them will succeed while other CPUs would simply resume whatever they were doing before. This means if we have N CPUs then for every external interrupt N-1 CPUs will always fail to claim it and waste their CPU time. Instead of above, external interrupts should be taken by only one CPU and we should have provision to explicitly specify IRQ affinity from kernel-space or user-space. This patch provides irq_set_affinity() implementation for PLIC driver. It also updates irq_enable() such that PLIC interrupts are only enabled for one of CPUs specified in IRQ affinity mask. With this patch in-place, we can change IRQ affinity at any-time from user-space using procfs. Example: / # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 8: 44 0 0 0 SiFive PLIC 8 virtio0 10: 48 0 0 0 SiFive PLIC 10 ttyS0 IPI0: 55 663 58 363 Rescheduling interrupts IPI1: 0 1 3 16 Function call interrupts / # / # / # echo 4 > /proc/irq/10/smp_affinity / # / # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 8: 45 0 0 0 SiFive PLIC 8 virtio0 10: 160 0 17 0 SiFive PLIC 10 ttyS0 IPI0: 68 693 77 410 Rescheduling interrupts IPI1: 0 2 3 16 Function call interrupts Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Anup Patel authored
We explicitly differentiate between PLIC handler and context because PLIC context is for given mode of HART whereas PLIC handler is per-CPU software construct meant for handling interrupts from a particular PLIC context. To achieve this differentiation, we rename "nr_handlers" to "nr_contexts" and "nr_mapped" to "nr_handlers" in plic_init(). Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Anup Patel authored
We have two enteries (one for M-mode and another for S-mode) in the interrupts-extended DT property of PLIC DT node for each HART. It is expected that firmware/bootloader will set M-mode HWIRQ line of each HART to 0xffffffff (i.e. -1) in interrupts-extended DT property because Linux runs in S-mode only. If firmware/bootloader is buggy then it will not correctly update interrupts-extended DT property which might result in a plic_handler configured twice. This patch adds a warning in plic_init() if a plic_handler is already marked present. This warning provides us a hint about incorrectly updated interrupts-extended DT property. Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Anup Patel authored
This patch does following optimizations: 1. Pre-compute hart base for each context handler 2. Pre-compute enable base for each context handler 3. Have enable lock for each context handler instead of global plic_toggle_lock Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
- 18 Feb, 2019 6 commits
-
-
Thomas Gleixner authored
Multiple interrupt sets for affinity spreading are now handled in the core code and the number of sets and their size is recalculated via a driver supplied callback. That avoids the requirement to invoke pci_alloc_irq_vectors_affinity() with the arguments minvecs and maxvecs set to the same value and the callsite handling the ENOSPC situation. Remove the now obsolete sanity checks and the related comments. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: Keith Busch <keith.busch@intel.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> Link: https://lkml.kernel.org/r/20190216172228.778630549@linutronix.de
-
Thomas Gleixner authored
Now that the NVME driver is converted over to the calc_set() callback, the workarounds of the original set support can be removed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: Keith Busch <keith.busch@intel.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> Link: https://lkml.kernel.org/r/20190216172228.689834224@linutronix.de
-
Ming Lei authored
The NVME PCI driver contains a tedious mechanism for interrupt allocation, which is necessary to adjust the number and size of interrupt sets to the maximum available number of interrupts which depends on the underlying PCI capabilities and the available CPU resources. It works around the former short comings of the PCI and core interrupt allocation mechanims in combination with interrupt sets. The PCI interrupt allocation function allows to provide a maximum and a minimum number of interrupts to be allocated and tries to allocate as many as possible. This worked without driver interaction as long as there was only a single set of interrupts to handle. With the addition of support for multiple interrupt sets in the generic affinity spreading logic, which is invoked from the PCI interrupt allocation, the adaptive loop in the PCI interrupt allocation did not work for multiple interrupt sets. The reason is that depending on the total number of interrupts which the PCI allocation adaptive loop tries to allocate in each step, the number and the size of the interrupt sets need to be adapted as well. Due to the way the interrupt sets support was implemented there was no way for the PCI interrupt allocation code or the core affinity spreading mechanism to invoke a driver specific function for adapting the interrupt sets configuration. As a consequence the driver had to implement another adaptive loop around the PCI interrupt allocation function and calling that with maximum and minimum interrupts set to the same value. This ensured that the allocation either succeeded or immediately failed without any attempt to adjust the number of interrupts in the PCI code. The core code now allows drivers to provide a callback to recalculate the number and the size of interrupt sets during PCI interrupt allocation, which in turn allows the PCI interrupt allocation function to be called in the same way as with a single set of interrupts. The PCI code handles the adaptive loop and the interrupt affinity spreading mechanism invokes the driver callback to adapt the interrupt set configuration to the current loop value. This replaces the adaptive loop in the driver completely. Implement the NVME specific callback which adjusts the interrupt sets configuration and remove the adaptive allocation loop. [ tglx: Simplify the callback further and restore the dropped adjustment of number of sets ] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: Keith Busch <keith.busch@intel.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> Link: https://lkml.kernel.org/r/20190216172228.602546658@linutronix.de
-
Ming Lei authored
The interrupt affinity spreading mechanism supports to spread out affinities for one or more interrupt sets. A interrupt set contains one or more interrupts. Each set is mapped to a specific functionality of a device, e.g. general I/O queues and read I/O queus of multiqueue block devices. The number of interrupts per set is defined by the driver. It depends on the total number of available interrupts for the device, which is determined by the PCI capabilites and the availability of underlying CPU resources, and the number of queues which the device provides and the driver wants to instantiate. The driver passes initial configuration for the interrupt allocation via a pointer to struct irq_affinity. Right now the allocation mechanism is complex as it requires to have a loop in the driver to determine the maximum number of interrupts which are provided by the PCI capabilities and the underlying CPU resources. This loop would have to be replicated in every driver which wants to utilize this mechanism. That's unwanted code duplication and error prone. In order to move this into generic facilities it is required to have a mechanism, which allows the recalculation of the interrupt sets and their size, in the core code. As the core code does not have any knowledge about the underlying device, a driver specific callback is required in struct irq_affinity, which can be invoked by the core code. The callback gets the number of available interupts as an argument, so the driver can calculate the corresponding number and size of interrupt sets. At the moment the struct irq_affinity pointer which is handed in from the driver and passed through to several core functions is marked 'const', but for the callback to be able to modify the data in the struct it's required to remove the 'const' qualifier. Add the optional callback to struct irq_affinity, which allows drivers to recalculate the number and size of interrupt sets and remove the 'const' qualifier. For simple invocations, which do not supply a callback, a default callback is installed, which just sets nr_sets to 1 and transfers the number of spreadable vectors to the set_size array at index 0. This is for now guarded by a check for nr_sets != 0 to keep the NVME driver working until it is converted to the callback mechanism. To make sure that the driver configuration is correct under all circumstances the callback is invoked even when there are no interrupts for queues left, i.e. the pre/post requirements already exhaust the numner of available interrupts. At the PCI layer irq_create_affinity_masks() has to be invoked even for the case where the legacy interrupt is used. That ensures that the callback is invoked and the device driver can adjust to that situation. [ tglx: Fixed the simple case (no sets required). Moved the sanity check for nr_sets after the invocation of the callback so it catches broken drivers. Fixed the kernel doc comments for struct irq_affinity and de-'This patch'-ed the changelog ] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: Keith Busch <keith.busch@intel.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> Link: https://lkml.kernel.org/r/20190216172228.512444498@linutronix.de
-
Ming Lei authored
The interrupt affinity spreading mechanism supports to spread out affinities for one or more interrupt sets. A interrupt set contains one or more interrupts. Each set is mapped to a specific functionality of a device, e.g. general I/O queues and read I/O queus of multiqueue block devices. The number of interrupts per set is defined by the driver. It depends on the total number of available interrupts for the device, which is determined by the PCI capabilites and the availability of underlying CPU resources, and the number of queues which the device provides and the driver wants to instantiate. The driver passes initial configuration for the interrupt allocation via a pointer to struct irq_affinity. Right now the allocation mechanism is complex as it requires to have a loop in the driver to determine the maximum number of interrupts which are provided by the PCI capabilities and the underlying CPU resources. This loop would have to be replicated in every driver which wants to utilize this mechanism. That's unwanted code duplication and error prone. In order to move this into generic facilities it is required to have a mechanism, which allows the recalculation of the interrupt sets and their size, in the core code. As the core code does not have any knowledge about the underlying device, a driver specific callback will be added to struct affinity_desc, which will be invoked by the core code. The callback will get the number of available interupts as an argument, so the driver can calculate the corresponding number and size of interrupt sets. To support this, two modifications for the handling of struct irq_affinity are required: 1) The (optional) interrupt sets size information is contained in a separate array of integers and struct irq_affinity contains a pointer to it. This is cumbersome and as the maximum number of interrupt sets is small, there is no reason to have separate storage. Moving the size array into struct affinity_desc avoids indirections and makes the code simpler. 2) At the moment the struct irq_affinity pointer which is handed in from the driver and passed through to several core functions is marked 'const'. With the upcoming callback to recalculate the number and size of interrupt sets, it's necessary to remove the 'const' qualifier. Otherwise the callback would not be able to update the data. Implement #1 and store the interrupt sets size in 'struct irq_affinity'. No functional change. [ tglx: Fixed the memcpy() size so it won't copy beyond the size of the source. Fixed the kernel doc comments for struct irq_affinity and de-'This patch'-ed the changelog ] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: Keith Busch <keith.busch@intel.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> Link: https://lkml.kernel.org/r/20190216172228.423723127@linutronix.de
-
Thomas Gleixner authored
All information and calculations in the interrupt affinity spreading code is strictly unsigned int. Though the code uses int all over the place. Convert it over to unsigned int. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: Keith Busch <keith.busch@intel.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com> Link: https://lkml.kernel.org/r/20190216172228.336424556@linutronix.de
-
- 14 Feb, 2019 7 commits
-
-
Thomas Gleixner authored
Pick up upstream changes to avoid conflicts for pending patches.
-
Atish Patra authored
riscv_hartid_to_cpuid can return invalid cpuid for a hart that is present in DT but was never brought up. Print the appropriate warning message and continue. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Aaro Koskinen authored
When using cpufreq on Loongson 2F MIPS platform, "poweroff" command gets frequently stuck in syscore_shutdown(). The reason is that i8259A_shutdown() gets called before cpufreq_suspend(), and if we have pending work then irq_work_sync() in cpufreq_dbs_governor_stop() gets stuck forever as we have all interrupts masked already. irq-i8259 is registering syscore_ops using device_initcall(), while cpufreq uses core_initcall(). Fix the shutdown order simply by registering the irq syscore_ops during the early IRQ init instead of using a separate initcall at later stage. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Jiaxun Yang authored
Dt-bindings doc about Loongson-1 interrupt controller. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Jiaxun Yang authored
This controller appeared on Loongson-1 family MCUs including Loongson-1B and Loongson-1C. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Zenghui Yu authored
In current logic, its_parse_indirect_baser() will be invoked twice when allocating Device tables. Add a *break* to omit the unnecessary and annoying (might be ...) invoking. Fixes: 32bd44dc ("irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size") Cc: stable@vger.kernel.org Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
Julien Thierry authored
ready_percpu_nmi() was the previous name of prepare_percpu_nmi(). Update request_percpu_nmi() comment with the correct function name. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Reported-by: Li Wei <liwei391@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
-
- 13 Feb, 2019 1 commit
-
-
Waiman Long authored
Commit: 1136b072 ("genirq: Avoid summation loops for /proc/stat") adds a new irq_desc::tot_count field, without documenting it. Add the missing piece of documentation. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1549983253-19107-1-git-send-email-longman@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 10 Feb, 2019 5 commits
-
-
Matthias Kaehlcke authored
When a CPU is unplugged the kernel threads of this CPU are parked (see smpboot_park_threads()). kthread_park() is used to mark each thread as parked and wake it up, so it can complete the process of parking itselfs (see smpboot_thread_fn()). If local softirqs are pending on interrupt exit invoke_softirq() is called to process the softirqs, however it skips processing when the softirq kernel thread of the local CPU is scheduled to run. The softirq kthread is one of the threads that is parked when a CPU is unplugged. Parking the kthread wakes it up, however only to complete the parking process, not to process the pending softirqs. Hence processing of softirqs at the end of an interrupt is skipped, but not done elsewhere, which can result in warnings about pending softirqs when a CPU is unplugged: /sys/devices/system/cpu # echo 0 > cpu4/online [ ... ] NOHZ: local_softirq_pending 02 [ ... ] NOHZ: local_softirq_pending 202 [ ... ] CPU4: shutdown [ ... ] psci: CPU4 killed. Don't skip processing of softirqs at the end of an interrupt when the softirq thread of the CPU is parking. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Douglas Anderson <dianders@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org> Link: https://lkml.kernel.org/r/20190128234625.78241-3-mka@chromium.org
-
Matthias Kaehlcke authored
kthread_should_park() is used to check if the calling kthread ('current') should park, but there is no function to check whether an arbitrary kthread should be parked. The latter is required to plug a CPU hotplug race vs. a parking ksoftirqd thread. The new __kthread_should_park() receives a task_struct as parameter to check if the corresponding kernel thread should be parked. Call __kthread_should_park() from kthread_should_park() to avoid code duplication. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Douglas Anderson <dianders@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org> Link: https://lkml.kernel.org/r/20190128234625.78241-2-mka@chromium.org
-
Thomas Gleixner authored
Waiman reported that on large systems with a large amount of interrupts the readout of /proc/stat takes a long time to sum up the interrupt statistics. In principle this is not a problem. but for unknown reasons some enterprise quality software reads /proc/stat with a high frequency. The reason for this is that interrupt statistics are accounted per cpu. So the /proc/stat logic has to sum up the interrupt stats for each interrupt. The interrupt core provides now a per interrupt summary counter which can be used to avoid the summation loops completely except for interrupts marked PER_CPU which are only a small fraction of the interrupt space if at all. Another simplification is to iterate only over the active interrupts and skip the potentially large gaps in the interrupt number space and just print zeros for the gaps without going into the interrupt core in the first place. Waiman provided test results from a 4-socket IvyBridge-EX system (60-core 120-thread, 3016 irqs) excuting a test program which reads /proc/stat 50,000 times: Before: 18.436s (sys 18.380s) After: 3.769s (sys 3.742s) Reported-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: linux-fsdevel@vger.kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/20190208135021.013828701@linutronix.de
-
Thomas Gleixner authored
Waiman reported that on large systems with a large amount of interrupts the readout of /proc/stat takes a long time to sum up the interrupt statistics. In principle this is not a problem. but for unknown reasons some enterprise quality software reads /proc/stat with a high frequency. The reason for this is that interrupt statistics are accounted per cpu. So the /proc/stat logic has to sum up the interrupt stats for each interrupt. This can be largely avoided for interrupts which are not marked as 'PER_CPU' interrupts by simply adding a per interrupt summation counter which is incremented along with the per interrupt per cpu counter. The PER_CPU interrupts need to avoid that and use only per cpu accounting because they share the interrupt number and the interrupt descriptor and concurrent updates would conflict or require unwanted synchronization. Reported-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-fsdevel@vger.kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de 8<------------- v2: Undo the unintentional layout change of struct irq_desc. include/linux/irqdesc.h | 1 + kernel/irq/chip.c | 12 ++++++++++-- kernel/irq/internals.h | 8 +++++++- kernel/irq/irqdesc.c | 7 ++++++- 4 files changed, 24 insertions(+), 4 deletions(-)
-
Ming Lei authored
'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(), so move it into irq_build_affinity_masks(). No functioanl change. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Link: https://lkml.kernel.org/r/20190125095347.17950-2-ming.lei@redhat.com
-
- 07 Feb, 2019 9 commits
-
-
git://git.infradead.org/linux-platform-drivers-x86Linus Torvalds authored
Pull x86 platform driver fixlet from Darren Hart: "Correct Documentation/ABI 4.21 KernelVersion to 5.0" * tag 'platform-drivers-x86-v5.0-2' of git://git.infradead.org/linux-platform-drivers-x86: Documentation/ABI: Correct mlxreg-io KernelVersion for 5.0
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull KVM fixes from Paolo Bonzini: "Three security fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
-
git://linux-nfs.org/~bfields/linuxLinus Torvalds authored
Pull nfsd fixes from Bruce Fields: "Two small nfsd bugfixes for 5.0, for an RDMA bug and a file clone bug" * tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux: svcrdma: Remove max_sge check at connect time nfsd: Fix error return values for nfsd4_clone_file_range()
-
Linus Torvalds authored
Merge tag 'for-5.0/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "Both of these fixes address issues in changes merged for 5.0-rc4: - Fix DM core's missing memory barrier before waitqueue_active() calls. - Fix DM core's clone_bio() to work when cloning a subset of a bio with an integrity payload; bio_integrity_trim() wasn't getting called due to bio_trim()'s early return" * tag 'for-5.0/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: don't use bio_trim() afterall dm: add memory barrier before waitqueue_active
-
Peter Shier authored
Bugzilla: 1671904 There are multiple code paths where an hrtimer may have been started to emulate an L1 VMX preemption timer that can result in a call to free_nested without an intervening L2 exit where the hrtimer is normally cancelled. Unconditionally cancel in free_nested to cover all cases. Embargoed until Feb 7th 2019. Signed-off-by: Peter Shier <pshier@google.com> Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@kernel.org Message-Id: <20181011184646.154065-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Bugzilla: 1671930 Emulation of certain instructions (VMXON, VMCLEAR, VMPTRLD, VMWRITE with memory operand, INVEPT, INVVPID) can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Embargoed until Feb 7th 2019. Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jann Horn authored
kvm_ioctl_create_device() does the following: 1. creates a device that holds a reference to the VM object (with a borrowed reference, the VM's refcount has not been bumped yet) 2. initializes the device 3. transfers the reference to the device to the caller's file descriptor table 4. calls kvm_get_kvm() to turn the borrowed reference to the VM into a real reference The ownership transfer in step 3 must not happen before the reference to the VM becomes a proper, non-borrowed reference, which only happens in step 4. After step 3, an attacker can close the file descriptor and drop the borrowed reference, which can cause the refcount of the kvm object to drop to zero. This means that we need to grab a reference for the device before anon_inode_getfd(), otherwise the VM can disappear from under us. Fixes: 852b6d57 ("kvm: add device control API") Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds authored
Pull HID fix from Jiri Kosina: "A fix for a bug in hid-debug that can lock up the kernel in infinite loop (CVE-2019-3819), from Vladis Dronov" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: debug: fix the ring buffer implementation
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "A collection of a few small fixes. The most significant one is the fix for the possible race at loading HD-audio drivers. This has been present for long time and surfaced only in a rare occasion, but finally spotted out" * tag 'sound-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/ca0132 - Fix build error without CONFIG_PCI ALSA: compress: Fix stop handling on compressed capture streams ALSA: usb-audio: Add support for new T+A USB DAC ALSA: hda - Serialize codec registrations ALSA: hda/realtek - Use a common helper for hp pin reference ALSA: hda/realtek - Fix lose hp_pins for disable auto mute ALSA: hda/realtek - Headset microphone support for System76 darp5
-