1. 23 Feb, 2019 1 commit
    • Thomas Gleixner's avatar
      Merge tag 'irqchip-5.1' of... · a324ca9c
      Thomas Gleixner authored
      Merge tag 'irqchip-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
      
      Pull irqchip updates from Marc Zyngier
      
      - Core pseudo-NMI handling code
      - Allow the default irq domain to be retrieved
      - A new interrupt controller for the Loongson LS1X platform
      - Affinity support for the SiFive PLIC
      - Better support for the iMX irqsteer driver
      - NUMA aware memory allocations for GICv3
      - A handful of other fixes (i8259, GICv3, PLIC)
      a324ca9c
  2. 22 Feb, 2019 4 commits
  3. 21 Feb, 2019 7 commits
    • Doug Berger's avatar
      irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code · 33517881
      Doug Berger authored
      Using the irq_gc_lock/irq_gc_unlock functions in the suspend and
      resume functions creates the opportunity for a deadlock during
      suspend, resume, and shutdown. Using the irq_gc_lock_irqsave/
      irq_gc_unlock_irqrestore variants prevents this possible deadlock.
      
      Cc: stable@vger.kernel.org
      Fixes: 7f646e92 ("irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      [maz: tidied up $SUBJECT]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      33517881
    • Shanker Donthineni's avatar
      irqchip/gicv3-its: Use NUMA aware memory allocation for ITS tables · 539d3782
      Shanker Donthineni authored
      The NUMA node information is visible to ITS driver but not being used
      other than handling hardware errata. ITS/GICR hardware accesses to the
      local NUMA node is usually quicker than the remote NUMA node. How slow
      the remote NUMA accesses are depends on the implementation details.
      
      This patch allocates memory for ITS management tables and command
      queue from the corresponding NUMA node using the appropriate NUMA
      aware functions. This change improves the performance of the ITS
      tables read latency on systems where it has more than one ITS block,
      and with the slower inter node accesses.
      
      Apache Web server benchmarking using ab tool on a HiSilicon D06
      board with multiple numa mem nodes shows Time per request and
      Transfer rate improvements of ~3.6% with this patch.
      Signed-off-by: default avatarShanker Donthineni <shankerd@codeaurora.org>
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarShameer Kolothum <shameerali.kolothum.thodi@huawei.com>
      Reviewed-by: default avatarGanapatrao Kulkarni <gkulkarni@marvell.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      539d3782
    • Marc Zyngier's avatar
      irqdomain: Allow the default irq domain to be retrieved · 9f199dd3
      Marc Zyngier authored
      The default irq domain allows legacy code to create irqdomain
      mappings without having to track the domain it is allocating
      from. Setting the default domain is a one shot, fire and forget
      operation, and no effort was made to be able to retrieve this
      information at a later point in time.
      
      Newer irqdomain APIs (the hierarchical stuff) relies on both
      the irqchip code to track the irqdomain it is allocating from,
      as well as some form of firmware abstraction to easily identify
      which piece of HW maps to which irq domain (DT, ACPI).
      
      For systems without such firmware (or legacy platform that are
      getting dragged into the 21st century), things are a bit harder.
      For these cases (and these cases only!), let's provide a way
      to retrieve the default domain, allowing the use of the v2 API
      without having to resort to platform-specific hacks.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      9f199dd3
    • Anup Patel's avatar
      irqchip/sifive-plic: Implement irq_set_affinity() for SMP host · cc9f04f9
      Anup Patel authored
      Currently on SMP host, all CPUs take external interrupts routed via
      PLIC. All CPUs will try to claim a given external interrupt but only
      one of them will succeed while other CPUs would simply resume whatever
      they were doing before. This means if we have N CPUs then for every
      external interrupt N-1 CPUs will always fail to claim it and waste
      their CPU time.
      
      Instead of above, external interrupts should be taken by only one CPU
      and we should have provision to explicitly specify IRQ affinity from
      kernel-space or user-space.
      
      This patch provides irq_set_affinity() implementation for PLIC driver.
      It also updates irq_enable() such that PLIC interrupts are only enabled
      for one of CPUs specified in IRQ affinity mask.
      
      With this patch in-place, we can change IRQ affinity at any-time from
      user-space using procfs.
      
      Example:
      
      / # cat /proc/interrupts
                 CPU0       CPU1       CPU2       CPU3
        8:         44          0          0          0  SiFive PLIC   8  virtio0
       10:         48          0          0          0  SiFive PLIC  10  ttyS0
      IPI0:        55        663         58        363  Rescheduling interrupts
      IPI1:         0          1          3         16  Function call interrupts
      / #
      / #
      / # echo 4 > /proc/irq/10/smp_affinity
      / #
      / # cat /proc/interrupts
                 CPU0       CPU1       CPU2       CPU3
        8:         45          0          0          0  SiFive PLIC   8  virtio0
       10:        160          0         17          0  SiFive PLIC  10  ttyS0
      IPI0:        68        693         77        410  Rescheduling interrupts
      IPI1:         0          2          3         16  Function call interrupts
      Signed-off-by: default avatarAnup Patel <anup@brainfault.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      cc9f04f9
    • Anup Patel's avatar
      irqchip/sifive-plic: Differentiate between PLIC handler and context · 6adfe8d2
      Anup Patel authored
      We explicitly differentiate between PLIC handler and context because
      PLIC context is for given mode of HART whereas PLIC handler is per-CPU
      software construct meant for handling interrupts from a particular
      PLIC context.
      
      To achieve this differentiation, we rename "nr_handlers" to "nr_contexts"
      and "nr_mapped" to "nr_handlers" in plic_init().
      Signed-off-by: default avatarAnup Patel <anup@brainfault.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      6adfe8d2
    • Anup Patel's avatar
      irqchip/sifive-plic: Add warning in plic_init() if handler already present · 3fecb5aa
      Anup Patel authored
      We have two enteries (one for M-mode and another for S-mode) in the
      interrupts-extended DT property of PLIC DT node for each HART. It is
      expected that firmware/bootloader will set M-mode HWIRQ line of each
      HART to 0xffffffff (i.e. -1) in interrupts-extended DT property
      because Linux runs in S-mode only.
      
      If firmware/bootloader is buggy then it will not correctly update
      interrupts-extended DT property which might result in a plic_handler
      configured twice. This patch adds a warning in plic_init() if a
      plic_handler is already marked present. This warning provides us
      a hint about incorrectly updated interrupts-extended DT property.
      Signed-off-by: default avatarAnup Patel <anup@brainfault.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      3fecb5aa
    • Anup Patel's avatar
      irqchip/sifive-plic: Pre-compute context hart base and enable base · 86c7cbf1
      Anup Patel authored
      This patch does following optimizations:
      1. Pre-compute hart base for each context handler
      2. Pre-compute enable base for each context handler
      3. Have enable lock for each context handler instead
      of global plic_toggle_lock
      Signed-off-by: default avatarAnup Patel <anup@brainfault.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      86c7cbf1
  4. 18 Feb, 2019 6 commits
    • Thomas Gleixner's avatar
      PCI/MSI: Remove obsolete sanity checks for multiple interrupt sets · 4e6b26d2
      Thomas Gleixner authored
      Multiple interrupt sets for affinity spreading are now handled in the core
      code and the number of sets and their size is recalculated via a driver
      supplied callback.
      
      That avoids the requirement to invoke pci_alloc_irq_vectors_affinity() with
      the arguments minvecs and maxvecs set to the same value and the callsite
      handling the ENOSPC situation.
      
      Remove the now obsolete sanity checks and the related comments.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.778630549@linutronix.de
      
      4e6b26d2
    • Thomas Gleixner's avatar
      genirq/affinity: Remove the leftovers of the original set support · a6a309ed
      Thomas Gleixner authored
      Now that the NVME driver is converted over to the calc_set() callback, the
      workarounds of the original set support can be removed.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.689834224@linutronix.de
      a6a309ed
    • Ming Lei's avatar
      nvme-pci: Simplify interrupt allocation · 612b7286
      Ming Lei authored
      The NVME PCI driver contains a tedious mechanism for interrupt
      allocation, which is necessary to adjust the number and size of interrupt
      sets to the maximum available number of interrupts which depends on the
      underlying PCI capabilities and the available CPU resources.
      
      It works around the former short comings of the PCI and core interrupt
      allocation mechanims in combination with interrupt sets.
      
      The PCI interrupt allocation function allows to provide a maximum and a
      minimum number of interrupts to be allocated and tries to allocate as
      many as possible. This worked without driver interaction as long as there
      was only a single set of interrupts to handle.
      
      With the addition of support for multiple interrupt sets in the generic
      affinity spreading logic, which is invoked from the PCI interrupt
      allocation, the adaptive loop in the PCI interrupt allocation did not
      work for multiple interrupt sets. The reason is that depending on the
      total number of interrupts which the PCI allocation adaptive loop tries
      to allocate in each step, the number and the size of the interrupt sets
      need to be adapted as well. Due to the way the interrupt sets support was
      implemented there was no way for the PCI interrupt allocation code or the
      core affinity spreading mechanism to invoke a driver specific function
      for adapting the interrupt sets configuration.
      
      As a consequence the driver had to implement another adaptive loop around
      the PCI interrupt allocation function and calling that with maximum and
      minimum interrupts set to the same value. This ensured that the
      allocation either succeeded or immediately failed without any attempt to
      adjust the number of interrupts in the PCI code.
      
      The core code now allows drivers to provide a callback to recalculate the
      number and the size of interrupt sets during PCI interrupt allocation,
      which in turn allows the PCI interrupt allocation function to be called
      in the same way as with a single set of interrupts. The PCI code handles
      the adaptive loop and the interrupt affinity spreading mechanism invokes
      the driver callback to adapt the interrupt set configuration to the
      current loop value. This replaces the adaptive loop in the driver
      completely.
      
      Implement the NVME specific callback which adjusts the interrupt sets
      configuration and remove the adaptive allocation loop.
      
      [ tglx: Simplify the callback further and restore the dropped adjustment of
        	number of sets ]
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.602546658@linutronix.de
      
      612b7286
    • Ming Lei's avatar
      genirq/affinity: Add new callback for (re)calculating interrupt sets · c66d4bd1
      Ming Lei authored
      The interrupt affinity spreading mechanism supports to spread out
      affinities for one or more interrupt sets. A interrupt set contains one or
      more interrupts. Each set is mapped to a specific functionality of a
      device, e.g. general I/O queues and read I/O queus of multiqueue block
      devices.
      
      The number of interrupts per set is defined by the driver. It depends on
      the total number of available interrupts for the device, which is
      determined by the PCI capabilites and the availability of underlying CPU
      resources, and the number of queues which the device provides and the
      driver wants to instantiate.
      
      The driver passes initial configuration for the interrupt allocation via a
      pointer to struct irq_affinity.
      
      Right now the allocation mechanism is complex as it requires to have a loop
      in the driver to determine the maximum number of interrupts which are
      provided by the PCI capabilities and the underlying CPU resources.  This
      loop would have to be replicated in every driver which wants to utilize
      this mechanism. That's unwanted code duplication and error prone.
      
      In order to move this into generic facilities it is required to have a
      mechanism, which allows the recalculation of the interrupt sets and their
      size, in the core code. As the core code does not have any knowledge about the
      underlying device, a driver specific callback is required in struct
      irq_affinity, which can be invoked by the core code. The callback gets the
      number of available interupts as an argument, so the driver can calculate the
      corresponding number and size of interrupt sets.
      
      At the moment the struct irq_affinity pointer which is handed in from the
      driver and passed through to several core functions is marked 'const', but for
      the callback to be able to modify the data in the struct it's required to
      remove the 'const' qualifier.
      
      Add the optional callback to struct irq_affinity, which allows drivers to
      recalculate the number and size of interrupt sets and remove the 'const'
      qualifier.
      
      For simple invocations, which do not supply a callback, a default callback
      is installed, which just sets nr_sets to 1 and transfers the number of
      spreadable vectors to the set_size array at index 0.
      
      This is for now guarded by a check for nr_sets != 0 to keep the NVME driver
      working until it is converted to the callback mechanism.
      
      To make sure that the driver configuration is correct under all circumstances
      the callback is invoked even when there are no interrupts for queues left,
      i.e. the pre/post requirements already exhaust the numner of available
      interrupts.
      
      At the PCI layer irq_create_affinity_masks() has to be invoked even for the
      case where the legacy interrupt is used. That ensures that the callback is
      invoked and the device driver can adjust to that situation.
      
      [ tglx: Fixed the simple case (no sets required). Moved the sanity check
        	for nr_sets after the invocation of the callback so it catches
        	broken drivers. Fixed the kernel doc comments for struct
        	irq_affinity and de-'This patch'-ed the changelog ]
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.512444498@linutronix.de
      
      c66d4bd1
    • Ming Lei's avatar
      genirq/affinity: Store interrupt sets size in struct irq_affinity · 9cfef55b
      Ming Lei authored
      The interrupt affinity spreading mechanism supports to spread out
      affinities for one or more interrupt sets. A interrupt set contains one
      or more interrupts. Each set is mapped to a specific functionality of a
      device, e.g. general I/O queues and read I/O queus of multiqueue block
      devices.
      
      The number of interrupts per set is defined by the driver. It depends on
      the total number of available interrupts for the device, which is
      determined by the PCI capabilites and the availability of underlying CPU
      resources, and the number of queues which the device provides and the
      driver wants to instantiate.
      
      The driver passes initial configuration for the interrupt allocation via
      a pointer to struct irq_affinity.
      
      Right now the allocation mechanism is complex as it requires to have a
      loop in the driver to determine the maximum number of interrupts which
      are provided by the PCI capabilities and the underlying CPU resources.
      This loop would have to be replicated in every driver which wants to
      utilize this mechanism. That's unwanted code duplication and error
      prone.
      
      In order to move this into generic facilities it is required to have a
      mechanism, which allows the recalculation of the interrupt sets and
      their size, in the core code. As the core code does not have any
      knowledge about the underlying device, a driver specific callback will
      be added to struct affinity_desc, which will be invoked by the core
      code. The callback will get the number of available interupts as an
      argument, so the driver can calculate the corresponding number and size
      of interrupt sets.
      
      To support this, two modifications for the handling of struct irq_affinity
      are required:
      
      1) The (optional) interrupt sets size information is contained in a
         separate array of integers and struct irq_affinity contains a
         pointer to it.
      
         This is cumbersome and as the maximum number of interrupt sets is small,
         there is no reason to have separate storage. Moving the size array into
         struct affinity_desc avoids indirections and makes the code simpler.
      
      2) At the moment the struct irq_affinity pointer which is handed in from
         the driver and passed through to several core functions is marked
         'const'.
      
         With the upcoming callback to recalculate the number and size of
         interrupt sets, it's necessary to remove the 'const'
         qualifier. Otherwise the callback would not be able to update the data.
      
      Implement #1 and store the interrupt sets size in 'struct irq_affinity'.
      
      No functional change.
      
      [ tglx: Fixed the memcpy() size so it won't copy beyond the size of the
        	source. Fixed the kernel doc comments for struct irq_affinity and
        	de-'This patch'-ed the changelog ]
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.423723127@linutronix.de
      
      9cfef55b
    • Thomas Gleixner's avatar
      genirq/affinity: Code consolidation · 0145c30e
      Thomas Gleixner authored
      All information and calculations in the interrupt affinity spreading code
      is strictly unsigned int. Though the code uses int all over the place.
      
      Convert it over to unsigned int.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.336424556@linutronix.de
      0145c30e
  5. 14 Feb, 2019 7 commits
  6. 13 Feb, 2019 1 commit
  7. 10 Feb, 2019 5 commits
    • Matthias Kaehlcke's avatar
      softirq: Don't skip softirq execution when softirq thread is parking · 1342d808
      Matthias Kaehlcke authored
      When a CPU is unplugged the kernel threads of this CPU are parked (see
      smpboot_park_threads()). kthread_park() is used to mark each thread as
      parked and wake it up, so it can complete the process of parking itselfs
      (see smpboot_thread_fn()).
      
      If local softirqs are pending on interrupt exit invoke_softirq() is called
      to process the softirqs, however it skips processing when the softirq
      kernel thread of the local CPU is scheduled to run. The softirq kthread is
      one of the threads that is parked when a CPU is unplugged. Parking the
      kthread wakes it up, however only to complete the parking process, not to
      process the pending softirqs. Hence processing of softirqs at the end of an
      interrupt is skipped, but not done elsewhere, which can result in warnings
      about pending softirqs when a CPU is unplugged:
      
      /sys/devices/system/cpu # echo 0 > cpu4/online
      [ ... ] NOHZ: local_softirq_pending 02
      [ ... ] NOHZ: local_softirq_pending 202
      [ ... ] CPU4: shutdown
      [ ... ] psci: CPU4 killed.
      
      Don't skip processing of softirqs at the end of an interrupt when the
      softirq thread of the CPU is parking.
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Douglas Anderson <dianders@chromium.org>
      Cc: Stephen Boyd <swboyd@chromium.org>
      Link: https://lkml.kernel.org/r/20190128234625.78241-3-mka@chromium.org
      1342d808
    • Matthias Kaehlcke's avatar
      kthread: Add __kthread_should_park() · 0121805d
      Matthias Kaehlcke authored
      kthread_should_park() is used to check if the calling kthread ('current')
      should park, but there is no function to check whether an arbitrary kthread
      should be parked. The latter is required to plug a CPU hotplug race vs. a
      parking ksoftirqd thread.
      
      The new __kthread_should_park() receives a task_struct as parameter to
      check if the corresponding kernel thread should be parked.
      
      Call __kthread_should_park() from kthread_should_park() to avoid code
      duplication.
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Douglas Anderson <dianders@chromium.org>
      Cc: Stephen Boyd <swboyd@chromium.org>
      Link: https://lkml.kernel.org/r/20190128234625.78241-2-mka@chromium.org
      0121805d
    • Thomas Gleixner's avatar
      proc/stat: Make the interrupt statistics more efficient · c2da3f1b
      Thomas Gleixner authored
      Waiman reported that on large systems with a large amount of interrupts the
      readout of /proc/stat takes a long time to sum up the interrupt
      statistics. In principle this is not a problem. but for unknown reasons
      some enterprise quality software reads /proc/stat with a high frequency.
      
      The reason for this is that interrupt statistics are accounted per cpu. So
      the /proc/stat logic has to sum up the interrupt stats for each interrupt.
      
      The interrupt core provides now a per interrupt summary counter which can
      be used to avoid the summation loops completely except for interrupts
      marked PER_CPU which are only a small fraction of the interrupt space if at
      all.
      
      Another simplification is to iterate only over the active interrupts and
      skip the potentially large gaps in the interrupt number space and just
      print zeros for the gaps without going into the interrupt core in the first
      place.
      
      Waiman provided test results from a 4-socket IvyBridge-EX system (60-core
      120-thread, 3016 irqs) excuting a test program which reads /proc/stat
      50,000 times:
      
         Before: 18.436s (sys 18.380s)
         After:   3.769s (sys  3.742s)
      Reported-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/20190208135021.013828701@linutronix.de
      c2da3f1b
    • Thomas Gleixner's avatar
      genirq: Avoid summation loops for /proc/stat · 1136b072
      Thomas Gleixner authored
      Waiman reported that on large systems with a large amount of interrupts the
      readout of /proc/stat takes a long time to sum up the interrupt
      statistics. In principle this is not a problem. but for unknown reasons
      some enterprise quality software reads /proc/stat with a high frequency.
      
      The reason for this is that interrupt statistics are accounted per cpu. So
      the /proc/stat logic has to sum up the interrupt stats for each interrupt.
      
      This can be largely avoided for interrupts which are not marked as
      'PER_CPU' interrupts by simply adding a per interrupt summation counter
      which is incremented along with the per interrupt per cpu counter.
      
      The PER_CPU interrupts need to avoid that and use only per cpu accounting
      because they share the interrupt number and the interrupt descriptor and
      concurrent updates would conflict or require unwanted synchronization.
      Reported-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de
      
      
      8<-------------
      
      v2: Undo the unintentional layout change of struct irq_desc.
      
       include/linux/irqdesc.h |    1 +
       kernel/irq/chip.c       |   12 ++++++++++--
       kernel/irq/internals.h  |    8 +++++++-
       kernel/irq/irqdesc.c    |    7 ++++++-
       4 files changed, 24 insertions(+), 4 deletions(-)
      
      1136b072
    • Ming Lei's avatar
      genirq/affinity: Move allocation of 'node_to_cpumask' to irq_build_affinity_masks() · 347253c4
      Ming Lei authored
      'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(),
      so move it into irq_build_affinity_masks().
      
      No functioanl change.
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190125095347.17950-2-ming.lei@redhat.com
      347253c4
  8. 07 Feb, 2019 9 commits