Commit a0b13403 authored by Michael Kelley's avatar Michael Kelley Committed by Wei Liu

Documentation: hyperv: Improve synic and interrupt handling description

Current documentation does not describe how Linux handles the synthetic
interrupt controller (synic) that Hyper-V provides to guest VMs, nor how
VMBus or timer interrupts are handled. Add text describing the synic and
reorganize existing text to make this more clear.
Signed-off-by: default avatarMichael Kelley <mhklinux@outlook.com>
Reviewed-by: default avatarEaswar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20240511133818.19649-2-mhklinux@outlook.comSigned-off-by: default avatarWei Liu <wei.liu@kernel.org>
Message-ID: <20240511133818.19649-2-mhklinux@outlook.com>
parent 4c5a65fd
...@@ -62,12 +62,21 @@ shared page with scale and offset values into user space. User ...@@ -62,12 +62,21 @@ shared page with scale and offset values into user space. User
space code performs the same algorithm of reading the TSC and space code performs the same algorithm of reading the TSC and
applying the scale and offset to get the constant 10 MHz clock. applying the scale and offset to get the constant 10 MHz clock.
Linux clockevents are based on Hyper-V synthetic timer 0. While Linux clockevents are based on Hyper-V synthetic timer 0 (stimer0).
Hyper-V offers 4 synthetic timers for each CPU, Linux only uses While Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
timer 0. Interrupts from stimer0 are recorded on the "HVS" line in timer 0. In older versions of Hyper-V, an interrupt from stimer0
/proc/interrupts. Clockevents based on the virtualized PIT and results in a VMBus control message that is demultiplexed by
local APIC timer also work, but the Hyper-V synthetic timer is vmbus_isr() as described in the Documentation/virt/hyperv/vmbus.rst
preferred. documentation. In newer versions of Hyper-V, stimer0 interrupts can
be mapped to an architectural interrupt, which is referred to as
"Direct Mode". Linux prefers to use Direct Mode when available. Since
x86/x64 doesn't support per-CPU interrupts, Direct Mode statically
allocates an x86 interrupt vector (HYPERV_STIMER0_VECTOR) across all CPUs
and explicitly codes it to call the stimer0 interrupt handler. Hence
interrupts from stimer0 are recorded on the "HVS" line in /proc/interrupts
rather than being associated with a Linux IRQ. Clockevents based on the
virtualized PIT and local APIC timer also work, but Hyper-V stimer0
is preferred.
The driver for the Hyper-V synthetic system clock and timers is The driver for the Hyper-V synthetic system clock and timers is
drivers/clocksource/hyperv_timer.c. drivers/clocksource/hyperv_timer.c.
...@@ -102,10 +102,10 @@ resources. For Windows Server 2019 and later, this limit is ...@@ -102,10 +102,10 @@ resources. For Windows Server 2019 and later, this limit is
approximately 1280 Mbytes. For versions prior to Windows Server approximately 1280 Mbytes. For versions prior to Windows Server
2019, the limit is approximately 384 Mbytes. 2019, the limit is approximately 384 Mbytes.
VMBus messages VMBus channel messages
-------------- ----------------------
All VMBus messages have a standard header that includes the message All messages sent in a VMBus channel have a standard header that includes
length, the offset of the message payload, some flags, and a the message length, the offset of the message payload, some flags, and a
transactionID. The portion of the message after the header is transactionID. The portion of the message after the header is
unique to each VSP/VSC pair. unique to each VSP/VSC pair.
...@@ -137,7 +137,7 @@ control message contains a list of GPAs that describe the data ...@@ -137,7 +137,7 @@ control message contains a list of GPAs that describe the data
buffer. For example, the storvsc driver uses this approach to buffer. For example, the storvsc driver uses this approach to
specify the data buffers to/from which disk I/O is done. specify the data buffers to/from which disk I/O is done.
Three functions exist to send VMBus messages: Three functions exist to send VMBus channel messages:
1. vmbus_sendpacket(): Control-only messages and messages with 1. vmbus_sendpacket(): Control-only messages and messages with
embedded data -- no GPAs embedded data -- no GPAs
...@@ -165,6 +165,37 @@ performed in this temporary buffer without the risk of Hyper-V ...@@ -165,6 +165,37 @@ performed in this temporary buffer without the risk of Hyper-V
maliciously modifying the message after it is validated but before maliciously modifying the message after it is validated but before
it is used. it is used.
Synthetic Interrupt Controller (synic)
--------------------------------------
Hyper-V provides each guest CPU with a synthetic interrupt controller
that is used by VMBus for host-guest communication. While each synic
defines 16 synthetic interrupts (SINT), Linux uses only one of the 16
(VMBUS_MESSAGE_SINT). All interrupts related to communication between
the Hyper-V host and a guest CPU use that SINT.
The SINT is mapped to a single per-CPU architectural interrupt (i.e,
an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
each CPU in the guest has a synic and may receive VMBus interrupts,
they are best modeled in Linux as per-CPU interrupts. This model works
well on arm64 where a single per-CPU Linux IRQ is allocated for
VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
"Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
across all CPUs and explicitly coded to call vmbus_isr(). In this case,
there's no Linux IRQ, and the interrupts are visible in aggregate in
/proc/interrupts on the "HYP" line.
The synic provides the means to demultiplex the architectural interrupt into
one or more logical interrupts and route the logical interrupt to the proper
VMBus handler in Linux. This demultiplexing is done by vmbus_isr() and
related functions that access synic data structures.
The synic is not modeled in Linux as an irq chip or irq domain,
and the demultiplexed logical interrupts are not Linux IRQs. As such,
they don't appear in /proc/interrupts or /proc/irq. The CPU
affinity for one of these logical interrupts is controlled via an
entry under /sys/bus/vmbus as described below.
VMBus interrupts VMBus interrupts
---------------- ----------------
VMBus provides a mechanism for the guest to interrupt the host when VMBus provides a mechanism for the guest to interrupt the host when
...@@ -176,16 +207,18 @@ unnecessary. If a guest sends an excessive number of unnecessary ...@@ -176,16 +207,18 @@ unnecessary. If a guest sends an excessive number of unnecessary
interrupts, the host may throttle that guest by suspending its interrupts, the host may throttle that guest by suspending its
execution for a few seconds to prevent a denial-of-service attack. execution for a few seconds to prevent a denial-of-service attack.
Similarly, the host will interrupt the guest when it sends a new Similarly, the host will interrupt the guest via the synic when
message on the VMBus control path, or when a VMBus channel "in" ring it sends a new message on the VMBus control path, or when a VMBus
buffer transitions from empty to non-empty. Each CPU in the guest channel "in" ring buffer transitions from empty to non-empty due to
may receive VMBus interrupts, so they are best modeled as per-CPU the host inserting a new VMBus channel message. The control message stream
interrupts in Linux. This model works well on arm64 where a single and each VMBus channel "in" ring buffer are separate logical interrupts
per-CPU IRQ is allocated for VMBus. Since x86/x64 lacks support for that are demultiplexed by vmbus_isr(). It demultiplexes by first checking
per-CPU IRQs, an x86 interrupt vector is statically allocated (see for channel interrupts by calling vmbus_chan_sched(), which looks at a synic
HYPERVISOR_CALLBACK_VECTOR) across all CPUs and explicitly coded to bitmap to determine which channels have pending interrupts on this CPU.
call the VMBus interrupt service routine. These interrupts are If multiple channels have pending interrupts for this CPU, they are
visible in /proc/interrupts on the "HYP" line. processed sequentially. When all channel interrupts have been processed,
vmbus_isr() checks for and processes any messages received on the VMBus
control path.
The guest CPU that a VMBus channel will interrupt is selected by the The guest CPU that a VMBus channel will interrupt is selected by the
guest when the channel is created, and the host is informed of that guest when the channel is created, and the host is informed of that
...@@ -212,10 +245,9 @@ neither "unmanaged" nor "managed" interrupts. ...@@ -212,10 +245,9 @@ neither "unmanaged" nor "managed" interrupts.
The CPU that a VMBus channel will interrupt can be seen in The CPU that a VMBus channel will interrupt can be seen in
/sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu. /sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
When running on later versions of Hyper-V, the CPU can be changed When running on later versions of Hyper-V, the CPU can be changed
by writing a new value to this sysfs entry. Because the interrupt by writing a new value to this sysfs entry. Because VMBus channel
assignment is done outside of the normal Linux affinity mechanism, interrupts are not Linux IRQs, there are no entries in /proc/interrupts
there are no entries in /proc/irq corresponding to individual or /proc/irq corresponding to individual VMBus channel interrupts.
VMBus channel interrupts.
An online CPU in a Linux guest may not be taken offline if it has An online CPU in a Linux guest may not be taken offline if it has
VMBus channel interrupts assigned to it. Any such channel VMBus channel interrupts assigned to it. Any such channel
...@@ -223,15 +255,6 @@ interrupts must first be manually reassigned to another CPU as ...@@ -223,15 +255,6 @@ interrupts must first be manually reassigned to another CPU as
described above. When no channel interrupts are assigned to the described above. When no channel interrupts are assigned to the
CPU, it can be taken offline. CPU, it can be taken offline.
When a guest CPU receives a VMBus interrupt from the host, the
function vmbus_isr() handles the interrupt. It first checks for
channel interrupts by calling vmbus_chan_sched(), which looks at a
bitmap setup by the host to determine which channels have pending
interrupts on this CPU. If multiple channels have pending
interrupts for this CPU, they are processed sequentially. When all
channel interrupts have been processed, vmbus_isr() checks for and
processes any message received on the VMBus control path.
The VMBus channel interrupt handling code is designed to work The VMBus channel interrupt handling code is designed to work
correctly even if an interrupt is received on a CPU other than the correctly even if an interrupt is received on a CPU other than the
CPU assigned to the channel. Specifically, the code does not use CPU assigned to the channel. Specifically, the code does not use
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment