Skip to content

pcie: add GICv3 ITS support for aarch64 MSI delivery#3441

Open
jstarks wants to merge 3 commits intomicrosoft:mainfrom
jstarks:its
Open

pcie: add GICv3 ITS support for aarch64 MSI delivery#3441
jstarks wants to merge 3 commits intomicrosoft:mainfrom
jstarks:its

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented May 8, 2026

Replace the GICv2m MSI controller with KVM's in-kernel GICv3 ITS for aarch64 PCIe MSI/MSI-X delivery. GICv2m maps MSI writes to a fixed pool of 64 SPIs, which doesn't scale (a single NVMe device with 128 queues exhausts it) and is incompatible with the ITS-based device ID model needed for future SMMU support. The ITS routes MSIs via LPIs using (DeviceID, EventID) lookup, supporting thousands of interrupt vectors across all devices.

KVM provides a complete in-kernel ITS (KVM_DEV_TYPE_ARM_VGIC_ITS) that handles all guest MMIO and command queue processing. The VMM creates the device, sets its base address, and initializes it. For emulated devices, MSIs are injected via KVM_SIGNAL_MSI with KVM_MSI_VALID_DEVID. For irqfd (VFIO passthrough), the kvm_irq_routing_msi entry carries the devid so the kernel signals the ITS directly.

The main design challenge is that PCIe devices don't know their own requester ID (bus/device/function), since bus numbers are assigned dynamically by guest firmware. This is solved with a per-device AssignedBusRange that the PCIe port updates atomically when the guest programs secondary/subordinate bus numbers. ITS wrappers (ItsSignalMsi, ItsIrqFd) compose the full 32-bit device ID as (segment << 16 | BDF) at interrupt delivery time, transparent to the devices themselves.

The SignalMsi trait changes from rid: u32 (always passed as 0) to devid: Option<u32>, and IrqFdRoute::enable gains a matching parameter. This is a mechanical change across all backends (KVM, WHP, MSHV, HVF).

Also adds ACPI IORT (IO Remapping Table) generation for aarch64, with ITS Group and PCI Root Complex nodes with ID mappings. The MADT gains a GIC ITS entry. DeviceTree generation emits an ITS child node under the GIC when ITS is configured, with msi-parent on PCIe host bridges pointing to the ITS phandle instead of v2m.

ITS support is probed at KVM init time via KVM_CREATE_DEVICE_TEST, falling back to GICv2m on kernels or hardware without ITS. A --gic-msi CLI option (auto/its/v2m) allows overriding the default selection. GICv2m remains available for GICv2-only configurations.

Replace the GICv2m MSI controller with KVM's in-kernel GICv3 ITS for
aarch64 PCIe MSI/MSI-X delivery. GICv2m maps MSI writes to a fixed pool
of 64 SPIs, which doesn't scale (a single NVMe device with 128 queues
exhausts it) and is incompatible with the ITS-based device ID model
needed for future SMMU support. The ITS routes MSIs via LPIs using
(DeviceID, EventID) lookup, supporting thousands of interrupt vectors
across all devices.

KVM provides a complete in-kernel ITS (KVM_DEV_TYPE_ARM_VGIC_ITS) that
handles all guest MMIO and command queue processing. The VMM creates the
device, sets its base address, and initializes it. For emulated devices,
MSIs are injected via KVM_SIGNAL_MSI with KVM_MSI_VALID_DEVID. For irqfd
(VFIO passthrough), the kvm_irq_routing_msi entry carries the devid so
the kernel signals the ITS directly.

The main design challenge is that PCIe devices don't know their own
requester ID (bus/device/function), since bus numbers are assigned
dynamically by guest firmware. This is solved with a per-device
AssignedBusRange that the PCIe port updates atomically when the guest
programs secondary/subordinate bus numbers. ITS wrappers (ItsSignalMsi,
ItsIrqFd) compose the full 32-bit device ID as (segment << 16 | BDF) at
interrupt delivery time, transparent to the devices themselves.

The SignalMsi trait changes from `rid: u32` (always passed as 0) to
`devid: Option<u32>`, and IrqFdRoute::enable gains a matching parameter.
This is a mechanical change across all backends (KVM, WHP, MSHV, HVF).

Also adds ACPI IORT (IO Remapping Table) generation for aarch64, with
ITS Group and PCI Root Complex nodes with ID mappings. The MADT gains a
GIC ITS entry. DeviceTree generation emits an ITS child node under the
GIC when ITS is configured, with msi-parent on PCIe host bridges
pointing to the ITS phandle instead of v2m.

ITS support is probed at KVM init time via KVM_CREATE_DEVICE_TEST,
falling back to GICv2m on kernels or hardware without ITS. A --gic-msi
CLI option (auto/its/v2m) allows overriding the default selection.
GICv2m remains available for GICv2-only configurations.
Copilot AI review requested due to automatic review settings May 8, 2026 16:45
@github-actions github-actions Bot added Guide unsafe Related to unsafe code labels May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades aarch64 PCIe MSI/MSI-X delivery by adding support for KVM’s in-kernel GICv3 ITS (LPI-based MSI routing) while keeping GICv2m as a fallback, and threads the required device-identity plumbing through the PCIe and hypervisor layers.

Changes:

  • Add an aarch64 MSI-controller selection model (GicMsiController / --gic-msi) and probe KVM ITS support, falling back to GICv2m when needed.
  • Introduce PCIe bus-range tracking (AssignedBusRange) and ITS wrappers that compose (segment << 16) | BDF for MSI signaling and irqfd routing.
  • Generate the required firmware descriptions for ITS routing (ACPI MADT ITS entry + IORT; device tree ITS child + msi-parent updates).

Reviewed changes

Copilot reviewed 54 out of 55 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vmm_core/vmotherboard/src/lib.rs Re-export AssignedBusRange for PCIe identity tracking integration.
vmm_core/vmotherboard/src/chipset/builder/mod.rs Thread optional AssignedBusRange through PCIe device registration.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/services.rs Extend static PCIe registration to carry optional bus-range identity.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/pci.rs Store and pass per-device optional AssignedBusRange during enumeration.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/device.rs Add builder API to attach an AssignedBusRange to PCIe devices.
vmm_core/vmotherboard/src/base_chipset.rs Propagate optional AssignedBusRange into PCIe device addition.
vmm_core/virt/src/x86/apic_software_device.rs Adapt SignalMsi signature to Option<u32> device identity.
vmm_core/virt/src/generic.rs Add supports_its to PlatformInfo; adapt SignalMsi signature.
vmm_core/virt/src/aarch64/gic_v2m.rs Update SignalMsi signature for v2m signaler.
vmm_core/virt/src/aarch64/gic_software_device.rs Update SignalMsi signature for software GIC implementation.
vmm_core/virt_whp/src/synic.rs Update SignalMsi signature for WHP MSI injection path.
vmm_core/virt_whp/src/lib.rs Replace gic_v2m with gic_msi controller selection; set supports_its=false.
vmm_core/virt_whp/src/device.rs Update SignalMsi signature for WHP device interrupt injection.
vmm_core/virt_mshv/src/x86_64/mod.rs Update SignalMsi signature for MSHV MSI injection.
vmm_core/virt_mshv/src/irqfd.rs Extend irqfd route enable to accept optional device identity.
vmm_core/virt_mshv/src/aarch64/mod.rs Add supports_its=false; adapt MSI signaling signature/behavior.
vmm_core/virt_kvm/src/lib.rs Track ITS device lifetime; store gic_msi controller selection.
vmm_core/virt_kvm/src/gsi.rs Extend KVM irqfd enable path to pass through optional device identity.
vmm_core/virt_kvm/src/arch/x86_64/mod.rs Plumb devid through KVM routing entries (unused on x86).
vmm_core/virt_kvm/src/arch/aarch64/mod.rs Probe/create/init ITS device; add ITS MSI injection and irqfd routing support; expose supports_its.
vmm_core/virt_hvf/src/lib.rs Add supports_its=false for HVF platform info.
vmm_core/src/device_builder.rs Pass optional AssignedBusRange into PCIe device build path.
vmm_core/src/acpi_builder.rs Add IORT generation for aarch64 + MADT ITS entry; add unit tests for IORT.
vm/vmcore/vm_topology/src/processor/aarch64.rs Replace gic_v2m option with GicMsiController (None/V2m/Its) in topology.
vm/vmcore/src/irqfd.rs Extend IrqFdRoute::enable with optional devid for ITS routing.
vm/kvm/src/lib.rs Extend KVM IRQ routing MSI entry to include optional devid + flags.
vm/devices/user_driver_emulated_mock/src/lib.rs Adapt mock MSI controller to new SignalMsi signature.
vm/devices/storage/nvme/src/tests/test_helpers.rs Update NVMe test MSI controller to new signature.
vm/devices/storage/nvme_test/src/tests/test_helpers.rs Update NVMe_test MSI controller to new signature.
vm/devices/pci/vpci/src/test_helpers/mod.rs Update VPCI test MSI signaling to new signature.
vm/devices/pci/pcie/src/switch.rs Add optional AssignedBusRange propagation into downstream port setup; route cfg writes via write_cfg.
vm/devices/pci/pcie/src/root.rs Add optional AssignedBusRange propagation to ports/hotplug; route cfg writes via write_cfg.
vm/devices/pci/pcie/src/port.rs Implement port-side bus-range tracking and cfg-write side effects for identity updates.
vm/devices/pci/pcie/src/lib.rs Export new bus_range and its modules.
vm/devices/pci/pcie/src/its.rs Add ITS wrappers (ItsSignalMsi, ItsIrqFd) composing segment+BDF device IDs.
vm/devices/pci/pcie/src/bus_range.rs Add shared atomic bus-range container and ITS devid composition helper.
vm/devices/pci/pcie/fuzz/fuzz_pcie.rs Update fuzz harness to match new PCIe root API signature.
vm/devices/pci/pcie/Cargo.toml Add pal_event dependency required by irqfd wrapper interface.
vm/devices/pci/pci_core/src/test_helpers/mod.rs Update test MSI controller signature.
vm/devices/pci/pci_core/src/msi.rs Change SignalMsi to Option<u32> devid; add route/target helpers for rid-aware signaling.
vm/devices/pci/pci_core/src/capabilities/msix.rs Update MSI-X delivery path to new MsiTarget API and irqfd enable signature.
vm/acpi_spec/src/madt.rs Add MADT GIC ITS entry type/struct.
vm/acpi_spec/src/lib.rs Export new iort module.
vm/acpi_spec/src/iort.rs Introduce IORT table/node/type definitions for aarch64 PCIe + ITS mapping.
tmk/tmk_vmm/src/run.rs Update TMK aarch64 topology config to gic_msi=None.
openvmm/openvmm_entry/src/lib.rs Wire --gic-msi into config (GicMsiConfig).
openvmm/openvmm_entry/src/cli_args.rs Add --gic-msi CLI option and GicMsiCli enum.
openvmm/openvmm_defs/src/config.rs Add ITS base/size constants and GicMsiConfig config enum.
openvmm/openvmm_core/src/worker/vm_loaders/linux.rs Emit device tree ITS child node + msi-parent selection (ITS vs v2m).
openvmm/openvmm_core/src/worker/dispatch.rs Select ITS vs v2m from platform/config; wrap PCIe MSI/irqfd with ITS devid injection; propagate AssignedBusRange.
openhcl/virt_mshv_vtl/src/lib.rs Update OpenHCL MSI signaling signature to Option<u32>.
openhcl/bootloader_fdt_parser/src/lib.rs Update parsed topology to gic_msi=None default.
Guide/src/reference/emulated/pcie/overview.md Document aarch64 MSI routing via ITS vs v2m and the --gic-msi override.
Guide/src/reference/devices/firmware/linux_direct.md Update ACPI table list to include ITS/IORT behavior for aarch64.
Cargo.lock Lockfile update for added pal_event dependency in pcie crate.

Comment thread openvmm/openvmm_entry/src/cli_args.rs
Comment thread openvmm/openvmm_core/src/worker/dispatch.rs Outdated
Comment thread openvmm/openvmm_core/src/worker/dispatch.rs Outdated
Comment thread vm/devices/pci/pcie/src/port.rs
@jstarks jstarks marked this pull request as ready for review May 8, 2026 17:14
@jstarks jstarks requested a review from a team as a code owner May 8, 2026 17:14
Copilot AI review requested due to automatic review settings May 8, 2026 17:14
@jstarks jstarks requested a review from a team as a code owner May 8, 2026 17:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 55 changed files in this pull request and generated 4 comments.

Comment thread openvmm/openvmm_core/src/worker/dispatch.rs
Comment thread openvmm/openvmm_core/src/worker/dispatch.rs
Comment thread vm/devices/pci/pci_core/src/msi.rs
Comment thread vmm_core/virt_kvm/src/arch/aarch64/mod.rs
.copied()
.expect("switch parent port must be a known downstream port");
for i in 0..switch.num_downstream_ports {
let port_name: Arc<str> = format!("{}-downstream-{}", switch.name, i).into();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems non-ideal to take a dependency on the string formatting here, the switch_device that gets created already exposes a query for all downstream ports, can we use that?

Comment on lines +90 to 126
/// Sets the shared bus range for the downstream device.
///
/// The port will update this bus range when the guest programs the
/// secondary bus number. The same bus range is shared with MSI/irqfd
/// wrappers so that interrupt delivery uses the correct requester ID.
///
/// The bus range is immediately initialized from the port's current
/// config space so that hotplugged devices see already-assigned bus
/// numbers without waiting for a guest write.
pub fn set_bus_range(&mut self, bus_range: AssignedBusRange) {
let secondary = *self.cfg_space.assigned_bus_range().start();
let subordinate = *self.cfg_space.assigned_bus_range().end();
bus_range.set_bus_range(secondary, subordinate);
self.bus_range = Some(bus_range);
}

/// Writes to the port's config space and handles any side effects
/// (e.g., bus number changes affecting downstream device identity).
pub fn write_cfg(&mut self, offset: u16, value: u32) -> IoResult {
let old_secondary = *self.cfg_space.assigned_bus_range().start();
let old_subordinate = *self.cfg_space.assigned_bus_range().end();
let result = self.cfg_space.write_u32(offset, value);
let new_secondary = *self.cfg_space.assigned_bus_range().start();
let new_subordinate = *self.cfg_space.assigned_bus_range().end();
if old_secondary != new_secondary || old_subordinate != new_subordinate {
self.on_bus_range_changed(new_secondary, new_subordinate);
}
result
}

/// Called when the bus range has changed. Updates the downstream
/// device's bus range to match.
fn on_bus_range_changed(&self, secondary_bus: u8, subordinate_bus: u8) {
if let Some(bus_range) = &self.bus_range {
bus_range.set_bus_range(secondary_bus, subordinate_bus);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all of this just go in ConfigSpaceType1Emulator directly?

/// Attach the provided `GenericPciBusDevice` to the port identified.
///
/// `device_id` is an optional shared identity that the port will update
/// with the device's RID when the guest programs the secondary bus number.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I have missed this in the change somewhere, but how does this approach (snooping on parent port bus number configuration) work for multifunction devices where there are multiple RIDs under the same port?

///
/// Clone is cheap (just an `Arc` bump).
#[derive(Clone, Debug)]
pub struct AssignedBusRange(Arc<AtomicU16>);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also include a segment number (ie. SegmentBusRange)

//
// Each device gets an AssignedBusRange that the root port updates when
// the guest programs the secondary and subordinate bus numbers. When
// ITS is configured, wrappers compose (segment << 16 | rid) at
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ITS is configured, wrappers compose (segment << 16 | rid) at
// ITS is configured, wrappers compose the RID at

/// update the device's RID when the secondary bus number changes.
/// Also available for SMMU stream ID mapping.
#[inspect(skip)]
bus_range: Option<AssignedBusRange>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be non-optional and always passed by callers?

partition.irqfd(),
signal_msi,
irqfd,
Some(bus_range),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused about the ownership and updating here. Each downstream port (root port or switch port) has a bus range that it updates on config space writes, but then we also have a separate AssignedBusRange given to the endpoint devices? And since the endpoint's bus range is handed over to ItsSignalMsi / ItsIrqFd, how does it get any information about bus number configuration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Guide unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants