pcie: add GICv3 ITS support for aarch64 MSI delivery#3441
pcie: add GICv3 ITS support for aarch64 MSI delivery#3441jstarks wants to merge 3 commits intomicrosoft:mainfrom
Conversation
Replace the GICv2m MSI controller with KVM's in-kernel GICv3 ITS for aarch64 PCIe MSI/MSI-X delivery. GICv2m maps MSI writes to a fixed pool of 64 SPIs, which doesn't scale (a single NVMe device with 128 queues exhausts it) and is incompatible with the ITS-based device ID model needed for future SMMU support. The ITS routes MSIs via LPIs using (DeviceID, EventID) lookup, supporting thousands of interrupt vectors across all devices. KVM provides a complete in-kernel ITS (KVM_DEV_TYPE_ARM_VGIC_ITS) that handles all guest MMIO and command queue processing. The VMM creates the device, sets its base address, and initializes it. For emulated devices, MSIs are injected via KVM_SIGNAL_MSI with KVM_MSI_VALID_DEVID. For irqfd (VFIO passthrough), the kvm_irq_routing_msi entry carries the devid so the kernel signals the ITS directly. The main design challenge is that PCIe devices don't know their own requester ID (bus/device/function), since bus numbers are assigned dynamically by guest firmware. This is solved with a per-device AssignedBusRange that the PCIe port updates atomically when the guest programs secondary/subordinate bus numbers. ITS wrappers (ItsSignalMsi, ItsIrqFd) compose the full 32-bit device ID as (segment << 16 | BDF) at interrupt delivery time, transparent to the devices themselves. The SignalMsi trait changes from `rid: u32` (always passed as 0) to `devid: Option<u32>`, and IrqFdRoute::enable gains a matching parameter. This is a mechanical change across all backends (KVM, WHP, MSHV, HVF). Also adds ACPI IORT (IO Remapping Table) generation for aarch64, with ITS Group and PCI Root Complex nodes with ID mappings. The MADT gains a GIC ITS entry. DeviceTree generation emits an ITS child node under the GIC when ITS is configured, with msi-parent on PCIe host bridges pointing to the ITS phandle instead of v2m. ITS support is probed at KVM init time via KVM_CREATE_DEVICE_TEST, falling back to GICv2m on kernels or hardware without ITS. A --gic-msi CLI option (auto/its/v2m) allows overriding the default selection. GICv2m remains available for GICv2-only configurations.
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
This PR upgrades aarch64 PCIe MSI/MSI-X delivery by adding support for KVM’s in-kernel GICv3 ITS (LPI-based MSI routing) while keeping GICv2m as a fallback, and threads the required device-identity plumbing through the PCIe and hypervisor layers.
Changes:
- Add an aarch64 MSI-controller selection model (
GicMsiController/--gic-msi) and probe KVM ITS support, falling back to GICv2m when needed. - Introduce PCIe bus-range tracking (
AssignedBusRange) and ITS wrappers that compose(segment << 16) | BDFfor MSI signaling and irqfd routing. - Generate the required firmware descriptions for ITS routing (ACPI MADT ITS entry + IORT; device tree ITS child +
msi-parentupdates).
Reviewed changes
Copilot reviewed 54 out of 55 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| vmm_core/vmotherboard/src/lib.rs | Re-export AssignedBusRange for PCIe identity tracking integration. |
| vmm_core/vmotherboard/src/chipset/builder/mod.rs | Thread optional AssignedBusRange through PCIe device registration. |
| vmm_core/vmotherboard/src/chipset/backing/arc_mutex/services.rs | Extend static PCIe registration to carry optional bus-range identity. |
| vmm_core/vmotherboard/src/chipset/backing/arc_mutex/pci.rs | Store and pass per-device optional AssignedBusRange during enumeration. |
| vmm_core/vmotherboard/src/chipset/backing/arc_mutex/device.rs | Add builder API to attach an AssignedBusRange to PCIe devices. |
| vmm_core/vmotherboard/src/base_chipset.rs | Propagate optional AssignedBusRange into PCIe device addition. |
| vmm_core/virt/src/x86/apic_software_device.rs | Adapt SignalMsi signature to Option<u32> device identity. |
| vmm_core/virt/src/generic.rs | Add supports_its to PlatformInfo; adapt SignalMsi signature. |
| vmm_core/virt/src/aarch64/gic_v2m.rs | Update SignalMsi signature for v2m signaler. |
| vmm_core/virt/src/aarch64/gic_software_device.rs | Update SignalMsi signature for software GIC implementation. |
| vmm_core/virt_whp/src/synic.rs | Update SignalMsi signature for WHP MSI injection path. |
| vmm_core/virt_whp/src/lib.rs | Replace gic_v2m with gic_msi controller selection; set supports_its=false. |
| vmm_core/virt_whp/src/device.rs | Update SignalMsi signature for WHP device interrupt injection. |
| vmm_core/virt_mshv/src/x86_64/mod.rs | Update SignalMsi signature for MSHV MSI injection. |
| vmm_core/virt_mshv/src/irqfd.rs | Extend irqfd route enable to accept optional device identity. |
| vmm_core/virt_mshv/src/aarch64/mod.rs | Add supports_its=false; adapt MSI signaling signature/behavior. |
| vmm_core/virt_kvm/src/lib.rs | Track ITS device lifetime; store gic_msi controller selection. |
| vmm_core/virt_kvm/src/gsi.rs | Extend KVM irqfd enable path to pass through optional device identity. |
| vmm_core/virt_kvm/src/arch/x86_64/mod.rs | Plumb devid through KVM routing entries (unused on x86). |
| vmm_core/virt_kvm/src/arch/aarch64/mod.rs | Probe/create/init ITS device; add ITS MSI injection and irqfd routing support; expose supports_its. |
| vmm_core/virt_hvf/src/lib.rs | Add supports_its=false for HVF platform info. |
| vmm_core/src/device_builder.rs | Pass optional AssignedBusRange into PCIe device build path. |
| vmm_core/src/acpi_builder.rs | Add IORT generation for aarch64 + MADT ITS entry; add unit tests for IORT. |
| vm/vmcore/vm_topology/src/processor/aarch64.rs | Replace gic_v2m option with GicMsiController (None/V2m/Its) in topology. |
| vm/vmcore/src/irqfd.rs | Extend IrqFdRoute::enable with optional devid for ITS routing. |
| vm/kvm/src/lib.rs | Extend KVM IRQ routing MSI entry to include optional devid + flags. |
| vm/devices/user_driver_emulated_mock/src/lib.rs | Adapt mock MSI controller to new SignalMsi signature. |
| vm/devices/storage/nvme/src/tests/test_helpers.rs | Update NVMe test MSI controller to new signature. |
| vm/devices/storage/nvme_test/src/tests/test_helpers.rs | Update NVMe_test MSI controller to new signature. |
| vm/devices/pci/vpci/src/test_helpers/mod.rs | Update VPCI test MSI signaling to new signature. |
| vm/devices/pci/pcie/src/switch.rs | Add optional AssignedBusRange propagation into downstream port setup; route cfg writes via write_cfg. |
| vm/devices/pci/pcie/src/root.rs | Add optional AssignedBusRange propagation to ports/hotplug; route cfg writes via write_cfg. |
| vm/devices/pci/pcie/src/port.rs | Implement port-side bus-range tracking and cfg-write side effects for identity updates. |
| vm/devices/pci/pcie/src/lib.rs | Export new bus_range and its modules. |
| vm/devices/pci/pcie/src/its.rs | Add ITS wrappers (ItsSignalMsi, ItsIrqFd) composing segment+BDF device IDs. |
| vm/devices/pci/pcie/src/bus_range.rs | Add shared atomic bus-range container and ITS devid composition helper. |
| vm/devices/pci/pcie/fuzz/fuzz_pcie.rs | Update fuzz harness to match new PCIe root API signature. |
| vm/devices/pci/pcie/Cargo.toml | Add pal_event dependency required by irqfd wrapper interface. |
| vm/devices/pci/pci_core/src/test_helpers/mod.rs | Update test MSI controller signature. |
| vm/devices/pci/pci_core/src/msi.rs | Change SignalMsi to Option<u32> devid; add route/target helpers for rid-aware signaling. |
| vm/devices/pci/pci_core/src/capabilities/msix.rs | Update MSI-X delivery path to new MsiTarget API and irqfd enable signature. |
| vm/acpi_spec/src/madt.rs | Add MADT GIC ITS entry type/struct. |
| vm/acpi_spec/src/lib.rs | Export new iort module. |
| vm/acpi_spec/src/iort.rs | Introduce IORT table/node/type definitions for aarch64 PCIe + ITS mapping. |
| tmk/tmk_vmm/src/run.rs | Update TMK aarch64 topology config to gic_msi=None. |
| openvmm/openvmm_entry/src/lib.rs | Wire --gic-msi into config (GicMsiConfig). |
| openvmm/openvmm_entry/src/cli_args.rs | Add --gic-msi CLI option and GicMsiCli enum. |
| openvmm/openvmm_defs/src/config.rs | Add ITS base/size constants and GicMsiConfig config enum. |
| openvmm/openvmm_core/src/worker/vm_loaders/linux.rs | Emit device tree ITS child node + msi-parent selection (ITS vs v2m). |
| openvmm/openvmm_core/src/worker/dispatch.rs | Select ITS vs v2m from platform/config; wrap PCIe MSI/irqfd with ITS devid injection; propagate AssignedBusRange. |
| openhcl/virt_mshv_vtl/src/lib.rs | Update OpenHCL MSI signaling signature to Option<u32>. |
| openhcl/bootloader_fdt_parser/src/lib.rs | Update parsed topology to gic_msi=None default. |
| Guide/src/reference/emulated/pcie/overview.md | Document aarch64 MSI routing via ITS vs v2m and the --gic-msi override. |
| Guide/src/reference/devices/firmware/linux_direct.md | Update ACPI table list to include ITS/IORT behavior for aarch64. |
| Cargo.lock | Lockfile update for added pal_event dependency in pcie crate. |
| .copied() | ||
| .expect("switch parent port must be a known downstream port"); | ||
| for i in 0..switch.num_downstream_ports { | ||
| let port_name: Arc<str> = format!("{}-downstream-{}", switch.name, i).into(); |
There was a problem hiding this comment.
It seems non-ideal to take a dependency on the string formatting here, the switch_device that gets created already exposes a query for all downstream ports, can we use that?
| /// Sets the shared bus range for the downstream device. | ||
| /// | ||
| /// The port will update this bus range when the guest programs the | ||
| /// secondary bus number. The same bus range is shared with MSI/irqfd | ||
| /// wrappers so that interrupt delivery uses the correct requester ID. | ||
| /// | ||
| /// The bus range is immediately initialized from the port's current | ||
| /// config space so that hotplugged devices see already-assigned bus | ||
| /// numbers without waiting for a guest write. | ||
| pub fn set_bus_range(&mut self, bus_range: AssignedBusRange) { | ||
| let secondary = *self.cfg_space.assigned_bus_range().start(); | ||
| let subordinate = *self.cfg_space.assigned_bus_range().end(); | ||
| bus_range.set_bus_range(secondary, subordinate); | ||
| self.bus_range = Some(bus_range); | ||
| } | ||
|
|
||
| /// Writes to the port's config space and handles any side effects | ||
| /// (e.g., bus number changes affecting downstream device identity). | ||
| pub fn write_cfg(&mut self, offset: u16, value: u32) -> IoResult { | ||
| let old_secondary = *self.cfg_space.assigned_bus_range().start(); | ||
| let old_subordinate = *self.cfg_space.assigned_bus_range().end(); | ||
| let result = self.cfg_space.write_u32(offset, value); | ||
| let new_secondary = *self.cfg_space.assigned_bus_range().start(); | ||
| let new_subordinate = *self.cfg_space.assigned_bus_range().end(); | ||
| if old_secondary != new_secondary || old_subordinate != new_subordinate { | ||
| self.on_bus_range_changed(new_secondary, new_subordinate); | ||
| } | ||
| result | ||
| } | ||
|
|
||
| /// Called when the bus range has changed. Updates the downstream | ||
| /// device's bus range to match. | ||
| fn on_bus_range_changed(&self, secondary_bus: u8, subordinate_bus: u8) { | ||
| if let Some(bus_range) = &self.bus_range { | ||
| bus_range.set_bus_range(secondary_bus, subordinate_bus); | ||
| } | ||
| } |
There was a problem hiding this comment.
Should all of this just go in ConfigSpaceType1Emulator directly?
| /// Attach the provided `GenericPciBusDevice` to the port identified. | ||
| /// | ||
| /// `device_id` is an optional shared identity that the port will update | ||
| /// with the device's RID when the guest programs the secondary bus number. |
There was a problem hiding this comment.
Maybe I have missed this in the change somewhere, but how does this approach (snooping on parent port bus number configuration) work for multifunction devices where there are multiple RIDs under the same port?
| /// | ||
| /// Clone is cheap (just an `Arc` bump). | ||
| #[derive(Clone, Debug)] | ||
| pub struct AssignedBusRange(Arc<AtomicU16>); |
There was a problem hiding this comment.
Should this also include a segment number (ie. SegmentBusRange)
| // | ||
| // Each device gets an AssignedBusRange that the root port updates when | ||
| // the guest programs the secondary and subordinate bus numbers. When | ||
| // ITS is configured, wrappers compose (segment << 16 | rid) at |
There was a problem hiding this comment.
| // ITS is configured, wrappers compose (segment << 16 | rid) at | |
| // ITS is configured, wrappers compose the RID at |
| /// update the device's RID when the secondary bus number changes. | ||
| /// Also available for SMMU stream ID mapping. | ||
| #[inspect(skip)] | ||
| bus_range: Option<AssignedBusRange>, |
There was a problem hiding this comment.
Should this be non-optional and always passed by callers?
| partition.irqfd(), | ||
| signal_msi, | ||
| irqfd, | ||
| Some(bus_range), |
There was a problem hiding this comment.
I am a little confused about the ownership and updating here. Each downstream port (root port or switch port) has a bus range that it updates on config space writes, but then we also have a separate AssignedBusRange given to the endpoint devices? And since the endpoint's bus range is handed over to ItsSignalMsi / ItsIrqFd, how does it get any information about bus number configuration?
Replace the GICv2m MSI controller with KVM's in-kernel GICv3 ITS for aarch64 PCIe MSI/MSI-X delivery. GICv2m maps MSI writes to a fixed pool of 64 SPIs, which doesn't scale (a single NVMe device with 128 queues exhausts it) and is incompatible with the ITS-based device ID model needed for future SMMU support. The ITS routes MSIs via LPIs using (DeviceID, EventID) lookup, supporting thousands of interrupt vectors across all devices.
KVM provides a complete in-kernel ITS (KVM_DEV_TYPE_ARM_VGIC_ITS) that handles all guest MMIO and command queue processing. The VMM creates the device, sets its base address, and initializes it. For emulated devices, MSIs are injected via KVM_SIGNAL_MSI with KVM_MSI_VALID_DEVID. For irqfd (VFIO passthrough), the kvm_irq_routing_msi entry carries the devid so the kernel signals the ITS directly.
The main design challenge is that PCIe devices don't know their own requester ID (bus/device/function), since bus numbers are assigned dynamically by guest firmware. This is solved with a per-device AssignedBusRange that the PCIe port updates atomically when the guest programs secondary/subordinate bus numbers. ITS wrappers (ItsSignalMsi, ItsIrqFd) compose the full 32-bit device ID as (segment << 16 | BDF) at interrupt delivery time, transparent to the devices themselves.
The SignalMsi trait changes from
rid: u32(always passed as 0) todevid: Option<u32>, and IrqFdRoute::enable gains a matching parameter. This is a mechanical change across all backends (KVM, WHP, MSHV, HVF).Also adds ACPI IORT (IO Remapping Table) generation for aarch64, with ITS Group and PCI Root Complex nodes with ID mappings. The MADT gains a GIC ITS entry. DeviceTree generation emits an ITS child node under the GIC when ITS is configured, with msi-parent on PCIe host bridges pointing to the ITS phandle instead of v2m.
ITS support is probed at KVM init time via KVM_CREATE_DEVICE_TEST, falling back to GICv2m on kernels or hardware without ITS. A --gic-msi CLI option (auto/its/v2m) allows overriding the default selection. GICv2m remains available for GICv2-only configurations.