
On 30/05/2024 18:00, Dario Faggioli via Devel wrote:
On Thu, 2024-05-30 at 14:45 +0200, Igor Mammedov wrote:
Usability of huge VMs (incl. large amount of vCPUs) heavily depends on used guest OS. What would work for one might not work for other. For general purpose OS adding IOMMU is typically necessary to make vCPUs over 254 usable, for Windows adjusting -smp might make a difference.
On the other hand, special built guest with tailored QEMU config might work without IOMMU just fine (David was the one who patched KVM to that effect if I recall correctly).
So, as far as I know, there is no way for any OS with any configuration to bring up more than 255 vCPUs, without a vIOMMU.
IIUIC, it's a matter of number of bits available in the I/O APIC IRQ destination register. Like, with only that available, and it being only 8 bit wide, it's just not doable.
On VMs, there's alternatively a KVM PV op to bump the limit to 32k vCPUs i.e. KVM_FEATURE_MSI_EXT_DEST_ID (on qemu it's cpu feature name +kvm-msi-ext-dest-id) Which uses the other 24-bits for that destination register (which on hardware would cross a page boundary in IOAPIC entry IIUC) without needing IOMMU interrupt remapping. But you need the guest to understand that feature (which is there since Linux v5.15 or around that timeframe). I think this is what Igor is referring to. Joao