Hi,
I'm trying to get a better understanding of how libvirt VMs interact with the default
QEMU setting for pci-hole64-size on q35 hosts, to assess why my libvirt VMs behave
differently from a similarly configured lxd VM. As I understand, both libvirt and lxd are
using qemu q35 VMs under the hood, and both are inheriting their pci-hole64-size from
qemu's default setting (correct me if that's wrong), but in my tests, I'm
getting different behavior from them. I know lxd is probably out of scope from the libvirt
project perspective, so consider this more of a libvirt question w/ some added lxd
context.
All of this is on a DGX B200 host, which contains large (~180GB VRAM) GPUs.
With libvirt/virt-install, I created a q35 virtual machine with CPU host passthrough and 1
or more GPUs passed-through via --host-device. Without additional modifications, this
works as expected, and I can initialize the GPU driver in the VM and run nvidia-smi.
With lxd (which creates a q35 virtual machine with CPU host passthrough by default), I
attached 1 GPU via "lxc config device add passthroughtest gpu gpu pci=1b:00.0".
On that machine, the pci-hole64-size is too small by default, since I see these in my
dmesg:
[ 1.099110] pci 0000:00:01.5: bridge window [mem size 0x6000000000 64bit pref]:
can't assign; no space
[ 1.120274] pci 0000:00:01.5: bridge window [mem size 0x6000000000 64bit pref]:
can't assign; no space
[ 1.183281] pci 0000:06:00.0: BAR 2 [mem size 0x4000000000 64bit pref]: can't
assign; no space
[ 1.186320] pci 0000:06:00.0: BAR 0 [mem size 0x04000000 64bit pref]: can't assign;
no space
[ 1.189340] pci 0000:06:00.0: BAR 4 [mem size 0x02000000 64bit pref]: can't assign;
no space
and I cannot initialize the GPU driver since the BARs weren't mapped correctly.
When I apply a larger hole size to my lxd VM via `lxc config set passthroughtest
raw.qemu=' -global q35-pcihost.pci-hole64-size=8192G'`, I don't see any
"can't assign; no space" messages, and the driver works as expected.
My question about libvirt is - where (if at all) does libvirt interact with qemu's
pci-hole64-size value? If libvirt does not automatically do something functionally similar
to changing the hole size like I need to do above for lxd, and is in fact just using a
qemu default value, is there some other related interaction happening in libvirt that
might explain why my libvirt VMs don't require a manual change to pci-hole64-size,
despite the fact that the relevant parts of the underlying qemu machine should be the
same?