Predictable and consistent net interface naming in guests

Hi Igor and Laine, I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest. That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes. Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type. Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system). Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest? Are there any plans to add the acpi_index support? [1] https://bugzilla.redhat.com/show_bug.cgi?id=1874096#c15 Thank you, Edy.

On Mon, Oct 31, 2022 at 16:32:27 +0200, Edward Haas wrote: [...]
Are there any plans to add the acpi_index support?
https://www.libvirt.org/formatdomain.html#network-interfaces "Since 7.3.0, one can set the ACPI index against network interfaces. With some operating systems (eg Linux with systemd), the ACPI index is used to provide network interface device naming, that is stable across changes in PCI addresses assigned to the device. This value is required to be unique across all devices and be between 1 and (16*1024-1)."

On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
Hi Igor and Laine,
I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest.
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
Are there any plans to add the acpi_index support?
This was implemented a year & a half ago https://libvirt.org/formatdomain.html#network-interfaces though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Thank you both for the quick response. On Mon, Oct 31, 2022 at 4:49 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
Hi Igor and Laine,
I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest.
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
Are there any plans to add the acpi_index support?
This was implemented a year & a half ago
https://libvirt.org/formatdomain.html#network-interfaces
though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet.
I think most deployments today use Q35. Are there plans to resolve it there? BTW, should this limitation be added to the documentation?
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, 31 Oct 2022 16:59:53 +0200 Edward Haas <edwardh@redhat.com> wrote:
Thank you both for the quick response.
On Mon, Oct 31, 2022 at 4:49 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
Hi Igor and Laine,
I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest.
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
Are there any plans to add the acpi_index support?
This was implemented a year & a half ago
https://libvirt.org/formatdomain.html#network-interfaces
though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet.
I think most deployments today use Q35. Are there plans to resolve it there?
I'm working on it actively. It won't make in this QEMU release but tentatively it might make into the next one if no complications would arise. I'm trying to make it work not only for root-ports/bridges on Q35 but also for non-hotpluggable NICs attached to root bus (aka integrated endpoints), so it would be on par with i440fx machines.
BTW, should this limitation be added to the documentation?
limitation (applies both to i440fx and Q35 is that NIC should be attached to hotpluggable bus where hotplug is managed by ACPI PCI hotplug handlers)
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, 31 Oct 2022 14:48:54 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
Hi Igor and Laine,
I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest.
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
Are there any plans to add the acpi_index support?
This was implemented a year & a half ago
https://libvirt.org/formatdomain.html#network-interfaces
though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet.
Q35 should work partially too. In its case acpi-index support is limited to hotplug enabled root-ports and PCIe-PCI bridges. One also has to enable ACPI PCI hotplug (it's enled by default on recent machine types) for it to work (i.e.it's not supported in native PCIe hotplug mode). So if mgmt can put nics on root-ports/bridges, then acpi-index should just work on Q35 as well.
With regards, Daniel

On 11/1/22 7:46 AM, Igor Mammedov wrote:
On Mon, 31 Oct 2022 14:48:54 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
Hi Igor and Laine,
I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest.
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
Are there any plans to add the acpi_index support?
This was implemented a year & a half ago
https://libvirt.org/formatdomain.html#network-interfaces
though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet.
Q35 should work partially too. In its case acpi-index support is limited to hotplug enabled root-ports and PCIe-PCI bridges. One also has to enable ACPI PCI hotplug (it's enled by default on recent machine types) for it to work (i.e.it's not supported in native PCIe hotplug mode).
So if mgmt can put nics on root-ports/bridges, then acpi-index should just work on Q35 as well.
With only a few exceptions (e.g. the first ich9 audio device, which is placed directly on the root bus at 00:1B.0 because that is where the ich9 audio device is located on actual Q35 hardware), libvirt will automatically put all PCI devices (including network interfaces) on a pcie-root-port. After seeing reports that "acpi index doesn't work with Q35 machinetypes" I just assumed that was correct and didn't try it. But after seeing the "should work partially" statement above, I tried it just now and an <interface> of a Q35 guest that had its PCI address auto-assigned by libvirt (and so was placed on a pcie-root-port)m and had <acpi index='4'/> was given the name "eno4". So what exactly is it that *doesn't* work?

On Wed, 2 Nov 2022 10:43:10 -0400 Laine Stump <laine@redhat.com> wrote:
On 11/1/22 7:46 AM, Igor Mammedov wrote:
On Mon, 31 Oct 2022 14:48:54 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
Hi Igor and Laine,
I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest.
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
Are there any plans to add the acpi_index support?
This was implemented a year & a half ago
https://libvirt.org/formatdomain.html#network-interfaces
though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet.
Q35 should work partially too. In its case acpi-index support is limited to hotplug enabled root-ports and PCIe-PCI bridges. One also has to enable ACPI PCI hotplug (it's enled by default on recent machine types) for it to work (i.e.it's not supported in native PCIe hotplug mode).
So if mgmt can put nics on root-ports/bridges, then acpi-index should just work on Q35 as well.
With only a few exceptions (e.g. the first ich9 audio device, which is placed directly on the root bus at 00:1B.0 because that is where the ich9 audio device is located on actual Q35 hardware), libvirt will automatically put all PCI devices (including network interfaces) on a pcie-root-port.
After seeing reports that "acpi index doesn't work with Q35 machinetypes" I just assumed that was correct and didn't try it. But after seeing the "should work partially" statement above, I tried it just now and an <interface> of a Q35 guest that had its PCI address auto-assigned by libvirt (and so was placed on a pcie-root-port)m and had <acpi index='4'/> was given the name "eno4". So what exactly is it that *doesn't* work?
From QEMU side: acpi-index requires: 1. acpi pci hotplug enabled (which is default on relatively new q35 machine types) 2. hotpluggble pci bus (root-port, various pci bridges) 3. NIC can be cold or hotplugged, guest should pick up acpi-index of the device currently plugged into slot what doesn't work: 1. device attached to host-bridge directly (work in progress) (q35) 2. devices attached to any PXB port and any hierarchy hanging of it (there are not plans to make it work) (q35, pc) 3. devices plugged into hot-plugged bridges/root-ports (hotplugged bridge lacks ACPI description) (hard to fix, maybe not possible) (q35, pc) 4. multifunction devices (it's undefined by spec, hence not supported)

On Wed, Nov 02, 2022 at 04:08:43PM +0100, Igor Mammedov wrote:
On Wed, 2 Nov 2022 10:43:10 -0400 Laine Stump <laine@redhat.com> wrote:
On 11/1/22 7:46 AM, Igor Mammedov wrote:
On Mon, 31 Oct 2022 14:48:54 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
Hi Igor and Laine,
I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest.
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
Are there any plans to add the acpi_index support?
This was implemented a year & a half ago
https://libvirt.org/formatdomain.html#network-interfaces
though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet.
Q35 should work partially too. In its case acpi-index support is limited to hotplug enabled root-ports and PCIe-PCI bridges. One also has to enable ACPI PCI hotplug (it's enled by default on recent machine types) for it to work (i.e.it's not supported in native PCIe hotplug mode).
So if mgmt can put nics on root-ports/bridges, then acpi-index should just work on Q35 as well.
With only a few exceptions (e.g. the first ich9 audio device, which is placed directly on the root bus at 00:1B.0 because that is where the ich9 audio device is located on actual Q35 hardware), libvirt will automatically put all PCI devices (including network interfaces) on a pcie-root-port.
After seeing reports that "acpi index doesn't work with Q35 machinetypes" I just assumed that was correct and didn't try it. But after seeing the "should work partially" statement above, I tried it just now and an <interface> of a Q35 guest that had its PCI address auto-assigned by libvirt (and so was placed on a pcie-root-port)m and had <acpi index='4'/> was given the name "eno4". So what exactly is it that *doesn't* work?
From QEMU side: acpi-index requires: 1. acpi pci hotplug enabled (which is default on relatively new q35 machine types) 2. hotpluggble pci bus (root-port, various pci bridges) 3. NIC can be cold or hotplugged, guest should pick up acpi-index of the device currently plugged into slot what doesn't work: 1. device attached to host-bridge directly (work in progress) (q35) 2. devices attached to any PXB port and any hierarchy hanging of it (there are not plans to make it work) (q35, pc)
I'd say this is still a relatively important, as the PXBs are needed to create a NUMA placement aware topology for guests, and I'd say it is undesirable to loose acpi-index if a guest is updated to be NUMA aware, or if a guest image can be deployed in either normal or NUMA aware setups.
3. devices plugged into hot-plugged bridges/root-ports (hotplugged bridge lacks ACPI description) (hard to fix, maybe not possible) (q35, pc)
Not so bothered about that, since I think generally mgmt apps pre-plug sufficient bridges to cope.
4. multifunction devices (it's undefined by spec, hence not supported)
Not a big deal. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, 2 Nov 2022 15:20:39 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Wed, Nov 02, 2022 at 04:08:43PM +0100, Igor Mammedov wrote:
On Wed, 2 Nov 2022 10:43:10 -0400 Laine Stump <laine@redhat.com> wrote:
On 11/1/22 7:46 AM, Igor Mammedov wrote:
On Mon, 31 Oct 2022 14:48:54 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
Hi Igor and Laine,
I would like to revive a 2 years old discussion [1] about consistent network interfaces in the guest.
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
Are there any plans to add the acpi_index support?
This was implemented a year & a half ago
https://libvirt.org/formatdomain.html#network-interfaces
though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet.
Q35 should work partially too. In its case acpi-index support is limited to hotplug enabled root-ports and PCIe-PCI bridges. One also has to enable ACPI PCI hotplug (it's enled by default on recent machine types) for it to work (i.e.it's not supported in native PCIe hotplug mode).
So if mgmt can put nics on root-ports/bridges, then acpi-index should just work on Q35 as well.
With only a few exceptions (e.g. the first ich9 audio device, which is placed directly on the root bus at 00:1B.0 because that is where the ich9 audio device is located on actual Q35 hardware), libvirt will automatically put all PCI devices (including network interfaces) on a pcie-root-port.
After seeing reports that "acpi index doesn't work with Q35 machinetypes" I just assumed that was correct and didn't try it. But after seeing the "should work partially" statement above, I tried it just now and an <interface> of a Q35 guest that had its PCI address auto-assigned by libvirt (and so was placed on a pcie-root-port)m and had <acpi index='4'/> was given the name "eno4". So what exactly is it that *doesn't* work?
From QEMU side: acpi-index requires: 1. acpi pci hotplug enabled (which is default on relatively new q35 machine types) 2. hotpluggble pci bus (root-port, various pci bridges) 3. NIC can be cold or hotplugged, guest should pick up acpi-index of the device currently plugged into slot what doesn't work: 1. device attached to host-bridge directly (work in progress) (q35) 2. devices attached to any PXB port and any hierarchy hanging of it (there are not plans to make it work) (q35, pc)
I'd say this is still a relatively important, as the PXBs are needed to create a NUMA placement aware topology for guests, and I'd say it is undesirable to loose acpi-index if a guest is updated to be NUMA aware, or if a guest image can be deployed in either normal or NUMA aware setups.
it's not only Q35 but also PC. We basically do not generate ACPI hierarchy for PXBs at all, so neither ACPI hotplug nor depended acpi-index would work. It's been so for many years and no one have asked to enable ACPI hotplug on them so far. CCing Amnon so he could ask around if we have a possible customer for this.
3. devices plugged into hot-plugged bridges/root-ports (hotplugged bridge lacks ACPI description) (hard to fix, maybe not possible) (q35, pc)
Not so bothered about that, since I think generally mgmt apps pre-plug sufficient bridges to cope.
4. multifunction devices (it's undefined by spec, hence not supported)
Not a big deal.
With regards, Daniel

On 11/2/22 11:58 AM, Igor Mammedov wrote:
On Wed, 2 Nov 2022 15:20:39 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Wed, Nov 02, 2022 at 04:08:43PM +0100, Igor Mammedov wrote:
On Wed, 2 Nov 2022 10:43:10 -0400 Laine Stump <laine@redhat.com> wrote:
On 11/1/22 7:46 AM, Igor Mammedov wrote:
On Mon, 31 Oct 2022 14:48:54 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote: > Hi Igor and Laine, > > I would like to revive a 2 years old discussion [1] about consistent network > interfaces in the guest. > > That discussion mentioned that a guest PCI address may change in two cases: > - The PCI topology changes. > - The machine type changes. > > Usually, the machine type is not expected to change, especially if one > wants to allow migrations between nodes. > I would hope to argue this should not be problematic in practice, because > guest images would be made per a specific machine type. > > Regarding the PCI topology, I am not sure I understand what changes > need to occur to the domxml for a defined guest PCI address to change. > The only think that I can think of is a scenario where hotplug/unplug is > used, > but even then I would expect existing devices to preserve their PCI address > and the plug/unplug device to have a reserved address managed by the one > acting on it (the management system). > > Could you please help clarify in which scenarios the PCI topology can cause > a mess to the naming of interfaces in the guest? > > Are there any plans to add the acpi_index support?
This was implemented a year & a half ago
https://libvirt.org/formatdomain.html#network-interfaces
though due to QEMU limitations this only works for the old i440fx chipset, not Q35 yet.
Q35 should work partially too. In its case acpi-index support is limited to hotplug enabled root-ports and PCIe-PCI bridges. One also has to enable ACPI PCI hotplug (it's enled by default on recent machine types) for it to work (i.e.it's not supported in native PCIe hotplug mode).
So if mgmt can put nics on root-ports/bridges, then acpi-index should just work on Q35 as well.
With only a few exceptions (e.g. the first ich9 audio device, which is placed directly on the root bus at 00:1B.0 because that is where the ich9 audio device is located on actual Q35 hardware), libvirt will automatically put all PCI devices (including network interfaces) on a pcie-root-port.
After seeing reports that "acpi index doesn't work with Q35 machinetypes" I just assumed that was correct and didn't try it. But after seeing the "should work partially" statement above, I tried it just now and an <interface> of a Q35 guest that had its PCI address auto-assigned by libvirt (and so was placed on a pcie-root-port)m and had <acpi index='4'/> was given the name "eno4". So what exactly is it that *doesn't* work?
From QEMU side: acpi-index requires: 1. acpi pci hotplug enabled (which is default on relatively new q35 machine types) 2. hotpluggble pci bus (root-port, various pci bridges) 3. NIC can be cold or hotplugged, guest should pick up acpi-index of the device currently plugged into slot what doesn't work: 1. device attached to host-bridge directly (work in progress) (q35) 2. devices attached to any PXB port and any hierarchy hanging of it (there are not plans to make it work) (q35, pc)
I'd say this is still a relatively important, as the PXBs are needed to create a NUMA placement aware topology for guests, and I'd say it is undesirable to loose acpi-index if a guest is updated to be NUMA aware, or if a guest image can be deployed in either normal or NUMA aware setups.
it's not only Q35 but also PC. We basically do not generate ACPI hierarchy for PXBs at all, so neither ACPI hotplug nor depended acpi-index would work. It's been so for many years and no one have asked to enable ACPI hotplug on them so far.
I'm guessing (based on absolutely 0 information :-)) that there would be more demand for acpi-index (and the resulting predictable interface names) than for acpi hotplug for NUMA-aware setup. Anyway, it sounds like (*within the confines of how libvirt constructs the PCI topology*) we actually have functional parity of acpi-index between 440fx and Q35.

On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
The machine type might not change from q35 to i440fx and vice versa, but since the domain XML is constructed every time a KubeVirt VM is started, the machine type might be q35-6.0 on one boot and q35-7.0 the next one if a KubeVirt upgrade that comes with a new version of QEMU has happened in between. This is unlikely to make a difference in terms of PCI addresses seen in the guest OS, but it's still not accurate to say that the machine type will not change. Live migration is a separate matter, as the machine type will definitely not change while the VM is running.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
A change in libvirt (again, due to a KubeVirt upgrade in between two boots of the same VM) might result in different PCI addresses being assigned to devices despite the same input XML. We generally try fairly hard to avoid this kind of situation, but we can only really guarantee stable PCI addresses for the lifetime of a VM that has been defined and can't promise that the same input XML will result in the same guest ABI when using different versions of libvirt. -- Andrea Bolognani / Red Hat / Virtualization

On Mon, Oct 31, 2022 at 6:55 PM Andrea Bolognani <abologna@redhat.com> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote:
That discussion mentioned that a guest PCI address may change in two cases: - The PCI topology changes. - The machine type changes.
Usually, the machine type is not expected to change, especially if one wants to allow migrations between nodes. I would hope to argue this should not be problematic in practice, because guest images would be made per a specific machine type.
The machine type might not change from q35 to i440fx and vice versa, but since the domain XML is constructed every time a KubeVirt VM is started, the machine type might be q35-6.0 on one boot and q35-7.0 the next one if a KubeVirt upgrade that comes with a new version of QEMU has happened in between.
This is unlikely to make a difference in terms of PCI addresses seen in the guest OS, but it's still not accurate to say that the machine type will not change.
Thank you for the clarification. It makes me wonder now what are the actual implications of the machine type change.
Live migration is a separate matter, as the machine type will definitely not change while the VM is running.
Regarding the PCI topology, I am not sure I understand what changes need to occur to the domxml for a defined guest PCI address to change. The only think that I can think of is a scenario where hotplug/unplug is used, but even then I would expect existing devices to preserve their PCI address and the plug/unplug device to have a reserved address managed by the one acting on it (the management system).
Could you please help clarify in which scenarios the PCI topology can cause a mess to the naming of interfaces in the guest?
A change in libvirt (again, due to a KubeVirt upgrade in between two boots of the same VM) might result in different PCI addresses being assigned to devices despite the same input XML.
We generally try fairly hard to avoid this kind of situation, but we can only really guarantee stable PCI addresses for the lifetime of a VM that has been defined and can't promise that the same input XML will result in the same guest ABI when using different versions of libvirt.
I would expect the PCI addresses that have been explicitly set in the domxml [2] to be honored. We cannot assume that? I mainly referred to that input option, not to the expectation that the generated configuration (of the domxml) to be identical between different versions. [2] https://libvirt.org/formatdomain.html#device-addresses
-- Andrea Bolognani / Red Hat / Virtualization

On 10/31/22 2:21 PM, Edward Haas wrote:
On Mon, Oct 31, 2022 at 6:55 PM Andrea Bolognani <abologna@redhat.com <mailto:abologna@redhat.com>> wrote:
On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote: > That discussion mentioned that a guest PCI address may change in two cases: > - The PCI topology changes. > - The machine type changes. > > Usually, the machine type is not expected to change, especially if one > wants to allow migrations between nodes. > I would hope to argue this should not be problematic in practice, because > guest images would be made per a specific machine type.
The machine type might not change from q35 to i440fx and vice versa, but since the domain XML is constructed every time a KubeVirt VM is started, the machine type might be q35-6.0 on one boot and q35-7.0 the next one if a KubeVirt upgrade that comes with a new version of QEMU has happened in between.
This is unlikely to make a difference in terms of PCI addresses seen in the guest OS, but it's still not accurate to say that the machine type will not change.
Thank you for the clarification. It makes me wonder now what are the actual implications of the machine type change.
Live migration is a separate matter, as the machine type will definitely not change while the VM is running.
> Regarding the PCI topology, I am not sure I understand what changes > need to occur to the domxml for a defined guest PCI address to change. > The only think that I can think of is a scenario where hotplug/unplug is > used, > but even then I would expect existing devices to preserve their PCI address > and the plug/unplug device to have a reserved address managed by the one > acting on it (the management system). > > Could you please help clarify in which scenarios the PCI topology can cause > a mess to the naming of interfaces in the guest?
A change in libvirt (again, due to a KubeVirt upgrade in between two boots of the same VM) might result in different PCI addresses being assigned to devices despite the same input XML.
We generally try fairly hard to avoid this kind of situation, but we can only really guarantee stable PCI addresses for the lifetime of a VM that has been defined and can't promise that the same input XML will result in the same guest ABI when using different versions of libvirt.
I would expect the PCI addresses that have been explicitly set in the domxml [2] to be honored. We cannot assume that?
*If* the PCI address has been set in the original XML, that address will be honored any and every time a new domain is defined from that XML. Alternately, if a domain is defined once (without explicitly specifying any PCI addresses) and then run multiple times from the same definition, libvirt will auto-generate PCI addresses at initial definition time, and then use those same addresses each time the domain is run. The issue is that no management application, including KubeVirt, is explicitly setting the PCI addresses of devices (and we believe that hands-off practice should continue), *AND* KubeVirt is re-defining the domain each time it is run (without querying libvirt for (and so never saving) the PCI addresses that were assigned to the devices. So each time the domain is stopped, all the PCI address info from that run is thrown away. And each time the domain is re-started (by re-defining it from the original XML that has no PCI address info), libvirt starts from scratch assigning addresses based on the information it receives from KubeVirt. And if the conditions have changed, then addresses are assigned differently. The potential situation Andrea described, where the PCI addresses could be changed merely due to an upgrade of KubeVirt/libvirt/qemu from one run to the next in spite of being fed the same (adress-less) XML, is actually extremely rare (I don't remember such a case) but theoretically it could happen. The more common change would be if a device was added or removed during one run of the guest, and then remained added/removed the next time it was run - that could change the PCI addresses of one or more of the remaining devices, depending on their ordering in the XML). So, libvirt provides two avenues to maintaining stable PCI addresses (and thus, network device names) across multiple runs of a domain (either define once, run many, or else query the XML of the running domain and use that XML (containing PCI addresses) the next time the domain is started, but KubeVirt doesn't use either of these (and if memory serves me correctly, it really can't due to its design. And delegating management of PCI addresses to KubeVirt is pushing too much complexity out to KubeVirt.
I mainly referred to that input option, not to the expectation that the generated configuration (of the domxml) to be identical between different versions.
[2] https://libvirt.org/formatdomain.html#device-addresses <https://libvirt.org/formatdomain.html#device-addresses>
-- Andrea Bolognani / Red Hat / Virtualization
participants (6)
-
Andrea Bolognani
-
Daniel P. Berrangé
-
Edward Haas
-
Igor Mammedov
-
Laine Stump
-
Peter Krempa