[libvirt PATCH 0/2] conf: log error when attempting to live update device acpi index

This isn't possible (acpi index is set when devices are probed at guest startup) and libvirt wasn't even trying to communicate the change to the guest in any way, but instead of logging an error, we were just pretending the request had succeeded. Laine Stump (2): conf: reformat virDomainDefCompatibleDevice for upcoming additional check conf: log error on attempts to modify ACPI index of active device src/conf/domain_conf.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) -- 2.31.1

The next patch will add another check similar to the existing check for a change in alias name. This patch reformats the code in preparation so that the next patch's purpose will be clear. Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index cb9e7218ff..fefc527901 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28477,13 +28477,14 @@ virDomainDefCompatibleDevice(virDomainDef *def, data.oldInfo = virDomainDeviceGetInfo(oldDev); if (action == VIR_DOMAIN_DEVICE_ACTION_UPDATE && - live && - (data.newInfo && data.oldInfo && - data.newInfo->alias && data.oldInfo->alias && - STRNEQ(data.newInfo->alias, data.oldInfo->alias))) { - virReportError(VIR_ERR_OPERATION_DENIED, "%s", - _("changing device alias is not allowed")); - return -1; + live && data.newInfo && data.oldInfo) { + + if (data.newInfo->alias && data.oldInfo->alias && + STRNEQ(data.newInfo->alias, data.oldInfo->alias)) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device alias is not allowed")); + return -1; + } } if (!virDomainDefHasUSB(def) && -- 2.31.1

The ACPI index of a device in a running guest can't be modified, and libvirt doesn't actually attempt to modify it, but it was possible for a user to request such a modification, and libvirt wouldn't complain, thus misleading the user into thinking that it had actually been changed. Resolves: https://bugzilla.redhat.com/1998920 Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fefc527901..7ff890d8b7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def, _("changing device alias is not allowed")); return -1; } + + if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device 'acpi index' is not allowed")); + return -1; + } } if (!virDomainDefHasUSB(def) && -- 2.31.1

On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
The ACPI index of a device in a running guest can't be modified, and libvirt doesn't actually attempt to modify it, but it was possible for a user to request such a modification, and libvirt wouldn't complain, thus misleading the user into thinking that it had actually been changed.
Resolves: https://bugzilla.redhat.com/1998920
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fefc527901..7ff890d8b7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def, _("changing device alias is not allowed")); return -1; } + + if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device 'acpi index' is not allowed")); + return -1; + }
I don't remember any more what the intended semantics of the 'update' API are in regards to asking the user to provide a complete definition of the device, but according to the 'alias' check above we seem to be specifically ignoring the situation when the new definition didn't provide an alias. This seems to be hinting that we ignore stuff that the used didn't provide. If that's the case you'll need to specifically exclude index '0' from the above check.

On 9/10/21 9:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
The ACPI index of a device in a running guest can't be modified, and libvirt doesn't actually attempt to modify it, but it was possible for a user to request such a modification, and libvirt wouldn't complain, thus misleading the user into thinking that it had actually been changed.
Resolves: https://bugzilla.redhat.com/1998920
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fefc527901..7ff890d8b7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def, _("changing device alias is not allowed")); return -1; } + + if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device 'acpi index' is not allowed")); + return -1; + }
I don't remember any more what the intended semantics of the 'update' API are in regards to asking the user to provide a complete definition of the device, but according to the 'alias' check above we seem to be specifically ignoring the situation when the new definition didn't provide an alias. This seems to be hinting that we ignore stuff that the used didn't provide.
If that's the case you'll need to specifically exclude index '0' from the above check.
While I agree that it would be user friendly I think we mandate users to provide full device XML and looking into the doc we really do: * virDomainUpdateDeviceFlags: * @domain: pointer to domain object * @xml: pointer to XML description of one device * @flags: bitwise-OR of virDomainDeviceModifyFlags * * The supplied XML description of the device should contain all * the information that is found in the corresponding domain XML. * Leaving out any piece of information may be treated as a * request for its removal, which may be denied. We can special case acpiIndex as you suggest but I think that's the farthest we should go. Otherwise it'll be hard to distinguish user's laziness to provide full XML from genuine request for removal. Michal

On Fri, Sep 10, 2021 at 10:57:37 +0200, Michal Prívozník wrote:
On 9/10/21 9:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
The ACPI index of a device in a running guest can't be modified, and libvirt doesn't actually attempt to modify it, but it was possible for a user to request such a modification, and libvirt wouldn't complain, thus misleading the user into thinking that it had actually been changed.
Resolves: https://bugzilla.redhat.com/1998920
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fefc527901..7ff890d8b7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def, _("changing device alias is not allowed")); return -1; } + + if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device 'acpi index' is not allowed")); + return -1; + }
I don't remember any more what the intended semantics of the 'update' API are in regards to asking the user to provide a complete definition of the device, but according to the 'alias' check above we seem to be specifically ignoring the situation when the new definition didn't provide an alias. This seems to be hinting that we ignore stuff that the used didn't provide.
If that's the case you'll need to specifically exclude index '0' from the above check.
While I agree that it would be user friendly I think we mandate users to provide full device XML and looking into the doc we really do:
* virDomainUpdateDeviceFlags: * @domain: pointer to domain object * @xml: pointer to XML description of one device * @flags: bitwise-OR of virDomainDeviceModifyFlags *
* The supplied XML description of the device should contain all * the information that is found in the corresponding domain XML. * Leaving out any piece of information may be treated as a * request for its removal, which may be denied.
We can special case acpiIndex as you suggest but I think that's the farthest we should go. Otherwise it'll be hard to distinguish user's laziness to provide full XML from genuine request for removal.
I definitely agree. It may be in certain cases even impossible to do the right thing, as e.g. removing an attribute/element may be a genuine user request to remove something. That's what acutally surprised me in case of the 'alias' check. If we want to go the proper way and require a full XML we should also remove the exemption for alias.

On 9/10/21 3:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
The ACPI index of a device in a running guest can't be modified, and libvirt doesn't actually attempt to modify it, but it was possible for a user to request such a modification, and libvirt wouldn't complain, thus misleading the user into thinking that it had actually been changed.
Resolves: https://bugzilla.redhat.com/1998920
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fefc527901..7ff890d8b7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def, _("changing device alias is not allowed")); return -1; } + + if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device 'acpi index' is not allowed")); + return -1; + }
I don't remember any more what the intended semantics of the 'update' API are in regards to asking the user to provide a complete definition of the device, but according to the 'alias' check above we seem to be specifically ignoring the situation when the new definition didn't provide an alias. This seems to be hinting that we ignore stuff that the used didn't provide.
If that's the case you'll need to specifically exclude index '0' from the above check.
The difference here is that alias is something that normally is not set by the user, but is instead auto-generated by libvirt when the guest starts and not stored/reported in the persistent config. So in the common case of "grab config XML, extract device element, modify it, send it to update-device" the alias wouldn't be in the XML, but if the user instead grabs the *status* XML then the alias *would* be there. (However, if the alias had been set by the user (with the prefix "ua-"), the alias would be present in the config XML.) So we allow for either blank (in the case that config XML was used) or no-change (in case alias was user-set and/or the status XML is used). The acpi index, on the other hand, is always unset unless the user specifies it, and so is never auto-generated and always matches in both the config and status XMLs. So whichever way the user gets the original XML (status or config) they are always going to have the actual setting. There are many other attributes that are checked to assure they match in (e.g.) qemuDomainChangeNet() (for example, network device model name, virtio options, upscript name,...) and (with one exception) none of those check for "not specified or not changed" - they all just check for "not changed". The one exception is ifname, which is another attribute that (like alias) is autogenerated every time the guest starts, and is never output in config XML (unless it was user-specified); for ifname if it's been left unspecified in the update XML, then the value from the status is copied over, making it also effectively ("unspecified or unchanged"). But again, this special handling isn't done for any attribute that isn't both auto-generated and ephemeral (as alias and ifname are). So in the end, I think it's correct that validation of acpi index *shouldn't* ignore an "apparently missing" value in the update. ====== BTW, something forgot to mention in the commit log - in the case of, e.g., network interfaces which is the only type of device where an alias is useful, most of the checks are in qemuDomainChangeNet() rather than in this virDomainDefCompatibleDevice() function. I was originally going to add this check into qemuDomainChangeNet() as well, but it is an attribute that's technically valid for any PCI device, not just interfaces; searching around, I found that virDomainDefCompatibleDevice() is called on update for all devices, and is already used to pretect against alias change (another attribute that's valid for all devices), so in the interest of consistency I put the check there. That said, although I used it based on precendent, I can't say that I really like virDomainDefCompatibleDevice(). It is used for multiple unrelated things - the bit at the beginning of the function is only executed if action is UPDATE and live is true, and all the rest of the function is only useful if action is ATTACH - it's checking things that could verifiably never happen during a live device update (e.g. checking if the device is USB but the domain doesn't support USB devices, or checking attributes for a device type that isn't allowed to be updated at all.). So it's really 2 completely independent functions in one, where the final two args act as an extension to the function name. I notice its action and live args previously weren't there - I did the spelunking into git history to find when/how; essentially there was at one time an unused action arg which was removed, but then re-added in order to add the check for alias). I didn't dig much deeper, but it seems like this might be an unnecessary tangle that should be split up into multiple functions. I didn't want to mess with that for a simple bugfix, though.

On 9/10/21 5:18 PM, Laine Stump wrote:
On 9/10/21 3:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
The ACPI index of a device in a running guest can't be modified, and libvirt doesn't actually attempt to modify it, but it was possible for a user to request such a modification, and libvirt wouldn't complain, thus misleading the user into thinking that it had actually been changed.
Resolves: https://bugzilla.redhat.com/1998920
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fefc527901..7ff890d8b7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def, _("changing device alias is not allowed")); return -1; } + + if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device 'acpi index' is not allowed")); + return -1; + }
I don't remember any more what the intended semantics of the 'update' API are in regards to asking the user to provide a complete definition of the device, but according to the 'alias' check above we seem to be specifically ignoring the situation when the new definition didn't provide an alias. This seems to be hinting that we ignore stuff that the used didn't provide.
If that's the case you'll need to specifically exclude index '0' from the above check.
The difference here is that alias is something that normally is not set by the user, but is instead auto-generated by libvirt when the guest starts and not stored/reported in the persistent config. So in the common case of "grab config XML, extract device element, modify it, send it to update-device" the alias wouldn't be in the XML, but if the user instead grabs the *status* XML then the alias *would* be there. (However, if the alias had been set by the user (with the prefix "ua-"), the alias would be present in the config XML.) So we allow for either blank (in the case that config XML was used) or no-change (in case alias was user-set and/or the status XML is used).
The acpi index, on the other hand, is always unset unless the user specifies it, and so is never auto-generated and always matches in both the config and status XMLs. So whichever way the user gets the original XML (status or config) they are always going to have the actual setting.
There are many other attributes that are checked to assure they match in (e.g.) qemuDomainChangeNet() (for example, network device model name, virtio options, upscript name,...) and (with one exception) none of those check for "not specified or not changed" - they all just check for "not changed". The one exception is ifname, which is another attribute that (like alias) is autogenerated every time the guest starts, and is never output in config XML (unless it was user-specified); for ifname if it's been left unspecified in the update XML, then the value from the status is copied over, making it also effectively ("unspecified or unchanged"). But again, this special handling isn't done for any attribute that isn't both auto-generated and ephemeral (as alias and ifname are).
So in the end, I think it's correct that validation of acpi index *shouldn't* ignore an "apparently missing" value in the update.
======
BTW, something forgot to mention in the commit log - in the case of, e.g., network interfaces which is the only type of device where an alias is useful, most of the checks are in qemuDomainChangeNet() rather than in this virDomainDefCompatibleDevice() function. I was originally going to add this check into qemuDomainChangeNet() as well, but it is an attribute that's technically valid for any PCI device, not just interfaces; searching around, I found that virDomainDefCompatibleDevice() is called on update for all devices, and is already used to pretect against alias change (another attribute that's valid for all devices), so in the interest of consistency I put the check there.
That said, although I used it based on precendent, I can't say that I really like virDomainDefCompatibleDevice(). It is used for multiple unrelated things - the bit at the beginning of the function is only executed if action is UPDATE and live is true, and all the rest of the function is only useful if action is ATTACH - it's checking things that could verifiably never happen during a live device update (e.g. checking if the device is USB but the domain doesn't support USB devices, or checking attributes for a device type that isn't allowed to be updated at all.). So it's really 2 completely independent functions in one, where the final two args act as an extension to the function name.
I notice its action and live args previously weren't there - I did the spelunking into git history to find when/how; essentially there was at one time an unused action arg which was removed, but then re-added in order to add the check for alias). I didn't dig much deeper, but it seems like this might be an unnecessary tangle that should be split up into multiple functions. I didn't want to mess with that for a simple bugfix, though.
Thank you Laine for your detailed analysis. I think you can merge this patch. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal

On Fri, Sep 10, 2021 at 11:18:20 -0400, Laine Stump wrote:
On 9/10/21 3:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
The ACPI index of a device in a running guest can't be modified, and libvirt doesn't actually attempt to modify it, but it was possible for a user to request such a modification, and libvirt wouldn't complain, thus misleading the user into thinking that it had actually been changed.
Resolves: https://bugzilla.redhat.com/1998920
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fefc527901..7ff890d8b7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def, _("changing device alias is not allowed")); return -1; } + + if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device 'acpi index' is not allowed")); + return -1; + }
I don't remember any more what the intended semantics of the 'update' API are in regards to asking the user to provide a complete definition of the device, but according to the 'alias' check above we seem to be specifically ignoring the situation when the new definition didn't provide an alias. This seems to be hinting that we ignore stuff that the used didn't provide.
If that's the case you'll need to specifically exclude index '0' from the above check.
The difference here is that alias is something that normally is not set by the user, but is instead auto-generated by libvirt when the guest starts and not stored/reported in the persistent config. So in the common case of "grab
I strongly disagree with this statement. Firstly we now have user aliases and secondly there's always runtime related stuff in the XML.
config XML, extract device element, modify it, send it to update-device" the alias wouldn't be in the XML, but if the user instead grabs the *status* XML then the alias *would* be there. (However, if the alias had been set by the user (with the prefix "ua-"), the alias would be present in the config XML.) So we allow for either blank (in the case that config XML was used) or no-change (in case alias was user-set and/or the status XML is used).
Picking config XML is semantically as wrong as generating a partial XML. It can simply be a completely different device. If we want the users to use the full XML when updating the device it _must_ be the full XML snippet from the running config. Otherwise we are back at a "partial XML" and all the problems connected to that.

On 9/13/21 11:38 AM, Peter Krempa wrote:
On Fri, Sep 10, 2021 at 11:18:20 -0400, Laine Stump wrote:
On 9/10/21 3:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
The ACPI index of a device in a running guest can't be modified, and libvirt doesn't actually attempt to modify it, but it was possible for a user to request such a modification, and libvirt wouldn't complain, thus misleading the user into thinking that it had actually been changed.
Resolves: https://bugzilla.redhat.com/1998920
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fefc527901..7ff890d8b7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def, _("changing device alias is not allowed")); return -1; } + + if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) { + virReportError(VIR_ERR_OPERATION_DENIED, "%s", + _("changing device 'acpi index' is not allowed")); + return -1; + }
I don't remember any more what the intended semantics of the 'update' API are in regards to asking the user to provide a complete definition of the device, but according to the 'alias' check above we seem to be specifically ignoring the situation when the new definition didn't provide an alias. This seems to be hinting that we ignore stuff that the used didn't provide.
If that's the case you'll need to specifically exclude index '0' from the above check.
The difference here is that alias is something that normally is not set by the user, but is instead auto-generated by libvirt when the guest starts and not stored/reported in the persistent config. So in the common case of "grab
I strongly disagree with this statement. Firstly we now have user aliases and secondly there's always runtime related stuff in the XML.
Yeah, I covered the user-specified aliases in my response. user-specified aliases are the only reason that code is in there at all. I did a bit of evening spelunking and found this: 1) originally the alias was completely ignored in device-update, since it was always set automatically by libvirt, and so it was a read-only attribute from the application's PoV. 2) Then user-specified aliases were added, with no associated change to the update-device code, and QE testing found that a requested change to alias during live device update was silently ignored: https://bugzilla.redhat.com/1585108 3) This led to a patch that checked for any change in alias in the update-device XML, with no special case for "not specified" (i.e. basically what you're suggesting the code should be): https://gitlab.com/libvirt/libvirt/-/commit/4ad54a417a1bbe10aeffcce783f4381d... 4) That patch resulted in a bug filed by RHV saying that the behavior of the update-device API had changed, which broke their existing code for changing the media in a CD drive or something like that (they don't read back the normalized XML after they've started the guest, so they don't have any info about auto-generated values, and thus weren't including alias in the XML they sent to update-device (this was before RHV started specifying their own aliases): https://bugzilla.redhat.com/1621910 5) *That* bug resulted in the code being changed to what it is today, with this patch: https://gitlab.com/libvirt/libvirt/-/commit/b48d9e939bcf32a8d6e571313637e2ee... So essentially we are trapped into this "ignore unspecified alias" semantic because alias previously existed but wasn't user-specified, and suddenly requiring it broke existing applications trying to do things completely unrelated to aliases. TL;NMFTG (Too Long; No More ... To Give) - As with many warts in libvirt, I think we need to leave the check for alias as it is.
config XML, extract device element, modify it, send it to update-device" the alias wouldn't be in the XML, but if the user instead grabs the *status* XML then the alias *would* be there. (However, if the alias had been set by the user (with the prefix "ua-"), the alias would be present in the config XML.) So we allow for either blank (in the case that config XML was used) or no-change (in case alias was user-set and/or the status XML is used).
Picking config XML is semantically as wrong as generating a partial XML. It can simply be a completely different device. > If we want the users to use the full XML when updating the device it _must_ be the full XML snippet from the running config. Otherwise we are back at a "partial XML" and all the problems connected to that.
Unfortunately I think the train has long ago left the station on this (used that metaphor just for you :-))(BTW, KVM Forum was just no fun at all this year without a talk like the one a few years ago about using virtualization on locomotives.) Anyway, based on Michal's ACK, I pushed my check for ACPI index, which *doesn't* ignore in the case that it's unspecified. As with all other attributes that are only present if they are user-specified, I think that is the correct behavior. (Feel free to joust with the "unspecified alias" windmill as much as you like though :-P)

On Thu, Sep 16, 2021 at 21:16:03 -0400, Laine Stump wrote:
On 9/13/21 11:38 AM, Peter Krempa wrote:
On Fri, Sep 10, 2021 at 11:18:20 -0400, Laine Stump wrote:
On 9/10/21 3:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
Look, my problem is that we have an utterly inconsistent user story in regards to device update and we are repeating our own mistakes. If we are stating that the user should use the FULL xml for update, then they should use the full XML and we should not arbitrarily choose what we allow. Similarly if we want to allow partial XML then that's fine as long as we are consistent with ourselves, the user expectations and the docs. Now for the repeating of our mistake part:
1) originally the alias was completely ignored in device-update, since it was always set automatically by libvirt, and so it was a read-only attribute from the application's PoV.
2) Then user-specified aliases were added, with no associated change to the update-device code, and QE testing found that a requested change to alias during live device update was silently ignored:
Same for the ACPI index. It was added, and device-update was neglected.
3) This led to a patch that checked for any change in alias in the update-device XML, with no special case for "not specified" (i.e. basically what you're suggesting the code should be):
https://gitlab.com/libvirt/libvirt/-/commit/4ad54a417a1bbe10aeffcce783f4381d...
Yup, that's also exactly what this patch is doing.
4) That patch resulted in a bug filed by RHV saying that the behavior of the update-device API had changed, which broke their existing code for changing the media in a CD drive or something like that (they don't read back the normalized XML after they've started the guest, so they don't have any info about auto-generated values, and thus weren't including alias in the XML they sent to update-device (this was before RHV started specifying their own aliases):
So this is still going to be a hypothetical. Imagine a management application that uses partial XML when updating a device because they generate it rather than taking a snippet from dumpxml and modifying it. (We do have an example at least in terms of disk update here.) The same way as in libvirt, they figure out that they need to start using the ACPI index for some reason. They implement it in their full domain XML generator, but similarly to libvirt where we neglect the update-device code they don't modify the generator that generates the update XML. So now they are using the ACPI index when defining the XML but not when updating the device. The update-device code in libvirt thinks now that they are attempting to change the ACPI index, which would seem to be a regression, because nobody actually touched the update device XML generator. This is also amplified by the fact libvirt actually fixed a similar bug before. Bam a new BZ gets filed for this. I grant you a mitigating circumstance that ACPI index is not really a commonly used feature, so there's a chance that the scenario above might not happen, but that doesn't mean that it's not a latent problem given what we've done before.
5) *That* bug resulted in the code being changed to what it is today, with this patch:
https://gitlab.com/libvirt/libvirt/-/commit/b48d9e939bcf32a8d6e571313637e2ee...
So essentially we are trapped into this "ignore unspecified alias" semantic because alias previously existed but wasn't user-specified, and suddenly requiring it broke existing applications trying to do things completely unrelated to aliases.
See, we've acknowledged actually that partial XMLs are fine and thus there was my reasoning that the same logic as with the alias should be applied, otherwise it's opening a gate for another history repeat.

On 9/17/21 3:22 AM, Peter Krempa wrote:
On Thu, Sep 16, 2021 at 21:16:03 -0400, Laine Stump wrote:
On 9/13/21 11:38 AM, Peter Krempa wrote:
On Fri, Sep 10, 2021 at 11:18:20 -0400, Laine Stump wrote:
On 9/10/21 3:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
Look, my problem is that we have an utterly inconsistent user story in regards to device update and we are repeating our own mistakes.
While I also think that consistency is paramount in making an API understandable, predictable, and thus usable, I think that description is a bit hyperbolic...[*]
If we are stating that the user should use the FULL xml for update, then they should use the full XML and we should not arbitrarily choose what we allow.
Similarly if we want to allow partial XML then that's fine as long as we are consistent with ourselves, the user expectations and the docs.
While the man page for virsh update-device says nothing on this subject, the documentation for virDomainUpateDeviceFlags() does say this: The supplied XML description of the device should contain all the information that is found in the corresponding domain XML. Leaving out any piece of information may be treated as a request for its removal, which may be denied. For instance, when users want to change CDROM media only for live XML, they must provide live disk XML as found in the corresponding live domain XML with only the disk path changed. So the official documentation says (in a clause that was ironically added in response to the same Bug report (1621910) that added the "unspecified exception" for alias!!) that omitting an attribute may (*may*! - what a "weasel word" :-P) be treated as a request for its removal. As far as requiring the user to read the current live XML and use that as the starting point for a device-update - while I think that is a noble goal, many (well, at least 2!) management applications don't ever read the domain XML back from libvirt - RHV at least didn't in the past when that bug was filed (don't know if it's still the case), and Kubevirt/CNV simply *can't* (as far as I understand it); all domain config is a one way street for them (or at least that's what was explained to us a couple years ago). So requiring update-device to be given a modified "current live XML" of a device is just going to be impossible.
Now for the repeating of our mistake part:
1) originally the alias was completely ignored in device-update, since it was always set automatically by libvirt, and so it was a read-only attribute from the application's PoV.
2) Then user-specified aliases were added, with no associated change to the update-device code, and QE testing found that a requested change to alias during live device update was silently ignored:
Same for the ACPI index. It was added, and device-update was neglected.
Yes. This has happened several times. Just as it used to happen with, e.g. new attributes being added and not documented (or not properly validated during parsing), or new devices being added and hotplug support forgotten. Those others have been internalized by everyone now, so invariably they'll be caught in review. Hopefully "remember to add the check in update-device" will also be internalized and caught more often in review (some things actually *can* be changed at runtime, and so remembering this might add useful functionality, not just a hand-slap for trying :-))
3) This led to a patch that checked for any change in alias in the update-device XML, with no special case for "not specified" (i.e. basically what you're suggesting the code should be):
https://gitlab.com/libvirt/libvirt/-/commit/4ad54a417a1bbe10aeffcce783f4381d...
Yup, that's also exactly what this patch is doing.
Yes, as the documentation says we should.
4) That patch resulted in a bug filed by RHV saying that the behavior of the update-device API had changed, which broke their existing code for changing the media in a CD drive or something like that (they don't read back the normalized XML after they've started the guest, so they don't have any info about auto-generated values, and thus weren't including alias in the XML they sent to update-device (this was before RHV started specifying their own aliases):
So this is still going to be a hypothetical.
Imagine a management application that uses partial XML when updating a device because they generate it rather than taking a snippet from dumpxml and modifying it. (We do have an example at least in terms of disk update here.)
The same way as in libvirt, they figure out that they need to start using the ACPI index for some reason. They implement it in their full domain XML generator, but similarly to libvirt where we neglect the update-device code they don't modify the generator that generates the update XML.
So now they are using the ACPI index when defining the XML but not when updating the device. The update-device code in libvirt thinks now that they are attempting to change the ACPI index, which would seem to be a regression, because nobody actually touched the update device XML generator. This is also amplified by the fact libvirt actually fixed a similar bug before. Bam a new BZ gets filed for this.
... and we reject it, saying "*You* updated part of your application to support setting ACPI index, and *you* didn't update all of it, so *you* have a bug that *you* need to fix in *your* application."
I grant you a mitigating circumstance that ACPI index is not really a commonly used feature, so there's a chance that the scenario above might not happen, but that doesn't mean that it's not a latent problem given what we've done before.
acpi index is a totally different (to be hyperbolic!) situation from alias (which, again, was 1) originally not settable by the user, and 2) automatically generated by libvirt, while acpi index has, for as long as it has existed, 1) been settable (only) by the user, and 2) is never automatically generated by libvirt.
5) *That* bug resulted in the code being changed to what it is today, with this patch:
https://gitlab.com/libvirt/libvirt/-/commit/b48d9e939bcf32a8d6e571313637e2ee...
So essentially we are trapped into this "ignore unspecified alias" semantic because alias previously existed but wasn't user-specified, and suddenly requiring it broke existing applications trying to do things completely unrelated to aliases.
See, we've acknowledged actually that partial XMLs are fine and thus there was my reasoning that the same logic as with the alias should be applied, otherwise it's opening a gate for another history repeat.
I disagree. In the case of alias, the bug was reported when an existing piece of functionality in the management application (RHV) was broken *with no change to RHV*, simply by updating libvirt. ("I didn't change *anything* in my application, just updated libvirt and now my application (which doesn't know or care about the new libvirt feature) doesn't work!") In your hypothetical situation, the management application consciously added support for acpi index, but only did it halfway. As long as they ignored the existence of acpi index in the rest of their application, they would never see a problem when using update-device; this is because libvirt will never set acpi index, the application itself must set it. (the same can be said for virtio options, which have no exception for "not specified"); if they're going to go to the trouble of setting it in the initial config, then they know what it's set to, and can/should set it in the XML sent to device-update. ("I updated my application to support this new libvirt feature, and now my application doesn't work!") ([*]P.S. On the other hand, there are a few rom-related attributes that are 1) only validated when updating an <interface>, but not when changing any other type of PCI device (of course we only support changing of interface, graphics, and disk), and 2) *do* have the "not specified" exception as alias does (added in response to https://bugzilla.redhat.com/1599513 - found as a result of my "morning spelunking") in spite of the fact that those options are *never* auto-generated by libvirt, so maybe your "bole" isn't so "hyper" after all :-P.) (My opinion is that attributes that are auto-generated/auto-filled by libvirt when they aren't specified should always get the exception, while items that remain blank when unspecified should not. This will assure that a) existing applications won't be broken by a libvirt update while b) applications *halfway* updated to support a new feature will break. This can be easily described and is consistent.)
participants (3)
-
Laine Stump
-
Michal Prívozník
-
Peter Krempa