On 9/17/21 3:22 AM, Peter Krempa wrote:
On Thu, Sep 16, 2021 at 21:16:03 -0400, Laine Stump wrote:
> On 9/13/21 11:38 AM, Peter Krempa wrote:
>> On Fri, Sep 10, 2021 at 11:18:20 -0400, Laine Stump wrote:
>>> On 9/10/21 3:30 AM, Peter Krempa wrote:
>>>> On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
Look, my problem is that we have an utterly inconsistent user story in
regards to device update and we are repeating our own mistakes.
While I also think that consistency is paramount in making an API
understandable, predictable, and thus usable, I think that description
is a bit hyperbolic...[*]
If we are stating that the user should use the FULL xml for update,
then
they should use the full XML and we should not arbitrarily choose what
we allow.
Similarly if we want to allow partial XML then that's fine as long as we
are consistent with ourselves, the user expectations and the docs.
While the man page for virsh update-device says nothing on this subject,
the documentation for virDomainUpateDeviceFlags() does say this:
The supplied XML description of the device should
contain all the information that is found in the
corresponding domain XML. Leaving out any piece of
information may be treated as a request for its
removal, which may be denied. For instance, when
users want to change CDROM media only for live
XML, they must provide live disk XML as found in
the corresponding live domain XML with only the
disk path changed.
So the official documentation says (in a clause that was ironically
added in response to the same Bug report (1621910) that added the
"unspecified exception" for alias!!) that omitting an attribute may
(*may*! - what a "weasel word" :-P) be treated as a request for its removal.
As far as requiring the user to read the current live XML and use that
as the starting point for a device-update - while I think that is a
noble goal, many (well, at least 2!) management applications don't ever
read the domain XML back from libvirt - RHV at least didn't in the past
when that bug was filed (don't know if it's still the case), and
Kubevirt/CNV simply *can't* (as far as I understand it); all domain
config is a one way street for them (or at least that's what was
explained to us a couple years ago). So requiring update-device to be
given a modified "current live XML" of a device is just going to be
impossible.
Now for the repeating of our mistake part:
> 1) originally the alias was completely ignored in device-update, since it
> was always set automatically by libvirt, and so it was a read-only attribute
> from the application's PoV.
>
> 2) Then user-specified aliases were added, with no associated change to the
> update-device code, and QE testing found that a requested change to alias
> during live device update was silently ignored:
>
>
https://bugzilla.redhat.com/1585108
Same for the ACPI index. It was added, and device-update was neglected.
Yes. This has happened several times. Just as it used to happen with,
e.g. new attributes being added and not documented (or not properly
validated during parsing), or new devices being added and hotplug
support forgotten. Those others have been internalized by everyone now,
so invariably they'll be caught in review. Hopefully "remember to add
the check in update-device" will also be internalized and caught more
often in review (some things actually *can* be changed at runtime, and
so remembering this might add useful functionality, not just a hand-slap
for trying :-))
>
> 3) This led to a patch that checked for any change in alias in the
> update-device XML, with no special case for "not specified" (i.e.
basically
> what you're suggesting the code should be):
>
>
>
https://gitlab.com/libvirt/libvirt/-/commit/4ad54a417a1bbe10aeffcce783f43...
Yup, that's also exactly what this patch is doing.
Yes, as the documentation says we should.
> 4) That patch resulted in a bug filed by RHV saying that the behavior of the
> update-device API had changed, which broke their existing code for changing
> the media in a CD drive or something like that (they don't read back the
> normalized XML after they've started the guest, so they don't have any info
> about auto-generated values, and thus weren't including alias in the XML
> they sent to update-device (this was before RHV started specifying their own
> aliases):
>
>
https://bugzilla.redhat.com/1621910
So this is still going to be a hypothetical.
Imagine a management application that uses partial XML when updating a
device because they generate it rather than taking a snippet from
dumpxml and modifying it. (We do have an example at least in terms of
disk update here.)
The same way as in libvirt, they figure out that they need to start
using the ACPI index for some reason. They implement it in their full
domain XML generator, but similarly to libvirt where we neglect the
update-device code they don't modify the generator that generates the
update XML.
So now they are using the ACPI index when defining the XML but not when
updating the device. The update-device code in libvirt thinks now that
they are attempting to change the ACPI index, which would seem to be a
regression, because nobody actually touched the update device XML
generator. This is also amplified by the fact libvirt actually fixed a
similar bug before. Bam a new BZ gets filed for this.
... and we reject it, saying "*You* updated part of your application to
support setting ACPI index, and *you* didn't update all of it, so *you*
have a bug that *you* need to fix in *your* application."
I grant you a mitigating circumstance that ACPI index is not really a
commonly used feature, so there's a chance that the scenario above might
not happen, but that doesn't mean that it's not a latent problem given
what we've done before.
acpi index is a totally different (to be hyperbolic!) situation from
alias (which, again, was 1) originally not settable by the user, and 2)
automatically generated by libvirt, while acpi index has, for as long as
it has existed, 1) been settable (only) by the user, and 2) is never
automatically generated by libvirt.
> 5) *That* bug resulted in the code being changed to what it is today, with
> this patch:
>
>
>
https://gitlab.com/libvirt/libvirt/-/commit/b48d9e939bcf32a8d6e571313637e...
>
> So essentially we are trapped into this "ignore unspecified alias"
semantic
> because alias previously existed but wasn't user-specified, and suddenly
> requiring it broke existing applications trying to do things completely
> unrelated to aliases.
See, we've acknowledged actually that partial XMLs are fine and thus
there was my reasoning that the same logic as with the alias should be
applied, otherwise it's opening a gate for another history repeat.
I disagree. In the case of alias, the bug was reported when an existing
piece of functionality in the management application (RHV) was broken
*with no change to RHV*, simply by updating libvirt. ("I didn't change
*anything* in my application, just updated libvirt and now my
application (which doesn't know or care about the new libvirt feature)
doesn't work!")
In your hypothetical situation, the management application consciously
added support for acpi index, but only did it halfway. As long as they
ignored the existence of acpi index in the rest of their application,
they would never see a problem when using update-device; this is because
libvirt will never set acpi index, the application itself must set it.
(the same can be said for virtio options, which have no exception for
"not specified"); if they're going to go to the trouble of setting it in
the initial config, then they know what it's set to, and can/should set
it in the XML sent to device-update. ("I updated my application to
support this new libvirt feature, and now my application doesn't work!")
([*]P.S. On the other hand, there are a few rom-related attributes that
are 1) only validated when updating an <interface>, but not when
changing any other type of PCI device (of course we only support
changing of interface, graphics, and disk), and 2) *do* have the "not
specified" exception as alias does (added in response to
https://bugzilla.redhat.com/1599513 - found as a result of my "morning
spelunking") in spite of the fact that those options are *never*
auto-generated by libvirt, so maybe your "bole" isn't so "hyper"
after
all :-P.)
(My opinion is that attributes that are auto-generated/auto-filled by
libvirt when they aren't specified should always get the exception,
while items that remain blank when unspecified should not. This will
assure that a) existing applications won't be broken by a libvirt update
while b) applications *halfway* updated to support a new feature will
break. This can be easily described and is consistent.)