Updating domains definitions via API

Hi, Looking into a bug in vagrant-libvirt where an error during the update will cause the domain to be completely discarded. https://github.com/vagrant-libvirt/vagrant-libvirt/issues/949 Basically I think it stems from doing an undefine -> create with XML new process, which if there is an issue with the new XML due to KVM module not loaded or something similar it will be rejected, but unfortunately it is also unlikely to allow the old definition to be restored either. I'm looking around to try and see if there is an API (specfically in ruby-libvirt) for updating the domain definition, so that if the new XML is rejected at least the old definition remains, and so far I'm drawing a blank. Is the only option here to write using a temporary domain name, then remove the old domain and rename the new definition to the old domain? Or have I missed the obvious API analogous to the edit functionality? -- Darragh Bailey "Nothing is foolproof to a sufficiently talented fool" - unknown

On 5/12/22 4:03 PM, Darragh Bailey wrote:
Hi,
Looking into a bug in vagrant-libvirt where an error during the update will cause the domain to be completely discarded.
https://github.com/vagrant-libvirt/vagrant-libvirt/issues/949 <https://github.com/vagrant-libvirt/vagrant-libvirt/issues/949>
Basically I think it stems from doing an undefine -> create with XML new process, which if there is an issue with the new XML due to KVM module not loaded or something similar it will be rejected, but unfortunately it is also unlikely to allow the old definition to be restored either.
I'm looking around to try and see if there is an API (specfically in ruby-libvirt) for updating the domain definition, so that if the new XML is rejected at least the old definition remains, and so far I'm drawing a blank.
Is the only option here to write using a temporary domain name, then remove the old domain and rename the new definition to the old domain?
Or have I missed the obvious API analogous to the edit functionality?
The virDomainDefineXMLFlags API (and also the older/deprecated virDomainDefineXML API) doesn't require that the domain first be undefined (with one notable exception - see below[*]). If you define a domain that already exists with the same name and uuid, then the effect is to "redefine" (or "update" if you prefer) the existing domain of that name. If the domain is currently active, then the changes will take effect the next time the domain is shut down ("Destroy"ed in libvirt API parlance) and re-started. If any error is encountered during this redefinition, then no changes are made to the existing domain definition. [*]The exception to this - if you attempt to Define a domain that has the same name or uuid as an existing domain, but the uuid/name is different, that is an error.

Hi, On Thu 12 May 2022, 21:34 Laine Stump, <laine@redhat.com> wrote:
The virDomainDefineXMLFlags API (and also the older/deprecated virDomainDefineXML API) doesn't require that the domain first be undefined (with one notable exception - see below[*]). If you define a domain that already exists with the same name and uuid, then the effect is to "redefine" (or "update" if you prefer) the existing domain of that name. If the domain is currently active, then the changes will take effect the next time the domain is shut down ("Destroy"ed in libvirt API parlance) and re-started.
That makes it much easier, and makes what the code was previously doing rather stupid. Probably a case that I never thought to read the API doc for virDomainDefineXML If any error is encountered during this redefinition, then no changes
are made to the existing domain definition.
That is the ideal behaviour
[*]The exception to this - if you attempt to Define a domain that has the same name or uuid as an existing domain, but the uuid/name is different, that is an error.
Are there any gotchas/limitations dealing with NVRAM when redefining domains that I should be aware of? Typically setting some flags to allow the undefine to work as expected, just not clear whether a NVRAM file could be orphaned by updating the domain XML, or if a reference will be maintained so if a domain once had NVRAM it's necessary to pass the flag to remove on undefine. Hopefully nget to test around some of this later, just not sure I know enough to ensure I check all behaviours. -- Darragh Bailey "Nothing is foolproof to a sufficiently talented fool" - unknown

On Fri, 13 May 2022 at 00:17, Darragh Bailey <daragh.bailey@gmail.com> wrote:
Hi,
On Thu 12 May 2022, 21:34 Laine Stump, <laine@redhat.com> wrote:
The virDomainDefineXMLFlags API (and also the older/deprecated virDomainDefineXML API) doesn't require that the domain first be undefined (with one notable exception - see below[*]). If you define a domain that already exists with the same name and uuid, then the effect is to "redefine" (or "update" if you prefer) the existing domain of that name. If the domain is currently active, then the changes will take effect the next time the domain is shut down ("Destroy"ed in libvirt API parlance) and re-started.
Unfortunately trying to call this via ruby-libvirt doesn't appear to behave as expected. It appears that if I add an nvram element without a loader element to the os block, the following code block will execute without issue but also without changing the domain XML: # Apply XML changes directly if descr_changed begin env[:ui].info("Updating domain definition due to configuration change") new_descr = String.new xml_descr.write new_descr # env[:machine].provider.driver.connection.client provides access to the ruby-libvirt connection object libvirt_domain = env[:machine].provider.driver.connection.client.define_domain_xml(new_descr, 1) domain = env[:machine].provider.driver.connection.servers.get(env[:machine].id.to_s) rescue Fog::Errors::Error => e raise Errors::UpdateServerError, error_message: e.message end end I've by-passing the fog-libvirt to call the ruby-libvirt connection object directly to try and avoid any unexpected interference when testing the API call out e.g. https://libvirt.org/ruby/api/Libvirt/Connect.html#method-i-define_domain_xml So not only is there no exception thrown if the XML change is ignored, I've noticed there doesn't appear to be an easy way to check using the API either, in that https://libvirt.org/ruby/api/Libvirt/Domain.html#method-i-updated-3F doesn't indicate that the domain has been updated whether I call it on the libvirt_domain object returned, or an instance from before the define_domain_xml call is made. It appears that the only way is to perform a call to get the xml definition of the libvirt_domain object returned in the above block and see if that matches the xml that was sent, if not, error. Is this the expected usage of the API? Or should the call to define_domain_xml raise an exception if it cannot update the domain XML? as opposed to a schema validation error which does appear to be detected when I did something stupid. -- Darragh

On 5/14/22 3:23 PM, Darragh Bailey wrote:
On Fri, 13 May 2022 at 00:17, Darragh Bailey <daragh.bailey@gmail.com <mailto:daragh.bailey@gmail.com>> wrote:
Hi,
On Thu 12 May 2022, 21:34 Laine Stump, <laine@redhat.com <mailto:laine@redhat.com>> wrote:
The virDomainDefineXMLFlags API (and also the older/deprecated virDomainDefineXML API) doesn't require that the domain first be undefined (with one notable exception - see below[*]). If you define a domain that already exists with the same name and uuid, then the effect is to "redefine" (or "update" if you prefer) the existing domain of that name. If the domain is currently active, then the changes will take effect the next time the domain is shut down ("Destroy"ed in libvirt API parlance) and re-started.
Unfortunately trying to call this via ruby-libvirt doesn't appear to behave as expected. It appears that if I add an nvram element without a loader element to the os block, the following code block will execute without issue but also without changing the domain XML:
# Apply XML changes directly if descr_changed begin env[:ui].info("Updating domain definition due to configuration change") new_descr = String.new xml_descr.write new_descr # env[:machine].provider.driver.connection.client provides access to the ruby-libvirt connection object libvirt_domain = env[:machine].provider.driver.connection.client.define_domain_xml(new_descr, 1) domain = env[:machine].provider.driver.connection.servers.get(env[:machine].id.to_s) rescue Fog::Errors::Error => e raise Errors::UpdateServerError, error_message: e.message end end
I've by-passing the fog-libvirt to call the ruby-libvirt connection object directly to try and avoid any unexpected interference when testing the API call out e.g. https://libvirt.org/ruby/api/Libvirt/Connect.html#method-i-define_domain_xml <https://libvirt.org/ruby/api/Libvirt/Connect.html#method-i-define_domain_xml>
So not only is there no exception thrown if the XML change is ignored, I've noticed there doesn't appear to be an easy way to check using the API either, in that https://libvirt.org/ruby/api/Libvirt/Domain.html#method-i-updated-3F <https://libvirt.org/ruby/api/Libvirt/Domain.html#method-i-updated-3F> doesn't indicate that the domain has been updated whether I call it on the libvirt_domain object returned, or an instance from before the define_domain_xml call is made.
It appears that the only way is to perform a call to get the xml definition of the libvirt_domain object returned in the above block and see if that matches the xml that was sent, if not, error.
Is this the expected usage of the API? Or should the call to define_domain_xml raise an exception if it cannot update the domain XML? as opposed to a schema validation error which does appear to be detected when I did something stupid.
Caveat - I'm completely unfamiliar with ruby and the libvirt-ruby API bindings. If there is a problem that causes the domain config to not be updated, libvirt will return an error. So I would suspect one of the two things is happening: 1) there may be a problem in the libvirt-ruby bindings that causes the error reported by the call (in whatever C code is behind the ruby bindings) to libvirt to be properly propagated to ruby. I would hope this isn't the case, but "bugs happen", so it should be considered as a possibility. 2) As I said in my earlier mail, any changes that are made will take effect the next time the domain is destroyed and restarted. This also means that the changes won't be reflected in the "live/status" XML of the domain until that time. If you want to see the new configuration after you've made changes, you should add the VIR_DOMAIN_XML_INACTIVE flag when requesting the domain XML. Possibly you haven't included this flag, and that's why you think that your change hasn't taken effect?

Hi, On Sat 14 May 2022, 21:11 Laine Stump, <laine@redhat.com> wrote:
Caveat - I'm completely unfamiliar with ruby and the libvirt-ruby API bindings.
If there is a problem that causes the domain config to not be updated, libvirt will return an error. So I would suspect one of the two things is happening:
Thanks, that's what I was expecting should happen, just wanted to be sure that there wasn't some other behaviour in place for compatibility reasons. 1) there may be a problem in the libvirt-ruby bindings that causes the
error reported by the call (in whatever C code is behind the ruby bindings) to libvirt to be properly propagated to ruby. I would hope this isn't the case, but "bugs happen", so it should be considered as a possibility.
A quick look suggests that the code looks to raise an exception if the dom pointer returned is NULL, so I think the bindings are correct. But I will check that what version of ruby-libvirt I have installed matches the source code I'm looked at. 2) As I said in my earlier mail, any changes that are made will take
effect the next time the domain is destroyed and restarted. This also means that the changes won't be reflected in the "live/status" XML of the domain until that time. If you want to see the new configuration after you've made changes, you should add the VIR_DOMAIN_XML_INACTIVE flag when requesting the domain XML. Possibly you haven't included this flag, and that's why you think that your change hasn't taken effect?
Ah, I forgot to outline where in the lifecycle the update is taking place. The domain isn't running when the code attempts to update the definition. Does that still mean that the VIR_DOMAIN_XML_INACTIVE flag is needed? I was assuming when the domain is inactive the XML changes would be reflected immediately. Oddly I thought during some experiments when the added NVRAM XML element was ignored, the updated number of CPUs which was in the same XML definition passed in was applied. Will dig further tomorrow or Monday on the version of ruby-libvirt installed into my rvm dev env as well as checking passing in the flag. I'm sure it'll turn out to be something obvious that I'm overlooking. Thanks, -- Darragh

Hi, On Sat, 14 May 2022 at 23:42, Darragh Bailey <daragh.bailey@gmail.com> wrote:
Hi,
On Sat 14 May 2022, 21:11 Laine Stump, <laine@redhat.com> wrote:
Caveat - I'm completely unfamiliar with ruby and the libvirt-ruby API bindings.
If there is a problem that causes the domain config to not be updated, libvirt will return an error. So I would suspect one of the two things is happening:
Looks like I've stumbled across an edge case here regarding domain config not being fully updated but also not returning an error.
1) there may be a problem in the libvirt-ruby bindings that causes the
error reported by the call (in whatever C code is behind the ruby bindings) to libvirt to be properly propagated to ruby. I would hope this isn't the case, but "bugs happen", so it should be considered as a possibility.
I decided to test the behaviour slightly more directly via virsh rather than through the ruby bindings and based on replicating the same there, and a review of the ruby binding API code, I believe the binding code is working fine, the problem is unexpected behaviour in libvirt. It appears that if the XML passed in contains an nvram XML tag without a corresponding loader tag, then the nvram tag will be dropped without an error. In fact if you change any other information such as the vcpu definition in the same update, that will still be applied while the nvram tag is ignored. Simply the reason the ruby code isn't raising an exception is that libvirt thinks nothing went wrong and is returning a non null pointer. Put together a gist to make it easier to show what I'm seeing https://gist.github.com/electrofelix/6f66714c14a0d6e3b1078037aadae398 I'm assuming at this point that this is a bug, the domain XML in test_with_nvram.xml should be rejected because it's not all applied. -- Darragh

On 5/14/22 6:42 PM, Darragh Bailey wrote:
Hi,
On Sat 14 May 2022, 21:11 Laine Stump, <laine@redhat.com <mailto:laine@redhat.com>> wrote:
Caveat - I'm completely unfamiliar with ruby and the libvirt-ruby API bindings.
If there is a problem that causes the domain config to not be updated, libvirt will return an error. So I would suspect one of the two things is happening:
Thanks, that's what I was expecting should happen, just wanted to be sure that there wasn't some other behaviour in place for compatibility reasons.
1) there may be a problem in the libvirt-ruby bindings that causes the error reported by the call (in whatever C code is behind the ruby bindings) to libvirt to be properly propagated to ruby. I would hope this isn't the case, but "bugs happen", so it should be considered as a possibility.
A quick look suggests that the code looks to raise an exception if the dom pointer returned is NULL, so I think the bindings are correct. But I will check that what version of ruby-libvirt I have installed matches the source code I'm looked at.
2) As I said in my earlier mail, any changes that are made will take effect the next time the domain is destroyed and restarted. This also means that the changes won't be reflected in the "live/status" XML of the domain until that time. If you want to see the new configuration after you've made changes, you should add the VIR_DOMAIN_XML_INACTIVE flag when requesting the domain XML. Possibly you haven't included this flag, and that's why you think that your change hasn't taken effect?
Ah, I forgot to outline where in the lifecycle the update is taking place. The domain isn't running when the code attempts to update the definition.
Does that still mean that the VIR_DOMAIN_XML_INACTIVE flag is needed? I was assuming when the domain is inactive the XML changes would be reflected immediately.
No, your thinking was correct - if the domain isn't active, then the change should take effect immediately, and there is no difference whether or not you have VIR_DOMAIN_XML_INACTIVE. I've never done anything directly with the nvram setting (just accepted whatever virt-manager put in there), but from your other message, it sounds like you've found a bonafide libvirt bug (either that, or I just don't know enough about how the nvram settings work :-)). Can you file an issue at https://gitlab.com/groups/libvirt/-/issues ?
Oddly I thought during some experiments when the added NVRAM XML element was ignored, the updated number of CPUs which was in the same XML definition passed in was applied.
Another indication that it's a bug - updates to the domain config are always an all-or-nothing thing.
Will dig further tomorrow or Monday on the version of ruby-libvirt installed into my rvm dev env as well as checking passing in the flag.
I'm sure it'll turn out to be something obvious that I'm overlooking.
Thanks, -- Darragh

On 5/14/22 21:23, Darragh Bailey wrote:
On Fri, 13 May 2022 at 00:17, Darragh Bailey <daragh.bailey@gmail.com <mailto:daragh.bailey@gmail.com>> wrote:
Hi,
On Thu 12 May 2022, 21:34 Laine Stump, <laine@redhat.com <mailto:laine@redhat.com>> wrote:
The virDomainDefineXMLFlags API (and also the older/deprecated virDomainDefineXML API) doesn't require that the domain first be undefined (with one notable exception - see below[*]). If you define a domain that already exists with the same name and uuid, then the effect is to "redefine" (or "update" if you prefer) the existing domain of that name. If the domain is currently active, then the changes will take effect the next time the domain is shut down ("Destroy"ed in libvirt API parlance) and re-started.
Unfortunately trying to call this via ruby-libvirt doesn't appear to behave as expected. It appears that if I add an nvram element without a loader element to the os block, the following code block will execute without issue but also without changing the domain XML:
I think that's kind of expected. If you take a look how libvirt parses that part of XML: https://gitlab.com/libvirt/libvirt/-/blob/master/src/conf/domain_conf.c#L182... if no <loader/> is found then the function exits early without looking at <nvram/> at all. It kind of makes sense - what good does nvram do without loader? Michal

On Mon 16 May 2022, 10:10 Michal Prívozník, <mprivozn@redhat.com> wrote:
On 5/14/22 21:23, Darragh Bailey wrote:
On Fri, 13 May 2022 at 00:17, Darragh Bailey <daragh.bailey@gmail.com <mailto:daragh.bailey@gmail.com>> wrote:
Unfortunately trying to call this via ruby-libvirt doesn't appear to behave as expected. It appears that if I add an nvram element without a loader element to the os block, the following code block will execute without issue but also without changing the domain XML:
I think that's kind of expected. If you take a look how libvirt parses that part of XML:
https://gitlab.com/libvirt/libvirt/-/blob/master/src/conf/domain_conf.c#L182...
if no <loader/> is found then the function exits early without looking at <nvram/> at all. It kind of makes sense - what good does nvram do without loader?
I don't know, I realised that there was a bug in our vagrant-libvirt in that it should be checking for both loader and NVRAM config settings to be passed in if the end user was updating the machine config. And subsequently set both elements or neither. However I was not expecting the silent discarding of the XML element and instead to get an error saying that an nvram XML element without a loader XML element is invalid and for the entire request to be rejected. Basically that the provided XML domain definition was invalid. Based on the previous explanation of how the define_domain_xml should work along with the response on the example reproducer this seems like a bug. Does your reply mean this is expected behaviour? Or were you looking for clarification on what I expected to see? -- Darragh Bailey

On 5/16/22 12:46, Darragh Bailey wrote:
On Mon 16 May 2022, 10:10 Michal Prívozník, <mprivozn@redhat.com <mailto:mprivozn@redhat.com>> wrote:
On 5/14/22 21:23, Darragh Bailey wrote: > > On Fri, 13 May 2022 at 00:17, Darragh Bailey <daragh.bailey@gmail.com <mailto:daragh.bailey@gmail.com> > <mailto:daragh.bailey@gmail.com <mailto:daragh.bailey@gmail.com>>> wrote: > > Unfortunately trying to call this via ruby-libvirt doesn't appear to > behave as expected. It appears that if I add an nvram element without a > loader element to the os block, the following code block will execute > without issue but also without changing the domain XML:
I think that's kind of expected. If you take a look how libvirt parses that part of XML:
https://gitlab.com/libvirt/libvirt/-/blob/master/src/conf/domain_conf.c#L182... <https://gitlab.com/libvirt/libvirt/-/blob/master/src/conf/domain_conf.c#L18257>
if no <loader/> is found then the function exits early without looking at <nvram/> at all. It kind of makes sense - what good does nvram do without loader?
I don't know, I realised that there was a bug in our vagrant-libvirt in that it should be checking for both loader and NVRAM config settings to be passed in if the end user was updating the machine config. And subsequently set both elements or neither.
However I was not expecting the silent discarding of the XML element and instead to get an error saying that an nvram XML element without a loader XML element is invalid and for the entire request to be rejected. Basically that the provided XML domain definition was invalid.
Based on the previous explanation of how the define_domain_xml should work along with the response on the example reproducer this seems like a bug.
Does your reply mean this is expected behaviour? Or were you looking for clarification on what I expected to see?
I mean, fixing this particular problem is trivial. But what's not as trivial is fixing ALL occurrences of this problem (where an attribute/element is ignored because of something we've parsed earlier). Mostly because we don't know what those are. BTW: Have you spotted the other demonstration of this problem (in the same function)? Yes, if fw autoselection is enabled (<os firmware='something'>) then <nvram template=''/> is ignored. Anyway, there used to be an unwritten agreement with users that they get XML of a domain they just defined to see if it contains everything they want. This is used to be because different drivers support different features. But I guess it's hard to argue with an unwritten rule esp. if it's not known to everybody. Michal

On Mon, 16 May 2022 at 12:30, Michal Prívozník <mprivozn@redhat.com> wrote:
I mean, fixing this particular problem is trivial. But what's not as trivial is fixing ALL occurrences of this problem (where an attribute/element is ignored because of something we've parsed earlier). Mostly because we don't know what those are.
tbf, this is the clarification I needed. I was previously operating under the assumption that given the description define_domain_xml was checking if the resulting XML for the domain matched what was passed in.
BTW: Have you spotted the other demonstration of this problem (in the same function)? Yes, if fw autoselection is enabled (<os firmware='something'>) then <nvram template=''/> is ignored.
No, I think it was probably a bit of a fluck that I hit the issue as I was looking at trying to avoid the special handling around nvram with the current code undefining a domain before creating the new definition. Could easily have missed it altogether.
Anyway, there used to be an unwritten agreement with users that they get XML of a domain they just defined to see if it contains everything they want. This is used to be because different drivers support different features. But I guess it's hard to argue with an unwritten rule esp. if it's not known to everybody.
Michal
I had started to lean towards checking the XML for the domain returned as a final validation step.Perhaps the API docs for https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainDefineXML and https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainDefineXMLFlags could mention that it doesnot guarantee that the entire XML document has been applied to the domain? So perhaps this is less a bug with the loader/nvram XML element handling and more a documentation bug and a possible enhancement that possibly the virDomainDefineXMLFlags could consider accepting a flag to verify the returned domain XML is equivalent as a general fix for those applications that would find this useful? -- Darragh Bailey "Nothing is foolproof to a sufficiently talented fool"

On 5/16/22 14:52, Darragh Bailey wrote:
So perhaps this is less a bug with the loader/nvram XML element handling and more a documentation bug and a possible enhancement that possibly the virDomainDefineXMLFlags could consider accepting a flag to verify the returned domain XML is equivalent as a general fix for those applications that would find this useful?
Yes to the first part, but no the second. Comparing XMLs is not as easy as you would think. For instance: <domain id='1' type='kvm'> <uuid>123456</uuid> <name>myGuest</name> </domain> <domain type='kvm' id='1'> <name>myGuest</uuid> <uuid>123456</uuid> </domain> The former is just an example of possible user input, the latter is how libvirt would format it. Obviously, these XMLs are equivalent, but not stcmp() equal. Michal

On Mon 16 May 2022, 14:32 Michal Prívozník, <mprivozn@redhat.com> wrote:
On 5/16/22 14:52, Darragh Bailey wrote:
So perhaps this is less a bug with the loader/nvram XML element handling and more a documentation bug and a possible enhancement that possibly the virDomainDefineXMLFlags could consider accepting a flag to verify the returned domain XML is equivalent as a general fix for those applications that would find this useful?
Yes to the first part, but no the second. Comparing XMLs is not as easy as you would think. For instance:
<domain id='1' type='kvm'> <uuid>123456</uuid> <name>myGuest</name> </domain>
<domain type='kvm' id='1'> <name>myGuest</uuid> <uuid>123456</uuid> </domain>
The former is just an example of possible user input, the latter is how libvirt would format it. Obviously, these XMLs are equivalent, but not stcmp() equal.
I'm familiar with the fun of checking for equivalence, I was hoping that as libvirt was working with XML it might already be using a library that could handle this. A quick look at libxml2 suggests it's been left up to the consumer to implement. So each project that becomes aware of this behaviour will have to find a library to do the comparison or implement it directly themselves. -- Darragh Bailey
participants (3)
-
Darragh Bailey
-
Laine Stump
-
Michal Prívozník