[libvirt] ignore vs. error when inappropriate or unrecognized attributes/elements are present in XML

This is related to: https://bugzilla.redhat.com/show_bug.cgi?id=638633#c14 I had started to reply to it in the comments of the bug, but my reply became too long, and expanded into an issue wider than that single bug, so I figured I'd better discuss it here instead. A synopsis: It's possible to specify a script to be executed when bringing up a guest interface device. This is done by adding a <script path='xxx'/> element to the <interface> definition. The bug report was that <script path='blah'/> was added to an <interface type='bridge'> definition in the config for a qemu domain, and nobody complained that 'blah' didn't exist. This eventually led to the realization that when interface type='bridge' and driver=qemu, <script> isn't used, so it's ignored. More exactly, the script element is only recognized by the parser when interface type == 'bridge' or 'ethernet', and is only used by the hypervisor driver when: 1) the interface type='bridge' and the xen driver is being used. or 2) the interface type='ethernet' and the qemu driver is being used. After this all came to light, in comment 14, Dave says:
That's being the case, we need to explicitly reject the attempt to specify a script.
(implied: when interface type='bridge' and the hypervisor is qemu). On the surface that sounds like a good idea, but it's opening a pretty big can of worms. Specifically for this issue, this rejection must take place in the hypervisor because different hypervisors support scripts for different types of interfaces - a script for type='bridge' is perfectly acceptable for xen, but not for qemu. That can be handled with an error log in the qemu driver in the case of type='bridge'. But if someone defines <interface type='network'> and specifies a script, the parser ignores the <script> element (because normally it's stored either in a union that's only valid when type='bridge' or a different union only valid when type='ethernet'), so qemu has no way of knowing a script was specified and therefore can't log an error. That's not a terribly difficult problem, though. Just a bit more code than a simple "else { log an error; }" somewhere in the code - the parser needs to be changed to move the script attribute out of the unions and into the main struct, and always populate it, then the drivers need to be taught to log an error when appropriate). But if we're doing that for the <script> subelement of <interface>, we're creating a prerequisite for what's expected in the case of all the *other* attributes/elements that are currently ignored under similar circumstances by libvirt's parsers. Essentially anything that's in a union contained in any of the virXxxDef structs is probably treated the same way. As a simple first example, look at "keymap", which is an attribute of a domain's devices/graphics element, but only valid if the type is spice or vnc. The parser ignores it in all other cases, and since it's stored in data.spice or data.vnc, there's not even a possibility of qemu (or other hypervisor) indicating any kind of error when someone specifies e.g. <graphics type='sdl' 'keymap='blah' .../>. Similarly, if someone were to add "tlsPort='5910' to a <graphics> of any type other than spice. Of course, from the point of view of all the code associated with <graphics type='vnc' ... />, this is no different than if someone had added "frozzle='fib'" - it's totally unexpected (in this case by any type), so it's ignored. Actually, I can see now there are several different classes of this problem. Here are the first few that come to mind: 1) an attribute/element is completely unknown/unexpected in all cases (e.g. "frozzle='fib'" anywhere, or more insidious, something that *looks* correct, but isn't, e.g. "<script name='/path/to/script'/>"*) 2) an attribute/element is useful/expected only when some other attribute is set to a particular value (usually one called "type", but could be something else), for example keymap='blah' is only expected in a <graphics> element when type='spice' or type='vnc'. 3) an attribute/element is useful/expected only for certain combinations of the value of some other attribute and which driver is using the element, e.g. the subject of this bug - script='blah' is only expected when type='bridge' and it's used by the Xen driver, or type='ethernet' and it's used by the qemu driver. So what are the rules of engagement for these various cases? When do we ignore, when do we log an error during parsing, and when do we log an error in the code that's using the parsed data? (*Another example of (1) from recent real-life - during testing of https://bugzilla.redhat.com/show_bug.cgi?id=727982, a QA person claimed that the bugt hadn't been fixed because the hosts file wasn't being updated; it turned out that they were adding the new <host> element at the toplevel under <network> rather than inside the <dhcp> subelement, so the parser was ignoring it (since it's unrecognized at that level) rather than logging an error).

On Tue, Aug 16, 2011 at 04:44:42AM -0400, Laine Stump wrote:
This is related to: https://bugzilla.redhat.com/show_bug.cgi?id=638633#c14
I had started to reply to it in the comments of the bug, but my reply became too long, and expanded into an issue wider than that single bug, so I figured I'd better discuss it here instead.
[snip]
Actually, I can see now there are several different classes of this problem. Here are the first few that come to mind:
1) an attribute/element is completely unknown/unexpected in all cases (e.g. "frozzle='fib'" anywhere, or more insidious, something that *looks* correct, but isn't, e.g. "<script name='/path/to/script'/>"*)
RNG schema validation is the only sane way to catch this
2) an attribute/element is useful/expected only when some other attribute is set to a particular value (usually one called "type", but could be something else), for example keymap='blah' is only expected in a <graphics> element when type='spice' or type='vnc'.
We should always catch these when parsing, since this is done via our enumeration helpers.
3) an attribute/element is useful/expected only for certain combinations of the value of some other attribute and which driver is using the element, e.g. the subject of this bug - script='blah' is only expected when type='bridge' and it's used by the Xen driver, or type='ethernet' and it's used by the qemu driver.
IMHO this is just another case of 1) really.
So what are the rules of engagement for these various cases? When do we ignore, when do we log an error during parsing, and when do we log an error in the code that's using the parsed data?
I think we should add a flag to 'virDomainDefine' and virDomainCreateXML VIR_DOMAIN_VALIDATE_XML and when that is set, run the user specified XML through the RNG schema validator. Virsh could be extended to have a --validate flag too. We'd add an explicit error code VIR_ERROR_XML_VALIDATION to let apps catch schema failures. This would fix a major annoyance with 'virsh edit' where you make XML changes and they get lost because you typod. ie virsh edit sets the validate flag. If it gets a failure it should ask the user whether they want to abandon the edit, force the edit (ie define without the validate flag), or re-launch the editor to correct the mistake. If we did this we'd get much more use of the RNG schemas and so find mistakes in them sooner Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/16/2011 11:47 AM, Daniel P. Berrange wrote:
On Tue, Aug 16, 2011 at 04:44:42AM -0400, Laine Stump wrote:
This is related to: https://bugzilla.redhat.com/show_bug.cgi?id=638633#c14
I had started to reply to it in the comments of the bug, but my reply became too long, and expanded into an issue wider than that single bug, so I figured I'd better discuss it here instead. [snip]
Actually, I can see now there are several different classes of this problem. Here are the first few that come to mind:
1) an attribute/element is completely unknown/unexpected in all cases (e.g. "frozzle='fib'" anywhere, or more insidious, something that *looks* correct, but isn't, e.g. "<script name='/path/to/script'/>"*) RNG schema validation is the only sane way to catch this
Agreed.
2) an attribute/element is useful/expected only when some other attribute is set to a particular value (usually one called "type", but could be something else), for example keymap='blah' is only expected in a<graphics> element when type='spice' or type='vnc'. We should always catch these when parsing, since this is done via our enumeration helpers.
"Enumeration helpers"? My brain isn't making the connection to what you're talking about... :-)
3) an attribute/element is useful/expected only for certain combinations of the value of some other attribute and which driver is using the element, e.g. the subject of this bug - script='blah' is only expected when type='bridge' and it's used by the Xen driver, or type='ethernet' and it's used by the qemu driver. IMHO this is just another case of 1) really.
I guess as long as the hypervisor being used is part of the XML, and the RNG is detailed enough to note that, eg, when the hypervisor is xen and the interface type is bridge, a script element is okay, or when the hypervisor is qemu and the interface type is ethernet, a script is okay, but not in other cases. Wouldn't an RNG file that adequately described that be fairly gigantic and filled with redundancies, though?
So what are the rules of engagement for these various cases? When do we ignore, when do we log an error during parsing, and when do we log an error in the code that's using the parsed data? I think we should add a flag to 'virDomainDefine' and virDomainCreateXML
VIR_DOMAIN_VALIDATE_XML
and when that is set, run the user specified XML through the RNG schema validator. Virsh could be extended to have a --validate flag too.
We'd add an explicit error code VIR_ERROR_XML_VALIDATION to let apps catch schema failures.
This would fix a major annoyance with 'virsh edit' where you make XML changes and they get lost because you typod. ie virsh edit sets the validate flag. If it gets a failure it should ask the user whether they want to abandon the edit, force the edit (ie define without the validate flag), or re-launch the editor to correct the mistake.
If we did this we'd get much more use of the RNG schemas and so find mistakes in them sooner
+10 I like it!

On Tue, Aug 16, 2011 at 03:29:04PM -0400, Laine Stump wrote:
On 08/16/2011 11:47 AM, Daniel P. Berrange wrote:
On Tue, Aug 16, 2011 at 04:44:42AM -0400, Laine Stump wrote:
This is related to: https://bugzilla.redhat.com/show_bug.cgi?id=638633#c14
I had started to reply to it in the comments of the bug, but my reply became too long, and expanded into an issue wider than that single bug, so I figured I'd better discuss it here instead. [snip]
Actually, I can see now there are several different classes of this problem. Here are the first few that come to mind:
1) an attribute/element is completely unknown/unexpected in all cases (e.g. "frozzle='fib'" anywhere, or more insidious, something that *looks* correct, but isn't, e.g. "<script name='/path/to/script'/>"*) RNG schema validation is the only sane way to catch this
Agreed.
2) an attribute/element is useful/expected only when some other attribute is set to a particular value (usually one called "type", but could be something else), for example keymap='blah' is only expected in a<graphics> element when type='spice' or type='vnc'. We should always catch these when parsing, since this is done via our enumeration helpers.
"Enumeration helpers"? My brain isn't making the connection to what you're talking about... :-)
3) an attribute/element is useful/expected only for certain combinations of the value of some other attribute and which driver is using the element, e.g. the subject of this bug - script='blah' is only expected when type='bridge' and it's used by the Xen driver, or type='ethernet' and it's used by the qemu driver. IMHO this is just another case of 1) really.
I guess as long as the hypervisor being used is part of the XML, and the RNG is detailed enough to note that, eg, when the hypervisor is xen and the interface type is bridge, a script element is okay, or when the hypervisor is qemu and the interface type is ethernet, a script is okay, but not in other cases. Wouldn't an RNG file that adequately described that be fairly gigantic and filled with redundancies, though?
Forgot to mention that the RNG schemas shouldn't contain hypervisor specific conditionals. For cases where the XML is valid, but the config is not supported by the driver in question, should report a VIR_ERR_CONFIG_UNSUPPORTED error Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/16/2011 04:58 PM, Daniel P. Berrange wrote:
On Tue, Aug 16, 2011 at 03:29:04PM -0400, Laine Stump wrote:
On 08/16/2011 11:47 AM, Daniel P. Berrange wrote:
On Tue, Aug 16, 2011 at 04:44:42AM -0400, Laine Stump wrote:
This is related to: https://bugzilla.redhat.com/show_bug.cgi?id=638633#c14
I had started to reply to it in the comments of the bug, but my reply became too long, and expanded into an issue wider than that single bug, so I figured I'd better discuss it here instead. [snip]
Actually, I can see now there are several different classes of this problem. Here are the first few that come to mind:
1) an attribute/element is completely unknown/unexpected in all cases (e.g. "frozzle='fib'" anywhere, or more insidious, something that *looks* correct, but isn't, e.g. "<script name='/path/to/script'/>"*) RNG schema validation is the only sane way to catch this
Agreed.
2) an attribute/element is useful/expected only when some other attribute is set to a particular value (usually one called "type", but could be something else), for example keymap='blah' is only expected in a<graphics> element when type='spice' or type='vnc'. We should always catch these when parsing, since this is done via our enumeration helpers.
"Enumeration helpers"? My brain isn't making the connection to what you're talking about... :-)
Okay, I think I finally see what happened here - I believe my example misled you into thinking the problem I was talking about was having the value "blah" for the keymap attribute rather than some recognized value (otherwise why would you talk about enumeration helpers?). (of course for keymap, your suggestion isn't valid either, since it's stored as a char*, not as an enum value, so maybe I'm still misunderstanding you). The problem I was *really* trying to point out is that of a keymap attribute being given *at all* when type is something other than "spice" or "vnc". The problem is that keymap is stored in a union, and the parts of the union that are active when type != "(spice|vnc)" don't have any keymap members. So even if keymap has something that would have been valid when type="(spice|vnc)" (but not otherwise), it isn't flagged in the parser (because the parser ignores keymap when it's not appropriate, rather than logging an error) and the driver *can't* do anything about it (because it has no way of knowing that a keymap was given - the parser can't put keymap into the config object because there's no place to put it). Specifically for the bug report I'm looking at, this can't be flagged as an error: <interface type='network'> <script path='/path/to/script'/> </interface> because the script path is in a union, and only active if type="(bridge|ethernet)".
3) an attribute/element is useful/expected only for certain combinations of the value of some other attribute and which driver is using the element, e.g. the subject of this bug - script='blah' is only expected when type='bridge' and it's used by the Xen driver, or type='ethernet' and it's used by the qemu driver. IMHO this is just another case of 1) really. I guess as long as the hypervisor being used is part of the XML, and the RNG is detailed enough to note that, eg, when the hypervisor is xen and the interface type is bridge, a script element is okay, or when the hypervisor is qemu and the interface type is ethernet, a script is okay, but not in other cases. Wouldn't an RNG file that adequately described that be fairly gigantic and filled with redundancies, though? Forgot to mention that the RNG schemas shouldn't contain hypervisor specific conditionals. For cases where the XML is valid, but the config is not supported by the driver in question, should report a VIR_ERR_CONFIG_UNSUPPORTED error
Ah, so it *isn't* really just another case of (1) :-) (since your answer to (1) was to catch it in the RNG, and now you're saying the opposite) I'll point out that a "hypervisor specific conditional" is just a larger scale example of the same phenomenon as a "interface-type specific conditional". What is it that makes hypervisor a special case? Isn't this exactly the way we should also handle, e.g. having keymap specified when graphic type="sdp"? (i.e., the hypervisor *does* support keymap, but graphics is a type that doesn't support it). To try and get at a solution - I think one of the main problems here is the use of unions in the config objects - they're encoding driver-specific stuff into the config object, but the parser (and therefore the config object) should be driver-agnostic. Having those unions makes it impossible for the driver to report VIR_ERR_CONFIG_UNSUPPORTED in many cases, because the parser is unable to tell the driver that such an attempt was made. The unions look all pretty and nicely organized, and make it easy to tell which bits of config are used in which cases by just looking in the .h file, but I think in a majority of cases they are inappropriate and should be eliminated. In particular, to solve https://bugzilla.redhat.com/show_bug.cgi?id=638633 according to your recommendation, I think what I need to do is: 1) Remove the script attribute from the bridge & ethernet unions in virDomainNetDef and place it in the common part of the struct. 2) Change the parser to always populate script, no matter what is the type of the virDomainNetDef. 3) In the driver code that uses the virDomainNetDef, check for script in *all* cases (not just the cases where the driver *wants* to see a script), and log VIR_ERR_CONFIG_UNSUPPORTED when it is specified at an inappropriate time. Does this sound correct? Or do you think that the only appropriate solution to that bug is to add the "validate" flag you mentioned earlier, and tell people to add that flag if they want an error like this caught? (I would argue that it should be up to the driver, not the parser, to decide whether or not a script is appropriate for any particular type of network interface)

On 08/22/2011 01:36 AM, Laine Stump wrote:
The problem I was *really* trying to point out is that of a keymap attribute being given *at all* when type is something other than "spice" or "vnc". The problem is that keymap is stored in a union, and the parts of the union that are active when type != "(spice|vnc)" don't have any keymap members. So even if keymap has something that would have been valid when type="(spice|vnc)" (but not otherwise), it isn't flagged in the parser (because the parser ignores keymap when it's not appropriate, rather than logging an error) and the driver *can't* do anything about it (because it has no way of knowing that a keymap was given - the parser can't put keymap into the config object because there's no place to put it).
To some degree, rng validation can map unions, if the distinguishing choice between unions is an attribute also exposed in the xml. It is done via <choice> and <group>, but it does make the rng larger; it also helps to have more <define>s in order to avoid some repetition of subelements that are common between more than one choice.
In particular, to solve https://bugzilla.redhat.com/show_bug.cgi?id=638633 according to your recommendation, I think what I need to do is:
1) Remove the script attribute from the bridge & ethernet unions in virDomainNetDef and place it in the common part of the struct.
2) Change the parser to always populate script, no matter what is the type of the virDomainNetDef.
Not necessarily. We can instead rewrite the rng to forbid script attributes from all but bridge and ethernet attributes. However, there's still the issue that some, but not all, hypervisors can support scripts with a bridge, so there's still some validation that will have to be done in drivers to reject unsupported items even though the rng allows it. For example, look at <define name="disk"> - it has a large <choice> with a number of groups, each group defines one value for <attribute name="type">, along with the sub-<elements> that are appropriate for that attribute choice. I think we could do the same with <define name="interface">, and split up <define name="interface-options"> to instead list which subelements belong under a given choice of interface type.
3) In the driver code that uses the virDomainNetDef, check for script in *all* cases (not just the cases where the driver *wants* to see a script), and log VIR_ERR_CONFIG_UNSUPPORTED when it is specified at an inappropriate time.
This part sounds right - anywhere that we allow parsing a script, we have to then check in the driver whether a script was given, whether or not the driver can support a script in that use case. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 08/22/2011 09:10 AM, Eric Blake wrote:
On 08/22/2011 01:36 AM, Laine Stump wrote:
The problem I was *really* trying to point out is that of a keymap attribute being given *at all* when type is something other than "spice" or "vnc". The problem is that keymap is stored in a union, and the parts of the union that are active when type != "(spice|vnc)" don't have any keymap members. So even if keymap has something that would have been valid when type="(spice|vnc)" (but not otherwise), it isn't flagged in the parser (because the parser ignores keymap when it's not appropriate, rather than logging an error) and the driver *can't* do anything about it (because it has no way of knowing that a keymap was given - the parser can't put keymap into the config object because there's no place to put it).
To some degree, rng validation can map unions, if the distinguishing choice between unions is an attribute also exposed in the xml. It is done via <choice> and <group>, but it does make the rng larger; it also helps to have more <define>s in order to avoid some repetition of subelements that are common between more than one choice.
In particular, to solve https://bugzilla.redhat.com/show_bug.cgi?id=638633 according to your recommendation, I think what I need to do is:
1) Remove the script attribute from the bridge & ethernet unions in virDomainNetDef and place it in the common part of the struct.
2) Change the parser to always populate script, no matter what is the type of the virDomainNetDef.
Not necessarily. We can instead rewrite the rng to forbid script attributes from all but bridge and ethernet attributes. However, there's still the issue that some, but not all, hypervisors can support scripts with a bridge, so there's still some validation that will have to be done in drivers to reject unsupported items even though the rng allows it.
That's all great, but the fact remains that we don't use the RNG anywhere in the libvirt API, so it does us no good, for example, when running "virsh edit mydomain". Until/unless we do that (validate input XML to the relevant RNG), having RNGs that properly mirror unions in the config objects is just preventing us from detecting improper usage. (and I'm doubtful this validation is a good candidate for backporting)
For example, look at <define name="disk"> - it has a large <choice> with a number of groups, each group defines one value for <attribute name="type">, along with the sub-<elements> that are appropriate for that attribute choice. I think we could do the same with <define name="interface">, and split up <define name="interface-options"> to instead list which subelements belong under a given choice of interface type.
Again, the RNG has a very thorough/correct representation of what should be allowed, but the only place the RNG is used is when running the test suite.
3) In the driver code that uses the virDomainNetDef, check for script in *all* cases (not just the cases where the driver *wants* to see a script), and log VIR_ERR_CONFIG_UNSUPPORTED when it is specified at an inappropriate time.
This part sounds right - anywhere that we allow parsing a script, we have to then check in the driver whether a script was given, whether or not the driver can support a script in that use case.

On Mon, Aug 22, 2011 at 01:52:07PM -0400, Laine Stump wrote:
On 08/22/2011 09:10 AM, Eric Blake wrote:
On 08/22/2011 01:36 AM, Laine Stump wrote:
The problem I was *really* trying to point out is that of a keymap attribute being given *at all* when type is something other than "spice" or "vnc". The problem is that keymap is stored in a union, and the parts of the union that are active when type != "(spice|vnc)" don't have any keymap members. So even if keymap has something that would have been valid when type="(spice|vnc)" (but not otherwise), it isn't flagged in the parser (because the parser ignores keymap when it's not appropriate, rather than logging an error) and the driver *can't* do anything about it (because it has no way of knowing that a keymap was given - the parser can't put keymap into the config object because there's no place to put it).
To some degree, rng validation can map unions, if the distinguishing choice between unions is an attribute also exposed in the xml. It is done via <choice> and <group>, but it does make the rng larger; it also helps to have more <define>s in order to avoid some repetition of subelements that are common between more than one choice.
In particular, to solve https://bugzilla.redhat.com/show_bug.cgi?id=638633 according to your recommendation, I think what I need to do is:
1) Remove the script attribute from the bridge & ethernet unions in virDomainNetDef and place it in the common part of the struct.
2) Change the parser to always populate script, no matter what is the type of the virDomainNetDef.
Not necessarily. We can instead rewrite the rng to forbid script attributes from all but bridge and ethernet attributes. However, there's still the issue that some, but not all, hypervisors can support scripts with a bridge, so there's still some validation that will have to be done in drivers to reject unsupported items even though the rng allows it.
That's all great, but the fact remains that we don't use the RNG anywhere in the libvirt API, so it does us no good, for example, when running "virsh edit mydomain".
Until/unless we do that (validate input XML to the relevant RNG), having RNGs that properly mirror unions in the config objects is just preventing us from detecting improper usage. (and I'm doubtful this validation is a good candidate for backporting)
As I mentioned earlier in this thread, we can easily add a new flag to the XML related APIs to request validation against the RNG. virsh edit would set this flag and if it fails, would prompt the user whether they want to correct the XML, or force the change ignoring validation errors. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump