[Libvir] XML escaping patch

This is related to bug #206653 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=206653 basically we are very optimistic when generating the XML files, and sometimes this can break. The most common case is if some string inherited from the user input or some other config file embeds one of > < or & , another one would be if the strings are containing character outside of ACSII range and not encoded in UTF-8. We can at least cope with the easy case of escaping the 3 characters. This patch adds a simple buffer printing routing working with a simple string argument, and use it for the 2 cases where I think it's most likely to be needed i.e. cmdline and bootloader_args. There is a number of places where paths are used and the user might use weird character names, but since those cases can't be handled properly (you can't change that path or try to convert encoding on the fly since we can't guess reliably which one is used) I didn't tried to change those. This makes for a relatively simple patch which should IMHO cover most case where we may break while we really should not. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard wrote: [...]
/** + * virBufferEscapeString: + * @buf: the buffer to dump + * @format: a printf like format string but with only one %s parameter + * @str: the string argument which need to be escaped + * + * Do a formatted print with a single string to an XML buffer. The string + * is escaped to avoid generating a not well-formed XML instance. + * + * Returns 0 successful, -1 in case of internal or API error. + */ +int +virBufferEscapeString(virBufferPtr buf, const char *format, const char *str)
I spent a bit of time pondering if it would be possible to either make this call type-safe, or else handle arbitrary format strings. I'm not sure I can see a good way to do either, so in the meantime, +1. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Fri, Jul 06, 2007 at 03:05:09PM +0100, Richard W.M. Jones wrote:
Daniel Veillard wrote: [...]
/** + * virBufferEscapeString: + * @buf: the buffer to dump + * @format: a printf like format string but with only one %s parameter + * @str: the string argument which need to be escaped + * + * Do a formatted print with a single string to an XML buffer. The string + * is escaped to avoid generating a not well-formed XML instance. + * + * Returns 0 successful, -1 in case of internal or API error. + */ +int +virBufferEscapeString(virBufferPtr buf, const char *format, const char *str)
I spent a bit of time pondering if it would be possible to either make this call type-safe, or else handle arbitrary format strings. I'm not sure I can see a good way to do either, so in the meantime, +1.
If we want safety then we should be using an API based on structural notion of XML elements & attributes, rather than printf. In our use cases any single attribute / element is either a boolean, int or a string, so if we had APIs based on idea of creating elements / adding attributes the whole thing could be typesafe & have no need of any printf formatting. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Daniel P. Berrange wrote:
If we want safety then we should be using an API based on structural notion of XML elements & attributes, rather than printf.
... Or use a language which integrates XML directly in a completely type-safe way: http://www.cduce.org/tutorial_getting_started.html#xmldoc Yup, it's friday. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Fri, Jul 06, 2007 at 09:49:46AM -0400, Daniel Veillard wrote:
This is related to bug #206653 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=206653 basically we are very optimistic when generating the XML files, and sometimes this can break. The most common case is if some string inherited from the user input or some other config file embeds one of > < or & , another one would be if the strings are containing character outside of ACSII range and not encoded in UTF-8. We can at least cope with the easy case of escaping the 3 characters. This patch adds a simple buffer printing routing working with a simple string argument, and use it for the 2 cases where I think it's most likely to be needed i.e. cmdline and bootloader_args. There is a number of places where paths are used and the user might use weird character names, but since those cases can't be handled properly (you can't change that path or try to convert encoding on the fly since we can't guess reliably which one is used) I didn't tried to change those. This makes for a relatively simple patch which should IMHO cover most case where we may break while we really should not.
To be honest the whole way of building up XML documents through string concatenation is rather disgusting. Isn't there some simple API that we could use to build up based on logical elements, attributes & text...
diff -u -p -r1.126 xend_internal.c --- src/xend_internal.c 5 Jul 2007 16:04:12 -0000 1.126 +++ src/xend_internal.c 6 Jul 2007 13:31:34 -0000 @@ -1337,7 +1337,7 @@ xend_parse_sexp_desc_os(virConnectPtr xe virBufferVSprintf(buf, " <root>%s</root>\n", tmp); tmp = sexpr_node(node, "domain/image/linux/args"); if ((tmp != NULL) && (tmp[0] != 0)) - virBufferVSprintf(buf, " <cmdline>%s</cmdline>\n", tmp); + virBufferEscapeString(buf, " <cmdline>%s</cmdline>\n", tmp); }
virBufferAdd(buf, " </os>\n", 8); @@ -1423,7 +1423,7 @@ xend_parse_sexp_desc(virConnectPtr conn, /* * Only insert bootloader_args if there is also a bootloader param */ - virBufferVSprintf(&buf, " <bootloader_args>%s</bootloader_args>\n", tmp); + virBufferEscapeString(&buf, " <bootloader_args>%s</bootloader_args>\n", tmp); }
So instead of that we could do dom = xmlRootElement("domain") xmlAttributeInt(dom, "id", id); xmlAttributeString(dom, "type", "xen"); os = xmlElement(dom, "os") xmlElementString(os, "bootloader", tmp) char *doc = xmlString(dom); xmlFree(dom); return doc; Of course this is a much large change than the one you're suggesting. We surely need to fixup much more than the two 'cmdline' and 'bootloader_args' elements that you describe - practically any element / attribute which takes a %s format string needs this fix. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Daniel P. Berrange wrote:
So instead of that we could do
dom = xmlRootElement("domain") xmlAttributeInt(dom, "id", id); xmlAttributeString(dom, "type", "xen");
os = xmlElement(dom, "os") xmlElementString(os, "bootloader", tmp)
char *doc = xmlString(dom); xmlFree(dom); return doc;
Even better might be to use a templating library to load external XML docs containing placeholders which can be substituted. But I think it's a waste of time to worry about this too much when Daniel [Veillard]'s patch solves the immediate bugzilla. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Fri, Jul 06, 2007 at 03:06:37PM +0100, Daniel P. Berrange wrote:
On Fri, Jul 06, 2007 at 09:49:46AM -0400, Daniel Veillard wrote:
This is related to bug #206653 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=206653 basically we are very optimistic when generating the XML files, and sometimes this can break. The most common case is if some string inherited from the user input or some other config file embeds one of > < or & , another one would be if the strings are containing character outside of ACSII range and not encoded in UTF-8. We can at least cope with the easy case of escaping the 3 characters. This patch adds a simple buffer printing routing working with a simple string argument, and use it for the 2 cases where I think it's most likely to be needed i.e. cmdline and bootloader_args. There is a number of places where paths are used and the user might use weird character names, but since those cases can't be handled properly (you can't change that path or try to convert encoding on the fly since we can't guess reliably which one is used) I didn't tried to change those. This makes for a relatively simple patch which should IMHO cover most case where we may break while we really should not.
To be honest the whole way of building up XML documents through string concatenation is rather disgusting. Isn't there some simple API that we could use to build up based on logical elements, attributes & text...
The xmlWriter part of libxml2 http://xmlsoft.org/html/libxml-xmlwriter.html but 1/ that's one of the few module I didn't wrote :-) 2/ I never used it 3/ documentation is rather limited. But the main reason I'm not sure it would really help is that the main kind of errors we may face is IMHO problems of encodings, where we may not know what encoding us associated with a char * and if we need to plug the string in XML we have absolutely no garantee. And since for example the sexpr don't have an encoding we are just crossing fingers when talking to Xend. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Fri, Jul 06, 2007 at 03:06:37PM +0100, Daniel P. Berrange wrote:
So instead of that we could do
dom = xmlRootElement("domain") xmlAttributeInt(dom, "id", id); xmlAttributeString(dom, "type", "xen");
os = xmlElement(dom, "os") xmlElementString(os, "bootloader", tmp)
char *doc = xmlString(dom); xmlFree(dom); return doc;
We could create intermediate functions around libxml2 tree API, but it would still be a bit more heavy than that.
Of course this is a much large change than the one you're suggesting. We surely need to fixup much more than the two 'cmdline' and 'bootloader_args' elements that you describe - practically any element / attribute which takes a %s format string needs this fix.
Well would a name with < > or & actually work ? A path would not, most attributes are predefined strings values which are just ascii name token, that would probably be a bit over the top, no ? Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
participants (3)
-
Daniel P. Berrange
-
Daniel Veillard
-
Richard W.M. Jones