[libvirt] [PATCH] Strip control codes in virBufferEscapeString

These cannot be represented in XML. We have been stripping them, but only if the string had characters that needed escaping: <>"'& Extend the strcspn check to include control codes, and strip them even if we don't do any escaping. https://bugzilla.redhat.com/show_bug.cgi?id=1184131 https://bugzilla.redhat.com/show_bug.cgi?id=1066564 --- src/util/virbuffer.c | 14 +++++++++++--- tests/virbuftest.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 3 deletions(-) diff --git a/src/util/virbuffer.c b/src/util/virbuffer.c index 706dbfa..3d13c90 100644 --- a/src/util/virbuffer.c +++ b/src/util/virbuffer.c @@ -438,6 +438,13 @@ virBufferEscapeString(virBufferPtr buf, const char *format, const char *str) int len; char *escaped, *out; const char *cur; + const char forbidden_characters[] = { + 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, + /*\t*/ /*\n*/ 0x0B, 0x0C, /*\r*/ 0x0E, 0x0F, 0x10, + 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, + 0x19, '"', '&', '\'', '<', '>', + '\0' + }; if ((format == NULL) || (buf == NULL) || (str == NULL)) return; @@ -446,7 +453,7 @@ virBufferEscapeString(virBufferPtr buf, const char *format, const char *str) return; len = strlen(str); - if (strcspn(str, "<>&'\"") == len) { + if (strcspn(str, forbidden_characters) == len) { virBufferAsprintf(buf, format, str); return; } @@ -490,8 +497,7 @@ virBufferEscapeString(virBufferPtr buf, const char *format, const char *str) *out++ = 'o'; *out++ = 's'; *out++ = ';'; - } else if (((unsigned char)*cur >= 0x20) || (*cur == '\n') || (*cur == '\t') || - (*cur == '\r')) { + } else if (!strchr(forbidden_characters, *cur)) { /* * default case, just copy ! * Note that character over 0x80 are likely to give problem @@ -499,6 +505,8 @@ virBufferEscapeString(virBufferPtr buf, const char *format, const char *str) * it's hard to handle properly we have to assume it's UTF-8 too */ *out++ = *cur; + } else { + /* silently ignore control characters */ } cur++; } diff --git a/tests/virbuftest.c b/tests/virbuftest.c index 21cb18b..10398d5 100644 --- a/tests/virbuftest.c +++ b/tests/virbuftest.c @@ -349,6 +349,39 @@ testBufAddStr(const void *opaque ATTRIBUTE_UNUSED) static int +testBufEscapeStr(const void *opaque ATTRIBUTE_UNUSED) +{ + const struct testBufAddStrData *data = opaque; + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *actual; + int ret = -1; + + virBufferAddLit(&buf, "<c>\n"); + virBufferAdjustIndent(&buf, 2); + virBufferEscapeString(&buf, "<el>%s</el>\n", data->data); + virBufferAdjustIndent(&buf, -2); + virBufferAddLit(&buf, "</c>"); + + if (!(actual = virBufferContentAndReset(&buf))) { + TEST_ERROR("buf is empty"); + goto cleanup; + } + + if (STRNEQ_NULLABLE(actual, data->expect)) { + TEST_ERROR("testBufEscapeStr(): Strings don't match:\n"); + virtTestDifference(stderr, data->expect, actual); + goto cleanup; + } + + ret = 0; + + cleanup: + VIR_FREE(actual); + return ret; +} + + +static int mymain(void) { int ret = 0; @@ -379,6 +412,22 @@ mymain(void) DO_TEST_ADD_STR("<a/>\n", "<c>\n <a/>\n</c>"); DO_TEST_ADD_STR("<b>\n <a/>\n</b>\n", "<c>\n <b>\n <a/>\n </b>\n</c>"); +#define DO_TEST_ESCAPE(data, expect) \ + do { \ + struct testBufAddStrData info = { data, expect }; \ + if (virtTestRun("Buf: EscapeStr", testBufEscapeStr, &info) < 0) \ + ret = -1; \ + } while (0) + + DO_TEST_ESCAPE("<td></td><td></td>", + "<c>\n <el><td></td><td></td></el>\n</c>"); + DO_TEST_ESCAPE("\007\"&&\"\x15", + "<c>\n <el>"&&"</el>\n</c>"); + DO_TEST_ESCAPE(",,'..',,", + "<c>\n <el>,,'..',,</el>\n</c>"); + DO_TEST_ESCAPE("\x01\x01\x02\x03\x05\x08", + "<c>\n <el></el>\n</c>"); + return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE; } -- 2.0.5

On Mon, Mar 30, 2015 at 01:02:49PM +0200, Ján Tomko wrote:
These cannot be represented in XML.
We have been stripping them, but only if the string had characters that needed escaping: <>"'&
Extend the strcspn check to include control codes, and strip them even if we don't do any escaping.
https://bugzilla.redhat.com/show_bug.cgi?id=1184131 https://bugzilla.redhat.com/show_bug.cgi?id=1066564 --- src/util/virbuffer.c | 14 +++++++++++--- tests/virbuftest.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 3 deletions(-)
ACK, according to XML documentation characters below 0x20 except 0x09, 0x0A and 0x0D are forbidden. Pavel

On 03/30/2015 05:02 AM, Ján Tomko wrote:
These cannot be represented in XML.
Yes they can, via entities. DV would know for sure, but I think that is the entity for the C byte '\1'.
We have been stripping them, but only if the string had characters that needed escaping: <>"'&
Extend the strcspn check to include control codes, and strip them even if we don't do any escaping.
NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them.
https://bugzilla.redhat.com/show_bug.cgi?id=1184131 https://bugzilla.redhat.com/show_bug.cgi?id=1066564 --- src/util/virbuffer.c | 14 +++++++++++--- tests/virbuftest.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 3 deletions(-)
As there are real bugs that will be fixed once we use the correct entities, I'm looking forward to v2. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Mar 30, 2015 at 07:06:45AM -0600, Eric Blake wrote:
On 03/30/2015 05:02 AM, Ján Tomko wrote:
These cannot be represented in XML.
Yes they can, via entities. DV would know for sure, but I think that is the entity for the C byte '\1'.
Only in XML 1.1. For XML 1.0: http://www.w3.org/TR/xml/#dt-charref Well-formedness constraint: Legal Character Characters referred to using character references MUST match the production for Char. Which is: http://www.w3.org/TR/xml/#NT-Char [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ Both libvirt and virt-xml-validate choke on those entities error: (domain_definition):2: xmlParseCharRef: invalid xmlChar value 1 <name>f21</name> DV was the one who wrote the code to skip over control characters in commit b36f453a581f27a4a43558978724a52df32045bb (v0.3.0~1) new function virBufferEscapeString() to format a string while escaping its content for XML, and apply it to a couple of obvious places, should fix bug #206653 It was the optimization in commit 0af02cb2e8d8192958735880e135ab69beb437c5 (v0.8.6~57) buf: Simplify virBufferEscapeString which broke this for strings that do have control codes, but not escapable characters.
We have been stripping them, but only if the string had characters that needed escaping: <>"'&
Extend the strcspn check to include control codes, and strip them even if we don't do any escaping.
NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them.
https://bugzilla.redhat.com/show_bug.cgi?id=1184131 https://bugzilla.redhat.com/show_bug.cgi?id=1066564 --- src/util/virbuffer.c | 14 +++++++++++--- tests/virbuftest.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 3 deletions(-)
As there are real bugs that will be fixed once we use the correct entities, I'm looking forward to v2.
This fixes the real bug of libvirt generating unparsable XML, which breaks creation of any VMs in virt-manager. Jan

On Mon, Mar 30, 2015 at 07:06:45AM -0600, Eric Blake wrote:
On 03/30/2015 05:02 AM, Ján Tomko wrote:
These cannot be represented in XML.
Yes they can, via entities. DV would know for sure, but I think that is the entity for the C byte '\1'.
no they can't :-) A character must match prod Char, even when using a CharRef http://www.w3.org/TR/REC-xml/#NT-Char [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
We have been stripping them, but only if the string had characters that needed escaping: <>"'&
Extend the strcspn check to include control codes, and strip them even if we don't do any escaping.
NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them.
you can't escape them with a CharRef for sure http://www.w3.org/TR/REC-xml/#wf-Legalchar Characters referred to using character references must match the production for Char. That time Ján is right :-) Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

On 03/30/2015 09:50 AM, Daniel Veillard wrote:
NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them.
you can't escape them with a CharRef for sure
http://www.w3.org/TR/REC-xml/#wf-Legalchar Characters referred to using character references must match the production for Char.
That time Ján is right :-)
Ouch. Then how do we represent the name of a storage volume, when the file system allows arbitrary bytes including control characters, in the volume name, but where we are restricted to only using valid XML? Do we just silently ignore such files as impossible volumes that libvirt cannot manage? (I'd rather omit such a volume from the list in the pool, than silently munge its name into something incorrect) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 03/30/2015 12:56 PM, Eric Blake wrote:
On 03/30/2015 09:50 AM, Daniel Veillard wrote:
NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them.
you can't escape them with a CharRef for sure
http://www.w3.org/TR/REC-xml/#wf-Legalchar Characters referred to using character references must match the production for Char.
That time Ján is right :-)
Ouch. Then how do we represent the name of a storage volume, when the file system allows arbitrary bytes including control characters, in the volume name, but where we are restricted to only using valid XML? Do we just silently ignore such files as impossible volumes that libvirt cannot manage? (I'd rather omit such a volume from the list in the pool, than silently munge its name into something incorrect)
I'd say just make a follow up patch/bz to reject unrepresentable filenames before they hit the XML. It's a much less serious problem IMO - Cole

On Mon, Mar 30, 2015 at 10:56:11AM -0600, Eric Blake wrote:
On 03/30/2015 09:50 AM, Daniel Veillard wrote:
NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them.
you can't escape them with a CharRef for sure
http://www.w3.org/TR/REC-xml/#wf-Legalchar Characters referred to using character references must match the production for Char.
That time Ján is right :-)
Ouch. Then how do we represent the name of a storage volume, when the file system allows arbitrary bytes including control characters, in the volume name, but where we are restricted to only using valid XML? Do we just silently ignore such files as impossible volumes that libvirt cannot manage? (I'd rather omit such a volume from the list in the pool, than silently munge its name into something incorrect)
Since if such an invalid CharRef were to hit libxml2 you would get a parser error and no result. So you can safely assume nobody ever has experienced those. Then you can try to push an additional patch doing a libvirt escaping but of only those problematic characters prior to the encoding in the XML. Then escape them back when reading from the XML to libvirt internals. This should not affect any deployed instance since they would be unparseable if that was the case. I would suggest using the same charref escaping but before passing to XML, e.g. real path: /foo\3bar libvirt encoded: /foobar XML encoded: /foobar you also need to catch & and give him special status real path: /foo&bar libvirt encoded: /foo&bar XML encoded: /foo&bar after libvirt parsing you end up with /foobar and each time you see numericsequence; you translate that to the equivalent UTF-8 character. Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] As a first approach, I would suggest just detecting bytes 1-8 0xB-0x1F and giving them the treatment, the probability of hitting surrogates in UTF-8 filesnames seems low enough that the patch should work in general. Whether using /foobar vs. /foox3;bar is a matter of taste you only need to handle one IMHO. Add a little regression tests with all the lower caracter and & use in the path and I think you're covered. Sounds too late for 1.2.14 though, Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (5)
-
Cole Robinson
-
Daniel Veillard
-
Eric Blake
-
Ján Tomko
-
Pavel Hrdina