Re: [libvirt] [PATCH] esx: Handle name escaping properly

13 Oct 2010

On Wed, Oct 13, 2010 at 11:06:44AM +0200, Matthias Bolte wrote:
...
VMware uses a mix of percent-, pipe- and base64-encoding in
different combinations in different places.
Add a testcase for this.
---
 src/esx/README                           |   25 ++++
 src/esx/esx_driver.c                     |   72 ++++++-----
 src/esx/esx_storage_driver.c             |   42 ++++++-
 src/esx/esx_util.c                       |  198 ++++++++++++++++++++++++++++++
 src/esx/esx_util.h                       |   18 +++
 src/esx/esx_vi.c                         |    6 +
 src/esx/esx_vmx.c                        |   88 +++++---------
 tests/esxutilstest.c                     |   51 ++++++++
 tests/xml2vmxdata/xml2vmx-annotation.vmx |    2 +-
 9 files changed, 405 insertions(+), 97 deletions(-)
That sounds vaguely familiar, I think I reviewed such a patch last
  month, right ?

[...]
...
-    virBufferURIEncodeString(&buffer, def->name);
+    escapedName = esxUtil_EscapeDatastoreItem(def->name);
+
okay, we really need to make sure the escaping is XML compatible
  though
...
@@ -621,3 +622,200 @@ esxUtil_ReformatUuid(const char *input, char *output)
[...]
+char *
+esxUtil_EscapeDatastoreItem(const char *string)
+{
+    char *replaced = strdup(string);
+    char *escaped1;
+    char *escaped2 = NULL;
+
+    if (replaced == NULL) {
+        virReportOOMError();
+        return NULL;
+    }
+
+    esxUtil_ReplaceSpecialWindowsPathChars(replaced);
+
+    escaped1 = esxUtil_EscapeHexPercent(replaced);
+
+    if (escaped1 == NULL) {
+        goto cleanup;
+    }
+
+    escaped2 = esxUtil_EscapeBase64(escaped1);
+
Okay none of those may lead to a problem for XML ...

[...]
...

--- a/tests/esxutilstest.c
+++ b/tests/esxutilstest.c
@@ -227,6 +227,56 @@ testConvertDateTimeToCalendarTime(const void *data ATTRIBUTE_UNUSED)
+struct testDatastoreItem {
+    const char *string;
+    const char *escaped;
+};
+
+static struct testDatastoreItem datastoreItems[] = {
+    { "normal", "normal" },
+    { "Aä1ö2ü3ß4#5~6!7§8/9%Z",
+      "A+w6Q-1+w7Y-2+w7w-3+w58-4+Iw-5+fg-6+IQ-7+wqc-8+JQ-2f9+JQ-25Z" },
+    { "Z~6!7§8/9%0#1\"2'3`4&A",
+      "Z+fg-6+IQ-7+wqc-8+JQ-2f9+JQ-250+Iw-1_2'3+YA-4+Jg-A" },
+    { "標準語", "+5qiZ5rqW6Kqe" },
+    { "!\"#$%&'()*+,-./0123456789:;<=>?",
+      "+IQ-_+IyQl-25+Jg-'()_+Kw-,-.+JQ-2f0123456789_+Ow-_+PQ-__" },
+    { "A Z[\\]^_B", "A Z+WyU-5c+XV4-_B" },
+    { "A`B@{|}~DEL", "A+YA-B+QHs-_+fX4-DEL" },
+    { "hÀÁÂÃÄÅH", "h+w4DDgcOCw4PDhMOF-H" },
+    { "A쿀Z", "A+7L+A-Z" },
+    { "!쿀A", "+Iey,gA-A" },
+    { "~~~", "+fn5+" },
+    { "~~~A", "+fn5+-A" },
+    { "K%U/H\\Z", "K+JQ-25U+JQ-2fH+JQ-5cZ" },
+};
Ouch :-)
One question for the C purists. How do we know how characters outside of
the ASCII range may be interpreted by a compiler ?  The mail as received is
made of one mime part, not multipart, it's defined as

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

so my mail reader probably assumes that this is utf8 characters, carried
as a base64 encoded chunk of ASCII. Since my terminal is displaying
in UTF-8 I apparently see your patch as it's expected, but as soon
as it's saved on disk, most Unix/Linux tools will interpret those
character using the locale of the user, i.e. if I were to compile this
in a terminal using a non-UTF-8 locale, the compiler may very well
interpret this slightly differently, okay this is data so maybe it's
just bytes being copied without risk, but I'm still a bit worried there.

I like the idea of testing this API seriously, but at the C level
shouldn't we encode those in an ASCII way to be sure ?

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/