Re: [libvirt] [PATCH] esx: Handle name escaping properly

Wednesday, 13 October 2010

2010/10/13 Daniel Veillard <veillard(a)redhat.com&gt;:
...
 On Wed, Oct 13, 2010 at 11:06:44AM +0200, Matthias Bolte wrote:
> VMware uses a mix of percent-, pipe- and base64-encoding in
> different combinations in different places.
>
> Add a testcase for this.
> ---
>  src/esx/README                           |   25 ++++
>  src/esx/esx_driver.c                     |   72 ++++++-----
>  src/esx/esx_storage_driver.c             |   42 ++++++-
>  src/esx/esx_util.c                       |  198 ++++++++++++++++++++++++++++++
>  src/esx/esx_util.h                       |   18 +++
>  src/esx/esx_vi.c                         |    6 +
>  src/esx/esx_vmx.c                        |   88 +++++---------
>  tests/esxutilstest.c                     |   51 ++++++++
>  tests/xml2vmxdata/xml2vmx-annotation.vmx |    2 +-
>  9 files changed, 405 insertions(+), 97 deletions(-)

  That sounds vaguely familiar, I think I reviewed such a patch last
  month, right ? 
I've sworn about this on IRC last week, so that's probably why it
sounds familiar to you :)

...
 [...]
>
> -    virBufferURIEncodeString(&buffer, def->name);
> +    escapedName = esxUtil_EscapeDatastoreItem(def->name);
> +

  okay, we really need to make sure the escaping is XML compatible
  though 
ESX in general accepts UTF-8 and the escaping as it is doesn't add XML
incompatible characters like <. But I'm not sure if the SOAP binding
in the driver uses virBufferEscapeString in all critical places.
During testing this I also used domain names that contained < and I
didn't hit any obvious problems.

I'll have to review the SOAP bindings for proper XML escaping. I think
there are some places that don't escape properly, but didn't surface
as problems yet.

...
> @@ -621,3 +622,200 @@ esxUtil_ReformatUuid(const char *input,
char *output)
 [...]
> +char *
> +esxUtil_EscapeDatastoreItem(const char *string)
> +{
> +    char *replaced = strdup(string);
> +    char *escaped1;
> +    char *escaped2 = NULL;
> +
> +    if (replaced == NULL) {
> +        virReportOOMError();
> +        return NULL;
> +    }
> +
> +    esxUtil_ReplaceSpecialWindowsPathChars(replaced);
> +
> +    escaped1 = esxUtil_EscapeHexPercent(replaced);
> +
> +    if (escaped1 == NULL) {
> +        goto cleanup;
> +    }
> +
> +    escaped2 = esxUtil_EscapeBase64(escaped1);
> +

  Okay none of those may lead to a problem for XML ... 
As I said the newly added escaping doesn't create new XML related problems.

...
 [...]
> --- a/tests/esxutilstest.c
> +++ b/tests/esxutilstest.c
> @@ -227,6 +227,56 @@ testConvertDateTimeToCalendarTime(const void *data
ATTRIBUTE_UNUSED)
>
>
>
> +struct testDatastoreItem {
> +    const char *string;
> +    const char *escaped;
> +};
> +
> +static struct testDatastoreItem datastoreItems[] = {
> +    { "normal", "normal" },
> +    { "Aä1ö2ü3ß4#5~6!7§8/9%Z",
> +      "A+w6Q-1+w7Y-2+w7w-3+w58-4+Iw-5+fg-6+IQ-7+wqc-8+JQ-2f9+JQ-25Z" },
> +    { "Z~6!7§8/9%0#1\"2'3`4&A",
> +      "Z+fg-6+IQ-7+wqc-8+JQ-2f9+JQ-250+Iw-1_2'3+YA-4+Jg-A" },
> +    { "標準語", "+5qiZ5rqW6Kqe" },
> +    { "!\"#$%&'()*+,-./0123456789:;<=>?",
> +      "+IQ-_+IyQl-25+Jg-'()_+Kw-,-.+JQ-2f0123456789_+Ow-_+PQ-__" },
> +    { "A Z[\\]^_B", "A Z+WyU-5c+XV4-_B" },
> +    { "A`B@{|}~DEL", "A+YA-B+QHs-_+fX4-DEL" },
> +    { "hÀÁÂÃÄÅH", "h+w4DDgcOCw4PDhMOF-H" },
> +    { "A쿀Z", "A+7L+A-Z" },
> +    { "!쿀A", "+Iey,gA-A" },
> +    { "~~~", "+fn5+" },
> +    { "~~~A", "+fn5+-A" },
> +    { "K%U/H\\Z", "K+JQ-25U+JQ-2fH+JQ-5cZ" },
> +};

  Ouch :-)
 One question for the C purists. How do we know how characters outside of
 the ASCII range may be interpreted by a compiler ?  The mail as received is
 made of one mime part, not multipart, it's defined as

 Content-Type: text/plain; charset="utf-8"
 Content-Transfer-Encoding: base64

 so my mail reader probably assumes that this is utf8 characters, carried
 as a base64 encoded chunk of ASCII. Since my terminal is displaying
 in UTF-8 I apparently see your patch as it's expected, but as soon
 as it's saved on disk, most Unix/Linux tools will interpret those
 character using the locale of the user, i.e. if I were to compile this
 in a terminal using a non-UTF-8 locale, the compiler may very well
 interpret this slightly differently, okay this is data so maybe it's
 just bytes being copied without risk, but I'm still a bit worried there.

 I like the idea of testing this API seriously, but at the C level
 shouldn't we encode those in an ASCII way to be sure ?

 Daniel

I didn't thought of this issue.

As discussed on IRC, I replaced all non-ASCII chars with their UTF-8
encoding in octal, for example ä -> \303\244.

So here is v2 attached that fixes the test case encoding issue.

Matthias

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH] esx: Handle name escaping properly