On Mon, Oct 04, 2010 at 08:38:57AM +0200, Daniel Veillard wrote:
On Sun, Oct 03, 2010 at 11:51:12PM +1100, Justin Clift wrote:
> On 10/03/2010 08:33 PM, Richard W.M. Jones wrote:
> <snip>
> >Indeed. I'm sure we need a whitelist, not a blacklist as suggested by
> >the other comment. All domains I'd ever want to create would match
> >the regexp
> >
> >^[[:alpha:]][-_[:alnum:]]*$
> >
> >This might break existing users however.
>
> Wonder if there are characters supported by some hypervisors, but not
> others?
I remember we had troubles with Xen, a long time ago, yes
So unfortunately this is really hypervsor specific... Maybe we could
have a generic checking routine but only providing a warning when
the name isn't a simple name the XML way. One of the problem of the
checking too is that most of the hypervisor APIs don't say a word about
encoding, so you're not manipulating characters but 0 terminated byte
strings. From there even your simple regexp goes havoc because what is
an alphanumeric character, requires character analysis and you need the
encoding for this. At least at libvirt API things are rather clear,
in XML data there is no ambiguity possible, and outside we expect
strings to be UTF-8.
Actually I think that for ESX since all exchanges with the hypervisor
are XML based there isn't that ambiguity about encoding at least.
> ie maybe Xen supports '/', '*', '+' in guest names, but
ESX doesn't
>
> That could lead to some interesting guest import problems. :(
goes beyond that, someone using any non-ascii name will hit hypervisor
specific behaviour, ISO-Latin, asian language ... and we habe no control
over this except for some checking and the possibility of a warning.
I think any reasonable analysis of this should start with where the
names come from:
- virDomainDefineXML (eg. virsh define, virt-install, V2V import etc)
- a list of existing domains from a hypervisor API
(eg. /etc/xen files, Xen hypercall, ESX XMLRPC call)
- already defined in an older version of libvirt which didn't do checking
- [any others?]
For the virDomainDefineXML route, we (a) know the names are UTF-8, and
(b) know that these domains are being created for the first time. And
I think for this route we should add a regexp-like restriction. (Note
when I wrote [:alnum:] before, that ought to cover all Unicode
characters in the alphanumeric classes, so it doesn't exclude non-US
characters).
There are further points that may need to be fixed within the drivers.
The drivers are probably just passing the UTF-8 strings through to
everything, but may need to do conversion. eg. If I've learned
anything about Microsoft developers, then a hypothetical Hyper-V
driver would almost certainly need to convert between UTF-8 and
UTF-16LE.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v