
On Mon, Oct 04, 2010 at 08:38:57AM +0200, Daniel Veillard wrote:
On Sun, Oct 03, 2010 at 11:51:12PM +1100, Justin Clift wrote:
On 10/03/2010 08:33 PM, Richard W.M. Jones wrote: <snip>
Indeed. I'm sure we need a whitelist, not a blacklist as suggested by the other comment. All domains I'd ever want to create would match the regexp
^[[:alpha:]][-_[:alnum:]]*$
This might break existing users however.
Wonder if there are characters supported by some hypervisors, but not others?
I remember we had troubles with Xen, a long time ago, yes So unfortunately this is really hypervsor specific... Maybe we could have a generic checking routine but only providing a warning when the name isn't a simple name the XML way. One of the problem of the checking too is that most of the hypervisor APIs don't say a word about encoding, so you're not manipulating characters but 0 terminated byte strings. From there even your simple regexp goes havoc because what is an alphanumeric character, requires character analysis and you need the encoding for this. At least at libvirt API things are rather clear, in XML data there is no ambiguity possible, and outside we expect strings to be UTF-8. Actually I think that for ESX since all exchanges with the hypervisor are XML based there isn't that ambiguity about encoding at least.
ie maybe Xen supports '/', '*', '+' in guest names, but ESX doesn't
That could lead to some interesting guest import problems. :(
goes beyond that, someone using any non-ascii name will hit hypervisor specific behaviour, ISO-Latin, asian language ... and we habe no control over this except for some checking and the possibility of a warning.
I think any reasonable analysis of this should start with where the names come from: - virDomainDefineXML (eg. virsh define, virt-install, V2V import etc) - a list of existing domains from a hypervisor API (eg. /etc/xen files, Xen hypercall, ESX XMLRPC call) - already defined in an older version of libvirt which didn't do checking - [any others?] For the virDomainDefineXML route, we (a) know the names are UTF-8, and (b) know that these domains are being created for the first time. And I think for this route we should add a regexp-like restriction. (Note when I wrote [:alnum:] before, that ought to cover all Unicode characters in the alphanumeric classes, so it doesn't exclude non-US characters). There are further points that may need to be fixed within the drivers. The drivers are probably just passing the UTF-8 strings through to everything, but may need to do conversion. eg. If I've learned anything about Microsoft developers, then a hypothetical Hyper-V driver would almost certainly need to convert between UTF-8 and UTF-16LE. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v