[libvirt] PATCH: Support container XML enhancements

This is something I previously submitted as part of one of the LXC patches, but I figure it makes sense on its own, since OpenVZ needs this now too. This adds two new XML elements to the domain XML format: - An <init> block within <os> allowing specification of the path for a binary to run when starting the container - aka 'init' by any other name. First we also specify that all containers will use an OS type of 'exe' - as in executable - the container equivalent of 'hvm' <os> <type>exe</type> <init>/sbin/init</init> </os> - An <filesystem> element for specifying how the container's filesystem is to be provided. This can actually be useful for full-machine virt too such as KVM which have host filesystem pass-through. There are various ways to configure it: eg to use '/some/directory' as the root filesystem for a container <filesystem type='mount'> <source dir='/some/directory'/> <target dir='/'/> </filesystem> eg to use a template called 'fedora9web' as the root filesystem for a container <filesystem type='template'> <source name='fedora9web'/> <target dir='/'/> </filesystem> eg to use a file containing a filesystem as the root filesystem <filesystem type='file'> <source file='/some/file.img'/> <target dir='/'/> </filesystem> eg to use a disk partition or other block device (eg LVM) containing a filesystem as the root filesystem <filesystem type='block'> <source dev='/dev/VolGroup00/Fedora9Web'/> <target dir='/'/> </filesystem> If setting the root filesystem, the target path will be '/', some container based virt allows the host OS root filesystem to be used in the guest, and merely specify additive mounts at specific locations, eg to override just /home within a container <filesystem type='mount'> <source dir='/some/directory'/> <target dir='/home'/> </filesystem> I believe this should satisfy all the OpenVZ, LXC and Linux-VServer drivers' requirements around filesystems Daniel diff -r 5614da5fe9ef src/domain_conf.c --- a/src/domain_conf.c Fri Jul 25 15:38:46 2008 +0100 +++ b/src/domain_conf.c Tue Jul 29 16:03:38 2008 +0100 @@ -86,6 +86,12 @@ "virtio", "xen") +VIR_ENUM_IMPL(virDomainFS, VIR_DOMAIN_FS_TYPE_LAST, + "mount", + "block", + "file", + "template") + VIR_ENUM_IMPL(virDomainNet, VIR_DOMAIN_NET_TYPE_LAST, "user", "ethernet", @@ -234,6 +240,18 @@ VIR_FREE(def->driverType); virDomainDiskDefFree(def->next); + VIR_FREE(def); +} + +void virDomainFSDefFree(virDomainFSDefPtr def) +{ + if (!def) + return; + + VIR_FREE(def->src); + VIR_FREE(def->dst); + + virDomainFSDefFree(def->next); VIR_FREE(def); } @@ -345,6 +363,7 @@ virDomainGraphicsDefFree(def->graphics); virDomainInputDefFree(def->inputs); virDomainDiskDefFree(def->disks); + virDomainFSDefFree(def->fss); virDomainNetDefFree(def->nets); virDomainChrDefFree(def->serials); virDomainChrDefFree(def->parallels); @@ -355,6 +374,7 @@ VIR_FREE(def->os.type); VIR_FREE(def->os.arch); VIR_FREE(def->os.machine); + VIR_FREE(def->os.init); VIR_FREE(def->os.kernel); VIR_FREE(def->os.initrd); VIR_FREE(def->os.cmdline); @@ -620,6 +640,89 @@ error: virDomainDiskDefFree(def); + def = NULL; + goto cleanup; +} + + +/* Parse the XML definition for a disk + * @param node XML nodeset to parse for disk definition + */ +static virDomainFSDefPtr +virDomainFSDefParseXML(virConnectPtr conn, + xmlNodePtr node) { + virDomainFSDefPtr def; + xmlNodePtr cur; + char *type = NULL; + char *source = NULL; + char *target = NULL; + + if (VIR_ALLOC(def) < 0) { + virDomainReportError(conn, VIR_ERR_NO_MEMORY, NULL); + return NULL; + } + + type = virXMLPropString(node, "type"); + if (type) { + if ((def->type = virDomainFSTypeFromString(type)) < 0) { + virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR, + _("unknown filesystem type '%s'"), type); + goto error; + } + } else { + def->type = VIR_DOMAIN_FS_TYPE_MOUNT; + } + + cur = node->children; + while (cur != NULL) { + if (cur->type == XML_ELEMENT_NODE) { + if ((source == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "source"))) { + + if (def->type == VIR_DOMAIN_FS_TYPE_MOUNT) + source = virXMLPropString(cur, "dir"); + else if (def->type == VIR_DOMAIN_FS_TYPE_FILE) + source = virXMLPropString(cur, "file"); + else if (def->type == VIR_DOMAIN_FS_TYPE_BLOCK) + source = virXMLPropString(cur, "dev"); + else if (def->type == VIR_DOMAIN_FS_TYPE_TEMPLATE) + source = virXMLPropString(cur, "name"); + } else if ((target == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "target"))) { + target = virXMLPropString(cur, "dir"); + } else if (xmlStrEqual(cur->name, BAD_CAST "readonly")) { + def->readonly = 1; + } + } + cur = cur->next; + } + + if (source == NULL) { + virDomainReportError(conn, VIR_ERR_NO_SOURCE, + target ? "%s" : NULL, target); + goto error; + } + + if (target == NULL) { + virDomainReportError(conn, VIR_ERR_NO_TARGET, + source ? "%s" : NULL, source); + goto error; + } + + def->src = source; + source = NULL; + def->dst = target; + target = NULL; + +cleanup: + VIR_FREE(type); + VIR_FREE(target); + VIR_FREE(source); + + return def; + + error: + virDomainFSDefFree(def); def = NULL; goto cleanup; } @@ -1351,6 +1454,10 @@ dev->type = VIR_DOMAIN_DEVICE_DISK; if (!(dev->data.disk = virDomainDiskDefParseXML(conn, node))) goto error; + } else if (xmlStrEqual(node->name, BAD_CAST "filesystem")) { + dev->type = VIR_DOMAIN_DEVICE_FS; + if (!(dev->data.fs = virDomainFSDefParseXML(conn, node))) + goto error; } else if (xmlStrEqual(node->name, BAD_CAST "interface")) { dev->type = VIR_DOMAIN_DEVICE_NET; if (!(dev->data.net = virDomainNetDefParseXML(conn, node))) @@ -1560,7 +1667,21 @@ } } - if (!def->os.bootloader) { + /* + * Booting options for different OS types.... + * + * - A bootloader (and optional kernel+initrd) (xen) + * - A kernel + initrd (xen) + * - A boot device (and optional kernel+initrd) (hvm) + * - An init script (exe) + */ + + if (STREQ(def->os.type, "exe")) { + def->os.init = virXPathString(conn, "string(./os/init[1])", ctxt); + } + + if (STREQ(def->os.type, "xen") || + STREQ(def->os.type, "hvm")) { def->os.kernel = virXPathString(conn, "string(./os/kernel[1])", ctxt); def->os.initrd = virXPathString(conn, "string(./os/initrd[1])", ctxt); def->os.cmdline = virXPathString(conn, "string(./os/cmdline[1])", ctxt); @@ -1610,12 +1731,9 @@ def->os.type, def->os.arch, type); - if (!emulator) { - virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR, - "%s", _("unsupported guest type")); - goto error; - } - if (!(def->emulator = strdup(emulator))) { + + if (emulator && + !(def->emulator = strdup(emulator))) { virDomainReportError(conn, VIR_ERR_NO_MEMORY, NULL); goto error; } @@ -1648,6 +1766,23 @@ ptr = ptr->next; } } + } + VIR_FREE(nodes); + + /* analysis of the filesystems */ + if ((n = virXPathNodeSet(conn, "./devices/filesystem", ctxt, &nodes)) < 0) { + virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR, + "%s", _("cannot extract filesystem devices")); + goto error; + } + for (i = n - 1 ; i >= 0 ; i--) { + virDomainFSDefPtr fs = virDomainFSDefParseXML(conn, + nodes[i]); + if (!fs) + goto error; + + fs->next = def->fss; + def->fss = fs; } VIR_FREE(nodes); @@ -2202,6 +2337,57 @@ } static int +virDomainFSDefFormat(virConnectPtr conn, + virBufferPtr buf, + virDomainFSDefPtr def) +{ + const char *type = virDomainFSTypeToString(def->type); + + if (!type) { + virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR, + _("unexpected filesystem type %d"), def->type); + return -1; + } + + virBufferVSprintf(buf, + " <filesystem type='%s'>\n", + type); + + if (def->src) { + switch (def->type) { + case VIR_DOMAIN_FS_TYPE_MOUNT: + virBufferEscapeString(buf, " <source dir='%s'/>\n", + def->src); + break; + + case VIR_DOMAIN_FS_TYPE_BLOCK: + virBufferEscapeString(buf, " <source dev='%s'/>\n", + def->src); + break; + + case VIR_DOMAIN_FS_TYPE_FILE: + virBufferEscapeString(buf, " <source file='%s'/>\n", + def->src); + break; + + case VIR_DOMAIN_FS_TYPE_TEMPLATE: + virBufferEscapeString(buf, " <source name='%s'/>\n", + def->src); + } + } + + virBufferVSprintf(buf, " <target dir='%s'/>\n", + def->dst); + + if (def->readonly) + virBufferAddLit(buf, " <readonly/>\n"); + + virBufferAddLit(buf, " </filesystem>\n"); + + return 0; +} + +static int virDomainNetDefFormat(virConnectPtr conn, virBufferPtr buf, virDomainNetDefPtr def) @@ -2479,6 +2665,7 @@ unsigned char *uuid; char uuidstr[VIR_UUID_STRING_BUFLEN]; virDomainDiskDefPtr disk; + virDomainFSDefPtr fs; virDomainNetDefPtr net; virDomainSoundDefPtr sound; virDomainInputDefPtr input; @@ -2548,6 +2735,9 @@ else virBufferVSprintf(&buf, ">%s</type>\n", def->os.type); + if (def->os.init) + virBufferEscapeString(&buf, " <init>%s</init>\n", + def->os.init); if (def->os.loader) virBufferEscapeString(&buf, " <loader>%s</loader>\n", def->os.loader); @@ -2621,6 +2811,13 @@ if (virDomainDiskDefFormat(conn, &buf, disk) < 0) goto cleanup; disk = disk->next; + } + + fs = def->fss; + while (fs) { + if (virDomainFSDefFormat(conn, &buf, fs) < 0) + goto cleanup; + fs = fs->next; } net = def->nets; diff -r 5614da5fe9ef src/domain_conf.h --- a/src/domain_conf.h Fri Jul 25 15:38:46 2008 +0100 +++ b/src/domain_conf.h Tue Jul 29 16:03:38 2008 +0100 @@ -91,6 +91,28 @@ unsigned int shared : 1; virDomainDiskDefPtr next; +}; + + +/* Two types of disk backends */ +enum virDomainFSType { + VIR_DOMAIN_FS_TYPE_MOUNT, /* Better named 'bind' */ + VIR_DOMAIN_FS_TYPE_BLOCK, + VIR_DOMAIN_FS_TYPE_FILE, + VIR_DOMAIN_FS_TYPE_TEMPLATE, + + VIR_DOMAIN_FS_TYPE_LAST +}; + +typedef struct _virDomainFSDef virDomainFSDef; +typedef virDomainFSDef *virDomainFSDefPtr; +struct _virDomainFSDef { + int type; + char *src; + char *dst; + unsigned int readonly : 1; + + virDomainFSDefPtr next; }; @@ -262,6 +284,7 @@ /* Flags for the 'type' field in next struct */ enum virDomainDeviceType { VIR_DOMAIN_DEVICE_DISK, + VIR_DOMAIN_DEVICE_FS, VIR_DOMAIN_DEVICE_NET, VIR_DOMAIN_DEVICE_INPUT, VIR_DOMAIN_DEVICE_SOUND, @@ -273,6 +296,7 @@ int type; union { virDomainDiskDefPtr disk; + virDomainFSDefPtr fs; virDomainNetDefPtr net; virDomainInputDefPtr input; virDomainSoundDefPtr sound; @@ -318,6 +342,7 @@ char *machine; int nBootDevs; int bootDevs[VIR_DOMAIN_BOOT_LAST]; + char *init; char *kernel; char *initrd; char *cmdline; @@ -357,6 +382,7 @@ virDomainGraphicsDefPtr graphics; virDomainDiskDefPtr disks; + virDomainFSDefPtr fss; virDomainNetDefPtr nets; virDomainInputDefPtr inputs; virDomainSoundDefPtr sounds; @@ -411,6 +437,7 @@ void virDomainGraphicsDefFree(virDomainGraphicsDefPtr def); void virDomainInputDefFree(virDomainInputDefPtr def); void virDomainDiskDefFree(virDomainDiskDefPtr def); +void virDomainFSDefFree(virDomainFSDefPtr def); void virDomainNetDefFree(virDomainNetDefPtr def); void virDomainChrDefFree(virDomainChrDefPtr def); void virDomainSoundDefFree(virDomainSoundDefPtr def); @@ -481,6 +508,7 @@ VIR_ENUM_DECL(virDomainDisk) VIR_ENUM_DECL(virDomainDiskDevice) VIR_ENUM_DECL(virDomainDiskBus) +VIR_ENUM_DECL(virDomainFS) VIR_ENUM_DECL(virDomainNet) VIR_ENUM_DECL(virDomainChr) VIR_ENUM_DECL(virDomainSoundModel) -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange пишет:
eg to use a template called 'fedora9web' as the root filesystem for a container
<filesystem type='template'> <source name='fedora9web'/> <target dir='/'/> </filesystem>
Daniel, OpenVZ also require quota tags <quota type="size" max="10000"/> <quota type="inodes" max="200000"/>

On Tue, Jul 29, 2008 at 07:33:29PM +0400, Evgeniy Sokolov wrote:
Daniel P. Berrange ??????????:
eg to use a template called 'fedora9web' as the root filesystem for a container
<filesystem type='template'> <source name='fedora9web'/> <target dir='/'/> </filesystem>
Daniel, OpenVZ also require quota tags
<quota type="size" max="10000"/> <quota type="inodes" max="200000"/>
I'd like to deal with those in a separate patch from the core filesystem functionality support - since they're just a tuning parameter it isn't critical to have them supported immediately. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Jul 29, 2008 at 04:20:14PM +0100, Daniel P. Berrange wrote:
This is something I previously submitted as part of one of the LXC patches, but I figure it makes sense on its own, since OpenVZ needs this now too.
No update to libvirt.rng? Or test cases?
<os> <type>exe</type> <init>/sbin/init</init> </os>
Will this be optional? Whilst it doesn't exist in any form now, Solaris Zones has no such concept (it's /always/ init, but it's also completely private to the implementation and should be exposed).
<filesystem type='mount'> <source dir='/some/directory'/> <target dir='/'/> </filesystem>
No facility for mount options? I don't think any of the stated options would work for ZFS dataset delegation, though I suppose that could be added later if it happens. What is template? regards john

On Tue, Jul 29, 2008 at 04:44:32PM +0100, John Levon wrote:
On Tue, Jul 29, 2008 at 04:20:14PM +0100, Daniel P. Berrange wrote:
This is something I previously submitted as part of one of the LXC patches, but I figure it makes sense on its own, since OpenVZ needs this now too.
No update to libvirt.rng? Or test cases?
Mmm, yes need the RNG file at least. We need to have a generic test suite for validating the RNG against the XML files in tests/, as well as ad-hoc XML files we may add
<os> <type>exe</type> <init>/sbin/init</init> </os>
Will this be optional? Whilst it doesn't exist in any form now, Solaris Zones has no such concept (it's /always/ init, but it's also completely private to the implementation and should be exposed).
Yes it is completely optional - its not required upon input. If the driver has a default value it wants to expose it can set it, so its visible upon XML dump.
<filesystem type='mount'> <source dir='/some/directory'/> <target dir='/'/> </filesystem>
No facility for mount options?
Not until we have a concrete need for them in one of the drivers. We can add them as attributes on the <source> tag, or child elements, as we find need for them.
I don't think any of the stated options would work for ZFS dataset delegation, though I suppose that could be added later if it happens.
What is ZFS dataset delegation ?
What is template?
Templates are a concept OpenVZ / VServer have to simplify management of container filesystems. Basically a 'canned' filesystem image, that is instantiated for each container on demand. eg, a tar.gz containing the FS, that is extracted once for each container using it. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Jul 29, 2008 at 04:54:09PM +0100, Daniel P. Berrange wrote:
This is something I previously submitted as part of one of the LXC patches, but I figure it makes sense on its own, since OpenVZ needs this now too.
No update to libvirt.rng? Or test cases?
Mmm, yes need the RNG file at least. We need to have a generic test suite for validating the RNG against the XML files in tests/, as well as ad-hoc XML files we may add
Right. I have a small virt-convert test suite, but of course that uses Python. But it's basically just doing xmllint --noout --relaxng doc/libvirt.rng foo.xml I'm a little short on time to do the same for libvirt right now alas.
I don't think any of the stated options would work for ZFS dataset delegation, though I suppose that could be added later if it happens.
What is ZFS dataset delegation ?
It allows you to place an entire ZFS hierarchy under control of a zone, e.g. I can say that the ZFS dataset "export/foo" is accessible to the zone and it can freely create sub-filesystems, snapshot, etc. It could almost hijack one of the other types if it weren't for the absolute path thing. regards john

On Tue, Jul 29, 2008 at 05:15:02PM +0100, John Levon wrote:
On Tue, Jul 29, 2008 at 04:54:09PM +0100, Daniel P. Berrange wrote:
I don't think any of the stated options would work for ZFS dataset delegation, though I suppose that could be added later if it happens.
What is ZFS dataset delegation ?
It allows you to place an entire ZFS hierarchy under control of a zone, e.g. I can say that the ZFS dataset "export/foo" is accessible to the zone and it can freely create sub-filesystems, snapshot, etc. It could almost hijack one of the other types if it weren't for the absolute path thing.
So does 'export/foo' become the root filesystem (/) of the zone ? Or is it sharing the same root filesystem, and more akin to granting permissions over 'export/foo' ? I'm not against to adding other types if it doesn't fit the model of any others I've suggested. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Jul 29, 2008 at 05:56:38PM +0100, Daniel P. Berrange wrote:
It allows you to place an entire ZFS hierarchy under control of a zone, e.g. I can say that the ZFS dataset "export/foo" is accessible to the zone and it can freely create sub-filesystems, snapshot, etc. It could almost hijack one of the other types if it weren't for the absolute path thing.
So does 'export/foo' become the root filesystem (/) of the zone ? Or is
Nope, it's just available (typically it'd get mounted as /export/foo in the zone). You can do the same with the root filesystem of course, but then the config could use an absolute path since it (called 'zonepath' does have to be absolute essentially.
over 'export/foo' ? I'm not against to adding other types if it doesn't fit the model of any others I've suggested.
OK regards john

On Tue, Jul 29, 2008 at 04:44:32PM +0100, John Levon wrote:
On Tue, Jul 29, 2008 at 04:20:14PM +0100, Daniel P. Berrange wrote:
This is something I previously submitted as part of one of the LXC patches, but I figure it makes sense on its own, since OpenVZ needs this now too.
No update to libvirt.rng? Or test cases?
I added the libvirt.rng update and a test XML data file to validate when I comitted this patch Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Jul 29, 2008 at 04:20:14PM +0100, Daniel P. Berrange wrote:
This is something I previously submitted as part of one of the LXC patches, but I figure it makes sense on its own, since OpenVZ needs this now too.
This adds two new XML elements to the domain XML format:
Yep this was discussed at the time when the old OpenVZ format was on the table. Looks fine to me
- An <init> block within <os> allowing specification of the path for a binary to run when starting the container - aka 'init' by any other [...] - An <filesystem> element for specifying how the container's filesystem is to be provided. This can actually be useful for full-machine virt [...] eg to use a template called 'fedora9web' as the root filesystem for a container
<filesystem type='template'> <source name='fedora9web'/> <target dir='/'/> </filesystem>
I think the template type is the only concept which is really OpenVZ specific (so far) everything else makes sense on a more general basis as long a the hypervisor expose a notion of filesystem and not just block devices. [...]
I believe this should satisfy all the OpenVZ, LXC and Linux-VServer drivers' requirements around filesystems
+1 from me, The RNG should really be extended to support it as John pointed out, we should really try to keep it in sync with the parsing routine now that they are centralized. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Wed, Jul 30, 2008 at 05:04:25AM -0400, Daniel Veillard wrote:
On Tue, Jul 29, 2008 at 04:20:14PM +0100, Daniel P. Berrange wrote:
This is something I previously submitted as part of one of the LXC patches, but I figure it makes sense on its own, since OpenVZ needs this now too.
This adds two new XML elements to the domain XML format:
Yep this was discussed at the time when the old OpenVZ format was on the table. Looks fine to me
- An <init> block within <os> allowing specification of the path for a binary to run when starting the container - aka 'init' by any other [...] - An <filesystem> element for specifying how the container's filesystem is to be provided. This can actually be useful for full-machine virt [...] eg to use a template called 'fedora9web' as the root filesystem for a container
<filesystem type='template'> <source name='fedora9web'/> <target dir='/'/> </filesystem>
I think the template type is the only concept which is really OpenVZ specific (so far) everything else makes sense on a more general basis as long a the hypervisor expose a notion of filesystem and not just block devices.
I believe Linux-VServer uses this template capability as well. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Evgeniy Sokolov
-
John Levon