[libvirt] [PATCH 0/7] Support NBD volumes with LXC containers

This series adds support for using non-raw disks with LXC containers by leverging NBD + qemu-nbd as an alternative to the loopback device.

From: "Daniel P. Berrange" <berrange@redhat.com> Currently the LXC controller creates the cgroup, configures the resources and adds the task all in one go. This is not sufficiently flexible for the forthcoming NBD integration. We need to make sure the NBD process gets into the right cgroup immediately, but we can not have limits (in particular the device ACL) applied at the point where we start qemu-nbd. So create a virLXCCgroupCreate method which creates the cgroup and adds the current ask to be called early, and leave virLXCCgroupSetup to only do resource config. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_cgroup.c | 39 +++++++++++++++++++++++++++------------ src/lxc/lxc_cgroup.h | 4 +++- src/lxc/lxc_controller.c | 12 +++++++++--- 3 files changed, 39 insertions(+), 16 deletions(-) diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c index a075335..fa47229 100644 --- a/src/lxc/lxc_cgroup.c +++ b/src/lxc/lxc_cgroup.c @@ -472,7 +472,7 @@ cleanup: } -int virLXCCgroupSetup(virDomainDefPtr def) +virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def) { virCgroupPtr driver = NULL; virCgroupPtr cgroup = NULL; @@ -494,6 +494,32 @@ int virLXCCgroupSetup(virDomainDefPtr def) goto cleanup; } + rc = virCgroupAddTask(cgroup, getpid()); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to add task %d to cgroup for domain %s"), + getpid(), def->name); + goto cleanup; + } + + ret = 0; + +cleanup: + virCgroupFree(&driver); + if (ret < 0) { + virCgroupFree(&cgroup); + return NULL; + } + + return cgroup; +} + + +int virLXCCgroupSetup(virDomainDefPtr def, + virCgroupPtr cgroup) +{ + int ret = -1; + if (virLXCCgroupSetupCpuTune(def, cgroup) < 0) goto cleanup; @@ -506,19 +532,8 @@ int virLXCCgroupSetup(virDomainDefPtr def) if (virLXCCgroupSetupDeviceACL(def, cgroup) < 0) goto cleanup; - rc = virCgroupAddTask(cgroup, getpid()); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to add task %d to cgroup for domain %s"), - getpid(), def->name); - goto cleanup; - } - ret = 0; cleanup: - virCgroupFree(&cgroup); - virCgroupFree(&driver); - return ret; } diff --git a/src/lxc/lxc_cgroup.h b/src/lxc/lxc_cgroup.h index fff554b..18f54e6 100644 --- a/src/lxc/lxc_cgroup.h +++ b/src/lxc/lxc_cgroup.h @@ -26,7 +26,9 @@ # include "lxc_fuse.h" # include "virusb.h" -int virLXCCgroupSetup(virDomainDefPtr def); +virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def); +int virLXCCgroupSetup(virDomainDefPtr def, + virCgroupPtr cgroup); int virLXCCgroupGetMeminfo(virLXCMeminfoPtr meminfo); int diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index becf811..1508b9c 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -628,7 +628,8 @@ static int virLXCControllerSetupCpuAffinity(virLXCControllerPtr ctrl) * * Returns 0 on success or -1 in case of error */ -static int virLXCControllerSetupResourceLimits(virLXCControllerPtr ctrl) +static int virLXCControllerSetupResourceLimits(virLXCControllerPtr ctrl, + virCgroupPtr cgroup) { if (virLXCControllerSetupCpuAffinity(ctrl) < 0) @@ -637,7 +638,7 @@ static int virLXCControllerSetupResourceLimits(virLXCControllerPtr ctrl) if (virLXCControllerSetupNUMAPolicy(ctrl) < 0) return -1; - return virLXCCgroupSetup(ctrl->def); + return virLXCCgroupSetup(ctrl->def, cgroup); } @@ -1473,6 +1474,7 @@ virLXCControllerRun(virLXCControllerPtr ctrl) int containerhandshake[2] = { -1, -1 }; char **containerTTYPaths = NULL; size_t i; + virCgroupPtr cgroup = NULL; if (VIR_ALLOC_N(containerTTYPaths, ctrl->nconsoles) < 0) { virReportOOMError(); @@ -1494,10 +1496,13 @@ virLXCControllerRun(virLXCControllerPtr ctrl) if (virLXCControllerSetupPrivateNS() < 0) goto cleanup; + if (!(cgroup = virLXCCgroupCreate(ctrl->def))) + goto cleanup; + if (virLXCControllerSetupLoopDevices(ctrl) < 0) goto cleanup; - if (virLXCControllerSetupResourceLimits(ctrl) < 0) + if (virLXCControllerSetupResourceLimits(ctrl, cgroup) < 0) goto cleanup; if (virLXCControllerSetupDevPTS(ctrl) < 0) @@ -1570,6 +1575,7 @@ cleanup: VIR_FREE(containerTTYPaths[i]); VIR_FREE(containerTTYPaths); + virCgroupFree(&cgroup); virLXCControllerStopInit(ctrl); return rc; -- 1.7.11.7

On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Currently the LXC controller creates the cgroup, configures the resources and adds the task all in one go. This is not sufficiently flexible for the forthcoming NBD integration. We need to make sure the NBD process gets into the right cgroup immediately, but we can not have limits (in particular the device ACL) applied at the point where we start qemu-nbd. So create a virLXCCgroupCreate method which creates the cgroup and adds the current ask to be called
s/ask/task
early, and leave virLXCCgroupSetup to only do resource config.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_cgroup.c | 39 +++++++++++++++++++++++++++------------ src/lxc/lxc_cgroup.h | 4 +++- src/lxc/lxc_controller.c | 12 +++++++++--- 3 files changed, 39 insertions(+), 16 deletions(-)
ACK - seems reasonable to me.

From: "Daniel P. Berrange" <berrange@redhat.com> Extend the <driver> element in filesystem devices to allow a storage format to be set. The new attribute uses 'format' to reflect the storage format. This is different from the <driver> element in disk devices which use 'type' to reflect the storage format. THis is because the 'type' attribute on filesystem devices is already used for the driver backend, for which the disk devices use the 'name' attribte. Arggggh. Anyway for disks we have <driver name="qemu" type="raw"/> And for filesystems this change means we now have <driver type="loop" format="raw"/> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- docs/formatdomain.html.in | 24 ++++++++++++++ docs/schemas/domaincommon.rng | 73 ++++++++++++++++++++++++++++++------------- src/conf/domain_conf.c | 25 ++++++++++++--- src/conf/domain_conf.h | 6 ++-- src/qemu/qemu_command.c | 3 +- 5 files changed, 102 insertions(+), 29 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1ef80b0..c0a65a5 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1811,6 +1811,13 @@ <target dir='/import/from/host'/> <readonly/> </filesystem> + <filesystem type='file' accessmode='passthrough'> + <driver name='loop' type='raw'/> + <driver type='path' wrpolicy='immediate'/> + <source file='/export/to/guest.img'/> + <target dir='/import/from/host'/> + <readonly/> + </filesystem> ... </devices> ...</pre> @@ -1902,6 +1909,23 @@ </dd> + <dt><code>driver</code></dt> + <dd> + The optional driver element allows specifying further details + related to the hypervisor driver used to provide the filesystem. + <span class="since">Since 1.0.4</span> + <ul> + <li> + If the hypervisor supports multiple backend drivers, then + the <code>type</code> attribute selects the primary + backend driver name, while the <code>format</code> + attribute provides the format type. For example, LXC + supports a type of "loop", with a format of "raw". QEMU + supports a type of "path" or "handle", but no formats. + </li> + </ul> + </dd> + <dt><code>source</code></dt> <dd> The resource on the host that is being accessed in the guest. The diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 9792065..a75c515 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -853,7 +853,7 @@ <define name="diskspec"> <interleave> <optional> - <ref name="driver"/> + <ref name="diskDriver"/> </optional> <optional> <ref name='diskMirror'/> @@ -1205,7 +1205,7 @@ <!-- Disk may use a special driver for access. --> - <define name="driver"> + <define name="diskDriver"> <element name="driver"> <choice> <group> @@ -1249,13 +1249,13 @@ <optional> <attribute name='type'> <choice> - <ref name='diskFormat'/> + <ref name='storageFormat'/> <value>aio</value> <!-- back-compat for 'raw' --> </choice> </attribute> </optional> </define> - <define name='diskFormat'> + <define name='storageFormat'> <choice> <value>raw</value> <value>dir</value> @@ -1414,6 +1414,9 @@ <attribute name="type"> <value>file</value> </attribute> + <optional> + <ref name="diskDriver"/> + </optional> <interleave> <element name="source"> <attribute name="file"> @@ -1427,6 +1430,9 @@ <attribute name="type"> <value>block</value> </attribute> + <optional> + <ref name="diskDriver"/> + </optional> <interleave> <element name="source"> <attribute name="dev"> @@ -1443,6 +1449,9 @@ <value>mount</value> </attribute> </optional> + <optional> + <ref name="diskDriver"/> + </optional> <interleave> <element name="source"> <attribute name="dir"> @@ -1450,22 +1459,6 @@ </attribute> <empty/> </element> - <optional> - <element name="driver"> - <attribute name="type"> - <choice> - <value>path</value> - <value>handle</value> - </choice> - </attribute> - <optional> - <attribute name="wrpolicy"> - <value>immediate</value> - </attribute> - </optional> - <empty/> - </element> - </optional> </interleave> </group> <group> @@ -1474,6 +1467,9 @@ <value>bind</value> </attribute> </optional> + <optional> + <ref name="diskDriver"/> + </optional> <interleave> <element name="source"> <attribute name="dir"> @@ -1487,6 +1483,9 @@ <attribute name="type"> <value>template</value> </attribute> + <optional> + <ref name="diskDriver"/> + </optional> <interleave> <element name="source"> <attribute name="name"> @@ -1500,6 +1499,9 @@ <attribute name="type"> <value>ram</value> </attribute> + <optional> + <ref name="diskDriver"/> + </optional> <interleave> <element name="source"> <attribute name="usage"> @@ -1557,6 +1559,35 @@ </interleave> </element> </define> + <define name="fsDriver"> + <element name="driver"> + <!-- Annoying inconsistency. 'disk' uses 'name' + for this kind of info, and 'type' for the + storage format. We need the latter too, so + had to invent a new attribute name --> + <optional> + <attribute name="type"> + <choice> + <value>path</value> + <value>handle</value> + <value>loop</value> + </choice> + </attribute> + </optional> + <optional> + <attribute name="format"> + <ref name="storageFormat"/> + </attribute> + </optional> + <optional> + <attribute name="wrpolicy"> + <value>immediate</value> + </attribute> + </optional> + <empty/> + </element> + </define> + <!-- An interface description can either be of type bridge in which case it will use a bridging source, or of type ethernet which uses a device @@ -3609,7 +3640,7 @@ </attribute> <optional> <attribute name='format'> - <ref name='diskFormat'/> + <ref name='storageFormat'/> </attribute> </optional> <optional> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 3278e9c..c344fca 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -329,7 +329,8 @@ VIR_ENUM_IMPL(virDomainFS, VIR_DOMAIN_FS_TYPE_LAST, VIR_ENUM_IMPL(virDomainFSDriverType, VIR_DOMAIN_FS_DRIVER_TYPE_LAST, "default", "path", - "handle") + "handle", + "loop") VIR_ENUM_IMPL(virDomainFSAccessMode, VIR_DOMAIN_FS_ACCESSMODE_LAST, "passthrough", @@ -5015,6 +5016,7 @@ virDomainFSDefParseXML(xmlNodePtr node, char *fsdriver = NULL; char *source = NULL; char *target = NULL; + char *format = NULL; char *accessmode = NULL; char *wrpolicy = NULL; char *usage = NULL; @@ -5083,9 +5085,13 @@ virDomainFSDefParseXML(xmlNodePtr node, target = virXMLPropString(cur, "dir"); } else if (xmlStrEqual(cur->name, BAD_CAST "readonly")) { def->readonly = 1; - } else if (!fsdriver && xmlStrEqual(cur->name, BAD_CAST "driver")) { - fsdriver = virXMLPropString(cur, "type"); - wrpolicy = virXMLPropString(cur, "wrpolicy"); + } else if (xmlStrEqual(cur->name, BAD_CAST "driver")) { + if (!fsdriver) + fsdriver = virXMLPropString(cur, "type"); + if (!wrpolicy) + wrpolicy = virXMLPropString(cur, "wrpolicy"); + if (!format) + format = virXMLPropString(cur, "format"); } } cur = cur->next; @@ -5093,12 +5099,20 @@ virDomainFSDefParseXML(xmlNodePtr node, if (fsdriver) { if ((def->fsdriver = virDomainFSDriverTypeTypeFromString(fsdriver)) <= 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("unknown fs driver type '%s'"), fsdriver); goto error; } } + if (format) { + if ((def->format = virStorageFileFormatTypeFromString(format)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown driver format value '%s'"), format); + goto error; + } + } + if (wrpolicy) { if ((def->wrpolicy = virDomainFSWrpolicyTypeFromString(wrpolicy)) <= 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -5158,6 +5172,7 @@ cleanup: VIR_FREE(wrpolicy); VIR_FREE(usage); VIR_FREE(unit); + VIR_FREE(format); return def; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 96f11ba..e4bc42b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -749,6 +749,7 @@ enum virDomainFSDriverType { VIR_DOMAIN_FS_DRIVER_TYPE_DEFAULT = 0, VIR_DOMAIN_FS_DRIVER_TYPE_PATH, VIR_DOMAIN_FS_DRIVER_TYPE_HANDLE, + VIR_DOMAIN_FS_DRIVER_TYPE_LOOP, VIR_DOMAIN_FS_DRIVER_TYPE_LAST }; @@ -775,9 +776,10 @@ enum virDomainFSWrpolicy { struct _virDomainFSDef { int type; - int fsdriver; - int accessmode; + int fsdriver; /* enum virDomainFSDriverType */ + int accessmode; /* enum virDomainFSAccessMode */ int wrpolicy; /* enum virDomainFSWrpolicy */ + int format; /* enum virStorageFileFormat */ unsigned long long usage; char *src; char *dst; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 03f068c..37ddfb3 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -129,7 +129,8 @@ VIR_ENUM_DECL(qemuDomainFSDriver) VIR_ENUM_IMPL(qemuDomainFSDriver, VIR_DOMAIN_FS_DRIVER_TYPE_LAST, "local", "local", - "handle"); + "handle", + NULL); /** -- 1.7.11.7

On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Extend the <driver> element in filesystem devices to allow a storage format to be set. The new attribute uses 'format' to reflect the storage format. This is different from the <driver> element in disk devices which use 'type' to reflect the storage format. THis
s/THis/This
is because the 'type' attribute on filesystem devices is already used for the driver backend, for which the disk devices use the 'name' attribte. Arggggh.
s/attribte/attribute Your call on the Arggggh
Anyway for disks we have
<driver name="qemu" type="raw"/>
And for filesystems this change means we now have
<driver type="loop" format="raw"/>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- docs/formatdomain.html.in | 24 ++++++++++++++ docs/schemas/domaincommon.rng | 73 ++++++++++++++++++++++++++++++------------- src/conf/domain_conf.c | 25 ++++++++++++--- src/conf/domain_conf.h | 6 ++-- src/qemu/qemu_command.c | 3 +- 5 files changed, 102 insertions(+), 29 deletions(-)
ACK Although the rng defs are still a bit of black magic to me. I suppose one could say the <driver element options could be better described with respect to their possible/valid options in both the hard drives/floppy disks/cdroms and filesystems section of the documentation. With your new description you list since 1.0.4 for the driver element, but it seems using driver was available before that and the new 1.0.4 addition is the loop/format options. At least that's how I read it. John

From: "Daniel P. Berrange" <berrange@redhat.com> The current code for setting up loop devices to LXC disks first does a switch() based on the disk format, then looks at the disk driver name. Reverse this so it first looks at the driver name, and then the disk format. This is more useful since the list of supported disk formats depends on what driver is used. The code for setting loop devices for LXC fs entries also needs to have the same logic added, now the XML schema supports this. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 76 +++++++++++++++++++++++++++--------------------- 1 file changed, 43 insertions(+), 33 deletions(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 1508b9c..8f3ca6a 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -415,17 +415,31 @@ static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) if (fs->type != VIR_DOMAIN_FS_TYPE_FILE) continue; - fd = virLXCControllerSetupLoopDeviceFS(fs); - if (fd < 0) - goto cleanup; + if (fs->fsdriver == VIR_DOMAIN_FS_DRIVER_TYPE_LOOP) { + if (fs->format != VIR_STORAGE_FILE_RAW && + fs->format != VIR_STORAGE_FILE_NONE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("fs format %s is not supported"), + virStorageFileFormatTypeToString(fs->format)); + goto cleanup; + } - VIR_DEBUG("Saving loop fd %d", fd); - if (VIR_EXPAND_N(ctrl->loopDevFds, ctrl->nloopDevs, 1) < 0) { - VIR_FORCE_CLOSE(fd); - virReportOOMError(); - goto cleanup; + fd = virLXCControllerSetupLoopDeviceFS(fs); + if (fd < 0) + goto cleanup; + + VIR_DEBUG("Saving loop fd %d", fd); + if (VIR_EXPAND_N(ctrl->loopDevFds, ctrl->nloopDevs, 1) < 0) { + VIR_FORCE_CLOSE(fd); + virReportOOMError(); + goto cleanup; + } + ctrl->loopDevFds[ctrl->nloopDevs - 1] = fd; + } else { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("fs driver %s is not supported"), + virDomainFSDriverTypeTypeToString(fs->fsdriver)); } - ctrl->loopDevFds[ctrl->nloopDevs - 1] = fd; } for (i = 0 ; i < ctrl->def->ndisks ; i++) { @@ -435,40 +449,36 @@ static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) if (disk->type != VIR_DOMAIN_DISK_TYPE_FILE) continue; - switch (disk->format) { - /* We treat 'none' as meaning 'raw' since we - * don't want to go into the auto-probing - * business for security reasons - */ - case VIR_STORAGE_FILE_RAW: - case VIR_STORAGE_FILE_NONE: - if (disk->driverName && - STRNEQ(disk->driverName, "loop")) { + if (!disk->driverName || + STREQ(disk->driverName, "loop")) { + if (disk->format != VIR_STORAGE_FILE_RAW && + disk->format != VIR_STORAGE_FILE_NONE) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("disk driver %s is not supported"), - disk->driverName); + _("disk format %s is not supported"), + virStorageFileFormatTypeToString(disk->format)); goto cleanup; } + /* We treat 'none' as meaning 'raw' since we + * don't want to go into the auto-probing + * business for security reasons + */ fd = virLXCControllerSetupLoopDeviceDisk(disk); if (fd < 0) goto cleanup; - break; - default: + VIR_DEBUG("Saving loop fd %d", fd); + if (VIR_EXPAND_N(ctrl->loopDevFds, ctrl->nloopDevs, 1) < 0) { + VIR_FORCE_CLOSE(fd); + virReportOOMError(); + goto cleanup; + } + ctrl->loopDevFds[ctrl->nloopDevs - 1] = fd; + } else { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("disk format %s is not supported"), - virStorageFileFormatTypeToString(disk->format)); - goto cleanup; - } - - VIR_DEBUG("Saving loop fd %d", fd); - if (VIR_EXPAND_N(ctrl->loopDevFds, ctrl->nloopDevs, 1) < 0) { - VIR_FORCE_CLOSE(fd); - virReportOOMError(); - goto cleanup; + _("disk driver %s is not supported"), + disk->driverName); } - ctrl->loopDevFds[ctrl->nloopDevs - 1] = fd; } VIR_DEBUG("Setup all loop devices"); -- 1.7.11.7

On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The current code for setting up loop devices to LXC disks first does a switch() based on the disk format, then looks at the disk driver name. Reverse this so it first looks at the driver name, and then the disk format. This is more useful since the list of supported disk formats depends on what driver is used.
The code for setting loop devices for LXC fs entries also needs to have the same logic added, now the XML schema supports this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 76 +++++++++++++++++++++++++++--------------------- 1 file changed, 43 insertions(+), 33 deletions(-)
ACK I take it the checking for FS_DRIVER_TYPE_LOOP in the first loop is just an optimization (or sanity check) over making more calls and determining the same thing. IOW previous the first loop didn't care about fsdriver type before making the SetupLoopDeviceFS call, but that could fail if it wasn't a loop device. John

From: "Daniel P. Berrange" <berrange@redhat.com> Add a virFileNBDDeviceAssociate method, which given a filename will setup a NBD device, using qemu-nbd as the server. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 4 ++ 3 files changed, 129 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5cad990..0607bae 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1253,6 +1253,7 @@ virFileDirectFdFlag; virFileFclose; virFileFdopen; virFileLoopDeviceAssociate; +virFileNBDDeviceAssociate; virFileRewrite; virFileTouch; virFileUpdatePerm; diff --git a/src/util/virfile.c b/src/util/virfile.c index 4a9fa81..095559e 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -631,6 +631,119 @@ cleanup: return lofd; } + +# define SYSFS_BLOCK_DIR "/sys/block" + + +static int +virFileNBDDeviceIsBusy(const char *devname) +{ + char *path; + int ret = -1; + + if (virAsprintf(&path, SYSFS_BLOCK_DIR "/%s/pid", + devname) < 0) { + virReportOOMError(); + return -1; + } + + if (access(path, F_OK) < 0) { + if (errno == ENOENT) + ret = 0; + else + virReportSystemError(errno, + _("Cannot check NBD device %s pid"), + devname); + goto cleanup; + } + ret = 1; + +cleanup: + VIR_FREE(path); + return ret; +} + + +static char * +virFileNBDDeviceFindUnused(void) +{ + DIR *dh; + char *ret = NULL; + struct dirent *de; + + if (!(dh = opendir(SYSFS_BLOCK_DIR))) { + virReportSystemError(errno, + _("Cannot read directory %s"), + SYSFS_BLOCK_DIR); + return NULL; + } + + while ((de = readdir(dh)) != NULL) { + if (STRPREFIX(de->d_name, "nbd")) { + int rv = virFileNBDDeviceIsBusy(de->d_name); + if (rv < 0) + goto cleanup; + if (rv == 0) { + if (virAsprintf(&ret, "/dev/%s", de->d_name) < 0) { + virReportOOMError(); + goto cleanup; + } + goto cleanup; + } + } + } + + virReportSystemError(EBUSY, "%s", + _("No free NBD devices")); + +cleanup: + closedir(dh); + return ret; +} + + +int virFileNBDDeviceAssociate(const char *file, + char **dev, + bool readonly) +{ + char *nbddev; + char *qemunbd; + virCommandPtr cmd = NULL; + int ret = -1; + + if (!(nbddev = virFileNBDDeviceFindUnused())) + goto cleanup; + + if (!(qemunbd = virFindFileInPath("qemu-nbd"))) { + virReportSystemError(ENOENT, "%s", + _("Unable to find 'qemu-nbd' binary in $PATH")); + goto cleanup; + } + + cmd = virCommandNew(qemunbd); + if (readonly) + virCommandAddArg(cmd, "-r"); + virCommandAddArgList(cmd, + "-n", /* Don't cache in qemu-nbd layer */ + "-c", nbddev, + file, NULL); + + /* qemu-nbd will daemonize itself */ + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + *dev = nbddev; + nbddev = NULL; + ret = 0; + +cleanup: + VIR_FREE(nbddev); + VIR_FREE(qemunbd); + virCommandFree(cmd); + return ret; +} + #else /* __linux__ */ int virFileLoopDeviceAssociate(const char *file, @@ -643,4 +756,15 @@ int virFileLoopDeviceAssociate(const char *file, return -1; } +int virFileNBDDeviceAssociate(const char *file, + char **dev ATTRIBUTE_UNUSED, + bool readonly ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, + _("Unable to associate file %s with NBD device"), + file); + *dev = NULL; + return -1; +} + #endif /* __linux__ */ diff --git a/src/util/virfile.h b/src/util/virfile.h index c885b73..6c9179b 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -108,4 +108,8 @@ int virFileUpdatePerm(const char *path, int virFileLoopDeviceAssociate(const char *file, char **dev); +int virFileNBDDeviceAssociate(const char *file, + char **dev, + bool readonly); + #endif /* __VIR_FILES_H */ -- 1.7.11.7

On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Add a virFileNBDDeviceAssociate method, which given a filename will setup a NBD device, using qemu-nbd as the server.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 4 ++ 3 files changed, 129 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5cad990..0607bae 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1253,6 +1253,7 @@ virFileDirectFdFlag; virFileFclose; virFileFdopen; virFileLoopDeviceAssociate; +virFileNBDDeviceAssociate; virFileRewrite; virFileTouch; virFileUpdatePerm; diff --git a/src/util/virfile.c b/src/util/virfile.c index 4a9fa81..095559e 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -631,6 +631,119 @@ cleanup: return lofd; }
+ +# define SYSFS_BLOCK_DIR "/sys/block" + + +static int +virFileNBDDeviceIsBusy(const char *devname) +{ + char *path; + int ret = -1; + + if (virAsprintf(&path, SYSFS_BLOCK_DIR "/%s/pid", + devname) < 0) { + virReportOOMError(); + return -1; + } + + if (access(path, F_OK) < 0) { + if (errno == ENOENT) + ret = 0; + else + virReportSystemError(errno, + _("Cannot check NBD device %s pid"), + devname); + goto cleanup; + } + ret = 1; + +cleanup: + VIR_FREE(path); + return ret; +} + + +static char * +virFileNBDDeviceFindUnused(void) +{ + DIR *dh; + char *ret = NULL; + struct dirent *de; + + if (!(dh = opendir(SYSFS_BLOCK_DIR))) { + virReportSystemError(errno, + _("Cannot read directory %s"), + SYSFS_BLOCK_DIR); + return NULL; + } + + while ((de = readdir(dh)) != NULL) { + if (STRPREFIX(de->d_name, "nbd")) { + int rv = virFileNBDDeviceIsBusy(de->d_name); + if (rv < 0) + goto cleanup; + if (rv == 0) { + if (virAsprintf(&ret, "/dev/%s", de->d_name) < 0) { + virReportOOMError(); + goto cleanup; + } + goto cleanup; + } + } + } + + virReportSystemError(EBUSY, "%s", + _("No free NBD devices")); + +cleanup: + closedir(dh); + return ret; +} + + +int virFileNBDDeviceAssociate(const char *file, + char **dev, + bool readonly) +{ + char *nbddev; + char *qemunbd; + virCommandPtr cmd = NULL; + int ret = -1; + + if (!(nbddev = virFileNBDDeviceFindUnused())) + goto cleanup; + + if (!(qemunbd = virFindFileInPath("qemu-nbd"))) { + virReportSystemError(ENOENT, "%s", + _("Unable to find 'qemu-nbd' binary in $PATH")); + goto cleanup; + } + + cmd = virCommandNew(qemunbd); + if (readonly) + virCommandAddArg(cmd, "-r"); + virCommandAddArgList(cmd, + "-n", /* Don't cache in qemu-nbd layer */ + "-c", nbddev, + file, NULL); + + /* qemu-nbd will daemonize itself */ + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + *dev = nbddev; + nbddev = NULL; + ret = 0; + +cleanup: + VIR_FREE(nbddev); + VIR_FREE(qemunbd); + virCommandFree(cmd); + return ret; +} + #else /* __linux__ */
int virFileLoopDeviceAssociate(const char *file, @@ -643,4 +756,15 @@ int virFileLoopDeviceAssociate(const char *file, return -1; }
+int virFileNBDDeviceAssociate(const char *file, + char **dev ATTRIBUTE_UNUSED, + bool readonly ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, + _("Unable to associate file %s with NBD device"), + file); + *dev = NULL;
Since this is done - should this still be UNUSED in the header?
+ return -1; +} + #endif /* __linux__ */ diff --git a/src/util/virfile.h b/src/util/virfile.h index c885b73..6c9179b 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -108,4 +108,8 @@ int virFileUpdatePerm(const char *path, int virFileLoopDeviceAssociate(const char *file, char **dev);
+int virFileNBDDeviceAssociate(const char *file, + char **dev, + bool readonly); + #endif /* __VIR_FILES_H */
ACK in any case.

On 03/19/2013 07:39 AM, John Ferlan wrote:
On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Add a virFileNBDDeviceAssociate method, which given a filename will setup a NBD device, using qemu-nbd as the server.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> ---
+virFileNBDDeviceFindUnused(void) +{ + DIR *dh; + char *ret = NULL; + struct dirent *de; + + if (!(dh = opendir(SYSFS_BLOCK_DIR))) { + virReportSystemError(errno, + _("Cannot read directory %s"), + SYSFS_BLOCK_DIR); + return NULL; + } + + while ((de = readdir(dh)) != NULL) {
readdir() is an annoying interface. To properly distinguish errors, you have to assign errno to 0 before each iteration, then check if errno is set when the while loop terminates.
+ if (STRPREFIX(de->d_name, "nbd")) { + int rv = virFileNBDDeviceIsBusy(de->d_name); + if (rv < 0) + goto cleanup; + if (rv == 0) { + if (virAsprintf(&ret, "/dev/%s", de->d_name) < 0) { + virReportOOMError(); + goto cleanup; + } + goto cleanup; + } + } + } + + virReportSystemError(EBUSY, "%s", + _("No free NBD devices"));
Your choice of error reporting is to blindly overwrite any (potential) error from readdir. That potentially loses information. Then again, since you normally succeed only by finding a winning existing entry, and since you are at least giving a reasonable error message, even if you overwrote the readdir() errno, I can live with it.
+int virFileNBDDeviceAssociate(const char *file, + char **dev ATTRIBUTE_UNUSED, + bool readonly ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, + _("Unable to associate file %s with NBD device"), + file); + *dev = NULL;
Since this is done - should this still be UNUSED in the header?
It doesn't hurt to leave it in, but it also doesn't hurt to take out the ATTRIBUTE_UNUSED or even drop the '*dev = NULL' line.
ACK in any case.
Agreed. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> The <filesystem> element can now accept a <driver type='nbd'/> as an alternative to 'loop'. The benefit of NBD is support for non-raw disk image formats. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/domain_conf.c | 3 ++- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 1 + 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c344fca..f7e0104 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -330,7 +330,8 @@ VIR_ENUM_IMPL(virDomainFSDriverType, VIR_DOMAIN_FS_DRIVER_TYPE_LAST, "default", "path", "handle", - "loop") + "loop", + "nbd") VIR_ENUM_IMPL(virDomainFSAccessMode, VIR_DOMAIN_FS_ACCESSMODE_LAST, "passthrough", diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e4bc42b..12a1cd2 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -750,6 +750,7 @@ enum virDomainFSDriverType { VIR_DOMAIN_FS_DRIVER_TYPE_PATH, VIR_DOMAIN_FS_DRIVER_TYPE_HANDLE, VIR_DOMAIN_FS_DRIVER_TYPE_LOOP, + VIR_DOMAIN_FS_DRIVER_TYPE_NBD, VIR_DOMAIN_FS_DRIVER_TYPE_LAST }; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 37ddfb3..37719e5 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -130,6 +130,7 @@ VIR_ENUM_IMPL(qemuDomainFSDriver, VIR_DOMAIN_FS_DRIVER_TYPE_LAST, "local", "local", "handle", + NULL, NULL); -- 1.7.11.7

On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The <filesystem> element can now accept a <driver type='nbd'/> as an alternative to 'loop'. The benefit of NBD is support for non-raw disk image formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/domain_conf.c | 3 ++- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 1 + 3 files changed, 4 insertions(+), 1 deletion(-)
ACK John

On 03/15/2013 10:32 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The <filesystem> element can now accept a <driver type='nbd'/> as an alternative to 'loop'. The benefit of NBD is support for non-raw disk image formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/domain_conf.c | 3 ++- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 1 + 3 files changed, 4 insertions(+), 1 deletion(-)
No RNG or formatindomain.html.in changes? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> The LXC driver can already configure <disk> or <filesystem> devices to use the loop device. This extends it to also allow for use of the NBD device, to support non-raw formats. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 2 deletions(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 8f3ca6a..c433fb1 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -403,6 +403,46 @@ static int virLXCControllerSetupLoopDeviceDisk(virDomainDiskDefPtr disk) } +static int virLXCControllerSetupNBDDeviceFS(virDomainFSDefPtr fs) +{ + char *dev; + + if (virFileNBDDeviceAssociate(fs->src, &dev, + !!fs->readonly) < 0) + return -1; + + /* + * We now change it into a block device type, so that + * the rest of container setup 'just works' + */ + fs->type = VIR_DOMAIN_DISK_TYPE_BLOCK; + VIR_FREE(fs->src); + fs->src = dev; + + return 0; +} + + +static int virLXCControllerSetupNBDDeviceDisk(virDomainDiskDefPtr disk) +{ + char *dev; + + if (virFileNBDDeviceAssociate(disk->src, &dev, + !!disk->readonly) < 0) + return -1; + + /* + * We now change it into a block device type, so that + * the rest of container setup 'just works' + */ + disk->type = VIR_DOMAIN_DISK_TYPE_BLOCK; + VIR_FREE(disk->src); + disk->src = dev; + + return 0; +} + + static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) { size_t i; @@ -435,6 +475,9 @@ static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) goto cleanup; } ctrl->loopDevFds[ctrl->nloopDevs - 1] = fd; + } else if (fs->fsdriver == VIR_DOMAIN_FS_DRIVER_TYPE_NBD) { + if (virLXCControllerSetupNBDDeviceFS(fs) < 0) + goto cleanup; } else { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("fs driver %s is not supported"), @@ -449,8 +492,14 @@ static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) if (disk->type != VIR_DOMAIN_DISK_TYPE_FILE) continue; - if (!disk->driverName || - STREQ(disk->driverName, "loop")) { + /* If no driverName is set, we prefer 'loop' for + * dealing with raw or undefined formats. Only + * default to 'nbd' for non-raw formats. + */ + if ((disk->driverName && STREQ(disk->driverName, "loop")) || + (!disk->driverName && + (disk->format == VIR_STORAGE_FILE_RAW || + disk->format == VIR_STORAGE_FILE_NONE))) { if (disk->format != VIR_STORAGE_FILE_RAW && disk->format != VIR_STORAGE_FILE_NONE) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -474,6 +523,17 @@ static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) goto cleanup; } ctrl->loopDevFds[ctrl->nloopDevs - 1] = fd; + } else if (!disk->driverName || + STREQ(disk->driverName, "nbd")) { + if (disk->cachemode != VIR_DOMAIN_DISK_CACHE_DEFAULT && + disk->cachemode != VIR_DOMAIN_DISK_CACHE_DISABLE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Disk cache mode %s is not supported"), + virDomainDiskCacheTypeToString(disk->cachemode)); + goto cleanup; + } + if (virLXCControllerSetupNBDDeviceDisk(disk) < 0) + goto cleanup; } else { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("disk driver %s is not supported"), -- 1.7.11.7

On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The LXC driver can already configure <disk> or <filesystem> devices to use the loop device. This extends it to also allow for use of the NBD device, to support non-raw formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 2 deletions(-)
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 8f3ca6a..c433fb1 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -403,6 +403,46 @@ static int virLXCControllerSetupLoopDeviceDisk(virDomainDiskDefPtr disk) }
+static int virLXCControllerSetupNBDDeviceFS(virDomainFSDefPtr fs) +{ + char *dev; + + if (virFileNBDDeviceAssociate(fs->src, &dev, + !!fs->readonly) < 0) + return -1; + + /* + * We now change it into a block device type, so that + * the rest of container setup 'just works' + */ + fs->type = VIR_DOMAIN_DISK_TYPE_BLOCK; + VIR_FREE(fs->src); + fs->src = dev; + + return 0; +} + + +static int virLXCControllerSetupNBDDeviceDisk(virDomainDiskDefPtr disk) +{ + char *dev; + + if (virFileNBDDeviceAssociate(disk->src, &dev, + !!disk->readonly) < 0) + return -1; + + /* + * We now change it into a block device type, so that + * the rest of container setup 'just works' + */ + disk->type = VIR_DOMAIN_DISK_TYPE_BLOCK; + VIR_FREE(disk->src); + disk->src = dev; + + return 0; +} + + static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) { size_t i; @@ -435,6 +475,9 @@ static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) goto cleanup; } ctrl->loopDevFds[ctrl->nloopDevs - 1] = fd; + } else if (fs->fsdriver == VIR_DOMAIN_FS_DRIVER_TYPE_NBD) { + if (virLXCControllerSetupNBDDeviceFS(fs) < 0) + goto cleanup; } else { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("fs driver %s is not supported"), @@ -449,8 +492,14 @@ static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) if (disk->type != VIR_DOMAIN_DISK_TYPE_FILE) continue;
- if (!disk->driverName || - STREQ(disk->driverName, "loop")) {
This was if no driverName or if driverName is loop, then make sure RAW or NONE for format... That logic doesn't seem to change with the new checks; however...
+ /* If no driverName is set, we prefer 'loop' for + * dealing with raw or undefined formats. Only + * default to 'nbd' for non-raw formats. + */ + if ((disk->driverName && STREQ(disk->driverName, "loop")) || + (!disk->driverName && + (disk->format == VIR_STORAGE_FILE_RAW || + disk->format == VIR_STORAGE_FILE_NONE))) { if (disk->format != VIR_STORAGE_FILE_RAW && disk->format != VIR_STORAGE_FILE_NONE) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -474,6 +523,17 @@ static int virLXCControllerSetupLoopDevices(virLXCControllerPtr ctrl) goto cleanup; } ctrl->loopDevFds[ctrl->nloopDevs - 1] = fd; + } else if (!disk->driverName || + STREQ(disk->driverName, "nbd")) {
if driverName is not provided we fall into here? Which doesn't seem to be what was intended. Seems this should be if driverName && nbd...
+ if (disk->cachemode != VIR_DOMAIN_DISK_CACHE_DEFAULT && + disk->cachemode != VIR_DOMAIN_DISK_CACHE_DISABLE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Disk cache mode %s is not supported"), + virDomainDiskCacheTypeToString(disk->cachemode)); + goto cleanup; + } + if (virLXCControllerSetupNBDDeviceDisk(disk) < 0) + goto cleanup; } else { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("disk driver %s is not supported"),
Which potentially causes issues hereif driverName == NULL, right? Also might be good to document the nbd format on the formatdomain page. John John

From: "Daniel P. Berrange" <berrange@redhat.com> The LXC controller is closing loop devices as soon as the container has started. This is fine if the loop device was setup as a mounted filesystem, but if we're just passing through the loop device as a disk, nothing else is keeping it open. Thus we must keep the loop device FDs open for as long the libvirt_lxc process is running. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index c433fb1..9545df3 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -199,22 +199,12 @@ error: } -static int virLXCControllerCloseLoopDevices(virLXCControllerPtr ctrl, - bool force) +static int virLXCControllerCloseLoopDevices(virLXCControllerPtr ctrl) { size_t i; - for (i = 0 ; i < ctrl->nloopDevs ; i++) { - if (force) { - VIR_FORCE_CLOSE(ctrl->loopDevFds[i]); - } else { - if (VIR_CLOSE(ctrl->loopDevFds[i]) < 0) { - virReportSystemError(errno, "%s", - _("Unable to close loop device")); - return -1; - } - } - } + for (i = 0 ; i < ctrl->nloopDevs ; i++) + VIR_FORCE_CLOSE(ctrl->loopDevFds[i]); return 0; } @@ -1616,10 +1606,6 @@ virLXCControllerRun(virLXCControllerPtr ctrl) /* Now the container is fully setup... */ - /* ...we can close the loop devices... */ - if (virLXCControllerCloseLoopDevices(ctrl, false) < 0) - goto cleanup; - /* ...and reduce our privileges */ if (lxcControllerClearCapabilities() < 0) goto cleanup; -- 1.7.11.7

On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The LXC controller is closing loop devices as soon as the container has started. This is fine if the loop device was setup as a mounted filesystem, but if we're just passing through the loop device as a disk, nothing else is keeping it open. Thus we must keep the loop device FDs open for as long the libvirt_lxc process is running.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-)
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index c433fb1..9545df3 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -199,22 +199,12 @@ error: }
-static int virLXCControllerCloseLoopDevices(virLXCControllerPtr ctrl, - bool force) +static int virLXCControllerCloseLoopDevices(virLXCControllerPtr ctrl) { size_t i;
- for (i = 0 ; i < ctrl->nloopDevs ; i++) { - if (force) { - VIR_FORCE_CLOSE(ctrl->loopDevFds[i]); - } else { - if (VIR_CLOSE(ctrl->loopDevFds[i]) < 0) { - virReportSystemError(errno, "%s", - _("Unable to close loop device")); - return -1; - } - } - } + for (i = 0 ; i < ctrl->nloopDevs ; i++) + VIR_FORCE_CLOSE(ctrl->loopDevFds[i]);
return 0; } @@ -1616,10 +1606,6 @@ virLXCControllerRun(virLXCControllerPtr ctrl)
/* Now the container is fully setup... */
- /* ...we can close the loop devices... */ - if (virLXCControllerCloseLoopDevices(ctrl, false) < 0) - goto cleanup; - /* ...and reduce our privileges */ if (lxcControllerClearCapabilities() < 0) goto cleanup;
Doesn't the call to virLXCControllerCloseLoopDevices() in virLXCControllerStopInit() need to lose the ", true" parameter? John

On Tue, Mar 19, 2013 at 10:17:59AM -0400, John Ferlan wrote:
On 03/15/2013 12:32 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The LXC controller is closing loop devices as soon as the container has started. This is fine if the loop device was setup as a mounted filesystem, but if we're just passing through the loop device as a disk, nothing else is keeping it open. Thus we must keep the loop device FDs open for as long the libvirt_lxc process is running.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-)
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index c433fb1..9545df3 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -199,22 +199,12 @@ error: }
-static int virLXCControllerCloseLoopDevices(virLXCControllerPtr ctrl, - bool force) +static int virLXCControllerCloseLoopDevices(virLXCControllerPtr ctrl) { size_t i;
- for (i = 0 ; i < ctrl->nloopDevs ; i++) { - if (force) { - VIR_FORCE_CLOSE(ctrl->loopDevFds[i]); - } else { - if (VIR_CLOSE(ctrl->loopDevFds[i]) < 0) { - virReportSystemError(errno, "%s", - _("Unable to close loop device")); - return -1; - } - } - } + for (i = 0 ; i < ctrl->nloopDevs ; i++) + VIR_FORCE_CLOSE(ctrl->loopDevFds[i]);
return 0; } @@ -1616,10 +1606,6 @@ virLXCControllerRun(virLXCControllerPtr ctrl)
/* Now the container is fully setup... */
- /* ...we can close the loop devices... */ - if (virLXCControllerCloseLoopDevices(ctrl, false) < 0) - goto cleanup; - /* ...and reduce our privileges */ if (lxcControllerClearCapabilities() < 0) goto cleanup;
Doesn't the call to virLXCControllerCloseLoopDevices() in virLXCControllerStopInit() need to lose the ", true" parameter?
Yes, of course :-) Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Ping, any reviews welcome On Fri, Mar 15, 2013 at 04:32:37PM +0000, Daniel P. Berrange wrote:
This series adds support for using non-raw disks with LXC containers by leverging NBD + qemu-nbd as an alternative to the loopback device.
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/19/2013 04:50 AM, Daniel P. Berrange wrote:
Ping, any reviews welcome
On Fri, Mar 15, 2013 at 04:32:37PM +0000, Daniel P. Berrange wrote:
This series adds support for using non-raw disks with LXC containers by leverging NBD + qemu-nbd as an alternative to the loopback device.
I also reviewed the series; John did a good job of pointing out the bulk of the things worth cleaning up. Looks like an interesting setup. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
John Ferlan