[libvirt] [PATCH 0/4] RFC: grant KVM guest rawio capability

This patch is as a result of the following dispute: http://www.redhat.com/archives/libvir-list/2011-December/msg00857.html http://www.redhat.com/archives/libvir-list/2011-December/msg00950.html http://www.redhat.com/archives/libvir-list/2012-January/msg00449.html This patchset achieves #1 and #2 of the following tasks:
1. rawio="yes|nmo" on the <disk> element somewhere 2. Give the QEMU process CAP_SYS_RAWIO 3. Use the devices cgroup to specify which individual disks can use rawio.
In short, if you specify the following XML: <disk type='block' device='lun' rawio='yes'> the domain will be granted CAP_SYS_RAWIO. # virsh start VM # cat /proc/<VM's PID>/status ... CapInh: 0000000000000000 CapPrm: fffffffc00020000 CapEff: fffffffc00020000 CapBnd: fffffffc00020000 ... *[PATCH 1/4] conf: add rawio attribute to disk element of domain XML *[PATCH 2/4] util: add functions to keep capabilities *[PATCH 3/4] util: extend virExecWithHook() *[PATCH 4/4] qemu: make qemu processes to retain capabilities - Best regards, Taku Izumi -- Taku Izumi <izumi.taku@jp.fujitsu.com>

This patch adds a new attribute "rawio" to the "disk" element of domain XML. Valid values of "rawio" attribute are "yes" and "no". rawio='yes' indicates the disk is desirous of CAP_SYS_RAWIO. If you specify the following XML: <disk type='block' device='lun' rawio='yes'> ... </disk> the domain will be granted CAP_SYS_RAWIO. (of course, the domain have to be executed with root privilege) NOTE: - "rawio" attribute is only valid when device='lun' - At the moment, any other disks you won't use rawio can use rawio. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> --- docs/formatdomain.html.in | 7 +++++-- docs/schemas/domaincommon.rng | 8 ++++++++ src/conf/domain_conf.c | 36 ++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 3 +++ 4 files changed, 52 insertions(+), 2 deletions(-) Index: libvirt/docs/schemas/domaincommon.rng =================================================================== --- libvirt.orig/docs/schemas/domaincommon.rng +++ libvirt/docs/schemas/domaincommon.rng @@ -806,6 +806,14 @@ </attribute> </optional> <optional> + <attribute name="rawio"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute> + </optional> + <optional> <ref name="snapshot"/> </optional> <choice> Index: libvirt/src/conf/domain_conf.c =================================================================== --- libvirt.orig/src/conf/domain_conf.c +++ libvirt/src/conf/domain_conf.c @@ -30,6 +30,7 @@ #include <dirent.h> #include <sys/time.h> #include <strings.h> +#include <linux/capability.h> #include "virterror_internal.h" #include "datatypes.h" @@ -2751,6 +2752,7 @@ virDomainDiskDefParseXML(virCapsPtr caps char *type = NULL; char *device = NULL; char *snapshot = NULL; + char *rawio = NULL; char *driverName = NULL; char *driverType = NULL; char *source = NULL; @@ -2795,6 +2797,8 @@ virDomainDiskDefParseXML(virCapsPtr caps snapshot = virXMLPropString(node, "snapshot"); + rawio = virXMLPropString(node, "rawio"); + cur = node->children; while (cur != NULL) { if (cur->type == XML_ELEMENT_NODE) { @@ -3103,6 +3107,26 @@ virDomainDiskDefParseXML(virCapsPtr caps def->snapshot = VIR_DOMAIN_DISK_SNAPSHOT_NO; } + def->rawio = -1; /* unspecified */ + if (rawio) { + if (def->device == VIR_DOMAIN_DISK_DEVICE_LUN) { + if (STREQ(rawio, "yes")) { + def->rawio = 1; + } else if (STREQ(rawio, "no")) { + def->rawio = 0; + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("unknown disk rawio setting '%s'"), + rawio); + goto error; + } + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("rawio can be used only with device='lun'")); + goto error; + } + } + if (bus) { if ((def->bus = virDomainDiskBusTypeFromString(bus)) < 0) { virDomainReportError(VIR_ERR_INTERNAL_ERROR, @@ -7517,6 +7541,13 @@ static virDomainDefPtr virDomainDefParse if (!disk) goto error; + /* cap_sys_rawio check */ + if (disk->rawio == 1 && + (def->process_caps & (1ULL << CAP_SYS_RAWIO)) == 0) { + def->process_caps |= (1ULL << CAP_SYS_RAWIO); + VIR_WARN("domain %s will be granted CAP_SYS_RAWIO", def->name); + } + virDomainDiskInsertPreAlloced(def, disk); } VIR_FREE(nodes); @@ -9930,6 +9961,11 @@ virDomainDiskDefFormat(virBufferPtr buf, virBufferAsprintf(buf, " <disk type='%s' device='%s'", type, device); + if (def->rawio == 1) { + virBufferAddLit(buf, " rawio='yes'"); + } else if (def->rawio == 0) { + virBufferAddLit(buf, " rawio='no'"); + } if (def->snapshot && !(def->snapshot == VIR_DOMAIN_DISK_SNAPSHOT_NO && def->readonly)) virBufferAsprintf(buf, " snapshot='%s'", Index: libvirt/src/conf/domain_conf.h =================================================================== --- libvirt.orig/src/conf/domain_conf.h +++ libvirt/src/conf/domain_conf.h @@ -401,6 +401,7 @@ struct _virDomainDiskDef { unsigned int transient : 1; virDomainDeviceInfo info; virStorageEncryptionPtr encryption; + int rawio; /* unspecified:-1 no:0 yes:1 */ }; @@ -1464,6 +1465,8 @@ struct _virDomainDef { char *emulator; int features; + unsigned long long process_caps; + virDomainClockDef clock; int ngraphics; Index: libvirt/docs/formatdomain.html.in =================================================================== --- libvirt.orig/docs/formatdomain.html.in +++ libvirt/docs/formatdomain.html.in @@ -1096,8 +1096,11 @@ - also note that device='lun' will only be recognized for actual raw devices, never for individual partitions or LVM partitions (in those cases, the kernel will reject the generic - SCSI commands, making it identical to device='disk'). The - optional <code>snapshot</code> attribute indicates the default + SCSI commands, making it identical to device='disk'). + The optional <code>rawio</code> attribute indicates that the disk + is desirous of rawio capability. This attribute is only valid when + device is "lun". + The optional <code>snapshot</code> attribute indicates the default behavior of the disk during disk snapshots: "internal" requires a file format such as qcow2 that can store both the snapshot and the data changes since the snapshot;

On 01/30/2012 10:08 AM, Taku Izumi wrote:
+ /* cap_sys_rawio check */ + if (disk->rawio == 1&& + (def->process_caps& (1ULL<< CAP_SYS_RAWIO)) == 0) { + def->process_caps |= (1ULL<< CAP_SYS_RAWIO); + VIR_WARN("domain %s will be granted CAP_SYS_RAWIO", def->name); + } +
This is missing tainting. Also, I'm not sure that this code (and the tainting) would need to go here or rather in src/qemu. Paolo

On Mon, Jan 30, 2012 at 06:08:35PM +0900, Taku Izumi wrote:
This patch adds a new attribute "rawio" to the "disk" element of domain XML. Valid values of "rawio" attribute are "yes" and "no". rawio='yes' indicates the disk is desirous of CAP_SYS_RAWIO.
If you specify the following XML:
<disk type='block' device='lun' rawio='yes'> ... </disk>
the domain will be granted CAP_SYS_RAWIO. (of course, the domain have to be executed with root privilege)
NOTE: - "rawio" attribute is only valid when device='lun' - At the moment, any other disks you won't use rawio can use rawio.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> --- docs/formatdomain.html.in | 7 +++++-- docs/schemas/domaincommon.rng | 8 ++++++++ src/conf/domain_conf.c | 36 ++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 3 +++ 4 files changed, 52 insertions(+), 2 deletions(-)
Index: libvirt/docs/schemas/domaincommon.rng =================================================================== --- libvirt.orig/docs/schemas/domaincommon.rng +++ libvirt/docs/schemas/domaincommon.rng @@ -806,6 +806,14 @@ </attribute> </optional> <optional> + <attribute name="rawio"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute> + </optional> + <optional> <ref name="snapshot"/> </optional> <choice> Index: libvirt/src/conf/domain_conf.c =================================================================== --- libvirt.orig/src/conf/domain_conf.c +++ libvirt/src/conf/domain_conf.c @@ -30,6 +30,7 @@ #include <dirent.h> #include <sys/time.h> #include <strings.h> +#include <linux/capability.h>
Remove this include.
#include "virterror_internal.h" #include "datatypes.h" @@ -2751,6 +2752,7 @@ virDomainDiskDefParseXML(virCapsPtr caps char *type = NULL; char *device = NULL; char *snapshot = NULL; + char *rawio = NULL; char *driverName = NULL; char *driverType = NULL; char *source = NULL; @@ -2795,6 +2797,8 @@ virDomainDiskDefParseXML(virCapsPtr caps
snapshot = virXMLPropString(node, "snapshot");
+ rawio = virXMLPropString(node, "rawio"); + cur = node->children; while (cur != NULL) { if (cur->type == XML_ELEMENT_NODE) { @@ -3103,6 +3107,26 @@ virDomainDiskDefParseXML(virCapsPtr caps def->snapshot = VIR_DOMAIN_DISK_SNAPSHOT_NO; }
+ def->rawio = -1; /* unspecified */ + if (rawio) { + if (def->device == VIR_DOMAIN_DISK_DEVICE_LUN) { + if (STREQ(rawio, "yes")) { + def->rawio = 1; + } else if (STREQ(rawio, "no")) { + def->rawio = 0; + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("unknown disk rawio setting '%s'"), + rawio); + goto error; + } + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("rawio can be used only with device='lun'")); + goto error; + } + } + if (bus) { if ((def->bus = virDomainDiskBusTypeFromString(bus)) < 0) { virDomainReportError(VIR_ERR_INTERNAL_ERROR, @@ -7517,6 +7541,13 @@ static virDomainDefPtr virDomainDefParse if (!disk) goto error;
+ /* cap_sys_rawio check */ + if (disk->rawio == 1 && + (def->process_caps & (1ULL << CAP_SYS_RAWIO)) == 0) { + def->process_caps |= (1ULL << CAP_SYS_RAWIO); + VIR_WARN("domain %s will be granted CAP_SYS_RAWIO", def->name); + } +
Don't do this here. 'process_caps' is an implementation detail for the QEMU driver. We don't need to store any field for that, since the QEMU driver can figure it out from the 'rawio' field when it comes to start the domain.
virDomainDiskInsertPreAlloced(def, disk); } VIR_FREE(nodes); @@ -9930,6 +9961,11 @@ virDomainDiskDefFormat(virBufferPtr buf, virBufferAsprintf(buf, " <disk type='%s' device='%s'", type, device); + if (def->rawio == 1) { + virBufferAddLit(buf, " rawio='yes'"); + } else if (def->rawio == 0) { + virBufferAddLit(buf, " rawio='no'"); + } if (def->snapshot && !(def->snapshot == VIR_DOMAIN_DISK_SNAPSHOT_NO && def->readonly)) virBufferAsprintf(buf, " snapshot='%s'", Index: libvirt/src/conf/domain_conf.h =================================================================== --- libvirt.orig/src/conf/domain_conf.h +++ libvirt/src/conf/domain_conf.h @@ -401,6 +401,7 @@ struct _virDomainDiskDef { unsigned int transient : 1; virDomainDeviceInfo info; virStorageEncryptionPtr encryption; + int rawio; /* unspecified:-1 no:0 yes:1 */ };
@@ -1464,6 +1465,8 @@ struct _virDomainDef { char *emulator; int features;
+ unsigned long long process_caps; +
Remove this field.
virDomainClockDef clock;
int ngraphics;
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01/30/2012 02:08 AM, Taku Izumi wrote:
This patch adds a new attribute "rawio" to the "disk" element of domain XML. Valid values of "rawio" attribute are "yes" and "no". rawio='yes' indicates the disk is desirous of CAP_SYS_RAWIO.
If you specify the following XML:
<disk type='block' device='lun' rawio='yes'> ... </disk>
the domain will be granted CAP_SYS_RAWIO. (of course, the domain have to be executed with root privilege)
NOTE: - "rawio" attribute is only valid when device='lun' - At the moment, any other disks you won't use rawio can use rawio.
Trailing space in your commit message. In addition to Dan's comments,
@@ -3103,6 +3107,26 @@ virDomainDiskDefParseXML(virCapsPtr caps def->snapshot = VIR_DOMAIN_DISK_SNAPSHOT_NO; }
+ def->rawio = -1; /* unspecified */ + if (rawio) { + if (def->device == VIR_DOMAIN_DISK_DEVICE_LUN) { + if (STREQ(rawio, "yes")) { + def->rawio = 1; + } else if (STREQ(rawio, "no")) { + def->rawio = 0; + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("unknown disk rawio setting '%s'"), + rawio); + goto error; + } + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("rawio can be used only with device='lun'")); + goto error; + }
This rejects any use of rawio, even rawio='no', for non-lun,
@@ -9930,6 +9961,11 @@ virDomainDiskDefFormat(virBufferPtr buf, virBufferAsprintf(buf, " <disk type='%s' device='%s'", type, device); + if (def->rawio == 1) { + virBufferAddLit(buf, " rawio='yes'"); + } else if (def->rawio == 0) { + virBufferAddLit(buf, " rawio='no'"); + }
but since rawio defaults to 0, this generates rawio='no' for all disks, making it impossible to restart libvirtd if you use non-lun. You need to conditionalize the output.
Index: libvirt/docs/formatdomain.html.in =================================================================== --- libvirt.orig/docs/formatdomain.html.in +++ libvirt/docs/formatdomain.html.in @@ -1096,8 +1096,11 @@ - also note that device='lun' will only be recognized for actual raw devices, never for individual partitions or LVM partitions (in those cases, the kernel will reject the generic - SCSI commands, making it identical to device='disk'). The - optional <code>snapshot</code> attribute indicates the default + SCSI commands, making it identical to device='disk'). + The optional <code>rawio</code> attribute indicates that the disk + is desirous of rawio capability. This attribute is only valid when + device is "lun".
I was about to say that you were missing a <span class="since"> notation; but since device='lun' was also introduced in 0.9.10, and rawio only makes sense with lun, you don't need an extra tag. Even though you got acks later in the series, I couldn't quickly repair the items that Dan pointed out about patch 1, so I'll wait for the next revision of this series. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This patch introduces virKeepCapabilities() function and implements virCommandAllowCap() function. Existing virClearCapabilities() is function to clear all capabilities. Instead virKeepCapabilities() is function to keep arbitrary capabilities. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/util/command.c | 45 ++++++++++++++++++++++++++++++++++++++------- src/util/command.h | 4 +--- 2 files changed, 39 insertions(+), 10 deletions(-) Index: libvirt/src/util/command.c =================================================================== --- libvirt.orig/src/util/command.c +++ libvirt/src/util/command.c @@ -103,6 +103,8 @@ struct _virCommand { pid_t pid; char *pidfile; bool reap; + + unsigned long long capabilities; }; /* @@ -182,6 +184,33 @@ static int virClearCapabilities(void) return 0; } + +/** + * virKeepCapabilities: + * @capabilities - capability flag to keep. + * In case of 0, this function is identical to + * virClearCapabilities() + * + */ +static int virKeepCapabilities(unsigned long long capabilities) +{ + int ret, i; + + capng_clear(CAPNG_SELECT_BOTH); + + for (i = 0; i <= CAP_LAST_CAP; i++) { + if (capabilities & (1ULL << i)) + capng_update(CAPNG_ADD, CAPNG_BOUNDING_SET, i); + } + + if (ret = capng_apply(CAPNG_SELECT_BOTH) < 0) { + virCommandError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), ret); + return -1; + } + + return 0; +} # else static int virClearCapabilities(void) { @@ -189,6 +218,11 @@ static int virClearCapabilities(void) // "capabilities"); return 0; } + +static int virKeepCapabilities(unsigned long long capabilities) +{ + return 0; +} # endif /** @@ -883,26 +917,23 @@ virCommandClearCaps(virCommandPtr cmd) cmd->flags |= VIR_EXEC_CLEAR_CAPS; } -#if 0 /* XXX Enable if we have a need for capability management. */ - /** * virCommandAllowCap: * @cmd: the command to modify - * @capability: what to allow + * @capabilities: what to allow * - * Re-allow a specific capability + * Allow specific capabilities */ void virCommandAllowCap(virCommandPtr cmd, - int capability ATTRIBUTE_UNUSED) + unsigned long long capabilities) { if (!cmd || cmd->has_error) return; - /* XXX ? */ + cmd->capabilities = capabilities; } -#endif /* 0 */ /** Index: libvirt/src/util/command.h =================================================================== --- libvirt.orig/src/util/command.h +++ libvirt/src/util/command.h @@ -60,10 +60,8 @@ void virCommandSetPidFile(virCommandPtr void virCommandClearCaps(virCommandPtr cmd); -# if 0 void virCommandAllowCap(virCommandPtr cmd, - int capability); -# endif + unsigned long long capabilities); void virCommandDaemonize(virCommandPtr cmd);

On Mon, Jan 30, 2012 at 06:09:44PM +0900, Taku Izumi wrote:
This patch introduces virKeepCapabilities() function and implements virCommandAllowCap() function.
Existing virClearCapabilities() is function to clear all capabilities. Instead virKeepCapabilities() is function to keep arbitrary capabilities.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/util/command.c | 45 ++++++++++++++++++++++++++++++++++++++------- src/util/command.h | 4 +--- 2 files changed, 39 insertions(+), 10 deletions(-)
Index: libvirt/src/util/command.c =================================================================== --- libvirt.orig/src/util/command.c +++ libvirt/src/util/command.c @@ -103,6 +103,8 @@ struct _virCommand { pid_t pid; char *pidfile; bool reap; + + unsigned long long capabilities; };
/* @@ -182,6 +184,33 @@ static int virClearCapabilities(void)
return 0; } + +/** + * virKeepCapabilities: + * @capabilities - capability flag to keep. + * In case of 0, this function is identical to + * virClearCapabilities() + * + */ +static int virKeepCapabilities(unsigned long long capabilities) +{ + int ret, i; + + capng_clear(CAPNG_SELECT_BOTH); + + for (i = 0; i <= CAP_LAST_CAP; i++) { + if (capabilities & (1ULL << i)) + capng_update(CAPNG_ADD, CAPNG_BOUNDING_SET, i); + } + + if (ret = capng_apply(CAPNG_SELECT_BOTH) < 0) { + virCommandError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), ret); + return -1; + } + + return 0; +} # else static int virClearCapabilities(void) { @@ -189,6 +218,11 @@ static int virClearCapabilities(void) // "capabilities"); return 0; } + +static int virKeepCapabilities(unsigned long long capabilities) +{ + return 0; +} # endif
I suggest calling this 'virSetCapabilities' instead, since it is possible this is called with no capabilities to be kept.
/** @@ -883,26 +917,23 @@ virCommandClearCaps(virCommandPtr cmd) cmd->flags |= VIR_EXEC_CLEAR_CAPS; }
-#if 0 /* XXX Enable if we have a need for capability management. */ - /** * virCommandAllowCap: * @cmd: the command to modify - * @capability: what to allow + * @capabilities: what to allow * - * Re-allow a specific capability + * Allow specific capabilities */ void virCommandAllowCap(virCommandPtr cmd, - int capability ATTRIBUTE_UNUSED) + unsigned long long capabilities)
IMHO this shouldn't be changed.
{ if (!cmd || cmd->has_error) return;
- /* XXX ? */ + cmd->capabilities = capabilities;
I'd prefer to see cmd->capabilities |= capability; Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch extends virExecWithHook() to receive capability information. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/util/command.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) Index: libvirt/src/util/command.c =================================================================== --- libvirt.orig/src/util/command.c +++ libvirt/src/util/command.c @@ -393,6 +393,7 @@ prepareStdFd(int fd, int std) * @hook optional virExecHook function to call prior to exec * @data data to pass to the hook function * @pidfile path to use as pidfile for daemonized process (needs DAEMON flag) + * @capabilities capabilities to keep */ static int virExecWithHook(const char *const*argv, @@ -404,7 +405,8 @@ virExecWithHook(const char *const*argv, unsigned int flags, virExecHook hook, void *data, - char *pidfile) + char *pidfile, + unsigned long long capabilities) { pid_t pid; int null = -1, i, openmax; @@ -633,9 +635,9 @@ virExecWithHook(const char *const*argv, /* The steps above may need todo something privileged, so * we delay clearing capabilities until the last minute */ - if ((flags & VIR_EXEC_CLEAR_CAPS) && - virClearCapabilities() < 0) - goto fork_error; + if (capabilities || (flags & VIR_EXEC_CLEAR_CAPS)) + if (virKeepCapabilities(capabilities) < 0) + goto fork_error; /* Close logging again to ensure no FDs leak to child */ virLogReset(); @@ -723,7 +725,8 @@ virExecWithHook(const char *const*argv A int flags_unused ATTRIBUTE_UNUSED, virExecHook hook ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED, - char *pidfile ATTRIBUTE_UNUSED) + char *pidfile ATTRIBUTE_UNUSED, + unsigned long long capabilities ATTRIBUTE_UNUSED) { /* XXX: Some day we can implement pieces of virCommand/virExec on * top of _spawn() or CreateProcess(), but we can't implement @@ -2171,7 +2174,8 @@ virCommandRunAsync(virCommandPtr cmd, pi cmd->flags, virCommandHook, cmd, - cmd->pidfile); + cmd->pidfile, + cmd->capabilities); VIR_DEBUG("Command result %d, with PID %d", ret, (int)cmd->pid);

On Mon, Jan 30, 2012 at 06:10:50PM +0900, Taku Izumi wrote:
This patch extends virExecWithHook() to receive capability information.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/util/command.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
Index: libvirt/src/util/command.c =================================================================== --- libvirt.orig/src/util/command.c +++ libvirt/src/util/command.c @@ -393,6 +393,7 @@ prepareStdFd(int fd, int std) * @hook optional virExecHook function to call prior to exec * @data data to pass to the hook function * @pidfile path to use as pidfile for daemonized process (needs DAEMON flag) + * @capabilities capabilities to keep */ static int virExecWithHook(const char *const*argv, @@ -404,7 +405,8 @@ virExecWithHook(const char *const*argv, unsigned int flags, virExecHook hook, void *data, - char *pidfile) + char *pidfile, + unsigned long long capabilities) { pid_t pid; int null = -1, i, openmax; @@ -633,9 +635,9 @@ virExecWithHook(const char *const*argv,
/* The steps above may need todo something privileged, so * we delay clearing capabilities until the last minute */ - if ((flags & VIR_EXEC_CLEAR_CAPS) && - virClearCapabilities() < 0) - goto fork_error; + if (capabilities || (flags & VIR_EXEC_CLEAR_CAPS)) + if (virKeepCapabilities(capabilities) < 0) + goto fork_error;
/* Close logging again to ensure no FDs leak to child */ virLogReset(); @@ -723,7 +725,8 @@ virExecWithHook(const char *const*argv A int flags_unused ATTRIBUTE_UNUSED, virExecHook hook ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED, - char *pidfile ATTRIBUTE_UNUSED) + char *pidfile ATTRIBUTE_UNUSED, + unsigned long long capabilities ATTRIBUTE_UNUSED) { /* XXX: Some day we can implement pieces of virCommand/virExec on * top of _spawn() or CreateProcess(), but we can't implement @@ -2171,7 +2174,8 @@ virCommandRunAsync(virCommandPtr cmd, pi cmd->flags, virCommandHook, cmd, - cmd->pidfile); + cmd->pidfile, + cmd->capabilities);
VIR_DEBUG("Command result %d, with PID %d", ret, (int)cmd->pid);
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch revises qemuProcessStart() function for qemu processes to retain arbitrary capabilities. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/qemu/qemu_process.c | 1 + 1 file changed, 1 insertion(+) Index: libvirt/src/qemu/qemu_process.c =================================================================== --- libvirt.orig/src/qemu/qemu_process.c +++ libvirt/src/qemu/qemu_process.c @@ -3334,6 +3334,7 @@ int qemuProcessStart(virConnectPtr conn, driver->clearEmulatorCapabilities); if (driver->clearEmulatorCapabilities) virCommandClearCaps(cmd); + virCommandAllowCap(cmd, vm->def->process_caps); virCommandSetPreExecHook(cmd, qemuProcessHook, &hookData);

On Mon, Jan 30, 2012 at 06:11:50PM +0900, Taku Izumi wrote:
This patch revises qemuProcessStart() function for qemu processes to retain arbitrary capabilities.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/qemu/qemu_process.c | 1 + 1 file changed, 1 insertion(+)
Index: libvirt/src/qemu/qemu_process.c =================================================================== --- libvirt.orig/src/qemu/qemu_process.c +++ libvirt/src/qemu/qemu_process.c @@ -3334,6 +3334,7 @@ int qemuProcessStart(virConnectPtr conn, driver->clearEmulatorCapabilities); if (driver->clearEmulatorCapabilities) virCommandClearCaps(cmd); + virCommandAllowCap(cmd, vm->def->process_caps);
Following on from my command in the first patch, we should be doing for (i = 0 ; i < vm->def->ndisks ; i++) { if (vm->def->disks[i].rawio) virCommandAllowCap(cmd, CAP_SYS_RAWIO); } And, in qemuDomainObjCheckDiskTaint() you need to add code which does if (disk->rawio) qemuDomainObjTaint(driver, obj, VIR_DOMAIN_TAINT_HIGH_PRIVILEGES, logFD); Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Paolo Bonzini
-
Taku Izumi