[libvirt] [PATCH v2 0/4] RFC: grant KVM guest rawio capability

This is the new version of http://www.redhat.com/archives/libvir-list/2012-January/msg01312.html Thank you for reviewing, Danie, Paolo, and Eric. -- This patchset achieves #1 and #2 of the following tasks:
1. rawio="yes|nmo" on the <disk> element somewhere 2. Give the QEMU process CAP_SYS_RAWIO 3. Use the devices cgroup to specify which individual disks can use rawio.
In short, if you specify the following XML: <disk type='block' device='lun' rawio='yes'> the domain will be granted CAP_SYS_RAWIO. # virsh start VM # cat /proc/<VM's PID>/status ... CapInh: 0000000000000000 CapPrm: fffffffc00020000 CapEff: fffffffc00020000 CapBnd: fffffffc00020000 ... *[PATCH v2 1/4] conf: add rawio attribute to disk element of domain XML *[PATCH v2 2/4] util: add functions to keep capabilities *[PATCH v2 3/4] util: extend virExecWithHook() *[PATCH v2 4/4] qemu: make qemu processes to retain rawio capability - Best regards, Taku Izumi

This patch adds a new attribute "rawio" to the "disk" element of domain XML. Valid values of "rawio" attribute are "yes" and "no". rawio='yes' indicates the disk is desirous of CAP_SYS_RAWIO. If you specify the following XML: <disk type='block' device='lun' rawio='yes'> ... </disk> the domain will be granted CAP_SYS_RAWIO. (of course, the domain have to be executed with root privilege) NOTE: - "rawio" attribute is only valid when device='lun' - At the moment, any other disks you won't use rawio can use rawio. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> --- docs/formatdomain.html.in | 7 +++++-- docs/schemas/domaincommon.rng | 8 ++++++++ src/conf/domain_conf.c | 28 ++++++++++++++++++++++++++++ src/conf/domain_conf.h | 1 + 4 files changed, 42 insertions(+), 2 deletions(-) Index: libvirt/docs/schemas/domaincommon.rng =================================================================== --- libvirt.orig/docs/schemas/domaincommon.rng +++ libvirt/docs/schemas/domaincommon.rng @@ -806,6 +806,14 @@ </attribute> </optional> <optional> + <attribute name="rawio"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute> + </optional> + <optional> <ref name="snapshot"/> </optional> <choice> Index: libvirt/src/conf/domain_conf.c =================================================================== --- libvirt.orig/src/conf/domain_conf.c +++ libvirt/src/conf/domain_conf.c @@ -2806,6 +2806,7 @@ virDomainDiskDefParseXML(virCapsPtr caps char *type = NULL; char *device = NULL; char *snapshot = NULL; + char *rawio = NULL; char *driverName = NULL; char *driverType = NULL; char *source = NULL; @@ -2850,6 +2851,8 @@ virDomainDiskDefParseXML(virCapsPtr caps snapshot = virXMLPropString(node, "snapshot"); + rawio = virXMLPropString(node, "rawio"); + cur = node->children; while (cur != NULL) { if (cur->type == XML_ELEMENT_NODE) { @@ -3156,6 +3159,26 @@ virDomainDiskDefParseXML(virCapsPtr caps def->snapshot = VIR_DOMAIN_DISK_SNAPSHOT_NO; } + def->rawio = -1; /* unspecified */ + if (rawio) { + if (def->device == VIR_DOMAIN_DISK_DEVICE_LUN) { + if (STREQ(rawio, "yes")) { + def->rawio = 1; + } else if (STREQ(rawio, "no")) { + def->rawio = 0; + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("unknown disk rawio setting '%s'"), + rawio); + goto error; + } + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("rawio can be used only with device='lun'")); + goto error; + } + } + if (bus) { if ((def->bus = virDomainDiskBusTypeFromString(bus)) < 0) { virDomainReportError(VIR_ERR_INTERNAL_ERROR, @@ -9972,6 +9995,11 @@ virDomainDiskDefFormat(virBufferPtr buf, virBufferAsprintf(buf, " <disk type='%s' device='%s'", type, device); + if (def->rawio == 1) { + virBufferAddLit(buf, " rawio='yes'"); + } else if (def->rawio == 0) { + virBufferAddLit(buf, " rawio='no'"); + } if (def->snapshot && !(def->snapshot == VIR_DOMAIN_DISK_SNAPSHOT_NO && def->readonly)) virBufferAsprintf(buf, " snapshot='%s'", Index: libvirt/src/conf/domain_conf.h =================================================================== --- libvirt.orig/src/conf/domain_conf.h +++ libvirt/src/conf/domain_conf.h @@ -413,6 +413,7 @@ struct _virDomainDiskDef { unsigned int transient : 1; virDomainDeviceInfo info; virStorageEncryptionPtr encryption; + int rawio; /* unspecified = -1, no = 0, yes = 1 */ }; Index: libvirt/docs/formatdomain.html.in =================================================================== --- libvirt.orig/docs/formatdomain.html.in +++ libvirt/docs/formatdomain.html.in @@ -1096,8 +1096,11 @@ - also note that device='lun' will only be recognized for actual raw devices, never for individual partitions or LVM partitions (in those cases, the kernel will reject the generic - SCSI commands, making it identical to device='disk'). The - optional <code>snapshot</code> attribute indicates the default + SCSI commands, making it identical to device='disk'). + The optional <code>rawio</code> attribute indicates that the disk + is desirous of rawio capability. This attribute is only valid when + device is "lun". + The optional <code>snapshot</code> attribute indicates the default behavior of the disk during disk snapshots: "internal" requires a file format such as qcow2 that can store both the snapshot and the data changes since the snapshot;

On Tue, Jan 31, 2012 at 01:49:11PM +0900, Taku Izumi wrote:
@@ -3156,6 +3159,26 @@ virDomainDiskDefParseXML(virCapsPtr caps def->snapshot = VIR_DOMAIN_DISK_SNAPSHOT_NO; }
+ def->rawio = -1; /* unspecified */ + if (rawio) { + if (def->device == VIR_DOMAIN_DISK_DEVICE_LUN) { + if (STREQ(rawio, "yes")) { + def->rawio = 1; + } else if (STREQ(rawio, "no")) { + def->rawio = 0; + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("unknown disk rawio setting '%s'"), + rawio); + goto error; + } + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("rawio can be used only with device='lun'"));
s/INTERNAL_ERROR/XML_ERROR/ ACK with that change Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01/31/2012 07:03 AM, Daniel P. Berrange wrote:
On Tue, Jan 31, 2012 at 01:49:11PM +0900, Taku Izumi wrote:
@@ -3156,6 +3159,26 @@ virDomainDiskDefParseXML(virCapsPtr caps def->snapshot = VIR_DOMAIN_DISK_SNAPSHOT_NO; }
+ def->rawio = -1; /* unspecified */ + if (rawio) { + if (def->device == VIR_DOMAIN_DISK_DEVICE_LUN) { + if (STREQ(rawio, "yes")) { + def->rawio = 1; + } else if (STREQ(rawio, "no")) { + def->rawio = 0; + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("unknown disk rawio setting '%s'"), + rawio); + goto error; + } + } else { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("rawio can be used only with device='lun'")); s/INTERNAL_ERROR/XML_ERROR/
ACK with that change
I've pushed this patch with that, and some other, changes (see attached diff for all changes aside from what you pointed out - I managed to squash that in before I thought to save a diff). Just now I realize I had misinterpreted your comment as being only about the first of those two messages (my brain stopped looking at the email as soon as I hit the first one - you should have done "s/.../.../g" :-), but when I looked at the file I saw the 2nd, I decided it should be VIR_ERR_CONFIG_UNSUPPORTED. Should I push a change to XML_ERROR, or leave it as is? (I *still* get confused about which is more appropriate where, but this one is in kind of a gray area.) I also found that "make check wouldn't pass, which was mostly traced to the concept of making the default value of rawio "-1". The problem with this is that there are other functions that create and fill-in domain structures, and they hadn't been taught about this default value. I changed the object to have a "bool rawio_specified", and modified the parse and format functions accordingly. Also there was no test that exercised the rawio attribute - I added that into the virtio-lun test. (and just now, I realize that I forgot to add a blurb in the docs. I'll do that when I'm done with these emails)

On 01/31/2012 11:53 AM, Laine Stump wrote:
I also found that "make check wouldn't pass, which was mostly traced to the concept of making the default value of rawio "-1". The problem with this is that there are other functions that create and fill-in domain structures, and they hadn't been taught about this default value. I changed the object to have a "bool rawio_specified", and modified the parse and format functions accordingly.
We could have also solved this by using a three-state enum with the default state being 0 rather than -1: enum { VIR_DOMAIN_DISK_RAWIO_DEFAULT = 0, VIR_DOMAIN_DISK_RAWIO_NO = 1, VIR_DOMAIN_DISK_RAWIO_YES = 2, } which we have done for other setups, as that avoids the need for a separate bool value. But I'm not too worried about what you did to bother changing things. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This patch introduces virSetCapabilities() function and implements virCommandAllowCap() function. Existing virClearCapabilities() is function to clear all capabilities. Instead virSetCapabilities() is function to set arbitrary capabilities. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/util/command.c | 43 +++++++++++++++++++++++++++++++++++++------ src/util/command.h | 2 -- 2 files changed, 37 insertions(+), 8 deletions(-) Index: libvirt/src/util/command.c =================================================================== --- libvirt.orig/src/util/command.c +++ libvirt/src/util/command.c @@ -103,6 +103,8 @@ struct _virCommand { pid_t pid; char *pidfile; bool reap; + + unsigned long long capabilities; }; /* @@ -182,6 +184,33 @@ static int virClearCapabilities(void) return 0; } + +/** + * virSetCapabilities: + * @capabilities - capability flag to set. + * In case of 0, this function is identical to + * virClearCapabilities() + * + */ +static int virSetCapabilities(unsigned long long capabilities) +{ + int ret, i; + + capng_clear(CAPNG_SELECT_BOTH); + + for (i = 0; i <= CAP_LAST_CAP; i++) { + if (capabilities & (1ULL << i)) + capng_update(CAPNG_ADD, CAPNG_BOUNDING_SET, i); + } + + if ((ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virCommandError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), ret); + return -1; + } + + return 0; +} # else static int virClearCapabilities(void) { @@ -189,6 +218,11 @@ static int virClearCapabilities(void) // "capabilities"); return 0; } + +static int virSetCapabilities(unsigned long long capabilities) +{ + return 0; +} # endif /** @@ -883,26 +917,23 @@ virCommandClearCaps(virCommandPtr cmd) cmd->flags |= VIR_EXEC_CLEAR_CAPS; } -#if 0 /* XXX Enable if we have a need for capability management. */ - /** * virCommandAllowCap: * @cmd: the command to modify * @capability: what to allow * - * Re-allow a specific capability + * Allow specific capabilities */ void virCommandAllowCap(virCommandPtr cmd, - int capability ATTRIBUTE_UNUSED) + int capability) { if (!cmd || cmd->has_error) return; - /* XXX ? */ + cmd->capabilities |= (1ULL << capability); } -#endif /* 0 */ /** Index: libvirt/src/util/command.h =================================================================== --- libvirt.orig/src/util/command.h +++ libvirt/src/util/command.h @@ -60,10 +60,8 @@ void virCommandSetPidFile(virCommandPtr void virCommandClearCaps(virCommandPtr cmd); -# if 0 void virCommandAllowCap(virCommandPtr cmd, int capability); -# endif void virCommandDaemonize(virCommandPtr cmd);

On Tue, Jan 31, 2012 at 01:50:42PM +0900, Taku Izumi wrote:
This patch introduces virSetCapabilities() function and implements virCommandAllowCap() function.
Existing virClearCapabilities() is function to clear all capabilities. Instead virSetCapabilities() is function to set arbitrary capabilities.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/util/command.c | 43 +++++++++++++++++++++++++++++++++++++------ src/util/command.h | 2 -- 2 files changed, 37 insertions(+), 8 deletions(-)
Index: libvirt/src/util/command.c =================================================================== --- libvirt.orig/src/util/command.c +++ libvirt/src/util/command.c @@ -103,6 +103,8 @@ struct _virCommand { pid_t pid; char *pidfile; bool reap; + + unsigned long long capabilities; };
/* @@ -182,6 +184,33 @@ static int virClearCapabilities(void)
return 0; } + +/** + * virSetCapabilities: + * @capabilities - capability flag to set. + * In case of 0, this function is identical to + * virClearCapabilities() + * + */ +static int virSetCapabilities(unsigned long long capabilities) +{ + int ret, i; + + capng_clear(CAPNG_SELECT_BOTH); + + for (i = 0; i <= CAP_LAST_CAP; i++) { + if (capabilities & (1ULL << i)) + capng_update(CAPNG_ADD, CAPNG_BOUNDING_SET, i); + } + + if ((ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virCommandError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), ret); + return -1; + } + + return 0; +} # else static int virClearCapabilities(void) { @@ -189,6 +218,11 @@ static int virClearCapabilities(void) // "capabilities"); return 0; } + +static int virSetCapabilities(unsigned long long capabilities) +{ + return 0; +} # endif
/** @@ -883,26 +917,23 @@ virCommandClearCaps(virCommandPtr cmd) cmd->flags |= VIR_EXEC_CLEAR_CAPS; }
-#if 0 /* XXX Enable if we have a need for capability management. */ - /** * virCommandAllowCap: * @cmd: the command to modify * @capability: what to allow * - * Re-allow a specific capability + * Allow specific capabilities */ void virCommandAllowCap(virCommandPtr cmd, - int capability ATTRIBUTE_UNUSED) + int capability) { if (!cmd || cmd->has_error) return;
- /* XXX ? */ + cmd->capabilities |= (1ULL << capability); }
-#endif /* 0 */
/** Index: libvirt/src/util/command.h =================================================================== --- libvirt.orig/src/util/command.h +++ libvirt/src/util/command.h @@ -60,10 +60,8 @@ void virCommandSetPidFile(virCommandPtr
void virCommandClearCaps(virCommandPtr cmd);
-# if 0 void virCommandAllowCap(virCommandPtr cmd, int capability); -# endif
void virCommandDaemonize(virCommandPtr cmd);
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch extends virExecWithHook() to receive capability information. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/util/command.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) Index: libvirt/src/util/command.c =================================================================== --- libvirt.orig/src/util/command.c +++ libvirt/src/util/command.c @@ -393,6 +393,7 @@ prepareStdFd(int fd, int std) * @hook optional virExecHook function to call prior to exec * @data data to pass to the hook function * @pidfile path to use as pidfile for daemonized process (needs DAEMON flag) + * @capabilities capabilities to keep */ static int virExecWithHook(const char *const*argv, @@ -404,7 +405,8 @@ virExecWithHook(const char *const*argv, unsigned int flags, virExecHook hook, void *data, - char *pidfile) + char *pidfile, + unsigned long long capabilities) { pid_t pid; int null = -1, i, openmax; @@ -633,9 +635,9 @@ virExecWithHook(const char *const*argv, /* The steps above may need todo something privileged, so * we delay clearing capabilities until the last minute */ - if ((flags & VIR_EXEC_CLEAR_CAPS) && - virClearCapabilities() < 0) - goto fork_error; + if (capabilities || (flags & VIR_EXEC_CLEAR_CAPS)) + if (virSetCapabilities(capabilities) < 0) + goto fork_error; /* Close logging again to ensure no FDs leak to child */ virLogReset(); @@ -723,7 +725,8 @@ virExecWithHook(const char *const*argv A int flags_unused ATTRIBUTE_UNUSED, virExecHook hook ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED, - char *pidfile ATTRIBUTE_UNUSED) + char *pidfile ATTRIBUTE_UNUSED, + unsigned long long capabilities ATTRIBUTE_UNUSED) { /* XXX: Some day we can implement pieces of virCommand/virExec on * top of _spawn() or CreateProcess(), but we can't implement @@ -2171,7 +2174,8 @@ virCommandRunAsync(virCommandPtr cmd, pi cmd->flags, virCommandHook, cmd, - cmd->pidfile); + cmd->pidfile, + cmd->capabilities); VIR_DEBUG("Command result %d, with PID %d", ret, (int)cmd->pid);

On Tue, Jan 31, 2012 at 01:51:22PM +0900, Taku Izumi wrote:
This patch extends virExecWithHook() to receive capability information.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/util/command.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
Index: libvirt/src/util/command.c =================================================================== --- libvirt.orig/src/util/command.c +++ libvirt/src/util/command.c @@ -393,6 +393,7 @@ prepareStdFd(int fd, int std) * @hook optional virExecHook function to call prior to exec * @data data to pass to the hook function * @pidfile path to use as pidfile for daemonized process (needs DAEMON flag) + * @capabilities capabilities to keep */ static int virExecWithHook(const char *const*argv, @@ -404,7 +405,8 @@ virExecWithHook(const char *const*argv, unsigned int flags, virExecHook hook, void *data, - char *pidfile) + char *pidfile, + unsigned long long capabilities) { pid_t pid; int null = -1, i, openmax; @@ -633,9 +635,9 @@ virExecWithHook(const char *const*argv,
/* The steps above may need todo something privileged, so * we delay clearing capabilities until the last minute */ - if ((flags & VIR_EXEC_CLEAR_CAPS) && - virClearCapabilities() < 0) - goto fork_error; + if (capabilities || (flags & VIR_EXEC_CLEAR_CAPS)) + if (virSetCapabilities(capabilities) < 0) + goto fork_error;
/* Close logging again to ensure no FDs leak to child */ virLogReset(); @@ -723,7 +725,8 @@ virExecWithHook(const char *const*argv A int flags_unused ATTRIBUTE_UNUSED, virExecHook hook ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED, - char *pidfile ATTRIBUTE_UNUSED) + char *pidfile ATTRIBUTE_UNUSED, + unsigned long long capabilities ATTRIBUTE_UNUSED) { /* XXX: Some day we can implement pieces of virCommand/virExec on * top of _spawn() or CreateProcess(), but we can't implement @@ -2171,7 +2174,8 @@ virCommandRunAsync(virCommandPtr cmd, pi cmd->flags, virCommandHook, cmd, - cmd->pidfile); + cmd->pidfile, + cmd->capabilities);
VIR_DEBUG("Command result %d, with PID %d", ret, (int)cmd->pid);
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch revises qemuProcessStart() function for qemu processes to retain CAP_SYS_RAWIO if needed. And in case of that, add taint flag to domain. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/qemu/qemu_domain.c | 3 +++ src/qemu/qemu_process.c | 8 ++++++++ 2 files changed, 11 insertions(+) Index: libvirt/src/qemu/qemu_process.c =================================================================== --- libvirt.orig/src/qemu/qemu_process.c +++ libvirt/src/qemu/qemu_process.c @@ -27,6 +27,7 @@ #include <sys/stat.h> #include <sys/time.h> #include <sys/resource.h> +#include <linux/capability.h> #include "qemu_process.h" #include "qemu_domain.h" @@ -3083,6 +3084,7 @@ int qemuProcessStart(virConnectPtr conn, virCommandPtr cmd = NULL; struct qemuProcessHookData hookData; unsigned long cur_balloon; + int i; hookData.conn = conn; hookData.vm = vm; @@ -3335,6 +3337,12 @@ int qemuProcessStart(virConnectPtr conn, if (driver->clearEmulatorCapabilities) virCommandClearCaps(cmd); + /* in case a certain disk is desirous of CAP_SYS_RAWIO, add this */ + for (i = 0; i < vm->def->ndisks; i++) { + if (vm->def->disks[i]->rawio == 1) + virCommandAllowCap(cmd, CAP_SYS_RAWIO); + } + virCommandSetPreExecHook(cmd, qemuProcessHook, &hookData); virCommandSetOutputFD(cmd, &logfile); Index: libvirt/src/qemu/qemu_domain.c =================================================================== --- libvirt.orig/src/qemu/qemu_domain.c +++ libvirt/src/qemu/qemu_domain.c @@ -1259,6 +1259,9 @@ void qemuDomainObjCheckDiskTaint(struct if (!disk->driverType && driver->allowDiskFormatProbing) qemuDomainObjTaint(driver, obj, VIR_DOMAIN_TAINT_DISK_PROBING, logFD); + + if (disk->rawio) + qemuDomainObjTaint(driver, obj, VIR_DOMAIN_TAINT_HIGH_PRIVILEGES, logFD); }

On Tue, Jan 31, 2012 at 01:52:27PM +0900, Taku Izumi wrote:
This patch revises qemuProcessStart() function for qemu processes to retain CAP_SYS_RAWIO if needed. And in case of that, add taint flag to domain.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp> --- src/qemu/qemu_domain.c | 3 +++ src/qemu/qemu_process.c | 8 ++++++++ 2 files changed, 11 insertions(+)
Index: libvirt/src/qemu/qemu_process.c =================================================================== --- libvirt.orig/src/qemu/qemu_process.c +++ libvirt/src/qemu/qemu_process.c @@ -27,6 +27,7 @@ #include <sys/stat.h> #include <sys/time.h> #include <sys/resource.h> +#include <linux/capability.h>
#include "qemu_process.h" #include "qemu_domain.h" @@ -3083,6 +3084,7 @@ int qemuProcessStart(virConnectPtr conn, virCommandPtr cmd = NULL; struct qemuProcessHookData hookData; unsigned long cur_balloon; + int i;
hookData.conn = conn; hookData.vm = vm; @@ -3335,6 +3337,12 @@ int qemuProcessStart(virConnectPtr conn, if (driver->clearEmulatorCapabilities) virCommandClearCaps(cmd);
+ /* in case a certain disk is desirous of CAP_SYS_RAWIO, add this */ + for (i = 0; i < vm->def->ndisks; i++) { + if (vm->def->disks[i]->rawio == 1) + virCommandAllowCap(cmd, CAP_SYS_RAWIO); + } + virCommandSetPreExecHook(cmd, qemuProcessHook, &hookData);
virCommandSetOutputFD(cmd, &logfile); Index: libvirt/src/qemu/qemu_domain.c =================================================================== --- libvirt.orig/src/qemu/qemu_domain.c +++ libvirt/src/qemu/qemu_domain.c @@ -1259,6 +1259,9 @@ void qemuDomainObjCheckDiskTaint(struct if (!disk->driverType && driver->allowDiskFormatProbing) qemuDomainObjTaint(driver, obj, VIR_DOMAIN_TAINT_DISK_PROBING, logFD); + + if (disk->rawio) + qemuDomainObjTaint(driver, obj, VIR_DOMAIN_TAINT_HIGH_PRIVILEGES, logFD); }
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01/31/2012 07:04 AM, Daniel P. Berrange wrote:
On Tue, Jan 31, 2012 at 01:52:27PM +0900, Taku Izumi wrote:
This patch revises qemuProcessStart() function for qemu processes to retain CAP_SYS_RAWIO if needed. And in case of that, add taint flag to domain. ACK
Pushed.

The original doc entry for rawio didn't mention the values it could have, the default, or the fact that setting it to "yes" for one disk effectively set it to "yes" for all disks in the domain. --- Pushed under the trivial rule. docs/formatdomain.html.in | 11 ++++++++--- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a025a8e..d58a5e1 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1097,9 +1097,14 @@ actual raw devices, never for individual partitions or LVM partitions (in those cases, the kernel will reject the generic SCSI commands, making it identical to device='disk'). - The optional <code>rawio</code> attribute indicates that the disk - is desirous of rawio capability. This attribute is only valid when - device is "lun". + The optional <code>rawio</code> attribute + (<span class="since">since 0.9.10</span>) indicates whether + the disk is needs rawio capability; valid settings are "yes" + or "no" (default is "no"). If any one disk in a domain has + rawio='yes', rawio capability will be enabled for all disks in + the domain (because, in the case of QEMU, this capability can + only be set on a per-process basis). This attribute is only + valid when device is "lun". The optional <code>snapshot</code> attribute indicates the default behavior of the disk during disk snapshots: "internal" requires a file format such as qcow2 that can store both the -- 1.7.7.5

On 01/31/2012 12:58 PM, Laine Stump wrote:
The original doc entry for rawio didn't mention the values it could have, the default, or the fact that setting it to "yes" for one disk effectively set it to "yes" for all disks in the domain.
That's true for now, but hopefully something we can change in the future (then again, if we change it in the future, we can also update the docs at that time). The idea is that the kernel already has the ability to maintain whitelists of valid ioctls per device, and it already has a cgroup for maintaining which devices can be accessed by various process groups. The logical conclusion - we need a new cgroup control file under the 'devices' cgroup which lets us tune the per-device whitelist of whether rawio ioctls are allowed. Once that is in place in the kernel, then libvirt can be taught to expose rawio only for the specified device, rather than its current behavior of forcing the enablement of rawio on all other devices tied to the process.
---
Pushed under the trivial rule.
docs/formatdomain.html.in | 11 ++++++++--- 1 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a025a8e..d58a5e1 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1097,9 +1097,14 @@ actual raw devices, never for individual partitions or LVM partitions (in those cases, the kernel will reject the generic SCSI commands, making it identical to device='disk'). - The optional <code>rawio</code> attribute indicates that the disk - is desirous of rawio capability. This attribute is only valid when - device is "lun". + The optional <code>rawio</code> attribute + (<span class="since">since 0.9.10</span>) indicates whether + the disk is needs rawio capability; valid settings are "yes" + or "no" (default is "no"). If any one disk in a domain has + rawio='yes', rawio capability will be enabled for all disks in + the domain (because, in the case of QEMU, this capability can + only be set on a per-process basis). This attribute is only + valid when device is "lun".
If anything, it might be worth tweaking this wording to state: Currently, if any one disk in a domain has rawio='yes', this implies that all other devices can use rawio; but this is expected to change in the future if the kernel is improved to limit rawio to a per-device rather than a per-process basis. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump
-
Taku Izumi