[libvirt] Supporting vhost-net and macvtap in libvirt for QEMU
by Anthony Liguori
Disclaimer: I am neither an SR-IOV nor a vhost-net expert, but I've CC'd
people that are who can throw tomatoes at me for getting bits wrong :-)
I wanted to start a discussion about supporting vhost-net in libvirt.
vhost-net has not yet been merged into qemu but I expect it will be soon
so it's a good time to start this discussion.
There are two modes worth supporting for vhost-net in libvirt. The
first mode is where vhost-net backs to a tun/tap device. This is
behaves in very much the same way that -net tap behaves in qemu today.
Basically, the difference is that the virtio backend is in the kernel
instead of in qemu so there should be some performance improvement.
Current, libvirt invokes qemu with -net tap,fd=X where X is an already
open fd to a tun/tap device. I suspect that after we merge vhost-net,
libvirt could support vhost-net in this mode by just doing -net
vhost,fd=X. I think the only real question for libvirt is whether to
provide a user visible switch to use vhost or to just always use vhost
when it's available and it makes sense. Personally, I think the later
makes sense.
The more interesting invocation of vhost-net though is one where the
vhost-net device backs directly to a physical network card. In this
mode, vhost should get considerably better performance than the current
implementation. I don't know the syntax yet, but I think it's
reasonable to assume that it will look something like -net
tap,dev=eth0. The effect will be that eth0 is dedicated to the guest.
On most modern systems, there is a small number of network devices so
this model is not all that useful except when dealing with SR-IOV
adapters. In that case, each physical device can be exposed as many
virtual devices (VFs). There are a few restrictions here though. The
biggest is that currently, you can only change the number of VFs by
reloading a kernel module so it's really a parameter that must be set at
startup time.
I think there are a few ways libvirt could support vhost-net in this
second mode. The simplest would be to introduce a new tag similar to
<source network='br0'>. In fact, if you probed the device type for the
network parameter, you could probably do something like <source
network='eth0'> and have it Just Work.
Another model would be to have libvirt see an SR-IOV adapter as a
network pool whereas it handled all of the VF management. Considering
how inflexible SR-IOV is today, I'm not sure whether this is the best model.
Has anyone put any more thought into this problem or how this should be
modeled in libvirt? Michael, could you share your current thinking for
-net syntax?
--
Regards,
Anthony Liguori
10 months, 3 weeks
[libvirt] [PATCH 0/4] Multiple problems with saving to block devices
by Daniel P. Berrange
This patch series makes it possible to save to a block device,
instead of a plain file. There were multiple problems
- WHen save failed, we might de-reference a NULL pointer
- When save failed, we unlinked the device node !!
- The approach of using >> to append, doesn't work with block devices
- CGroups was blocking QEMU access to the block device when enabled
One remaining problem is not in libvirt, but rather QEMU. The QEMU
exec: based migration often fails to detect failure of the command
and will thus hang forever attempting a migration that'll never
succeed! Fortunately you can now work around this in libvirt using
the virsh domjobabort command
11 years, 5 months
[libvirt] libvirt(-java): virDomainMigrateSetMaxDowntime
by Thomas Treutner
Hi,
I'm facing some troubles with virDomainMigrate &
virDomainMigrateSetMaxDowntime. The core problem is that KVM's default
value for the maximum allowed downtime is 30ms (max_downtime in
migration.c, it's nanoseconds there; 0.12.3) which is too low for my VMs
when they're busy (~50% CPU util and above). Migrations then take
literally forever, I had to abort them after 15 minutes or so. I'm using
GBit Ethernet, so plenty bandwidth should be available. Increasing the
allowed downtime to 50ms seems to help, but I have not tested situations
where the VM is completely utilized. Anyways, the default value is too
low for me, so I tried virDomainMigrateSetMaxDowntime resp. the Java
wrapper function.
Here I'm facing a problem I can overcome only with a quite crude hack:
org.libvirt.Domain.migrate(..) blocks until the migration is done, which
is of course reasonable. So I tried calling migrateSetMaxDowntime(..)
before migrating, causing an error:
"Requested operation is not valid: domain is not being migrated"
This tells me that calling migrateSetMaxDowntime is only allowed during
migrations. As I'm migrating VMs automatically and without any user
intervention I'd need to create some glue code that runs in an extra
thread, waiting "some time" hoping that the migration was kicked off in
the main thread yet and then calling migrateSetMaxDowntime. I'd like to
avoid such quirks in the long run, if possible.
So my question: Would it be possible to extend the migrate() method
resp. virDomainMigrate() function with an optional maxDowntime parameter
that is passed down as QEMU_JOB_SIGNAL_MIGRATE_DOWNTIME so that
qemuDomainWaitForMigrationComplete would set the value? Or are there
easier ways?
Thanks and regards,
-t
13 years
[libvirt] (how much) support for kqemu domain
by John Lumby
I am wondering about the extent to which "old" qemu-0.11.1 and kqemu-1.4.0 are supported by virt-manager.
I see I can specify --virt-type=kqemu on virt-install and it remembers domain type='kqemu', and does things such as refusing to start the vm if the kqemu kernel mod not loaded, but it seems it does not tack on the
-enable-kqemu -kernel-kqemu
options on to the qemu command line. There is really not much point in trying to start a qemu-based vm with neither hardware kvm nor kqemu ...
I can work around it with an override script to intercept the qemu command, but does anyone think virt-manager ought to do this for me?
John Lumby
_________________________________________________________________
13 years, 3 months
[libvirt] xen:/// vs. xen://FQDN/ vs xen+unix:/// discrepancy
by Philipp Hahn
Hello,
I regularly observe the problem, that depending on the libvirt-URL I get
different information:
root@xen4# virsh -c xen://xen4.domain.name/ list
Id Name State
----------------------------------
root@xen4# virsh -c xen:/// list
Id Name State
----------------------------------
0 Domain-0 running
1 dos4 idle
root@xen4# virsh -c xen+unix:/// list
Id Name State
----------------------------------
root@xen4# xm li
Name ID Mem VCPUs State
Time(s)
Domain-0 0 2044 2 r----- 419.7
dos4 1 4 1 -b---- 33.4
Has somebody observed the same problem and/or can give me any advise on where
to look for the problem?
I'm running a Debian-Lenny based OS with libvirt-0.8.2 on Xen-3.4.3 with a
2.6.32 kernel.
Sincerely
Philipp Hahn
--
Philipp Hahn Open Source Software Engineer hahn(a)univention.de
Univention GmbH Linux for Your Business fon: +49 421 22 232- 0
Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99
http://www.univention.de
13 years, 6 months
[libvirt] [PATCH 1/2] build-sys: fix build when daemon is disabled by not installing libvirtd.8
by Diego Elio Pettenò
Since the rule to build libvirtd.8 is within the WITH_LIBVIRTD conditional,
so declare the man page in there as well. Without this change, build
without daemon will fail.
---
.gnulib | 2 +-
daemon/Makefile.am | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/.gnulib b/.gnulib
index 1629006..11fbc57 160000
--- a/.gnulib
+++ b/.gnulib
@@ -1 +1 @@
-Subproject commit 1629006348e1f66f07ce3ddcf3ebd2d14556cfce
+Subproject commit 11fbc57405a118e6ec9a3ebc19bbf5ececdae4d6
diff --git a/daemon/Makefile.am b/daemon/Makefile.am
index 963d64f..dbf0ac3 100644
--- a/daemon/Makefile.am
+++ b/daemon/Makefile.am
@@ -41,12 +41,12 @@ EXTRA_DIST = \
$(AVAHI_SOURCES) \
$(DAEMON_SOURCES)
-man_MANS = libvirtd.8
-
BUILT_SOURCES =
if WITH_LIBVIRTD
+man_MANS = libvirtd.8
+
sbin_PROGRAMS = libvirtd
confdir = $(sysconfdir)/libvirt/
--
1.7.2
13 years, 7 months
[libvirt] [PATCH v2] Support for qemu aio drive option
by Matthias Dahl
Revised patch against libvirt 0.7.6 to support qemu's aio option.
qemu allows the user to choose what io storage api should be used, either the
default (threads) or native (linux aio) which in the latter case can result in
better performance.
This patch exposes this functionality through libvirt.
Thanks a lot to Eric Blake and Matthias Bolte for their comments.
---
docs/formatdomain.html.in | 4 +++-
docs/schemas/domain.rng | 13 +++++++++++--
src/conf/domain_conf.c | 23 +++++++++++++++++++++++
src/conf/domain_conf.h | 10 ++++++++++
src/qemu/qemu_conf.c | 14 ++++++++++++++
src/qemu/qemu_conf.h | 2 ++
6 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index ce49f7d..ec85ef3 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -478,7 +478,9 @@
attribute is the primary backend driver name, while the optional <code>type</code>
attribute provides the sub-type. The optional <code>cache</code> attribute
controls the cache mechanism, possible values are "default", "none",
- "writethrough" and "writeback". <span class="since">Since 0.1.8</span>
+ "writethrough" and "writeback". Specific to KVM guests is the optional <code>aio</code>
+ attribute which controls what storage api is used for io operations, its possible
+ values are "threads" and "native". <span class="since">Since 0.1.8; "aio" attribute since 0.7.7</span>
</dd>
<dt><code>encryption</code></dt>
<dd>If present, specifies how the volume is encrypted. See
diff --git a/docs/schemas/domain.rng b/docs/schemas/domain.rng
index bb6d00d..5d2dafe 100644
--- a/docs/schemas/domain.rng
+++ b/docs/schemas/domain.rng
@@ -499,8 +499,9 @@
<ref name="driverCache"/>
</group>
</choice>
- <empty/>
- </element>
+ <optional>
+ <ref name="driverAIO"/>
+ </optional>
</define>
<define name="driverFormat">
<attribute name="name">
@@ -521,6 +522,14 @@
</choice>
</attribute>
</define>
+ <define name="driverAIO">
+ <attribute name="aio">
+ <choice>
+ <value>threads</value>
+ <value>native</value>
+ </choice>
+ </attribute>
+ </define>
<define name="controller">
<element name="controller">
<optional>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 766993c..8bee7b8 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -120,6 +120,11 @@ VIR_ENUM_IMPL(virDomainDiskCache, VIR_DOMAIN_DISK_CACHE_LAST,
"writethrough",
"writeback")
+VIR_ENUM_IMPL(virDomainDiskAIO, VIR_DOMAIN_DISK_AIO_LAST,
+ "default",
+ "native",
+ "threads")
+
VIR_ENUM_IMPL(virDomainController, VIR_DOMAIN_CONTROLLER_TYPE_LAST,
"ide",
"fdc",
@@ -1228,6 +1233,7 @@ virDomainDiskDefParseXML(virConnectPtr conn,
char *target = NULL;
char *bus = NULL;
char *cachetag = NULL;
+ char *aiotag = NULL;
char *devaddr = NULL;
virStorageEncryptionPtr encryption = NULL;
char *serial = NULL;
@@ -1293,6 +1299,7 @@ virDomainDiskDefParseXML(virConnectPtr conn,
driverName = virXMLPropString(cur, "name");
driverType = virXMLPropString(cur, "type");
cachetag = virXMLPropString(cur, "cache");
+ aiotag = virXMLPropString(cur, "aio");
} else if (xmlStrEqual(cur->name, BAD_CAST "readonly")) {
def->readonly = 1;
} else if (xmlStrEqual(cur->name, BAD_CAST "shareable")) {
@@ -1409,6 +1416,13 @@ virDomainDiskDefParseXML(virConnectPtr conn,
goto error;
}
+ if (aiotag &&
+ (def->aiomode = virDomainDiskAIOTypeFromString(aiotag)) < 0) {
+ virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR,
+ _("unknown disk aio mode '%s'"), aiotag);
+ goto error;
+ }
+
if (devaddr) {
if (sscanf(devaddr, "%x:%x:%x",
&def->info.addr.pci.domain,
@@ -1450,6 +1464,7 @@ cleanup:
VIR_FREE(driverType);
VIR_FREE(driverName);
VIR_FREE(cachetag);
+ VIR_FREE(aiotag);
VIR_FREE(devaddr);
VIR_FREE(serial);
virStorageEncryptionFree(encryption);
@@ -4565,6 +4580,7 @@ virDomainDiskDefFormat(virConnectPtr conn,
const char *device = virDomainDiskDeviceTypeToString(def->device);
const char *bus = virDomainDiskBusTypeToString(def->bus);
const char *cachemode = virDomainDiskCacheTypeToString(def->cachemode);
+ const char *aiomode = virDomainDiskAIOTypeToString(def->aiomode);
if (!type) {
virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR,
@@ -4586,6 +4602,11 @@ virDomainDiskDefFormat(virConnectPtr conn,
_("unexpected disk cache mode %d"), def->cachemode);
return -1;
}
+ if (!aiomode) {
+ virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR,
+ _("unexpected disk aio mode %d"), def->aiomode);
+ return -1;
+ }
virBufferVSprintf(buf,
" <disk type='%s' device='%s'>\n",
@@ -4597,6 +4618,8 @@ virDomainDiskDefFormat(virConnectPtr conn,
virBufferVSprintf(buf, " type='%s'", def->driverType);
if (def->cachemode)
virBufferVSprintf(buf, " cache='%s'", cachemode);
+ if (def->aiomode)
+ virBufferVSprintf(buf, " aio='%s'", aiomode);
virBufferVSprintf(buf, "/>\n");
}
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 0b79e88..07805a6 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -141,6 +141,14 @@ enum virDomainDiskCache {
VIR_DOMAIN_DISK_CACHE_LAST
};
+enum virDomainDiskAIO {
+ VIR_DOMAIN_DISK_AIO_DEFAULT,
+ VIR_DOMAIN_DISK_AIO_NATIVE,
+ VIR_DOMAIN_DISK_AIO_THREADS,
+
+ VIR_DOMAIN_DISK_AIO_LAST
+};
+
/* Stores the virtual disk configuration */
typedef struct _virDomainDiskDef virDomainDiskDef;
typedef virDomainDiskDef *virDomainDiskDefPtr;
@@ -154,6 +162,7 @@ struct _virDomainDiskDef {
char *driverType;
char *serial;
int cachemode;
+ int aiomode;
unsigned int readonly : 1;
unsigned int shared : 1;
virDomainDeviceInfo info;
@@ -897,6 +906,7 @@ VIR_ENUM_DECL(virDomainDisk)
VIR_ENUM_DECL(virDomainDiskDevice)
VIR_ENUM_DECL(virDomainDiskBus)
VIR_ENUM_DECL(virDomainDiskCache)
+VIR_ENUM_DECL(virDomainDiskAIO)
VIR_ENUM_DECL(virDomainController)
VIR_ENUM_DECL(virDomainFS)
VIR_ENUM_DECL(virDomainNet)
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c
index 3d83a8f..fd3a670 100644
--- a/src/qemu/qemu_conf.c
+++ b/src/qemu/qemu_conf.c
@@ -83,6 +83,12 @@ VIR_ENUM_IMPL(qemuDiskCacheV2, VIR_DOMAIN_DISK_CACHE_LAST,
"writethrough",
"writeback");
+VIR_ENUM_DECL(qemuDiskAIO)
+VIR_ENUM_IMPL(qemuDiskAIO, VIR_DOMAIN_DISK_AIO_LAST,
+ "default",
+ "native",
+ "threads");
+
VIR_ENUM_DECL(qemuVideo)
VIR_ENUM_IMPL(qemuVideo, VIR_DOMAIN_VIDEO_TYPE_LAST,
@@ -1137,6 +1143,8 @@ static unsigned int qemudComputeCmdFlags(const char *help,
flags |= QEMUD_CMD_FLAG_DRIVE_CACHE_V2;
if (strstr(help, "format="))
flags |= QEMUD_CMD_FLAG_DRIVE_FORMAT;
+ if (strstr(help, "aio=threads|native"))
+ flags |= QEMUD_CMD_FLAG_DRIVE_AIO;
}
if (strstr(help, "-vga") && !strstr(help, "-std-vga"))
flags |= QEMUD_CMD_FLAG_VGA;
@@ -2340,6 +2348,12 @@ qemuBuildDriveStr(virDomainDiskDefPtr disk,
virBufferAddLit(&opt, ",cache=off");
}
+ if (disk->aiomode && (qemuCmdFlags & QEMUD_CMD_FLAG_DRIVE_AIO)) {
+ const char * mode = qemuDiskAIOTypeToString(disk->aiomode);
+
+ virBufferVSprintf(&opt, ",aio=%s", mode);
+ }
+
if (virBufferError(&opt)) {
virReportOOMError(NULL);
goto error;
diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h
index 101f187..780ae6e 100644
--- a/src/qemu/qemu_conf.h
+++ b/src/qemu/qemu_conf.h
@@ -82,6 +82,8 @@ enum qemud_cmd_flags {
QEMUD_CMD_FLAG_SDL = (1 << 27), /* Is the new -sdl arg available */
QEMUD_CMD_FLAG_SMP_TOPOLOGY = (1 << 28), /* Is sockets=s,cores=c,threads=t available for -smp? */
QEMUD_CMD_FLAG_NETDEV = (1 << 29), /* The -netdev flag & netdev_add/remove monitor commands */
+
+ QEMUD_CMD_FLAG_DRIVE_AIO = (1 << 30), /* Is -drive aio= avail */
};
/* Main driver state */
--
1.7.0.4
13 years, 8 months
[libvirt] [PATCH] initgroups() in qemudOpenAsUID()
by Dan Kenigsberg
qemudOpenAsUID is intended to open a file with the credentials of a
specified uid. Current implementation fails if the file is accessible to
one of uid's groups but not owned by uid.
This patch replaces the supplementary group list that the child process
inherited from libvirtd with the default group list of uid.
---
src/qemu/qemu_driver.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 0ce2d40..a1027d4 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -6353,6 +6353,7 @@ parent_cleanup:
char *buf = NULL;
size_t bufsize = 1024 * 1024;
int bytesread;
+ struct passwd *pwd;
/* child doesn't need the read side of the pipe */
close(pipefd[0]);
@@ -6365,6 +6366,21 @@ parent_cleanup:
goto child_cleanup;
}
+ /* we can avoid getpwuid_r() in threadless child */
+ if ((pwd = getpwuid(uid)) == NULL) {
+ exit_code = errno;
+ virReportSystemError(errno,
+ _("cannot setuid(%d) to read '%s'"),
+ uid, path);
+ goto child_cleanup;
+ }
+ if (initgroups(pwd->pw_name, pwd->pw_gid) != 0) {
+ exit_code = errno;
+ virReportSystemError(errno,
+ _("cannot setuid(%d) to read '%s'"),
+ uid, path);
+ goto child_cleanup;
+ }
if (setuid(uid) != 0) {
exit_code = errno;
virReportSystemError(errno,
--
1.7.2.3
13 years, 8 months