[libvirt] Supporting vhost-net and macvtap in libvirt for QEMU
by Anthony Liguori
Disclaimer: I am neither an SR-IOV nor a vhost-net expert, but I've CC'd
people that are who can throw tomatoes at me for getting bits wrong :-)
I wanted to start a discussion about supporting vhost-net in libvirt.
vhost-net has not yet been merged into qemu but I expect it will be soon
so it's a good time to start this discussion.
There are two modes worth supporting for vhost-net in libvirt. The
first mode is where vhost-net backs to a tun/tap device. This is
behaves in very much the same way that -net tap behaves in qemu today.
Basically, the difference is that the virtio backend is in the kernel
instead of in qemu so there should be some performance improvement.
Current, libvirt invokes qemu with -net tap,fd=X where X is an already
open fd to a tun/tap device. I suspect that after we merge vhost-net,
libvirt could support vhost-net in this mode by just doing -net
vhost,fd=X. I think the only real question for libvirt is whether to
provide a user visible switch to use vhost or to just always use vhost
when it's available and it makes sense. Personally, I think the later
makes sense.
The more interesting invocation of vhost-net though is one where the
vhost-net device backs directly to a physical network card. In this
mode, vhost should get considerably better performance than the current
implementation. I don't know the syntax yet, but I think it's
reasonable to assume that it will look something like -net
tap,dev=eth0. The effect will be that eth0 is dedicated to the guest.
On most modern systems, there is a small number of network devices so
this model is not all that useful except when dealing with SR-IOV
adapters. In that case, each physical device can be exposed as many
virtual devices (VFs). There are a few restrictions here though. The
biggest is that currently, you can only change the number of VFs by
reloading a kernel module so it's really a parameter that must be set at
startup time.
I think there are a few ways libvirt could support vhost-net in this
second mode. The simplest would be to introduce a new tag similar to
<source network='br0'>. In fact, if you probed the device type for the
network parameter, you could probably do something like <source
network='eth0'> and have it Just Work.
Another model would be to have libvirt see an SR-IOV adapter as a
network pool whereas it handled all of the VF management. Considering
how inflexible SR-IOV is today, I'm not sure whether this is the best model.
Has anyone put any more thought into this problem or how this should be
modeled in libvirt? Michael, could you share your current thinking for
-net syntax?
--
Regards,
Anthony Liguori
1 year, 1 month
[libvirt] [PATCH 0/4] Multiple problems with saving to block devices
by Daniel P. Berrange
This patch series makes it possible to save to a block device,
instead of a plain file. There were multiple problems
- WHen save failed, we might de-reference a NULL pointer
- When save failed, we unlinked the device node !!
- The approach of using >> to append, doesn't work with block devices
- CGroups was blocking QEMU access to the block device when enabled
One remaining problem is not in libvirt, but rather QEMU. The QEMU
exec: based migration often fails to detect failure of the command
and will thus hang forever attempting a migration that'll never
succeed! Fortunately you can now work around this in libvirt using
the virsh domjobabort command
11 years, 9 months
[libvirt] [PATCH v2] Support for qemu aio drive option
by Matthias Dahl
Revised patch against libvirt 0.7.6 to support qemu's aio option.
qemu allows the user to choose what io storage api should be used, either the
default (threads) or native (linux aio) which in the latter case can result in
better performance.
This patch exposes this functionality through libvirt.
Thanks a lot to Eric Blake and Matthias Bolte for their comments.
---
docs/formatdomain.html.in | 4 +++-
docs/schemas/domain.rng | 13 +++++++++++--
src/conf/domain_conf.c | 23 +++++++++++++++++++++++
src/conf/domain_conf.h | 10 ++++++++++
src/qemu/qemu_conf.c | 14 ++++++++++++++
src/qemu/qemu_conf.h | 2 ++
6 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index ce49f7d..ec85ef3 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -478,7 +478,9 @@
attribute is the primary backend driver name, while the optional <code>type</code>
attribute provides the sub-type. The optional <code>cache</code> attribute
controls the cache mechanism, possible values are "default", "none",
- "writethrough" and "writeback". <span class="since">Since 0.1.8</span>
+ "writethrough" and "writeback". Specific to KVM guests is the optional <code>aio</code>
+ attribute which controls what storage api is used for io operations, its possible
+ values are "threads" and "native". <span class="since">Since 0.1.8; "aio" attribute since 0.7.7</span>
</dd>
<dt><code>encryption</code></dt>
<dd>If present, specifies how the volume is encrypted. See
diff --git a/docs/schemas/domain.rng b/docs/schemas/domain.rng
index bb6d00d..5d2dafe 100644
--- a/docs/schemas/domain.rng
+++ b/docs/schemas/domain.rng
@@ -499,8 +499,9 @@
<ref name="driverCache"/>
</group>
</choice>
- <empty/>
- </element>
+ <optional>
+ <ref name="driverAIO"/>
+ </optional>
</define>
<define name="driverFormat">
<attribute name="name">
@@ -521,6 +522,14 @@
</choice>
</attribute>
</define>
+ <define name="driverAIO">
+ <attribute name="aio">
+ <choice>
+ <value>threads</value>
+ <value>native</value>
+ </choice>
+ </attribute>
+ </define>
<define name="controller">
<element name="controller">
<optional>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 766993c..8bee7b8 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -120,6 +120,11 @@ VIR_ENUM_IMPL(virDomainDiskCache, VIR_DOMAIN_DISK_CACHE_LAST,
"writethrough",
"writeback")
+VIR_ENUM_IMPL(virDomainDiskAIO, VIR_DOMAIN_DISK_AIO_LAST,
+ "default",
+ "native",
+ "threads")
+
VIR_ENUM_IMPL(virDomainController, VIR_DOMAIN_CONTROLLER_TYPE_LAST,
"ide",
"fdc",
@@ -1228,6 +1233,7 @@ virDomainDiskDefParseXML(virConnectPtr conn,
char *target = NULL;
char *bus = NULL;
char *cachetag = NULL;
+ char *aiotag = NULL;
char *devaddr = NULL;
virStorageEncryptionPtr encryption = NULL;
char *serial = NULL;
@@ -1293,6 +1299,7 @@ virDomainDiskDefParseXML(virConnectPtr conn,
driverName = virXMLPropString(cur, "name");
driverType = virXMLPropString(cur, "type");
cachetag = virXMLPropString(cur, "cache");
+ aiotag = virXMLPropString(cur, "aio");
} else if (xmlStrEqual(cur->name, BAD_CAST "readonly")) {
def->readonly = 1;
} else if (xmlStrEqual(cur->name, BAD_CAST "shareable")) {
@@ -1409,6 +1416,13 @@ virDomainDiskDefParseXML(virConnectPtr conn,
goto error;
}
+ if (aiotag &&
+ (def->aiomode = virDomainDiskAIOTypeFromString(aiotag)) < 0) {
+ virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR,
+ _("unknown disk aio mode '%s'"), aiotag);
+ goto error;
+ }
+
if (devaddr) {
if (sscanf(devaddr, "%x:%x:%x",
&def->info.addr.pci.domain,
@@ -1450,6 +1464,7 @@ cleanup:
VIR_FREE(driverType);
VIR_FREE(driverName);
VIR_FREE(cachetag);
+ VIR_FREE(aiotag);
VIR_FREE(devaddr);
VIR_FREE(serial);
virStorageEncryptionFree(encryption);
@@ -4565,6 +4580,7 @@ virDomainDiskDefFormat(virConnectPtr conn,
const char *device = virDomainDiskDeviceTypeToString(def->device);
const char *bus = virDomainDiskBusTypeToString(def->bus);
const char *cachemode = virDomainDiskCacheTypeToString(def->cachemode);
+ const char *aiomode = virDomainDiskAIOTypeToString(def->aiomode);
if (!type) {
virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR,
@@ -4586,6 +4602,11 @@ virDomainDiskDefFormat(virConnectPtr conn,
_("unexpected disk cache mode %d"), def->cachemode);
return -1;
}
+ if (!aiomode) {
+ virDomainReportError(conn, VIR_ERR_INTERNAL_ERROR,
+ _("unexpected disk aio mode %d"), def->aiomode);
+ return -1;
+ }
virBufferVSprintf(buf,
" <disk type='%s' device='%s'>\n",
@@ -4597,6 +4618,8 @@ virDomainDiskDefFormat(virConnectPtr conn,
virBufferVSprintf(buf, " type='%s'", def->driverType);
if (def->cachemode)
virBufferVSprintf(buf, " cache='%s'", cachemode);
+ if (def->aiomode)
+ virBufferVSprintf(buf, " aio='%s'", aiomode);
virBufferVSprintf(buf, "/>\n");
}
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 0b79e88..07805a6 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -141,6 +141,14 @@ enum virDomainDiskCache {
VIR_DOMAIN_DISK_CACHE_LAST
};
+enum virDomainDiskAIO {
+ VIR_DOMAIN_DISK_AIO_DEFAULT,
+ VIR_DOMAIN_DISK_AIO_NATIVE,
+ VIR_DOMAIN_DISK_AIO_THREADS,
+
+ VIR_DOMAIN_DISK_AIO_LAST
+};
+
/* Stores the virtual disk configuration */
typedef struct _virDomainDiskDef virDomainDiskDef;
typedef virDomainDiskDef *virDomainDiskDefPtr;
@@ -154,6 +162,7 @@ struct _virDomainDiskDef {
char *driverType;
char *serial;
int cachemode;
+ int aiomode;
unsigned int readonly : 1;
unsigned int shared : 1;
virDomainDeviceInfo info;
@@ -897,6 +906,7 @@ VIR_ENUM_DECL(virDomainDisk)
VIR_ENUM_DECL(virDomainDiskDevice)
VIR_ENUM_DECL(virDomainDiskBus)
VIR_ENUM_DECL(virDomainDiskCache)
+VIR_ENUM_DECL(virDomainDiskAIO)
VIR_ENUM_DECL(virDomainController)
VIR_ENUM_DECL(virDomainFS)
VIR_ENUM_DECL(virDomainNet)
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c
index 3d83a8f..fd3a670 100644
--- a/src/qemu/qemu_conf.c
+++ b/src/qemu/qemu_conf.c
@@ -83,6 +83,12 @@ VIR_ENUM_IMPL(qemuDiskCacheV2, VIR_DOMAIN_DISK_CACHE_LAST,
"writethrough",
"writeback");
+VIR_ENUM_DECL(qemuDiskAIO)
+VIR_ENUM_IMPL(qemuDiskAIO, VIR_DOMAIN_DISK_AIO_LAST,
+ "default",
+ "native",
+ "threads");
+
VIR_ENUM_DECL(qemuVideo)
VIR_ENUM_IMPL(qemuVideo, VIR_DOMAIN_VIDEO_TYPE_LAST,
@@ -1137,6 +1143,8 @@ static unsigned int qemudComputeCmdFlags(const char *help,
flags |= QEMUD_CMD_FLAG_DRIVE_CACHE_V2;
if (strstr(help, "format="))
flags |= QEMUD_CMD_FLAG_DRIVE_FORMAT;
+ if (strstr(help, "aio=threads|native"))
+ flags |= QEMUD_CMD_FLAG_DRIVE_AIO;
}
if (strstr(help, "-vga") && !strstr(help, "-std-vga"))
flags |= QEMUD_CMD_FLAG_VGA;
@@ -2340,6 +2348,12 @@ qemuBuildDriveStr(virDomainDiskDefPtr disk,
virBufferAddLit(&opt, ",cache=off");
}
+ if (disk->aiomode && (qemuCmdFlags & QEMUD_CMD_FLAG_DRIVE_AIO)) {
+ const char * mode = qemuDiskAIOTypeToString(disk->aiomode);
+
+ virBufferVSprintf(&opt, ",aio=%s", mode);
+ }
+
if (virBufferError(&opt)) {
virReportOOMError(NULL);
goto error;
diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h
index 101f187..780ae6e 100644
--- a/src/qemu/qemu_conf.h
+++ b/src/qemu/qemu_conf.h
@@ -82,6 +82,8 @@ enum qemud_cmd_flags {
QEMUD_CMD_FLAG_SDL = (1 << 27), /* Is the new -sdl arg available */
QEMUD_CMD_FLAG_SMP_TOPOLOGY = (1 << 28), /* Is sockets=s,cores=c,threads=t available for -smp? */
QEMUD_CMD_FLAG_NETDEV = (1 << 29), /* The -netdev flag & netdev_add/remove monitor commands */
+
+ QEMUD_CMD_FLAG_DRIVE_AIO = (1 << 30), /* Is -drive aio= avail */
};
/* Main driver state */
--
1.7.0.4
13 years, 11 months
[libvirt] process= support for 'qemu-kvm -name' [Bug 576950]
by John Morrissey
I wrote (attached here, and to the bug) a quick patch that sets the process
name to the same value as the window title.
I'm unsure where to go from here. Should I add support for converting
"native" QEMU command lines to libvirt XML? What would that look like, since
I'm not modifying the libvirt format? Should it just drop any ,process= from
the QEMU command line it's parsing? I also imagine the test cases will need
updating.
If someone could give me some high-level guidance, I'd be happy to keep
going on this.
john
--
John Morrissey _o /\ ---- __o
jwm(a)horde.net _-< \_ / \ ---- < \,
www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__
14 years, 1 month
[libvirt] [PATCH 0/2] speed up qemu domain save by increasing dd blocksize
by Laine Stump
These two patches are posted together because applying the 2nd exposes
the bug fixed by the first.
Here are the results of tests I made with various block sizes before
deciding the 1MB really was the best balance (all tests were done on a
paused 512MB domain, saving to local disk on a Lenovo T61 laptop)
BS M:SS save image size
----- ---- ---------------
2048K - 0:56 476135451
1024K - 0:56 475090953
512k - 1:02 474564173
256k - 1:10 474303797
128k - 1:25 474176859
512 - 3:47 474085423 - the original
I didn't bother testing sizes between 512 and 128k, as there was still
significant improvement from 128k to 256k.
14 years, 2 months
[libvirt] [PATCH 0/3] test cases for spoofing prevention
by gstenzel@linux.vnet.ibm.com
The following patches add a set of test cases to verify that several spoofing attacks are prevented by the nwfilter subsystem.
In order to have a well defined test machine a virtual disk is installed from scratch over the network.
I am currently trying to find a suitable location for the kickstart file.
--
Best regards,
Gerhard Stenzel,
-----------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
14 years, 3 months
[libvirt] inability to open local read-only connection
by Tavares, John
I have been experimenting with using libvirt (0.3.3) on a variety of systems (RHEL, CentOS and Oracle VM). I have run into an issue when I try to open a local read-only connection to the hypervisor that is failing only on Oracle VM server release 2.2.0. I have created a root owned setuid executable that is effectively running as root, but even so, still cannot open the local read-only connection of the hypervisor. It only works if I run it directly as root. This is not an option. I do not understand why it works as is on my RHEL and CentOS machines, but not my Oracle machine. It would seem as thought it is not checking if the effective uid is root, just the uid.
Has anyone run into a similar issue or have any suggestions of what I might try to fix this issue or can tell me that this is a defect that needs (is) fixed??
Thanks.
14 years, 4 months