[libvirt] Supporting vhost-net and macvtap in libvirt for QEMU
by Anthony Liguori
Disclaimer: I am neither an SR-IOV nor a vhost-net expert, but I've CC'd
people that are who can throw tomatoes at me for getting bits wrong :-)
I wanted to start a discussion about supporting vhost-net in libvirt.
vhost-net has not yet been merged into qemu but I expect it will be soon
so it's a good time to start this discussion.
There are two modes worth supporting for vhost-net in libvirt. The
first mode is where vhost-net backs to a tun/tap device. This is
behaves in very much the same way that -net tap behaves in qemu today.
Basically, the difference is that the virtio backend is in the kernel
instead of in qemu so there should be some performance improvement.
Current, libvirt invokes qemu with -net tap,fd=X where X is an already
open fd to a tun/tap device. I suspect that after we merge vhost-net,
libvirt could support vhost-net in this mode by just doing -net
vhost,fd=X. I think the only real question for libvirt is whether to
provide a user visible switch to use vhost or to just always use vhost
when it's available and it makes sense. Personally, I think the later
makes sense.
The more interesting invocation of vhost-net though is one where the
vhost-net device backs directly to a physical network card. In this
mode, vhost should get considerably better performance than the current
implementation. I don't know the syntax yet, but I think it's
reasonable to assume that it will look something like -net
tap,dev=eth0. The effect will be that eth0 is dedicated to the guest.
On most modern systems, there is a small number of network devices so
this model is not all that useful except when dealing with SR-IOV
adapters. In that case, each physical device can be exposed as many
virtual devices (VFs). There are a few restrictions here though. The
biggest is that currently, you can only change the number of VFs by
reloading a kernel module so it's really a parameter that must be set at
startup time.
I think there are a few ways libvirt could support vhost-net in this
second mode. The simplest would be to introduce a new tag similar to
<source network='br0'>. In fact, if you probed the device type for the
network parameter, you could probably do something like <source
network='eth0'> and have it Just Work.
Another model would be to have libvirt see an SR-IOV adapter as a
network pool whereas it handled all of the VF management. Considering
how inflexible SR-IOV is today, I'm not sure whether this is the best model.
Has anyone put any more thought into this problem or how this should be
modeled in libvirt? Michael, could you share your current thinking for
-net syntax?
--
Regards,
Anthony Liguori
1 year
[libvirt] Libvirt multi queue support
by Naor Shlomo
Hello experts,
Could anyone please tell me if Multi Queue it fully supported in Libvirt and if so what version contains it?
Thanks,
Naor
8 years, 4 months
[libvirt] [PATCH] qemu: add PCI-multibus support for ppc
by Olivia Yin
Signed-off-by: Olivia Yin <hong-hua.yin(a)freescale.com>
---
src/qemu/qemu_capabilities.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index 7bc1ebc..7d7791d 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -2209,6 +2209,11 @@ virQEMUCapsInitHelp(virQEMUCapsPtr qemuCaps, uid_t runUid, gid_t runGid)
virQEMUCapsClear(qemuCaps, QEMU_CAPS_NO_ACPI);
}
+ /* ppc support PCI-multibus */
+ if (qemuCaps->arch == VIR_ARCH_PPC) {
+ virQEMUCapsSet(qemuCaps, QEMU_CAPS_PCI_MULTIBUS);
+ }
+
/* virQEMUCapsExtractDeviceStr will only set additional caps if qemu
* understands the 0.13.0+ notion of "-device driver,". */
if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE) &&
@@ -2450,6 +2455,11 @@ virQEMUCapsInitQMP(virQEMUCapsPtr qemuCaps,
virQEMUCapsSet(qemuCaps, QEMU_CAPS_NO_ACPI);
}
+ /* ppc support PCI-multibus */
+ if (qemuCaps->arch == VIR_ARCH_PPC) {
+ virQEMUCapsSet(qemuCaps, QEMU_CAPS_PCI_MULTIBUS);
+ }
+
if (virQEMUCapsProbeQMPCommands(qemuCaps, mon) < 0)
goto cleanup;
if (virQEMUCapsProbeQMPEvents(qemuCaps, mon) < 0)
--
1.6.4
10 years, 8 months
[libvirt] Add patches to allow users to join running containers.
by dwalsh@redhat.com
[PATCH 1/2] Add virGetUserDirectoryByUID to retrieve users homedir
[PATCH 2/2] virt-login-shell joins users into lxc container.
This patch implements most of the changes suggested by Dan Berrange and
Eric Blake.
Some replies to suggested changes.
Removed mingw-libvirt.spec.in changes since virt lxc probably can not be
supported in Windows. Not sure if I need to make changes so my code will not
build on that platform.
Did not make the changes to install virt-login-shell as 4755 automatically.
I guess I want a more firm, make that change request...
I did not make a helper function to parse a list of strings out of conf file.
The getuid and getgid calls return the user that executed the program, when the app is setuid geteuid and getegid return "0". I believe getuid and getgid are correct.
Added virt-login-shell --help, not sure what --program would do?
The program is hard coded to LXC because there is no way that I know of for a ZZ
process to join a running qemu instance.
I have heard back from one security review from Miloslav Trmac, who had similar comments as Eric.
10 years, 10 months
[libvirt] JNA Error Callback could cause core dump.
by Benjamin Wang (gendwang)
Hi,
When I changed code as following:
public class Connect {
// Load the native part
static {
Libvirt.INSTANCE.virInitialize();
try {
ErrorHandler.processError(Libvirt.INSTANCE);
} catch (Exception e) {
e.printStackTrace();
}
+ Libvirt.INSTANCE.virSetErrorFunc(null, new ErrorCallback());
}
The server will generate the following core dump:
Program terminated with signal 6, Aborted.
#0 0x0000003f9b030265 in raise () from /lib64/libc.so.6
(gdb) where
#0 0x0000003f9b030265 in raise () from /lib64/libc.so.6
#1 0x0000003f9b031d10 in abort () from /lib64/libc.so.6
#2 0x0000003f9b06a84b in __libc_message () from /lib64/libc.so.6
#3 0x0000003f9b07230f in _int_free () from /lib64/libc.so.6
#4 0x0000003f9b07276b in free () from /lib64/libc.so.6
#5 0x00002aaaacf46868 in ?? ()
#6 0x0000000000000000 in ?? ()
The problem was caused that when JNA call setErrorFunc, it will create ErrorCallback object. But when GC is executed, the object is GCed. But even I change code as following.
When GC is excuted, the callback object will be moved. Then C can't find this object. Both of scenarios will cause core dump. It seems that JNA mustn't provide ErrorCallback Class,
Because nobody can use this.
Please correct me.
public class Connect {
+ private static final ErrorCallback callback = new ErrorCallback();
// Load the native part
static {
Libvirt.INSTANCE.virInitialize();
try {
ErrorHandler.processError(Libvirt.INSTANCE);
} catch (Exception e) {
e.printStackTrace();
}
+ Libvirt.INSTANCE.virSetErrorFunc(null, callback);
}
B.R.
Benjamin Wang
10 years, 10 months
[libvirt] [PATCH 0/4] Support for integrating cgroups with systemd
by Daniel P. Berrange
From: "Daniel P. Berrange" <berrange(a)redhat.com>
This is a much changed / expanded version of my previous work to
create cgroups via systemd. The difference is that this time it
actually works :-)
I'm not proposing this for merge until after the 1.1.1 release.
Daniel P. Berrange (4):
Add APIs for formatting systemd slice/scope names
Add support for systemd cgroup mount
Cope with races while killing processes
Enable support for systemd-machined in cgroups creation
src/libvirt_private.syms | 2 +
src/lxc/lxc_process.c | 10 +-
src/qemu/qemu_cgroup.c | 1 +
src/util/vircgroup.c | 270 +++++++++++++++++++++++++++++++++++++++++------
src/util/vircgroup.h | 2 +
src/util/virsystemd.c | 96 ++++++++++++++++-
src/util/virsystemd.h | 5 +
tests/vircgrouptest.c | 9 ++
tests/virsystemdtest.c | 48 +++++++++
9 files changed, 403 insertions(+), 40 deletions(-)
--
1.8.1.4
10 years, 10 months
[libvirt] pvpanic plans?
by Paolo Bonzini
The thread from yesterday has died off (perhaps also because of
my inappropriate answer to Michael, for which I apologize to him
and everyone). I took some time to discuss the libvirt requirements
further with Daniel Berrange and Eric Blake on IRC. If anyone is
interested, I can give logs. This is a suggestion for how to
proceed in both QEMU and libvirt.
== Builtin pvpanic ==
QEMU will remove pvpanic from pc-1.5 in 1.6.1 and 1.5.4. This does not
break migration.
== Support in libvirt for current functionality ==
libvirt will add a <panic-notifier/> element, and possibly a capability
for it accessible via "virsh capabilities". There are two possibilities:
1) On QEMU 1.5.4/1.6.1 and newer (and on QEMU 1.6.0 with a machine type
other than pc-1.5), <on_crash> will only work if the element is there.
On QEMU 1.5.0->1.5.3, and on QEMU 1.6.0 with the pc-1.5 machine type,
<on_crash> will be obeyed always, and may override e.g. reboot-on-panic
if a guest driver exist.
2) On all versions, <on_crash> will only work if the element is there.
In turn, there are two ways to implement (2):
2a) libvirt will always add -global pvpanic.iobase=0 to neutralize
the builtin pvpanic device if present. <panic-notifier/>
will create the device with -device pvpanic,iobase=0x505
Advantage: no changes to QEMU
Disadvantage 1: writes to port 0 with QEMU 1.{5.0,5.1,5.2,5.3,6.0}
and pc-1.5 machine type will write to a pvpanic device instead of
the DMA controller. Probably harmless, and limited to some QEMU
versions.
Disadvantage 2: libvirt has knowledge of the pvpanic port number
2b) QEMU will provide a way for libvirt to detect that no machine type
has the builtin pvpanic. If some machine type may have the builtin
pvpanic, and <panic-notifier/> is absent, libvirt will add
"-global pvpanic.iobase=0" to neutralize it. Otherwise, libvirt
will create the device normally.
A possible way for libvirt to detect "good" machine types is a
dummy property. This is a bit ugly in that the property would not
affect the behavior of the device. The property would remain in
the long term.
Another possibility is for QEMU to rename the device, e.g. to
isa-pvpanic. This is also somewhat gross, but not visible in the
long term when the "pvpanic" name will be lost in history.
Advantage 1: libvirt has no knowledge of the pvpanic port number
Disadvantage 1: same as above
Disadvantage 2: need a somewhat gross change in QEMU
This method also provides an (also somewhat gross on the QEMU side)
way to detect other changes in the pvpanic semantics. One example
mentioned below, is making the panicked state temporary.
== Possible improvements to pvpanic ==
The current implementation of pvpanic supports three modes: reset system
on panic, destroy domain on panic, preserve domain with no possibility
to resume it. (Optionally a domain can be dumped too).
Long term, the choice to include pvpanic should not be on the guest
admin's shoulders, but rather in libosinfo. Thus, it would be nice to
have a fourth mode where the panic is logged but the guest otherwise
keeps running. This mode would let libosinfo add pvpanic by default
without affecting the guest's behavior on panic.
With this change, <on_crash>ignore</on_crash> will behave as follows
for the three possibilities above:
(1) With QEMU 1.5.0 to 1.6.1, <on_crash> will _not_ obey the setting,
never (even if no <panic-notifier/> is specified).
libvirt will have to pick a fallback action.
advantage of destroy as fallback: it is the default (but
note that restart is the default for virt-install)
advantage of preserve as fallback: lets the admin examine
the panic
advantage of restart as fallback: maximum availability of
the VM, it is the default for virt-install
(2a) With QEMU 1.5.0 to 1.6.1, <on_crash> will _not_ obey the setting
if <panic-notifier/> is specified. libvirt has _no way_ to learn
about this, so the capability would always be present with these
QEMU versions and libosinfo would always add <panic-notifier/> with
these versions. Given the libosinfo scenario being considered here,
this is not very different from (1).
(2b) With QEMU 1.5.0 to 1.6.1, the <panic-notifier/> element will not
be available and not exposed in libvirt capabilities. Thus with
this version libosinfo would omit <panic-notifier/> from the XML.
Guest policy will always be followed correctly.
The problem in both (1) and (2a) can be summarized as follows. First,
libvirt will have to implement and document a fallback action for buggy
QEMU. Second, even though the problems would be limited to some version
of QEMU, they would be relatively hard to debug for a casual user, could
start happening randomly by updating any one of QEMU, libvirt, libosinfo
or the guest kernel, and there is no fallback action for libvirt that is
always correct.
Thus, considering future libosinfo support for pvpanic, (2b) is preferrable
in my opinion.
Now, making pvpanic reversible requires a change in QEMU (patch already
posted). Andreas proposed using a pvpanic property to determine whether
the panicked state is temporary or definitive. libvirt could piggyback
on such a property to detect the "goodness" of machine types (as mentioned
regarding solution 2b above). However:
First, this would require a more intrusive patch, less appealing for
1.5 and 1.6 stable branches. Second, there is no reason why libvirt would
want to make the panicked state definitive. To achieve the same effect,
libvirt can just not issue the "continue" monitor command when the guest
is panicked. Thus the new property would be useless except to communicate
pvpanic behavior---and renaming the device still seems preferrable to me.
Thanks for reading up to here!
Paolo
11 years