[libvirt] Supporting vhost-net and macvtap in libvirt for QEMU
by Anthony Liguori
Disclaimer: I am neither an SR-IOV nor a vhost-net expert, but I've CC'd
people that are who can throw tomatoes at me for getting bits wrong :-)
I wanted to start a discussion about supporting vhost-net in libvirt.
vhost-net has not yet been merged into qemu but I expect it will be soon
so it's a good time to start this discussion.
There are two modes worth supporting for vhost-net in libvirt. The
first mode is where vhost-net backs to a tun/tap device. This is
behaves in very much the same way that -net tap behaves in qemu today.
Basically, the difference is that the virtio backend is in the kernel
instead of in qemu so there should be some performance improvement.
Current, libvirt invokes qemu with -net tap,fd=X where X is an already
open fd to a tun/tap device. I suspect that after we merge vhost-net,
libvirt could support vhost-net in this mode by just doing -net
vhost,fd=X. I think the only real question for libvirt is whether to
provide a user visible switch to use vhost or to just always use vhost
when it's available and it makes sense. Personally, I think the later
makes sense.
The more interesting invocation of vhost-net though is one where the
vhost-net device backs directly to a physical network card. In this
mode, vhost should get considerably better performance than the current
implementation. I don't know the syntax yet, but I think it's
reasonable to assume that it will look something like -net
tap,dev=eth0. The effect will be that eth0 is dedicated to the guest.
On most modern systems, there is a small number of network devices so
this model is not all that useful except when dealing with SR-IOV
adapters. In that case, each physical device can be exposed as many
virtual devices (VFs). There are a few restrictions here though. The
biggest is that currently, you can only change the number of VFs by
reloading a kernel module so it's really a parameter that must be set at
startup time.
I think there are a few ways libvirt could support vhost-net in this
second mode. The simplest would be to introduce a new tag similar to
<source network='br0'>. In fact, if you probed the device type for the
network parameter, you could probably do something like <source
network='eth0'> and have it Just Work.
Another model would be to have libvirt see an SR-IOV adapter as a
network pool whereas it handled all of the VF management. Considering
how inflexible SR-IOV is today, I'm not sure whether this is the best model.
Has anyone put any more thought into this problem or how this should be
modeled in libvirt? Michael, could you share your current thinking for
-net syntax?
--
Regards,
Anthony Liguori
1 year
[libvirt] Libvirt multi queue support
by Naor Shlomo
Hello experts,
Could anyone please tell me if Multi Queue it fully supported in Libvirt and if so what version contains it?
Thanks,
Naor
8 years, 5 months
[libvirt] [PATCH/RFC] Add missing delta from Ubuntu to apparmor profiles
by Stefan Bader
This had been on the Debian package list before but its time to take
this onwards. So the goal would be to have one set to rule them all
(when using apparmor) and drop the seperate set of definitions which
exist at least in the Ubuntu packaging.
Right now the patch would be at a state which adds all missing files
and rules to the current examples in libvirt and installs them when
using --with-apparmor-profiles.
One problem seems to be that some of the definitions might cause
parse failures on certain versions of apparmor. I checked this morning
and this looks a bit hairy. So some apparmor 2.8 versions potentially
have issues, but not all apparmor 2.8 are the same (gah).
I could imagine (but John, we really could use some guidance here ;))
that at least some changes could be related to version 2.8.95~2430:
+ debian/patches/mediate-signals.patch,
debian/patches/change-signal-syntax.patch: Parse signal rules with
apparmor_parser. See the apparmor.d(5) man page for syntax details.
+ debian/patches/change-ptrace-syntax.patch,
debian/patches/mediate-ptrace.patch: Parse ptrace rules with
apparmor_parser. See the apparmor.d(5) man page for syntax details.
But, regardless of the when, the apparmor rules maybe need a way to handle
versioned features of the parser. One proposal was to comment out problematic
rules and allow the packager to re-enable things. Maybe going one step
further and have some pre-processing that handles version based sections
(like #if (APPARMOR_VERSION >= xxx)).
So that is where we stand. Ideas are very welcome.
-Stefan
---
>From aec5cf8cc30c80492a37856626264c3d4c27a31f Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader(a)canonical.com>
Date: Thu, 18 Sep 2014 14:15:17 +0200
Subject: [PATCH] Add missing delta from Ubuntu to apparmor profiles
This fixes up the upstream profiles and would allow to drop apparmor
related delta from the Ubuntu package.
Thanks to Serge Hallyn for the Makefile.am install hook that allows
to rename the local file.
Signed-off-by: Stefan Bader <stefan.bader(a)canonical.com>
---
examples/apparmor/Makefile.am | 10 ++++++++
examples/apparmor/libvirt-lxc | 15 +++++++++++-
examples/apparmor/libvirt-qemu | 31 +++++++++++++++++++++++-
examples/apparmor/local-usr.sbin.libvirtd | 2 ++
examples/apparmor/usr.lib.libvirt.virt-aa-helper | 25 ++++++++++++++++---
examples/apparmor/usr.sbin.libvirtd | 17 ++++++++++++-
6 files changed, 94 insertions(+), 6 deletions(-)
create mode 100644 examples/apparmor/local-usr.sbin.libvirtd
diff --git a/examples/apparmor/Makefile.am b/examples/apparmor/Makefile.am
index 7a20e16..aa46cb9 100644
--- a/examples/apparmor/Makefile.am
+++ b/examples/apparmor/Makefile.am
@@ -20,6 +20,7 @@ EXTRA_DIST= \
libvirt-qemu \
libvirt-lxc \
usr.lib.libvirt.virt-aa-helper \
+ local-usr.sbin.libvirtd \
usr.sbin.libvirtd
if WITH_APPARMOR_PROFILES
@@ -29,6 +30,15 @@ apparmor_DATA = \
usr.sbin.libvirtd \
$(NULL)
+localdir = $(apparmordir)/local
+local_DATA = \
+ local-usr.sbin.libvirtd \
+ $(NULL)
+
+install-data-hook:
+ mv $(DESTDIR)$(localdir)/local-usr.sbin.libvirtd \
+ $(DESTDIR)$(localdir)/usr.sbin.libvirtd
+
abstractionsdir = $(apparmordir)/abstractions
abstractions_DATA = \
libvirt-qemu \
diff --git a/examples/apparmor/libvirt-lxc b/examples/apparmor/libvirt-lxc
index 4bfb503..4705e0a 100644
--- a/examples/apparmor/libvirt-lxc
+++ b/examples/apparmor/libvirt-lxc
@@ -1,12 +1,18 @@
-# Last Modified: Fri Feb 7 13:01:36 2014
+# Last Modified: Thu, 18 Sep 2014 13:56:49 +0200
#include <abstractions/base>
umount,
+ dbus,
+ signal,
+ ptrace,
# ignore DENIED message on / remount
deny mount options=(ro, remount) -> /,
+ # support use of cgmanager proxy
+ mount options=(move) /sys/fs/cgroup/cgmanager/ -> /sys/fs/cgroup/cgmanager.lower/,
+
# allow tmpfs mounts everywhere
mount fstype=tmpfs,
@@ -33,8 +39,15 @@
mount fstype=fusectl -> /sys/fs/fuse/connections/,
mount fstype=securityfs -> /sys/kernel/security/,
mount fstype=debugfs -> /sys/kernel/debug/,
+ deny mount fstype=debugfs -> /var/lib/ureadahead/debugfs/,
mount fstype=proc -> /proc/,
mount fstype=sysfs -> /sys/,
+
+ mount options=(rw nosuid nodev noexec remount) -> /sys/,
+ mount options=(rw remount) -> /sys/kernel/security/,
+ mount options=(rw remount) -> /sys/fs/pstore/,
+ mount options=(ro remount) -> /sys/fs/pstore/,
+
deny /sys/firmware/efi/efivars/** rwklx,
deny /sys/kernel/security/** rwklx,
diff --git a/examples/apparmor/libvirt-qemu b/examples/apparmor/libvirt-qemu
index c6de6dd..b69e64c 100644
--- a/examples/apparmor/libvirt-qemu
+++ b/examples/apparmor/libvirt-qemu
@@ -1,4 +1,4 @@
-# Last Modified: Wed Sep 3 21:52:03 2014
+# Last Modified: Thu, 18 Sep 2014 16:41:21 +0200
#include <abstractions/base>
#include <abstractions/consoles>
@@ -13,15 +13,22 @@
capability setgid,
capability setuid,
+ # this is needed with libcap-ng support, however it breaks a lot of things
+ # atm, so just silence the denial until libcap-ng works right. LP: #522845
+ deny capability setpcap,
+
network inet stream,
network inet6 stream,
/dev/net/tun rw,
+ /dev/tap* rw,
/dev/kvm rw,
/dev/ptmx rw,
/dev/kqemu rw,
@{PROC}/*/status r,
@{PROC}/sys/kernel/cap_last_cap r,
+ owner @{PROC}/*/auxv r,
+ @{PROC}/sys/vm/overcommit_memory r,
# For hostdev access. The actual devices will be added dynamically
/sys/bus/usb/devices/ r,
@@ -38,6 +45,9 @@
/dev/snd/* rw,
capability ipc_lock,
# spice
+ /usr/bin/qemu-system-i386-spice rmix,
+ /usr/bin/qemu-system-x86_64-spice rmix,
+ /{dev,run}/shm/ r,
owner /{dev,run}/shm/spice.* rw,
# 'kill' is not required for sound and is a security risk. Do not enable
# unless you absolutely need it.
@@ -73,6 +83,7 @@
# the various binaries
/usr/bin/kvm rmix,
/usr/bin/qemu rmix,
+ /usr/bin/qemu-system-aarch64 rmix,
/usr/bin/qemu-system-arm rmix,
/usr/bin/qemu-system-cris rmix,
/usr/bin/qemu-system-i386 rmix,
@@ -91,6 +102,7 @@
/usr/bin/qemu-system-sparc rmix,
/usr/bin/qemu-system-sparc64 rmix,
/usr/bin/qemu-system-x86_64 rmix,
+ /usr/bin/qemu-system-x86_64-spice rmix,
/usr/bin/qemu-alpha rmix,
/usr/bin/qemu-arm rmix,
/usr/bin/qemu-armeb rmix,
@@ -117,6 +129,16 @@
/bin/dash rmix,
/bin/dd rmix,
/bin/cat rmix,
+ /etc/pki/CA/ r,
+ /etc/pki/CA/* r,
+ /etc/pki/libvirt/ r,
+ /etc/pki/libvirt/** r,
+
+ # for rbd
+ /etc/ceph/ceph.conf r,
+
+ # for access to hugepages
+ owner "/run/hugepages/kvm/libvirt/qemu/**" rw,
# for usb access
/dev/bus/usb/ r,
@@ -124,6 +146,13 @@
/sys/bus/ r,
/sys/class/ r,
+ signal (receive) peer=/usr/sbin/libvirtd,
+ ptrace (tracedby) peer=/usr/sbin/libvirtd,
+
+ # for ppc device-tree access
+ @{PROC}/device-tree/ r,
+ @{PROC}/device-tree/** r,
+
/usr/{lib,libexec}/qemu-bridge-helper Cx -> qemu_bridge_helper,
# child profile for bridge helper process
profile qemu_bridge_helper {
diff --git a/examples/apparmor/local-usr.sbin.libvirtd b/examples/apparmor/local-usr.sbin.libvirtd
new file mode 100644
index 0000000..6e19f20
--- /dev/null
+++ b/examples/apparmor/local-usr.sbin.libvirtd
@@ -0,0 +1,2 @@
+# Site-specific additions and overrides for usr.sbin.libvirtd.
+# For more details, please see /etc/apparmor.d/local/README.
diff --git a/examples/apparmor/usr.lib.libvirt.virt-aa-helper b/examples/apparmor/usr.lib.libvirt.virt-aa-helper
index bceaaff..4df86b0 100644
--- a/examples/apparmor/usr.lib.libvirt.virt-aa-helper
+++ b/examples/apparmor/usr.lib.libvirt.virt-aa-helper
@@ -1,8 +1,9 @@
-# Last Modified: Mon Apr 5 15:10:27 2010
+# Last Modified: Thu, 18 Sep 2014 14:05:36 +0200
#include <tunables/global>
/usr/lib/libvirt/virt-aa-helper {
#include <abstractions/base>
+ #include <abstractions/user-tmp>
# needed for searching directories
capability dac_override,
@@ -19,6 +20,12 @@
# for hostdev
/sys/devices/ r,
/sys/devices/** r,
+ /sys/bus/usb/devices/ r,
+ /sys/bus/usb/devices/** r,
+ deny /dev/sd* r,
+ deny /dev/dm-* r,
+ deny /dev/mapper/ r,
+ deny /dev/mapper/* r,
/usr/lib/libvirt/virt-aa-helper mr,
/sbin/apparmor_parser Ux,
@@ -26,8 +33,11 @@
/etc/apparmor.d/libvirt/* r,
/etc/apparmor.d/libvirt/libvirt-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]* rw,
- # for backingstore -- allow access to non-hidden files in @{HOME} as well
- # as storage pools
+ # For backingstore, virt-aa-helper needs to peek inside the disk image, so
+ # allow access to non-hidden files in @{HOME} as well as storage pools, and
+ # removable media and filesystems, and certain file extentions. A
+ # virt-aa-helper failure when checking a disk for backinsgstore is non-fatal
+ # (but obviously the backingstore won't be added).
audit deny @{HOME}/.* mrwkl,
audit deny @{HOME}/.*/ rw,
audit deny @{HOME}/.*/** mrwkl,
@@ -35,8 +45,17 @@
audit deny @{HOME}/bin/** mrwkl,
@{HOME}/ r,
@{HOME}/** r,
+ @{HOME}/.Private/** mrwlk,
+ @{HOMEDIRS}/.ecryptfs/*/.Private/** mrwlk,
+
/var/lib/libvirt/images/ r,
/var/lib/libvirt/images/** r,
+ /var/lib/nova/images/** r,
+ /var/lib/nova/instances/_base/** r,
+ /var/lib/nova/instances/snapshots/** r,
+ /var/lib/eucalyptus/instances/**/disk* r,
+ /var/lib/eucalyptus/instances/**/loader* r,
+ /var/lib/uvtool/libvirt/images/** r,
/{media,mnt,opt,srv}/** r,
/**.img r,
diff --git a/examples/apparmor/usr.sbin.libvirtd b/examples/apparmor/usr.sbin.libvirtd
index 3011eff..814b4d81 100644
--- a/examples/apparmor/usr.sbin.libvirtd
+++ b/examples/apparmor/usr.sbin.libvirtd
@@ -1,10 +1,12 @@
-# Last Modified: Mon Apr 5 15:03:58 2010
+# Last Modified: Tue, 23 Sep 2014 09:28:07 +0200
#include <tunables/global>
@{LIBVIRT}="libvirt"
/usr/sbin/libvirtd {
#include <abstractions/base>
#include <abstractions/dbus>
+ # Site-specific additions and overrides. See local/README for details.
+ #include <local/usr.sbin.libvirtd>
capability kill,
capability net_admin,
@@ -23,6 +25,7 @@
capability setpcap,
capability mknod,
capability fsetid,
+ capability ipc_lock,
capability audit_write,
# Needed for vfio
@@ -33,6 +36,12 @@
network inet6 stream,
network inet6 dgram,
network packet dgram,
+ network netlink,
+
+ dbus bus=system,
+ signal,
+ ptrace,
+ unix,
# Very lenient profile for libvirtd since we want to first focus on confining
# the guests. Guests will have a very restricted profile.
@@ -45,6 +54,12 @@
/usr/sbin/* PUx,
/lib/udev/scsi_id PUx,
/usr/lib/xen-common/bin/xen-toolstack PUx,
+ /usr/lib/xen-*/bin/pygrub PUx,
+ /usr/lib/xen-*/bin/libxl-save-helper PUx,
+
+ # Required by nwfilter_ebiptables_driver.c:ebiptablesWriteToTempFile() to
+ # write and run an ebtables script.
+ /var/lib/libvirt/virtd* ixr,
# force the use of virt-aa-helper
audit deny /sbin/apparmor_parser rwxl,
--
1.9.1
8 years, 6 months
[libvirt] [PATCH] qemu: don't refuse to undefine a guest with NVRAM file
by Daniel P. Berrange
The undefine operation should always be allowed to succeed
regardless of whether any NVRAM file exists. ie we should
not force the application to use the VIR_DOMAIN_UNDEFINE_NVRAM
flag. It is valid for the app to decide it wants the NVRAM
file left on disk, in the same way that disk images are left
on disk at undefine.
---
src/qemu/qemu_driver.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index bec05d4..302bf48 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -6985,19 +6985,13 @@ qemuDomainUndefineFlags(virDomainPtr dom,
if (!virDomainObjIsActive(vm) &&
vm->def->os.loader && vm->def->os.loader->nvram &&
- virFileExists(vm->def->os.loader->nvram)) {
- if (!(flags & VIR_DOMAIN_UNDEFINE_NVRAM)) {
- virReportError(VIR_ERR_OPERATION_INVALID, "%s",
- _("cannot delete inactive domain with nvram"));
- goto cleanup;
- }
-
- if (unlink(vm->def->os.loader->nvram) < 0) {
- virReportSystemError(errno,
- _("failed to remove nvram: %s"),
- vm->def->os.loader->nvram);
- goto cleanup;
- }
+ virFileExists(vm->def->os.loader->nvram) &&
+ (flags & VIR_DOMAIN_UNDEFINE_NVRAM) &&
+ (unlink(vm->def->os.loader->nvram) < 0)) {
+ virReportSystemError(errno,
+ _("failed to remove nvram: %s"),
+ vm->def->os.loader->nvram);
+ goto cleanup;
}
if (virDomainDeleteConfig(cfg->configDir, cfg->autostartDir, vm) < 0)
--
2.1.0
8 years, 6 months
[libvirt] [PATCH v2 0/8] Add support for fetching statistics of completed jobs
by Jiri Denemark
Using virDomainGetJobStats, we can monitor running jobs but sometimes it
may be useful to get statistics about a job that already finished, for
example, to get the final amount of data transferred during migration or
to get an idea about total downtime. This is what the following patches
are about.
Version 2:
- changed according to John's review (see individual patches for
details)
Jiri Denemark (8):
Refactor job statistics
qemu: Avoid incrementing jobs_queued if virTimeMillisNow fails
Add support for fetching statistics of completed jobs
qemu: Silence coverity on optional migration stats
virsh: Add support for completed job stats
qemu: Transfer migration statistics to destination
qemu: Recompute downtime and total time when migration completes
qemu: Transfer recomputed stats back to source
include/libvirt/libvirt.h.in | 11 ++
src/libvirt.c | 11 +-
src/qemu/qemu_domain.c | 189 ++++++++++++++++++++++++++-
src/qemu/qemu_domain.h | 32 ++++-
src/qemu/qemu_driver.c | 130 ++++--------------
src/qemu/qemu_migration.c | 304 ++++++++++++++++++++++++++++++++++++-------
src/qemu/qemu_monitor_json.c | 10 +-
src/qemu/qemu_process.c | 9 +-
tools/virsh-domain.c | 27 +++-
tools/virsh.pod | 10 +-
10 files changed, 557 insertions(+), 176 deletions(-)
--
2.1.0
8 years, 6 months
[libvirt] Libvirt bite sized tasks
by Michal Privoznik
Dear lists,
I've just started new wiki page which aim is to summarize small and
trivial tasks, that starting contributors can take, investigate and
implement. The aim is to give them something easy to start with while
not scaring them out about complexity of our code. The page can be found
here:
http://wiki.libvirt.org/page/BiteSizedTasks
The name is copied from qemu wiki:
http://wiki.qemu.org/BiteSizedTasks
Please, feel free to extend the list. I plan to use it in the future
when interviewing GSoC candidates.
Michal
8 years, 8 months
[libvirt] [[PATCH v6] autocreate tap device for VIR_DOMAIN_NET_TYPE_ETHERNET] autocreate tap device for VIR_DOMAIN_NET_TYPE_ETHERNET
by Vasiliy Tolstov
If a user specify ehernet device create it via libvirt and run
script if it provided. After this commit user does not need to
run external script to create tap device or add root to qemu
process.
Signed-off-by: Vasiliy Tolstov <v.tolstov(a)selfip.ru>
---
src/qemu/qemu_command.c | 142 +++++++++++++++++++++++++++++++-----------------
src/qemu/qemu_hotplug.c | 13 ++---
src/qemu/qemu_process.c | 6 ++
3 files changed, 101 insertions(+), 60 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 3886b4f..6d26d28 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -332,10 +332,39 @@ static int qemuCreateInBridgePortWithHelper(virQEMUDriverConfigPtr cfg,
return *tapfd < 0 ? -1 : 0;
}
+/**
+ * qemuExecuteEthernetScript:
+ * @ifname: the interface name
+ * @script: the script name
+ *
+ * This function executes script for new tap device created by libvirt.
+ * Returns 0 in case of success or -1 on failure
+ */
+static int
+qemuExecuteEthernetScript(const char *ifname, const char *script)
+{
+ virCommandPtr cmd;
+ int ret;
+
+ cmd = virCommandNew(script);
+ virCommandAddArgFormat(cmd, "%s", ifname);
+ virCommandClearCaps(cmd);
+#ifdef CAP_NET_ADMIN
+ virCommandAllowCap(cmd, CAP_NET_ADMIN);
+#endif
+ virCommandAddEnvPassCommon(cmd);
+
+ ret = virCommandRun(cmd, NULL);
+
+ virCommandFree(cmd);
+ return ret;
+}
+
/* qemuNetworkIfaceConnect - *only* called if actualType is
- * VIR_DOMAIN_NET_TYPE_NETWORK or VIR_DOMAIN_NET_TYPE_BRIDGE (i.e. if
- * the connection is made with a tap device connecting to a bridge
- * device)
+ * VIR_DOMAIN_NET_TYPE_NETWORK, VIR_DOMAIN_NET_TYPE_BRIDGE
+ * VIR_DOMAIN_NET_TYPE_ETHERNET (i.e. if the connection is
+ * made with a tap device connecting to a bridge device or
+ * use plain tap device)
*/
int
qemuNetworkIfaceConnect(virDomainDefPtr def,
@@ -351,6 +380,7 @@ qemuNetworkIfaceConnect(virDomainDefPtr def,
bool template_ifname = false;
virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
const char *tunpath = "/dev/net/tun";
+ virMacAddr tapmac;
if (net->backend.tap) {
tunpath = net->backend.tap;
@@ -361,11 +391,6 @@ qemuNetworkIfaceConnect(virDomainDefPtr def,
}
}
- if (!(brname = virDomainNetGetActualBridgeName(net))) {
- virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Missing bridge name"));
- goto cleanup;
- }
-
if (!net->ifname ||
STRPREFIX(net->ifname, VIR_NET_GENERATED_PREFIX) ||
strchr(net->ifname, '%')) {
@@ -381,40 +406,65 @@ qemuNetworkIfaceConnect(virDomainDefPtr def,
tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR;
}
- if (cfg->privileged) {
- if (virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac,
- def->uuid, tunpath, tapfd, *tapfdSize,
- virDomainNetGetActualVirtPortProfile(net),
- virDomainNetGetActualVlan(net),
- tap_create_flags) < 0) {
+ if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_ETHERNET) {
+ if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, *tapfdSize,
+ tap_create_flags) < 0) {
virDomainAuditNetDevice(def, net, tunpath, false);
goto cleanup;
}
- if (virDomainNetGetActualBridgeMACTableManager(net)
- == VIR_NETWORK_BRIDGE_MAC_TABLE_MANAGER_LIBVIRT) {
- /* libvirt is managing the FDB of the bridge this device
- * is attaching to, so we need to turn off learning and
- * unicast_flood on the device to prevent the kernel from
- * adding any FDB entries for it. We will add add an fdb
- * entry ourselves (during qemuInterfaceStartDevices(),
- * using the MAC address from the interface config.
- */
- if (virNetDevBridgePortSetLearning(brname, net->ifname, false) < 0)
- goto cleanup;
- if (virNetDevBridgePortSetUnicastFlood(brname, net->ifname, false) < 0)
+ virMacAddrSet(&tapmac, &net->mac);
+
+ if (virNetDevSetMAC(net->ifname, &tapmac) < 0)
+ goto cleanup;
+
+ if (virNetDevSetOnline(net->ifname, true) < 0)
+ goto cleanup;
+
+ if (net->script) {
+ if (qemuExecuteEthernetScript(net->ifname, net->script) < 0)
goto cleanup;
}
} else {
- if (qemuCreateInBridgePortWithHelper(cfg, brname,
- &net->ifname,
- tapfd, tap_create_flags) < 0) {
- virDomainAuditNetDevice(def, net, tunpath, false);
+ if (!(brname = virDomainNetGetActualBridgeName(net))) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Missing bridge name"));
goto cleanup;
}
- /* qemuCreateInBridgePortWithHelper can only create a single FD */
- if (*tapfdSize > 1) {
- VIR_WARN("Ignoring multiqueue network request");
- *tapfdSize = 1;
+
+ if (cfg->privileged) {
+ if (virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac,
+ def->uuid, tunpath, tapfd, *tapfdSize,
+ virDomainNetGetActualVirtPortProfile(net),
+ virDomainNetGetActualVlan(net),
+ tap_create_flags) < 0) {
+ virDomainAuditNetDevice(def, net, tunpath, false);
+ goto cleanup;
+ }
+ if (virDomainNetGetActualBridgeMACTableManager(net)
+ == VIR_NETWORK_BRIDGE_MAC_TABLE_MANAGER_LIBVIRT) {
+ /* libvirt is managing the FDB of the bridge this device
+ * is attaching to, so we need to turn off learning and
+ * unicast_flood on the device to prevent the kernel from
+ * adding any FDB entries for it. We will add add an fdb
+ * entry ourselves (during qemuInterfaceStartDevices(),
+ * using the MAC address from the interface config.
+ */
+ if (virNetDevBridgePortSetLearning(brname, net->ifname, false) < 0)
+ goto cleanup;
+ if (virNetDevBridgePortSetUnicastFlood(brname, net->ifname, false) < 0)
+ goto cleanup;
+ }
+ } else {
+ if (qemuCreateInBridgePortWithHelper(cfg, brname,
+ &net->ifname,
+ tapfd, tap_create_flags) < 0) {
+ virDomainAuditNetDevice(def, net, tunpath, false);
+ goto cleanup;
+ }
+ /* qemuCreateInBridgePortWithHelper can only create a single FD */
+ if (*tapfdSize > 1) {
+ VIR_WARN("Ignoring multiqueue network request");
+ *tapfdSize = 1;
+ }
}
}
@@ -5221,6 +5271,7 @@ qemuBuildHostNetStr(virDomainNetDefPtr net,
case VIR_DOMAIN_NET_TYPE_BRIDGE:
case VIR_DOMAIN_NET_TYPE_NETWORK:
case VIR_DOMAIN_NET_TYPE_DIRECT:
+ case VIR_DOMAIN_NET_TYPE_ETHERNET:
virBufferAsprintf(&buf, "tap%c", type_sep);
/* for one tapfd 'fd=' shall be used,
* for more than one 'fds=' is the right choice */
@@ -5238,20 +5289,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net,
is_tap = true;
break;
- case VIR_DOMAIN_NET_TYPE_ETHERNET:
- virBufferAddLit(&buf, "tap");
- if (net->ifname) {
- virBufferAsprintf(&buf, "%cifname=%s", type_sep, net->ifname);
- type_sep = ',';
- }
- if (net->script) {
- virBufferAsprintf(&buf, "%cscript=%s", type_sep,
- net->script);
- type_sep = ',';
- }
- is_tap = true;
- break;
-
case VIR_DOMAIN_NET_TYPE_CLIENT:
virBufferAsprintf(&buf, "socket%cconnect=%s:%d",
type_sep,
@@ -8226,7 +8263,8 @@ qemuBuildInterfaceCommandLine(virCommandPtr cmd,
/* Currently nothing besides TAP devices supports multiqueue. */
if (net->driver.virtio.queues > 0 &&
!(actualType == VIR_DOMAIN_NET_TYPE_NETWORK ||
- actualType == VIR_DOMAIN_NET_TYPE_BRIDGE)) {
+ actualType == VIR_DOMAIN_NET_TYPE_BRIDGE ||
+ actualType == VIR_DOMAIN_NET_TYPE_ETHERNET)) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
_("Multiqueue network is not supported for: %s"),
virDomainNetTypeToString(actualType));
@@ -8235,7 +8273,8 @@ qemuBuildInterfaceCommandLine(virCommandPtr cmd,
if (net->backend.tap &&
!(actualType == VIR_DOMAIN_NET_TYPE_NETWORK ||
- actualType == VIR_DOMAIN_NET_TYPE_BRIDGE)) {
+ actualType == VIR_DOMAIN_NET_TYPE_BRIDGE ||
+ actualType == VIR_DOMAIN_NET_TYPE_ETHERNET)) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
_("Custom tap device path is not supported for: %s"),
virDomainNetTypeToString(actualType));
@@ -8245,7 +8284,8 @@ qemuBuildInterfaceCommandLine(virCommandPtr cmd,
cfg = virQEMUDriverGetConfig(driver);
if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK ||
- actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) {
+ actualType == VIR_DOMAIN_NET_TYPE_BRIDGE ||
+ actualType == VIR_DOMAIN_NET_TYPE_ETHERNET) {
tapfdSize = net->driver.virtio.queues;
if (!tapfdSize)
tapfdSize = 1;
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index cc86a3b..21ea3fd 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -908,7 +908,8 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
/* Currently nothing besides TAP devices supports multiqueue. */
if (net->driver.virtio.queues > 0 &&
!(actualType == VIR_DOMAIN_NET_TYPE_NETWORK ||
- actualType == VIR_DOMAIN_NET_TYPE_BRIDGE)) {
+ actualType == VIR_DOMAIN_NET_TYPE_BRIDGE ||
+ actualType == VIR_DOMAIN_NET_TYPE_ETHERNET)) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
_("Multiqueue network is not supported for: %s"),
virDomainNetTypeToString(actualType));
@@ -916,7 +917,8 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
}
if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE ||
- actualType == VIR_DOMAIN_NET_TYPE_NETWORK) {
+ actualType == VIR_DOMAIN_NET_TYPE_NETWORK ||
+ actualType == VIR_DOMAIN_NET_TYPE_ETHERNET) {
tapfdSize = vhostfdSize = net->driver.virtio.queues;
if (!tapfdSize)
tapfdSize = vhostfdSize = 1;
@@ -947,13 +949,6 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
iface_connected = true;
if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, vhostfd, &vhostfdSize) < 0)
goto cleanup;
- } else if (actualType == VIR_DOMAIN_NET_TYPE_ETHERNET) {
- vhostfdSize = 1;
- if (VIR_ALLOC(vhostfd) < 0)
- goto cleanup;
- *vhostfd = -1;
- if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, vhostfd, &vhostfdSize) < 0)
- goto cleanup;
}
/* Set device online immediately */
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 64ee049..d866e44 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -5205,6 +5205,12 @@ void qemuProcessStop(virQEMUDriverPtr driver,
cfg->stateDir));
VIR_FREE(net->ifname);
break;
+ case VIR_DOMAIN_NET_TYPE_ETHERNET:
+ if (net->ifname) {
+ ignore_value(virNetDevTapDelete(net->ifname, net->backend.tap));
+ VIR_FREE(net->ifname);
+ }
+ break;
case VIR_DOMAIN_NET_TYPE_BRIDGE:
case VIR_DOMAIN_NET_TYPE_NETWORK:
#ifdef VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP
--
2.3.3
9 years
[libvirt] [PATCH] [RFC] virSetUIDGID: Don't leak supplementary groups
by Richard Weinberger
The LXC driver uses virSetUIDGID() to become UID/GID 0.
It passes an empty groups list to virSetUIDGID()
to get rid of all supplementary groups from the host side.
But virSetUIDGID() calls setgroups() only if the supplied list
is larger than 0.
This leads to a container root with unrelated supplementary groups.
In most cases this issue is unoticed as libvirtd runs as UID/GID 0
without any supplementary groups.
Signed-off-by: Richard Weinberger <richard(a)nod.at>
---
I've marked that patch as RFC as I'm not sure if all users of virSetUIDGID()
expect this behavior too.
Thanks,
//richard
---
src/util/virutil.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/util/virutil.c b/src/util/virutil.c
index cddc78a..ea697a3 100644
--- a/src/util/virutil.c
+++ b/src/util/virutil.c
@@ -1103,7 +1103,7 @@ virSetUIDGID(uid_t uid, gid_t gid, gid_t *groups ATTRIBUTE_UNUSED,
}
# if HAVE_SETGROUPS
- if (ngroups && setgroups(ngroups, groups) < 0) {
+ if (setgroups(ngroups, groups) < 0) {
virReportSystemError(errno, "%s",
_("cannot set supplemental groups"));
return -1;
--
2.4.2
9 years
[libvirt] [PATCH] lxc: Bind mount container TTYs
by Richard Weinberger
Instead of creating symlinks, bind mount the devices to
/dev/pts/XY.
Using bind mounts it is no longer needed to add pts devices
to files like /dev/securetty.
Signed-off-by: Richard Weinberger <richard(a)nod.at>
---
src/lxc/lxc_container.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 7d531e2..ea76370 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -1141,6 +1141,20 @@ static int lxcContainerMountFSDevPTS(virDomainDefPtr def,
return ret;
}
+static int lxcContainerBindMountDevice(const char *src, const char *dst)
+{
+ if (virFileTouch(dst, 0666) < 0)
+ return -1;
+
+ if (mount(src, dst, "none", MS_BIND, NULL) < 0) {
+ virReportSystemError(errno, _("Failed to bind %s on to %s"), src,
+ dst);
+ return -1;
+ }
+
+ return 0;
+}
+
static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths)
{
size_t i;
@@ -1164,34 +1178,24 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths)
}
/* We have private devpts capability, so bind that */
- if (virFileTouch("/dev/ptmx", 0666) < 0)
+ if (lxcContainerBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0)
return -1;
- if (mount("/dev/pts/ptmx", "/dev/ptmx", "ptmx", MS_BIND, NULL) < 0) {
- virReportSystemError(errno, "%s",
- _("Failed to bind /dev/pts/ptmx on to /dev/ptmx"));
- return -1;
- }
-
for (i = 0; i < nttyPaths; i++) {
char *tty;
if (virAsprintf(&tty, "/dev/tty%zu", i+1) < 0)
return -1;
- if (symlink(ttyPaths[i], tty) < 0) {
- virReportSystemError(errno,
- _("Failed to symlink %s to %s"),
- ttyPaths[i], tty);
- VIR_FREE(tty);
+
+ if (lxcContainerBindMountDevice(ttyPaths[i], tty) < 0) {
return -1;
+ VIR_FREE(tty);
}
+
VIR_FREE(tty);
+
if (i == 0 &&
- symlink(ttyPaths[i], "/dev/console") < 0) {
- virReportSystemError(errno,
- _("Failed to symlink %s to /dev/console"),
- ttyPaths[i]);
+ lxcContainerBindMountDevice(ttyPaths[i], "/dev/console") < 0)
return -1;
- }
}
return 0;
}
--
2.4.2
9 years