[libvirt] [PATCH v2] Add some notes about security considerations when using LXC
by Daniel P. Berrange
From: "Daniel P. Berrange" <berrange(a)redhat.com>
Describe some of the issues to be aware of when configuring LXC
guests with security isolation as a goal.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
---
docs/drvlxc.html.in | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 103 insertions(+)
In v2:
- Clarify UNIX domain socket issues wrt filesystem & network namespaces
diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in
index 1e6aa1d..66d97e4 100644
--- a/docs/drvlxc.html.in
+++ b/docs/drvlxc.html.in
@@ -168,6 +168,109 @@ Further block or character devices will be made available to containers
depending on their configuration.
</p>
+<h2><a name="security">Security considerations</a></h2>
+
+<p>
+The libvirt LXC driver is fairly flexible in how it can be configured,
+and as such does not enforce a requirement for strict security
+separation between a container and the host. This allows it to be used
+in scenarios where only resource control capabilities are important,
+and resource sharing is desired. Applications wishing to ensure secure
+isolation between a container and the host must ensure that they are
+writing a suitable configuration.
+</p>
+
+<h3><a name="securenetworking">Network isolation</a></h3>
+
+<p>
+If the guest configuration does not list any network interfaces,
+the <code>network</code> namespace will not be activated, and thus
+the container will see all the host's network interfaces. This will
+allow apps in the container to bind to/connect from TCP/UDP addresses
+and ports from the host OS. It also allows applications to access
+UNIX domain sockets associated with the host OS, which are in the
+abstract namespace. If access to UNIX domains sockets in the abstract
+namespace is not wanted, then applications should set the
+<code><privnet/></code> flag in the
+<code><features>....</features></code> element.
+</p>
+
+<h3><a name="securefs">Filesystem isolation</a></h3>
+
+<p>
+If the guest configuration does not list any filesystems, then
+the container will be set up with a root filesystem that matches
+the host's root filesystem. As noted earlier, only a few locations
+such as <code>/dev</code>, <code>/proc</code> and <code>/sys</code>
+will be altered. This means that, in the absence of restrictions
+from sVirt, a process running as user/group N:M inside the container
+will be able to access almost exactly the same files as a process
+running as user/group N:M in the host.
+</p>
+
+<p>
+There are multiple options for restricting this. It is possible to
+simply map the existing root filesystem through to the container in
+read-only mode. Alternatively a completely separate root filesystem
+can be configured for the guest. In both cases, further sub-mounts
+can be applied to customize the content that is made visible. Note
+that in the absence of sVirt controls, it is still possible for the
+root user in a container to unmount any sub-mounts applied. The user
+namespace feature can also be used to restrict access to files based
+on the UID/GID mappings.
+</p>
+
+<p>
+Sharing the host filesystem tree, also allows applications to access
+UNIX domains sockets associated with the host OS, which are in the
+filesystem namespaces. It should be noted that a number of init
+systems including at least <code>systemd</code> and <code>upstart</code>
+have UNIX domain socket which are used to control their operation.
+Thus, if the directory/filesystem holding their UNIX domain socket is
+exposed to the container, it will be possible for a user in the container
+to invoke operations on the init service in the same way it could if
+outside the container. This also applies to other applications in the
+host which use UNIX domain sockets in the filesystem, such as DBus,
+Libvirtd, and many more. If this is not desired, then applications
+should either specify the UID/GID mapping in the configuration to
+enable user namespaces & thus block access to the UNIX domain socket
+based on permissions, or should ensure the relevant directories have
+a bind mount to hide them. This is particularly important for the
+<code>/run</code> or <code>/var/run</code> directories.
+</p>
+
+
+<h3><a name="secureusers">User and group isolation</a></h3>
+
+<p>
+If the guest configuration does not list any ID mapping, then the
+user and group IDs used inside the container will match those used
+outside the container. In addition, the capabilities associated with
+a process in the container will infer the same privileges they would
+for a process in the host. This has obvious implications for security,
+since a root user inside the container will be able to access any
+file owned by root that is visible to the container, and perform more
+or less any privileged kernel operation. In the absence of additional
+protection from sVirt, this means that the root user inside a container
+is effectively as powerful as the root user in the host. There is no
+security isolation of the root user.
+</p>
+
+<p>
+The ID mapping facility was introduced to allow for stricter control
+over the privileges of users inside the container. It allows apps to
+define rules such as "user ID 0 in the container maps to user ID 1000
+in the host". In addition the privileges associated with capabilities
+are somewhat reduced so that they can not be used to escape from the
+container environment. A full description of user namespaces is outside
+the scope of this document, however LWN has
+<a href="https://lwn.net/Articles/532593/">a good write-up on the topic</a>.
+From the libvirt point of view, the key thing to remember is that defining
+an ID mapping for users and groups in the container XML configuration
+causes libvirt to activate the user namespace feature.
+</p>
+
+
<h2><a name="activation">Systemd Socket Activation Integration</a></h2>
<p>
--
1.8.3.1
11 years, 3 months
[libvirt] [PATCH] Fix naming of permission for detecting storage pools
by Daniel P. Berrange
From: "Daniel P. Berrange" <berrange(a)redhat.com>
The VIR_ACCESS_PERM_CONNECT_DETECT_STORAGE_POOLS enum
constant had its string format be 'detect_storage_pool',
note the missing trailing 's'. This prevent the ACL
check from ever succeeding. Fix this and add a simple
test script to validate this problem of matching names.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
---
src/Makefile.am | 8 ++++-
src/access/viraccessperm.c | 2 +-
src/check-aclperms.pl | 75 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 83 insertions(+), 2 deletions(-)
create mode 100755 src/check-aclperms.pl
diff --git a/src/Makefile.am b/src/Makefile.am
index 711da32..9f9dcd9 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -528,10 +528,16 @@ check-aclrules:
$(REMOTE_PROTOCOL) \
$(addprefix $(srcdir)/,$(filter-out /%,$(STATEFUL_DRIVER_SOURCE_FILES)))
+check-aclperms:
+ $(AM_V_GEN)$(PERL) $(srcdir)/check-aclperms.pl \
+ $(srcdir)/access/viraccessperm.h \
+ $(srcdir)/access/viraccessperm.c
+
EXTRA_DIST += check-driverimpls.pl check-aclrules.pl
check-local: check-protocol check-symfile check-symsorting \
- check-drivername check-driverimpls check-aclrules
+ check-drivername check-driverimpls check-aclrules \
+ check-aclperms
.PHONY: check-protocol $(PROTOCOL_STRUCTS:structs=struct)
# Mock driver, covering domains, storage, networks, etc
diff --git a/src/access/viraccessperm.c b/src/access/viraccessperm.c
index 9c720f9..d517c66 100644
--- a/src/access/viraccessperm.c
+++ b/src/access/viraccessperm.c
@@ -30,7 +30,7 @@ VIR_ENUM_IMPL(virAccessPermConnect,
"search_storage_pools", "search_node_devices",
"search_interfaces", "search_secrets",
"search_nwfilters",
- "detect_storage_pool", "pm_control",
+ "detect_storage_pools", "pm_control",
"interface_transaction");
VIR_ENUM_IMPL(virAccessPermDomain,
diff --git a/src/check-aclperms.pl b/src/check-aclperms.pl
new file mode 100755
index 0000000..b7fadcd
--- /dev/null
+++ b/src/check-aclperms.pl
@@ -0,0 +1,75 @@
+#!/usr/bin/perl
+#
+# Copyright (C) 2013 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library. If not, see
+# <http://www.gnu.org/licenses/>.
+#
+# This script just validates that the stringified version of
+# a virAccessPerm enum matches the enum constant name. We do
+# alot of auto-generation of code, so when these don't match
+# problems occur, preventing auth from succeeding at all.
+
+my $hdr = shift;
+my $impl = shift;
+
+my %perms;
+
+my @perms;
+
+open HDR, $hdr or die "cannot read $hdr: $!";
+
+while (<HDR>) {
+ if (/^\s+VIR_ACCESS_PERM_([_A-Z]+)(,?|\s|$)/) {
+ my $perm = $1;
+
+ $perms{$perm} = 1 unless ($perm =~ /_LAST$/);
+ }
+}
+
+close HDR;
+
+
+open IMPL, $impl or die "cannot read $impl: $!";
+
+my $group;
+my $warned = 0;
+
+while (defined (my $line = <IMPL>)) {
+ if ($line =~ /VIR_ACCESS_PERM_([_A-Z]+)_LAST/) {
+ $group = $1;
+ } elsif ($line =~ /"[_a-z]+"/) {
+ my @bits = split /,/, $line;
+ foreach my $bit (@bits) {
+ if ($bit =~ /"([_a-z]+)"/) {
+ #print $1, "\n";
+
+ my $perm = uc($group . "_" . $1);
+ if (!exists $perms{$perm}) {
+ print STDERR "Unknown perm string $1 for group $group\n";
+ $warned = 1;
+ }
+ delete $perms{$perm};
+ }
+ }
+ }
+}
+close IMPL;
+
+foreach my $perm (keys %perms) {
+ print STDERR "Perm $perm had not string form\n";
+ $warned = 1;
+}
+
+exit $warned;
--
1.8.3.1
11 years, 3 months
[libvirt] [PATCH] rbd: Use rbd_create3 to create RBD format 2 images by default
by Wido den Hollander
This new RBD format supports snapshotting and cloning. By having
libvirt create images in format 2 end-users of the created images
can benefit of the new RBD format.
Signed-off-by: Wido den Hollander <wido(a)widodh.nl>
---
src/storage/storage_backend_rbd.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c
index d9e1789..e79873f 100644
--- a/src/storage/storage_backend_rbd.c
+++ b/src/storage/storage_backend_rbd.c
@@ -435,6 +435,26 @@ cleanup:
return ret;
}
+static int virStorageBackendRBDCreateImage(rados_ioctx_t io,
+ char *name, long capacity)
+{
+ int order = 0;
+ #if LIBRBD_VERSION_CODE > 260
+ uint64_t features = 3;
+ uint64_t stripe_count = 1;
+ uint64_t stripe_unit = 4194304;
+
+ if (rbd_create3(io, name, capacity, features, &order,
+ stripe_count, stripe_unit) < 0) {
+ #else
+ if (rbd_create(io, name, capacity, &order) < 0) {
+ #endif
+ return -1;
+ }
+
+ return 0;
+}
+
static int virStorageBackendRBDCreateVol(virConnectPtr conn,
virStoragePoolObjPtr pool,
virStorageVolDefPtr vol)
@@ -442,7 +462,6 @@ static int virStorageBackendRBDCreateVol(virConnectPtr conn,
virStorageBackendRBDStatePtr ptr;
ptr.cluster = NULL;
ptr.ioctx = NULL;
- int order = 0;
int ret = -1;
VIR_DEBUG("Creating RBD image %s/%s with size %llu",
@@ -467,7 +486,7 @@ static int virStorageBackendRBDCreateVol(virConnectPtr conn,
goto cleanup;
}
- if (rbd_create(ptr.ioctx, vol->name, vol->capacity, &order) < 0) {
+ if (virStorageBackendRBDCreateImage(ptr.ioctx, vol->name, vol->capacity) < 0) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("failed to create volume '%s/%s'"),
pool->def->source.name,
--
1.7.9.5
11 years, 3 months
[libvirt] [PATCH] Allow root users to have their own configuration file
by Martin Kletzander
Currently, we have two configuration file paths, one global (where
"global" means root-only and we're probably not changing this in near
future) and one per-user. Unfortunately root user cannot use the
second option because until now we were choosing the file path
depending only on whether the user is root or not.
This patch modifies the mentioned behavior for root only, allowing him
to set his own configuration files without changing anything in
system-wide configuration folders.
This also makes the virsh-uriprecedence test pass its first test case
when ran as root.
Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
---
Notes:
I'm playing along previously mentioned "proper behavior" in this
patch. However, IMNSHO, our "global" or "system-wide" configuration
file (defaulting to '/etc/libvirt/libvirt.conf') should be accessible
for all users since this has no security impact (security information
may be in files 'libvirtd.conf' or 'qemu.conf'). This file should be
also read and used for all users. After that, settings in user
configuration file (defaulting to '~/.config/libvirt/libvirt.conf')
may override some of these settings for that user.
This is how all sensible configurations are loaded and that's also
what I'd prefer. Unfortunately some developers feels this should be
done in completely different way.
src/libvirt.c | 56 ++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 36 insertions(+), 20 deletions(-)
diff --git a/src/libvirt.c b/src/libvirt.c
index 20a2d4c..bfc466b 100644
--- a/src/libvirt.c
+++ b/src/libvirt.c
@@ -957,28 +957,34 @@ error:
return -1;
}
-static char *
-virConnectGetConfigFilePath(void)
+/*
+ * Return code 0 means no error, but doesn't guarantee path != NULL.
+ */
+static int
+virConnectGetConfigFilePath(char **path, bool global)
{
- char *path;
- if (geteuid() == 0) {
- if (virAsprintf(&path, "%s/libvirt/libvirt.conf",
+ char *userdir = NULL;
+ int ret = -1;
+ *path = NULL;
+
+ /* Don't provide the global configuration file to non-root users */
+ if (geteuid() != 0 && global)
+ return 0;
+
+ if (global) {
+ if (virAsprintf(path, "%s/libvirt/libvirt.conf",
SYSCONFDIR) < 0)
- return NULL;
+ goto cleanup;
} else {
- char *userdir = virGetUserConfigDirectory();
- if (!userdir)
- return NULL;
-
- if (virAsprintf(&path, "%s/libvirt.conf",
- userdir) < 0) {
- VIR_FREE(userdir);
- return NULL;
- }
- VIR_FREE(userdir);
+ if (!(userdir = virGetUserConfigDirectory()) ||
+ virAsprintf(path, "%s/libvirt.conf", userdir) < 0)
+ goto cleanup;
}
- return path;
+ ret = 0;
+ cleanup:
+ VIR_FREE(userdir);
+ return ret;
}
static int
@@ -989,12 +995,22 @@ virConnectGetConfigFile(virConfPtr *conf)
*conf = NULL;
- if (!(filename = virConnectGetConfigFilePath()))
+ /* Try reading user configuration file unconditionally */
+ if (virConnectGetConfigFilePath(&filename, false) < 0)
goto cleanup;
if (!virFileExists(filename)) {
- ret = 0;
- goto cleanup;
+ /* and in case there is none, try the global one. */
+
+ VIR_FREE(filename);
+ if (virConnectGetConfigFilePath(&filename, true) < 0)
+ goto cleanup;
+
+ if (!filename ||
+ !virFileExists(filename)) {
+ ret = 0;
+ goto cleanup;
+ }
}
VIR_DEBUG("Loading config file '%s'", filename);
--
1.8.3.2
11 years, 3 months
[libvirt] [PATCHv2] netcf driver: use a single netcf handle for all connections
by Laine Stump
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=983026
The netcf interface driver previously had no state driver associated
with it - as a connection was opened, it would create a new netcf
instance just for that connection, and close it when it was
finished. the problem with this is that each connection to libvirt
used up a netlink socket, and there is a per process maximum of ~1000
netlink sockets.
The solution is to create a state driver to go along with the netcf
driver. The state driver will opens a netcf instance, then all
connections share that same netcf instance, thus only a single
netlink socket will be used no matter how many connections are mde to
libvirtd.
This was rather simple to do - a new virObjectLockable class is
created for the single driverState object, which is created in
netcfStateInitialize and contains the single netcf handle; instead of
creating a new object for each client connection, netcfInterfaceOpen
now just increments the driverState object's reference count and puts
a pointer to it into the connection's privateData. Similarly,
netcfInterfaceClose() just un-refs the driverState object (as does
netcfStateCleanup()), and virNetcfInterfaceDriverStateDispose()
handles closing the netcf instance. Since all the functions already
have locking around them, the static lock functions used by all
functions just needed to be changed to call virObjectLock() and
virObjectUnlock() instead of directly calling the virMutex* functions.
---
Changes from V1:
* make driverState a static.
* switch to using a virObjectLockable for driverState, at
Eric's suggestion.
* add a simple error message if ncf_init() fails.
Again, I've tried this with a small number of simultaneous connections
(including virt-manager), but I don't have a ready-made stress test.
src/interface/interface_backend_netcf.c | 173 +++++++++++++++++++++++---------
1 file changed, 125 insertions(+), 48 deletions(-)
diff --git a/src/interface/interface_backend_netcf.c b/src/interface/interface_backend_netcf.c
index f47669e..627c225 100644
--- a/src/interface/interface_backend_netcf.c
+++ b/src/interface/interface_backend_netcf.c
@@ -41,19 +41,119 @@
/* Main driver state */
typedef struct
{
- virMutex lock;
+ virObjectLockable parent;
struct netcf *netcf;
} virNetcfDriverState, *virNetcfDriverStatePtr;
+static virClassPtr virNetcfDriverStateClass;
+static void virNetcfDriverStateDispose(void *obj);
-static void interfaceDriverLock(virNetcfDriverStatePtr driver)
+static int
+virNetcfDriverStateOnceInit(void)
+{
+ if (!(virNetcfDriverStateClass = virClassNew(virClassForObjectLockable(),
+ "virNetcfDriverState",
+ sizeof(virNetcfDriverState),
+ virNetcfDriverStateDispose)))
+ return -1;
+ return 0;
+}
+
+VIR_ONCE_GLOBAL_INIT(virNetcfDriverState)
+
+static virNetcfDriverStatePtr driverState = NULL;
+
+static void
+virNetcfDriverStateDispose(void *obj)
+{
+ virNetcfDriverStatePtr driver = obj;
+
+ if (driver->netcf)
+ ncf_close(driver->netcf);
+}
+
+static void
+interfaceDriverLock(virNetcfDriverStatePtr driver)
+{
+ virObjectLock(driver);
+}
+
+static void
+interfaceDriverUnlock(virNetcfDriverStatePtr driver)
+{
+ virObjectUnlock(driver);
+}
+
+static int
+netcfStateInitialize(bool privileged ATTRIBUTE_UNUSED,
+ virStateInhibitCallback callback ATTRIBUTE_UNUSED,
+ void *opaque ATTRIBUTE_UNUSED)
+{
+ if (virNetcfDriverStateInitialize() < 0)
+ return -1;
+
+ if (!(driverState = virObjectLockableNew(virNetcfDriverStateClass)))
+ return -1;
+
+ /* open netcf */
+ if (ncf_init(&driverState->netcf, NULL) != 0) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("failed to initialize netcf"));
+ virObjectUnref(driverState);
+ driverState = NULL;
+ return -1;
+ }
+ return 0;
+}
+
+static int
+netcfStateCleanup(void)
{
- virMutexLock(&driver->lock);
+ if (!driverState) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("Attempt to close netcf state driver already closed"));
+ return -1;
+ }
+
+ if (virObjectUnref(driverState)) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("Attempt to close netcf state driver "
+ "with open connections"));
+ return -1;
+ }
+ driverState = NULL;
+ return 0;
}
-static void interfaceDriverUnlock(virNetcfDriverStatePtr driver)
+static int
+netcfStateReload(void)
{
- virMutexUnlock(&driver->lock);
+ int ret = -1;
+
+ if (!driverState)
+ return 0;
+
+ interfaceDriverLock(driverState);
+ ncf_close(driverState->netcf);
+ if (ncf_init(&driverState->netcf, NULL) != 0)
+ {
+ /* this isn't a good situation, because we can't shut down the
+ * driver as there may still be connections to it. If we set
+ * the netcf handle to NULL, any subsequent calls to netcf
+ * will just fail rather than causing a crash. Not ideal, but
+ * livable (since this should never happen).
+ */
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("failed to re-init netcf"));
+ driverState->netcf = NULL;
+ goto cleanup;
+ }
+
+ ret = 0;
+cleanup:
+ interfaceDriverUnlock(driverState);
+
+ return ret;
}
/*
@@ -148,61 +248,30 @@ static struct netcf_if *interfaceDriverGetNetcfIF(struct netcf *ncf, virInterfac
return iface;
}
-static virDrvOpenStatus netcfInterfaceOpen(virConnectPtr conn,
- virConnectAuthPtr auth ATTRIBUTE_UNUSED,
- unsigned int flags)
+static virDrvOpenStatus
+netcfInterfaceOpen(virConnectPtr conn,
+ virConnectAuthPtr auth ATTRIBUTE_UNUSED,
+ unsigned int flags)
{
- virNetcfDriverStatePtr driverState;
-
virCheckFlags(VIR_CONNECT_RO, VIR_DRV_OPEN_ERROR);
- if (VIR_ALLOC(driverState) < 0)
- goto alloc_error;
-
- /* initialize non-0 stuff in driverState */
- if (virMutexInit(&driverState->lock) < 0)
- {
- /* what error to report? */
- goto mutex_error;
- }
-
- /* open netcf */
- if (ncf_init(&driverState->netcf, NULL) != 0)
- {
- /* what error to report? */
- goto netcf_error;
- }
+ if (!driverState)
+ return VIR_DRV_OPEN_ERROR;
+ virObjectRef(driverState);
conn->interfacePrivateData = driverState;
return VIR_DRV_OPEN_SUCCESS;
-
-netcf_error:
- if (driverState->netcf)
- {
- ncf_close(driverState->netcf);
- }
- virMutexDestroy(&driverState->lock);
-mutex_error:
- VIR_FREE(driverState);
-alloc_error:
- return VIR_DRV_OPEN_ERROR;
}
-static int netcfInterfaceClose(virConnectPtr conn)
+static int
+netcfInterfaceClose(virConnectPtr conn)
{
if (conn->interfacePrivateData != NULL)
{
- virNetcfDriverStatePtr driver = conn->interfacePrivateData;
-
- /* close netcf instance */
- ncf_close(driver->netcf);
- /* destroy lock */
- virMutexDestroy(&driver->lock);
- /* free driver state */
- VIR_FREE(driver);
+ virObjectUnref(conn->interfacePrivateData);
+ conn->interfacePrivateData = NULL;
}
- conn->interfacePrivateData = NULL;
return 0;
}
@@ -1070,7 +1139,7 @@ static int netcfInterfaceChangeRollback(virConnectPtr conn, unsigned int flags)
#endif /* HAVE_NETCF_TRANSACTIONS */
static virInterfaceDriver interfaceDriver = {
- "netcf",
+ .name = INTERFACE_DRIVER_NAME,
.interfaceOpen = netcfInterfaceOpen, /* 0.7.0 */
.interfaceClose = netcfInterfaceClose, /* 0.7.0 */
.connectNumOfInterfaces = netcfConnectNumOfInterfaces, /* 0.7.0 */
@@ -1093,11 +1162,19 @@ static virInterfaceDriver interfaceDriver = {
#endif /* HAVE_NETCF_TRANSACTIONS */
};
+static virStateDriver interfaceStateDriver = {
+ .name = INTERFACE_DRIVER_NAME,
+ .stateInitialize = netcfStateInitialize,
+ .stateCleanup = netcfStateCleanup,
+ .stateReload = netcfStateReload,
+};
+
int netcfIfaceRegister(void) {
if (virRegisterInterfaceDriver(&interfaceDriver) < 0) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("failed to register netcf interface driver"));
return -1;
}
+ virRegisterStateDriver(&interfaceStateDriver);
return 0;
}
--
1.7.11.7
11 years, 3 months
[libvirt] Doc: How to use NPIV in libvirt
by Osier Yang
Before posting it to WIKI or somewhere, I want to see if there is any
suggestions on it, or if I missed something.
============================================
How to use NPIV in libvirt
I planned to wrote a document about how to use NPIV in libvirt after
more features are supported, but it looks like I can't wait till then,
got lots lots of questions from both the bugs and mails. So here we go.
The document tries to summary up the things about NPIV that libvirt
supports till now, and the TODO list. Feedback or suggestion is welcomed.
1) How to find out which HBA(s) support vHBA
For libvirt newer than "1.0.4", you can find it out simply by:
# virsh nodedev-list --cap vports
"--cap vports" is to tell "nodedev-list" only outputs the devices
which support "vports" capability, i.e. support vHBA.
And also since version "1.0.4", you should be able to know the maximum
vports the HBA supports and the current vports number from the HBA's XML,
e.g.
# virsh nodedev-dumpxml scsi_host5
<device>
<name>scsi_host5</name>
<parent>pci_0000_04_00_1</parent>
<capability type='scsi_host'>
<host>5</host>
<capability type='fc_host'>
<wwnn>2001001b32a9da4e</wwnn>
<wwpn>2101001b32a9da4e</wwpn>
<fabric_wwn>2001000dec9877c1</fabric_wwn>
</capability>
<capability type='vport_ops'>
<max_vports>164</max_vports>
<vports>5</vports>
</capability>
</capability>
</device>
For libvirt older than "1.0.4", it's a bit complicated than above:
First you need to find out all the HBAs, e.g.
# virsh nodedev-list --cap scsi_host
scsi_host0
scsi_host1
scsi_host2
scsi_host3
scsi_host4
scsi_host5
And then, to see if the HBA supports vHBA, check if the dumped
XML contains "vport_ops" capability. E.g.
# virsh nodedev-dumpxml scsi_host3
<device>
<name>scsi_host3</name>
<parent>pci_0000_00_08_0</parent>
<capability type='scsi_host'>
<host>3</host>
</capability>
</device>
That says "scsi_host3" doesn't support vHBA
# virsh nodedev-dumpxml scsi_host5
<device>
<name>scsi_host5</name>
<parent>pci_0000_04_00_1</parent>
<capability type='scsi_host'>
<host>5</host>
<capability type='fc_host'>
<wwnn>2001001b32a9da4e</wwnn>
<wwpn>2101001b32a9da4e</wwpn>
<fabric_wwn>2001000dec9877c1</fabric_wwn>
</capability>
<capability type='vport_ops' />
</capability>
</device>
But "scsi_host5" supports it.
One might be confused with the node device naming style (e.g. scsi_host5)
in this document and RHEL6 Virtualization Guide [1]
(pci_10df_fe00_scsi_host_0). It's because of libvirt has two backends for
node device driver: udev and HAL. We prefer the udev backend more than HAL
backend in internal implementation, I think there is good enough reason to
do so (HAL is maintenance mode now). I believe udev backend is used more
than HAL backend, but if your destribution packager build libvirt without
udev backend, don't be surprised with the node device names like the ones
in [1].
2) How to create a vHBA
Pick up one HBA which supports vHBA, use it's "node device name" as the
"parent" of vHBA, and specify the "wwnn" and "wwpn" in the vHBA's XML. E.g.
<device>
<name>scsi_host6</name>
<parent>scsi_host5</parent>
<capability type='scsi_host'>
<capability type='fc_host'>
<wwnn>2001001b32a9da5e</wwnn>
<wwpn>2101001b32a9da5e</wwpn>
</capability>
</capability>
</device>
Then create the vHBA with virsh command "nodedev-create" (assuming above
XML file is named "vhba.xml"):
# virsh nodedev-create vhba.xml
Node device scsi_host6 created from vhba.xml
Since "0.9.10", libvirt will generate "wwnn" and "wwpn" automatically if
they are not specified. It means one can create the vHBA by a more simple
XML like:
<device>
<parent>scsi_host5</parent>
<capability type='scsi_host'>
<capability type='fc_host'>
</capability>
</capability>
</device>
3) How to destroy a vHBA
As usual, destroying something is always simpler than creating it:
# virsh nodedev-destroy scsi_host6
Destroyed node device 'scsi_host6'
You might already realize that the vHBA is removed permanently, don't be
surprised, it's the life, node device driver doesn't support persistent
config. I won't say it's nightmare for users who screams when realizing the
vHBA disappeared after a system rebooting, but it's relatively not good,
(assuming that you got the wwnn:wwpn pair from the storage admin, but didn't
record it). Fortunately, we support the persistent vHBA now, see next
section
for details.
4) How to create a persistent vHBA
Let's go back to the history a bit firstly.
Prior to libvirt "1.0.5", one can define a "scsi" type pool based on a
(v)HBA by it's scsi host name (e.g. "host5" in XML below). E.g.
<pool type='scsi'>
<name>poolhba0</name>
<uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
<capacity unit='bytes'>0</capacity>
<allocation unit='bytes'>0</allocation>
<available unit='bytes'>0</available>
<source>
<adapter name='host0'/>
</source>
<target>
<path>/dev/disk/by-path</path>
<permissions>
<mode>0700</mode>
<owner>0</owner>
<group>0</group>
</permissions>
</target>
</pool>
Quite nice? yeah, at least it looks so, but the problem is the scsi host
number is *unstable* (it can be changed after system rebooting, or kernel
module reloading, or a vHBA recreating etc), and thus the "scsi" type pool
based on a (v)HBA becomes unstable too. Obviously it doesn't help on the
"persistent vHBA" problem.
To solve the problems, since libvirt "1.0.5", we introduced new XML
schema
to indicate the (v)HBA. An example of the XML:
<pool type='scsi'>
<name>poolvhba0</name>
<uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
<source>
<adapter type='fc_host' parent='scsi_host5'
wwnn='20000000c9831b4b' wwpn='10000000c9831b4b'/>
</source>
<target>
<path>/dev/disk/by-path</path>
<permissions>
<mode>0700</mode>
<owner>0</owner>
<group>0</group>
</permissions>
</target>
</pool>
It allows to define a "scsi" type pool based on either a HBA or a
vHBA. For
HBA, "parent" attribute can be omitted. For vHBA, if "parent" is not
specified,
libvirt will pick up the first HBA which supports vHBA, and doesn't
exceed the
maximum vports it supports, automatically.
For the pool based on a vHBA, When the pool is starting, libvirt will
check
if the specified vHBA (wwnn:wwpn) is existing on host or not, if it doesn't
exist yet, libvirt will create it automatically. When the pool is being
stopped,
the vHBA is destroyed. But since storage driver supports the persistent
config,
one can easily gets the vHBA with same "wwnn:wwpn" in next starting
(Don't scream
if your pool is transient).
It's not the end if you want to get the vHBA created automatically
after system
rebooting, you will need to set the pool as "autostart":
# virsh pool-autostart poolvhba0
One might be curious about why not to support persistent config for
node device
driver, and support to create persistent vHBA there. One of the reason
is that
it will be duplicate with what storage pool does. And another reason
(the important
one) is we want to assiciate the libvirt storage pool/volume with domain
(see
section "Use LUN for guest" below).
5) How to find out the LUN's path
If you have defined the "scsi" type pool based on the (v)HBA, it's
simple to
lookup what LUNs attached to the (v)HBA by virsh command "vol-list", e.g.
# virsh vol-list poolvhba0 --details
Name Path Type Capacity Allocation
--------------------------------------------------------------------------------------------------------
unit:0:2:0
/dev/disk/by-path/pci-0000:04:00.1-fc-0x203500a0b85ad1d7-lun-0 block
20.01 GiB 20.01 GiB
If you have not defined a "scsi" type pool based on the (v)HBA, you
can find it
out (v)HBA by either virsh command "nodedev-list --tree", or iterating
sysfs manually.
To find out the LUNs by virsh command "nodedev-list" (irrelevant
ouputs are
omitted):
# virsh nodedev-list --tree
+- pci_0000_00_0d_0
| |
| +- pci_0000_04_00_0
| | |
| | +- scsi_host4
| |
| +- pci_0000_04_00_1
| |
| +- scsi_host5
| |
| +- scsi_host7
| +- scsi_target5_0_0
| | |
| | +- scsi_5_0_0_0
| |
| +- scsi_target5_0_1
| | |
| | +- scsi_5_0_1_0
| |
| +- scsi_target5_0_2
| | |
| | +- scsi_5_0_2_0
| | |
| | +- block_sdb_3600a0b80005adb0b0000ab2d4cae9254
| |
| +- scsi_target5_0_3
| |
| +- scsi_5_0_3_0
"scsi_host5" is an HBA on my host, it has a LUN named
"block_sdb_3600a0b80005adb0b0000ab2d4cae9254", don't be confused with
the naming,
it's the naming style libvirt uses, meaningful only for libvirt. It
indicates
the LUN has a short device path "/dev/sdb", and a ID
"3600a0b80005adb0b0000ab2d4cae9254":
# ls /dev/disk/by-id/ | grep 3600a0b80005adb0b0000ab2d4cae9254
scsi-3600a0b80005adb0b0000ab2d4cae9254
To manually find the LUNs of a (v)HBA:
First, you need to iterate over all the directores begins with the SCSI
scsi host number of the v(HBA) under "/sys/bus/scsi/devices". E.g. I
will look
up the LUNs of the HBA with SCSI host number 5 on my host:
# ls /sys/bus/scsi/devices/5:* -d
/sys/bus/scsi/devices/5:0:0:0 /sys/bus/scsi/devices/5:0:1:0
/sys/bus/scsi/devices/5:0:2:0 /sys/bus/scsi/devices/5:0:3:0
# ls /sys/bus/scsi/devices/5\:0\:3\:0/block/sdc
It means scsi_host5 has a LUN attached with device name "sdc" on address
"5:0:3:0".
# ls /sys/bus/scsi/devices/5\:0\:1\:0/ | grep block
device_blocked
scsi_host5 doesn't have a LUN attached on address "5:0:2:0"
The device name like "sdc" is not stable, to find out the stable
path, find
out the symbol link which points to the device name. E.g.
# ls -l /dev/disk/by-path/
lrwxrwxrwx. 1 root root 9 Sep 10 22:28
pci-0000:00:07.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx. 1 root root 10 Sep 10 22:28
pci-0000:00:07.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 9 Sep 10 22:28
pci-0000:04:00.1-fc-0x203400a0b85ad1d7-lun-0 -> ../../sdc
Then "/dev/disk/by-path/pci-0000:04:00.1-fc-0x203400a0b85ad1d7-lun-0"
is the
stable path of the LUN attached to address "5:0:3:0". Of course, you can use
the similiar method to get the "by-id | by-uuid | by-label" stable path.
6) Use the LUN to guest
Since libvirt "1.0.5", we supported to use the storage volume as disk
source by
two new attributes ("pool" and "volume") for disk "<source"> element. E.g.
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='poolvhba0' volume='unit:0:2:0 '/>
<target dev='hda' bus='ide'/>
</disk>
There are lots of advantage to do so. Since the mainly purpose of the
document is about "how to use", I will only mention two here to persuade
you using the it. First, you don't need to look up the LUN's path youself.
Second, assuming that you want to migrate a domain which uses a LUN attached
to a vHBA, do you want to create the vHBA manually on target host? With the
pool, you can simply define/start a pool with same config on target host.
So, if your libvirt is newer than "1.0.5", we recommend you to define the
"scsi" type pool based on the (v)HBA, and use "pool/volume" names to use
the LUN as disk source.
You can either use the LUN as qemu emulated disk, or passthrough it to
guest.
To use it as qemu emulated disk, specifying the "device" attribute as
"device='disk|cdrom|floppy'". E.g.
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='blk-pool0' volume='blk-pool0-vol0'/>
<target dev='hda' bus='ide'/>
</disk>
Or (using the LUN's path directly)
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source
dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
<target dev='sda' bus='scsi'/>
</disk>
To passthrough the LUN, specifying the "device" attribute as
"device='lun'", e.g.
<disk type='volume' device='lun'>
<driver name='qemu' type='raw'/>
<source
dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
<target dev='sda' bus='scsi'/>
</disk>
6) Future work
* NPIV based SCSI host passthrough
That's what the users ask: How to passthrough a (v)HBA to guest?
* Expose vendor information, LUN's path, state of (v)HBA in its XML
* May be a virsh command to simplify vHBA creation with options
[1]
http://www.linuxtopia.org/online_books/rhel6/rhel_6_virtualization/rhel_6...
Regards,
Osier
11 years, 3 months
[libvirt] [v0.9.12-maint v2 00/12] Debian's 0.9.12 patches
by Guido Günther
These are the patches Debian is currently carrying on 0.9.12. Most are
straight cherry-picks. Since we're maintaining 0.9.12 for our current
stable release I'm happy to push these to v0.9.12-maint.
Daniel P. Berrange (2):
Don't ignore return value of qemuProcessKill
Fix race condition when destroying guests
Eric Blake (1):
build: fix virnetlink on glibc 2.11
Jiri Denemark (3):
daemon: Fix crash in virTypedParameterArrayClear
Revert "rpc: Discard non-blocking calls only when necessary"
qemu: Add support for -no-user-config
Luca Tettamanti (1):
Make sure regfree is called close to it's usage
Martin Kletzander (1):
security: Fix libvirtd crash possibility
Peter Krempa (4):
qemu: Fix off-by-one error while unescaping monitor strings
rpc: Fix crash on error paths of message dispatching
conf: Remove callback from stream when freeing entries in console hash
conf: Remove console stream callback only when freeing console helper
cfg.mk | 3 +-
daemon/remote.c | 16 +-
src/conf/virconsole.c | 13 ++
src/qemu/qemu_capabilities.c | 7 +-
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_command.c | 11 +-
src/qemu/qemu_driver.c | 21 ++-
src/qemu/qemu_monitor.c | 11 +-
src/rpc/virnetclient.c | 21 +--
src/rpc/virnetserverclient.c | 3 +
src/rpc/virnetserverprogram.c | 11 +-
src/storage/storage_backend_logical.c | 5 +-
src/util/virnetlink.h | 2 +
tests/qemuhelpdata/qemu-1.1 | 268 ++++++++++++++++++++++++++++++++++
tests/qemuhelpdata/qemu-1.1-device | 160 ++++++++++++++++++++
tests/qemuhelptest.c | 75 ++++++++++
16 files changed, 586 insertions(+), 42 deletions(-)
create mode 100644 tests/qemuhelpdata/qemu-1.1
create mode 100644 tests/qemuhelpdata/qemu-1.1-device
--
1.8.4.rc3
11 years, 3 months
[libvirt] [PATCH] LXC: don't try to mount selinux filesystem when user namespace enabled
by Gao feng
Right now we mount selinuxfs even user namespace is enabled and
ignore the error. But we shouldn't ignore these errors when user
namespace is not enabled.
This patch skips mounting selinuxfs when user namespace enabled.
Signed-off-by: Gao feng <gaofeng(a)cn.fujitsu.com>
---
src/lxc/lxc_container.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 661ac52..84b1b57 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -797,7 +797,7 @@ static int lxcContainerMountBasicFS(bool userns_enabled)
#if WITH_SELINUX
if (STREQ(mnts[i].src, SELINUX_MOUNT) &&
- !is_selinux_enabled())
+ (!is_selinux_enabled() || userns_enabled))
continue;
#endif
@@ -814,12 +814,6 @@ static int lxcContainerMountBasicFS(bool userns_enabled)
VIR_DEBUG("Mount %s on %s type=%s flags=%x, opts=%s",
srcpath, mnts[i].dst, mnts[i].type, mnts[i].mflags, mnts[i].opts);
if (mount(srcpath, mnts[i].dst, mnts[i].type, mnts[i].mflags, mnts[i].opts) < 0) {
-#if WITH_SELINUX
- if (STREQ(mnts[i].src, SELINUX_MOUNT) &&
- (errno == EINVAL || errno == EPERM))
- continue;
-#endif
-
virReportSystemError(errno,
_("Failed to mount %s on %s type %s flags=%x opts=%s"),
srcpath, mnts[i].dst, NULLSTR(mnts[i].type),
--
1.8.3.1
11 years, 3 months
[libvirt] [PATCH] qemu: Fix checking of guest ABI compatibility when reverting snapshots
by Peter Krempa
When reverting a live internal snapshot with a live guest the ABI
compatiblity check was comparing a "migratable" definition with a normal
one. This resulted in the check failing with:
revert requires force: Target device address type none does not match source pci
This patch generates a "migratable" definition from the actual one to
check against the definition from the snapshot to avoid this problem.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1006886
---
src/qemu/qemu_driver.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index bbf2d23..ae1948f 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -13037,6 +13037,7 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
qemuDomainObjPrivatePtr priv;
int rc;
virDomainDefPtr config = NULL;
+ virDomainDefPtr migratableDef = NULL;
virQEMUDriverConfigPtr cfg = NULL;
virCapsPtr caps = NULL;
@@ -13151,8 +13152,13 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
* to have finer control. */
if (virDomainObjIsActive(vm)) {
/* Transitions 5, 6, 8, 9 */
- /* Check for ABI compatibility. */
- if (config && !virDomainDefCheckABIStability(vm->def, config)) {
+ /* Check for ABI compatibility. We need to do this check against
+ * the migratable XML or it will always fail otherwise */
+ if (!(migratableDef = qemuDomainDefCopy(driver, vm->def,
+ VIR_DOMAIN_XML_MIGRATABLE)))
+ goto cleanup;
+
+ if (config && !virDomainDefCheckABIStability(migratableDef, config)) {
virErrorPtr err = virGetLastError();
if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) {
@@ -13357,6 +13363,7 @@ cleanup:
}
if (vm)
virObjectUnlock(vm);
+ virDomainDefFree(migratableDef);
virObjectUnref(caps);
virObjectUnref(cfg);
--
1.8.3.2
11 years, 3 months