[libvirt] [PATCH] Fix dlopen dependency
by Matthias Bolte
Since the addition of the lock manager framework in 6a943419c528fdd7
dlopen is always required, but the checks in configure wasn't changed
to reflect that. This didn't show up directly because the VirtualBox
driver linking dlopen in covered it. But disabling the VirtualBox
driver makes the build fail due to missing dlopen.
Change the dlopen check in configure to pick up dlopen when available.
Reported by Ruben Kerkhof.
---
configure.ac | 46 ++++++++++++++++++++++++++--------------------
src/Makefile.am | 2 +-
2 files changed, 27 insertions(+), 21 deletions(-)
diff --git a/configure.ac b/configure.ac
index 985b8c2..f816696 100644
--- a/configure.ac
+++ b/configure.ac
@@ -417,6 +417,28 @@ fi
dnl
+dnl check for libdl
+dnl
+
+dlfcn_found=yes
+dlopen_found=yes
+
+AC_CHECK_HEADER([dlfcn.h],, [dlfcn_found=no])
+AC_SEARCH_LIBS([dlopen], [dl],, [dlopen_found=no])
+
+case $ac_cv_search_dlopen:$host_os in
+ 'none required'* | *:mingw* | *:msvc*) DLOPEN_LIBS= ;;
+ no*) AC_MSG_ERROR([Unable to find dlopen()]) ;;
+ *) if test "x$dlfcn_found" != "xyes"; then
+ AC_MSG_ERROR([Unable to find dlfcn.h])
+ fi
+ DLOPEN_LIBS=$ac_cv_search_dlopen ;;
+esac
+
+AC_SUBST([DLOPEN_LIBS])
+
+
+dnl
dnl check for VirtualBox XPCOMC location
dnl
@@ -432,14 +454,6 @@ AC_DEFINE_UNQUOTED([VBOX_XPCOMC_DIR], ["$vbox_xpcomc_dir"],
[Location of directory containing VirtualBox XPCOMC library])
if test "x$with_vbox" = "xyes"; then
- AC_SEARCH_LIBS([dlopen], [dl],,)
- case $ac_cv_search_dlopen:$host_os in
- 'none required'* | *:mingw* | *:msvc*) DLOPEN_LIBS= ;;
- no*) AC_MSG_ERROR([Unable to find dlopen()]) ;;
- *) DLOPEN_LIBS=$ac_cv_search_dlopen ;;
- esac
- AC_SUBST([DLOPEN_LIBS])
-
case "$host" in
*-*-mingw* | *-*-msvc*) MSCOM_LIBS="-lole32 -loleaut32" ;;
*) MSCOM_LIBS= ;;
@@ -2138,19 +2152,10 @@ AC_ARG_WITH([driver-modules],
DRIVER_MODULE_CFLAGS=
DRIVER_MODULE_LIBS=
-if test "x$with_driver_modules" = "xyes" ; then
- old_cflags="$CFLAGS"
- old_libs="$LIBS"
- fail=0
- AC_CHECK_HEADER([dlfcn.h],[],[fail=1])
- AC_SEARCH_LIBS([dlopen], [dl], [], [fail=1])
- test $fail = 1 &&
- AC_MSG_ERROR([You must have dlfcn.h / dlopen() support to build driver modules])
-
- CFLAGS="$old_cflags"
- LIBS="$old_libs"
-fi
if test "$with_driver_modules" = "yes"; then
+ if test "$dlfcn_found" != "yes" || test "$dlopen_found" != "yes"; then
+ AC_MSG_ERROR([You must have dlfcn.h / dlopen() support to build driver modules])
+ fi
DRIVER_MODULE_CFLAGS="-export-dynamic"
case $ac_cv_search_dlopen in
no*) DRIVER_MODULE_LIBS= ;;
@@ -2468,6 +2473,7 @@ AC_MSG_NOTICE([])
AC_MSG_NOTICE([Libraries])
AC_MSG_NOTICE([])
AC_MSG_NOTICE([ libxml: $LIBXML_CFLAGS $LIBXML_LIBS])
+AC_MSG_NOTICE([ dlopen: $DLOPEN_LIBS])
if test "$with_esx" = "yes" ; then
AC_MSG_NOTICE([ libcurl: $LIBCURL_CFLAGS $LIBCURL_LIBS])
else
diff --git a/src/Makefile.am b/src/Makefile.am
index 3612a24..4f9bfc9 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -526,7 +526,7 @@ libvirt_driver_la_SOURCES = $(DRIVER_SOURCES)
libvirt_driver_la_CFLAGS = $(NUMACTL_CFLAGS) $(GNUTLS_CFLAGS) \
-I@top_srcdir@/src/conf $(AM_CFLAGS)
-libvirt_driver_la_LIBADD = $(NUMACTL_LIBS) $(GNUTLS_LIBS)
+libvirt_driver_la_LIBADD = $(NUMACTL_LIBS) $(GNUTLS_LIBS) $(DLOPEN_LIBS)
USED_SYM_FILES = libvirt_private.syms
--
1.7.0.4
13 years, 5 months
[libvirt] [PATCH] qemu: Faster response time to qemu startup errors
by Stefan Berger
The below patch decreases the response time of libvirt to errors
reported by Qemu upon startup by checking whether the qemu process is
still alive while polling for the local socket to show up.
This patch also introduces a special handling of signal for the Win32
part of virKillProcess.
Signed-off-by: Stefan Berger <stefanb(a)linux.vnet.ibm.com>
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
index 26bb814..92c44bf 100644
--- a/src/qemu/qemu_monitor.c
+++ b/src/qemu/qemu_monitor.c
@@ -247,7 +247,7 @@ qemuMonitorUnwatch(void *monitor)
}
static int
-qemuMonitorOpenUnix(const char *monitor)
+qemuMonitorOpenUnix(const char *monitor, pid_t cpid)
{
struct sockaddr_un addr;
int monfd;
@@ -274,7 +274,8 @@ qemuMonitorOpenUnix(const char *monitor)
if (ret == 0)
break;
- if (errno == ENOENT || errno == ECONNREFUSED) {
+ if ((errno == ENOENT || errno == ECONNREFUSED) &&
+ virKillProcess(cpid, 0) == 0) {
/* ENOENT : Socket may not have shown up yet
* ECONNREFUSED : Leftover socket hasn't been removed yet */
continue;
@@ -691,7 +692,7 @@ qemuMonitorOpen(virDomainObjPtr vm,
switch (config->type) {
case VIR_DOMAIN_CHR_TYPE_UNIX:
mon->hasSendFD = 1;
- mon->fd = qemuMonitorOpenUnix(config->data.nix.path);
+ mon->fd = qemuMonitorOpenUnix(config->data.nix.path, vm->pid);
break;
case VIR_DOMAIN_CHR_TYPE_PTY:
diff --git a/src/util/util.c b/src/util/util.c
index d00f065..df4dfac 100644
--- a/src/util/util.c
+++ b/src/util/util.c
@@ -2010,7 +2010,7 @@ int virKillProcess(pid_t pid, int sig)
* TerminateProcess is more or less equiv to SIG_KILL, in that
* a process can't trap / block it
*/
- if (!TerminateProcess(proc, sig)) {
+ if (sig != 0 && !TerminateProcess(proc, sig)) {
errno = ESRCH;
return -1;
}
13 years, 5 months
[libvirt] [PATCH] build: update to latest gnulib
by Eric Blake
* .gnulib: Update to latest, for more strerror_r fixes.
---
strerror_r has proven tricker than I first thought. There's a
couple of other useful improvements in here, too.
* .gnulib 9d196fa...79d4e75 (70):
> strerror_r-posix: fix on MacOS
> gnulib-tool: Better isolation between different gnulib-tool invocations.
> strerror: simplify replacement
> strerror_r-posix: Tweaks.
> perror: document fixed bugs
> stat-time: get_stat_birthtime failure is better-defined
> strerror_r-posix: work around cygwin 1.7.9
> test-perror: relax test to ignore cygwin bug
> strerror: Move AC_LIBOBJ invocations to module description.
> perror: Use common idiom.
> autoupdate
> tests: fix usage message in 'mktempd_'
> tests init: new function 'fatal_', for hard errors
> doc/lgpl-2.1.texi
> canonicalize-lgpl: use common idiom
> canonicalize-lgpl: work around AIX realpath bug
> strerror: work around FreeBSD bug
> strerror-override: avoid bloating errno module
> Typo in recent ChangeLog entry.
> spawn-pipe tests: Rename program.
> spawn-pipe tests: Like the child program only against libc.
> careadlinkat: Avoid mismatch between ssize_t and int.
> gnulib-common.m4: add _GL_ATTRIBUTE_CONST and _GL_ATTRIBUTE_PURE
> ansi-c++-opt: Interoperability with libtool.
> acl: Fix test failure on AIX 7.
> pipe-filter-ii: Fix test failure on AIX and IRIX.
> localename: Fix link dependencies.
> error: Avoid gcc warning.
> unsetenv: Avoid gcc warning.
> setenv: Avoid gcc warning.
> sys_select: Ensure memset is declared also on AIX 7.
> maint.mk: sc_unmarked_diagnostics: don't hard-code "error"
> getopt: Avoid gcc warning.
> strerror_r: Fix comments.
> perror: Fix compilation error.
> setlocale: Enable replacement on Cygwin 1.5.
> strerror-override: Don't disable symbol renamings.
> Copyright: Use LGPL 2.1 instead of LGPL 2.0.
> doc: Fix a module name.
> pipe2: Remove dependency on 'nonblocking' module.
> maint.mk: add three prohibit-header-without-use rules
> allocator: 'die' routine is now given requested size
> strerror: drop strerror_r dependency
> perror: call strerror_r directly
> strerror_r: fix includes for FreeBSD
> Fix link errors in tests: openat-die uses gettext-h.
> build-aux/config.sub
> Fix link errors in tests: wait-process uses gettext-h.
> * modules/assert-h (assert.h): Substitute the symbol-prefix more consistently.
> assert-h: work around 'verify' incompatibility
> trim: remove three superfluous assignments
> wctype-h: Avoid namespace pollution on Solaris 2.6.
> parse-datetime.y: accommodate -Wstrict-overflow
> trim: avoid a warning from -O2 -Wstrict-overflow
> gnulib-tool: Fix bug in yesterday's commit.
> Allow multiple gnulib generated include files to be combined.
> assert-h: Allow multiple gnulib generated replacements to coexist.
> argp: Allow coexistence with strerror_r-posix module.
> Status of work-in-progress around libposix.
> gnulib-tool: Alternative structure of testdirs, similar to --import.
> getloadavg: Remove an unreliable safety check.
> doc: Cleanup yet another file produced by texinfo.tex.
> Finish the conditional dependencies mechanism.
> doc: Use a recent texinfo.tex.
> intprops.h: adjust another comment to match code change * lib/intprops.h (_GL_INT_SIGNED): Now, E may have side effects.
> intprops.h: adjust comment to match code change
> gen-uni-tables: Say "gen-uni-tables.c" consistently.
> mbsrchr: Avoid collision with system function on Interix.
> getopt: for ambiguous options, enumerate the possibilities.
> getcwd: work around mingw bug
.gnulib | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/.gnulib b/.gnulib
index 9d196fa..79d4e75 160000
--- a/.gnulib
+++ b/.gnulib
@@ -1 +1 @@
-Subproject commit 9d196fad055a448c5732a8e950cc044b353d2615
+Subproject commit 79d4e75d8e14dee5d91f58413942fe875857d4f5
--
1.7.4.4
13 years, 5 months
[libvirt] [PATCH 00/12] Coverity cleanups, round 2
by Eric Blake
Well, I guess I didn't send these in time for 0.9.2.
Again, some are bigger in impact than others.
Eric Blake (12):
build: detect Coverity 5.3.0
storage: avoid mishandling backing store > 2GB
build: silence coverity false positive
python: avoid unlikely sign extension bug
debug: avoid null dereference on uuid lookup api
uuid: annotate non-null requirements
qemu: reorder checks for safety
secret: drop dead code
esx: avoid dead code
build: silence coverity false positives
qemu: add missing break statement
build: break some long lines
configure.ac | 4 ++-
python/libvirt-override.c | 2 +-
src/conf/nwfilter_conf.c | 4 +++
src/esx/esx_vi.c | 4 +-
src/libvirt.c | 42 ++++++++++++++++++++++++---------------
src/qemu/qemu_cgroup.c | 4 +-
src/qemu/qemu_hotplug.c | 47 +++++++++++++++++++++++++++++---------------
src/secret/secret_driver.c | 8 +------
src/util/storage_file.c | 3 +-
src/util/util.c | 2 +-
src/util/uuid.h | 8 ++++--
tools/virsh.c | 3 ++
12 files changed, 81 insertions(+), 50 deletions(-)
--
1.7.4.4
13 years, 5 months
[libvirt] RFC: extending sVirt to confine host apps which talk to libvirtd
by Daniel P. Berrange
What follows is a document outlining some thoughts I've been having
on extending sVirt to allow confinement of applications which talk
to libvirtd on the host, primarily focusing on use of SELinux, but
also allowing a simple non-SElinux RBAC mechanism.
Securing KVM virtualization hosts with MAC
==========================================
This document looks at the task of securing KVM virtualizaton
hosts using mandatory access control technologies, with focus
on SELinux. At the time of writing there have been two phases
of development, and this document makes proposals for a third
phase.
Phase 1: circa 2006
-------------------
Goal: Protect the host from a compromised virtual machine.
The first phase of development had the modest goal of
protecting the host from attack by a compromised virtual
machine. To achieve this, the KVM processes are configured
such that they will run under a confined security context
('virt_t' in the SELinux reference policy), which blocks
access to any host resources not labelled ('virt_image_t')
for use by virtual machines.
The primary limitations of this initial implementation
is that while the virtual host is secured, there is no
protection between virtual machines. This can be considered
a regression in isolation as compared to that offered by
non-virtualized hosts. The second limitation is that the
virtualization admin has to take care to ensure the host
resources intended for use by the virtual machines are
correctly labelled. This is a manual setup taks unless
the images are kept in a preset location (/var/lib/libvirt/images
in the SELinux reference policy).
Phase 2: March 2009
-------------------
Goal: Protect virtual machines from each other
The second phase of development has the goal of providing
isolation between virtual machines that is comparable to
that achieved between physical machines. This piece of
work is commonly referred to as "svirt". The achieve this,
the KVM processes are each configured to run under a
dedicated security context, which blocks access to any
resources not explicitly assigned to that virtual machine.
In the SELinux implementation, the base context "svirt_t"
has a unique MCS category ("c240,c955") appended to form
a unique security context "system_u:system_r:svirt_t:s0:c240,c955".
For each host resource to be assigned to the virtual machine,
the base context "svirt_image_t" is combined with the same
MCS category to form a unique resource security context
"system_u:object_r:svirt_image_t:s0:c240,c955".
The assignment of virtual machine security contexts and
labelling of resources can be done statically by the
administrator / management application, or dynamically
by the libvirtd daemon. The latter removes much of the
administrator burden.
The second phase has addressed the major guest security
limitation of the first phase, and eased the burden placed
on host administors. Attention can now focus on the security
of the host management software stack. Client applications
communicate with the libvirtd daemon using a simple sockets
based RPC protocol. Thus operations initiated by client
applications which run under one security context are in
fact invoked under the libvirtd daemon's security context.
Since the libvirtd daemon is a highly privileged, almost
unconfined process, this provides a means for applications
to elevate their privileges.
A second problem with the current model is seen when looking
at guest migration between hosts. During migration, there
are two QEMU processes running for the same virtual machine,
one process on each host. The dynamic assignment of MCS
values to form unique security contexts is done on a per host
basis, so there is no guarantee that the VM on host A will be
using (or be able to use) the same security context on the
target host of migration. This is not neccessarily a problem
if the guest is using block devices, since block device inode
labels are only visible to a single host. With a shared
filesystem that supports SELinux labelling, like GFS2, both
QEMU processes must run in the same security context to allow
them both to access the associated files.
Phase 3: June 2011
------------------
Goal: Protect virtual machines from host applications
The third phase of development has the primary goal of
honouring the confinement of client applications talking
to libvirtd, when performing operations on virtual machines
and other managed objects (storage pools, host devices,
virtual networks, secrets, etc). Every application connecting
to libvirt has an associated security context. Every object
managed by libvirtd will have an associated security context.
When an operation is invoked via a libvirt API the client
application security context will be checked against the
target object context, before proceeding. Thus applications
will not be able to make use of a libvirtd connection to
perform operations that are otherwise blocked.
The secondary goal is to add further flexibility and safety
to the way MCS categories are assigned, and files are relabelled.
Instead of maintaining a local database of assigned labels, there
must be some shared storage where label usage can be recorded.
At its simplest this can be an NFS share, with one file per MCS
category and locking with fcntl(). An alternative would to be
acquire leases using a lock manager such as sanlock. In addition,
the guest configuration will be enhanced such that a guest can
be assigned a statically chosen security context, but still make
use of dynamic relabelling of resources. Finally the existing
boolean mode of 'static' vs 'dynmamic' label generation will be
turned into a tri-state, introducing a 'hybrid' mode where the
client supplies a custom base context, and the MCS part is still
auto-generated.
Usage scenarios
---------------
To aid in development a couple of relevant core use cases
or usage scenarios have been identified:
1. A virtual machine monitoring application
For this example, consider the simple monitoring application
'virt-top'. This application displays a list of all virtual
machines on the host and their associated resource utilization
(CPU, disk, network). This application has no need to be able
to stop/start/define virtual machines, nor do any operation
related to host devices, storage, or networking. Traditionally
this application is written to use a read only libvirt connection.
With enhanced access control from libvirtd, the policy would define
a new security context 'virt_top_t' for the 'virt-top' application.
This policy would allow 'list', 'read', 'readstats' on the 'domain'
object type.
2. A multi-guest, multi-user MLS enabled host
For this example, consider a virtualizaton host with MLS policy
that is running multiple virtual machines, for a variety of
different users. A user with the security level "restricted"
must not be allowed to control virtual machines with a security
level of "confidential". Conversely a user with security level
"secret" must not be allowed to create virtual machines with a
security level of "unclassified".
With enhanced access control from libvirtd, getpeercon() would
provide the security context of the client application (user).
The client context would be used to perform an AVC when any API
operation is invoked, thus ensuring that the client's MLS
label is honoured in access control checks. The effect would be
that when an 'restricted' user asked for a list of virtual machines
only virtual machines at level 'restricted' or below would be
returned. Or when a "secret" user asked to start a guest when
a security level of 'unclassified', the operation would be denied.
3. Identity transitions from trusted agents
For this example, consider a trusted agent such as libvirt-qpid,
or libvirt-snmp, which translates the libvirt API from its native
model, into an alternate access model. In such an example, the
agent talking to libvirtd will have authenticated itself. The
peer identity that libvirtd sees, however, is that of the agent,
not the ultimate (end-user) client. In such a case it will desirable
to allow a trusted agent to transition to a different identity when
performing operations.
An end user running under context "unconfined_u:unconfined_r:virt_top_t:s0-s0:c0.c1023"
may talk to the libvirt-qpid agent which runs under the context
"system_u:system_r:virt_qpid_t:s0-s0:c0.c1023". The libvirt-qpid
connects to libvirtd which sees 'virt_qpid_t' as the client type.
The policy is written to allow transitions from 'virt_qpid_t' to
the 'virt_top_t' type, so when the virt-top client connects to
libvirt-qpid, it changes its identity to 'virt_top_t'. From that
point onwards, all AVC checks honour the privileges of the ultimate
end user application, rather than the libvirt-qpid intermediary.
The same mechanism also ensures that the client application MLS
level is transferred via the libvirt-qpid agent to libvirtd.
Anticipated Development tasks
-----------------------------
1. Extend the domain XML to add a third attribute to the <seclabel>
element relabel="yes|no", to control whether libvirtd will
automatically label resources assigned to a guest. If the
existing 'mode' attribute is "dynamic", then relabelling will
default to enabled, while if it is 'static', then relabelling
will default to disabled. Also change 'mode' to allow a new
'hybrid' value.
2. Determine how to maintain/identify security labels for other
managed objects, including virStoragePoolPtr, virStorageVolPtr,
virSecretPtr, virNetworkPtr, virInterfacePtr, virNodeDevicePtr,
an host level APIs without any explicit managed object.
3. Extend XML for non-domain objects to implant security labels
as identified in step 2.
4. Create an internal virIdentity struct to store the identity
of the client. This will include at least the x509 distinguished
name, the SASL username, the SELinux context (getpeercon())
and UNIX username/group (SCM_CREDENTIALS).
5. Create a new public API to allow a client application to
supply a new identity, allowing them to pass a new x509
distinguished name, SASL username, SELinux context and
UNIX username/group.
6. Extend the libvirtd daemon such that the current identity
is stored in a thread local whenever invoking a public
API operation.
7. Extend the QEMU driver such that a suitable identity is
set when performing autonomous background operations
such as domain auto-start and core dump, in a non-API
thread.
8. Create a set of internal access control helper APIs in
$libvirt/src/accesscontrol/. There will be one API for each
managed object, talking an object pointer, and an operation
identifier (from an enum).
9. Create a simple impl of the access control APIs which defines
roles for groups of user identities, and grants privileges to
each role based on the operation names. This allows for simple
testing of internal infrastructure, and an RBAC mechanism for
users who lack SELinux in their OS.
10. Implant access control checks into the main codepaths of every
driver method implementations in the QEMU driver.
11. Change the SELinux reference policy to define the new security
types and access vectors for the libvirt objects & associated
API calls.
12. Create a SELinux impl of the access control APIs which invokes
avc_has_perm() using the client's SELinux context. This is
intended to be the primary RBAC mechanism for Fedora/RHEL
virtualization hosts.
13. Write policy to confine targetted applications like virt-top,
virt-mem.
14. Extend libvirt-snmp, libvirt-cim, libvirt-qpid to pass through
the client identity to libvirtd.
Technical Notes / Issues
------------------------
1. Adding new SELinux security classes / access vectors
The selinux security classes are defined in /usr/include/selinux/flask.h
and access vectors in /usr/include/selinux/av_permissions.h Both of these
files are automatically by a script in the selinux reference policy code
'$serefpolicy/policy/flask/flask.py'. The master data files are in the
same directory, 'access_vectors' and 'security_classes'. Once generated,
the headers need to be manually copied into the libselinux package
sources.
APIs are added to libvirt on a very frequent basis. What is the process
for applying access control to them if the SELinux policy does not yet
have a suitable access vector / security class defined ? Do we need a
generic 'admin' access vector we can use as catch all, until more
specific vectors can be defined for the new APIs. Desirable to avoid
having to lock-step upgrade libvirt with selinux policy for all additions
to the libvirt public API.
2. Security contexts for libvirt managed objects
virDomainPtr: Already embedded in XML, unless using dynamic labelling
in which case context is assigned at startup.
virNetworkPtr: No existing security context, nor any object on disk
that could be used. Follow example of domains and embed
<seclabel> in the XML. Assign unique MCS category per
network and ensure that daemons launched per network
(dnsmasq, radvd) inherit the MCS category.
virSecretPtr: No existing security context. Secrets may be associated
with disk paths for VMs. Could copy the security context
of the guests and apply it to the secret, or have a
dedicated type svirt_secret_t and just copy the MCS
category. Hard to make it work for guests with dynamic
MCS assignment.
virStoragePoolPtr: No existing security context. Some pool types have
objects existing on the host filesystem eg SCSI
HBAs have a directory in sysfs, filesystem dirs
have a directory somewhere, LVM has directory
for the volume group in /dev. Other pool types have
no object on disk anywhere convenient. eg Sheepdog.
Other pool types only have an object on disk when
the pool is active (eg iSCSI, NFS). So there is
nothing to use for API checks when the pool is
inactive.
Likely have to ignore whatever associated resource
is on disk and just store a security context in the
XML config as with virDomainPtr/virNetworkPtr.
virStorageVolPtr: Currently reports the SELinux security label associated
with the file on disk. Not all pool types neccessarily
have volumes with a corresponding file on disks (eg
Sheepdog).
virNodeDevicePtr: No existing security context. Most data comes from udev
or HAL databases, though ultimately much is available
in sysfs.
When detaching PCI devices from host drivers, files
in sysfs are used. When creating/deleting NPIV adapters
sysfs is used. Thus could use sysfs file labels for AVC
checks ?
virConnectPtr: All host level APIs for which there is no other object
aside from the nebulous concept of the 'host'. APIs are
all readonly, eg query host capabilities, query free
memory, CPU stats, etc. What if we gain APIs to make
write calls.
virInterfacePtr: No existing security context. Currently using netcf to
get data from /etc/sysconfig/network-scripts/ifcfg-XXX
files, but can't assume those file names since that is
Fedora/RHEL specific. Might not even use netcf if it
talks directly to network manager. Does netcf need to
expose a security label based on the ifcfg-XXX file ?
3. Security labelling config modes
When creating a guest the following XML snippets can be used.
a. Default type, dynamic MCS, automatic relabelling
<seclabel type='selinux' mode='dynamic' relabel='yes'/>
b. Custom type, dynamic MCS, automatic relabelling
<seclabel type='selinux' mode='hybrid' relabel='yes'>
<label>system_u:system_r:mysvirt_t</label>
<imagelabel>system_u:object_r:mysvirt_image_t</imagelabel>
</seclabel>
c. Default type, dynamic MCS, no relabelling
<seclabel type='selinux' mode='dynamic' relabel='no'/>
Does this mode make any sense, since admin doesn't know
MCS category upfront ? Possibly only useful if the guest
only has readonly disks.
d. Custom type, dynamic MCS, no relabelling
<seclabel type='selinux' mode='hybrid' relabel='no'>
<label>system_u:system_r:mysvirt_t</label>
</seclabel>
Same question about whether it makes sense
e. Custom type, static MCS, auto relabelling
<seclabel type='selinux' mode='static' relabel='yes'>
<label>system_u:system_r:mysvirt_t:s0:c123,c456</label>
<imagelabel>system_u:system_r:mysvirt_image_t:s0:c123,c456</imagelabel>
</seclabel>
f. Custom type, static MCS, no relabelling
<seclabel type='selinux' mode='static' relabel='no'>
<label>system_u:system_r:mysvirt_t:s0:c123,c456</label>
</seclabel>
4. Time at which to apply checks / source context
It would be desirable to restrict the ability to use automatic file
relabelling within the policy. If a client application defines a
guest with the 'relabel=yes' attribute set, at what time should this
usage be validated ?
Validate at the time the guest is defined ? This ensures the app
defining the guest is suitably privileged, but the file labels
might be changed by the time the guest starts.
Validate at the time the guest is started ? This minimises the
window between access check being performed, and libvirtd actually
performing the relabel operation. The app starting the guest might
be different from the one defining the guest though ?
Check at both define + start time ?
What source security context should we use when performing autostart
of virtual machines ? Normally when starting a VM, the check would be
performed using the context of the client invoking the start API, but
there is no such client when autostart occurs.
Should we instead perform a 'start' operation check whenever the
'autostart' flag is turned on by a client ? Or check the autostart
operation against some generic source context ?
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
13 years, 5 months
[libvirt] [PATCH v2] qemu: Parse current balloon value returned by query_balloon
by Osier Yang
Qemu once supported following memory stats which will returned by
"query_balloon":
stat_put(dict, "actual", actual);
stat_put(dict, "mem_swapped_in", dev->stats[VIRTIO_BALLOON_S_SWAP_IN]);
stat_put(dict, "mem_swapped_out", dev->stats[VIRTIO_BALLOON_S_SWAP_OUT]);
stat_put(dict, "major_page_faults", dev->stats[VIRTIO_BALLOON_S_MAJFLT]);
stat_put(dict, "minor_page_faults", dev->stats[VIRTIO_BALLOON_S_MINFLT]);
stat_put(dict, "free_mem", dev->stats[VIRTIO_BALLOON_S_MEMFREE]);
stat_put(dict, "total_mem", dev->stats[VIRTIO_BALLOON_S_MEMTOT]);
But it later disabled all the stats except "actual" by commit
07b0403dfc2b2ac179ae5b48105096cc2d03375a.
libvirt doesn't parse "actual", so user will always see a empty result
with "virsh dommemstat $domain". Even qemu haven't disabled the stats,
we should support parsing "actual".
---
include/libvirt/libvirt.h.in | 4 +++-
src/libvirt.c | 2 ++
src/qemu/qemu_monitor_json.c | 12 ++++++++++++
src/qemu/qemu_monitor_text.c | 4 +++-
tools/virsh.c | 2 ++
5 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
index df213f1..0930622 100644
--- a/include/libvirt/libvirt.h.in
+++ b/include/libvirt/libvirt.h.in
@@ -467,11 +467,13 @@ typedef enum {
*/
VIR_DOMAIN_MEMORY_STAT_AVAILABLE = 5,
+ /* Current balloon value (in KB). */
+ VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON = 6,
/*
* The number of statistics supported by this version of the interface.
* To add new statistics, add them to the enum and increase this value.
*/
- VIR_DOMAIN_MEMORY_STAT_NR = 6,
+ VIR_DOMAIN_MEMORY_STAT_NR = 7,
} virDomainMemoryStatTags;
typedef struct _virDomainMemoryStat virDomainMemoryStatStruct;
diff --git a/src/libvirt.c b/src/libvirt.c
index 18c4e08..08a7d4c 100644
--- a/src/libvirt.c
+++ b/src/libvirt.c
@@ -5737,6 +5737,8 @@ error:
* The amount of memory which is not being used for any purpose (in kb).
* VIR_DOMAIN_MEMORY_STAT_AVAILABLE:
* The total amount of memory available to the domain's OS (in kb).
+ * VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON:
+ * Current balloon value (in kb).
*
* Returns: The number of stats provided or -1 in case of failure.
*/
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
index 75adf66..2680b3c 100644
--- a/src/qemu/qemu_monitor_json.c
+++ b/src/qemu/qemu_monitor_json.c
@@ -1119,6 +1119,18 @@ int qemuMonitorJSONGetMemoryStats(qemuMonitorPtr mon,
goto cleanup;
}
+ if (virJSONValueObjectHasKey(data, "actual") && (got < nr_stats)) {
+ if (virJSONValueObjectGetNumberUlong(data, "actual", &mem) < 0) {
+ qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("info balloon reply was missing balloon actual"));
+ ret = -1;
+ goto cleanup;
+ }
+ stats[got].tag = VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON;
+ stats[got].val = (mem/1024);
+ got++;
+ }
+
if (virJSONValueObjectHasKey(data, "mem_swapped_in") && (got < nr_stats)) {
if (virJSONValueObjectGetNumberUlong(data, "mem_swapped_in", &mem) < 0) {
qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s",
diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c
index 3b42e7a..d432027 100644
--- a/src/qemu/qemu_monitor_text.c
+++ b/src/qemu/qemu_monitor_text.c
@@ -549,7 +549,9 @@ static int qemuMonitorParseExtraBalloonInfo(char *text,
parseMemoryStat(&p, VIR_DOMAIN_MEMORY_STAT_UNUSED,
",free_mem=", &stats[nr_stats_found]) ||
parseMemoryStat(&p, VIR_DOMAIN_MEMORY_STAT_AVAILABLE,
- ",total_mem=", &stats[nr_stats_found]))
+ ",total_mem=", &stats[nr_stats_found]) ||
+ parseMemoryStat(&p, VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON,
+ ",actual=", &stats[nr_stats_found]))
nr_stats_found++;
/* Skip to the next label. When *p is ',' the last match attempt
diff --git a/tools/virsh.c b/tools/virsh.c
index d98be1c..17f6a22 100644
--- a/tools/virsh.c
+++ b/tools/virsh.c
@@ -1147,6 +1147,8 @@ cmdDomMemStat(vshControl *ctl, const vshCmd *cmd)
vshPrint (ctl, "unused %llu\n", stats[i].val);
if (stats[i].tag == VIR_DOMAIN_MEMORY_STAT_AVAILABLE)
vshPrint (ctl, "available %llu\n", stats[i].val);
+ if (stats[i].tag == VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON)
+ vshPrint (ctl, "actual %llu\n", stats[i].val);
}
virDomainFree(dom);
--
1.7.4
13 years, 5 months
[libvirt] [PATCH] storage: Deactive lv before remove it
by Osier Yang
This is to address BZ# https://bugzilla.redhat.com/show_bug.cgi?id=702260,
though even if with this patch, the user might see error like
"Unable to deactivate logical volume", it could fix the problem if the
lv is referred to by another existing LVs, allowing the user remove
the lv successfully without seeing error like "Can't remove open logical
volume".
For the error "Unable to deactivate logical volume", libvirt can't do
more, it's problem of lvm, see BZ#:
https://bugzilla.redhat.com/show_bug.cgi?id=570359
And the patch applied to upstream lvm to fix it:
https://www.redhat.com/archives/lvm-devel/2011-May/msg00025.html
---
configure.ac | 4 ++++
src/storage/storage_backend_logical.c | 30 ++++++++++++++++++++++++------
2 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/configure.ac b/configure.ac
index 7982e21..5c2eeb8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1666,6 +1666,7 @@ if test "$with_storage_lvm" = "yes" || test "$with_storage_lvm" = "check"; then
AC_PATH_PROG([PVREMOVE], [pvremove], [], [$PATH:/sbin:/usr/sbin])
AC_PATH_PROG([VGREMOVE], [vgremove], [], [$PATH:/sbin:/usr/sbin])
AC_PATH_PROG([LVREMOVE], [lvremove], [], [$PATH:/sbin:/usr/sbin])
+ AC_PATH_PROG([LVCHANGE], [lvchange], [], [$PATH:/sbin:/usr/sbin])
AC_PATH_PROG([VGCHANGE], [vgchange], [], [$PATH:/sbin:/usr/sbin])
AC_PATH_PROG([VGSCAN], [vgscan], [], [$PATH:/sbin:/usr/sbin])
AC_PATH_PROG([PVS], [pvs], [], [$PATH:/sbin:/usr/sbin])
@@ -1679,6 +1680,7 @@ if test "$with_storage_lvm" = "yes" || test "$with_storage_lvm" = "check"; then
if test -z "$PVREMOVE" ; then AC_MSG_ERROR([We need pvremove for LVM storage driver]) ; fi
if test -z "$VGREMOVE" ; then AC_MSG_ERROR([We need vgremove for LVM storage driver]) ; fi
if test -z "$LVREMOVE" ; then AC_MSG_ERROR([We need lvremove for LVM storage driver]) ; fi
+ if test -z "$LVCHANGE" ; then AC_MSG_ERROR([We need lvchange for LVM storage driver]) ; fi
if test -z "$VGCHANGE" ; then AC_MSG_ERROR([We need vgchange for LVM storage driver]) ; fi
if test -z "$VGSCAN" ; then AC_MSG_ERROR([We need vgscan for LVM storage driver]) ; fi
if test -z "$PVS" ; then AC_MSG_ERROR([We need pvs for LVM storage driver]) ; fi
@@ -1691,6 +1693,7 @@ if test "$with_storage_lvm" = "yes" || test "$with_storage_lvm" = "check"; then
if test -z "$PVREMOVE" ; then with_storage_lvm=no ; fi
if test -z "$VGREMOVE" ; then with_storage_lvm=no ; fi
if test -z "$LVREMOVE" ; then with_storage_lvm=no ; fi
+ if test -z "$LVCHANGE" ; then with_storage_lvm=no ; fi
if test -z "$VGCHANGE" ; then with_storage_lvm=no ; fi
if test -z "$VGSCAN" ; then with_storage_lvm=no ; fi
if test -z "$PVS" ; then with_storage_lvm=no ; fi
@@ -1708,6 +1711,7 @@ if test "$with_storage_lvm" = "yes" || test "$with_storage_lvm" = "check"; then
AC_DEFINE_UNQUOTED([PVREMOVE],["$PVREMOVE"],[Location of pvremove program])
AC_DEFINE_UNQUOTED([VGREMOVE],["$VGREMOVE"],[Location of vgremove program])
AC_DEFINE_UNQUOTED([LVREMOVE],["$LVREMOVE"],[Location of lvremove program])
+ AC_DEFINE_UNQUOTED([LVCHANGE],["$LVCHANGE"],[Location of lvchange program])
AC_DEFINE_UNQUOTED([VGCHANGE],["$VGCHANGE"],[Location of vgchange program])
AC_DEFINE_UNQUOTED([VGSCAN],["$VGSCAN"],[Location of vgscan program])
AC_DEFINE_UNQUOTED([PVS],["$PVS"],[Location of pvs program])
diff --git a/src/storage/storage_backend_logical.c b/src/storage/storage_backend_logical.c
index 4de5442..03d7321 100644
--- a/src/storage/storage_backend_logical.c
+++ b/src/storage/storage_backend_logical.c
@@ -667,14 +667,32 @@ virStorageBackendLogicalDeleteVol(virConnectPtr conn ATTRIBUTE_UNUSED,
virStorageVolDefPtr vol,
unsigned int flags ATTRIBUTE_UNUSED)
{
- const char *cmdargv[] = {
- LVREMOVE, "-f", vol->target.path, NULL
- };
+ int ret = -1;
+ virCommandPtr lvchange_cmd = NULL;
+ virCommandPtr lvremove_cmd = NULL;
- if (virRun(cmdargv, NULL) < 0)
- return -1;
+ lvchange_cmd = virCommandNewArgList(LVCHANGE,
+ "-an",
+ vol->target.path,
+ NULL);
- return 0;
+ if (virCommandRun(lvchange_cmd, NULL) < 0)
+ goto cleanup;
+
+ lvremove_cmd = virCommandNewArgList(LVREMOVE,
+ "-an",
+ vol->target.path,
+ NULL);
+
+ if (virCommandRun(lvremove_cmd, NULL) < 0)
+ goto cleanup;
+
+ ret = 0;
+
+cleanup:
+ virCommandFree(lvchange_cmd);
+ virCommandFree(lvremove_cmd);
+ return ret;
}
--
1.7.4
13 years, 5 months
[libvirt] [PATCH] test: Remove unused timeval
by Jiri Denemark
---
src/test/test_driver.c | 7 -------
1 files changed, 0 insertions(+), 7 deletions(-)
diff --git a/src/test/test_driver.c b/src/test/test_driver.c
index 2da24f1..68ab2fe 100644
--- a/src/test/test_driver.c
+++ b/src/test/test_driver.c
@@ -499,7 +499,6 @@ cleanup:
static int testOpenDefault(virConnectPtr conn) {
int u;
- struct timeval tv;
testConnPtr privconn;
virDomainDefPtr domdef = NULL;
virDomainObjPtr domobj = NULL;
@@ -526,12 +525,6 @@ static int testOpenDefault(virConnectPtr conn) {
testDriverLock(privconn);
conn->privateData = privconn;
- if (gettimeofday(&tv, NULL) < 0) {
- virReportSystemError(errno,
- "%s", _("getting time of day"));
- goto error;
- }
-
if (virDomainObjListInit(&privconn->domains) < 0)
goto error;
--
1.7.5.3
13 years, 5 months
[libvirt] CFS Hardlimits and the libvirt cgroups implementation
by Adam Litke
Hi all. In this post I would like to bring up 3 issues which are
tightly related: 1. unwanted behavior when using cfs hardlimits with
libvirt, 2. Scaling cputune.share according to the number of vcpus, 3.
API proposal for CFS hardlimits support.
=== 1 ===
Mark Peloquin (on cc:) has been looking at implementing CFS hard limit
support on top of the existing libvirt cgroups implementation and he has
run into some unwanted behavior when enabling quotas that seems to be
affected by the cgroup hierarchy being used by libvirt.
Here are Mark's words on the subject (posted by me while Mark joins this
mailing list):
------------------
I've conducted a number of measurements using CFS.
The system config is a 2 socket Nehalem system with 64GB ram. Installed
is RHEL6.1-snap4. The guest VMs being used have RHEL5.5 - 32bit. I've
replaced the kernel with 2.6.39-rc6+ with patches from
Paul-V6-upstream-breakout.tar.bz2 for CFS bandwidth. The test config
uses 5 VMs of various vcpu and memory sizes. Being used are 2 VMs with 2
vcpus and 4GB of memory, 1 VM with 4vcpus/8GB, another VM with
8vcpus/16GB and finally a VM with 16vcpus/16GB.
Thus far the tests have been limited to cpu intensive workloads. Each VM
runs a single instance of the workload. The workload is configured to
create one thread for each vcpu in the VM. The workload is then capable
of completely saturation each vcpu in each VM.
CFS was tested using two different topologies.
First vcpu cgroups were created under each VM created by libvirt. The
vcpu threads from the VM's cgroup/tasks were moved to the tasks list of
each vcpu cgroup, one thread to each vcpu cgroup. This tree structure
permits setting CFS quota and period per vcpu. Default values for
cpu.shares (1024), quota (-1) and period (500000us) was used in each VM
cgroup and inherited by the vcpu croup. With these settings the workload
generated system cpu utilization (measured in the host) of >99% guest,
>0.1 idle, 0.14% user and 0.38 system.
Second, using the same topology, the CFS quota in each vcpu's cgroup was
set to 250000us allowing each vcpu to consume 50% of a cpu. The cpu
workloads was run again. This time the total system cpu utilization was
measured at 75% guest, ~24% idle, 0.15% user and 0.40% system.
The topology was changed such that a cgroup for each vcpu was created in
/cgroup/cpu.
The first test used the default/inherited shares and CFS quota and
period. The measured system cpu utilization was >99% guest, ~0.5 idle,
0.13 user and 0.38 system, similar to the default settings using vcpu
cgroups under libvirt.
The next test, like before the topology change, set the vcpu quota
values to 250000us or 50% of a cpu. In this case the measured system cpu
utilization was ~92% guest, ~7.5% idle, 0.15% user and 0.38% system.
We can see that moving the vcpu cgroups from being under libvirt/qemu
make a big difference in idle cpu time.
Does this suggest a possible problems with libvirt?
------------------
Has anyone else seen this type of behavior when using cgroups with CFS
hardlimits? We are working with the kernel community to see if there
might be a bug in cgroups itself.
=== 2 ===
Something else we are seeing is that libvirt's default setting for
cputune.share is 1024 for any domain (regardless of how many vcpus are
configured. This ends up hindering performance of really large VMs
(with lots of vcpus) as compared to smaller ones since all domains are
given equal share. Would folks consider changing the default for
'shares' to be a quantity scaled by the number of vcpus such that bigger
domains get to use proportionally more host cpu resource?
=== 3 ===
Besides the above issues, I would like to open a discussion on what the
libvirt API for enabling cpu hardlimits should look like. Here is what
I was thinking:
Two additional scheduler parameters (based on the names given in the
cgroup fs) will be recognized for qemu domains: 'cfs_period' and
'cfs_quota'. These can use the existing
virDomain[Get|Set]SchedulerParameters() API. The Domain XML schema
would be updated to permit the following:
--- snip ---
<cputune>
...
<cfs_period>1000000</cfs_period>
<cfs_quota>500000</cfs_quota>
</cputune>
--- snip ---
To actuate these configuration settings, we simply apply the values to
the appropriate cgroup(s) for the domain. We would prefer that each
vcpu be in its own cgroup to ensure equal and fair scheduling across all
vcpus running on the system. (We will need to resolve the issues
described by Mark in order to figure out where to hang these cgroups).
Thanks for sticking with me through this long email. I greatly
appreciate your thoughts and comments on these topics.
--
Adam Litke
IBM Linux Technology Center
13 years, 5 months