June 2011 - Devel - Libvirt List Archives

[libvirt] [PATCH 0/2] Update virsh to use stdin where appropriate

by Michael Williams

Allow virsh to use stdin for xml and other config file input when otherwise it would have required a file input. This allows for easier passing of configs through pipes, for example: virsh dumpxml 6 | ssh remote-host "virsh define" Michael Williams (2): Use '-' to read from stdin Try stdin for input when no file is specified src/util/util.c | 16 ++++++++- tools/virsh.c | 99 +++++++++++++++++++++++++++++++++++-------------------- 2 files changed, 78 insertions(+), 37 deletions(-) -- 1.7.3.4

14 years, 1 month

1
0
0 / 0

[libvirt] [PATCH] Fix dlopen dependency

by Matthias Bolte

Since the addition of the lock manager framework in 6a943419c528fdd7 dlopen is always required, but the checks in configure wasn't changed to reflect that. This didn't show up directly because the VirtualBox driver linking dlopen in covered it. But disabling the VirtualBox driver makes the build fail due to missing dlopen. Change the dlopen check in configure to pick up dlopen when available. Reported by Ruben Kerkhof. --- configure.ac | 46 ++++++++++++++++++++++++++-------------------- src/Makefile.am | 2 +- 2 files changed, 27 insertions(+), 21 deletions(-) diff --git a/configure.ac b/configure.ac index 985b8c2..f816696 100644 --- a/configure.ac +++ b/configure.ac @@ -417,6 +417,28 @@ fi dnl +dnl check for libdl +dnl + +dlfcn_found=yes +dlopen_found=yes + +AC_CHECK_HEADER([dlfcn.h],, [dlfcn_found=no]) +AC_SEARCH_LIBS([dlopen], [dl],, [dlopen_found=no]) + +case $ac_cv_search_dlopen:$host_os in + 'none required'* | *:mingw* | *:msvc*) DLOPEN_LIBS= ;; + no*) AC_MSG_ERROR([Unable to find dlopen()]) ;; + *) if test "x$dlfcn_found" != "xyes"; then + AC_MSG_ERROR([Unable to find dlfcn.h]) + fi + DLOPEN_LIBS=$ac_cv_search_dlopen ;; +esac + +AC_SUBST([DLOPEN_LIBS]) + + +dnl dnl check for VirtualBox XPCOMC location dnl @@ -432,14 +454,6 @@ AC_DEFINE_UNQUOTED([VBOX_XPCOMC_DIR], ["$vbox_xpcomc_dir"], [Location of directory containing VirtualBox XPCOMC library]) if test "x$with_vbox" = "xyes"; then - AC_SEARCH_LIBS([dlopen], [dl],,) - case $ac_cv_search_dlopen:$host_os in - 'none required'* | *:mingw* | *:msvc*) DLOPEN_LIBS= ;; - no*) AC_MSG_ERROR([Unable to find dlopen()]) ;; - *) DLOPEN_LIBS=$ac_cv_search_dlopen ;; - esac - AC_SUBST([DLOPEN_LIBS]) - case "$host" in *-*-mingw* | *-*-msvc*) MSCOM_LIBS="-lole32 -loleaut32" ;; *) MSCOM_LIBS= ;; @@ -2138,19 +2152,10 @@ AC_ARG_WITH([driver-modules], DRIVER_MODULE_CFLAGS= DRIVER_MODULE_LIBS= -if test "x$with_driver_modules" = "xyes" ; then - old_cflags="$CFLAGS" - old_libs="$LIBS" - fail=0 - AC_CHECK_HEADER([dlfcn.h],[],[fail=1]) - AC_SEARCH_LIBS([dlopen], [dl], [], [fail=1]) - test $fail = 1 && - AC_MSG_ERROR([You must have dlfcn.h / dlopen() support to build driver modules]) - - CFLAGS="$old_cflags" - LIBS="$old_libs" -fi if test "$with_driver_modules" = "yes"; then + if test "$dlfcn_found" != "yes" || test "$dlopen_found" != "yes"; then + AC_MSG_ERROR([You must have dlfcn.h / dlopen() support to build driver modules]) + fi DRIVER_MODULE_CFLAGS="-export-dynamic" case $ac_cv_search_dlopen in no*) DRIVER_MODULE_LIBS= ;; @@ -2468,6 +2473,7 @@ AC_MSG_NOTICE([]) AC_MSG_NOTICE([Libraries]) AC_MSG_NOTICE([]) AC_MSG_NOTICE([ libxml: $LIBXML_CFLAGS $LIBXML_LIBS]) +AC_MSG_NOTICE([ dlopen: $DLOPEN_LIBS]) if test "$with_esx" = "yes" ; then AC_MSG_NOTICE([ libcurl: $LIBCURL_CFLAGS $LIBCURL_LIBS]) else diff --git a/src/Makefile.am b/src/Makefile.am index 3612a24..4f9bfc9 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -526,7 +526,7 @@ libvirt_driver_la_SOURCES = $(DRIVER_SOURCES) libvirt_driver_la_CFLAGS = $(NUMACTL_CFLAGS) $(GNUTLS_CFLAGS) \ -I@top_srcdir@/src/conf $(AM_CFLAGS) -libvirt_driver_la_LIBADD = $(NUMACTL_LIBS) $(GNUTLS_LIBS) +libvirt_driver_la_LIBADD = $(NUMACTL_LIBS) $(GNUTLS_LIBS) $(DLOPEN_LIBS) USED_SYM_FILES = libvirt_private.syms -- 1.7.0.4

14 years, 1 month

3
3
0 / 0

[libvirt] [PATCH] qemu: Faster response time to qemu startup errors

by Stefan Berger

The below patch decreases the response time of libvirt to errors reported by Qemu upon startup by checking whether the qemu process is still alive while polling for the local socket to show up. This patch also introduces a special handling of signal for the Win32 part of virKillProcess. Signed-off-by: Stefan Berger <stefanb(a)linux.vnet.ibm.com> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 26bb814..92c44bf 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -247,7 +247,7 @@ qemuMonitorUnwatch(void *monitor) } static int -qemuMonitorOpenUnix(const char *monitor) +qemuMonitorOpenUnix(const char *monitor, pid_t cpid) { struct sockaddr_un addr; int monfd; @@ -274,7 +274,8 @@ qemuMonitorOpenUnix(const char *monitor) if (ret == 0) break; - if (errno == ENOENT || errno == ECONNREFUSED) { + if ((errno == ENOENT || errno == ECONNREFUSED) && + virKillProcess(cpid, 0) == 0) { /* ENOENT : Socket may not have shown up yet * ECONNREFUSED : Leftover socket hasn't been removed yet */ continue; @@ -691,7 +692,7 @@ qemuMonitorOpen(virDomainObjPtr vm, switch (config->type) { case VIR_DOMAIN_CHR_TYPE_UNIX: mon->hasSendFD = 1; - mon->fd = qemuMonitorOpenUnix(config->data.nix.path); + mon->fd = qemuMonitorOpenUnix(config->data.nix.path, vm->pid); break; case VIR_DOMAIN_CHR_TYPE_PTY: diff --git a/src/util/util.c b/src/util/util.c index d00f065..df4dfac 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2010,7 +2010,7 @@ int virKillProcess(pid_t pid, int sig) * TerminateProcess is more or less equiv to SIG_KILL, in that * a process can't trap / block it */ - if (!TerminateProcess(proc, sig)) { + if (sig != 0 && !TerminateProcess(proc, sig)) { errno = ESRCH; return -1; }

14 years, 1 month

2
2
0 / 0

[libvirt] [PATCH] build: update to latest gnulib

by Eric Blake

* .gnulib: Update to latest, for more strerror_r fixes. --- strerror_r has proven tricker than I first thought. There's a couple of other useful improvements in here, too. * .gnulib 9d196fa...79d4e75 (70): > strerror_r-posix: fix on MacOS > gnulib-tool: Better isolation between different gnulib-tool invocations. > strerror: simplify replacement > strerror_r-posix: Tweaks. > perror: document fixed bugs > stat-time: get_stat_birthtime failure is better-defined > strerror_r-posix: work around cygwin 1.7.9 > test-perror: relax test to ignore cygwin bug > strerror: Move AC_LIBOBJ invocations to module description. > perror: Use common idiom. > autoupdate > tests: fix usage message in 'mktempd_' > tests init: new function 'fatal_', for hard errors > doc/lgpl-2.1.texi > canonicalize-lgpl: use common idiom > canonicalize-lgpl: work around AIX realpath bug > strerror: work around FreeBSD bug > strerror-override: avoid bloating errno module > Typo in recent ChangeLog entry. > spawn-pipe tests: Rename program. > spawn-pipe tests: Like the child program only against libc. > careadlinkat: Avoid mismatch between ssize_t and int. > gnulib-common.m4: add _GL_ATTRIBUTE_CONST and _GL_ATTRIBUTE_PURE > ansi-c++-opt: Interoperability with libtool. > acl: Fix test failure on AIX 7. > pipe-filter-ii: Fix test failure on AIX and IRIX. > localename: Fix link dependencies. > error: Avoid gcc warning. > unsetenv: Avoid gcc warning. > setenv: Avoid gcc warning. > sys_select: Ensure memset is declared also on AIX 7. > maint.mk: sc_unmarked_diagnostics: don't hard-code "error" > getopt: Avoid gcc warning. > strerror_r: Fix comments. > perror: Fix compilation error. > setlocale: Enable replacement on Cygwin 1.5. > strerror-override: Don't disable symbol renamings. > Copyright: Use LGPL 2.1 instead of LGPL 2.0. > doc: Fix a module name. > pipe2: Remove dependency on 'nonblocking' module. > maint.mk: add three prohibit-header-without-use rules > allocator: 'die' routine is now given requested size > strerror: drop strerror_r dependency > perror: call strerror_r directly > strerror_r: fix includes for FreeBSD > Fix link errors in tests: openat-die uses gettext-h. > build-aux/config.sub > Fix link errors in tests: wait-process uses gettext-h. > * modules/assert-h (assert.h): Substitute the symbol-prefix more consistently. > assert-h: work around 'verify' incompatibility > trim: remove three superfluous assignments > wctype-h: Avoid namespace pollution on Solaris 2.6. > parse-datetime.y: accommodate -Wstrict-overflow > trim: avoid a warning from -O2 -Wstrict-overflow > gnulib-tool: Fix bug in yesterday's commit. > Allow multiple gnulib generated include files to be combined. > assert-h: Allow multiple gnulib generated replacements to coexist. > argp: Allow coexistence with strerror_r-posix module. > Status of work-in-progress around libposix. > gnulib-tool: Alternative structure of testdirs, similar to --import. > getloadavg: Remove an unreliable safety check. > doc: Cleanup yet another file produced by texinfo.tex. > Finish the conditional dependencies mechanism. > doc: Use a recent texinfo.tex. > intprops.h: adjust another comment to match code change * lib/intprops.h (_GL_INT_SIGNED): Now, E may have side effects. > intprops.h: adjust comment to match code change > gen-uni-tables: Say "gen-uni-tables.c" consistently. > mbsrchr: Avoid collision with system function on Interix. > getopt: for ambiguous options, enumerate the possibilities. > getcwd: work around mingw bug .gnulib | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/.gnulib b/.gnulib index 9d196fa..79d4e75 160000 --- a/.gnulib +++ b/.gnulib @@ -1 +1 @@ -Subproject commit 9d196fad055a448c5732a8e950cc044b353d2615 +Subproject commit 79d4e75d8e14dee5d91f58413942fe875857d4f5 -- 1.7.4.4

14 years, 1 month

1
1
0 / 0

[libvirt] [PATCH 00/12] Coverity cleanups, round 2

by Eric Blake

Well, I guess I didn't send these in time for 0.9.2. Again, some are bigger in impact than others. Eric Blake (12): build: detect Coverity 5.3.0 storage: avoid mishandling backing store > 2GB build: silence coverity false positive python: avoid unlikely sign extension bug debug: avoid null dereference on uuid lookup api uuid: annotate non-null requirements qemu: reorder checks for safety secret: drop dead code esx: avoid dead code build: silence coverity false positives qemu: add missing break statement build: break some long lines configure.ac | 4 ++- python/libvirt-override.c | 2 +- src/conf/nwfilter_conf.c | 4 +++ src/esx/esx_vi.c | 4 +- src/libvirt.c | 42 ++++++++++++++++++++++++--------------- src/qemu/qemu_cgroup.c | 4 +- src/qemu/qemu_hotplug.c | 47 +++++++++++++++++++++++++++++--------------- src/secret/secret_driver.c | 8 +------ src/util/storage_file.c | 3 +- src/util/util.c | 2 +- src/util/uuid.h | 8 ++++-- tools/virsh.c | 3 ++ 12 files changed, 81 insertions(+), 50 deletions(-) -- 1.7.4.4

14 years, 1 month

3
31
0 / 0

[libvirt] RFC: extending sVirt to confine host apps which talk to libvirtd

by Daniel P. Berrange

What follows is a document outlining some thoughts I've been having on extending sVirt to allow confinement of applications which talk to libvirtd on the host, primarily focusing on use of SELinux, but also allowing a simple non-SElinux RBAC mechanism. Securing KVM virtualization hosts with MAC ========================================== This document looks at the task of securing KVM virtualizaton hosts using mandatory access control technologies, with focus on SELinux. At the time of writing there have been two phases of development, and this document makes proposals for a third phase. Phase 1: circa 2006 ------------------- Goal: Protect the host from a compromised virtual machine. The first phase of development had the modest goal of protecting the host from attack by a compromised virtual machine. To achieve this, the KVM processes are configured such that they will run under a confined security context ('virt_t' in the SELinux reference policy), which blocks access to any host resources not labelled ('virt_image_t') for use by virtual machines. The primary limitations of this initial implementation is that while the virtual host is secured, there is no protection between virtual machines. This can be considered a regression in isolation as compared to that offered by non-virtualized hosts. The second limitation is that the virtualization admin has to take care to ensure the host resources intended for use by the virtual machines are correctly labelled. This is a manual setup taks unless the images are kept in a preset location (/var/lib/libvirt/images in the SELinux reference policy). Phase 2: March 2009 ------------------- Goal: Protect virtual machines from each other The second phase of development has the goal of providing isolation between virtual machines that is comparable to that achieved between physical machines. This piece of work is commonly referred to as "svirt". The achieve this, the KVM processes are each configured to run under a dedicated security context, which blocks access to any resources not explicitly assigned to that virtual machine. In the SELinux implementation, the base context "svirt_t" has a unique MCS category ("c240,c955") appended to form a unique security context "system_u:system_r:svirt_t:s0:c240,c955". For each host resource to be assigned to the virtual machine, the base context "svirt_image_t" is combined with the same MCS category to form a unique resource security context "system_u:object_r:svirt_image_t:s0:c240,c955". The assignment of virtual machine security contexts and labelling of resources can be done statically by the administrator / management application, or dynamically by the libvirtd daemon. The latter removes much of the administrator burden. The second phase has addressed the major guest security limitation of the first phase, and eased the burden placed on host administors. Attention can now focus on the security of the host management software stack. Client applications communicate with the libvirtd daemon using a simple sockets based RPC protocol. Thus operations initiated by client applications which run under one security context are in fact invoked under the libvirtd daemon's security context. Since the libvirtd daemon is a highly privileged, almost unconfined process, this provides a means for applications to elevate their privileges. A second problem with the current model is seen when looking at guest migration between hosts. During migration, there are two QEMU processes running for the same virtual machine, one process on each host. The dynamic assignment of MCS values to form unique security contexts is done on a per host basis, so there is no guarantee that the VM on host A will be using (or be able to use) the same security context on the target host of migration. This is not neccessarily a problem if the guest is using block devices, since block device inode labels are only visible to a single host. With a shared filesystem that supports SELinux labelling, like GFS2, both QEMU processes must run in the same security context to allow them both to access the associated files. Phase 3: June 2011 ------------------ Goal: Protect virtual machines from host applications The third phase of development has the primary goal of honouring the confinement of client applications talking to libvirtd, when performing operations on virtual machines and other managed objects (storage pools, host devices, virtual networks, secrets, etc). Every application connecting to libvirt has an associated security context. Every object managed by libvirtd will have an associated security context. When an operation is invoked via a libvirt API the client application security context will be checked against the target object context, before proceeding. Thus applications will not be able to make use of a libvirtd connection to perform operations that are otherwise blocked. The secondary goal is to add further flexibility and safety to the way MCS categories are assigned, and files are relabelled. Instead of maintaining a local database of assigned labels, there must be some shared storage where label usage can be recorded. At its simplest this can be an NFS share, with one file per MCS category and locking with fcntl(). An alternative would to be acquire leases using a lock manager such as sanlock. In addition, the guest configuration will be enhanced such that a guest can be assigned a statically chosen security context, but still make use of dynamic relabelling of resources. Finally the existing boolean mode of 'static' vs 'dynmamic' label generation will be turned into a tri-state, introducing a 'hybrid' mode where the client supplies a custom base context, and the MCS part is still auto-generated. Usage scenarios --------------- To aid in development a couple of relevant core use cases or usage scenarios have been identified: 1. A virtual machine monitoring application For this example, consider the simple monitoring application 'virt-top'. This application displays a list of all virtual machines on the host and their associated resource utilization (CPU, disk, network). This application has no need to be able to stop/start/define virtual machines, nor do any operation related to host devices, storage, or networking. Traditionally this application is written to use a read only libvirt connection. With enhanced access control from libvirtd, the policy would define a new security context 'virt_top_t' for the 'virt-top' application. This policy would allow 'list', 'read', 'readstats' on the 'domain' object type. 2. A multi-guest, multi-user MLS enabled host For this example, consider a virtualizaton host with MLS policy that is running multiple virtual machines, for a variety of different users. A user with the security level "restricted" must not be allowed to control virtual machines with a security level of "confidential". Conversely a user with security level "secret" must not be allowed to create virtual machines with a security level of "unclassified". With enhanced access control from libvirtd, getpeercon() would provide the security context of the client application (user). The client context would be used to perform an AVC when any API operation is invoked, thus ensuring that the client's MLS label is honoured in access control checks. The effect would be that when an 'restricted' user asked for a list of virtual machines only virtual machines at level 'restricted' or below would be returned. Or when a "secret" user asked to start a guest when a security level of 'unclassified', the operation would be denied. 3. Identity transitions from trusted agents For this example, consider a trusted agent such as libvirt-qpid, or libvirt-snmp, which translates the libvirt API from its native model, into an alternate access model. In such an example, the agent talking to libvirtd will have authenticated itself. The peer identity that libvirtd sees, however, is that of the agent, not the ultimate (end-user) client. In such a case it will desirable to allow a trusted agent to transition to a different identity when performing operations. An end user running under context "unconfined_u:unconfined_r:virt_top_t:s0-s0:c0.c1023" may talk to the libvirt-qpid agent which runs under the context "system_u:system_r:virt_qpid_t:s0-s0:c0.c1023". The libvirt-qpid connects to libvirtd which sees 'virt_qpid_t' as the client type. The policy is written to allow transitions from 'virt_qpid_t' to the 'virt_top_t' type, so when the virt-top client connects to libvirt-qpid, it changes its identity to 'virt_top_t'. From that point onwards, all AVC checks honour the privileges of the ultimate end user application, rather than the libvirt-qpid intermediary. The same mechanism also ensures that the client application MLS level is transferred via the libvirt-qpid agent to libvirtd. Anticipated Development tasks ----------------------------- 1. Extend the domain XML to add a third attribute to the <seclabel> element relabel="yes|no", to control whether libvirtd will automatically label resources assigned to a guest. If the existing 'mode' attribute is "dynamic", then relabelling will default to enabled, while if it is 'static', then relabelling will default to disabled. Also change 'mode' to allow a new 'hybrid' value. 2. Determine how to maintain/identify security labels for other managed objects, including virStoragePoolPtr, virStorageVolPtr, virSecretPtr, virNetworkPtr, virInterfacePtr, virNodeDevicePtr, an host level APIs without any explicit managed object. 3. Extend XML for non-domain objects to implant security labels as identified in step 2. 4. Create an internal virIdentity struct to store the identity of the client. This will include at least the x509 distinguished name, the SASL username, the SELinux context (getpeercon()) and UNIX username/group (SCM_CREDENTIALS). 5. Create a new public API to allow a client application to supply a new identity, allowing them to pass a new x509 distinguished name, SASL username, SELinux context and UNIX username/group. 6. Extend the libvirtd daemon such that the current identity is stored in a thread local whenever invoking a public API operation. 7. Extend the QEMU driver such that a suitable identity is set when performing autonomous background operations such as domain auto-start and core dump, in a non-API thread. 8. Create a set of internal access control helper APIs in $libvirt/src/accesscontrol/. There will be one API for each managed object, talking an object pointer, and an operation identifier (from an enum). 9. Create a simple impl of the access control APIs which defines roles for groups of user identities, and grants privileges to each role based on the operation names. This allows for simple testing of internal infrastructure, and an RBAC mechanism for users who lack SELinux in their OS. 10. Implant access control checks into the main codepaths of every driver method implementations in the QEMU driver. 11. Change the SELinux reference policy to define the new security types and access vectors for the libvirt objects & associated API calls. 12. Create a SELinux impl of the access control APIs which invokes avc_has_perm() using the client's SELinux context. This is intended to be the primary RBAC mechanism for Fedora/RHEL virtualization hosts. 13. Write policy to confine targetted applications like virt-top, virt-mem. 14. Extend libvirt-snmp, libvirt-cim, libvirt-qpid to pass through the client identity to libvirtd. Technical Notes / Issues ------------------------ 1. Adding new SELinux security classes / access vectors The selinux security classes are defined in /usr/include/selinux/flask.h and access vectors in /usr/include/selinux/av_permissions.h Both of these files are automatically by a script in the selinux reference policy code '$serefpolicy/policy/flask/flask.py'. The master data files are in the same directory, 'access_vectors' and 'security_classes'. Once generated, the headers need to be manually copied into the libselinux package sources. APIs are added to libvirt on a very frequent basis. What is the process for applying access control to them if the SELinux policy does not yet have a suitable access vector / security class defined ? Do we need a generic 'admin' access vector we can use as catch all, until more specific vectors can be defined for the new APIs. Desirable to avoid having to lock-step upgrade libvirt with selinux policy for all additions to the libvirt public API. 2. Security contexts for libvirt managed objects virDomainPtr: Already embedded in XML, unless using dynamic labelling in which case context is assigned at startup. virNetworkPtr: No existing security context, nor any object on disk that could be used. Follow example of domains and embed <seclabel> in the XML. Assign unique MCS category per network and ensure that daemons launched per network (dnsmasq, radvd) inherit the MCS category. virSecretPtr: No existing security context. Secrets may be associated with disk paths for VMs. Could copy the security context of the guests and apply it to the secret, or have a dedicated type svirt_secret_t and just copy the MCS category. Hard to make it work for guests with dynamic MCS assignment. virStoragePoolPtr: No existing security context. Some pool types have objects existing on the host filesystem eg SCSI HBAs have a directory in sysfs, filesystem dirs have a directory somewhere, LVM has directory for the volume group in /dev. Other pool types have no object on disk anywhere convenient. eg Sheepdog. Other pool types only have an object on disk when the pool is active (eg iSCSI, NFS). So there is nothing to use for API checks when the pool is inactive. Likely have to ignore whatever associated resource is on disk and just store a security context in the XML config as with virDomainPtr/virNetworkPtr. virStorageVolPtr: Currently reports the SELinux security label associated with the file on disk. Not all pool types neccessarily have volumes with a corresponding file on disks (eg Sheepdog). virNodeDevicePtr: No existing security context. Most data comes from udev or HAL databases, though ultimately much is available in sysfs. When detaching PCI devices from host drivers, files in sysfs are used. When creating/deleting NPIV adapters sysfs is used. Thus could use sysfs file labels for AVC checks ? virConnectPtr: All host level APIs for which there is no other object aside from the nebulous concept of the 'host'. APIs are all readonly, eg query host capabilities, query free memory, CPU stats, etc. What if we gain APIs to make write calls. virInterfacePtr: No existing security context. Currently using netcf to get data from /etc/sysconfig/network-scripts/ifcfg-XXX files, but can't assume those file names since that is Fedora/RHEL specific. Might not even use netcf if it talks directly to network manager. Does netcf need to expose a security label based on the ifcfg-XXX file ? 3. Security labelling config modes When creating a guest the following XML snippets can be used. a. Default type, dynamic MCS, automatic relabelling <seclabel type='selinux' mode='dynamic' relabel='yes'/> b. Custom type, dynamic MCS, automatic relabelling <seclabel type='selinux' mode='hybrid' relabel='yes'> <label>system_u:system_r:mysvirt_t</label> <imagelabel>system_u:object_r:mysvirt_image_t</imagelabel> </seclabel> c. Default type, dynamic MCS, no relabelling <seclabel type='selinux' mode='dynamic' relabel='no'/> Does this mode make any sense, since admin doesn't know MCS category upfront ? Possibly only useful if the guest only has readonly disks. d. Custom type, dynamic MCS, no relabelling <seclabel type='selinux' mode='hybrid' relabel='no'> <label>system_u:system_r:mysvirt_t</label> </seclabel> Same question about whether it makes sense e. Custom type, static MCS, auto relabelling <seclabel type='selinux' mode='static' relabel='yes'> <label>system_u:system_r:mysvirt_t:s0:c123,c456</label> <imagelabel>system_u:system_r:mysvirt_image_t:s0:c123,c456</imagelabel> </seclabel> f. Custom type, static MCS, no relabelling <seclabel type='selinux' mode='static' relabel='no'> <label>system_u:system_r:mysvirt_t:s0:c123,c456</label> </seclabel> 4. Time at which to apply checks / source context It would be desirable to restrict the ability to use automatic file relabelling within the policy. If a client application defines a guest with the 'relabel=yes' attribute set, at what time should this usage be validated ? Validate at the time the guest is defined ? This ensures the app defining the guest is suitably privileged, but the file labels might be changed by the time the guest starts. Validate at the time the guest is started ? This minimises the window between access check being performed, and libvirtd actually performing the relabel operation. The app starting the guest might be different from the one defining the guest though ? Check at both define + start time ? What source security context should we use when performing autostart of virtual machines ? Normally when starting a VM, the check would be performed using the context of the client invoking the start API, but there is no such client when autostart occurs. Should we instead perform a 'start' operation check whenever the 'autostart' flag is turned on by a client ? Or check the autostart operation against some generic source context ? -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

14 years, 1 month

5
9
0 / 0

[libvirt] [PATCH v2] qemu: Parse current balloon value returned by query_balloon

by Osier Yang

Qemu once supported following memory stats which will returned by "query_balloon": stat_put(dict, "actual", actual); stat_put(dict, "mem_swapped_in", dev->stats[VIRTIO_BALLOON_S_SWAP_IN]); stat_put(dict, "mem_swapped_out", dev->stats[VIRTIO_BALLOON_S_SWAP_OUT]); stat_put(dict, "major_page_faults", dev->stats[VIRTIO_BALLOON_S_MAJFLT]); stat_put(dict, "minor_page_faults", dev->stats[VIRTIO_BALLOON_S_MINFLT]); stat_put(dict, "free_mem", dev->stats[VIRTIO_BALLOON_S_MEMFREE]); stat_put(dict, "total_mem", dev->stats[VIRTIO_BALLOON_S_MEMTOT]); But it later disabled all the stats except "actual" by commit 07b0403dfc2b2ac179ae5b48105096cc2d03375a. libvirt doesn't parse "actual", so user will always see a empty result with "virsh dommemstat $domain". Even qemu haven't disabled the stats, we should support parsing "actual". --- include/libvirt/libvirt.h.in | 4 +++- src/libvirt.c | 2 ++ src/qemu/qemu_monitor_json.c | 12 ++++++++++++ src/qemu/qemu_monitor_text.c | 4 +++- tools/virsh.c | 2 ++ 5 files changed, 22 insertions(+), 2 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index df213f1..0930622 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -467,11 +467,13 @@ typedef enum { */ VIR_DOMAIN_MEMORY_STAT_AVAILABLE = 5, + /* Current balloon value (in KB). */ + VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON = 6, /* * The number of statistics supported by this version of the interface. * To add new statistics, add them to the enum and increase this value. */ - VIR_DOMAIN_MEMORY_STAT_NR = 6, + VIR_DOMAIN_MEMORY_STAT_NR = 7, } virDomainMemoryStatTags; typedef struct _virDomainMemoryStat virDomainMemoryStatStruct; diff --git a/src/libvirt.c b/src/libvirt.c index 18c4e08..08a7d4c 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -5737,6 +5737,8 @@ error: * The amount of memory which is not being used for any purpose (in kb). * VIR_DOMAIN_MEMORY_STAT_AVAILABLE: * The total amount of memory available to the domain's OS (in kb). + * VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON: + * Current balloon value (in kb). * * Returns: The number of stats provided or -1 in case of failure. */ diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 75adf66..2680b3c 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1119,6 +1119,18 @@ int qemuMonitorJSONGetMemoryStats(qemuMonitorPtr mon, goto cleanup; } + if (virJSONValueObjectHasKey(data, "actual") && (got < nr_stats)) { + if (virJSONValueObjectGetNumberUlong(data, "actual", &mem) < 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("info balloon reply was missing balloon actual")); + ret = -1; + goto cleanup; + } + stats[got].tag = VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON; + stats[got].val = (mem/1024); + got++; + } + if (virJSONValueObjectHasKey(data, "mem_swapped_in") && (got < nr_stats)) { if (virJSONValueObjectGetNumberUlong(data, "mem_swapped_in", &mem) < 0) { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index 3b42e7a..d432027 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -549,7 +549,9 @@ static int qemuMonitorParseExtraBalloonInfo(char *text, parseMemoryStat(&p, VIR_DOMAIN_MEMORY_STAT_UNUSED, ",free_mem=", &stats[nr_stats_found]) || parseMemoryStat(&p, VIR_DOMAIN_MEMORY_STAT_AVAILABLE, - ",total_mem=", &stats[nr_stats_found])) + ",total_mem=", &stats[nr_stats_found]) || + parseMemoryStat(&p, VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON, + ",actual=", &stats[nr_stats_found])) nr_stats_found++; /* Skip to the next label. When *p is ',' the last match attempt diff --git a/tools/virsh.c b/tools/virsh.c index d98be1c..17f6a22 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -1147,6 +1147,8 @@ cmdDomMemStat(vshControl *ctl, const vshCmd *cmd) vshPrint (ctl, "unused %llu\n", stats[i].val); if (stats[i].tag == VIR_DOMAIN_MEMORY_STAT_AVAILABLE) vshPrint (ctl, "available %llu\n", stats[i].val); + if (stats[i].tag == VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON) + vshPrint (ctl, "actual %llu\n", stats[i].val); } virDomainFree(dom); -- 1.7.4

14 years, 1 month

1
1
0 / 0

[libvirt] [PATCH] storage: Deactive lv before remove it

by Osier Yang

This is to address BZ# https://bugzilla.redhat.com/show_bug.cgi?id=702260, though even if with this patch, the user might see error like "Unable to deactivate logical volume", it could fix the problem if the lv is referred to by another existing LVs, allowing the user remove the lv successfully without seeing error like "Can't remove open logical volume". For the error "Unable to deactivate logical volume", libvirt can't do more, it's problem of lvm, see BZ#: https://bugzilla.redhat.com/show_bug.cgi?id=570359 And the patch applied to upstream lvm to fix it: https://www.redhat.com/archives/lvm-devel/2011-May/msg00025.html --- configure.ac | 4 ++++ src/storage/storage_backend_logical.c | 30 ++++++++++++++++++++++++------ 2 files changed, 28 insertions(+), 6 deletions(-) diff --git a/configure.ac b/configure.ac index 7982e21..5c2eeb8 100644 --- a/configure.ac +++ b/configure.ac @@ -1666,6 +1666,7 @@ if test "$with_storage_lvm" = "yes" || test "$with_storage_lvm" = "check"; then AC_PATH_PROG([PVREMOVE], [pvremove], [], [$PATH:/sbin:/usr/sbin]) AC_PATH_PROG([VGREMOVE], [vgremove], [], [$PATH:/sbin:/usr/sbin]) AC_PATH_PROG([LVREMOVE], [lvremove], [], [$PATH:/sbin:/usr/sbin]) + AC_PATH_PROG([LVCHANGE], [lvchange], [], [$PATH:/sbin:/usr/sbin]) AC_PATH_PROG([VGCHANGE], [vgchange], [], [$PATH:/sbin:/usr/sbin]) AC_PATH_PROG([VGSCAN], [vgscan], [], [$PATH:/sbin:/usr/sbin]) AC_PATH_PROG([PVS], [pvs], [], [$PATH:/sbin:/usr/sbin]) @@ -1679,6 +1680,7 @@ if test "$with_storage_lvm" = "yes" || test "$with_storage_lvm" = "check"; then if test -z "$PVREMOVE" ; then AC_MSG_ERROR([We need pvremove for LVM storage driver]) ; fi if test -z "$VGREMOVE" ; then AC_MSG_ERROR([We need vgremove for LVM storage driver]) ; fi if test -z "$LVREMOVE" ; then AC_MSG_ERROR([We need lvremove for LVM storage driver]) ; fi + if test -z "$LVCHANGE" ; then AC_MSG_ERROR([We need lvchange for LVM storage driver]) ; fi if test -z "$VGCHANGE" ; then AC_MSG_ERROR([We need vgchange for LVM storage driver]) ; fi if test -z "$VGSCAN" ; then AC_MSG_ERROR([We need vgscan for LVM storage driver]) ; fi if test -z "$PVS" ; then AC_MSG_ERROR([We need pvs for LVM storage driver]) ; fi @@ -1691,6 +1693,7 @@ if test "$with_storage_lvm" = "yes" || test "$with_storage_lvm" = "check"; then if test -z "$PVREMOVE" ; then with_storage_lvm=no ; fi if test -z "$VGREMOVE" ; then with_storage_lvm=no ; fi if test -z "$LVREMOVE" ; then with_storage_lvm=no ; fi + if test -z "$LVCHANGE" ; then with_storage_lvm=no ; fi if test -z "$VGCHANGE" ; then with_storage_lvm=no ; fi if test -z "$VGSCAN" ; then with_storage_lvm=no ; fi if test -z "$PVS" ; then with_storage_lvm=no ; fi @@ -1708,6 +1711,7 @@ if test "$with_storage_lvm" = "yes" || test "$with_storage_lvm" = "check"; then AC_DEFINE_UNQUOTED([PVREMOVE],["$PVREMOVE"],[Location of pvremove program]) AC_DEFINE_UNQUOTED([VGREMOVE],["$VGREMOVE"],[Location of vgremove program]) AC_DEFINE_UNQUOTED([LVREMOVE],["$LVREMOVE"],[Location of lvremove program]) + AC_DEFINE_UNQUOTED([LVCHANGE],["$LVCHANGE"],[Location of lvchange program]) AC_DEFINE_UNQUOTED([VGCHANGE],["$VGCHANGE"],[Location of vgchange program]) AC_DEFINE_UNQUOTED([VGSCAN],["$VGSCAN"],[Location of vgscan program]) AC_DEFINE_UNQUOTED([PVS],["$PVS"],[Location of pvs program]) diff --git a/src/storage/storage_backend_logical.c b/src/storage/storage_backend_logical.c index 4de5442..03d7321 100644 --- a/src/storage/storage_backend_logical.c +++ b/src/storage/storage_backend_logical.c @@ -667,14 +667,32 @@ virStorageBackendLogicalDeleteVol(virConnectPtr conn ATTRIBUTE_UNUSED, virStorageVolDefPtr vol, unsigned int flags ATTRIBUTE_UNUSED) { - const char *cmdargv[] = { - LVREMOVE, "-f", vol->target.path, NULL - }; + int ret = -1; + virCommandPtr lvchange_cmd = NULL; + virCommandPtr lvremove_cmd = NULL; - if (virRun(cmdargv, NULL) < 0) - return -1; + lvchange_cmd = virCommandNewArgList(LVCHANGE, + "-an", + vol->target.path, + NULL); - return 0; + if (virCommandRun(lvchange_cmd, NULL) < 0) + goto cleanup; + + lvremove_cmd = virCommandNewArgList(LVREMOVE, + "-an", + vol->target.path, + NULL); + + if (virCommandRun(lvremove_cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + virCommandFree(lvchange_cmd); + virCommandFree(lvremove_cmd); + return ret; } -- 1.7.4

14 years, 1 month

2
3
0 / 0

[libvirt] [PATCH] test: Remove unused timeval

by Jiri Denemark

--- src/test/test_driver.c | 7 ------- 1 files changed, 0 insertions(+), 7 deletions(-) diff --git a/src/test/test_driver.c b/src/test/test_driver.c index 2da24f1..68ab2fe 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -499,7 +499,6 @@ cleanup: static int testOpenDefault(virConnectPtr conn) { int u; - struct timeval tv; testConnPtr privconn; virDomainDefPtr domdef = NULL; virDomainObjPtr domobj = NULL; @@ -526,12 +525,6 @@ static int testOpenDefault(virConnectPtr conn) { testDriverLock(privconn); conn->privateData = privconn; - if (gettimeofday(&tv, NULL) < 0) { - virReportSystemError(errno, - "%s", _("getting time of day")); - goto error; - } - if (virDomainObjListInit(&privconn->domains) < 0) goto error; -- 1.7.5.3

14 years, 1 month

2
2
0 / 0

[libvirt] CFS Hardlimits and the libvirt cgroups implementation

by Adam Litke

Hi all. In this post I would like to bring up 3 issues which are tightly related: 1. unwanted behavior when using cfs hardlimits with libvirt, 2. Scaling cputune.share according to the number of vcpus, 3. API proposal for CFS hardlimits support. === 1 === Mark Peloquin (on cc:) has been looking at implementing CFS hard limit support on top of the existing libvirt cgroups implementation and he has run into some unwanted behavior when enabling quotas that seems to be affected by the cgroup hierarchy being used by libvirt. Here are Mark's words on the subject (posted by me while Mark joins this mailing list): ------------------ I've conducted a number of measurements using CFS. The system config is a 2 socket Nehalem system with 64GB ram. Installed is RHEL6.1-snap4. The guest VMs being used have RHEL5.5 - 32bit. I've replaced the kernel with 2.6.39-rc6+ with patches from Paul-V6-upstream-breakout.tar.bz2 for CFS bandwidth. The test config uses 5 VMs of various vcpu and memory sizes. Being used are 2 VMs with 2 vcpus and 4GB of memory, 1 VM with 4vcpus/8GB, another VM with 8vcpus/16GB and finally a VM with 16vcpus/16GB. Thus far the tests have been limited to cpu intensive workloads. Each VM runs a single instance of the workload. The workload is configured to create one thread for each vcpu in the VM. The workload is then capable of completely saturation each vcpu in each VM. CFS was tested using two different topologies. First vcpu cgroups were created under each VM created by libvirt. The vcpu threads from the VM's cgroup/tasks were moved to the tasks list of each vcpu cgroup, one thread to each vcpu cgroup. This tree structure permits setting CFS quota and period per vcpu. Default values for cpu.shares (1024), quota (-1) and period (500000us) was used in each VM cgroup and inherited by the vcpu croup. With these settings the workload generated system cpu utilization (measured in the host) of >99% guest, >0.1 idle, 0.14% user and 0.38 system. Second, using the same topology, the CFS quota in each vcpu's cgroup was set to 250000us allowing each vcpu to consume 50% of a cpu. The cpu workloads was run again. This time the total system cpu utilization was measured at 75% guest, ~24% idle, 0.15% user and 0.40% system. The topology was changed such that a cgroup for each vcpu was created in /cgroup/cpu. The first test used the default/inherited shares and CFS quota and period. The measured system cpu utilization was >99% guest, ~0.5 idle, 0.13 user and 0.38 system, similar to the default settings using vcpu cgroups under libvirt. The next test, like before the topology change, set the vcpu quota values to 250000us or 50% of a cpu. In this case the measured system cpu utilization was ~92% guest, ~7.5% idle, 0.15% user and 0.38% system. We can see that moving the vcpu cgroups from being under libvirt/qemu make a big difference in idle cpu time. Does this suggest a possible problems with libvirt? ------------------ Has anyone else seen this type of behavior when using cgroups with CFS hardlimits? We are working with the kernel community to see if there might be a bug in cgroups itself. === 2 === Something else we are seeing is that libvirt's default setting for cputune.share is 1024 for any domain (regardless of how many vcpus are configured. This ends up hindering performance of really large VMs (with lots of vcpus) as compared to smaller ones since all domains are given equal share. Would folks consider changing the default for 'shares' to be a quantity scaled by the number of vcpus such that bigger domains get to use proportionally more host cpu resource? === 3 === Besides the above issues, I would like to open a discussion on what the libvirt API for enabling cpu hardlimits should look like. Here is what I was thinking: Two additional scheduler parameters (based on the names given in the cgroup fs) will be recognized for qemu domains: 'cfs_period' and 'cfs_quota'. These can use the existing virDomain[Get|Set]SchedulerParameters() API. The Domain XML schema would be updated to permit the following: --- snip --- <cputune> ... <cfs_period>1000000</cfs_period> <cfs_quota>500000</cfs_quota> </cputune> --- snip --- To actuate these configuration settings, we simply apply the values to the appropriate cgroup(s) for the domain. We would prefer that each vcpu be in its own cgroup to ensure equal and fair scheduling across all vcpus running on the system. (We will need to resolve the issues described by Mark in order to figure out where to hang these cgroups). Thanks for sticking with me through this long email. I greatly appreciate your thoughts and comments on these topics. -- Adam Litke IBM Linux Technology Center

14 years, 1 month

4
8
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Devel June 2011