[libvirt] [PATCH] Fix memory leak while scanning snapshots
by Philipp Hahn
If a snapshot with the name already exists, virDomainSnapshotAssignDef()
just returns NULL, in which case the snapshot definition is leaked.
Currently this leak is not a big problem, since qemuDomainSnapshotLoad()
is only called once during initial startup of libvirtd.
Signed-off-by: Philipp Hahn <hahn(a)univention.de>
---
src/qemu/qemu_driver.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index ce19be7..b815046 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -293,6 +293,7 @@ static void qemuDomainSnapshotLoad(void *payload,
int ret;
char *fullpath;
virDomainSnapshotDefPtr def = NULL;
+ virDomainSnapshotObjPtr snap = NULL;
char ebuf[1024];
virDomainObjLock(vm);
@@ -344,7 +345,10 @@ static void qemuDomainSnapshotLoad(void *payload,
continue;
}
- virDomainSnapshotAssignDef(&vm->snapshots, def);
+ snap = virDomainSnapshotAssignDef(&vm->snapshots, def);
+ if (snap == NULL) {
+ virDomainSnapshotDefFree(def);
+ }
VIR_FREE(fullpath);
VIR_FREE(xmlStr);
--
1.7.1
13 years, 4 months
[libvirt] [RFC v4] Export KVM Host Power Management capabilities
by Srivatsa S. Bhat
This patch exports KVM Host Power Management capabilities as XML so that
higher-level systems management software can make use of these features
available in the host.
The script "pm-is-supported" (from pm-utils package) is run to discover if
Suspend-to-RAM (S3) or Suspend-to-Disk (S4) is supported by the host.
If either of them are supported, then a new tag "<power_management>" is
introduced in the XML under the <host> tag.
Eg: When the host supports both S3 and S4, the XML looks like this:
<capabilities>
<host>
<uuid>dc699581-48a2-11cb-b8a8-9a0265a79bbe</uuid>
<cpu>
<arch>i686</arch>
<model>coreduo</model>
<vendor>Intel</vendor>
<topology sockets='1' cores='2' threads='1'/>
<feature name='xtpr'/>
<feature name='tm2'/>
<feature name='est'/>
<feature name='vmx'/>
<feature name='pbe'/>
<feature name='tm'/>
<feature name='ht'/>
<feature name='ss'/>
<feature name='acpi'/>
<feature name='ds'/>
</cpu>
<power_management> <<<=== New host power management features
<S3/>
<S4/>
</power_management>
<migration_features>
<live/>
<uri_transports>
<uri_transport>tcp</uri_transport>
</uri_transports>
</migration_features>
</host>
.
.
.
However in case the query to check for power management features succeeded,
but the host does not support any such feature, then the XML will contain
an empty <power_management/> tag. In the event that the PM query itself
failed, the XML will not contain any "power_management" tag.
Open issues:
-----------
1. Design new APIs in libvirt to exploit power management features
such as S3/S4. This was discussed in [3] and [4].
Please let me know your comments and feedback.
Changelog:
---------
v1: The idea of exporting host power management capabilities through
libvirt was discussed in [1].
v2: A working implementation was presented for review in [2].
v3: Omissions and improvements pointed out in v2 were taken care of in [5].
References:
----------
[1] Exporting KVM host power saving capabilities through libvirt
http://thread.gmane.org/gmane.comp.emulators.libvirt/40886
[2] http://www.redhat.com/archives/libvir-list/2011-August/msg00238.html
[3] http://www.redhat.com/archives/libvir-list/2011-August/msg00248.html
[4] http://www.redhat.com/archives/libvir-list/2011-August/msg00302.html
[5] http://www.redhat.com/archives/libvir-list/2011-August/msg00282.html
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat(a)linux.vnet.ibm.com>
---
docs/formatcaps.html.in | 19 +++++++---
docs/schemas/capability.rng | 18 +++++++++
include/libvirt/virterror.h | 1 +
libvirt.spec.in | 2 +
src/conf/capabilities.c | 27 +++++++++++++-
src/conf/capabilities.h | 4 ++
src/libvirt_private.syms | 1 +
src/qemu/qemu_capabilities.c | 18 +++++++++
src/util/util.c | 82 ++++++++++++++++++++++++++++++++++++++++++
src/util/util.h | 14 +++++++
src/util/virterror.c | 3 ++
11 files changed, 183 insertions(+), 6 deletions(-)
diff --git a/docs/formatcaps.html.in b/docs/formatcaps.html.in
index a4297ce..ce6f9a6 100644
--- a/docs/formatcaps.html.in
+++ b/docs/formatcaps.html.in
@@ -28,6 +28,10 @@ BIOS you will see</p>
<feature name='xtpr'/>
...
</cpu>
+ <power_management>
+ <S3/>
+ <S4/>
+ <power_management/>
</host></span>
<!-- xen-3.0-x86_64 -->
@@ -61,11 +65,16 @@ BIOS you will see</p>
...
</capabilities></pre>
<p>The first block (in red) indicates the host hardware capabilities, currently
-it is limited to the CPU properties but other information may be available,
-it shows the CPU architecture, topology, model name, and additional features
-which are not included in the model but the CPU provides them. Features of the
-chip are shown within the feature block (the block is similar to what you will
-find in a Xen fully virtualized domain description).</p>
+it is limited to the CPU properties and the power management features of
+the host platform, but other information may be available, it shows the CPU architecture,
+topology, model name, and additional features which are not included in the model but the
+CPU provides them. Features of the chip are shown within the feature block (the block is
+similar to what you will find in a Xen fully virtualized domain description). Further,
+the power management features supported by the host are shown, such as Suspend-to-RAM (S3)
+and Suspend-to-Disk (S4). In case the query for power management features succeeded but the
+host does not support any such feature, then an empty <power_management/>
+tag will be shown. Otherwise, if the query itself failed, no such tag will
+be displayed (i.e., there will not be any power_management block or empty tag in the XML).</p>
<p>The second block (in blue) indicates the paravirtualization support of the
Xen support, you will see the os_type of xen to indicate a paravirtual
kernel, then architecture information and potential features.</p>
diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng
index 99b4a9a..8238a37 100644
--- a/docs/schemas/capability.rng
+++ b/docs/schemas/capability.rng
@@ -35,6 +35,9 @@
</optional>
</element>
<optional>
+ <ref name='power_management'/>
+ </optional>
+ <optional>
<ref name='migration'/>
</optional>
<optional>
@@ -105,6 +108,21 @@
</zeroOrMore>
</define>
+ <define name='power_management'>
+ <element name='power_management'>
+ <optional>
+ <element name='S3'>
+ <empty/>
+ </element>
+ </optional>
+ <optional>
+ <element name='S4'>
+ <empty/>
+ </element>
+ </optional>
+ </element>
+ </define>
+
<define name='migration'>
<element name='migration_features'>
<optional>
diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h
index 9cac437..a831c73 100644
--- a/include/libvirt/virterror.h
+++ b/include/libvirt/virterror.h
@@ -82,6 +82,7 @@ typedef enum {
VIR_FROM_EVENT = 40, /* Error from event loop impl */
VIR_FROM_LIBXL = 41, /* Error from libxenlight driver */
VIR_FROM_LOCKING = 42, /* Error from lock manager */
+ VIR_FROM_CAPABILITIES = 43, /* Error from capabilities */
} virErrorDomain;
diff --git a/libvirt.spec.in b/libvirt.spec.in
index e2b7f65..3193de3 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -482,6 +482,8 @@ Requires: nc
Requires: gettext
# Needed by virt-pki-validate script.
Requires: gnutls-utils
+# Needed for probing the power management features of the host.
+Requires: pm-utils
%if %{with_sasl}
Requires: cyrus-sasl
# Not technically required, but makes 'out-of-box' config
diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c
index 2f243ae..e8ab599 100644
--- a/src/conf/capabilities.c
+++ b/src/conf/capabilities.c
@@ -29,6 +29,13 @@
#include "util.h"
#include "uuid.h"
#include "cpu_conf.h"
+#include "virterror_internal.h"
+
+
+#define VIR_FROM_THIS VIR_FROM_CAPABILITIES
+
+VIR_ENUM_IMPL(virHostPMCapability, VIR_HOST_PM_LAST,
+ "S3", "S4")
/**
* virCapabilitiesNew:
@@ -201,7 +208,6 @@ virCapabilitiesAddHostFeature(virCapsPtr caps,
return 0;
}
-
/**
* virCapabilitiesAddHostMigrateTransport:
* @caps: capabilities to extend
@@ -686,6 +692,25 @@ virCapabilitiesFormatXML(virCapsPtr caps)
virBufferAddLit(&xml, " </cpu>\n");
+ if(caps->host.powerMgmt_valid) {
+ /* The PM query was successful. */
+ if(caps->host.powerMgmt) {
+ /* The host supports some PM features. */
+ unsigned int pm = caps->host.powerMgmt;
+ virBufferAddLit(&xml, " <power_management>\n");
+ while(pm) {
+ int bit = ffs(pm) - 1;
+ virBufferAsprintf(&xml, " <%s/>\n",
+ virHostPMCapabilityTypeToString(bit));
+ pm &= ~(1U << bit);
+ }
+ virBufferAddLit(&xml, " </power_management>\n");
+ } else {
+ /* The host does not support any PM feature. */
+ virBufferAddLit(&xml, " <power_management/>\n");
+ }
+ }
+
if (caps->host.offlineMigrate) {
virBufferAddLit(&xml, " <migration_features>\n");
if (caps->host.liveMigrate)
diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h
index e2fa1d6..c51f220 100644
--- a/src/conf/capabilities.h
+++ b/src/conf/capabilities.h
@@ -105,6 +105,10 @@ struct _virCapsHost {
size_t nfeatures;
size_t nfeatures_max;
char **features;
+ bool powerMgmt_valid;
+ unsigned int powerMgmt; /* Bitmask of the PM capabilities.
+ * See enum virHostPMCapability.
+ */
int offlineMigrate;
int liveMigrate;
size_t nmigrateTrans;
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 830222b..40fc4d0 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -1058,6 +1058,7 @@ virFormatMacAddr;
virGenerateMacAddr;
virGetGroupID;
virGetHostname;
+virGetPMCapabilities;
virGetUserDirectory;
virGetUserID;
virGetUserName;
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index 3f36212..581b80f 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -794,6 +794,8 @@ virCapsPtr qemuCapsInit(virCapsPtr old_caps)
struct utsname utsname;
virCapsPtr caps;
int i;
+ int status = -1;
+ unsigned int pmbitmask = 0;
char *xenner = NULL;
/* Really, this never fails - look at the man-page. */
@@ -824,6 +826,22 @@ virCapsPtr qemuCapsInit(virCapsPtr old_caps)
old_caps->host.cpu = NULL;
}
+ /* Add the power management features of the host */
+
+ status = virGetPMCapabilities(&pmbitmask);
+ if(status < 0) {
+ caps->host.powerMgmt_valid = false;
+ VIR_WARN("Failed to get host power management capabilities");
+ } else {
+ /* The PM query succeeded. */
+ caps->host.powerMgmt_valid = true;
+
+ /* The power management features supported by the host are
+ * represented as a bitmask by 'pmbitmask'.
+ */
+ caps->host.powerMgmt = pmbitmask;
+ }
+
virCapabilitiesAddHostMigrateTransport(caps,
"tcp");
diff --git a/src/util/util.c b/src/util/util.c
index 03a9e1a..b1a6434 100644
--- a/src/util/util.c
+++ b/src/util/util.c
@@ -2641,3 +2641,85 @@ or other application using the libvirt API.\n\
return 0;
}
+
+/**
+ * Get the Power Management Capabilities of the host system.
+ * The script 'pm-is-supported' (from the pm-utils package) is run
+ * to find out all the power management features supported by the host,
+ * such as Suspend-to-RAM (S3) and Suspend-to-Disk (S4).
+ *
+ * @bitmask: Pointer to the bitmask that must be set appropriately to
+ * indicate all the supported host power management features.
+ * This will be set to zero if the host does not support any
+ * power management feature.
+ *
+ * Return values:
+ * 0 if the query was successful.
+ * -1 on error like 'pm-is-supported' is not found.
+ */
+int
+virGetPMCapabilities(unsigned int * bitmask)
+{
+
+ char *path = NULL;
+ int status = -1;
+ int ret = -1;
+ virCommandPtr cmd;
+
+ *bitmask = 0;
+ if((path = virFindFileInPath("pm-is-supported")) == NULL) {
+ virUtilError(VIR_ERR_INTERNAL_ERROR,
+ "%s", _("Failed to get the path of pm-is-supported"));
+ return -1;
+ }
+
+ /* Check support for Suspend-to-RAM (S3) */
+ cmd = virCommandNew(path);
+ virCommandAddArg(cmd, "--suspend");
+ if(virCommandRun(cmd, &status) < 0) {
+ virUtilError(VIR_ERR_INTERNAL_ERROR,
+ "%s", _("Failed to run command"
+ "'pm-is-supported --suspend'"));
+ virCommandFree(cmd);
+ ret = -1;
+ goto cleanup;
+ } else {
+ ret = 0;
+
+ /* Check return code of command == 0 for success
+ * (i.e., the PM capability is supported)
+ */
+ if(status == 0)
+ *bitmask |= 1U << VIR_HOST_PM_S3;
+
+ virCommandFree(cmd);
+ }
+
+ /* Check support for Suspend-to-Disk (S4) */
+ cmd = virCommandNew(path);
+ virCommandAddArg(cmd, "--hibernate");
+ if(virCommandRun(cmd, &status) < 0) {
+ virUtilError(VIR_ERR_INTERNAL_ERROR,
+ "%s", _("Failed to run command"
+ "'pm-is-supported --hibernate'"));
+
+ virCommandFree(cmd);
+ ret = -1;
+ goto cleanup;
+ } else {
+ ret = 0;
+
+ /* Check return code of command == 0 for success
+ * (i.e., the PM capability is supported)
+ */
+ if(status == 0)
+ *bitmask |= 1U << VIR_HOST_PM_S4;
+
+ virCommandFree(cmd);
+ }
+
+cleanup:
+ VIR_FREE(path);
+ return ret;
+}
+
diff --git a/src/util/util.h b/src/util/util.h
index af8b15d..24a87ff 100644
--- a/src/util/util.h
+++ b/src/util/util.h
@@ -272,4 +272,18 @@ bool virIsDevMapperDevice(const char *devname) ATTRIBUTE_NONNULL(1);
int virEmitXMLWarning(int fd,
const char *name,
const char *cmd) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3);
+
+/* Power Management Capabilities of the host system */
+
+enum virHostPMCapability {
+ VIR_HOST_PM_S3, /* Suspend-to-RAM */
+ VIR_HOST_PM_S4, /* Suspend-to-Disk */
+
+ VIR_HOST_PM_LAST
+};
+
+VIR_ENUM_DECL(virHostPMCapability)
+
+int virGetPMCapabilities(unsigned int *);
+
#endif /* __VIR_UTIL_H__ */
diff --git a/src/util/virterror.c b/src/util/virterror.c
index 9a27feb..e07de61 100644
--- a/src/util/virterror.c
+++ b/src/util/virterror.c
@@ -172,6 +172,9 @@ static const char *virErrorDomainName(virErrorDomain domain) {
case VIR_FROM_LOCKING:
dom = "Locking ";
break;
+ case VIR_FROM_CAPABILITIES:
+ dom = "Capabilities ";
+ break;
}
return(dom);
}
13 years, 4 months
[libvirt] Allow to migrate to same host?
by Osier Yang
The request is from a PhD stdudent in France, he is developing some
distributed system based on libvirt, before deploying the codes to real
instances, he want to test it on his own box first.
Libvirt checks if it's migrating to same host by comparing the hostname,
I'm wondering if we can add some flag to switch the checking on/off,
(probally a new property in qemu.conf?), so that one can test the migration
on same host just for development purpose.
Thoughts?
Regards
Osier
13 years, 4 months
[libvirt] [test-API][PATCH 1/2] Declare hypervisor connection variable as the global
by Guannan Ren
The is solve the problem where the failure of open() will lead to
null conn variable, and when we call close() later,it will report
no global 'conn' attribute to the ConnectAPI object.
remove a duplicated close() function
---
lib/connectAPI.py | 19 ++++++-------------
1 files changed, 6 insertions(+), 13 deletions(-)
diff --git a/lib/connectAPI.py b/lib/connectAPI.py
index 5d5b94f..702a088 100644
--- a/lib/connectAPI.py
+++ b/lib/connectAPI.py
@@ -40,11 +40,11 @@ import exception
class ConnectAPI(object):
def __init__(self):
- pass
+ self.conn = None
def open(self, uri):
try:
- self.conn = libvirt.open(uri)
+ conn = libvirt.open(uri)
return self.conn
except libvirt.libvirtError, e:
message = e.get_error_message()
@@ -53,7 +53,7 @@ class ConnectAPI(object):
def open_read_only(self, uri):
try:
- self.conn = libvirt.openReadOnly(uri)
+ conn = libvirt.openReadOnly(uri)
return self.conn
except libvirt.libvirtError, e:
message = e.get_error_message()
@@ -62,21 +62,13 @@ class ConnectAPI(object):
def openAuth(self, uri, auth, flags = 0):
try:
- self.conn = libvirt.openAuth(uri, auth, flags)
+ conn = libvirt.openAuth(uri, auth, flags)
return self.conn
except libvirt.libvirtError, e:
message = e.get_error_message()
code = e.get_error_code()
raise exception.LibvirtAPI(message, code)
- def close(self):
- try:
- self.conn.close()
- except libvirt.libvirtError, e:
- message = e.get_error_message()
- code = e.get_error_code()
- raise exception.LibvirtAPI(message, code)
-
def get_caps(self):
try:
caps = self.conn.getCapabilities()
@@ -398,7 +390,8 @@ class ConnectAPI(object):
def close(self):
try:
- return self.conn.close()
+ if self.conn:
+ return self.conn.close()
except libvirt.libvirtError, e:
message = e.get_error_message()
code = e.get_error_code()
--
1.7.1
13 years, 4 months
[libvirt] [test-API][PATCH] Add new testcase for libvirtd connection with tcp socket and with SASL authentication
by Guannan Ren
---
repos/remoteAccess/tcp_setup.py | 242 +++++++++++++++++++++++++++++++++++++++
1 files changed, 242 insertions(+), 0 deletions(-)
create mode 100644 repos/remoteAccess/tcp_setup.py
diff --git a/repos/remoteAccess/tcp_setup.py b/repos/remoteAccess/tcp_setup.py
new file mode 100644
index 0000000..8f88810
--- /dev/null
+++ b/repos/remoteAccess/tcp_setup.py
@@ -0,0 +1,242 @@
+#!/usr/bin/env python
+""" Configure and test libvirt tcp connection
+ remoteAccess:tcp_setup
+ target_machine
+ xx.xx.xx.xx
+ username
+ root
+ password
+ xxxxxx
+ listen_tcp
+ enable|disable
+ auth_tcp
+ none|sasl
+"""
+
+__author__ = 'Guannan Ren: gren(a)redhat.com'
+__date__ = 'Sun Aug 7, 2011'
+__version__ = '0.1.0'
+__credits__ = 'Copyright (C) 2011 Red Hat, Inc.'
+__all__ = ['tcp_setup', 'tcp_libvirtd_set', 'hypervisor_connecting_test']
+
+import os
+import re
+import sys
+
+def append_path(path):
+ """Append root path of package"""
+ if path in sys.path:
+ pass
+ else:
+ sys.path.append(path)
+
+pwd = os.getcwd()
+result = re.search('(.*)libvirt-test-API', pwd)
+append_path(result.group(0))
+
+from lib import connectAPI
+from utils.Python import utils
+from exception import LibvirtAPI
+
+SASLPASSWD2 = "/usr/sbin/saslpasswd2"
+LIBVIRTD_CONF = "/etc/libvirt/libvirtd.conf"
+SYSCONFIG_LIBVIRTD = "/etc/sysconfig/libvirtd"
+
+def check_params(params):
+ """check out the arguments requried for this testcases"""
+ logger = params['logger']
+ keys = ['target_machine', 'username', 'password', 'listen_tcp', 'auth_tcp']
+ for key in keys:
+ if key not in params:
+ logger.error("Argument %s is required" % key)
+ return 1
+ return 0
+
+def sasl_user_add(target_machine, username, password, util, logger):
+ """ execute saslpasswd2 to add sasl user """
+ logger.info("add sasl user on server side")
+ saslpasswd2_add = "echo %s | %s -a libvirt %s" % (password, SASLPASSWD2, username)
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, saslpasswd2_add)
+ if ret:
+ logger.error("failed to add sasl user")
+ return 1
+
+ return 0
+
+def tcp_libvirtd_set(target_machine, username, password,
+ listen_tcp, auth_tcp, util, logger):
+ """ configure libvirtd.conf on libvirt server """
+ logger.info("setting libvirtd.conf on libvirt server")
+ # open libvirtd --listen option
+ listen_open_cmd = "echo 'LIBVIRTD_ARGS=\"--listen\"' >> %s" % SYSCONFIG_LIBVIRTD
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, listen_open_cmd)
+ if ret:
+ logger.error("failed to uncomment --listen in %s" % SYSCONFIG_LIBVIRTD)
+ return 1
+
+ # set listen_tls
+ logger.info("set listen_tls to 0 in %s" % LIBVIRTD_CONF)
+ listen_tls_disable = "echo \"listen_tls = 0\" >> %s" % LIBVIRTD_CONF
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, listen_tls_disable)
+ if ret:
+ logger.error("failed to set listen_tls to 0 in %s" % LIBVIRTD_CONF)
+ return 1
+
+ # set listen_tcp
+ if listen_tcp == 'enable':
+ logger.info("enable listen_tcp = 1 in %s" % LIBVIRTD_CONF)
+ listen_tcp_set = "echo 'listen_tcp = 1' >> %s" % LIBVIRTD_CONF
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, listen_tcp_set)
+ if ret:
+ logger.error("failed to set listen_tcp in %s" % LIBVIRTD_CONF)
+ return 1
+
+ # set auth_tcp
+ logger.info("set auth_tcp to \"%s\" in %s" % (auth_tcp, LIBVIRTD_CONF))
+ auth_tcp_set = "echo 'auth_tcp = \"%s\"' >> %s" % (auth_tcp, LIBVIRTD_CONF)
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, auth_tcp_set)
+ if ret:
+ logger.error("failed to set auth_tcp in %s" % LIBVIRTD_CONF)
+ return 1
+
+ # restart remote libvirtd service
+ libvirtd_restart_cmd = "service libvirtd restart"
+ logger.info("libvirtd restart")
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, libvirtd_restart_cmd)
+ if ret:
+ logger.error("failed to restart libvirtd service")
+ return 1
+
+ logger.info("done to libvirtd configuration")
+ return 0
+
+def request_credentials(credentials, user_data):
+ for credential in credentials:
+ if credential[0] == connectAPI.VIR_CRED_AUTHNAME:
+ credential[4] = user_data[0]
+
+ if len(credential[4]) == 0:
+ credential[4] = credential[3]
+ elif credential[0] == connectAPI.VIR_CRED_PASSPHRASE:
+ credential[4] = user_data[1]
+ else:
+ return -1
+
+ return 0
+
+def hypervisor_connecting_test(uri, auth_tcp, username,
+ password, logger, expected_result):
+ """ connect remote server """
+ ret = 1
+ try:
+ conn = connectAPI.ConnectAPI()
+ if auth_tcp == 'none':
+ virconn = conn.open(uri)
+ elif auth_tcp == 'sasl':
+ user_data = [username, password]
+ auth = [[connectAPI.VIR_CRED_AUTHNAME, connectAPI.VIR_CRED_PASSPHRASE], request_credentials, user_data]
+ virconn = conn.openAuth(uri, auth, 0)
+
+ ret = 0
+ conn.close()
+ except LibvirtAPI, e:
+ logger.error("API error message: %s, error code is %s" % \
+ (e.response()['message'], e.response()['code']))
+
+ ret = 1
+ conn.close()
+
+ if ret == 0 and expected_result == 'success':
+ logger.info("tcp connnection success")
+ return 0
+ elif ret == 1 and expected_result == 'fail':
+ logger.info("tcp connection failed, but that is expected")
+ return 0
+ elif ret == 0 and expected_result == 'fail':
+ logger.error("tcp connection success, but we hope the reverse")
+ return 1
+ elif ret == 1 and expected_result == 'success':
+ logger.error("tcp connection failed")
+ return 1
+
+ return 0
+
+def tcp_setup(params):
+ """ configure libvirt and connect to it through TCP socket"""
+ logger = params['logger']
+ params_check_result = check_params(params)
+ if params_check_result:
+ return 1
+
+ target_machine = params['target_machine']
+ username = params['username']
+ password = params['password']
+ listen_tcp = params['listen_tcp']
+ auth_tcp = params['auth_tcp']
+
+ uri = "qemu+tcp://%s/system" % target_machine
+
+ util = utils.Utils()
+
+ logger.info("the hostname of server is %s" % target_machine)
+ logger.info("the value of listen_tcp is %s" % listen_tcp)
+ logger.info("the value of auth_tcp is %s" % auth_tcp)
+
+ if not util.do_ping(target_machine, 0):
+ logger.error("failed to ping host %s" % target_machine)
+ return 1
+
+ if auth_tcp == 'sasl':
+ if sasl_user_add(target_machine, username, password, util, logger):
+ return 1
+
+ if tcp_libvirtd_set(target_machine, username, password,
+ listen_tcp, auth_tcp, util, logger):
+ return 1
+
+ if listen_tcp == 'disable':
+ if hypervisor_connecting_test(uri, auth_tcp, username,
+ password, logger, 'fail'):
+ return 1
+ elif listen_tcp == 'enable':
+ if hypervisor_connecting_test(uri, auth_tcp, username,
+ password, logger, 'success'):
+ return 1
+
+ return 0
+
+def tcp_setup_clean(params):
+ """cleanup testing environment"""
+
+ logger = params['logger']
+ target_machine = params['target_machine']
+ username = params['username']
+ password = params['password']
+ listen_tcp = params['listen_tcp']
+ auth_tcp = params['auth_tcp']
+
+ util = utils.Utils()
+
+ if auth_tcp == 'sasl':
+ saslpasswd2_delete = "%s -a libvirt -d %s" % (SASLPASSWD2, username)
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, saslpasswd2_delete)
+ if ret:
+ logger.error("failed to delete sasl user")
+ libvirtd_conf_retore = "sed -i -n '/^[ #]/p' %s" % LIBVIRTD_CONF
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, libvirtd_conf_retore)
+ if ret:
+ logger.error("failed to restore %s" % LIBVIRTD_CONF)
+
+ sysconfig_libvirtd_restore = "sed -i -n '/^[ #]/p' %s" % SYSCONFIG_LIBVIRTD
+ ret = util.remote_exec_pexpect(target_machine, username,
+ password, sysconfig_libvirtd_restore)
+ if ret:
+ logger.error("failed to restore %s" % SYSCONFIG_LIBVIRTD)
--
1.7.1
13 years, 4 months
[libvirt] [test-API][PATCH] Add testcases for testing permission control and sasl authentication of unix socket
by Guannan Ren
add new testcases repos/remoteAccess/unix_perm_sasl.py
---
repos/remoteAccess/unix_perm_sasl.py | 234 ++++++++++++++++++++++++++++++++++
1 files changed, 234 insertions(+), 0 deletions(-)
create mode 100644 repos/remoteAccess/unix_perm_sasl.py
diff --git a/repos/remoteAccess/unix_perm_sasl.py b/repos/remoteAccess/unix_perm_sasl.py
new file mode 100644
index 0000000..9bb2600
--- /dev/null
+++ b/repos/remoteAccess/unix_perm_sasl.py
@@ -0,0 +1,234 @@
+#!/usr/bin/env python
+""" testing for permission and authentication of unix domain socket
+ remoteAccess:unix_perm_sasl
+ auth_unix_ro
+ none|sasl
+ auth_unix_rw
+ none|sasl
+ unix_sock_group(optional)
+ libvirt
+"""
+
+__author__ = 'Guannan Ren: gren(a)redhat.com'
+__date__ = 'Fri Aug 5, 2011'
+__version__ = '0.1.0'
+__credits__ = 'Copyright (C) 2011 Red Hat, Inc.'
+__all__ = ['unix_perm_sasl', 'group_sasl_set',
+ 'libvirt_configure', 'hypervisor_connecting_test']
+
+import os
+import re
+import sys
+import commands
+
+from pwd import getpwnam
+
+def append_path(path):
+ """Append root path of package"""
+ if path in sys.path:
+ pass
+ else:
+ sys.path.append(path)
+
+pwd = os.getcwd()
+result = re.search('(.*)libvirt-test-API', pwd)
+append_path(result.group(0))
+
+from lib import connectAPI
+from exception import LibvirtAPI
+
+TESTING_USER = 'testapi'
+LIBVIRTD_CONF = "/etc/libvirt/libvirtd.conf"
+SASLPASSWD2 = "/usr/sbin/saslpasswd2"
+
+def check_params(params):
+ """check out the arguments requried for the testcase"""
+ logger = params['logger']
+ keys = ['auth_unix_ro', 'auth_unix_rw']
+ for key in keys:
+ if key not in params:
+ logger.error("Argument %s is required" % key)
+ return 1
+ return 0
+
+def get_output(command, flag, logger):
+ """execute shell command
+ """
+ status, ret = commands.getstatusoutput(command)
+ if not flag and status:
+ logger.error("executing "+ "\"" + command + "\"" + " failed")
+ logger.error(ret)
+ return status, ret
+
+def libvirt_configure(unix_sock_group, auth_unix_ro, auth_unix_rw, logger):
+ """configure libvirt.conf """
+ logger.info("configuring libvirt.conf")
+
+ # uncomment unix_sock_group
+ unix_group_add = "echo 'unix_sock_group = \"%s\"' >> %s" % \
+ (unix_sock_group, LIBVIRTD_CONF)
+ status, output = get_output(unix_group_add, 0, logger)
+ if status:
+ logger.error("setting unix_sock_group to %s failed" % unix_sock_group)
+ return 1
+
+ auth_unix_ro_add = "echo 'auth_unix_ro = \"%s\"' >> %s" % \
+ (auth_unix_ro, LIBVIRTD_CONF)
+ status, output = get_output(auth_unix_ro_add, 0, logger)
+ if status:
+ logger.error("setting auth_unix_ro to %s failed" % auth_unix_ro)
+ return 1
+
+ auth_unix_rw_add = "echo 'auth_unix_rw = \"%s\"' >> %s" % \
+ (auth_unix_rw, LIBVIRTD_CONF)
+ status, output = get_output(auth_unix_rw_add, 0, logger)
+ if status:
+ logger.error("setting auth_unix_rw to %s failed" % auth_unix_rw)
+ return 1
+
+ return 0
+
+def group_sasl_set(unix_sock_group, auth_unix_ro, auth_unix_rw, logger):
+ """add libvirt group and set sasl authentication if needed"""
+ logger.info("add unix socket group and sasl authentication if we need")
+
+ # add unix socket group
+ libvirt_group_add = "groupadd %s" % unix_sock_group
+ status, output = get_output(libvirt_group_add, 0, logger)
+ if status:
+ logger.error("failed to add %s group" % unix_sock_group)
+ return 1
+
+ # add "testapi" as the testing user
+ libvirt_user_add = "useradd -g %s %s" % (unix_sock_group, TESTING_USER)
+ status, output = get_output(libvirt_user_add, 0, logger)
+ if status:
+ logger.error("failed to add %s user into group %s" % \
+ (TESTING_USER, unix_sock_group))
+ return 1
+
+ # add sasl user
+ if auth_unix_ro == 'sasl' or auth_unix_rw == 'sasl':
+ saslpasswd2_add = "echo %s | %s -a libvirt %s" % \
+ (TESTING_USER, SASLPASSWD2, TESTING_USER)
+ status, output = get_output(saslpasswd2_add, 0, logger)
+ if status:
+ logger.error("failed to set sasl user %s" % TESTING_USER)
+ return 1
+
+ return 0
+
+def request_credentials(credentials, user_data):
+ for credential in credentials:
+ if credential[0] == connectAPI.VIR_CRED_AUTHNAME:
+ credential[4] = user_data[0]
+
+ if len(credential[4]) == 0:
+ credential[4] = credential[3]
+ elif credential[0] == connectAPI.VIR_CRED_PASSPHRASE:
+ credential[4] = user_data[1]
+ else:
+ return -1
+
+ return 0
+
+def hypervisor_connecting_test(uri, auth_unix_ro, auth_unix_rw, logger):
+ """connect to hypervisor"""
+ logger.info("connect to hypervisor")
+ orginal_user = os.geteuid()
+ testing_user_id = getpwnam(TESTING_USER)[2]
+ logger.info("the testing_user id is %d" % testing_user_id)
+
+ logger.info("set euid to %d" % testing_user_id)
+ os.seteuid(testing_user_id)
+
+ try:
+ conn = connectAPI.ConnectAPI()
+ if auth_unix_ro == 'none':
+ virconn = conn.open_read_only(uri)
+ elif auth_unix_ro == 'sasl':
+ user_data = [TESTING_USER, TESTING_USER]
+ auth = [[connectAPI.VIR_CRED_AUTHNAME, \
+ connectAPI.VIR_CRED_PASSPHRASE],
+ request_credentials, user_data]
+ virconn = conn.openAuth(uri, auth, 0)
+
+ if auth_unix_rw == 'none':
+ virconn = conn.open(uri)
+ elif auth_unix_rw == 'sasl':
+ user_data = [TESTING_USER, TESTING_USER]
+ auth = [[connectAPI.VIR_CRED_AUTHNAME, \
+ connectAPI.VIR_CRED_PASSPHRASE],
+ request_credentials, user_data]
+ virconn = conn.openAuth(uri, auth, 0)
+ conn.close()
+ except LibvirtAPI, e:
+ logger.error("API error message: %s, error code is %s" % \
+ (e.response()['message'], e.response()['code']))
+ logger.info("set euid back to %d" % orginal_user)
+ os.seteuid(orginal_user)
+ conn.close()
+ return 1
+
+ logger.info("set euid back to %d" % orginal_user)
+ os.seteuid(orginal_user)
+ return 0
+
+def unix_perm_sasl(params):
+ """ test unix socket group function and sasl authentication"""
+ logger = params['logger']
+ params_check_result = check_params(params)
+ if params_check_result:
+ return 1
+
+ auth_unix_ro = params['auth_unix_ro']
+ auth_unix_rw = params['auth_unix_rw']
+
+ unix_sock_group = 'libvirt'
+ if params.has_key('unix_sock_group'):
+ unix_sock_group = params['unix_sock_group']
+
+ uri = "qemu:///system"
+
+
+ if group_sasl_set(unix_sock_group, auth_unix_ro, auth_unix_rw, logger):
+ return 1
+
+ if libvirt_configure(unix_sock_group, auth_unix_ro, auth_unix_rw, logger):
+ return 1
+
+ if hypervisor_connecting_test(uri, auth_unix_ro, auth_unix_rw, logger):
+ return 1
+
+ return 0
+
+def unix_perm_sasl_clean(params):
+ """clean testing environment"""
+ logger = params['logger']
+
+ auth_unix_ro = params['auth_unix_ro']
+ auth_unix_rw = params['auth_unix_rw']
+
+ unix_sock_group = 'libvirt'
+ if params.has_key('unix_sock_group'):
+ unix_sock_group = params['unix_sock_group']
+
+ # delete "testapi" user
+ libvirt_user_del = "userdel %s" % TESTING_USER
+ status, output = get_output(libvirt_user_del, 0, logger)
+ if status:
+ logger.error("failed to del %s user into group %s" % TESTING_USER)
+
+ # delete unix socket group
+ libvirt_group_del = "groupdel %s" % unix_sock_group
+ status, output = get_output(libvirt_group_del, 0, logger)
+ if status:
+ logger.error("failed to del %s group" % unix_sock_group)
+
+ # delete sasl user
+ if auth_unix_ro == 'sasl' or auth_unix_rw == 'sasl':
+ saslpasswd2_delete = "%s -a libvirt -d %s" % (SASLPASSWD2, TESTING_USER)
+ status, output = get_output(saslpasswd2_delete, 0, logger)
+ if status:
+ logger.error("failed to delete sasl user %s" % TESTING_USER)
+
--
1.7.1
13 years, 4 months
[libvirt] [Libvirt] [PATCH v2] Fix bug #611823 prohibit pools with duplicate storage
by Lei Li
Make sure the unique storage pool defined and create from different directory to avoid inconsistent version of volume pool created.
Signed-off-by: Lei Li <lilei(a)linux.vnet.ibm.com>
---
src/conf/storage_conf.c | 36 ++++++++++++++++++++++++++++++++++++
src/conf/storage_conf.h | 4 ++++
src/libvirt_private.syms | 2 ++
src/storage/storage_driver.c | 6 ++++++
4 files changed, 48 insertions(+), 0 deletions(-)
diff --git a/src/conf/storage_conf.c b/src/conf/storage_conf.c
index 995f9a6..9078f78 100644
--- a/src/conf/storage_conf.c
+++ b/src/conf/storage_conf.c
@@ -1317,6 +1317,21 @@ virStoragePoolObjFindByName(virStoragePoolObjListPtr pools,
return NULL;
}
+virStoragePoolObjPtr
+virStoragePoolObjFindByPath(virStoragePoolObjListPtr pools,
+ const char *path) {
+ unsigned int i;
+
+ for (i = 0 ; i < pools->count ; i++) {
+ virStoragePoolObjLock(pools->objs[i]);
+ if (STREQ(pools->objs[i]->def->target.path, path))
+ return pools->objs[i];
+ virStoragePoolObjUnlock(pools->objs[i]);
+ }
+
+ return NULL;
+}
+
void
virStoragePoolObjClearVols(virStoragePoolObjPtr pool)
{
@@ -1707,6 +1722,27 @@ cleanup:
return ret;
}
+int virStoragePoolTargetDuplicate(virStoragePoolObjListPtr pools,
+ virStoragePoolDefPtr def)
+{
+ int ret = 1;
+ virStoragePoolObjPtr pool = NULL;
+
+ /* Check the pool list if defined target path already exist */
+ pool = virStoragePoolObjFindByPath(pools, def->target.path);
+ if (pool) {
+ virStorageReportError(VIR_ERR_OPERATION_FAILED,
+ _("target path '%s' is already in use"),
+ pool->def->target.path);
+ ret = -1;
+ goto cleanup;
+ }
+
+cleanup:
+ if (pool)
+ virStoragePoolObjUnlock(pool);
+ return ret;
+}
void virStoragePoolObjLock(virStoragePoolObjPtr obj)
{
diff --git a/src/conf/storage_conf.h b/src/conf/storage_conf.h
index 271441a..454c43d 100644
--- a/src/conf/storage_conf.h
+++ b/src/conf/storage_conf.h
@@ -335,6 +335,8 @@ virStoragePoolObjPtr virStoragePoolObjFindByUUID(virStoragePoolObjListPtr pools,
const unsigned char *uuid);
virStoragePoolObjPtr virStoragePoolObjFindByName(virStoragePoolObjListPtr pools,
const char *name);
+virStoragePoolObjPtr virStoragePoolObjFindByPath(virStoragePoolObjListPtr pools,
+ const char *path);
virStorageVolDefPtr virStorageVolDefFindByKey(virStoragePoolObjPtr pool,
const char *key);
@@ -387,6 +389,8 @@ char *virStoragePoolSourceListFormat(virStoragePoolSourceListPtr def);
int virStoragePoolObjIsDuplicate(virStoragePoolObjListPtr pools,
virStoragePoolDefPtr def,
unsigned int check_active);
+int virStoragePoolTargetDuplicate(virStoragePoolObjListPtr pools,
+ virStoragePoolDefPtr def);
void virStoragePoolObjLock(virStoragePoolObjPtr obj);
void virStoragePoolObjUnlock(virStoragePoolObjPtr obj);
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 830222b..37afaf2 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -937,7 +937,9 @@ virStoragePoolObjClearVols;
virStoragePoolObjDeleteDef;
virStoragePoolObjFindByName;
virStoragePoolObjFindByUUID;
+virStoragePoolObjFindByPath;
virStoragePoolObjIsDuplicate;
+virStoragePoolTargetDuplicate;
virStoragePoolObjListFree;
virStoragePoolObjLock;
virStoragePoolObjRemove;
diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c
index 9c353e3..b757911 100644
--- a/src/storage/storage_driver.c
+++ b/src/storage/storage_driver.c
@@ -536,6 +536,9 @@ storagePoolCreate(virConnectPtr conn,
if (virStoragePoolObjIsDuplicate(&driver->pools, def, 1) < 0)
goto cleanup;
+ if (virStoragePoolTargetDuplicate(&driver->pools, def) < 0)
+ goto cleanup;
+
if ((backend = virStorageBackendForType(def->type)) == NULL)
goto cleanup;
@@ -589,6 +592,9 @@ storagePoolDefine(virConnectPtr conn,
if (virStoragePoolObjIsDuplicate(&driver->pools, def, 0) < 0)
goto cleanup;
+ if (virStoragePoolTargetDuplicate(&driver->pools, def) < 0)
+ goto cleanup;
+
if (virStorageBackendForType(def->type) == NULL)
goto cleanup;
--
1.7.1
13 years, 4 months
[libvirt] How to avoid failure of migration/restoring/starting if cdrom is ejected inside guest?
by Osier Yang
Hello list,
There is problem of migration if the changedable medium is ejected inside
guest, this is caused by qemu closes the block driver backend once the
medium is ejected, but it doesn't gives a way to let libvirt known the fact.
So, libvirt will try to migrate the guest with the media still exists.
This will
cause the failure, as qemu already closes the block driver backend.
Actually this could also break domain restoring and starting (if the domain
has a managed saving image, supposing the media is ejected before saving
or managed saving).
It's ideal if qemu could provide some event so that libvirt could known the
media is changed immediately, but it's bad news qemu upstream won't make
a patch for this in a short time.
As a alternative solution, they proposed patch to expose the status of
changeable medium via monitor command "info block":
http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00408.html
The output of the improved "info block" looks like below:
(qemu) info block
disk0: removable=0 file=/home/armbru/work/images/test.qcow2
backing_file=test.img ro=0 drv=qcow2 encrypted=0
cd: removable=1 locked=0 ejected file=x.iso ro=1 drv=raw encrypted=0
What libvirt can do with the qemu improvement is checking the meduim status
at the time of migration, but it doesn't kill all the bugs, such as for a
live migration, one can eject the media inside guest just during the migration
process. This probally makes a race and cause the failure just the same.
And moreover, this won't solve the problem on restoring and starting
with managed
saving image, can't get the medium status as the guest is not active.
So, I 'm hesitating to use "info block" to resolve the problems, it
can't resolve
the root problem thoroughly.
Or I missed some good ideas? Any thought is welcomed, thanks.
By the way, it might be deserving to report the cdrom tray status using
the improved
"info block" though, though qemu might keep improving the command to
output more
infomation, such as the media is inserted, but the tray is still open
(which means
different if we report the tray status as "close" if "info block"
outputs "inserted",
need to change the codes then)?
Patches of qemu side to improve the tray handling:
http://lists.nongnu.org/archive/html/qemu-devel/2011-06/msg00381.html
Regards
Osier
13 years, 4 months
[libvirt] [RFC v3] Export KVM Host Power Management capabilities
by Srivatsa S. Bhat
This patch exports KVM Host Power Management capabilities as XML so that
higher-level systems management software can make use of these features
available in the host.
The script "pm-is-supported" (from pm-utils package) is run to discover if
Suspend-to-RAM (S3) or Suspend-to-Disk (S4) is supported by the host.
If either of them are supported, then a new tag "<power_management>" is
introduced in the XML under the <host> tag.
Eg: When the host supports both S3 and S4, the XML looks like this:
<capabilities>
<host>
<uuid>dc699581-48a2-11cb-b8a8-9a0265a79bbe</uuid>
<cpu>
<arch>i686</arch>
<model>coreduo</model>
<vendor>Intel</vendor>
<topology sockets='1' cores='2' threads='1'/>
<feature name='xtpr'/>
<feature name='tm2'/>
<feature name='est'/>
<feature name='vmx'/>
<feature name='pbe'/>
<feature name='tm'/>
<feature name='ht'/>
<feature name='ss'/>
<feature name='acpi'/>
<feature name='ds'/>
</cpu>
<power_management> <<<=== New host power management features
<S3/>
<S4/>
</power_management>
<migration_features>
<live/>
<uri_transports>
<uri_transport>tcp</uri_transport>
</uri_transports>
</migration_features>
</host>
.
.
.
However in case the query to check for power management features succeeded,
but the host does not support any such feature, then the XML will contain
an empty <power_management/> tag. In the event that the PM query itself
failed, the XML will not contain any "power_management" tag.
Open issues:
-----------
1. Design new APIs in libvirt to actually exploit the host power management
features instead of relying on external programs. This was discussed in
[4].
2. Decide on whether to include "pm-utils" package in the libvirt.spec
file considering the fact that the package name (pm-utils) may differ
from one Linux distribution to another.
Please let me know your comments and feedback.
Changelog:
---------
v1: The idea of exporting host power management capabilities through
libvirt was discussed in [1]. The choice to name the new tag as
"power_management" was discussed in [2].
v2: A working implementation was presented for review in [3].
References:
----------
[1] Exporting KVM host power saving capabilities through libvirt
http://thread.gmane.org/gmane.comp.emulators.libvirt/40886
[2] http://article.gmane.org/gmane.comp.emulators.libvirt/41688
[3] http://www.redhat.com/archives/libvir-list/2011-August/msg00238.html
[4] http://www.redhat.com/archives/libvir-list/2011-August/msg00248.html
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat(a)linux.vnet.ibm.com>
---
docs/formatcaps.html.in | 19 ++++++++++---
docs/schemas/capability.rng | 23 ++++++++++++++++
include/libvirt/virterror.h | 1 +
src/conf/capabilities.c | 50 ++++++++++++++++++++++++++++++++++
src/conf/capabilities.h | 8 ++++++
src/libvirt_private.syms | 2 +
src/qemu/qemu_capabilities.c | 27 +++++++++++++++++++
src/util/util.c | 61 ++++++++++++++++++++++++++++++++++++++++++
src/util/util.h | 14 ++++++++++
src/util/virterror.c | 3 ++
10 files changed, 203 insertions(+), 5 deletions(-)
diff --git a/docs/formatcaps.html.in b/docs/formatcaps.html.in
index a4297ce..ce6f9a6 100644
--- a/docs/formatcaps.html.in
+++ b/docs/formatcaps.html.in
@@ -28,6 +28,10 @@ BIOS you will see</p>
<feature name='xtpr'/>
...
</cpu>
+ <power_management>
+ <S3/>
+ <S4/>
+ <power_management/>
</host></span>
<!-- xen-3.0-x86_64 -->
@@ -61,11 +65,16 @@ BIOS you will see</p>
...
</capabilities></pre>
<p>The first block (in red) indicates the host hardware capabilities, currently
-it is limited to the CPU properties but other information may be available,
-it shows the CPU architecture, topology, model name, and additional features
-which are not included in the model but the CPU provides them. Features of the
-chip are shown within the feature block (the block is similar to what you will
-find in a Xen fully virtualized domain description).</p>
+it is limited to the CPU properties and the power management features of
+the host platform, but other information may be available, it shows the CPU architecture,
+topology, model name, and additional features which are not included in the model but the
+CPU provides them. Features of the chip are shown within the feature block (the block is
+similar to what you will find in a Xen fully virtualized domain description). Further,
+the power management features supported by the host are shown, such as Suspend-to-RAM (S3)
+and Suspend-to-Disk (S4). In case the query for power management features succeeded but the
+host does not support any such feature, then an empty <power_management/>
+tag will be shown. Otherwise, if the query itself failed, no such tag will
+be displayed (i.e., there will not be any power_management block or empty tag in the XML).</p>
<p>The second block (in blue) indicates the paravirtualization support of the
Xen support, you will see the os_type of xen to indicate a paravirtual
kernel, then architecture information and potential features.</p>
diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng
index 99b4a9a..930374c 100644
--- a/docs/schemas/capability.rng
+++ b/docs/schemas/capability.rng
@@ -35,6 +35,9 @@
</optional>
</element>
<optional>
+ <ref name='power_management'/>
+ </optional>
+ <optional>
<ref name='migration'/>
</optional>
<optional>
@@ -105,6 +108,26 @@
</zeroOrMore>
</define>
+ <define name='power_management'>
+ <choice>
+ <element name='power_management'>
+ <optional>
+ <element name='S3'>
+ <empty/>
+ </element>
+ </optional>
+ <optional>
+ <element name='S4'>
+ <empty/>
+ </element>
+ </optional>
+ </element>
+ <element name='power_management'>
+ <empty/>
+ </element>
+ </choice>
+ </define>
+
<define name='migration'>
<element name='migration_features'>
<optional>
diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h
index 9cac437..a831c73 100644
--- a/include/libvirt/virterror.h
+++ b/include/libvirt/virterror.h
@@ -82,6 +82,7 @@ typedef enum {
VIR_FROM_EVENT = 40, /* Error from event loop impl */
VIR_FROM_LIBXL = 41, /* Error from libxenlight driver */
VIR_FROM_LOCKING = 42, /* Error from lock manager */
+ VIR_FROM_CAPABILITIES = 43, /* Error from capabilities */
} virErrorDomain;
diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c
index 2f243ae..d39a3f9 100644
--- a/src/conf/capabilities.c
+++ b/src/conf/capabilities.c
@@ -29,6 +29,13 @@
#include "util.h"
#include "uuid.h"
#include "cpu_conf.h"
+#include "virterror_internal.h"
+
+
+#define VIR_FROM_THIS VIR_FROM_CAPABILITIES
+
+VIR_ENUM_IMPL(virHostPMCapability, VIR_HOST_PM_LAST,
+ "S3", "S4")
/**
* virCapabilitiesNew:
@@ -166,6 +173,8 @@ virCapabilitiesFree(virCapsPtr caps) {
virCapabilitiesFreeNUMAInfo(caps);
+ VIR_FREE(caps->host.powerMgmt);
+
for (i = 0 ; i < caps->host.nmigrateTrans ; i++)
VIR_FREE(caps->host.migrateTrans[i]);
VIR_FREE(caps->host.migrateTrans);
@@ -201,6 +210,28 @@ virCapabilitiesAddHostFeature(virCapsPtr caps,
return 0;
}
+/**
+ * virCapabilitiesAddHostPowerManagement:
+ * @caps: capabilities to extend
+ * @feature: the power management feature to be added
+ *
+ * Registers a new host power management feature, eg: 'S3' or 'S4'
+ */
+int
+virCapabilitiesAddHostPowerManagement(virCapsPtr caps,
+ int feature)
+{
+ if(VIR_RESIZE_N(caps->host.powerMgmt, caps->host.npowerMgmt_max,
+ caps->host.npowerMgmt, 1) < 0) {
+ virReportOOMError();
+ return -1;
+ }
+
+ caps->host.powerMgmt[caps->host.npowerMgmt] = feature;
+ caps->host.npowerMgmt++;
+
+ return 0;
+}
/**
* virCapabilitiesAddHostMigrateTransport:
@@ -686,6 +717,25 @@ virCapabilitiesFormatXML(virCapsPtr caps)
virBufferAddLit(&xml, " </cpu>\n");
+ if(caps->host.isPMQuerySuccess) {
+ if(caps->host.npowerMgmt) {
+ /* The PM Query was successful and the host supports
+ * some PM features.
+ */
+ virBufferAddLit(&xml, " <power_management>\n");
+ for (i = 0; i < caps->host.npowerMgmt ; i++) {
+ virBufferAsprintf(&xml, " <%s/>\n",
+ virHostPMCapabilityTypeToString(caps->host.powerMgmt[i]));
+ }
+ virBufferAddLit(&xml, " </power_management>\n");
+ } else {
+ /* The PM Query was successful but the host does not
+ * support any PM feature.
+ */
+ virBufferAddLit(&xml, " <power_management/>\n");
+ }
+ }
+
if (caps->host.offlineMigrate) {
virBufferAddLit(&xml, " <migration_features>\n");
if (caps->host.liveMigrate)
diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h
index e2fa1d6..afbf732 100644
--- a/src/conf/capabilities.h
+++ b/src/conf/capabilities.h
@@ -105,6 +105,10 @@ struct _virCapsHost {
size_t nfeatures;
size_t nfeatures_max;
char **features;
+ bool isPMQuerySuccess;
+ size_t npowerMgmt;
+ size_t npowerMgmt_max;
+ int *powerMgmt; /* enum virHostPMCapability */
int offlineMigrate;
int liveMigrate;
size_t nmigrateTrans;
@@ -186,6 +190,10 @@ virCapabilitiesAddHostFeature(virCapsPtr caps,
const char *name);
extern int
+virCapabilitiesAddHostPowerManagement(virCapsPtr caps,
+ int feature);
+
+extern int
virCapabilitiesAddHostMigrateTransport(virCapsPtr caps,
const char *name);
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 830222b..5754fdd 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -41,6 +41,7 @@ virCapabilitiesAddGuestFeature;
virCapabilitiesAddHostFeature;
virCapabilitiesAddHostMigrateTransport;
virCapabilitiesAddHostNUMACell;
+virCapabilitiesAddHostPowerManagement;
virCapabilitiesAllocMachines;
virCapabilitiesDefaultGuestArch;
virCapabilitiesDefaultGuestEmulator;
@@ -1025,6 +1026,7 @@ safezero;
virArgvToString;
virAsprintf;
virBuildPathInternal;
+virCheckPMCapability;
virDirCreate;
virEmitXMLWarning;
virEnumFromString;
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index 3f36212..f3d0c0a 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -794,6 +794,7 @@ virCapsPtr qemuCapsInit(virCapsPtr old_caps)
struct utsname utsname;
virCapsPtr caps;
int i;
+ int status = -1;
char *xenner = NULL;
/* Really, this never fails - look at the man-page. */
@@ -824,6 +825,32 @@ virCapsPtr qemuCapsInit(virCapsPtr old_caps)
old_caps->host.cpu = NULL;
}
+ /* Add the power management features of the host */
+
+ /* Check for Suspend-to-RAM support (S3) */
+ status = virCheckPMCapability(VIR_HOST_PM_S3);
+ if(status < 0) {
+ caps->host.isPMQuerySuccess = false;
+ VIR_WARN("Failed to get host power management features");
+ } else {
+ /* The PM Query succeeded */
+ caps->host.isPMQuerySuccess = true;
+ if(status == 1) /* S3 is supported */
+ virCapabilitiesAddHostPowerManagement(caps, VIR_HOST_PM_S3);
+ }
+
+ /* Check for Suspend-to-Disk support (S4) */
+ status = virCheckPMCapability(VIR_HOST_PM_S4);
+ if(status < 0) {
+ caps->host.isPMQuerySuccess = false;
+ VIR_WARN("Failed to get host power management features");
+ } else {
+ /* The PM Query succeeded */
+ caps->host.isPMQuerySuccess = true;
+ if(status == 1) /* S4 is supported */
+ virCapabilitiesAddHostPowerManagement(caps, VIR_HOST_PM_S4);
+ }
+
virCapabilitiesAddHostMigrateTransport(caps,
"tcp");
diff --git a/src/util/util.c b/src/util/util.c
index 03a9e1a..489c4d6 100644
--- a/src/util/util.c
+++ b/src/util/util.c
@@ -2641,3 +2641,64 @@ or other application using the libvirt API.\n\
return 0;
}
+
+/**
+ * Check the Power Management Capabilities of the host system.
+ * The script 'pm-is-supported' (from the pm-utils package) is run
+ * to find out if the capability is supported by the host.
+ *
+ * @capability: capability to check for
+ * VIR_HOST_PM_S3: Check for Suspend-to-RAM support
+ * VIR_HOST_PM_S4: Check for Suspend-to-Disk support
+ *
+ * Return values:
+ * 1 if the capability is supported.
+ * 0 if the query was successful but the capability is
+ * not supported by the host.
+ * -1 on error like 'pm-is-supported' is not found.
+ */
+int
+virCheckPMCapability(int capability)
+{
+
+ char *path = NULL;
+ int status = -1;
+ int ret = -1;
+ virCommandPtr cmd;
+
+ if((path = virFindFileInPath("pm-is-supported")) == NULL) {
+ virUtilError(VIR_ERR_INTERNAL_ERROR,
+ "%s", _("Failed to get the path of pm-is-supported"));
+ return -1;
+ }
+
+ cmd = virCommandNew(path);
+ switch(capability) {
+ case VIR_HOST_PM_S3:
+ /* Check support for suspend (S3) */
+ virCommandAddArg(cmd, "--suspend");
+ break;
+
+ case VIR_HOST_PM_S4:
+ /* Check support for hibernation (S4) */
+ virCommandAddArg(cmd, "--hibernate");
+ break;
+
+ default:
+ goto cleanup;
+ }
+
+ if(virCommandRun(cmd, &status) < 0)
+ goto cleanup;
+
+ /* Check return code of command == 0 for success
+ * (i.e., the PM capability is supported)
+ */
+ ret = (status == 0) ? 1 : 0;
+
+cleanup:
+ virCommandFree(cmd);
+ VIR_FREE(path);
+ return ret;
+}
+
diff --git a/src/util/util.h b/src/util/util.h
index af8b15d..dfb8c1a 100644
--- a/src/util/util.h
+++ b/src/util/util.h
@@ -272,4 +272,18 @@ bool virIsDevMapperDevice(const char *devname) ATTRIBUTE_NONNULL(1);
int virEmitXMLWarning(int fd,
const char *name,
const char *cmd) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3);
+
+/* Power Management Capabilities of the host system */
+
+enum virHostPMCapability {
+ VIR_HOST_PM_S3, /* Suspend-to-RAM */
+ VIR_HOST_PM_S4, /* Suspend-to-Disk */
+
+ VIR_HOST_PM_LAST
+};
+
+VIR_ENUM_DECL(virHostPMCapability)
+
+int virCheckPMCapability(int capability);
+
#endif /* __VIR_UTIL_H__ */
diff --git a/src/util/virterror.c b/src/util/virterror.c
index 9a27feb..26d6011 100644
--- a/src/util/virterror.c
+++ b/src/util/virterror.c
@@ -148,6 +148,9 @@ static const char *virErrorDomainName(virErrorDomain domain) {
case VIR_FROM_CPU:
dom = "CPU ";
break;
+ case VIR_FROM_CAPABILITIES:
+ dom = "Capabilities ";
+ break;
case VIR_FROM_NWFILTER:
dom = "Network Filter ";
break;
13 years, 4 months
Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
by Anthony Liguori
On 08/08/2011 03:42 AM, Shribman, Aidan wrote:
> Subject: [PATCH v4] XBZRLE delta for live migration of large memory apps
> From: Aidan Shribman<aidan.shribman(a)sap.com>
>
> By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we can reduce VM downtime
> and total live-migration time of VMs running memory write intensive workloads
> typical of large enterprise applications such as SAP ERP Systems, and generally
> speaking for any application with a sparse memory update pattern.
>
> On the sender side XBZRLE is used as a compact delta encoding of page updates,
> retrieving the old page content from an LRU cache (default size of 64 MB). The
> receiving side uses the existing page content and XBZRLE to decode the new page
> content.
>
> Work was originally based on research results published VEE 2011: Evaluation of
> Delta Compression Techniques for Efficient Live Migration of Large Virtual
> Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
> XBRLE was improved further using XBZRLE instead.
>
> XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
> ideal for in-line, real-time encoding such as is needed for live-migration.
>
> A typical usage scenario:
> {qemu} migrate_set_cachesize 256m
> {qemu} migrate -x -d tcp:destination.host:4444
> {qemu} info migrate
> ...
> transferred ram-duplicate: A kbytes
> transferred ram-duplicate: B pages
> transferred ram-normal: C kbytes
> transferred ram-normal: D pages
> transferred ram-xbrle: E kbytes
> transferred ram-xbrle: F pages
> overflow ram-xbrle: G pages
> cache-hit ram-xbrle: H pages
> cache-lookup ram-xbrle: J pages
>
> Testing: live migration with XBZRLE completed in 110 seconds, without live
> migration was not able to complete.
>
> A simple synthetic memory r/w load generator:
> .. include<stdlib.h>
> .. include<stdio.h>
> .. int main()
> .. {
> .. char *buf = (char *) calloc(4096, 4096);
> .. while (1) {
> .. int i;
> .. for (i = 0; i< 4096 * 4; i++) {
> .. buf[i * 4096 / 4]++;
> .. }
> .. printf(".");
> .. }
> .. }
>
> Signed-off-by: Benoit Hudzia<benoit.hudzia(a)sap.com>
> Signed-off-by: Petter Svard<petters(a)cs.umu.se>
> Signed-off-by: Aidan Shribman<aidan.shribman(a)sap.com>
One thing that strikes me about this algorithm is that it's very good
for a particular type of workload--shockingly good really.
I think workload aware migration compression is possible for a lot of
different types of workloads. That makes me a bit wary of QEMU growing
quite a lot of compression mechanisms.
It makes me think that this logic may really belong at a higher level
where more information is known about the workload. For instance, I can
imagine XBZRLE living in something like libvirt.
Today, parsing migration traffic is pretty horrible but I think we're
pretty strongly committed to fixing that in 1.0. That makes me wonder
if it would be nicer architecturally for a higher level tool to own
something like this.
Originally, when I added migration, I had the view that we would have
transport plugins based on the exec: protocol. That hasn't really
happened since libvirt really owns migration but I think having XBZRLE
as a transport plugin for libvirt is something worth considering.
I'm curious what people think about this type of approach. CC'ing
libvirt to get their input.
Regards,
Anthony Liguori
>
> --
>
> Makefile.target | 1 +
> arch_init.c | 351 ++++++++++++++++++++++++++++++++++++++++++++++------
> block-migration.c | 3 +-
> hash.h | 72 +++++++++++
> hmp-commands.hx | 36 ++++--
> hw/hw.h | 3 +-
> lru.c | 142 +++++++++++++++++++++
> lru.h | 13 ++
> migration-exec.c | 6 +-
> migration-fd.c | 6 +-
> migration-tcp.c | 6 +-
> migration-unix.c | 6 +-
> migration.c | 119 +++++++++++++++++-
> migration.h | 25 +++-
> qmp-commands.hx | 43 ++++++-
> savevm.c | 13 ++-
> sysemu.h | 13 ++-
> xbzrle.c | 126 +++++++++++++++++++
> xbzrle.h | 12 ++
> 19 files changed, 917 insertions(+), 79 deletions(-)
>
> diff --git a/Makefile.target b/Makefile.target
> index 2800f47..b3215de 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -186,6 +186,7 @@ endif #CONFIG_BSD_USER
> ifdef CONFIG_SOFTMMU
>
> obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o
> +obj-y += lru.o xbzrle.o
> # virtio has to be here due to weird dependency between PCI and virtio-net.
> # need to fix this properly
> obj-y += virtio-blk.o virtio-balloon.o virtio-net.o virtio-serial-bus.o
> diff --git a/arch_init.c b/arch_init.c
> old mode 100644
> new mode 100755
> index 4486925..d67dc82
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -40,6 +40,17 @@
> #include "net.h"
> #include "gdbstub.h"
> #include "hw/smbios.h"
> +#include "lru.h"
> +#include "xbzrle.h"
> +
> +//#define DEBUG_ARCH_INIT
> +#ifdef DEBUG_ARCH_INIT
> +#define DPRINTF(fmt, ...) \
> + do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> + do { } while (0)
> +#endif
>
> #ifdef TARGET_SPARC
> int graphic_width = 1024;
> @@ -88,6 +99,161 @@ const uint32_t arch_type = QEMU_ARCH;
> #define RAM_SAVE_FLAG_PAGE 0x08
> #define RAM_SAVE_FLAG_EOS 0x10
> #define RAM_SAVE_FLAG_CONTINUE 0x20
> +#define RAM_SAVE_FLAG_XBZRLE 0x40
> +
> +/***********************************************************/
> +/* RAM Migration State */
> +typedef struct ArchMigrationState {
> + int use_xbrle;
> + int64_t xbrle_cache_size;
> +} ArchMigrationState;
> +
> +static ArchMigrationState arch_mig_state;
> +
> +void arch_set_params(int blk_enable, int shared_base, int use_xbrle,
> + int64_t xbrle_cache_size, void *opaque)
> +{
> + arch_mig_state.use_xbrle = use_xbrle;
> + arch_mig_state.xbrle_cache_size = xbrle_cache_size;
> +}
> +
> +#define BE16_MAGIC 0x0123
> +
> +/***********************************************************/
> +/* XBZRLE (Xor Binary Zero Run-Length Encoding) */
> +typedef struct XBZRLEHeader {
> + uint32_t xh_cksum; /* not used */
> + uint16_t xh_magic;
> + uint16_t xh_len;
> + uint8_t xh_flags;
> +} XBZRLEHeader;
> +
> +static uint8_t dup_buf[TARGET_PAGE_SIZE];
> +
> +/***********************************************************/
> +/* accounting */
> +typedef struct AccountingInfo{
> + uint64_t dup_pages;
> + uint64_t norm_pages;
> + uint64_t xbrle_bytes;
> + uint64_t xbrle_pages;
> + uint64_t xbrle_overflow;
> + uint64_t xbrle_cache_lookup;
> + uint64_t xbrle_cache_hit;
> + uint64_t iterations;
> +} AccountingInfo;
> +
> +static AccountingInfo acct_info;
> +
> +static void acct_clear(void)
> +{
> + memset(&acct_info, 0, sizeof(acct_info));
> +}
> +
> +uint64_t dup_mig_bytes_transferred(void)
> +{
> + return acct_info.dup_pages;
> +}
> +
> +uint64_t dup_mig_pages_transferred(void)
> +{
> + return acct_info.dup_pages;
> +}
> +
> +uint64_t norm_mig_bytes_transferred(void)
> +{
> + return acct_info.norm_pages * TARGET_PAGE_SIZE;
> +}
> +
> +uint64_t norm_mig_pages_transferred(void)
> +{
> + return acct_info.norm_pages;
> +}
> +
> +uint64_t xbrle_mig_bytes_transferred(void)
> +{
> + return acct_info.xbrle_bytes;
> +}
> +
> +uint64_t xbrle_mig_pages_transferred(void)
> +{
> + return acct_info.xbrle_pages;
> +}
> +
> +uint64_t xbrle_mig_pages_overflow(void)
> +{
> + return acct_info.xbrle_overflow;
> +}
> +
> +uint64_t xbrle_mig_pages_cache_hit(void)
> +{
> + return acct_info.xbrle_cache_hit;
> +}
> +
> +uint64_t xbrle_mig_pages_cache_lookup(void)
> +{
> + return acct_info.xbrle_cache_lookup;
> +}
> +
> +static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
> + int cont, int flag)
> +{
> + qemu_put_be64(f, offset | cont | flag);
> + if (!cont) {
> + qemu_put_byte(f, strlen(block->idstr));
> + qemu_put_buffer(f, (uint8_t *)block->idstr,
> + strlen(block->idstr));
> + }
> +}
> +
> +#define ENCODING_FLAG_XBZRLE 0x1
> +
> +static int save_xbrle_page(QEMUFile *f, uint8_t *current_page,
> + ram_addr_t current_addr, RAMBlock *block, ram_addr_t offset, int cont)
> +{
> + int encoded_len = 0, bytes_sent = 0;
> + XBZRLEHeader hdr = {0, BE16_MAGIC};
> + uint8_t *encoded, *old_page;
> +
> + /* abort if page not cached */
> + acct_info.xbrle_cache_lookup++;
> + old_page = lru_lookup(current_addr);
> + if (!old_page) {
> + goto done;
> + }
> + acct_info.xbrle_cache_hit++;
> +
> + /* XBZRLE (XOR+ZRLE) encoding */
> + encoded = (uint8_t *) qemu_malloc(TARGET_PAGE_SIZE);
> + encoded_len = xbzrle_encode(encoded, old_page, current_page,
> + TARGET_PAGE_SIZE);
> +
> + if (encoded_len< 0) {
> + DPRINTF("XBZRLE encoding overflow - sending uncompressed\n");
> + acct_info.xbrle_overflow++;
> + goto done;
> + }
> +
> + hdr.xh_len = encoded_len;
> + hdr.xh_flags |= ENCODING_FLAG_XBZRLE;
> +
> + /* Send XBZRLE compressed page */
> + save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBZRLE);
> +
> + qemu_put_be32(f, hdr.xh_cksum);
> + qemu_put_buffer(f, (uint8_t *)&hdr.xh_magic, sizeof (hdr.xh_magic));
> + qemu_put_be16(f, hdr.xh_len);
> + qemu_put_byte(f, hdr.xh_flags);
> +
> + qemu_put_buffer(f, encoded, encoded_len);
> + acct_info.xbrle_pages++;
> + bytes_sent = encoded_len + sizeof(hdr);
> + acct_info.xbrle_bytes += bytes_sent;
> +
> +done:
> + qemu_free(encoded);
> + return bytes_sent;
> +}
>
> static int is_dup_page(uint8_t *page, uint8_t ch)
> {
> @@ -107,7 +273,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
> static RAMBlock *last_block;
> static ram_addr_t last_offset;
>
> -static int ram_save_block(QEMUFile *f)
> +static int ram_save_block(QEMUFile *f, int stage)
> {
> RAMBlock *block = last_block;
> ram_addr_t offset = last_offset;
> @@ -120,6 +286,7 @@ static int ram_save_block(QEMUFile *f)
> current_addr = block->offset + offset;
>
> do {
> + lru_free_cb_t free_cb = qemu_free;
> if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) {
> uint8_t *p;
> int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
> @@ -128,28 +295,35 @@ static int ram_save_block(QEMUFile *f)
> current_addr + TARGET_PAGE_SIZE,
> MIGRATION_DIRTY_FLAG);
>
> - p = block->host + offset;
> + if (arch_mig_state.use_xbrle) {
> + p = qemu_malloc(TARGET_PAGE_SIZE);
> + memcpy(p, block->host + offset, TARGET_PAGE_SIZE);
> + } else {
> + p = block->host + offset;
> + }
>
> if (is_dup_page(p, *p)) {
> - qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
> - if (!cont) {
> - qemu_put_byte(f, strlen(block->idstr));
> - qemu_put_buffer(f, (uint8_t *)block->idstr,
> - strlen(block->idstr));
> - }
> + save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
> qemu_put_byte(f, *p);
> bytes_sent = 1;
> - } else {
> - qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
> - if (!cont) {
> - qemu_put_byte(f, strlen(block->idstr));
> - qemu_put_buffer(f, (uint8_t *)block->idstr,
> - strlen(block->idstr));
> + acct_info.dup_pages++;
> + if (arch_mig_state.use_xbrle&& !*p) {
> + p = dup_buf;
> + free_cb = NULL;
> }
> + } else if (stage == 2&& arch_mig_state.use_xbrle) {
> + bytes_sent = save_xbrle_page(f, p, current_addr, block,
> + offset, cont);
> + }
> + if (!bytes_sent) {
> + save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
> qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> bytes_sent = TARGET_PAGE_SIZE;
> + acct_info.norm_pages++;
> + }
> + if (arch_mig_state.use_xbrle) {
> + lru_insert(current_addr, p, free_cb);
> }
> -
> break;
> }
>
> @@ -221,6 +395,9 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
>
> if (stage< 0) {
> cpu_physical_memory_set_dirty_tracking(0);
> + if (arch_mig_state.use_xbrle) {
> + lru_fini();
> + }
> return 0;
> }
>
> @@ -235,6 +412,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
> last_block = NULL;
> last_offset = 0;
>
> + if (arch_mig_state.use_xbrle) {
> + lru_init(arch_mig_state.xbrle_cache_size/TARGET_PAGE_SIZE, 0);
> + acct_clear();
> + }
> +
> /* Make sure all dirty bits are set */
> QLIST_FOREACH(block,&ram_list.blocks, next) {
> for (addr = block->offset; addr< block->offset + block->length;
> @@ -264,8 +446,9 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
> while (!qemu_file_rate_limit(f)) {
> int bytes_sent;
>
> - bytes_sent = ram_save_block(f);
> + bytes_sent = ram_save_block(f, stage);
> bytes_transferred += bytes_sent;
> + acct_info.iterations++;
> if (bytes_sent == 0) { /* no more blocks */
> break;
> }
> @@ -285,19 +468,79 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
> int bytes_sent;
>
> /* flush all remaining blocks regardless of rate limiting */
> - while ((bytes_sent = ram_save_block(f)) != 0) {
> + while ((bytes_sent = ram_save_block(f, stage))) {
> bytes_transferred += bytes_sent;
> }
> cpu_physical_memory_set_dirty_tracking(0);
> + if (arch_mig_state.use_xbrle) {
> + lru_fini();
> + }
> }
>
> qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
>
> expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
>
> + DPRINTF("ram_save_live: expected(%ld)<= max(%ld)?\n", expected_time,
> + migrate_max_downtime());
> +
> return (stage == 2)&& (expected_time<= migrate_max_downtime());
> }
>
> +static int load_xbrle(QEMUFile *f, ram_addr_t addr, void *host)
> +{
> + int len, rc = -1;
> + uint8_t *encoded;
> + XBZRLEHeader hdr = {0};
> +
> + /* extract ZRLE header */
> + hdr.xh_cksum = qemu_get_be32(f);
> + qemu_get_buffer(f, (uint8_t *)&hdr.xh_magic, sizeof (hdr.xh_magic));
> + hdr.xh_len = qemu_get_be16(f);
> + hdr.xh_flags = qemu_get_byte(f);
> +
> + if (!(hdr.xh_flags& ENCODING_FLAG_XBZRLE)) {
> + fprintf(stderr, "Failed to load XZBRLE page - wrong compression!\n");
> + goto done;
> + }
> +
> + if (hdr.xh_len> TARGET_PAGE_SIZE) {
> + fprintf(stderr, "Failed to load XZBRLE page - len overflow!\n");
> + goto done;
> + }
> +
> + /* load data and decode */
> + encoded = (uint8_t *) qemu_malloc(hdr.xh_len);
> + qemu_get_buffer(f, encoded, hdr.xh_len);
> + /* covert endianess if magic indicated destination differs from source */
> + if (hdr.xh_magic != BE16_MAGIC) {
> + const uint64_t *end = (uint64_t *) encoded +
> + hdr.xh_len / sizeof (uint64_t);
> + uint64_t *p;
> + for (p = (uint64_t *) encoded; p< end; p++) {
> + bswap64s(p);
> + }
> + }
> +
> + /* decode ZRLE */
> + len = xbzrle_decode(host, host, encoded, hdr.xh_len);
> + if (len == -1) {
> + fprintf(stderr, "Failed to load XBZRLE page - decode error!\n");
> + goto done;
> + }
> +
> + if (len != TARGET_PAGE_SIZE) {
> + fprintf(stderr, "Failed to load XBZRLE page - size %d expected %d!\n",
> + len, TARGET_PAGE_SIZE);
> + goto done;
> + }
> +
> + rc = 0;
> +done:
> + qemu_free(encoded);
> + return rc;
> +}
> +
> static inline void *host_from_stream_offset(QEMUFile *f,
> ram_addr_t offset,
> int flags)
> @@ -328,16 +571,38 @@ static inline void *host_from_stream_offset(QEMUFile *f,
> return NULL;
> }
>
> +static inline void *host_from_stream_offset_versioned(int version_id,
> + QEMUFile *f, ram_addr_t offset, int flags)
> +{
> + void *host;
> + if (version_id == 3) {
> + host = qemu_get_ram_ptr(offset);
> + } else {
> + host = host_from_stream_offset(f, offset, flags);
> + }
> + if (!host) {
> + fprintf(stderr, "Failed to convert RAM address to host"
> + " for offset 0x%lX!\n", offset);
> + abort();
> + }
> + return host;
> +}
> +
> int ram_load(QEMUFile *f, void *opaque, int version_id)
> {
> ram_addr_t addr;
> - int flags;
> + int flags, ret = 0;
> + static uint64_t seq_iter;
> +
> + seq_iter++;
>
> if (version_id< 3 || version_id> 4) {
> - return -EINVAL;
> + ret = -EINVAL;
> + goto done;
> }
>
> do {
> + void *host;
> addr = qemu_get_be64(f);
>
> flags = addr& ~TARGET_PAGE_MASK;
> @@ -346,7 +611,8 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
> if (flags& RAM_SAVE_FLAG_MEM_SIZE) {
> if (version_id == 3) {
> if (addr != ram_bytes_total()) {
> - return -EINVAL;
> + ret = -EINVAL;
> + goto done;
> }
> } else {
> /* Synchronize RAM block list */
> @@ -365,8 +631,10 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
>
> QLIST_FOREACH(block,&ram_list.blocks, next) {
> if (!strncmp(id, block->idstr, sizeof(id))) {
> - if (block->length != length)
> - return -EINVAL;
> + if (block->length != length) {
> + ret = -EINVAL;
> + goto done;
> + }
> break;
> }
> }
> @@ -374,7 +642,8 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
> if (!block) {
> fprintf(stderr, "Unknown ramblock \"%s\", cannot "
> "accept migration\n", id);
> - return -EINVAL;
> + ret = -EINVAL;
> + goto done;
> }
>
> total_ram_bytes -= length;
> @@ -383,17 +652,10 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
> }
>
> if (flags& RAM_SAVE_FLAG_COMPRESS) {
> - void *host;
> uint8_t ch;
>
> - if (version_id == 3)
> - host = qemu_get_ram_ptr(addr);
> - else
> - host = host_from_stream_offset(f, addr, flags);
> - if (!host) {
> - return -EINVAL;
> - }
> -
> + host = host_from_stream_offset_versioned(version_id,
> + f, addr, flags);
> ch = qemu_get_byte(f);
> memset(host, ch, TARGET_PAGE_SIZE);
> #ifndef _WIN32
> @@ -403,21 +665,28 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
> }
> #endif
> } else if (flags& RAM_SAVE_FLAG_PAGE) {
> - void *host;
> -
> - if (version_id == 3)
> - host = qemu_get_ram_ptr(addr);
> - else
> - host = host_from_stream_offset(f, addr, flags);
> -
> + host = host_from_stream_offset_versioned(version_id,
> + f, addr, flags);
> qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
> + } else if (flags& RAM_SAVE_FLAG_XBZRLE) {
> + host = host_from_stream_offset_versioned(version_id,
> + f, addr, flags);
> + if (load_xbrle(f, addr, host)< 0) {
> + ret = -EINVAL;
> + goto done;
> + }
> }
> +
> if (qemu_file_has_error(f)) {
> - return -EIO;
> + ret = -EIO;
> + goto done;
> }
> } while (!(flags& RAM_SAVE_FLAG_EOS));
>
> - return 0;
> +done:
> + DPRINTF("Completed load of VM with exit code %d seq iteration %ld\n",
> + ret, seq_iter);
> + return ret;
> }
>
> void qemu_service_io(void)
> diff --git a/block-migration.c b/block-migration.c
> index 3e66f49..504df70 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -689,7 +689,8 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
> return 0;
> }
>
> -static void block_set_params(int blk_enable, int shared_base, void *opaque)
> +static void block_set_params(int blk_enable, int shared_base,
> + int use_xbrle, int64_t xbrle_cache_size, void *opaque)
> {
> block_mig_state.blk_enable = blk_enable;
> block_mig_state.shared_base = shared_base;
> diff --git a/hash.h b/hash.h
> new file mode 100644
> index 0000000..7109905
> --- /dev/null
> +++ b/hash.h
> @@ -0,0 +1,72 @@
> +#ifndef _LINUX_HASH_H
> +#define _LINUX_HASH_H
> +/* Fast hashing routine for ints, longs and pointers.
> + (C) 2002 William Lee Irwin III, IBM */
> +
> +/*
> + * Knuth recommends primes in approximately golden ratio to the maximum
> + * integer representable by a machine word for multiplicative hashing.
> + * Chuck Lever verified the effectiveness of this technique:
> + * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
> + *
> + * These primes are chosen to be bit-sparse, that is operations on
> + * them can use shifts and additions instead of multiplications for
> + * machines where multiplications are slow.
> + */
> +
> +typedef uint64_t u64;
> +typedef uint32_t u32;
> +#define BITS_PER_LONG TARGET_LONG_BITS
> +
> +/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
> +#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
> +/* 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
> +#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL
> +
> +#if BITS_PER_LONG == 32
> +#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_32
> +#define hash_long(val, bits) hash_32(val, bits)
> +#elif BITS_PER_LONG == 64
> +#define hash_long(val, bits) hash_64(val, bits)
> +#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_64
> +#else
> +#error Wordsize not 32 or 64
> +#endif
> +
> +static inline u64 hash_64(u64 val, unsigned int bits)
> +{
> + u64 hash = val;
> +
> + /* Sigh, gcc can't optimise this alone like it does for 32 bits. */
> + u64 n = hash;
> + n<<= 18;
> + hash -= n;
> + n<<= 33;
> + hash -= n;
> + n<<= 3;
> + hash += n;
> + n<<= 3;
> + hash -= n;
> + n<<= 4;
> + hash += n;
> + n<<= 2;
> + hash += n;
> +
> + /* High bits are more random, so use them. */
> + return hash>> (64 - bits);
> +}
> +
> +static inline u32 hash_32(u32 val, unsigned int bits)
> +{
> + /* On some cpus multiply is faster, on others gcc will do shifts */
> + u32 hash = val * GOLDEN_RATIO_PRIME_32;
> +
> + /* High bits are more random, so use them. */
> + return hash>> (32 - bits);
> +}
> +
> +static inline unsigned long hash_ptr(void *ptr, unsigned int bits)
> +{
> + return hash_long((unsigned long)ptr, bits);
> +}
> +#endif /* _LINUX_HASH_H */
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> old mode 100644
> new mode 100755
> index e5585ba..e49d5be
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -717,24 +717,27 @@ ETEXI
>
> {
> .name = "migrate",
> - .args_type = "detach:-d,blk:-b,inc:-i,uri:s",
> - .params = "[-d] [-b] [-i] uri",
> - .help = "migrate to URI (using -d to not wait for completion)"
> - "\n\t\t\t -b for migration without shared storage with"
> - " full copy of disk\n\t\t\t -i for migration without "
> - "shared storage with incremental copy of disk "
> - "(base image shared between src and destination)",
> + .args_type = "detach:-d,blk:-b,inc:-i,xbrle:-x,uri:s",
> + .params = "[-d] [-b] [-i] [-x] uri",
> + .help = "migrate to URI"
> + "\n\t -d to not wait for completion"
> + "\n\t -b for migration without shared storage with"
> + " full copy of disk"
> + "\n\t -i for migration without"
> + " shared storage with incremental copy of disk"
> + " (base image shared between source and destination)"
> + "\n\t -x to use XBRLE page delta compression",
> .user_print = monitor_user_noop,
> .mhandler.cmd_new = do_migrate,
> },
>
> -
> STEXI
> -@item migrate [-d] [-b] [-i] @var{uri}
> +@item migrate [-d] [-b] [-i] [-x] @var{uri}
> @findex migrate
> Migrate to @var{uri} (using -d to not wait for completion).
> -b for migration with full copy of disk
> -i for migration with incremental copy of disk (base image is shared)
> + -x to use XBRLE page delta compression
> ETEXI
>
> {
> @@ -753,10 +756,23 @@ Cancel the current VM migration.
> ETEXI
>
> {
> + .name = "migrate_set_cachesize",
> + .args_type = "value:s",
> + .params = "value",
> + .help = "set cache size (in MB) for XBRLE migrations",
> + .mhandler.cmd = do_migrate_set_cachesize,
> + },
> +
> +STEXI
> +@item migrate_set_cachesize @var{value}
> +Set cache size (in MB) for xbrle migrations.
> +ETEXI
> +
> + {
> .name = "migrate_set_speed",
> .args_type = "value:o",
> .params = "value",
> - .help = "set maximum speed (in bytes) for migrations. "
> + .help = "set maximum XBRLE cache size (in bytes) for migrations. "
> "Defaults to MB if no size suffix is specified, ie. B/K/M/G/T",
> .user_print = monitor_user_noop,
> .mhandler.cmd_new = do_migrate_set_speed,
> diff --git a/hw/hw.h b/hw/hw.h
> index 9d2cfc2..aa336ec 100644
> --- a/hw/hw.h
> +++ b/hw/hw.h
> @@ -239,7 +239,8 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
> int64_t qemu_ftell(QEMUFile *f);
> int64_t qemu_fseek(QEMUFile *f, int64_t pos, int whence);
>
> -typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
> +typedef void SaveSetParamsHandler(int blk_enable, int shared,
> + int use_xbrle, int64_t xbrle_cache_size, void *opaque);
> typedef void SaveStateHandler(QEMUFile *f, void *opaque);
> typedef int SaveLiveStateHandler(Monitor *mon, QEMUFile *f, int stage,
> void *opaque);
> diff --git a/lru.c b/lru.c
> new file mode 100644
> index 0000000..e7230d0
> --- /dev/null
> +++ b/lru.c
> @@ -0,0 +1,142 @@
> +#include<assert.h>
> +#include<math.h>
> +#include "qemu-common.h"
> +#include "qemu-queue.h"
> +#include "host-utils.h"
> +#include "lru.h"
> +#include "hash.h"
> +
> +typedef struct CacheItem {
> + ram_addr_t it_addr;
> + uint8_t *it_data;
> + lru_free_cb_t it_free;
> + QCIRCLEQ_ENTRY(CacheItem) it_lru_next;
> + QCIRCLEQ_ENTRY(CacheItem) it_bucket_next;
> +} CacheItem;
> +
> +typedef QCIRCLEQ_HEAD(, CacheItem) CacheBucket;
> +static CacheBucket *page_hash;
> +static int64_t cache_table_size;
> +static uint64_t cache_max_items;
> +static int64_t cache_num_items;
> +static uint8_t cache_hash_bits;
> +
> +static QCIRCLEQ_HEAD(page_lru, CacheItem) page_lru;
> +
> +static uint64_t next_pow_of_2(uint64_t v)
> +{
> + v--;
> + v |= v>> 1;
> + v |= v>> 2;
> + v |= v>> 4;
> + v |= v>> 8;
> + v |= v>> 16;
> + v |= v>> 32;
> + v++;
> + return v;
> +}
> +
> +void lru_init(int64_t max_items, void *param)
> +{
> + int i;
> +
> + cache_num_items = 0;
> + cache_max_items = max_items;
> + /* add 20% to table size to reduce collisions */
> + cache_table_size = next_pow_of_2(1.2 * max_items);
> + cache_hash_bits = ctz64(cache_table_size) - 1;
> +
> + QCIRCLEQ_INIT(&page_lru);
> +
> + page_hash = qemu_mallocz(sizeof(CacheBucket) * cache_table_size);
> + assert(page_hash);
> + for (i = 0; i< cache_table_size; i++) {
> + QCIRCLEQ_INIT(&page_hash[i]);
> + }
> +}
> +
> +static CacheBucket *page_bucket_list(ram_addr_t addr)
> +{
> + return&page_hash[hash_long(addr, cache_hash_bits)];
> +}
> +
> +static void do_lru_remove(CacheItem *it)
> +{
> + assert(it);
> +
> + QCIRCLEQ_REMOVE(&page_lru, it, it_lru_next);
> + QCIRCLEQ_REMOVE(page_bucket_list(it->it_addr), it, it_bucket_next);
> + if (it->it_free) {
> + (*it->it_free)(it->it_data);
> + }
> + qemu_free(it);
> + cache_num_items--;
> +}
> +
> +static int do_lru_remove_first(void)
> +{
> + CacheItem *first;
> +
> + if (QCIRCLEQ_EMPTY(&page_lru)) {
> + return -1;
> + }
> + first = QCIRCLEQ_FIRST(&page_lru);
> + do_lru_remove(first);
> + return 0;
> +}
> +
> +
> +void lru_fini(void)
> +{
> + while (!do_lru_remove_first()) {
> + }
> + qemu_free(page_hash);
> +}
> +
> +static CacheItem *do_lru_lookup(ram_addr_t addr)
> +{
> + CacheBucket *head = page_bucket_list(addr);
> + CacheItem *it;
> +
> + if (QCIRCLEQ_EMPTY(head)) {
> + return NULL;
> + }
> + QCIRCLEQ_FOREACH(it, head, it_bucket_next) {
> + if (addr == it->it_addr) {
> + return it;
> + }
> + }
> + return NULL;
> +}
> +
> +uint8_t *lru_lookup(ram_addr_t addr)
> +{
> + CacheItem *it = do_lru_lookup(addr);
> + return it ? it->it_data : NULL;
> +}
> +
> +void lru_insert(ram_addr_t addr, uint8_t *data, lru_free_cb_t free_cb)
> +{
> + CacheItem *it;
> +
> + /* remove old if item exists */
> + it = do_lru_lookup(addr);
> + if (it) {
> + do_lru_remove(it);
> + }
> +
> + /* evict LRU if require free space */
> + if (cache_num_items == cache_max_items) {
> + do_lru_remove_first();
> + }
> +
> + /* add new entry */
> + it = qemu_mallocz(sizeof(*it));
> + it->it_addr = addr;
> + it->it_data = data;
> + it->it_free = free_cb;
> + QCIRCLEQ_INSERT_HEAD(page_bucket_list(addr), it, it_bucket_next);
> + QCIRCLEQ_INSERT_TAIL(&page_lru, it, it_lru_next);
> + cache_num_items++;
> +}
> +
> diff --git a/lru.h b/lru.h
> new file mode 100644
> index 0000000..6c70095
> --- /dev/null
> +++ b/lru.h
> @@ -0,0 +1,13 @@
> +#ifndef _LRU_H_
> +#define _LRU_H_
> +
> +#include<unistd.h>
> +#include<stdint.h>
> +#include "cpu-all.h"
> +typedef void (*lru_free_cb_t)(void *);
> +void lru_init(ssize_t num_items, void *param);
> +void lru_fini(void);
> +void lru_insert(ram_addr_t id, uint8_t *pdata, lru_free_cb_t free_cb);
> +uint8_t *lru_lookup(ram_addr_t addr);
> +#endif
> +
> diff --git a/migration-exec.c b/migration-exec.c
> index 14718dd..fe8254a 100644
> --- a/migration-exec.c
> +++ b/migration-exec.c
> @@ -67,7 +67,9 @@ MigrationState *exec_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc)
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> FdMigrationState *s;
> FILE *f;
> @@ -99,6 +101,8 @@ MigrationState *exec_start_outgoing_migration(Monitor *mon,
>
> s->mig_state.blk = blk;
> s->mig_state.shared = inc;
> + s->mig_state.use_xbrle = use_xbrle;
> + s->mig_state.xbrle_cache_size = xbrle_cache_size;
>
> s->state = MIG_STATE_ACTIVE;
> s->mon = NULL;
> diff --git a/migration-fd.c b/migration-fd.c
> index 6d14505..4a1ddbd 100644
> --- a/migration-fd.c
> +++ b/migration-fd.c
> @@ -56,7 +56,9 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc)
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> FdMigrationState *s;
>
> @@ -82,6 +84,8 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,
>
> s->mig_state.blk = blk;
> s->mig_state.shared = inc;
> + s->mig_state.use_xbrle = use_xbrle;
> + s->mig_state.xbrle_cache_size = xbrle_cache_size;
>
> s->state = MIG_STATE_ACTIVE;
> s->mon = NULL;
> diff --git a/migration-tcp.c b/migration-tcp.c
> index b55f419..4ca5bf6 100644
> --- a/migration-tcp.c
> +++ b/migration-tcp.c
> @@ -81,7 +81,9 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc)
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> struct sockaddr_in addr;
> FdMigrationState *s;
> @@ -101,6 +103,8 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
>
> s->mig_state.blk = blk;
> s->mig_state.shared = inc;
> + s->mig_state.use_xbrle = use_xbrle;
> + s->mig_state.xbrle_cache_size = xbrle_cache_size;
>
> s->state = MIG_STATE_ACTIVE;
> s->mon = NULL;
> diff --git a/migration-unix.c b/migration-unix.c
> index 57232c0..0813902 100644
> --- a/migration-unix.c
> +++ b/migration-unix.c
> @@ -80,7 +80,9 @@ MigrationState *unix_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc)
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> FdMigrationState *s;
> struct sockaddr_un addr;
> @@ -100,6 +102,8 @@ MigrationState *unix_start_outgoing_migration(Monitor *mon,
>
> s->mig_state.blk = blk;
> s->mig_state.shared = inc;
> + s->mig_state.use_xbrle = use_xbrle;
> + s->mig_state.xbrle_cache_size = xbrle_cache_size;
>
> s->state = MIG_STATE_ACTIVE;
> s->mon = NULL;
> diff --git a/migration.c b/migration.c
> old mode 100644
> new mode 100755
> index 9ee8b17..ccacf81
> --- a/migration.c
> +++ b/migration.c
> @@ -34,6 +34,11 @@
> /* Migration speed throttling */
> static uint32_t max_throttle = (32<< 20);
>
> +/* Migration XBRLE cache size */
> +#define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
> +
> +static int64_t migrate_cache_size = DEFAULT_MIGRATE_CACHE_SIZE;
> +
> static MigrationState *current_migration;
>
> int qemu_start_incoming_migration(const char *uri)
> @@ -80,6 +85,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
> int detach = qdict_get_try_bool(qdict, "detach", 0);
> int blk = qdict_get_try_bool(qdict, "blk", 0);
> int inc = qdict_get_try_bool(qdict, "inc", 0);
> + int use_xbrle = qdict_get_try_bool(qdict, "xbrle", 0);
> const char *uri = qdict_get_str(qdict, "uri");
>
> if (current_migration&&
> @@ -90,17 +96,21 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
>
> if (strstart(uri, "tcp:",&p)) {
> s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
> - blk, inc);
> + blk, inc, use_xbrle,
> + migrate_cache_size);
> #if !defined(WIN32)
> } else if (strstart(uri, "exec:",&p)) {
> s = exec_start_outgoing_migration(mon, p, max_throttle, detach,
> - blk, inc);
> + blk, inc, use_xbrle,
> + migrate_cache_size);
> } else if (strstart(uri, "unix:",&p)) {
> s = unix_start_outgoing_migration(mon, p, max_throttle, detach,
> - blk, inc);
> + blk, inc, use_xbrle,
> + migrate_cache_size);
> } else if (strstart(uri, "fd:",&p)) {
> s = fd_start_outgoing_migration(mon, p, max_throttle, detach,
> - blk, inc);
> + blk, inc, use_xbrle,
> + migrate_cache_size);
> #endif
> } else {
> monitor_printf(mon, "unknown migration protocol: %s\n", uri);
> @@ -185,6 +195,36 @@ static void migrate_print_status(Monitor *mon, const char *name,
> qdict_get_int(qdict, "total")>> 10);
> }
>
> +static void migrate_print_ram_status(Monitor *mon, const char *name,
> + const QDict *status_dict)
> +{
> + QDict *qdict;
> + uint64_t overflow, cache_hit, cache_lookup;
> +
> + qdict = qobject_to_qdict(qdict_get(status_dict, name));
> +
> + monitor_printf(mon, "transferred %s: %" PRIu64 " kbytes\n", name,
> + qdict_get_int(qdict, "bytes")>> 10);
> + monitor_printf(mon, "transferred %s: %" PRIu64 " pages\n", name,
> + qdict_get_int(qdict, "pages"));
> + overflow = qdict_get_int(qdict, "overflow");
> + if (overflow> 0) {
> + monitor_printf(mon, "overflow %s: %" PRIu64 " pages\n", name,
> + overflow);
> + }
> + cache_hit = qdict_get_int(qdict, "cache-hit");
> + if (cache_hit> 0) {
> + monitor_printf(mon, "cache-hit %s: %" PRIu64 " pages\n", name,
> + cache_hit);
> + }
> + cache_lookup = qdict_get_int(qdict, "cache-lookup");
> + if (cache_lookup> 0) {
> + monitor_printf(mon, "cache-lookup %s: %" PRIu64 " pages\n", name,
> + cache_lookup);
> + }
> +
> +}
> +
> void do_info_migrate_print(Monitor *mon, const QObject *data)
> {
> QDict *qdict;
> @@ -198,6 +238,18 @@ void do_info_migrate_print(Monitor *mon, const QObject *data)
> migrate_print_status(mon, "ram", qdict);
> }
>
> + if (qdict_haskey(qdict, "ram-duplicate")) {
> + migrate_print_ram_status(mon, "ram-duplicate", qdict);
> + }
> +
> + if (qdict_haskey(qdict, "ram-normal")) {
> + migrate_print_ram_status(mon, "ram-normal", qdict);
> + }
> +
> + if (qdict_haskey(qdict, "ram-xbrle")) {
> + migrate_print_ram_status(mon, "ram-xbrle", qdict);
> + }
> +
> if (qdict_haskey(qdict, "disk")) {
> migrate_print_status(mon, "disk", qdict);
> }
> @@ -214,6 +266,23 @@ static void migrate_put_status(QDict *qdict, const char *name,
> qdict_put_obj(qdict, name, obj);
> }
>
> +static void migrate_put_ram_status(QDict *qdict, const char *name,
> + uint64_t bytes, uint64_t pages,
> + uint64_t overflow, uint64_t cache_hit,
> + uint64_t cache_lookup)
> +{
> + QObject *obj;
> +
> + obj = qobject_from_jsonf("{ 'bytes': %" PRId64 ", "
> + "'pages': %" PRId64 ", "
> + "'overflow': %" PRId64 ", "
> + "'cache-hit': %" PRId64 ", "
> + "'cache-lookup': %" PRId64 " }",
> + bytes, pages, overflow, cache_hit,
> + cache_lookup);
> + qdict_put_obj(qdict, name, obj);
> +}
> +
> void do_info_migrate(Monitor *mon, QObject **ret_data)
> {
> QDict *qdict;
> @@ -228,6 +297,21 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
> migrate_put_status(qdict, "ram", ram_bytes_transferred(),
> ram_bytes_remaining(), ram_bytes_total());
>
> + if (s->use_xbrle) {
> + migrate_put_ram_status(qdict, "ram-duplicate",
> + dup_mig_bytes_transferred(),
> + dup_mig_pages_transferred(), 0, 0, 0);
> + migrate_put_ram_status(qdict, "ram-normal",
> + norm_mig_bytes_transferred(),
> + norm_mig_pages_transferred(), 0, 0, 0);
> + migrate_put_ram_status(qdict, "ram-xbrle",
> + xbrle_mig_bytes_transferred(),
> + xbrle_mig_pages_transferred(),
> + xbrle_mig_pages_overflow(),
> + xbrle_mig_pages_cache_hit(),
> + xbrle_mig_pages_cache_lookup());
> + }
> +
> if (blk_mig_active()) {
> migrate_put_status(qdict, "disk", blk_mig_bytes_transferred(),
> blk_mig_bytes_remaining(),
> @@ -341,7 +425,8 @@ void migrate_fd_connect(FdMigrationState *s)
>
> DPRINTF("beginning savevm\n");
> ret = qemu_savevm_state_begin(s->mon, s->file, s->mig_state.blk,
> - s->mig_state.shared);
> + s->mig_state.shared, s->mig_state.use_xbrle,
> + s->mig_state.xbrle_cache_size);
> if (ret< 0) {
> DPRINTF("failed, %d\n", ret);
> migrate_fd_error(s);
> @@ -448,3 +533,27 @@ int migrate_fd_close(void *opaque)
> qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
> return s->close(s);
> }
> +
> +void do_migrate_set_cachesize(Monitor *mon, const QDict *qdict)
> +{
> + ssize_t bytes;
> + const char *value = qdict_get_str(qdict, "value");
> +
> + bytes = strtosz(value, NULL);
> + if (bytes< 0) {
> + monitor_printf(mon, "invalid cache size: %s\n", value);
> + return;
> + }
> +
> + /* On 32-bit hosts, QEMU is limited by virtual address space */
> + if (bytes> (2047<< 20)&& HOST_LONG_BITS == 32) {
> + monitor_printf(mon, "cache can't exceed 2047 MB RAM limit on host\n");
> + return;
> + }
> + if (bytes != (uint64_t) bytes) {
> + monitor_printf(mon, "cache size too large\n");
> + return;
> + }
> + migrate_cache_size = bytes;
> +}
> +
> diff --git a/migration.h b/migration.h
> index d13ed4f..6dc0543 100644
> --- a/migration.h
> +++ b/migration.h
> @@ -32,6 +32,8 @@ struct MigrationState
> void (*release)(MigrationState *s);
> int blk;
> int shared;
> + int use_xbrle;
> + int64_t xbrle_cache_size;
> };
>
> typedef struct FdMigrationState FdMigrationState;
> @@ -76,7 +78,9 @@ MigrationState *exec_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc);
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size);
>
> int tcp_start_incoming_migration(const char *host_port);
>
> @@ -85,7 +89,9 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc);
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size);
>
> int unix_start_incoming_migration(const char *path);
>
> @@ -94,7 +100,9 @@ MigrationState *unix_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc);
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size);
>
> int fd_start_incoming_migration(const char *path);
>
> @@ -103,7 +111,9 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,
> int64_t bandwidth_limit,
> int detach,
> int blk,
> - int inc);
> + int inc,
> + int use_xbrle,
> + int64_t xbrle_cache_size);
>
> void migrate_fd_monitor_suspend(FdMigrationState *s, Monitor *mon);
>
> @@ -134,4 +144,11 @@ static inline FdMigrationState *migrate_to_fms(MigrationState *mig_state)
> return container_of(mig_state, FdMigrationState, mig_state);
> }
>
> +void do_migrate_set_cachesize(Monitor *mon, const QDict *qdict);
> +
> +void arch_set_params(int blk_enable, int shared_base,
> + int use_xbrle, int64_t xbrle_cache_size, void *opaque);
> +
> +int xbrle_mig_active(void);
> +
> #endif
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 793cf1c..8fbe64b 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -431,13 +431,16 @@ EQMP
>
> {
> .name = "migrate",
> - .args_type = "detach:-d,blk:-b,inc:-i,uri:s",
> - .params = "[-d] [-b] [-i] uri",
> - .help = "migrate to URI (using -d to not wait for completion)"
> - "\n\t\t\t -b for migration without shared storage with"
> - " full copy of disk\n\t\t\t -i for migration without "
> - "shared storage with incremental copy of disk "
> - "(base image shared between src and destination)",
> + .args_type = "detach:-d,blk:-b,inc:-i,xbrle:-x,uri:s",
> + .params = "[-d] [-b] [-i] [-x] uri",
> + .help = "migrate to URI"
> + "\n\t -d to not wait for completion"
> + "\n\t -b for migration without shared storage with"
> + " full copy of disk"
> + "\n\t -i for migration without"
> + " shared storage with incremental copy of disk"
> + " (base image shared between source and destination)"
> + "\n\t -x to use XBRLE page delta compression",
> .user_print = monitor_user_noop,
> .mhandler.cmd_new = do_migrate,
> },
> @@ -453,6 +456,7 @@ Arguments:
> - "blk": block migration, full disk copy (json-bool, optional)
> - "inc": incremental disk copy (json-bool, optional)
> - "uri": Destination URI (json-string)
> +- "xbrle": to use XBRLE page delta compression
>
> Example:
>
> @@ -494,6 +498,31 @@ Example:
> EQMP
>
> {
> + .name = "migrate_set_cachesize",
> + .args_type = "value:s",
> + .params = "value",
> + .help = "set cache size (in MB) for xbrle migrations",
> + .mhandler.cmd = do_migrate_set_cachesize,
> + },
> +
> +SQMP
> +migrate_set_cachesize
> +---------------------
> +
> +Set cache size to be used by XBRLE migration
> +
> +Arguments:
> +
> +- "value": cache size in bytes (json-number)
> +
> +Example:
> +
> +-> { "execute": "migrate_set_cachesize", "arguments": { "value": 500M } }
> +<- { "return": {} }
> +
> +EQMP
> +
> + {
> .name = "migrate_set_speed",
> .args_type = "value:f",
> .params = "value",
> diff --git a/savevm.c b/savevm.c
> index 4e49765..93b512b 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1141,7 +1141,8 @@ int register_savevm(DeviceState *dev,
> void *opaque)
> {
> return register_savevm_live(dev, idstr, instance_id, version_id,
> - NULL, NULL, save_state, load_state, opaque);
> + arch_set_params, NULL, save_state,
> + load_state, opaque);
> }
>
> void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque)
> @@ -1428,15 +1429,17 @@ static int vmstate_save(QEMUFile *f, SaveStateEntry *se)
> #define QEMU_VM_SUBSECTION 0x05
>
> int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
> - int shared)
> + int shared, int use_xbrle,
> + int64_t xbrle_cache_size)
> {
> SaveStateEntry *se;
>
> QTAILQ_FOREACH(se,&savevm_handlers, entry) {
> if(se->set_params == NULL) {
> continue;
> - }
> - se->set_params(blk_enable, shared, se->opaque);
> + }
> + se->set_params(blk_enable, shared, use_xbrle, xbrle_cache_size,
> + se->opaque);
> }
>
> qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
> @@ -1577,7 +1580,7 @@ static int qemu_savevm_state(Monitor *mon, QEMUFile *f)
>
> bdrv_flush_all();
>
> - ret = qemu_savevm_state_begin(mon, f, 0, 0);
> + ret = qemu_savevm_state_begin(mon, f, 0, 0, 0, 0);
> if (ret< 0)
> goto out;
>
> diff --git a/sysemu.h b/sysemu.h
> index b81a70e..eb53bf7 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -44,6 +44,16 @@ uint64_t ram_bytes_remaining(void);
> uint64_t ram_bytes_transferred(void);
> uint64_t ram_bytes_total(void);
>
> +uint64_t dup_mig_bytes_transferred(void);
> +uint64_t dup_mig_pages_transferred(void);
> +uint64_t norm_mig_bytes_transferred(void);
> +uint64_t norm_mig_pages_transferred(void);
> +uint64_t xbrle_mig_bytes_transferred(void);
> +uint64_t xbrle_mig_pages_transferred(void);
> +uint64_t xbrle_mig_pages_overflow(void);
> +uint64_t xbrle_mig_pages_cache_lookup(void);
> +uint64_t xbrle_mig_pages_cache_hit(void);
> +
> int64_t cpu_get_ticks(void);
> void cpu_enable_ticks(void);
> void cpu_disable_ticks(void);
> @@ -74,7 +84,8 @@ void qemu_announce_self(void);
> void main_loop_wait(int nonblocking);
>
> int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
> - int shared);
> + int shared, int use_xbrle,
> + int64_t xbrle_cache_size);
> int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
> int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
> void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
> diff --git a/xbzrle.c b/xbzrle.c
> new file mode 100644
> index 0000000..e9285e0
> --- /dev/null
> +++ b/xbzrle.c
> @@ -0,0 +1,126 @@
> +#include<stdint.h>
> +#include<string.h>
> +#include<assert.h>
> +#include "cpu-all.h"
> +#include "xbzrle.h"
> +
> +typedef struct {
> + uint64_t c;
> + uint64_t num;
> +} zero_encoding_t;
> +
> +typedef struct {
> + uint64_t c;
> +} char_encoding_t;
> +
> +static int rle_encode(uint64_t *in, int slen, uint8_t *out, const int dlen)
> +{
> + int dl = 0;
> + uint64_t cp = 0, c, run_len = 0;
> +
> + if (slen<= 0)
> + return -1;
> +
> + while (1) {
> + if (!slen)
> + break;
> + c = *in++;
> + slen--;
> + if (!(cp || c)) {
> + run_len++;
> + } else if (!cp) {
> + ((zero_encoding_t *)out)->c = cp;
> + ((zero_encoding_t *)out)->num = run_len;
> + dl += sizeof(zero_encoding_t);
> + out += sizeof(zero_encoding_t);
> + run_len = 1;
> + } else {
> + ((char_encoding_t *)out)->c = cp;
> + dl += sizeof(char_encoding_t);
> + out += sizeof(char_encoding_t);
> + }
> + cp = c;
> + }
> +
> + if (!cp) {
> + ((zero_encoding_t *)out)->c = cp;
> + ((zero_encoding_t *)out)->num = run_len;
> + dl += sizeof(zero_encoding_t);
> + out += sizeof(zero_encoding_t);
> + } else {
> + ((char_encoding_t *)out)->c = cp;
> + dl += sizeof(char_encoding_t);
> + out += sizeof(char_encoding_t);
> + }
> + return dl;
> +}
> +
> +static int rle_decode(const uint8_t *in, int slen, uint64_t *out, int dlen)
> +{
> + int tb = 0;
> + uint64_t run_len, c;
> +
> + while (slen> 0) {
> + c = ((char_encoding_t *) in)->c;
> + if (c) {
> + slen -= sizeof(char_encoding_t);
> + in += sizeof(char_encoding_t);
> + *out++ = c;
> + tb++;
> + continue;
> + }
> + run_len = ((zero_encoding_t *) in)->num;
> + slen -= sizeof(zero_encoding_t);
> + in += sizeof(zero_encoding_t);
> + while (run_len--> 0) {
> + *out++ = c;
> + tb++;
> + }
> + }
> + return tb;
> +}
> +
> +static void xor_encode_word(uint8_t *dst, const uint8_t *src1,
> + const uint8_t *src2)
> +{
> + int len = TARGET_PAGE_SIZE / sizeof (uint64_t);
> + uint64_t *dstw = (uint64_t *) dst;
> + const uint64_t *srcw1 = (const uint64_t *) src1;
> + const uint64_t *srcw2 = (const uint64_t *) src2;
> +
> + while (len--) {
> + *dstw++ = *srcw1++ ^ *srcw2++;
> + }
> +}
> +
> +int xbzrle_encode(uint8_t *xbzrle, const uint8_t *old, const uint8_t *curr,
> + const size_t max_compressed_len)
> +{
> + int compressed_len;
> + uint8_t xor_buf[TARGET_PAGE_SIZE];
> + uint8_t work_buf[TARGET_PAGE_SIZE * 2]; /* worst case xbzrle is 150% */
> +
> + xor_encode_word(xor_buf, old, curr);
> + compressed_len = rle_encode((uint64_t *)xor_buf,
> + sizeof(xor_buf)/sizeof(uint64_t), work_buf,
> + sizeof(work_buf));
> + if (compressed_len> max_compressed_len) {
> + return -1;
> + }
> + memcpy(xbzrle, work_buf, compressed_len);
> + return compressed_len;
> +}
> +
> +int xbzrle_decode(uint8_t *curr, const uint8_t *old, const uint8_t *xbrle,
> + const size_t compressed_len)
> +{
> + uint8_t xor_buf[TARGET_PAGE_SIZE];
> +
> + int len = rle_decode(xbrle, compressed_len,
> + (uint64_t *)xor_buf, sizeof(xor_buf)/sizeof(uint64_t));
> + if (len< 0) {
> + return len;
> + }
> + xor_encode_word(curr, old, xor_buf);
> + return len * sizeof(uint64_t);
> +}
> diff --git a/xbzrle.h b/xbzrle.h
> new file mode 100644
> index 0000000..5d625a0
> --- /dev/null
> +++ b/xbzrle.h
> @@ -0,0 +1,12 @@
> +#ifndef _XBZRLE_H_
> +#define _XBZRLE_H_
> +
> +#include<stdio.h>
> +
> +int xbzrle_encode(uint8_t *xbrle, const uint8_t *old, const uint8_t *curr,
> + const size_t len);
> +int xbzrle_decode(uint8_t *curr, const uint8_t *old, const uint8_t *xbrle,
> + const size_t len);
> +
> +#endif
> +
>
13 years, 4 months