[Libvir] [PATCH] virDomainMigrate version 4 (for discussion only!)

This is version 4 of the virDomainMigrate patch. It includes remote support, but no qemu, virsh or python yet. And it needs lots more testing which I intend to do more of tomorrow. Interface --------- The interface is now this: virDomainPtr virDomainMigrate (virDomainPtr domain, virConnectPtr dconn, unsigned long flags, const char *dname, const char *uri, unsigned long resource); The caller may set dname, uri and resource to 0/NULL and forget about them. Or else the caller may set, in particular, uri to allow for more complicated migration strategies (especially for qemu). https://www.redhat.com/archives/libvir-list/2007-July/msg00249.html Driver support -------------- As outlined in the diagram in this email: https://www.redhat.com/archives/libvir-list/2007-July/msg00264.html migration happens in two stages. Firstly we send a "prepare" message to the destination host. The destination host may reply with a cookie. It may also suggest a URI (in the current Xen implementation it just returns gethostname). Secondly we send a "perform" message to the source host. Correspondingly, there are two new driver API functions: typedef int (*virDrvDomainMigratePrepare) (virConnectPtr dconn, char **cookie, int *cookielen, const char *uri_in, char **uri_out, unsigned long flags, const char *dname, unsigned long resource); typedef int (*virDrvDomainMigratePerform) (virDomainPtr domain, const char *cookie, int cookielen, const char *uri, unsigned long flags, const char *dname, unsigned long resource); Remote support -------------- To make this work in the remote case I have had to export two private functions from the API which only remote should call: __virDomainMigratePrepare __virDomainMigratePerform The reason for this is that libvirtd is just linked to the regular libvirt.so, so can only make calls into libvirt through exported symbols in the dynamic symbol table. There are two corresponding wire messages (REMOTE_PROC_DOMAIN_MIGRATE_PREPARE and REMOTE_PROC_DOMAIN_MIGRATE_PERFORM) but they just do dumb argument shuffling, albeit rather complicated because of the number of arguments passed in and out. The complete list of messages which go across the wire during a migration is: client -- prepare --> destination host client <-- prepare reply -- destination host client -- perform --> source host client <-- perform reply -- source host client -- lookupbyname --> destination host client <-- lookupbyname reply -- destination host Xen URIs -------- Xen recognises the following forms of URI: hostname hostname:port tcp://hostname/ tcp://hostname:port/ Capabilities ------------ I have extended capabilities with <migration_features>. For Xen this is: <capabilities> <host> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features> Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

Attached is an updated patch which adds a "virsh migrate" subcommand. I also rebuilt the API & Python binding using the generator. Two known problems with the auto-generated Python binding at the moment: (1) It's attached to the connection object (conn.migrate (...)) instead of the domain object (dom.migrate (...)). It looks like the generator is confused about a function which takes both a domain ptr and a connect ptr. (2) I need to get the generator to export the VIR_MIGRATE_LIVE flag. At the moment you need to pass 0|1 for the flag parameter. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Thu, Jul 19, 2007 at 05:52:35PM +0100, Richard W.M. Jones wrote:
Attached is an updated patch which adds a "virsh migrate" subcommand.
patch looks fine to me. That will allow testing :-)
I also rebuilt the API & Python binding using the generator.
Two known problems with the auto-generated Python binding at the moment:
(1) It's attached to the connection object (conn.migrate (...)) instead of the domain object (dom.migrate (...)). It looks like the generator is confused about a function which takes both a domain ptr and a connect ptr.
Hum, maybe I could look what I did for libxml2, usually the function then can attach to both classes. I wonder why it's different here.
(2) I need to get the generator to export the VIR_MIGRATE_LIVE flag. At the moment you need to pass 0|1 for the flag parameter.
Strange, macros should show up, at least they do at the XML level (e.g. <exports symbol='VIR_DOMAIN_SCHED_FIELD_LENGTH' type='macro'/> ) maybe that's something I did in the libxml2 generator but didn't backport in libvirt. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Richard W.M. Jones wrote:
Attached is an updated patch which adds a "virsh migrate" subcommand.
I also rebuilt the API & Python binding using the generator.
Two known problems with the auto-generated Python binding at the moment:
(1) It's attached to the connection object (conn.migrate (...)) instead of the domain object (dom.migrate (...)). It looks like the generator is confused about a function which takes both a domain ptr and a connect ptr.
Actually the above isn't right. The generator makes two copies of the function: class virDomain: def migrate(self, dconn, flags, dname, uri, bandwidth): and: class virConnect: def migrate(self, domain, flags, dname, uri, bandwidth): The first (virDomain.migrate) binding unfortunately still a contained problem: The generator doesn't understand that the domain object which is returned exists in the scope of the destination connection (dconn, not conn). This was solved in the general case when I added the virDomainGetConnect[1] function, so I will change the Python generator accordingly in a future patch. The second (virConnect.migrate) binding is correct, although a little peculiar. You would have to use it like this: dconn.migrate (domain, etc.). If people are happy with this, I will leave it.
(2) I need to get the generator to export the VIR_MIGRATE_LIVE flag. At the moment you need to pass 0|1 for the flag parameter.
Added this to the upcoming (version 5) patch. Rich. [1]http://www.redhat.com/archives/libvir-list/2007-June/thread.html#00365

Some observations about Xen migration and error handling. The Xen migration protocol isn't stable between releases. It changed between 3.0.3 and 3.1.0. There doesn't seem to be any versioning, and incompatible versions of Xen seem happy to attempt migrations between them, even though these will certainly fail. The source host's xend forks an xc_save process. It appears to me that xc_save will happily write to _anything_ listening on port 8002, even if that thing closes the socket prematurely. (Try running 'xm migrate' on one host and at the same time 'nc -l 8002 > /dev/null' on the target host. The 'xm migrate' will happily complete without error. Meanwhile the domain you "migrated" just gets deleted.) Partly because of the lack of error reporting, and partly because the xend -> xc_save fork will make error reporting difficult to add, libvirt has a hard time displaying errors in this situation. It is quite possible to call virDomainMigrate and get a domain back which shortly afterwards "disappears", all without any indication of error. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Thu, Jul 19, 2007 at 06:03:29PM +0100, Richard W.M. Jones wrote:
Some observations about Xen migration and error handling.
It's a horrible mess. I'm in a large group of people who want to see this all fixed (with yet another break) at the same time secure migration is implemented... regards john

On Thu, Jul 19, 2007 at 06:03:29PM +0100, Richard W.M. Jones wrote:
Some observations about Xen migration and error handling.
The Xen migration protocol isn't stable between releases. It changed between 3.0.3 and 3.1.0. There doesn't seem to be any versioning, and incompatible versions of Xen seem happy to attempt migrations between them, even though these will certainly fail.
The source host's xend forks an xc_save process. It appears to me that xc_save will happily write to _anything_ listening on port 8002, even if that thing closes the socket prematurely. (Try running 'xm migrate' on one host and at the same time 'nc -l 8002 > /dev/null' on the target host. The 'xm migrate' will happily complete without error. Meanwhile the domain you "migrated" just gets deleted.)
Partly because of the lack of error reporting, and partly because the xend -> xc_save fork will make error reporting difficult to add, libvirt has a hard time displaying errors in this situation. It is quite possible to call virDomainMigrate and get a domain back which shortly afterwards "disappears", all without any indication of error.
We need to be careful about where we draw the line here. We can jump through all sorts of hoops in libvirt, but at the end of the day there is some majorly broken stuff in Xen really just needs fixing rather than working around. I'd rather submit fixes to upstream Xen where needed to make migration more reliable than put too much complexity into libvirt, even if it means we have more limited error reporting for current XenD. The mailing list archive links eludes me right now, but upstream Xen was reasonably receptive to the idea of bringing xc_save/restore back into XenD process which would resolve a huge class of error reporting problems. The original motivation for making them separate processes was that the code was fragile and crashed XenD a fair bit, but that's likely no longer a problem. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Wed, Jul 18, 2007 at 08:22:37PM +0100, Richard W.M. Jones wrote:
This is version 4 of the virDomainMigrate patch. It includes remote support, but no qemu, virsh or python yet. And it needs lots more testing which I intend to do more of tomorrow.
Interface ---------
The interface is now this:
virDomainPtr virDomainMigrate (virDomainPtr domain, virConnectPtr dconn, unsigned long flags, const char *dname, const char *uri, unsigned long resource);
I would probably rename resource as 'bandwidth' it's more precise.
The caller may set dname, uri and resource to 0/NULL and forget about them. Or else the caller may set, in particular, uri to allow for more complicated migration strategies (especially for qemu).
https://www.redhat.com/archives/libvir-list/2007-July/msg00249.html
Sounds fine, I would have loved a simpler API though.
As outlined in the diagram in this email: https://www.redhat.com/archives/libvir-list/2007-July/msg00264.html migration happens in two stages.
To some extend that's "implementation details" compared to the API but I understand why two steps are needed under the hood, fine by me.
Firstly we send a "prepare" message to the destination host. The destination host may reply with a cookie. It may also suggest a URI (in the current Xen implementation it just returns gethostname). Secondly we send a "perform" message to the source host.
Correspondingly, there are two new driver API functions:
typedef int (*virDrvDomainMigratePrepare) (virConnectPtr dconn, char **cookie, int *cookielen, const char *uri_in, char **uri_out, unsigned long flags, const char *dname, unsigned long resource);
typedef int (*virDrvDomainMigratePerform) (virDomainPtr domain, const char *cookie, int cookielen, const char *uri, unsigned long flags, const char *dname, unsigned long resource);
I wonder if 2 steps are really sufficient. I have the feeling that a third step virDrvDomainMigrateFinish() might be needed, it could for example resume on the target side and also verify the domain is actually okay. That could improve error handling and feels a lot more like a transactional system where you really want an atomic work/fail operation and nothing else.
There are two corresponding wire messages (REMOTE_PROC_DOMAIN_MIGRATE_PREPARE and REMOTE_PROC_DOMAIN_MIGRATE_PERFORM) but they just do dumb argument shuffling, albeit rather complicated because of the number of arguments passed in and out.
The complete list of messages which go across the wire during a migration is:
client -- prepare --> destination host client <-- prepare reply -- destination host client -- perform --> source host client <-- perform reply -- source host client -- lookupbyname --> destination host client <-- lookupbyname reply -- destination host
Okay, instead of trying to reuse lookupbyname to assert completion, I would rather make a third special entry point. Sounds more generic to me, but again it's an implementation point, not a blocker, all this is hidden behind the API.
Xen URIs --------
Xen recognises the following forms of URI: hostname hostname:port tcp://hostname/ tcp://hostname:port/
From an URI perspective "tcp" is not proper IMHO, it should be a protocol name, make that xenmigr or something.
Capabilities ------------
I have extended capabilities with <migration_features>. For Xen this is:
<capabilities> <host> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features>
Nice, but what is the expected set of values for uri_transport ? Theorically that can be any scheme name from rfc2396 (or later) scheme = alpha *( alpha | digit | "+" | "-" | "." ) unless I misunderstood this. [...]
+/* Domain migration flags. */ +#define VIR_MIGRATE_LIVE 1 + +/* Domain migration. */ +virDomainPtr virDomainMigrate (virDomainPtr domain, virConnectPtr dconn, + unsigned long flags, const char *dname, + const char *uri, unsigned long resource);
I would rename resource to bandwidth :-) [...]
#define REMOTE_CPUMAPS_MAX 16384 +#define REMOTE_MIGRATE_COOKIE_MAX 256
hum, what is that ? Sorry to show up ignorance, feel free to point me to an xdr FAQ !
@@ -632,6 +663,8 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_GET_SCHEDULER_PARAMETERS = 57, REMOTE_PROC_DOMAIN_SET_SCHEDULER_PARAMETERS = 58, REMOTE_PROC_GET_HOSTNAME = 59, + REMOTE_PROC_DOMAIN_MIGRATE_PERFORM = 60, + REMOTE_PROC_DOMAIN_MIGRATE_PREPARE = 61, };
I would like to see a distinct third step FINISH, as I think it would be more generic. [...]
+ * virDomainMigrate: + * @domain: a domain object + * @dconn: destination host (a connection object) + * @flags: flags + * @dname: (optional) rename domain to this at destination + * @uri: (optional) dest hostname/URI as seen from the source host + * @resource: (optional) specify resource limit in Mbps + * + * Migrate the domain object from its current host to the destination + * host given by dconn (a connection to the destination host). + * + * Flags may be one of more of the following: + * VIR_MIGRATE_LIVE Attempt a live migration. + * + * If a hypervisor supports renaming domains during migration, + * then you may set the dname parameter to the new name (otherwise + * it keeps the same name). If this is not supported by the + * hypervisor, dname must be NULL or else you will get an error.
hum, sounds a bit drastic to me, but might be the right thing. In any case the capability to rename should be added to the <capabilities><migration_features> probably as a <rename/> optional element.
+ * Since typically the two hypervisors connect directly to each + * other in order to perform the migration, you may need to specify + * a path from the source to the destination. This is the purpose + * of the uri parameter. If uri is NULL, then libvirt will try to + * find the best method. Uri may specify the hostname or IP address + * of the destination host as seen from the source. Or uri may be + * a URI giving transport, hostname, user, port, etc. in the usual + * form. Refer to driver documentation for the particular URIs + * supported. + * + * The maximum bandwidth (in Mbps) that will be used to do migration + * can be specified with the resource parameter. If set to 0, + * libvirt will choose a suitable default. Some hypervisors do + * not support this feature and will return an error if resource + * is not 0.
Do you really want to fail there too ? Similary the capability to limit bandwidth should be added to the <capabilities><migration_features> possibly as a <bandwidth/> optional element.
+ * To see which features are supported by the current hypervisor, + * see virConnectGetCapabilities, /capabilities/host/migration_features. + * + * There are many limitations on migration imposed by the underlying + * technology - for example it may not be possible to migrate between + * different processors even with the same architecture, or between + * different types of hypervisor. + * + * Returns the new domain object if the migration was successful, + * or NULL in case of error. + */ +virDomainPtr +virDomainMigrate (virDomainPtr domain, + virConnectPtr dconn, + unsigned long flags, + const char *dname, + const char *uri, + unsigned long resource) +{ + virConnectPtr conn; + char *uri_out = NULL; + char *cookie = NULL; + int cookielen = 0, ret; + DEBUG("domain=%p, dconn=%p, flags=%lu, dname=%s, uri=%s, resource=%lu", + domain, dconn, flags, dname, uri, resource); + + if (!VIR_IS_DOMAIN (domain)) { + virLibDomainError(domain, VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + return NULL; + } + conn = domain->conn; /* Source connection. */ + if (!VIR_IS_CONNECT (dconn)) { + virLibConnError (conn, VIR_ERR_INVALID_CONN, __FUNCTION__); + return NULL; + } + + /* Check that migration is supported. + * Note that in the remote case these functions always exist. + * But the remote driver will throw NO_SUPPORT errors later. + */ + if (!conn->driver->domainMigratePerform || + !dconn->driver->domainMigratePrepare) { + virLibConnError (conn, VIR_ERR_NO_SUPPORT, __FUNCTION__); + return NULL; + } + + /* Prepare the migration. + * + * The destination host may return a cookie, or leave cookie as + * NULL. + * + * The destination host MUST set uri_out if uri_in is NULL. + * + * If uri_in is non-NULL, then the destination host may modify + * the URI by setting uri_out. If it does not wish to modify + * the URI, it should leave uri_out as NULL. + */ + ret = dconn->driver->domainMigratePrepare + (dconn, &cookie, &cookielen, uri, &uri_out, flags, dname, resource); + if (ret == -1) return NULL; + if (uri == NULL && uri_out == NULL) { + virLibConnError (conn, VIR_ERR_INTERNAL_ERROR, + "domainMigratePrepare did not set uri"); + if (cookie) free (cookie); + return NULL; + } + if (uri_out) uri = uri_out; /* Did domainMigratePrepare change URI? */ + + assert (uri != NULL); + + /* Perform the migration. The driver isn't supposed to return + * until the migration is complete. + */ + ret = conn->driver->domainMigratePerform + (domain, cookie, cookielen, uri, flags, dname, resource); + + if (uri_out) free (uri_out); + if (cookie) free (cookie); + + if (ret == -1) return NULL; + + /* Get the destination domain and return it or error. */ + return virDomainLookupByName (dconn, dname ? dname : domain->name); +}
--- src/xen_internal.c 12 Jul 2007 08:57:52 -0000 1.85 +++ src/xen_internal.c 18 Jul 2007 19:00:20 -0000 @@ -2173,6 +2173,12 @@ xenHypervisorMakeCapabilitiesXML(virConn "\ </features>\n\ </cpu>\n\ + <migration_features>\n\ + <live/>\n\
Looks fine, but a finish operation returning a virDomainPtr may be better. It would carry the cookie and names to identify the migration. Crazy question, what happens if the remote task gets migrated away again or just killed before the lookup :-) , I'm not sure we can come up with an atomic operation in the general case, but that would be nice to allow it in the design if the hypervisor makes it possible. [...] plus <bandwidth/> <rename/>
+ <uri_transports>\n\ + <uri_transport>tcp</uri_transport>\n\ + </uri_transports>\n\ + </migration_features>\n\ </host>\n", -1); if (r == -1) goto vir_buffer_failed; [...] + /* Xen (at least up to 3.1.0) takes a resource parameter but + * ignores it. + */ + if (resource) { + virXendError (conn, VIR_ERR_NO_SUPPORT, + "xenDaemonDomainMigrate: Xen does not support resource limits during migration"); + return -1; + }
Hum, strange, the classical Xen bandwidth during live migration of http picture uses a 100Mbps capped transfer, so that ought to work at least at some point in the past, weird. [...]
+ /* Set hostname and port. + * + * URI is non-NULL (guaranteed by caller). We expect either + * "hostname", "hostname:port" or "tcp://hostname[:port]/". + */ + if (strstr (uri, "//")) { /* Full URI. */ + xmlURIPtr uriptr = xmlParseURI (uri); + if (!uriptr) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: invalid URI"); + return -1; + } + if (uriptr->scheme && STRCASENEQ (uriptr->scheme, "tcp")) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: only tcp:// migrations are supported by Xen"); + xmlFreeURI (uriptr); + return -1; + } + if (!uriptr->server) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: a hostname must be specified in the URI"); + xmlFreeURI (uriptr); + return -1; + } + hostname = strdup (uriptr->server); + if (!hostname) { + virXendError (conn, VIR_ERR_NO_MEMORY, "strdup"); + xmlFreeURI (uriptr); + return -1; + } + if (uriptr->port) + snprintf (port, sizeof port, "%d", uriptr->port); + xmlFreeURI (uriptr); + } + else if ((p = strrchr (uri, ':')) != NULL) { /* "hostname:port" */ + int port_nr, n; + + if (sscanf (p+1, "%d", &port_nr) != 1) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: invalid port number"); + return -1; + } + snprintf (port, sizeof port, "%d", port_nr); + + /* Get the hostname. */ + n = p - uri; /* n = Length of hostname in bytes. */ + hostname = strdup (uri); + if (!hostname) { + virXendError (conn, VIR_ERR_NO_MEMORY, "strdup"); + return -1; + } + hostname[n] = '\0'; + } + else { /* "hostname" (or IP address) */ + hostname = strdup (uri); + if (!hostname) { + virXendError (conn, VIR_ERR_NO_MEMORY, "strdup"); + return -1; + } + }
Hum, we say it's an uri, but we interpret is differently if it's not absolute this could lead to confusion. But I'm not sure being a purist here would help that much from an user POV. In general this looks really good to me. I'm just worried about the 3rd part of the operation, I don't think it would be that hard to add, but would IMHO make the protocol at the network level more future-proof, if we could avoid breaking it in the future that woud be nice to put in now. But that's not a blocker to me. thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard wrote:
virDomainPtr virDomainMigrate (virDomainPtr domain, virConnectPtr dconn, unsigned long flags, const char *dname, const char *uri, unsigned long resource);
I would probably rename resource as 'bandwidth' it's more precise.
Yes, I think you're right.
The caller may set dname, uri and resource to 0/NULL and forget about them. Or else the caller may set, in particular, uri to allow for more complicated migration strategies (especially for qemu).
https://www.redhat.com/archives/libvir-list/2007-July/msg00249.html
Sounds fine, I would have loved a simpler API though.
Me too. However I think that all the parameters have their place. The only change I might suggest would be to bring back the list of optional parameters (see the first version of the API), and move things like flags, dname and resource^Wbandwidth there. (The URI however is quite fundamental). I don't know what you think of that. Having lists of parameters is flexible but in its own way somewhat complicated.
As outlined in the diagram in this email: https://www.redhat.com/archives/libvir-list/2007-July/msg00264.html migration happens in two stages.
To some extend that's "implementation details" compared to the API but I understand why two steps are needed under the hood, fine by me.
Firstly we send a "prepare" message to the destination host. The destination host may reply with a cookie. It may also suggest a URI (in the current Xen implementation it just returns gethostname). Secondly we send a "perform" message to the source host.
Correspondingly, there are two new driver API functions:
typedef int (*virDrvDomainMigratePrepare) (virConnectPtr dconn, char **cookie, int *cookielen, const char *uri_in, char **uri_out, unsigned long flags, const char *dname, unsigned long resource);
typedef int (*virDrvDomainMigratePerform) (virDomainPtr domain, const char *cookie, int cookielen, const char *uri, unsigned long flags, const char *dname, unsigned long resource);
I wonder if 2 steps are really sufficient. I have the feeling that a third step virDrvDomainMigrateFinish() might be needed, it could for example resume on the target side and also verify the domain is actually okay. That could improve error handling and feels a lot more like a transactional system where you really want an atomic work/fail operation and nothing else.
Yes. It's important to note that the internals may be changed later, although that may complicate things if we want to allow people to run incompatible versions of the daemons. [Another email on that subject is coming up after this one]. I'm not sure how exactly what you propose would work in the Xen case. In the common error case (incompatible xend leading to domains being eaten), the domain is actually created for a short while on the dest host. It dies later, but it seems possible to me that there is a race where domainMigrateFinish could return "OK", and yet the domain would fail later. In another case -- where the domain could not connect to its iSCSI backend -- this could be even more common. Note also that there is a third step in the current code. After domainMigrate has finished, the controlling client then does "virDomainLookupByName" in order to fetch the destination domain object. This is subject to the race conditions in the preceeding paragraph.
There are two corresponding wire messages (REMOTE_PROC_DOMAIN_MIGRATE_PREPARE and REMOTE_PROC_DOMAIN_MIGRATE_PERFORM) but they just do dumb argument shuffling, albeit rather complicated because of the number of arguments passed in and out.
The complete list of messages which go across the wire during a migration is:
client -- prepare --> destination host client <-- prepare reply -- destination host client -- perform --> source host client <-- perform reply -- source host client -- lookupbyname --> destination host client <-- lookupbyname reply -- destination host
Okay, instead of trying to reuse lookupbyname to assert completion, I would rather make a third special entry point. Sounds more generic to me, but again it's an implementation point, not a blocker, all this is hidden behind the API.
Noted.
Xen URIs --------
Xen recognises the following forms of URI: hostname hostname:port tcp://hostname/ tcp://hostname:port/
From an URI perspective "tcp" is not proper IMHO, it should be a protocol name, make that xenmigr or something.
OK.
Capabilities ------------
I have extended capabilities with <migration_features>. For Xen this is:
<capabilities> <host> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features>
Nice, but what is the expected set of values for uri_transport ? Theorically that can be any scheme name from rfc2396 (or later) scheme = alpha *( alpha | digit | "+" | "-" | "." )
unless I misunderstood this.
I think I'm misunderstanding you. In the Xen case it would be changed to <uri_transport>xenmigr</uri_transport>.
[...]
+/* Domain migration flags. */ +#define VIR_MIGRATE_LIVE 1 + +/* Domain migration. */ +virDomainPtr virDomainMigrate (virDomainPtr domain, virConnectPtr dconn, + unsigned long flags, const char *dname, + const char *uri, unsigned long resource);
I would rename resource to bandwidth :-)
[...]
#define REMOTE_CPUMAPS_MAX 16384 +#define REMOTE_MIGRATE_COOKIE_MAX 256
hum, what is that ? Sorry to show up ignorance, feel free to point me to an xdr FAQ !
In XDR you can either have unlimited strings (well, the limit is 2^32-1), or you can impose a limit on their length. Some common types in XDR notation: Unlimited strings: string foo<>; Limited-length strings: string foo<1000>; Fixed-length strings: char foo[1000]; Byte arrays: opaque foo[1000]; Now if we just defined all strings as <> type, then the automatically-built XDR receivers would accept unlimited amounts of data from a buggy or malicious client. In particular they would attempt to allocate large amounts of memory, crashing the server[1]. So instead we impose upper limits on the length of various strings. They are defined at the top of qemud/remote_protocol.x.
@@ -632,6 +663,8 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_GET_SCHEDULER_PARAMETERS = 57, REMOTE_PROC_DOMAIN_SET_SCHEDULER_PARAMETERS = 58, REMOTE_PROC_GET_HOSTNAME = 59, + REMOTE_PROC_DOMAIN_MIGRATE_PERFORM = 60, + REMOTE_PROC_DOMAIN_MIGRATE_PREPARE = 61, };
I would like to see a distinct third step FINISH, as I think it would be more generic.
[...]
+ * virDomainMigrate: + * @domain: a domain object + * @dconn: destination host (a connection object) + * @flags: flags + * @dname: (optional) rename domain to this at destination + * @uri: (optional) dest hostname/URI as seen from the source host + * @resource: (optional) specify resource limit in Mbps + * + * Migrate the domain object from its current host to the destination + * host given by dconn (a connection to the destination host). + * + * Flags may be one of more of the following: + * VIR_MIGRATE_LIVE Attempt a live migration. + * + * If a hypervisor supports renaming domains during migration, + * then you may set the dname parameter to the new name (otherwise + * it keeps the same name). If this is not supported by the + * hypervisor, dname must be NULL or else you will get an error.
hum, sounds a bit drastic to me, but might be the right thing. In any case the capability to rename should be added to the <capabilities><migration_features> probably as a <rename/> optional element.
+ * Since typically the two hypervisors connect directly to each + * other in order to perform the migration, you may need to specify + * a path from the source to the destination. This is the purpose + * of the uri parameter. If uri is NULL, then libvirt will try to + * find the best method. Uri may specify the hostname or IP address + * of the destination host as seen from the source. Or uri may be + * a URI giving transport, hostname, user, port, etc. in the usual + * form. Refer to driver documentation for the particular URIs + * supported. + * + * The maximum bandwidth (in Mbps) that will be used to do migration + * can be specified with the resource parameter. If set to 0, + * libvirt will choose a suitable default. Some hypervisors do + * not support this feature and will return an error if resource + * is not 0.
Do you really want to fail there too ?
I'm not sure ... It seemed like the safest thing to do since we are unable to enforce the requested limit so Bad Things might happen (effectively a DoS on the network).
Similary the capability to limit bandwidth should be added to the <capabilities><migration_features> possibly as a <bandwidth/> optional element.
Yes. Note that although xend takes and indeed requires a resource parameter, the implementation in xen 3.1 completely ignores it. For this reason, <bandwidth/> is _not_ a xen 3.1 capability.
+ * To see which features are supported by the current hypervisor, + * see virConnectGetCapabilities, /capabilities/host/migration_features. + * + * There are many limitations on migration imposed by the underlying + * technology - for example it may not be possible to migrate between + * different processors even with the same architecture, or between + * different types of hypervisor. + * + * Returns the new domain object if the migration was successful, + * or NULL in case of error. + */ +virDomainPtr +virDomainMigrate (virDomainPtr domain, + virConnectPtr dconn, + unsigned long flags, + const char *dname, + const char *uri, + unsigned long resource) +{ + virConnectPtr conn; + char *uri_out = NULL; + char *cookie = NULL; + int cookielen = 0, ret; + DEBUG("domain=%p, dconn=%p, flags=%lu, dname=%s, uri=%s, resource=%lu", + domain, dconn, flags, dname, uri, resource); + + if (!VIR_IS_DOMAIN (domain)) { + virLibDomainError(domain, VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + return NULL; + } + conn = domain->conn; /* Source connection. */ + if (!VIR_IS_CONNECT (dconn)) { + virLibConnError (conn, VIR_ERR_INVALID_CONN, __FUNCTION__); + return NULL; + } + + /* Check that migration is supported. + * Note that in the remote case these functions always exist. + * But the remote driver will throw NO_SUPPORT errors later. + */ + if (!conn->driver->domainMigratePerform || + !dconn->driver->domainMigratePrepare) { + virLibConnError (conn, VIR_ERR_NO_SUPPORT, __FUNCTION__); + return NULL; + } + + /* Prepare the migration. + * + * The destination host may return a cookie, or leave cookie as + * NULL. + * + * The destination host MUST set uri_out if uri_in is NULL. + * + * If uri_in is non-NULL, then the destination host may modify + * the URI by setting uri_out. If it does not wish to modify + * the URI, it should leave uri_out as NULL. + */ + ret = dconn->driver->domainMigratePrepare + (dconn, &cookie, &cookielen, uri, &uri_out, flags, dname, resource); + if (ret == -1) return NULL; + if (uri == NULL && uri_out == NULL) { + virLibConnError (conn, VIR_ERR_INTERNAL_ERROR, + "domainMigratePrepare did not set uri"); + if (cookie) free (cookie); + return NULL; + } + if (uri_out) uri = uri_out; /* Did domainMigratePrepare change URI? */ + + assert (uri != NULL); + + /* Perform the migration. The driver isn't supposed to return + * until the migration is complete. + */ + ret = conn->driver->domainMigratePerform + (domain, cookie, cookielen, uri, flags, dname, resource); + + if (uri_out) free (uri_out); + if (cookie) free (cookie); + + if (ret == -1) return NULL; + + /* Get the destination domain and return it or error. */ + return virDomainLookupByName (dconn, dname ? dname : domain->name); +}
Looks fine, but a finish operation returning a virDomainPtr may be better. It would carry the cookie and names to identify the migration. Crazy question, what happens if the remote task gets migrated away again or just killed before the lookup :-) , I'm not sure we can come up with an atomic operation in the general case, but that would be nice to allow it in the design if the hypervisor makes it possible.
--- src/xen_internal.c 12 Jul 2007 08:57:52 -0000 1.85 +++ src/xen_internal.c 18 Jul 2007 19:00:20 -0000 @@ -2173,6 +2173,12 @@ xenHypervisorMakeCapabilitiesXML(virConn "\ </features>\n\ </cpu>\n\ + <migration_features>\n\ + <live/>\n\
[...] plus <bandwidth/> <rename/>
+ <uri_transports>\n\ + <uri_transport>tcp</uri_transport>\n\ + </uri_transports>\n\ + </migration_features>\n\ </host>\n", -1); if (r == -1) goto vir_buffer_failed; [...] + /* Xen (at least up to 3.1.0) takes a resource parameter but + * ignores it. + */ + if (resource) { + virXendError (conn, VIR_ERR_NO_SUPPORT, + "xenDaemonDomainMigrate: Xen does not support resource limits during migration"); + return -1; + }
Hum, strange, the classical Xen bandwidth during live migration of http picture uses a 100Mbps capped transfer, so that ought to work at least at some point in the past, weird.
[...]
+ /* Set hostname and port. + * + * URI is non-NULL (guaranteed by caller). We expect either + * "hostname", "hostname:port" or "tcp://hostname[:port]/". + */ + if (strstr (uri, "//")) { /* Full URI. */ + xmlURIPtr uriptr = xmlParseURI (uri); + if (!uriptr) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: invalid URI"); + return -1; + } + if (uriptr->scheme && STRCASENEQ (uriptr->scheme, "tcp")) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: only tcp:// migrations are supported by Xen"); + xmlFreeURI (uriptr); + return -1; + } + if (!uriptr->server) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: a hostname must be specified in the URI"); + xmlFreeURI (uriptr); + return -1; + } + hostname = strdup (uriptr->server); + if (!hostname) { + virXendError (conn, VIR_ERR_NO_MEMORY, "strdup"); + xmlFreeURI (uriptr); + return -1; + } + if (uriptr->port) + snprintf (port, sizeof port, "%d", uriptr->port); + xmlFreeURI (uriptr); + } + else if ((p = strrchr (uri, ':')) != NULL) { /* "hostname:port" */ + int port_nr, n; + + if (sscanf (p+1, "%d", &port_nr) != 1) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: invalid port number"); + return -1; + } + snprintf (port, sizeof port, "%d", port_nr); + + /* Get the hostname. */ + n = p - uri; /* n = Length of hostname in bytes. */ + hostname = strdup (uri); + if (!hostname) { + virXendError (conn, VIR_ERR_NO_MEMORY, "strdup"); + return -1; + } + hostname[n] = '\0'; + } + else { /* "hostname" (or IP address) */ + hostname = strdup (uri); + if (!hostname) { + virXendError (conn, VIR_ERR_NO_MEMORY, "strdup"); + return -1; + } + }
Hum, we say it's an uri, but we interpret is differently if it's not absolute this could lead to confusion. But I'm not sure being a purist here would help that much from an user POV.
In general this looks really good to me. I'm just worried about the 3rd part of the operation, I don't think it would be that hard to add, but would IMHO make the protocol at the network level more future-proof, if we could avoid breaking it in the future that woud be nice to put in now. But that's not a blocker to me.
Rich. [1] Actually there are various other measures to prevent this from happening, including a maximum total message size that we will accept, but defense in depth ...

On Mon, Jul 23, 2007 at 11:00:21AM +0100, Richard W.M. Jones wrote:
Daniel Veillard wrote:
Firstly we send a "prepare" message to the destination host. The destination host may reply with a cookie. It may also suggest a URI (in the current Xen implementation it just returns gethostname). Secondly we send a "perform" message to the source host.
Correspondingly, there are two new driver API functions:
typedef int (*virDrvDomainMigratePrepare) (virConnectPtr dconn, char **cookie, int *cookielen, const char *uri_in, char **uri_out, unsigned long flags, const char *dname, unsigned long resource);
typedef int (*virDrvDomainMigratePerform) (virDomainPtr domain, const char *cookie, int cookielen, const char *uri, unsigned long flags, const char *dname, unsigned long resource);
I wonder if 2 steps are really sufficient. I have the feeling that a third step virDrvDomainMigrateFinish() might be needed, it could for example resume on the target side and also verify the domain is actually okay. That could improve error handling and feels a lot more like a transactional system where you really want an atomic work/fail operation and nothing else.
Yes. It's important to note that the internals may be changed later, although that may complicate things if we want to allow people to run incompatible versions of the daemons. [Another email on that subject is coming up after this one].
I'm not sure how exactly what you propose would work in the Xen case. In the common error case (incompatible xend leading to domains being eaten), the domain is actually created for a short while on the dest host. It dies later, but it seems possible to me that there is a race where domainMigrateFinish could return "OK", and yet the domain would fail later. In another case -- where the domain could not connect to its iSCSI backend -- this could be even more common.
Note also that there is a third step in the current code. After domainMigrate has finished, the controlling client then does "virDomainLookupByName" in order to fetch the destination domain object. This is subject to the race conditions in the preceeding paragraph.
Yes, my point is that other migration may need a more complex third step, for example to activate the domain on the target after the transfer of the data in step 2. In the case of xen the semantic of the migrate command includes the restart on the other side but that may not always be the case. In the case of Xen then the backend can just call virDomainLookupByName on the remote target node.
There are two corresponding wire messages (REMOTE_PROC_DOMAIN_MIGRATE_PREPARE and REMOTE_PROC_DOMAIN_MIGRATE_PERFORM) but they just do dumb argument shuffling, albeit rather complicated because of the number of arguments passed in and out.
The complete list of messages which go across the wire during a migration is:
client -- prepare --> destination host client <-- prepare reply -- destination host client -- perform --> source host client <-- perform reply -- source host client -- lookupbyname --> destination host client <-- lookupbyname reply -- destination host
Okay, instead of trying to reuse lookupbyname to assert completion, I would rather make a third special entry point. Sounds more generic to me, but again it's an implementation point, not a blocker, all this is hidden behind the API.
Noted.
At some point we should make the network protocol part of the ABI, but let's try to avoid breaking it too often :-)
Capabilities ------------
I have extended capabilities with <migration_features>. For Xen this is:
<capabilities> <host> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features>
Nice, but what is the expected set of values for uri_transport ? Theorically that can be any scheme name from rfc2396 (or later) scheme = alpha *( alpha | digit | "+" | "-" | "." )
unless I misunderstood this.
I think I'm misunderstanding you. In the Xen case it would be changed to <uri_transport>xenmigr</uri_transport>.
yes, if you think its okay.
#define REMOTE_CPUMAPS_MAX 16384 +#define REMOTE_MIGRATE_COOKIE_MAX 256
hum, what is that ? Sorry to show up ignorance, feel free to point me to an xdr FAQ !
In XDR you can either have unlimited strings (well, the limit is 2^32-1), or you can impose a limit on their length. Some common types in XDR notation:
Unlimited strings: string foo<>; Limited-length strings: string foo<1000>; Fixed-length strings: char foo[1000]; Byte arrays: opaque foo[1000];
Now if we just defined all strings as <> type, then the automatically-built XDR receivers would accept unlimited amounts of data from a buggy or malicious client. In particular they would attempt to allocate large amounts of memory, crashing the server[1]. So instead we impose upper limits on the length of various strings. They are defined at the top of qemud/remote_protocol.x.
okay, thanks for the explanations :-)
+ * Since typically the two hypervisors connect directly to each + * other in order to perform the migration, you may need to specify + * a path from the source to the destination. This is the purpose + * of the uri parameter. If uri is NULL, then libvirt will try to + * find the best method. Uri may specify the hostname or IP address + * of the destination host as seen from the source. Or uri may be + * a URI giving transport, hostname, user, port, etc. in the usual + * form. Refer to driver documentation for the particular URIs + * supported. + * + * The maximum bandwidth (in Mbps) that will be used to do migration + * can be specified with the resource parameter. If set to 0, + * libvirt will choose a suitable default. Some hypervisors do + * not support this feature and will return an error if resource + * is not 0.
Do you really want to fail there too ?
I'm not sure ... It seemed like the safest thing to do since we are unable to enforce the requested limit so Bad Things might happen (effectively a DoS on the network).
Okay, it's probably better to avoid any fuzziness in the definition of the API, and then error in that case. I'm convinced !
Similary the capability to limit bandwidth should be added to the <capabilities><migration_features> possibly as a <bandwidth/> optional element.
Yes. Note that although xend takes and indeed requires a resource parameter, the implementation in xen 3.1 completely ignores it. For this reason, <bandwidth/> is _not_ a xen 3.1 capability.
okay thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Mon, Jul 23, 2007 at 11:00:21AM +0100, Richard W.M. Jones wrote:
step virDrvDomainMigrateFinish() might be needed, it could for example resume on the target side and also verify the domain is actually okay. That could improve error handling and feels a lot more like a transactional system where you really want an atomic work/fail operation and nothing else.
Yes. It's important to note that the internals may be changed later, although that may complicate things if we want to allow people to run incompatible versions of the daemons. [Another email on that subject is coming up after this one].
We need to treat the binary on-the-wire ABI for the client<->daemon the same way as our public ELF library ABI. Never break it, only add to it. So if we think the internals client<->server needs may change we should avoid including this code in a release until we're more satisfied with its longevity. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Daniel P. Berrange wrote:
On Mon, Jul 23, 2007 at 11:00:21AM +0100, Richard W.M. Jones wrote:
step virDrvDomainMigrateFinish() might be needed, it could for example resume on the target side and also verify the domain is actually okay. That could improve error handling and feels a lot more like a transactional system where you really want an atomic work/fail operation and nothing else. Yes. It's important to note that the internals may be changed later, although that may complicate things if we want to allow people to run incompatible versions of the daemons. [Another email on that subject is coming up after this one].
We need to treat the binary on-the-wire ABI for the client<->daemon the same way as our public ELF library ABI. Never break it, only add to it. So if we think the internals client<->server needs may change we should avoid including this code in a release until we're more satisfied with its longevity.
Yesterday I wrote: "It's important to note that the internals may be changed later, although that may complicate things if we want to allow people to run incompatible versions of the daemons. [Another email on that subject is coming up after this one]." When I actually looked into it, it turned out to be trickier and less useful than I thought, so the promised email never came. At the moment the function src/libvirt.c: virDomainMigrate is a kind of "coordination function", in that it coordinates the steps necessary (ie. domainMigratePrepare -> domainMigratePerform -> etc.) One of the things which this coordination function does first is to check that the drivers support migration at all. The code is: /* Check that migration is supported. * Note that in the remote case these functions always exist. * But the remote driver will throw NO_SUPPORT errors later. */ if (!conn->driver->domainMigratePerform || !dconn->driver->domainMigratePrepare) { virLibConnError (conn, VIR_ERR_NO_SUPPORT, __FUNCTION__); return NULL; } As noted in the comment, this test is only functional in the local case. The remote driver registers functions for every driver entrypoint. So I began to think about having an "is_available" function. The test above would be rewritten: if (!is_available (conn->driver->domainMigratePerform) || !is_available (dconn->driver->domainMigratePrepare)) { ... } and the idea is that is_available expands to a straightforward non-NULL test for local drivers, but something more complicated if the driver is the remote driver. Implementing is_available turns out to be less than simple however. I had some ideas along these lines: #define is_available(drv,fn) \ (!(drv)->_available \ ? (drv)->(fn) \ : (drv)->_available((drv,offsetof(typeof(drv),(fn))))) but this is wrong in several ways: (1) relies on GCC typeof extension, (2) not easy to write the _available function to do the right thing across different word sizes or structure packing (which might happen at the two ends of the remote connection). What seems to be better is some sort of private driver capabilities function. We might use it like this: if (!conn->driver->supports_feature (VIR_DRV_MIGRATION_V1) || !dconn->driver->supports_feature (VIR_DRV_MIGRATION_V1)) { ... } (Note that this is a completely private interface). In the remote case, the supports_feature call is passed through to the end driver. Now if we decide that the current internal implementation of virDomainMigrate is bogus, we can add the extra driver calls for a new migration implementation, then support both old and new by changing virDomainMigrate coordination function as so: if (conn->driver->supports_feature (VIR_DRV_MIGRATION_V2) && dconn->driver->supports_feature (VIR_DRV_MIGRATION_V2)) { // shiny new implementation of migration // ... } else if (conn->driver->supports_feature (VIR_DRV_MIGRATION_V1) && dconn->driver->supports_feature (VIR_DRV_MIGRATION_V1)) { // bogus old implementation of migration // ... } else { // no compatible implementation virLibConnError (conn, VIR_ERR_NO_SUPPORT, __FUNCTION__); return NULL; } The above allows us to do migrations from old libvirtd -> old libvirtd, or from new libvirtd -> new libvirtd. Plausibly we could add the other cases as well, depending upon how the new migration implementation worked. So that is my proposal. No code, but obviously simple to implement. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Fri, Jul 20, 2007 at 10:58:24AM -0400, Daniel Veillard wrote:
+ if (resource) { + virXendError (conn, VIR_ERR_NO_SUPPORT, + "xenDaemonDomainMigrate: Xen does not support resource limits during migration"); + return -1; + }
Hum, strange, the classical Xen bandwidth during live migration of http picture uses a 100Mbps capped transfer, so that ought to work at least at some point in the past, weird.
Yeah, it used to be implemented, but got dropped somewhere along the line, probably in the Xen 2 -> Xen 3 re-write. I wouldn't be surprised if it comes back again when migrate in XenD is next touched.
+ if (strstr (uri, "//")) { /* Full URI. */ + xmlURIPtr uriptr = xmlParseURI (uri); + if (!uriptr) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: invalid URI"); + return -1; + } + if (uriptr->scheme && STRCASENEQ (uriptr->scheme, "tcp")) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: only tcp:// migrations are supported by Xen"); + xmlFreeURI (uriptr); + return -1; + } + if (!uriptr->server) { + virXendError (conn, VIR_ERR_INVALID_ARG, + "xenDaemonDomainMigrate: a hostname must be specified in the URI"); + xmlFreeURI (uriptr); + return -1; + } + hostname = strdup (uriptr->server); + if (!hostname) { + virXendError (conn, VIR_ERR_NO_MEMORY, "strdup"); + xmlFreeURI (uriptr); + return -1; + } + if (uriptr->port) + snprintf (port, sizeof port, "%d", uriptr->port); + xmlFreeURI (uriptr); + } + else if ((p = strrchr (uri, ':')) != NULL) { /* "hostname:port" */
[snip]
+ else { /* "hostname" (or IP address) */
[snip]
Hum, we say it's an uri, but we interpret is differently if it's not absolute this could lead to confusion. But I'm not sure being a purist here would help that much from an user POV.
I agree - the fact that the 'uri' to virConnectOpen doesn't technically have to always be a URI (eg, NULL, or Xen, or xen) is a major cause of pain virt-manager since we have to special case parsing of it, rather than just handing off to a generic URI parser module. We should mandate wellformed URIs for the migrate API, where wellformed is defined to be whatever libxml is able to parse :-) Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Tue, Jul 24, 2007 at 01:33:08AM +0100, Daniel P. Berrange wrote:
On Fri, Jul 20, 2007 at 10:58:24AM -0400, Daniel Veillard wrote:
Hum, we say it's an uri, but we interpret is differently if it's not absolute this could lead to confusion. But I'm not sure being a purist here would help that much from an user POV.
I agree - the fact that the 'uri' to virConnectOpen doesn't technically have to always be a URI (eg, NULL, or Xen, or xen) is a major cause of pain virt-manager since we have to special case parsing of it, rather than just handing off to a generic URI parser module. We should mandate wellformed URIs for the migrate API, where wellformed is defined to be whatever libxml is able to parse :-)
libxml2 is pedanticly following RFC 2396, though at some point I will need to update to its successor RFC 3986 Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel P. Berrange wrote:
On Fri, Jul 20, 2007 at 10:58:24AM -0400, Daniel Veillard wrote:
Hum, we say it's an uri, but we interpret is differently if it's not absolute this could lead to confusion. But I'm not sure being a purist here would help that much from an user POV.
I agree - the fact that the 'uri' to virConnectOpen doesn't technically have to always be a URI (eg, NULL, or Xen, or xen) is a major cause of pain virt-manager since we have to special case parsing of it, rather than just handing off to a generic URI parser module. We should mandate wellformed URIs for the migrate API, where wellformed is defined to be whatever libxml is able to parse :-)
I'd be a bit happier if libxml2 could parse a bare string like "hostname" and "hostname:1234" more like my browser does. At the moment this is what it does: hostname ---> scheme=(null) server=(null) port=0 path=hostname query_raw=(null) hostname:1234 ---> scheme=hostname server=(null) port=0 path=(null) query_raw=(null) (Original program attached). I'm sure there's some smartypants standards reason for it, but it's counterintuitive to me. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Tue, Jul 24, 2007 at 01:25:31PM +0100, Richard W.M. Jones wrote:
Daniel P. Berrange wrote:
I agree - the fact that the 'uri' to virConnectOpen doesn't technically have to always be a URI (eg, NULL, or Xen, or xen) is a major cause of pain virt-manager since we have to special case parsing of it, rather than just handing off to a generic URI parser module. We should mandate wellformed URIs for the migrate API, where wellformed is defined to be whatever libxml is able to parse :-)
I'd be a bit happier if libxml2 could parse a bare string like "hostname" and "hostname:1234" more like my browser does. At the moment this is what it does:
With my libxml2 maintainer hat on: no What libxml2 provides is are URI handling
hostname ---> scheme=(null) server=(null) port=0 path=hostname query_raw=(null)
makes no sense. From an URI perspective "hostname" is a valid URI Reference, it could not be interpreted in any other way.
hostname:1234 ---> scheme=hostname server=(null) port=0 path=(null) query_raw=(null)
no better, "hostname:1234" is also a valid URI Reference
I'm sure there's some smartypants standards reason for it, but it's counterintuitive to me.
Your browser in general interprets "hostname" as an URI Reference (hopefully), but the small window where an user can type something relats more to guessing where you tried to direct him rather than any sensible and predictable behaviour. Again, sorry no, makes no sense from a libxml2 perspective. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Wed, Jul 18, 2007 at 08:22:37PM +0100, Richard W.M. Jones wrote:
This is version 4 of the virDomainMigrate patch. It includes remote support, but no qemu, virsh or python yet. And it needs lots more testing which I intend to do more of tomorrow.
Interface ---------
The interface is now this:
virDomainPtr virDomainMigrate (virDomainPtr domain, virConnectPtr dconn, unsigned long flags, const char *dname, const char *uri, unsigned long resource);
The caller may set dname, uri and resource to 0/NULL and forget about them. Or else the caller may set, in particular, uri to allow for more complicated migration strategies (especially for qemu).
https://www.redhat.com/archives/libvir-list/2007-July/msg00249.html
Driver support --------------
As outlined in the diagram in this email: https://www.redhat.com/archives/libvir-list/2007-July/msg00264.html migration happens in two stages.
Firstly we send a "prepare" message to the destination host. The destination host may reply with a cookie. It may also suggest a URI (in the current Xen implementation it just returns gethostname). Secondly we send a "perform" message to the source host.
Correspondingly, there are two new driver API functions:
typedef int (*virDrvDomainMigratePrepare) (virConnectPtr dconn, char **cookie, int *cookielen, const char *uri_in, char **uri_out, unsigned long flags, const char *dname, unsigned long resource);
typedef int (*virDrvDomainMigratePerform) (virDomainPtr domain, const char *cookie, int cookielen, const char *uri, unsigned long flags, const char *dname, unsigned long resource);
To make this work in the remote case I have had to export two private functions from the API which only remote should call: __virDomainMigratePrepare __virDomainMigratePerform The reason for this is that libvirtd is just linked to the regular libvirt.so, so can only make calls into libvirt through exported symbols in the dynamic symbol table.
There are two corresponding wire messages (REMOTE_PROC_DOMAIN_MIGRATE_PREPARE and REMOTE_PROC_DOMAIN_MIGRATE_PERFORM) but they just do dumb argument shuffling, albeit rather complicated because of the number of arguments passed in and out.
The complete list of messages which go across the wire during a migration is:
client -- prepare --> destination host client <-- prepare reply -- destination host client -- perform --> source host client <-- perform reply -- source host client -- lookupbyname --> destination host client <-- lookupbyname reply -- destination host
I wonder where we should allow for multiple back-forth between prepare and perform. The 'perform' method could return a tri-state - success, failure or continue. In the latter case it would call prepare on the destination again before calling perform on the source a second time. eg you might end up with an exchange like 1. --> client calls prepare at destination 2. <-- destination returns some metadata 3. --> client calls perform at source with metdata 4. <-- source returns 'continue' 5. --> client calls prepare are destination again 6. <-- destination returns more metdata 7. --> client calls perform at source with more metdata 8. <-- source returns 'success' In fact thinking about DV's suggestion that we should have a 'finalize' method at the destination. If we generalized just one step more we could handle this. Have just 2 methods virDrvDomainMigrateSource virDrvDomainMigrateDestination Both methods return a tri-state, 'complete', 'failed', 'continue'. The impl of virDomainMigrate starts with virDrvDomainMigrateDestination, then calls virDrvDomainMigrateSource, then calls virDrvDomainMigrateDestination again, and then virDrvDomainMigrateSource again, etc, etc. The sequence stops when both virDrvDomainMigrateSource & virDrvDomainMigrateDestination have returned 'complete', or one of them has returned 'failure'. In the case of one returning 'failure' it is possible we might still need to talk to the other end to initiate a recovery process. Finally the 'cookie' param is currently an 'out' parameter for prepare, and an 'in' parameter for perform. If we allowed for the arbitrary back-and-forth, then I think the cookie should probably be an 'in' and 'out' parameter for both calls - to allow them to pass metadata in both directions. Maybe this is all overkill, and hopefully, no impl would ever need more than a single call to either method, but I'm worried that we're designing this wire level protocol with single exchange+cookie, based on a rough idea for a potential future impl in Xen which has not actually been implemented yet. I think it could be a useful future-proofing to allow for the arbitrary back-and- forth at the wire level, could make use of it without needing to break client <-> daemon compatability. Alternatively, we could hold-off on adding a migrate API to libvirt, and get involved in upstream Xen to sort out secure migration for XenD. Then come back and finalize the libvirt design based on more knowledge of what we're going to be talking to.
I have extended capabilities with <migration_features>. For Xen this is:
<capabilities> <host> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features>
Makes a lot of sense Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Tue, Jul 24, 2007 at 01:56:08AM +0100, Daniel P. Berrange wrote:
On Wed, Jul 18, 2007 at 08:22:37PM +0100, Richard W.M. Jones wrote: [...] I wonder where we should allow for multiple back-forth between prepare and perform. The 'perform' method could return a tri-state - success, failure or continue. In the latter case it would call prepare on the destination again before calling perform on the source a second time. eg you might end up with an exchange like
1. --> client calls prepare at destination 2. <-- destination returns some metadata 3. --> client calls perform at source with metdata 4. <-- source returns 'continue' 5. --> client calls prepare are destination again 6. <-- destination returns more metdata 7. --> client calls perform at source with more metdata 8. <-- source returns 'success'
In fact thinking about DV's suggestion that we should have a 'finalize' method at the destination. If we generalized just one step more we could handle this. Have just 2 methods
virDrvDomainMigrateSource virDrvDomainMigrateDestination
Both methods return a tri-state, 'complete', 'failed', 'continue'. The impl of virDomainMigrate starts with virDrvDomainMigrateDestination, then calls virDrvDomainMigrateSource, then calls virDrvDomainMigrateDestination again, and then virDrvDomainMigrateSource again, etc, etc. The sequence stops when both virDrvDomainMigrateSource & virDrvDomainMigrateDestination have returned 'complete', or one of them has returned 'failure'.
Interesting. This would mean we could implement a low level transfer protocol within libvirt, bypassing the hypervisor own support :-) I still think you need a function which finalize and return the handle to the new domain in a possibly atomic fashion. Calling a lookup introduces a hazard IMHO it may fail even if the migration suceeded.
Maybe this is all overkill, and hopefully, no impl would ever need more than a single call to either method, but I'm worried that we're designing this wire level protocol with single exchange+cookie, based on a rough idea for a potential future impl in Xen which has not actually been implemented yet.
I think it could be a useful future-proofing to allow for the arbitrary back-and- forth at the wire level, could make use of it without needing to break client <-> daemon compatability.
Alternatively, we could hold-off on adding a migrate API to libvirt, and get involved in upstream Xen to sort out secure migration for XenD. Then come back and finalize the libvirt design based on more knowledge of what we're going to be talking to.
Let's be frank, I wasn't thinking about a future Xen new migration impl. I was thinking about the possibility to implement an in-libvirt migration based on save/transfer/restore and being able to make this an atomic transactional operation. That was the 2 kind of use I was thinking of. W.r.t. the protocol ABI I agree we should really be careful about it, I think we should just state that for migration, at this point we may not have a stable protocol, and it may break in future release. I think that's reasonnable at the moment. I would really like to make a new release today or tomorrow to try to push the version for testing in the Fedora 8 test 1 beta. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel P. Berrange wrote:
I wonder where we should allow for multiple back-forth between prepare and perform. The 'perform' method could return a tri-state - success, failure or continue. In the latter case it would call prepare on the destination again before calling perform on the source a second time. eg you might end up with an exchange like
1. --> client calls prepare at destination 2. <-- destination returns some metadata 3. --> client calls perform at source with metdata 4. <-- source returns 'continue' 5. --> client calls prepare are destination again 6. <-- destination returns more metdata 7. --> client calls perform at source with more metdata 8. <-- source returns 'success' [...]
I sense we're allowing this to grow into a private protocol between the source and destination. In reality no current migration system that we support will use anything more than a single call to domainMigratePrepare followed by a single call to domainMigratePerform method. (eg. for qemu, you need to start a listener at the destination, then at the source send a few console commands). Xen only uses domainMigratePerform, but may possibly in future use a single call to domainMigratePrepare (but there isn't even code for this, nevermind acceptance from upstream).
Alternatively, we could hold-off on adding a migrate API to libvirt, and get involved in upstream Xen to sort out secure migration for XenD. Then come back and finalize the libvirt design based on more knowledge of what we're going to be talking to.
Now this I totally agree with. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
John Levon
-
Richard W.M. Jones