On Mon, Jul 23, 2007 at 11:00:21AM +0100, Richard W.M. Jones wrote:
Daniel Veillard wrote:
>>Firstly we send a "prepare" message to the destination host. The
>>destination host may reply with a cookie. It may also suggest a URI (in
>>the current Xen implementation it just returns gethostname). Secondly
>>we send a "perform" message to the source host.
>>
>>Correspondingly, there are two new driver API functions:
>>
>> typedef int
>> (*virDrvDomainMigratePrepare)
>> (virConnectPtr dconn,
>> char **cookie,
>> int *cookielen,
>> const char *uri_in,
>> char **uri_out,
>> unsigned long flags,
>> const char *dname,
>> unsigned long resource);
>>
>> typedef int
>> (*virDrvDomainMigratePerform)
>> (virDomainPtr domain,
>> const char *cookie,
>> int cookielen,
>> const char *uri,
>> unsigned long flags,
>> const char *dname,
>> unsigned long resource);
>
> I wonder if 2 steps are really sufficient. I have the feeling that a
> third
>step virDrvDomainMigrateFinish() might be needed, it could for example
>resume on the target side and also verify the domain is actually okay.
>That could improve error handling and feels a lot more like a transactional
>system where you really want an atomic work/fail operation and nothing
>else.
Yes. It's important to note that the internals may be changed later,
although that may complicate things if we want to allow people to run
incompatible versions of the daemons. [Another email on that subject is
coming up after this one].
I'm not sure how exactly what you propose would work in the Xen case.
In the common error case (incompatible xend leading to domains being
eaten), the domain is actually created for a short while on the dest
host. It dies later, but it seems possible to me that there is a race
where domainMigrateFinish could return "OK", and yet the domain would
fail later. In another case -- where the domain could not connect to
its iSCSI backend -- this could be even more common.
Note also that there is a third step in the current code. After
domainMigrate has finished, the controlling client then does
"virDomainLookupByName" in order to fetch the destination domain object.
This is subject to the race conditions in the preceeding paragraph.
Yes, my point is that other migration may need a more complex third
step, for example to activate the domain on the target after the transfer of
the data in step 2. In the case of xen the semantic of the migrate command
includes the restart on the other side but that may not always be the case.
In the case of Xen then the backend can just call virDomainLookupByName
on the remote target node.
>>There are two corresponding wire messages
>>(REMOTE_PROC_DOMAIN_MIGRATE_PREPARE and
>>REMOTE_PROC_DOMAIN_MIGRATE_PERFORM) but they just do dumb argument
>>shuffling, albeit rather complicated because of the number of arguments
>>passed in and out.
>>
>>The complete list of messages which go across the wire during a
>>migration is:
>>
>> client -- prepare --> destination host
>> client <-- prepare reply -- destination host
>> client -- perform --> source host
>> client <-- perform reply -- source host
>> client -- lookupbyname --> destination host
>> client <-- lookupbyname reply -- destination host
>
> Okay, instead of trying to reuse lookupbyname to assert completion,
>I would rather make a third special entry point. Sounds more generic
>to me, but again it's an implementation point, not a blocker, all this
>is hidden behind the API.
Noted.
At some point we should make the network protocol part of the ABI,
but let's try to avoid breaking it too often :-)
>>Capabilities
>>------------
>>
>>I have extended capabilities with <migration_features>. For Xen this is:
>>
>><capabilities>
>> <host>
>> <migration_features>
>> <live/>
>> <uri_transports>
>> <uri_transport>tcp</uri_transport>
>> </uri_transports>
>> </migration_features>
>
> Nice, but what is the expected set of values for uri_transport ?
>Theorically that can be any scheme name from rfc2396 (or later)
> scheme = alpha *( alpha | digit | "+" | "-" |
"." )
>
> unless I misunderstood this.
I think I'm misunderstanding you. In the Xen case it would be changed
to <uri_transport>xenmigr</uri_transport>.
yes, if you think its okay.
>> #define REMOTE_CPUMAPS_MAX 16384
>>+#define REMOTE_MIGRATE_COOKIE_MAX 256
>
> hum, what is that ? Sorry to show up ignorance, feel free to point
>me to an xdr FAQ !
In XDR you can either have unlimited strings (well, the limit is
2^32-1), or you can impose a limit on their length. Some common types
in XDR notation:
Unlimited strings: string foo<>;
Limited-length strings: string foo<1000>;
Fixed-length strings: char foo[1000];
Byte arrays: opaque foo[1000];
Now if we just defined all strings as <> type, then the
automatically-built XDR receivers would accept unlimited amounts of data
from a buggy or malicious client. In particular they would attempt to
allocate large amounts of memory, crashing the server[1]. So instead we
impose upper limits on the length of various strings. They are defined
at the top of qemud/remote_protocol.x.
okay, thanks for the explanations :-)
>>+ * Since typically the two hypervisors connect directly to
each
>>+ * other in order to perform the migration, you may need to specify
>>+ * a path from the source to the destination. This is the purpose
>>+ * of the uri parameter. If uri is NULL, then libvirt will try to
>>+ * find the best method. Uri may specify the hostname or IP address
>>+ * of the destination host as seen from the source. Or uri may be
>>+ * a URI giving transport, hostname, user, port, etc. in the usual
>>+ * form. Refer to driver documentation for the particular URIs
>>+ * supported.
>>+ *
>>+ * The maximum bandwidth (in Mbps) that will be used to do migration
>>+ * can be specified with the resource parameter. If set to 0,
>>+ * libvirt will choose a suitable default. Some hypervisors do
>>+ * not support this feature and will return an error if resource
>>+ * is not 0.
>
> Do you really want to fail there too ?
I'm not sure ... It seemed like the safest thing to do since we are
unable to enforce the requested limit so Bad Things might happen
(effectively a DoS on the network).
Okay, it's probably better to avoid any fuzziness in the definition
of the API, and then error in that case. I'm convinced !
>Similary the capability to
>limit bandwidth should be added to the
<capabilities><migration_features>
>possibly as a <bandwidth/> optional element.
Yes. Note that although xend takes and indeed requires a resource
parameter, the implementation in xen 3.1 completely ignores it. For
this reason, <bandwidth/> is _not_ a xen 3.1 capability.
okay
thanks !
Daniel
--
Red Hat Virtualization group
http://redhat.com/virtualization/
Daniel Veillard | virtualization library
http://libvirt.org/
veillard(a)redhat.com | libxml GNOME XML XSLT toolkit
http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine
http://rpmfind.net/