Re: [libvirt] [RFC/Experimental]: Tunnelled migration

Friday, 17 July 2009

On Fri, Jul 17, 2009 at 02:59:36PM +0200, Chris Lalancette wrote:
...
 All,
      Attached is the current version of the tunnelled migration patch, based
 upon danpb's generic datastream work.  In order to use this work, you must first
 grab danpb's data-streams git branch here:
 http://gitorious.org/~berrange/libvirt/staging

 and then apply this patch on top.

 In some basic testing, this seems to work fine for me, although I have not given
 it a difficult scenario nor measured CPU utilization with these patches in place.
      DanB, these patches take a slightly different approach than you and I
 discussed yesterday on IRC.  Just to recap, you suggested a new version of
 virMigratePrepare (called virMigratePrepareTunnel) that would take in as one of
 the arguments a datastream, and during the prepare step properly setup the
 datastream.  Unless I'm missing something (which is entirely possible), this
 would also require passing that same datastream into the perform and finish
 stages, meaning that I'd essentially have an all new migration protocol version 3.
      To try to avoid that, during the prepare I store the port that we used to
 start the listening qemu in a new field in the virDomainObj structure.  Then
 during the perform step, I create a datastream on the destination and run a new
 RPC function called virDomainMigratePrepareTunnel.  This looks that port back
 up, associates it with the current stream, and returns back to the caller.  Then
 the source side just does virStreamSend for all the data, and we have tunnelled
 migration. 
Ah, now I understand why you were having trouble with this.

With this patch, the flow of control is logically

  virDomainMigrate(src, dst, uri)
    +- virDomainMigratePrepare(dst)
    +- virDomainMigratePerform(src, uri)
    |      +- dst2 = virConnectOpen(uri)
    |      +- virDomainMigratePrepareTunnel(dst2)
    |      +- while (1)
    |      |   +- virStreamSend(dst2, data)
    |      +- virConnectClose(uri)
    +- virDomainMigrateFinish(dst)

If we remember the requirement from the libvirt-qpid guys, which is
to remove the need for an application to pass in the destination
handle, this scheme won't work because 'dst' will be NULL.

To cope with that requirement we'd need the logical flow to be

  virDomainMigrate(src, NULL, uri)
    +- virDomainMigratePerform(src, uri)
           +- dst = virConnectOpen(uri)
           +- virDomainMigratePrepare(dst)
           +- virDomainMigratePrepareTunnel(dst)
           +- while (1)
           |   +- virStreamSend(dst, data)
           +- virDomainMigrateFinish(dst)
           +- virConnectClose(uri)

At which point, having separate virDomainMigratePrepare vs
virDomainMigratePrepareTunnel is overkill & might as well
just have virDomainMigratePrepareTunnel, which does all
the virDomainMigratePrepare logic itself, avoiding the need
to pass a TCP port around in virDomainObjPtr.

...

 TODO:
      - More testing, especially under worst-case scenarios (VM constantly
 changing it's memory during migration)
      - CPU utilization testing to make sure that we aren't using a lot of CPU
 time doing this
      - Wall-clock testing
      - Switch over to using Unix Domain Sockets instead of localhost TCP
 migration.  With a patch I put into upstream qemu (and is now in F-12), we can
 totally get rid of the scanning of localhost ports to find a free one, and just
 use Unix Domain Sockets.  That should make the whole thing more robust. 
The downside of using exec+dd to open a UNXI domain socket is that
it would add in yet another data copy. It really is annoying that
QEMU doesn't have more of this stuff built-in.

...
 +    st = virStreamNew(dconn, 0);
 +    if (st == NULL)
 +        /* FIXME: do we need to set an error here or did virStreamNew do it? */
 +        goto close_dconn;
 +
 +    if (virDomainMigratePrepareTunnel(dconn, st, vm->def->uuid, 0) < 0)
 +        /* FIXME: do we need to set an error here or did PrepareTunnel do it? */
 +        goto close_stream;
 +
 +    for (;;) {
 +        bytes = saferead(client_sock, buffer, MAX_BUFFER);
 +        if (bytes < 0) {
 +            qemudReportError (dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED,
 +                              _("Failed to read from qemu: %s"),
 +                              virStrerror(errno, ebuf, sizeof ebuf));
 +            goto close_stream; 
You should call virStreamAbort() here before virStreamFree to
inform the remote end you're terminating the data channel 
abnormally

...
 +        }
 +        else if (bytes == 0)
 +            /* EOF; get out of here */
 +            break;
 +
 +        if (virStreamSend(st, buffer, bytes) < 0) {
 +            qemudReportError (dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED,
 +                              _("Failed to write migration data to remote
libvirtd"));
 +            goto close_stream;
 +        }
 +    }
 +
 +    virStreamFinish(st);
 +    /* FIXME: check for errors */ 
A simple check of virStreamFinish return code as you already
do for virStreamSend would be sufficient here. voirStreamFinish
does a round triip  handshake to ensure it sees any of the
async errors from virStreamSend.

...
 +
 +    retval = 0;
 +
 +close_stream:
 +    virStreamFree(st);
 +
 +close_dconn:
 +    virConnectClose(dconn);
 +
 +close_client_sock:
 +    close(client_sock);
 +
 +qemu_cancel_migration:
 +    if (retval != 0)
 +        qemudMonitorCommand(vm, "migrate_cancel", &info);
 +    VIR_FREE(info);
 +
 +close_qemu_sock:
 +    close(qemu_sock);
 +
 +    return retval;
 +}
 +
  /* Perform is the second step, and it runs on the source host. */ 

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [RFC/Experimental]: Tunnelled migration