[libvirt] [PATCH 0/3] fix migration of paused vms

From: Paolo Bonzini <pbonzini@gnu.org> This is yet another try on fixing migration of paused vms. Since the driver-only solution didn't work out, I'm now fixing it directly in libvirt.c. The first two patches are the same as for the previous series. The first is a bugfix for failed migration for the QEMU driver; the second adds a "virsh migrate --suspend" option to leave the VM not running on the destination machine. As in the previous version of the patch, --suspend/VIR_MIGRATE_PAUSED is the bulk of implementing the new feature. I just query the domain state and, if it is paused, I implicitly add the VIR_MIGRATE_PAUSED flag to the migration. This is the third patch. I thought about transmitting the state in the end (since it works well if VIR_MIGRATE_PAUSED is passed only to MigrateFinish and not to MigratePerform), but I don't think it's a good idea to commit it now as it's basically untestable. It's in the fourth patch, but if you want it now and think the approach is sound, tell me and I'll test it more heavily.

This makes a small change on the failed-migration path. Up to now, all VMs that failed non-live migration after the "stop" command were restarted. This must not be done when the VM was paused in the first place. * src/qemu/qemu_driver.c (qemudDomainMigratePerform): Do not restart a paused VM that fails migration. Set paused state after "stop", reset it after failure. --- src/qemu/qemu_driver.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 356e4e7..44cec6c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7182,7 +7182,7 @@ qemudDomainMigratePerform (virDomainPtr dom, goto endjob; } - if (!(flags & VIR_MIGRATE_LIVE)) { + if (!(flags & VIR_MIGRATE_LIVE) && vm->state == VIR_DOMAIN_RUNNING) { qemuDomainObjPrivatePtr priv = vm->privateData; /* Pause domain for non-live migration */ qemuDomainObjEnterMonitorWithDriver(driver, vm); @@ -7193,6 +7193,7 @@ qemudDomainMigratePerform (virDomainPtr dom, qemuDomainObjExitMonitorWithDriver(driver, vm); paused = 1; + vm->state = VIR_DOMAIN_PAUSED; event = virDomainEventNewFromObj(vm, VIR_DOMAIN_EVENT_SUSPENDED, VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED); @@ -7240,6 +7241,7 @@ endjob: } qemuDomainObjExitMonitorWithDriver(driver, vm); + vm->state = VIR_DOMAIN_RUNNING; event = virDomainEventNewFromObj(vm, VIR_DOMAIN_EVENT_RESUMED, VIR_DOMAIN_EVENT_RESUMED_MIGRATED); -- 1.6.5.2

On Wed, Nov 25, 2009 at 04:40:06PM +0100, Paolo Bonzini wrote:
This makes a small change on the failed-migration path. Up to now, all VMs that failed non-live migration after the "stop" command were restarted. This must not be done when the VM was paused in the first place.
* src/qemu/qemu_driver.c (qemudDomainMigratePerform): Do not restart a paused VM that fails migration. Set paused state after "stop", reset it after failure. --- src/qemu/qemu_driver.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-)
ACK, looks good. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

This adds a new flag, VIR_MIGRATE_PAUSED, that mandates pausing the migrated VM before starting it. * include/libvirt/libvirt.h.in (virDomainMigrateFlags): Add VIR_MIGRATE_PAUSED. * src/qemu/qemu_driver.c (qemudDomainMigrateFinish2): Handle VIR_MIGRATE_PAUSED. * tools/virsh.c (opts_migrate): Add --suspend. (cmdMigrate): Handle it. * tools/virsh.pod (migrate): Document it. --- The complicated part of the patch is simply this with diff -b: @@ -7320,6 +7320,7 @@ qemudDomainMigrateFinish2 (virConnectPtr dconn, qemuDomainObjPrivatePtr priv = vm->privateData; dom = virGetDomain (dconn, vm->def->name, vm->def->uuid); + if (!(flags & VIR_MIGRATE_PAUSED)) { /* run 'cont' on the destination, which allows migration on qemu * >= 0.10.6 to work properly. This isn't strictly necessary on * older qemu's, but it also doesn't hurt anything there @@ -7335,9 +7336,17 @@ qemudDomainMigrateFinish2 (virConnectPtr dconn, qemuDomainObjExitMonitorWithDriver(driver, vm); vm->state = VIR_DOMAIN_RUNNING; + } include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 1 + src/qemu/qemu_driver.c | 33 +++++++++++++++++++++------------ tools/virsh.c | 5 ++++- tools/virsh.pod | 7 ++++--- 5 files changed, 31 insertions(+), 16 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 5bc7694..0488cbf 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -341,6 +341,7 @@ typedef enum { VIR_MIGRATE_TUNNELLED = (1 << 2), /* tunnel migration data over libvirtd connection */ VIR_MIGRATE_PERSIST_DEST = (1 << 3), /* persist the VM on the destination */ VIR_MIGRATE_UNDEFINE_SOURCE = (1 << 4), /* undefine the VM on the source */ + VIR_MIGRATE_PAUSED = (1 << 5), /* pause on remote side */ } virDomainMigrateFlags; /* Domain migration. */ diff --git a/src/libvirt.c b/src/libvirt.c index 05e45f3..2ced604 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -3179,6 +3179,7 @@ virDomainMigrateDirect (virDomainPtr domain, * on the destination host. * VIR_MIGRATE_UNDEFINE_SOURCE If the migration is successful, undefine the * domain on the source host. + * VIR_MIGRATE_PAUSED Leave the domain suspended on the remote side. * * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set. * Applications using the VIR_MIGRATE_PEER2PEER flag will probably diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 44cec6c..4d20fb7 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7320,24 +7320,33 @@ qemudDomainMigrateFinish2 (virConnectPtr dconn, qemuDomainObjPrivatePtr priv = vm->privateData; dom = virGetDomain (dconn, vm->def->name, vm->def->uuid); - /* run 'cont' on the destination, which allows migration on qemu - * >= 0.10.6 to work properly. This isn't strictly necessary on - * older qemu's, but it also doesn't hurt anything there - */ - qemuDomainObjEnterMonitorWithDriver(driver, vm); - if (qemuMonitorStartCPUs(priv->mon, dconn) < 0) { - if (virGetLastError() == NULL) - qemudReportError(dconn, NULL, NULL, VIR_ERR_INTERNAL_ERROR, - "%s", _("resume operation failed")); + if (!(flags & VIR_MIGRATE_PAUSED)) { + /* run 'cont' on the destination, which allows migration on qemu + * >= 0.10.6 to work properly. This isn't strictly necessary on + * older qemu's, but it also doesn't hurt anything there + */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + if (qemuMonitorStartCPUs(priv->mon, dconn) < 0) { + if (virGetLastError() == NULL) + qemudReportError(dconn, NULL, NULL, VIR_ERR_INTERNAL_ERROR, + "%s", _("resume operation failed")); + qemuDomainObjExitMonitorWithDriver(driver, vm); + goto endjob; + } qemuDomainObjExitMonitorWithDriver(driver, vm); - goto endjob; + + vm->state = VIR_DOMAIN_RUNNING; } - qemuDomainObjExitMonitorWithDriver(driver, vm); - vm->state = VIR_DOMAIN_RUNNING; event = virDomainEventNewFromObj(vm, VIR_DOMAIN_EVENT_RESUMED, VIR_DOMAIN_EVENT_RESUMED_MIGRATED); + if (vm->state == VIR_DOMAIN_PAUSED) { + qemuDomainEventQueue(driver, event); + event = virDomainEventNewFromObj(vm, + VIR_DOMAIN_EVENT_SUSPENDED, + VIR_DOMAIN_EVENT_SUSPENDED_PAUSED); + } virDomainSaveStatus(dconn, driver->stateDir, vm); } else { qemudShutdownVMDaemon (dconn, driver, vm); diff --git a/tools/virsh.c b/tools/virsh.c index 9faac35..9871b4b 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -2478,6 +2478,7 @@ static const vshCmdOptDef opts_migrate[] = { {"tunnelled", VSH_OT_BOOL, 0, gettext_noop("tunnelled migration")}, {"persistent", VSH_OT_BOOL, 0, gettext_noop("persist VM on destination")}, {"undefinesource", VSH_OT_BOOL, 0, gettext_noop("undefine VM on source")}, + {"suspend", VSH_OT_BOOL, 0, gettext_noop("do not restart the domain on the destination host")}, {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, gettext_noop("domain name, id or uuid")}, {"desturi", VSH_OT_DATA, VSH_OFLAG_REQ, gettext_noop("connection URI of the destination host")}, {"migrateuri", VSH_OT_DATA, 0, gettext_noop("migration URI, usually can be omitted")}, @@ -2519,10 +2520,12 @@ cmdMigrate (vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool (cmd, "persistent")) flags |= VIR_MIGRATE_PERSIST_DEST; - if (vshCommandOptBool (cmd, "undefinesource")) flags |= VIR_MIGRATE_UNDEFINE_SOURCE; + if (vshCommandOptBool (cmd, "suspend")) + flags |= VIR_MIGRATE_PAUSED; + if ((flags & VIR_MIGRATE_PEER2PEER) || vshCommandOptBool (cmd, "direct")) { /* For peer2peer migration or direct migration we only expect one URI diff --git a/tools/virsh.pod b/tools/virsh.pod index 6ff0151..3830464 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -302,10 +302,11 @@ except that it does some error checking. The editor used can be supplied by the C<$EDITOR> environment variable, or if that is not defined defaults to C<vi>. -=item B<migrate> optional I<--live> I<domain-id> I<desturi> I<migrateuri> +=item B<migrate> optional I<--live> I<--suspend> I<domain-id> I<desturi> I<migrateuri> -Migrate domain to another host. Add --live for live migration. The I<desturi> -is the connection URI of the destination host, and I<migrateuri> is the +Migrate domain to another host. Add --live for live migration; --suspend +leaves the domain paused on the destination host. The I<desturi> is the +connection URI of the destination host, and I<migrateuri> is the migration URI, which usually can be omitted. =item B<reboot> I<domain-id> -- 1.6.5.2

On Wed, Nov 25, 2009 at 04:40:07PM +0100, Paolo Bonzini wrote:
This adds a new flag, VIR_MIGRATE_PAUSED, that mandates pausing the migrated VM before starting it.
* include/libvirt/libvirt.h.in (virDomainMigrateFlags): Add VIR_MIGRATE_PAUSED. * src/qemu/qemu_driver.c (qemudDomainMigrateFinish2): Handle VIR_MIGRATE_PAUSED. * tools/virsh.c (opts_migrate): Add --suspend. (cmdMigrate): Handle it. * tools/virsh.pod (migrate): Document it. --- The complicated part of the patch is simply this with diff -b:
@@ -7320,6 +7320,7 @@ qemudDomainMigrateFinish2 (virConnectPtr dconn, qemuDomainObjPrivatePtr priv = vm->privateData; dom = virGetDomain (dconn, vm->def->name, vm->def->uuid);
+ if (!(flags & VIR_MIGRATE_PAUSED)) { /* run 'cont' on the destination, which allows migration on qemu * >= 0.10.6 to work properly. This isn't strictly necessary on * older qemu's, but it also doesn't hurt anything there @@ -7335,9 +7336,17 @@ qemudDomainMigrateFinish2 (virConnectPtr dconn, qemuDomainObjExitMonitorWithDriver(driver, vm);
vm->state = VIR_DOMAIN_RUNNING; + }
include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 1 + src/qemu/qemu_driver.c | 33 +++++++++++++++++++++------------ tools/virsh.c | 5 ++++- tools/virsh.pod | 7 ++++--- 5 files changed, 31 insertions(+), 16 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 5bc7694..0488cbf 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -341,6 +341,7 @@ typedef enum { VIR_MIGRATE_TUNNELLED = (1 << 2), /* tunnel migration data over libvirtd connection */ VIR_MIGRATE_PERSIST_DEST = (1 << 3), /* persist the VM on the destination */ VIR_MIGRATE_UNDEFINE_SOURCE = (1 << 4), /* undefine the VM on the source */ + VIR_MIGRATE_PAUSED = (1 << 5), /* pause on remote side */ } virDomainMigrateFlags;
/* Domain migration. */ diff --git a/src/libvirt.c b/src/libvirt.c index 05e45f3..2ced604 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -3179,6 +3179,7 @@ virDomainMigrateDirect (virDomainPtr domain, * on the destination host. * VIR_MIGRATE_UNDEFINE_SOURCE If the migration is successful, undefine the * domain on the source host. + * VIR_MIGRATE_PAUSED Leave the domain suspended on the remote side. * * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set. * Applications using the VIR_MIGRATE_PEER2PEER flag will probably diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 44cec6c..4d20fb7 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7320,24 +7320,33 @@ qemudDomainMigrateFinish2 (virConnectPtr dconn, qemuDomainObjPrivatePtr priv = vm->privateData; dom = virGetDomain (dconn, vm->def->name, vm->def->uuid);
- /* run 'cont' on the destination, which allows migration on qemu - * >= 0.10.6 to work properly. This isn't strictly necessary on - * older qemu's, but it also doesn't hurt anything there - */ - qemuDomainObjEnterMonitorWithDriver(driver, vm); - if (qemuMonitorStartCPUs(priv->mon, dconn) < 0) { - if (virGetLastError() == NULL) - qemudReportError(dconn, NULL, NULL, VIR_ERR_INTERNAL_ERROR, - "%s", _("resume operation failed")); + if (!(flags & VIR_MIGRATE_PAUSED)) { + /* run 'cont' on the destination, which allows migration on qemu + * >= 0.10.6 to work properly. This isn't strictly necessary on + * older qemu's, but it also doesn't hurt anything there + */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + if (qemuMonitorStartCPUs(priv->mon, dconn) < 0) { + if (virGetLastError() == NULL) + qemudReportError(dconn, NULL, NULL, VIR_ERR_INTERNAL_ERROR, + "%s", _("resume operation failed")); + qemuDomainObjExitMonitorWithDriver(driver, vm); + goto endjob; + } qemuDomainObjExitMonitorWithDriver(driver, vm); - goto endjob; + + vm->state = VIR_DOMAIN_RUNNING; } - qemuDomainObjExitMonitorWithDriver(driver, vm);
- vm->state = VIR_DOMAIN_RUNNING; event = virDomainEventNewFromObj(vm, VIR_DOMAIN_EVENT_RESUMED, VIR_DOMAIN_EVENT_RESUMED_MIGRATED); + if (vm->state == VIR_DOMAIN_PAUSED) { + qemuDomainEventQueue(driver, event); + event = virDomainEventNewFromObj(vm, + VIR_DOMAIN_EVENT_SUSPENDED, + VIR_DOMAIN_EVENT_SUSPENDED_PAUSED); + } virDomainSaveStatus(dconn, driver->stateDir, vm); } else { qemudShutdownVMDaemon (dconn, driver, vm); diff --git a/tools/virsh.c b/tools/virsh.c index 9faac35..9871b4b 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -2478,6 +2478,7 @@ static const vshCmdOptDef opts_migrate[] = { {"tunnelled", VSH_OT_BOOL, 0, gettext_noop("tunnelled migration")}, {"persistent", VSH_OT_BOOL, 0, gettext_noop("persist VM on destination")}, {"undefinesource", VSH_OT_BOOL, 0, gettext_noop("undefine VM on source")}, + {"suspend", VSH_OT_BOOL, 0, gettext_noop("do not restart the domain on the destination host")}, {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, gettext_noop("domain name, id or uuid")}, {"desturi", VSH_OT_DATA, VSH_OFLAG_REQ, gettext_noop("connection URI of the destination host")}, {"migrateuri", VSH_OT_DATA, 0, gettext_noop("migration URI, usually can be omitted")}, @@ -2519,10 +2520,12 @@ cmdMigrate (vshControl *ctl, const vshCmd *cmd)
if (vshCommandOptBool (cmd, "persistent")) flags |= VIR_MIGRATE_PERSIST_DEST; - if (vshCommandOptBool (cmd, "undefinesource")) flags |= VIR_MIGRATE_UNDEFINE_SOURCE;
+ if (vshCommandOptBool (cmd, "suspend")) + flags |= VIR_MIGRATE_PAUSED; + if ((flags & VIR_MIGRATE_PEER2PEER) || vshCommandOptBool (cmd, "direct")) { /* For peer2peer migration or direct migration we only expect one URI diff --git a/tools/virsh.pod b/tools/virsh.pod index 6ff0151..3830464 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -302,10 +302,11 @@ except that it does some error checking. The editor used can be supplied by the C<$EDITOR> environment variable, or if that is not defined defaults to C<vi>.
-=item B<migrate> optional I<--live> I<domain-id> I<desturi> I<migrateuri> +=item B<migrate> optional I<--live> I<--suspend> I<domain-id> I<desturi> I<migrateuri>
-Migrate domain to another host. Add --live for live migration. The I<desturi> -is the connection URI of the destination host, and I<migrateuri> is the +Migrate domain to another host. Add --live for live migration; --suspend +leaves the domain paused on the destination host. The I<desturi> is the +connection URI of the destination host, and I<migrateuri> is the migration URI, which usually can be omitted.
=item B<reboot> I<domain-id>
ACK, though one thing I notice is that the QEMU driver isn't validating the flags passed in For example Xen does if (flags & VIR_MIGRATE_UNDEFINE_SOURCE) { undefined_source = 1; flags &= ~VIR_MIGRATE_UNDEFINE_SOURCE; } /* Ignore the persist_dest flag here */ if (flags & VIR_MIGRATE_PERSIST_DEST) flags &= ~VIR_MIGRATE_PERSIST_DEST; /* XXX we could easily do tunnelled & peer2peer migration too if we want to. support these... */ if (flags != 0) { virXendError (conn, VIR_ERR_NO_SUPPORT, "%s", _("xenDaemonDomainMigrate: unsupported flag")); return -1; } This means if you tried to use this new '--suspend' feature with Xen you'd get a nice error. If you used it with an old QEMU it'd silently complete and do nothing. We can't do much about this now, but I think it is worth adding in such a check, so if we add more migration flags later we wil be validating them ACK to this patch anyway Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

This patch fixes the bug where paused/running state is not transmitted during migration. As a result, in the QEMU driver for example the machine was always started on the destination end. In order to do so, I just read the state and if it is appropriate I set the VIR_MIGRATE_PAUSED flag. * src/libvirt.c (virDomainMigrateVersion1, virDomainMigrateVersion2): Automatically add VIR_MIGRATE_PAUSED when appropriate. * src/xen/xend_internal.c (xenDaemonDomainMigratePerform): Give a nicer error message when migration of paused virtual machines is attempted. --- src/libvirt.c | 12 ++++++++++++ src/xen/xend_internal.c | 9 +++++++++ 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/src/libvirt.c b/src/libvirt.c index 2ced604..9d03e13 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2964,6 +2964,12 @@ virDomainMigrateVersion1 (virDomainPtr domain, char *uri_out = NULL; char *cookie = NULL; int cookielen = 0; + virDomainInfo info; + + virDomainGetInfo (domain, &info); + if (info.state == VIR_DOMAIN_PAUSED) { + flags |= VIR_MIGRATE_PAUSED; + } /* Prepare the migration. * @@ -3028,6 +3034,7 @@ virDomainMigrateVersion2 (virDomainPtr domain, char *cookie = NULL; char *dom_xml = NULL; int cookielen = 0, ret; + virDomainInfo info; /* Prepare the migration. * @@ -3054,6 +3061,11 @@ virDomainMigrateVersion2 (virDomainPtr domain, if (!dom_xml) return NULL; + virDomainGetInfo (domain, &info); + if (info.state == VIR_DOMAIN_PAUSED) { + flags |= VIR_MIGRATE_PAUSED; + } + ret = dconn->driver->domainMigratePrepare2 (dconn, &cookie, &cookielen, uri, &uri_out, flags, dname, bandwidth, dom_xml); diff --git a/src/xen/xend_internal.c b/src/xen/xend_internal.c index e370eb8..36148d9 100644 --- a/src/xen/xend_internal.c +++ b/src/xen/xend_internal.c @@ -4434,6 +4434,15 @@ xenDaemonDomainMigratePerform (virDomainPtr domain, if (flags & VIR_MIGRATE_PERSIST_DEST) flags &= ~VIR_MIGRATE_PERSIST_DEST; + /* This could be supported in principle (disregard VIR_MIGRATE_LIVE, pause + * the VM in advance) but is buggy in Xend; give a nice error message. + */ + if (flags & VIR_MIGRATE_PAUSED) { + virXendError (conn, VIR_ERR_NO_SUPPORT, + "%s", _("xenDaemonDomainMigrate: xend cannot migrate paused virtual machines")); + return -1; + } + /* XXX we could easily do tunnelled & peer2peer migration too if we want to. support these... */ if (flags != 0) { -- 1.6.5.2

On Wed, Nov 25, 2009 at 04:40:08PM +0100, Paolo Bonzini wrote:
This patch fixes the bug where paused/running state is not transmitted during migration. As a result, in the QEMU driver for example the machine was always started on the destination end.
In order to do so, I just read the state and if it is appropriate I set the VIR_MIGRATE_PAUSED flag.
* src/libvirt.c (virDomainMigrateVersion1, virDomainMigrateVersion2): Automatically add VIR_MIGRATE_PAUSED when appropriate. * src/xen/xend_internal.c (xenDaemonDomainMigratePerform): Give a nicer error message when migration of paused virtual machines is attempted. --- src/libvirt.c | 12 ++++++++++++ src/xen/xend_internal.c | 9 +++++++++ 2 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/src/libvirt.c b/src/libvirt.c index 2ced604..9d03e13 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2964,6 +2964,12 @@ virDomainMigrateVersion1 (virDomainPtr domain, char *uri_out = NULL; char *cookie = NULL; int cookielen = 0; + virDomainInfo info; + + virDomainGetInfo (domain, &info); + if (info.state == VIR_DOMAIN_PAUSED) { + flags |= VIR_MIGRATE_PAUSED; + }
There's a non-trivial chance of virDomainGetInfo failing so I think we should check return value here Also, since we now have the state, we might as well also short-circuit shutoff domains if (info.state == VIR_DOMAIN_SHUTOFF) ...cannot migrate inactive domains... Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

This patch would fix the case where paused/running state is changed during live migration. In order to do so, libvirt attaches an event handler that tracks the paused/running state on the source side. Then, based on the state _just before_ the end of the migration, it can stick a VIR_MIGRATE_PAUSED flag into the flags passed to MigrateFinish or MigrateFinish2. It is not a problem for the QEMU driver if the flag is not passed in the Perform callback. This would have the advantage of working even with asynchronous changes to the state that can happen during the migration itself. However, since the driver lock is held during migration, this is basically impossible to test until asynchronous notifications are in place. Also, it is somewhat inherently racy: maybe when notifications are added to QEMU, it should also have a "sync" monitor command to flush all notifications. Hence, this is just an RFC. * src/libvirt.c (struct _migrateEventState, migrateEventState, virMigrateEventCallback): New. (virDomainMigrateVersion1, virDomainMigrateVersion2): Use them. --- src/libvirt.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 60 insertions(+), 8 deletions(-) diff --git a/src/libvirt.c b/src/libvirt.c index 9d03e13..12e90cd 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2951,6 +2951,42 @@ error: return NULL; } +typedef struct _migrateEventData migrateEventData; + +struct _migrateEventData { + virDomainPtr domain; + virDomainInfo info; +}; + +static int +virMigrateEventCallback(virConnectPtr conn ATTRIBUTE_UNUSED, + virDomainPtr dom, + int event, + int detail, + void *opaque) +{ + migrateEventData *data = opaque; + + if (dom != data->domain) + return 0; + + switch (event) + { + case VIR_DOMAIN_EVENT_SUSPENDED: + if (detail != VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED) + data->info.state = VIR_DOMAIN_PAUSED; + break; + case VIR_DOMAIN_EVENT_STOPPED: + if (detail == VIR_DOMAIN_EVENT_STOPPED_CRASHED) + data->info.state = VIR_DOMAIN_CRASHED; + else if (detail != VIR_DOMAIN_EVENT_STOPPED_MIGRATED) + data->info.state = VIR_DOMAIN_SHUTOFF; + break; + } + + return 0; +} + static virDomainPtr virDomainMigrateVersion1 (virDomainPtr domain, @@ -2964,12 +3000,14 @@ virDomainMigrateVersion1 (virDomainPtr domain, char *uri_out = NULL; char *cookie = NULL; int cookielen = 0; - virDomainInfo info; + migrateEventData data; - virDomainGetInfo (domain, &info); - if (info.state == VIR_DOMAIN_PAUSED) { - flags |= VIR_MIGRATE_PAUSED; + data.domain = domain; + if (virConnectDomainEventRegister (domain->conn, virMigrateEventCallback, + &data, NULL) == -1) { + return NULL; } + virDomainGetInfo (domain, &data.info); /* Prepare the migration. * @@ -3003,6 +3041,11 @@ virDomainMigrateVersion1 (virDomainPtr domain, (domain, cookie, cookielen, uri, flags, dname, bandwidth) == -1) goto done; + /* TODO: what if the host has crashed during live migration? */ + if (data.info.state == VIR_DOMAIN_PAUSED) { + flags |= VIR_MIGRATE_PAUSED; + } + /* Get the destination domain and return it or error. * 'domain' no longer actually exists at this point * (or so we hope), but we still use the object in memory @@ -3016,6 +3059,7 @@ virDomainMigrateVersion1 (virDomainPtr domain, ddomain = virDomainLookupByName (dconn, dname); done: + virConnectDomainEventDeregister (domain->conn, virMigrateEventCallback); VIR_FREE (uri_out); VIR_FREE (cookie); return ddomain; @@ -3034,7 +3078,7 @@ virDomainMigrateVersion2 (virDomainPtr domain, char *cookie = NULL; char *dom_xml = NULL; int cookielen = 0, ret; - virDomainInfo info; + migrateEventData data; /* Prepare the migration. * @@ -3061,10 +3105,12 @@ virDomainMigrateVersion2 (virDomainPtr domain, if (!dom_xml) return NULL; - virDomainGetInfo (domain, &info); - if (info.state == VIR_DOMAIN_PAUSED) { - flags |= VIR_MIGRATE_PAUSED; + data.domain = domain; + if (virConnectDomainEventRegister (domain->conn, virMigrateEventCallback, + &data, NULL) == -1) { + return NULL; } + virDomainGetInfo (domain, &data.info); ret = dconn->driver->domainMigratePrepare2 (dconn, &cookie, &cookielen, uri, &uri_out, flags, dname, @@ -3088,6 +3134,11 @@ virDomainMigrateVersion2 (virDomainPtr domain, ret = domain->conn->driver->domainMigratePerform (domain, cookie, cookielen, uri, flags, dname, bandwidth); + /* TODO: what if the host has crashed during live migration? */ + if (data.info.state == VIR_DOMAIN_PAUSED) { + flags |= VIR_MIGRATE_PAUSED; + } + /* In version 2 of the migration protocol, we pass the * status code from the sender to the destination host, * so it can do any cleanup if the migration failed. @@ -3097,6 +3148,7 @@ virDomainMigrateVersion2 (virDomainPtr domain, (dconn, dname, cookie, cookielen, uri, flags, ret); done: + virConnectDomainEventDeregister (domain->conn, virMigrateEventCallback); VIR_FREE (uri_out); VIR_FREE (cookie); return ddomain; -- 1.6.5.2

On Wed, Nov 25, 2009 at 04:40:09PM +0100, Paolo Bonzini wrote:
This patch would fix the case where paused/running state is changed during live migration. In order to do so, libvirt attaches an event handler that tracks the paused/running state on the source side. Then, based on the state _just before_ the end of the migration, it can stick a VIR_MIGRATE_PAUSED flag into the flags passed to MigrateFinish or MigrateFinish2. It is not a problem for the QEMU driver if the flag is not passed in the Perform callback.
This would have the advantage of working even with asynchronous changes to the state that can happen during the migration itself. However, since the driver lock is held during migration, this is basically impossible to test until asynchronous notifications are in place. Also, it is somewhat inherently racy: maybe when notifications are added to QEMU, it should also have a "sync" monitor command to flush all notifications.
Hence, this is just an RFC.
* src/libvirt.c (struct _migrateEventState, migrateEventState, virMigrateEventCallback): New. (virDomainMigrateVersion1, virDomainMigrateVersion2): Use them. --- src/libvirt.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 60 insertions(+), 8 deletions(-)
diff --git a/src/libvirt.c b/src/libvirt.c index 9d03e13..12e90cd 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2951,6 +2951,42 @@ error: return NULL; }
- virDomainGetInfo (domain, &info); - if (info.state == VIR_DOMAIN_PAUSED) { - flags |= VIR_MIGRATE_PAUSED; + data.domain = domain; + if (virConnectDomainEventRegister (domain->conn, virMigrateEventCallback, + &data, NULL) == -1) { + return NULL; }
Calling virConnectDomainEventRegister() is not valid unless the application has registered an event loop impl with libvit (virEventRegister), but we cannot assume it has done so. Daniel. -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Calling virConnectDomainEventRegister() is not valid unless the application has registered an event loop impl with libvit (virEventRegister), but we cannot assume it has done so.
I'll withdraw this RFC and redo part 3 to check GetDomainInfo. Thanks for the review! Paolo

On 11/25/2009 04:40 PM, Paolo Bonzini wrote:
From: Paolo Bonzini<pbonzini@gnu.org>
This is yet another try on fixing migration of paused vms. Since the driver-only solution didn't work out, I'm now fixing it directly in libvirt.c.
The first two patches are the same as for the previous series. The first is a bugfix for failed migration for the QEMU driver; the second adds a "virsh migrate --suspend" option to leave the VM not running on the destination machine.
As in the previous version of the patch, --suspend/VIR_MIGRATE_PAUSED is the bulk of implementing the new feature. I just query the domain state and, if it is paused, I implicitly add the VIR_MIGRATE_PAUSED flag to the migration. This is the third patch.
I thought about transmitting the state in the end (since it works well if VIR_MIGRATE_PAUSED is passed only to MigrateFinish and not to MigratePerform), but I don't think it's a good idea to commit it now as it's basically untestable. It's in the fourth patch, but if you want it now and think the approach is sound, tell me and I'll test it more heavily.
PING Paolo
participants (2)
-
Daniel P. Berrange
-
Paolo Bonzini