[libvirt] [PATCH] Do not try to cancel non-existent migration on source

If migration failed on source daemon, the migration is automatically canceled by the daemon itself. Thus we don't need to call virDomainMigrateConfirm3(cancelled=1). Calling it doesn't cause any harm but the resulting error message printed in logs may confuse people. --- src/libvirt.c | 41 +++++++++++++++++++++++++---------------- 1 files changed, 25 insertions(+), 16 deletions(-) diff --git a/src/libvirt.c b/src/libvirt.c index c8af3e1..256828c 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -4141,6 +4141,7 @@ virDomainMigrateVersion3(virDomainPtr domain, virErrorPtr orig_err = NULL; int cancelled = 1; unsigned long protection = 0; + bool notify_source = true; VIR_DOMAIN_DEBUG(domain, "dconn=%p xmlin=%s, flags=%lx, " "dname=%s, uri=%s, bandwidth=%lu", @@ -4221,8 +4222,13 @@ virDomainMigrateVersion3(virDomainPtr domain, uri, flags | protection, dname, bandwidth); /* Perform failed. Make sure Finish doesn't overwrite the error */ - if (ret < 0) + if (ret < 0) { orig_err = virSaveLastError(); + /* Perform failed so we don't need to call confirm to let source know + * about the failure. + */ + notify_source = false; + } /* If Perform returns < 0, then we need to cancel the VM * startup on the destination @@ -4265,22 +4271,25 @@ finish: confirm: /* - * If cancelled, then src VM will be restarted, else - * it will be killed - */ - VIR_DEBUG("Confirm3 %p ret=%d domain=%p", domain->conn, ret, domain); - VIR_FREE(cookiein); - cookiein = cookieout; - cookieinlen = cookieoutlen; - cookieout = NULL; - cookieoutlen = 0; - ret = domain->conn->driver->domainMigrateConfirm3 - (domain, cookiein, cookieinlen, - flags | protection, cancelled); - /* If Confirm3 returns -1, there's nothing more we can - * do, but fortunately worst case is that there is a - * domain left in 'paused' state on source. + * If cancelled, then src VM will be restarted, else it will be killed. + * Don't do this if migration failed on source and thus it was already + * cancelled there. */ + if (notify_source) { + VIR_DEBUG("Confirm3 %p ret=%d domain=%p", domain->conn, ret, domain); + VIR_FREE(cookiein); + cookiein = cookieout; + cookieinlen = cookieoutlen; + cookieout = NULL; + cookieoutlen = 0; + ret = domain->conn->driver->domainMigrateConfirm3 + (domain, cookiein, cookieinlen, + flags | protection, cancelled); + /* If Confirm3 returns -1, there's nothing more we can + * do, but fortunately worst case is that there is a + * domain left in 'paused' state on source. + */ + } done: if (orig_err) { -- 1.7.6

On 08/16/2011 04:44 AM, Jiri Denemark wrote:
If migration failed on source daemon, the migration is automatically canceled by the daemon itself. Thus we don't need to call virDomainMigrateConfirm3(cancelled=1). Calling it doesn't cause any harm but the resulting error message printed in logs may confuse people. --- src/libvirt.c | 41 +++++++++++++++++++++++++---------------- 1 files changed, 25 insertions(+), 16 deletions(-)
Shouldn't qemu_migration.c get the same fix, since peer2peer and tunneled migration basically re-implement the same migration driver function?
+ if (notify_source) { + VIR_DEBUG("Confirm3 %p ret=%d domain=%p", domain->conn, ret, domain); + VIR_FREE(cookiein); + cookiein = cookieout; + cookieinlen = cookieoutlen; + cookieout = NULL; + cookieoutlen = 0; + ret = domain->conn->driver->domainMigrateConfirm3 + (domain, cookiein, cookieinlen, + flags | protection, cancelled);
And this brings us right back to Alex's question about ret being a dead store - do we need to add a VIR_WARN here based on ret, or just drop the assignment? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Tue, Aug 16, 2011 at 08:33:04 -0600, Eric Blake wrote:
On 08/16/2011 04:44 AM, Jiri Denemark wrote:
If migration failed on source daemon, the migration is automatically canceled by the daemon itself. Thus we don't need to call virDomainMigrateConfirm3(cancelled=1). Calling it doesn't cause any harm but the resulting error message printed in logs may confuse people. --- src/libvirt.c | 41 +++++++++++++++++++++++++---------------- 1 files changed, 25 insertions(+), 16 deletions(-)
Shouldn't qemu_migration.c get the same fix, since peer2peer and tunneled migration basically re-implement the same migration driver function?
It's reimplementing the same thing but in a different context. The code in libvirt.c runs on a client, while qemu_migration.c runs within the qemu driver in libvirtd. The virDomainMigratePrform3 API called from libvirt.c cleans up after itself; it resumes a domain and finishes the migration job in case of error. On the other hand, everything in doPeer2PeerMigrate3 is run within a single public API and the function called to do the Perform phase is an internal function which doesn't clean up after itself in the same way as the public API has to. Hence, we need to call qemuMigrationConfirm to actually resume the domain even if the Perform phase failed.
+ if (notify_source) { + VIR_DEBUG("Confirm3 %p ret=%d domain=%p", domain->conn, ret, domain); + VIR_FREE(cookiein); + cookiein = cookieout; + cookieinlen = cookieoutlen; + cookieout = NULL; + cookieoutlen = 0; + ret = domain->conn->driver->domainMigrateConfirm3 + (domain, cookiein, cookieinlen, + flags | protection, cancelled);
And this brings us right back to Alex's question about ret being a dead store - do we need to add a VIR_WARN here based on ret, or just drop the assignment?
AFAIK we do not log reported errors until they are dispatched to clients so the log doesn't contain any error from domainMigrateConfirm3. So IMHO VIR_WARNing about the failure seems like the way to go. Jirka

On Tue, Aug 16, 2011 at 21:39:41 +0200, Jiri Denemark wrote:
On Tue, Aug 16, 2011 at 08:33:04 -0600, Eric Blake wrote:
On 08/16/2011 04:44 AM, Jiri Denemark wrote:
If migration failed on source daemon, the migration is automatically canceled by the daemon itself. Thus we don't need to call virDomainMigrateConfirm3(cancelled=1). Calling it doesn't cause any harm but the resulting error message printed in logs may confuse people. --- src/libvirt.c | 41 +++++++++++++++++++++++++---------------- 1 files changed, 25 insertions(+), 16 deletions(-)
Shouldn't qemu_migration.c get the same fix, since peer2peer and tunneled migration basically re-implement the same migration driver function?
It's reimplementing the same thing but in a different context. The code in libvirt.c runs on a client, while qemu_migration.c runs within the qemu driver in libvirtd. The virDomainMigratePrform3 API called from libvirt.c cleans up after itself; it resumes a domain and finishes the migration job in case of error. On the other hand, everything in doPeer2PeerMigrate3 is run within a single public API and the function called to do the Perform phase is an internal function which doesn't clean up after itself in the same way as the public API has to. Hence, we need to call qemuMigrationConfirm to actually resume the domain even if the Perform phase failed.
Ping. Did I manage to convince you this patch is complete? :-) Jirka

On 08/25/2011 01:03 PM, Jiri Denemark wrote:
On Tue, Aug 16, 2011 at 21:39:41 +0200, Jiri Denemark wrote:
On Tue, Aug 16, 2011 at 08:33:04 -0600, Eric Blake wrote:
On 08/16/2011 04:44 AM, Jiri Denemark wrote:
If migration failed on source daemon, the migration is automatically canceled by the daemon itself. Thus we don't need to call virDomainMigrateConfirm3(cancelled=1). Calling it doesn't cause any harm but the resulting error message printed in logs may confuse people. --- src/libvirt.c | 41 +++++++++++++++++++++++++---------------- 1 files changed, 25 insertions(+), 16 deletions(-)
Shouldn't qemu_migration.c get the same fix, since peer2peer and tunneled migration basically re-implement the same migration driver function?
It's reimplementing the same thing but in a different context. The code in libvirt.c runs on a client, while qemu_migration.c runs within the qemu driver in libvirtd. The virDomainMigratePrform3 API called from libvirt.c cleans up after itself; it resumes a domain and finishes the migration job in case of error. On the other hand, everything in doPeer2PeerMigrate3 is run within a single public API and the function called to do the Perform phase is an internal function which doesn't clean up after itself in the same way as the public API has to. Hence, we need to call qemuMigrationConfirm to actually resume the domain even if the Perform phase failed.
Ping. Did I manage to convince you this patch is complete? :-)
Yes. ACK, although since I just pushed Alex' patch, you might have a merge conflict to fix when you rebase. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Aug 25, 2011 at 13:30:43 -0600, Eric Blake wrote:
On 08/25/2011 01:03 PM, Jiri Denemark wrote:
On Tue, Aug 16, 2011 at 21:39:41 +0200, Jiri Denemark wrote:
On Tue, Aug 16, 2011 at 08:33:04 -0600, Eric Blake wrote:
On 08/16/2011 04:44 AM, Jiri Denemark wrote: Shouldn't qemu_migration.c get the same fix, since peer2peer and tunneled migration basically re-implement the same migration driver function?
It's reimplementing the same thing but in a different context. The code in libvirt.c runs on a client, while qemu_migration.c runs within the qemu driver in libvirtd. The virDomainMigratePrform3 API called from libvirt.c cleans up after itself; it resumes a domain and finishes the migration job in case of error. On the other hand, everything in doPeer2PeerMigrate3 is run within a single public API and the function called to do the Perform phase is an internal function which doesn't clean up after itself in the same way as the public API has to. Hence, we need to call qemuMigrationConfirm to actually resume the domain even if the Perform phase failed.
Ping. Did I manage to convince you this patch is complete? :-)
Yes. ACK, although since I just pushed Alex' patch, you might have a merge conflict to fix when you rebase.
No problem, thanks and pushed. Jirka
participants (2)
-
Eric Blake
-
Jiri Denemark