On Fri, Apr 07, 2017 at 02:12:31PM +0200, Kashyap Chamarthy wrote:
On Fri, Apr 07, 2017 at 08:22:01AM +0200, Jiri Denemark wrote:
> On Thu, Apr 06, 2017 at 18:14:07 +0200, Kashyap Chamarthy wrote:
> > [Filed this bug --
https://bugzilla.redhat.com/show_bug.cgi?id=1439841]
> >
> > Easy reproducer:
> >
> > $ virsh migrate --verbose --copy-storage-all \
> > --p2p --live l2-f25 qemu+ssh://root@devstack-a/system
> > error: invalid argument: monitor must not be NULL
>
> This is caused by the TLS migration code and most likely fixed by
>
https://www.redhat.com/archives/libvir-list/2017-April/msg00219.html
Thanks. I'll test with your series & report back on that thread.
[Since the above series is pushed, responding here.]
I just built (RPMs) from libvirt Git, which has the above series ("qemu:
Properly reset all migration capabilities"). I was here when I tested it:
$ git describe
v3.2.0-80-gbe193c4
I did two tests (same reproducer command-line as above):
(Test-1) Migrate a guest from source to destination:
Result: Succeeds (the migrated guest successfully runs on the
destination)
(Test-2) Once 'Test-1' finished successfully, and the guest is running
successfully on the destination, migrate it back to source:
Result: Fails.
$ virsh migrate --verbose --copy-storage-all \
--p2p --live l2-f25 qemu+ssh://root@l1-f25/system
error: operation failed: migration job: is not active
Looking at the source debug log (URLs to complete logs further below), I
see the dreaded "cannot acquire state change lock" error.
[...]
2017-04-10 06:29:23.322+0000: 22676: warning : qemuDomainObjBeginJobInternal:3607 : Cannot
start job (modify, none) for domain l2-f25; current job is (none, migration out) owned by
(0 <null>
, 16698 remoteDispatchDomainMigratePerform3Params) for (0s, 96s)
2017-04-10 06:29:23.322+0000: 22676: error : qemuDomainObjBeginJobInternal:3619 : Timed
out during operation: cannot acquire state change lock (held by
+remoteDispatchDomainMigratePerform3Params)
[...]
2017-04-10 06:31:57.525+0000: 16698: error : qemuMigrationCheckJobStatus:1420 : operation
failed: migration job: is not active
2017-04-10 06:31:57.525+0000: 16698: debug : qemuMigrationCancelDriveMirror:785 :
Cancelling drive mirrors for domain l2-f25
[...]
2017-04-10 06:31:57.538+0000: 16698: debug : qemuMigrationDriveMirrorCancelled:700 : All
disk mirrors are gone
2017-04-10 06:31:57.538+0000: 16698: debug : doPeer2PeerMigrate3:4428 : Finish3
0x7f39d801e3d0 ret=-1
2017-04-10 06:31:57.539+0000: 16698: debug : qemuDomainObjEnterRemote:3918 : Entering
remote (vm=0x563b26a60e60 name=l2-f25)
2017-04-10 06:31:57.783+0000: 16698: error : virNetClientProgramDispatchError:177 :
migration successfully aborted
[...]
Complete libvirt debug logs (with appropriate log filters):
- libvirtd debug log of source host (after a failed migration from
destination to source) --
https://bugzilla.redhat.com/attachment.cgi?id=1270407
- libvirtd debug log of destination host (after a failed migration from
destination to source) --
https://bugzilla.redhat.com/attachment.cgi?id=1270406
--
/kashyap