Please fix your address book, it's 'libvir-list(a)redhat.com' not
'libvirt-list(a)redhat.com'
On Tue, Sep 21, 2021 at 00:52:57 +0800, wangjie (P) wrote:
bug reproduce process:
1、perform migrateToURI3.
2、kill libvirtd when enter memory migration phase,and restart libvirtd.
I presume this is a reproducer and not a normal approach.
3、perform migrateToURI3 again and again,migrateToURI3 will fail
forever with err-msg "Requested operation is not valid: domain has active block
job"
I found the reasion which trigger the bug as follow:
1、the qemuBlockJobData is not persistent when libvirtd restart,so the job which return
from qemuBlockJobDiskGetJob while always NULL, so qemuMigrationSrcNBDCopyCancel will not
be taken.
2、calltrace:
qemuProcessReconnect
->qemuProcessRecoverJob
->qemuProcessRecoverMigrationOut
->qemuMigrationSrcCancel
3、code as follow:
qemuMigrationSrcCancel(virQEMUDriver *driver,
virDomainObj *vm)
{
... ...
for (i = 0; i < vm->def->ndisks; i++) {
virDomainDiskDef *disk = vm->def->disks[i];
qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
qemuBlockJobData *job;
if (!(job = qemuBlockJobDiskGetJob(disk)) || //the job is always
NULL !!!
!qemuBlockJobIsRunning(job))
I'll have a look. The blockjob data should have been recovered at this
point. There's possibility that it's just wrong ordering of function
calls.
diskPriv->migrating = false;
if (diskPriv->migrating) {
qemuBlockJobSyncBegin(job);
storage = true;
}
virObjectUnref(job);
}
... ...
if (storage &&
qemuMigrationSrcNBDCopyCancel(driver, vm, true,
QEMU_ASYNC_JOB_NONE, NULL) < 0)
return -1;
... ...
}
Next time please file an issue in the upstream bug tracker.