Hi developers,
Recently, I am researching migration with non-share disk(flags
VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC).
As we know, the non-shared disk migration could have block jobs to copy the
disk image from the src host to the dst host. So here are my questions for
non-shared disk migration:
q1. For the API virDomainMigrate3 with the bandwidth param, could it set
the bandwidth of block jobs?
q2. For the API virDomainMigrateSetMaxSpeed, could it set the bandwidth of
block jobs?
q3. For the domain job abort API virDomainAbortJob, could it stop the block
job of non-shared disk migration?
q4. For the block job bandwidth API virDomainBlockJobSetSpeed, could it set
the block job of non-shared disk migration?
q5. For the block job abort API virDomainBlockJobAbort, could it stop the
block job of non-shared disk migration?
Then I got the test results of libvirt-8.4.0-1.el9.x86_64
qemu-kvm-7.0.0-4.el9.x86_64:
q1: The bandwidth limit of virDomainMigrate3 is effective to the blockjob:
➜ ~ virsh migrate OVMF qemu+ssh://root@hhan-rhel9--1/system --live --p2p
--tls --tls-destination hhan-rhel9--1 --copy-storage-all --disks-uri
tcp://hhan-rhel9--1:49156 --bandwidth 2
➜ ~ virsh blockjob OVMF vda
Block Copy: [ 0 %] Bandwidth limit: 2097152 bytes/s (2.000 MiB/s)
q2: The virDomainMigrateSetMaxSpeed doesn't change the the bandwidth of
block jobs.
➜ ~ virsh migrate-setspeed OVMF 8
➜ ~ virsh blockjob OVMF vda
Block Copy: [ 9 %] Bandwidth limit: 2097152 bytes/s (2.000 MiB/s)
q3: The virDomainAbortJob could stop a block job of non-shared disk
migration
➜ ~ virsh migrate OVMF qemu+ssh://root@hhan-rhel9--1/system --live --p2p
--tls --tls-destination hhan-rhel9--1 --copy-storage-all --disks-uri
tcp://hhan-rhel9--1:49156 --bandwidth 2
Then start a virsh event on another terminal:
➜ ~ virsh event --loop --all
Abort the domain job:
➜ ~ virsh domjobabort OVMF
The error "error: operation aborted: migration out: canceled by client"
appears at the terminal of "virsh migrate"
The terminal of "virsh event" shows the block job has been failed:
event 'block-job' for domain 'OVMF': Block Copy for
/var/lib/libvirt/images/OVMF.qcow2 failed
event 'block-job-2' for domain 'OVMF': Block Copy for vda failed
q4: The block job bandwidth of non-shared disk migration cannot be set by
virDomainBlockJobSetSpeed:
➜ ~ virsh blockjob OVMF vda --bandwidth 10
error: Timed out during operation: cannot acquire state change lock (held
by monitor=remoteDispatchDomainMigratePerform3Params)
q5: The block job of non-shared disk migration cannot be aborted by
virDomainBlockJobAbort:
➜ ~ virsh blockjob OVMF vda --abort
error: Timed out during operation: cannot acquire state change lock (held
by monitor=remoteDispatchDomainMigratePerform3Params)
Are the results above expected?
Here are my personal thoughts:
For the bandwidth in q1 and q2, they are commented as migration bandwidth(
https://gitlab.com/libvirt/libvirt/-/blob/master/include/libvirt/libvirt-...
,
https://gitlab.com/libvirt/libvirt/-/blob/master/src/libvirt-domain.c#L9696
), but one works for block jobs while one doesn't. So we should make the
comment clear whether they are the bandwidth of VM migration or the
bandwidth of migration with blockjobs. What's more, add a flag to
virDomainMigrateMaxSpeedFlags to support set bandwidth to the blockjobs in
migration.
For q4 and q5, if we will not support to change the block job of non-shared
disk migration by blockjob APIs, we should note that in the migration doc
or the block job doc, to present the difference between this type of block
job and the others.