On Thu, Sep 30, 2021 at 04:17:44PM -0400, Laine Stump wrote:
On 9/30/21 1:09 PM, Laurent Vivier wrote:
> If we want to save a snapshot of a VM to a file, we used to follow the
> following steps:
>
> 1- stop the VM:
> (qemu) stop
>
> 2- migrate the VM to a file:
> (qemu) migrate "exec:cat > snapshot"
>
> 3- resume the VM:
> (qemu) cont
>
> After that we can restore the snapshot with:
> qemu-system-x86_64 ... -incoming "exec:cat snapshot"
> (qemu) cont
This is the basics of what libvirt does for a snapshot, and steps 1+2 are
what it does for a "managedsave" (where it saves the snapshot to disk and
then terminates the qemu process, for later re-animation).
In those cases, it seems like this new parameter could work for us - instead
of explicitly pausing the guest prior to migrating it to disk, we would set
this new parameter to on, then directly migrate-to-disk (relying on qemu to
do the pause). Care will need to be taken to assure that error recovery
behaves the same though.
What libvirt does is actually quite different from this in a signficant
way. In the HMP example here 'migrate' is a blocking command that does
not return until migration is finished.
Libvirt uses QMP and 'migrate' there is a asynchronous command that merely
launches the migration and returns control to the client.
IOW, what libvirt does is
stop
migrate
while status != failed || completed
query-migrate
...also receive any QMP migration events...
...possibly modify migration parameters...
cont
With this pattern I'm not seeing any need for a new migration parameter
for libvirt. The migration status lets us distinguish when QEMU is in
the "waiting for unplug" phase vs the "active" phase. So AFAICT,
libvirt
can do:
migrate
while status != failed || completed
query-migrate
...also receive any QMP migration events..
if status changed wait-for-unplug to active
stop
...possibly modify migration parameters...
cont
There is a small window here when the guest CPUs are running
but migration is active. In most cases for libvirt that is
harmless. If there are cases where libvirt needs a strong
guarantee to synchonize the 'stop' with some other option,
then the new proposed "pause-vm" parameter as the same problem
as libvirt can't sychronize against that either.
There are a couple of cases when libvirt apparently *doesn't*
pause the
guest during the migrate-to-disk, both having to do with saving a coredump
of the guest. Since I really have no idea of how common/important that is
(or even if my assessment of the code is correct), I'm Cc'ing this patch to
libvir-list to make sure it catches the attention of someone who knows the
answers and implications.
IIUC, the problem with unplug only happens when libvirt pauses
the guest. So surely if there are some scenarios where we're not
pausing the guest, there's no problem to solve for those.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|