On Mon, Jun 06, 2022 at 15:37:45 +0200, Peter Krempa wrote:
On Wed, Jun 01, 2022 at 14:50:21 +0200, Jiri Denemark wrote:
> QEMU keeps guest CPUs running even in postcopy-paused migration state so
> that processes that already have all memory pages they need migrated to
> the destination can keep running. However, this behavior might bring
> unexpected delays in interprocess communication as some processes will
> be stopped until migration is recover and their memory pages migrated.
> So let's make sure all guest CPUs are paused while postcopy migration is
> paused.
> ---
>
> Notes:
> Version 2:
> - new patch
>
> - this patch does not currently work as QEMU cannot handle "stop"
> QMP command while in postcopy-paused state... the monitor just
> hangs (see
https://gitlab.com/qemu-project/qemu/-/issues/1052 )
Does it then somehow self-heal? Because if not ...
> + } else if (state == VIR_DOMAIN_PAUSED) {
> + qemuProcessStopCPUs(driver, vm, reason, asyncJob);
Then this will obviously break our ability to control qemu. If that is
forever, then we certainly should not be doing this.
In which case if we want to go ahead with pausing it ourselves, once
qemu fixes the issue you've mentioned above, they need to also add a
'feature' flag into QMP which we can probe and avoid breaking qemu
willingly.
Exactly. We either need QEMU to stop the CPUs by itself or fix the bug
and add a way for us to probe it was fixed. Currently our code would
just hang waiting for QEMU reply.
Because of this, pushing the series even without this RFC patch (before
the QEMU issue is sorted out in some way) is actually better than
keeping the current "always pause, even if QEMU is still migrating"
behavior as with the current code we may get stuck after sending "stop"
while QEMU migration is in postcopy-paused.
Jirka