On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
> When reverting a running domain to a snapshot(active state), We need to
> use the FORCE flag for snapshot-revert if current domain configuration
> is different from the target domain configuration, and this will start a
> new qemu instance for the target domain.
>
> In this situation, if there is existing connection to the domain, say
> Spice or VNC through virt-manager, Then the libvirtd would crash during
> snapshot revert because: Both of snapshot revert worker and new worker
> job 'remoteDispatchDomainOpenGraphicsFd' are waiting for
mon->msg->finished
> in qemuMonitorSend(), We know if IO process resulted in an error with a
> message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter.
> Then mon->msg will be set to NULL in qemuMonitorSend() once the worker
> GraphicsFD is woken up, which causes snapshot revert worker dereferences
> this null pointer.
[....]
> Signed-off-by: Lin Ma <lma(a)suse.com>
> ---
> src/qemu/qemu_monitor.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
> index ea3e62dc8e..a8344e698b 100644
> --- a/src/qemu/qemu_monitor.c
> +++ b/src/qemu/qemu_monitor.c
> @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon,
> "mon=%p msg=%s fd=%d",
> mon, mon->msg->txBuffer, mon->msg->txFD);
>
> - while (!mon->msg->finished) {
> + while (mon->msg && !mon->msg->finished) {
This fixes only the symptom. The actual problem is in handling of our
job state when restarting the qemu process:
Please see the following patches which aim to fix the same problem:
https://www.redhat.com/archives/libvir-list/2019-December/msg00663.html
In fact, I've pushed the patch yesterday:
d75f865fb9 qemu: fix concurrency crash bug in snapshot revert
Does it fix the problem you're seeing?
Michal