[libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert

When reverting a running domain to a snapshot(active state), We need to use the FORCE flag for snapshot-revert if current domain configuration is different from the target domain configuration, and this will start a new qemu instance for the target domain. In this situation, if there is existing connection to the domain, say Spice or VNC through virt-manager, Then the libvirtd would crash during snapshot revert because: Both of snapshot revert worker and new worker job 'remoteDispatchDomainOpenGraphicsFd' are waiting for mon->msg->finished in qemuMonitorSend(), We know if IO process resulted in an error with a message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter. Then mon->msg will be set to NULL in qemuMonitorSend() once the worker GraphicsFD is woken up, which causes snapshot revert worker dereferences this null pointer. Not sure whether this scenario makes sense, But at least the libvirtd should not crash, So fix it. Thread 6 "libvirtd" hit Breakpoint 1, qemuMonitorSend 987 if (mon->lastError.code != VIR_ERR_OK) { (gdb) bt #0 in qemuMonitorSend #1 in qemuMonitorJSONCommandWithFd #2 in qemuMonitorJSONSendFileHandle #3 in qemuMonitorSendFileHandle #4 in qemuMonitorOpenGraphics #5 in qemuDomainOpenGraphicsFD #6 in virDomainOpenGraphicsFD #7 in remoteDispatchDomainOpenGraphicsFd #8 in remoteDispatchDomainOpenGraphicsFdHelper #9 in virNetServerProgramDispatchCall #10 in virNetServerProgramDispatch #11 in virNetServerProcessMsg #12 in virNetServerHandleJob #13 in virThreadPoolWorker #14 in virThreadHelper #15 in start_thread #16 in clone (gdb) c Continuing. Thread 2 "libvirtd" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fbd20fe2700 (LWP 19014)] 0x00007fbd1c3be838 in qemuMonitorSend (mon=mon@entry=0x7fbd1804a940, msg=msg@entry=0x7fbd20fe15b0) at ../../src/qemu/qemu_monitor.c:1001 1001 while (!mon->msg->finished) { (gdb) bt #0 in qemuMonitorSend #1 in qemuMonitorJSONCommandWithFd #2 in qemuMonitorJSONCommand #3 in qemuMonitorJSONGetChardevInfo #4 in qemuMonitorGetChardevInfo #5 in qemuProcessWaitForMonitor #6 in qemuProcessLaunch #7 in qemuProcessStart #8 in qemuDomainRevertToSnapshot #9 in virDomainRevertToSnapshot #10 in remoteDispatchDomainRevertToSnapshot #11 in remoteDispatchDomainRevertToSnapshotHelper #12 in virNetServerProgramDispatchCall #13 in virNetServerProgramDispatch #14 in virNetServerProcessMsg #15 in virNetServerHandleJob #16 in virThreadPoolWorker #17 in virThreadHelper #18 in start_thread #19 in clone Signed-off-by: Lin Ma <lma@suse.com> --- src/qemu/qemu_monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ea3e62dc8e..a8344e698b 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon, "mon=%p msg=%s fd=%d", mon, mon->msg->txBuffer, mon->msg->txFD); - while (!mon->msg->finished) { + while (mon->msg && !mon->msg->finished) { if (virCondWait(&mon->notify, &mon->parent.lock) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Unable to wait on monitor condition")); -- 2.23.0

On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
When reverting a running domain to a snapshot(active state), We need to use the FORCE flag for snapshot-revert if current domain configuration is different from the target domain configuration, and this will start a new qemu instance for the target domain.
In this situation, if there is existing connection to the domain, say Spice or VNC through virt-manager, Then the libvirtd would crash during snapshot revert because: Both of snapshot revert worker and new worker job 'remoteDispatchDomainOpenGraphicsFd' are waiting for mon->msg->finished in qemuMonitorSend(), We know if IO process resulted in an error with a message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter. Then mon->msg will be set to NULL in qemuMonitorSend() once the worker GraphicsFD is woken up, which causes snapshot revert worker dereferences this null pointer.
[....]
Signed-off-by: Lin Ma <lma@suse.com> --- src/qemu/qemu_monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ea3e62dc8e..a8344e698b 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon, "mon=%p msg=%s fd=%d", mon, mon->msg->txBuffer, mon->msg->txFD);
- while (!mon->msg->finished) { + while (mon->msg && !mon->msg->finished) {
This fixes only the symptom. The actual problem is in handling of our job state when restarting the qemu process: Please see the following patches which aim to fix the same problem: https://www.redhat.com/archives/libvir-list/2019-December/msg00663.html

On 12/13/19 8:55 AM, Peter Krempa wrote:
On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
When reverting a running domain to a snapshot(active state), We need to use the FORCE flag for snapshot-revert if current domain configuration is different from the target domain configuration, and this will start a new qemu instance for the target domain.
In this situation, if there is existing connection to the domain, say Spice or VNC through virt-manager, Then the libvirtd would crash during snapshot revert because: Both of snapshot revert worker and new worker job 'remoteDispatchDomainOpenGraphicsFd' are waiting for mon->msg->finished in qemuMonitorSend(), We know if IO process resulted in an error with a message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter. Then mon->msg will be set to NULL in qemuMonitorSend() once the worker GraphicsFD is woken up, which causes snapshot revert worker dereferences this null pointer.
[....]
Signed-off-by: Lin Ma <lma@suse.com> --- src/qemu/qemu_monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ea3e62dc8e..a8344e698b 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon, "mon=%p msg=%s fd=%d", mon, mon->msg->txBuffer, mon->msg->txFD);
- while (!mon->msg->finished) { + while (mon->msg && !mon->msg->finished) {
This fixes only the symptom. The actual problem is in handling of our job state when restarting the qemu process:
Please see the following patches which aim to fix the same problem:
https://www.redhat.com/archives/libvir-list/2019-December/msg00663.html
In fact, I've pushed the patch yesterday: d75f865fb9 qemu: fix concurrency crash bug in snapshot revert Does it fix the problem you're seeing? Michal

-----邮件原件----- 发件人: Michal Privoznik <mprivozn@redhat.com> 发送时间: 2019年12月13日 17:30 收件人: Peter Krempa <pkrempa@redhat.com>; Lin Ma <LMa@suse.com> 抄送: libvir-list@redhat.com 主题: Re: [libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
On 12/13/19 8:55 AM, Peter Krempa wrote:
On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
When reverting a running domain to a snapshot(active state), We need to use the FORCE flag for snapshot-revert if current domain configuration is different from the target domain configuration, and this will start a new qemu instance for the target domain.
In this situation, if there is existing connection to the domain, say Spice or VNC through virt-manager, Then the libvirtd would crash during snapshot revert because: Both of snapshot revert worker and new worker job 'remoteDispatchDomainOpenGraphicsFd' are waiting for mon->msg->finished in qemuMonitorSend(), We know if IO process resulted in an error with a message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter. Then mon->msg will be set to NULL in qemuMonitorSend() once the worker GraphicsFD is woken up, which causes snapshot revert worker dereferences this null pointer.
[....]
Signed-off-by: Lin Ma <lma@suse.com> --- src/qemu/qemu_monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ea3e62dc8e..a8344e698b 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon, "mon=%p msg=%s fd=%d", mon, mon->msg->txBuffer, mon->msg->txFD);
- while (!mon->msg->finished) { + while (mon->msg && !mon->msg->finished) {
This fixes only the symptom. The actual problem is in handling of our job state when restarting the qemu process:
Please see the following patches which aim to fix the same problem:
https://www.redhat.com/archives/libvir-list/2019-December/msg00663.htm l
In fact, I've pushed the patch yesterday:
d75f865fb9 qemu: fix concurrency crash bug in snapshot revert
Does it fix the problem you're seeing?
Yes, It does fix the problem. Thanks, Lin

-----邮件原件----- 发件人: Peter Krempa <pkrempa@redhat.com> 发送时间: 2019年12月13日 15:56 收件人: Lin Ma <LMa@suse.com> 抄送: libvir-list@redhat.com; berrange@redhat.com 主题: Re: [libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
When reverting a running domain to a snapshot(active state), We need to use the FORCE flag for snapshot-revert if current domain configuration is different from the target domain configuration, and this will start a new qemu instance for the target domain.
In this situation, if there is existing connection to the domain, say Spice or VNC through virt-manager, Then the libvirtd would crash during snapshot revert because: Both of snapshot revert worker and new worker job 'remoteDispatchDomainOpenGraphicsFd' are waiting for mon->msg->finished in qemuMonitorSend(), We know if IO process resulted in an error with a message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter. Then mon->msg will be set to NULL in qemuMonitorSend() once the worker GraphicsFD is woken up, which causes snapshot revert worker dereferences this null pointer.
[....]
Signed-off-by: Lin Ma <lma@suse.com> --- src/qemu/qemu_monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ea3e62dc8e..a8344e698b 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon, "mon=%p msg=%s fd=%d", mon, mon->msg->txBuffer, mon->msg->txFD);
- while (!mon->msg->finished) { + while (mon->msg && !mon->msg->finished) {
This fixes only the symptom. The actual problem is in handling of our job state when restarting the qemu process:
Please see the following patches which aim to fix the same problem:
https://www.redhat.com/archives/libvir-list/2019-December/msg00663.html
Indeed, It fixed the root cause, Thanks for your information. Lin
participants (4)
-
Lin Ma
-
Lin Ma
-
Michal Privoznik
-
Peter Krempa