Re: [PATCH] conf: Add qemu_job_wait_time option

7 May 2020

      On 2020/05/04 17:13, Michal Privoznik wrote:
...
On 5/4/20 10:07 AM, Peter Krempa wrote:
...
On Fri, May 01, 2020 at 16:09:04 +0900, MIKI Nobuhiro wrote:
...
The waiting time to acquire the lock times out, which leads to a segment fault.
Could you please elaborate here? Adding this band-aid is pointless if it
can timeout later. We do want to fix any locking issue but without
information we can't really.
...
In essence we should make improvements around locks, but as a workaround we
will change the timeout to allow the user to increase it.
This value was defined as 30 seconds, so use it as the default value.
The logs are as follows:
```
Timed out during operation: cannot acquire state change lock \
    (held by monitor=remoteDispatchDomainCreateWithFlags)
libvirtd.service: main process exited, code=killed,status=11/SEGV
```
Unfortunately I don't consider this a proper justification for the
change below. Either re-state why you want this, e.g. saying that
shortening time may give users greater feedback, but mentioning that it
works around a crash is not acceptable as a justification for something
which doesn't fix the crash.
Agreed. Allowing users to configure the timeout makes sense - we already
do that for other timeouts, but if it is masking a real bug we need to
fix it first. Do you have any steps to reproduce the bug? Are you able
to get the stack trace from the coredump?
Here is a stacktrace from the coredump.
But, today I tested again on master branch (commit eea5d63a221a8f36a3ed5b1189fe619d4fa1fde2), and every virtual machines was booted successfully.
So it seems that this bug is already fixed.
I apologize for any time you may spend for me.

(gdb) p mon
$1 = (qemuMonitor *) 0x7fe0dc0142e0
(gdb) p mon->msg
$2 = (qemuMonitorMessagePtr) 0x0  # I supposed that mon is shared between worker threads and some thread may set mon->msg = NULL.

(gdb) bt
#0  qemuMonitorSend (mon=mon@entry=0x7fe0dc0142e0, msg=msg@entry=0x7fe0e3f32350) at qemu/qemu_monitor.c:981
#1  0x00007fe0d23c4428 in qemuMonitorJSONCommandWithFd (mon=0x7fe0dc0142e0, cmd=cmd@entry=0x7fe0dc014660, scm_fd=scm_fd@entry=-1, reply=reply@entry=0x7fe0e3f323e0) at qemu/qemu_monitor_json.c:333
#2  0x00007fe0d23c61cf in qemuMonitorJSONCommand (reply=0x7fe0e3f323e0, cmd=0x7fe0dc014660, mon=<optimized out>) at qemu/qemu_monitor_json.c:358
#3  qemuMonitorJSONSetCapabilities (mon=<optimized out>) at qemu/qemu_monitor_json.c:1611
#4  0x00007fe0d23b6453 in qemuMonitorSetCapabilities (mon=<optimized out>) at qemu/qemu_monitor.c:1582
#5  0x00007fe0d2394e43 in qemuProcessInitMonitor (asyncJob=QEMU_ASYNC_JOB_START, vm=0x7fe0cc028670, driver=0x7fe0801290c0) at qemu/qemu_process.c:1928
#6  qemuConnectMonitor (driver=driver@entry=0x7fe0801290c0, vm=vm@entry=0x7fe0cc028670, asyncJob=asyncJob@entry=6, retry=retry@entry=false, logCtxt=logCtxt@entry=0x7fe0dc044b40) at qemu/qemu_process.c:2003
#7  0x00007fe0d239b69c in qemuProcessWaitForMonitor (logCtxt=0x7fe0dc044b40, asyncJob=6, vm=0x7fe0cc028670, driver=0x7fe0801290c0) at qemu/qemu_process.c:2413
#8  qemuProcessLaunch (conn=conn@entry=0x7fe0c4000a00, driver=driver@entry=0x7fe0801290c0, vm=vm@entry=0x7fe0cc028670, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0,
    vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=flags@entry=17) at qemu/qemu_process.c:6993
#9  0x00007fe0d239f8f2 in qemuProcessStart (conn=conn@entry=0x7fe0c4000a00, driver=driver@entry=0x7fe0801290c0, vm=vm@entry=0x7fe0cc028670, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0,
    migrateFd=migrateFd@entry=-1, migratePath=migratePath@entry=0x0, snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17, flags@entry=1) at qemu/qemu_process.c:7230
#10 0x00007fe0d2402d59 in qemuDomainObjStart (conn=0x7fe0c4000a00, driver=driver@entry=0x7fe0801290c0, vm=0x7fe0cc028670, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7650
#11 0x00007fe0d2403436 in qemuDomainCreateWithFlags (dom=0x7fe0dc0050d0, flags=0) at qemu/qemu_driver.c:7703
#12 0x00007fe0f394f88d in virDomainCreateWithFlags (domain=domain@entry=0x7fe0dc0050d0, flags=0) at libvirt-domain.c:6600
#13 0x000055d9e00348a2 in remoteDispatchDomainCreateWithFlags (server=0x55d9e1c95140, msg=0x55d9e1cb7d10, ret=0x7fe0dc004b80, args=0x7fe0dc005110, rerr=0x7fe0e3f32c10, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:4819
#14 remoteDispatchDomainCreateWithFlagsHelper (server=0x55d9e1c95140, client=<optimized out>, msg=0x55d9e1cb7d10, rerr=0x7fe0e3f32c10, args=0x7fe0dc005110, ret=0x7fe0dc004b80) at remote/remote_daemon_dispatch_stubs.h:4797
#15 0x00007fe0f387c0d9 in virNetServerProgramDispatchCall (msg=0x55d9e1cb7d10, client=0x55d9e1cb6ce0, server=0x55d9e1c95140, prog=0x55d9e1cb3a40) at rpc/virnetserverprogram.c:435
#16 virNetServerProgramDispatch (prog=0x55d9e1cb3a40, server=server@entry=0x55d9e1c95140, client=0x55d9e1cb6ce0, msg=0x55d9e1cb7d10) at rpc/virnetserverprogram.c:302
#17 0x00007fe0f388137d in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55d9e1c95140) at rpc/virnetserver.c:137
#18 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55d9e1c95140) at rpc/virnetserver.c:158
#19 0x00007fe0f37a9c31 in virThreadPoolWorker (opaque=opaque@entry=0x55d9e1c94e50) at util/virthreadpool.c:163
#20 0x00007fe0f37a9038 in virThreadHelper (data=<optimized out>) at util/virthread.c:196
#21 0x00007fe0f0d8ce65 in start_thread () from /lib64/libpthread.so.0
#22 0x00007fe0f0ab588d in clone () from /lib64/libc.so.6
...
...
Changes to news.xml always must be in a separate commit.
Just a short explanation - this is to ease possible backports. For
instance, if there is a bug fix in version X, but a distro wants to
backport it to version X-1 then the news.xml looks completely different
there and the cherry-pick won't apply cleanly.
Thank you for your reviews.
I think this modification might be useful for other situations.
So, I'll reconstruct this patch and submit again.

Re: [PATCH] conf: Add qemu_job_wait_time option

Nobuhiro Miki