On 2020/05/04 17:13, Michal Privoznik wrote:
On 5/4/20 10:07 AM, Peter Krempa wrote:
> On Fri, May 01, 2020 at 16:09:04 +0900, MIKI Nobuhiro wrote:
>> The waiting time to acquire the lock times out, which leads to a segment fault.
>
> Could you please elaborate here? Adding this band-aid is pointless if it
> can timeout later. We do want to fix any locking issue but without
> information we can't really.
>
>> In essence we should make improvements around locks, but as a workaround we
>> will change the timeout to allow the user to increase it.
>> This value was defined as 30 seconds, so use it as the default value.
>> The logs are as follows:
>>
>> ```
>> Timed out during operation: cannot acquire state change lock \
>> (held by monitor=remoteDispatchDomainCreateWithFlags)
>> libvirtd.service: main process exited, code=killed,status=11/SEGV
>> ```
>
> Unfortunately I don't consider this a proper justification for the
> change below. Either re-state why you want this, e.g. saying that
> shortening time may give users greater feedback, but mentioning that it
> works around a crash is not acceptable as a justification for something
> which doesn't fix the crash.
Agreed. Allowing users to configure the timeout makes sense - we already
do that for other timeouts, but if it is masking a real bug we need to
fix it first. Do you have any steps to reproduce the bug? Are you able
to get the stack trace from the coredump?
Here is a stacktrace from the coredump.
But, today I tested again on master branch (commit
eea5d63a221a8f36a3ed5b1189fe619d4fa1fde2), and every virtual machines was booted
successfully.
So it seems that this bug is already fixed.
I apologize for any time you may spend for me.
(gdb) p mon
$1 = (qemuMonitor *) 0x7fe0dc0142e0
(gdb) p mon->msg
$2 = (qemuMonitorMessagePtr) 0x0 # I supposed that mon is shared between worker threads
and some thread may set mon->msg = NULL.
(gdb) bt
#0 qemuMonitorSend (mon=mon@entry=0x7fe0dc0142e0, msg=msg@entry=0x7fe0e3f32350) at
qemu/qemu_monitor.c:981
#1 0x00007fe0d23c4428 in qemuMonitorJSONCommandWithFd (mon=0x7fe0dc0142e0,
cmd=cmd@entry=0x7fe0dc014660, scm_fd=scm_fd@entry=-1, reply=reply@entry=0x7fe0e3f323e0) at
qemu/qemu_monitor_json.c:333
#2 0x00007fe0d23c61cf in qemuMonitorJSONCommand (reply=0x7fe0e3f323e0,
cmd=0x7fe0dc014660, mon=<optimized out>) at qemu/qemu_monitor_json.c:358
#3 qemuMonitorJSONSetCapabilities (mon=<optimized out>) at
qemu/qemu_monitor_json.c:1611
#4 0x00007fe0d23b6453 in qemuMonitorSetCapabilities (mon=<optimized out>) at
qemu/qemu_monitor.c:1582
#5 0x00007fe0d2394e43 in qemuProcessInitMonitor (asyncJob=QEMU_ASYNC_JOB_START,
vm=0x7fe0cc028670, driver=0x7fe0801290c0) at qemu/qemu_process.c:1928
#6 qemuConnectMonitor (driver=driver@entry=0x7fe0801290c0, vm=vm@entry=0x7fe0cc028670,
asyncJob=asyncJob@entry=6, retry=retry@entry=false, logCtxt=logCtxt@entry=0x7fe0dc044b40)
at qemu/qemu_process.c:2003
#7 0x00007fe0d239b69c in qemuProcessWaitForMonitor (logCtxt=0x7fe0dc044b40, asyncJob=6,
vm=0x7fe0cc028670, driver=0x7fe0801290c0) at qemu/qemu_process.c:2413
#8 qemuProcessLaunch (conn=conn@entry=0x7fe0c4000a00, driver=driver@entry=0x7fe0801290c0,
vm=vm@entry=0x7fe0cc028670, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START,
incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0,
vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=flags@entry=17) at
qemu/qemu_process.c:6993
#9 0x00007fe0d239f8f2 in qemuProcessStart (conn=conn@entry=0x7fe0c4000a00,
driver=driver@entry=0x7fe0801290c0, vm=vm@entry=0x7fe0cc028670,
updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START,
migrateFrom=migrateFrom@entry=0x0,
migrateFd=migrateFd@entry=-1, migratePath=migratePath@entry=0x0,
snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17,
flags@entry=1) at qemu/qemu_process.c:7230
#10 0x00007fe0d2402d59 in qemuDomainObjStart (conn=0x7fe0c4000a00,
driver=driver@entry=0x7fe0801290c0, vm=0x7fe0cc028670, flags=flags@entry=0,
asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7650
#11 0x00007fe0d2403436 in qemuDomainCreateWithFlags (dom=0x7fe0dc0050d0, flags=0) at
qemu/qemu_driver.c:7703
#12 0x00007fe0f394f88d in virDomainCreateWithFlags (domain=domain@entry=0x7fe0dc0050d0,
flags=0) at libvirt-domain.c:6600
#13 0x000055d9e00348a2 in remoteDispatchDomainCreateWithFlags (server=0x55d9e1c95140,
msg=0x55d9e1cb7d10, ret=0x7fe0dc004b80, args=0x7fe0dc005110, rerr=0x7fe0e3f32c10,
client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:4819
#14 remoteDispatchDomainCreateWithFlagsHelper (server=0x55d9e1c95140, client=<optimized
out>, msg=0x55d9e1cb7d10, rerr=0x7fe0e3f32c10, args=0x7fe0dc005110, ret=0x7fe0dc004b80)
at remote/remote_daemon_dispatch_stubs.h:4797
#15 0x00007fe0f387c0d9 in virNetServerProgramDispatchCall (msg=0x55d9e1cb7d10,
client=0x55d9e1cb6ce0, server=0x55d9e1c95140, prog=0x55d9e1cb3a40) at
rpc/virnetserverprogram.c:435
#16 virNetServerProgramDispatch (prog=0x55d9e1cb3a40, server=server@entry=0x55d9e1c95140,
client=0x55d9e1cb6ce0, msg=0x55d9e1cb7d10) at rpc/virnetserverprogram.c:302
#17 0x00007fe0f388137d in virNetServerProcessMsg (msg=<optimized out>,
prog=<optimized out>, client=<optimized out>, srv=0x55d9e1c95140) at
rpc/virnetserver.c:137
#18 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55d9e1c95140) at
rpc/virnetserver.c:158
#19 0x00007fe0f37a9c31 in virThreadPoolWorker (opaque=opaque@entry=0x55d9e1c94e50) at
util/virthreadpool.c:163
#20 0x00007fe0f37a9038 in virThreadHelper (data=<optimized out>) at
util/virthread.c:196
#21 0x00007fe0f0d8ce65 in start_thread () from /lib64/libpthread.so.0
#22 0x00007fe0f0ab588d in clone () from /lib64/libc.so.6
> Changes to news.xml always must be in a separate commit.
Just a short explanation - this is to ease possible backports. For
instance, if there is a bug fix in version X, but a distro wants to
backport it to version X-1 then the news.xml looks completely different
there and the cherry-pick won't apply cleanly.
Thank you for your reviews.
I think this modification might be useful for other situations.
So, I'll reconstruct this patch and submit again.