
On 3/12/21 1:44 AM, Peter Krempa wrote:
On Thu, Mar 11, 2021 at 16:47:54 -0700, Jim Fehlig wrote:
On 3/10/21 9:37 AM, Peter Krempa wrote:
Commit 94e45d1042e broke exec-restart of virtlogd and virtlockd as the code waiting for the daemon shutdown closed the daemons before exec-restarting.
This reminds me of an odd issue we encountered three years ago, fixed by Daniel
https://listman.redhat.com/archives/libvir-list/2018-March/msg00298.html
I tested your patches but notice locks are still lost on re-exec.
qemu.conf: lock_manager = "lockd"
qemu-lockd.conf: file_lockspace_dir = "/var/lib/libvirt/lockspace"
/var/lib/libvirt/lockspace is nothing special, xfs on a local disk. After starting a VM
# ls /var/lib/libvirt/lockspace/ a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c # lslocks | grep lockd virtlockd 95009 POSIX WRITE 0 0 0 /var/lib/libvirt/lockspace/a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c virtlockd 95009 POSIX 5B WRITE 0 0 0 /run/virtlockd.pid # systemctl reload virtlockd
Could you make sure that the virtlockd process before and after has the same pid, so that it wasn't actually restarted by systemct?
I thought I checked it, but apparently not...
I'm asking because in my current test I've encountered another crash when exec-restarting:
2021-03-12 08:41:31.649+0000: 2765718: error : virJSONValueToBuffer:1946 : internal error: failed to convert virJSONValue to yajl data double free or corruption (fasttop)
Program received signal SIGABRT, Aborted. 0x00007ffff77819d5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff77819d5 in raise () at /lib64/libc.so.6 #1 0x00007ffff776a8a4 in abort () at /lib64/libc.so.6 #2 0x00007ffff77c4177 in __libc_message () at /lib64/libc.so.6 #3 0x00007ffff77cbe6c in annobin_top_check.start () at /lib64/libc.so.6 #4 0x00007ffff77cd393 in _int_free () at /lib64/libc.so.6 #5 0x00007ffff7a0b70d in g_free () at /lib64/libglib-2.0.so.0 #6 0x00007ffff7c0977f in virJSONValueFree (value=0x5555555710b0) at ../../../libvirt/src/util/virjson.c:401 #7 0x000055555555c3f2 in glib_autoptr_clear_virJSONValue (_ptr=0x5555555c4250) at ../../../libvirt/src/util/virjson.h:173 #8 glib_autoptr_cleanup_virJSONValue (_ptr=<synthetic pointer>) at ../../../libvirt/src/util/virjson.h:173 #9 virLockDaemonPreExecRestart (argv=0x7fffffffe428, dmn=<optimized out>, state_file=<optimized out>) at ../../../libvirt/src/locking/lock_daemon.c:700 #10 main (argc=<optimized out>, argv=0x7fffffffe428) at ../../../libvirt/src/locking/lock_daemon.c:1148
because looking again I'm seeing the same crash. Facepalm!
Looks like a double free. I'll post patches later for this.
I noticed your patches are pushed. A quick test verified all is working well now. Thanks! Regards, Jim