[PATCH 0/2] Fix exec-restart of virtlogd and virtlockd

Peter Krempa (2): virnetdaemon: Introduce virNetDaemonQuitExecRestart virtlo(g|ck)d: Fix exec-restart src/libvirt_remote.syms | 1 + src/locking/lock_daemon.c | 2 +- src/logging/log_daemon.c | 2 +- src/rpc/virnetdaemon.c | 19 +++++++++++++++++++ src/rpc/virnetdaemon.h | 1 + 5 files changed, 23 insertions(+), 2 deletions(-) -- 2.29.2

Recent changes which meant to fix daemon shutdown broke the exec-restart capability of virtlogd and virtlockd, since the code actually closed all the sockets and shut down all the internals. Add virNetDaemonQuitExecRestart, which requests a shutdown of the process, but keeps all the services open and registered since they are preserved across the restart. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/libvirt_remote.syms | 1 + src/rpc/virnetdaemon.c | 19 +++++++++++++++++++ src/rpc/virnetdaemon.h | 1 + 3 files changed, 21 insertions(+) diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms index 3cd84a0854..605271bcb2 100644 --- a/src/libvirt_remote.syms +++ b/src/libvirt_remote.syms @@ -85,6 +85,7 @@ virNetDaemonNew; virNetDaemonNewPostExecRestart; virNetDaemonPreExecRestart; virNetDaemonQuit; +virNetDaemonQuitExecRestart; virNetDaemonRemoveShutdownInhibition; virNetDaemonRun; virNetDaemonSetShutdownCallbacks; diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 327540c4b4..cac60ef488 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -76,6 +76,7 @@ struct _virNetDaemon { bool quit; bool finished; bool graceful; + bool execRestart; unsigned int autoShutdownTimeout; size_t autoShutdownInhibitions; @@ -857,6 +858,10 @@ virNetDaemonRun(virNetDaemonPtr dmn) virHashForEach(dmn->servers, daemonServerProcessClients, NULL); + /* don't shutdown services when performing an exec-restart */ + if (dmn->quit && dmn->execRestart) + goto cleanup; + if (dmn->quit && dmn->finishTimer == -1) { virHashForEach(dmn->servers, daemonServerClose, NULL); if (dmn->shutdownPrepareCb && dmn->shutdownPrepareCb() < 0) @@ -912,6 +917,20 @@ virNetDaemonQuit(virNetDaemonPtr dmn) virObjectUnlock(dmn); } + +void +virNetDaemonQuitExecRestart(virNetDaemonPtr dmn) +{ + virObjectLock(dmn); + + VIR_DEBUG("Exec-restart requested %p", dmn); + dmn->quit = true; + dmn->execRestart = true; + + virObjectUnlock(dmn); +} + + static int daemonServerClose(void *payload, const char *key G_GNUC_UNUSED, diff --git a/src/rpc/virnetdaemon.h b/src/rpc/virnetdaemon.h index fcc6e1fdff..b3d81b6aaf 100644 --- a/src/rpc/virnetdaemon.h +++ b/src/rpc/virnetdaemon.h @@ -75,6 +75,7 @@ void virNetDaemonSetStateStopWorkerThread(virNetDaemonPtr dmn, void virNetDaemonRun(virNetDaemonPtr dmn); void virNetDaemonQuit(virNetDaemonPtr dmn); +void virNetDaemonQuitExecRestart(virNetDaemonPtr dmn); bool virNetDaemonHasClients(virNetDaemonPtr dmn); -- 2.29.2

On 3/10/21 5:37 PM, Peter Krempa wrote:
Recent changes which meant to fix daemon shutdown broke the exec-restart capability of virtlogd and virtlockd, since the code actually closed all the sockets and shut down all the internals.
Add virNetDaemonQuitExecRestart, which requests a shutdown of the process, but keeps all the services open and registered since they are preserved across the restart.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/libvirt_remote.syms | 1 + src/rpc/virnetdaemon.c | 19 +++++++++++++++++++ src/rpc/virnetdaemon.h | 1 + 3 files changed, 21 insertions(+)
diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms index 3cd84a0854..605271bcb2 100644 --- a/src/libvirt_remote.syms +++ b/src/libvirt_remote.syms @@ -85,6 +85,7 @@ virNetDaemonNew; virNetDaemonNewPostExecRestart; virNetDaemonPreExecRestart; virNetDaemonQuit; +virNetDaemonQuitExecRestart; virNetDaemonRemoveShutdownInhibition; virNetDaemonRun; virNetDaemonSetShutdownCallbacks; diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 327540c4b4..cac60ef488 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -76,6 +76,7 @@ struct _virNetDaemon { bool quit; bool finished; bool graceful; + bool execRestart;
unsigned int autoShutdownTimeout; size_t autoShutdownInhibitions; @@ -857,6 +858,10 @@ virNetDaemonRun(virNetDaemonPtr dmn)
virHashForEach(dmn->servers, daemonServerProcessClients, NULL);
+ /* don't shutdown services when performing an exec-restart */ + if (dmn->quit && dmn->execRestart)
1: you'd check only for @execRestart here.
+ goto cleanup; + if (dmn->quit && dmn->finishTimer == -1) { virHashForEach(dmn->servers, daemonServerClose, NULL); if (dmn->shutdownPrepareCb && dmn->shutdownPrepareCb() < 0) @@ -912,6 +917,20 @@ virNetDaemonQuit(virNetDaemonPtr dmn) virObjectUnlock(dmn); }
+ +void +virNetDaemonQuitExecRestart(virNetDaemonPtr dmn) +{ + virObjectLock(dmn); + + VIR_DEBUG("Exec-restart requested %p", dmn); + dmn->quit = true; + dmn->execRestart = true;
I think you don't need to set @quit flag, if [1].
+ + virObjectUnlock(dmn); +} + +
Michal

On Fri, Mar 12, 2021 at 13:13:15 +0100, Michal Privoznik wrote:
On 3/10/21 5:37 PM, Peter Krempa wrote:
Recent changes which meant to fix daemon shutdown broke the exec-restart capability of virtlogd and virtlockd, since the code actually closed all the sockets and shut down all the internals.
Add virNetDaemonQuitExecRestart, which requests a shutdown of the process, but keeps all the services open and registered since they are preserved across the restart.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/libvirt_remote.syms | 1 + src/rpc/virnetdaemon.c | 19 +++++++++++++++++++ src/rpc/virnetdaemon.h | 1 + 3 files changed, 21 insertions(+)
diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms index 3cd84a0854..605271bcb2 100644 --- a/src/libvirt_remote.syms +++ b/src/libvirt_remote.syms @@ -85,6 +85,7 @@ virNetDaemonNew; virNetDaemonNewPostExecRestart; virNetDaemonPreExecRestart; virNetDaemonQuit; +virNetDaemonQuitExecRestart; virNetDaemonRemoveShutdownInhibition; virNetDaemonRun; virNetDaemonSetShutdownCallbacks; diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 327540c4b4..cac60ef488 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -76,6 +76,7 @@ struct _virNetDaemon { bool quit; bool finished; bool graceful; + bool execRestart;
unsigned int autoShutdownTimeout; size_t autoShutdownInhibitions; @@ -857,6 +858,10 @@ virNetDaemonRun(virNetDaemonPtr dmn)
virHashForEach(dmn->servers, daemonServerProcessClients, NULL);
+ /* don't shutdown services when performing an exec-restart */ + if (dmn->quit && dmn->execRestart)
1: you'd check only for @execRestart here.
+ goto cleanup; + if (dmn->quit && dmn->finishTimer == -1) { virHashForEach(dmn->servers, daemonServerClose, NULL); if (dmn->shutdownPrepareCb && dmn->shutdownPrepareCb() < 0) @@ -912,6 +917,20 @@ virNetDaemonQuit(virNetDaemonPtr dmn) virObjectUnlock(dmn); }
+ +void +virNetDaemonQuitExecRestart(virNetDaemonPtr dmn) +{ + virObjectLock(dmn); + + VIR_DEBUG("Exec-restart requested %p", dmn); + dmn->quit = true; + dmn->execRestart = true;
I think you don't need to set @quit flag, if [1].
I wasn't sure whether quit doesn't have any other special meaning outside of virNetDaemonRun. For now I'll probably leave both flags enabled.
+ + virObjectUnlock(dmn); +} + +
Michal

Commit 94e45d1042e broke exec-restart of virtlogd and virtlockd as the code waiting for the daemon shutdown closed the daemons before exec-restarting. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1912243 Fixes: 94e45d1042e Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/locking/lock_daemon.c | 2 +- src/logging/log_daemon.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index 04038d2668..ffde2017ac 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -336,7 +336,7 @@ virLockDaemonExecRestartHandler(virNetDaemonPtr dmn, void *opaque G_GNUC_UNUSED) { execRestart = true; - virNetDaemonQuit(dmn); + virNetDaemonQuitExecRestart(dmn); } static int diff --git a/src/logging/log_daemon.c b/src/logging/log_daemon.c index aa76dcd329..e81de50899 100644 --- a/src/logging/log_daemon.c +++ b/src/logging/log_daemon.c @@ -283,7 +283,7 @@ virLogDaemonExecRestartHandler(virNetDaemonPtr dmn, void *opaque G_GNUC_UNUSED) { execRestart = true; - virNetDaemonQuit(dmn); + virNetDaemonQuitExecRestart(dmn); } static int -- 2.29.2

On 3/10/21 9:37 AM, Peter Krempa wrote:
Commit 94e45d1042e broke exec-restart of virtlogd and virtlockd as the code waiting for the daemon shutdown closed the daemons before exec-restarting.
This reminds me of an odd issue we encountered three years ago, fixed by Daniel https://listman.redhat.com/archives/libvir-list/2018-March/msg00298.html I tested your patches but notice locks are still lost on re-exec. qemu.conf: lock_manager = "lockd" qemu-lockd.conf: file_lockspace_dir = "/var/lib/libvirt/lockspace" /var/lib/libvirt/lockspace is nothing special, xfs on a local disk. After starting a VM # ls /var/lib/libvirt/lockspace/ a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c # lslocks | grep lockd virtlockd 95009 POSIX WRITE 0 0 0 /var/lib/libvirt/lockspace/a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c virtlockd 95009 POSIX 5B WRITE 0 0 0 /run/virtlockd.pid # systemctl reload virtlockd # ls /var/lib/libvirt/lockspace/ a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c # lslocks | grep lockd # Regards, Jim
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1912243 Fixes: 94e45d1042e Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/locking/lock_daemon.c | 2 +- src/logging/log_daemon.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index 04038d2668..ffde2017ac 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -336,7 +336,7 @@ virLockDaemonExecRestartHandler(virNetDaemonPtr dmn, void *opaque G_GNUC_UNUSED) { execRestart = true; - virNetDaemonQuit(dmn); + virNetDaemonQuitExecRestart(dmn); }
static int diff --git a/src/logging/log_daemon.c b/src/logging/log_daemon.c index aa76dcd329..e81de50899 100644 --- a/src/logging/log_daemon.c +++ b/src/logging/log_daemon.c @@ -283,7 +283,7 @@ virLogDaemonExecRestartHandler(virNetDaemonPtr dmn, void *opaque G_GNUC_UNUSED) { execRestart = true; - virNetDaemonQuit(dmn); + virNetDaemonQuitExecRestart(dmn); }
static int

On Thu, Mar 11, 2021 at 16:47:54 -0700, Jim Fehlig wrote:
On 3/10/21 9:37 AM, Peter Krempa wrote:
Commit 94e45d1042e broke exec-restart of virtlogd and virtlockd as the code waiting for the daemon shutdown closed the daemons before exec-restarting.
This reminds me of an odd issue we encountered three years ago, fixed by Daniel
https://listman.redhat.com/archives/libvir-list/2018-March/msg00298.html
I tested your patches but notice locks are still lost on re-exec.
qemu.conf: lock_manager = "lockd"
qemu-lockd.conf: file_lockspace_dir = "/var/lib/libvirt/lockspace"
/var/lib/libvirt/lockspace is nothing special, xfs on a local disk. After starting a VM
# ls /var/lib/libvirt/lockspace/ a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c # lslocks | grep lockd virtlockd 95009 POSIX WRITE 0 0 0 /var/lib/libvirt/lockspace/a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c virtlockd 95009 POSIX 5B WRITE 0 0 0 /run/virtlockd.pid # systemctl reload virtlockd
Could you make sure that the virtlockd process before and after has the same pid, so that it wasn't actually restarted by systemct? I'm asking because in my current test I've encountered another crash when exec-restarting: 2021-03-12 08:41:31.649+0000: 2765718: error : virJSONValueToBuffer:1946 : internal error: failed to convert virJSONValue to yajl data double free or corruption (fasttop) Program received signal SIGABRT, Aborted. 0x00007ffff77819d5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff77819d5 in raise () at /lib64/libc.so.6 #1 0x00007ffff776a8a4 in abort () at /lib64/libc.so.6 #2 0x00007ffff77c4177 in __libc_message () at /lib64/libc.so.6 #3 0x00007ffff77cbe6c in annobin_top_check.start () at /lib64/libc.so.6 #4 0x00007ffff77cd393 in _int_free () at /lib64/libc.so.6 #5 0x00007ffff7a0b70d in g_free () at /lib64/libglib-2.0.so.0 #6 0x00007ffff7c0977f in virJSONValueFree (value=0x5555555710b0) at ../../../libvirt/src/util/virjson.c:401 #7 0x000055555555c3f2 in glib_autoptr_clear_virJSONValue (_ptr=0x5555555c4250) at ../../../libvirt/src/util/virjson.h:173 #8 glib_autoptr_cleanup_virJSONValue (_ptr=<synthetic pointer>) at ../../../libvirt/src/util/virjson.h:173 #9 virLockDaemonPreExecRestart (argv=0x7fffffffe428, dmn=<optimized out>, state_file=<optimized out>) at ../../../libvirt/src/locking/lock_daemon.c:700 #10 main (argc=<optimized out>, argv=0x7fffffffe428) at ../../../libvirt/src/locking/lock_daemon.c:1148 Looks like a double free. I'll post patches later for this.
# ls /var/lib/libvirt/lockspace/ a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c # lslocks | grep lockd

On Fri, Mar 12, 2021 at 09:44:09 +0100, Peter Krempa wrote:
On Thu, Mar 11, 2021 at 16:47:54 -0700, Jim Fehlig wrote:
On 3/10/21 9:37 AM, Peter Krempa wrote:
Commit 94e45d1042e broke exec-restart of virtlogd and virtlockd as the code waiting for the daemon shutdown closed the daemons before exec-restarting.
This reminds me of an odd issue we encountered three years ago, fixed by Daniel
https://listman.redhat.com/archives/libvir-list/2018-March/msg00298.html
I tested your patches but notice locks are still lost on re-exec.
qemu.conf: lock_manager = "lockd"
qemu-lockd.conf: file_lockspace_dir = "/var/lib/libvirt/lockspace"
/var/lib/libvirt/lockspace is nothing special, xfs on a local disk. After starting a VM
# ls /var/lib/libvirt/lockspace/ a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c # lslocks | grep lockd virtlockd 95009 POSIX WRITE 0 0 0 /var/lib/libvirt/lockspace/a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c virtlockd 95009 POSIX 5B WRITE 0 0 0 /run/virtlockd.pid # systemctl reload virtlockd
Could you make sure that the virtlockd process before and after has the same pid, so that it wasn't actually restarted by systemct?
I'm asking because in my current test I've encountered another crash when exec-restarting:
2021-03-12 08:41:31.649+0000: 2765718: error : virJSONValueToBuffer:1946 : internal error: failed to convert virJSONValue to yajl data double free or corruption (fasttop)
Looks like a double free. I'll post patches later for this.
https://listman.redhat.com/archives/libvir-list/2021-March/msg00569.html

On 3/12/21 1:44 AM, Peter Krempa wrote:
On Thu, Mar 11, 2021 at 16:47:54 -0700, Jim Fehlig wrote:
On 3/10/21 9:37 AM, Peter Krempa wrote:
Commit 94e45d1042e broke exec-restart of virtlogd and virtlockd as the code waiting for the daemon shutdown closed the daemons before exec-restarting.
This reminds me of an odd issue we encountered three years ago, fixed by Daniel
https://listman.redhat.com/archives/libvir-list/2018-March/msg00298.html
I tested your patches but notice locks are still lost on re-exec.
qemu.conf: lock_manager = "lockd"
qemu-lockd.conf: file_lockspace_dir = "/var/lib/libvirt/lockspace"
/var/lib/libvirt/lockspace is nothing special, xfs on a local disk. After starting a VM
# ls /var/lib/libvirt/lockspace/ a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c # lslocks | grep lockd virtlockd 95009 POSIX WRITE 0 0 0 /var/lib/libvirt/lockspace/a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c virtlockd 95009 POSIX 5B WRITE 0 0 0 /run/virtlockd.pid # systemctl reload virtlockd
Could you make sure that the virtlockd process before and after has the same pid, so that it wasn't actually restarted by systemct?
I thought I checked it, but apparently not...
I'm asking because in my current test I've encountered another crash when exec-restarting:
2021-03-12 08:41:31.649+0000: 2765718: error : virJSONValueToBuffer:1946 : internal error: failed to convert virJSONValue to yajl data double free or corruption (fasttop)
Program received signal SIGABRT, Aborted. 0x00007ffff77819d5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff77819d5 in raise () at /lib64/libc.so.6 #1 0x00007ffff776a8a4 in abort () at /lib64/libc.so.6 #2 0x00007ffff77c4177 in __libc_message () at /lib64/libc.so.6 #3 0x00007ffff77cbe6c in annobin_top_check.start () at /lib64/libc.so.6 #4 0x00007ffff77cd393 in _int_free () at /lib64/libc.so.6 #5 0x00007ffff7a0b70d in g_free () at /lib64/libglib-2.0.so.0 #6 0x00007ffff7c0977f in virJSONValueFree (value=0x5555555710b0) at ../../../libvirt/src/util/virjson.c:401 #7 0x000055555555c3f2 in glib_autoptr_clear_virJSONValue (_ptr=0x5555555c4250) at ../../../libvirt/src/util/virjson.h:173 #8 glib_autoptr_cleanup_virJSONValue (_ptr=<synthetic pointer>) at ../../../libvirt/src/util/virjson.h:173 #9 virLockDaemonPreExecRestart (argv=0x7fffffffe428, dmn=<optimized out>, state_file=<optimized out>) at ../../../libvirt/src/locking/lock_daemon.c:700 #10 main (argc=<optimized out>, argv=0x7fffffffe428) at ../../../libvirt/src/locking/lock_daemon.c:1148
because looking again I'm seeing the same crash. Facepalm!
Looks like a double free. I'll post patches later for this.
I noticed your patches are pushed. A quick test verified all is working well now. Thanks! Regards, Jim

On 3/10/21 5:37 PM, Peter Krempa wrote:
Peter Krempa (2): virnetdaemon: Introduce virNetDaemonQuitExecRestart virtlo(g|ck)d: Fix exec-restart
src/libvirt_remote.syms | 1 + src/locking/lock_daemon.c | 2 +- src/logging/log_daemon.c | 2 +- src/rpc/virnetdaemon.c | 19 +++++++++++++++++++ src/rpc/virnetdaemon.h | 1 + 5 files changed, 23 insertions(+), 2 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal
participants (3)
-
Jim Fehlig
-
Michal Privoznik
-
Peter Krempa