On Tue, Feb 14, 2023 at 11:08:12 -0600, Jonathon Jongsma wrote:
Adds the ability to monitor the nbdkit process so that we can take
action in case the child exits unexpectedly.
When the nbdkit process exits, we pause the vm, restart nbdkit, and then
resume the vm. This allows the vm to continue working in the event of a
nbdkit failure.
Eventually we may want to generalize this functionality since we may
need something similar for e.g. qemu-storage-daemon, etc.
The process is monitored with the pidfd_open() syscall if it exists
(since linux 5.3). Otherwise it resorts to checking whether the process
is alive once a second. The one-second time period was chosen somewhat
arbitrarily.
Signed-off-by: Jonathon Jongsma <jjongsma(a)redhat.com>
---
meson.build | 7 ++
src/qemu/qemu_nbdkit.c | 147 ++++++++++++++++++++++++++++++++++++++--
src/qemu/qemu_nbdkit.h | 4 +-
src/qemu/qemu_process.c | 4 +-
4 files changed, 155 insertions(+), 7 deletions(-)
[...]
diff --git a/src/qemu/qemu_nbdkit.c b/src/qemu/qemu_nbdkit.c
index e6d3232a1e..74e2ceee19 100644
--- a/src/qemu/qemu_nbdkit.c
+++ b/src/qemu/qemu_nbdkit.c
[...]
+#if WITH_DECL_SYS_PIDFD_OPEN
+ pidfd = syscall(SYS_pidfd_open, proc->pid, 0);
+ if (pidfd < 0)
+ return -1;
+
+ proc->eventwatch = virEventAddHandle(pidfd,
+ VIR_EVENT_HANDLE_READABLE,
+ qemuNbdkitProcessPidfdCb,
+ qemuNbdkitProcessEventDataNew(proc, vm),
+
(virFreeCallback)qemuNbdkitProcessEventDataFree);
+#else
+ /* fall back to checking once a second */
+ proc->eventwatch = virEventAddTimeout(1000,
+ qemuNbdkitProcessTimeoutCb,
+ qemuNbdkitProcessEventDataNew(proc, vm),
+
(virFreeCallback)qemuNbdkitProcessEventDataFree);
I don't think we want to do this this polling. I'll rather disable all
of nbdkit than do this if the event interface is not available.
Additionally any failure in this function would result into the default
error of "unable to connect to nbdkit" but this time with no
information.