[PATCH] bhyve: process: improve domain startup error handling
After executing the bhyve binary, it might happen that it fails very early due to configuration issues (missing/inaccessible files, incorrect custom args), bugs, etc. In this case it'll look like the domain has started normally, but quickly turned off. Improve that by waiting for the domain's vmm entity to appear in /dev/vmm. Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com> --- Ideally, I'd also like to display the errors that the bhyve binary logs in this case. But as I'm doing: virCommandSetErrorFD(cmd, &logfd); I'm not sure if I should read that again from logfd, or are there more optimal ways to do that? src/bhyve/bhyve_process.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/src/bhyve/bhyve_process.c b/src/bhyve/bhyve_process.c index 4fb7e642e1..591c2d1ded 100644 --- a/src/bhyve/bhyve_process.c +++ b/src/bhyve/bhyve_process.c @@ -143,7 +143,12 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver, g_autoptr(virCommand) cmd = NULL; g_autoptr(virCommand) load_cmd = NULL; bhyveDomainObjPrivate *priv = vm->privateData; + g_auto(virBuffer) domain_vmm_path_buf = VIR_BUFFER_INITIALIZER; + g_autofree char *domain_vmm_path = NULL; int ret = -1, rc; + size_t i = 0; + int timeout = 3; + bool vmm_appeared = false; logfile = g_strdup_printf("%s/%s.log", BHYVE_LOG_DIR, vm->def->name); if ((logfd = open(logfile, O_WRONLY | O_APPEND | O_CREAT, @@ -228,6 +233,25 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver, goto cleanup; } + virBufferAsprintf(&domain_vmm_path_buf, "/dev/vmm/%s", vm->def->name); + domain_vmm_path = virBufferContentAndReset(&domain_vmm_path_buf); + + do { + if (virFileExists(domain_vmm_path)) { + vmm_appeared = true; + break; + } + + g_usleep(500000); + } while (!vmm_appeared && ++i < timeout); + + if (!vmm_appeared) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Domain %1$s didn't show up in /dev/vmm"), + vm->def->name); + goto cleanup; + } + vm->def->id = vm->pid; virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason); priv->mon = bhyveMonitorOpen(vm, driver); -- 2.51.0
On 10/28/25 19:22, Roman Bogorodskiy wrote:
After executing the bhyve binary, it might happen that it fails very early due to configuration issues (missing/inaccessible files, incorrect custom args), bugs, etc. In this case it'll look like the domain has started normally, but quickly turned off.
Improve that by waiting for the domain's vmm entity to appear in /dev/vmm.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com> --- Ideally, I'd also like to display the errors that the bhyve binary logs in this case. But as I'm doing:
virCommandSetErrorFD(cmd, &logfd);
I'm not sure if I should read that again from logfd, or are there more optimal ways to do that?
src/bhyve/bhyve_process.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
Please consider a squashing this in: diff --git a/src/bhyve/bhyve_process.c b/src/bhyve/bhyve_process.c index 591c2d1ded..61638abf85 100644 --- a/src/bhyve/bhyve_process.c +++ b/src/bhyve/bhyve_process.c @@ -51,6 +51,7 @@ #include "virnetdev.h" #include "virnetdevbridge.h" #include "virnetdevtap.h" +#include "virtime.h" #define VIR_FROM_THIS VIR_FROM_BHYVE @@ -143,11 +144,9 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver, g_autoptr(virCommand) cmd = NULL; g_autoptr(virCommand) load_cmd = NULL; bhyveDomainObjPrivate *priv = vm->privateData; - g_auto(virBuffer) domain_vmm_path_buf = VIR_BUFFER_INITIALIZER; g_autofree char *domain_vmm_path = NULL; + virTimeBackOffVar timebackoff; int ret = -1, rc; - size_t i = 0; - int timeout = 3; bool vmm_appeared = false; logfile = g_strdup_printf("%s/%s.log", BHYVE_LOG_DIR, vm->def->name); @@ -233,17 +232,16 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver, goto cleanup; } - virBufferAsprintf(&domain_vmm_path_buf, "/dev/vmm/%s", vm->def->name); - domain_vmm_path = virBufferContentAndReset(&domain_vmm_path_buf); + domain_vmm_path = g_strdup_printf("/dev/vmm/%s", vm->def->name); - do { + if (virTimeBackOffStart(&timebackoff, 1, 1000) < 0) + goto cleanup; + while (virTimeBackOffWait(&timebackoff)) { if (virFileExists(domain_vmm_path)) { vmm_appeared = true; break; } - - g_usleep(500000); - } while (!vmm_appeared && ++i < timeout); + } if (!vmm_appeared) { virReportError(VIR_ERR_INTERNAL_ERROR, Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal
Michal Prívozník wrote:
On 10/28/25 19:22, Roman Bogorodskiy wrote:
After executing the bhyve binary, it might happen that it fails very early due to configuration issues (missing/inaccessible files, incorrect custom args), bugs, etc. In this case it'll look like the domain has started normally, but quickly turned off.
Improve that by waiting for the domain's vmm entity to appear in /dev/vmm.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com> --- Ideally, I'd also like to display the errors that the bhyve binary logs in this case. But as I'm doing:
virCommandSetErrorFD(cmd, &logfd);
I'm not sure if I should read that again from logfd, or are there more optimal ways to do that?
src/bhyve/bhyve_process.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
Please consider a squashing this in:
diff --git a/src/bhyve/bhyve_process.c b/src/bhyve/bhyve_process.c index 591c2d1ded..61638abf85 100644 --- a/src/bhyve/bhyve_process.c +++ b/src/bhyve/bhyve_process.c @@ -51,6 +51,7 @@ #include "virnetdev.h" #include "virnetdevbridge.h" #include "virnetdevtap.h" +#include "virtime.h"
#define VIR_FROM_THIS VIR_FROM_BHYVE
@@ -143,11 +144,9 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver, g_autoptr(virCommand) cmd = NULL; g_autoptr(virCommand) load_cmd = NULL; bhyveDomainObjPrivate *priv = vm->privateData; - g_auto(virBuffer) domain_vmm_path_buf = VIR_BUFFER_INITIALIZER; g_autofree char *domain_vmm_path = NULL; + virTimeBackOffVar timebackoff; int ret = -1, rc; - size_t i = 0; - int timeout = 3; bool vmm_appeared = false;
logfile = g_strdup_printf("%s/%s.log", BHYVE_LOG_DIR, vm->def->name); @@ -233,17 +232,16 @@ virBhyveProcessStartImpl(struct _bhyveConn *driver, goto cleanup; }
- virBufferAsprintf(&domain_vmm_path_buf, "/dev/vmm/%s", vm->def->name); - domain_vmm_path = virBufferContentAndReset(&domain_vmm_path_buf); + domain_vmm_path = g_strdup_printf("/dev/vmm/%s", vm->def->name);
- do { + if (virTimeBackOffStart(&timebackoff, 1, 1000) < 0) + goto cleanup; + while (virTimeBackOffWait(&timebackoff)) { if (virFileExists(domain_vmm_path)) { vmm_appeared = true; break; } - - g_usleep(500000); - } while (!vmm_appeared && ++i < timeout); + }
if (!vmm_appeared) { virReportError(VIR_ERR_INTERNAL_ERROR,
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal
Thanks, I was looking for something like that, but somehow overlooked the virTimeBackOff* functions. I squashed that in and changed 1000 to 1500 in virTimeBackOffStart() to keep the original 1.5 seconds timeout. Then I thought a bit more about it and bumped it to 5000 to add some safety gap. I'll test it a bit more and merge after the freeze. By the way, any pointers how to improve logging for these situation, i.e. so I could somehow keep the error log not only in the actual log FD, but also accessible in the code, so I could report that to a user? Thanks, Roman
participants (2)
-
Michal Prívozník -
Roman Bogorodskiy