[libvirt] [PATCH 0/3] Improve LXC startup error reporting

From: "Daniel P. Berrange" <berrange@redhat.com> LXC has long suffered from pretty poor error reporting of failures at startup. This series addresses those problems. Daniel P. Berrange (3): Fix exit status of lxc controller Improve error reporting with LXC controller Don't ignore all dbus connection errors src/lxc/lxc_controller.c | 2 +- src/lxc/lxc_process.c | 31 +++++++++++++++++++++++++------ src/nwfilter/nwfilter_driver.c | 5 +++-- src/util/virdbus.c | 22 +++++++++++++++++++--- src/util/virsystemd.c | 6 ++++-- 5 files changed, 52 insertions(+), 14 deletions(-) -- 1.8.3.1

From: "Daniel P. Berrange" <berrange@redhat.com> The LXC controller main() method initialized 'rc' to 1 rather than '-1'. In the cleanup path it will print any error to stderr, if-and-only-if rc < 0. Hence the incorrect initialization caused errors to be lost. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index b881f17..1c6aed6 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -2230,7 +2230,7 @@ cleanup: int main(int argc, char *argv[]) { pid_t pid; - int rc = 1; + int rc = -1; char *name = NULL; size_t nveths = 0; char **veths = NULL; -- 1.8.3.1

Reviewed-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Daniel P. Berrange Sent: Saturday, October 12, 2013 12:54 AM To: libvir-list@redhat.com Subject: [libvirt] [PATCH 1/3] Fix exit status of lxc controller
From: "Daniel P. Berrange" <berrange@redhat.com>
The LXC controller main() method initialized 'rc' to 1 rather than '-1'. In the cleanup path it will print any error to stderr, if-and-only-if rc < 0. Hence the incorrect initialization caused errors to be lost.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_controller.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index b881f17..1c6aed6 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -2230,7 +2230,7 @@ cleanup: int main(int argc, char *argv[]) { pid_t pid; - int rc = 1; + int rc = -1; char *name = NULL; size_t nveths = 0; char **veths = NULL; -- 1.8.3.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

From: "Daniel P. Berrange" <berrange@redhat.com> The LXC code would read the log file if an LXC guest failed to startup. There were a number of failure cases where the guest will not start and libvirtd never gets as far as looking at the log file. Fix this by replacing some earlier generic errors with messages from the log. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_process.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index d07ff13..840e138 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -980,6 +980,7 @@ int virLXCProcessStart(virConnectPtr conn, virErrorPtr err = NULL; virLXCDriverConfigPtr cfg = virLXCDriverGetConfig(driver); virCgroupPtr selfcgroup; + int status; if (virCgroupNewSelf(&selfcgroup) < 0) return -1; @@ -1182,9 +1183,18 @@ int virLXCProcessStart(virConnectPtr conn, VIR_WARN("Unable to seek to end of logfile: %s", virStrerror(errno, ebuf, sizeof(ebuf))); - if (virCommandRun(cmd, NULL) < 0) + if (virCommandRun(cmd, &status) < 0) goto cleanup; + if (status != 0) { + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) <= 0) + snprintf(ebuf, sizeof(ebuf), "unexpected exit status %d", status); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("guest failed to start: %s"), ebuf); + goto cleanup; + } + + if (VIR_CLOSE(handshakefds[1]) < 0) { virReportSystemError(errno, "%s", _("could not close handshake fd")); goto cleanup; @@ -1193,16 +1203,25 @@ int virLXCProcessStart(virConnectPtr conn, /* Connect to the controller as a client *first* because * this will block until the child has written their * pid file out to disk & created their cgroup */ - if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) + if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) { + VIR_ERROR("Here"); + /* Intentionally overwrite the real monitor error message, + * since a better one is almost always found in the logs + */ + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) > 0) { + VIR_ERROR("tHere"); + virResetLastError(); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("guest failed to start: %s"), ebuf); + } goto cleanup; + } /* And get its pid */ if ((r = virPidFileRead(cfg->stateDir, vm->def->name, &vm->pid)) < 0) { - char out[1024]; - - if (virLXCProcessReadLogOutput(vm, logfile, pos, out, 1024) > 0) + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) > 0) virReportError(VIR_ERR_INTERNAL_ERROR, - _("guest failed to start: %s"), out); + _("guest failed to start: %s"), ebuf); else virReportSystemError(-r, _("Failed to read pid file %s/%s.pid"), -- 1.8.3.1

This would be far more convenient than checking logs. Reviewed-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Daniel P. Berrange Sent: Saturday, October 12, 2013 12:54 AM To: libvir-list@redhat.com Subject: [libvirt] [PATCH 2/3] Improve error reporting with LXC controller
From: "Daniel P. Berrange" <berrange@redhat.com>
The LXC code would read the log file if an LXC guest failed to startup. There were a number of failure cases where the guest will not start and libvirtd never gets as far as looking at the log file.
Fix this by replacing some earlier generic errors with messages from the log.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_process.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index d07ff13..840e138 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -980,6 +980,7 @@ int virLXCProcessStart(virConnectPtr conn, virErrorPtr err = NULL; virLXCDriverConfigPtr cfg = virLXCDriverGetConfig(driver); virCgroupPtr selfcgroup; + int status;
if (virCgroupNewSelf(&selfcgroup) < 0) return -1; @@ -1182,9 +1183,18 @@ int virLXCProcessStart(virConnectPtr conn, VIR_WARN("Unable to seek to end of logfile: %s", virStrerror(errno, ebuf, sizeof(ebuf)));
- if (virCommandRun(cmd, NULL) < 0) + if (virCommandRun(cmd, &status) < 0) goto cleanup;
+ if (status != 0) { + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) <= 0) + snprintf(ebuf, sizeof(ebuf), "unexpected exit status %d", status); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("guest failed to start: %s"), ebuf); + goto cleanup; + } + + if (VIR_CLOSE(handshakefds[1]) < 0) { virReportSystemError(errno, "%s", _("could not close handshake fd")); goto cleanup; @@ -1193,16 +1203,25 @@ int virLXCProcessStart(virConnectPtr conn, /* Connect to the controller as a client *first* because * this will block until the child has written their * pid file out to disk & created their cgroup */ - if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) + if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) { + VIR_ERROR("Here"); + /* Intentionally overwrite the real monitor error message, + * since a better one is almost always found in the logs + */ + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) > 0) { + VIR_ERROR("tHere"); + virResetLastError(); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("guest failed to start: %s"), ebuf); + } goto cleanup; + }
/* And get its pid */ if ((r = virPidFileRead(cfg->stateDir, vm->def->name, &vm->pid)) < 0) { - char out[1024]; - - if (virLXCProcessReadLogOutput(vm, logfile, pos, out, 1024) > 0) + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) > 0) virReportError(VIR_ERR_INTERNAL_ERROR, - _("guest failed to start: %s"), out); + _("guest failed to start: %s"), ebuf); else virReportSystemError(-r, _("Failed to read pid file %s/%s.pid"), -- 1.8.3.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 11.10.2013 18:53, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The LXC code would read the log file if an LXC guest failed to startup. There were a number of failure cases where the guest will not start and libvirtd never gets as far as looking at the log file.
Fix this by replacing some earlier generic errors with messages from the log.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_process.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index d07ff13..840e138 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -980,6 +980,7 @@ int virLXCProcessStart(virConnectPtr conn, virErrorPtr err = NULL; virLXCDriverConfigPtr cfg = virLXCDriverGetConfig(driver); virCgroupPtr selfcgroup; + int status;
if (virCgroupNewSelf(&selfcgroup) < 0) return -1; @@ -1182,9 +1183,18 @@ int virLXCProcessStart(virConnectPtr conn, VIR_WARN("Unable to seek to end of logfile: %s", virStrerror(errno, ebuf, sizeof(ebuf)));
- if (virCommandRun(cmd, NULL) < 0) + if (virCommandRun(cmd, &status) < 0) goto cleanup;
+ if (status != 0) { + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) <= 0) + snprintf(ebuf, sizeof(ebuf), "unexpected exit status %d", status); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("guest failed to start: %s"), ebuf); + goto cleanup; + } + + if (VIR_CLOSE(handshakefds[1]) < 0) { virReportSystemError(errno, "%s", _("could not close handshake fd")); goto cleanup; @@ -1193,16 +1203,25 @@ int virLXCProcessStart(virConnectPtr conn, /* Connect to the controller as a client *first* because * this will block until the child has written their * pid file out to disk & created their cgroup */ - if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) + if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) { + VIR_ERROR("Here");
You probably don't want this line ^^
+ /* Intentionally overwrite the real monitor error message, + * since a better one is almost always found in the logs + */ + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) > 0) { + VIR_ERROR("tHere");
Nor this one ^^.
+ virResetLastError(); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("guest failed to start: %s"), ebuf); + } goto cleanup; + }
/* And get its pid */ if ((r = virPidFileRead(cfg->stateDir, vm->def->name, &vm->pid)) < 0) { - char out[1024]; - - if (virLXCProcessReadLogOutput(vm, logfile, pos, out, 1024) > 0) + if (virLXCProcessReadLogOutput(vm, logfile, pos, ebuf, sizeof(ebuf)) > 0) virReportError(VIR_ERR_INTERNAL_ERROR, - _("guest failed to start: %s"), out); + _("guest failed to start: %s"), ebuf); else virReportSystemError(-r, _("Failed to read pid file %s/%s.pid"),
Michal

From: "Daniel P. Berrange" <berrange@redhat.com> Previous commit commit 7ada155cdf2bbfac16ce08f64abb455a940e2cf7 Author: Gao feng <gaofeng@cn.fujitsu.com> Date: Wed Sep 11 11:15:02 2013 +0800 DBus: introduce virDBusIsServiceEnabled Made the cgroups code fallback to non-systemd based setup when dbus is not running. It was too big a hammer though, as it did not check what error code was received when the dbus connection failed. Thus it silently ignored serious errors from dbus such as "too many client connections", which should always be treated as fatal. We only want to ignore errors if the dbus unix socket does not exist, or if nothing is listening on it. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/nwfilter/nwfilter_driver.c | 5 +++-- src/util/virdbus.c | 22 +++++++++++++++++++--- src/util/virsystemd.c | 6 ++++-- 3 files changed, 26 insertions(+), 7 deletions(-) diff --git a/src/nwfilter/nwfilter_driver.c b/src/nwfilter/nwfilter_driver.c index d25c6f2..6602d73 100644 --- a/src/nwfilter/nwfilter_driver.c +++ b/src/nwfilter/nwfilter_driver.c @@ -175,8 +175,9 @@ nwfilterStateInitialize(bool privileged, DBusConnection *sysbus = NULL; #if WITH_DBUS - if (virDBusHasSystemBus()) - sysbus = virDBusGetSystemBus(); + if (virDBusHasSystemBus() && + !(sysbus = virDBusGetSystemBus())) + return -1; #endif /* WITH_DBUS */ if (VIR_ALLOC(driverState) < 0) diff --git a/src/util/virdbus.c b/src/util/virdbus.c index dc4ace0..4e4c267 100644 --- a/src/util/virdbus.c +++ b/src/util/virdbus.c @@ -111,14 +111,29 @@ virDBusGetSystemBus(void) } +/** + * virDBusHasSystemBus: + * + * Check if dbus system bus is running. This does not + * imply that we have a connection. DBus might be running + * and refusing connections due to its client limit. The + * latter must be treated as a fatal error. + * + * Return false if dbus is not available, true if probably available. + */ bool virDBusHasSystemBus(void) { if (virDBusGetSystemBusInternal()) return true; - VIR_DEBUG("System DBus not available: %s", NULLSTR(systemdbuserr.message)); - return false; + if (systemdbuserr.name && + (STREQ(systemdbuserr.name, "org.freedesktop.DBus.Error.FileNotFound") || + STREQ(systemdbuserr.name, "org.freedesktop.DBus.Error.NoServer"))) { + VIR_DEBUG("System DBus not available: %s", NULLSTR(systemdbuserr.message)); + return false; + } + return true; } @@ -1257,7 +1272,8 @@ int virDBusIsServiceEnabled(const char *name) if (!virDBusHasSystemBus()) return -2; - conn = virDBusGetSystemBus(); + if (!(conn = virDBusGetSystemBus())) + return -1; if (virDBusCallMethod(conn, &reply, diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 1ba37cc..503fff7 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -169,7 +169,8 @@ int virSystemdCreateMachine(const char *name, if (ret < 0) return ret; - conn = virDBusGetSystemBus(); + if (!(conn = virDBusGetSystemBus())) + return -1; ret = -1; if (!(machinename = virSystemdMakeMachineName(name, drivername, privileged))) @@ -267,7 +268,8 @@ int virSystemdTerminateMachine(const char *name, if (ret < 0) return ret; - conn = virDBusGetSystemBus(); + if (!(conn = virDBusGetSystemBus())) + return -1; ret = -1; if (!(machinename = virSystemdMakeMachineName(name, drivername, privileged))) -- 1.8.3.1

On 11.10.2013 18:53, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
LXC has long suffered from pretty poor error reporting of failures at startup. This series addresses those problems.
Daniel P. Berrange (3): Fix exit status of lxc controller Improve error reporting with LXC controller Don't ignore all dbus connection errors
src/lxc/lxc_controller.c | 2 +- src/lxc/lxc_process.c | 31 +++++++++++++++++++++++++------ src/nwfilter/nwfilter_driver.c | 5 +++-- src/util/virdbus.c | 22 +++++++++++++++++++--- src/util/virsystemd.c | 6 ++++-- 5 files changed, 52 insertions(+), 14 deletions(-)
ACK series Michal
participants (3)
-
Chen Hanxiao
-
Daniel P. Berrange
-
Michal Privoznik