[libvirt] [[PATCH v2] 0/4] try harder to get dest qemu errors on migation

Hi, all. In case migration fails due to destination qemu exits unexpectedly user recevies the qemu log in the error message. Unfortunately log is truncated and the most interesting part is missed (below is the example of such a log [1]). Actually for the most cases the first patch will be enough to fix the issue. Originally I thought the problem is qemu logging and reading the log are not in sync (which is true) so I tried to fix it as well in the next patches. * diff from v1: 1. split changes to libvirtd and virtlogd to different patches 2. split virtlogd patch further 3. simplify handling eofs and hangups in draining function [1] log example: CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 ES =0000 00000000 00000000 00000000 CS =0000 00000000 00000000 00000000 SS =0000 00000000 00000000 00000000 DS =0000 00000000 00000000 00000000 FS =0000 00000000 00000000 00000000 GS =0000 00000000 00000000 00000000 LDT=0000 00000000 00000000 00000000 TR =0000 00000000 00000000 00000000 GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000000 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=0000000000000000 DR7=0000000000000000 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=0000 FSW=0000 [ST=0] FTW=ff MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 CPU Reset (CPU 1) EAX=00000000 EBX=00000000 ECX=00000000 EDX=000206a1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=000 qemu: terminating on signal 15 from pid 168133 Nikolay Shirokovskiy (4): util: remove 1k limit for error messages virtlogd: stop reading on EOF instead of hangup virtlogd: add flag to wait for log end on read qemu: if virtlogd is used then read log tail correctly src/logging/log_handler.c | 46 ++++++++++++++++++++++++++++++++++++++++------ src/logging/log_protocol.x | 5 +++++ src/qemu/qemu_domain.c | 7 ++++++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 2 ++ src/util/virerror.c | 9 ++++----- 6 files changed, 58 insertions(+), 12 deletions(-) -- 1.8.3.1

This limit is the issue for example when we want to return in the error message the log of the abnormally exited qemu process of the destination side in the process of migration. virVasprintf is already used in the process of logging in the callers down the stack. --- src/util/virerror.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/util/virerror.c b/src/util/virerror.c index 2958308..ca74c14 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -1427,22 +1427,21 @@ void virReportErrorHelper(int domcode, { int save_errno = errno; va_list args; - char errorMessage[1024]; + char *errorMessage = NULL; const char *virerr; if (fmt) { va_start(args, fmt); - vsnprintf(errorMessage, sizeof(errorMessage)-1, fmt, args); + ignore_value(virVasprintfQuiet(&errorMessage, fmt, args)); va_end(args); - } else { - errorMessage[0] = '\0'; } - virerr = virErrorMsg(errorcode, (errorMessage[0] ? errorMessage : NULL)); + virerr = virErrorMsg(errorcode, errorMessage); virRaiseErrorFull(filename, funcname, linenr, domcode, errorcode, VIR_ERR_ERROR, virerr, errorMessage, NULL, -1, -1, virerr, errorMessage); + VIR_FREE(errorMessage); errno = save_errno; } -- 1.8.3.1

Otherwise we can miss log tail. Handling hangup in a special way in not nessessary. --- src/logging/log_handler.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/logging/log_handler.c b/src/logging/log_handler.c index cd0ba6e..ca294a0 100644 --- a/src/logging/log_handler.c +++ b/src/logging/log_handler.c @@ -140,7 +140,7 @@ virLogHandlerGetLogFileFromWatch(virLogHandlerPtr handler, static void virLogHandlerDomainLogFileEvent(int watch, int fd, - int events, + int events ATTRIBUTE_UNUSED, void *opaque) { virLogHandlerPtr handler = opaque; @@ -167,12 +167,12 @@ virLogHandlerDomainLogFileEvent(int watch, goto error; } + if (len == 0) + goto error; + if (virRotatingFileWriterAppend(logfile->file, buf, len) != len) goto error; - if (events & VIR_EVENT_HANDLE_HANGUP) - goto error; - virObjectUnlock(handler); return; -- 1.8.3.1

--- src/logging/log_handler.c | 38 ++++++++++++++++++++++++++++++++++++-- src/logging/log_protocol.x | 5 +++++ 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/src/logging/log_handler.c b/src/logging/log_handler.c index ca294a0..93b893b 100644 --- a/src/logging/log_handler.c +++ b/src/logging/log_handler.c @@ -23,6 +23,7 @@ #include <config.h> #include "log_handler.h" +#include "log_protocol.h" #include "virerror.h" #include "virobject.h" #include "virfile.h" @@ -54,10 +55,12 @@ struct _virLogHandlerLogFile { char *driver; unsigned char domuuid[VIR_UUID_BUFLEN]; char *domname; + char *path; }; struct _virLogHandler { virObjectLockable parent; + virCond condition; bool privileged; size_t max_size; @@ -102,6 +105,7 @@ virLogHandlerLogFileFree(virLogHandlerLogFilePtr file) VIR_FREE(file->driver); VIR_FREE(file->domname); + VIR_FREE(file->path); VIR_FREE(file); } @@ -137,6 +141,21 @@ virLogHandlerGetLogFileFromWatch(virLogHandlerPtr handler, } +static virLogHandlerLogFilePtr +virLogHandlerGetLogFileFromPath(virLogHandlerPtr handler, + const char* path) +{ + size_t i; + + for (i = 0; i < handler->nfiles; i++) { + if (STREQ(handler->files[i]->path, path)) + return handler->files[i]; + } + + return NULL; +} + + static void virLogHandlerDomainLogFileEvent(int watch, int fd, @@ -179,6 +198,7 @@ virLogHandlerDomainLogFileEvent(int watch, error: handler->inhibitor(false, handler->opaque); virLogHandlerLogFileClose(handler, logfile); + virCondBroadcast(&handler->condition); virObjectUnlock(handler); } @@ -198,6 +218,9 @@ virLogHandlerNew(bool privileged, if (!(handler = virObjectLockableNew(virLogHandlerClass))) goto error; + if (virCondInit(&handler->condition) < 0) + goto error; + handler->privileged = privileged; handler->max_size = max_size; handler->max_backups = max_backups; @@ -357,6 +380,7 @@ virLogHandlerDispose(void *obj) virLogHandlerLogFileFree(handler->files[i]); } VIR_FREE(handler->files); + virCondDestroy(&handler->condition); } @@ -401,7 +425,8 @@ virLogHandlerDomainOpenLogFile(virLogHandlerPtr handler, pipefd[0] = -1; memcpy(file->domuuid, domuuid, VIR_UUID_BUFLEN); if (VIR_STRDUP(file->driver, driver) < 0 || - VIR_STRDUP(file->domname, domname) < 0) + VIR_STRDUP(file->domname, domname) < 0 || + VIR_STRDUP(file->path, path) < 0) goto error; if ((file->file = virRotatingFileWriterNew(path, @@ -492,10 +517,19 @@ virLogHandlerDomainReadLogFile(virLogHandlerPtr handler, char *data = NULL; ssize_t got; - virCheckFlags(0, NULL); + virCheckFlags(VIR_LOG_MANAGER_PROTOCOL_DOMAIN_READ_LOG_FILE_WAIT, NULL); virObjectLock(handler); + if (flags & VIR_LOG_MANAGER_PROTOCOL_DOMAIN_READ_LOG_FILE_WAIT) { + while (virLogHandlerGetLogFileFromPath(handler, path)) { + if (virCondWait(&handler->condition, &handler->parent.lock) < 0) { + virReportSystemError(errno, "%s", _("failed to wait for EOF")); + goto error; + } + } + } + if (!(file = virRotatingFileReaderNew(path, handler->max_backups))) goto error; diff --git a/src/logging/log_protocol.x b/src/logging/log_protocol.x index 2434043..194bc2e 100644 --- a/src/logging/log_protocol.x +++ b/src/logging/log_protocol.x @@ -34,6 +34,11 @@ enum virLogManagerProtocolDomainOpenLogFileFlags { VIR_LOG_MANAGER_PROTOCOL_DOMAIN_OPEN_LOG_FILE_TRUNCATE = 1 }; +enum virLogManagerProtocolDomainReadLogFileFlags { + /* wait until EOF from writing side */ + VIR_LOG_MANAGER_PROTOCOL_DOMAIN_READ_LOG_FILE_WAIT = 1 +}; + /* Obtain a file handle suitable for writing to a * log file for a domain */ -- 1.8.3.1

On Mon, Sep 12, 2016 at 17:34:42 +0300, Nikolay Shirokovskiy wrote: This is a pretty big change but you did not write anything to describe or justify it.
--- src/logging/log_handler.c | 38 ++++++++++++++++++++++++++++++++++++-- src/logging/log_protocol.x | 5 +++++ 2 files changed, 41 insertions(+), 2 deletions(-)
[...]
@@ -492,10 +517,19 @@ virLogHandlerDomainReadLogFile(virLogHandlerPtr handler, char *data = NULL; ssize_t got;
- virCheckFlags(0, NULL); + virCheckFlags(VIR_LOG_MANAGER_PROTOCOL_DOMAIN_READ_LOG_FILE_WAIT, NULL);
virObjectLock(handler);
+ if (flags & VIR_LOG_MANAGER_PROTOCOL_DOMAIN_READ_LOG_FILE_WAIT) { + while (virLogHandlerGetLogFileFromPath(handler, path)) { + if (virCondWait(&handler->condition, &handler->parent.lock) < 0) { + virReportSystemError(errno, "%s", _("failed to wait for EOF")); + goto error; + } + }
E.g why do you need this? The qemu process is dead at the point when libvirt attempts to read the log file so I don't see a reason to wait here. Peter

On 05.10.2016 18:08, Peter Krempa wrote:
On Mon, Sep 12, 2016 at 17:34:42 +0300, Nikolay Shirokovskiy wrote:
This is a pretty big change but you did not write anything to describe or justify it.
--- src/logging/log_handler.c | 38 ++++++++++++++++++++++++++++++++++++-- src/logging/log_protocol.x | 5 +++++ 2 files changed, 41 insertions(+), 2 deletions(-)
[...]
@@ -492,10 +517,19 @@ virLogHandlerDomainReadLogFile(virLogHandlerPtr handler, char *data = NULL; ssize_t got;
- virCheckFlags(0, NULL); + virCheckFlags(VIR_LOG_MANAGER_PROTOCOL_DOMAIN_READ_LOG_FILE_WAIT, NULL);
virObjectLock(handler);
+ if (flags & VIR_LOG_MANAGER_PROTOCOL_DOMAIN_READ_LOG_FILE_WAIT) { + while (virLogHandlerGetLogFileFromPath(handler, path)) { + if (virCondWait(&handler->condition, &handler->parent.lock) < 0) { + virReportSystemError(errno, "%s", _("failed to wait for EOF")); + goto error; + } + }
E.g why do you need this? The qemu process is dead at the point when libvirt attempts to read the log file so I don't see a reason to wait here.
Peter
When we read log it can be written partially at that moment. Qemu is dead but there is a chance the pipe that connects the process and the log daemon is not drained. Thus if we want read everything the qemu process has written we need to wait for EOF. Log file handler for path of interest will disappear in case of EOF. By the way patch can be simplified. No need to store path in _virLogHandlerLogFile because it's child store this path already. Should I send v2 of the patch? Nikolay

First we need to specify flag to wait while qemu finishes its logging. Second virtlogd will not receive EOF as both qemu and libvirtd keep logging pipe fd. Thus let's close it after qemu process startup is finished as it will not be used anymore. All following logging on behalf of qemu by libvirt is done either thru daemon API or by receiving new writing descriptor. Beware, qemuDomainLogContextFree is unref actually. --- src/qemu/qemu_domain.c | 7 ++++++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 2 ++ 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index fd79390..6662907 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3841,12 +3841,13 @@ ssize_t qemuDomainLogContextRead(qemuDomainLogContextPtr ctxt, char *buf; size_t buflen; if (ctxt->manager) { + int flags = VIR_LOG_MANAGER_PROTOCOL_DOMAIN_READ_LOG_FILE_WAIT; buf = virLogManagerDomainReadLogFile(ctxt->manager, ctxt->path, ctxt->inode, ctxt->pos, 1024 * 128, - 0); + flags); if (!buf) return -1; buflen = strlen(buf); @@ -3974,6 +3975,10 @@ virLogManagerPtr qemuDomainLogContextGetManager(qemuDomainLogContextPtr ctxt) return ctxt->manager; } +void qemuDomainLogContextHalfClose(qemuDomainLogContextPtr ctxt) +{ + VIR_FORCE_CLOSE(ctxt->writefd); +} void qemuDomainLogContextFree(qemuDomainLogContextPtr ctxt) { diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 13c0372..71fe6ad 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -499,6 +499,7 @@ ssize_t qemuDomainLogContextRead(qemuDomainLogContextPtr ctxt, int qemuDomainLogContextGetWriteFD(qemuDomainLogContextPtr ctxt); void qemuDomainLogContextMarkPosition(qemuDomainLogContextPtr ctxt); void qemuDomainLogContextRef(qemuDomainLogContextPtr ctxt); +void qemuDomainLogContextHalfClose(qemuDomainLogContextPtr ctxt); void qemuDomainLogContextFree(qemuDomainLogContextPtr ctxt); virLogManagerPtr qemuDomainLogContextGetManager(qemuDomainLogContextPtr ctxt); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 7596579..737b2e0 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5493,6 +5493,8 @@ qemuProcessLaunch(virConnectPtr conn, cleanup: qemuDomainSecretDestroy(vm); virCommandFree(cmd); + if (logCtxt) + qemuDomainLogContextHalfClose(logCtxt); qemuDomainLogContextFree(logCtxt); virObjectUnref(cfg); virObjectUnref(caps); -- 1.8.3.1

ping On 12.09.2016 17:34, Nikolay Shirokovskiy wrote:
Hi, all.
In case migration fails due to destination qemu exits unexpectedly user recevies the qemu log in the error message. Unfortunately log is truncated and the most interesting part is missed (below is the example of such a log [1]).
Actually for the most cases the first patch will be enough to fix the issue. Originally I thought the problem is qemu logging and reading the log are not in sync (which is true) so I tried to fix it as well in the next patches.
* diff from v1:
1. split changes to libvirtd and virtlogd to different patches 2. split virtlogd patch further 3. simplify handling eofs and hangups in draining function
[1] log example:
CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 ES =0000 00000000 00000000 00000000 CS =0000 00000000 00000000 00000000 SS =0000 00000000 00000000 00000000 DS =0000 00000000 00000000 00000000 FS =0000 00000000 00000000 00000000 GS =0000 00000000 00000000 00000000 LDT=0000 00000000 00000000 00000000 TR =0000 00000000 00000000 00000000 GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000000 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=0000000000000000 DR7=0000000000000000 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=0000 FSW=0000 [ST=0] FTW=ff MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 CPU Reset (CPU 1) EAX=00000000 EBX=00000000 ECX=00000000 EDX=000206a1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=000 qemu: terminating on signal 15 from pid 168133
Nikolay Shirokovskiy (4): util: remove 1k limit for error messages virtlogd: stop reading on EOF instead of hangup virtlogd: add flag to wait for log end on read qemu: if virtlogd is used then read log tail correctly
src/logging/log_handler.c | 46 ++++++++++++++++++++++++++++++++++++++++------ src/logging/log_protocol.x | 5 +++++ src/qemu/qemu_domain.c | 7 ++++++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 2 ++ src/util/virerror.c | 9 ++++----- 6 files changed, 58 insertions(+), 12 deletions(-)

On Mon, Sep 12, 2016 at 17:34:39 +0300, Nikolay Shirokovskiy wrote:
Hi, all.
In case migration fails due to destination qemu exits unexpectedly user recevies the qemu log in the error message. Unfortunately log is truncated and the most interesting part is missed (below is the example of such a log [1]).
Actually for the most cases the first patch will be enough to fix the issue. Originally I thought the problem is qemu logging and reading the log are not in sync (which is true) so I tried to fix it as well in the next patches.
* diff from v1:
1. split changes to libvirtd and virtlogd to different patches 2. split virtlogd patch further 3. simplify handling eofs and hangups in draining function
[1] log example:
CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 ES =0000 00000000 00000000 00000000 CS =0000 00000000 00000000 00000000 SS =0000 00000000 00000000 00000000 DS =0000 00000000 00000000 00000000 FS =0000 00000000 00000000 00000000 GS =0000 00000000 00000000 00000000 LDT=0000 00000000 00000000 00000000 TR =0000 00000000 00000000 00000000 GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000000 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=0000000000000000 DR7=0000000000000000 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=0000 FSW=0000 [ST=0] FTW=ff MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 CPU Reset (CPU 1) EAX=00000000 EBX=00000000 ECX=00000000 EDX=000206a1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=000 qemu: terminating on signal 15 from pid 168133
I don't think that reporting all of the above is a good idea. We should perhaps report at most two last lines.

On 05.10.2016 18:13, Peter Krempa wrote:
On Mon, Sep 12, 2016 at 17:34:39 +0300, Nikolay Shirokovskiy wrote:
Hi, all.
In case migration fails due to destination qemu exits unexpectedly user recevies the qemu log in the error message. Unfortunately log is truncated and the most interesting part is missed (below is the example of such a log [1]).
Actually for the most cases the first patch will be enough to fix the issue. Originally I thought the problem is qemu logging and reading the log are not in sync (which is true) so I tried to fix it as well in the next patches.
* diff from v1:
1. split changes to libvirtd and virtlogd to different patches 2. split virtlogd patch further 3. simplify handling eofs and hangups in draining function
[1] log example:
CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 ES =0000 00000000 00000000 00000000 CS =0000 00000000 00000000 00000000 SS =0000 00000000 00000000 00000000 DS =0000 00000000 00000000 00000000 FS =0000 00000000 00000000 00000000 GS =0000 00000000 00000000 00000000 LDT=0000 00000000 00000000 00000000 TR =0000 00000000 00000000 00000000 GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000000 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=0000000000000000 DR7=0000000000000000 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=0000 FSW=0000 [ST=0] FTW=ff MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 CPU Reset (CPU 1) EAX=00000000 EBX=00000000 ECX=00000000 EDX=000206a1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=000 qemu: terminating on signal 15 from pid 168133
I don't think that reporting all of the above is a good idea. We should perhaps report at most two last lines.
We already report about half of this, this patch just removes random truncation. As to most two lines, AFAIU one can not say what part of this log will be useful for crash investigation. Nikolay

On Thu, Oct 06, 2016 at 10:23:05 +0300, Nikolay Shirokovskiy wrote:
On 05.10.2016 18:13, Peter Krempa wrote:
On Mon, Sep 12, 2016 at 17:34:39 +0300, Nikolay Shirokovskiy wrote:
Hi, all.
In case migration fails due to destination qemu exits unexpectedly user recevies the qemu log in the error message. Unfortunately log is truncated and the most interesting part is missed (below is the example of such a log [1]).
Actually for the most cases the first patch will be enough to fix the issue. Originally I thought the problem is qemu logging and reading the log are not in sync (which is true) so I tried to fix it as well in the next patches.
* diff from v1:
1. split changes to libvirtd and virtlogd to different patches 2. split virtlogd patch further 3. simplify handling eofs and hangups in draining function
[1] log example:
CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 ES =0000 00000000 00000000 00000000 CS =0000 00000000 00000000 00000000 SS =0000 00000000 00000000 00000000 DS =0000 00000000 00000000 00000000 FS =0000 00000000 00000000 00000000 GS =0000 00000000 00000000 00000000 LDT=0000 00000000 00000000 00000000 TR =0000 00000000 00000000 00000000 GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000000 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=0000000000000000 DR7=0000000000000000 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=0000 FSW=0000 [ST=0] FTW=ff MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 CPU Reset (CPU 1) EAX=00000000 EBX=00000000 ECX=00000000 EDX=000206a1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=000 qemu: terminating on signal 15 from pid 168133
I don't think that reporting all of the above is a good idea. We should perhaps report at most two last lines.
We already report about half of this, this patch just removes random truncation. As to most two lines, AFAIU one can not say what part of this log will be useful for crash investigation.
This is not about the log but about the error message. The error message containing ALL of the above stuff is useless for any user. For crash investigation you can always get the full log from the actual log file. When I've implemented this I did not see such error message. I'd otherwise truncate it to the end since all the above in a error message is clearly ridiculous. Peter

On 06.10.2016 10:29, Peter Krempa wrote:
On Thu, Oct 06, 2016 at 10:23:05 +0300, Nikolay Shirokovskiy wrote:
On 05.10.2016 18:13, Peter Krempa wrote:
On Mon, Sep 12, 2016 at 17:34:39 +0300, Nikolay Shirokovskiy wrote:
Hi, all.
In case migration fails due to destination qemu exits unexpectedly user recevies the qemu log in the error message. Unfortunately log is truncated and the most interesting part is missed (below is the example of such a log [1]).
Actually for the most cases the first patch will be enough to fix the issue. Originally I thought the problem is qemu logging and reading the log are not in sync (which is true) so I tried to fix it as well in the next patches.
* diff from v1:
1. split changes to libvirtd and virtlogd to different patches 2. split virtlogd patch further 3. simplify handling eofs and hangups in draining function
[1] log example:
CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 ES =0000 00000000 00000000 00000000 CS =0000 00000000 00000000 00000000 SS =0000 00000000 00000000 00000000 DS =0000 00000000 00000000 00000000 FS =0000 00000000 00000000 00000000 GS =0000 00000000 00000000 00000000 LDT=0000 00000000 00000000 00000000 TR =0000 00000000 00000000 00000000 GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000000 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=0000000000000000 DR7=0000000000000000 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=0000 FSW=0000 [ST=0] FTW=ff MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 CPU Reset (CPU 1) EAX=00000000 EBX=00000000 ECX=00000000 EDX=000206a1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=000 qemu: terminating on signal 15 from pid 168133
I don't think that reporting all of the above is a good idea. We should perhaps report at most two last lines.
We already report about half of this, this patch just removes random truncation. As to most two lines, AFAIU one can not say what part of this log will be useful for crash investigation.
This is not about the log but about the error message. The error message containing ALL of the above stuff is useless for any user. For crash investigation you can always get the full log from the actual log file.
Isn't leaving 1-2 lines is random? Sometimes it will help, sometimes not. If before die qemu writes 10 lines (besides registers dump) the first line will probably be the most interesting. I think we'd better just print something like "qemu died" ("go and see it's log if you want to").
When I've implemented this I did not see such error message. I'd otherwise truncate it to the end since all the above in a error message is clearly ridiculous.
It is pretty impressive actually )) It scared me a couple of times when I started with libvirt. Looks like BSOD or something. (Hmm, this can be a reason - not to print the dump for unprepared user )) Nikolay

On Thu, Oct 06, 2016 at 10:23:05 +0300, Nikolay Shirokovskiy wrote:
On 05.10.2016 18:13, Peter Krempa wrote:
On Mon, Sep 12, 2016 at 17:34:39 +0300, Nikolay Shirokovskiy wrote:
Hi, all.
In case migration fails due to destination qemu exits unexpectedly user recevies the qemu log in the error message. Unfortunately log is truncated and the most interesting part is missed (below is the example of such a log [1]).
Actually for the most cases the first patch will be enough to fix the issue. Originally I thought the problem is qemu logging and reading the log are not in sync (which is true) so I tried to fix it as well in the next patches.
* diff from v1:
1. split changes to libvirtd and virtlogd to different patches 2. split virtlogd patch further 3. simplify handling eofs and hangups in draining function
[1] log example:
CPU Reset (CPU 0) EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0 ES =0000 00000000 00000000 00000000 CS =0000 00000000 00000000 00000000 SS =0000 00000000 00000000 00000000 DS =0000 00000000 00000000 00000000 FS =0000 00000000 00000000 00000000 GS =0000 00000000 00000000 00000000 LDT=0000 00000000 00000000 00000000 TR =0000 00000000 00000000 00000000 GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000000 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=0000000000000000 DR7=0000000000000000 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=0000 FSW=0000 [ST=0] FTW=ff MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 CPU Reset (CPU 1) EAX=00000000 EBX=00000000 ECX=00000000 EDX=000206a1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=00000000 CCD=00000000 CCO=DYNAMIC EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=000 qemu: terminating on signal 15 from pid 168133
I don't think that reporting all of the above is a good idea. We should perhaps report at most two last lines.
We already report about half of this, this patch just removes random truncation. As to most two lines, AFAIU one can not say what part of this log will be useful for crash investigation.
Well, we're talking about error messages. And I doubt anyone would like to see a 50 lines long error message. We have logs for investigating crashes, the user should just get a reasonable message about why QEMU died, i.e., showing something like the last two lines from the log looks like a reasonable thing to do. Jirka
participants (3)
-
Jiri Denemark
-
Nikolay Shirokovskiy
-
Peter Krempa