[libvirt] [PATCH 0/6 v2] Allow debug dump in case of crashes

Changes in v2 based on danpb feedback: - make libvirtd.log the default instead of syslog - do not open the libvirtd.log directly - create a new API to dump debug buffer to file output - use that new API now in the fatal signal handler This patch set main goal is to allow dumping of full debug informations for operations occuring in libvirt daemon before a crash. The principle is to save all logs to a round-robbin memory buffer (which we already do, except we never use that buffer), save the general daemon logs to a libvirtd.log file and upon reception of a fatal signal, save the memory buffer directly to the log file(s) or stderr. There is quite a few remarks about this, we already have that buffer but we don't log everything in, only what's output, that's what patch 1 changes, then we need to provide a function to dump it in emergency from signal handler so this also adds a new API for it. We also need to be cautious about what system call we operate from the signal handler, but basically we should limit ourselves here to write and sigaction which are safe there dixit POSIX. Something like killall -USR2 libvirtd allows to see the kind of output one get, an idle libvirtd is quiet, but handle/timer/fdpolls tend to be very verbose, Maybe now that this part is well debugged some of those could be suppressed or commented off. Also the buffer is a statically allocated 64KB, this should be made more flexible based on a configuration entry but it's left for another patch. Daniel

Initially only the log actually written out by libvirt were saved on the memory buffer, this patch forces all informations including info and debug to be saved in memory too. This is useful to get full data in case of crash. Signed-off-by: Daniel Veillard <veillard@redhat.com> --- src/util/logging.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-)

On Fri, Mar 04, 2011 at 06:30:52PM +0800, Daniel Veillard wrote:
Initially only the log actually written out by libvirt were saved on the memory buffer, this patch forces all informations including info and debug to be saved in memory too. This is useful to get full data in case of crash.
Signed-off-by: Daniel Veillard <veillard@redhat.com> --- src/util/logging.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/04/2011 03:30 AM, Daniel Veillard wrote: In the subject - s/robbin/robin/ Oh well, you already pushed it with the typo.
Initially only the log actually written out by libvirt were saved on the memory buffer, this patch forces all informations including info and debug to be saved in memory too. This is useful to get full data in case of crash.
-- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Mar 04, 2011 at 08:50:54AM -0700, Eric Blake wrote:
On 03/04/2011 03:30 AM, Daniel Veillard wrote:
In the subject - s/robbin/robin/
Oh well, you already pushed it with the typo.
yeah, grin :-\ ... to my defense, I really though it was spelled that way, it's not a typo, a real mistake ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Syslog is not the best place to go search for libvirt error logs, change it to a default file output libvirtd.log, but still keep standard error if not run as a daemon. Depending on whether it's run as root or user, the log is saved in the local state dir or in $HOME/.libvirt. * daemon/libvirtd.c: change default logging to go to libvirtd.log Signed-off-by: Daniel Veillard <veillard@redhat.com> --- daemon/libvirtd.c | 56 ++++++++++++++++++++++++++++++++++++++++------------ 1 files changed, 43 insertions(+), 13 deletions(-)

On Fri, Mar 04, 2011 at 06:30:53PM +0800, Daniel Veillard wrote:
+++ b/daemon/libvirtd.c @@ -197,6 +197,8 @@ static int audit_logging = 0;
#define DH_BITS 1024
+static int logFD = -1; +
Ah, I though I had that cleaned up, will remove before any commit, or next push, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Fri, Mar 04, 2011 at 06:30:53PM +0800, Daniel Veillard wrote:
Syslog is not the best place to go search for libvirt error logs, change it to a default file output libvirtd.log, but still keep standard error if not run as a daemon. Depending on whether it's run as root or user, the log is saved in the local state dir or in $HOME/.libvirt. * daemon/libvirtd.c: change default logging to go to libvirtd.log
Signed-off-by: Daniel Veillard <veillard@redhat.com> --- daemon/libvirtd.c | 56 ++++++++++++++++++++++++++++++++++++++++------------ 1 files changed, 43 insertions(+), 13 deletions(-)
ACK
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index b2e5e20..ebed828 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -197,6 +197,8 @@ static int audit_logging = 0;
#define DH_BITS 1024
+static int logFD = -1;
Just need to delete this.
+ +/* + * Stop logging + */ +static void +qemudStopLogging(void) +{ + virLogShutdown(); + VIR_FORCE_CLOSE(logFD); }
No need for this new function now.
/* Read the config file if it exists. @@ -2805,7 +2835,7 @@ remoteReadConfigFile (struct qemud_server *server, const char *filename) /* * First get all the logging settings and activate them */ - if (qemudSetLogging(conf, filename) < 0) + if (qemudSetLogging(server, conf, filename) < 0) goto free_and_fail;
GET_CONF_INT (conf, filename, listen_tcp); @@ -3369,6 +3399,6 @@ error: qemudCleanup(server); if (pid_file) unlink (pid_file); - virLogShutdown(); + qemudStopLogging();
So this chunk can just disappear too Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

As the file may grow quite a bit especially with debug turned on. * daemon/libvirtd.logrotate.in daemon/Makefile.am libvirt.spec.in: add new logrotate file for the daemon log Signed-off-by: Daniel Veillard <veillard@redhat.com> --- daemon/Makefile.am | 10 +++++++++- daemon/libvirtd.logrotate.in | 9 +++++++++ libvirt.spec.in | 1 + 3 files changed, 19 insertions(+), 1 deletions(-) create mode 100644 daemon/libvirtd.logrotate.in

On Fri, Mar 04, 2011 at 06:30:54PM +0800, Daniel Veillard wrote:
As the file may grow quite a bit especially with debug turned on. * daemon/libvirtd.logrotate.in daemon/Makefile.am libvirt.spec.in: add new logrotate file for the daemon log
Signed-off-by: Daniel Veillard <veillard@redhat.com> --- daemon/Makefile.am | 10 +++++++++- daemon/libvirtd.logrotate.in | 9 +++++++++ libvirt.spec.in | 1 + 3 files changed, 19 insertions(+), 1 deletions(-) create mode 100644 daemon/libvirtd.logrotate.in
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

* src/util/logging.c: the start pointer need to wrap around too Signed-off-by: Daniel Veillard <veillard@redhat.com> --- src/util/logging.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-)

On Fri, Mar 04, 2011 at 06:30:55PM +0800, Daniel Veillard wrote:
* src/util/logging.c: the start pointer need to wrap around too
Signed-off-by: Daniel Veillard <veillard@redhat.com> --- src/util/logging.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

virLogEmergencyDumpAll() allows to dump the content of the debug buffer from within a signal handler. It saves to all log file or stderr if none is found * src/util/logging.h src/util/logging.c: add the new API and cleanup the old virLogDump code * src/libvirt_private.syms: exports it as a private symbol Signed-off-by: Daniel Veillard <veillard@redhat.com> --- src/libvirt_private.syms | 1 + src/util/logging.c | 87 +++++++++++++++++++++++++++------------------ src/util/logging.h | 2 +- 3 files changed, 54 insertions(+), 36 deletions(-)

On Fri, Mar 04, 2011 at 06:30:56PM +0800, Daniel Veillard wrote:
virLogEmergencyDumpAll() allows to dump the content of the debug buffer from within a signal handler. It saves to all log file or stderr if none is found * src/util/logging.h src/util/logging.c: add the new API and cleanup the old virLogDump code * src/libvirt_private.syms: exports it as a private symbol
Signed-off-by: Daniel Veillard <veillard@redhat.com> --- src/libvirt_private.syms | 1 + src/util/logging.c | 87 +++++++++++++++++++++++++++------------------ src/util/logging.h | 2 +- 3 files changed, 54 insertions(+), 36 deletions(-)
ACK with one small fix....
+static void virLogDumpAllFD(const char *msg, int len) { + int i, found = 0; + + for (i = 0; i < virLogNbOutputs;i++) { + if (virLogOutputs[i].f == virLogOutputToFd) { + int fd = (long) virLogOutputs[i].data; + + if (fd >= 0) + ignore_value (safewrite(fd, msg, len));
Need to be: if (fd >= 0) { ignore_value (safewrite(fd, msg, len)); found = 1; }
+ } + } + if (!found) + ignore_value (safewrite(STDERR_FILENO, msg, len)); +}
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Mar 04, 2011 at 11:33:16AM +0000, Daniel P. Berrange wrote:
On Fri, Mar 04, 2011 at 06:30:56PM +0800, Daniel Veillard wrote:
virLogEmergencyDumpAll() allows to dump the content of the debug buffer from within a signal handler. It saves to all log file or stderr if none is found * src/util/logging.h src/util/logging.c: add the new API and cleanup the old virLogDump code * src/libvirt_private.syms: exports it as a private symbol
Signed-off-by: Daniel Veillard <veillard@redhat.com> --- src/libvirt_private.syms | 1 + src/util/logging.c | 87 +++++++++++++++++++++++++++------------------ src/util/logging.h | 2 +- 3 files changed, 54 insertions(+), 36 deletions(-)
ACK with one small fix....
+static void virLogDumpAllFD(const char *msg, int len) { + int i, found = 0; + + for (i = 0; i < virLogNbOutputs;i++) { + if (virLogOutputs[i].f == virLogOutputToFd) { + int fd = (long) virLogOutputs[i].data; + + if (fd >= 0) + ignore_value (safewrite(fd, msg, len));
Need to be:
if (fd >= 0) { ignore_value (safewrite(fd, msg, len)); found = 1; }
Oops, right :-) Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On 03/04/2011 03:30 AM, Daniel Veillard wrote:
virLogEmergencyDumpAll() allows to dump the content of the debug buffer from within a signal handler. It saves to all log file or stderr if none is found * src/util/logging.h src/util/logging.c: add the new API and cleanup the old virLogDump code * src/libvirt_private.syms: exports it as a private symbol
+void +virLogEmergencyDumpAll(int signum) { + int ret = 0, len; + char buf[100]; + + if (virLogLen == 0) + return;
- if ((virLogLen == 0) || (f == NULL)) - return 0; virLogLock();
Is virLogLock async-signal-safe?
+ snprintf(buf, sizeof(buf) - 1, + "Caught signal %d, dumping internal log buffer:\n", signum);
snprintf is _not_ safe; it can call malloc. We probably ought to use a manual decimal-to-string conversion loop instead.
+ buf[sizeof(buf) - 1] = 0; + virLogDumpAllFD(buf, strlen(buf)); + snprintf(buf, sizeof(buf) - 1, "\n\n ====== start of log =====\n\n");
Why snprintf here, rather than strcpy? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Mar 04, 2011 at 08:53:55AM -0700, Eric Blake wrote:
On 03/04/2011 03:30 AM, Daniel Veillard wrote:
virLogEmergencyDumpAll() allows to dump the content of the debug buffer from within a signal handler. It saves to all log file or stderr if none is found * src/util/logging.h src/util/logging.c: add the new API and cleanup the old virLogDump code * src/libvirt_private.syms: exports it as a private symbol
+void +virLogEmergencyDumpAll(int signum) { + int ret = 0, len; + char buf[100]; + + if (virLogLen == 0) + return;
- if ((virLogLen == 0) || (f == NULL)) - return 0; virLogLock();
Is virLogLock async-signal-safe?
I could not find, I'm afraid it's implementation dependant. I would expect the lock to be done in user space using an atomic test and set op and hence being safe at least on i386 and x86_64. The problem is that if we drop the lock we can crash if used on USR2 while another thread is legitimately running.
+ snprintf(buf, sizeof(buf) - 1, + "Caught signal %d, dumping internal log buffer:\n", signum);
snprintf is _not_ safe; it can call malloc. We probably ought to use a manual decimal-to-string conversion loop instead.
Even better I think a case on signum and outputting the signal description is even more useful, and simpler.
+ buf[sizeof(buf) - 1] = 0; + virLogDumpAllFD(buf, strlen(buf)); + snprintf(buf, sizeof(buf) - 1, "\n\n ====== start of log =====\n\n");
Why snprintf here, rather than strcpy?
pure lazyness. I'm suggesting the following patch to fix those 2 issues, it get rids of the temporary buffer which was used to compute the length, better done in the logging routine directly. It also get rid of ret variable which was never used :-\ I'm also removing the early virLogLen == 0 test because we should at least log the fact we got the signal, even if the buffer may be empty. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Mon, Mar 07, 2011 at 03:06:18PM +0800, Daniel Veillard wrote:
On Fri, Mar 04, 2011 at 08:53:55AM -0700, Eric Blake wrote:
On 03/04/2011 03:30 AM, Daniel Veillard wrote:
virLogEmergencyDumpAll() allows to dump the content of the debug buffer from within a signal handler. It saves to all log file or stderr if none is found * src/util/logging.h src/util/logging.c: add the new API and cleanup the old virLogDump code * src/libvirt_private.syms: exports it as a private symbol
+void +virLogEmergencyDumpAll(int signum) { + int ret = 0, len; + char buf[100]; + + if (virLogLen == 0) + return;
- if ((virLogLen == 0) || (f == NULL)) - return 0; virLogLock();
Is virLogLock async-signal-safe?
I could not find, I'm afraid it's implementation dependant. I would expect the lock to be done in user space using an atomic test and set op and hence being safe at least on i386 and x86_64. The problem is that if we drop the lock we can crash if used on USR2 while another thread is legitimately running.
+ snprintf(buf, sizeof(buf) - 1, + "Caught signal %d, dumping internal log buffer:\n", signum);
snprintf is _not_ safe; it can call malloc. We probably ought to use a manual decimal-to-string conversion loop instead.
Even better I think a case on signum and outputting the signal description is even more useful, and simpler.
+ buf[sizeof(buf) - 1] = 0; + virLogDumpAllFD(buf, strlen(buf)); + snprintf(buf, sizeof(buf) - 1, "\n\n ====== start of log =====\n\n");
Why snprintf here, rather than strcpy?
pure lazyness.
I'm suggesting the following patch to fix those 2 issues, it get rids of the temporary buffer which was used to compute the length, better done in the logging routine directly. It also get rid of ret variable which was never used :-\ I'm also removing the early virLogLen == 0 test because we should at least log the fact we got the signal, even if the buffer may be empty.
Daniel
-- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
diff --git a/src/util/logging.c b/src/util/logging.c index e39e6c5..956e046 100644 --- a/src/util/logging.c +++ b/src/util/logging.c @@ -30,6 +30,7 @@ #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> +#include <signal.h> #if HAVE_SYSLOG_H # include <syslog.h> #endif @@ -283,6 +284,9 @@ static void virLogStr(const char *str, int len) { static void virLogDumpAllFD(const char *msg, int len) { int i, found = 0;
+ if (len <= 0) + len = strlen(msg); + for (i = 0; i < virLogNbOutputs;i++) { if (virLogOutputs[i].f == virLogOutputToFd) { int fd = (long) virLogOutputs[i].data; @@ -308,24 +312,38 @@ static void virLogDumpAllFD(const char *msg, int len) { */ void virLogEmergencyDumpAll(int signum) { - int ret = 0, len; - char buf[100]; - - if (virLogLen == 0) - return; + int len;
virLogLock(); - snprintf(buf, sizeof(buf) - 1, - "Caught signal %d, dumping internal log buffer:\n", signum); - buf[sizeof(buf) - 1] = 0; - virLogDumpAllFD(buf, strlen(buf)); - snprintf(buf, sizeof(buf) - 1, "\n\n ====== start of log =====\n\n"); - virLogDumpAllFD(buf, strlen(buf)); + switch (signum) { + case SIGFPE: + virLogDumpAllFD( "Caught signal Floating-point exception", -1); + break; + case SIGSEGV: + virLogDumpAllFD( "Caught Segmentation violation", -1); + break; + case SIGILL: + virLogDumpAllFD( "Caught illegal instruction", -1); + break; + case SIGABRT: + virLogDumpAllFD( "Caught abort signal", -1); + break; + case SIGBUS: + virLogDumpAllFD( "Caught bus error", -1); + break; + case SIGUSR2: + virLogDumpAllFD( "Caught User-defined signal 2", -1); + break; + default: + virLogDumpAllFD( "Caught unexpected signal", -1); + break; + } + virLogDumpAllFD(" dumping internal log buffer:\n", -1); + virLogDumpAllFD("\n\n ====== start of log =====\n\n", -1); while (virLogLen > 0) { if (virLogStart + virLogLen < LOG_BUFFER_SIZE) { virLogBuffer[virLogStart + virLogLen] = 0; virLogDumpAllFD(&virLogBuffer[virLogStart], virLogLen); - ret += virLogLen; virLogStart += virLogLen; virLogLen = 0; } else { @@ -336,8 +354,7 @@ virLogEmergencyDumpAll(int signum) { virLogStart = 0; } } - snprintf(buf, sizeof(buf) - 1, "\n\n ====== end of log =====\n\n"); - virLogDumpAllFD(buf, strlen(buf)); + virLogDumpAllFD("\n\n ====== end of log =====\n\n", -1); virLogUnlock(); }
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Mar 07, 2011 at 10:06:35AM +0000, Daniel P. Berrange wrote:
On Mon, Mar 07, 2011 at 03:06:18PM +0800, Daniel Veillard wrote:
On Fri, Mar 04, 2011 at 08:53:55AM -0700, Eric Blake wrote:
snprintf is _not_ safe; it can call malloc. We probably ought to use a manual decimal-to-string conversion loop instead.
Even better I think a case on signum and outputting the signal description is even more useful, and simpler.
+ buf[sizeof(buf) - 1] = 0; + virLogDumpAllFD(buf, strlen(buf)); + snprintf(buf, sizeof(buf) - 1, "\n\n ====== start of log =====\n\n");
Why snprintf here, rather than strcpy?
pure lazyness.
I'm suggesting the following patch to fix those 2 issues, it get rids of the temporary buffer which was used to compute the length, better done in the logging routine directly. It also get rid of ret variable which was never used :-\ I'm also removing the early virLogLen == 0 test because we should at least log the fact we got the signal, even if the buffer may be empty. [...]
ACK
thanks, pushed ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Mon, Mar 07, 2011 at 03:06:18PM +0800, Daniel Veillard wrote:
On Fri, Mar 04, 2011 at 08:53:55AM -0700, Eric Blake wrote:
On 03/04/2011 03:30 AM, Daniel Veillard wrote:
virLogEmergencyDumpAll() allows to dump the content of the debug buffer from within a signal handler. It saves to all log file or stderr if none is found * src/util/logging.h src/util/logging.c: add the new API and cleanup the old virLogDump code * src/libvirt_private.syms: exports it as a private symbol
+void +virLogEmergencyDumpAll(int signum) { + int ret = 0, len; + char buf[100]; + + if (virLogLen == 0) + return;
- if ((virLogLen == 0) || (f == NULL)) - return 0; virLogLock();
Is virLogLock async-signal-safe?
I could not find, I'm afraid it's implementation dependant. I would expect the lock to be done in user space using an atomic test and set op and hence being safe at least on i386 and x86_64. The problem is that if we drop the lock we can crash if used on USR2 while another thread is legitimately running.
Even if the impl was robust there is an architectural problem with mixing of mutexes and signals. If a thread is running virLogMessage(), it will be holding the log mutex. If a signal is processed, and it calls into this new log code, it will likely deadlock. Is there a way we can somehow make the global log buffer into something that can be lockless for readers. ie although our main log funcs would still need to use the mutex in general when writing to the buffer, we could read the log buffer without any locks. It would probably suffice to make sure we use atomic integer arithmetic on virLogLen, virLogStart and virLogEnd, and update them in a specific order to avoid a reader seeing garbage. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Mar 07, 2011 at 10:18:57AM +0000, Daniel P. Berrange wrote:
On Mon, Mar 07, 2011 at 03:06:18PM +0800, Daniel Veillard wrote:
On Fri, Mar 04, 2011 at 08:53:55AM -0700, Eric Blake wrote:
Is virLogLock async-signal-safe?
I could not find, I'm afraid it's implementation dependant. I would expect the lock to be done in user space using an atomic test and set op and hence being safe at least on i386 and x86_64. The problem is that if we drop the lock we can crash if used on USR2 while another thread is legitimately running.
Even if the impl was robust there is an architectural problem with mixing of mutexes and signals. If a thread is running virLogMessage(), it will be holding the log mutex. If a signal is processed, and it calls into this new log code, it will likely deadlock.
right this is the only thing that really makes the USR2 debug dangerous IMHO, and we should try to fix it.
Is there a way we can somehow make the global log buffer into something that can be lockless for readers. ie although our main log funcs would still need to use the mutex in general when writing to the buffer, we could read the log buffer without any locks.
the problem is that I really want to empty the buffer as a result of emitting the logs, i.e. the reader will emit only once the content at most.
It would probably suffice to make sure we use atomic integer arithmetic on virLogLen, virLogStart and virLogEnd, and update them in a specific order to avoid a reader seeing garbage.
honnestly I would not be chagrined by the probability of emitting a little bit of garbage when flushing the logs or one or two items. I think we can work lockless there, by registering the initial values of virLogStart and virLogEnd when entering the routine, emit whever is in those boundaries with the tiny risk that some of it may be overwritten (at the beginning hence the less interesting stuff) and on exit set back virLogStart = virLogEnd and virLogLen = 0, possibly loosing whetver might have been emitted between the beginning of the save and the exit from the routine. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Mon, Mar 07, 2011 at 09:18:46PM +0800, Daniel Veillard wrote:
On Mon, Mar 07, 2011 at 10:18:57AM +0000, Daniel P. Berrange wrote:
On Mon, Mar 07, 2011 at 03:06:18PM +0800, Daniel Veillard wrote:
On Fri, Mar 04, 2011 at 08:53:55AM -0700, Eric Blake wrote:
Is virLogLock async-signal-safe?
I could not find, I'm afraid it's implementation dependant. I would expect the lock to be done in user space using an atomic test and set op and hence being safe at least on i386 and x86_64. The problem is that if we drop the lock we can crash if used on USR2 while another thread is legitimately running.
Even if the impl was robust there is an architectural problem with mixing of mutexes and signals. If a thread is running virLogMessage(), it will be holding the log mutex. If a signal is processed, and it calls into this new log code, it will likely deadlock.
right this is the only thing that really makes the USR2 debug dangerous IMHO, and we should try to fix it.
Is there a way we can somehow make the global log buffer into something that can be lockless for readers. ie although our main log funcs would still need to use the mutex in general when writing to the buffer, we could read the log buffer without any locks.
the problem is that I really want to empty the buffer as a result of emitting the logs, i.e. the reader will emit only once the content at most.
While emptying the buffer is nice, I don't actually think it is too important. Upon SEGV,BUS,FPE etc we're going to abort the entire process anyway, so clearing the buffer isn't neccessary. With SIGUSR2, if the user triggers it multiple times, they will likely have left enough time between invocation that the original data has been already overwritten. So clearing the buffer is a minor non-critical optmization for that.
It would probably suffice to make sure we use atomic integer arithmetic on virLogLen, virLogStart and virLogEnd, and update them in a specific order to avoid a reader seeing garbage.
honnestly I would not be chagrined by the probability of emitting a little bit of garbage when flushing the logs or one or two items. I think we can work lockless there, by registering the initial values of virLogStart and virLogEnd when entering the routine, emit whever is in those boundaries with the tiny risk that some of it may be overwritten (at the beginning hence the less interesting stuff) and on exit set back virLogStart = virLogEnd and virLogLen = 0, possibly loosing whetver might have been emitted between the beginning of the save and the exit from the routine.
Agreed, the only thing we need to avoid is crashing. If we emit a little bit of garbage data, that's not too bad and outweighed by the benefit of useful data in most scenarios. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Mar 07, 2011 at 01:22:09PM +0000, Daniel P. Berrange wrote:
On Mon, Mar 07, 2011 at 09:18:46PM +0800, Daniel Veillard wrote:
the problem is that I really want to empty the buffer as a result of emitting the logs, i.e. the reader will emit only once the content at most.
While emptying the buffer is nice, I don't actually think it is too important. Upon SEGV,BUS,FPE etc we're going to abort the entire process anyway, so clearing the buffer isn't neccessary. With SIGUSR2, if the user triggers it multiple times, they will likely have left enough time between invocation that the original data has been already overwritten. So clearing the buffer is a minor non-critical optmization for that.
Well it's actually interesting to see what my have happen between exactly 2 instruction on an otherwise idle system, and consecutive USR2 gives this if we flush, I palyed with this a bit and I like the feature, it would be sad to drop this if we can avoid.
It would probably suffice to make sure we use atomic integer arithmetic on virLogLen, virLogStart and virLogEnd, and update them in a specific order to avoid a reader seeing garbage.
honnestly I would not be chagrined by the probability of emitting a little bit of garbage when flushing the logs or one or two items. I think we can work lockless there, by registering the initial values of virLogStart and virLogEnd when entering the routine, emit whever is in those boundaries with the tiny risk that some of it may be overwritten (at the beginning hence the less interesting stuff) and on exit set back virLogStart = virLogEnd and virLogLen = 0, possibly loosing whetver might have been emitted between the beginning of the save and the exit from the routine.
Agreed, the only thing we need to avoid is crashing. If we emit a little bit of garbage data, that's not too bad and outweighed by the benefit of useful data in most scenarios.
Okay, I'm gonna look more closely. The good thing is that virLogEmergencyDumpAll() is really the only function reading that buffer so it should be easy to make things safe without locking. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Mon, Mar 07, 2011 at 09:32:59PM +0800, Daniel Veillard wrote:
On Mon, Mar 07, 2011 at 01:22:09PM +0000, Daniel P. Berrange wrote:
On Mon, Mar 07, 2011 at 09:18:46PM +0800, Daniel Veillard wrote:
the problem is that I really want to empty the buffer as a result of emitting the logs, i.e. the reader will emit only once the content at most.
While emptying the buffer is nice, I don't actually think it is too important. Upon SEGV,BUS,FPE etc we're going to abort the entire process anyway, so clearing the buffer isn't neccessary. With SIGUSR2, if the user triggers it multiple times, they will likely have left enough time between invocation that the original data has been already overwritten. So clearing the buffer is a minor non-critical optmization for that.
Well it's actually interesting to see what my have happen between exactly 2 instruction on an otherwise idle system, and consecutive USR2 gives this if we flush, I palyed with this a bit and I like the feature, it would be sad to drop this if we can avoid.
IIUC, the messages written in the log buffer have the timestamp prepended to each entry, you could likely identify any overlap betweeen two USR2 dumps via the timestamps. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

In case of imminent crash or upon request (signal USR2), dump the logging buffer to the libvirtd.log file for post-mortem analysis * daemon/libvirtd.c: create a sig_fatal() handler connected to SIGFPE SIGSEGV SIGILL SIGABRT SIGBUS and SIGUSR2, just dumping the log buffer using virLogEmergencyDumpAll Signed-off-by: Daniel Veillard <veillard@redhat.com> --- daemon/libvirtd.c | 32 ++++++++++++++++++++++++++++++++ 1 files changed, 32 insertions(+), 0 deletions(-)

On Fri, Mar 04, 2011 at 06:30:57PM +0800, Daniel Veillard wrote:
In case of imminent crash or upon request (signal USR2), dump the logging buffer to the libvirtd.log file for post-mortem analysis * daemon/libvirtd.c: create a sig_fatal() handler connected to SIGFPE SIGSEGV SIGILL SIGABRT SIGBUS and SIGUSR2, just dumping the log buffer using virLogEmergencyDumpAll
Signed-off-by: Daniel Veillard <veillard@redhat.com> --- daemon/libvirtd.c | 32 ++++++++++++++++++++++++++++++++ 1 files changed, 32 insertions(+), 0 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Mar 04, 2011 at 06:30:51PM +0800, Daniel Veillard wrote:
Changes in v2 based on danpb feedback: - make libvirtd.log the default instead of syslog - do not open the libvirtd.log directly - create a new API to dump debug buffer to file output - use that new API now in the fatal signal handler
This patch set main goal is to allow dumping of full debug informations for operations occuring in libvirt daemon before a crash.
Okay, I pushed the patch set with the 2 small fixes raised on review, thanks ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (3)
-
Daniel P. Berrange
-
Daniel Veillard
-
Eric Blake