On Thu, Sep 20, 2012 at 08:23:59PM +0200, Miloslav Trmač wrote:
This patch set adds structured syslog support to libvirt.
In short, the idea is to embed JSON in a syslog record. This format
is used in a new log output type called "syslog/json".
Example "syslog/json" record (line-wrapped for readability):
Sep 20 19:59:40 kulicka libvirt: @cee: {"msg":"libvirt version:
0.10.1",
"category":"../../src/test/test_driver.c","priority":"info",
"funcname":"virLogVMessage","line":769,
"timestamp":"2012-09-20 17:59:40.853+0000"}
compared with existing "syslog" record format:
Sep 20 20:02:24 kulicka libvirt: 5896: info : libvirt version: 0.10.1
This JSON embedding format is defined by the Lumberjack project led
by the major Open Source syslog implementations
(
https://fedorahosted.org/lumberjack/). The format is also a base
for a future MITRE CEE standard. Some of the fields are specified
at
https://fedorahosted.org/lumberjack/wiki/FieldList , but applications
are free to use whatever fields make sense for them.
The JSON structure makes easy to process logs by programs without writing
code that understands the specific format used by each application,
or using unreliable regexps (
https://fedorahosted.org/ceelog/ is a
companion project that allows processing such logs).
Current systems will store and these records unmodified and present them
to users unmodified, but tools (and probably /var/log/messages) will
be able to show only the "msg" field; this allows us to add more detailed
information without worsening the user experience (the example above
already shows added information that is currently being thrown away when
logging to syslog).
While I agree with the effort to try to standardize on metadata to
be collected with log messages, I can't say I like the Lumberjack
design, or this proposed usage in libvirt.
First it is not possible for libvirt to detect whether messages it
sends to syslog() are going to be processed by a syslog daemon which
actually understands the JSON format. This means we can't turn on
use of this JSON format automatically, without causing all existing
distros to get their logs filled with JSON data.
I also don't like the use of JSON for this purpose in the first place.
Application logging facilities really need to be simple, reliable, and
safe to use in any context. For example in between a fork() and execve()
where POSIX requires only use of async signal safe functions. Now I know
that technically speaking syslog() itself is not async signal safe in
general, but the way libvirt uses it for logging makes it safe(ish) in
our case. We can not, however, imagine that JSON (or really any general
JSON library) is going to be async signal safe.
Note, I don't care what data format is used server side for storing
the logs - if syslogd wants to use JSON format, that's fine. Pushing
this format into client applications is what I dislike.
I can see the appeal of JSON in that it allows applications to send
arbitrarily complex nested data structures into the logs, but that
kind of generality for what is IMHO niche usage, really compromises
the common case. What makes it worse is that the standard field list
quoted above actually demonstrates that you don't actually even need
the nesting in many cases, because it shows use of
{"user!name":"bob"}
as an alternative to {"user":{"name":"bob"}}
This patch series goes to great lengths to introduce a whole series of
new JSON APIs in libvirt, to avoid memory allocation from the heap, but
this work is negated by the fact that the YAJL APIs used at the end all
allocate from the heap anyway.
Finally alot of the metadata items listed in the "standard fields" doc
above would be better filled in by the server (from SCM_RIGHTS), than
the client, so that the server can prevent an application from lying
about its uid/gid/pid etc.
By way of comparison the systemd journal APIs deal with all theses
issues.
The actual journald client side logging API is possible to implement
and use without doing any memory allocation at all, in a completely
async signal safe manner, using only socket() + sendmsg() with a
stack allocated iovec[] for storing individual metadata fields to be
sent. The API is soooo simple to use too, for example, a traditional
syslog() call, can be replaced with something like:
sd_journal_send("MESSAGE=%s", str,
"PRIORITY=%d", prio,
"CODE_LINE=%lld", linenr,
"CODE_FUNC=%s", funcname,
...arbirary other key=value metadata...
NULL);
The journald communication uses a new UNIX socket, so apps can automatically
determine whether journald is available and if not, fallback to plain
syslog for old systems. It uses SCM_RIGHTS to automatically fill in a
number of key fields in an unspoofable way. The way the data is transferred
using sendmsg + iovec means there is no need for client applications to
do any kind of magic escaping of special characters in strings.
The journald log processing tools can easily output the log data in many
different formats, including JSON, without the apps having to get involved
in dealing with JSON themselves.
As I mentioned right at the start, I think it is good to have structured
standardized metadata associated with log messages. I just don't think
that the way applications/libraries like libvirt are being asked to
implement this is a good idea. It would be entirely practical for rsyslog
or syslog-ng to provide a client/daemon protocol (+library) that worked
in exactly the same way systemd's journal protocol (+library) do, and then
turn the received raw data into their preferred JSON format on the server.
FYI, I already have a local patch which adds support for journald to
libvirt logging which I am yet to submit, but is so much simpler than
this lumberjack support.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|