Hello,
Thanks for the review, and I'm sorry about the late response.
----- Original Message -----
While I agree with the effort to try to standardize on metadata to
be collected with log messages, I can't say I like the Lumberjack
design, or this proposed usage in libvirt.
First it is not possible for libvirt to detect whether messages it
sends to syslog() are going to be processed by a syslog daemon which
actually understands the JSON format. This means we can't turn on
use of this JSON format automatically, without causing all existing
distros to get their logs filled with JSON data.
True, but that's an inevitable consequence of being compatible with the syslog
protocol.[1]
Yes, it does make the question of a default awkward. However, considering that journal
has been switched on by default, and MESSAGE= of everything sent to journal goes to
syslog, is there still a chance of enabling a syslog output by default at the same time?
Doing so would cause each message to appear in syslog twice on systems that have
journald.
I also don't like the use of JSON for this purpose in the first
place.
I'm afraid this is not really open to a change, however I'll try to
explain why I don't see JSON as a problem.
Application logging facilities really need to be simple, reliable,
and
safe to use in any context. For example in between a fork() and execve()
where POSIX requires only use of async signal safe functions. Now I know
that technically speaking syslog() itself is not async signal safe in
general, but the way libvirt uses it for logging makes it safe(ish) in
our case. We can not, however, imagine that JSON (or really any general
JSON library) is going to be async signal safe.
You can also view JSON syntax as a sequence of (fixed field identifier string, field value
with some characters escaped) pairs, which is pretty much the minimum standard for _any_
representation of data in a text format. Choosing JSON was just simpler than trying to
committee-invent something new.[2]
The journal protocol has a similarly inconvenient case of handling \n - it's only not
as noticeable because there are fewer characters to escape, and \n is rarer.
Re: async-signal-safe use: Yes, using a JSON library is more convenient and generally
preferred to writing a yet another JSON encoder, and yes, a typical JSON library will not
be async-signal-safe. However, applications are not required to use all JSON facilities.
An application that wants to log a static string between fork() and execve() can just as
easily log a static JSON string.
If the multi-threaded application wanted to log something dynamic at that inconvenient
time, the application author would have to write string formatting code manually in any
case (sprintf() is also not async signal safe) - and I don't think JSON escaping is
really difficult. Sure, it is more difficult than no escaping, but there needs to be
_some_ escaping if we want to keep the existing syslog transport.
Regarding libvirt specifically, virLogVMessage is rather far from async-signal-safe (e.g.
virVasprintf(), virLogLock()), so this doesn't seem to be a concern for libvirt.
This patch series goes to great lengths to introduce a whole series
of
new JSON APIs in libvirt, to avoid memory allocation from the heap, but
this work is negated by the fact that the YAJL APIs used at the end all
allocate from the heap anyway.
The major problem here was not memory allocation itself (as noted above, other parts of
the log path allocate memory), but possible recursion through virReportOOMError() ->
virLogMessage() and deadlock on virLogMutex. Alocating data on the stack is primarily a
way to avoid OOM, not signal safety.
In retrospect, the virJSON* data structure is unnecessarily complex for the simple list of
key-type-value triples that libvirt actually needs for logging, making the code rather
more complex than necessary. I'll replace that, see the other reply.
Finally alot of the metadata items listed in the "standard
fields" doc
above would be better filled in by the server (from SCM_RIGHTS), than
the client, so that the server can prevent an application from lying
about its uid/gid/pid etc.
rsyslog does add these fields automatically, and the patch set does not add them manually.
This is a bug in the FieldList wiki page, I'll get it fixed.
By way of comparison the systemd journal APIs deal with all theses
issues.
The actual journald client side logging API is possible to implement
and use without doing any memory allocation at all,
virLogOutputToJournald can be implemented like that, but not the libsystemd-journal API:
you'll find a fairly liberal use of alloca() in journal-send.c. If alloca() is
allowed, a JSON formatting API can be implemented without "memory allocation" as
well.
in a completely
async signal safe manner, using only socket() + sendmsg() with a
stack allocated iovec[] for storing individual metadata fields to be
sent.
Looking at how libvirt currently sends journal data, is the /run/systemd/journal/socket
protocol a stable ABI? I can't find anything documenting it within the systemd git
tree, and it is not mentioned at
http://www.freedesktop.org/wiki/Software/systemd/InterfaceStabilityPromise .
The API is soooo simple to use too, for example, a traditional
syslog() call, can be replaced with something like:
sd_journal_send("MESSAGE=%s", str,
"PRIORITY=%d", prio,
"CODE_LINE=%lld", linenr,
"CODE_FUNC=%s", funcname,
...arbirary other key=value metadata...
NULL);
There actually is a library (libumberlog) providing just such a simple API. A
corresponding call is
ul_syslog(prio, str,
"code_line", "%s", service,
"code_func", "%s", auth_method,
... arbitrary other metadata...
NULL);
However, I thought that libvirt would prefer not adding yet another dependency, especially
when the extensions for virRaiseErrorFull require a per-field data structure in any
case.[3]
Mirek
[1] Whether JSON is desired or not also ultimately depends on the final recipient of the
data, not on the system that generates it - an "old" system could generate JSON
that will be forwarded to a log aggregation server.
[2] Especially because the major syslog implementations already supported the JSON format
for other reasons.
[3] And, to be honest, the library does not nicely support the case of virRaiseErrorFull
adding more data. It would be possible, but ugly - we'll make it easier in the
future.