[libvirt] [PATCH 0/3] Debug: Improve log messages.

Libvirt logs include many snippets for debugging daemon state, but some of those messages are either missing vital information or end up logging "errors" for normal operating conditions as well. This series improves log messages, also adding additional WARN'ings for connection instantiation and closure. Prerna Saxena (3): Debug: Add WARN'ing messages for when a client has opened/closed connection. Debug: Remove unnecessary errors reported while parsing non-existent sysfs files. Debug: Report VM errors more concisely. src/qemu/qemu_process.c | 4 ++-- src/rpc/virnetserverclient.c | 18 ++++++++++++++---- src/util/virnetdev.c | 2 +- 3 files changed, 17 insertions(+), 7 deletions(-) -- 1.8.1.2

While tracing connections from a remote client, it helps to keep track of the connection lifecycle. Messages such as the following : error : virNetSocketReadWire:1574 : End of file while reading data: Input/output error are rather unhelpful. They do not indicate if the client had earlier asked for connection closure via libvirt API. This patch introduces messages to annotate when a client connected/disconnected. Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/rpc/virnetserverclient.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/rpc/virnetserverclient.c b/src/rpc/virnetserverclient.c index 85857bc..a77feaa 100644 --- a/src/rpc/virnetserverclient.c +++ b/src/rpc/virnetserverclient.c @@ -678,14 +678,19 @@ int virNetServerClientGetTLSKeySize(virNetServerClientPtr client) return size; } #endif - +/* + * This mostly just needs to publish the client socket FD to logs. + * It does not necessarily need a lock, or will add lock contention problems. + * Replace the lock with a reference counting mechanism, to prevent the client + * object from being deallocated while this is being run + */ int virNetServerClientGetFD(virNetServerClientPtr client) { int fd = -1; - virObjectLock(client); + virObjectRef(client); if (client->sock) fd = virNetSocketGetFD(client->sock); - virObjectUnlock(client); + virObjectUnref(client); return fd; } @@ -965,7 +970,9 @@ void virNetServerClientClose(virNetServerClientPtr client) virKeepAlivePtr ka; virObjectLock(client); - VIR_DEBUG("client=%p", client); + VIR_WARN("Free'ing up resources for client=%p sock=%d", client, + virNetServerClientGetFD(client)); + if (!client->sock) { virObjectUnlock(client); return; @@ -1039,6 +1046,8 @@ void virNetServerClientDelayedClose(virNetServerClientPtr client) virObjectLock(client); client->delayedClose = true; virObjectUnlock(client); + VIR_WARN("Client=%p sock=%d requested closure of connection.", + client, virNetServerClientGetFD(client)); } void virNetServerClientImmediateClose(virNetServerClientPtr client) @@ -1151,6 +1160,7 @@ static void virNetServerClientDispatchRead(virNetServerClientPtr client) if (client->rx->nfds == 0) { if (virNetServerClientRead(client) < 0) { client->wantClose = true; + VIR_WARN("Client=%p sock=%p closed connection", client, client->sock); return; /* Error */ } } -- 1.8.1.2

On Wed, Mar 22, 2017 at 01:02:17 -0700, Prerna Saxena wrote:
While tracing connections from a remote client, it helps to keep track of the connection lifecycle. Messages such as the following :
error : virNetSocketReadWire:1574 : End of file while reading data: Input/output error
are rather unhelpful. They do not indicate if the client had earlier asked for connection closure via libvirt API. This patch introduces messages to annotate when a client connected/disconnected.
Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/rpc/virnetserverclient.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/rpc/virnetserverclient.c b/src/rpc/virnetserverclient.c index 85857bc..a77feaa 100644 --- a/src/rpc/virnetserverclient.c +++ b/src/rpc/virnetserverclient.c @@ -678,14 +678,19 @@ int virNetServerClientGetTLSKeySize(virNetServerClientPtr client) return size; } #endif - +/* + * This mostly just needs to publish the client socket FD to logs. + * It does not necessarily need a lock, or will add lock contention problems. + * Replace the lock with a reference counting mechanism, to prevent the client + * object from being deallocated while this is being run + */ int virNetServerClientGetFD(virNetServerClientPtr client) { int fd = -1; - virObjectLock(client); + virObjectRef(client); if (client->sock) fd = virNetSocketGetFD(client->sock); - virObjectUnlock(client); + virObjectUnref(client);
This change is not justified in any way. Also looks wrong. You can't access an unlocked object.
return fd; }
@@ -965,7 +970,9 @@ void virNetServerClientClose(virNetServerClientPtr client) virKeepAlivePtr ka;
virObjectLock(client); - VIR_DEBUG("client=%p", client); + VIR_WARN("Free'ing up resources for client=%p sock=%d", client, + virNetServerClientGetFD(client));
NACK using warnings instead of debug messages is not desired. For debug purposes you should use debug logs.

I looked through what you were suggesting. I was assuming virNetSocketGetFD()would do a NULL check for the sock arg, and would return immediately if a different client executed a virNetServerClientClose() setting client->sock to null. Since this check is missing, I understand the implicit assumption is to always have virNetServerClientGetFD() lock the client, and consequently, I will rearrange debug messages around to prevent lock contention. Sending out a V2. On Wed, Mar 22, 2017 at 1:41 PM, Peter Krempa <pkrempa@redhat.com> wrote:
While tracing connections from a remote client, it helps to keep track of the connection lifecycle. Messages such as the following :
error : virNetSocketReadWire:1574 : End of file while reading data: Input/output error
are rather unhelpful. They do not indicate if the client had earlier asked for connection closure via libvirt API. This patch introduces messages to annotate when a client connected/disconnected.
Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/rpc/virnetserverclient.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/rpc/virnetserverclient.c b/src/rpc/virnetserverclient.c index 85857bc..a77feaa 100644 --- a/src/rpc/virnetserverclient.c +++ b/src/rpc/virnetserverclient.c @@ -678,14 +678,19 @@ int virNetServerClientGetTLSKeySize(virNetServerClientPtr client) return size; } #endif - +/* + * This mostly just needs to publish the client socket FD to logs. + * It does not necessarily need a lock, or will add lock contention
On Wed, Mar 22, 2017 at 01:02:17 -0700, Prerna Saxena wrote: problems.
+ * Replace the lock with a reference counting mechanism, to prevent the client + * object from being deallocated while this is being run + */ int virNetServerClientGetFD(virNetServerClientPtr client) { int fd = -1; - virObjectLock(client); + virObjectRef(client); if (client->sock) fd = virNetSocketGetFD(client->sock); - virObjectUnlock(client); + virObjectUnref(client);
This change is not justified in any way. Also looks wrong. You can't access an unlocked object.
return fd; }
@@ -965,7 +970,9 @@ void virNetServerClientClose(virNetServerClientPtr
client)
virKeepAlivePtr ka;
virObjectLock(client); - VIR_DEBUG("client=%p", client); + VIR_WARN("Free'ing up resources for client=%p sock=%d", client, + virNetServerClientGetFD(client));
NACK using warnings instead of debug messages is not desired. For debug purposes you should use debug logs.

Sample from current logs: error : virFileReadAll:1290 : Failed to open file '/sys/class/net/tap3/operstate': No such file or directory error : virNetDevGetLinkInfo:1895 : unable to read: /sys/class/net/tap3/operstate: No such file or directory These have no useful data point and are redundant. Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/util/virnetdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index d123248..3e2f962 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1874,7 +1874,7 @@ virNetDevGetLinkInfo(const char *ifname, if (virNetDevSysfsFile(&path, ifname, "operstate") < 0) goto cleanup; - if (virFileReadAll(path, 1024, &buf) < 0) { + if (virFileReadAllQuiet(path, 1024, &buf) < 0 && errno != ENOENT) { virReportSystemError(errno, _("unable to read: %s"), path); -- 1.8.1.2

On Wed, Mar 22, 2017 at 01:02:18 -0700, Prerna Saxena wrote:
Sample from current logs: error : virFileReadAll:1290 : Failed to open file '/sys/class/net/tap3/operstate': No such file or directory error : virNetDevGetLinkInfo:1895 : unable to read: /sys/class/net/tap3/operstate: No such file or directory
These have no useful data point and are redundant.
Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/util/virnetdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index d123248..3e2f962 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1874,7 +1874,7 @@ virNetDevGetLinkInfo(const char *ifname, if (virNetDevSysfsFile(&path, ifname, "operstate") < 0) goto cleanup;
- if (virFileReadAll(path, 1024, &buf) < 0) { + if (virFileReadAllQuiet(path, 1024, &buf) < 0 && errno != ENOENT) { virReportSystemError(errno, _("unable to read: %s"), path);
Remove this message here instead of switching to virFileReadAllQuiet. virFileReadAll reports messages according to the failure itself.

On Wed, Mar 22, 2017 at 09:14:41 +0100, Peter Krempa wrote:
On Wed, Mar 22, 2017 at 01:02:18 -0700, Prerna Saxena wrote:
Sample from current logs: error : virFileReadAll:1290 : Failed to open file '/sys/class/net/tap3/operstate': No such file or directory error : virNetDevGetLinkInfo:1895 : unable to read: /sys/class/net/tap3/operstate: No such file or directory
These have no useful data point and are redundant.
Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/util/virnetdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index d123248..3e2f962 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1874,7 +1874,7 @@ virNetDevGetLinkInfo(const char *ifname, if (virNetDevSysfsFile(&path, ifname, "operstate") < 0) goto cleanup;
- if (virFileReadAll(path, 1024, &buf) < 0) { + if (virFileReadAllQuiet(path, 1024, &buf) < 0 && errno != ENOENT) { virReportSystemError(errno, _("unable to read: %s"), path);
Remove this message here instead of switching to virFileReadAllQuiet. virFileReadAll reports messages according to the failure itself.
Hmm, So you want to avoid the error message altogether. So in that case you should rather call virFileAccess and also note what happens in an comment. Additionally since that would be a semantic change you need to mention it in the commit message and also make sure that all callers would ignore such failure. Otherwise you may cause a regression where we'd report an "unknown error" in cases where it was actually necessary. Since I found at least one code path that propagates the error (nodeDeviceGetXMLDesc) you should re-evaluate your approach.

I agree. Thanks for pointing out that this is a behavioural change, which should not happen. I should be doing something like this: if (virFileReadAllQuiet(path, 1024, &buf) < 0 ) { if (errno != ENOENT) { virReportSystemError(errno, _("unable to read: %s"), path); } goto cleanup; } On Wed, Mar 22, 2017 at 1:56 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Wed, Mar 22, 2017 at 09:14:41 +0100, Peter Krempa wrote:
On Wed, Mar 22, 2017 at 01:02:18 -0700, Prerna Saxena wrote:
Sample from current logs: error : virFileReadAll:1290 : Failed to open file '/sys/class/net/tap3/operstate': No such file or directory error : virNetDevGetLinkInfo:1895 : unable to read: /sys/class/net/tap3/operstate: No such file or directory
These have no useful data point and are redundant.
Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/util/virnetdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index d123248..3e2f962 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1874,7 +1874,7 @@ virNetDevGetLinkInfo(const char *ifname, if (virNetDevSysfsFile(&path, ifname, "operstate") < 0) goto cleanup;
- if (virFileReadAll(path, 1024, &buf) < 0) { + if (virFileReadAllQuiet(path, 1024, &buf) < 0 && errno != ENOENT) { virReportSystemError(errno, _("unable to read: %s"), path);
Remove this message here instead of switching to virFileReadAllQuiet. virFileReadAll reports messages according to the failure itself.
Hmm, So you want to avoid the error message altogether. So in that case you should rather call virFileAccess and also note what happens in an comment.
Additionally since that would be a semantic change you need to mention it in the commit message and also make sure that all callers would ignore such failure. Otherwise you may cause a regression where we'd report an "unknown error" in cases where it was actually necessary.
Since I found at least one code path that propagates the error (nodeDeviceGetXMLDesc) you should re-evaluate your approach.

On Wed, Mar 22, 2017 at 14:13:25 +0530, Prerna wrote:
I agree. Thanks for pointing out that this is a behavioural change, which should not happen.
Please don't top post on technical lists.
I should be doing something like this:
if (virFileReadAllQuiet(path, 1024, &buf) < 0 ) { if (errno != ENOENT) { virReportSystemError(errno, _("unable to read: %s"), path);
} goto cleanup; }
No, that still would break at least the one case I've pointed out below [1]. You need to make sure that callers are okay with such change. Or you need to find out the code path to the message you want to get rid of. But doing this chagne without regards to odther code paths is not the right approach.
On Wed, Mar 22, 2017 at 1:56 PM, Peter Krempa <pkrempa@redhat.com> wrote:
On Wed, Mar 22, 2017 at 09:14:41 +0100, Peter Krempa wrote:
On Wed, Mar 22, 2017 at 01:02:18 -0700, Prerna Saxena wrote:
Sample from current logs: error : virFileReadAll:1290 : Failed to open file '/sys/class/net/tap3/operstate': No such file or directory error : virNetDevGetLinkInfo:1895 : unable to read: /sys/class/net/tap3/operstate: No such file or directory
These have no useful data point and are redundant.
Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/util/virnetdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index d123248..3e2f962 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1874,7 +1874,7 @@ virNetDevGetLinkInfo(const char *ifname, if (virNetDevSysfsFile(&path, ifname, "operstate") < 0) goto cleanup;
- if (virFileReadAll(path, 1024, &buf) < 0) { + if (virFileReadAllQuiet(path, 1024, &buf) < 0 && errno != ENOENT) { virReportSystemError(errno, _("unable to read: %s"), path);
Remove this message here instead of switching to virFileReadAllQuiet. virFileReadAll reports messages according to the failure itself.
Hmm, So you want to avoid the error message altogether. So in that case you should rather call virFileAccess and also note what happens in an comment.
Additionally since that would be a semantic change you need to mention it in the commit message and also make sure that all callers would ignore such failure. Otherwise you may cause a regression where we'd report an "unknown error" in cases where it was actually necessary.
[1]
Since I found at least one code path that propagates the error (nodeDeviceGetXMLDesc) you should re-evaluate your approach.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Current logs: error : qemuProcessFindDomainDiskByAlias:411 : internal error: no disk found with alias ide0-0-0 There is no way to find which VM was seeing this error. Makes debugging very hard, and the message itself is no good. Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/qemu/qemu_process.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ec0e36d..9f5bc3a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -366,8 +366,8 @@ qemuProcessFindDomainDiskByAlias(virDomainObjPtr vm, } virReportError(VIR_ERR_INTERNAL_ERROR, - _("no disk found with alias %s"), - alias); + _("VM %s: no disk found with alias %s"), + vm->def->name, alias); return NULL; } -- 1.8.1.2

On Wed, Mar 22, 2017 at 01:02:19 -0700, Prerna Saxena wrote:
Current logs: error : qemuProcessFindDomainDiskByAlias:411 : internal error: no disk found with alias ide0-0-0
There is no way to find which VM was seeing this error. Makes debugging very hard, and the message itself is no good.
Signed-off-by: Prerna Saxena <saxenap.ltc@gmail.com> --- src/qemu/qemu_process.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ec0e36d..9f5bc3a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -366,8 +366,8 @@ qemuProcessFindDomainDiskByAlias(virDomainObjPtr vm, }
virReportError(VIR_ERR_INTERNAL_ERROR, - _("no disk found with alias %s"), - alias); + _("VM %s: no disk found with alias %s"), + vm->def->name, alias);
I think a better option is to enable debug logs anyways. This message won't really help you debug the issue if you don't get the additional data from where the problem happened.
participants (3)
-
Peter Krempa
-
Prerna
-
Prerna Saxena