[libvirt] [PATCHv2] Don't log an internal error when the guest hasn't updated balloon stats

If virDomainMemoryStats is called too soon after domain startup, QEMU returns: "error":{"class":"GenericError","desc":"guest hasn't updated any stats yet"} when we try to query balloon stats. Check for this reply and log it as OPERATION_INVALID instead of INTERNAL_ERROR. This means the daemon only logs it at the debug level, without polluting system logs. Reported by Laszlo Pal: https://www.redhat.com/archives/libvirt-users/2014-May/msg00023.html --- v1: https://www.redhat.com/archives/libvir-list/2014-May/msg00420.html v2: return 0 in this case - even though balloon stats are not yet available, we can still return 'rss' in qemuDomainMemoryStats jump to cleanup if CheckError returns < 0 src/qemu/qemu_monitor_json.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index f8ab975..914f3ef 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1465,12 +1465,22 @@ int qemuMonitorJSONGetMemoryStats(qemuMonitorPtr mon, NULL))) goto cleanup; - ret = qemuMonitorJSONCommand(mon, cmd, &reply); + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; - if (ret == 0) - ret = qemuMonitorJSONCheckError(cmd, reply); + if ((data = virJSONValueObjectGet(reply, "error"))) { + const char *klass = virJSONValueObjectGetString(data, "class"); + const char *desc = virJSONValueObjectGetString(data, "desc"); - if (ret < 0) + if (STREQ_NULLABLE(klass, "GenericError") && + STREQ_NULLABLE(desc, "guest hasn't updated any stats yet")) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("the guest hasn't updated any stats yet")); + goto cleanup; + } + } + + if ((ret = qemuMonitorJSONCheckError(cmd, reply)) < 0) goto cleanup; if (!(data = virJSONValueObjectGet(reply, "return"))) { -- 1.8.3.2

On 05/15/2014 01:22 AM, Ján Tomko wrote:
If virDomainMemoryStats is called too soon after domain startup, QEMU returns: "error":{"class":"GenericError","desc":"guest hasn't updated any stats yet"} when we try to query balloon stats.
Check for this reply and log it as OPERATION_INVALID instead of INTERNAL_ERROR. This means the daemon only logs it at the debug level, without polluting system logs.
Reported by Laszlo Pal: https://www.redhat.com/archives/libvirt-users/2014-May/msg00023.html --- v1: https://www.redhat.com/archives/libvir-list/2014-May/msg00420.html v2: return 0 in this case - even though balloon stats are not yet available, we can still return 'rss' in qemuDomainMemoryStats jump to cleanup if CheckError returns < 0
src/qemu/qemu_monitor_json.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
+ if ((data = virJSONValueObjectGet(reply, "error"))) { + const char *klass = virJSONValueObjectGetString(data, "class"); + const char *desc = virJSONValueObjectGetString(data, "desc");
- if (ret < 0) + if (STREQ_NULLABLE(klass, "GenericError") && + STREQ_NULLABLE(desc, "guest hasn't updated any stats yet")) {
Adding qemu. Uggh - the qemu documentation of QMP states: - The "desc" member is a human-readable error message. Clients should not attempt to parse this message. because the contents of that field are NOT guaranteed to be stable. We're stuck parsing that field for old versions of qemu, but this is one case where upstream qemu (for future versions) should change the "class" member of that particular error case to a distinct value other than GenericError so that it is trivially obvious when this particular condition has occurred, since it is a case where libvirt wants to treat it as a non-error. reluctant ACK, while hoping that we can do something more reliable in the future. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Copying Luiz... Eric Blake <eblake@redhat.com> writes:
On 05/15/2014 01:22 AM, Ján Tomko wrote:
If virDomainMemoryStats is called too soon after domain startup, QEMU returns: "error":{"class":"GenericError","desc":"guest hasn't updated any stats yet"} when we try to query balloon stats.
Check for this reply and log it as OPERATION_INVALID instead of INTERNAL_ERROR. This means the daemon only logs it at the debug level, without polluting system logs.
Reported by Laszlo Pal: https://www.redhat.com/archives/libvirt-users/2014-May/msg00023.html --- v1: https://www.redhat.com/archives/libvir-list/2014-May/msg00420.html v2: return 0 in this case - even though balloon stats are not yet available, we can still return 'rss' in qemuDomainMemoryStats jump to cleanup if CheckError returns < 0
src/qemu/qemu_monitor_json.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
+ if ((data = virJSONValueObjectGet(reply, "error"))) { + const char *klass = virJSONValueObjectGetString(data, "class"); + const char *desc = virJSONValueObjectGetString(data, "desc");
- if (ret < 0) + if (STREQ_NULLABLE(klass, "GenericError") && + STREQ_NULLABLE(desc, "guest hasn't updated any stats yet")) {
You snipped so much of the diff that I have trouble finding the place this applies.
Adding qemu. Uggh - the qemu documentation of QMP states:
- The "desc" member is a human-readable error message. Clients should not attempt to parse this message.
because the contents of that field are NOT guaranteed to be stable. We're stuck parsing that field for old versions of qemu, but this is one case where upstream qemu (for future versions) should change the "class" member of that particular error case to a distinct value other than GenericError so that it is trivially obvious when this particular condition has occurred, since it is a case where libvirt wants to treat it as a non-error.
reluctant ACK, while hoping that we can do something more reliable in the future.
Is "no stats yet" really an error? Libvirt has done nothing wrong, and I'd argue the guest hasn't done anything wrong, either. Should we simply return an empty result? Like "cat" on a file that hasn't gotten its data, yet.

On 05/15/2014 11:59 PM, Markus Armbruster wrote:
Copying Luiz...
src/qemu/qemu_monitor_json.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
+ if ((data = virJSONValueObjectGet(reply, "error"))) { + const char *klass = virJSONValueObjectGetString(data, "class"); + const char *desc = virJSONValueObjectGetString(data, "desc");
- if (ret < 0) + if (STREQ_NULLABLE(klass, "GenericError") && + STREQ_NULLABLE(desc, "guest hasn't updated any stats yet")) {
You snipped so much of the diff that I have trouble finding the place this applies.
Apologies; it is for qemuMonitorJSONGetMemoryStats() calling qom-get: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_monitor_json.c;...
Is "no stats yet" really an error? Libvirt has done nothing wrong, and I'd argue the guest hasn't done anything wrong, either. Should we simply return an empty result? Like "cat" on a file that hasn't gotten its data, yet.
Yes, that would be reasonable. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, 16 May 2014 00:11:24 -0600 Eric Blake <eblake@redhat.com> wrote:
Is "no stats yet" really an error?
This is a special case where the guest hasn't ever filled QEMU with balloon stats. There are two possible cases. Either the guest hasn't done it yet, but will do in the future or the guest will never do it (eg. the guest doesn't support balloon, the guest crashed, etc).
Libvirt has done nothing wrong, and I'd argue the guest hasn't done anything wrong, either. Should we simply return an empty result? Like "cat" on a file that hasn't gotten its data, yet.
Yes, that would be reasonable.
I'm fine with the two possible solutions here: adding a new TryAgain error class or returning an "empty" result. I say "empty" because those fields are not optionals, so we'll have to fill them with some value. Shouldn't be a problem for most fields, as the spec (docs/virtio-balloon-stats.txt) already defines that stats that the guest doesn't report are returned as -1. The only exception here is the last-update field, which can't hold a negative iirc. The only choice is to return 0 there. I guess that this shouldn't be a problem either. Who volunteers to fix this?

On 05/16/2014 03:13 PM, Luiz Capitulino wrote:
On Fri, 16 May 2014 00:11:24 -0600 Eric Blake <eblake@redhat.com> wrote:
Is "no stats yet" really an error?
This is a special case where the guest hasn't ever filled QEMU with balloon stats. There are two possible cases. Either the guest hasn't done it yet, but will do in the future or the guest will never do it (eg. the guest doesn't support balloon, the guest crashed, etc).
Libvirt has done nothing wrong, and I'd argue the guest hasn't done anything wrong, either. Should we simply return an empty result? Like "cat" on a file that hasn't gotten its data, yet.
Yes, that would be reasonable.
I'm fine with the two possible solutions here: adding a new TryAgain error class or returning an "empty" result.
I say "empty" because those fields are not optionals, so we'll have to fill them with some value. Shouldn't be a problem for most fields, as the spec (docs/virtio-balloon-stats.txt) already defines that stats that the guest doesn't report are returned as -1. The only exception here is the last-update field, which can't hold a negative iirc. The only choice is to return 0 there. I guess that this shouldn't be a problem either.
Who volunteers to fix this?
I've tried: http://marc.info/?l=qemu-devel&m=140048179520115&w=2 Jan

On 05/15/2014 11:19 PM, Eric Blake wrote:
On 05/15/2014 01:22 AM, Ján Tomko wrote:
If virDomainMemoryStats is called too soon after domain startup, QEMU returns: "error":{"class":"GenericError","desc":"guest hasn't updated any stats yet"} when we try to query balloon stats.
Check for this reply and log it as OPERATION_INVALID instead of INTERNAL_ERROR. This means the daemon only logs it at the debug level, without polluting system logs.
Reported by Laszlo Pal: https://www.redhat.com/archives/libvirt-users/2014-May/msg00023.html --- v1: https://www.redhat.com/archives/libvir-list/2014-May/msg00420.html v2: return 0 in this case - even though balloon stats are not yet available, we can still return 'rss' in qemuDomainMemoryStats jump to cleanup if CheckError returns < 0
src/qemu/qemu_monitor_json.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
+ if ((data = virJSONValueObjectGet(reply, "error"))) { + const char *klass = virJSONValueObjectGetString(data, "class"); + const char *desc = virJSONValueObjectGetString(data, "desc");
- if (ret < 0) + if (STREQ_NULLABLE(klass, "GenericError") && + STREQ_NULLABLE(desc, "guest hasn't updated any stats yet")) {
Adding qemu. Uggh - the qemu documentation of QMP states:
- The "desc" member is a human-readable error message. Clients should not attempt to parse this message.
because the contents of that field are NOT guaranteed to be stable. We're stuck parsing that field for old versions of qemu, but this is one case where upstream qemu (for future versions) should change the "class" member of that particular error case to a distinct value other than GenericError so that it is trivially obvious when this particular condition has occurred, since it is a case where libvirt wants to treat it as a non-error.
reluctant ACK, while hoping that we can do something more reliable in the future.
I have pushed the patch now. The qemu patch reporting empty stats instead of this error should be on its way: https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg04295.html Jan
participants (4)
-
Eric Blake
-
Ján Tomko
-
Luiz Capitulino
-
Markus Armbruster