[PATCH v2 00/15] qemu: Store I/O error messages for disks

v2: - ACK'd patches were pushed - the QEMU 'qom-path' regression is no longer mentioned - it's worker around by prefering lookup via node-name - the patch is justified without the need to mention it - 'qom-path' is now properly used for lookup of the disk - I/O error messages are now stored in virStorageSource - the messages can be viewed via virDomainGetMessages() - various refactors to make the addition clean added - news added Peter Krempa (15): qemu: Handle quirks of 'device' field of BLOCK_IO_ERROR event in monitor code qemu: Rename 'diskAlias' to 'device' in qemu IO error event handling qemuProcessHandleIOError: Rename local variables qemuMonitorJSONHandleIOError: Do not munge 'reason' field of IO error event qemuProcessHandleIOError: Prefer lookup by node name qemuMonitorJSONHandleIOError: Propagate new 'qom-path' field virStorageSource: Add fields for storing last I/O error message qemuProcessHandleIOError: Populate I/O error reason to virStorageSource qemuProcessHandleIOError: Log IO errors in the VM log file libxlDomainGetMessages: Add existing flags to 'virCheckFlags' virDomainObjGetMessages: Refactor using GPtrArray virDomainGetMessages: Introduce VIR_DOMAIN_MESSAGE_IOERRORS include: libvirt-domain: Reword documentation for @reason of VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON include: libvirt-domain: Add 'hypervisor-message' @reason of VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON NEWS: Mention preserving I/O error messages for qemu VMs NEWS.rst | 6 +++ include/libvirt/libvirt-domain.h | 19 ++++++-- src/conf/domain_conf.c | 80 +++++++++++++++++++++----------- src/conf/domain_conf.h | 9 +++- src/conf/storage_source_conf.c | 5 ++ src/conf/storage_source_conf.h | 6 +++ src/libvirt_private.syms | 1 + src/libxl/libxl_driver.c | 13 +++++- src/qemu/qemu_driver.c | 24 +++++++++- src/qemu/qemu_monitor.c | 6 ++- src/qemu/qemu_monitor.h | 8 +++- src/qemu/qemu_monitor_json.c | 23 ++++++--- src/qemu/qemu_process.c | 61 +++++++++++++++--------- src/test/test_driver.c | 12 ++++- 14 files changed, 202 insertions(+), 71 deletions(-) -- 2.48.1

BLOCK_IO_ERROR's 'device' field is an empty string in case when it isn't applicable as it was originally mandatory in the qemu API docs. Move the logic that convert's empty string back to NULL from 'qemuProcessHandleIOError()' to 'qemuMonitorJSONHandleIOError()' This also fixes a hypothetical NULL-dereference if qemu would indeed report an IO error without the 'device' field present. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor_json.c | 9 ++++++++- src/qemu/qemu_process.c | 3 --- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 9f417d27c6..5ca76d2d80 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -708,8 +708,15 @@ qemuMonitorJSONHandleIOError(qemuMonitor *mon, virJSONValue *data) action = "ignore"; } - if ((device = virJSONValueObjectGetString(data, "device")) == NULL) + if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in disk io error event"); + } else { + /* 'device' was documented as mandatory in the qemu event, but later became + * optional, in which case an empty string is sent by qemu. Convert it back + * to NULL */ + if (*device == '\0') + device = NULL; + } nodename = virJSONValueObjectGetString(data, "node-name"); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 34a755a49a..edcd8ac3a8 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -840,9 +840,6 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, virObjectLock(vm); priv = QEMU_DOMAIN_PRIVATE(vm); - if (*diskAlias == '\0') - diskAlias = NULL; - if (diskAlias) disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL); else if (nodename) -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:05PM +0100, Peter Krempa wrote:
BLOCK_IO_ERROR's 'device' field is an empty string in case when it isn't applicable as it was originally mandatory in the qemu API docs.
Move the logic that convert's empty string back to NULL from 'qemuProcessHandleIOError()' to 'qemuMonitorJSONHandleIOError()'
This also fixes a hypothetical NULL-dereference if qemu would indeed report an IO error without the 'device' field present.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor_json.c | 9 ++++++++- src/qemu/qemu_process.c | 3 --- 2 files changed, 8 insertions(+), 4 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 9f417d27c6..5ca76d2d80 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -708,8 +708,15 @@ qemuMonitorJSONHandleIOError(qemuMonitor *mon, virJSONValue *data) action = "ignore"; }
- if ((device = virJSONValueObjectGetString(data, "device")) == NULL) + if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in disk io error event"); + } else { + /* 'device' was documented as mandatory in the qemu event, but later became + * optional, in which case an empty string is sent by qemu. Convert it back + * to NULL */ + if (*device == '\0') + device = NULL;
Oh, I'm surprised that QEMU doesn't send a JSON "null" in this case already.
+ }
nodename = virJSONValueObjectGetString(data, "node-name");
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 34a755a49a..edcd8ac3a8 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -840,9 +840,6 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, virObjectLock(vm); priv = QEMU_DOMAIN_PRIVATE(vm);
- if (*diskAlias == '\0') - diskAlias = NULL; - if (diskAlias) disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL); else if (nodename) -- 2.48.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Jan 30, 2025 at 11:10:18 +0000, Daniel P. Berrangé wrote:
On Tue, Jan 28, 2025 at 05:28:05PM +0100, Peter Krempa wrote:
BLOCK_IO_ERROR's 'device' field is an empty string in case when it isn't applicable as it was originally mandatory in the qemu API docs.
Move the logic that convert's empty string back to NULL from 'qemuProcessHandleIOError()' to 'qemuMonitorJSONHandleIOError()'
This also fixes a hypothetical NULL-dereference if qemu would indeed report an IO error without the 'device' field present.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor_json.c | 9 ++++++++- src/qemu/qemu_process.c | 3 --- 2 files changed, 8 insertions(+), 4 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 9f417d27c6..5ca76d2d80 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -708,8 +708,15 @@ qemuMonitorJSONHandleIOError(qemuMonitor *mon, virJSONValue *data) action = "ignore"; }
- if ((device = virJSONValueObjectGetString(data, "device")) == NULL) + if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in disk io error event"); + } else { + /* 'device' was documented as mandatory in the qemu event, but later became + * optional, in which case an empty string is sent by qemu. Convert it back + * to NULL */ + if (*device == '\0') + device = NULL;
Oh, I'm surprised that QEMU doesn't send a JSON "null" in this case already.
The idea of sending empty string is to prevent API breakage as the 'device' field was documented to be mandatory and a string when BLOCK_IO_ERROR was initially added. Switching to JSON 'null' would be almost equivalent to just skipping the field (as virJSONValueObjectGetString would return NULL) at least for the above code in libvirt.

The field is named 'device' in the event so unify our naming. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.c | 4 ++-- src/qemu/qemu_monitor.h | 4 ++-- src/qemu/qemu_process.c | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index e0b1bf1d37..1227298a45 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1127,7 +1127,7 @@ qemuMonitorEmitWatchdog(qemuMonitor *mon, int action) void qemuMonitorEmitIOError(qemuMonitor *mon, - const char *diskAlias, + const char *device, const char *nodename, int action, const char *reason) @@ -1135,7 +1135,7 @@ qemuMonitorEmitIOError(qemuMonitor *mon, VIR_DEBUG("mon=%p", mon); QEMU_MONITOR_CALLBACK(mon, domainIOError, mon->vm, - diskAlias, nodename, action, reason); + device, nodename, action, reason); } diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index c74892c4dc..3ec8e7227f 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -206,7 +206,7 @@ typedef void (*qemuMonitorDomainWatchdogCallback)(qemuMonitor *mon, int action); typedef void (*qemuMonitorDomainIOErrorCallback)(qemuMonitor *mon, virDomainObj *vm, - const char *diskAlias, + const char *device, const char *nodename, int action, const char *reason); @@ -450,7 +450,7 @@ void qemuMonitorEmitResume(qemuMonitor *mon); void qemuMonitorEmitRTCChange(qemuMonitor *mon, long long offset); void qemuMonitorEmitWatchdog(qemuMonitor *mon, int action); void qemuMonitorEmitIOError(qemuMonitor *mon, - const char *diskAlias, + const char *device, const char *nodename, int action, const char *reason); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index edcd8ac3a8..c2a5361b79 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -824,7 +824,7 @@ qemuProcessHandleWatchdog(qemuMonitor *mon G_GNUC_UNUSED, static void qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, virDomainObj *vm, - const char *diskAlias, + const char *device, const char *nodename, int action, const char *reason) @@ -840,8 +840,8 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, virObjectLock(vm); priv = QEMU_DOMAIN_PRIVATE(vm); - if (diskAlias) - disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL); + if (device) + disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, device, NULL); else if (nodename) disk = qemuDomainDiskLookupByNodename(vm->def, NULL, nodename, NULL); else -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:06PM +0100, Peter Krempa wrote:
The field is named 'device' in the event so unify our naming.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.c | 4 ++-- src/qemu/qemu_monitor.h | 4 ++-- src/qemu/qemu_process.c | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Prefix the helper variables used to supply data to the event by 'event'. Declare them with the default value of an empty string rather than doing it later. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c2a5361b79..aaf718e552 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -833,8 +833,8 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, virObjectEvent *ioErrorEvent = NULL; virObjectEvent *ioErrorEvent2 = NULL; virObjectEvent *lifecycleEvent = NULL; - const char *srcPath; - const char *devAlias; + const char *eventPath = ""; + const char *eventAlias = ""; virDomainDiskDef *disk; virObjectLock(vm); @@ -848,15 +848,12 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, disk = NULL; if (disk) { - srcPath = virDomainDiskGetSource(disk); - devAlias = disk->info.alias; - } else { - srcPath = ""; - devAlias = ""; + eventPath = virDomainDiskGetSource(disk); + eventAlias = disk->info.alias; } - ioErrorEvent = virDomainEventIOErrorNewFromObj(vm, srcPath, devAlias, action); - ioErrorEvent2 = virDomainEventIOErrorReasonNewFromObj(vm, srcPath, devAlias, action, reason); + ioErrorEvent = virDomainEventIOErrorNewFromObj(vm, eventPath, eventAlias, action); + ioErrorEvent2 = virDomainEventIOErrorReasonNewFromObj(vm, eventPath, eventAlias, action, reason); if (action == VIR_DOMAIN_EVENT_IO_ERROR_PAUSE && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:07PM +0100, Peter Krempa wrote:
Prefix the helper variables used to supply data to the event by 'event'. Declare them with the default value of an empty string rather than doing it later.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Leave the interpretation of the event to 'qemuProcessHandleIOError()' which will create it's own variant of the messages for the user-facing libvirt events. qemuMonitorJSONHandleIOError() will pass through the raw data it got from qemu. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.c | 3 ++- src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 12 ++++++------ src/qemu/qemu_process.c | 9 +++++++-- 4 files changed, 17 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 1227298a45..cae7dc6bf5 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1130,12 +1130,13 @@ qemuMonitorEmitIOError(qemuMonitor *mon, const char *device, const char *nodename, int action, + bool nospace, const char *reason) { VIR_DEBUG("mon=%p", mon); QEMU_MONITOR_CALLBACK(mon, domainIOError, mon->vm, - device, nodename, action, reason); + device, nodename, action, nospace, reason); } diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 3ec8e7227f..c1aa36fe83 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -209,6 +209,7 @@ typedef void (*qemuMonitorDomainIOErrorCallback)(qemuMonitor *mon, const char *device, const char *nodename, int action, + bool nospace, const char *reason); typedef void (*qemuMonitorDomainGraphicsCallback)(qemuMonitor *mon, virDomainObj *vm, @@ -453,6 +454,7 @@ void qemuMonitorEmitIOError(qemuMonitor *mon, const char *device, const char *nodename, int action, + bool nospace, const char *reason); void qemuMonitorEmitGraphics(qemuMonitor *mon, int phase, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 5ca76d2d80..db232cee78 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -695,8 +695,8 @@ qemuMonitorJSONHandleIOError(qemuMonitor *mon, virJSONValue *data) const char *device; const char *nodename; const char *action; - const char *reason = ""; - bool nospc = false; + const char *reason; + bool nospace = false; int actionID; /* Throughout here we try our best to carry on upon errors, @@ -719,16 +719,16 @@ qemuMonitorJSONHandleIOError(qemuMonitor *mon, virJSONValue *data) } nodename = virJSONValueObjectGetString(data, "node-name"); - - if (virJSONValueObjectGetBoolean(data, "nospace", &nospc) == 0 && nospc) - reason = "enospc"; + reason = virJSONValueObjectGetString(data, "reason"); + /* 'nospace' flag is relevant only when true */ + ignore_value(virJSONValueObjectGetBoolean(data, "nospace", &nospace)); if ((actionID = qemuMonitorIOErrorActionTypeFromString(action)) < 0) { VIR_WARN("unknown disk io error action '%s'", action); actionID = VIR_DOMAIN_EVENT_IO_ERROR_NONE; } - qemuMonitorEmitIOError(mon, device, nodename, actionID, reason); + qemuMonitorEmitIOError(mon, device, nodename, actionID, nospace, reason); } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index aaf718e552..80a73643f0 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -827,7 +827,8 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, const char *device, const char *nodename, int action, - const char *reason) + bool nospace, + const char *reason G_GNUC_UNUSED) { qemuDomainObjPrivate *priv; virObjectEvent *ioErrorEvent = NULL; @@ -835,6 +836,7 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, virObjectEvent *lifecycleEvent = NULL; const char *eventPath = ""; const char *eventAlias = ""; + const char *eventReason = ""; virDomainDiskDef *disk; virObjectLock(vm); @@ -852,8 +854,11 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, eventAlias = disk->info.alias; } + if (nospace) + eventReason = "enospc"; + ioErrorEvent = virDomainEventIOErrorNewFromObj(vm, eventPath, eventAlias, action); - ioErrorEvent2 = virDomainEventIOErrorReasonNewFromObj(vm, eventPath, eventAlias, action, reason); + ioErrorEvent2 = virDomainEventIOErrorReasonNewFromObj(vm, eventPath, eventAlias, action, eventReason); if (action == VIR_DOMAIN_EVENT_IO_ERROR_PAUSE && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:08PM +0100, Peter Krempa wrote:
Leave the interpretation of the event to 'qemuProcessHandleIOError()' which will create it's own variant of the messages for the user-facing libvirt events. qemuMonitorJSONHandleIOError() will pass through the raw data it got from qemu.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.c | 3 ++- src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 12 ++++++------ src/qemu/qemu_process.c | 9 +++++++-- 4 files changed, 17 insertions(+), 9 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

When qemu reports a node name for an I/O error we should prefer the lookup by node name instead as it gives us the path to the specific image which caused the error instead of the top level image path. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 80a73643f0..1849f68634 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -837,22 +837,26 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, const char *eventPath = ""; const char *eventAlias = ""; const char *eventReason = ""; - virDomainDiskDef *disk; + virDomainDiskDef *disk = NULL; + virStorageSource *src = NULL; virObjectLock(vm); priv = QEMU_DOMAIN_PRIVATE(vm); - if (device) + if (nodename) + disk = qemuDomainDiskLookupByNodename(vm->def, priv->backup, nodename, &src); + + if (!disk) disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, device, NULL); - else if (nodename) - disk = qemuDomainDiskLookupByNodename(vm->def, NULL, nodename, NULL); - else - disk = NULL; - if (disk) { - eventPath = virDomainDiskGetSource(disk); + if (!src && disk) + src = disk->src; + + if (disk) eventAlias = disk->info.alias; - } + + if (src && src->path) + eventPath = src->path; if (nospace) eventReason = "enospc"; -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:09PM +0100, Peter Krempa wrote:
When qemu reports a node name for an I/O error we should prefer the lookup by node name instead as it gives us the path to the specific image which caused the error instead of the top level image path.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

QEMU commit v9.1.0-1065-ge67b7aef7c added 'qom-path' as an optional field for the BLOCK_IO_ERROR event. Extract and propagate it as an alternative to lookup via 'node-name' and 'device' (alias). Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.c | 3 ++- src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 4 +++- src/qemu/qemu_process.c | 3 ++- 4 files changed, 9 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index cae7dc6bf5..830ecbad1c 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1128,6 +1128,7 @@ qemuMonitorEmitWatchdog(qemuMonitor *mon, int action) void qemuMonitorEmitIOError(qemuMonitor *mon, const char *device, + const char *qompath, const char *nodename, int action, bool nospace, @@ -1136,7 +1137,7 @@ qemuMonitorEmitIOError(qemuMonitor *mon, VIR_DEBUG("mon=%p", mon); QEMU_MONITOR_CALLBACK(mon, domainIOError, mon->vm, - device, nodename, action, nospace, reason); + device, qompath, nodename, action, nospace, reason); } diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index c1aa36fe83..072f452e79 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -207,6 +207,7 @@ typedef void (*qemuMonitorDomainWatchdogCallback)(qemuMonitor *mon, typedef void (*qemuMonitorDomainIOErrorCallback)(qemuMonitor *mon, virDomainObj *vm, const char *device, + const char *qompath, const char *nodename, int action, bool nospace, @@ -452,6 +453,7 @@ void qemuMonitorEmitRTCChange(qemuMonitor *mon, long long offset); void qemuMonitorEmitWatchdog(qemuMonitor *mon, int action); void qemuMonitorEmitIOError(qemuMonitor *mon, const char *device, + const char *qompath, const char *nodename, int action, bool nospace, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index db232cee78..890d9b7dfd 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -693,6 +693,7 @@ static void qemuMonitorJSONHandleIOError(qemuMonitor *mon, virJSONValue *data) { const char *device; + const char *qompath; const char *nodename; const char *action; const char *reason; @@ -718,6 +719,7 @@ qemuMonitorJSONHandleIOError(qemuMonitor *mon, virJSONValue *data) device = NULL; } + qompath = virJSONValueObjectGetString(data, "qom-path"); nodename = virJSONValueObjectGetString(data, "node-name"); reason = virJSONValueObjectGetString(data, "reason"); /* 'nospace' flag is relevant only when true */ @@ -728,7 +730,7 @@ qemuMonitorJSONHandleIOError(qemuMonitor *mon, virJSONValue *data) actionID = VIR_DOMAIN_EVENT_IO_ERROR_NONE; } - qemuMonitorEmitIOError(mon, device, nodename, actionID, nospace, reason); + qemuMonitorEmitIOError(mon, device, qompath, nodename, actionID, nospace, reason); } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1849f68634..4497d089d0 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -825,6 +825,7 @@ static void qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, virDomainObj *vm, const char *device, + const char *qompath, const char *nodename, int action, bool nospace, @@ -847,7 +848,7 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, disk = qemuDomainDiskLookupByNodename(vm->def, priv->backup, nodename, &src); if (!disk) - disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, device, NULL); + disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, device, qompath); if (!src && disk) src = disk->src; -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:10PM +0100, Peter Krempa wrote:
QEMU commit v9.1.0-1065-ge67b7aef7c added 'qom-path' as an optional field for the BLOCK_IO_ERROR event. Extract and propagate it as an alternative to lookup via 'node-name' and 'device' (alias).
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.c | 3 ++- src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 4 +++- src/qemu/qemu_process.c | 3 ++- 4 files changed, 9 insertions(+), 3 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hypervisors may report a I/O error message (unstable; for human use) to libvirt. In order to store it with the appropriate virStorageSource so that it can be later queried we need to add fields to virStorageSource to store the timestamp and message. The message is deliberately not copied via virStorageSourceCopy. The messages are also not serialized to the status XML as losing them on a daemon restart as they're likely to be stale anyways. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/conf/storage_source_conf.c | 5 +++++ src/conf/storage_source_conf.h | 6 ++++++ 2 files changed, 11 insertions(+) diff --git a/src/conf/storage_source_conf.c b/src/conf/storage_source_conf.c index 2b658dd485..ca956a1b7c 100644 --- a/src/conf/storage_source_conf.c +++ b/src/conf/storage_source_conf.c @@ -917,6 +917,8 @@ virStorageSourceCopy(const virStorageSource *src, def->nfs_uid = src->nfs_uid; def->nfs_gid = src->nfs_gid; + /* 'ioerror_timestamp' and 'ioerror_message' are deliberately not copied */ + return g_steal_pointer(&def); } @@ -1203,6 +1205,9 @@ virStorageSourceClear(virStorageSource *def) g_clear_pointer(&def->fdtuple, g_object_unref); + VIR_FREE(def->ioerror_timestamp); + VIR_FREE(def->ioerror_message); + /* clear everything except the class header as the object APIs * will break otherwise */ memset((char *) def + sizeof(def->parent), 0, diff --git a/src/conf/storage_source_conf.h b/src/conf/storage_source_conf.h index 9463722518..e6cbb93c06 100644 --- a/src/conf/storage_source_conf.h +++ b/src/conf/storage_source_conf.h @@ -445,6 +445,12 @@ struct _virStorageSource { * to do this decision. */ bool seclabelSkipRemember; + + /* Last instance of hypervisor originated I/O error message and timestamp. + * The error stored here is not designed to be stable. The message + * is also not copied via virStorageSourceCopy */ + char *ioerror_timestamp; + char *ioerror_message; }; G_DEFINE_AUTOPTR_CLEANUP_FUNC(virStorageSource, virObjectUnref); -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:11PM +0100, Peter Krempa wrote:
Hypervisors may report a I/O error message (unstable; for human use) to libvirt. In order to store it with the appropriate virStorageSource so that it can be later queried we need to add fields to virStorageSource to store the timestamp and message.
The message is deliberately not copied via virStorageSourceCopy.
The messages are also not serialized to the status XML as losing them on a daemon restart as they're likely to be stale anyways.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/conf/storage_source_conf.c | 5 +++++ src/conf/storage_source_conf.h | 6 ++++++ 2 files changed, 11 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Record the last I/O error reason and timestamp which happened with the corresponding virStorageSource struct. This will later allow querying the last error e.g. via the virDomainGetMessages() API. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 4497d089d0..d73c52542b 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -840,6 +840,7 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, const char *eventReason = ""; virDomainDiskDef *disk = NULL; virStorageSource *src = NULL; + g_autofree char *timestamp = NULL; virObjectLock(vm); priv = QEMU_DOMAIN_PRIVATE(vm); @@ -865,6 +866,15 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, ioErrorEvent = virDomainEventIOErrorNewFromObj(vm, eventPath, eventAlias, action); ioErrorEvent2 = virDomainEventIOErrorReasonNewFromObj(vm, eventPath, eventAlias, action, eventReason); + if ((timestamp = virTimeStringNow()) != NULL) { + if (src) { + g_free(src->ioerror_timestamp); + g_free(src->ioerror_message); + src->ioerror_timestamp = g_steal_pointer(×tamp); + src->ioerror_message = g_strdup(reason); + } + } + if (action == VIR_DOMAIN_EVENT_IO_ERROR_PAUSE && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { VIR_WARN("Transitioned guest %s to paused state due to IO error", vm->def->name); -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:12PM +0100, Peter Krempa wrote:
Record the last I/O error reason and timestamp which happened with the corresponding virStorageSource struct.
This will later allow querying the last error e.g. via the virDomainGetMessages() API.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Add a log entry to the VM log file for every time we receive an IO error event from qemu. The log entry is as follows: 2025-01-24 16:03:28.928+0000: IO error device='virtio-disk0' node-name='libvirt-1-storage' reason='other: Input/output error' Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d73c52542b..e9342287ab 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -829,7 +829,7 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, const char *nodename, int action, bool nospace, - const char *reason G_GNUC_UNUSED) + const char *reason) { qemuDomainObjPrivate *priv; virObjectEvent *ioErrorEvent = NULL; @@ -867,6 +867,9 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, ioErrorEvent2 = virDomainEventIOErrorReasonNewFromObj(vm, eventPath, eventAlias, action, eventReason); if ((timestamp = virTimeStringNow()) != NULL) { + qemuDomainLogAppendMessage(priv->driver, vm, "%s: IO error device='%s' node-name='%s' reason='%s'\n", + timestamp, NULLSTR(eventAlias), NULLSTR(nodename), NULLSTR(reason)); + if (src) { g_free(src->ioerror_timestamp); g_free(src->ioerror_message); -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:13PM +0100, Peter Krempa wrote:
Add a log entry to the VM log file for every time we receive an IO error event from qemu. The log entry is as follows:
2025-01-24 16:03:28.928+0000: IO error device='virtio-disk0' node-name='libvirt-1-storage' reason='other: Input/output error'
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

The two VIR_DOMAIN_MESSAGE_* flags were not listed in the virCheckFlags check in 'libxl' but were present in 'test' and 'qemu' driver impls. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/libxl/libxl_driver.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 494b1ad9bc..058fee0706 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -6578,7 +6578,8 @@ libxlDomainGetMessages(virDomainPtr dom, virDomainObj *vm = NULL; int ret = -1; - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_MESSAGE_DEPRECATION | + VIR_DOMAIN_MESSAGE_TAINTING, -1); if (!(vm = libxlDomObjFromDomain(dom))) return -1; -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:14PM +0100, Peter Krempa wrote:
The two VIR_DOMAIN_MESSAGE_* flags were not listed in the virCheckFlags check in 'libxl' but were present in 'test' and 'qemu' driver impls.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/libxl/libxl_driver.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Simplify the function especially by rewriting it using GPtrArray to construct the string list, especially for the upcoming case when the number of added elements will not be known beforehand and when hypervisor specific data will be added. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/conf/domain_conf.c | 36 +++++++----------------------------- src/conf/domain_conf.h | 4 ++-- src/libxl/libxl_driver.c | 9 ++++++++- src/qemu/qemu_driver.c | 9 ++++++++- src/test/test_driver.c | 9 ++++++++- 5 files changed, 33 insertions(+), 34 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 45c2cd09f1..00d486e774 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -31685,53 +31685,31 @@ virHostdevIsPCIDevice(const virDomainHostdevDef *hostdev) /** * virDomainObjGetMessages: * @vm: domain object - * @msgs: pointer to a variable to store messages + * @m: GPtrArray to be filled with messages * @flags: zero or more virDomainMessageType flags - * - * Returns number of messages stored in @msgs, -1 otherwise. */ -int +void virDomainObjGetMessages(virDomainObj *vm, - char ***msgs, + GPtrArray *m, unsigned int flags) { size_t i = 0; - size_t n = 0; - int nmsgs = 0; - int rv = -1; - - *msgs = NULL; if (!flags || (flags & VIR_DOMAIN_MESSAGE_TAINTING)) { - nmsgs += __builtin_popcount(vm->taint); - *msgs = g_renew(char *, *msgs, nmsgs+1); - for (i = 0; i < VIR_DOMAIN_TAINT_LAST; i++) { if (vm->taint & (1 << i)) { - (*msgs)[n++] = g_strdup_printf( - _("tainted: %1$s"), - _(virDomainTaintMessageTypeToString(i))); + g_ptr_array_add(m, g_strdup_printf(_("tainted: %1$s"), + _(virDomainTaintMessageTypeToString(i)))); } } } if (!flags || (flags & VIR_DOMAIN_MESSAGE_DEPRECATION)) { - nmsgs += vm->ndeprecations; - *msgs = g_renew(char *, *msgs, nmsgs+1); - for (i = 0; i < vm->ndeprecations; i++) { - (*msgs)[n++] = g_strdup_printf( - _("deprecated configuration: %1$s"), - vm->deprecations[i]); + g_ptr_array_add(m, g_strdup_printf(_("deprecated configuration: %1$s"), + vm->deprecations[i])); } } - - if (*msgs) - (*msgs)[nmsgs] = NULL; - - rv = nmsgs; - - return rv; } bool diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index ba1a495764..e996d3c0de 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -4588,9 +4588,9 @@ bool virHostdevIsPCIDevice(const virDomainHostdevDef *hostdev) ATTRIBUTE_NONNULL(1); -int +void virDomainObjGetMessages(virDomainObj *vm, - char ***msgs, + GPtrArray *m, unsigned int flags); bool diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 058fee0706..426c2b4278 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -6575,6 +6575,7 @@ libxlDomainGetMessages(virDomainPtr dom, char ***msgs, unsigned int flags) { + g_autoptr(GPtrArray) m = g_ptr_array_new_with_free_func(g_free); virDomainObj *vm = NULL; int ret = -1; @@ -6587,7 +6588,13 @@ libxlDomainGetMessages(virDomainPtr dom, if (virDomainGetMessagesEnsureACL(dom->conn, vm->def) < 0) goto cleanup; - ret = virDomainObjGetMessages(vm, msgs, flags); + virDomainObjGetMessages(vm, m, flags); + + ret = m->len; + if (m->len > 0) { + g_ptr_array_add(m, NULL); + *msgs = (char **) g_ptr_array_steal(m, NULL); + } cleanup: virDomainObjEndAPI(&vm); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a1fc61bae2..8327e7079c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -19818,6 +19818,7 @@ qemuDomainGetMessages(virDomainPtr dom, char ***msgs, unsigned int flags) { + g_autoptr(GPtrArray) m = g_ptr_array_new_with_free_func(g_free); virDomainObj *vm = NULL; int rv = -1; @@ -19830,7 +19831,13 @@ qemuDomainGetMessages(virDomainPtr dom, if (virDomainGetMessagesEnsureACL(dom->conn, vm->def) < 0) goto cleanup; - rv = virDomainObjGetMessages(vm, msgs, flags); + virDomainObjGetMessages(vm, m, flags); + + rv = m->len; + if (m->len > 0) { + g_ptr_array_add(m, NULL); + *msgs = (char **) g_ptr_array_steal(m, NULL); + } cleanup: virDomainObjEndAPI(&vm); diff --git a/src/test/test_driver.c b/src/test/test_driver.c index f1cefb5c50..a10ec3bc41 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -9522,6 +9522,7 @@ testDomainGetMessages(virDomainPtr dom, char ***msgs, unsigned int flags) { + g_autoptr(GPtrArray) m = g_ptr_array_new_with_free_func(g_free); virDomainObj *vm = NULL; int rv = -1; @@ -9531,7 +9532,13 @@ testDomainGetMessages(virDomainPtr dom, if (!(vm = testDomObjFromDomain(dom))) return -1; - rv = virDomainObjGetMessages(vm, msgs, flags); + virDomainObjGetMessages(vm, m, flags); + + rv = m->len; + if (m->len > 0) { + g_ptr_array_add(m, NULL); + *msgs = (char **) g_ptr_array_steal(m, NULL); + } virDomainObjEndAPI(&vm); return rv; -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:15PM +0100, Peter Krempa wrote:
Simplify the function especially by rewriting it using GPtrArray to construct the string list, especially for the upcoming case when the number of added elements will not be known beforehand and when hypervisor specific data will be added.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/conf/domain_conf.c | 36 +++++++----------------------------- src/conf/domain_conf.h | 4 ++-- src/libxl/libxl_driver.c | 9 ++++++++- src/qemu/qemu_driver.c | 9 ++++++++- src/test/test_driver.c | 9 ++++++++- 5 files changed, 33 insertions(+), 34 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Report any stored I/O error messages reported by the hypervisor when reporting messages of a domain. As the I/O error may be already stale we report also the timestamp when it was recorded. Example message: I/O error: disk='vda', index='1', path='/dev/mapper/errdev0', timestamp='2025-01-28 15:47:52.776+0000', message'Input/output error' Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- include/libvirt/libvirt-domain.h | 3 ++ src/conf/domain_conf.c | 50 ++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 5 ++++ src/libvirt_private.syms | 1 + src/libxl/libxl_driver.c | 3 +- src/qemu/qemu_driver.c | 15 +++++++++- src/test/test_driver.c | 3 +- 7 files changed, 77 insertions(+), 3 deletions(-) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 74016c6c46..9e9016cfe7 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -6520,6 +6520,9 @@ int virDomainAuthorizedSSHKeysSet(virDomainPtr domain, typedef enum { VIR_DOMAIN_MESSAGE_DEPRECATION = (1 << 0), /* (Since: 7.1.0) */ VIR_DOMAIN_MESSAGE_TAINTING = (1 << 1), /* (Since: 7.1.0) */ + VIR_DOMAIN_MESSAGE_IOERRORS = (1 << 2), /* Report available stored I/O + errors messages for disk images + (Since: 11.1.0) */ } virDomainMessageType; int virDomainGetMessages(virDomainPtr domain, diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 00d486e774..548bc82308 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -31682,6 +31682,47 @@ virHostdevIsPCIDevice(const virDomainHostdevDef *hostdev) } +static void +virDomainObjGetMessagesIOErrorsSrc(virStorageSource *src, + const char *diskdst, + GPtrArray *m) +{ + if (!src || + !src->ioerror_message) + return; + + g_ptr_array_add(m, g_strdup_printf(_("I/O error: disk='%1$s', index='%2$d', path='%3$s', timestamp='%4$s', message'%5$s'"), + NULLSTR_MINUS(diskdst), + src->id, + NULLSTR_MINUS(src->path), + src->ioerror_timestamp, + src->ioerror_message)); +} + + +void +virDomainObjGetMessagesIOErrorsChain(virStorageSource *src, + const char *diskdst, + GPtrArray *m) +{ + virStorageSource *n; + + for (n = src; n; n = n->backingStore) { + virDomainObjGetMessagesIOErrorsSrc(n, diskdst, m); + virDomainObjGetMessagesIOErrorsSrc(n->dataFileStore, diskdst, m); + } +} + + +static void +virDomainObjGetMessagesIOErrorsDisk(virDomainDiskDef *disk, + GPtrArray *m) +{ + virDomainObjGetMessagesIOErrorsChain(disk->src, disk->dst, m); + virDomainObjGetMessagesIOErrorsChain(disk->mirror, disk->dst, m); +} + + /** * virDomainObjGetMessages: * @vm: domain object @@ -31710,6 +31751,15 @@ virDomainObjGetMessages(virDomainObj *vm, vm->deprecations[i])); } } + + if (!flags || (flags & VIR_DOMAIN_MESSAGE_IOERRORS)) { + if (vm->def->os.loader) + virDomainObjGetMessagesIOErrorsChain(vm->def->os.loader->nvram, NULL, m); + + for (i = 0; i < vm->def->ndisks; i++) + virDomainObjGetMessagesIOErrorsDisk(vm->def->disks[i], m); + } + } bool diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e996d3c0de..e51c74b6d1 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -4588,6 +4588,11 @@ bool virHostdevIsPCIDevice(const virDomainHostdevDef *hostdev) ATTRIBUTE_NONNULL(1); +void +virDomainObjGetMessagesIOErrorsChain(virStorageSource *src, + const char *diskdst, + GPtrArray *m); + void virDomainObjGetMessages(virDomainObj *vm, GPtrArray *m, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 2fe0a07944..406e6583a3 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -592,6 +592,7 @@ virDomainObjEndAPI; virDomainObjFormat; virDomainObjGetDefs; virDomainObjGetMessages; +virDomainObjGetMessagesIOErrorsChain; virDomainObjGetMetadata; virDomainObjGetOneDef; virDomainObjGetOneDefState; diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 426c2b4278..a76545c9ff 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -6580,7 +6580,8 @@ libxlDomainGetMessages(virDomainPtr dom, int ret = -1; virCheckFlags(VIR_DOMAIN_MESSAGE_DEPRECATION | - VIR_DOMAIN_MESSAGE_TAINTING, -1); + VIR_DOMAIN_MESSAGE_TAINTING | + VIR_DOMAIN_MESSAGE_IOERRORS, -1); if (!(vm = libxlDomObjFromDomain(dom))) return -1; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8327e7079c..50733a5b3a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -19821,9 +19821,11 @@ qemuDomainGetMessages(virDomainPtr dom, g_autoptr(GPtrArray) m = g_ptr_array_new_with_free_func(g_free); virDomainObj *vm = NULL; int rv = -1; + qemuDomainObjPrivate *priv; virCheckFlags(VIR_DOMAIN_MESSAGE_DEPRECATION | - VIR_DOMAIN_MESSAGE_TAINTING, -1); + VIR_DOMAIN_MESSAGE_TAINTING | + VIR_DOMAIN_MESSAGE_IOERRORS, -1); if (!(vm = qemuDomainObjFromDomain(dom))) return -1; @@ -19831,8 +19833,19 @@ qemuDomainGetMessages(virDomainPtr dom, if (virDomainGetMessagesEnsureACL(dom->conn, vm->def) < 0) goto cleanup; + priv = vm->privateData; + virDomainObjGetMessages(vm, m, flags); + if (priv->backup) { + size_t i; + + for (i = 0; i < priv->backup->ndisks; i++) + virDomainObjGetMessagesIOErrorsChain(priv->backup->disks[i].store, + priv->backup->disks[i].name, + m); + } + rv = m->len; if (m->len > 0) { g_ptr_array_add(m, NULL); diff --git a/src/test/test_driver.c b/src/test/test_driver.c index a10ec3bc41..6f18b2b2c8 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -9527,7 +9527,8 @@ testDomainGetMessages(virDomainPtr dom, int rv = -1; virCheckFlags(VIR_DOMAIN_MESSAGE_DEPRECATION | - VIR_DOMAIN_MESSAGE_TAINTING, -1); + VIR_DOMAIN_MESSAGE_TAINTING | + VIR_DOMAIN_MESSAGE_IOERRORS, -1); if (!(vm = testDomObjFromDomain(dom))) return -1; -- 2.48.1

On Tue, Jan 28, 2025 at 17:28:16 +0100, Peter Krempa wrote:
Report any stored I/O error messages reported by the hypervisor when reporting messages of a domain. As the I/O error may be already stale we report also the timestamp when it was recorded.
Example message:
I/O error: disk='vda', index='1', path='/dev/mapper/errdev0', timestamp='2025-01-28 15:47:52.776+0000', message'Input/output error'
Missing = after 'message'
Signed-off-by: Peter Krempa <pkrempa@redhat.com> ---
+static void +virDomainObjGetMessagesIOErrorsSrc(virStorageSource *src, + const char *diskdst, + GPtrArray *m) +{ + if (!src || + !src->ioerror_message) + return; + + g_ptr_array_add(m, g_strdup_printf(_("I/O error: disk='%1$s', index='%2$d', path='%3$s', timestamp='%4$s', message'%5$s'"),
here.
+ NULLSTR_MINUS(diskdst), + src->id, + NULLSTR_MINUS(src->path), + src->ioerror_timestamp, + src->ioerror_message)); +}

On Tue, Jan 28, 2025 at 05:28:16PM +0100, Peter Krempa wrote:
Report any stored I/O error messages reported by the hypervisor when reporting messages of a domain. As the I/O error may be already stale we report also the timestamp when it was recorded.
Example message:
I/O error: disk='vda', index='1', path='/dev/mapper/errdev0', timestamp='2025-01-28 15:47:52.776+0000', message'Input/output error'
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- include/libvirt/libvirt-domain.h | 3 ++ src/conf/domain_conf.c | 50 ++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 5 ++++ src/libvirt_private.syms | 1 + src/libxl/libxl_driver.c | 3 +- src/qemu/qemu_driver.c | 15 +++++++++- src/test/test_driver.c | 3 +- 7 files changed, 77 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8327e7079c..50733a5b3a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -19821,9 +19821,11 @@ qemuDomainGetMessages(virDomainPtr dom, g_autoptr(GPtrArray) m = g_ptr_array_new_with_free_func(g_free); virDomainObj *vm = NULL; int rv = -1; + qemuDomainObjPrivate *priv;
virCheckFlags(VIR_DOMAIN_MESSAGE_DEPRECATION | - VIR_DOMAIN_MESSAGE_TAINTING, -1); + VIR_DOMAIN_MESSAGE_TAINTING | + VIR_DOMAIN_MESSAGE_IOERRORS, -1);
if (!(vm = qemuDomainObjFromDomain(dom))) return -1; @@ -19831,8 +19833,19 @@ qemuDomainGetMessages(virDomainPtr dom, if (virDomainGetMessagesEnsureACL(dom->conn, vm->def) < 0) goto cleanup;
+ priv = vm->privateData; + virDomainObjGetMessages(vm, m, flags);
+ if (priv->backup) {
&& (!flags || (flags & VIR_DOMAIN_MESSAGE_IOERRORS)) with that Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+ size_t i; + + for (i = 0; i < priv->backup->ndisks; i++) + virDomainObjGetMessagesIOErrorsChain(priv->backup->disks[i].store, + priv->backup->disks[i].name, + m); + }
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Emphasise that it's an enumeration and convert the possibilities to a list of values with explanation. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- include/libvirt/libvirt-domain.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 9e9016cfe7..941e67aab1 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4805,11 +4805,13 @@ typedef void (*virConnectDomainEventIOErrorCallback)(virConnectPtr conn, * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON with virConnectDomainEventRegisterAny() * - * If the I/O error is known to be caused by an ENOSPC condition in - * the host (where resizing the disk to be larger will allow the guest - * to be resumed as if nothing happened), @reason will be "enospc". - * Otherwise, @reason will be "", although future strings may be added - * if determination of other error types becomes possible. + * Although @reason is a string, it is considered to be an enumeration of the + * following values: + * + * - "" (empty string): unknown I/O error reason + * - "enospc": The I/O error is known to be caused by an ENOSPC condition in + * the host. Resizing the disk source to be larger will allow the + * guest to be resumed as if nothing happened. * * Since: 0.8.1 */ -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:17PM +0100, Peter Krempa wrote:
Emphasise that it's an enumeration and convert the possibilities to a list of values with explanation.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- include/libvirt/libvirt-domain.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

In case when the hypervisor does report the reason for the I/O error as an unstable string to display to users we can add a @reason possibility for the I/O error event noting that the error is available. Add 'hypervisor-message' as a reason enumeration value and document it to instruct users to look at the logs or virDomainGetMessages(). The resulting event looks like: event 'io-error' for domain 'cd': /dev/mapper/errdev0 (virtio-disk0) report due to hypervisor-message Users then can look at the virDomainGetMessages() API: I/O error: disk='vda', index='1', path='/dev/mapper/errdev0', timestamp='2025-01-28 15:47:52.776+0000', message'Input/output error' Or at the VM log file: 2025-01-28 15:47:52.776+0000: IO error device='virtio-disk0' node-name='libvirt-1-storage' reason='Input/output error' Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- include/libvirt/libvirt-domain.h | 4 ++++ src/qemu/qemu_process.c | 2 ++ 2 files changed, 6 insertions(+) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 941e67aab1..2615d7b76a 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4812,6 +4812,10 @@ typedef void (*virConnectDomainEventIOErrorCallback)(virConnectPtr conn, * - "enospc": The I/O error is known to be caused by an ENOSPC condition in * the host. Resizing the disk source to be larger will allow the * guest to be resumed as if nothing happened. + * - "hypervisor-message": The hypervisor reported a string description of the + * I/O error. The errors are usually logged into the + * domain log file or the last instance of the error + * string can be queried via virDomainGetMessages(). * * Since: 0.8.1 */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e9342287ab..cf6e0517ef 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -862,6 +862,8 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED, if (nospace) eventReason = "enospc"; + else if (reason) + eventReason = "hypervisor-message"; ioErrorEvent = virDomainEventIOErrorNewFromObj(vm, eventPath, eventAlias, action); ioErrorEvent2 = virDomainEventIOErrorReasonNewFromObj(vm, eventPath, eventAlias, action, eventReason); -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:18PM +0100, Peter Krempa wrote:
In case when the hypervisor does report the reason for the I/O error as an unstable string to display to users we can add a @reason possibility for the I/O error event noting that the error is available.
Add 'hypervisor-message' as a reason enumeration value and document it to instruct users to look at the logs or virDomainGetMessages().
The resulting event looks like:
event 'io-error' for domain 'cd': /dev/mapper/errdev0 (virtio-disk0) report due to hypervisor-message
Users then can look at the virDomainGetMessages() API:
I/O error: disk='vda', index='1', path='/dev/mapper/errdev0', timestamp='2025-01-28 15:47:52.776+0000', message'Input/output error'
Or at the VM log file:
2025-01-28 15:47:52.776+0000: IO error device='virtio-disk0' node-name='libvirt-1-storage' reason='Input/output error'
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- include/libvirt/libvirt-domain.h | 4 ++++ src/qemu/qemu_process.c | 2 ++ 2 files changed, 6 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 941e67aab1..2615d7b76a 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4812,6 +4812,10 @@ typedef void (*virConnectDomainEventIOErrorCallback)(virConnectPtr conn, * - "enospc": The I/O error is known to be caused by an ENOSPC condition in * the host. Resizing the disk source to be larger will allow the * guest to be resumed as if nothing happened. + * - "hypervisor-message": The hypervisor reported a string description of the + * I/O error. The errors are usually logged into the + * domain log file or the last instance of the error + * string can be queried via virDomainGetMessages().
Or shorten it to just "message" ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 5eae2706b7..452754a190 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -35,6 +35,12 @@ v11.1.0 (unreleased) * **Improvements** + * qemu: I/O error messages can be queried via ``virDomainGetMessages()`` + + The qemu hypervisor driver now preserves the last I/O error message along + with the timestamp when it was recorded and preserves it to be queried via + ``virDomainGetMessages()``. + * **Bug fixes** -- 2.48.1

On Tue, Jan 28, 2025 at 05:28:19PM +0100, Peter Krempa wrote:
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 6 ++++++ 1 file changed, 6 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Daniel P. Berrangé
-
Peter Krempa