[libvirt] [PATCH v3 0/5] Add virDomainGetDiskErrors API

We already provide ways to detect when a domain has been paused as a result of I/O error, but there was no way of getting the exact error or even the device that experienced it. This new API may be used for both. Changes in version 3: - special case errors == NULL, maxerrors == 0 returns size of the array the caller should pass - explicitly document that disks with no error are not returned and that the caller must free disk names - explicitly document what happens when maxerrors is lower then the actual number of disk errors - maxerrors turned unsigned - the first patch was split into two (public API and RPC) - python binding is back Changes in version 2: - the API is now called virDomainGetDiskErrors - it returns a list (allocated by caller) of disks with errors instead so that users don't have to call it multiple times to get all errors - there's no python API yet since it can't be autogenerated; I'll send a patch for it tomorrow Jiri Denemark (5): virDomainGetDiskErrors public API Remote protocol for virDomainGetDiskErrors qemu: Implement virDomainGetDiskErrors virsh: Implement domblkerror command python: Add binding for virDomainGetDiskErrors daemon/remote.c | 103 +++++++++++++++++++++++++++++++++++++++ include/libvirt/libvirt.h.in | 32 ++++++++++++ python/generator.py | 3 +- python/libvirt-override-api.xml | 6 ++ python/libvirt-override.c | 50 +++++++++++++++++++ src/driver.h | 7 +++ src/libvirt.c | 65 ++++++++++++++++++++++++ src/libvirt_public.syms | 1 + src/qemu/qemu_conf.h | 1 + src/qemu/qemu_driver.c | 86 ++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 40 +++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 8 +++ src/qemu/qemu_monitor_text.c | 15 ++++++ src/remote/remote_driver.c | 77 +++++++++++++++++++++++++++++ src/remote/remote_protocol.x | 23 ++++++++- src/remote_protocol-structs | 17 ++++++ tools/virsh.c | 78 +++++++++++++++++++++++++++++ tools/virsh.pod | 7 +++ 19 files changed, 618 insertions(+), 2 deletions(-) -- 1.7.8.4

We already provide ways to detect when a domain has been paused as a result of I/O error, but there was no way of getting the exact error or even the device that experienced it. This new API may be used for both. --- include/libvirt/libvirt.h.in | 32 ++++++++++++++++++++ python/generator.py | 3 +- src/driver.h | 7 ++++ src/libvirt.c | 65 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 1 + 5 files changed, 107 insertions(+), 1 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index d9b9b95..cf53cf2 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1967,6 +1967,38 @@ virDomainGetBlockIoTune(virDomainPtr dom, int *nparams, unsigned int flags); +/** + * virDomainDiskErrorCode: + * + * Disk I/O error. + */ +typedef enum { + VIR_DOMAIN_DISK_ERROR_NONE = 0, /* no error */ + VIR_DOMAIN_DISK_ERROR_UNSPEC = 1, /* unspecified I/O error */ + VIR_DOMAIN_DISK_ERROR_NO_SPACE = 2, /* no space left on the device */ + +#ifdef VIR_ENUM_SENTINELS + VIR_DOMAIN_DISK_ERROR_LAST +#endif +} virDomainDiskErrorCode; + +/** + * virDomainDiskError: + * + */ +typedef struct _virDomainDiskError virDomainDiskError; +typedef virDomainDiskError *virDomainDiskErrorPtr; + +struct _virDomainDiskError { + char *disk; /* disk target */ + int error; /* virDomainDiskErrorCode */ +}; + +int virDomainGetDiskErrors(virDomainPtr dom, + virDomainDiskErrorPtr errors, + unsigned int maxerrors, + unsigned int flags); + /* * NUMA support diff --git a/python/generator.py b/python/generator.py index 6f813ae..b514af5 100755 --- a/python/generator.py +++ b/python/generator.py @@ -423,7 +423,8 @@ skip_impl = ( 'virDomainGetBlockIoTune', 'virDomainSetInterfaceParameters', 'virDomainGetInterfaceParameters', - 'virDomainGetCPUStats' # not implemented now. + 'virDomainGetCPUStats', # not implemented now. + 'virDomainGetDiskErrors', ) qemu_skip_impl = ( diff --git a/src/driver.h b/src/driver.h index 2e2042e..9ff5edf 100644 --- a/src/driver.h +++ b/src/driver.h @@ -810,6 +810,12 @@ typedef int unsigned int ncpus, unsigned int flags); +typedef int + (*virDrvDomainGetDiskErrors)(virDomainPtr dom, + virDomainDiskErrorPtr errors, + unsigned int maxerrors, + unsigned int flags); + /** * _virDriver: * @@ -981,6 +987,7 @@ struct _virDriver { virDrvDomainSetBlockIoTune domainSetBlockIoTune; virDrvDomainGetBlockIoTune domainGetBlockIoTune; virDrvDomainGetCPUStats domainGetCPUStats; + virDrvDomainGetDiskErrors domainGetDiskErrors; }; typedef int diff --git a/src/libvirt.c b/src/libvirt.c index c609202..e84447e 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18282,3 +18282,68 @@ error: virDispatchError(domain->conn); return -1; } + +/** + * virDomainGetDiskErrors: + * @dom: a domain object + * @errors: array to populate on output + * @maxerrors: size of @errors array + * @flags: extra flags; not used yet, so callers should always pass 0 + * + * The function populates @errors array with all disks that encountered an + * I/O error. Disks with no error will not be returned in the @errors array. + * Each disk is identified by its target (the dev attribute of target + * subelement in domain XML), such as "vda", and accompanied with the error + * that was seen on it. The caller is also responsible for calling free() + * on each disk name returned. + * + * In a special case when @errors is NULL and @maxerrors is 0, the function + * returns preferred size of @errors that the caller should use to get all + * disk errors. + * + * Since calling virDomainGetDiskErrors(dom, NULL, 0, 0) to get preferred size + * of @errors array and getting the errors are two separate operations, new + * disks may be hotplugged to the domain and new errors may be encountered + * between the two calls. Thus, this function may not return all disk errors + * because the supplied array is not large enough. Such errors may, however, + * be detected by listening to domain events. + * + * Returns number of disks with errors filled in the @errors array or -1 on + * error. + */ +int +virDomainGetDiskErrors(virDomainPtr dom, + virDomainDiskErrorPtr errors, + unsigned int maxerrors, + unsigned int flags) +{ + VIR_DOMAIN_DEBUG(dom, "errors=%p, maxerrors=%u, flags=%x", + errors, maxerrors, flags); + + virResetLastError(); + + if (!VIR_IS_DOMAIN(dom)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + + if ((!errors && maxerrors) || (errors && !maxerrors)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + goto error; + } + + if (dom->conn->driver->domainGetDiskErrors) { + int ret = dom->conn->driver->domainGetDiskErrors(dom, errors, + maxerrors, flags); + if (ret < 0) + goto error; + return ret; + } + + virLibConnError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(dom->conn); + return -1; +} diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 1c4e0a3..ced9fb3 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -519,6 +519,7 @@ LIBVIRT_0.9.9 { LIBVIRT_0.9.10 { global: virDomainGetCPUStats; + virDomainGetDiskErrors; virDomainPMSuspendForDuration; virDomainShutdownFlags; virStorageVolResize; -- 1.7.8.4

On 01/31/2012 12:26 PM, Jiri Denemark wrote:
We already provide ways to detect when a domain has been paused as a result of I/O error, but there was no way of getting the exact error or even the device that experienced it. This new API may be used for both. --- include/libvirt/libvirt.h.in | 32 ++++++++++++++++++++ python/generator.py | 3 +- src/driver.h | 7 ++++ src/libvirt.c | 65 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 1 + 5 files changed, 107 insertions(+), 1 deletions(-)
+ * + * Since calling virDomainGetDiskErrors(dom, NULL, 0, 0) to get preferred size + * of @errors array and getting the errors are two separate operations, new + * disks may be hotplugged to the domain and new errors may be encountered + * between the two calls. Thus, this function may not return all disk errors + * because the supplied array is not large enough. Such errors may, however, + * be detected by listening to domain events.
I like it. Nice improvement from v1. ACK. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

--- daemon/remote.c | 103 ++++++++++++++++++++++++++++++++++++++++++ src/remote/remote_driver.c | 77 +++++++++++++++++++++++++++++++ src/remote/remote_protocol.x | 23 +++++++++- src/remote_protocol-structs | 17 +++++++ 4 files changed, 219 insertions(+), 1 deletions(-) diff --git a/daemon/remote.c b/daemon/remote.c index cb8423a..3d7021a 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -98,6 +98,12 @@ remoteDeserializeTypedParameters(remote_typed_param *args_params_val, int limit, int *nparams); +static int +remoteSerializeDomainDiskErrors(virDomainDiskErrorPtr errors, + int nerrors, + remote_domain_disk_error **ret_errors_val, + u_int *ret_errors_len); + #include "remote_dispatch.h" #include "qemu_dispatch.h" @@ -3595,6 +3601,69 @@ cleanup: return rv; } +static int remoteDispatchDomainGetDiskErrors( + virNetServerPtr server ATTRIBUTE_UNUSED, + virNetServerClientPtr client, + virNetMessagePtr msg ATTRIBUTE_UNUSED, + virNetMessageErrorPtr rerr, + remote_domain_get_disk_errors_args *args, + remote_domain_get_disk_errors_ret *ret) +{ + int rv = -1; + virDomainPtr dom = NULL; + virDomainDiskErrorPtr errors = NULL; + int len; + struct daemonClientPrivate *priv = + virNetServerClientGetPrivateData(client); + + if (!priv->conn) { + virNetError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection not open")); + goto cleanup; + } + + if (!(dom = get_nonnull_domain(priv->conn, args->dom))) + goto cleanup; + + if (args->maxerrors > REMOTE_DOMAIN_DISK_ERRORS_MAX) { + virNetError(VIR_ERR_INTERNAL_ERROR, "%s", + _("maxerrors too large")); + goto cleanup; + } + + if (args->maxerrors && + VIR_ALLOC_N(errors, args->maxerrors) < 0) { + virReportOOMError(); + goto cleanup; + } + + if ((len = virDomainGetDiskErrors(dom, errors, + args->maxerrors, + args->flags)) < 0) + goto cleanup; + + ret->nerrors = len; + if (errors && + remoteSerializeDomainDiskErrors(errors, len, + &ret->errors.errors_val, + &ret->errors.errors_len) < 0) + goto cleanup; + + rv = 0; + +cleanup: + if (rv < 0) + virNetMessageSaveError(rerr); + if (dom) + virDomainFree(dom); + if (errors) { + int i; + for (i = 0; i < len; i++) + VIR_FREE(errors[i].disk); + } + VIR_FREE(errors); + return rv; +} + /*----- Helpers. -----*/ /* get_nonnull_domain and get_nonnull_network turn an on-wire @@ -3725,3 +3794,37 @@ make_nonnull_domain_snapshot(remote_nonnull_domain_snapshot *snapshot_dst, virDo snapshot_dst->name = strdup(snapshot_src->name); make_nonnull_domain(&snapshot_dst->dom, snapshot_src->domain); } + +static int +remoteSerializeDomainDiskErrors(virDomainDiskErrorPtr errors, + int nerrors, + remote_domain_disk_error **ret_errors_val, + u_int *ret_errors_len) +{ + remote_domain_disk_error *val = NULL; + int i = 0; + + if (VIR_ALLOC_N(val, nerrors) < 0) + goto no_memory; + + for (i = 0; i < nerrors; i++) { + if (!(val[i].disk = strdup(errors[i].disk))) + goto no_memory; + val[i].error = errors[i].error; + } + + *ret_errors_len = nerrors; + *ret_errors_val = val; + + return 0; + +no_memory: + if (val) { + int j; + for (j = 0; j < i; j++) + VIR_FREE(val[j].disk); + VIR_FREE(val); + } + virReportOOMError(); + return -1; +} diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 61b96e9..9c3dbab 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -1427,6 +1427,39 @@ cleanup: } static int +remoteDeserializeDomainDiskErrors(remote_domain_disk_error *ret_errors_val, + u_int ret_errors_len, + int limit, + virDomainDiskErrorPtr errors, + int maxerrors) +{ + int i = 0; + int j; + + if (ret_errors_len > limit || ret_errors_len > maxerrors) { + remoteError(VIR_ERR_RPC, "%s", + _("returned number of disk errors exceeds limit")); + goto error; + } + + for (i = 0; i < ret_errors_len; i++) { + if (!(errors[i].disk = strdup(ret_errors_val[i].disk))) { + virReportOOMError(); + goto error; + } + errors[i].error = ret_errors_val[i].error; + } + + return 0; + +error: + for (j = 0; j < i; j++) + VIR_FREE(errors[i].disk); + + return -1; +} + +static int remoteDomainBlockStatsFlags(virDomainPtr domain, const char *path, virTypedParameterPtr params, @@ -4447,6 +4480,49 @@ remoteIsAlive(virConnectPtr conn) } +static int +remoteDomainGetDiskErrors(virDomainPtr dom, + virDomainDiskErrorPtr errors, + unsigned int maxerrors, + unsigned int flags) +{ + int rv = -1; + struct private_data *priv = dom->conn->privateData; + remote_domain_get_disk_errors_args args; + remote_domain_get_disk_errors_ret ret; + + remoteDriverLock(priv); + + make_nonnull_domain(&args.dom, dom); + args.maxerrors = maxerrors; + args.flags = flags; + + memset(&ret, 0, sizeof ret); + + if (call(dom->conn, priv, 0, REMOTE_PROC_DOMAIN_GET_DISK_ERRORS, + (xdrproc_t) xdr_remote_domain_get_disk_errors_args, + (char *) &args, + (xdrproc_t) xdr_remote_domain_get_disk_errors_ret, + (char *) &ret) == -1) + goto done; + + if (remoteDeserializeDomainDiskErrors(ret.errors.errors_val, + ret.errors.errors_len, + REMOTE_DOMAIN_DISK_ERRORS_MAX, + errors, + maxerrors) < 0) + goto cleanup; + + rv = ret.nerrors; + +cleanup: + xdr_free((xdrproc_t) xdr_remote_domain_get_disk_errors_ret, (char *) &ret); + +done: + remoteDriverUnlock(priv); + return rv; +} + #include "remote_client_bodies.h" #include "qemu_client_bodies.h" @@ -4849,6 +4925,7 @@ static virDriver remote_driver = { .domainSetNumaParameters = remoteDomainSetNumaParameters, /* 0.9.9 */ .domainGetNumaParameters = remoteDomainGetNumaParameters, /* 0.9.9 */ .domainGetCPUStats = remoteDomainGetCPUStats, /* 0.9.10 */ + .domainGetDiskErrors = remoteDomainGetDiskErrors, /* 0.9.10 */ }; static virNetworkDriver network_driver = { diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index b2c8426..10fd294 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -219,6 +219,11 @@ const REMOTE_DOMAIN_GET_CPU_STATS_NCPUS_MAX = 128; */ const REMOTE_DOMAIN_GET_CPU_STATS_MAX = 2048; +/* + * Upper limit on number of disks with errors + */ +const REMOTE_DOMAIN_DISK_ERRORS_MAX = 256; + /* UUID. VIR_UUID_BUFLEN definition comes from libvirt.h */ typedef opaque remote_uuid[VIR_UUID_BUFLEN]; @@ -359,6 +364,10 @@ struct remote_node_get_memory_stats { unsigned hyper value; }; +struct remote_domain_disk_error { + remote_nonnull_string disk; + int error; +}; /*----- Calls. -----*/ @@ -2397,6 +2406,17 @@ struct remote_domain_shutdown_flags_args { unsigned int flags; }; +struct remote_domain_get_disk_errors_args { + remote_nonnull_domain dom; + unsigned int maxerrors; + unsigned int flags; +}; + +struct remote_domain_get_disk_errors_ret { + remote_domain_disk_error errors<REMOTE_DOMAIN_DISK_ERRORS_MAX>; + int nerrors; +}; + /*----- Protocol. -----*/ @@ -2708,7 +2728,8 @@ enum remote_procedure { REMOTE_PROC_STORAGE_VOL_RESIZE = 260, /* autogen autogen */ REMOTE_PROC_DOMAIN_PM_SUSPEND_FOR_DURATION = 261, /* autogen autogen */ - REMOTE_PROC_DOMAIN_GET_CPU_STATS = 262 /* skipgen skipgen */ + REMOTE_PROC_DOMAIN_GET_CPU_STATS = 262, /* skipgen skipgen */ + REMOTE_PROC_DOMAIN_GET_DISK_ERRORS = 263 /* skipgen skipgen */ /* * Notice how the entries are grouped in sets of 10 ? diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index e9137a9..ee2207c 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -94,6 +94,10 @@ struct remote_node_get_memory_stats { remote_nonnull_string field; uint64_t value; }; +struct remote_domain_disk_error { + remote_nonnull_string disk; + int error; +}; struct remote_open_args { remote_string name; u_int flags; @@ -1866,6 +1870,18 @@ struct remote_domain_shutdown_flags_args { remote_nonnull_domain dom; u_int flags; }; +struct remote_domain_get_disk_errors_args { + remote_nonnull_domain dom; + u_int maxerrors; + u_int flags; +}; +struct remote_domain_get_disk_errors_ret { + struct { + u_int errors_len; + remote_domain_disk_error * errors_val; + } errors; + int nerrors; +}; enum remote_procedure { REMOTE_PROC_OPEN = 1, REMOTE_PROC_CLOSE = 2, @@ -2129,4 +2145,5 @@ enum remote_procedure { REMOTE_PROC_STORAGE_VOL_RESIZE = 260, REMOTE_PROC_DOMAIN_PM_SUSPEND_FOR_DURATION = 261, REMOTE_PROC_DOMAIN_GET_CPU_STATS = 262, + REMOTE_PROC_DOMAIN_GET_DISK_ERRORS = 263, }; -- 1.7.8.4

On 01/31/2012 12:26 PM, Jiri Denemark wrote:
--- daemon/remote.c | 103 ++++++++++++++++++++++++++++++++++++++++++ src/remote/remote_driver.c | 77 +++++++++++++++++++++++++++++++ src/remote/remote_protocol.x | 23 +++++++++- src/remote_protocol-structs | 17 +++++++ 4 files changed, 219 insertions(+), 1 deletions(-)
ACK. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

--- src/qemu/qemu_conf.h | 1 + src/qemu/qemu_driver.c | 86 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 40 +++++++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 8 ++++ src/qemu/qemu_monitor_text.c | 15 +++++++ 6 files changed, 151 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index 7d79823..0b65d7d 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -175,6 +175,7 @@ struct qemuDomainDiskInfo { bool removable; bool locked; bool tray_open; + int io_status; }; #endif /* __QEMUD_CONF_H */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 1b147a9..a7ac75a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11740,6 +11740,91 @@ cleanup: return ret; } +static int +qemuDomainGetDiskErrors(virDomainPtr dom, + virDomainDiskErrorPtr errors, + unsigned int nerrors, + unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + virHashTablePtr table = NULL; + int ret = -1; + int i; + int n = 0; + + virCheckFlags(0, -1); + + qemuDriverLock(driver); + virUUIDFormat(dom->uuid, uuidstr); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + qemuDriverUnlock(driver); + + if (!vm) { + qemuReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + + priv = vm->privateData; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + if (!errors) { + ret = vm->def->ndisks; + goto endjob; + } + + qemuDomainObjEnterMonitor(driver, vm); + table = qemuMonitorGetBlockInfo(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + if (!table) + goto endjob; + + for (i = n = 0; i < vm->def->ndisks; i++) { + struct qemuDomainDiskInfo *info; + virDomainDiskDefPtr disk = vm->def->disks[i]; + + if ((info = virHashLookup(table, disk->info.alias)) && + info->io_status != VIR_DOMAIN_DISK_ERROR_NONE) { + if (n == nerrors) + break; + + if (!(errors[n].disk = strdup(disk->dst))) { + virReportOOMError(); + goto endjob; + } + errors[n].error = info->io_status; + n++; + } + } + + ret = n; + +endjob: + if (qemuDomainObjEndJob(driver, vm) == 0) + vm = NULL; + +cleanup: + if (vm) + virDomainObjUnlock(vm); + virHashFree(table); + if (ret < 0) { + for (i = 0; i < n; i++) + VIR_FREE(errors[i].disk); + } + return ret; +} + static virDriver qemuDriver = { .no = VIR_DRV_QEMU, .name = "QEMU", @@ -11893,6 +11978,7 @@ static virDriver qemuDriver = { .domainGetNumaParameters = qemuDomainGetNumaParameters, /* 0.9.9 */ .domainGetInterfaceParameters = qemuDomainGetInterfaceParameters, /* 0.9.9 */ .domainSetInterfaceParameters = qemuDomainSetInterfaceParameters, /* 0.9.9 */ + .domainGetDiskErrors = qemuDomainGetDiskErrors, /* 0.9.10 */ }; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index dda0521..7872e3d 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -87,6 +87,20 @@ VIR_ENUM_IMPL(qemuMonitorVMStatus, "postmigrate", "prelaunch", "finish-migrate", "restore-vm", "running", "save-vm", "shutdown", "watchdog") +typedef enum { + QEMU_MONITOR_BLOCK_IO_STATUS_OK, + QEMU_MONITOR_BLOCK_IO_STATUS_FAILED, + QEMU_MONITOR_BLOCK_IO_STATUS_NOSPACE, + + QEMU_MONITOR_BLOCK_IO_STATUS_LAST +} qemuMonitorBlockIOStatus; + +VIR_ENUM_DECL(qemuMonitorBlockIOStatus) + +VIR_ENUM_IMPL(qemuMonitorBlockIOStatus, + QEMU_MONITOR_BLOCK_IO_STATUS_LAST, + "ok", "failed", "nospace") + char *qemuMonitorEscapeArg(const char *in) { int len = 0; @@ -1227,6 +1241,32 @@ int qemuMonitorGetMemoryStats(qemuMonitorPtr mon, return ret; } +int +qemuMonitorBlockIOStatusToError(const char *status) +{ + int st = qemuMonitorBlockIOStatusTypeFromString(status); + + if (st < 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("unknown block IO status: %s"), status); + return -1; + } + + switch ((qemuMonitorBlockIOStatus) st) { + case QEMU_MONITOR_BLOCK_IO_STATUS_OK: + return VIR_DOMAIN_DISK_ERROR_NONE; + case QEMU_MONITOR_BLOCK_IO_STATUS_FAILED: + return VIR_DOMAIN_DISK_ERROR_UNSPEC; + case QEMU_MONITOR_BLOCK_IO_STATUS_NOSPACE: + return VIR_DOMAIN_DISK_ERROR_NO_SPACE; + + /* unreachable */ + case QEMU_MONITOR_BLOCK_IO_STATUS_LAST: + break; + } + return -1; +} + virHashTablePtr qemuMonitorGetBlockInfo(qemuMonitorPtr mon) { diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index a9471fe..5945d3e 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -236,6 +236,7 @@ int qemuMonitorGetMemoryStats(qemuMonitorPtr mon, virDomainMemoryStatPtr stats, unsigned int nr_stats); +int qemuMonitorBlockIOStatusToError(const char *status); virHashTablePtr qemuMonitorGetBlockInfo(qemuMonitorPtr mon); struct qemuDomainDiskInfo * qemuMonitorBlockInfoLookup(virHashTablePtr blockInfo, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 0bd9f46..b0ae570 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1389,6 +1389,7 @@ int qemuMonitorJSONGetBlockInfo(qemuMonitorPtr mon, virJSONValuePtr dev = virJSONValueArrayGet(devices, i); struct qemuDomainDiskInfo *info; const char *thisdev; + const char *status; if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -1434,6 +1435,13 @@ int qemuMonitorJSONGetBlockInfo(qemuMonitorPtr mon, */ ignore_value(virJSONValueObjectGetBoolean(dev, "tray-open", &info->tray_open)); + + /* Missing io-status indicates no error */ + if ((status = virJSONValueObjectGetString(dev, "io-status"))) { + info->io_status = qemuMonitorBlockIOStatusToError(status); + if (info->io_status < 0) + goto cleanup; + } } ret = 0; diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index edeb435..f051a86 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -839,6 +839,21 @@ int qemuMonitorTextGetBlockInfo(qemuMonitorPtr mon, VIR_DEBUG("error reading tray_open: %s", p); else info->tray_open = (tmp != 0); + } else if (STRPREFIX(p, "io-status=")) { + char *end; + char c; + + p += strlen("io-status="); + end = strchr(p, ' '); + if (!end || end > eol) + end = eol; + + c = *end; + *end = '\0'; + info->io_status = qemuMonitorBlockIOStatusToError(p); + *end = c; + if (info->io_status < 0) + goto cleanup; } else { /* ignore because we don't parse all options */ } -- 1.7.8.4

On 01/31/2012 12:26 PM, Jiri Denemark wrote:
--- src/qemu/qemu_conf.h | 1 + src/qemu/qemu_driver.c | 86 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 40 +++++++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 8 ++++ src/qemu/qemu_monitor_text.c | 15 +++++++ 6 files changed, 151 insertions(+), 0 deletions(-)
ACK. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This command lists all disk devices with errors --- tools/virsh.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/virsh.pod | 7 +++++ 2 files changed, 85 insertions(+), 0 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 3a59746..23962bf 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -16037,6 +16037,83 @@ cleanup: } /* + * "domblkerror" command + */ +static const char * +vshDomainIOErrorToString(int error) +{ + switch ((virDomainDiskErrorCode) error) { + case VIR_DOMAIN_DISK_ERROR_NONE: + return _("no error"); + case VIR_DOMAIN_DISK_ERROR_UNSPEC: + return _("unspecified error"); + case VIR_DOMAIN_DISK_ERROR_NO_SPACE: + return _("no space"); + case VIR_DOMAIN_DISK_ERROR_LAST: + ; + } + + return _("unknown error"); +} + +static const vshCmdInfo info_domblkerror[] = { + {"help", N_("Show errors on block devices")}, + {"desc", N_("Show block device errors")}, + {NULL, NULL} +}; + +static const vshCmdOptDef opts_domblkerror[] = { + {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id, or uuid")}, + {NULL, 0, 0, NULL} +}; + +static bool +cmdDomBlkError(vshControl *ctl, const vshCmd *cmd) +{ + virDomainPtr dom; + virDomainDiskErrorPtr disks = NULL; + unsigned int ndisks; + int i; + int count; + bool ret = false; + + if (!vshConnectionUsability(ctl, ctl->conn)) + return false; + + if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if ((count = virDomainGetDiskErrors(dom, NULL, 0, 0)) < 0) + goto cleanup; + ndisks = count; + + if (ndisks) { + if (VIR_ALLOC_N(disks, ndisks) < 0) + goto cleanup; + + if ((count = virDomainGetDiskErrors(dom, disks, ndisks, 0)) == -1) + goto cleanup; + } + + if (count == 0) { + vshPrint(ctl, _("No errors found\n")); + } else { + for (i = 0; i < count; i++) { + vshPrint(ctl, "%s: %s\n", + disks[i].disk, + vshDomainIOErrorToString(disks[i].error)); + } + } + + ret = true; + +cleanup: + VIR_FREE(disks); + virDomainFree(dom); + return ret; +} + +/* * "qemu-monitor-command" command */ static const vshCmdInfo info_qemu_monitor_command[] = { @@ -16236,6 +16313,7 @@ static const vshCmdDef domManagementCmds[] = { }; static const vshCmdDef domMonitoringCmds[] = { + {"domblkerror", cmdDomBlkError, opts_domblkerror, info_domblkerror, 0}, {"domblkinfo", cmdDomblkinfo, opts_domblkinfo, info_domblkinfo, 0}, {"domblklist", cmdDomblklist, opts_domblklist, info_domblklist, 0}, {"domblkstat", cmdDomblkstat, opts_domblkstat, info_domblkstat, 0}, diff --git a/tools/virsh.pod b/tools/virsh.pod index 4bc25bf..e9598ac 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -507,6 +507,13 @@ on hypervisor. Get memory stats for a running domain. +=item B<domblkerror> I<domain-id> + +Show errors on block devices. This command usually comes handy when +B<domstate> command says that a domain was paused due to I/O error. +The B<domblkerror> command lists all block devices in error state and +the error seen on each of them. + =item B<domblkinfo> I<domain> I<block-device> Get block device size info for a domain. A I<block-device> corresponds -- 1.7.8.4

On 01/31/2012 12:26 PM, Jiri Denemark wrote:
This command lists all disk devices with errors --- tools/virsh.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/virsh.pod | 7 +++++ 2 files changed, 85 insertions(+), 0 deletions(-)
ACK. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

--- python/libvirt-override-api.xml | 6 ++++ python/libvirt-override.c | 50 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+), 0 deletions(-) diff --git a/python/libvirt-override-api.xml b/python/libvirt-override-api.xml index 704fee9..b2b8152 100644 --- a/python/libvirt-override-api.xml +++ b/python/libvirt-override-api.xml @@ -421,5 +421,11 @@ <arg name='flags' type='unsigned int' info='an OR'ed set of virDomainMemoryFlags'/> <return type='char *' info='the returned buffer or None in case of error'/> </function> + <function name='virDomainGetDiskErrors' file='python'> + <info>Extract errors on disk devices.</info> + <return type='virDomainDiskErrorPtr' info='dictionary of disks and their errors or None in case of error'/> + <arg name='domain' type='virDomainPtr' info='a domain object'/> + <arg name='flags' type='unsigned int' info='unused, always pass 0'/> + </function> </symbols> </api> diff --git a/python/libvirt-override.c b/python/libvirt-override.c index d2aad0f..7f1c093 100644 --- a/python/libvirt-override.c +++ b/python/libvirt-override.c @@ -3484,6 +3484,55 @@ libvirt_virDomainGetBlockIoTune(PyObject *self ATTRIBUTE_UNUSED, return(pyreply); } +static PyObject * +libvirt_virDomainGetDiskErrors(PyObject *self ATTRIBUTE_UNUSED, + PyObject *args) +{ + PyObject *py_retval = VIR_PY_NONE; + virDomainPtr domain; + PyObject *pyobj_domain; + unsigned int flags; + virDomainDiskErrorPtr disks = NULL; + unsigned int ndisks; + int count; + int i; + + if (!PyArg_ParseTuple(args, (char *) "Oi:virDomainGetDiskErrors", + &pyobj_domain, &flags)) + return NULL; + + domain = (virDomainPtr) PyvirDomain_Get(pyobj_domain); + + if ((count = virDomainGetDiskErrors(domain, NULL, 0, 0)) < 0) + return VIR_PY_NONE; + ndisks = count; + + if (ndisks) { + if (!(disks = malloc(sizeof(*disks) * ndisks))) + return VIR_PY_NONE; + + LIBVIRT_BEGIN_ALLOW_THREADS; + count = virDomainGetDiskErrors(domain, disks, ndisks, 0); + LIBVIRT_END_ALLOW_THREADS; + + if (count < 0) + goto cleanup; + } + + if (!(py_retval = PyDict_New())) + goto cleanup; + + for (i = 0; i < count; i++) { + PyDict_SetItem(py_retval, + libvirt_charPtrWrap(disks[i].disk), + libvirt_intWrap(disks[i].error)); + } + +cleanup: + free(disks); + return py_retval; +} + /******************************************* * Helper functions to avoid importing modules * for every callback @@ -5206,6 +5255,7 @@ static PyMethodDef libvirtMethods[] = { {(char *) "virDomainMigrateGetMaxSpeed", libvirt_virDomainMigrateGetMaxSpeed, METH_VARARGS, NULL}, {(char *) "virDomainBlockPeek", libvirt_virDomainBlockPeek, METH_VARARGS, NULL}, {(char *) "virDomainMemoryPeek", libvirt_virDomainMemoryPeek, METH_VARARGS, NULL}, + {(char *) "virDomainGetDiskErrors", libvirt_virDomainGetDiskErrors, METH_VARARGS, NULL}, {NULL, NULL, 0, NULL} }; -- 1.7.8.4

On 01/31/2012 12:26 PM, Jiri Denemark wrote:
--- python/libvirt-override-api.xml | 6 ++++ python/libvirt-override.c | 50 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+), 0 deletions(-)
+ if ((count = virDomainGetDiskErrors(domain, NULL, 0, 0)) < 0) + return VIR_PY_NONE;
This one's good.
+ ndisks = count; + + if (ndisks) { + if (!(disks = malloc(sizeof(*disks) * ndisks))) + return VIR_PY_NONE;
You're not the first offender, so I don't care if you check this in as-is and let a subsequent patch clean this up, but this should be a place where we return a python OOM exception rather than None.
+ + LIBVIRT_BEGIN_ALLOW_THREADS; + count = virDomainGetDiskErrors(domain, disks, ndisks, 0); + LIBVIRT_END_ALLOW_THREADS; + + if (count < 0) + goto cleanup; + } + + if (!(py_retval = PyDict_New())) + goto cleanup;
This properly propagates the python exception.
+ + for (i = 0; i < count; i++) { + PyDict_SetItem(py_retval, + libvirt_charPtrWrap(disks[i].disk), + libvirt_intWrap(disks[i].error));
Also a case where you're not the first offender (so fixing it can be saved for a later global cleanup), but we should: 1. check that libvirt_charPtrWrap() and libvirt_intWrap() aren't returning NULL (since PyDict_SetItem doesn't handle NULL well), and 2. check for PyDict_SetItem failures (in which case we free the portion of the dictionary collected so far and propagate the python exception).
+ } + +cleanup: + free(disks);
I think this is a memory leak - if I'm correct, libvirt_charPtrWrap creates a new object that copies the incoming string to the new object's content, rather than stealing a reference to your malloc'd string. That means you need to loop over count and free each disk name before freeing disks. ACK with the memory leak in cleanup: fixed; you can leave the other issues for a later global cleanup of the python overrides. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, Jan 31, 2012 at 20:57:54 -0700, Eric Blake wrote:
On 01/31/2012 12:26 PM, Jiri Denemark wrote:
--- python/libvirt-override-api.xml | 6 ++++ python/libvirt-override.c | 50 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+), 0 deletions(-)
+ if ((count = virDomainGetDiskErrors(domain, NULL, 0, 0)) < 0) + return VIR_PY_NONE;
This one's good.
+ ndisks = count; + + if (ndisks) { + if (!(disks = malloc(sizeof(*disks) * ndisks))) + return VIR_PY_NONE;
You're not the first offender, so I don't care if you check this in as-is and let a subsequent patch clean this up, but this should be a place where we return a python OOM exception rather than None.
Yeah, I hate creating python binding so I just copied&pasted the boring parts of this function from other places :-)
+ + LIBVIRT_BEGIN_ALLOW_THREADS; + count = virDomainGetDiskErrors(domain, disks, ndisks, 0); + LIBVIRT_END_ALLOW_THREADS; + + if (count < 0) + goto cleanup; + } + + if (!(py_retval = PyDict_New())) + goto cleanup;
This properly propagates the python exception.
+ + for (i = 0; i < count; i++) { + PyDict_SetItem(py_retval, + libvirt_charPtrWrap(disks[i].disk), + libvirt_intWrap(disks[i].error));
Also a case where you're not the first offender (so fixing it can be saved for a later global cleanup), but we should: 1. check that libvirt_charPtrWrap() and libvirt_intWrap() aren't returning NULL (since PyDict_SetItem doesn't handle NULL well), and 2. check for PyDict_SetItem failures (in which case we free the portion of the dictionary collected so far and propagate the python exception).
+ } + +cleanup: + free(disks);
I think this is a memory leak - if I'm correct, libvirt_charPtrWrap creates a new object that copies the incoming string to the new object's content, rather than stealing a reference to your malloc'd string.
That's what libvirt_constcharPtrWrap does. libvirt_charPtrWrap is nice that it also calls free() on the original string at the end. But we can get to cleanup without going through the for loop so I need to change it to call libvirt_constcharPtrWrap and free the disks here. Thanks for pointing that out. Jirka

On Tue, Jan 31, 2012 at 20:26:09 +0100, Jiri Denemark wrote:
We already provide ways to detect when a domain has been paused as a result of I/O error, but there was no way of getting the exact error or even the device that experienced it. This new API may be used for both.
I fixed the memory leak in error path in python bindings and pushed this series. Thanks for the reviews. Jirka
participants (2)
-
Eric Blake
-
Jiri Denemark