[libvirt] [RFC PATCHv2 0/8] new API virDomainBlockSetWriteThreshold

Still in RFC stage because I need more work on qemu to turn on auto-node-naming, then corresponding work in libvirt to do node name lookups for both setting the threshold and for mapping events back to the right device name. But with this series in place and a simple hack to qemu to allow threshold settings on the active layer of a device (wrong node), I was able to do round trip threshold modifications (set a number, and read it back correctly) and rudimentary threshold event testing (I set such a low threshold that pretty much any guest write was enough to fire the event, although the corresponding length past the offset was unpredictable and had no bearing on whether the host allocation had actually changed as a result of the guest write). Posting now, since most of these patches will be fairly similar to their final form; only those touching src/qemu/ will need changes, once I get the counterpart qemu patches for auto-node-naming working and integrated. Eric Blake (8): threshold: new API virDomainBlockSetWriteThreshold threshold: expose new API in virsh threshold: new event object for tracking write threshold threshold: wire up threshold event in RPC threshold: add qemu capability bit threshold: add threshold event handling in qemu threshold: scrape threshold data from QMP threshold: add write threshold setting in qemu daemon/remote.c | 39 +++++++++++++++ include/libvirt/libvirt-domain.h | 47 ++++++++++++++++++ src/conf/domain_event.c | 93 +++++++++++++++++++++++++++++++++- src/conf/domain_event.h | 14 +++++- src/driver-hypervisor.h | 7 +++ src/libvirt-domain.c | 95 +++++++++++++++++++++++++++++++++++ src/libvirt_private.syms | 2 + src/libvirt_public.syms | 5 ++ src/qemu/qemu_capabilities.c | 4 +- src/qemu/qemu_capabilities.h | 3 +- src/qemu/qemu_driver.c | 104 +++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 27 ++++++++++ src/qemu/qemu_monitor.h | 17 +++++++ src/qemu/qemu_monitor_json.c | 79 +++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ src/qemu/qemu_process.c | 34 +++++++++++++ src/remote/remote_driver.c | 34 +++++++++++++ src/remote/remote_protocol.x | 29 ++++++++++- src/remote_protocol-structs | 15 ++++++ tests/qemumonitortest.c | 13 ++--- tools/virsh-domain-monitor.c | 81 ++++++++++++++++++++++++++++++ tools/virsh-domain.c | 23 +++++++++ tools/virsh.pod | 25 ++++++++++ 23 files changed, 782 insertions(+), 11 deletions(-) -- 2.4.2

qemu 2.3 added a new QMP command block-set-write-threshold, which allows callers to get an interrupt when a file hits a write threshold, rather than the current approach of repeatedly polling for file allocation. This patch prepares the API for callers to register to receive the event, as well as a way to query the threshold via virDomainListGetStats(). The event is one-shot in qemu - a guest must re-register a new threshold after each time it triggers. However, the virConnectDomainEventRegisterAny() call does not allow parameterization, so callers must use a pair of APIs - one to register the callback (one-time call) that will be used each time a threshold triggers for any guest disk, and another to repeatedly set the desired threshold (must be called each time a threshold should be changed). Note that the threshold can either be registered by a byte offset, or by a thousandth of a percentage (a value between 0 and 100000, scaled to the disk size). But the value is always reported as a byte offset, even when registered as a percentage. I also considered having the setup parameter be a double, to allow a finer resolution on percentage; with the choice of an integer fixed-point scale, this means a 100G disk can only set a threshold to a granularity of 1M, but that is probably sufficient for the usage. To make the patch series more digestible, this patch intentionally omits remote support, by using a couple of placeholders at a point where the compiler forces the addition of a case label within a switch statement. * include/libvirt/libvirt-domain.h (virDomainBlockSetWriteThreshold): New API. (virConnectDomainEventWriteThresholdCallback): New event. * src/libvirt_public.syms (LIBVIRT_1.2.17): Export it. * src/libvirt-domain.c (virDomainBlockSetWriteThreshold): New API. (virConnectGetAllDomainStats): New stat. * src/driver-hypervisor.h (virDrvDomainBlockSetWriteThreshold): New hypervisor entry point. * tools/virsh-domain.c (vshEventWriteThresholdPrint): Print new event. * tools/virsh.pod (domstats): Document new stat. * daemon/remote.c (domainEventCallbacks): Add stub. * src/conf/domain_event.c (virDomainEventDispatchDefaultFunc): Likewise. Signed-off-by: Eric Blake <eblake@redhat.com> --- daemon/remote.c | 2 + include/libvirt/libvirt-domain.h | 47 ++++++++++++++++++++ src/conf/domain_event.c | 4 +- src/driver-hypervisor.h | 7 +++ src/libvirt-domain.c | 95 ++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 5 +++ tools/virsh-domain.c | 23 ++++++++++ tools/virsh.pod | 1 + 8 files changed, 183 insertions(+), 1 deletion(-) diff --git a/daemon/remote.c b/daemon/remote.c index e9e2dca..283ece2 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1102,6 +1102,8 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventTunable), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventAgentLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventDeviceAdded), + /* TODO: Implement RPC support for this */ + VIR_DOMAIN_EVENT_CALLBACK(NULL), }; verify(ARRAY_CARDINALITY(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index d851225..7514656 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1297,6 +1297,17 @@ int virDomainBlockStatsFlags (virDomainPtr dom, virTypedParameterPtr params, int *nparams, unsigned int flags); + +typedef enum { + /* threshold is thousandth of a percentage (0 to 100000) relative to + * image size rather than byte limit */ + VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE = (1 << 0), +} virDomainBlockSetWriteThresholdFlags; +int virDomainBlockSetWriteThreshold(virDomainPtr dom, + const char *disk, + unsigned long long threshold, + unsigned int flags); + int virDomainInterfaceStats (virDomainPtr dom, const char *path, virDomainInterfaceStatsPtr stats, @@ -3246,6 +3257,41 @@ typedef void (*virConnectDomainEventDeviceAddedCallback)(virConnectPtr conn, void *opaque); /** + * virConnectDomainEventWriteThresholdCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @devAlias: device alias + * @threshold: threshold that was exceeded, in bytes + * @length: length beyond @threshold that was involved in the triggering + * write, or 0 if not known + * @opaque: application specified data + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD with virConnectDomainEventRegisterAny() + * + * This callback occurs when a block device detects a write event that + * exceeds a non-zero threshold set by + * virDomainBlockSetWriteThreshold(). When this event occurs, the + * threshold is reset to 0, and a new limit must be installed to see + * the event again on the same device. The intent of this event is to + * allow time for the underlying storage to be resized dynamically + * prior to the point where the guest would be paused due to running + * out of space, without having to poll for allocation values. + * + * The contents of @devAlias will be "vda" when the threshold is triggered + * on the active layer of guest disk vda. Some hypervisors also support + * threshold reporting on backing images, such as during a block commit; + * when that happens, @devAlias will be "vda[1]" for the backingStore at + * index 1 within the chain of host resources for guest disk vda. + */ +typedef void (*virConnectDomainEventWriteThresholdCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *devAlias, + unsigned long long threshold, + unsigned long long length, + void *opaque); + +/** * VIR_DOMAIN_TUNABLE_CPU_VCPUPIN: * * Macro represents formatted pinning for one vcpu specified by id which is @@ -3528,6 +3574,7 @@ typedef enum { VIR_DOMAIN_EVENT_ID_TUNABLE = 17, /* virConnectDomainEventTunableCallback */ VIR_DOMAIN_EVENT_ID_AGENT_LIFECYCLE = 18,/* virConnectDomainEventAgentLifecycleCallback */ VIR_DOMAIN_EVENT_ID_DEVICE_ADDED = 19, /* virConnectDomainEventDeviceAddedCallback */ + VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD = 20, /* virConnectDomainEventWriteThreshold */ # ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_EVENT_ID_LAST diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index 20d66e1..c43799f 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -1,7 +1,7 @@ /* * domain_event.c: domain event queue processing helpers * - * Copyright (C) 2010-2014 Red Hat, Inc. + * Copyright (C) 2010-2015 Red Hat, Inc. * Copyright (C) 2008 VirtualIron * Copyright (C) 2013 SUSE LINUX Products GmbH, Nuernberg, Germany. * @@ -1614,6 +1614,8 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, goto cleanup; } + case VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD: + /* TODO: Implement RPC support for this */ case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h index 3275343..b5b51f1 100644 --- a/src/driver-hypervisor.h +++ b/src/driver-hypervisor.h @@ -484,6 +484,12 @@ typedef int unsigned int flags); typedef int +(*virDrvDomainBlockSetWriteThreshold)(virDomainPtr domain, + const char *disk, + unsigned long long threshold, + unsigned int flags); + +typedef int (*virDrvDomainInterfaceStats)(virDomainPtr domain, const char *path, virDomainInterfaceStatsPtr stats); @@ -1324,6 +1330,7 @@ struct _virHypervisorDriver { virDrvDomainBlockResize domainBlockResize; virDrvDomainBlockStats domainBlockStats; virDrvDomainBlockStatsFlags domainBlockStatsFlags; + virDrvDomainBlockSetWriteThreshold domainBlockSetWriteThreshold; virDrvDomainInterfaceStats domainInterfaceStats; virDrvDomainSetInterfaceParameters domainSetInterfaceParameters; virDrvDomainGetInterfaceParameters domainGetInterfaceParameters; diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 7e6d749..53114d3 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -5743,6 +5743,99 @@ virDomainBlockStatsFlags(virDomainPtr dom, /** + * virDomainBlockSetWriteThreshold: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @threshold: limit at which a write threshold event can trigger + * @flags: bitwise-OR of virDomainBlockSetWriteThresholdFlags + * + * This function is used to set a one-shot write threshold. It should + * be used in tandem with virConnectDomainEventRegisterAny() + * installing a handler for VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD. If + * the hypervisor detects that a write request (whether guest data, or + * host metadata) would exceed the host byte offset specified in + * @threshold, then an event is raised, and the threshold is reset to + * 0 at that time. The event registration is only needed once, but + * this function must be called each time a new threshold is desired; + * the event will only fire if a non-zero threshold is + * exceeded. + * + * By default, @threshold is specified in bytes, and must not exceed + * the size of the block device. However, when @flags includes + * VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE, @threshold is + * instead a value between 0 an 100,000, as a thousandth of a percent + * of the current size of the disk, and the driver will compute the + * corresponding byte value. For example, 80000 represents 80.000%. + * A driver may round the requested threshold to a granularity that + * can actually be supported. + * + * Setting a threshold allows event-based resizing of host resources + * that back a guest disk without having to poll the current disk + * allocation, while still having enough time to complete the resize + * before the guest would end up halting due to insufficient space. + * Calling this function to set the threshold back to zero will stop + * further firing of the event. virConnectGetAllDomainStats() can be + * used to track the current threshold value, always in the form + * normalized to bytes. + * + * The @disk parameter is either the device target shorthand (the + * <target dev='...'/> sub-element, such as "vda"), or (since 0.9.8) + * an unambiguous source name of the block device (the <source + * file='...'/> sub-element, such as "/path/to/image"). Valid names + * can be found by calling virDomainGetXMLDesc() and inspecting + * elements within //domain/devices/disk. Some drivers might also + * accept strings such as "vda[1]" for setting the threshold of a + * backing image, useful when doing a block commit into the backing + * image. + * + * Domains may have more than one block device. To set thresholds for + * each you should make multiple calls to this function. If write + * thresholds are not supported, an application will have to instead + * poll virDomainGetBlockInfo() or similar to track allocation. + * + * Returns -1 in case of error, 0 in case of success. + */ +int +virDomainBlockSetWriteThreshold(virDomainPtr dom, + const char *disk, + unsigned long long threshold, + unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(dom, "disk=%s, threshold=%llu, flags=%x", + disk, threshold, flags); + + virResetLastError(); + + virCheckDomainReturn(dom, -1); + virCheckNonNullArgGoto(disk, error); + if (flags & VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE && + threshold > 100000) { + virReportError(VIR_ERR_INVALID_ARG, + _("threshold in %s is larger than 100%%"), + __FUNCTION__); + goto error; + } + conn = dom->conn; + + if (conn->driver->domainBlockSetWriteThreshold) { + int ret; + ret = conn->driver->domainBlockSetWriteThreshold(dom, disk, threshold, + flags); + if (ret < 0) + goto error; + return ret; + } + virReportUnsupportedError(); + + error: + virDispatchError(dom->conn); + return -1; +} + + +/** * virDomainInterfaceStats: * @dom: pointer to the domain object * @path: path to the interface @@ -11182,6 +11275,8 @@ virConnectGetDomainCapabilities(virConnectPtr conn, * unsigned long long. * "block.<num>.errors" - Xen only: the 'oo_req' value as * unsigned long long. + * "block.<num>.write-threshold" - byte at which a write threshold event + * will fire, as unsigned long long. * "block.<num>.allocation" - offset of the highest written sector * as unsigned long long. * "block.<num>.capacity" - logical size in bytes of the block device backing diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 716dd2f..d2bf1fa 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -715,4 +715,9 @@ LIBVIRT_1.2.16 { virDomainSetUserPassword; } LIBVIRT_1.2.15; +LIBVIRT_1.2.17 { + global: + virDomainBlockSetWriteThreshold; +} LIBVIRT_1.2.16; + # .... define new API here using predicted next version number .... diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 4c47473..f334570 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -12056,6 +12056,27 @@ vshEventDeviceAddedPrint(virConnectPtr conn ATTRIBUTE_UNUSED, } static void +vshEventWriteThresholdPrint(virConnectPtr conn ATTRIBUTE_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long threshold, + unsigned long long length, + void *opaque) +{ + vshDomEventData *data = opaque; + + if (!data->loop && *data->count) + return; + vshPrint(data->ctl, + _("event 'write-threshold' for domain %s disk %s: threshold %lld " + "exceeded by %lld bytes\n"), + virDomainGetName(dom), alias, threshold, length); + (*data->count)++; + if (!data->loop) + vshEventDone(data->ctl); +} + +static void vshEventTunablePrint(virConnectPtr conn ATTRIBUTE_UNUSED, virDomainPtr dom, virTypedParameterPtr params, @@ -12162,6 +12183,8 @@ static vshEventCallback vshEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(vshEventAgentLifecyclePrint), }, { "device-added", VIR_DOMAIN_EVENT_CALLBACK(vshEventDeviceAddedPrint), }, + { "write-threshold", + VIR_DOMAIN_EVENT_CALLBACK(vshEventWriteThresholdPrint), }, }; verify(VIR_DOMAIN_EVENT_ID_LAST == ARRAY_CARDINALITY(vshEventCallbacks)); diff --git a/tools/virsh.pod b/tools/virsh.pod index 9b57c8c..84e8fc4 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -919,6 +919,7 @@ local file or block device, "block.<num>.fl.reqs" - total flush requests, "block.<num>.fl.times" - total time (ns) spent on cache flushing, "block.<num>.errors" - Xen only: the 'oo_req' value, +"block.<num>.write-threshold" - write threshold event trigger, in bytes, "block.<num>.allocation" - offset of highest written sector in bytes, "block.<num>.capacity" - logical size of source file in bytes, "block.<num>.physical" - physical size of source file in bytes -- 2.4.2

On Fri, Jun 12, 2015 at 13:29:25 -0600, Eric Blake wrote:
qemu 2.3 added a new QMP command block-set-write-threshold, which allows callers to get an interrupt when a file hits a write threshold, rather than the current approach of repeatedly polling for file allocation. This patch prepares the API for callers to register to receive the event, as well as a way to query the threshold via virDomainListGetStats().
The event is one-shot in qemu - a guest must re-register a new threshold after each time it triggers. However, the virConnectDomainEventRegisterAny() call does not allow parameterization, so callers must use a pair of APIs - one to register the callback (one-time call) that will be used each time a threshold triggers for any guest disk, and another to repeatedly set the desired threshold (must be called each time a threshold should be changed).
Note that the threshold can either be registered by a byte offset, or by a thousandth of a percentage (a value between 0 and 100000, scaled to the disk size). But the value is always reported as a byte offset, even when registered as a percentage. I also considered having the setup parameter be a double, to allow a finer resolution on percentage; with the choice of an integer fixed-point scale, this means a 100G disk can only set a threshold to a granularity of 1M, but that is probably sufficient for the usage.
To make the patch series more digestible, this patch intentionally omits remote support, by using a couple of placeholders at a point where the compiler forces the addition of a case label within a switch statement.
* include/libvirt/libvirt-domain.h (virDomainBlockSetWriteThreshold): New API. (virConnectDomainEventWriteThresholdCallback): New event. * src/libvirt_public.syms (LIBVIRT_1.2.17): Export it. * src/libvirt-domain.c (virDomainBlockSetWriteThreshold): New API. (virConnectGetAllDomainStats): New stat. * src/driver-hypervisor.h (virDrvDomainBlockSetWriteThreshold): New hypervisor entry point. * tools/virsh-domain.c (vshEventWriteThresholdPrint): Print new event. * tools/virsh.pod (domstats): Document new stat. * daemon/remote.c (domainEventCallbacks): Add stub. * src/conf/domain_event.c (virDomainEventDispatchDefaultFunc): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com> --- daemon/remote.c | 2 + include/libvirt/libvirt-domain.h | 47 ++++++++++++++++++++ src/conf/domain_event.c | 4 +- src/driver-hypervisor.h | 7 +++ src/libvirt-domain.c | 95 ++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 5 +++ tools/virsh-domain.c | 23 ++++++++++ tools/virsh.pod | 1 + 8 files changed, 183 insertions(+), 1 deletion(-)
...
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index d851225..7514656 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1297,6 +1297,17 @@ int virDomainBlockStatsFlags (virDomainPtr dom, virTypedParameterPtr params, int *nparams, unsigned int flags); + +typedef enum { + /* threshold is thousandth of a percentage (0 to 100000) relative to
You managed to choose a unusual unit. Commonly used ones are 1/1000 and 1/1 000 000. Financial world also uses 1/10 000. Your unit of 1/100 000 is not among: https://en.wikipedia.org/wiki/Parts-per_notation#Parts-per_expressions I'd again suggest to use 1/1 000 000. Or if you want to be uber preciese you might choose 1/(2^64 - 1).
+ * image size rather than byte limit */ + VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE = (1 << 0), +} virDomainBlockSetWriteThresholdFlags; +int virDomainBlockSetWriteThreshold(virDomainPtr dom, + const char *disk, + unsigned long long threshold, + unsigned int flags); + int virDomainInterfaceStats (virDomainPtr dom, const char *path, virDomainInterfaceStatsPtr stats, @@ -3246,6 +3257,41 @@ typedef void (*virConnectDomainEventDeviceAddedCallback)(virConnectPtr conn, void *opaque);
/** + * virConnectDomainEventWriteThresholdCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @devAlias: device alias + * @threshold: threshold that was exceeded, in bytes + * @length: length beyond @threshold that was involved in the triggering + * write, or 0 if not known + * @opaque: application specified data + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD with virConnectDomainEventRegisterAny() + * + * This callback occurs when a block device detects a write event that + * exceeds a non-zero threshold set by + * virDomainBlockSetWriteThreshold(). When this event occurs, the + * threshold is reset to 0, and a new limit must be installed to see + * the event again on the same device. The intent of this event is to + * allow time for the underlying storage to be resized dynamically + * prior to the point where the guest would be paused due to running + * out of space, without having to poll for allocation values. + * + * The contents of @devAlias will be "vda" when the threshold is triggered + * on the active layer of guest disk vda. Some hypervisors also support + * threshold reporting on backing images, such as during a block commit; + * when that happens, @devAlias will be "vda[1]" for the backingStore at + * index 1 within the chain of host resources for guest disk vda.
Is it perhaps worth to include a optional field that will contain the file path since most use cases of this event will use a local block device with the event? iscsi and NBD block devices then could return the field empty as we now do in the bulk stats API
+ */ +typedef void (*virConnectDomainEventWriteThresholdCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *devAlias, + unsigned long long threshold, + unsigned long long length, + void *opaque); + +/** * VIR_DOMAIN_TUNABLE_CPU_VCPUPIN: * * Macro represents formatted pinning for one vcpu specified by id which is ...
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 7e6d749..53114d3 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -5743,6 +5743,99 @@ virDomainBlockStatsFlags(virDomainPtr dom,
/** + * virDomainBlockSetWriteThreshold: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @threshold: limit at which a write threshold event can trigger + * @flags: bitwise-OR of virDomainBlockSetWriteThresholdFlags + * + * This function is used to set a one-shot write threshold. It should + * be used in tandem with virConnectDomainEventRegisterAny() + * installing a handler for VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD. If + * the hypervisor detects that a write request (whether guest data, or + * host metadata) would exceed the host byte offset specified in + * @threshold, then an event is raised, and the threshold is reset to + * 0 at that time. The event registration is only needed once, but + * this function must be called each time a new threshold is desired; + * the event will only fire if a non-zero threshold is + * exceeded. + * + * By default, @threshold is specified in bytes, and must not exceed + * the size of the block device. However, when @flags includes + * VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE, @threshold is + * instead a value between 0 an 100,000, as a thousandth of a percent + * of the current size of the disk, and the driver will compute the + * corresponding byte value. For example, 80000 represents 80.000%. + * A driver may round the requested threshold to a granularity that + * can actually be supported. + * + * Setting a threshold allows event-based resizing of host resources + * that back a guest disk without having to poll the current disk + * allocation, while still having enough time to complete the resize + * before the guest would end up halting due to insufficient space. + * Calling this function to set the threshold back to zero will stop + * further firing of the event. virConnectGetAllDomainStats() can be + * used to track the current threshold value, always in the form + * normalized to bytes. + * + * The @disk parameter is either the device target shorthand (the + * <target dev='...'/> sub-element, such as "vda"), or (since 0.9.8)
Since this will be added in 1.3.0 (or 1.2.15) the "since" statement is not exactly true.
+ * an unambiguous source name of the block device (the <source + * file='...'/> sub-element, such as "/path/to/image"). Valid names + * can be found by calling virDomainGetXMLDesc() and inspecting + * elements within //domain/devices/disk. Some drivers might also + * accept strings such as "vda[1]" for setting the threshold of a + * backing image, useful when doing a block commit into the backing + * image. + * + * Domains may have more than one block device. To set thresholds for + * each you should make multiple calls to this function. If write + * thresholds are not supported, an application will have to instead + * poll virDomainGetBlockInfo() or similar to track allocation. + * + * Returns -1 in case of error, 0 in case of success. + */
Otherwise looks good to me. Peter

On 06/15/2015 07:19 AM, Peter Krempa wrote:
On Fri, Jun 12, 2015 at 13:29:25 -0600, Eric Blake wrote:
qemu 2.3 added a new QMP command block-set-write-threshold, which allows callers to get an interrupt when a file hits a write threshold, rather than the current approach of repeatedly polling for file allocation. This patch prepares the API for callers to register to receive the event, as well as a way to query the threshold via virDomainListGetStats().
+ +typedef enum { + /* threshold is thousandth of a percentage (0 to 100000) relative to
You managed to choose a unusual unit. Commonly used ones are 1/1000 and 1/1 000 000. Financial world also uses 1/10 000. Your unit of 1/100 000 is not among:
https://en.wikipedia.org/wiki/Parts-per_notation#Parts-per_expressions
I'd again suggest to use 1/1 000 000. Or if you want to be uber preciese you might choose 1/(2^64 - 1).
Francesco, what precision would you like? Parts per million seems okay to me, if we want an order of magnitude closer; and I don't think we need anything beyond that. Or if parts per thousand is sufficient, that leads to smaller numbers on input. But it's pretty trivial for me to adjust the code to a different base, for whatever people would like.
+ * The contents of @devAlias will be "vda" when the threshold is triggered + * on the active layer of guest disk vda. Some hypervisors also support + * threshold reporting on backing images, such as during a block commit; + * when that happens, @devAlias will be "vda[1]" for the backingStore at + * index 1 within the chain of host resources for guest disk vda.
Is it perhaps worth to include a optional field that will contain the file path since most use cases of this event will use a local block device with the event? iscsi and NBD block devices then could return the field empty as we now do in the bulk stats API
Good idea - since the input accepts local path names instead of "vda", the output can display both where a local name is available. I'll update my code.
+ * + * The @disk parameter is either the device target shorthand (the + * <target dev='...'/> sub-element, such as "vda"), or (since 0.9.8)
Since this will be added in 1.3.0 (or 1.2.15) the "since" statement is not exactly true.
Copy-and-paste strikes again :) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

----- Original Message -----
From: "Eric Blake" <eblake@redhat.com> To: "Peter Krempa" <pkrempa@redhat.com> Cc: libvir-list@redhat.com, fromani@redhat.com Sent: Monday, June 15, 2015 6:21:13 PM Subject: Re: [libvirt] [RFC PATCHv2 1/8] threshold: new API virDomainBlockSetWriteThreshold
On 06/15/2015 07:19 AM, Peter Krempa wrote:
On Fri, Jun 12, 2015 at 13:29:25 -0600, Eric Blake wrote:
qemu 2.3 added a new QMP command block-set-write-threshold, which allows callers to get an interrupt when a file hits a write threshold, rather than the current approach of repeatedly polling for file allocation. This patch prepares the API for callers to register to receive the event, as well as a way to query the threshold via virDomainListGetStats().
+ +typedef enum { + /* threshold is thousandth of a percentage (0 to 100000) relative to
You managed to choose a unusual unit. Commonly used ones are 1/1000 and 1/1 000 000. Financial world also uses 1/10 000. Your unit of 1/100 000 is not among:
https://en.wikipedia.org/wiki/Parts-per_notation#Parts-per_expressions
I'd again suggest to use 1/1 000 000. Or if you want to be uber preciese you might choose 1/(2^64 - 1).
Francesco, what precision would you like? Parts per million seems okay to me, if we want an order of magnitude closer; and I don't think we need anything beyond that. Or if parts per thousand is sufficient, that leads to smaller numbers on input. But it's pretty trivial for me to adjust the code to a different base, for whatever people would like.
We (in oVirt) use very coarse thresholds. For our current needs, I believe even parts per thousand is sufficient. Trying to be a bit forward thinking, I believe parts per million is perfectly fine. Bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani

Add a new 'virsh domblkthreshold' command to use the new API. * tools/virsh.pod (domblkthreshold): Document it. * tools/virsh-domain-monitor.c (cmdDomblkthreshold): New function. (domMonitoringCmds): Register it. Signed-off-by: Eric Blake <eblake@redhat.com> --- tools/virsh-domain-monitor.c | 81 ++++++++++++++++++++++++++++++++++++++++++++ tools/virsh.pod | 24 +++++++++++++ 2 files changed, 105 insertions(+) diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c index 1d4dc25..66f7571 100644 --- a/tools/virsh-domain-monitor.c +++ b/tools/virsh-domain-monitor.c @@ -571,6 +571,81 @@ cmdDomblklist(vshControl *ctl, const vshCmd *cmd) } /* + * "domblkthreshold" command + */ +static const vshCmdInfo info_domblkthreshold[] = { + {.name = "help", + .data = N_("set domain block device write thresholds") + }, + {.name = "desc", + .data = N_("Set a threshold to get a one-shot event if block " + "allocation exceeds that size") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_domblkthreshold[] = { + {.name = "domain", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("domain name, id or uuid"), + }, + {.name = "device", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("block device"), + }, + {.name = "threshold", + .type = VSH_OT_INT, + .flags = VSH_OFLAG_REQ, + .help = N_("new threshold, or 0 to disable"), + }, + {.name = "percentage", + .type = VSH_OT_BOOL, + .help = N_("threshold is in units of .001% instead of bytes"), + }, + {.name = NULL} +}; + +static bool +cmdDomblkthreshold(vshControl *ctl, const vshCmd *cmd) +{ + virDomainPtr dom; + bool ret = false; + const char *device = NULL; + unsigned long long threshold; + bool percentage = vshCommandOptBool(cmd, "percentage"); + unsigned int flags = 0; + + if (percentage) + flags |= VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE; + + if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (vshCommandOptStringReq(ctl, cmd, "device", &device) < 0) + goto cleanup; + if (vshCommandOptULongLong(ctl, cmd, "threshold", &threshold) < 0) + goto cleanup; + + if (virDomainBlockSetWriteThreshold(dom, device, threshold, flags) < 0) + goto cleanup; + + if (percentage) + vshPrint(ctl, _("threshold of %s set to %llu.%03llu%%\n"), + device, threshold / 1000, threshold % 1000); + else + vshPrint(ctl, _("threshold of %s set to %llu bytes\n"), + device, threshold); + + ret = true; + + cleanup: + virDomainFree(dom); + return ret; +} + +/* * "domiflist" command */ static const vshCmdInfo info_domiflist[] = { @@ -2358,6 +2433,12 @@ const vshCmdDef domMonitoringCmds[] = { .info = info_domblkstat, .flags = 0 }, + {.name = "domblkthreshold", + .handler = cmdDomblkthreshold, + .opts = opts_domblkthreshold, + .info = info_domblkthreshold, + .flags = 0 + }, {.name = "domcontrol", .handler = cmdDomControl, .opts = opts_domcontrol, diff --git a/tools/virsh.pod b/tools/virsh.pod index 84e8fc4..1c94651 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -843,6 +843,30 @@ that require a block device name (such as I<domblkinfo> or I<snapshot-create> for disk snapshots) will accept either target or unique source names printed by this command. +=item B<domblkthreshold> I<domain> I<block-device> I<threshold> +[I<--percentage>] + +Set the write threshold for a block device of a domain. A +I<block-device> corresponds to a unique target name (<target +dev='name'/>) or source file (<source file='name'/>) for one of the +disk devices attached to I<domain> (see also B<domblklist> for listing +these names). Some hypervisors also allow "vda[1]" to set a threshold +on the first backing file of the target "vda", useful when doing a +block commit. + +By default, I<threshold> is the byte offset within the host that will +trigger an event; but if I<--percentage> is used, it is instead a +thousandth of a percent of the disk size (80000 maps to an 80% +threshold). + +Setting a write threshold causes an event to be delivered if the +allocation of I<block-device> passes that point, allowing a user to +resize the underlying storage before the guest would be forcefully +paused due to an ENOSPC scenario. The B<event> command can be used to +listen for such events. If the hypervisor does not support event +notification, then B<domblkinfo> must instead be polled to track +allocation over time. The current threshold is listed in B<domstats>. + =item B<domstats> [I<--raw>] [I<--enforce>] [I<--backing>] [I<--state>] [I<--cpu-total>] [I<--balloon>] [I<--vcpu>] [I<--interface>] [I<--block>] [[I<--list-active>] [I<--list-inactive>] [I<--list-persistent>] -- 2.4.2

On Fri, Jun 12, 2015 at 13:29:26 -0600, Eric Blake wrote:
Add a new 'virsh domblkthreshold' command to use the new API.
* tools/virsh.pod (domblkthreshold): Document it. * tools/virsh-domain-monitor.c (cmdDomblkthreshold): New function. (domMonitoringCmds): Register it.
Signed-off-by: Eric Blake <eblake@redhat.com> --- tools/virsh-domain-monitor.c | 81 ++++++++++++++++++++++++++++++++++++++++++++ tools/virsh.pod | 24 +++++++++++++ 2 files changed, 105 insertions(+)
diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c index 1d4dc25..66f7571 100644 --- a/tools/virsh-domain-monitor.c +++ b/tools/virsh-domain-monitor.c @@ -571,6 +571,81 @@ cmdDomblklist(vshControl *ctl, const vshCmd *cmd) }
/* + * "domblkthreshold" command + */ +static const vshCmdInfo info_domblkthreshold[] = { + {.name = "help", + .data = N_("set domain block device write thresholds") + }, + {.name = "desc", + .data = N_("Set a threshold to get a one-shot event if block " + "allocation exceeds that size") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_domblkthreshold[] = { + {.name = "domain", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("domain name, id or uuid"), + }, + {.name = "device", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("block device"), + }, + {.name = "threshold", + .type = VSH_OT_INT, + .flags = VSH_OFLAG_REQ, + .help = N_("new threshold, or 0 to disable"), + }, + {.name = "percentage",
I'd rather use "proportional" or something else that does not hint to parts per hundred.
+ .type = VSH_OT_BOOL, + .help = N_("threshold is in units of .001% instead of bytes"), + }, + {.name = NULL} +}; + +static bool +cmdDomblkthreshold(vshControl *ctl, const vshCmd *cmd) +{ + virDomainPtr dom; + bool ret = false; + const char *device = NULL; + unsigned long long threshold; + bool percentage = vshCommandOptBool(cmd, "percentage"); + unsigned int flags = 0; + + if (percentage) + flags |= VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE;
As well as here.
+ + if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (vshCommandOptStringReq(ctl, cmd, "device", &device) < 0) + goto cleanup; + if (vshCommandOptULongLong(ctl, cmd, "threshold", &threshold) < 0) + goto cleanup; +
Peter

On 06/15/2015 07:29 AM, Peter Krempa wrote:
On Fri, Jun 12, 2015 at 13:29:26 -0600, Eric Blake wrote:
Add a new 'virsh domblkthreshold' command to use the new API.
* tools/virsh.pod (domblkthreshold): Document it. * tools/virsh-domain-monitor.c (cmdDomblkthreshold): New function. (domMonitoringCmds): Register it.
Signed-off-by: Eric Blake <eblake@redhat.com> --- tools/virsh-domain-monitor.c | 81 ++++++++++++++++++++++++++++++++++++++++++++ tools/virsh.pod | 24 +++++++++++++ 2 files changed, 105 insertions(+)
+ {.name = "percentage",
I'd rather use "proportional" or something else that does not hint to parts per hundred.
Good idea. It also means I need to tweak the flag name in patch 1. I also wonder if I should teach virsh to allow '--percentage 72.5' (meaning 72.5%) in addition to '--proportion 725000' (if we switch to parts-per-million)? Floating point parameter parsing in virsh could be interesting...
+ if (percentage) + flags |= VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE;
As well as here.
Ah, so you caught the same naming conundrum as I mentioned above :) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Common code for tracking a write threshold event from creation to delivery through a callback. * src/conf/domain_event.c (_virDomainEventWriteThreshold): New struct. (virDomainEventsOnceInit, virDomainEventWriteThresholdDispose) (virDomainEventWriteThresholdNew) (virDomainEventWriteThresholdNewFromObj) (virDomainEventWriteThresholdNewFromDom) (virDomainEventDispatchDefaultFunc): Wire it up. * src/conf/domain_event.h: Expose new functions. * src/libvirt-private.syms (domain_event.h): Export them. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/conf/domain_event.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++- src/conf/domain_event.h | 14 +++++++- src/libvirt_private.syms | 2 ++ 3 files changed, 105 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index c43799f..968ce94 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -56,6 +56,7 @@ static virClassPtr virDomainQemuMonitorEventClass; static virClassPtr virDomainEventTunableClass; static virClassPtr virDomainEventAgentLifecycleClass; static virClassPtr virDomainEventDeviceAddedClass; +static virClassPtr virDomainEventWriteThresholdClass; static void virDomainEventDispose(void *obj); @@ -74,6 +75,7 @@ static void virDomainQemuMonitorEventDispose(void *obj); static void virDomainEventTunableDispose(void *obj); static void virDomainEventAgentLifecycleDispose(void *obj); static void virDomainEventDeviceAddedDispose(void *obj); +static void virDomainEventWriteThresholdDispose(void *obj); static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -236,6 +238,16 @@ struct _virDomainEventAgentLifecycle { typedef struct _virDomainEventAgentLifecycle virDomainEventAgentLifecycle; typedef virDomainEventAgentLifecycle *virDomainEventAgentLifecyclePtr; +struct _virDomainEventWriteThreshold { + virDomainEvent parent; + + char *disk; + unsigned long long threshold; + unsigned long long length; +}; +typedef struct _virDomainEventWriteThreshold virDomainEventWriteThreshold; +typedef virDomainEventWriteThreshold *virDomainEventWriteThresholdPtr; + static int virDomainEventsOnceInit(void) @@ -336,6 +348,12 @@ virDomainEventsOnceInit(void) sizeof(virDomainEventAgentLifecycle), virDomainEventAgentLifecycleDispose))) return -1; + if (!(virDomainEventWriteThresholdClass = + virClassNew(virDomainEventClass, + "virDomainEventWriteThreshold", + sizeof(virDomainEventWriteThreshold), + virDomainEventWriteThresholdDispose))) + return -1; return 0; } @@ -496,6 +514,15 @@ virDomainEventAgentLifecycleDispose(void *obj) VIR_DEBUG("obj=%p", event); }; +static void +virDomainEventWriteThresholdDispose(void *obj) +{ + virDomainEventWriteThresholdPtr event = obj; + VIR_DEBUG("obj=%p", event); + + VIR_FREE(event->disk); +}; + static void * virDomainEventNew(virClassPtr klass, @@ -1390,6 +1417,57 @@ virDomainEventTunableNewFromDom(virDomainPtr dom, nparams); } +static virObjectEventPtr +virDomainEventWriteThresholdNew(int id, + const char *name, + unsigned char *uuid, + const char *disk, + unsigned long long threshold, + unsigned long long length) +{ + virDomainEventWriteThresholdPtr ev; + + if (virDomainEventsInitialize() < 0) + return NULL; + + if (!(ev = virDomainEventNew(virDomainEventWriteThresholdClass, + VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD, + id, name, uuid))) + return NULL; + + if (VIR_STRDUP(ev->disk, disk) < 0) + goto error; + ev->threshold = threshold; + ev->length = length; + + return (virObjectEventPtr)ev; + + error: + virObjectUnref(ev); + return NULL; +} + +virObjectEventPtr +virDomainEventWriteThresholdNewFromObj(virDomainObjPtr obj, + const char *disk, + unsigned long long threshold, + unsigned long long length) +{ + return virDomainEventWriteThresholdNew(obj->def->id, obj->def->name, + obj->def->uuid, disk, + threshold, length); +} + +virObjectEventPtr +virDomainEventWriteThresholdNewFromDom(virDomainPtr dom, + const char *disk, + unsigned long long threshold, + unsigned long long length) +{ + return virDomainEventWriteThresholdNew(dom->id, dom->name, dom->uuid, + disk, threshold, length); +} + static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -1615,7 +1693,18 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, } case VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD: - /* TODO: Implement RPC support for this */ + { + virDomainEventWriteThresholdPtr ev; + + ev = (virDomainEventWriteThresholdPtr)event; + ((virConnectDomainEventWriteThresholdCallback)cb)(conn, dom, + ev->disk, + ev->threshold, + ev->length, + cbopaque); + goto cleanup; + } + case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/conf/domain_event.h b/src/conf/domain_event.h index afbed89..3421fef 100644 --- a/src/conf/domain_event.h +++ b/src/conf/domain_event.h @@ -1,7 +1,7 @@ /* * domain_event.h: domain event queue processing helpers * - * Copyright (C) 2012-2014 Red Hat, Inc. + * Copyright (C) 2012-2015 Red Hat, Inc. * Copyright (C) 2008 VirtualIron * Copyright (C) 2013 SUSE LINUX Products GmbH, Nuernberg, Germany. * @@ -209,6 +209,18 @@ virDomainEventAgentLifecycleNewFromDom(virDomainPtr dom, int state, int reason); +virObjectEventPtr +virDomainEventWriteThresholdNewFromObj(virDomainObjPtr obj, + const char *disk, + unsigned long long threshold, + unsigned long long length); +virObjectEventPtr +virDomainEventWriteThresholdNewFromDom(virDomainPtr dom, + const char *disk, + unsigned long long threshold, + unsigned long long length); + + int virDomainEventStateRegister(virConnectPtr conn, virObjectEventStatePtr state, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index dc8a52d..bd1a600 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -534,6 +534,8 @@ virDomainEventTunableNewFromDom; virDomainEventTunableNewFromObj; virDomainEventWatchdogNewFromDom; virDomainEventWatchdogNewFromObj; +virDomainEventWriteThresholdNewFromDom; +virDomainEventWriteThresholdNewFromObj; virDomainQemuMonitorEventNew; virDomainQemuMonitorEventStateRegisterID; -- 2.4.2

Pass the new virDomainBlockSetWriteThreshold() API and WRITE_THRESHOLD event across RPC. The event takes the bulk of the patch, but everything here is pretty mechanical and copies patterns from existing code. I debated about splitting the event separately from the API, in case we ever want to support setting a write threshold by overloading an existing API such as virDomainSetBlockIoTune(); but consensus on the design was that overloading existing API was not appropriate, and that no one is expecting to use this functionality without also being prepared to require the .so bump. * daemon/remote.c (remoteRelayDomainEventDeviceAdded): Fix odd use of comma operator. (remoteRelayDomainEventWriteThreshold): New function. * src/remote/remote_driver.c (remoteDomainBuildEventCallbackWriteThreshold): Likewise. * src/remote/remote_protocol.x (remote_domain_event_callback_write_threshold_msg) (remote_domain_block_set_write_threshold_args): New structs. * src/remote_protocol-structs: Update. Signed-off-by: Eric Blake <eblake@redhat.com> --- daemon/remote.c | 41 +++++++++++++++++++++++++++++++++++++++-- src/remote/remote_driver.c | 34 ++++++++++++++++++++++++++++++++++ src/remote/remote_protocol.x | 29 ++++++++++++++++++++++++++++- src/remote_protocol-structs | 15 +++++++++++++++ 4 files changed, 116 insertions(+), 3 deletions(-) diff --git a/daemon/remote.c b/daemon/remote.c index 283ece2..cc9936b 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1079,6 +1079,44 @@ remoteRelayDomainEventDeviceAdded(virConnectPtr conn, } +static int +remoteRelayDomainEventWriteThreshold(virConnectPtr conn, + virDomainPtr dom, + const char *disk, + unsigned long long threshold, + unsigned long long length, + void *opaque) +{ + daemonClientEventCallbackPtr callback = opaque; + remote_domain_event_callback_write_threshold_msg data; + + if (callback->callbackID < 0 || + !remoteRelayDomainEventCheckACL(callback->client, conn, dom)) + return -1; + + VIR_DEBUG("Relaying domain write threshold event %s %d %s %lld %lld, " + "callback %d", dom->name, dom->id, + disk, threshold, length, callback->callbackID); + + /* build return data */ + memset(&data, 0, sizeof(data)); + + if (VIR_STRDUP(data.disk, disk) < 0) + return -1; + + make_nonnull_domain(&data.dom, dom); + data.callbackID = callback->callbackID; + data.threshold = threshold; + data.length = length; + + remoteDispatchObjectEventSend(callback->client, remoteProgram, + REMOTE_PROC_DOMAIN_EVENT_CALLBACK_WRITE_THRESHOLD, + (xdrproc_t)xdr_remote_domain_event_callback_write_threshold_msg, + &data); + + return 0; +} + static virConnectDomainEventGenericCallback domainEventCallbacks[] = { @@ -1102,8 +1140,7 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventTunable), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventAgentLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventDeviceAdded), - /* TODO: Implement RPC support for this */ - VIR_DOMAIN_EVENT_CALLBACK(NULL), + VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventWriteThreshold), }; verify(ARRAY_CARDINALITY(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 273799b..2e631ba 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -342,6 +342,12 @@ remoteDomainBuildEventCallbackAgentLifecycle(virNetClientProgramPtr prog, void *evdata, void *opaque); static void +remoteDomainBuildEventCallbackWriteThreshold(virNetClientProgramPtr prog, + virNetClientPtr client, + void *evdata, void *opaque); + + +static void remoteNetworkBuildEventLifecycle(virNetClientProgramPtr prog ATTRIBUTE_UNUSED, virNetClientPtr client ATTRIBUTE_UNUSED, void *evdata, void *opaque); @@ -504,6 +510,10 @@ static virNetClientProgramEvent remoteEvents[] = { remoteDomainBuildEventCallbackDeviceAdded, sizeof(remote_domain_event_callback_device_added_msg), (xdrproc_t)xdr_remote_domain_event_callback_device_added_msg }, + { REMOTE_PROC_DOMAIN_EVENT_CALLBACK_WRITE_THRESHOLD, + remoteDomainBuildEventCallbackWriteThreshold, + sizeof(remote_domain_event_callback_write_threshold_msg), + (xdrproc_t)xdr_remote_domain_event_callback_write_threshold_msg }, }; @@ -5518,6 +5528,29 @@ remoteDomainBuildEventCallbackAgentLifecycle(virNetClientProgramPtr prog ATTRIBU } static void +remoteDomainBuildEventCallbackWriteThreshold(virNetClientProgramPtr prog ATTRIBUTE_UNUSED, + virNetClientPtr client ATTRIBUTE_UNUSED, + void *evdata, void *opaque) +{ + virConnectPtr conn = opaque; + remote_domain_event_callback_write_threshold_msg *msg = evdata; + struct private_data *priv = conn->privateData; + virDomainPtr dom; + virObjectEventPtr event = NULL; + + if (!(dom = get_nonnull_domain(conn, msg->dom))) + return; + + event = virDomainEventWriteThresholdNewFromDom(dom, msg->disk, + msg->threshold, + msg->length); + + virObjectUnref(dom); + + remoteEventQueue(priv, event, msg->callbackID); +} + +static void remoteNetworkBuildEventLifecycle(virNetClientProgramPtr prog ATTRIBUTE_UNUSED, virNetClientPtr client ATTRIBUTE_UNUSED, void *evdata, void *opaque) @@ -8275,6 +8308,7 @@ static virHypervisorDriver hypervisor_driver = { .domainBlockResize = remoteDomainBlockResize, /* 0.9.8 */ .domainBlockStats = remoteDomainBlockStats, /* 0.3.2 */ .domainBlockStatsFlags = remoteDomainBlockStatsFlags, /* 0.9.5 */ + .domainBlockSetWriteThreshold = remoteDomainBlockSetWriteThreshold, /* 1.2.17 */ .domainInterfaceStats = remoteDomainInterfaceStats, /* 0.3.2 */ .domainSetInterfaceParameters = remoteDomainSetInterfaceParameters, /* 0.9.9 */ .domainGetInterfaceParameters = remoteDomainGetInterfaceParameters, /* 0.9.9 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 9f1be6b..93f0952 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3230,6 +3230,21 @@ struct remote_domain_set_user_password_args { unsigned int flags; }; +struct remote_domain_event_callback_write_threshold_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string disk; + unsigned hyper threshold; + unsigned hyper length; +}; + +struct remote_domain_block_set_write_threshold_args { + remote_nonnull_domain dom; + remote_nonnull_string disk; + unsigned hyper threshold; + unsigned int flags; +}; + /*----- Protocol. -----*/ @@ -5696,5 +5711,17 @@ enum remote_procedure { * @generate:both * @acl: domain:set_password */ - REMOTE_PROC_DOMAIN_SET_USER_PASSWORD = 357 + REMOTE_PROC_DOMAIN_SET_USER_PASSWORD = 357, + + /** + * @generate: both + * @acl: none + */ + REMOTE_PROC_DOMAIN_EVENT_CALLBACK_WRITE_THRESHOLD = 358, + + /** + * @generate: both + * @acl: domain:block_write + */ + REMOTE_PROC_DOMAIN_BLOCK_SET_WRITE_THRESHOLD = 359 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 48c3bd8..991357f 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -2684,6 +2684,19 @@ struct remote_domain_set_user_password_args { remote_string password; u_int flags; }; +struct remote_domain_event_callback_write_threshold_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string disk; + uint64_t threshold; + uint64_t length; +}; +struct remote_domain_block_set_write_threshold_args { + remote_nonnull_domain dom; + remote_nonnull_string disk; + uint64_t threshold; + u_int flags; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3042,4 +3055,6 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_ADD_IOTHREAD = 355, REMOTE_PROC_DOMAIN_DEL_IOTHREAD = 356, REMOTE_PROC_DOMAIN_SET_USER_PASSWORD = 357, + REMOTE_PROC_DOMAIN_EVENT_CALLBACK_WRITE_THRESHOLD = 358, + REMOTE_PROC_DOMAIN_BLOCK_SET_WRITE_THRESHOLD = 359, }; -- 2.4.2

Track whether qemu is new enough to do block thresholds on the active layer. The plan is that even if qemu is too old, the event handler can still be registered, but will never fire (since it is useful to bulk-install handlers); while the request to set a threshold will honor the capability bit and fail up front if it is not possible. FIXME: Note that qemu requires that libvirt use a node name and not a device name to actually use the feature. What's more, a single qcow2 host resource results in two separate qemu nodes (one node for the guest view served by qcow2 protocol, the other node for the underlying host file access), so I'm working on a patch to qemu to automatically name all nodes (rather than having to hack up libvirt to supply two separate node names for a much more complex command line), at which point I'll need code in libvirt to probe the node name that got assigned by qemu. So I may still need another capability bit for whether qemu is new enough to have the patch to auto-name all nodes. * src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCK_WRITE_THRESHOLD): New bit. * src/qemu/qemu_capabilities.c (virQEMUCapsCommands): Enable it. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_capabilities.c | 4 +++- src/qemu/qemu_capabilities.h | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 8a64422..81c63088 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1,7 +1,7 @@ /* * qemu_capabilities.c: QEMU capabilities generation * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -287,6 +287,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "aarch64-off", "vhost-user-multiq", /* 190 */ + "block-write-threshold", ); @@ -1495,6 +1496,7 @@ struct virQEMUCapsStringFlags virQEMUCapsCommands[] = { { "nbd-server-start", QEMU_CAPS_NBD_SERVER }, { "change-backing-file", QEMU_CAPS_CHANGE_BACKING_FILE }, { "rtc-reset-reinjection", QEMU_CAPS_RTC_RESET_REINJECTION }, + { "block-set-write-threshold", QEMU_CAPS_BLOCK_WRITE_THRESHOLD }, }; struct virQEMUCapsStringFlags virQEMUCapsMigration[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 3c166b6..e018043 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -1,7 +1,7 @@ /* * qemu_capabilities.h: QEMU capabilities generation * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -230,6 +230,7 @@ typedef enum { QEMU_CAPS_DEVICE_PCI_SERIAL = 188, /* -device pci-serial */ QEMU_CAPS_CPU_AARCH64_OFF = 189, /* -cpu ...,aarch64=off */ QEMU_CAPS_VHOSTUSER_MULTIQ = 190, /* vhost-user with -netdev queues= */ + QEMU_CAPS_BLOCK_WRITE_THRESHOLD = 191, /* block-set-write-threshold */ QEMU_CAPS_LAST, /* this must always be the last item */ } virQEMUCapsFlags; -- 2.4.2

On Fri, Jun 12, 2015 at 13:29:29 -0600, Eric Blake wrote:
Track whether qemu is new enough to do block thresholds on the active layer. The plan is that even if qemu is too old, the event handler can still be registered, but will never fire (since
Well the event handler can be registered, but the API for setting the actual threshold value should return failure in case when qemu does not support it.
it is useful to bulk-install handlers); while the request to set a threshold will honor the capability bit and fail up front if it is not possible.
FIXME: Note that qemu requires that libvirt use a node name and not a device name to actually use the feature. What's more, a single qcow2 host resource results in two separate qemu nodes (one node for the guest view served by qcow2 protocol, the other node for the underlying host file access), so I'm working on a patch to qemu to automatically name all nodes (rather than having to hack up libvirt to supply two separate node names for a much more complex command line), at which point I'll need code in libvirt to probe the node name that got assigned by qemu. So I may still need another capability bit for whether qemu is new enough to have the patch to auto-name all nodes.
Or perhaps just refuse to use this unless qemu is new enough to support it?
* src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCK_WRITE_THRESHOLD): New bit. * src/qemu/qemu_capabilities.c (virQEMUCapsCommands): Enable it.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_capabilities.c | 4 +++- src/qemu/qemu_capabilities.h | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-)

On 06/15/2015 07:33 AM, Peter Krempa wrote:
On Fri, Jun 12, 2015 at 13:29:29 -0600, Eric Blake wrote:
Track whether qemu is new enough to do block thresholds on the active layer. The plan is that even if qemu is too old, the event handler can still be registered, but will never fire (since
Well the event handler can be registered, but the API for setting the actual threshold value should return failure in case when qemu does not support it.
it is useful to bulk-install handlers); while the request to set a threshold will honor the capability bit and fail up front if it is not possible.
Yes, that's what the second half of the sentence was trying to say. I can reorder things to make it more obvious: If qemu is too old, attempts to register a threshold value will return failure; but the event handler callback can still be registered successfully (and will just never fire).
FIXME: Note that qemu requires that libvirt use a node name and not a device name to actually use the feature. What's more, a single qcow2 host resource results in two separate qemu nodes (one node for the guest view served by qcow2 protocol, the other node for the underlying host file access), so I'm working on a patch to qemu to automatically name all nodes (rather than having to hack up libvirt to supply two separate node names for a much more complex command line), at which point I'll need code in libvirt to probe the node name that got assigned by qemu. So I may still need another capability bit for whether qemu is new enough to have the patch to auto-name all nodes.
Or perhaps just refuse to use this unless qemu is new enough to support it?
Yeah, after the weekend, my current work-in-progress is to patch qemu to auto-name nodes, and to make libvirt refuse to allow threshold settings unless auto-named nodes are detected in addition to the write-threshold command, since the qemu auto-name patch can be fairly easy backported by distros. It's thus turning into two capability bits, both of which must be present to use thresholds (but where the callback can be registered even if either or both capabilities are missing, just that the callback will never fire in those cases). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

With this patch, block write threshold events delivered by qemu are converted into libvirt events. FIXME: Qemu reports stats based on node name, but my plan is to rely on a patch where qemu auto-assigns node names rather than libvirt having to create and track it all. However, there's still the matter that libvirt needs to map a node name back to the right host resource. I tested this patch using a qemu hack patch: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg02858.html which reported the device name for the active layer node, but the active layer node is the guest view, not the host view. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockWriteThreshold): New function. * src/qemu/qemu_monitor.c (qemuMonitorEmitBlockThreshold): Likewise. * src/qemu/qemu_monitor.h (qemuMonitorEmitBlockThreshold): Likewise. * src/qemu/qemu_process.c (qemuProcessHandleBlockThreshold): Likewise. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_monitor.c | 14 ++++++++++++++ src/qemu/qemu_monitor.h | 11 +++++++++++ src/qemu/qemu_monitor_json.c | 33 +++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 34 ++++++++++++++++++++++++++++++++++ 4 files changed, 92 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 33600f0..27e5a32 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1426,6 +1426,20 @@ qemuMonitorEmitBlockJob(qemuMonitorPtr mon, } +int qemuMonitorEmitBlockThreshold(qemuMonitorPtr mon, + const char *diskAlias, + unsigned long long threshold, + unsigned long long length) +{ + int ret = -1; + VIR_DEBUG("mon=%p", mon); + + QEMU_MONITOR_CALLBACK(mon, ret, domainBlockThreshold, mon->vm, + diskAlias, threshold, length); + return ret; +} + + int qemuMonitorEmitBalloonChange(qemuMonitorPtr mon, unsigned long long actual) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index d30b514..381c0a6 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -146,6 +146,12 @@ typedef int (*qemuMonitorDomainBlockJobCallback)(qemuMonitorPtr mon, int type, int status, void *opaque); +typedef int (*qemuMonitorDomainBlockThresholdCallback)(qemuMonitorPtr mon, + virDomainObjPtr vm, + const char *diskAlias, + unsigned long long threshold, + unsigned long long length, + void *opaque); typedef int (*qemuMonitorDomainTrayChangeCallback)(qemuMonitorPtr mon, virDomainObjPtr vm, const char *devAlias, @@ -200,6 +206,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainIOErrorCallback domainIOError; qemuMonitorDomainGraphicsCallback domainGraphics; qemuMonitorDomainBlockJobCallback domainBlockJob; + qemuMonitorDomainBlockThresholdCallback domainBlockThreshold; qemuMonitorDomainTrayChangeCallback domainTrayChange; qemuMonitorDomainPMWakeupCallback domainPMWakeup; qemuMonitorDomainPMSuspendCallback domainPMSuspend; @@ -296,6 +303,10 @@ int qemuMonitorEmitBlockJob(qemuMonitorPtr mon, const char *diskAlias, int type, int status); +int qemuMonitorEmitBlockThreshold(qemuMonitorPtr mon, + const char *diskAlias, + unsigned long long threshold, + unsigned long long length); int qemuMonitorEmitBalloonChange(qemuMonitorPtr mon, unsigned long long actual); int qemuMonitorEmitPMSuspendDisk(qemuMonitorPtr mon); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 13c57d2..8dd1279 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -77,6 +77,8 @@ static void qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, virJSONValuePtr d static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleBlockJobReady(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockWriteThreshold(qemuMonitorPtr mon, + virJSONValuePtr data); static void qemuMonitorJSONHandleBalloonChange(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMSuspendDisk(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleGuestPanic(qemuMonitorPtr mon, virJSONValuePtr data); @@ -95,6 +97,7 @@ static qemuEventHandler eventHandlers[] = { { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, }, { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, { "BLOCK_JOB_READY", qemuMonitorJSONHandleBlockJobReady, }, + { "BLOCK_WRITE_THRESHOLD", qemuMonitorJSONHandleBlockWriteThreshold, }, { "DEVICE_DELETED", qemuMonitorJSONHandleDeviceDeleted, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, }, @@ -845,6 +848,36 @@ qemuMonitorJSONHandleBlockJobReady(qemuMonitorPtr mon, VIR_DOMAIN_BLOCK_JOB_READY); } + +static void +qemuMonitorJSONHandleBlockWriteThreshold(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + const char *device; + unsigned long long offset = 0, len = 0; + + /* TODO: Once we support node names for more than just the active + * layer, we will need to map this into 'vda[1]' notation. */ + if ((device = virJSONValueObjectGetString(data, "node-name")) == NULL) { + VIR_WARN("missing device in block threshold event"); + goto out; + } + + if (virJSONValueObjectGetNumberUlong(data, "write-threshold", + &offset) < 0) { + VIR_WARN("missing threshold in block threshold event"); + goto out; + } + + if (virJSONValueObjectGetNumberUlong(data, "amount-exceeded", &len) < 0) { + VIR_WARN("missing len in block threshold event"); + goto out; + } + + out: + qemuMonitorEmitBlockThreshold(mon, device, offset, len); +} + static void qemuMonitorJSONHandleBalloonChange(qemuMonitorPtr mon, virJSONValuePtr data) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 64ee049..3109c80 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1037,6 +1037,39 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, static int +qemuProcessHandleBlockThreshold(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *diskAlias, + unsigned long long threshold, + unsigned long long length, + void *opaque) +{ + virQEMUDriverPtr driver = opaque; + virDomainDiskDefPtr disk; + virObjectEventPtr event = NULL; + + virObjectLock(vm); + + VIR_DEBUG("Block threshold for device %s (domain: %p,%s) threshold %llu " + "length %llu", diskAlias, vm, vm->def->name, threshold, length); + + if (!(disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias))) + goto cleanup; + + /* TODO: Once we support node names for more than just the active + * layer, we will need to map this into 'vda[1]' notation. */ + event = virDomainEventWriteThresholdNewFromObj(vm, disk->dst, threshold, + length); + + cleanup: + virObjectUnlock(vm); + if (event) + qemuDomainEventQueue(driver, event); + return 0; +} + + +static int qemuProcessHandleGraphics(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjPtr vm, int phase, @@ -1495,6 +1528,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainIOError = qemuProcessHandleIOError, .domainGraphics = qemuProcessHandleGraphics, .domainBlockJob = qemuProcessHandleBlockJob, + .domainBlockThreshold = qemuProcessHandleBlockThreshold, .domainTrayChange = qemuProcessHandleTrayChange, .domainPMWakeup = qemuProcessHandlePMWakeup, .domainPMSuspend = qemuProcessHandlePMSuspend, -- 2.4.2

Expose threshold information by collecting the information from QMP commands. qemu has a way to get threshold information on all elements of a block chain, but only if libvirt provides node names for each node, and then correlates 'query-named-block-nodes' back to the names on each chain element. So for now, we only worry about the threshold of the active layer of a chain. * src/qemu/qemu_monitor.h (_qemuBlockStats): Add member. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockStatsUpdateCapacity): Populate it. * src/qemu/qemu_driver.c (qemuDomainGetStatsOneBlock): Expose it. * tests/qemumonitortest.c (testBlockInfoData): Deal with new member. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 16 ++++++++++++++++ tests/qemumonitortest.c | 13 +++++++------ 4 files changed, 30 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6bb8549..74a6680 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -19370,6 +19370,12 @@ qemuDomainGetStatsOneBlock(virQEMUDriverPtr driver, QEMU_ADD_BLOCK_PARAM_LL(record, maxparams, block_idx, "fl.times", entry->flush_total_times); + /* TODO: Until we can set thresholds on backing elements, we only + * report the threshold on the active layer. */ + if (!backing_idx) + QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx, + "write-threshold", entry->write_threshold); + QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx, "allocation", entry->wr_highest_offset); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 381c0a6..45afba4 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -384,6 +384,7 @@ struct _qemuBlockStats { unsigned long long capacity; unsigned long long physical; unsigned long long wr_highest_offset; + unsigned long long write_threshold; }; int qemuMonitorGetAllBlockStatsInfo(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 8dd1279..2cb3c4b 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1942,6 +1942,7 @@ qemuMonitorJSONBlockStatsUpdateCapacity(qemuMonitorPtr mon, virJSONValuePtr dev = virJSONValueArrayGet(devices, i); virJSONValuePtr inserted; virJSONValuePtr image; + virJSONValuePtr threshold; const char *dev_name; if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) { @@ -1967,6 +1968,21 @@ qemuMonitorJSONBlockStatsUpdateCapacity(qemuMonitorPtr mon, stats, backingChain) < 0) goto cleanup; + + /* For the top layer only, additionally populate the + * write_threshold stat. TODO: populating it for other layers + * requires naming all nodes, and correlating + * query-named-block-nodes back to the data learned here. */ + if ((threshold = virJSONValueObjectGet(inserted, "write_threshold"))) { + char *entry_name = qemuDomainStorageAlias(dev_name, 0); + qemuBlockStatsPtr bstats = virHashLookup(stats, entry_name); + VIR_FREE(entry_name); + + if (bstats && + virJSONValueGetNumberUlong(threshold, + &bstats->write_threshold) < 0) + goto cleanup; + } } ret = 0; diff --git a/tests/qemumonitortest.c b/tests/qemumonitortest.c index 0125962..6dfb7ad 100644 --- a/tests/qemumonitortest.c +++ b/tests/qemumonitortest.c @@ -93,12 +93,13 @@ struct blockInfoData { static const struct blockInfoData testBlockInfoData[] = { -/* NAME, rd_req, rd_bytes, wr_req, wr_bytes, rd_total_time, wr_total_time, flush_req, flush_total_time */ - {"vda", {11, 12, 13, 14, 15, 16, 17, 18, 0, 0, 0}}, - {"vdb", {21, 22, 23, 24, 25, 26, 27, 28, 0, 0, 0}}, - {"vdc", {31, 32, 33, -1, 35, 36, 37, 38, 0, 0, 0}}, - {"vdd", {-1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0}}, - {"vde", {41, 42, 43, 44, 45, 46, 47, 48, 0, 0, 0}} + /* NAME, rd_req, rd_bytes, wr_req, wr_bytes, rd_total_time, wr_total_time, + flush_req, flush_total_time, write_threshold */ + {"vda", {11, 12, 13, 14, 15, 16, 17, 18, 0, 0, 0, 0}}, + {"vdb", {21, 22, 23, 24, 25, 26, 27, 28, 0, 0, 0, 0}}, + {"vdc", {31, 32, 33, -1, 35, 36, 37, 38, 0, 0, 0, 0}}, + {"vdd", {-1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0}}, + {"vde", {41, 42, 43, 44, 45, 46, 47, 48, 0, 0, 0, 0}} }; static const char testBlockInfoReply[] = -- 2.4.2

Time to wire up the new API to call into the QMP command for setting a write threshold. FIXME: This patch does not quite do the right thing. At one point, I tried patching qemu to allow thresholds tied to the device, rather than to a node name, since libvirt is not yet setting node names: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg02023.html But that is the wrong node (remember, each qcow2 resource has two nodes; one for the guest view, which is tied to the device; and another for the host view which is tied to managing the underlying file; we want the allocation of the underlying file based on host offsets, and not the offset seen by the guest). So this patch will need tweaking to coordinate with a version of qemu that auto-assigns node names, to learn the correct node name rather than merely reusing the device name. * src/qemu/qemu_driver.c (qemuDomainBlockSetWriteThreshold): New function. * src/qemu/qemu_monitor.c (qemuMonitorBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor.h (qemuMonitorBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockSetWriteThreshold): Likewise. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 98 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 13 ++++++ src/qemu/qemu_monitor.h | 5 +++ src/qemu/qemu_monitor_json.c | 30 ++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ 5 files changed, 149 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 74a6680..43073d9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -17994,6 +17994,103 @@ qemuDomainGetBlockIoTune(virDomainPtr dom, return ret; } + +static int +qemuDomainBlockSetWriteThreshold(virDomainPtr dom, + const char *path, + unsigned long long threshold, + unsigned int flags) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + virDomainDiskDefPtr disk = NULL; + virCapsPtr caps = NULL; + virQEMUDriverConfigPtr cfg = NULL; + char *node = NULL; + int ret = -1; + + virCheckFlags(VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE, -1); + + if (!(vm = qemuDomObjFromDomain(dom))) + return -1; + priv = vm->privateData; + + if (virDomainBlockSetWriteThresholdEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + goto cleanup; + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCK_WRITE_THRESHOLD)) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("block write threshold not supported by this " + "qemu binary")); + goto cleanup; + } + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + if (!(disk = virDomainDiskByName(vm->def, path, false))) { + virReportError(VIR_ERR_INVALID_ARG, + _("invalid path %s not assigned to domain"), path); + goto endjob; + } + + if (virStorageSourceIsEmpty(disk->src)) { + virReportError(VIR_ERR_INVALID_ARG, + _("disk '%s' does not currently have a source assigned"), + path); + goto endjob; + } + if (virAsprintf(&node, "drive-%s", disk->info.alias) < 0) + goto endjob; + + cfg = virQEMUDriverGetConfig(driver); + if (qemuStorageLimitsRefresh(driver, cfg, vm, disk->src) < 0) + goto endjob; + + if (flags & VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE) { + /* Caller already sanitized max value to 100000. Use of + * floating point intermediary reduces (but does not + * eliminate) rounding error, but since we already document a + * granularity of a thousandth of a percentage, it shouldn't + * matter too much if the answer is a few bytes off. */ + threshold *= disk->src->physical / 100000.0; + } else { + if (threshold > disk->src->physical) { + virReportError(VIR_ERR_INVALID_ARG, + _("threshold %lld exceeds disk size %lld"), + threshold, disk->src->physical); + goto endjob; + } + } + + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorBlockSetWriteThreshold(priv->mon, node, threshold); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + ret = -1; + + + endjob: + qemuDomainObjEndJob(driver, vm); + + cleanup: + VIR_FREE(node); + virDomainObjEndAPI(&vm); + virObjectUnref(caps); + virObjectUnref(cfg); + return ret; +} + + static int qemuDomainGetDiskErrors(virDomainPtr dom, virDomainDiskErrorPtr errors, @@ -20121,6 +20218,7 @@ static virHypervisorDriver qemuHypervisorDriver = { .nodeSuspendForDuration = qemuNodeSuspendForDuration, /* 0.9.8 */ .domainSetBlockIoTune = qemuDomainSetBlockIoTune, /* 0.9.8 */ .domainGetBlockIoTune = qemuDomainGetBlockIoTune, /* 0.9.8 */ + .domainBlockSetWriteThreshold = qemuDomainBlockSetWriteThreshold, /* 1.2.17 */ .domainSetNumaParameters = qemuDomainSetNumaParameters, /* 0.9.9 */ .domainGetNumaParameters = qemuDomainGetNumaParameters, /* 0.9.9 */ .domainGetInterfaceParameters = qemuDomainGetInterfaceParameters, /* 0.9.9 */ diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 27e5a32..8d9ca84 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1824,6 +1824,19 @@ qemuMonitorBlockStatsUpdateCapacity(qemuMonitorPtr mon, int +qemuMonitorBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name, + unsigned long long threshold) +{ + VIR_DEBUG("dev_name=%s, threshold=%llu", dev_name, threshold); + + QEMU_CHECK_MONITOR_JSON(mon); + + return qemuMonitorJSONBlockSetWriteThreshold(mon, dev_name, threshold); +} + + +int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 45afba4..e2832e8 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -397,6 +397,11 @@ int qemuMonitorBlockStatsUpdateCapacity(qemuMonitorPtr mon, bool backingChain) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name, + unsigned long long threshold) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 2cb3c4b..96168d9 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1994,6 +1994,36 @@ qemuMonitorJSONBlockStatsUpdateCapacity(qemuMonitorPtr mon, } +int +qemuMonitorJSONBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name, + unsigned long long threshold) +{ + int ret = -1; + int rc; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("block-set-write-threshold", + "s:node-name", dev_name, + "U:write-threshold", threshold, + NULL))) + return -1; + + if ((rc = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + if (qemuMonitorJSONCheckError(cmd, reply) < 0) + goto cleanup; + + ret = 0; + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + + static int qemuMonitorJSONReportBlockExtentError(qemuMonitorBlockExtentError error) { diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index fb77930..8b41536 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -77,6 +77,9 @@ int qemuMonitorJSONGetAllBlockStatsInfo(qemuMonitorPtr mon, int qemuMonitorJSONBlockStatsUpdateCapacity(qemuMonitorPtr mon, virHashTablePtr stats, bool backingChain); +int qemuMonitorJSONBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name, + unsigned long long threshold); int qemuMonitorJSONGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent); -- 2.4.2
participants (3)
-
Eric Blake
-
Francesco Romani
-
Peter Krempa