On Fri, Jun 12, 2015 at 13:29:25 -0600, Eric Blake wrote:
qemu 2.3 added a new QMP command block-set-write-threshold,
which allows callers to get an interrupt when a file hits a
write threshold, rather than the current approach of repeatedly
polling for file allocation. This patch prepares the API for
callers to register to receive the event, as well as a way
to query the threshold via virDomainListGetStats().
The event is one-shot in qemu - a guest must re-register a new
threshold after each time it triggers. However, the
virConnectDomainEventRegisterAny() call does not allow
parameterization, so callers must use a pair of APIs - one
to register the callback (one-time call) that will be used each
time a threshold triggers for any guest disk, and another to
repeatedly set the desired threshold (must be called each time
a threshold should be changed).
Note that the threshold can either be registered by a byte
offset, or by a thousandth of a percentage (a value between
0 and 100000, scaled to the disk size). But the value is
always reported as a byte offset, even when registered as a
percentage. I also considered having the setup parameter be
a double, to allow a finer resolution on percentage; with the
choice of an integer fixed-point scale, this means a 100G
disk can only set a threshold to a granularity of 1M, but
that is probably sufficient for the usage.
To make the patch series more digestible, this patch
intentionally omits remote support, by using a couple of
placeholders at a point where the compiler forces the addition
of a case label within a switch statement.
* include/libvirt/libvirt-domain.h
(virDomainBlockSetWriteThreshold): New API.
(virConnectDomainEventWriteThresholdCallback): New event.
* src/libvirt_public.syms (LIBVIRT_1.2.17): Export it.
* src/libvirt-domain.c (virDomainBlockSetWriteThreshold): New API.
(virConnectGetAllDomainStats): New stat.
* src/driver-hypervisor.h (virDrvDomainBlockSetWriteThreshold):
New hypervisor entry point.
* tools/virsh-domain.c (vshEventWriteThresholdPrint): Print new
event.
* tools/virsh.pod (domstats): Document new stat.
* daemon/remote.c (domainEventCallbacks): Add stub.
* src/conf/domain_event.c (virDomainEventDispatchDefaultFunc):
Likewise.
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
daemon/remote.c | 2 +
include/libvirt/libvirt-domain.h | 47 ++++++++++++++++++++
src/conf/domain_event.c | 4 +-
src/driver-hypervisor.h | 7 +++
src/libvirt-domain.c | 95 ++++++++++++++++++++++++++++++++++++++++
src/libvirt_public.syms | 5 +++
tools/virsh-domain.c | 23 ++++++++++
tools/virsh.pod | 1 +
8 files changed, 183 insertions(+), 1 deletion(-)
...
diff --git a/include/libvirt/libvirt-domain.h
b/include/libvirt/libvirt-domain.h
index d851225..7514656 100644
--- a/include/libvirt/libvirt-domain.h
+++ b/include/libvirt/libvirt-domain.h
@@ -1297,6 +1297,17 @@ int virDomainBlockStatsFlags (virDomainPtr
dom,
virTypedParameterPtr params,
int *nparams,
unsigned int flags);
+
+typedef enum {
+ /* threshold is thousandth of a percentage (0 to 100000) relative to
You managed to choose a unusual unit. Commonly used ones are 1/1000 and
1/1 000 000. Financial world also uses 1/10 000. Your unit of 1/100 000
is not among:
https://en.wikipedia.org/wiki/Parts-per_notation#Parts-per_expressions
I'd again suggest to use 1/1 000 000. Or if you want to be uber preciese
you might choose 1/(2^64 - 1).
+ * image size rather than byte limit */
+ VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE = (1 << 0),
+} virDomainBlockSetWriteThresholdFlags;
+int virDomainBlockSetWriteThreshold(virDomainPtr dom,
+ const char *disk,
+ unsigned long long threshold,
+ unsigned int flags);
+
int virDomainInterfaceStats (virDomainPtr dom,
const char *path,
virDomainInterfaceStatsPtr stats,
@@ -3246,6 +3257,41 @@ typedef void
(*virConnectDomainEventDeviceAddedCallback)(virConnectPtr conn,
void *opaque);
/**
+ * virConnectDomainEventWriteThresholdCallback:
+ * @conn: connection object
+ * @dom: domain on which the event occurred
+ * @devAlias: device alias
+ * @threshold: threshold that was exceeded, in bytes
+ * @length: length beyond @threshold that was involved in the triggering
+ * write, or 0 if not known
+ * @opaque: application specified data
+ *
+ * The callback signature to use when registering for an event of type
+ * VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD with virConnectDomainEventRegisterAny()
+ *
+ * This callback occurs when a block device detects a write event that
+ * exceeds a non-zero threshold set by
+ * virDomainBlockSetWriteThreshold(). When this event occurs, the
+ * threshold is reset to 0, and a new limit must be installed to see
+ * the event again on the same device. The intent of this event is to
+ * allow time for the underlying storage to be resized dynamically
+ * prior to the point where the guest would be paused due to running
+ * out of space, without having to poll for allocation values.
+ *
+ * The contents of @devAlias will be "vda" when the threshold is triggered
+ * on the active layer of guest disk vda. Some hypervisors also support
+ * threshold reporting on backing images, such as during a block commit;
+ * when that happens, @devAlias will be "vda[1]" for the backingStore at
+ * index 1 within the chain of host resources for guest disk vda.
Is it perhaps worth to include a optional field that will contain the
file path since most use cases of this event will use a local block
device with the event? iscsi and NBD block devices then could return the
field empty as we now do in the bulk stats API
+ */
+typedef void (*virConnectDomainEventWriteThresholdCallback)(virConnectPtr conn,
+ virDomainPtr dom,
+ const char *devAlias,
+ unsigned long long
threshold,
+ unsigned long long length,
+ void *opaque);
+
+/**
* VIR_DOMAIN_TUNABLE_CPU_VCPUPIN:
*
* Macro represents formatted pinning for one vcpu specified by id which is
...
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index 7e6d749..53114d3 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -5743,6 +5743,99 @@ virDomainBlockStatsFlags(virDomainPtr dom,
/**
+ * virDomainBlockSetWriteThreshold:
+ * @dom: pointer to domain object
+ * @disk: path to the block device, or device shorthand
+ * @threshold: limit at which a write threshold event can trigger
+ * @flags: bitwise-OR of virDomainBlockSetWriteThresholdFlags
+ *
+ * This function is used to set a one-shot write threshold. It should
+ * be used in tandem with virConnectDomainEventRegisterAny()
+ * installing a handler for VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD. If
+ * the hypervisor detects that a write request (whether guest data, or
+ * host metadata) would exceed the host byte offset specified in
+ * @threshold, then an event is raised, and the threshold is reset to
+ * 0 at that time. The event registration is only needed once, but
+ * this function must be called each time a new threshold is desired;
+ * the event will only fire if a non-zero threshold is
+ * exceeded.
+ *
+ * By default, @threshold is specified in bytes, and must not exceed
+ * the size of the block device. However, when @flags includes
+ * VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PERCENTAGE, @threshold is
+ * instead a value between 0 an 100,000, as a thousandth of a percent
+ * of the current size of the disk, and the driver will compute the
+ * corresponding byte value. For example, 80000 represents 80.000%.
+ * A driver may round the requested threshold to a granularity that
+ * can actually be supported.
+ *
+ * Setting a threshold allows event-based resizing of host resources
+ * that back a guest disk without having to poll the current disk
+ * allocation, while still having enough time to complete the resize
+ * before the guest would end up halting due to insufficient space.
+ * Calling this function to set the threshold back to zero will stop
+ * further firing of the event. virConnectGetAllDomainStats() can be
+ * used to track the current threshold value, always in the form
+ * normalized to bytes.
+ *
+ * The @disk parameter is either the device target shorthand (the
+ * <target dev='...'/> sub-element, such as "vda"), or (since
0.9.8)
Since this will be added in 1.3.0 (or 1.2.15) the "since" statement is
not exactly true.
+ * an unambiguous source name of the block device (the <source
+ * file='...'/> sub-element, such as "/path/to/image"). Valid
names
+ * can be found by calling virDomainGetXMLDesc() and inspecting
+ * elements within //domain/devices/disk. Some drivers might also
+ * accept strings such as "vda[1]" for setting the threshold of a
+ * backing image, useful when doing a block commit into the backing
+ * image.
+ *
+ * Domains may have more than one block device. To set thresholds for
+ * each you should make multiple calls to this function. If write
+ * thresholds are not supported, an application will have to instead
+ * poll virDomainGetBlockInfo() or similar to track allocation.
+ *
+ * Returns -1 in case of error, 0 in case of success.
+ */
Otherwise looks good to me.
Peter