[libvirt] [PATCHv3 00/10] new API virDomainBlockSetWriteThreshold

v2 was here: https://www.redhat.com/archives/libvir-list/2015-June/msg00591.html This series depends on my yajl/json cleanups: https://www.redhat.com/archives/libvir-list/2015-June/msg01098.html and can also be found here: git clone git://repo.or.cz/libvirt/ericb.git threshold http://repo.or.cz/w/libvirt/ericb.git/shortlog/refs/heads/threshold Since then, I've rebased to master, addressed Peter's findings on v2, and tested that things actually work when combined with a single pending patch on qemu (I'll reply to this mail with the list id of that patch, where that patch in turn points to this series as justification). Here's a transcript of how I tested: I modified my 'f20' domain to include: <devices> <emulator>/home/eblake/qemu/x86_64-softmmu/qemu-system-x86_64</emulator> <disk type='block' device='disk'> <driver name='qemu' type='qcow2'/> <source dev='/dev/sda6'/> <target dev='vda' bus='virtio'/> </disk> then created a raw partition wrapper around the usual disk image: # qemu-img create -f qcow2 -o backing_file=/var/lib/libvirt/images/f20.img,backing_fmt=qcow2 /dev/sda6 Formatting '/dev/sda6', fmt=qcow2 size=12884901888 backing_file='/var/lib/libvirt/images/f20.img' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 I then started the domain paused (as long as no guest I/O happens, the threshold cannot be exceeded, which gives me time to register a threshold and set up an event handler): # virsh start --paused f20 Domain f20 started # ./run tools/virsh -k0 domblkthreshold f20 vda 10000 threshold of vda set to 10000 bytes I was then able to play with domstats to see different thresholds as I registered them (such as playing with the parts-per-million or even with virsh 'percentage sugar'): # virsh -k0 domstats f20 --block | grep -C1 allo block.0.write-threshold=10000 block.0.allocation=0 block.0.capacity=12884901888 In another window, I set up my event loop: $ ./run tools/virsh -k0 event f20 --loop --all then in the first window, I resumed things: # virsh resume f20 and got this message in the second window, after several seconds had elapsed (the first few seconds in the guest is under control of Grub, which doesn't write to disk; when it transitioned over to the Linux boot, disk activity started happening and easily causes the qcow2 wrapper to exceed the threshold I set earlier): $ ./run tools/virsh -k0 -c qemu:///system event --loop --all event 'lifecycle' for domain f20: Resumed Unpaused event 'lifecycle' for domain f20: Resumed Unpaused event 'write-threshold' for domain f20 disk vda (/dev/sda6): threshold 10000 exceeded by 186608 bytes Sure enough, the registration had cleared, and the qcow2 wrapper had grown (it grew more than once in the process of booting, so by the time I re-queried, allocation had grown even further). # virsh -k0 domstats f20 --block | grep -C1 allo block.0.write-threshold=0 block.0.allocation=13434368 block.0.capacity=12884901888 I'd really like to see this series in 1.3.0 to commit to the API; there are still some further improvements we can make (in particular 'vda[1]' support, and also improving qemu event handling to re-grab a job lock if the cache of node names is not populated after a libvirtd restart), but as those improvements won't affect ABI, they can be deferred if they don't make it into 1.3.0. [I'm headed on vacation, so I will be really slow to respond in the next 3 weeks; someone else may need to push this if it is approved for 1.3.0] Eric Blake (10): threshold: new API virDomainBlockSetWriteThreshold threshold: expose new API in virsh threshold: new event object for tracking write threshold threshold: wire up threshold event in RPC threshold: add qemu capability bits threshold: learn a node name for a given qcow2 disk threshold: track an allocation node name for a storage source threshold: scrape threshold data from QMP threshold: add threshold event handling in qemu threshold: add write threshold setting in qemu daemon/remote.c | 51 +++ include/libvirt/libvirt-domain.h | 53 +++ src/conf/domain_event.c | 101 ++++- src/conf/domain_event.h | 16 +- src/driver-hypervisor.h | 7 + src/libvirt-domain.c | 98 +++++ src/libvirt_private.syms | 2 + src/libvirt_public.syms | 1 + src/qemu/qemu_capabilities.c | 12 +- src/qemu/qemu_capabilities.h | 4 +- src/qemu/qemu_domain.c | 105 ++++++ src/qemu/qemu_domain.h | 10 +- src/qemu/qemu_driver.c | 108 +++++- src/qemu/qemu_migration.c | 2 +- src/qemu/qemu_monitor.c | 79 +++- src/qemu/qemu_monitor.h | 28 ++ src/qemu/qemu_monitor_json.c | 305 ++++++++++++++- src/qemu/qemu_monitor_json.h | 11 + src/qemu/qemu_process.c | 44 +++ src/remote/remote_driver.c | 35 ++ src/remote/remote_protocol.x | 30 +- src/remote_protocol-structs | 16 + src/util/virstoragefile.c | 4 +- src/util/virstoragefile.h | 3 +- tests/qemucapabilitiesdata/caps_2.1.1-1.replies | 470 ++++++++++++------------ tests/qemumonitortest.c | 13 +- tools/virsh-domain-monitor.c | 108 ++++++ tools/virsh-domain.c | 24 ++ tools/virsh.pod | 26 ++ 29 files changed, 1508 insertions(+), 258 deletions(-) -- 2.4.3

qemu 2.3 added a new QMP command block-set-write-threshold, which allows callers to get an interrupt when a file hits a write threshold, rather than the current approach of repeatedly polling for file allocation. This patch prepares the API for callers to register to receive the event, as well as a way to query the threshold via virDomainListGetStats(). The event is one-shot in qemu - a guest must re-register a new threshold after each time it triggers. However, the virConnectDomainEventRegisterAny() call does not allow parameterization, so callers must use a pair of APIs - one to register the callback (one-time call) that will be used each time a threshold triggers for any guest disk, and another to repeatedly set the desired threshold (must be called each time a threshold should be changed). Note that the threshold can either be registered by a byte offset, or by a parts-per-million proportion (a value between 0 and 1000000, scaled to the disk size). But the value is always reported as a byte offset, even when registered as a proportion. I also considered having the setup parameter be a double, to allow a finer resolution rather than fixed-point proportion; but that much resolution is probably not necessary (for a 100G disk, the resulting 100k granularity is pretty much in the noise). To make the patch series more digestible, this patch intentionally omits remote support, by using a couple of placeholders at a point where the compiler forces the addition of a case label within a switch statement. * include/libvirt/libvirt-domain.h (virDomainBlockSetWriteThreshold): New API. (virConnectDomainEventWriteThresholdCallback): New event. * src/libvirt_public.syms (LIBVIRT_1.3.0): Export it. * src/libvirt-domain.c (virDomainBlockSetWriteThreshold): New API. (virConnectGetAllDomainStats): New stat. * src/driver-hypervisor.h (virDrvDomainBlockSetWriteThreshold): New hypervisor entry point. * tools/virsh-domain.c (vshEventWriteThresholdPrint): Print new event. * tools/virsh.pod (domstats): Document new stat. * daemon/remote.c (domainEventCallbacks): Add stub. * src/conf/domain_event.c (virDomainEventDispatchDefaultFunc): Likewise. Signed-off-by: Eric Blake <eblake@redhat.com> --- daemon/remote.c | 2 + include/libvirt/libvirt-domain.h | 53 ++++++++++++++++++++++ src/conf/domain_event.c | 4 +- src/driver-hypervisor.h | 7 +++ src/libvirt-domain.c | 98 ++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 1 + tools/virsh-domain.c | 24 ++++++++++ tools/virsh.pod | 1 + 8 files changed, 189 insertions(+), 1 deletion(-) diff --git a/daemon/remote.c b/daemon/remote.c index e9e2dca..283ece2 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1102,6 +1102,8 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventTunable), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventAgentLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventDeviceAdded), + /* TODO: Implement RPC support for this */ + VIR_DOMAIN_EVENT_CALLBACK(NULL), }; verify(ARRAY_CARDINALITY(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 7564c20..ca2f929 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1306,6 +1306,18 @@ int virDomainBlockStatsFlags (virDomainPtr dom, virTypedParameterPtr params, int *nparams, unsigned int flags); + +typedef enum { + /* threshold is a parts-per-million proportion of the image size + * rather than byte limit */ + VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PROPORTION = (1 << 0), +} virDomainBlockSetWriteThresholdFlags; + +int virDomainBlockSetWriteThreshold(virDomainPtr dom, + const char *disk, + unsigned long long threshold, + unsigned int flags); + int virDomainInterfaceStats (virDomainPtr dom, const char *path, virDomainInterfaceStatsPtr stats, @@ -3255,6 +3267,46 @@ typedef void (*virConnectDomainEventDeviceAddedCallback)(virConnectPtr conn, void *opaque); /** + * virConnectDomainEventWriteThresholdCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @devAlias: device alias + * @path: a local path name of the host resource, or NULL if not available + * @threshold: threshold that was exceeded, in bytes + * @length: length beyond @threshold that was involved in the triggering + * write, or 0 if not known + * @opaque: application specified data + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD with virConnectDomainEventRegisterAny() + * + * This callback occurs when a block device detects a write event that + * exceeds a non-zero threshold set by + * virDomainBlockSetWriteThreshold(). When this event occurs, the + * threshold is reset to 0, and a new limit must be installed to see + * the event again on the same device. The intent of this event is to + * allow time for the underlying storage to be resized dynamically + * prior to the point where the guest would be paused due to running + * out of space, without having to poll for allocation values. + * + * The contents of @devAlias will be "vda" when the threshold is + * triggered on the active layer of guest disk vda. Some hypervisors + * also support threshold reporting on backing images, such as during + * a block commit; when that happens, @devAlias will be "vda[1]" for + * the backingStore at index 1 within the chain of host resources for + * guest disk vda. For convenience, if the host resource has a local + * file name, that will be listed in @path (note that @path will be + * NULL for network resources). + */ +typedef void (*virConnectDomainEventWriteThresholdCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *devAlias, + const char *path, + unsigned long long threshold, + unsigned long long length, + void *opaque); + +/** * VIR_DOMAIN_TUNABLE_CPU_VCPUPIN: * * Macro represents formatted pinning for one vcpu specified by id which is @@ -3537,6 +3589,7 @@ typedef enum { VIR_DOMAIN_EVENT_ID_TUNABLE = 17, /* virConnectDomainEventTunableCallback */ VIR_DOMAIN_EVENT_ID_AGENT_LIFECYCLE = 18,/* virConnectDomainEventAgentLifecycleCallback */ VIR_DOMAIN_EVENT_ID_DEVICE_ADDED = 19, /* virConnectDomainEventDeviceAddedCallback */ + VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD = 20, /* virConnectDomainEventWriteThreshold */ # ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_EVENT_ID_LAST diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index 20d66e1..c43799f 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -1,7 +1,7 @@ /* * domain_event.c: domain event queue processing helpers * - * Copyright (C) 2010-2014 Red Hat, Inc. + * Copyright (C) 2010-2015 Red Hat, Inc. * Copyright (C) 2008 VirtualIron * Copyright (C) 2013 SUSE LINUX Products GmbH, Nuernberg, Germany. * @@ -1614,6 +1614,8 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, goto cleanup; } + case VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD: + /* TODO: Implement RPC support for this */ case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h index 3275343..b5b51f1 100644 --- a/src/driver-hypervisor.h +++ b/src/driver-hypervisor.h @@ -484,6 +484,12 @@ typedef int unsigned int flags); typedef int +(*virDrvDomainBlockSetWriteThreshold)(virDomainPtr domain, + const char *disk, + unsigned long long threshold, + unsigned int flags); + +typedef int (*virDrvDomainInterfaceStats)(virDomainPtr domain, const char *path, virDomainInterfaceStatsPtr stats); @@ -1324,6 +1330,7 @@ struct _virHypervisorDriver { virDrvDomainBlockResize domainBlockResize; virDrvDomainBlockStats domainBlockStats; virDrvDomainBlockStatsFlags domainBlockStatsFlags; + virDrvDomainBlockSetWriteThreshold domainBlockSetWriteThreshold; virDrvDomainInterfaceStats domainInterfaceStats; virDrvDomainSetInterfaceParameters domainSetInterfaceParameters; virDrvDomainGetInterfaceParameters domainGetInterfaceParameters; diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 4d7b88a..4ac9325 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -5743,6 +5743,102 @@ virDomainBlockStatsFlags(virDomainPtr dom, /** + * virDomainBlockSetWriteThreshold: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @threshold: limit at which a write threshold event can trigger + * @flags: bitwise-OR of virDomainBlockSetWriteThresholdFlags + * + * This function is used to set a one-shot write threshold. It should + * be used in tandem with virConnectDomainEventRegisterAny() + * installing a handler for VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD. If + * the hypervisor detects that a write request (whether guest data, or + * host metadata) would exceed the host byte offset specified in + * @threshold, then an event is raised, and the threshold is reset to + * 0 at that time. The event registration is only needed once, but + * this function must be called each time a new threshold is desired; + * the event will only fire if a non-zero threshold is + * exceeded. + * + * By default, @threshold is specified in bytes, and must not exceed + * the size of the block device. However, when @flags includes + * VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PROPORTION, @threshold is + * instead a value between 0 an 1,000,000, as a parts-per-million + * proportion to the current size of the disk, and the driver will + * compute the corresponding byte value. For example, 500000 + * represents a threshold when half the disk has been allocated. A + * driver may round the requested threshold to a granularity that can + * actually be supported. + * + * Setting a threshold allows event-based resizing of host resources + * that back a guest disk without having to poll the current disk + * allocation, while still having enough time to complete the resize + * before the guest would end up halting due to insufficient space. + * Calling this function to set the threshold back to zero will stop + * further firing of the event. virConnectGetAllDomainStats() can be + * used to track the current threshold value, always in the form + * normalized to bytes. + * + * The @disk parameter is either the device target shorthand (the + * <target dev='...'/> sub-element, such as "vda"), or an unambiguous + * source name of the block device (the <source file='...'/> + * sub-element, such as "/path/to/image"). Valid names can be found + * by calling virDomainGetXMLDesc() and inspecting elements within + * //domain/devices/disk. Some drivers might also accept strings such + * as "vda[1]" for setting the threshold of a backing image, useful + * when doing a block commit into the backing image. Hypervisors may + * restrict threshold reporting to certain types of host resources, + * such as a qcow2 format on top of a block device (as allocation + * tracking differs according to the type of host resource). + * + * Domains may have more than one block device. To set thresholds for + * each you should make multiple calls to this function. If write + * thresholds are not supported, an application will have to instead + * poll virDomainGetBlockInfo() or similar to track allocation. + * + * Returns -1 in case of error, 0 in case of success. + */ +int +virDomainBlockSetWriteThreshold(virDomainPtr dom, + const char *disk, + unsigned long long threshold, + unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(dom, "disk=%s, threshold=%llu, flags=%x", + disk, threshold, flags); + + virResetLastError(); + + virCheckDomainReturn(dom, -1); + virCheckNonNullArgGoto(disk, error); + if (flags & VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PROPORTION && + threshold > 1000000) { + virReportError(VIR_ERR_INVALID_ARG, + _("threshold in %s is larger than 100%%"), + __FUNCTION__); + goto error; + } + conn = dom->conn; + + if (conn->driver->domainBlockSetWriteThreshold) { + int ret; + ret = conn->driver->domainBlockSetWriteThreshold(dom, disk, threshold, + flags); + if (ret < 0) + goto error; + return ret; + } + virReportUnsupportedError(); + + error: + virDispatchError(dom->conn); + return -1; +} + + +/** * virDomainInterfaceStats: * @dom: pointer to the domain object * @path: path to the interface @@ -11176,6 +11272,8 @@ virConnectGetDomainCapabilities(virConnectPtr conn, * unsigned long long. * "block.<num>.errors" - Xen only: the 'oo_req' value as * unsigned long long. + * "block.<num>.write-threshold" - byte at which a write threshold event + * will fire, as unsigned long long. * "block.<num>.allocation" - offset of the highest written sector * as unsigned long long. * "block.<num>.capacity" - logical size in bytes of the block device backing diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 59d8c12..14b2373 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -717,6 +717,7 @@ LIBVIRT_1.2.16 { LIBVIRT_1.3.0 { global: + virDomainBlockSetWriteThreshold; virTypedParamsAddStringList; } LIBVIRT_1.2.16; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index baf4fa3..141be3a 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -12082,6 +12082,28 @@ vshEventDeviceAddedPrint(virConnectPtr conn ATTRIBUTE_UNUSED, } static void +vshEventWriteThresholdPrint(virConnectPtr conn ATTRIBUTE_UNUSED, + virDomainPtr dom, + const char *alias, + const char *path, + unsigned long long threshold, + unsigned long long length, + void *opaque) +{ + vshDomEventData *data = opaque; + + if (!data->loop && *data->count) + return; + vshPrint(data->ctl, + _("event 'write-threshold' for domain %s disk %s (%s): " + "threshold %llu exceeded by %llu bytes\n"), + virDomainGetName(dom), alias, NULLSTR(path), threshold, length); + (*data->count)++; + if (!data->loop) + vshEventDone(data->ctl); +} + +static void vshEventTunablePrint(virConnectPtr conn ATTRIBUTE_UNUSED, virDomainPtr dom, virTypedParameterPtr params, @@ -12188,6 +12210,8 @@ static vshEventCallback vshEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(vshEventAgentLifecyclePrint), }, { "device-added", VIR_DOMAIN_EVENT_CALLBACK(vshEventDeviceAddedPrint), }, + { "write-threshold", + VIR_DOMAIN_EVENT_CALLBACK(vshEventWriteThresholdPrint), }, }; verify(VIR_DOMAIN_EVENT_ID_LAST == ARRAY_CARDINALITY(vshEventCallbacks)); diff --git a/tools/virsh.pod b/tools/virsh.pod index bcfa165..600ea42 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -919,6 +919,7 @@ local file or block device, "block.<num>.fl.reqs" - total flush requests, "block.<num>.fl.times" - total time (ns) spent on cache flushing, "block.<num>.errors" - Xen only: the 'oo_req' value, +"block.<num>.write-threshold" - write threshold event trigger, in bytes, "block.<num>.allocation" - offset of highest written sector in bytes, "block.<num>.capacity" - logical size of source file in bytes, "block.<num>.physical" - physical size of source file in bytes -- 2.4.3

Add a new 'virsh domblkthreshold' command to use the new API. The main use is an obvious mapping to the new API: virsh domblkthreshold $dom $disk 10000000 # 10M bytes or virsh domblkthreshold $dom $disk 101000 --proportion # 10.1% but I also wanted to be lazy at computing parts per million, so I allow: virsh domblkthreshold $dom $disk --percentage 10.1% # as before * tools/virsh.pod (domblkthreshold): Document it. * tools/virsh-domain-monitor.c (cmdDomblkthreshold): New function. (domMonitoringCmds): Register it. Signed-off-by: Eric Blake <eblake@redhat.com> --- tools/virsh-domain-monitor.c | 108 +++++++++++++++++++++++++++++++++++++++++++ tools/virsh.pod | 25 ++++++++++ 2 files changed, 133 insertions(+) diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c index 1d4dc25..5a0287f 100644 --- a/tools/virsh-domain-monitor.c +++ b/tools/virsh-domain-monitor.c @@ -571,6 +571,108 @@ cmdDomblklist(vshControl *ctl, const vshCmd *cmd) } /* + * "domblkthreshold" command + */ +static const vshCmdInfo info_domblkthreshold[] = { + {.name = "help", + .data = N_("set domain block device write thresholds") + }, + {.name = "desc", + .data = N_("Set a threshold to get a one-shot event if block " + "allocation exceeds that size") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_domblkthreshold[] = { + {.name = "domain", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("domain name, id or uuid"), + }, + {.name = "device", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("block device"), + }, + {.name = "threshold", + .type = VSH_OT_INT, + .help = N_("new threshold, or 0 to disable"), + }, + {.name = "proportion", + .type = VSH_OT_BOOL, + .help = N_("threshold is in parts-per-million instead of bytes"), + }, + {.name = "percentage", + .type = VSH_OT_STRING, /* floating point doesn't have an option type */ + .flags = VSH_OFLAG_REQ_OPT, + .help = N_("determine threshold from a percentage"), + }, + {.name = NULL} +}; + +static bool +cmdDomblkthreshold(vshControl *ctl, const vshCmd *cmd) +{ + virDomainPtr dom; + bool ret = false; + const char *device = NULL; + unsigned long long threshold; + bool proportion = vshCommandOptBool(cmd, "proportion"); + unsigned int flags = 0; + const char *percentage = NULL; + + VSH_EXCLUSIVE_OPTIONS("threshold", "percentage"); + VSH_EXCLUSIVE_OPTIONS("proportion", "percentage"); + + if (vshCommandOptStringReq(ctl, cmd, "percentage", &percentage) < 0) + return false; + if (percentage) { + char *end; + double raw; + + if (virStrToDouble(percentage, &end, &raw) < 0 || + end[*end == '%'] || raw < 0.0 || raw > 100.0) { + vshError(ctl, _("unable to parse '%s' as percentage"), percentage); + return false; + } + threshold = raw * 10000; + proportion = true; + } else if (!vshCommandOptBool(cmd, "threshold")) { + vshError(ctl, "%s", + _("either --threshold or --percentage is required")); + return false; + } else if (vshCommandOptULongLong(ctl, cmd, "threshold", &threshold) < 0) { + return false; + } + + if (proportion) + flags |= VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PROPORTION; + + if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (vshCommandOptStringReq(ctl, cmd, "device", &device) < 0) + goto cleanup; + + if (virDomainBlockSetWriteThreshold(dom, device, threshold, flags) < 0) + goto cleanup; + + if (proportion) + vshPrint(ctl, _("threshold of %s set to %llu.%04llu%%\n"), + device, threshold / 10000, threshold % 10000); + else + vshPrint(ctl, _("threshold of %s set to %llu bytes\n"), + device, threshold); + + ret = true; + + cleanup: + virDomainFree(dom); + return ret; +} + +/* * "domiflist" command */ static const vshCmdInfo info_domiflist[] = { @@ -2358,6 +2460,12 @@ const vshCmdDef domMonitoringCmds[] = { .info = info_domblkstat, .flags = 0 }, + {.name = "domblkthreshold", + .handler = cmdDomblkthreshold, + .opts = opts_domblkthreshold, + .info = info_domblkthreshold, + .flags = 0 + }, {.name = "domcontrol", .handler = cmdDomControl, .opts = opts_domcontrol, diff --git a/tools/virsh.pod b/tools/virsh.pod index 600ea42..1b67a72 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -843,6 +843,31 @@ that require a block device name (such as I<domblkinfo> or I<snapshot-create> for disk snapshots) will accept either target or unique source names printed by this command. +=item B<domblkthreshold> I<domain> I<block-device> { I<threshold> +[I<--proportion>] | I<--percentage> I<value> } + +Set the write threshold for a block device of a domain. A +I<block-device> corresponds to a unique target name (<target +dev='name'/>) or source file (<source file='name'/>) for one of the +disk devices attached to I<domain> (see also B<domblklist> for listing +these names). Some hypervisors also allow "vda[1]" to set a threshold +on the first backing file of the target "vda", useful when doing a +block commit. + +By default, I<threshold> is the byte offset within the host that will +trigger an event; but if I<--proportion> is used, the threshold is +instead a parts-per-million relative to the disk size (800000 maps to +an 80% threshold). It is also possible to use I<--percentage=80.00> +as shorthand for I<--proportion --threshold=800000>. + +Setting a write threshold causes an event to be delivered if the +allocation of I<block-device> passes that point, allowing a user to +resize the underlying storage before the guest would be forcefully +paused due to an ENOSPC scenario. The B<event> command can be used to +listen for such events. If the hypervisor does not support event +notification, then B<domblkinfo> must instead be polled to track +allocation over time. The current threshold is listed in B<domstats>. + =item B<domstats> [I<--raw>] [I<--enforce>] [I<--backing>] [I<--state>] [I<--cpu-total>] [I<--balloon>] [I<--vcpu>] [I<--interface>] [I<--block>] [[I<--list-active>] [I<--list-inactive>] [I<--list-persistent>] -- 2.4.3

Common code for tracking a write threshold event from creation to delivery through a callback. * src/conf/domain_event.c (_virDomainEventWriteThreshold): New struct. (virDomainEventsOnceInit, virDomainEventWriteThresholdDispose) (virDomainEventWriteThresholdNew) (virDomainEventWriteThresholdNewFromObj) (virDomainEventWriteThresholdNewFromDom) (virDomainEventDispatchDefaultFunc): Wire it up. * src/conf/domain_event.h: Expose new functions. * src/libvirt-private.syms (domain_event.h): Export them. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/conf/domain_event.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++- src/conf/domain_event.h | 16 +++++++- src/libvirt_private.syms | 2 + 3 files changed, 115 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index c43799f..400cfa0 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -56,6 +56,7 @@ static virClassPtr virDomainQemuMonitorEventClass; static virClassPtr virDomainEventTunableClass; static virClassPtr virDomainEventAgentLifecycleClass; static virClassPtr virDomainEventDeviceAddedClass; +static virClassPtr virDomainEventWriteThresholdClass; static void virDomainEventDispose(void *obj); @@ -74,6 +75,7 @@ static void virDomainQemuMonitorEventDispose(void *obj); static void virDomainEventTunableDispose(void *obj); static void virDomainEventAgentLifecycleDispose(void *obj); static void virDomainEventDeviceAddedDispose(void *obj); +static void virDomainEventWriteThresholdDispose(void *obj); static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -236,6 +238,17 @@ struct _virDomainEventAgentLifecycle { typedef struct _virDomainEventAgentLifecycle virDomainEventAgentLifecycle; typedef virDomainEventAgentLifecycle *virDomainEventAgentLifecyclePtr; +struct _virDomainEventWriteThreshold { + virDomainEvent parent; + + char *disk; + char *path; + unsigned long long threshold; + unsigned long long length; +}; +typedef struct _virDomainEventWriteThreshold virDomainEventWriteThreshold; +typedef virDomainEventWriteThreshold *virDomainEventWriteThresholdPtr; + static int virDomainEventsOnceInit(void) @@ -336,6 +349,12 @@ virDomainEventsOnceInit(void) sizeof(virDomainEventAgentLifecycle), virDomainEventAgentLifecycleDispose))) return -1; + if (!(virDomainEventWriteThresholdClass = + virClassNew(virDomainEventClass, + "virDomainEventWriteThreshold", + sizeof(virDomainEventWriteThreshold), + virDomainEventWriteThresholdDispose))) + return -1; return 0; } @@ -496,6 +515,16 @@ virDomainEventAgentLifecycleDispose(void *obj) VIR_DEBUG("obj=%p", event); }; +static void +virDomainEventWriteThresholdDispose(void *obj) +{ + virDomainEventWriteThresholdPtr event = obj; + VIR_DEBUG("obj=%p", event); + + VIR_FREE(event->disk); + VIR_FREE(event->path); +}; + static void * virDomainEventNew(virClassPtr klass, @@ -1390,6 +1419,62 @@ virDomainEventTunableNewFromDom(virDomainPtr dom, nparams); } +static virObjectEventPtr +virDomainEventWriteThresholdNew(int id, + const char *name, + unsigned char *uuid, + const char *disk, + const char *path, + unsigned long long threshold, + unsigned long long length) +{ + virDomainEventWriteThresholdPtr ev; + + if (virDomainEventsInitialize() < 0) + return NULL; + + if (!(ev = virDomainEventNew(virDomainEventWriteThresholdClass, + VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD, + id, name, uuid))) + return NULL; + + if (VIR_STRDUP(ev->disk, disk) < 0) + goto error; + if (VIR_STRDUP(ev->path, path) < 0) + goto error; + ev->threshold = threshold; + ev->length = length; + + return (virObjectEventPtr)ev; + + error: + virObjectUnref(ev); + return NULL; +} + +virObjectEventPtr +virDomainEventWriteThresholdNewFromObj(virDomainObjPtr obj, + const char *disk, + const char *path, + unsigned long long threshold, + unsigned long long length) +{ + return virDomainEventWriteThresholdNew(obj->def->id, obj->def->name, + obj->def->uuid, disk, path, + threshold, length); +} + +virObjectEventPtr +virDomainEventWriteThresholdNewFromDom(virDomainPtr dom, + const char *disk, + const char *path, + unsigned long long threshold, + unsigned long long length) +{ + return virDomainEventWriteThresholdNew(dom->id, dom->name, dom->uuid, + disk, path, threshold, length); +} + static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -1615,7 +1700,19 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, } case VIR_DOMAIN_EVENT_ID_WRITE_THRESHOLD: - /* TODO: Implement RPC support for this */ + { + virDomainEventWriteThresholdPtr ev; + + ev = (virDomainEventWriteThresholdPtr)event; + ((virConnectDomainEventWriteThresholdCallback)cb)(conn, dom, + ev->disk, + ev->path, + ev->threshold, + ev->length, + cbopaque); + goto cleanup; + } + case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/conf/domain_event.h b/src/conf/domain_event.h index afbed89..6e31e3e 100644 --- a/src/conf/domain_event.h +++ b/src/conf/domain_event.h @@ -1,7 +1,7 @@ /* * domain_event.h: domain event queue processing helpers * - * Copyright (C) 2012-2014 Red Hat, Inc. + * Copyright (C) 2012-2015 Red Hat, Inc. * Copyright (C) 2008 VirtualIron * Copyright (C) 2013 SUSE LINUX Products GmbH, Nuernberg, Germany. * @@ -209,6 +209,20 @@ virDomainEventAgentLifecycleNewFromDom(virDomainPtr dom, int state, int reason); +virObjectEventPtr +virDomainEventWriteThresholdNewFromObj(virDomainObjPtr obj, + const char *disk, + const char *path, + unsigned long long threshold, + unsigned long long length); +virObjectEventPtr +virDomainEventWriteThresholdNewFromDom(virDomainPtr dom, + const char *disk, + const char *path, + unsigned long long threshold, + unsigned long long length); + + int virDomainEventStateRegister(virConnectPtr conn, virObjectEventStatePtr state, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 1566d11..bd16860 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -541,6 +541,8 @@ virDomainEventTunableNewFromDom; virDomainEventTunableNewFromObj; virDomainEventWatchdogNewFromDom; virDomainEventWatchdogNewFromObj; +virDomainEventWriteThresholdNewFromDom; +virDomainEventWriteThresholdNewFromObj; virDomainQemuMonitorEventNew; virDomainQemuMonitorEventStateRegisterID; -- 2.4.3

Pass the new virDomainBlockSetWriteThreshold() API and WRITE_THRESHOLD event across RPC. The event takes the bulk of the patch, but everything here is pretty mechanical and copies patterns from existing code. Yes, it is intentional that registration takes a single mandatory string (which can be either "vda" or path notation), while the reported event has two strings (one mandatory, for "vda"; the other optional for when path is available). I debated about splitting the event separately from the API for setting the threshold, in case we ever want to support setting a write threshold by overloading an existing API such as virDomainSetBlockIoTune(); but consensus on the design was that overloading existing API was not appropriate, and that no one is expecting to use this functionality without also being prepared to require the .so bump. * daemon/remote.c (remoteRelayDomainEventWriteThreshold): New function. * src/remote/remote_driver.c (remoteDomainBuildEventCallbackWriteThreshold): Likewise. * src/remote/remote_protocol.x (remote_domain_event_callback_write_threshold_msg) (remote_domain_block_set_write_threshold_args): New structs. * src/remote_protocol-structs: Update. Signed-off-by: Eric Blake <eblake@redhat.com> --- daemon/remote.c | 53 ++++++++++++++++++++++++++++++++++++++++++-- src/remote/remote_driver.c | 35 +++++++++++++++++++++++++++++ src/remote/remote_protocol.x | 30 ++++++++++++++++++++++++- src/remote_protocol-structs | 16 +++++++++++++ 4 files changed, 131 insertions(+), 3 deletions(-) diff --git a/daemon/remote.c b/daemon/remote.c index 283ece2..457e220 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1079,6 +1079,56 @@ remoteRelayDomainEventDeviceAdded(virConnectPtr conn, } +static int +remoteRelayDomainEventWriteThreshold(virConnectPtr conn, + virDomainPtr dom, + const char *disk, + const char *path, + unsigned long long threshold, + unsigned long long length, + void *opaque) +{ + daemonClientEventCallbackPtr callback = opaque; + remote_domain_event_callback_write_threshold_msg data; + char **path_p = NULL; + + if (callback->callbackID < 0 || + !remoteRelayDomainEventCheckACL(callback->client, conn, dom)) + return -1; + + VIR_DEBUG("Relaying domain write threshold event %s %d %s (%s) %llu %llu, " + "callback %d", dom->name, dom->id, + disk, NULLSTR(path), threshold, length, callback->callbackID); + + /* build return data */ + memset(&data, 0, sizeof(data)); + + if (VIR_STRDUP(data.disk, disk) < 0) + goto error; + if (path && + ((VIR_ALLOC(path_p) < 0) || + VIR_STRDUP(*path_p, path) < 0)) + goto error; + + make_nonnull_domain(&data.dom, dom); + data.callbackID = callback->callbackID; + data.path = path_p; + data.threshold = threshold; + data.length = length; + + remoteDispatchObjectEventSend(callback->client, remoteProgram, + REMOTE_PROC_DOMAIN_EVENT_CALLBACK_WRITE_THRESHOLD, + (xdrproc_t)xdr_remote_domain_event_callback_write_threshold_msg, + &data); + + return 0; + error: + VIR_FREE(data.disk); + VIR_FREE(data.path); + VIR_FREE(path_p); + return -1; +} + static virConnectDomainEventGenericCallback domainEventCallbacks[] = { @@ -1102,8 +1152,7 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventTunable), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventAgentLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventDeviceAdded), - /* TODO: Implement RPC support for this */ - VIR_DOMAIN_EVENT_CALLBACK(NULL), + VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventWriteThreshold), }; verify(ARRAY_CARDINALITY(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 273799b..f0b1941 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -342,6 +342,12 @@ remoteDomainBuildEventCallbackAgentLifecycle(virNetClientProgramPtr prog, void *evdata, void *opaque); static void +remoteDomainBuildEventCallbackWriteThreshold(virNetClientProgramPtr prog, + virNetClientPtr client, + void *evdata, void *opaque); + + +static void remoteNetworkBuildEventLifecycle(virNetClientProgramPtr prog ATTRIBUTE_UNUSED, virNetClientPtr client ATTRIBUTE_UNUSED, void *evdata, void *opaque); @@ -504,6 +510,10 @@ static virNetClientProgramEvent remoteEvents[] = { remoteDomainBuildEventCallbackDeviceAdded, sizeof(remote_domain_event_callback_device_added_msg), (xdrproc_t)xdr_remote_domain_event_callback_device_added_msg }, + { REMOTE_PROC_DOMAIN_EVENT_CALLBACK_WRITE_THRESHOLD, + remoteDomainBuildEventCallbackWriteThreshold, + sizeof(remote_domain_event_callback_write_threshold_msg), + (xdrproc_t)xdr_remote_domain_event_callback_write_threshold_msg }, }; @@ -5518,6 +5528,30 @@ remoteDomainBuildEventCallbackAgentLifecycle(virNetClientProgramPtr prog ATTRIBU } static void +remoteDomainBuildEventCallbackWriteThreshold(virNetClientProgramPtr prog ATTRIBUTE_UNUSED, + virNetClientPtr client ATTRIBUTE_UNUSED, + void *evdata, void *opaque) +{ + virConnectPtr conn = opaque; + remote_domain_event_callback_write_threshold_msg *msg = evdata; + struct private_data *priv = conn->privateData; + virDomainPtr dom; + virObjectEventPtr event = NULL; + const char *path = msg->path ? *msg->path : NULL; + + if (!(dom = get_nonnull_domain(conn, msg->dom))) + return; + + event = virDomainEventWriteThresholdNewFromDom(dom, msg->disk, path, + msg->threshold, + msg->length); + + virObjectUnref(dom); + + remoteEventQueue(priv, event, msg->callbackID); +} + +static void remoteNetworkBuildEventLifecycle(virNetClientProgramPtr prog ATTRIBUTE_UNUSED, virNetClientPtr client ATTRIBUTE_UNUSED, void *evdata, void *opaque) @@ -8275,6 +8309,7 @@ static virHypervisorDriver hypervisor_driver = { .domainBlockResize = remoteDomainBlockResize, /* 0.9.8 */ .domainBlockStats = remoteDomainBlockStats, /* 0.3.2 */ .domainBlockStatsFlags = remoteDomainBlockStatsFlags, /* 0.9.5 */ + .domainBlockSetWriteThreshold = remoteDomainBlockSetWriteThreshold, /* 1.3.0 */ .domainInterfaceStats = remoteDomainInterfaceStats, /* 0.3.2 */ .domainSetInterfaceParameters = remoteDomainSetInterfaceParameters, /* 0.9.9 */ .domainGetInterfaceParameters = remoteDomainGetInterfaceParameters, /* 0.9.9 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 9f1be6b..0461319 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3230,6 +3230,22 @@ struct remote_domain_set_user_password_args { unsigned int flags; }; +struct remote_domain_event_callback_write_threshold_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string disk; + remote_string path; + unsigned hyper threshold; + unsigned hyper length; +}; + +struct remote_domain_block_set_write_threshold_args { + remote_nonnull_domain dom; + remote_nonnull_string disk; + unsigned hyper threshold; + unsigned int flags; +}; + /*----- Protocol. -----*/ @@ -5696,5 +5712,17 @@ enum remote_procedure { * @generate:both * @acl: domain:set_password */ - REMOTE_PROC_DOMAIN_SET_USER_PASSWORD = 357 + REMOTE_PROC_DOMAIN_SET_USER_PASSWORD = 357, + + /** + * @generate: both + * @acl: none + */ + REMOTE_PROC_DOMAIN_EVENT_CALLBACK_WRITE_THRESHOLD = 358, + + /** + * @generate: both + * @acl: domain:block_write + */ + REMOTE_PROC_DOMAIN_BLOCK_SET_WRITE_THRESHOLD = 359 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 48c3bd8..bfa3d3c 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -2684,6 +2684,20 @@ struct remote_domain_set_user_password_args { remote_string password; u_int flags; }; +struct remote_domain_event_callback_write_threshold_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string disk; + remote_string path; + uint64_t threshold; + uint64_t length; +}; +struct remote_domain_block_set_write_threshold_args { + remote_nonnull_domain dom; + remote_nonnull_string disk; + uint64_t threshold; + u_int flags; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3042,4 +3056,6 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_ADD_IOTHREAD = 355, REMOTE_PROC_DOMAIN_DEL_IOTHREAD = 356, REMOTE_PROC_DOMAIN_SET_USER_PASSWORD = 357, + REMOTE_PROC_DOMAIN_EVENT_CALLBACK_WRITE_THRESHOLD = 358, + REMOTE_PROC_DOMAIN_BLOCK_SET_WRITE_THRESHOLD = 359, }; -- 2.4.3

Track whether qemu is new enough to do block thresholds on the active layer (feature added in qemu 2.3). The plan is that even if qemu is too old, attempts to register a threshold value will return failure; but the event handler callback can still be registered successfuly (and will just never fire). Thresholds only work if you use named nodes. But overhauling libvirt to track named nodes is a pain; particularly since a single qcow2 libvirt <disk> element has multiple qemu nodes (one for the qcow2 protocol, that controls what the guest sees, and one for the underlying file, that controls what is done on the host); the threshold information we are interested in is associated with the host view, but that is NOT the node at the active layer of a block device. So while we may eventually want to track node names in libvirt XML and have control over the names of (some) nodes used by qemu, there is no guarantee that we want to do it for all nodes. For now, it is easier to just assume that qemu will supply a generated node name when we don't (a feature added in qemu 2.4), and the rest of this series will merely cache generated node names (re-querying it from qemu as needed after a libvirtd restart) rather than trying to control a fixed name. The ability to set thresholds and the ability to auto name nodes are indepenent enough to be backported separately in downstream qemu, so we have to use two capability bits, and will require both to be set before allowing a write threshold registration. If my assumptions during probing are ever broken (for example, if fdset 0 is no longer wired to read-only /dev/null, or if a named node already exists prior to our point of probing), the code could be made more robust by using the 'null-co' driver instead of 'file', and then by iterating through the array to ensure one of the nodes is visiting the just-registered fdset file. But this patch works for now. I _hate_ the way qemucapabilitiestests works - it is NOT easy to debug how to fix failures in that test. But I don't have time to revamp it today (ideally, the .replies file should include as a comment the command that was associated with a given reply, so that injecting new commands in the code can more easily match that to the .replies file. Furthermore, the replies should not be so sensitive to changing "id" fields). * src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCK_WRITE_THRESHOLD) (QEMU_CAPS_AUTO_NODE_NAMES): New bits * src/qemu/qemu_capabilities.c (virQEMUCapsCommands): Enable them... (virQEMUCapsProbeQMPCommands): ...but only if they actually work. * src/qemu/qemu_monitor.c (qemuMonitorSupportsNodeNames): New function. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONSupportsNodeNames): Likewise. * src/qemu/qemu_monitor.h (qemuMonitorSupportsNodeNames): New prototype. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONSupportsNodeNames): Likewise. * tests/qemucapabilitiesdata/caps_2.1.1-1.replies: Update test. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_capabilities.c | 12 +- src/qemu/qemu_capabilities.h | 4 +- src/qemu/qemu_monitor.c | 12 + src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 65 ++++ src/qemu/qemu_monitor_json.h | 3 + tests/qemucapabilitiesdata/caps_2.1.1-1.replies | 470 ++++++++++++------------ 7 files changed, 336 insertions(+), 231 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index e7002a3..af597e3 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1,7 +1,7 @@ /* * qemu_capabilities.c: QEMU capabilities generation * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -288,6 +288,8 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "vhost-user-multiqueue", /* 190 */ "arm-virt-pci", + "block-write-threshold", + "auto-node-names", ); @@ -1499,6 +1501,8 @@ struct virQEMUCapsStringFlags virQEMUCapsCommands[] = { { "nbd-server-start", QEMU_CAPS_NBD_SERVER }, { "change-backing-file", QEMU_CAPS_CHANGE_BACKING_FILE }, { "rtc-reset-reinjection", QEMU_CAPS_RTC_RESET_REINJECTION }, + { "block-set-write-threshold", QEMU_CAPS_BLOCK_WRITE_THRESHOLD }, + { "query-named-block-nodes", QEMU_CAPS_AUTO_NODE_NAMES }, }; struct virQEMUCapsStringFlags virQEMUCapsMigration[] = { @@ -2359,6 +2363,12 @@ virQEMUCapsProbeQMPCommands(virQEMUCapsPtr qemuCaps, qemuMonitorSupportsActiveCommit(mon)) virQEMUCapsSet(qemuCaps, QEMU_CAPS_ACTIVE_COMMIT); + /* Qemu 2.3 did not automatically set node names, so querying node + * information is useless if we don't supply them. */ + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_AUTO_NODE_NAMES) && + !qemuMonitorSupportsNodeNames(mon)) + virQEMUCapsClear(qemuCaps, QEMU_CAPS_AUTO_NODE_NAMES); + return 0; } diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index f4180a8..32877c8 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -1,7 +1,7 @@ /* * qemu_capabilities.h: QEMU capabilities generation * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -231,6 +231,8 @@ typedef enum { QEMU_CAPS_CPU_AARCH64_OFF = 189, /* -cpu ...,aarch64=off */ QEMU_CAPS_VHOSTUSER_MULTIQUEUE = 190, /* vhost-user with -netdev queues= */ QEMU_CAPS_ARM_VIRT_PCI = 191, /* ARM 'virt' machine has PCI bus */ + QEMU_CAPS_BLOCK_WRITE_THRESHOLD = 192, /* block-set-write-threshold */ + QEMU_CAPS_AUTO_NODE_NAMES = 193, /* query-named-block-nodes is useful */ QEMU_CAPS_LAST, /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 904d682..2e9e2de 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3082,6 +3082,18 @@ qemuMonitorSupportsActiveCommit(qemuMonitorPtr mon) } +/* Probe whether node names are automatically assigned by a given qemu + * binary; assumes that fd set 0 is available for device hotplug. */ +bool +qemuMonitorSupportsNodeNames(qemuMonitorPtr mon) +{ + if (!mon || !mon->json) + return false; + + return qemuMonitorJSONSupportsNodeNames(mon); +} + + /* Determine the name that qemu is using for tracking the backing * element TARGET within the chain starting at TOP. */ char * diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 1afc344..e198c06 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -739,6 +739,7 @@ int qemuMonitorBlockCommit(qemuMonitorPtr mon, unsigned long long bandwidth) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); bool qemuMonitorSupportsActiveCommit(qemuMonitorPtr mon); +bool qemuMonitorSupportsNodeNames(qemuMonitorPtr mon); char *qemuMonitorDiskNameLookup(qemuMonitorPtr mon, const char *device, virStorageSourcePtr top, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 71e8f4b..e4701aa 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3900,6 +3900,71 @@ qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, } +/* Used only for capability probing. Assumes that fd set 0 is already + * connected to /dev/null and that we have no existing nodes; we then + * try to add the fd as a block device, and see if the new device has + * a node name assigned by qemu. */ +bool +qemuMonitorJSONSupportsNodeNames(qemuMonitorPtr mon) +{ + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr options = virJSONValueNewObject(); + bool ret = false; + virJSONValuePtr nodes; + + /* Create a node. */ + /* TODO: We could try using 'null-co' instead of 'file' as the + * driver, if we don't want to depend on /dev/fdset/0 already + * being present. */ + if (!options || + virJSONValueObjectAdd(options, + "s:driver", "file", + "s:id", "drive-fdset0", + "s:filename", "/dev/fdset/0", + "b:read-only", true, + NULL) < 0) + goto cleanup; + cmd = qemuMonitorJSONMakeCommand("blockdev-add", + "a:options", options, + NULL); + if (!cmd) + goto cleanup; + options = NULL; + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + if (virJSONValueObjectGet(reply, "error")) + goto cleanup; + + /* Then query for all nodes. */ + virJSONValueFree(cmd); + virJSONValueFree(reply); + cmd = qemuMonitorJSONMakeCommand("query-named-block-nodes", NULL); + if (!cmd) + goto cleanup; + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + if (virJSONValueObjectGet(reply, "error")) + goto cleanup; + + /* The array will be empty unless an automatic node name was assigned. */ + /* TODO: If we are paranoid that other named nodes might already + * exist before this probe, we could iterate the returned array + * and look for the fdset file name we just added. */ + nodes = virJSONValueObjectGet(reply, "return"); + if (nodes && virJSONValueArraySize(nodes) > 0) + ret = true; + + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + virJSONValueFree(options); + + return ret; + +} + + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index c0ee4ce..a8ae411 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -282,6 +282,9 @@ char *qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); +bool qemuMonitorJSONSupportsNodeNames(qemuMonitorPtr mon) + ATTRIBUTE_NONNULL(1); + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/tests/qemucapabilitiesdata/caps_2.1.1-1.replies b/tests/qemucapabilitiesdata/caps_2.1.1-1.replies index 511461a..04d4431 100644 --- a/tests/qemucapabilitiesdata/caps_2.1.1-1.replies +++ b/tests/qemucapabilitiesdata/caps_2.1.1-1.replies @@ -391,6 +391,18 @@ } { + "return": { + }, + "id": "libvirt-7" +} + +{ + "return": [ + ], + "id": "libvirt-8" +} + +{ "return": [ { "name": "VSERPORT_CHANGE" @@ -489,7 +501,7 @@ "name": "BLOCK_IMAGE_CORRUPTED" } ], - "id": "libvirt-7" + "id": "libvirt-9" } { @@ -1290,7 +1302,7 @@ "name": "fusbh200-ehci-usb" } ], - "id": "libvirt-8" + "id": "libvirt-10" } { @@ -1400,7 +1412,7 @@ "type": "uint32" } ], - "id": "libvirt-9" + "id": "libvirt-11" } { @@ -1562,11 +1574,11 @@ "type": "on/off" } ], - "id": "libvirt-10" + "id": "libvirt-12" } { - "id": "libvirt-11", + "id": "libvirt-13", "error": { "class": "DeviceNotFound", "desc": "Device 'virtio-blk-ccw' not found" @@ -1574,7 +1586,7 @@ } { - "id": "libvirt-12", + "id": "libvirt-14", "error": { "class": "DeviceNotFound", "desc": "Device 'virtio-net-ccw' not found" @@ -1582,7 +1594,7 @@ } { - "id": "libvirt-13", + "id": "libvirt-15", "error": { "class": "DeviceNotFound", "desc": "Device 'virtio-blk-s390' not found" @@ -1590,7 +1602,7 @@ } { - "id": "libvirt-14", + "id": "libvirt-16", "error": { "class": "DeviceNotFound", "desc": "Device 'virtio-net-s390' not found" @@ -1598,7 +1610,7 @@ } { - "id": "libvirt-15", + "id": "libvirt-17", "error": { "class": "DeviceNotFound", "desc": "Device 'pci-assign' not found" @@ -1648,7 +1660,7 @@ "type": "pci-host-devaddr" } ], - "id": "libvirt-16" + "id": "libvirt-18" } { @@ -1690,7 +1702,7 @@ "type": "pci-host-devaddr" } ], - "id": "libvirt-17" + "id": "libvirt-19" } { @@ -1776,7 +1788,7 @@ "type": "drive" } ], - "id": "libvirt-18" + "id": "libvirt-20" } { @@ -1830,7 +1842,7 @@ "type": "drive" } ], - "id": "libvirt-19" + "id": "libvirt-21" } { @@ -1880,7 +1892,7 @@ "type": "uint32" } ], - "id": "libvirt-20" + "id": "libvirt-22" } { @@ -1918,7 +1930,7 @@ "type": "chr" } ], - "id": "libvirt-21" + "id": "libvirt-23" } { @@ -1980,76 +1992,76 @@ "type": "uint32" } ], - "id": "libvirt-22" -} - -{ - "return": [ - { - "name": "lun", - "type": "uint32" - }, - { - "name": "scsi-id", - "type": "uint32" - }, - { - "name": "channel", - "type": "uint32" - }, - { - "name": "bootindex", - "type": "int32" - }, - { - "name": "drive", - "type": "drive" - } - ], - "id": "libvirt-23" -} - -{ - "return": [ - { - "name": "pci-hole64-end", - "type": "int" - }, - { - "name": "pci-hole64-start", - "type": "int" - }, - { - "name": "pci-hole-end", - "type": "int" - }, - { - "name": "pci-hole-start", - "type": "int" - }, - { - "name": "pci-conf-data[0]", - "type": "child<qemu:memory-region>" - }, - { - "name": "pci-conf-idx[0]", - "type": "child<qemu:memory-region>" - }, - { - "name": "short_root_bus", - "type": "uint32" - }, - { - "name": "pci-hole64-size", - "type": "size" - } - ], "id": "libvirt-24" } { "return": [ { + "name": "lun", + "type": "uint32" + }, + { + "name": "scsi-id", + "type": "uint32" + }, + { + "name": "channel", + "type": "uint32" + }, + { + "name": "bootindex", + "type": "int32" + }, + { + "name": "drive", + "type": "drive" + } + ], + "id": "libvirt-25" +} + +{ + "return": [ + { + "name": "pci-hole64-end", + "type": "int" + }, + { + "name": "pci-hole64-start", + "type": "int" + }, + { + "name": "pci-hole-end", + "type": "int" + }, + { + "name": "pci-hole-start", + "type": "int" + }, + { + "name": "pci-conf-data[0]", + "type": "child<qemu:memory-region>" + }, + { + "name": "pci-conf-idx[0]", + "type": "child<qemu:memory-region>" + }, + { + "name": "short_root_bus", + "type": "uint32" + }, + { + "name": "pci-hole64-size", + "type": "size" + } + ], + "id": "libvirt-26" +} + +{ + "return": [ + { "name": "mcfg_size", "type": "int" }, @@ -2094,7 +2106,7 @@ "type": "uint64" } ], - "id": "libvirt-25" + "id": "libvirt-27" } { @@ -2148,7 +2160,7 @@ "type": "drive" } ], - "id": "libvirt-26" + "id": "libvirt-28" } { @@ -2162,7 +2174,7 @@ "type": "uint32" } ], - "id": "libvirt-27" + "id": "libvirt-29" } { @@ -2196,106 +2208,6 @@ "type": "uint32" } ], - "id": "libvirt-28" -} - -{ - "return": [ - { - "name": "command_serr_enable", - "type": "on/off" - }, - { - "name": "multifunction", - "type": "on/off" - }, - { - "name": "rombar", - "type": "uint32" - }, - { - "name": "romfile", - "type": "string" - }, - { - "name": "addr", - "type": "pci-devfn" - }, - { - "name": "vgamem_mb", - "type": "uint32" - } - ], - "id": "libvirt-29" -} - -{ - "return": [ - { - "name": "command_serr_enable", - "type": "on/off" - }, - { - "name": "multifunction", - "type": "on/off" - }, - { - "name": "rombar", - "type": "uint32" - }, - { - "name": "romfile", - "type": "string" - }, - { - "name": "addr", - "type": "pci-devfn" - }, - { - "name": "surfaces", - "type": "int32" - }, - { - "name": "vgamem_mb", - "type": "uint32" - }, - { - "name": "vram64_size_mb", - "type": "uint32" - }, - { - "name": "vram_size_mb", - "type": "uint32" - }, - { - "name": "ram_size_mb", - "type": "uint32" - }, - { - "name": "cmdlog", - "type": "uint32" - }, - { - "name": "guestdebug", - "type": "uint32" - }, - { - "name": "debug", - "type": "uint32" - }, - { - "name": "revision", - "type": "uint32" - }, - { - "name": "vram_size", - "type": "uint32" - }, - { - "name": "ram_size", - "type": "uint32" - } - ], "id": "libvirt-30" } @@ -2322,54 +2234,154 @@ "type": "pci-devfn" }, { - "name": "surfaces", - "type": "int32" - }, - { "name": "vgamem_mb", "type": "uint32" - }, - { - "name": "vram64_size_mb", - "type": "uint32" - }, - { - "name": "vram_size_mb", - "type": "uint32" - }, - { - "name": "ram_size_mb", - "type": "uint32" - }, - { - "name": "cmdlog", - "type": "uint32" - }, - { - "name": "guestdebug", - "type": "uint32" - }, - { - "name": "debug", - "type": "uint32" - }, - { - "name": "revision", - "type": "uint32" - }, - { - "name": "vram_size", - "type": "uint32" - }, - { - "name": "ram_size", - "type": "uint32" } ], "id": "libvirt-31" } { + "return": [ + { + "name": "command_serr_enable", + "type": "on/off" + }, + { + "name": "multifunction", + "type": "on/off" + }, + { + "name": "rombar", + "type": "uint32" + }, + { + "name": "romfile", + "type": "string" + }, + { + "name": "addr", + "type": "pci-devfn" + }, + { + "name": "surfaces", + "type": "int32" + }, + { + "name": "vgamem_mb", + "type": "uint32" + }, + { + "name": "vram64_size_mb", + "type": "uint32" + }, + { + "name": "vram_size_mb", + "type": "uint32" + }, + { + "name": "ram_size_mb", + "type": "uint32" + }, + { + "name": "cmdlog", + "type": "uint32" + }, + { + "name": "guestdebug", + "type": "uint32" + }, + { + "name": "debug", + "type": "uint32" + }, + { + "name": "revision", + "type": "uint32" + }, + { + "name": "vram_size", + "type": "uint32" + }, + { + "name": "ram_size", + "type": "uint32" + } + ], + "id": "libvirt-32" +} + +{ + "return": [ + { + "name": "command_serr_enable", + "type": "on/off" + }, + { + "name": "multifunction", + "type": "on/off" + }, + { + "name": "rombar", + "type": "uint32" + }, + { + "name": "romfile", + "type": "string" + }, + { + "name": "addr", + "type": "pci-devfn" + }, + { + "name": "surfaces", + "type": "int32" + }, + { + "name": "vgamem_mb", + "type": "uint32" + }, + { + "name": "vram64_size_mb", + "type": "uint32" + }, + { + "name": "vram_size_mb", + "type": "uint32" + }, + { + "name": "ram_size_mb", + "type": "uint32" + }, + { + "name": "cmdlog", + "type": "uint32" + }, + { + "name": "guestdebug", + "type": "uint32" + }, + { + "name": "debug", + "type": "uint32" + }, + { + "name": "revision", + "type": "uint32" + }, + { + "name": "vram_size", + "type": "uint32" + }, + { + "name": "ram_size", + "type": "uint32" + } + ], + "id": "libvirt-33" +} + +{ "return": [ { "name": "pc-1.3", @@ -2479,7 +2491,7 @@ "cpu-max": 255 } ], - "id": "libvirt-32" + "id": "libvirt-34" } { @@ -2560,7 +2572,7 @@ "name": "qemu64" } ], - "id": "libvirt-33" + "id": "libvirt-35" } { @@ -2568,21 +2580,21 @@ "enabled": false, "present": true }, - "id": "libvirt-34" + "id": "libvirt-36" } { "return": [ "tpm-tis" ], - "id": "libvirt-35" + "id": "libvirt-37" } { "return": [ "passthrough" ], - "id": "libvirt-36" + "id": "libvirt-38" } { @@ -3442,7 +3454,7 @@ "option": "drive" } ], - "id": "libvirt-37" + "id": "libvirt-39" } { @@ -3464,5 +3476,5 @@ "capability": "zero-blocks" } ], - "id": "libvirt-38" + "id": "libvirt-40" } -- 2.4.3

Implement the QMP side of asking qemu what node name it assigned to an arbitrary top-level device node. Assumes that the caller will have already validated that the device is qcow2 backed by a block device, and that qemu auto-assigns node names. * src/qemu/qemu_monitor.h (qemuMonitorNodeNameLookup): New function. * src/qemu/qemu_monitor.c (qemuMonitorNodeNameLookup): Likewise. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONNodeNameLookup): Likewise. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONNodeNameLookup): Likewise. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_monitor.c | 15 ++++++++- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 77 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ 4 files changed, 97 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 2e9e2de..5612491 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3094,7 +3094,7 @@ qemuMonitorSupportsNodeNames(qemuMonitorPtr mon) } -/* Determine the name that qemu is using for tracking the backing +/* Determine the path name that qemu is using for tracking the backing * element TARGET within the chain starting at TOP. */ char * qemuMonitorDiskNameLookup(qemuMonitorPtr mon, @@ -3108,6 +3108,19 @@ qemuMonitorDiskNameLookup(qemuMonitorPtr mon, } +/* Determine the node name that qemu is using for tracking the raw + * file of the qcow2 protocol at DEVICE. */ +char * +qemuMonitorNodeNameLookup(qemuMonitorPtr mon, + const char *device) +{ + /* TODO - change signature to allow backing file lookups */ + QEMU_CHECK_MONITOR_JSON_NULL(mon); + + return qemuMonitorJSONNodeNameLookup(mon, device); +} + + /* Use the block-job-complete monitor command to pivot a block copy job. */ int qemuMonitorDrivePivot(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index e198c06..826835b 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -745,6 +745,9 @@ char *qemuMonitorDiskNameLookup(qemuMonitorPtr mon, virStorageSourcePtr top, virStorageSourcePtr target) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); +char *qemuMonitorNodeNameLookup(qemuMonitorPtr mon, + const char *device) + ATTRIBUTE_NONNULL(2); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e4701aa..32b2719 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3835,6 +3835,8 @@ qemuMonitorJSONDiskNameLookupOne(virJSONValuePtr image, } +/* Look up the image name that qemu is using for a member in a backing + * chain. */ char * qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, const char *device, @@ -3900,6 +3902,81 @@ qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, } +/* Look up the node name of the file underneath a qcow2 image. */ +char * +qemuMonitorJSONNodeNameLookup(qemuMonitorPtr mon, + const char *device) +{ + char *ret = NULL; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr devices; + size_t i; + + cmd = qemuMonitorJSONMakeCommand("query-blockstats", NULL); + if (!cmd) + return NULL; + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + if (!(devices = virJSONValueObjectGetArray(reply, "return"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("blockstats reply was missing device list")); + goto cleanup; + } + + for (i = 0; i < virJSONValueArraySize(devices); i++) { + virJSONValuePtr dev = virJSONValueArrayGet(devices, i); + virJSONValuePtr parent; + const char *thisdev; + const char *node; + + if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("blockstats entry was not in expected format")); + goto cleanup; + } + + if (!(thisdev = virJSONValueObjectGetString(dev, "device"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("blockstats entry was not in expected format")); + goto cleanup; + } + + if (STRPREFIX(thisdev, QEMU_DRIVE_HOST_PREFIX)) + thisdev += strlen(QEMU_DRIVE_HOST_PREFIX); + if (STRNEQ(thisdev, device)) + continue; + + if (!(parent = virJSONValueObjectGetObject(dev, "parent"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("blockstats device %s missing parent node"), + device); + goto cleanup; + } + + if (!(node = virJSONValueObjectGetString(parent, "node-name"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("node name missing for device %s"), device); + goto cleanup; + } + ignore_value(VIR_STRDUP(ret, node)); + goto cleanup; + } + + /* If we get here, we didn't find the device. */ + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to find node name for device %s"), + device); + + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + + return ret; +} + + /* Used only for capability probing. Assumes that fd set 0 is already * connected to /dev/null and that we have no existing nodes; we then * try to add the fd as a block device, and see if the new device has diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index a8ae411..f18295b 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -281,6 +281,9 @@ char *qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, virStorageSourcePtr target) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); +char *qemuMonitorJSONNodeNameLookup(qemuMonitorPtr mon, + const char *device) + ATTRIBUTE_NONNULL(2); bool qemuMonitorJSONSupportsNodeNames(qemuMonitorPtr mon) ATTRIBUTE_NONNULL(1); -- 2.4.3

On Mon, Jun 22, 2015 at 17:06:42 -0600, Eric Blake wrote:
Implement the QMP side of asking qemu what node name it assigned to an arbitrary top-level device node. Assumes that the caller will have already validated that the device is qcow2 backed by a block device, and that qemu auto-assigns node names.
* src/qemu/qemu_monitor.h (qemuMonitorNodeNameLookup): New function. * src/qemu/qemu_monitor.c (qemuMonitorNodeNameLookup): Likewise. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONNodeNameLookup): Likewise. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONNodeNameLookup): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_monitor.c | 15 ++++++++- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 77 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ 4 files changed, 97 insertions(+), 1 deletion(-)
...
@@ -3900,6 +3902,81 @@ qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, }
+/* Look up the node name of the file underneath a qcow2 image. */ +char * +qemuMonitorJSONNodeNameLookup(qemuMonitorPtr mon, + const char *device) +{ + char *ret = NULL; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr devices; + size_t i; + + cmd = qemuMonitorJSONMakeCommand("query-blockstats", NULL);
Since this is yet another function that would call query-blockstats ...
+ if (!cmd) + return NULL; + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + if (!(devices = virJSONValueObjectGetArray(reply, "return"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("blockstats reply was missing device list")); + goto cleanup; + } + + for (i = 0; i < virJSONValueArraySize(devices); i++) { + virJSONValuePtr dev = virJSONValueArrayGet(devices, i); + virJSONValuePtr parent; + const char *thisdev; + const char *node; + + if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("blockstats entry was not in expected format")); + goto cleanup; + } + + if (!(thisdev = virJSONValueObjectGetString(dev, "device"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("blockstats entry was not in expected format")); + goto cleanup; + } + + if (STRPREFIX(thisdev, QEMU_DRIVE_HOST_PREFIX)) + thisdev += strlen(QEMU_DRIVE_HOST_PREFIX); + if (STRNEQ(thisdev, device)) + continue; + + if (!(parent = virJSONValueObjectGetObject(dev, "parent"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("blockstats device %s missing parent node"), + device); + goto cleanup; + } + + if (!(node = virJSONValueObjectGetString(parent, "node-name"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("node name missing for device %s"), device); + goto cleanup; + } + ignore_value(VIR_STRDUP(ret, node)); + goto cleanup; + } + + /* If we get here, we didn't find the device. */ + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to find node name for device %s"), + device);
... and works only on one device and thus needs to be re-called for every single disk I'd rather see us fold this code into qemuMonitorJSONGetOneBlockStatsInfo so that all the info can be returned via qemuMonitorJSONGetAllBlockStatsInfo. I'll post a series that kills qemuMonitorJSONGetBlockExtent since it's basically the same instance. Peter

Set up functions to make it easy to map between libvirt disk names and qemu node names. Although we won't expose the information in XML, it is still nicer to cache the information than to have to grab a job lock, so that we are less likely to need to re-query the monitor when dealing with a qemu monitor event. We need the information in two places: one in the hash table used to collect QMP stats (as the QMP design requires using 'query-blockstats' to learn node names, followed by 'query-named-block-nodes' to learn statistics about those nodes), and the other in virStorageSourcePtr (the disk information tracked by libvirt). Making sure this doesn't leak memory was interesting; in particular, in qemu_driver.c:qemUDomainBlocksStatsGather(), I made sure that the returned result does not set an allocation name, so that the callers of that function can continue to use VIR_FREE. * src/util/virstoragefile.h (_virStorageSource): Add a field. * src/util/virstoragefile.c (virStorageSourceBackingStoreClear): Clear it. (virStorageSourceCopy): Document that it is not deep-copied. * src/qemu/qemu_domain.c (qemuDomainDiskGetAllocationNode) (qemuDomainDiskResolveAllocationNode): New functions, to use the field. (qemuDomainDiskSupportsAllocationNode): New helper. * src/qemu/qemu_domain.h: Declare new functions. * src/qemu/qemu_monitor.h (_qemuBlockStats): Add fields. (qemuMonitorCleanBlockStats): New declaration. * src/qemu/qemu_monitor.c (qemuMonitorCleanBlockStats): New function. (qemuMonitorGetAllBlockStatsInfo): Use it to avoid leak. * src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlocksStatsGather): Don't leak memory. * tests/qemumonitortest.c (testBlockInfoData): Update test. --- src/qemu/qemu_domain.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 10 ++++- src/qemu/qemu_driver.c | 1 + src/qemu/qemu_migration.c | 2 +- src/qemu/qemu_monitor.c | 12 +++++- src/qemu/qemu_monitor.h | 4 ++ src/util/virstoragefile.c | 4 +- src/util/virstoragefile.h | 3 +- tests/qemumonitortest.c | 13 +++--- 9 files changed, 143 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6213fd9..d3ce7db 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2906,6 +2906,111 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver, } +static bool +qemuDomainDiskSupportsAllocationNode(virDomainDiskDefPtr disk) +{ + /* For now, only qcow2 images backed by block devices are supported. */ + if (disk->src->type != VIR_STORAGE_TYPE_BLOCK) + return false; + if (disk->src->format != VIR_STORAGE_FILE_QCOW2) + return false; + return true; +} + + +/* Determine the node name that qemu associates with allocation + * tracking. Cache the value to minimize queries. Return NULL if the + * storage source is not a qcow2 formatted block device (as those are + * the only devices where live allocation tracking is useful), or if + * qemu cannot determine a node name. Requires a live domain, and + * assumes that the job lock is held, as this may require monitor + * interaction. */ +const char * +qemuDomainDiskGetAllocationNode(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + virStorageSourcePtr src = disk->src; + char *name = NULL; + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!qemuDomainDiskSupportsAllocationNode(disk)) + return NULL; + + if (src->allocation_node) + goto cleanup; + + qemuDomainObjEnterMonitor(driver, vm); + /* TODO: allow lookup of node name of backing images */ + name = qemuMonitorNodeNameLookup(priv->mon, disk->info.alias); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + goto cleanup; + + src->allocation_node = name; + name = NULL; + + cleanup: + VIR_FREE(name); + return src->allocation_node; +} + + +/* Determine the disk name that matches a node name returned by qemu + * in a threshold event. In the common case, no job is needed: the + * node name will be cached from when the threshold was + * registered. But if the cache is lost (such as over a libvirtd + * restart), JOB_OKAY states whether it is safe to rebuild the cache + * (requires a live domain and monitor interaction), instead of + * failing. */ +virDomainDiskDefPtr +qemuDomainDiskResolveAllocationNode(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *node, + bool job_okay) +{ + size_t i; + virDomainDiskDefPtr disk; + virDomainDiskDefPtr ret = NULL; + char *name = NULL; + qemuDomainObjPrivatePtr priv = vm->privateData; + + /* First pass - see if the node name is cached. */ + for (i = 0; i < vm->def->ndisks; i++) { + disk = vm->def->disks[i]; + if (!qemuDomainDiskSupportsAllocationNode(disk)) + continue; + if (STREQ_NULLABLE(disk->src->allocation_node, node)) + return disk; + } + + /* Second pass - query the monitor. */ + if (!job_okay) + return NULL; + for (i = 0; i < vm->def->ndisks; i++) { + disk = vm->def->disks[i]; + if (!qemuDomainDiskSupportsAllocationNode(disk) || + disk->src->allocation_node) + continue; + + qemuDomainObjEnterMonitor(driver, vm); + name = qemuMonitorNodeNameLookup(priv->mon, disk->info.alias); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + goto cleanup; + + disk->src->allocation_node = name; + name = NULL; + if (STREQ_NULLABLE(disk->src->allocation_node, node)) { + ret = disk; + break; + } + } + + cleanup: + VIR_FREE(name); + return ret; +} + + bool qemuDomainDiskBlockJobIsActive(virDomainDiskDefPtr disk) { diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 54e1e7b..9550349 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1,7 +1,7 @@ /* * qemu_domain.h: QEMU domain private state * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -408,6 +408,14 @@ int qemuDomainDetermineDiskChain(virQEMUDriverPtr driver, bool force_probe, bool report_broken); +const char *qemuDomainDiskGetAllocationNode(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); +virDomainDiskDefPtr qemuDomainDiskResolveAllocationNode(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *node, + bool job_okay); + int qemuDomainStorageFileInit(virQEMUDriverPtr driver, virDomainObjPtr vm, virStorageSourcePtr src); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c1373de..25bc76d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10967,6 +10967,7 @@ qemuDomainBlocksStatsGather(virQEMUDriverPtr driver, _("cannot find statistics for device '%s'"), diskAlias); goto cleanup; } + VIR_FREE(stats->allocation_node); **retstats = *stats; } else { diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 47d49cd..1651edc 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -569,7 +569,7 @@ qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, qemuBlockStats *entry; if (!stats) { - if (!(stats = virHashCreate(10, virHashValueFree))) + if (!(stats = virHashCreate(10, qemuMonitorCleanBlockStats))) goto cleanup; if (qemuDomainObjEnterMonitorAsync(driver, vm, diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 5612491..86dc4be 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1763,6 +1763,16 @@ qemuMonitorBlockInfoLookup(virHashTablePtr blockInfo, } +/* Callback for use in cleaning up a hash used by + * qemuMonitorGetAllBlockStatsInfo. */ +void +qemuMonitorCleanBlockStats(void *opaque, const void *name ATTRIBUTE_UNUSED) +{ + qemuBlockStatsPtr stats = opaque; + VIR_FREE(stats->allocation_node); + VIR_FREE(stats); +} + /** * qemuMonitorGetAllBlockStatsInfo: * @mon: monitor object @@ -1785,7 +1795,7 @@ qemuMonitorGetAllBlockStatsInfo(qemuMonitorPtr mon, QEMU_CHECK_MONITOR(mon); - if (!(*ret_stats = virHashCreate(10, virHashValueFree))) + if (!(*ret_stats = virHashCreate(10, qemuMonitorCleanBlockStats))) goto error; if (mon->json) { diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 826835b..c0ea2ee 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -379,8 +379,12 @@ struct _qemuBlockStats { unsigned long long capacity; unsigned long long physical; unsigned long long wr_highest_offset; + char *allocation_node; + unsigned long long write_threshold; }; +void qemuMonitorCleanBlockStats(void *opaque, const void *name); + int qemuMonitorGetAllBlockStatsInfo(qemuMonitorPtr mon, virHashTablePtr *ret_stats, bool backingChain) diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 6c3017c..a175e1d 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -1,7 +1,7 @@ /* * virstoragefile.c: file utility functions for FS storage backend * - * Copyright (C) 2007-2014 Red Hat, Inc. + * Copyright (C) 2007-2015 Red Hat, Inc. * Copyright (C) 2007-2008 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -1833,6 +1833,7 @@ virStorageSourceCopy(const virStorageSource *src, ret->capacity = src->capacity; ret->allocation = src->allocation; ret->physical = src->physical; + /* allocation node name cache is not copied */ ret->readonly = src->readonly; ret->shared = src->shared; @@ -2048,6 +2049,7 @@ virStorageSourceClear(virStorageSourcePtr def) virStorageSourceSeclabelsClear(def); virStoragePermsFree(def->perms); VIR_FREE(def->timestamps); + VIR_FREE(def->allocation_node); virStorageNetHostDefFree(def->nhosts, def->hosts); virStorageAuthDefFree(def->auth); diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index aa17a00..8b2f491 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -1,7 +1,7 @@ /* * virstoragefile.h: file utility functions for FS storage backend * - * Copyright (C) 2007-2009, 2012-2014 Red Hat, Inc. + * Copyright (C) 2007-2009, 2012-2015 Red Hat, Inc. * Copyright (C) 2007-2008 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -260,6 +260,7 @@ struct _virStorageSource { unsigned long long capacity; /* in bytes, 0 if unknown */ unsigned long long allocation; /* in bytes, 0 if unknown */ unsigned long long physical; /* in bytes, 0 if unknown */ + char *allocation_node; /* cache of node name for tracking allocation */ size_t nseclabels; virSecurityDeviceLabelDefPtr *seclabels; diff --git a/tests/qemumonitortest.c b/tests/qemumonitortest.c index 0125962..b580416 100644 --- a/tests/qemumonitortest.c +++ b/tests/qemumonitortest.c @@ -93,12 +93,13 @@ struct blockInfoData { static const struct blockInfoData testBlockInfoData[] = { -/* NAME, rd_req, rd_bytes, wr_req, wr_bytes, rd_total_time, wr_total_time, flush_req, flush_total_time */ - {"vda", {11, 12, 13, 14, 15, 16, 17, 18, 0, 0, 0}}, - {"vdb", {21, 22, 23, 24, 25, 26, 27, 28, 0, 0, 0}}, - {"vdc", {31, 32, 33, -1, 35, 36, 37, 38, 0, 0, 0}}, - {"vdd", {-1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0}}, - {"vde", {41, 42, 43, 44, 45, 46, 47, 48, 0, 0, 0}} + /* NAME, rd_req, rd_bytes, wr_req, wr_bytes, rd_total_time, wr_total_time, + * flush_req, flush_total_time, allocation_node, write_threshold */ + {"vda", {11, 12, 13, 14, 15, 16, 17, 18, 0, 0, 0, NULL, 0}}, + {"vdb", {21, 22, 23, 24, 25, 26, 27, 28, 0, 0, 0, NULL, 0}}, + {"vdc", {31, 32, 33, -1, 35, 36, 37, 38, 0, 0, 0, NULL, 0}}, + {"vdd", {-1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, NULL, 0}}, + {"vde", {41, 42, 43, 44, 45, 46, 47, 48, 0, 0, 0, NULL, 0}}, }; static const char testBlockInfoReply[] = -- 2.4.3

On Mon, Jun 22, 2015 at 17:06:43 -0600, Eric Blake wrote:
Set up functions to make it easy to map between libvirt disk names and qemu node names. Although we won't expose the information in XML, it is still nicer to cache the information than to have to grab a job lock, so that we are less likely to need to re-query the monitor when dealing with a qemu monitor event. We need the information in two places: one in the hash table used to collect QMP stats (as the QMP design requires using 'query-blockstats' to learn node names, followed by 'query-named-block-nodes' to learn statistics about those nodes), and the other in virStorageSourcePtr (the disk information tracked by libvirt). Making sure this doesn't leak memory was interesting; in particular, in qemu_driver.c:qemUDomainBlocksStatsGather(), I made sure that the returned result does not set an allocation name, so that the callers of that function can continue to use VIR_FREE.
* src/util/virstoragefile.h (_virStorageSource): Add a field. * src/util/virstoragefile.c (virStorageSourceBackingStoreClear): Clear it. (virStorageSourceCopy): Document that it is not deep-copied. * src/qemu/qemu_domain.c (qemuDomainDiskGetAllocationNode) (qemuDomainDiskResolveAllocationNode): New functions, to use the field. (qemuDomainDiskSupportsAllocationNode): New helper. * src/qemu/qemu_domain.h: Declare new functions. * src/qemu/qemu_monitor.h (_qemuBlockStats): Add fields. (qemuMonitorCleanBlockStats): New declaration. * src/qemu/qemu_monitor.c (qemuMonitorCleanBlockStats): New function. (qemuMonitorGetAllBlockStatsInfo): Use it to avoid leak. * src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlocksStatsGather): Don't leak memory. * tests/qemumonitortest.c (testBlockInfoData): Update test. --- src/qemu/qemu_domain.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 10 ++++- src/qemu/qemu_driver.c | 1 + src/qemu/qemu_migration.c | 2 +- src/qemu/qemu_monitor.c | 12 +++++- src/qemu/qemu_monitor.h | 4 ++ src/util/virstoragefile.c | 4 +- src/util/virstoragefile.h | 3 +- tests/qemumonitortest.c | 13 +++--- 9 files changed, 143 insertions(+), 11 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6213fd9..d3ce7db 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2906,6 +2906,111 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver, }
+static bool +qemuDomainDiskSupportsAllocationNode(virDomainDiskDefPtr disk) +{ + /* For now, only qcow2 images backed by block devices are supported. */ + if (disk->src->type != VIR_STORAGE_TYPE_BLOCK) + return false; + if (disk->src->format != VIR_STORAGE_FILE_QCOW2) + return false; + return true; +} + + +/* Determine the node name that qemu associates with allocation + * tracking. Cache the value to minimize queries. Return NULL if the + * storage source is not a qcow2 formatted block device (as those are + * the only devices where live allocation tracking is useful), or if + * qemu cannot determine a node name. Requires a live domain, and + * assumes that the job lock is held, as this may require monitor + * interaction. */ +const char * +qemuDomainDiskGetAllocationNode(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + virStorageSourcePtr src = disk->src; + char *name = NULL; + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!qemuDomainDiskSupportsAllocationNode(disk)) + return NULL; + + if (src->allocation_node) + goto cleanup; + + qemuDomainObjEnterMonitor(driver, vm); + /* TODO: allow lookup of node name of backing images */ + name = qemuMonitorNodeNameLookup(priv->mon, disk->info.alias); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + goto cleanup; + + src->allocation_node = name; + name = NULL; + + cleanup: + VIR_FREE(name); + return src->allocation_node; +} + + +/* Determine the disk name that matches a node name returned by qemu + * in a threshold event. In the common case, no job is needed: the + * node name will be cached from when the threshold was + * registered. But if the cache is lost (such as over a libvirtd + * restart), JOB_OKAY states whether it is safe to rebuild the cache + * (requires a live domain and monitor interaction), instead of + * failing. */ +virDomainDiskDefPtr +qemuDomainDiskResolveAllocationNode(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *node, + bool job_okay) +{ + size_t i; + virDomainDiskDefPtr disk; + virDomainDiskDefPtr ret = NULL; + char *name = NULL; + qemuDomainObjPrivatePtr priv = vm->privateData; + + /* First pass - see if the node name is cached. */ + for (i = 0; i < vm->def->ndisks; i++) { + disk = vm->def->disks[i]; + if (!qemuDomainDiskSupportsAllocationNode(disk)) + continue; + if (STREQ_NULLABLE(disk->src->allocation_node, node)) + return disk; + } + + /* Second pass - query the monitor. */ + if (!job_okay) + return NULL; + for (i = 0; i < vm->def->ndisks; i++) { + disk = vm->def->disks[i]; + if (!qemuDomainDiskSupportsAllocationNode(disk) || + disk->src->allocation_node) + continue; + + qemuDomainObjEnterMonitor(driver, vm); + name = qemuMonitorNodeNameLookup(priv->mon, disk->info.alias); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + goto cleanup; + + disk->src->allocation_node = name; + name = NULL; + if (STREQ_NULLABLE(disk->src->allocation_node, node)) { + ret = disk; + break; + } + } + + cleanup: + VIR_FREE(name); + return ret; +}
Rather than doing the ad-hoc back and forth mapping I think we should cache all node names for all disks when reconnecting to the monitor and when we complete a snapshot operation. This will simplfiy the event code quite a bit and additionally it will be more future proof.
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c1373de..25bc76d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10967,6 +10967,7 @@ qemuDomainBlocksStatsGather(virQEMUDriverPtr driver, _("cannot find statistics for device '%s'"), diskAlias); goto cleanup; } + VIR_FREE(stats->allocation_node);
The callers should use qemuMonitorCleanBlockStats rather than deleting it here.
**retstats = *stats; } else {
...
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 5612491..86dc4be 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1763,6 +1763,16 @@ qemuMonitorBlockInfoLookup(virHashTablePtr blockInfo, }
+/* Callback for use in cleaning up a hash used by + * qemuMonitorGetAllBlockStatsInfo. */ +void +qemuMonitorCleanBlockStats(void *opaque, const void *name ATTRIBUTE_UNUSED)
We use either Clear or Free. I'd go with qemuMonitorBlockStatsFree.
+{ + qemuBlockStatsPtr stats = opaque; + VIR_FREE(stats->allocation_node); + VIR_FREE(stats); +} + /** * qemuMonitorGetAllBlockStatsInfo: * @mon: monitor object
...
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 826835b..c0ea2ee 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -379,8 +379,12 @@ struct _qemuBlockStats { unsigned long long capacity; unsigned long long physical; unsigned long long wr_highest_offset; + char *allocation_node;
See below ...
diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index aa17a00..8b2f491 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -1,7 +1,7 @@ /* * virstoragefile.h: file utility functions for FS storage backend * - * Copyright (C) 2007-2009, 2012-2014 Red Hat, Inc. + * Copyright (C) 2007-2009, 2012-2015 Red Hat, Inc. * Copyright (C) 2007-2008 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -260,6 +260,7 @@ struct _virStorageSource { unsigned long long capacity; /* in bytes, 0 if unknown */ unsigned long long allocation; /* in bytes, 0 if unknown */ unsigned long long physical; /* in bytes, 0 if unknown */ + char *allocation_node; /* cache of node name for tracking allocation */
The node name will be useful for other operations too so I'd choose a more universal name.
size_t nseclabels; virSecurityDeviceLabelDefPtr *seclabels;
Peter

Expose threshold information by collecting the information from QMP commands. qemu 2.3 has a way to get threshold information on all elements of a block chain, but only when node names are used - what's worse, the threshold information is only provided by 'query-named-block-nodes', but the mapping between devices and nodes is only provided by 'query-blockstats'. Rather than manage node names ourselves, we rely on qemu 2.4 doing auto naming; then when collecting the stats, we make a pass through both query functions. I chose to go with the naive O(n^2) mapping algorithm; if it turns out to be slow in practice on a guest with lots of nodes, we can enhance it via a sorted list or hash table lookup. * src/qemu/qemu_monitor.h (qemuMonitorBlockStatsUpdateThreshold): New prototype. * src/qemu/qemu_monitor.c (qemuMonitorBlockStatsUpdateThreshold): New function. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockStatsUpdateThreshold): New prototype. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDevGetBlockExtent): Populate node name. (qemuMonitorJSONBlockStatsUpdateThreshold) (qemuMonitorJSONBlockStatsUpdateOneThreshold): Use it to populate threshold data. (qemuMonitorJSONGetOneBlockStatsInfo) (qemuMonitorJSONGetBlockExtent): Update callers. * src/qemu/qemu_driver.c (qemuDomainGetStatsOneBlock): Expose threshold data. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 12 ++++- src/qemu/qemu_monitor.c | 13 ++++++ src/qemu/qemu_monitor.h | 4 ++ src/qemu/qemu_monitor_json.c | 102 ++++++++++++++++++++++++++++++++++++++----- src/qemu/qemu_monitor_json.h | 2 + 5 files changed, 121 insertions(+), 12 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 25bc76d..72e256b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -19282,6 +19282,12 @@ qemuDomainGetStatsOneBlock(virQEMUDriverPtr driver, QEMU_ADD_BLOCK_PARAM_LL(record, maxparams, block_idx, "fl.times", entry->flush_total_times); + /* TODO: Until we can set thresholds on backing elements, we only + * report the threshold on the active layer. */ + if (!backing_idx) + QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx, + "write-threshold", entry->write_threshold); + QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx, "allocation", entry->wr_highest_offset); @@ -19323,9 +19329,13 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr driver, qemuDomainObjEnterMonitor(driver, dom); rc = qemuMonitorGetAllBlockStatsInfo(priv->mon, &stats, visitBacking); - if (rc >= 0) + if (rc >= 0) { ignore_value(qemuMonitorBlockStatsUpdateCapacity(priv->mon, stats, visitBacking)); + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_AUTO_NODE_NAMES)) + ignore_value(qemuMonitorBlockStatsUpdateThreshold(priv->mon, + stats)); + } if (qemuDomainObjExitMonitor(driver, dom) < 0) goto cleanup; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 86dc4be..a3ad740 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1838,6 +1838,19 @@ qemuMonitorBlockStatsUpdateCapacity(qemuMonitorPtr mon, } +/* Updates "stats" to fill write threshold of the image */ +int +qemuMonitorBlockStatsUpdateThreshold(qemuMonitorPtr mon, + virHashTablePtr stats) +{ + VIR_DEBUG("stats=%p", stats); + + QEMU_CHECK_MONITOR_JSON(mon); + + return qemuMonitorJSONBlockStatsUpdateThreshold(mon, stats); +} + + int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index c0ea2ee..541f774 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -395,6 +395,10 @@ int qemuMonitorBlockStatsUpdateCapacity(qemuMonitorPtr mon, bool backingChain) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorBlockStatsUpdateThreshold(qemuMonitorPtr mon, + virHashTablePtr stats) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 32b2719..885b6f4 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1668,9 +1668,13 @@ typedef enum { } qemuMonitorBlockExtentError; +/* Get the highest written extent. Additionally, if NODE is not null, + * and a node name is associated with the extent, then return that + * name; but failure to get a node name is not fatal. */ static int qemuMonitorJSONDevGetBlockExtent(virJSONValuePtr dev, - unsigned long long *extent) + unsigned long long *extent, + char **node) { virJSONValuePtr stats; virJSONValuePtr parent; @@ -1678,6 +1682,11 @@ qemuMonitorJSONDevGetBlockExtent(virJSONValuePtr dev, if ((parent = virJSONValueObjectGetObject(dev, "parent")) == NULL) return QEMU_MONITOR_BLOCK_EXTENT_ERROR_NOPARENT; + if (node) { + const char *name = virJSONValueObjectGetString(parent, "node-name"); + ignore_value(VIR_STRDUP_QUIET(*node, name)); + } + if ((stats = virJSONValueObjectGetObject(parent, "stats")) == NULL) return QEMU_MONITOR_BLOCK_EXTENT_ERROR_NOSTATS; @@ -1725,18 +1734,23 @@ qemuMonitorJSONGetOneBlockStatsInfo(virJSONValuePtr dev, goto cleanup; \ } \ } - QEMU_MONITOR_BLOCK_STAT_GET("rd_bytes", bstats->rd_bytes, true); - QEMU_MONITOR_BLOCK_STAT_GET("wr_bytes", bstats->wr_bytes, true); - QEMU_MONITOR_BLOCK_STAT_GET("rd_operations", bstats->rd_req, true); - QEMU_MONITOR_BLOCK_STAT_GET("wr_operations", bstats->wr_req, true); - QEMU_MONITOR_BLOCK_STAT_GET("rd_total_time_ns", bstats->rd_total_times, false); - QEMU_MONITOR_BLOCK_STAT_GET("wr_total_time_ns", bstats->wr_total_times, false); - QEMU_MONITOR_BLOCK_STAT_GET("flush_operations", bstats->flush_req, false); - QEMU_MONITOR_BLOCK_STAT_GET("flush_total_time_ns", bstats->flush_total_times, false); + QEMU_MONITOR_BLOCK_STAT_GET("rd_bytes", bstats->rd_bytes, true); + QEMU_MONITOR_BLOCK_STAT_GET("wr_bytes", bstats->wr_bytes, true); + QEMU_MONITOR_BLOCK_STAT_GET("rd_operations", bstats->rd_req, true); + QEMU_MONITOR_BLOCK_STAT_GET("wr_operations", bstats->wr_req, true); + QEMU_MONITOR_BLOCK_STAT_GET("rd_total_time_ns", bstats->rd_total_times, + false); + QEMU_MONITOR_BLOCK_STAT_GET("wr_total_time_ns", bstats->wr_total_times, + false); + QEMU_MONITOR_BLOCK_STAT_GET("flush_operations", bstats->flush_req, + false); + QEMU_MONITOR_BLOCK_STAT_GET("flush_total_time_ns", + bstats->flush_total_times, false); #undef QEMU_MONITOR_BLOCK_STAT_GET /* it's ok to not have this information here. Just skip silently. */ - qemuMonitorJSONDevGetBlockExtent(dev, &bstats->wr_highest_offset); + qemuMonitorJSONDevGetBlockExtent(dev, &bstats->wr_highest_offset, + &bstats->allocation_node); if (virHashAddEntry(hash, entry_name, bstats) < 0) goto cleanup; @@ -1937,6 +1951,72 @@ qemuMonitorJSONBlockStatsUpdateCapacity(qemuMonitorPtr mon, } +/* Hash table callback: Best effort routine to update write-threshold + * of a given qemuBlockStatsPtr (payload) using the QMP results for + * all nodes (opaque). */ +static void +qemuMonitorJSONBlockStatsUpdateOneThreshold(void *payload, + const void *entry ATTRIBUTE_UNUSED, + void *opaque) +{ + qemuBlockStatsPtr data = payload; + virJSONValuePtr nodes = opaque; + size_t i; + + if (!data->allocation_node) + return; + for (i = 0; i < virJSONValueArraySize(nodes); i++) { + virJSONValuePtr node = virJSONValueArrayGet(nodes, i); + const char *name = virJSONValueObjectGetString(node, "node-name"); + + if (STREQ_NULLABLE(data->allocation_node, name) && + virJSONValueObjectGetNumberUlong(node, "write_threshold", + &data->write_threshold) == 0) + break; + } +} + + +int +qemuMonitorJSONBlockStatsUpdateThreshold(qemuMonitorPtr mon, + virHashTablePtr stats) +{ + int ret = -1; + int rc; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + virJSONValuePtr devices; + + if (!(cmd = qemuMonitorJSONMakeCommand("query-named-block-nodes", NULL))) + return -1; + if ((rc = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + if (qemuMonitorJSONCheckError(cmd, reply) < 0) + goto cleanup; + + if (!(devices = virJSONValueObjectGetArray(reply, "return"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-named-block-nodes reply was missing list")); + goto cleanup; + } + /* O(n^2) collection pass because we visit the entire nodes list + * for each stats entry. Hopefully there aren't a boat-load of + * nodes to make this noticeably slower. If we need, we can do a + * pre-processing pass over devices to reach O(n log n) (via + * sorted list) or amortized O(n) (via a hash table) layout where + * node name lookup is more efficient. */ + if (virJSONValueArraySize(devices) > 0) + virHashForEach(stats, qemuMonitorJSONBlockStatsUpdateOneThreshold, + devices); + ret = 0; + + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + + static int qemuMonitorJSONReportBlockExtentError(qemuMonitorBlockExtentError error) { @@ -2025,7 +2105,7 @@ int qemuMonitorJSONGetBlockExtent(qemuMonitorPtr mon, continue; found = true; - if ((err = qemuMonitorJSONDevGetBlockExtent(dev, extent)) != + if ((err = qemuMonitorJSONDevGetBlockExtent(dev, extent, NULL)) != QEMU_MONITOR_BLOCK_EXTENT_ERROR_OK) { qemuMonitorJSONReportBlockExtentError(err); goto cleanup; diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index f18295b..2f90b5c 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -77,6 +77,8 @@ int qemuMonitorJSONGetAllBlockStatsInfo(qemuMonitorPtr mon, int qemuMonitorJSONBlockStatsUpdateCapacity(qemuMonitorPtr mon, virHashTablePtr stats, bool backingChain); +int qemuMonitorJSONBlockStatsUpdateThreshold(qemuMonitorPtr mon, + virHashTablePtr stats); int qemuMonitorJSONGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent); -- 2.4.3

On Mon, Jun 22, 2015 at 17:06:44 -0600, Eric Blake wrote:
Expose threshold information by collecting the information from QMP commands. qemu 2.3 has a way to get threshold information on all elements of a block chain, but only when node names are used - what's worse, the threshold information is only provided by 'query-named-block-nodes', but the mapping between devices and nodes is only provided by 'query-blockstats'. Rather than manage node names ourselves, we rely on qemu 2.4 doing auto naming; then when collecting the stats, we make a pass through both query functions.
I chose to go with the naive O(n^2) mapping algorithm; if it turns out to be slow in practice on a guest with lots of nodes, we can enhance it via a sorted list or hash table lookup.
* src/qemu/qemu_monitor.h (qemuMonitorBlockStatsUpdateThreshold): New prototype. * src/qemu/qemu_monitor.c (qemuMonitorBlockStatsUpdateThreshold): New function. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockStatsUpdateThreshold): New prototype. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDevGetBlockExtent): Populate node name. (qemuMonitorJSONBlockStatsUpdateThreshold) (qemuMonitorJSONBlockStatsUpdateOneThreshold): Use it to populate threshold data. (qemuMonitorJSONGetOneBlockStatsInfo) (qemuMonitorJSONGetBlockExtent): Update callers. * src/qemu/qemu_driver.c (qemuDomainGetStatsOneBlock): Expose threshold data.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 12 ++++- src/qemu/qemu_monitor.c | 13 ++++++ src/qemu/qemu_monitor.h | 4 ++ src/qemu/qemu_monitor_json.c | 102 ++++++++++++++++++++++++++++++++++++++----- src/qemu/qemu_monitor_json.h | 2 + 5 files changed, 121 insertions(+), 12 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 25bc76d..72e256b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -19282,6 +19282,12 @@ qemuDomainGetStatsOneBlock(virQEMUDriverPtr driver, QEMU_ADD_BLOCK_PARAM_LL(record, maxparams, block_idx, "fl.times", entry->flush_total_times);
+ /* TODO: Until we can set thresholds on backing elements, we only + * report the threshold on the active layer. */
While this is okay in this intermediate part I am not okay with merging this series without the full support for non-top layers. If we add a partial impl we will force the users to check whether given libvirt supports it. Since I know of only one consumer of this feature which is oVirt and they are interrested in both data we would increase the complexity of usefullnes of this feature.
+ if (!backing_idx) + QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx, + "write-threshold", entry->write_threshold); + QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx, "allocation", entry->wr_highest_offset);
@@ -19323,9 +19329,13 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr driver, qemuDomainObjEnterMonitor(driver, dom); rc = qemuMonitorGetAllBlockStatsInfo(priv->mon, &stats, visitBacking); - if (rc >= 0) + if (rc >= 0) { ignore_value(qemuMonitorBlockStatsUpdateCapacity(priv->mon, stats, visitBacking)); + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_AUTO_NODE_NAMES)) + ignore_value(qemuMonitorBlockStatsUpdateThreshold(priv->mon, + stats)); + } if (qemuDomainObjExitMonitor(driver, dom) < 0) goto cleanup;
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 86dc4be..a3ad740 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1838,6 +1838,19 @@ qemuMonitorBlockStatsUpdateCapacity(qemuMonitorPtr mon, }
+/* Updates "stats" to fill write threshold of the image */ +int +qemuMonitorBlockStatsUpdateThreshold(qemuMonitorPtr mon, + virHashTablePtr stats) +{ + VIR_DEBUG("stats=%p", stats); + + QEMU_CHECK_MONITOR_JSON(mon); + + return qemuMonitorJSONBlockStatsUpdateThreshold(mon, stats); +} + + int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index c0ea2ee..541f774 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -395,6 +395,10 @@ int qemuMonitorBlockStatsUpdateCapacity(qemuMonitorPtr mon, bool backingChain) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);
+int qemuMonitorBlockStatsUpdateThreshold(qemuMonitorPtr mon, + virHashTablePtr stats) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 32b2719..885b6f4 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1668,9 +1668,13 @@ typedef enum { } qemuMonitorBlockExtentError;
+/* Get the highest written extent. Additionally, if NODE is not null, + * and a node name is associated with the extent, then return that + * name; but failure to get a node name is not fatal. */ static int qemuMonitorJSONDevGetBlockExtent(virJSONValuePtr dev, - unsigned long long *extent) + unsigned long long *extent, + char **node)
I'll send a series today that will remove this function. qemuMonitorJSONGetOneBlockStatsInfo will need to do this operation so that we don't duplicate calls to query-blockstats.
{ virJSONValuePtr stats; virJSONValuePtr parent; @@ -1678,6 +1682,11 @@ qemuMonitorJSONDevGetBlockExtent(virJSONValuePtr dev, if ((parent = virJSONValueObjectGetObject(dev, "parent")) == NULL) return QEMU_MONITOR_BLOCK_EXTENT_ERROR_NOPARENT;
+ if (node) { + const char *name = virJSONValueObjectGetString(parent, "node-name"); + ignore_value(VIR_STRDUP_QUIET(*node, name)); + } + if ((stats = virJSONValueObjectGetObject(parent, "stats")) == NULL) return QEMU_MONITOR_BLOCK_EXTENT_ERROR_NOSTATS;
@@ -1725,18 +1734,23 @@ qemuMonitorJSONGetOneBlockStatsInfo(virJSONValuePtr dev, goto cleanup; \ } \ } - QEMU_MONITOR_BLOCK_STAT_GET("rd_bytes", bstats->rd_bytes, true); - QEMU_MONITOR_BLOCK_STAT_GET("wr_bytes", bstats->wr_bytes, true); - QEMU_MONITOR_BLOCK_STAT_GET("rd_operations", bstats->rd_req, true); - QEMU_MONITOR_BLOCK_STAT_GET("wr_operations", bstats->wr_req, true); - QEMU_MONITOR_BLOCK_STAT_GET("rd_total_time_ns", bstats->rd_total_times, false); - QEMU_MONITOR_BLOCK_STAT_GET("wr_total_time_ns", bstats->wr_total_times, false); - QEMU_MONITOR_BLOCK_STAT_GET("flush_operations", bstats->flush_req, false); - QEMU_MONITOR_BLOCK_STAT_GET("flush_total_time_ns", bstats->flush_total_times, false); + QEMU_MONITOR_BLOCK_STAT_GET("rd_bytes", bstats->rd_bytes, true); + QEMU_MONITOR_BLOCK_STAT_GET("wr_bytes", bstats->wr_bytes, true); + QEMU_MONITOR_BLOCK_STAT_GET("rd_operations", bstats->rd_req, true); + QEMU_MONITOR_BLOCK_STAT_GET("wr_operations", bstats->wr_req, true); + QEMU_MONITOR_BLOCK_STAT_GET("rd_total_time_ns", bstats->rd_total_times, + false); + QEMU_MONITOR_BLOCK_STAT_GET("wr_total_time_ns", bstats->wr_total_times, + false); + QEMU_MONITOR_BLOCK_STAT_GET("flush_operations", bstats->flush_req, + false); + QEMU_MONITOR_BLOCK_STAT_GET("flush_total_time_ns", + bstats->flush_total_times, false);
Spurious whitespace change.
#undef QEMU_MONITOR_BLOCK_STAT_GET
/* it's ok to not have this information here. Just skip silently. */ - qemuMonitorJSONDevGetBlockExtent(dev, &bstats->wr_highest_offset); + qemuMonitorJSONDevGetBlockExtent(dev, &bstats->wr_highest_offset, + &bstats->allocation_node);
if (virHashAddEntry(hash, entry_name, bstats) < 0) goto cleanup; @@ -1937,6 +1951,72 @@ qemuMonitorJSONBlockStatsUpdateCapacity(qemuMonitorPtr mon, }
+/* Hash table callback: Best effort routine to update write-threshold + * of a given qemuBlockStatsPtr (payload) using the QMP results for + * all nodes (opaque). */ +static void +qemuMonitorJSONBlockStatsUpdateOneThreshold(void *payload, + const void *entry ATTRIBUTE_UNUSED, + void *opaque) +{ + qemuBlockStatsPtr data = payload; + virJSONValuePtr nodes = opaque; + size_t i; + + if (!data->allocation_node) + return; + for (i = 0; i < virJSONValueArraySize(nodes); i++) { + virJSONValuePtr node = virJSONValueArrayGet(nodes, i); + const char *name = virJSONValueObjectGetString(node, "node-name"); + + if (STREQ_NULLABLE(data->allocation_node, name) && + virJSONValueObjectGetNumberUlong(node, "write_threshold", + &data->write_threshold) == 0) + break; + } +} + + +int +qemuMonitorJSONBlockStatsUpdateThreshold(qemuMonitorPtr mon, + virHashTablePtr stats) +{ + int ret = -1; + int rc; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + virJSONValuePtr devices; + + if (!(cmd = qemuMonitorJSONMakeCommand("query-named-block-nodes", NULL))) + return -1; + if ((rc = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + if (qemuMonitorJSONCheckError(cmd, reply) < 0) + goto cleanup; + + if (!(devices = virJSONValueObjectGetArray(reply, "return"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-named-block-nodes reply was missing list")); + goto cleanup; + } + /* O(n^2) collection pass because we visit the entire nodes list + * for each stats entry. Hopefully there aren't a boat-load of + * nodes to make this noticeably slower. If we need, we can do a + * pre-processing pass over devices to reach O(n log n) (via + * sorted list) or amortized O(n) (via a hash table) layout where
I think in the future it will be necessary to have the data from query-named-blocks available for use so I think that this helper should similarly to qemuMonitorJSONGetAllBlockStatsInfo return a hash of the useful data, in this case keyed by the node name, and the caller would do the disk->node matching when it requires the data if it requires it.
+ * node name lookup is more efficient. */ + if (virJSONValueArraySize(devices) > 0) + virHashForEach(stats, qemuMonitorJSONBlockStatsUpdateOneThreshold, + devices); + ret = 0; + + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + + static int
Peter

With this patch, block write threshold events delivered by qemu are converted into libvirt events. This patch takes the easy road, and only reports events if the node name is still cached by libvirtd (true in the common case when libvirtd is not restarted, since you can't get an event if you didn't register a threshold). A followup patch will be needed to properly handle grabbing the job lock and getting the node name when the cache lookup fails, using code similar to how we update domain XML after a block job completes. But for getting the initial API in place, I figured this was a good enough start. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockWriteThreshold): New function. * src/qemu/qemu_monitor.c (qemuMonitorEmitBlockThreshold): Likewise. * src/qemu/qemu_monitor.h (qemuMonitorEmitBlockThreshold): Likewise. * src/qemu/qemu_process.c (qemuProcessHandleBlockThreshold): Likewise. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_monitor.c | 14 ++++++++++++++ src/qemu/qemu_monitor.h | 11 +++++++++++ src/qemu/qemu_monitor_json.c | 31 +++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 100 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index a3ad740..c4b6073 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1433,6 +1433,20 @@ qemuMonitorEmitBlockJob(qemuMonitorPtr mon, } +int qemuMonitorEmitBlockThreshold(qemuMonitorPtr mon, + const char *node, + unsigned long long threshold, + unsigned long long length) +{ + int ret = -1; + VIR_DEBUG("mon=%p", mon); + + QEMU_MONITOR_CALLBACK(mon, ret, domainBlockThreshold, mon->vm, + node, threshold, length); + return ret; +} + + int qemuMonitorEmitBalloonChange(qemuMonitorPtr mon, unsigned long long actual) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 541f774..8c84cdb 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -146,6 +146,12 @@ typedef int (*qemuMonitorDomainBlockJobCallback)(qemuMonitorPtr mon, int type, int status, void *opaque); +typedef int (*qemuMonitorDomainBlockThresholdCallback)(qemuMonitorPtr mon, + virDomainObjPtr vm, + const char *node, + unsigned long long threshold, + unsigned long long length, + void *opaque); typedef int (*qemuMonitorDomainTrayChangeCallback)(qemuMonitorPtr mon, virDomainObjPtr vm, const char *devAlias, @@ -204,6 +210,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainIOErrorCallback domainIOError; qemuMonitorDomainGraphicsCallback domainGraphics; qemuMonitorDomainBlockJobCallback domainBlockJob; + qemuMonitorDomainBlockThresholdCallback domainBlockThreshold; qemuMonitorDomainTrayChangeCallback domainTrayChange; qemuMonitorDomainPMWakeupCallback domainPMWakeup; qemuMonitorDomainPMSuspendCallback domainPMSuspend; @@ -301,6 +308,10 @@ int qemuMonitorEmitBlockJob(qemuMonitorPtr mon, const char *diskAlias, int type, int status); +int qemuMonitorEmitBlockThreshold(qemuMonitorPtr mon, + const char *node, + unsigned long long threshold, + unsigned long long length); int qemuMonitorEmitBalloonChange(qemuMonitorPtr mon, unsigned long long actual); int qemuMonitorEmitPMSuspendDisk(qemuMonitorPtr mon); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 885b6f4..46383d7 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -77,6 +77,8 @@ static void qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, virJSONValuePtr d static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleBlockJobReady(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockWriteThreshold(qemuMonitorPtr mon, + virJSONValuePtr data); static void qemuMonitorJSONHandleBalloonChange(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMSuspendDisk(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleGuestPanic(qemuMonitorPtr mon, virJSONValuePtr data); @@ -96,6 +98,7 @@ static qemuEventHandler eventHandlers[] = { { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, }, { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, { "BLOCK_JOB_READY", qemuMonitorJSONHandleBlockJobReady, }, + { "BLOCK_WRITE_THRESHOLD", qemuMonitorJSONHandleBlockWriteThreshold, }, { "DEVICE_DELETED", qemuMonitorJSONHandleDeviceDeleted, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, }, @@ -843,6 +846,34 @@ qemuMonitorJSONHandleBlockJobReady(qemuMonitorPtr mon, VIR_DOMAIN_BLOCK_JOB_READY); } + +static void +qemuMonitorJSONHandleBlockWriteThreshold(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + const char *node; + unsigned long long offset = 0, len = 0; + + if ((node = virJSONValueObjectGetString(data, "node-name")) == NULL) { + VIR_WARN("missing node in block threshold event"); + goto out; + } + + if (virJSONValueObjectGetNumberUlong(data, "write-threshold", + &offset) < 0) { + VIR_WARN("missing threshold in block threshold event"); + goto out; + } + + if (virJSONValueObjectGetNumberUlong(data, "amount-exceeded", &len) < 0) { + VIR_WARN("missing len in block threshold event"); + goto out; + } + + out: + qemuMonitorEmitBlockThreshold(mon, node, offset, len); +} + static void qemuMonitorJSONHandleBalloonChange(qemuMonitorPtr mon, virJSONValuePtr data) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ba84182..5f582e5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1037,6 +1037,49 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, static int +qemuProcessHandleBlockThreshold(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *node, + unsigned long long threshold, + unsigned long long length, + void *opaque) +{ + virQEMUDriverPtr driver = opaque; + virDomainDiskDefPtr disk; + virObjectEventPtr event = NULL; + const char *path = NULL; + + virObjectLock(vm); + + VIR_DEBUG("Block threshold for node %s (domain: %p,%s) threshold %llu " + "length %llu", node, vm, vm->def->name, threshold, length); + + /* TODO: If the node name is not found in the current domain XML, + * we need to grab a job lock and query the monitor to repopulate + * the node name cache. For now, if that happens, we just discard + * the event. */ + if (!(disk = qemuDomainDiskResolveAllocationNode(driver, vm, node, + false))) { + VIR_WARN("failed to locate disk matching node '%s'", node); + goto cleanup; + } + if (virStorageSourceIsLocalStorage(disk->src)) + path = disk->src->path; + + /* TODO: Once we support node names for more than just the active + * layer, we will need to map this into 'vda[1]' notation. */ + event = virDomainEventWriteThresholdNewFromObj(vm, disk->dst, path, + threshold, length); + + cleanup: + virObjectUnlock(vm); + if (event) + qemuDomainEventQueue(driver, event); + return 0; +} + + +static int qemuProcessHandleGraphics(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjPtr vm, int phase, @@ -1522,6 +1565,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainIOError = qemuProcessHandleIOError, .domainGraphics = qemuProcessHandleGraphics, .domainBlockJob = qemuProcessHandleBlockJob, + .domainBlockThreshold = qemuProcessHandleBlockThreshold, .domainTrayChange = qemuProcessHandleTrayChange, .domainPMWakeup = qemuProcessHandlePMWakeup, .domainPMSuspend = qemuProcessHandlePMSuspend, -- 2.4.3

On Mon, Jun 22, 2015 at 17:06:45 -0600, Eric Blake wrote:
With this patch, block write threshold events delivered by qemu are converted into libvirt events.
This patch takes the easy road, and only reports events if the node name is still cached by libvirtd (true in the common case when libvirtd is not restarted, since you can't get an event if you didn't register a threshold). A followup patch will be needed to properly handle grabbing the job lock and getting the node name when the cache lookup fails, using code similar to how we update domain XML after a block job completes. But for getting the initial API in place, I figured this was a good enough start.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockWriteThreshold): New function. * src/qemu/qemu_monitor.c (qemuMonitorEmitBlockThreshold): Likewise. * src/qemu/qemu_monitor.h (qemuMonitorEmitBlockThreshold): Likewise. * src/qemu/qemu_process.c (qemuProcessHandleBlockThreshold): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_monitor.c | 14 ++++++++++++++ src/qemu/qemu_monitor.h | 11 +++++++++++ src/qemu/qemu_monitor_json.c | 31 +++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 100 insertions(+)
...
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ba84182..5f582e5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1037,6 +1037,49 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
static int +qemuProcessHandleBlockThreshold(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *node, + unsigned long long threshold, + unsigned long long length, + void *opaque) +{ + virQEMUDriverPtr driver = opaque; + virDomainDiskDefPtr disk; + virObjectEventPtr event = NULL; + const char *path = NULL; + + virObjectLock(vm); + + VIR_DEBUG("Block threshold for node %s (domain: %p,%s) threshold %llu " + "length %llu", node, vm, vm->def->name, threshold, length); + + /* TODO: If the node name is not found in the current domain XML, + * we need to grab a job lock and query the monitor to repopulate + * the node name cache. For now, if that happens, we just discard + * the event. */
This duality of operations is why we should cache the node names upfront.
+ if (!(disk = qemuDomainDiskResolveAllocationNode(driver, vm, node, + false))) { + VIR_WARN("failed to locate disk matching node '%s'", node); + goto cleanup; + } + if (virStorageSourceIsLocalStorage(disk->src)) + path = disk->src->path; + + /* TODO: Once we support node names for more than just the active + * layer, we will need to map this into 'vda[1]' notation. */ + event = virDomainEventWriteThresholdNewFromObj(vm, disk->dst, path, + threshold, length); +
Peter

Time to wire up the new API to call into the QMP command for setting a write threshold. For now, the API only allows setting the threshold on the active layer. But I left TODOs in the series for places that need touching to find and support node names of backing files, so that we can use "vda[1]" notation to register for a threshold on the backing image during an active commit operation. * src/qemu/qemu_driver.c (qemuDomainBlockSetWriteThreshold): New function. * src/qemu/qemu_monitor.c (qemuMonitorBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor.h (qemuMonitorBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockSetWriteThreshold): Likewise. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 95 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 13 ++++++ src/qemu/qemu_monitor.h | 5 +++ src/qemu/qemu_monitor_json.c | 30 ++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ 5 files changed, 146 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 72e256b..732f102 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -17906,6 +17906,100 @@ qemuDomainGetBlockIoTune(virDomainPtr dom, return ret; } + +static int +qemuDomainBlockSetWriteThreshold(virDomainPtr dom, + const char *path, + unsigned long long threshold, + unsigned int flags) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + virDomainDiskDefPtr disk = NULL; + virCapsPtr caps = NULL; + virQEMUDriverConfigPtr cfg = NULL; + const char *node = NULL; + int ret = -1; + + virCheckFlags(VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PROPORTION, -1); + + if (!(vm = qemuDomObjFromDomain(dom))) + return -1; + priv = vm->privateData; + + if (virDomainBlockSetWriteThresholdEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + goto cleanup; + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCK_WRITE_THRESHOLD) || + !virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_AUTO_NODE_NAMES)) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("block write threshold not supported by this " + "qemu binary")); + goto cleanup; + } + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + if (!(disk = virDomainDiskByName(vm->def, path, false))) { + virReportError(VIR_ERR_INVALID_ARG, + _("invalid path %s not assigned to domain"), path); + goto endjob; + } + + if (!(node = qemuDomainDiskGetAllocationNode(driver, vm, disk))) { + virReportError(VIR_ERR_INVALID_ARG, + _("cannot track threshold events on disk '%s'"), + path); + goto endjob; + } + + cfg = virQEMUDriverGetConfig(driver); + if (qemuStorageLimitsRefresh(driver, cfg, vm, disk->src) < 0) + goto endjob; + + if (flags & VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PROPORTION) { + /* Caller already sanitized max value to 1000000. Use of + * floating point intermediary reduces (but does not + * eliminate) rounding error, but since we already document a + * granularity of parts per million, it shouldn't matter too + * much if the answer is a few bytes off. */ + threshold *= disk->src->physical / 1e6; + } else { + if (threshold > disk->src->physical) { + virReportError(VIR_ERR_INVALID_ARG, + _("threshold %llu exceeds disk size %llu"), + threshold, disk->src->physical); + goto endjob; + } + } + + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorBlockSetWriteThreshold(priv->mon, node, threshold); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + ret = -1; + + + endjob: + qemuDomainObjEndJob(driver, vm); + + cleanup: + virDomainObjEndAPI(&vm); + virObjectUnref(caps); + virObjectUnref(cfg); + return ret; +} + + static int qemuDomainGetDiskErrors(virDomainPtr dom, virDomainDiskErrorPtr errors, @@ -20037,6 +20131,7 @@ static virHypervisorDriver qemuHypervisorDriver = { .nodeSuspendForDuration = qemuNodeSuspendForDuration, /* 0.9.8 */ .domainSetBlockIoTune = qemuDomainSetBlockIoTune, /* 0.9.8 */ .domainGetBlockIoTune = qemuDomainGetBlockIoTune, /* 0.9.8 */ + .domainBlockSetWriteThreshold = qemuDomainBlockSetWriteThreshold, /* 1.3.0 */ .domainSetNumaParameters = qemuDomainSetNumaParameters, /* 0.9.9 */ .domainGetNumaParameters = qemuDomainGetNumaParameters, /* 0.9.9 */ .domainGetInterfaceParameters = qemuDomainGetInterfaceParameters, /* 0.9.9 */ diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index c4b6073..2ff34e2 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1866,6 +1866,19 @@ qemuMonitorBlockStatsUpdateThreshold(qemuMonitorPtr mon, int +qemuMonitorBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name, + unsigned long long threshold) +{ + VIR_DEBUG("dev_name=%s, threshold=%llu", dev_name, threshold); + + QEMU_CHECK_MONITOR_JSON(mon); + + return qemuMonitorJSONBlockSetWriteThreshold(mon, dev_name, threshold); +} + + +int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 8c84cdb..2d1afc4 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -410,6 +410,11 @@ int qemuMonitorBlockStatsUpdateThreshold(qemuMonitorPtr mon, virHashTablePtr stats) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name, + unsigned long long threshold) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 46383d7..6cfba99 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2048,6 +2048,36 @@ qemuMonitorJSONBlockStatsUpdateThreshold(qemuMonitorPtr mon, } +int +qemuMonitorJSONBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name, + unsigned long long threshold) +{ + int ret = -1; + int rc; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("block-set-write-threshold", + "s:node-name", dev_name, + "U:write-threshold", threshold, + NULL))) + return -1; + + if ((rc = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + if (qemuMonitorJSONCheckError(cmd, reply) < 0) + goto cleanup; + + ret = 0; + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + + static int qemuMonitorJSONReportBlockExtentError(qemuMonitorBlockExtentError error) { diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 2f90b5c..fe00f20 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -79,6 +79,9 @@ int qemuMonitorJSONBlockStatsUpdateCapacity(qemuMonitorPtr mon, bool backingChain); int qemuMonitorJSONBlockStatsUpdateThreshold(qemuMonitorPtr mon, virHashTablePtr stats); +int qemuMonitorJSONBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name, + unsigned long long threshold); int qemuMonitorJSONGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent); -- 2.4.3

On Mon, Jun 22, 2015 at 17:06:46 -0600, Eric Blake wrote:
Time to wire up the new API to call into the QMP command for setting a write threshold.
For now, the API only allows setting the threshold on the active layer. But I left TODOs in the series for places that need touching to find and support node names of backing files, so that we can use "vda[1]" notation to register for a threshold on the backing image during an active commit operation.
* src/qemu/qemu_driver.c (qemuDomainBlockSetWriteThreshold): New function. * src/qemu/qemu_monitor.c (qemuMonitorBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor.h (qemuMonitorBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockSetWriteThreshold): Likewise. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockSetWriteThreshold): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 95 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 13 ++++++ src/qemu/qemu_monitor.h | 5 +++ src/qemu/qemu_monitor_json.c | 30 ++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ 5 files changed, 146 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 72e256b..732f102 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -17906,6 +17906,100 @@ qemuDomainGetBlockIoTune(virDomainPtr dom, return ret; }
+ +static int +qemuDomainBlockSetWriteThreshold(virDomainPtr dom, + const char *path, + unsigned long long threshold, + unsigned int flags) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + virDomainDiskDefPtr disk = NULL; + virCapsPtr caps = NULL; + virQEMUDriverConfigPtr cfg = NULL; + const char *node = NULL; + int ret = -1; + + virCheckFlags(VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PROPORTION, -1); + + if (!(vm = qemuDomObjFromDomain(dom))) + return -1; + priv = vm->privateData; + + if (virDomainBlockSetWriteThresholdEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + goto cleanup; + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCK_WRITE_THRESHOLD) || + !virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_AUTO_NODE_NAMES)) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("block write threshold not supported by this " + "qemu binary")); + goto cleanup; + } + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + if (!(disk = virDomainDiskByName(vm->def, path, false))) { + virReportError(VIR_ERR_INVALID_ARG, + _("invalid path %s not assigned to domain"), path); + goto endjob; + } + + if (!(node = qemuDomainDiskGetAllocationNode(driver, vm, disk))) { + virReportError(VIR_ERR_INVALID_ARG, + _("cannot track threshold events on disk '%s'"), + path); + goto endjob; + } + + cfg = virQEMUDriverGetConfig(driver); + if (qemuStorageLimitsRefresh(driver, cfg, vm, disk->src) < 0)
This function begins with a rather long comment explaining that it's not entirely a good idea to use it on online VMs and especially on write-enabled disk sources. Shouldn't we query the monitor rather than use that?
+ goto endjob; + + if (flags & VIR_DOMAIN_BLOCK_SET_WRITE_THRESHOLD_PROPORTION) { + /* Caller already sanitized max value to 1000000. Use of + * floating point intermediary reduces (but does not + * eliminate) rounding error, but since we already document a + * granularity of parts per million, it shouldn't matter too + * much if the answer is a few bytes off. */ + threshold *= disk->src->physical / 1e6; + } else { + if (threshold > disk->src->physical) { + virReportError(VIR_ERR_INVALID_ARG, + _("threshold %llu exceeds disk size %llu"), + threshold, disk->src->physical); + goto endjob; + } + } + + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorBlockSetWriteThreshold(priv->mon, node, threshold); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + ret = -1; + + + endjob: + qemuDomainObjEndJob(driver, vm); + + cleanup: + virDomainObjEndAPI(&vm); + virObjectUnref(caps); + virObjectUnref(cfg); + return ret; +} + + static int qemuDomainGetDiskErrors(virDomainPtr dom, virDomainDiskErrorPtr errors, @@ -20037,6 +20131,7 @@ static virHypervisorDriver qemuHypervisorDriver = { .nodeSuspendForDuration = qemuNodeSuspendForDuration, /* 0.9.8 */ .domainSetBlockIoTune = qemuDomainSetBlockIoTune, /* 0.9.8 */ .domainGetBlockIoTune = qemuDomainGetBlockIoTune, /* 0.9.8 */ + .domainBlockSetWriteThreshold = qemuDomainBlockSetWriteThreshold, /* 1.3.0 */ .domainSetNumaParameters = qemuDomainSetNumaParameters, /* 0.9.9 */ .domainGetNumaParameters = qemuDomainGetNumaParameters, /* 0.9.9 */ .domainGetInterfaceParameters = qemuDomainGetInterfaceParameters, /* 0.9.9 */ diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index c4b6073..2ff34e2 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1866,6 +1866,19 @@ qemuMonitorBlockStatsUpdateThreshold(qemuMonitorPtr mon,
int +qemuMonitorBlockSetWriteThreshold(qemuMonitorPtr mon, + const char *dev_name,
This function needs a node name passed so it should also be called that way to avoid confusion since the function lacks docs.
+ unsigned long long threshold) +{ + VIR_DEBUG("dev_name=%s, threshold=%llu", dev_name, threshold); + + QEMU_CHECK_MONITOR_JSON(mon); + + return qemuMonitorJSONBlockSetWriteThreshold(mon, dev_name, threshold); +} + + +int qemuMonitorGetBlockExtent(qemuMonitorPtr mon, const char *dev_name, unsigned long long *extent)
Peter

On 06/22/2015 05:06 PM, Eric Blake wrote:
v2 was here: https://www.redhat.com/archives/libvir-list/2015-June/msg00591.html
This series depends on my yajl/json cleanups: https://www.redhat.com/archives/libvir-list/2015-June/msg01098.html
and can also be found here: git clone git://repo.or.cz/libvirt/ericb.git threshold http://repo.or.cz/w/libvirt/ericb.git/shortlog/refs/heads/threshold
Since then, I've rebased to master, addressed Peter's findings on v2, and tested that things actually work when combined with a single pending patch on qemu (I'll reply to this mail with the list id of that patch, where that patch in turn points to this series as justification).
Depends on this qemu patch: https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05770.html -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Peter Krempa