[libvirt] [PATCH v2 0/3] rbd: improvements to actual disk-usage calculation

From: Jason Dillaman <dillaman@redhat.com> The RBD fast-diff feature can vastly reduce the amount of time needed to calculate actual disk usage of volumes, but it might still be a slow operation for large RBD pools or pools with large RBD images. Therefore, this feature should be able to be optionally disabled if needed. Additionally, the fast-diff feature can only be used if the fast-diff map isn't flagged as invalid. Otherwise, librbd will silently perform a costly block-by-block scan to calculate the disk usage. since v1: - Moved RBD-unique refresh volume allocation override to shared storage pool attribute Jason Dillaman (3): rbd: do not attempt to use fast-diff if it's marked invalid storage: optional 'refresh_volume_allocation' attribute on pool rbd: optionally compute volume allocation from capacity docs/formatstorage.html.in | 15 +++++-- docs/schemas/storagecommon.rng | 7 ++++ docs/schemas/storagepool.rng | 5 +++ src/conf/storage_conf.c | 23 ++++++++++ src/conf/storage_conf.h | 9 ++++ src/storage/storage_backend_rbd.c | 42 +++++++++++++++++-- .../pool-rbd-refresh-volume-allocation.xml | 12 ++++++ .../pool-rbd-refresh-volume-allocation.xml | 15 +++++++ tests/storagepoolxml2xmltest.c | 1 + 9 files changed, 122 insertions(+), 7 deletions(-) create mode 100644 tests/storagepoolxml2xmlin/pool-rbd-refresh-volume-allocation.xml create mode 100644 tests/storagepoolxml2xmlout/pool-rbd-refresh-volume-allocation.xml -- 2.20.1

From: Jason Dillaman <dillaman@redhat.com> The librbd API will transparently revert to a slow disk usage calculation method if the fast-diff map is marked as invalid. Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- src/storage/storage_backend_rbd.c | 41 ++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index 2b7af1db23..e67911f928 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -437,10 +437,29 @@ volStorageBackendRBDGetFeatures(rbd_image_t image, } #if LIBRBD_VERSION_CODE > 265 +static int +volStorageBackendRBDGetFlags(rbd_image_t image, + const char *volname, + uint64_t *flags) +{ + int r, ret = -1; + + if ((r = rbd_get_flags(image, flags)) < 0) { + virReportSystemError(-r, _("failed to get the flags of RBD image " + "%s"), volname); + goto cleanup; + } + ret = 0; + + cleanup: + return ret; +} + static bool -volStorageBackendRBDUseFastDiff(uint64_t features) +volStorageBackendRBDUseFastDiff(uint64_t features, uint64_t flags) { - return features & RBD_FEATURE_FAST_DIFF; + return (((features & RBD_FEATURE_FAST_DIFF) != 0ULL) && + ((flags & RBD_FLAG_FAST_DIFF_INVALID) == 0ULL)); } static int @@ -484,7 +503,17 @@ virStorageBackendRBDSetAllocation(virStorageVolDefPtr vol, #else static int -volStorageBackendRBDUseFastDiff(uint64_t features ATTRIBUTE_UNUSED) +volStorageBackendRBDGetFlags(rbd_image_t image, + const char *volname, + uint64_t *flags) +{ + *flags = 0; + return 0; +} + +static int +volStorageBackendRBDUseFastDiff(uint64_t features ATTRIBUTE_UNUSED, + uint64_t feature_flags ATTRIBUTE_UNUSED) { return false; } @@ -509,6 +538,7 @@ volStorageBackendRBDRefreshVolInfo(virStorageVolDefPtr vol, rbd_image_t image = NULL; rbd_image_info_t info; uint64_t features; + uint64_t flags; if ((r = rbd_open_read_only(ptr->ioctx, vol->name, &image, NULL)) < 0) { ret = -r; @@ -527,11 +557,14 @@ volStorageBackendRBDRefreshVolInfo(virStorageVolDefPtr vol, if (volStorageBackendRBDGetFeatures(image, vol->name, &features) < 0) goto cleanup; + if (volStorageBackendRBDGetFlags(image, vol->name, &flags) < 0) + goto cleanup; + vol->target.capacity = info.size; vol->type = VIR_STORAGE_VOL_NETWORK; vol->target.format = VIR_STORAGE_FILE_RAW; - if (volStorageBackendRBDUseFastDiff(features)) { + if (volStorageBackendRBDUseFastDiff(features, flags)) { VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " "Querying for actual allocation", def->source.name, vol->name); -- 2.20.1

On 3/15/19 4:17 PM, jdillama@redhat.com wrote:
From: Jason Dillaman <dillaman@redhat.com>
The librbd API will transparently revert to a slow disk usage calculation method if the fast-diff map is marked as invalid.
Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- src/storage/storage_backend_rbd.c | 41 ++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 4 deletions(-)
ACK, even though I don't have any ceph deployed where I could test this, so I just have to take your word that this still works. The code looks okay though. Michal

From: Jason Dillaman <dillaman@redhat.com> When this attribute is set to 'capacity', refreshing volume allocations within the pool will report the capacity for the allocation. This is useful for certain backends where computing the actual allocation of a volume might be an expensive operation. Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- docs/formatstorage.html.in | 15 +++++++++--- docs/schemas/storagecommon.rng | 7 ++++++ docs/schemas/storagepool.rng | 5 ++++ src/conf/storage_conf.c | 23 +++++++++++++++++++ src/conf/storage_conf.h | 9 ++++++++ .../pool-rbd-refresh-volume-allocation.xml | 12 ++++++++++ .../pool-rbd-refresh-volume-allocation.xml | 15 ++++++++++++ tests/storagepoolxml2xmltest.c | 1 + 8 files changed, 84 insertions(+), 3 deletions(-) create mode 100644 tests/storagepoolxml2xmlin/pool-rbd-refresh-volume-allocation.xml create mode 100644 tests/storagepoolxml2xmlout/pool-rbd-refresh-volume-allocation.xml diff --git a/docs/formatstorage.html.in b/docs/formatstorage.html.in index 968651330f..d40e463167 100644 --- a/docs/formatstorage.html.in +++ b/docs/formatstorage.html.in @@ -16,7 +16,7 @@ </p> <p> The top level tag for a storage pool document is 'pool'. It has - a single attribute <code>type</code>, which is one of <code>dir</code>, + an attribute <code>type</code>, which is one of <code>dir</code>, <code>fs</code>, <code>netfs</code>, <code>disk</code>, <code>iscsi</code>, <code>logical</code>, <code>scsi</code> (all <span class="since">since 0.4.1</span>), @@ -27,8 +27,17 @@ <code>zfs</code> (<span class="since">since 1.2.8</span>), <code>vstorage</code> (<span class="since">since 3.1.0</span>), or <code>iscsi-direct</code> (<span class="since">since 4.7.0</span>). - This corresponds to the - storage backend drivers listed further along in this document. + This corresponds to the storage backend drivers listed further along in + this document. + </p> + <p> + The 'pool' element may also have an optional + <code>refresh_volume_allocation</code> attribute to control how + volume allocation is computed during a refresh. Valid values + are <code>default</code> to compute the actual usage or + <code>capacity</code> to default the allocation to the logical + capacity for cases where computing the allocation is too expensive. + <span class="since">Since 5.1.1</span> </p> <h3><a id="StoragePoolFirst">General metadata</a></h3> diff --git a/docs/schemas/storagecommon.rng b/docs/schemas/storagecommon.rng index 63b51470a0..d837f92bc7 100644 --- a/docs/schemas/storagecommon.rng +++ b/docs/schemas/storagecommon.rng @@ -236,4 +236,11 @@ </optional> </define> + <define name='refreshVolumeAllocation'> + <choice> + <value>default</value> + <value>capacity</value> + </choice> + </define> + </grammar> diff --git a/docs/schemas/storagepool.rng b/docs/schemas/storagepool.rng index 3907f70afe..e44ecdb3f3 100644 --- a/docs/schemas/storagepool.rng +++ b/docs/schemas/storagepool.rng @@ -11,6 +11,11 @@ <define name='pool'> <element name='pool'> + <optional> + <attribute name='refresh_volume_allocation'> + <ref name="refreshVolumeAllocation"/> + </attribute> + </optional> <choice> <ref name='pooldir'/> <ref name='poolfs'/> diff --git a/src/conf/storage_conf.c b/src/conf/storage_conf.c index c7ab5b8802..dc4838058c 100644 --- a/src/conf/storage_conf.c +++ b/src/conf/storage_conf.c @@ -94,6 +94,11 @@ VIR_ENUM_IMPL(virStorageVolFormatDisk, "extended", ); +VIR_ENUM_IMPL(virStorageVolRefreshAllocation, + VIR_STORAGE_VOL_REFRESH_ALLOCATION_LAST, + "default", "capacity", +); + VIR_ENUM_IMPL(virStoragePartedFs, VIR_STORAGE_PARTED_FS_TYPE_LAST, "ext2", "ext2", "fat16", @@ -797,6 +802,7 @@ virStoragePoolDefParseXML(xmlXPathContextPtr ctxt) xmlNodePtr source_node; VIR_AUTOPTR(virStoragePoolDef) def = NULL; VIR_AUTOFREE(char *) type = NULL; + VIR_AUTOFREE(char *) refresh_volume_allocation = NULL; VIR_AUTOFREE(char *) uuid = NULL; VIR_AUTOFREE(char *) target_path = NULL; @@ -819,6 +825,18 @@ virStoragePoolDefParseXML(xmlXPathContextPtr ctxt) if ((options = virStoragePoolOptionsForPoolType(def->type)) == NULL) return NULL; + refresh_volume_allocation = virXPathString("string(./@refresh_volume_allocation)", + ctxt); + if (refresh_volume_allocation) { + if ((def->refresh_volume_allocation = + virStorageVolRefreshAllocationTypeFromString(refresh_volume_allocation)) < 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown storage pool volume refresh allocation type %s"), + refresh_volume_allocation); + return NULL; + } + } + source_node = virXPathNode("./source", ctxt); if (source_node) { if (virStoragePoolDefParseSource(ctxt, &def->source, def->type, @@ -1107,8 +1125,13 @@ virStoragePoolDefFormatBuf(virBufferPtr buf, return -1; } virBufferAsprintf(buf, "<pool type='%s'", type); + if (def->refresh_volume_allocation) + virBufferAsprintf(buf, " refresh_volume_allocation='%s'", + virStorageVolRefreshAllocationTypeToString(def->refresh_volume_allocation)); if (def->namespaceData && def->ns.href) virBufferAsprintf(buf, " %s", (def->ns.href)()); + + virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2); virBufferEscapeString(buf, "<name>%s</name>\n", def->name); diff --git a/src/conf/storage_conf.h b/src/conf/storage_conf.h index bfbebd15bd..f61d8f5a18 100644 --- a/src/conf/storage_conf.h +++ b/src/conf/storage_conf.h @@ -83,6 +83,13 @@ struct _virStorageVolSource { * backend for partition type creation */ }; +typedef enum { + VIR_STORAGE_VOL_REFRESH_ALLOCATION_DEFAULT, /* compute actual allocation */ + VIR_STORAGE_VOL_REFRESH_ALLOCATION_CAPACITY, /* use logical capacity */ + VIR_STORAGE_VOL_REFRESH_ALLOCATION_LAST, +} virStorageVolRefreshAllocationType; + +VIR_ENUM_DECL(virStorageVolRefreshAllocation); typedef struct _virStorageVolDef virStorageVolDef; typedef virStorageVolDef *virStorageVolDefPtr; @@ -243,6 +250,8 @@ struct _virStoragePoolDef { unsigned char uuid[VIR_UUID_BUFLEN]; int type; /* virStoragePoolType */ + int refresh_volume_allocation; /* virStorageVolRefreshAllocationType */ + unsigned long long allocation; /* bytes */ unsigned long long capacity; /* bytes */ unsigned long long available; /* bytes */ diff --git a/tests/storagepoolxml2xmlin/pool-rbd-refresh-volume-allocation.xml b/tests/storagepoolxml2xmlin/pool-rbd-refresh-volume-allocation.xml new file mode 100644 index 0000000000..632d895310 --- /dev/null +++ b/tests/storagepoolxml2xmlin/pool-rbd-refresh-volume-allocation.xml @@ -0,0 +1,12 @@ +<pool type='rbd' refresh_volume_allocation='capacity'> + <name>ceph</name> + <uuid>47c1faee-0207-e741-f5ae-d9b019b98fe2</uuid> + <source> + <name>rbd</name> + <host name='localhost' port='6789'/> + <host name='localhost' port='6790'/> + <auth username='admin' type='ceph'> + <secret uuid='2ec115d7-3a88-3ceb-bc12-0ac909a6fd87'/> + </auth> + </source> +</pool> diff --git a/tests/storagepoolxml2xmlout/pool-rbd-refresh-volume-allocation.xml b/tests/storagepoolxml2xmlout/pool-rbd-refresh-volume-allocation.xml new file mode 100644 index 0000000000..c40c0b5bd5 --- /dev/null +++ b/tests/storagepoolxml2xmlout/pool-rbd-refresh-volume-allocation.xml @@ -0,0 +1,15 @@ +<pool type='rbd' refresh_volume_allocation='capacity'> + <name>ceph</name> + <uuid>47c1faee-0207-e741-f5ae-d9b019b98fe2</uuid> + <capacity unit='bytes'>0</capacity> + <allocation unit='bytes'>0</allocation> + <available unit='bytes'>0</available> + <source> + <host name='localhost' port='6789'/> + <host name='localhost' port='6790'/> + <name>rbd</name> + <auth type='ceph' username='admin'> + <secret uuid='2ec115d7-3a88-3ceb-bc12-0ac909a6fd87'/> + </auth> + </source> +</pool> diff --git a/tests/storagepoolxml2xmltest.c b/tests/storagepoolxml2xmltest.c index bd3408e8b8..2ae514f346 100644 --- a/tests/storagepoolxml2xmltest.c +++ b/tests/storagepoolxml2xmltest.c @@ -95,6 +95,7 @@ mymain(void) DO_TEST("pool-zfs-sourcedev"); DO_TEST("pool-rbd"); #ifdef WITH_STORAGE_RBD + DO_TEST("pool-rbd-refresh-volume-allocation"); DO_TEST("pool-rbd-ns-configopts"); #endif DO_TEST("pool-vstorage"); -- 2.20.1

On 3/15/19 4:17 PM, jdillama@redhat.com wrote:
From: Jason Dillaman <dillaman@redhat.com>
When this attribute is set to 'capacity', refreshing volume allocations within the pool will report the capacity for the allocation. This is useful for certain backends where computing the actual allocation of a volume might be an expensive operation.
Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- docs/formatstorage.html.in | 15 +++++++++--- docs/schemas/storagecommon.rng | 7 ++++++ docs/schemas/storagepool.rng | 5 ++++ src/conf/storage_conf.c | 23 +++++++++++++++++++ src/conf/storage_conf.h | 9 ++++++++ .../pool-rbd-refresh-volume-allocation.xml | 12 ++++++++++ .../pool-rbd-refresh-volume-allocation.xml | 15 ++++++++++++ tests/storagepoolxml2xmltest.c | 1 + 8 files changed, 84 insertions(+), 3 deletions(-) create mode 100644 tests/storagepoolxml2xmlin/pool-rbd-refresh-volume-allocation.xml create mode 100644 tests/storagepoolxml2xmlout/pool-rbd-refresh-volume-allocation.xml
diff --git a/docs/formatstorage.html.in b/docs/formatstorage.html.in index 968651330f..d40e463167 100644 --- a/docs/formatstorage.html.in +++ b/docs/formatstorage.html.in @@ -16,7 +16,7 @@ </p> <p> The top level tag for a storage pool document is 'pool'. It has - a single attribute <code>type</code>, which is one of <code>dir</code>, + an attribute <code>type</code>, which is one of <code>dir</code>, <code>fs</code>, <code>netfs</code>, <code>disk</code>, <code>iscsi</code>, <code>logical</code>, <code>scsi</code> (all <span class="since">since 0.4.1</span>), @@ -27,8 +27,17 @@ <code>zfs</code> (<span class="since">since 1.2.8</span>), <code>vstorage</code> (<span class="since">since 3.1.0</span>), or <code>iscsi-direct</code> (<span class="since">since 4.7.0</span>). - This corresponds to the - storage backend drivers listed further along in this document. + This corresponds to the storage backend drivers listed further along in + this document. + </p> + <p> + The 'pool' element may also have an optional + <code>refresh_volume_allocation</code> attribute to control how + volume allocation is computed during a refresh. Valid values + are <code>default</code> to compute the actual usage or + <code>capacity</code> to default the allocation to the logical + capacity for cases where computing the allocation is too expensive. + <span class="since">Since 5.1.1</span>
Since 5.2.0 ;-) We change minor for every release, and micro is for asynchronous releases, where we backport some patches (not that we've ever done that, except maybe for v3.2.1). Anyway, I don't like the name of this attribute. In my book, if an attribute has to consist of two or more words that calls for having an element for each of the words but the last which then can be as an attribute. So how about: <pool> <refresh method='default|capacity'/> ... </pool> This is also more future proof IMO as it allows us to expand <refresh/> should we ever need that. The coode looks good otherwise. Michal

On Tue, Mar 19, 2019 at 5:52 AM Michal Privoznik <mprivozn@redhat.com> wrote:
On 3/15/19 4:17 PM, jdillama@redhat.com wrote:
From: Jason Dillaman <dillaman@redhat.com>
When this attribute is set to 'capacity', refreshing volume allocations within the pool will report the capacity for the allocation. This is useful for certain backends where computing the actual allocation of a volume might be an expensive operation.
Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- docs/formatstorage.html.in | 15 +++++++++--- docs/schemas/storagecommon.rng | 7 ++++++ docs/schemas/storagepool.rng | 5 ++++ src/conf/storage_conf.c | 23 +++++++++++++++++++ src/conf/storage_conf.h | 9 ++++++++ .../pool-rbd-refresh-volume-allocation.xml | 12 ++++++++++ .../pool-rbd-refresh-volume-allocation.xml | 15 ++++++++++++ tests/storagepoolxml2xmltest.c | 1 + 8 files changed, 84 insertions(+), 3 deletions(-) create mode 100644 tests/storagepoolxml2xmlin/pool-rbd-refresh-volume-allocation.xml create mode 100644 tests/storagepoolxml2xmlout/pool-rbd-refresh-volume-allocation.xml
diff --git a/docs/formatstorage.html.in b/docs/formatstorage.html.in index 968651330f..d40e463167 100644 --- a/docs/formatstorage.html.in +++ b/docs/formatstorage.html.in @@ -16,7 +16,7 @@ </p> <p> The top level tag for a storage pool document is 'pool'. It has - a single attribute <code>type</code>, which is one of <code>dir</code>, + an attribute <code>type</code>, which is one of <code>dir</code>, <code>fs</code>, <code>netfs</code>, <code>disk</code>, <code>iscsi</code>, <code>logical</code>, <code>scsi</code> (all <span class="since">since 0.4.1</span>), @@ -27,8 +27,17 @@ <code>zfs</code> (<span class="since">since 1.2.8</span>), <code>vstorage</code> (<span class="since">since 3.1.0</span>), or <code>iscsi-direct</code> (<span class="since">since 4.7.0</span>). - This corresponds to the - storage backend drivers listed further along in this document. + This corresponds to the storage backend drivers listed further along in + this document. + </p> + <p> + The 'pool' element may also have an optional + <code>refresh_volume_allocation</code> attribute to control how + volume allocation is computed during a refresh. Valid values + are <code>default</code> to compute the actual usage or + <code>capacity</code> to default the allocation to the logical + capacity for cases where computing the allocation is too expensive. + <span class="since">Since 5.1.1</span>
Since 5.2.0 ;-) We change minor for every release, and micro is for asynchronous releases, where we backport some patches (not that we've ever done that, except maybe for v3.2.1).
Anyway, I don't like the name of this attribute. In my book, if an attribute has to consist of two or more words that calls for having an element for each of the words but the last which then can be as an attribute. So how about:
<pool> <refresh method='default|capacity'/> ... </pool>
Thanks for the suggestion ... I didn't love it either.
This is also more future proof IMO as it allows us to expand <refresh/> should we ever need that.
The coode looks good otherwise.
Michal
-- Jason

From: Jason Dillaman <dillaman@redhat.com> Use the new 'refresh_volume_allocation' pool override to skip computing the actual volume usage if disabled. Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- src/storage/storage_backend_rbd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index e67911f928..1b01fbb3a8 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -564,7 +564,8 @@ volStorageBackendRBDRefreshVolInfo(virStorageVolDefPtr vol, vol->type = VIR_STORAGE_VOL_NETWORK; vol->target.format = VIR_STORAGE_FILE_RAW; - if (volStorageBackendRBDUseFastDiff(features, flags)) { + if (def->refresh_volume_allocation == VIR_STORAGE_VOL_REFRESH_ALLOCATION_DEFAULT && + volStorageBackendRBDUseFastDiff(features, flags)) { VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " "Querying for actual allocation", def->source.name, vol->name); -- 2.20.1

On 3/15/19 4:17 PM, jdillama@redhat.com wrote:
From: Jason Dillaman <dillaman@redhat.com>
Use the new 'refresh_volume_allocation' pool override to skip computing the actual volume usage if disabled.
Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- src/storage/storage_backend_rbd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index e67911f928..1b01fbb3a8 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -564,7 +564,8 @@ volStorageBackendRBDRefreshVolInfo(virStorageVolDefPtr vol, vol->type = VIR_STORAGE_VOL_NETWORK; vol->target.format = VIR_STORAGE_FILE_RAW;
- if (volStorageBackendRBDUseFastDiff(features, flags)) { + if (def->refresh_volume_allocation == VIR_STORAGE_VOL_REFRESH_ALLOCATION_DEFAULT && + volStorageBackendRBDUseFastDiff(features, flags)) { VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " "Querying for actual allocation", def->source.name, vol->name);
This will need to adapt to changes I'm suggesting in 2/3 but the idea looks good. Michal
participants (3)
-
Jason Dillaman
-
jdillama@redhat.com
-
Michal Privoznik