[libvirt] [PATCH 0/2] rbd: improvements to actual disk-usage calculation

From: Jason Dillaman <dillaman@redhat.com> The RBD fast-diff feature can vastly reduce the amount of time needed to calculate actual disk usage of volumes, but it might still be a slow operation for large RBD pools or pools with large RBD images. Therefore, this feature should be able to be optionally disabled if needed. Additionally, the fast-diff feature can only be used if the fast-diff map isn't flagged as invalid. Otherwise, librbd will silently perform a costly block-by-block scan to calculate the disk usage. Jason Dillaman (2): rbd: do not attempt to use fast-diff if it's marked invalid rbd: optionally disable actual disk-usage during pool/volume refresh docs/formatstorage.html.in | 13 ++++- src/storage/storage_backend_rbd.c | 80 ++++++++++++++++++++++++++++--- 2 files changed, 84 insertions(+), 9 deletions(-) -- 2.20.1

From: Jason Dillaman <dillaman@redhat.com> The librbd API will transparently revert to a slow disk usage calculation method if the fast-diff map is marked as invalid. Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- src/storage/storage_backend_rbd.c | 41 ++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index 2b7af1db23..e67911f928 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -437,10 +437,29 @@ volStorageBackendRBDGetFeatures(rbd_image_t image, } #if LIBRBD_VERSION_CODE > 265 +static int +volStorageBackendRBDGetFlags(rbd_image_t image, + const char *volname, + uint64_t *flags) +{ + int r, ret = -1; + + if ((r = rbd_get_flags(image, flags)) < 0) { + virReportSystemError(-r, _("failed to get the flags of RBD image " + "%s"), volname); + goto cleanup; + } + ret = 0; + + cleanup: + return ret; +} + static bool -volStorageBackendRBDUseFastDiff(uint64_t features) +volStorageBackendRBDUseFastDiff(uint64_t features, uint64_t flags) { - return features & RBD_FEATURE_FAST_DIFF; + return (((features & RBD_FEATURE_FAST_DIFF) != 0ULL) && + ((flags & RBD_FLAG_FAST_DIFF_INVALID) == 0ULL)); } static int @@ -484,7 +503,17 @@ virStorageBackendRBDSetAllocation(virStorageVolDefPtr vol, #else static int -volStorageBackendRBDUseFastDiff(uint64_t features ATTRIBUTE_UNUSED) +volStorageBackendRBDGetFlags(rbd_image_t image, + const char *volname, + uint64_t *flags) +{ + *flags = 0; + return 0; +} + +static int +volStorageBackendRBDUseFastDiff(uint64_t features ATTRIBUTE_UNUSED, + uint64_t feature_flags ATTRIBUTE_UNUSED) { return false; } @@ -509,6 +538,7 @@ volStorageBackendRBDRefreshVolInfo(virStorageVolDefPtr vol, rbd_image_t image = NULL; rbd_image_info_t info; uint64_t features; + uint64_t flags; if ((r = rbd_open_read_only(ptr->ioctx, vol->name, &image, NULL)) < 0) { ret = -r; @@ -527,11 +557,14 @@ volStorageBackendRBDRefreshVolInfo(virStorageVolDefPtr vol, if (volStorageBackendRBDGetFeatures(image, vol->name, &features) < 0) goto cleanup; + if (volStorageBackendRBDGetFlags(image, vol->name, &flags) < 0) + goto cleanup; + vol->target.capacity = info.size; vol->type = VIR_STORAGE_VOL_NETWORK; vol->target.format = VIR_STORAGE_FILE_RAW; - if (volStorageBackendRBDUseFastDiff(features)) { + if (volStorageBackendRBDUseFastDiff(features, flags)) { VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " "Querying for actual allocation", def->source.name, vol->name); -- 2.20.1

From: Jason Dillaman <dillaman@redhat.com> For pools with numerous volumes or very large volumes, the disk-usage calculation can take a non-trivial amount of time even with the fast-diff feature enabled (especially since the the usage is calculated serially for each image in the pool). The "rbd:config_opts" node now supports a new libvirt-internal "libvirt_calculate_disk_usage" option which can be set to false to disable this calculation as needed. Signed-off-by: Jason Dillaman <dillaman@redhat.com> --- docs/formatstorage.html.in | 13 ++++++++-- src/storage/storage_backend_rbd.c | 41 ++++++++++++++++++++++++++++--- 2 files changed, 48 insertions(+), 6 deletions(-) diff --git a/docs/formatstorage.html.in b/docs/formatstorage.html.in index 968651330f..eb65edc4d4 100644 --- a/docs/formatstorage.html.in +++ b/docs/formatstorage.html.in @@ -613,12 +613,21 @@ <rbd:option name='client_mount_timeout' value='45'/> <rbd:option name='rados_mon_op_timeout' value='20'/> <rbd:option name='rados_osd_op_timeout' value='10'/> + <rbd:option name='libvirt_calculate_disk_usage' value='false'/> </rbd:config_opts> </pool> </pre> - <span class="since">Since 5.1.0.</span></dd> - + <span class="since">Since 5.1.0.</span> + <dl> + <dt><code>libvirt_calculate_disk_usage</code></dt> + <dd>Disable calculating the actual disk usage of RBD volumes when + set to <code>false</code>. This will speed up pool and volume + refresh operations for pools with many volumes or pools with + large images. + </dd> + </dl> + </dd> </dl> <h2><a id="StorageVol">Storage volume XML</a></h2> diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index e67911f928..07ab72f97c 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -61,6 +61,9 @@ struct _virStoragePoolRBDConfigOptionsDef { #define STORAGE_POOL_RBD_NAMESPACE_HREF "http://libvirt.org/schemas/storagepool/rbd/1.0" +#define CONFIG_OPTION_INTERNAL_PREFIX "libvirt_" +#define CONFIG_OPTION_CALCULATE_DISK_USAGE CONFIG_OPTION_INTERNAL_PREFIX "calculate_disk_usage" + static void virStoragePoolDefRBDNamespaceFree(void *nsdata) { @@ -321,6 +324,9 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, char uuidstr[VIR_UUID_STRING_BUFLEN]; for (i = 0; i < cmdopts->noptions; i++) { + if (STRPREFIX(cmdopts->names[i], CONFIG_OPTION_INTERNAL_PREFIX)) + continue; + if (virStorageBackendRBDRADOSConfSet(ptr->cluster, cmdopts->names[i], cmdopts->values[i]) < 0) @@ -527,10 +533,30 @@ virStorageBackendRBDSetAllocation(virStorageVolDefPtr vol ATTRIBUTE_UNUSED, } #endif +static bool +virStorageBackendRBDCalculateDiskUsage(virStoragePoolDefPtr def) +{ + size_t i; + bool calculate_disk_usage = true; + + if (def->namespaceData) { + virStoragePoolRBDConfigOptionsDefPtr cmdopts = def->namespaceData; + + for (i = 0; i < cmdopts->noptions; i++) { + if (STREQ(cmdopts->names[i], CONFIG_OPTION_CALCULATE_DISK_USAGE)) { + calculate_disk_usage = STRCASEEQ(cmdopts->values[i], "true"); + break; + } + } + } + return calculate_disk_usage; +} + static int volStorageBackendRBDRefreshVolInfo(virStorageVolDefPtr vol, virStoragePoolObjPtr pool, - virStorageBackendRBDStatePtr ptr) + virStorageBackendRBDStatePtr ptr, + bool calculate_disk_usage) { int ret = -1; virStoragePoolDefPtr def = virStoragePoolObjGetDef(pool); @@ -564,7 +590,8 @@ volStorageBackendRBDRefreshVolInfo(virStorageVolDefPtr vol, vol->type = VIR_STORAGE_VOL_NETWORK; vol->target.format = VIR_STORAGE_FILE_RAW; - if (volStorageBackendRBDUseFastDiff(features, flags)) { + if (calculate_disk_usage && + volStorageBackendRBDUseFastDiff(features, flags)) { VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " "Querying for actual allocation", def->source.name, vol->name); @@ -610,8 +637,10 @@ virStorageBackendRBDRefreshPool(virStoragePoolObjPtr pool) virStorageBackendRBDStatePtr ptr = NULL; struct rados_cluster_stat_t clusterstat; struct rados_pool_stat_t poolstat; + bool calculate_disk_usage = virStorageBackendRBDCalculateDiskUsage(def); VIR_AUTOFREE(char *) names = NULL; + if (!(ptr = virStorageBackendRBDNewState(pool))) goto cleanup; @@ -663,7 +692,8 @@ virStorageBackendRBDRefreshPool(virStoragePoolObjPtr pool) name += strlen(name) + 1; - r = volStorageBackendRBDRefreshVolInfo(vol, pool, ptr); + r = volStorageBackendRBDRefreshVolInfo(vol, pool, ptr, + calculate_disk_usage); /* It could be that a volume has been deleted through a different route * then libvirt and that will cause a -ENOENT to be returned. @@ -1238,12 +1268,15 @@ virStorageBackendRBDRefreshVol(virStoragePoolObjPtr pool, virStorageVolDefPtr vol) { virStorageBackendRBDStatePtr ptr = NULL; + virStoragePoolDefPtr def = virStoragePoolObjGetDef(pool); + bool calculate_disk_usage = virStorageBackendRBDCalculateDiskUsage(def); int ret = -1; if (!(ptr = virStorageBackendRBDNewState(pool))) goto cleanup; - if (volStorageBackendRBDRefreshVolInfo(vol, pool, ptr) < 0) + if (volStorageBackendRBDRefreshVolInfo(vol, pool, ptr, + calculate_disk_usage) < 0) goto cleanup; ret = 0; -- 2.20.1

On Mon, Mar 11, 2019 at 11:39:55AM -0400, jdillama@redhat.com wrote:
From: Jason Dillaman <dillaman@redhat.com>
For pools with numerous volumes or very large volumes, the disk-usage calculation can take a non-trivial amount of time even with the fast-diff feature enabled (especially since the the usage is calculated serially for each image in the pool).
The "rbd:config_opts" node now supports a new libvirt-internal "libvirt_calculate_disk_usage" option which can be set to false to disable this calculation as needed.
The RBD namespace is not something we want to encourage applications to use. It is supposed to be for hacks that are only used in testing or QA, never in production. Essentially any usage is considered unsupported for production. This scalability problem sounds like something most likely to hit in production envs, and as such we need a fully supportable solution. This probably suggests making it a core part of the storage pool schema, and extending its impl to other drivers too. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Mar 11, 2019 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Mar 11, 2019 at 11:39:55AM -0400, jdillama@redhat.com wrote:
From: Jason Dillaman <dillaman@redhat.com>
For pools with numerous volumes or very large volumes, the disk-usage calculation can take a non-trivial amount of time even with the fast-diff feature enabled (especially since the the usage is calculated serially for each image in the pool).
The "rbd:config_opts" node now supports a new libvirt-internal "libvirt_calculate_disk_usage" option which can be set to false to disable this calculation as needed.
The RBD namespace is not something we want to encourage applications to use. It is supposed to be for hacks that are only used in testing or QA, never in production. Essentially any usage is considered unsupported for production.
Gotcha. Honestly, in that case, since starting w/ the Mimic release of Ceph all configuration options can be stored in the Ceph monitors and starting with the Nautilus release of Ceph all RBD configuration options can be overridden in the pools and individual images, I actually wonder the value of this XML section even for non-production settings. Also, should the documentation be updated to indicate that this is only for non-production environments (assuming support for this isn't just yanked)?
This scalability problem sounds like something most likely to hit in production envs, and as such we need a fully supportable solution. This probably suggests making it a core part of the storage pool schema, and extending its impl to other drivers too.
From a quick scan, it only looked like RBD was using a costly "stat"-like method to calculate actual volume usage which is why I implemented it as an RBD-only feature. If a new "volume_usage=[physical|allocated]"-like option is acceptable in the root storage pool XML, I can add it and repost.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Thanks, -- Jason
participants (3)
-
Daniel P. Berrangé
-
Jason Dillaman
-
jdillama@redhat.com