
On 12/23/2015 10:06 AM, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
This patch also adds a new wiping algorithm VIR_STORAGE_VOL_WIPE_ALG_DISCARD
By default the VIR_STORAGE_VOL_WIPE_ALG_ZERO algorithm is used and with RBD this will called rbd_write() in chunks of the underlying object size to completely zero out the volume.
With VIR_STORAGE_VOL_WIPE_ALG_DISCARD it will call rbd_discard() in the same object size chunks which will trim/discard all underlying RADOS objects in the Ceph cluster.
Signed-off-by: Wido den Hollander <wido@widodh.nl> --- include/libvirt/libvirt-storage.h | 4 + src/storage/storage_backend_rbd.c | 155 +++++++++++++++++++++++++++++++++++++- tools/virsh-volume.c | 2 +- 3 files changed, 159 insertions(+), 2 deletions(-)
Found these buried in my todo list of things to look at from during the holiday break. I figure by bumping it'll bring it back into focus... "Semantically" speaking - this patch is a v2 of the original patch series... I'm still a bit conflicted whether to add a new option to Wipe or whether a new API should be developed. I see value in both options. Although perhaps thinking of this as "trim" and not "discard" could make it more palatable for wipe. As a new API, each backend driver could decide whether it supports the discard/trim option, but that's quite a bit more work (essentially mimic the Wipe functionality, but generate Trim). I'll note off the top that if we go with adding a new wipe algorithm and we've updated virsh-volume.c to recognize that, then virsh.pod would also need an update to describe it. Also rather than one patch here - I suggest smaller individual patches to make it easier to debug issues down the line when using git bisect. I see perhaps 4 patches... Patch 1: You probably want to start by adjusting virStorageBackendVolWipeLocal. In particular, the switch statement there needs some tweaking - first to use the "switch ((virStorageVolWipeAlgorithm) algorithm) {" construct, but also fixing a 'bug' I just noted in the current design. If the current 'default:' option is taken, the code reports an error, but still attempts the SCRUB command (which will return/cause a different error). BTW: Instead of default it would the *_LAST case... If you're really ambitious, adding a check for the "expected" 'flags' bits would also be beneficial especially since you'll be adding one. Patch 2: Add wipe support for rbd and add the Zero algorithm. This gives a base. The switch in virStorageBackendRBDVolWipe could still remain, but the Flags would only be for _ZERO Patch 3: Then add a the 'trim' option to libvirt-storage.h, virsh-volume.c, and virsh.pod... Patch 4: This patch would add the 'trim' support to the backend. Also grab virStorageBackendRBDImageInfo from patch 2. You're making the same 'stripe_count' call in this patch, but don't have the same checks. If you're concerned about the perhaps extra unnecessary calls you could allow the 3 return parameters to be NULL, then prior to fetching do "if (param)" type trick. The caller could then provide a NULL if they don't care about features and unit...
diff --git a/include/libvirt/libvirt-storage.h b/include/libvirt/libvirt-storage.h index 2c55c93..139add3 100644 --- a/include/libvirt/libvirt-storage.h +++ b/include/libvirt/libvirt-storage.h @@ -153,6 +153,10 @@ typedef enum {
VIR_STORAGE_VOL_WIPE_ALG_RANDOM = 8, /* 1-pass random */
+ VIR_STORAGE_VOL_WIPE_ALG_DISCARD = 9, /* 1-pass, discard all data on the + volume by using TRIM or + DISCARD */
Assuming we use wipe, I think "TRIM" with the description of the option to be "trimming" the contents of the volume. Whether that's sparse files, thin/sparse logical volumes, or rbd object discarding... The aren't your problem to solve here, unless you have that desire to make those changes too. Also, the 2nd/3rd comments should line up under 1-pass...
+ # ifdef VIR_ENUM_SENTINELS VIR_STORAGE_VOL_WIPE_ALG_LAST /* diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index cdbfdee..d13658d 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -32,6 +32,7 @@ #include "base64.h" #include "viruuid.h" #include "virstring.h" +#include "virutil.h"
This isn't necessary I believe. I was able to remove without issue.
#include "rados/librados.h" #include "rbd/librbd.h"
@@ -700,6 +701,157 @@ static int virStorageBackendRBDResizeVol(virConnectPtr conn ATTRIBUTE_UNUSED, return ret; }
+static int virStorageBackendRBDVolWipeZero(rbd_image_t image, + char *imgname, + rbd_image_info_t info, + uint64_t stripe_count)
Newer libvirt convention is: static int virStorage...
+{ + int r = -1;
Add: int ret = -1; Usually it's 'ret' instead of just 'r'... Keeping 'r' for rbd_*() call failures fine though since that will contain (and possibly message) rbd_* specific API call errors...
+ size_t offset = 0; + uint64_t length; + char *writebuf; + + if (VIR_ALLOC_N(writebuf, info.obj_size * stripe_count) < 0) + goto cleanup; + + while (offset < info.size) { + length = MIN((info.size - offset), (info.obj_size * stripe_count)); + + r = rbd_write(image, offset, length, writebuf); + if (r < 0) { + virReportSystemError(-r, _("writing %llu bytes failed on " + " RBD image %s at offset %llu"),
This will generate two spaces "... failed on RBD image..."
+ (unsigned long long)length, + imgname, + (unsigned long long)offset);
So is length a "uint64_t" or not? I do note that librdb.h deems it a "size_t"... The query is more why caste to (unsigned long long) other than the %llu (of course). As for offset, IIRC the convention is "%zu", although for this one I note that the librdb.h deems it a "uint64_t".
+ goto cleanup; + } + + VIR_DEBUG("Wrote %llu bytes to RBD image %s at offset %llu", + (unsigned long long)length, + imgname, (unsigned long long)offset);
similar comments regarding the castes and the variable types.
+ + offset += length; + }
Here would be: ret = 0;
+ + cleanup:
writebuf is leaked. Need a VIR_FREE()
+ return r;
and this becomes return ret;
+} + +static int virStorageBackendRBDVolWipeDiscard(rbd_image_t image, + char *imgname, + rbd_image_info_t info, + uint64_t stripe_count)
static int virStorage...
+{ + int r = -1;
Need int ret = -1
+ size_t offset = 0; + uint64_t length; + + VIR_DEBUG("Wiping RBD %s volume using discard)", imgname); + + while (offset < info.size) { + length = MIN((info.size - offset), (info.obj_size * stripe_count)); + + r = rbd_discard(image, offset, length);
rbd_discard deems 'offset' to also be a uint64_t
+ if (r < 0) { + virReportSystemError(-r, _("discarding %llu bytes failed on " + " RBD image %s at offset %llu"),
similar to *Zero - you'll have "...failed on RBD image..."
+ (unsigned long long)length, + imgname, + (unsigned long long)offset);
similar comments regarding caste's of length and offset
+ goto cleanup; + } + + VIR_DEBUG("Discarded %llu bytes of RBD image %s at offset %llu", + (unsigned long long)length, + imgname, (unsigned long long)offset);
similar comments regarding caste's
+ + offset += length; + }
Here would be ret = 0;
+ + cleanup: + return r;
And return ret;
+} + +static int virStorageBackendRBDVolWipe(virConnectPtr conn, + virStoragePoolObjPtr pool, + virStorageVolDefPtr vol, + unsigned int algorithm, + unsigned int flags)
static int virStorage...
+{ + virStorageBackendRBDState ptr; + ptr.cluster = NULL; + ptr.ioctx = NULL; + rbd_image_t image = NULL; + rbd_image_info_t info; + uint64_t stripe_count; + int r = -1;
Add int ret = -1;
+ + virCheckFlags(VIR_STORAGE_VOL_WIPE_ALG_ZERO | + VIR_STORAGE_VOL_WIPE_ALG_DISCARD, -1); + + VIR_DEBUG("Wiping RBD image %s/%s", pool->def->source.name, vol->name); + + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0) + goto cleanup; + + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0) + goto cleanup; + + r = rbd_open(ptr.ioctx, vol->name, &image, NULL); + if (r < 0) {
BTW: This can be : if ((r = rbd_open(ptr.ioctx, vol->name, &image, NULL)) < 0) { For this and all rbd_* calls...
+ virReportSystemError(-r, _("failed to open the RBD image %s"), + vol->name); + goto cleanup; + } + + r = rbd_stat(image, &info, sizeof(info)); + if (r < 0) { + virReportSystemError(-r, _("failed to stat the RBD image %s"), + vol->name); + goto cleanup; + } + + r = rbd_get_stripe_count(image, &stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get stripe count of RBD image %s"), + vol->name); + goto cleanup; + }
I see the subsequent patch has some extra checks before calling this. Why wouldn't those also need to be made here?
+ + VIR_DEBUG("Need to wipe %llu bytes from RBD image %s/%s", + (unsigned long long)info.size, pool->def->source.name, vol->name); + + switch (algorithm) {
Follow the convention of "switch ((virStorageVolWipeAlgorithm) algorithm) {" Then each "case" lines up under "switch".
+ case VIR_STORAGE_VOL_WIPE_ALG_ZERO: + r = virStorageBackendRBDVolWipeZero(image, vol->name, + info, stripe_count);
I would change this (and the next one) to: if (virStorageBackendRBDVolWipeZero(image, vol->name, info, stripe_count) < 0) goto cleanup; Also, I ran these patches through Coverity - it complains that 'info' is passed by value of 160 bytes... Although neither API adjusts it, why not just pass "info.size" and "info.obj_size" or pass by reference the whole 'info' (just to be safe).
+ break; + case VIR_STORAGE_VOL_WIPE_ALG_DISCARD: + r = virStorageBackendRBDVolWipeDiscard(image, vol->name, + info, stripe_count); + break; + default:
And listing each case allowed - so it's clearer. That way if someone in the future comes along and adds ALG_ONE, the rbd code isn't forgotten to be adjusted... The compiler catches it.
+ virReportError(VIR_ERR_INVALID_ARG, _("unsupported algorithm %d"), + algorithm); + r = -VIR_ERR_INVALID_ARG;
This will be unnecessary...
+ goto cleanup; + } + + if (r < 0) { + virReportSystemError(-r, _("failed to wipe RBD image %s"), + vol->name);
This overwrites the errors found in the *WipeZero and *WipeDiscard API's
+ goto cleanup; + }
The assumption here being ret = 0;
+ + cleanup: + if (image) + rbd_close(image); + + virStorageBackendRBDCloseRADOSConn(&ptr); + return r;
return ret;
+} + virStorageBackend virStorageBackendRBD = { .type = VIR_STORAGE_POOL_RBD,
@@ -708,5 +860,6 @@ virStorageBackend virStorageBackendRBD = { .buildVol = virStorageBackendRBDBuildVol, .refreshVol = virStorageBackendRBDRefreshVol, .deleteVol = virStorageBackendRBDDeleteVol, - .resizeVol = virStorageBackendRBDResizeVol, + .wipeVol = virStorageBackendRBDVolWipe, + .resizeVol = virStorageBackendRBDResizeVol
No need to remove the "," - that way the only diff is the line.
}; diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c index 7932ef2..3e95aa5 100644 --- a/tools/virsh-volume.c +++ b/tools/virsh-volume.c @@ -954,7 +954,7 @@ static const vshCmdOptDef opts_vol_wipe[] = { VIR_ENUM_DECL(virStorageVolWipeAlgorithm) VIR_ENUM_IMPL(virStorageVolWipeAlgorithm, VIR_STORAGE_VOL_WIPE_ALG_LAST, "zero", "nnsa", "dod", "bsi", "gutmann", "schneier", - "pfitzner7", "pfitzner33", "random"); + "pfitzner7", "pfitzner33", "random", "discard");
I think "trim" will be better. John
static bool cmdVolWipe(vshControl *ctl, const vshCmd *cmd)