On 12/23/2015 10:06 AM, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the
libvirt
storage driver.
This patch also adds a new wiping algorithm VIR_STORAGE_VOL_WIPE_ALG_DISCARD
By default the VIR_STORAGE_VOL_WIPE_ALG_ZERO algorithm is used and with
RBD this will called rbd_write() in chunks of the underlying object size
to completely zero out the volume.
With VIR_STORAGE_VOL_WIPE_ALG_DISCARD it will call rbd_discard() in the
same object size chunks which will trim/discard all underlying RADOS objects
in the Ceph cluster.
Signed-off-by: Wido den Hollander <wido(a)widodh.nl>
---
include/libvirt/libvirt-storage.h | 4 +
src/storage/storage_backend_rbd.c | 155 +++++++++++++++++++++++++++++++++++++-
tools/virsh-volume.c | 2 +-
3 files changed, 159 insertions(+), 2 deletions(-)
Found these buried in my todo list of things to look at from during the
holiday break. I figure by bumping it'll bring it back into focus...
"Semantically" speaking - this patch is a v2 of the original patch series...
I'm still a bit conflicted whether to add a new option to Wipe or
whether a new API should be developed. I see value in both options.
Although perhaps thinking of this as "trim" and not "discard" could
make
it more palatable for wipe. As a new API, each backend driver could
decide whether it supports the discard/trim option, but that's quite a
bit more work (essentially mimic the Wipe functionality, but generate Trim).
I'll note off the top that if we go with adding a new wipe algorithm and
we've updated virsh-volume.c to recognize that, then virsh.pod would
also need an update to describe it.
Also rather than one patch here - I suggest smaller individual patches
to make it easier to debug issues down the line when using git bisect. I
see perhaps 4 patches...
Patch 1:
You probably want to start by adjusting virStorageBackendVolWipeLocal.
In particular, the switch statement there needs some tweaking - first to
use the "switch ((virStorageVolWipeAlgorithm) algorithm) {" construct,
but also fixing a 'bug' I just noted in the current design. If the
current 'default:' option is taken, the code reports an error, but still
attempts the SCRUB command (which will return/cause a different error).
BTW: Instead of default it would the *_LAST case... If you're really
ambitious, adding a check for the "expected" 'flags' bits would also be
beneficial especially since you'll be adding one.
Patch 2:
Add wipe support for rbd and add the Zero algorithm. This gives a base.
The switch in virStorageBackendRBDVolWipe could still remain, but the
Flags would only be for _ZERO
Patch 3:
Then add a the 'trim' option to libvirt-storage.h, virsh-volume.c, and
virsh.pod...
Patch 4:
This patch would add the 'trim' support to the backend. Also grab
virStorageBackendRBDImageInfo from patch 2. You're making the same
'stripe_count' call in this patch, but don't have the same checks. If
you're concerned about the perhaps extra unnecessary calls you could
allow the 3 return parameters to be NULL, then prior to fetching do "if
(param)" type trick. The caller could then provide a NULL if they don't
care about features and unit...
diff --git a/include/libvirt/libvirt-storage.h
b/include/libvirt/libvirt-storage.h
index 2c55c93..139add3 100644
--- a/include/libvirt/libvirt-storage.h
+++ b/include/libvirt/libvirt-storage.h
@@ -153,6 +153,10 @@ typedef enum {
VIR_STORAGE_VOL_WIPE_ALG_RANDOM = 8, /* 1-pass random */
+ VIR_STORAGE_VOL_WIPE_ALG_DISCARD = 9, /* 1-pass, discard all data on the
+ volume by using TRIM or
+ DISCARD */
Assuming we use wipe, I think "TRIM" with the description of the option
to be "trimming" the contents of the volume. Whether that's sparse
files, thin/sparse logical volumes, or rbd object discarding... The
aren't your problem to solve here, unless you have that desire to make
those changes too. Also, the 2nd/3rd comments should line up under 1-pass...
+
# ifdef VIR_ENUM_SENTINELS
VIR_STORAGE_VOL_WIPE_ALG_LAST
/*
diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c
index cdbfdee..d13658d 100644
--- a/src/storage/storage_backend_rbd.c
+++ b/src/storage/storage_backend_rbd.c
@@ -32,6 +32,7 @@
#include "base64.h"
#include "viruuid.h"
#include "virstring.h"
+#include "virutil.h"
This isn't necessary I believe. I was able to remove without issue.
#include "rados/librados.h"
#include "rbd/librbd.h"
@@ -700,6 +701,157 @@ static int virStorageBackendRBDResizeVol(virConnectPtr conn
ATTRIBUTE_UNUSED,
return ret;
}
+static int virStorageBackendRBDVolWipeZero(rbd_image_t image,
+ char *imgname,
+ rbd_image_info_t info,
+ uint64_t stripe_count)
Newer libvirt convention is:
static int
virStorage...
+{
+ int r = -1;
Add:
int ret = -1;
Usually it's 'ret' instead of just 'r'... Keeping 'r' for
rbd_*() call
failures fine though since that will contain (and possibly message)
rbd_* specific API call errors...
+ size_t offset = 0;
+ uint64_t length;
+ char *writebuf;
+
+ if (VIR_ALLOC_N(writebuf, info.obj_size * stripe_count) < 0)
+ goto cleanup;
+
+ while (offset < info.size) {
+ length = MIN((info.size - offset), (info.obj_size * stripe_count));
+
+ r = rbd_write(image, offset, length, writebuf);
+ if (r < 0) {
+ virReportSystemError(-r, _("writing %llu bytes failed on "
+ " RBD image %s at offset %llu"),
This will generate two spaces "... failed on RBD image..."
+ (unsigned long long)length,
+ imgname,
+ (unsigned long long)offset);
So is length a "uint64_t" or not? I do note that librdb.h deems it a
"size_t"... The query is more why caste to (unsigned long long) other
than the %llu (of course).
As for offset, IIRC the convention is "%zu", although for this one I
note that the librdb.h deems it a "uint64_t".
+ goto cleanup;
+ }
+
+ VIR_DEBUG("Wrote %llu bytes to RBD image %s at offset %llu",
+ (unsigned long long)length,
+ imgname, (unsigned long long)offset);
similar comments regarding the castes and the variable types.
+
+ offset += length;
+ }
Here would be:
ret = 0;
+
+ cleanup:
writebuf is leaked. Need a VIR_FREE()
+ return r;
and this becomes return ret;
+}
+
+static int virStorageBackendRBDVolWipeDiscard(rbd_image_t image,
+ char *imgname,
+ rbd_image_info_t info,
+ uint64_t stripe_count)
static int
virStorage...
+{
+ int r = -1;
Need int ret = -1
+ size_t offset = 0;
+ uint64_t length;
+
+ VIR_DEBUG("Wiping RBD %s volume using discard)", imgname);
+
+ while (offset < info.size) {
+ length = MIN((info.size - offset), (info.obj_size * stripe_count));
+
+ r = rbd_discard(image, offset, length);
rbd_discard deems 'offset' to also be a uint64_t
+ if (r < 0) {
+ virReportSystemError(-r, _("discarding %llu bytes failed on "
+ " RBD image %s at offset %llu"),
similar to *Zero - you'll have "...failed on RBD image..."
+ (unsigned long long)length,
+ imgname,
+ (unsigned long long)offset);
similar comments regarding caste's of length and offset
+ goto cleanup;
+ }
+
+ VIR_DEBUG("Discarded %llu bytes of RBD image %s at offset %llu",
+ (unsigned long long)length,
+ imgname, (unsigned long long)offset);
similar comments regarding caste's
+
+ offset += length;
+ }
Here would be
ret = 0;
+
+ cleanup:
+ return r;
And return ret;
+}
+
+static int virStorageBackendRBDVolWipe(virConnectPtr conn,
+ virStoragePoolObjPtr pool,
+ virStorageVolDefPtr vol,
+ unsigned int algorithm,
+ unsigned int flags)
static int
virStorage...
+{
+ virStorageBackendRBDState ptr;
+ ptr.cluster = NULL;
+ ptr.ioctx = NULL;
+ rbd_image_t image = NULL;
+ rbd_image_info_t info;
+ uint64_t stripe_count;
+ int r = -1;
Add
int ret = -1;
+
+ virCheckFlags(VIR_STORAGE_VOL_WIPE_ALG_ZERO |
+ VIR_STORAGE_VOL_WIPE_ALG_DISCARD, -1);
+
+ VIR_DEBUG("Wiping RBD image %s/%s", pool->def->source.name,
vol->name);
+
+ if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source)
< 0)
+ goto cleanup;
+
+ if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0)
+ goto cleanup;
+
+ r = rbd_open(ptr.ioctx, vol->name, &image, NULL);
+ if (r < 0) {
BTW: This can be :
if ((r = rbd_open(ptr.ioctx, vol->name, &image, NULL)) < 0) {
For this and all rbd_* calls...
+ virReportSystemError(-r, _("failed to open the RBD
image %s"),
+ vol->name);
+ goto cleanup;
+ }
+
+ r = rbd_stat(image, &info, sizeof(info));
+ if (r < 0) {
+ virReportSystemError(-r, _("failed to stat the RBD image %s"),
+ vol->name);
+ goto cleanup;
+ }
+
+ r = rbd_get_stripe_count(image, &stripe_count);
+ if (r < 0) {
+ virReportSystemError(-r, _("failed to get stripe count of RBD image
%s"),
+ vol->name);
+ goto cleanup;
+ }
I see the subsequent patch has some extra checks before calling this.
Why wouldn't those also need to be made here?
+
+ VIR_DEBUG("Need to wipe %llu bytes from RBD image %s/%s",
+ (unsigned long long)info.size, pool->def->source.name,
vol->name);
+
+ switch (algorithm) {
Follow the convention of
"switch ((virStorageVolWipeAlgorithm) algorithm) {"
Then each "case" lines up under "switch".
+ case VIR_STORAGE_VOL_WIPE_ALG_ZERO:
+ r = virStorageBackendRBDVolWipeZero(image, vol->name,
+ info, stripe_count);
I would change this (and the next one) to:
if (virStorageBackendRBDVolWipeZero(image, vol->name,
info, stripe_count) < 0)
goto cleanup;
Also, I ran these patches through Coverity - it complains that 'info' is
passed by value of 160 bytes... Although neither API adjusts it, why not
just pass "info.size" and "info.obj_size" or pass by reference the
whole
'info' (just to be safe).
+ break;
+ case VIR_STORAGE_VOL_WIPE_ALG_DISCARD:
+ r = virStorageBackendRBDVolWipeDiscard(image, vol->name,
+ info, stripe_count);
+ break;
+ default:
And listing each case allowed - so it's clearer. That way if someone in
the future comes along and adds ALG_ONE, the rbd code isn't forgotten to
be adjusted... The compiler catches it.
+ virReportError(VIR_ERR_INVALID_ARG, _("unsupported
algorithm %d"),
+ algorithm);
+ r = -VIR_ERR_INVALID_ARG;
This will be unnecessary...
+ goto cleanup;
+ }
+
+ if (r < 0) {
+ virReportSystemError(-r, _("failed to wipe RBD image %s"),
+ vol->name);
This overwrites the errors found in the *WipeZero and *WipeDiscard API's
+ goto cleanup;
+ }
The assumption here being
ret = 0;
+
+ cleanup:
> + if (image)
> + rbd_close(image);
> +
> + virStorageBackendRBDCloseRADOSConn(&ptr);
+ return r;
return ret;
+}
+
virStorageBackend virStorageBackendRBD = {
.type = VIR_STORAGE_POOL_RBD,
@@ -708,5 +860,6 @@ virStorageBackend virStorageBackendRBD = {
.buildVol = virStorageBackendRBDBuildVol,
.refreshVol = virStorageBackendRBDRefreshVol,
.deleteVol = virStorageBackendRBDDeleteVol,
- .resizeVol = virStorageBackendRBDResizeVol,
+ .wipeVol = virStorageBackendRBDVolWipe,
+ .resizeVol = virStorageBackendRBDResizeVol
No need to remove the "," - that way the only diff is the line.
};
diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c
index 7932ef2..3e95aa5 100644
--- a/tools/virsh-volume.c
+++ b/tools/virsh-volume.c
@@ -954,7 +954,7 @@ static const vshCmdOptDef opts_vol_wipe[] = {
VIR_ENUM_DECL(virStorageVolWipeAlgorithm)
VIR_ENUM_IMPL(virStorageVolWipeAlgorithm, VIR_STORAGE_VOL_WIPE_ALG_LAST,
"zero", "nnsa", "dod", "bsi",
"gutmann", "schneier",
- "pfitzner7", "pfitzner33", "random");
+ "pfitzner7", "pfitzner33", "random",
"discard");
I think "trim" will be better.
John
static bool
cmdVolWipe(vshControl *ctl, const vshCmd *cmd)