[libvirt] rbd: Add support for wiping and cloning images to storage driver

These two patches add support for buildVolFrom and wipeVol to the RBD storage driver. Wiping is done by discarding all blocks on the RBD image. Building a volume from a original is done leveraging the cloning mechanism RBD already supports.

This allows user to use the volume wiping functionality of the libvirt storage driver. All data from the volume will be wiped by calling rbd_discard() in chunks of the underlying object and stripe size inside Ceph. This will ensure all data is zeroed and leaves the user with a completely empty volume ready for use again. Signed-off-by: Wido den Hollander <wido@widodh.nl> --- src/storage/storage_backend_rbd.c | 90 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 89 insertions(+), 1 deletion(-) diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index cdbfdee..5e16e7a 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -32,6 +32,7 @@ #include "base64.h" #include "viruuid.h" #include "virstring.h" +#include "virutil.h" #include "rados/librados.h" #include "rbd/librbd.h" @@ -700,6 +701,92 @@ static int virStorageBackendRBDResizeVol(virConnectPtr conn ATTRIBUTE_UNUSED, return ret; } +static int virStorageBackendRBDVolWipe(virConnectPtr conn, + virStoragePoolObjPtr pool, + virStorageVolDefPtr vol, + unsigned int algorithm, + unsigned int flags) +{ + virStorageBackendRBDState ptr; + ptr.cluster = NULL; + ptr.ioctx = NULL; + rbd_image_t image = NULL; + rbd_image_info_t info; + int r = -1; + size_t offset = 0; + uint64_t stripe_count; + uint64_t length; + + virCheckFlags(0, -1); + + VIR_DEBUG("Wiping RBD image %s/%s", pool->def->source.name, vol->name); + + if (algorithm != VIR_STORAGE_VOL_WIPE_ALG_ZERO) { + virReportError(VIR_ERR_NO_SUPPORT, "%s", + _("RBD storage pool does not support other wiping" + " algorithms than zero")); + goto cleanup; + } + + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0) + goto cleanup; + + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0) + goto cleanup; + + r = rbd_open(ptr.ioctx, vol->name, &image, NULL); + if (r < 0) { + virReportSystemError(-r, _("failed to open the RBD image %s"), + vol->name); + goto cleanup; + } + + r = rbd_stat(image, &info, sizeof(info)); + if (r < 0) { + virReportSystemError(-r, _("failed to stat the RBD image %s"), + vol->name); + goto cleanup; + } + + r = rbd_get_stripe_count(image, &stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get stripe count of RBD image %s"), + vol->name); + goto cleanup; + } + + VIR_DEBUG("Need to wipe %llu bytes from RBD image %s/%s", + (unsigned long long)info.size, pool->def->source.name, vol->name); + + while (offset < info.size) { + length = MIN((info.size - offset), (info.obj_size * stripe_count)); + + r = rbd_discard(image, offset, length); + if (r < 0) { + virReportSystemError(-r, _("discarding %llu bytes failed on " + " RBD image %s at offset %llu"), + (unsigned long long)length, + vol->name, + (unsigned long long)offset); + goto cleanup; + } + + VIR_DEBUG("Discarded %llu bytes of RBD image %s/%s at offset %llu", + (unsigned long long)length, + pool->def->source.name, + vol->name, (unsigned long long)offset); + + offset += length; + } + + cleanup: + if (image) + rbd_close(image); + + virStorageBackendRBDCloseRADOSConn(&ptr); + return r; +} + virStorageBackend virStorageBackendRBD = { .type = VIR_STORAGE_POOL_RBD, @@ -708,5 +795,6 @@ virStorageBackend virStorageBackendRBD = { .buildVol = virStorageBackendRBDBuildVol, .refreshVol = virStorageBackendRBDRefreshVol, .deleteVol = virStorageBackendRBDDeleteVol, - .resizeVol = virStorageBackendRBDResizeVol, + .wipeVol = virStorageBackendRBDVolWipe, + .resizeVol = virStorageBackendRBDResizeVol }; -- 1.9.1

On Wed, Dec 23, 2015 at 10:29:04AM +0100, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
All data from the volume will be wiped by calling rbd_discard() in chunks of the underlying object and stripe size inside Ceph.
This will ensure all data is zeroed and leaves the user with a completely empty volume ready for use again.
Based on the name 'rbd_discard' it sounds like this is going to call TRIM/DISCARD on the underlying storage too ? If so, then I don't think that this is an appropriate approach for this API. The virStorageVolWipe API should clear the data, *without* having any effect on the storage of the API - ie we don't want to discard underling storage blocks as a side effect Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 23-12-15 10:45, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 10:29:04AM +0100, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
All data from the volume will be wiped by calling rbd_discard() in chunks of the underlying object and stripe size inside Ceph.
This will ensure all data is zeroed and leaves the user with a completely empty volume ready for use again.
Based on the name 'rbd_discard' it sounds like this is going to call TRIM/DISCARD on the underlying storage too ? If so, then I don't think that this is an appropriate approach for this API. The virStorageVolWipe API should clear the data, *without* having any effect on the storage of the API - ie we don't want to discard underling storage blocks as a side effect
Afaik it zeroes/trims all the RBD objects on the Ceph cluster, but it doesn't TRIM the lower SSD on it's turn. So it will send these calls to Ceph/RBD and it will zero all the data of that specific volume. A rather simple way to get rid of the data in a volume. If makes sure that the volume no longer contains any data for the end-user. All user-data on the volume will be gone. Or am I misunderstanding you? What is it supposed to do? Wido
Regards, Daniel

On Wed, Dec 23, 2015 at 01:50:52PM +0100, Wido den Hollander wrote:
On 23-12-15 10:45, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 10:29:04AM +0100, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
All data from the volume will be wiped by calling rbd_discard() in chunks of the underlying object and stripe size inside Ceph.
This will ensure all data is zeroed and leaves the user with a completely empty volume ready for use again.
Based on the name 'rbd_discard' it sounds like this is going to call TRIM/DISCARD on the underlying storage too ? If so, then I don't think that this is an appropriate approach for this API. The virStorageVolWipe API should clear the data, *without* having any effect on the storage of the API - ie we don't want to discard underling storage blocks as a side effect
Afaik it zeroes/trims all the RBD objects on the Ceph cluster, but it doesn't TRIM the lower SSD on it's turn.
So it will send these calls to Ceph/RBD and it will zero all the data of that specific volume. A rather simple way to get rid of the data in a volume.
That's not what I see that API impl of rbd_discard() doing. It looks very much like it is discarding extents from the RBD volume, at least if the discarded region is large enough. Only if the discarded region is small, does it merely zero the data. So I really don't think this is a suitable API for use with the virStorageVolWipe() API, whose *only* effect should be to overwrite the data, not have any side effect on volume extent allocation Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 23-12-15 14:05, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 01:50:52PM +0100, Wido den Hollander wrote:
On 23-12-15 10:45, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 10:29:04AM +0100, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
All data from the volume will be wiped by calling rbd_discard() in chunks of the underlying object and stripe size inside Ceph.
This will ensure all data is zeroed and leaves the user with a completely empty volume ready for use again.
Based on the name 'rbd_discard' it sounds like this is going to call TRIM/DISCARD on the underlying storage too ? If so, then I don't think that this is an appropriate approach for this API. The virStorageVolWipe API should clear the data, *without* having any effect on the storage of the API - ie we don't want to discard underling storage blocks as a side effect
Afaik it zeroes/trims all the RBD objects on the Ceph cluster, but it doesn't TRIM the lower SSD on it's turn.
So it will send these calls to Ceph/RBD and it will zero all the data of that specific volume. A rather simple way to get rid of the data in a volume.
That's not what I see that API impl of rbd_discard() doing. It looks very much like it is discarding extents from the RBD volume, at least if the discarded region is large enough. Only if the discarded region is small, does it merely zero the data.
Let me verify this with the Ceph people.
So I really don't think this is a suitable API for use with the virStorageVolWipe() API, whose *only* effect should be to overwrite the data, not have any side effect on volume extent allocation
What do you suggest? use rbd_write() in a loop and overwrite the whole volume? The problem with this option is that the RBD volume will then grow to it's maximum size on the underlying storage. With rbd_discard() being called on the exact object size all those objects will be trimmed, effectively wiping the volume's data. So I figured it was OK. I just want to make the RBD storage driver as feature complete as possible :) Wido
Regards, Daniel

On Wed, Dec 23, 2015 at 02:10:28PM +0100, Wido den Hollander wrote:
On 23-12-15 14:05, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 01:50:52PM +0100, Wido den Hollander wrote:
On 23-12-15 10:45, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 10:29:04AM +0100, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
All data from the volume will be wiped by calling rbd_discard() in chunks of the underlying object and stripe size inside Ceph.
This will ensure all data is zeroed and leaves the user with a completely empty volume ready for use again.
Based on the name 'rbd_discard' it sounds like this is going to call TRIM/DISCARD on the underlying storage too ? If so, then I don't think that this is an appropriate approach for this API. The virStorageVolWipe API should clear the data, *without* having any effect on the storage of the API - ie we don't want to discard underling storage blocks as a side effect
Afaik it zeroes/trims all the RBD objects on the Ceph cluster, but it doesn't TRIM the lower SSD on it's turn.
So it will send these calls to Ceph/RBD and it will zero all the data of that specific volume. A rather simple way to get rid of the data in a volume.
That's not what I see that API impl of rbd_discard() doing. It looks very much like it is discarding extents from the RBD volume, at least if the discarded region is large enough. Only if the discarded region is small, does it merely zero the data.
Let me verify this with the Ceph people.
So I really don't think this is a suitable API for use with the virStorageVolWipe() API, whose *only* effect should be to overwrite the data, not have any side effect on volume extent allocation
What do you suggest? use rbd_write() in a loop and overwrite the whole volume?
That would match intended semantics of WIPE_ALG_ZERO albeit rather slow
The problem with this option is that the RBD volume will then grow to it's maximum size on the underlying storage.
This is the same semantics as we have for the other impls of the Wipe API. The API is defined to fill the contents of the file with the requested byte pattern, which neccessarily means that the volume will grow to its maximum size.
With rbd_discard() being called on the exact object size all those objects will be trimmed, effectively wiping the volume's data. So I figured it was OK.
We could introduce a VIR_STORAGE_VOL_WIPE_ALG_DISCARD which is defined to be the same as VIR_STORAGE_VOL_WIPE_ALG_ZERO, but using discard semantics to release space. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 23-12-15 14:14, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 02:10:28PM +0100, Wido den Hollander wrote:
On 23-12-15 14:05, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 01:50:52PM +0100, Wido den Hollander wrote:
On 23-12-15 10:45, Daniel P. Berrange wrote:
On Wed, Dec 23, 2015 at 10:29:04AM +0100, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
All data from the volume will be wiped by calling rbd_discard() in chunks of the underlying object and stripe size inside Ceph.
This will ensure all data is zeroed and leaves the user with a completely empty volume ready for use again.
Based on the name 'rbd_discard' it sounds like this is going to call TRIM/DISCARD on the underlying storage too ? If so, then I don't think that this is an appropriate approach for this API. The virStorageVolWipe API should clear the data, *without* having any effect on the storage of the API - ie we don't want to discard underling storage blocks as a side effect
Afaik it zeroes/trims all the RBD objects on the Ceph cluster, but it doesn't TRIM the lower SSD on it's turn.
So it will send these calls to Ceph/RBD and it will zero all the data of that specific volume. A rather simple way to get rid of the data in a volume.
That's not what I see that API impl of rbd_discard() doing. It looks very much like it is discarding extents from the RBD volume, at least if the discarded region is large enough. Only if the discarded region is small, does it merely zero the data.
Let me verify this with the Ceph people.
So I really don't think this is a suitable API for use with the virStorageVolWipe() API, whose *only* effect should be to overwrite the data, not have any side effect on volume extent allocation
What do you suggest? use rbd_write() in a loop and overwrite the whole volume?
That would match intended semantics of WIPE_ALG_ZERO albeit rather slow
The problem with this option is that the RBD volume will then grow to it's maximum size on the underlying storage.
This is the same semantics as we have for the other impls of the Wipe API. The API is defined to fill the contents of the file with the requested byte pattern, which neccessarily means that the volume will grow to its maximum size.
Ah, ok. I misunderstood that.
With rbd_discard() being called on the exact object size all those objects will be trimmed, effectively wiping the volume's data. So I figured it was OK.
We could introduce a VIR_STORAGE_VOL_WIPE_ALG_DISCARD which is defined to be the same as VIR_STORAGE_VOL_WIPE_ALG_ZERO, but using discard semantics to release space.
That seems fine with me. What would the best way be to proceed? Simply add the flag and that's it? This part of libvirt is not really my knowledge. Thanks for the super fast responses btw! Wido
Regards, Daniel

This allows user to use the volume wiping functionality of the libvirt storage driver. This patch also adds a new wiping algorithm VIR_STORAGE_VOL_WIPE_ALG_DISCARD By default the VIR_STORAGE_VOL_WIPE_ALG_ZERO algorithm is used and with RBD this will called rbd_write() in chunks of the underlying object size to completely zero out the volume. With VIR_STORAGE_VOL_WIPE_ALG_DISCARD it will call rbd_discard() in the same object size chunks which will trim/discard all underlying RADOS objects in the Ceph cluster. Signed-off-by: Wido den Hollander <wido@widodh.nl> --- include/libvirt/libvirt-storage.h | 4 + src/storage/storage_backend_rbd.c | 155 +++++++++++++++++++++++++++++++++++++- tools/virsh-volume.c | 2 +- 3 files changed, 159 insertions(+), 2 deletions(-) diff --git a/include/libvirt/libvirt-storage.h b/include/libvirt/libvirt-storage.h index 2c55c93..139add3 100644 --- a/include/libvirt/libvirt-storage.h +++ b/include/libvirt/libvirt-storage.h @@ -153,6 +153,10 @@ typedef enum { VIR_STORAGE_VOL_WIPE_ALG_RANDOM = 8, /* 1-pass random */ + VIR_STORAGE_VOL_WIPE_ALG_DISCARD = 9, /* 1-pass, discard all data on the + volume by using TRIM or + DISCARD */ + # ifdef VIR_ENUM_SENTINELS VIR_STORAGE_VOL_WIPE_ALG_LAST /* diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index cdbfdee..d13658d 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -32,6 +32,7 @@ #include "base64.h" #include "viruuid.h" #include "virstring.h" +#include "virutil.h" #include "rados/librados.h" #include "rbd/librbd.h" @@ -700,6 +701,157 @@ static int virStorageBackendRBDResizeVol(virConnectPtr conn ATTRIBUTE_UNUSED, return ret; } +static int virStorageBackendRBDVolWipeZero(rbd_image_t image, + char *imgname, + rbd_image_info_t info, + uint64_t stripe_count) +{ + int r = -1; + size_t offset = 0; + uint64_t length; + char *writebuf; + + if (VIR_ALLOC_N(writebuf, info.obj_size * stripe_count) < 0) + goto cleanup; + + while (offset < info.size) { + length = MIN((info.size - offset), (info.obj_size * stripe_count)); + + r = rbd_write(image, offset, length, writebuf); + if (r < 0) { + virReportSystemError(-r, _("writing %llu bytes failed on " + " RBD image %s at offset %llu"), + (unsigned long long)length, + imgname, + (unsigned long long)offset); + goto cleanup; + } + + VIR_DEBUG("Wrote %llu bytes to RBD image %s at offset %llu", + (unsigned long long)length, + imgname, (unsigned long long)offset); + + offset += length; + } + + cleanup: + return r; +} + +static int virStorageBackendRBDVolWipeDiscard(rbd_image_t image, + char *imgname, + rbd_image_info_t info, + uint64_t stripe_count) +{ + int r = -1; + size_t offset = 0; + uint64_t length; + + VIR_DEBUG("Wiping RBD %s volume using discard)", imgname); + + while (offset < info.size) { + length = MIN((info.size - offset), (info.obj_size * stripe_count)); + + r = rbd_discard(image, offset, length); + if (r < 0) { + virReportSystemError(-r, _("discarding %llu bytes failed on " + " RBD image %s at offset %llu"), + (unsigned long long)length, + imgname, + (unsigned long long)offset); + goto cleanup; + } + + VIR_DEBUG("Discarded %llu bytes of RBD image %s at offset %llu", + (unsigned long long)length, + imgname, (unsigned long long)offset); + + offset += length; + } + + cleanup: + return r; +} + +static int virStorageBackendRBDVolWipe(virConnectPtr conn, + virStoragePoolObjPtr pool, + virStorageVolDefPtr vol, + unsigned int algorithm, + unsigned int flags) +{ + virStorageBackendRBDState ptr; + ptr.cluster = NULL; + ptr.ioctx = NULL; + rbd_image_t image = NULL; + rbd_image_info_t info; + uint64_t stripe_count; + int r = -1; + + virCheckFlags(VIR_STORAGE_VOL_WIPE_ALG_ZERO | + VIR_STORAGE_VOL_WIPE_ALG_DISCARD, -1); + + VIR_DEBUG("Wiping RBD image %s/%s", pool->def->source.name, vol->name); + + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0) + goto cleanup; + + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0) + goto cleanup; + + r = rbd_open(ptr.ioctx, vol->name, &image, NULL); + if (r < 0) { + virReportSystemError(-r, _("failed to open the RBD image %s"), + vol->name); + goto cleanup; + } + + r = rbd_stat(image, &info, sizeof(info)); + if (r < 0) { + virReportSystemError(-r, _("failed to stat the RBD image %s"), + vol->name); + goto cleanup; + } + + r = rbd_get_stripe_count(image, &stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get stripe count of RBD image %s"), + vol->name); + goto cleanup; + } + + VIR_DEBUG("Need to wipe %llu bytes from RBD image %s/%s", + (unsigned long long)info.size, pool->def->source.name, vol->name); + + switch (algorithm) { + case VIR_STORAGE_VOL_WIPE_ALG_ZERO: + r = virStorageBackendRBDVolWipeZero(image, vol->name, + info, stripe_count); + break; + case VIR_STORAGE_VOL_WIPE_ALG_DISCARD: + r = virStorageBackendRBDVolWipeDiscard(image, vol->name, + info, stripe_count); + break; + default: + virReportError(VIR_ERR_INVALID_ARG, _("unsupported algorithm %d"), + algorithm); + r = -VIR_ERR_INVALID_ARG; + goto cleanup; + } + + if (r < 0) { + virReportSystemError(-r, _("failed to wipe RBD image %s"), + vol->name); + goto cleanup; + } + + cleanup: + if (image) + rbd_close(image); + + virStorageBackendRBDCloseRADOSConn(&ptr); + return r; +} + virStorageBackend virStorageBackendRBD = { .type = VIR_STORAGE_POOL_RBD, @@ -708,5 +860,6 @@ virStorageBackend virStorageBackendRBD = { .buildVol = virStorageBackendRBDBuildVol, .refreshVol = virStorageBackendRBDRefreshVol, .deleteVol = virStorageBackendRBDDeleteVol, - .resizeVol = virStorageBackendRBDResizeVol, + .wipeVol = virStorageBackendRBDVolWipe, + .resizeVol = virStorageBackendRBDResizeVol }; diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c index 7932ef2..3e95aa5 100644 --- a/tools/virsh-volume.c +++ b/tools/virsh-volume.c @@ -954,7 +954,7 @@ static const vshCmdOptDef opts_vol_wipe[] = { VIR_ENUM_DECL(virStorageVolWipeAlgorithm) VIR_ENUM_IMPL(virStorageVolWipeAlgorithm, VIR_STORAGE_VOL_WIPE_ALG_LAST, "zero", "nnsa", "dod", "bsi", "gutmann", "schneier", - "pfitzner7", "pfitzner33", "random"); + "pfitzner7", "pfitzner33", "random", "discard"); static bool cmdVolWipe(vshControl *ctl, const vshCmd *cmd) -- 1.9.1

On 12/23/2015 10:06 AM, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
This patch also adds a new wiping algorithm VIR_STORAGE_VOL_WIPE_ALG_DISCARD
By default the VIR_STORAGE_VOL_WIPE_ALG_ZERO algorithm is used and with RBD this will called rbd_write() in chunks of the underlying object size to completely zero out the volume.
With VIR_STORAGE_VOL_WIPE_ALG_DISCARD it will call rbd_discard() in the same object size chunks which will trim/discard all underlying RADOS objects in the Ceph cluster.
Signed-off-by: Wido den Hollander <wido@widodh.nl> --- include/libvirt/libvirt-storage.h | 4 + src/storage/storage_backend_rbd.c | 155 +++++++++++++++++++++++++++++++++++++- tools/virsh-volume.c | 2 +- 3 files changed, 159 insertions(+), 2 deletions(-)
Found these buried in my todo list of things to look at from during the holiday break. I figure by bumping it'll bring it back into focus... "Semantically" speaking - this patch is a v2 of the original patch series... I'm still a bit conflicted whether to add a new option to Wipe or whether a new API should be developed. I see value in both options. Although perhaps thinking of this as "trim" and not "discard" could make it more palatable for wipe. As a new API, each backend driver could decide whether it supports the discard/trim option, but that's quite a bit more work (essentially mimic the Wipe functionality, but generate Trim). I'll note off the top that if we go with adding a new wipe algorithm and we've updated virsh-volume.c to recognize that, then virsh.pod would also need an update to describe it. Also rather than one patch here - I suggest smaller individual patches to make it easier to debug issues down the line when using git bisect. I see perhaps 4 patches... Patch 1: You probably want to start by adjusting virStorageBackendVolWipeLocal. In particular, the switch statement there needs some tweaking - first to use the "switch ((virStorageVolWipeAlgorithm) algorithm) {" construct, but also fixing a 'bug' I just noted in the current design. If the current 'default:' option is taken, the code reports an error, but still attempts the SCRUB command (which will return/cause a different error). BTW: Instead of default it would the *_LAST case... If you're really ambitious, adding a check for the "expected" 'flags' bits would also be beneficial especially since you'll be adding one. Patch 2: Add wipe support for rbd and add the Zero algorithm. This gives a base. The switch in virStorageBackendRBDVolWipe could still remain, but the Flags would only be for _ZERO Patch 3: Then add a the 'trim' option to libvirt-storage.h, virsh-volume.c, and virsh.pod... Patch 4: This patch would add the 'trim' support to the backend. Also grab virStorageBackendRBDImageInfo from patch 2. You're making the same 'stripe_count' call in this patch, but don't have the same checks. If you're concerned about the perhaps extra unnecessary calls you could allow the 3 return parameters to be NULL, then prior to fetching do "if (param)" type trick. The caller could then provide a NULL if they don't care about features and unit...
diff --git a/include/libvirt/libvirt-storage.h b/include/libvirt/libvirt-storage.h index 2c55c93..139add3 100644 --- a/include/libvirt/libvirt-storage.h +++ b/include/libvirt/libvirt-storage.h @@ -153,6 +153,10 @@ typedef enum {
VIR_STORAGE_VOL_WIPE_ALG_RANDOM = 8, /* 1-pass random */
+ VIR_STORAGE_VOL_WIPE_ALG_DISCARD = 9, /* 1-pass, discard all data on the + volume by using TRIM or + DISCARD */
Assuming we use wipe, I think "TRIM" with the description of the option to be "trimming" the contents of the volume. Whether that's sparse files, thin/sparse logical volumes, or rbd object discarding... The aren't your problem to solve here, unless you have that desire to make those changes too. Also, the 2nd/3rd comments should line up under 1-pass...
+ # ifdef VIR_ENUM_SENTINELS VIR_STORAGE_VOL_WIPE_ALG_LAST /* diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index cdbfdee..d13658d 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -32,6 +32,7 @@ #include "base64.h" #include "viruuid.h" #include "virstring.h" +#include "virutil.h"
This isn't necessary I believe. I was able to remove without issue.
#include "rados/librados.h" #include "rbd/librbd.h"
@@ -700,6 +701,157 @@ static int virStorageBackendRBDResizeVol(virConnectPtr conn ATTRIBUTE_UNUSED, return ret; }
+static int virStorageBackendRBDVolWipeZero(rbd_image_t image, + char *imgname, + rbd_image_info_t info, + uint64_t stripe_count)
Newer libvirt convention is: static int virStorage...
+{ + int r = -1;
Add: int ret = -1; Usually it's 'ret' instead of just 'r'... Keeping 'r' for rbd_*() call failures fine though since that will contain (and possibly message) rbd_* specific API call errors...
+ size_t offset = 0; + uint64_t length; + char *writebuf; + + if (VIR_ALLOC_N(writebuf, info.obj_size * stripe_count) < 0) + goto cleanup; + + while (offset < info.size) { + length = MIN((info.size - offset), (info.obj_size * stripe_count)); + + r = rbd_write(image, offset, length, writebuf); + if (r < 0) { + virReportSystemError(-r, _("writing %llu bytes failed on " + " RBD image %s at offset %llu"),
This will generate two spaces "... failed on RBD image..."
+ (unsigned long long)length, + imgname, + (unsigned long long)offset);
So is length a "uint64_t" or not? I do note that librdb.h deems it a "size_t"... The query is more why caste to (unsigned long long) other than the %llu (of course). As for offset, IIRC the convention is "%zu", although for this one I note that the librdb.h deems it a "uint64_t".
+ goto cleanup; + } + + VIR_DEBUG("Wrote %llu bytes to RBD image %s at offset %llu", + (unsigned long long)length, + imgname, (unsigned long long)offset);
similar comments regarding the castes and the variable types.
+ + offset += length; + }
Here would be: ret = 0;
+ + cleanup:
writebuf is leaked. Need a VIR_FREE()
+ return r;
and this becomes return ret;
+} + +static int virStorageBackendRBDVolWipeDiscard(rbd_image_t image, + char *imgname, + rbd_image_info_t info, + uint64_t stripe_count)
static int virStorage...
+{ + int r = -1;
Need int ret = -1
+ size_t offset = 0; + uint64_t length; + + VIR_DEBUG("Wiping RBD %s volume using discard)", imgname); + + while (offset < info.size) { + length = MIN((info.size - offset), (info.obj_size * stripe_count)); + + r = rbd_discard(image, offset, length);
rbd_discard deems 'offset' to also be a uint64_t
+ if (r < 0) { + virReportSystemError(-r, _("discarding %llu bytes failed on " + " RBD image %s at offset %llu"),
similar to *Zero - you'll have "...failed on RBD image..."
+ (unsigned long long)length, + imgname, + (unsigned long long)offset);
similar comments regarding caste's of length and offset
+ goto cleanup; + } + + VIR_DEBUG("Discarded %llu bytes of RBD image %s at offset %llu", + (unsigned long long)length, + imgname, (unsigned long long)offset);
similar comments regarding caste's
+ + offset += length; + }
Here would be ret = 0;
+ + cleanup: + return r;
And return ret;
+} + +static int virStorageBackendRBDVolWipe(virConnectPtr conn, + virStoragePoolObjPtr pool, + virStorageVolDefPtr vol, + unsigned int algorithm, + unsigned int flags)
static int virStorage...
+{ + virStorageBackendRBDState ptr; + ptr.cluster = NULL; + ptr.ioctx = NULL; + rbd_image_t image = NULL; + rbd_image_info_t info; + uint64_t stripe_count; + int r = -1;
Add int ret = -1;
+ + virCheckFlags(VIR_STORAGE_VOL_WIPE_ALG_ZERO | + VIR_STORAGE_VOL_WIPE_ALG_DISCARD, -1); + + VIR_DEBUG("Wiping RBD image %s/%s", pool->def->source.name, vol->name); + + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0) + goto cleanup; + + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0) + goto cleanup; + + r = rbd_open(ptr.ioctx, vol->name, &image, NULL); + if (r < 0) {
BTW: This can be : if ((r = rbd_open(ptr.ioctx, vol->name, &image, NULL)) < 0) { For this and all rbd_* calls...
+ virReportSystemError(-r, _("failed to open the RBD image %s"), + vol->name); + goto cleanup; + } + + r = rbd_stat(image, &info, sizeof(info)); + if (r < 0) { + virReportSystemError(-r, _("failed to stat the RBD image %s"), + vol->name); + goto cleanup; + } + + r = rbd_get_stripe_count(image, &stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get stripe count of RBD image %s"), + vol->name); + goto cleanup; + }
I see the subsequent patch has some extra checks before calling this. Why wouldn't those also need to be made here?
+ + VIR_DEBUG("Need to wipe %llu bytes from RBD image %s/%s", + (unsigned long long)info.size, pool->def->source.name, vol->name); + + switch (algorithm) {
Follow the convention of "switch ((virStorageVolWipeAlgorithm) algorithm) {" Then each "case" lines up under "switch".
+ case VIR_STORAGE_VOL_WIPE_ALG_ZERO: + r = virStorageBackendRBDVolWipeZero(image, vol->name, + info, stripe_count);
I would change this (and the next one) to: if (virStorageBackendRBDVolWipeZero(image, vol->name, info, stripe_count) < 0) goto cleanup; Also, I ran these patches through Coverity - it complains that 'info' is passed by value of 160 bytes... Although neither API adjusts it, why not just pass "info.size" and "info.obj_size" or pass by reference the whole 'info' (just to be safe).
+ break; + case VIR_STORAGE_VOL_WIPE_ALG_DISCARD: + r = virStorageBackendRBDVolWipeDiscard(image, vol->name, + info, stripe_count); + break; + default:
And listing each case allowed - so it's clearer. That way if someone in the future comes along and adds ALG_ONE, the rbd code isn't forgotten to be adjusted... The compiler catches it.
+ virReportError(VIR_ERR_INVALID_ARG, _("unsupported algorithm %d"), + algorithm); + r = -VIR_ERR_INVALID_ARG;
This will be unnecessary...
+ goto cleanup; + } + + if (r < 0) { + virReportSystemError(-r, _("failed to wipe RBD image %s"), + vol->name);
This overwrites the errors found in the *WipeZero and *WipeDiscard API's
+ goto cleanup; + }
The assumption here being ret = 0;
+ + cleanup: + if (image) + rbd_close(image); + + virStorageBackendRBDCloseRADOSConn(&ptr); + return r;
return ret;
+} + virStorageBackend virStorageBackendRBD = { .type = VIR_STORAGE_POOL_RBD,
@@ -708,5 +860,6 @@ virStorageBackend virStorageBackendRBD = { .buildVol = virStorageBackendRBDBuildVol, .refreshVol = virStorageBackendRBDRefreshVol, .deleteVol = virStorageBackendRBDDeleteVol, - .resizeVol = virStorageBackendRBDResizeVol, + .wipeVol = virStorageBackendRBDVolWipe, + .resizeVol = virStorageBackendRBDResizeVol
No need to remove the "," - that way the only diff is the line.
}; diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c index 7932ef2..3e95aa5 100644 --- a/tools/virsh-volume.c +++ b/tools/virsh-volume.c @@ -954,7 +954,7 @@ static const vshCmdOptDef opts_vol_wipe[] = { VIR_ENUM_DECL(virStorageVolWipeAlgorithm) VIR_ENUM_IMPL(virStorageVolWipeAlgorithm, VIR_STORAGE_VOL_WIPE_ALG_LAST, "zero", "nnsa", "dod", "bsi", "gutmann", "schneier", - "pfitzner7", "pfitzner33", "random"); + "pfitzner7", "pfitzner33", "random", "discard");
I think "trim" will be better. John
static bool cmdVolWipe(vshControl *ctl, const vshCmd *cmd)

On Thu, Jan 21, 2016 at 02:10:02PM -0500, John Ferlan wrote:
On 12/23/2015 10:06 AM, Wido den Hollander wrote:
This allows user to use the volume wiping functionality of the libvirt storage driver.
This patch also adds a new wiping algorithm VIR_STORAGE_VOL_WIPE_ALG_DISCARD
By default the VIR_STORAGE_VOL_WIPE_ALG_ZERO algorithm is used and with RBD this will called rbd_write() in chunks of the underlying object size to completely zero out the volume.
With VIR_STORAGE_VOL_WIPE_ALG_DISCARD it will call rbd_discard() in the same object size chunks which will trim/discard all underlying RADOS objects in the Ceph cluster.
Signed-off-by: Wido den Hollander <wido@widodh.nl> --- include/libvirt/libvirt-storage.h | 4 + src/storage/storage_backend_rbd.c | 155 +++++++++++++++++++++++++++++++++++++- tools/virsh-volume.c | 2 +- 3 files changed, 159 insertions(+), 2 deletions(-)
Found these buried in my todo list of things to look at from during the holiday break. I figure by bumping it'll bring it back into focus... "Semantically" speaking - this patch is a v2 of the original patch series...
I'm still a bit conflicted whether to add a new option to Wipe or whether a new API should be developed. I see value in both options. Although perhaps thinking of this as "trim" and not "discard" could make it more palatable for wipe. As a new API, each backend driver could decide whether it supports the discard/trim option, but that's quite a bit more work (essentially mimic the Wipe functionality, but generate Trim).
I think I can certainly see a dedicated API for trim in the future, as there's use cases for trim beyond full-disk wiping. eg to spars-ify a disk image. I think its still ok to add trim to the existing Wipe API though, as a specific targetted use case. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

RBD supports cloning by creating a snapshot, protecting it and create a child image based on that snapshot afterwards. The RBD storage driver will try to find a snapshot with zero deltas between the current state of the original volume and the snapshot. If such a snapshot is found a clone/child image will be created using the rbd_clone2() function from librbd. It will use the same features, strip size and stripe count as the parent image. This implementation will only create a single snapshot on the parent image if this never changes. That should improve performance when removing the parent image at some point. During build the decision will be made to user rbd_diff_iterate() or rbd_diff_iterate2(). The latter is faster, but only available on Ceph versions after 0.94 (Hammer). Cloning is only supported if RBD format 2 is used. All images created by libvirt are already format 2. If a RBD format 1 image is used as the original volume libvirt will return VIR_ERR_OPERATION_UNSUPPORTED Signed-off-by: Wido den Hollander <wido@widodh.nl> --- src/storage/storage_backend_rbd.c | 340 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 340 insertions(+) diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index 5e16e7a..d22e5e0 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -33,6 +33,7 @@ #include "viruuid.h" #include "virstring.h" #include "virutil.h" +#include "time.h" #include "rados/librados.h" #include "rbd/librbd.h" @@ -632,6 +633,344 @@ virStorageBackendRBDBuildVol(virConnectPtr conn, return ret; } +static int virStorageBackendRBDImageInfo(rbd_image_t image, char *volname, + uint64_t *features, + uint64_t *stripe_unit, + uint64_t *stripe_count) +{ + int r = -1; + uint8_t oldformat; + + r = rbd_get_old_format(image, &oldformat); + if (r < 0) { + virReportSystemError(-r, _("failed to get the format of RBD image %s"), + volname); + goto cleanup; + } + + if (oldformat != 0) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("RBD image %s is old format. Does not support " + "extended features and striping"), + volname); + r = VIR_ERR_OPERATION_UNSUPPORTED; + goto cleanup; + } + + r = rbd_get_features(image, features); + if (r < 0) { + virReportSystemError(-r, _("failed to get the features of RBD image %s"), + volname); + goto cleanup; + } + + r = rbd_get_stripe_unit(image, stripe_unit); + if (r < 0) { + virReportSystemError(-r, _("failed to get the stripe unit of RBD image %s"), + volname); + goto cleanup; + } + + r = rbd_get_stripe_count(image, stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get the stripe count of RBD image %s"), + volname); + goto cleanup; + } + + cleanup: + return r; +} + +/* Callback function for rbd_diff_iterate() */ +static int virStorageBackendRBDIterateCb(uint64_t offset ATTRIBUTE_UNUSED, + size_t length ATTRIBUTE_UNUSED, + int exists ATTRIBUTE_UNUSED, + void *arg) +{ + /* + * Just set that there is a diff for this snapshot, we do not care where + */ + *(int*) arg = 1; + return -1; +} + +static int virStorageBackendRBDSnapshotFindNoDiff(rbd_image_t image, char *imgname, + virBufferPtr snapname) +{ + int r = -1; + int snap_count; + int max_snaps = 128; + size_t i; + int diff; + rbd_snap_info_t *snaps = NULL; + rbd_image_info_t info; + + r = rbd_stat(image, &info, sizeof(info)); + if (r < 0) { + virReportSystemError(-r, _("failed to stat the RBD image %s"), + imgname); + goto cleanup; + } + + do { + if (VIR_ALLOC_N(snaps, max_snaps)) + goto cleanup; + + snap_count = rbd_snap_list(image, snaps, &max_snaps); + if (snap_count <= 0) + VIR_FREE(snaps); + + } while (snap_count == -ERANGE); + + VIR_DEBUG("Found %d snapshots for RBD image %s", snap_count, imgname); + + if (snap_count == 0) { + r = -ENOENT; + goto cleanup; + } + + if (snap_count > 0) { + for (i = 0; i < snap_count; i++) { + VIR_DEBUG("Quering diff for RBD snapshot %s@%s", imgname, + snaps[i].name); + + /* The callback will set diff to non-zero if there is a diff */ + diff = 0; + +/* + * rbd_diff_iterate2() is available in versions above Ceph 0.94 (Hammer) + * It uses a object map inside Ceph which is faster than rbd_diff_iterate() + * which iterates all objects. + */ +#if LIBRBD_VERSION_CODE > 266 + r = rbd_diff_iterate2(image, snaps[i].name, 0, info.size, 0, 1, +#else + r = rbd_diff_iterate(image, snaps[i].name, 0, info.size, +#endif + virStorageBackendRBDIterateCb, (void *)&diff); + + if (r < 0) { + virReportSystemError(-r, _("failed to iterate RBD snapshot %s@%s"), + imgname, snaps[i].name); + goto cleanup; + } + + if (diff == 0) { + VIR_DEBUG("RBD snapshot %s@%s has no delta", imgname, + snaps[i].name); + virBufferAsprintf(snapname, "%s", snaps[i].name); + r = 0; + goto cleanup; + } + + VIR_DEBUG("RBD snapshot %s@%s has deltas", imgname, + snaps[i].name); + } + } + + r = -ENOENT; + + cleanup: + if (snaps) + rbd_snap_list_end(snaps); + + VIR_FREE(snaps); + + return r; +} + +static int virStorageBackendRBDSnapshotCreate(rbd_image_t image, char *imgname, + char *snapname) +{ + int r = -1; + + VIR_DEBUG("Creating RBD snapshot %s@%s", imgname, snapname); + + r = rbd_snap_create(image, snapname); + if (r < 0) { + virReportSystemError(-r, _("failed to create RBD snapshot %s@%s"), + imgname, snapname); + goto cleanup; + } + + cleanup: + return r; +} + +static int virStorageBackendRBDSnapshotProtect(rbd_image_t image, char *imgname, + char *snapname) +{ + int r = -1; + int protected; + + VIR_DEBUG("Quering if RBD snapshot %s@%s is protected", imgname, snapname); + + r = rbd_snap_is_protected(image, snapname, &protected); + if (r < 0) { + virReportSystemError(-r, _("failed verify if RBD snapshot %s@%s " + "is protected"), imgname, snapname); + goto cleanup; + } + + if (protected == 0) { + VIR_DEBUG("RBD Snapshot %s@%s is not protected, protecting", + imgname, snapname); + + r = rbd_snap_protect(image, snapname); + if (r < 0) { + virReportSystemError(-r, _("failed protect RBD snapshot %s@%s"), + imgname, snapname); + goto cleanup; + } + } else { + VIR_DEBUG("RBD Snapshot %s@%s is already protected", imgname, snapname); + } + + cleanup: + return r; +} + +static int virStorageBackendRBDCloneImage(rados_ioctx_t io, char *origvol, + char *newvol) +{ + int r = -1; + int order = 0; + uint64_t features; + uint64_t stripe_count; + uint64_t stripe_unit; + virBuffer snapname = VIR_BUFFER_INITIALIZER; + char *snapname_buff; + rbd_image_t image = NULL; + + r = rbd_open(io, origvol, &image, NULL); + if (r < 0) { + virReportSystemError(-r, _("failed to open the RBD image %s"), + origvol); + goto cleanup; + } + + r = virStorageBackendRBDImageInfo(image, origvol, &features, &stripe_unit, + &stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get info from RBD image %s"), + origvol); + goto cleanup; + } + + /* + * First we attempt to find a snapshot which has no differences between + * the current state of the RBD image. + * + * This prevents us from creating a new snapshot for every clone operation + * while it could be that the original volume has not changed + */ + r = virStorageBackendRBDSnapshotFindNoDiff(image, origvol, &snapname); + if (r < 0 && r != -ENOENT) { + virReportSystemError(-r, _("failed to find snapshot for RBD image %s"), + origvol); + goto cleanup; + } + + /* + * No such snapshot could be found, so we will create a new snapshot + * and use that for cloning + */ + if (r == -ENOENT) { + VIR_DEBUG("No RBD snapshot with zero delta could be found for image %s", + origvol); + + virBufferAsprintf(&snapname, "libvirt-%d", (int)time(NULL)); + + if (virBufferCheckError(&snapname) < 0) + goto cleanup; + + snapname_buff = virBufferContentAndReset(&snapname); + + r = virStorageBackendRBDSnapshotCreate(image, origvol, snapname_buff); + if (r < 0) { + virReportSystemError(-r, _("failed to snapshot RBD image %s"), + origvol); + goto cleanup; + } + } else { + snapname_buff = virBufferContentAndReset(&snapname); + + VIR_DEBUG("Found RBD snapshot %s with zero delta for image %s", + snapname_buff, origvol); + } + + VIR_DEBUG("Using snapshot name %s for cloning RBD image %s to %s", + snapname_buff, origvol, newvol); + + /* + * RBD snapshots have to be 'protected' before they can be used + * as a parent snapshot for a child image + */ + r = virStorageBackendRBDSnapshotProtect(image, origvol, snapname_buff); + if (r < 0) { + virReportSystemError(-r, _("failed to protect RBD snapshot %s@%s"), + origvol, snapname_buff); + goto cleanup; + } + + VIR_DEBUG("Performing RBD clone from %s to %s", origvol, newvol); + + r = rbd_clone2(io, origvol, snapname_buff, io, newvol, features, &order, + stripe_unit, stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to clone RBD volume %s to %s"), + origvol, newvol); + goto cleanup; + } + + VIR_DEBUG("Cloned RBD image %s to %s", origvol, newvol); + + cleanup: + virBufferFreeAndReset(&snapname); + VIR_FREE(snapname_buff); + + if (image) + rbd_close(image); + + return r; +} + +static int virStorageBackendRBDBuildVolFrom(virConnectPtr conn, + virStoragePoolObjPtr pool, + virStorageVolDefPtr newvol, + virStorageVolDefPtr origvol, + unsigned int flags) +{ + virStorageBackendRBDState ptr; + ptr.cluster = NULL; + ptr.ioctx = NULL; + int r = -1; + + VIR_DEBUG("Creating clone of RBD image %s/%s with name %s", + pool->def->source.name, origvol->name, newvol->name); + + virCheckFlags(0, -1); + + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0) + goto cleanup; + + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0) + goto cleanup; + + r = virStorageBackendRBDCloneImage(ptr.ioctx, origvol->name, newvol->name); + if (r < 0) { + virReportSystemError(-r, _("failed to clone volume '%s/%s' to %s"), + pool->def->source.name, origvol->name, + newvol->name); + goto cleanup; + } + + cleanup: + virStorageBackendRBDCloseRADOSConn(&ptr); + return r; +} + static int virStorageBackendRBDRefreshVol(virConnectPtr conn, virStoragePoolObjPtr pool ATTRIBUTE_UNUSED, virStorageVolDefPtr vol) @@ -793,6 +1132,7 @@ virStorageBackend virStorageBackendRBD = { .refreshPool = virStorageBackendRBDRefreshPool, .createVol = virStorageBackendRBDCreateVol, .buildVol = virStorageBackendRBDBuildVol, + .buildVolFrom = virStorageBackendRBDBuildVolFrom, .refreshVol = virStorageBackendRBDRefreshVol, .deleteVol = virStorageBackendRBDDeleteVol, .wipeVol = virStorageBackendRBDVolWipe, -- 1.9.1

On 12/23/2015 04:29 AM, Wido den Hollander wrote:
RBD supports cloning by creating a snapshot, protecting it and create a child image based on that snapshot afterwards.
The RBD storage driver will try to find a snapshot with zero deltas between the current state of the original volume and the snapshot.
If such a snapshot is found a clone/child image will be created using the rbd_clone2() function from librbd.
It will use the same features, strip size and stripe count as the parent image.
This implementation will only create a single snapshot on the parent image if this never changes. That should improve performance when removing the parent image at some point.
During build the decision will be made to user rbd_diff_iterate() or rbd_diff_iterate2(). The latter is faster, but only available on Ceph versions after 0.94 (Hammer).
Cloning is only supported if RBD format 2 is used. All images created by libvirt are already format 2.
If a RBD format 1 image is used as the original volume libvirt will return VIR_ERR_OPERATION_UNSUPPORTED
Long line... and the API should return 0 or -1 (based on other backends)... Caller doesn't expect that VIR_ERR* on return... I assume you realize that 'virsh vol-create-from' and 'vol-clone' end up using the same API... FWIW: I reviewed bottom up - I may also have been distracted more than once looking for specific code. Hopefully this is coherent ;-)
Signed-off-by: Wido den Hollander <wido@widodh.nl> --- src/storage/storage_backend_rbd.c | 340 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 340 insertions(+)
diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index 5e16e7a..d22e5e0 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -33,6 +33,7 @@ #include "viruuid.h" #include "virstring.h" #include "virutil.h" +#include "time.h"
Instead of time, perhaps virrandom... see note below
#include "rados/librados.h" #include "rbd/librbd.h"
@@ -632,6 +633,344 @@ virStorageBackendRBDBuildVol(virConnectPtr conn, return ret; }
+static int virStorageBackendRBDImageInfo(rbd_image_t image, char *volname, + uint64_t *features, + uint64_t *stripe_unit, + uint64_t *stripe_count)
static int virStorage... and one line per parameter Also, I think this moves to help with the previous patch...
+{ + int r = -1;
int ret = -1;
+ uint8_t oldformat; + + r = rbd_get_old_format(image, &oldformat); + if (r < 0) {
You could use (in more places than just here): "if ((r = rbd_*(...)) < 0) {" I'll just mention it once though...
+ virReportSystemError(-r, _("failed to get the format of RBD image %s"), + volname); + goto cleanup; + } + + if (oldformat != 0) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("RBD image %s is old format. Does not support " + "extended features and striping"), + volname); + r = VIR_ERR_OPERATION_UNSUPPORTED;
unnecessary
+ goto cleanup; + }
FWIW: This is the check I'm referring to in the review of the other patch.
+ + r = rbd_get_features(image, features); + if (r < 0) { + virReportSystemError(-r, _("failed to get the features of RBD image %s"), + volname); + goto cleanup; + } + + r = rbd_get_stripe_unit(image, stripe_unit); + if (r < 0) { + virReportSystemError(-r, _("failed to get the stripe unit of RBD image %s"), + volname); + goto cleanup; + } + + r = rbd_get_stripe_count(image, stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get the stripe count of RBD image %s"), + volname); + goto cleanup; + }
ret = 0;
+ + cleanup: + return r;
return ret;
+} + +/* Callback function for rbd_diff_iterate() */ +static int virStorageBackendRBDIterateCb(uint64_t offset ATTRIBUTE_UNUSED, + size_t length ATTRIBUTE_UNUSED, + int exists ATTRIBUTE_UNUSED, + void *arg)
static int virStorage*
+{ + /* + * Just set that there is a diff for this snapshot, we do not care where + */ + *(int*) arg = 1; + return -1;
So I assume the only reason this gets called then is when there is a difference...
+} +
This one will require some comments w/r/t what it returns... I'm also curious how would the caller distinguish between EPERM and a -1 return call? That's the problem with returning negative errno values. I think you need to pick specific numbers to return and not mess w/ errno. Seems like you have 1 for success, 0 for nothing found, and -1 for error. Then your caller is adjusted thusly.
+static int virStorageBackendRBDSnapshotFindNoDiff(rbd_image_t image, char *imgname, + virBufferPtr snapname)
static int virStorage... and one line per parameter
+{ + int r = -1;
int ret = -1;
+ int snap_count; + int max_snaps = 128; + size_t i; + int diff; + rbd_snap_info_t *snaps = NULL; + rbd_image_info_t info; + + r = rbd_stat(image, &info, sizeof(info)); + if (r < 0) { + virReportSystemError(-r, _("failed to stat the RBD image %s"), + imgname); + goto cleanup; + }
Perhaps interesting to note that virStorageBackendRBDVolWipe calls this too - why not call it once and save/pass the result (by reference)?
+ + do { + if (VIR_ALLOC_N(snaps, max_snaps)) + goto cleanup; + + snap_count = rbd_snap_list(image, snaps, &max_snaps);
There isn't an API to tell you how many are there? Seems like a much better idea to have that rather than iterative loop like this trying to guess... Especially one that creates 128 to start... Lots of wasted space if there's only 1. There are some API's where you pass NULL for 'snaps' and 'max_snaps = 0', then it returns the number found... That would be more useful here.
+ if (snap_count <= 0) + VIR_FREE(snaps); + + } while (snap_count == -ERANGE); + + VIR_DEBUG("Found %d snapshots for RBD image %s", snap_count, imgname);
Found -# snapshots will look very strange on error...
+ + if (snap_count == 0) { + r = -ENOENT; + goto cleanup; + }
This then becomes: if (snap_count <= 0) { if (snap_count == 0) ret = 0; goto cleanup; }
+ + if (snap_count > 0) {
That means this if goes away...
+ for (i = 0; i < snap_count; i++) { + VIR_DEBUG("Quering diff for RBD snapshot %s@%s", imgname, + snaps[i].name);
Querying or checking.
+ + /* The callback will set diff to non-zero if there is a diff */ + diff = 0; + +/* + * rbd_diff_iterate2() is available in versions above Ceph 0.94 (Hammer) + * It uses a object map inside Ceph which is faster than rbd_diff_iterate() + * which iterates all objects.
Rather than split function calls between #if #else #endif - I'd copy that duplicated line into both.
+ */ +#if LIBRBD_VERSION_CODE > 266 + r = rbd_diff_iterate2(image, snaps[i].name, 0, info.size, 0, 1, +#else + r = rbd_diff_iterate(image, snaps[i].name, 0, info.size, +#endif + virStorageBackendRBDIterateCb, (void *)&diff); + + if (r < 0) { + virReportSystemError(-r, _("failed to iterate RBD snapshot %s@%s"), + imgname, snaps[i].name); + goto cleanup; + } + + if (diff == 0) {
So this is the I found the matching image/snaps[i].name and it's exactly the same? I'm a bit unclear on what the iterate call does especially with that callback function added into the mix.
+ VIR_DEBUG("RBD snapshot %s@%s has no delta", imgname, + snaps[i].name); + virBufferAsprintf(snapname, "%s", snaps[i].name); + r = 0;
I assume though here we're saying match and the same, so the caller should take a specific path, thus: ret = 1;
+ goto cleanup; + } + + VIR_DEBUG("RBD snapshot %s@%s has deltas", imgname, + snaps[i].name);
Otherwise, we'll try the next snaps[i]?
+ } + } + + r = -ENOENT;
Thus if we get here we found nothing matching, so: ret = 0;
+ + cleanup: + if (snaps) + rbd_snap_list_end(snaps); + + VIR_FREE(snaps); + + return r;
return ret;
+} + +static int virStorageBackendRBDSnapshotCreate(rbd_image_t image, char *imgname, + char *snapname)
static int virStorage... one line per argument
+{ + int r = -1;
int ret = -1;
+ + VIR_DEBUG("Creating RBD snapshot %s@%s", imgname, snapname); + + r = rbd_snap_create(image, snapname); + if (r < 0) { + virReportSystemError(-r, _("failed to create RBD snapshot %s@%s"), + imgname, snapname); + goto cleanup; + }
ret = 0;
+ + cleanup: + return r;
return ret;
+} + +static int virStorageBackendRBDSnapshotProtect(rbd_image_t image, char *imgname, + char *snapname)
static int virStorage... And one line per parameter...
+{ + int r = -1;
int ret = -1;
+ int protected; + + VIR_DEBUG("Quering if RBD snapshot %s@%s is protected", imgname, snapname);
Querying or Checking
+ + r = rbd_snap_is_protected(image, snapname, &protected); + if (r < 0) { + virReportSystemError(-r, _("failed verify if RBD snapshot %s@%s " + "is protected"), imgname, snapname); + goto cleanup; + } + + if (protected == 0) { + VIR_DEBUG("RBD Snapshot %s@%s is not protected, protecting", + imgname, snapname); + + r = rbd_snap_protect(image, snapname); + if (r < 0) { + virReportSystemError(-r, _("failed protect RBD snapshot %s@%s"), + imgname, snapname); + goto cleanup; + } + } else { + VIR_DEBUG("RBD Snapshot %s@%s is already protected", imgname, snapname); + } + ret = 0;
+ cleanup: + return r;
return ret;
+} + +static int virStorageBackendRBDCloneImage(rados_ioctx_t io, char *origvol, + char *newvol)
static int virStorage... Also one line per parameter...
+{ + int r = -1;
int ret = -1;
+ int order = 0; + uint64_t features; + uint64_t stripe_count; + uint64_t stripe_unit; + virBuffer snapname = VIR_BUFFER_INITIALIZER; + char *snapname_buff;
= NULL; Since we can get to cleanup without ever initializing...
+ rbd_image_t image = NULL; + + r = rbd_open(io, origvol, &image, NULL); + if (r < 0) { + virReportSystemError(-r, _("failed to open the RBD image %s"), + origvol); + goto cleanup; + } + + r = virStorageBackendRBDImageInfo(image, origvol, &features, &stripe_unit, + &stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get info from RBD image %s"), + origvol);
Don't overwrite error messages from virStorageBackendRBDImageInfo This then just becomes: if (virStorageBackendRBDImageInfo() < 0) Yes, I see this is where the commit message claim about returning VIR_ERR_OPERATION_UNSUPPORTED is sourced, but I don't think that's the right choice...
+ goto cleanup; + } + + /* + * First we attempt to find a snapshot which has no differences between + * the current state of the RBD image. + * + * This prevents us from creating a new snapshot for every clone operation + * while it could be that the original volume has not changed + */ + r = virStorageBackendRBDSnapshotFindNoDiff(image, origvol, &snapname); + if (r < 0 && r != -ENOENT) { + virReportSystemError(-r, _("failed to find snapshot for RBD image %s"), + origvol); + goto cleanup; + }
Same here regarding error message... Using my suggestions above "if (r < 0) goto cleanup;"
+ + /* + * No such snapshot could be found, so we will create a new snapshot + * and use that for cloning + */ + if (r == -ENOENT) {
again from my suggestion "if (r == 0)"
+ VIR_DEBUG("No RBD snapshot with zero delta could be found for image %s", + origvol); + + virBufferAsprintf(&snapname, "libvirt-%d", (int)time(NULL));
Ewww... I assume you're shooting for unique name, why not use 'virRandomInt'.
+ + if (virBufferCheckError(&snapname) < 0) + goto cleanup; + + snapname_buff = virBufferContentAndReset(&snapname); + + r = virStorageBackendRBDSnapshotCreate(image, origvol, snapname_buff); + if (r < 0) { + virReportSystemError(-r, _("failed to snapshot RBD image %s"), + origvol); + goto cleanup; + }
Same here regarding error message
+ } else { + snapname_buff = virBufferContentAndReset(&snapname);
This path assumes snapname was generated via the virBufferAsprintf in the called function, but this path doesn't do the virBufferCheckError leaving the possibility that the virBufferAsprintf failed and then the snapname_buff could be "NULL" (if error was set), resulting in issues below...
+ + VIR_DEBUG("Found RBD snapshot %s with zero delta for image %s", + snapname_buff, origvol);
This is somewhat redundant with the VIR_DEBUG in the *NoDiff helper
+ } + + VIR_DEBUG("Using snapshot name %s for cloning RBD image %s to %s", + snapname_buff, origvol, newvol); + + /* + * RBD snapshots have to be 'protected' before they can be used + * as a parent snapshot for a child image + */ + r = virStorageBackendRBDSnapshotProtect(image, origvol, snapname_buff); + if (r < 0) { + virReportSystemError(-r, _("failed to protect RBD snapshot %s@%s"), + origvol, snapname_buff); + goto cleanup; + }
Same here regarding error message.
+ + VIR_DEBUG("Performing RBD clone from %s to %s", origvol, newvol); + + r = rbd_clone2(io, origvol, snapname_buff, io, newvol, features, &order, + stripe_unit, stripe_count);
Seeing rbd_clone2 makes me wonder is there an rbd_clone? and if so, then what version does rbd_clone2 show up in? e.g. why no LIBRBD_VERSION_CODE syntax here? Why even get this far if rbd_clone2 isn't available?
+ if (r < 0) { + virReportSystemError(-r, _("failed to clone RBD volume %s to %s"), + origvol, newvol); + goto cleanup; + } + + VIR_DEBUG("Cloned RBD image %s to %s", origvol, newvol); + ret = 0;
+ cleanup: + virBufferFreeAndReset(&snapname); + VIR_FREE(snapname_buff); + + if (image) + rbd_close(image); + + return r;
return ret;
+} +
Since the code relies upon rbd_clone2, rather than go through all the effort in the code only to fail because the right function isn't available - check once, early on and be done. I used 266 here because it's the *iterate2 - if "rbd_clone2" is later, then that version should be used. Just be sure to document your choice. #if LIBRBD_VERSION_CODE > 266
+static int virStorageBackendRBDBuildVolFrom(virConnectPtr conn, + virStoragePoolObjPtr pool, + virStorageVolDefPtr newvol, + virStorageVolDefPtr origvol, + unsigned int flags)
static int virStorage...
+{ + virStorageBackendRBDState ptr; + ptr.cluster = NULL; + ptr.ioctx = NULL; + int r = -1;
Add a 'int ret = -1;'
+ + VIR_DEBUG("Creating clone of RBD image %s/%s with name %s", + pool->def->source.name, origvol->name, newvol->name); + + virCheckFlags(0, -1); + + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0) + goto cleanup; + + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0) + goto cleanup; + + r = virStorageBackendRBDCloneImage(ptr.ioctx, origvol->name, newvol->name); + if (r < 0) { + virReportSystemError(-r, _("failed to clone volume '%s/%s' to %s"), + pool->def->source.name, origvol->name, + newvol->name);
This will replicate or overwrite an error from called function? Causing the user to look deeper in error log history... If you really need the pool name, then pass it since you're passing the other two. Thus this could just become: if (virStorage*() < 0)
+ goto cleanup; + }
ret = 0;
+ + cleanup: + virStorageBackendRBDCloseRADOSConn(&ptr); + return r;
return ret;
+} +
#else static int virStorageBackendRBDBuildVolFrom(virConnectPtr conn ATTRIBUTE_UNUSED, virStoragePoolObjPtr pool ATTRIBUTE_UNUSED, virStorageVolDefPtr newvol ATTRIBUTE_UNUSED, virStorageVolDefPtr origvol ATTRIBUTE_UNUSED, unsigned int flags ATTRIBUTE_UNUSED) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("")) Devise an appropriate error message... return -1; } #endif John
static int virStorageBackendRBDRefreshVol(virConnectPtr conn, virStoragePoolObjPtr pool ATTRIBUTE_UNUSED, virStorageVolDefPtr vol) @@ -793,6 +1132,7 @@ virStorageBackend virStorageBackendRBD = { .refreshPool = virStorageBackendRBDRefreshPool, .createVol = virStorageBackendRBDCreateVol, .buildVol = virStorageBackendRBDBuildVol, + .buildVolFrom = virStorageBackendRBDBuildVolFrom, .refreshVol = virStorageBackendRBDRefreshVol, .deleteVol = virStorageBackendRBDDeleteVol, .wipeVol = virStorageBackendRBDVolWipe,

On 21-01-16 21:55, John Ferlan wrote:
On 12/23/2015 04:29 AM, Wido den Hollander wrote:
RBD supports cloning by creating a snapshot, protecting it and create a child image based on that snapshot afterwards.
The RBD storage driver will try to find a snapshot with zero deltas between the current state of the original volume and the snapshot.
If such a snapshot is found a clone/child image will be created using the rbd_clone2() function from librbd.
It will use the same features, strip size and stripe count as the parent image.
This implementation will only create a single snapshot on the parent image if this never changes. That should improve performance when removing the parent image at some point.
During build the decision will be made to user rbd_diff_iterate() or rbd_diff_iterate2(). The latter is faster, but only available on Ceph versions after 0.94 (Hammer).
Cloning is only supported if RBD format 2 is used. All images created by libvirt are already format 2.
If a RBD format 1 image is used as the original volume libvirt will return VIR_ERR_OPERATION_UNSUPPORTED
Long line... and the API should return 0 or -1 (based on other backends)... Caller doesn't expect that VIR_ERR* on return...
I assume you realize that 'virsh vol-create-from' and 'vol-clone' end up using the same API...
FWIW: I reviewed bottom up - I may also have been distracted more than once looking for specific code. Hopefully this is coherent ;-)
Thanks for all the feedback, really appreciate it! Working on a revised version of all the patches right now. Rebasing a lot, editing commits, etc, etc. The coding style changes are new to me though. I run make syntax-check and that works. Never knew that this was needed: static int virStorageRBDXXXXXXX All the exisiting code does: static int virStorageRBDXXXXXX Anyway, I'll get back with new patches. Ignore anything which is on the list right now. Btw, a Pull Request mechanism like Github would be so much easier than sending patches per e-mail ;)
Signed-off-by: Wido den Hollander <wido@widodh.nl> --- src/storage/storage_backend_rbd.c | 340 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 340 insertions(+)
diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c index 5e16e7a..d22e5e0 100644 --- a/src/storage/storage_backend_rbd.c +++ b/src/storage/storage_backend_rbd.c @@ -33,6 +33,7 @@ #include "viruuid.h" #include "virstring.h" #include "virutil.h" +#include "time.h"
Instead of time, perhaps virrandom... see note below
#include "rados/librados.h" #include "rbd/librbd.h"
@@ -632,6 +633,344 @@ virStorageBackendRBDBuildVol(virConnectPtr conn, return ret; }
+static int virStorageBackendRBDImageInfo(rbd_image_t image, char *volname, + uint64_t *features, + uint64_t *stripe_unit, + uint64_t *stripe_count)
static int virStorage...
and one line per parameter
Also, I think this moves to help with the previous patch...
+{ + int r = -1;
int ret = -1;
+ uint8_t oldformat; + + r = rbd_get_old_format(image, &oldformat); + if (r < 0) {
You could use (in more places than just here):
"if ((r = rbd_*(...)) < 0) {"
I'll just mention it once though...
+ virReportSystemError(-r, _("failed to get the format of RBD image %s"), + volname); + goto cleanup; + } + + if (oldformat != 0) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("RBD image %s is old format. Does not support " + "extended features and striping"), + volname); + r = VIR_ERR_OPERATION_UNSUPPORTED;
unnecessary
+ goto cleanup; + }
FWIW: This is the check I'm referring to in the review of the other patch.
+ + r = rbd_get_features(image, features); + if (r < 0) { + virReportSystemError(-r, _("failed to get the features of RBD image %s"), + volname); + goto cleanup; + } + + r = rbd_get_stripe_unit(image, stripe_unit); + if (r < 0) { + virReportSystemError(-r, _("failed to get the stripe unit of RBD image %s"), + volname); + goto cleanup; + } + + r = rbd_get_stripe_count(image, stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get the stripe count of RBD image %s"), + volname); + goto cleanup; + }
ret = 0;
+ + cleanup: + return r;
return ret;
+} + +/* Callback function for rbd_diff_iterate() */ +static int virStorageBackendRBDIterateCb(uint64_t offset ATTRIBUTE_UNUSED, + size_t length ATTRIBUTE_UNUSED, + int exists ATTRIBUTE_UNUSED, + void *arg)
static int virStorage*
+{ + /* + * Just set that there is a diff for this snapshot, we do not care where + */ + *(int*) arg = 1; + return -1;
So I assume the only reason this gets called then is when there is a difference...
+} +
This one will require some comments w/r/t what it returns... I'm also curious how would the caller distinguish between EPERM and a -1 return call? That's the problem with returning negative errno values. I think you need to pick specific numbers to return and not mess w/ errno.
Seems like you have 1 for success, 0 for nothing found, and -1 for error. Then your caller is adjusted thusly.
+static int virStorageBackendRBDSnapshotFindNoDiff(rbd_image_t image, char *imgname, + virBufferPtr snapname)
static int virStorage...
and one line per parameter
+{ + int r = -1;
int ret = -1;
+ int snap_count; + int max_snaps = 128; + size_t i; + int diff; + rbd_snap_info_t *snaps = NULL; + rbd_image_info_t info; + + r = rbd_stat(image, &info, sizeof(info)); + if (r < 0) { + virReportSystemError(-r, _("failed to stat the RBD image %s"), + imgname); + goto cleanup; + }
Perhaps interesting to note that virStorageBackendRBDVolWipe calls this too - why not call it once and save/pass the result (by reference)?
+ + do { + if (VIR_ALLOC_N(snaps, max_snaps)) + goto cleanup; + + snap_count = rbd_snap_list(image, snaps, &max_snaps);
There isn't an API to tell you how many are there? Seems like a much better idea to have that rather than iterative loop like this trying to guess... Especially one that creates 128 to start... Lots of wasted space if there's only 1. There are some API's where you pass NULL for 'snaps' and 'max_snaps = 0', then it returns the number found... That would be more useful here.
+ if (snap_count <= 0) + VIR_FREE(snaps); + + } while (snap_count == -ERANGE); + + VIR_DEBUG("Found %d snapshots for RBD image %s", snap_count, imgname);
Found -# snapshots will look very strange on error...
+ + if (snap_count == 0) { + r = -ENOENT; + goto cleanup; + }
This then becomes:
if (snap_count <= 0) { if (snap_count == 0) ret = 0; goto cleanup; }
+ + if (snap_count > 0) {
That means this if goes away...
+ for (i = 0; i < snap_count; i++) { + VIR_DEBUG("Quering diff for RBD snapshot %s@%s", imgname, + snaps[i].name);
Querying or checking.
+ + /* The callback will set diff to non-zero if there is a diff */ + diff = 0; + +/* + * rbd_diff_iterate2() is available in versions above Ceph 0.94 (Hammer) + * It uses a object map inside Ceph which is faster than rbd_diff_iterate() + * which iterates all objects.
Rather than split function calls between #if #else #endif - I'd copy that duplicated line into both.
+ */ +#if LIBRBD_VERSION_CODE > 266 + r = rbd_diff_iterate2(image, snaps[i].name, 0, info.size, 0, 1, +#else + r = rbd_diff_iterate(image, snaps[i].name, 0, info.size, +#endif + virStorageBackendRBDIterateCb, (void *)&diff); + + if (r < 0) { + virReportSystemError(-r, _("failed to iterate RBD snapshot %s@%s"), + imgname, snaps[i].name); + goto cleanup; + } + + if (diff == 0) {
So this is the I found the matching image/snaps[i].name and it's exactly the same? I'm a bit unclear on what the iterate call does especially with that callback function added into the mix.
+ VIR_DEBUG("RBD snapshot %s@%s has no delta", imgname, + snaps[i].name); + virBufferAsprintf(snapname, "%s", snaps[i].name); + r = 0;
I assume though here we're saying match and the same, so the caller should take a specific path, thus:
ret = 1;
+ goto cleanup; + } + + VIR_DEBUG("RBD snapshot %s@%s has deltas", imgname, + snaps[i].name);
Otherwise, we'll try the next snaps[i]?
+ } + } + + r = -ENOENT;
Thus if we get here we found nothing matching, so:
ret = 0;
+ + cleanup: + if (snaps) + rbd_snap_list_end(snaps); + + VIR_FREE(snaps); + + return r;
return ret;
+} + +static int virStorageBackendRBDSnapshotCreate(rbd_image_t image, char *imgname, + char *snapname)
static int virStorage...
one line per argument
+{ + int r = -1;
int ret = -1;
+ + VIR_DEBUG("Creating RBD snapshot %s@%s", imgname, snapname); + + r = rbd_snap_create(image, snapname); + if (r < 0) { + virReportSystemError(-r, _("failed to create RBD snapshot %s@%s"), + imgname, snapname); + goto cleanup; + }
ret = 0;
+ + cleanup: + return r;
return ret;
+} + +static int virStorageBackendRBDSnapshotProtect(rbd_image_t image, char *imgname, + char *snapname)
static int virStorage...
And one line per parameter...
+{ + int r = -1;
int ret = -1;
+ int protected; + + VIR_DEBUG("Quering if RBD snapshot %s@%s is protected", imgname, snapname);
Querying or Checking
+ + r = rbd_snap_is_protected(image, snapname, &protected); + if (r < 0) { + virReportSystemError(-r, _("failed verify if RBD snapshot %s@%s " + "is protected"), imgname, snapname); + goto cleanup; + } + + if (protected == 0) { + VIR_DEBUG("RBD Snapshot %s@%s is not protected, protecting", + imgname, snapname); + + r = rbd_snap_protect(image, snapname); + if (r < 0) { + virReportSystemError(-r, _("failed protect RBD snapshot %s@%s"), + imgname, snapname); + goto cleanup; + } + } else { + VIR_DEBUG("RBD Snapshot %s@%s is already protected", imgname, snapname); + } + ret = 0;
+ cleanup: + return r;
return ret;
+} + +static int virStorageBackendRBDCloneImage(rados_ioctx_t io, char *origvol, + char *newvol)
static int virStorage...
Also one line per parameter...
+{ + int r = -1;
int ret = -1;
+ int order = 0; + uint64_t features; + uint64_t stripe_count; + uint64_t stripe_unit; + virBuffer snapname = VIR_BUFFER_INITIALIZER; + char *snapname_buff;
= NULL;
Since we can get to cleanup without ever initializing...
+ rbd_image_t image = NULL; + + r = rbd_open(io, origvol, &image, NULL); + if (r < 0) { + virReportSystemError(-r, _("failed to open the RBD image %s"), + origvol); + goto cleanup; + } + + r = virStorageBackendRBDImageInfo(image, origvol, &features, &stripe_unit, + &stripe_count); + if (r < 0) { + virReportSystemError(-r, _("failed to get info from RBD image %s"), + origvol);
Don't overwrite error messages from virStorageBackendRBDImageInfo
This then just becomes:
if (virStorageBackendRBDImageInfo() < 0)
Yes, I see this is where the commit message claim about returning VIR_ERR_OPERATION_UNSUPPORTED is sourced, but I don't think that's the right choice...
+ goto cleanup; + } + + /* + * First we attempt to find a snapshot which has no differences between + * the current state of the RBD image. + * + * This prevents us from creating a new snapshot for every clone operation + * while it could be that the original volume has not changed + */ + r = virStorageBackendRBDSnapshotFindNoDiff(image, origvol, &snapname); + if (r < 0 && r != -ENOENT) { + virReportSystemError(-r, _("failed to find snapshot for RBD image %s"), + origvol); + goto cleanup; + }
Same here regarding error message...
Using my suggestions above "if (r < 0) goto cleanup;"
+ + /* + * No such snapshot could be found, so we will create a new snapshot + * and use that for cloning + */ + if (r == -ENOENT) {
again from my suggestion "if (r == 0)"
+ VIR_DEBUG("No RBD snapshot with zero delta could be found for image %s", + origvol); + + virBufferAsprintf(&snapname, "libvirt-%d", (int)time(NULL));
Ewww... I assume you're shooting for unique name, why not use 'virRandomInt'.
+ + if (virBufferCheckError(&snapname) < 0) + goto cleanup; + + snapname_buff = virBufferContentAndReset(&snapname); + + r = virStorageBackendRBDSnapshotCreate(image, origvol, snapname_buff); + if (r < 0) { + virReportSystemError(-r, _("failed to snapshot RBD image %s"), + origvol); + goto cleanup; + }
Same here regarding error message
+ } else { + snapname_buff = virBufferContentAndReset(&snapname);
This path assumes snapname was generated via the virBufferAsprintf in the called function, but this path doesn't do the virBufferCheckError leaving the possibility that the virBufferAsprintf failed and then the snapname_buff could be "NULL" (if error was set), resulting in issues below...
+ + VIR_DEBUG("Found RBD snapshot %s with zero delta for image %s", + snapname_buff, origvol);
This is somewhat redundant with the VIR_DEBUG in the *NoDiff helper
+ } + + VIR_DEBUG("Using snapshot name %s for cloning RBD image %s to %s", + snapname_buff, origvol, newvol); + + /* + * RBD snapshots have to be 'protected' before they can be used + * as a parent snapshot for a child image + */ + r = virStorageBackendRBDSnapshotProtect(image, origvol, snapname_buff); + if (r < 0) { + virReportSystemError(-r, _("failed to protect RBD snapshot %s@%s"), + origvol, snapname_buff); + goto cleanup; + }
Same here regarding error message.
+ + VIR_DEBUG("Performing RBD clone from %s to %s", origvol, newvol); + + r = rbd_clone2(io, origvol, snapname_buff, io, newvol, features, &order, + stripe_unit, stripe_count);
Seeing rbd_clone2 makes me wonder is there an rbd_clone? and if so, then what version does rbd_clone2 show up in? e.g. why no LIBRBD_VERSION_CODE syntax here?
Why even get this far if rbd_clone2 isn't available?
+ if (r < 0) { + virReportSystemError(-r, _("failed to clone RBD volume %s to %s"), + origvol, newvol); + goto cleanup; + } + + VIR_DEBUG("Cloned RBD image %s to %s", origvol, newvol); + ret = 0;
+ cleanup: + virBufferFreeAndReset(&snapname); + VIR_FREE(snapname_buff); + + if (image) + rbd_close(image); + + return r;
return ret;
+} +
Since the code relies upon rbd_clone2, rather than go through all the effort in the code only to fail because the right function isn't available - check once, early on and be done. I used 266 here because it's the *iterate2 - if "rbd_clone2" is later, then that version should be used. Just be sure to document your choice.
#if LIBRBD_VERSION_CODE > 266
+static int virStorageBackendRBDBuildVolFrom(virConnectPtr conn, + virStoragePoolObjPtr pool, + virStorageVolDefPtr newvol, + virStorageVolDefPtr origvol, + unsigned int flags)
static int virStorage...
+{ + virStorageBackendRBDState ptr; + ptr.cluster = NULL; + ptr.ioctx = NULL; + int r = -1;
Add a 'int ret = -1;'
+ + VIR_DEBUG("Creating clone of RBD image %s/%s with name %s", + pool->def->source.name, origvol->name, newvol->name); + + virCheckFlags(0, -1); + + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0) + goto cleanup; + + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0) + goto cleanup; + + r = virStorageBackendRBDCloneImage(ptr.ioctx, origvol->name, newvol->name); + if (r < 0) { + virReportSystemError(-r, _("failed to clone volume '%s/%s' to %s"), + pool->def->source.name, origvol->name, + newvol->name);
This will replicate or overwrite an error from called function? Causing the user to look deeper in error log history... If you really need the pool name, then pass it since you're passing the other two.
Thus this could just become:
if (virStorage*() < 0)
+ goto cleanup; + }
ret = 0;
+ + cleanup: + virStorageBackendRBDCloseRADOSConn(&ptr); + return r;
return ret;
+} +
#else
static int virStorageBackendRBDBuildVolFrom(virConnectPtr conn ATTRIBUTE_UNUSED, virStoragePoolObjPtr pool ATTRIBUTE_UNUSED, virStorageVolDefPtr newvol ATTRIBUTE_UNUSED, virStorageVolDefPtr origvol ATTRIBUTE_UNUSED, unsigned int flags ATTRIBUTE_UNUSED) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _(""))
Devise an appropriate error message...
return -1;
}
#endif
John
static int virStorageBackendRBDRefreshVol(virConnectPtr conn, virStoragePoolObjPtr pool ATTRIBUTE_UNUSED, virStorageVolDefPtr vol) @@ -793,6 +1132,7 @@ virStorageBackend virStorageBackendRBD = { .refreshPool = virStorageBackendRBDRefreshPool, .createVol = virStorageBackendRBDCreateVol, .buildVol = virStorageBackendRBDBuildVol, + .buildVolFrom = virStorageBackendRBDBuildVolFrom, .refreshVol = virStorageBackendRBDRefreshVol, .deleteVol = virStorageBackendRBDDeleteVol, .wipeVol = virStorageBackendRBDVolWipe,

On 01/22/2016 08:14 AM, Wido den Hollander wrote:
On 21-01-16 21:55, John Ferlan wrote:
On 12/23/2015 04:29 AM, Wido den Hollander wrote:
RBD supports cloning by creating a snapshot, protecting it and create a child image based on that snapshot afterwards.
The RBD storage driver will try to find a snapshot with zero deltas between the current state of the original volume and the snapshot.
If such a snapshot is found a clone/child image will be created using the rbd_clone2() function from librbd.
It will use the same features, strip size and stripe count as the parent image.
This implementation will only create a single snapshot on the parent image if this never changes. That should improve performance when removing the parent image at some point.
During build the decision will be made to user rbd_diff_iterate() or rbd_diff_iterate2(). The latter is faster, but only available on Ceph versions after 0.94 (Hammer).
Cloning is only supported if RBD format 2 is used. All images created by libvirt are already format 2.
If a RBD format 1 image is used as the original volume libvirt will return VIR_ERR_OPERATION_UNSUPPORTED
Long line... and the API should return 0 or -1 (based on other backends)... Caller doesn't expect that VIR_ERR* on return...
I assume you realize that 'virsh vol-create-from' and 'vol-clone' end up using the same API...
FWIW: I reviewed bottom up - I may also have been distracted more than once looking for specific code. Hopefully this is coherent ;-)
Thanks for all the feedback, really appreciate it!
Working on a revised version of all the patches right now. Rebasing a lot, editing commits, etc, etc.
The coding style changes are new to me though. I run make syntax-check and that works. Never knew that this was needed:
static int virStorageRBDXXXXXXX
All the exisiting code does:
static int virStorageRBDXXXXXX
Tough to make a syntax-check rule since there'd be a lot of cleanup before it could be 'enforced'. This has been more of a code review type effort as new functions are created or function arguments are changed. Similarly two blank lines between functions. It's stylistic - the more code that can follow, the better/easier to read other code. BTW: Use of 'ret' instead of the 'r' from each rbd_* call (IOW: -errno values) is definitely something I've seen crop up in a few recent bz's so it's fresher on my mind.
Anyway, I'll get back with new patches. Ignore anything which is on the list right now.
Btw, a Pull Request mechanism like Github would be so much easier than sending patches per e-mail ;)
And there's a segment of folks that aren't fans of github... John
participants (3)
-
Daniel P. Berrange
-
John Ferlan
-
Wido den Hollander