[libvirt] [PATCH RFC] storage: perform btrfs clone if possible

We already had nocow flags in virStorageSource. But when creating RAW file, we don't take advantage of clone of btrfs. This file introduce btrfs_clone_file function, and try to use it when !nocow. Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> --- src/storage/storage_backend.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index 98720f6..f5ea34c 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -156,6 +156,27 @@ enum { #define READ_BLOCK_SIZE_DEFAULT (1024 * 1024) #define WRITE_BLOCK_SIZE_DEFAULT (4 * 1024) +/* + * Perform the O(1) btrfs clone operation, if possible. + * Upon success, return 0. Otherwise, return -1 and set errno. + */ +static inline int +btrfs_clone_file(int dest_fd, int src_fd) +{ +#ifdef __linux__ +# undef BTRFS_IOCTL_MAGICi +# define BTRFS_IOCTL_MAGIC 0x94 +# undef BTRFS_IOC_CLONE +# define BTRFS_IOC_CLONE _IOW (BTRFS_IOCTL_MAGIC, 9, int) + return ioctl(dest_fd, BTRFS_IOC_CLONE, src_fd); +#else + (void) dest_fd; + (void) src_fd; + errno = ENOTSUP; + return -1; +#endif +} + static int ATTRIBUTE_NONNULL(2) virStorageBackendCopyToFD(virStorageVolDefPtr vol, virStorageVolDefPtr inputvol, @@ -200,6 +221,16 @@ virStorageBackendCopyToFD(virStorageVolDefPtr vol, goto cleanup; } + if (!vol->target.nocow) { + if (btrfs_clone_file(fd, inputfd) == -1) { + if (errno == ENOTSUP) + VIR_DEBUG("btrfs clone not supported, try another way."); + } else { + VIR_DEBUG("btrfs clone findished."); + goto cleanup; + } + } + while (amtread != 0) { int amtleft; -- 1.9.3

On Mon, Nov 24, 2014 at 02:11:47PM +0800, Chen Hanxiao wrote:
We already had nocow flags in virStorageSource. But when creating RAW file, we don't take advantage of clone of btrfs. This file introduce btrfs_clone_file function, and try to use it when !nocow.
I'm not sure we want to do this, but I have nothing against that either. So I'll just review the code without any other comments.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> --- src/storage/storage_backend.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)
diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index 98720f6..f5ea34c 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -156,6 +156,27 @@ enum { #define READ_BLOCK_SIZE_DEFAULT (1024 * 1024) #define WRITE_BLOCK_SIZE_DEFAULT (4 * 1024)
+/* + * Perform the O(1) btrfs clone operation, if possible. + * Upon success, return 0. Otherwise, return -1 and set errno. + */ +static inline int +btrfs_clone_file(int dest_fd, int src_fd)
All the functions in this file use camelCase, this one might too.
+{ +#ifdef __linux__ +# undef BTRFS_IOCTL_MAGICi
s/i$// ?
+# define BTRFS_IOCTL_MAGIC 0x94 +# undef BTRFS_IOC_CLONE +# define BTRFS_IOC_CLONE _IOW (BTRFS_IOCTL_MAGIC, 9, int)
Are you redefining those just in case they are not defined, but the support exists? I'm always afraid of creating incompatibilities and would prefer #if for that and if anything is not defined, just don't use it.
+ return ioctl(dest_fd, BTRFS_IOC_CLONE, src_fd); +#else + (void) dest_fd; + (void) src_fd;
we use ignore_value(), but you don't need that if you do what's preferred ...
+ errno = ENOTSUP; + return -1; +#endif +} +
... we prefer to split the whole definitions for functions to a working variant and a stub, in that case you can mark unused parameters in the stub function. From a subjective point of view, it's more readable, also (you see right before the definition what you need for the function to work): #if defined(__linux__) && defined(BTRFS_IOC_CLONE) static inline int btrfs_clone_file(int dest_fd, int src_fd) { return ioctl(dest_fd, BTRFS_IOC_CLONE, src_fd); } #else static inline int btrfs_clone_file(int dest_fd, int src_fd) { errno = ENOTSUP; return -1; } #endif
@@ -200,6 +221,16 @@ virStorageBackendCopyToFD(virStorageVolDefPtr vol, goto cleanup; }
+ if (!vol->target.nocow) { + if (btrfs_clone_file(fd, inputfd) == -1) { + if (errno == ENOTSUP) + VIR_DEBUG("btrfs clone not supported, try another way."); + } else { + VIR_DEBUG("btrfs clone findished.");
s/findished/finished/ As I said, I'm not commenting on whether we want this in or not, so for that you should wait for someone's response. I bet there's a (good) reason behind libvirt not using some lvm/zfs/btrfs features, but I am too lazy to search for it since it'd be inaccurate anyway. Martin

-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Monday, November 24, 2014 3:09 PM To: Chen, Hanxiao/陈 晗霄 Cc: libvir-list@redhat.com Subject: Re: [libvirt] [PATCH RFC] storage: perform btrfs clone if possible
On Mon, Nov 24, 2014 at 02:11:47PM +0800, Chen Hanxiao wrote:
We already had nocow flags in virStorageSource. But when creating RAW file, we don't take advantage of clone of btrfs. This file introduce btrfs_clone_file function, and try to use it when !nocow.
I'm not sure we want to do this, but I have nothing against that either. So I'll just review the code without any other comments.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> --- src/storage/storage_backend.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)
diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index 98720f6..f5ea34c 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -156,6 +156,27 @@ enum { #define READ_BLOCK_SIZE_DEFAULT (1024 * 1024) #define WRITE_BLOCK_SIZE_DEFAULT (4 * 1024)
+/* + * Perform the O(1) btrfs clone operation, if possible. + * Upon success, return 0. Otherwise, return -1 and set errno. + */ +static inline int +btrfs_clone_file(int dest_fd, int src_fd)
All the functions in this file use camelCase, this one might too.
+{ +#ifdef __linux__ +# undef BTRFS_IOCTL_MAGICi
s/i$// ?
+# define BTRFS_IOCTL_MAGIC 0x94 +# undef BTRFS_IOC_CLONE +# define BTRFS_IOC_CLONE _IOW (BTRFS_IOCTL_MAGIC, 9, int)
Are you redefining those just in case they are not defined, but the support exists? I'm always afraid of creating incompatibilities and would prefer #if for that and if anything is not defined, just don't use it.
+ return ioctl(dest_fd, BTRFS_IOC_CLONE, src_fd); +#else + (void) dest_fd; + (void) src_fd;
we use ignore_value(), but you don't need that if you do what's preferred ...
+ errno = ENOTSUP; + return -1; +#endif +} +
... we prefer to split the whole definitions for functions to a working variant and a stub, in that case you can mark unused parameters in the stub function. From a subjective point of view, it's more readable, also (you see right before the definition what you need for the function to work):
#if defined(__linux__) && defined(BTRFS_IOC_CLONE)
static inline int btrfs_clone_file(int dest_fd, int src_fd) { return ioctl(dest_fd, BTRFS_IOC_CLONE, src_fd); }
#else
static inline int btrfs_clone_file(int dest_fd, int src_fd) { errno = ENOTSUP; return -1; }
#endif
Thanks for your explanation in such a detail way. That's very helpful. :)
@@ -200,6 +221,16 @@ virStorageBackendCopyToFD(virStorageVolDefPtr vol, goto cleanup; }
+ if (!vol->target.nocow) { + if (btrfs_clone_file(fd, inputfd) == -1) { + if (errno == ENOTSUP) + VIR_DEBUG("btrfs clone not supported, try another way."); + } else { + VIR_DEBUG("btrfs clone findished.");
s/findished/finished/
As I said, I'm not commenting on whether we want this in or not, so for that you should wait for someone's response. I bet there's a (good) reason behind libvirt not using some lvm/zfs/btrfs features, but I am too lazy to search for it since it'd be inaccurate anyway.
Let's wait for other's comments on whether we need this. But features like btrfs clone really cool if we could get full use of it. It's slow to copy huge VM images by tools such as virt-clone. Thanks, - Chen

On 11/24/2014 12:09 AM, Martin Kletzander wrote:
On Mon, Nov 24, 2014 at 02:11:47PM +0800, Chen Hanxiao wrote:
We already had nocow flags in virStorageSource. But when creating RAW file, we don't take advantage of clone of btrfs. This file introduce btrfs_clone_file function, and try to use it when !nocow.
I'm not sure we want to do this, but I have nothing against that either. So I'll just review the code without any other comments.
As I said, I'm not commenting on whether we want this in or not, so for that you should wait for someone's response. I bet there's a (good) reason behind libvirt not using some lvm/zfs/btrfs features, but I am too lazy to search for it since it'd be inaccurate anyway.
I think it makes sense to expose this functionality; although I suspect it is better if we do so by having the user pass an explicit new flag value to existing API instead of doing it automatically. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

-----Original Message----- From: Eric Blake [mailto:eblake@redhat.com] Sent: Tuesday, November 25, 2014 12:25 AM To: Martin Kletzander; Chen, Hanxiao/陈 晗霄 Cc: libvir-list@redhat.com Subject: Re: [libvirt] [PATCH RFC] storage: perform btrfs clone if possible
On 11/24/2014 12:09 AM, Martin Kletzander wrote:
On Mon, Nov 24, 2014 at 02:11:47PM +0800, Chen Hanxiao wrote:
We already had nocow flags in virStorageSource. But when creating RAW file, we don't take advantage of clone of btrfs. This file introduce btrfs_clone_file function, and try to use it when !nocow.
I'm not sure we want to do this, but I have nothing against that either. So I'll just review the code without any other comments.
As I said, I'm not commenting on whether we want this in or not, so for that you should wait for someone's response. I bet there's a (good) reason behind libvirt not using some lvm/zfs/btrfs features, but I am too lazy to search for it since it'd be inaccurate anyway.
I think it makes sense to expose this functionality; although I suspect it is better if we do so by having the user pass an explicit new flag value to existing API instead of doing it automatically.
Thanks for your clarification. But we've already had nocow in virStorageSource and <nocow> tags. So I think if we do not specify <nocow> tags in XML, we should try it according to 'nocow' in codes. Or do we need a new flags such as --reflink for tools like virt-clone? Thanks, - Chen

On 11/25/2014 03:19 AM, Chen, Hanxiao wrote:
I think it makes sense to expose this functionality; although I suspect it is better if we do so by having the user pass an explicit new flag value to existing API instead of doing it automatically.
Thanks for your clarification.
But we've already had nocow in virStorageSource and <nocow> tags. So I think if we do not specify <nocow> tags in XML, we should try it according to 'nocow' in codes.
Or do we need a new flags such as --reflink for tools like virt-clone?
It's a design trade-off - do we make the default operation efficient when possible (but with possible surprises due to overcommit of storage), or safe (but slow)? Regardless of the default, is it possible to explicitly request the opposite action? To me, it seems backwards compatible to make an operation faster by using underlying clone hooks as long as the end result is the same, and as long as a user NOT specifying a specific use of clone or avoidance of clone does not get an error message when clone is attempted but not supported. So at this point, I could go either way for the default as long as I can still have full control over the behavior, and we don't break existing users. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

-----Original Message----- From: Eric Blake [mailto:eblake@redhat.com] Sent: Tuesday, November 25, 2014 11:59 PM To: Chen, Hanxiao/陈 晗霄; Martin Kletzander Cc: libvir-list@redhat.com Subject: Re: [libvirt] [PATCH RFC] storage: perform btrfs clone if possible
On 11/25/2014 03:19 AM, Chen, Hanxiao wrote:
I think it makes sense to expose this functionality; although I suspect it is better if we do so by having the user pass an explicit new flag value to existing API instead of doing it automatically.
Thanks for your clarification.
But we've already had nocow in virStorageSource and <nocow> tags. So I think if we do not specify <nocow> tags in XML, we should try it according to 'nocow' in codes.
Or do we need a new flags such as --reflink for tools like virt-clone?
It's a design trade-off - do we make the default operation efficient when possible (but with possible surprises due to overcommit of storage), or safe (but slow)? Regardless of the default, is it possible to explicitly request the opposite action? To me, it seems backwards compatible to make an operation faster by using underlying clone hooks as long as the end result is the same, and as long as a user NOT specifying a specific use of clone or avoidance of clone does not get an error message when clone is attempted but not supported. So at this point, I could go either way for the default as long as I can still have full control over the behavior, and we don't break existing users.
FYI, in my limited test, the suggestion of this patch did not break existing users. If not supported, it just print some debug logs and try the original way then: + if (!vol->target.nocow) { + if (btrfs_clone_file(fd, inputfd) == -1) { + if (errno == ENOTSUP) + VIR_DEBUG("btrfs clone not supported, try another way."); + } else { + VIR_DEBUG("btrfs clone findished."); + goto cleanup; + } + } + If I didn't miss something, I think this kind of operation is acceptable. Thanks, - Chen
participants (4)
-
Chen Hanxiao
-
Chen, Hanxiao
-
Eric Blake
-
Martin Kletzander