[libvirt] [PATCH] add --nocow option to vol-create and vol-clone

Btrfs has terrible performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files (since having copy on write for this kind of data is not useful). According to 'chattr' manpage, NOCOW could be set to new or empty file only on btrfs, so this patch tries to add a --nocow option to vol-create functions and vol-clone function, so that users could have a chance to set NOCOW to a new volume if that happens to create on a btrfs like file system. Signed-off-by: Chunyan Liu <cyliu@suse.com> --- include/libvirt/libvirt.h.in | 1 + src/storage/storage_backend.c | 2 +- src/storage/storage_backend_fs.c | 49 ++++++++++++++++++++++++++++++++++++- src/storage/storage_driver.c | 6 +++- tools/virsh-volume.c | 26 ++++++++++++++++++- 5 files changed, 77 insertions(+), 7 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 5aad75c..0761ba6 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -3151,6 +3151,7 @@ const char* virStorageVolGetKey (virStorageVolPtr vol); typedef enum { VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA = 1 << 0, + VIR_STORAGE_VOL_CREATE_NOCOW = 1 << 1, } virStorageVolCreateFlags; virStorageVolPtr virStorageVolCreateXML (virStoragePoolPtr pool, diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index 57c1728..8ce7e99 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -423,7 +423,7 @@ virStorageBackendCreateRaw(virConnectPtr conn ATTRIBUTE_UNUSED, operation_flags |= VIR_FILE_OPEN_FORK; if ((fd = virFileOpenAs(vol->target.path, - O_RDWR | O_CREAT | O_EXCL, + O_RDWR | O_CREAT, vol->target.perms.mode, vol->target.perms.uid, vol->target.perms.gid, diff --git a/src/storage/storage_backend_fs.c b/src/storage/storage_backend_fs.c index 11cf2df..d60b64c 100644 --- a/src/storage/storage_backend_fs.c +++ b/src/storage/storage_backend_fs.c @@ -51,6 +51,13 @@ #include "virfile.h" #include "virlog.h" #include "virstring.h" +#ifdef __linux__ +# include <sys/ioctl.h> +# include <linux/fs.h> +#ifndef FS_NOCOW_FL +#define FS_NOCOW_FL 0x00800000 /* Do not cow file */ +#endif +#endif #define VIR_FROM_THIS VIR_FROM_STORAGE @@ -1090,6 +1097,42 @@ _virStorageBackendFileSystemVolBuild(virConnectPtr conn, return -1; } + if (flags & VIR_STORAGE_VOL_CREATE_NOCOW) { +#ifdef __linux__ + /* create an empty file and set nocow flag. + * This could optimize performance on file system like btrfs. + */ + int attr, fd; + int operation_flags = VIR_FILE_OPEN_FORCE_MODE | VIR_FILE_OPEN_FORCE_OWNER; + if (pool->def->type == VIR_STORAGE_POOL_NETFS) + operation_flags |= VIR_FILE_OPEN_FORK; + + if ((fd = virFileOpenAs(vol->target.path, + O_RDWR | O_CREAT | O_EXCL | O_LARGEFILE, + vol->target.perms.mode, + vol->target.perms.uid, + vol->target.perms.gid, + operation_flags)) < 0) { + virReportSystemError(-fd, + _("Failed to create file '%s'"), + vol->target.path); + return -1; + } + + /* This is an optimisation. The FS_IOC_SETFLAGS ioctl return value will + * be ignored since any failure of this operation should not block the + * left work. + */ + if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { + attr |= FS_NOCOW_FL; + ioctl(fd, FS_IOC_SETFLAGS, &attr); + } + + VIR_FORCE_CLOSE(fd); +#endif + flags &= ~VIR_STORAGE_VOL_CREATE_NOCOW; + } + if (create_func(conn, pool, vol, inputvol, flags) < 0) return -1; return 0; @@ -1106,7 +1149,8 @@ virStorageBackendFileSystemVolBuild(virConnectPtr conn, virStorageVolDefPtr vol, unsigned int flags) { - virCheckFlags(VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA, -1); + virCheckFlags(VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA | + VIR_STORAGE_VOL_CREATE_NOCOW, -1); return _virStorageBackendFileSystemVolBuild(conn, pool, vol, NULL, flags); } @@ -1121,7 +1165,8 @@ virStorageBackendFileSystemVolBuildFrom(virConnectPtr conn, virStorageVolDefPtr inputvol, unsigned int flags) { - virCheckFlags(VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA, -1); + virCheckFlags(VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA | + VIR_STORAGE_VOL_CREATE_NOCOW, -1); return _virStorageBackendFileSystemVolBuild(conn, pool, vol, inputvol, flags); } diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c index 3b4715a..2aef1a7 100644 --- a/src/storage/storage_driver.c +++ b/src/storage/storage_driver.c @@ -1509,7 +1509,8 @@ storageVolCreateXML(virStoragePoolPtr obj, virStorageVolPtr ret = NULL, volobj = NULL; virStorageVolDefPtr buildvoldef = NULL; - virCheckFlags(VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA, NULL); + virCheckFlags(VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA | + VIR_STORAGE_VOL_CREATE_NOCOW, NULL); storageDriverLock(driver); pool = virStoragePoolObjFindByUUID(&driver->pools, obj->uuid); @@ -1641,7 +1642,8 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, unsigned long long allocation; int buildret; - virCheckFlags(VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA, NULL); + virCheckFlags(VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA | + VIR_STORAGE_VOL_CREATE_NOCOW, NULL); storageDriverLock(driver); pool = virStoragePoolObjFindByUUID(&driver->pools, obj->uuid); diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c index 22b10d5..bfbcc1f 100644 --- a/tools/virsh-volume.c +++ b/tools/virsh-volume.c @@ -151,6 +151,10 @@ static const vshCmdOptDef opts_vol_create_as[] = { .type = VSH_OT_BOOL, .help = N_("preallocate metadata (for qcow2 instead of full allocation)") }, + {.name = "nocow", + .type = VSH_OT_BOOL, + .help = N_("turn off copy-on-write (for file on fs like btrfs)") + }, {.name = NULL} }; @@ -177,6 +181,8 @@ cmdVolCreateAs(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool(cmd, "prealloc-metadata")) flags |= VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA; + if (vshCommandOptBool(cmd, "nocow")) + flags |= VIR_STORAGE_VOL_CREATE_NOCOW; if (!(pool = vshCommandOptPool(ctl, cmd, "pool", NULL))) return false; @@ -330,6 +336,10 @@ static const vshCmdOptDef opts_vol_create[] = { .type = VSH_OT_BOOL, .help = N_("preallocate metadata (for qcow2 instead of full allocation)") }, + {.name = "nocow", + .type = VSH_OT_BOOL, + .help = N_("turn off copy-on-write (for file on fs like btrfs)") + }, {.name = NULL} }; @@ -345,6 +355,8 @@ cmdVolCreate(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool(cmd, "prealloc-metadata")) flags |= VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA; + if (vshCommandOptBool(cmd, "nocow")) + flags |= VIR_STORAGE_VOL_CREATE_NOCOW; if (!(pool = vshCommandOptPool(ctl, cmd, "pool", NULL))) return false; @@ -408,6 +420,10 @@ static const vshCmdOptDef opts_vol_create_from[] = { .type = VSH_OT_BOOL, .help = N_("preallocate metadata (for qcow2 instead of full allocation)") }, + {.name = "nocow", + .type = VSH_OT_BOOL, + .help = N_("turn off copy-on-write (for file on fs like btrfs)") + }, {.name = NULL} }; @@ -426,7 +442,8 @@ cmdVolCreateFrom(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool(cmd, "prealloc-metadata")) flags |= VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA; - + if (vshCommandOptBool(cmd, "nocow")) + flags |= VIR_STORAGE_VOL_CREATE_NOCOW; if (vshCommandOptStringReq(ctl, cmd, "file", &from) < 0) goto cleanup; @@ -521,6 +538,10 @@ static const vshCmdOptDef opts_vol_clone[] = { .type = VSH_OT_BOOL, .help = N_("preallocate metadata (for qcow2 instead of full allocation)") }, + {.name = "nocow", + .type = VSH_OT_BOOL, + .help = N_("turn off copy-on-write (for file on fs like btrfs)") + }, {.name = NULL} }; @@ -540,7 +561,8 @@ cmdVolClone(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool(cmd, "prealloc-metadata")) flags |= VIR_STORAGE_VOL_CREATE_PREALLOC_METADATA; - + if (vshCommandOptBool(cmd, "nocow")) + flags |= VIR_STORAGE_VOL_CREATE_NOCOW; origpool = virStoragePoolLookupByVolume(origvol); if (!origpool) { vshError(ctl, "%s", _("failed to get parent pool")); -- 1.6.0.2

On Thu, Dec 05, 2013 at 06:35:12PM +0800, Chunyan Liu wrote:
Btrfs has terrible performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files (since having copy on write for this kind of data is not useful).
According to 'chattr' manpage, NOCOW could be set to new or empty file only on btrfs, so this patch tries to add a --nocow option to vol-create functions and vol-clone function, so that users could have a chance to set NOCOW to a new volume if that happens to create on a btrfs like file system.
What effect / impact does setting this flag have from a functional POV ? Why would we not just unconditonally enable it on btrfs so it was fast "out of the box" ? I'm loathe to add a btrfs-specific flag to our public API. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

2013/12/5 Daniel P. Berrange <berrange@redhat.com>
Btrfs has terrible performance when hosting VM images, even more when
in those VM are also using btrfs as file system. One way to mitigate
On Thu, Dec 05, 2013 at 06:35:12PM +0800, Chunyan Liu wrote: the guest this bad
performance is to turn off COW attributes on VM files (since having copy on write for this kind of data is not useful).
According to 'chattr' manpage, NOCOW could be set to new or empty file only on btrfs, so this patch tries to add a --nocow option to vol-create functions and vol-clone function, so that users could have a chance to set NOCOW to a new volume if that happens to create on a btrfs like file system.
What effect / impact does setting this flag have from a functional POV ?
It implies nodatasum as well. But COW may still happen if a snapshot is taken. Following is quoted from: https://btrfs.wiki.kernel.org/index.php/FAQ Can copy-on-write be turned off for data blocks? Yes, there are several ways how to do that. Disable it by mounting with nodatacow. This implies nodatasum as well. COW may still happen if a snapshot is taken. However COW will still be maintained for existing files, because the COW status can be modified only for empty or newly created files. For an empty file, add the NOCOW file attribute (use chattr utility with +C), or you create a new file in a directory with the NOCOW attribute set (then the new file will inherit this attribute). Now copy the original data into the pre-created file, delete original and rename back. Why would we not just unconditonally enable it on btrfs so
it was fast "out of the box" ?
COW is default feature of Btrfs. There are many advantages with COW mechanism. Other uses may want the COW advantages at the same time we set NOCOW to a VM image. But in pool-create and vol-create case, it seems the whole pool is used to hold VM images, so maybe we could just disable COW in pool side. Then all vol created in it will be NOCOW. That means, in pool-start phase, if checking fs format is 'btrfs', add '-o nodatacow' option to 'mount' command. That still need some change in libvirt code. How do you think about this way? Thanks, Chunyan
I'm loathe to add a btrfs-specific flag to our public API.
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/:| |: http://libvirt.org -o- http://virt-manager.org:| |: http://autobuild.org -o- http://search.cpan.org/~danberr/:| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc:|

2013/12/6 Chunyan Liu <cyliu@suse.com>
2013/12/5 Daniel P. Berrange <berrange@redhat.com>
On Thu, Dec 05, 2013 at 06:35:12PM +0800, Chunyan Liu wrote:
Btrfs has terrible performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files (since having copy on write for this kind of data is not useful).
According to 'chattr' manpage, NOCOW could be set to new or empty file only on btrfs, so this patch tries to add a --nocow option to vol-create functions and vol-clone function, so that users could have a chance to set NOCOW to a new volume if that happens to create on a btrfs like file system.
What effect / impact does setting this flag have from a functional POV ?
It implies nodatasum as well. But COW may still happen if a snapshot is taken.
Following is quoted from: https://btrfs.wiki.kernel.org/index.php/FAQ
Can copy-on-write be turned off for data blocks?
Yes, there are several ways how to do that.
Disable it by mounting with nodatacow. This implies nodatasum as well. COW may still happen if a snapshot is taken. However COW will still be maintained for existing files, because the COW status can be modified only for empty or newly created files.
For an empty file, add the NOCOW file attribute (use chattr utility with +C), or you create a new file in a directory with the NOCOW attribute set (then the new file will inherit this attribute). Now copy the original data into the pre-created file, delete original and rename back.
Why would we not just unconditonally enable it on btrfs so
it was fast "out of the box" ?
COW is default feature of Btrfs. There are many advantages with COW mechanism. Other uses may want the COW advantages at the same time we set NOCOW to a VM image.
But in pool-create and vol-create case, it seems the whole pool is used to hold VM images, so maybe we could just disable COW in pool side. Then all vol created in it will be NOCOW. That means, in pool-start phase, if checking fs format is 'btrfs', add '-o nodatacow' option to 'mount' command. That still need some change in libvirt code. How do you think about this way?
Daniel, about the nocow issue, could we do *mount* -o *nodatacow* in pool-start if checked that fs format is btrfs? Chunyan
Thanks, Chunyan
I'm loathe to add a btrfs-specific flag to our public API.
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Chunyan Liu wrote:
2013/12/5 Daniel P. Berrange <berrange@redhat.com <mailto:berrange@redhat.com>>
On Thu, Dec 05, 2013 at 06:35:12PM +0800, Chunyan Liu wrote: > Btrfs has terrible performance when hosting VM images, even more when the guest > in those VM are also using btrfs as file system. One way to mitigate this bad > performance is to turn off COW attributes on VM files (since having copy on > write for this kind of data is not useful). > > According to 'chattr' manpage, NOCOW could be set to new or empty file only on > btrfs, so this patch tries to add a --nocow option to vol-create functions and > vol-clone function, so that users could have a chance to set NOCOW to a new > volume if that happens to create on a btrfs like file system.
What effect / impact does setting this flag have from a functional POV ?
It implies nodatasum as well. But COW may still happen if a snapshot is taken.
Following is quoted from: https://btrfs.wiki.kernel.org/index.php/FAQ
Can copy-on-write be turned off for data blocks?
Yes, there are several ways how to do that.
Disable it by mounting with nodatacow. This implies nodatasum as well. COW may still happen if a snapshot is taken. However COW will still be maintained for existing files, because the COW status can be modified only for empty or newly created files.
For an empty file, add the NOCOW file attribute (use chattr utility with +C), or you create a new file in a directory with the NOCOW attribute set (then the new file will inherit this attribute). Now copy the original data into the pre-created file, delete original and rename back.
Why would we not just unconditonally enable it on btrfs so it was fast "out of the box" ?
COW is default feature of Btrfs. There are many advantages with COW mechanism. Other uses may want the COW advantages at the same time we set NOCOW to a VM image.
But in pool-create and vol-create case, it seems the whole pool is used to hold VM images, so maybe we could just disable COW in pool side. Then all vol created in it will be NOCOW. That means, in pool-start phase, if checking fs format is 'btrfs', add '-o nodatacow' option to 'mount' command. That still need some change in libvirt code. How do you think about this way?
Daniel, Any thoughts on Chunyan's suggestion? It seems mounting a btrfs pool with nodatacow provides out-of-the-box performance improvement, but doesn't later preclude creating/using COW volumes from the pool. Regards, Jim

On Thu, Dec 05, 2013 at 11:16:42AM +0000, Daniel P. Berrange wrote:
On Thu, Dec 05, 2013 at 06:35:12PM +0800, Chunyan Liu wrote:
Btrfs has terrible performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files (since having copy on write for this kind of data is not useful).
According to 'chattr' manpage, NOCOW could be set to new or empty file only on btrfs, so this patch tries to add a --nocow option to vol-create functions and vol-clone function, so that users could have a chance to set NOCOW to a new volume if that happens to create on a btrfs like file system.
What effect / impact does setting this flag have from a functional POV ? Why would we not just unconditonally enable it on btrfs so it was fast "out of the box" ? I'm loathe to add a btrfs-specific flag to our public API.
Quoting from the qemu-devel thread on the same subject:
When the NOCOW attribute is set on a file, reflink copying (aka file-level snapshots) do not work:
$ cp --reflink test.img test-snapshot.img
This produces EINVAL.
It is a regression if qemu-img create suddenly starts breaking this standard btrfs feature for existing users.
So as with QEMU, I don't think libvirt can do something which could break existing users of brtfs in this way. So this would have to be an opt-in of some kind. We already have a way to express "features" for storage volumes in the XML description. We could use this to express a 'nocow' feature. This is preferrable to an API flag, since this would let a user query XML for an existing volume to discover if it had 'nocow' or not. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

2013/12/18 Daniel P. Berrange <berrange@redhat.com>
On Thu, Dec 05, 2013 at 06:35:12PM +0800, Chunyan Liu wrote:
Btrfs has terrible performance when hosting VM images, even more when
in those VM are also using btrfs as file system. One way to mitigate
On Thu, Dec 05, 2013 at 11:16:42AM +0000, Daniel P. Berrange wrote: the guest this bad
performance is to turn off COW attributes on VM files (since having copy on write for this kind of data is not useful).
According to 'chattr' manpage, NOCOW could be set to new or empty file only on btrfs, so this patch tries to add a --nocow option to vol-create functions and vol-clone function, so that users could have a chance to set NOCOW to a new volume if that happens to create on a btrfs like file system.
What effect / impact does setting this flag have from a functional POV ? Why would we not just unconditonally enable it on btrfs so it was fast "out of the box" ? I'm loathe to add a btrfs-specific flag to our public API.
Quoting from the qemu-devel thread on the same subject:
When the NOCOW attribute is set on a file, reflink copying (aka file-level snapshots) do not work:
$ cp --reflink test.img test-snapshot.img
This produces EINVAL.
It is a regression if qemu-img create suddenly starts breaking this standard btrfs feature for existing users.
So as with QEMU, I don't think libvirt can do something which could break existing users of brtfs in this way. So this would have to be an opt-in of some kind.
We already have a way to express "features" for storage volumes in the XML description. We could use this to express a 'nocow' feature. This is preferrable to an API flag, since this would let a user query XML for an existing volume to discover if it had 'nocow' or not.
Thanks, Daniel! I'll rework on that. Chunyan
participants (3)
-
Chunyan Liu
-
Daniel P. Berrange
-
Jim Fehlig