[libvirt] [PATCH] ZFS: Support sparse volumes

By default, `zfs create -V ...` reserves space for the entire volsize, plus some extra (which attempts to account for overhead). If `zfs create -s -V ...` is used instead, zvols are (fully) sparse. A middle ground (partial allocation) can be achieved with `zfs create -s -o refreservation=... -V ...`. Both libvirt and ZFS support this approach, so the ZFS storage backend should support it. Signed-off-by: Richard Laager <rlaager@wiktel.com> --- src/storage/storage_backend_zfs.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/src/storage/storage_backend_zfs.c b/src/storage/storage_backend_zfs.c index 2e6e407..6dc3cec 100644 --- a/src/storage/storage_backend_zfs.c +++ b/src/storage/storage_backend_zfs.c @@ -112,7 +112,7 @@ virStorageBackendZFSParseVol(virStoragePoolObjPtr pool, if (!(tokens = virStringSplitCount(volume_string, "\t", 0, &count))) return -1; - if (count != 2) + if (count != 3) goto cleanup; if (!(name_tokens = virStringSplit(tokens[0], "/", 2))) @@ -151,6 +151,20 @@ virStorageBackendZFSParseVol(virStoragePoolObjPtr pool, goto cleanup; } + if (virStrToLong_ull(tokens[2], NULL, 10, &volume->target.allocation) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("malformed refreservation reported")); + goto cleanup; + } + if (volume->target.allocation >= volume->target.capacity) { + /* A zvol created without -s will have a refreservation slightly larger + * than volblocksize. + */ + volume->target.allocation = volume->target.capacity; + } else { + volume->target.sparse = true; + } + if (is_new_vol && VIR_APPEND_ELEMENT(pool->volumes.objs, pool->volumes.count, @@ -190,7 +204,7 @@ virStorageBackendZFSFindVols(virStoragePoolObjPtr pool, cmd = virCommandNewArgList(ZFS, "list", "-Hp", "-t", "volume", "-r", - "-o", "name,volsize", + "-o", "name,volsize,refreservation", pool->def->source.name, NULL); virCommandSetOutputBuffer(cmd, &volumes_list); @@ -320,15 +334,28 @@ virStorageBackendZFSCreateVol(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; /** * $ zfs create -o volmode=dev -V 10240K test/volname + * $ zfs create -o volmode=dev -s -V 10240K test/volname + * $ zfs create -o volmode=dev -s -o refreservation=1024K -V 10240K test/volname * * -o volmode=dev -- we want to get volumes exposed as cdev * devices. If we don't specify that zfs * will lookup vfs.zfs.vol.mode sysctl value + * -s -- create a sparse volume + * -o refreservation -- reserve the specified amount of space * -V -- tells to create a volume with the specified size */ cmd = virCommandNewArgList(ZFS, "create", NULL); if (volmode_needed) virCommandAddArgList(cmd, "-o", "volmode=dev", NULL); + if (vol->target.capacity != vol->target.allocation) { + virCommandAddArg(cmd, "-s"); + if (vol->target.allocation > 0) { + virCommandAddArg(cmd, "-o"); + virCommandAddArgFormat(cmd, "refreservation=%lluK", + VIR_DIV_UP(vol->target.allocation, 1024)); + } + vol->target.sparse = true; + } virCommandAddArg(cmd, "-V"); virCommandAddArgFormat(cmd, "%lluK", VIR_DIV_UP(vol->target.capacity, 1024)); -- 2.1.4

Richard Laager wrote:
By default, `zfs create -V ...` reserves space for the entire volsize, plus some extra (which attempts to account for overhead).
If `zfs create -s -V ...` is used instead, zvols are (fully) sparse.
A middle ground (partial allocation) can be achieved with `zfs create -s -o refreservation=... -V ...`. Both libvirt and ZFS support this approach, so the ZFS storage backend should support it.
Hi! Thanks for the patch! Some questions inline.
Signed-off-by: Richard Laager <rlaager@wiktel.com> --- src/storage/storage_backend_zfs.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/src/storage/storage_backend_zfs.c b/src/storage/storage_backend_zfs.c index 2e6e407..6dc3cec 100644 --- a/src/storage/storage_backend_zfs.c +++ b/src/storage/storage_backend_zfs.c @@ -112,7 +112,7 @@ virStorageBackendZFSParseVol(virStoragePoolObjPtr pool, if (!(tokens = virStringSplitCount(volume_string, "\t", 0, &count))) return -1;
- if (count != 2) + if (count != 3) goto cleanup;
if (!(name_tokens = virStringSplit(tokens[0], "/", 2))) @@ -151,6 +151,20 @@ virStorageBackendZFSParseVol(virStoragePoolObjPtr pool, goto cleanup; }
+ if (virStrToLong_ull(tokens[2], NULL, 10, &volume->target.allocation) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("malformed refreservation reported")); + goto cleanup; + } + if (volume->target.allocation >= volume->target.capacity) { + /* A zvol created without -s will have a refreservation slightly larger + * than volblocksize. + */ + volume->target.allocation = volume->target.capacity;
What if allocation specified will be much larger than capacity? Neither zfs nor virStorageBackendZFSCreateVol() prevent from creating a volume with: --capacity 2G --allocation 4G Though it'll be displayed as capacity == allocation == 2G by libvirt. What's the reason to limit displayed allocation? PS I noticed that it could be an issue to set refreservation larger than volsize with ZFS on Linux: https://github.com/zfsonlinux/zfs/issues/2468. The issue is still open and at this moment I cannot check if I can reproduce that on Linux. Thanks,
+ } else { + volume->target.sparse = true; + } + if (is_new_vol && VIR_APPEND_ELEMENT(pool->volumes.objs, pool->volumes.count, @@ -190,7 +204,7 @@ virStorageBackendZFSFindVols(virStoragePoolObjPtr pool, cmd = virCommandNewArgList(ZFS, "list", "-Hp", "-t", "volume", "-r", - "-o", "name,volsize", + "-o", "name,volsize,refreservation", pool->def->source.name, NULL); virCommandSetOutputBuffer(cmd, &volumes_list); @@ -320,15 +334,28 @@ virStorageBackendZFSCreateVol(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; /** * $ zfs create -o volmode=dev -V 10240K test/volname + * $ zfs create -o volmode=dev -s -V 10240K test/volname + * $ zfs create -o volmode=dev -s -o refreservation=1024K -V 10240K test/volname * * -o volmode=dev -- we want to get volumes exposed as cdev * devices. If we don't specify that zfs * will lookup vfs.zfs.vol.mode sysctl value + * -s -- create a sparse volume + * -o refreservation -- reserve the specified amount of space * -V -- tells to create a volume with the specified size */ cmd = virCommandNewArgList(ZFS, "create", NULL); if (volmode_needed) virCommandAddArgList(cmd, "-o", "volmode=dev", NULL); + if (vol->target.capacity != vol->target.allocation) { + virCommandAddArg(cmd, "-s"); + if (vol->target.allocation > 0) { + virCommandAddArg(cmd, "-o"); + virCommandAddArgFormat(cmd, "refreservation=%lluK", + VIR_DIV_UP(vol->target.allocation, 1024)); + } + vol->target.sparse = true; + } virCommandAddArg(cmd, "-V"); virCommandAddArgFormat(cmd, "%lluK", VIR_DIV_UP(vol->target.capacity, 1024)); -- 2.1.4
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Roman Bogorodskiy

On 03/05/2016 08:35 AM, Roman Bogorodskiy wrote:
What if allocation specified will be much larger than capacity?
Neither zfs nor virStorageBackendZFSCreateVol() prevent from creating a volume with:
--capacity 2G --allocation 4G
Is this necessary to support? I don't see a use case where this is helpful. If not, the easy solution seems to be to prohibit it. If you really want to support this, then I suppose we can just remove the clamping, which leads to...
What's the reason to limit displayed allocation?
I didn't want --capacity 1G --allocation 1G to show as 1.03G allocation.
PS I noticed that it could be an issue to set refreservation larger than volsize with ZFS on Linux: https://github.com/zfsonlinux/zfs/issues/2468.
The issue is still open and at this moment I cannot check if I can reproduce that on Linux.
I can reproduce this. Setting the refreservation as part of the initial `zfs create` works, but it is not possible to do with `zfs set` later. -- Richard

Richard Laager wrote:
On 03/05/2016 08:35 AM, Roman Bogorodskiy wrote:
What if allocation specified will be much larger than capacity?
Neither zfs nor virStorageBackendZFSCreateVol() prevent from creating a volume with:
--capacity 2G --allocation 4G
Is this necessary to support? I don't see a use case where this is helpful. If not, the easy solution seems to be to prohibit it.
I'm not sure about use cases either. But considering that ZFS on FreeBSD allows to do that and ZoL considers not being able to do that as a bug (as per the link to the github issue I posted earlier), I guess it could be useful in some situations. Actually, the primary question is: do we want to add extra logic on top of what ZFS does (i.e. disallow refres > volize, 'round' displayed allocation) or just make things simple and do and show whatever ZFS allows us to do and to show respectively.
If you really want to support this, then I suppose we can just remove the clamping, which leads to...
What's the reason to limit displayed allocation?
I didn't want --capacity 1G --allocation 1G to show as 1.03G allocation.
PS I noticed that it could be an issue to set refreservation larger than volsize with ZFS on Linux: https://github.com/zfsonlinux/zfs/issues/2468.
The issue is still open and at this moment I cannot check if I can reproduce that on Linux.
I can reproduce this. Setting the refreservation as part of the initial `zfs create` works, but it is not possible to do with `zfs set` later.
Roman Bogorodskiy

By default, `zfs create -V ...` reserves space for the entire volsize, plus some extra (which attempts to account for overhead). If `zfs create -s -V ...` is used instead, zvols are (fully) sparse. A middle ground (partial allocation) can be achieved with `zfs create -s -o refreservation=... -V ...`. Both libvirt and ZFS support this approach, so the ZFS storage backend should support it. Signed-off-by: Richard Laager <rlaager@wiktel.com> --- Notes: This version does not clamp allocation to capacity. src/storage/storage_backend_zfs.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/src/storage/storage_backend_zfs.c b/src/storage/storage_backend_zfs.c index 5238ecc..4d04a80 100644 --- a/src/storage/storage_backend_zfs.c +++ b/src/storage/storage_backend_zfs.c @@ -112,7 +112,7 @@ virStorageBackendZFSParseVol(virStoragePoolObjPtr pool, if (!(tokens = virStringSplitCount(volume_string, "\t", 0, &count))) return -1; - if (count != 2) + if (count != 3) goto cleanup; if (!(name_tokens = virStringSplit(tokens[0], "/", 2))) @@ -151,6 +151,15 @@ virStorageBackendZFSParseVol(virStoragePoolObjPtr pool, goto cleanup; } + if (virStrToLong_ull(tokens[2], NULL, 10, &volume->target.allocation) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("malformed refreservation reported")); + goto cleanup; + } + + if (volume->target.allocation < volume->target.capacity) + volume->target.sparse = true; + if (is_new_vol && VIR_APPEND_ELEMENT(pool->volumes.objs, pool->volumes.count, @@ -190,7 +199,7 @@ virStorageBackendZFSFindVols(virStoragePoolObjPtr pool, cmd = virCommandNewArgList(ZFS, "list", "-Hp", "-t", "volume", "-r", - "-o", "name,volsize", + "-o", "name,volsize,refreservation", pool->def->source.name, NULL); virCommandSetOutputBuffer(cmd, &volumes_list); @@ -329,15 +338,28 @@ virStorageBackendZFSCreateVol(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; /** * $ zfs create -o volmode=dev -V 10240K test/volname + * $ zfs create -o volmode=dev -s -V 10240K test/volname + * $ zfs create -o volmode=dev -s -o refreservation=1024K -V 10240K test/volname * * -o volmode=dev -- we want to get volumes exposed as cdev * devices. If we don't specify that zfs * will lookup vfs.zfs.vol.mode sysctl value + * -s -- create a sparse volume + * -o refreservation -- reserve the specified amount of space * -V -- tells to create a volume with the specified size */ cmd = virCommandNewArgList(ZFS, "create", NULL); if (volmode_needed) virCommandAddArgList(cmd, "-o", "volmode=dev", NULL); + if (vol->target.capacity != vol->target.allocation) { + virCommandAddArg(cmd, "-s"); + if (vol->target.allocation > 0) { + virCommandAddArg(cmd, "-o"); + virCommandAddArgFormat(cmd, "refreservation=%lluK", + VIR_DIV_UP(vol->target.allocation, 1024)); + } + vol->target.sparse = true; + } virCommandAddArg(cmd, "-V"); virCommandAddArgFormat(cmd, "%lluK", VIR_DIV_UP(vol->target.capacity, 1024)); -- 2.1.4

Richard Laager wrote:
By default, `zfs create -V ...` reserves space for the entire volsize, plus some extra (which attempts to account for overhead).
If `zfs create -s -V ...` is used instead, zvols are (fully) sparse.
A middle ground (partial allocation) can be achieved with `zfs create -s -o refreservation=... -V ...`. Both libvirt and ZFS support this approach, so the ZFS storage backend should support it.
Signed-off-by: Richard Laager <rlaager@wiktel.com> ---
Notes: This version does not clamp allocation to capacity.
src/storage/storage_backend_zfs.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-)
ACK and pushed, thanks! Roman Bogorodskiy

On 03/16/2016 03:26 AM, Roman Bogorodskiy wrote:
Actually, the primary question is: do we want to add extra logic on top of what ZFS does (i.e. disallow refres > volize, 'round' displayed allocation) or just make things simple and do and show whatever ZFS allows us to do and to show respectively.
I'm not necessarily sure I know the answer. So that probably argues for the simpler approach of libvirt just being a pass-through. Also, it's possible to add the extra code later, if that turns out to be better. I've already submitted patches taking each approach. What do you think is the next step? -- Richard

Richard Laager wrote:
On 03/16/2016 03:26 AM, Roman Bogorodskiy wrote:
Actually, the primary question is: do we want to add extra logic on top of what ZFS does (i.e. disallow refres > volize, 'round' displayed allocation) or just make things simple and do and show whatever ZFS allows us to do and to show respectively.
I'm not necessarily sure I know the answer. So that probably argues for the simpler approach of libvirt just being a pass-through. Also, it's possible to add the extra code later, if that turns out to be better.
I've already submitted patches taking each approach. What do you think is the next step?
Hi Richard, I'm really sorry, I saw the updated patch and overall it looks good, but I haven't had time to look closer. I hope to do it this weekend and give it some testing on FreeBSD and push it if everything is fine. Roman Bogorodskiy
participants (2)
-
Richard Laager
-
Roman Bogorodskiy