[libvirt] [PATCH v2 0/2] Storage fixes for libvirt.

From: Prerna Saxena <prerna@linux.vnet.ibm.com> Date: Mon, 22 Jun 2015 06:12:24 -0500 This is the second version of storage fixes. V1 : http://www.redhat.com/archives/libvir-list/2015-May/msg00050.html Summary: ------------ Prerna Saxena (2): Storage: Introduce shadow vol for refresh while the main vol builds. Storage : Fix cloning of raw, sparse volumes src/storage/storage_backend.c | 10 +++++----- src/storage/storage_driver.c | 20 +++++++++++++------- 2 files changed, 18 insertions(+), 12 deletions(-) Changelog: -------------- 1. Dropped patch 1/2 of v1 as per jtomko's suggestion; and introduced a new patch for shadow-volume based cloning. 2. Reworked patch 2/2 incorporating review comments. -- Regards, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

From: Prerna Saxena <prerna@linux.vnet.ibm.com> Date: Thu, 18 Jun 2015 05:05:09 -0500 Libvirt periodically refreshes all volumes in a storage pool, including the volumes being cloned. While cloning a storage volume from parent, we drop pool locks. Subsequent volume refresh sometimes changes allocation for an ongoing copy, and leads to corrupt images. Fix: Introduce a shadow volume that isolates the volume object under refresh from the base which has a copy ongoing. Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com> --- src/storage/storage_driver.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c index 57060ab..4980546 100644 --- a/src/storage/storage_driver.c +++ b/src/storage/storage_driver.c @@ -1898,7 +1898,7 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, { virStoragePoolObjPtr pool, origpool = NULL; virStorageBackendPtr backend; - virStorageVolDefPtr origvol = NULL, newvol = NULL; + virStorageVolDefPtr origvol = NULL, newvol = NULL, shadowvol = NULL; virStorageVolPtr ret = NULL, volobj = NULL; unsigned long long allocation; int buildret; @@ -2010,6 +2010,17 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, if (backend->createVol(obj->conn, pool, newvol) < 0) goto cleanup; + /* Make a shallow copy of the 'defined' volume definition, since the + * original allocation value will change as the user polls 'info', + * but we only need the initial requested values + */ + if (VIR_ALLOC(shadowvol) < 0) { + newvol = NULL; + goto cleanup; + } + + memcpy(shadowvol, newvol, sizeof(*newvol)); + pool->volumes.objs[pool->volumes.count++] = newvol; volobj = virGetStorageVol(obj->conn, pool->def->name, newvol->name, newvol->key, NULL, NULL); @@ -2029,7 +2040,7 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, virStoragePoolObjUnlock(origpool); } - buildret = backend->buildVolFrom(obj->conn, pool, newvol, origvol, flags); + buildret = backend->buildVolFrom(obj->conn, pool, shadowvol, origvol, flags); storageDriverLock(); virStoragePoolObjLock(pool); -- 1.8.3.1 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Mon, Jun 22, 2015 at 05:07:26PM +0530, Prerna Saxena wrote:
From: Prerna Saxena <prerna@linux.vnet.ibm.com> Date: Thu, 18 Jun 2015 05:05:09 -0500
Libvirt periodically refreshes all volumes in a storage pool, including the volumes being cloned. While cloning a storage volume from parent, we drop pool locks. Subsequent volume refresh sometimes changes allocation for an ongoing copy, and leads to corrupt images. Fix: Introduce a shadow volume that isolates the volume object under refresh from the base which has a copy ongoing.
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com> --- src/storage/storage_driver.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c index 57060ab..4980546 100644 --- a/src/storage/storage_driver.c +++ b/src/storage/storage_driver.c @@ -1898,7 +1898,7 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, { virStoragePoolObjPtr pool, origpool = NULL; virStorageBackendPtr backend; - virStorageVolDefPtr origvol = NULL, newvol = NULL; + virStorageVolDefPtr origvol = NULL, newvol = NULL, shadowvol = NULL; virStorageVolPtr ret = NULL, volobj = NULL; unsigned long long allocation; int buildret; @@ -2010,6 +2010,17 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, if (backend->createVol(obj->conn, pool, newvol) < 0) goto cleanup;
+ /* Make a shallow copy of the 'defined' volume definition, since the + * original allocation value will change as the user polls 'info', + * but we only need the initial requested values + */ + if (VIR_ALLOC(shadowvol) < 0) { + newvol = NULL;
newvol has not been added to pool->volumes.objs yet, so we should free it on the cleanup path. shadowvol should also be VIR_FREE'd.
+ goto cleanup; + } + + memcpy(shadowvol, newvol, sizeof(*newvol)); + pool->volumes.objs[pool->volumes.count++] = newvol; volobj = virGetStorageVol(obj->conn, pool->def->name, newvol->name, newvol->key, NULL, NULL); @@ -2029,7 +2040,7 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, virStoragePoolObjUnlock(origpool); }
- buildret = backend->buildVolFrom(obj->conn, pool, newvol, origvol, flags); + buildret = backend->buildVolFrom(obj->conn, pool, shadowvol, origvol, flags);
A few lines below, there is one more usage of newvol after the pool has been unlocked: newvol->target.allocation If the parallel volume refresh happened when the volume was not fully allocated, this might not contain the intended allocation. Using shadowvol->target.allocation will behave the same regardless of a parallel refresh (even though the buildVolFrom function might not honor the allocation exactly). Jan

Hi Jan, Thanks for the review comments. On Tuesday 23 June 2015 06:21 PM, Ján Tomko wrote:
On Mon, Jun 22, 2015 at 05:07:26PM +0530, Prerna Saxena wrote:
From: Prerna Saxena <prerna@linux.vnet.ibm.com> Date: Thu, 18 Jun 2015 05:05:09 -0500
Libvirt periodically refreshes all volumes in a storage pool, including the volumes being cloned. While cloning a storage volume from parent, we drop pool locks. Subsequent volume refresh sometimes changes allocation for an ongoing copy, and leads to corrupt images. Fix: Introduce a shadow volume that isolates the volume object under refresh from the base which has a copy ongoing.
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com> --- src/storage/storage_driver.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c index 57060ab..4980546 100644 --- a/src/storage/storage_driver.c +++ b/src/storage/storage_driver.c @@ -1898,7 +1898,7 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, { virStoragePoolObjPtr pool, origpool = NULL; virStorageBackendPtr backend; - virStorageVolDefPtr origvol = NULL, newvol = NULL; + virStorageVolDefPtr origvol = NULL, newvol = NULL, shadowvol = NULL; virStorageVolPtr ret = NULL, volobj = NULL; unsigned long long allocation; int buildret; @@ -2010,6 +2010,17 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, if (backend->createVol(obj->conn, pool, newvol) < 0) goto cleanup;
+ /* Make a shallow copy of the 'defined' volume definition, since the + * original allocation value will change as the user polls 'info', + * but we only need the initial requested values + */ + if (VIR_ALLOC(shadowvol) < 0) { + newvol = NULL; newvol has not been added to pool->volumes.objs yet, so we should free it on the cleanup path. shadowvol should also be VIR_FREE'd.
Thanks, I'd missed this. Will be addressed in subsequent patch.
+ goto cleanup; + } + + memcpy(shadowvol, newvol, sizeof(*newvol)); + pool->volumes.objs[pool->volumes.count++] = newvol; volobj = virGetStorageVol(obj->conn, pool->def->name, newvol->name, newvol->key, NULL, NULL); @@ -2029,7 +2040,7 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, virStoragePoolObjUnlock(origpool); }
- buildret = backend->buildVolFrom(obj->conn, pool, newvol, origvol, flags); + buildret = backend->buildVolFrom(obj->conn, pool, shadowvol, origvol, flags);
A few lines below, there is one more usage of newvol after the pool has been unlocked: newvol->target.allocation
If the parallel volume refresh happened when the volume was not fully allocated, this might not contain the intended allocation.
Using shadowvol->target.allocation will behave the same regardless of a parallel refresh (even though the buildVolFrom function might not honor the allocation exactly).
Right. The buildVolFrom() call completes an fsync and then returns. In my experimental runs, a parallel refresh would always happen by the time I got to this point; so this had missed me. But ofcourse we can never say that for sure. Will fix in the next patch. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

From: Prerna Saxena <prerna@linux.vnet.ibm.com> Date: Mon, 22 Jun 2015 02:54:32 -0500 When virsh vol-clone is attempted on a raw file where capacity > allocation, the resulting cloned volume has a size that matches the virtual-size of the parent; in place of matching its actual, disk size. This patch fixes the cloned disk to have same _allocated_size_ as the parent file from which it was cloned. Ref: http://www.redhat.com/archives/libvir-list/2015-May/msg00050.html Also fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1130739 Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com> --- src/storage/storage_backend.c | 10 +++++----- src/storage/storage_driver.c | 5 ----- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index ce59f63..c99b718 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -342,7 +342,7 @@ virStorageBackendCreateBlockFrom(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; } - remain = vol->target.allocation; + remain = vol->target.capacity; if (inputvol) { int res = virStorageBackendCopyToFD(vol, inputvol, @@ -397,7 +397,7 @@ createRawFile(int fd, virStorageVolDefPtr vol, virStorageVolDefPtr inputvol, bool reflink_copy) { - bool need_alloc = true; + bool need_alloc = !(inputvol && (inputvol->target.capacity > inputvol->target.allocation)); int ret = 0; unsigned long long remain; @@ -420,7 +420,7 @@ createRawFile(int fd, virStorageVolDefPtr vol, * to writing zeroes block by block in case fallocate isn't * available, and since we're going to copy data from another * file it doesn't make sense to write the file twice. */ - if (vol->target.allocation) { + if (vol->target.allocation && need_alloc) { if (fallocate(fd, 0, 0, vol->target.allocation) == 0) { need_alloc = false; } else if (errno != ENOSYS && errno != EOPNOTSUPP) { @@ -433,14 +433,14 @@ createRawFile(int fd, virStorageVolDefPtr vol, } #endif - remain = vol->target.allocation; + remain = vol->target.capacity; if (inputvol) { /* allow zero blocks to be skipped if we've requested sparse * allocation (allocation < capacity) or we have already * been able to allocate the required space. */ bool want_sparse = !need_alloc || - (vol->target.allocation < inputvol->target.capacity); + (inputvol->target.allocation < inputvol->target.capacity); ret = virStorageBackendCopyToFD(vol, inputvol, fd, &remain, want_sparse, reflink_copy); diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c index 4980546..c1dfe89 100644 --- a/src/storage/storage_driver.c +++ b/src/storage/storage_driver.c @@ -1976,11 +1976,6 @@ storageVolCreateXMLFrom(virStoragePoolPtr obj, if (newvol->target.capacity < origvol->target.capacity) newvol->target.capacity = origvol->target.capacity; - /* Make sure allocation is at least as large as the destination cap, - * to make absolutely sure we copy all possible contents */ - if (newvol->target.allocation < origvol->target.capacity) - newvol->target.allocation = origvol->target.capacity; - if (!backend->buildVolFrom) { virReportError(VIR_ERR_NO_SUPPORT, "%s", _("storage pool does not support" -- 1.8.3.1 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Mon, Jun 22, 2015 at 05:09:18PM +0530, Prerna Saxena wrote:
From: Prerna Saxena <prerna@linux.vnet.ibm.com> Date: Mon, 22 Jun 2015 02:54:32 -0500
When virsh vol-clone is attempted on a raw file where capacity > allocation, the resulting cloned volume has a size that matches the virtual-size of the parent; in place of matching its actual, disk size. This patch fixes the cloned disk to have same _allocated_size_ as the parent file from which it was cloned.
Ref: http://www.redhat.com/archives/libvir-list/2015-May/msg00050.html
Also fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1130739
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com> --- src/storage/storage_backend.c | 10 +++++----- src/storage/storage_driver.c | 5 ----- 2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index ce59f63..c99b718 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -342,7 +342,7 @@ virStorageBackendCreateBlockFrom(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; }
- remain = vol->target.allocation; + remain = vol->target.capacity;
if (inputvol) { int res = virStorageBackendCopyToFD(vol, inputvol, @@ -397,7 +397,7 @@ createRawFile(int fd, virStorageVolDefPtr vol, virStorageVolDefPtr inputvol, bool reflink_copy) { - bool need_alloc = true; + bool need_alloc = !(inputvol && (inputvol->target.capacity > inputvol->target.allocation));
Comparing 'inputvol->target.capacity > vol->target.allocation' would allow creating a sparse volume from a non-sparse one and vice-versa by specifying the correct allocation.
int ret = 0; unsigned long long remain;
@@ -420,7 +420,7 @@ createRawFile(int fd, virStorageVolDefPtr vol, * to writing zeroes block by block in case fallocate isn't * available, and since we're going to copy data from another * file it doesn't make sense to write the file twice. */ - if (vol->target.allocation) { + if (vol->target.allocation && need_alloc) { if (fallocate(fd, 0, 0, vol->target.allocation) == 0) { need_alloc = false; } else if (errno != ENOSYS && errno != EOPNOTSUPP) { @@ -433,14 +433,14 @@ createRawFile(int fd, virStorageVolDefPtr vol, } #endif
- remain = vol->target.allocation; + remain = vol->target.capacity;
if (inputvol) { /* allow zero blocks to be skipped if we've requested sparse * allocation (allocation < capacity) or we have already * been able to allocate the required space. */ bool want_sparse = !need_alloc || - (vol->target.allocation < inputvol->target.capacity); + (inputvol->target.allocation < inputvol->target.capacity);
If allocation < capacity, then need_alloc is already false. Jan

On Tuesday 23 June 2015 06:29 PM, Ján Tomko wrote:
On Mon, Jun 22, 2015 at 05:09:18PM +0530, Prerna Saxena wrote:
From: Prerna Saxena <prerna@linux.vnet.ibm.com> Date: Mon, 22 Jun 2015 02:54:32 -0500
When virsh vol-clone is attempted on a raw file where capacity > allocation, the resulting cloned volume has a size that matches the virtual-size of the parent; in place of matching its actual, disk size. This patch fixes the cloned disk to have same _allocated_size_ as the parent file from which it was cloned.
Ref: http://www.redhat.com/archives/libvir-list/2015-May/msg00050.html
Also fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1130739
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com> --- src/storage/storage_backend.c | 10 +++++----- src/storage/storage_driver.c | 5 ----- 2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index ce59f63..c99b718 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -342,7 +342,7 @@ virStorageBackendCreateBlockFrom(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; }
- remain = vol->target.allocation; + remain = vol->target.capacity;
if (inputvol) { int res = virStorageBackendCopyToFD(vol, inputvol, @@ -397,7 +397,7 @@ createRawFile(int fd, virStorageVolDefPtr vol, virStorageVolDefPtr inputvol, bool reflink_copy) { - bool need_alloc = true; + bool need_alloc = !(inputvol && (inputvol->target.capacity > inputvol->target.allocation)); Comparing 'inputvol->target.capacity > vol->target.allocation' would allow creating a sparse volume from a non-sparse one and vice-versa by specifying the correct allocation.
Ok, I was not sure if libvirt wanted to do that. If the parent volume is a sparse volume, I'd expect the cloned volume to be sparse too. Likewise, for a non-sparse parent, the cloned volume should also be non-sparse. Should that not be something we honour in libvirt, when we clone ?
int ret = 0; unsigned long long remain;
@@ -420,7 +420,7 @@ createRawFile(int fd, virStorageVolDefPtr vol, * to writing zeroes block by block in case fallocate isn't * available, and since we're going to copy data from another * file it doesn't make sense to write the file twice. */ - if (vol->target.allocation) { + if (vol->target.allocation && need_alloc) { if (fallocate(fd, 0, 0, vol->target.allocation) == 0) { need_alloc = false; } else if (errno != ENOSYS && errno != EOPNOTSUPP) { @@ -433,14 +433,14 @@ createRawFile(int fd, virStorageVolDefPtr vol, } #endif
- remain = vol->target.allocation; + remain = vol->target.capacity;
if (inputvol) { /* allow zero blocks to be skipped if we've requested sparse * allocation (allocation < capacity) or we have already * been able to allocate the required space. */ bool want_sparse = !need_alloc || - (vol->target.allocation < inputvol->target.capacity); + (inputvol->target.allocation < inputvol->target.capacity);
If allocation < capacity, then need_alloc is already false.
I was trying to accomodate the original usecase of need_alloc, but you are right. This will go away in the next version of this series, which I will post shortly. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
participants (2)
-
Ján Tomko
-
Prerna Saxena