[libvirt] [PATCH] blockcopy: check dst = identical device

Check whether dst is the same device as source, if yes, report error and exit. Currently if dst is the same device as source, blockcopy is still going and qemu 'drive-mirror' is executed. Considering that: a). blockcopy to the same device is meaningless. b.) result is unexpected. (tested with block device whose source path is /dev/sdaX, after blockcopy, shutdown VM and then create VM from xml again, the VM cannot be started.) This case should not be allowed. Signed-off-by: Chunyan Liu <cyliu@suse.com> --- src/qemu/qemu_driver.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 704ba39..87a3790 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15309,6 +15309,13 @@ qemuDomainBlockCopy(virDomainObjPtr vm, } /* Prepare the destination file. */ + if (STREQ(disk->src->path, dest)) { + virReportError(VIR_ERR_INVALID_ARG, + _("destination '%s' is the same as disk '%s' source"), + dest, path); + goto endjob; + } + if (stat(dest, &st) < 0) { if (errno != ENOENT) { virReportSystemError(errno, _("unable to stat for disk %s: %s"), -- 1.8.4.5

On Tue, Jul 29, 2014 at 05:59:32PM +0800, Chunyan Liu wrote:
Check whether dst is the same device as source, if yes, report error and exit.
Currently if dst is the same device as source, blockcopy is still going and qemu 'drive-mirror' is executed. Considering that: a). blockcopy to the same device is meaningless.
True, I can't think of any useful reason to have dst and src be same device, except for a negative test case.
b.) result is unexpected. (tested with block device whose source path is /dev/sdaX, after blockcopy, shutdown VM and then create VM from xml again, the VM cannot be started.) This case should not be allowed.
Signed-off-by: Chunyan Liu <cyliu@suse.com> --- src/qemu/qemu_driver.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 704ba39..87a3790 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15309,6 +15309,13 @@ qemuDomainBlockCopy(virDomainObjPtr vm, }
/* Prepare the destination file. */ + if (STREQ(disk->src->path, dest)) { + virReportError(VIR_ERR_INVALID_ARG, + _("destination '%s' is the same as disk '%s' source"), + dest, path); + goto endjob; + } + if (stat(dest, &st) < 0) { if (errno != ENOENT) { virReportSystemError(errno, _("unable to stat for disk %s: %s"), -- 1.8.4.5
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- /kashyap

On 07/29/2014 03:59 AM, Chunyan Liu wrote:
Check whether dst is the same device as source, if yes, report error and exit.
Currently if dst is the same device as source, blockcopy is still going and qemu 'drive-mirror' is executed. Considering that: a). blockcopy to the same device is meaningless. b.) result is unexpected. (tested with block device whose source path is /dev/sdaX, after blockcopy, shutdown VM and then create VM from xml again, the VM cannot be started.) This case should not be allowed.
Signed-off-by: Chunyan Liu <cyliu@suse.com> --- src/qemu/qemu_driver.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 704ba39..87a3790 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15309,6 +15309,13 @@ qemuDomainBlockCopy(virDomainObjPtr vm, }
/* Prepare the destination file. */ + if (STREQ(disk->src->path, dest)) {
STREQ is too weak (consider symlinks, or even "a/b" vs. "a//b"). It will catch some cases, but not all. I don't know that we can reliably detect all possible ways the user can shoot themselves in the foot, so I'm not sure this patch is a good idea. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/29/2014 07:37 AM, Eric Blake wrote:
On 07/29/2014 03:59 AM, Chunyan Liu wrote:
Check whether dst is the same device as source, if yes, report error and exit.
Currently if dst is the same device as source, blockcopy is still going and qemu 'drive-mirror' is executed. Considering that: a). blockcopy to the same device is meaningless. b.) result is unexpected. (tested with block device whose source path is /dev/sdaX, after blockcopy, shutdown VM and then create VM from xml again, the VM cannot be started.) This case should not be allowed.
Signed-off-by: Chunyan Liu <cyliu@suse.com> --- src/qemu/qemu_driver.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 704ba39..87a3790 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15309,6 +15309,13 @@ qemuDomainBlockCopy(virDomainObjPtr vm, }
/* Prepare the destination file. */ + if (STREQ(disk->src->path, dest)) {
STREQ is too weak (consider symlinks, or even "a/b" vs. "a//b"). It will catch some cases, but not all.
I don't know that we can reliably detect all possible ways the user can shoot themselves in the foot, so I'm not sure this patch is a good idea.
A better idea would be to rely on the volume lease manager - obtaining a lease should be impossible for an image already in use (and should even cover the case of copying 'base <- active' onto 'base', which your equality test wouldn't catch). I'm not sure why the lease manager is not already flagging this issue - are we still using the nop lease manager by default, and would the fcntl or sanlock lease manager do a better job? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 7/29/2014 at 11:47 PM, in message <53D7C1F6.4030705@redhat.com>, Eric Blake <eblake@redhat.com> wrote: On 07/29/2014 07:37 AM, Eric Blake wrote: On 07/29/2014 03:59 AM, Chunyan Liu wrote: Check whether dst is the same device as source, if yes, report error and exit.
Currently if dst is the same device as source, blockcopy is still going and qemu 'drive-mirror' is executed. Considering that: a). blockcopy to the same device is meaningless. b.) result is unexpected. (tested with block device whose source path is /dev/sdaX, after blockcopy, shutdown VM and then create VM from xml again, the VM cannot be started.) This case should not be allowed.
Signed-off-by: Chunyan Liu <cyliu@suse.com> --- src/qemu/qemu_driver.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 704ba39..87a3790 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15309,6 +15309,13 @@ qemuDomainBlockCopy(virDomainObjPtr vm, }
/* Prepare the destination file. */ + if (STREQ(disk->src->path, dest)) {
STREQ is too weak (consider symlinks, or even "a/b" vs. "a//b"). It will catch some cases, but not all.
I don't know that we can reliably detect all possible ways the user can shoot themselves in the foot, so I'm not sure this patch is a good idea.
A better idea would be to rely on the volume lease manager - obtaining a lease should be impossible for an image already in use (and should even cover the case of copying 'base <- active' onto 'base', which your equality test wouldn't catch). I'm not sure why the lease manager is not already flagging this issue - are we still using the nop lease manager by default, and would the fcntl or sanlock lease manager do a better job?
According to code, it's still using 'nop' by default. Code: if (!(qemu_driver->lockManager = virLockManagerPluginNew(cfg->lockManagerName ? cfg->lockManagerName : "nop", "qemu", cfg->configBaseDir, 0))) goto error; Saw Daniel's virlockd patch series, the last patch: "Change the default QEMU lock manager to 'lockd'" is ACKed. But this patch is not committed. All other patches in that series have been committed. http://www.redhat.com/archives/libvir-list/2012-December/msg00734.html
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 7/29/2014 at 11:47 PM, in message <53D7C1F6.4030705@redhat.com>, Eric Blake <eblake@redhat.com> wrote: On 07/29/2014 07:37 AM, Eric Blake wrote: On 07/29/2014 03:59 AM, Chunyan Liu wrote: Check whether dst is the same device as source, if yes, report error and exit.
Currently if dst is the same device as source, blockcopy is still going and qemu 'drive-mirror' is executed. Considering that: a). blockcopy to the same device is meaningless. b.) result is unexpected. (tested with block device whose source path is /dev/sdaX, after blockcopy, shutdown VM and then create VM from xml again, the VM cannot be started.) This case should not be allowed.
Signed-off-by: Chunyan Liu <cyliu@suse.com> --- src/qemu/qemu_driver.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 704ba39..87a3790 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15309,6 +15309,13 @@ qemuDomainBlockCopy(virDomainObjPtr vm, }
/* Prepare the destination file. */ + if (STREQ(disk->src->path, dest)) {
STREQ is too weak (consider symlinks, or even "a/b" vs. "a//b"). It will catch some cases, but not all.
I don't know that we can reliably detect all possible ways the user can shoot themselves in the foot, so I'm not sure this patch is a good idea.
A better idea would be to rely on the volume lease manager - obtaining a lease should be impossible for an image already in use (and should even cover the case of copying 'base <- active' onto 'base', which your equality test wouldn't catch). I'm not sure why the lease manager is not already flagging this issue - are we still using the nop lease manager by default, and would the fcntl or sanlock lease manager do a better job?
Besides the default lock is 'nop', currently lock manager is only used in: VM start/stop and attach/detach disk, blockcopy not using it. To use it in blockcopy, maybe can refer to attach/detach disk: before doing blockcopy, try AcquireResource; after blockcopy finish, try releaseResource. Anyway, using lock manager is a better idea in such cases.
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/30/2014 09:29 PM, Chun Yan Liu wrote:
A better idea would be to rely on the volume lease manager - obtaining a lease should be impossible for an image already in use (and should even cover the case of copying 'base <- active' onto 'base', which your equality test wouldn't catch). I'm not sure why the lease manager is not already flagging this issue - are we still using the nop lease manager by default, and would the fcntl or sanlock lease manager do a better job?
Besides the default lock is 'nop', currently lock manager is only used in: VM start/stop and attach/detach disk, blockcopy not using it.
But that's not true - the code IS trying to use it. qemuDomainBlockCopy calls qemuDomainPrepareDiskChainElement(), which calls virDomainLockImageAttach(), and that should be the use of the lock manager. Can you debug why it is not working when using something other than the 'nop' manager?
To use it in blockcopy, maybe can refer to attach/detach disk: before doing blockcopy, try AcquireResource; after blockcopy finish, try releaseResource.
That should be what is already happening. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 7/31/2014 at 11:35 AM, in message <53D9B993.4040806@redhat.com>, Eric Blake <eblake@redhat.com> wrote: On 07/30/2014 09:29 PM, Chun Yan Liu wrote:
A better idea would be to rely on the volume lease manager - obtaining a lease should be impossible for an image already in use (and should even cover the case of copying 'base <- active' onto 'base', which your equality test wouldn't catch). I'm not sure why the lease manager is not already flagging this issue - are we still using the nop lease manager by default, and would the fcntl or sanlock lease manager do a better job?
Besides the default lock is 'nop', currently lock manager is only used in: VM start/stop and attach/detach disk, blockcopy not using it.
But that's not true - the code IS trying to use it. qemuDomainBlockCopy calls qemuDomainPrepareDiskChainElement(), which calls virDomainLockImageAttach(), and that should be the use of the lock manager. Can you debug why it is not working when using something other than the 'nop' manager?
Yeah. I'll have a look.
To use it in blockcopy, maybe can refer to attach/detach disk: before doing blockcopy, try AcquireResource; after blockcopy finish, try releaseResource.
That should be what is already happening.
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 7/31/2014 at 11:35 AM, in message <53D9B993.4040806@redhat.com>, Eric Blake <eblake@redhat.com> wrote: On 07/30/2014 09:29 PM, Chun Yan Liu wrote:
A better idea would be to rely on the volume lease manager - obtaining a lease should be impossible for an image already in use (and should even cover the case of copying 'base <- active' onto 'base', which your equality test wouldn't catch). I'm not sure why the lease manager is not already flagging this issue - are we still using the nop lease manager by default, and would the fcntl or sanlock lease manager do a better job?
Besides the default lock is 'nop', currently lock manager is only used in: VM start/stop and attach/detach disk, blockcopy not using it.
But that's not true - the code IS trying to use it. qemuDomainBlockCopy calls qemuDomainPrepareDiskChainElement(), which calls virDomainLockImageAttach(), and that should be the use of the lock manager. Can you debug why it is not working when using something other than the 'nop' manager?
Update: 1. identical device with exactly the same spelled name. e.g. source is /dev/sda4, dest is /dev/sda4: * 'lockd' manager is working well with --reuse-external. * my previous testing result which makes the VM cannot be started successfully after doing blockcopy to same device and shutdown VM and start VM again, because I didn't add --reuse-external option. Then in code, it considers the dest as a new created file, when error happens, it unlinks the dest, which is actually also my source disk. 2. identical device with not exactly the same spelling. e.g. source is /dev/sda4, dest is /dev///sda4, or a softlink: 'lockd' manager is not working. Both hashtable check and fcntl can't give right result. * HashTable checks the resource in the locked list depending on the name. If name is not exactly the same, it will treat as 'not found in locked list'. * In fcntl stage, since fcntl is to one process, in the same process, do two times fcntl F_SETLK, both will succeed. Since everytime it's virtlockd tries fcntl, so in this case, 1st time fcntl /dev/sda4, it succeeds; 2nd time fcntl /dev///sda4, also succeeds. Not as we expected 'cannot get the lock'.
To use it in blockcopy, maybe can refer to attach/detach disk: before doing blockcopy, try AcquireResource; after blockcopy finish, try releaseResource.
That should be what is already happening.
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 7/31/2014 at 05:52 PM, in message <53DA11DB.FE6 : 102 : 21807>, Chun Yan Liu wrote:
On 7/31/2014 at 11:35 AM, in message <53D9B993.4040806@redhat.com>, Eric Blake <eblake@redhat.com> wrote: On 07/30/2014 09:29 PM, Chun Yan Liu wrote:
A better idea would be to rely on the volume lease manager - obtaining a lease should be impossible for an image already in use (and should even cover the case of copying 'base <- active' onto 'base', which your equality test wouldn't catch). I'm not sure why the lease manager is not already flagging this issue - are we still using the nop lease manager by default, and would the fcntl or sanlock lease manager do a better job?
Besides the default lock is 'nop', currently lock manager is only used in: VM start/stop and attach/detach disk, blockcopy not using it.
But that's not true - the code IS trying to use it. qemuDomainBlockCopy calls qemuDomainPrepareDiskChainElement(), which calls virDomainLockImageAttach(), and that should be the use of the lock manager. Can you debug why it is not working when using something other than the 'nop' manager?
Update: 1. identical device with exactly the same spelled name. e.g. source is /dev/sda4, dest is /dev/sda4: * 'lockd' manager is working well with --reuse-external. * my previous testing result which makes the VM cannot be started successfully after doing blockcopy to same device and shutdown VM and start VM again, because I didn't add --reuse-external option. Then in code, it considers the dest as a new created file, when error happens, it unlinks the dest, which s actually also my source disk. 2. identical device with not exactly the same spelling. e.g. source is /dev/sda4, dest is /dev///sda4, or a softlink: 'lockd' manager is not working. Both hashtable check and fcntl can't give right result. * HashTable checks the resource in the locked list depending on the name. If name is not exactly the same, it will treat as 'not found in locked list'. * In fcntl stage, since fcntl is to one process, in the same process, do two times fcntl F_SETLK, both will succeed. Since everytime it's virtlockd tries fcntl, so in this case, 1st time fcntl /dev/sda4, it succeeds; 2nd time fcntl /dev///sda4, also succeeds. Not as we expected 'cannot get the lock'.
About the 2nd point, 'lockd' manager cannot handle identical device but different name (extra / in name or softlink), I didn't think of a quick fix. Maybe still better to check src and dst name in earlier stage. Could call realpath() to cover more cases. Any other ideas?
To use it in blockcopy, maybe can refer to attach/detach disk: before doing blockcopy, try AcquireResource; after blockcopy finish, try releaseResource.
That should be what is already happening.
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (4)
-
Chun Yan Liu
-
Chunyan Liu
-
Eric Blake
-
Kashyap Chamarthy