[libvirt] [PATCH 0/2] qemuProcessStop: Simulate blockjob --abort on destroy

See 2/2 for explanation. Michal Prívozník (2): qemu: Separate image metadata removal into a function qemuProcessStop: Simulate blockjob --abort on destroy src/qemu/qemu_block.c | 25 ++++++++++++++++++++++ src/qemu/qemu_block.h | 6 ++++++ src/qemu/qemu_blockjob.c | 45 ++++------------------------------------ src/qemu/qemu_process.c | 9 ++++++++ 4 files changed, 44 insertions(+), 41 deletions(-) -- 2.23.0

There are four places where we remove image XATTRs and in all of them we have the same for() loop with the same body. Move it into a separate function because I'm about to introduce fifth place where the same needs to be done. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_block.c | 25 ++++++++++++++++++++++ src/qemu/qemu_block.h | 6 ++++++ src/qemu/qemu_blockjob.c | 45 ++++------------------------------------ 3 files changed, 35 insertions(+), 41 deletions(-) diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c index b8f0208742..1140a33114 100644 --- a/src/qemu/qemu_block.c +++ b/src/qemu/qemu_block.c @@ -22,6 +22,7 @@ #include "qemu_command.h" #include "qemu_domain.h" #include "qemu_alias.h" +#include "qemu_security.h" #include "viralloc.h" #include "virstring.h" @@ -2588,3 +2589,27 @@ qemuBlockStorageSourceCreateDetectSize(virHashTablePtr blockNamedNodeData, return 0; } + + +int +qemuBlockRemoveImageMetadata(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *diskTarget, + virStorageSourcePtr src) +{ + virStorageSourcePtr n; + int ret = 0; + + for (n = src; virStorageSourceIsBacking(n); n = n->backingStore) { + if (qemuSecurityMoveImageMetadata(driver, vm, n, NULL) < 0) { + VIR_WARN("Unable to remove disk metadata on " + "vm %s from %s (disk target %s)", + vm->def->name, + NULLSTR(n->path), + diskTarget); + ret = -1; + } + } + + return ret; +} diff --git a/src/qemu/qemu_block.h b/src/qemu/qemu_block.h index 5ddeeaf6e2..5854641027 100644 --- a/src/qemu/qemu_block.h +++ b/src/qemu/qemu_block.h @@ -197,3 +197,9 @@ int qemuBlockStorageSourceCreateDetectSize(virHashTablePtr blockNamedNodeData, virStorageSourcePtr src, virStorageSourcePtr templ); + +int +qemuBlockRemoveImageMetadata(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *diskTarget, + virStorageSourcePtr src); diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 5c294f8024..92e4d391c9 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -658,36 +658,18 @@ qemuBlockJobEventProcessLegacyCompleted(virQEMUDriverPtr driver, virObjectUnref(disk->src); disk->src = disk->mirror; } else { - virStorageSourcePtr n; - if (disk->mirror) { virDomainLockImageDetach(driver->lockManager, vm, disk->mirror); /* Ideally, we would restore seclabels on the backing chain here * but we don't know if somebody else is not using parts of it. * Remove security driver metadata so that they are not leaked. */ - for (n = disk->mirror; virStorageSourceIsBacking(n); n = n->backingStore) { - if (qemuSecurityMoveImageMetadata(driver, vm, n, NULL) < 0) { - VIR_WARN("Unable to remove disk metadata on " - "vm %s from %s (disk target %s)", - vm->def->name, - NULLSTR(disk->src->path), - disk->dst); - } - } + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror); virObjectUnref(disk->mirror); } - for (n = disk->src; virStorageSourceIsBacking(n); n = n->backingStore) { - if (qemuSecurityMoveImageMetadata(driver, vm, n, NULL) < 0) { - VIR_WARN("Unable to remove disk metadata on " - "vm %s from %s (disk target %s)", - vm->def->name, - NULLSTR(n->path), - disk->dst); - } - } + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src); } /* Recompute the cached backing chain to match our @@ -754,22 +736,12 @@ qemuBlockJobEventProcessLegacy(virQEMUDriverPtr driver, case VIR_DOMAIN_BLOCK_JOB_FAILED: case VIR_DOMAIN_BLOCK_JOB_CANCELED: if (disk->mirror) { - virStorageSourcePtr n; - virDomainLockImageDetach(driver->lockManager, vm, disk->mirror); /* Ideally, we would restore seclabels on the backing chain here * but we don't know if somebody else is not using parts of it. * Remove security driver metadata so that they are not leaked. */ - for (n = disk->mirror; virStorageSourceIsBacking(n); n = n->backingStore) { - if (qemuSecurityMoveImageMetadata(driver, vm, n, NULL) < 0) { - VIR_WARN("Unable to remove disk metadata on " - "vm %s from %s (disk target %s)", - vm->def->name, - NULLSTR(disk->src->path), - disk->dst); - } - } + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror); virObjectUnref(disk->mirror); disk->mirror = NULL; @@ -1177,7 +1149,6 @@ qemuBlockJobProcessEventFailedActiveCommit(virQEMUDriverPtr driver, qemuBlockJobDataPtr job) { virDomainDiskDefPtr disk = job->disk; - virStorageSourcePtr n; VIR_DEBUG("active commit job '%s' on VM '%s' failed", job->name, vm->def->name); @@ -1187,15 +1158,7 @@ qemuBlockJobProcessEventFailedActiveCommit(virQEMUDriverPtr driver, /* Ideally, we would make the backing chain read only again (yes, SELinux * can do that using different labels). But that is not implemented yet and * not leaking security driver metadata is more important. */ - for (n = disk->mirror; virStorageSourceIsBacking(n); n = n->backingStore) { - if (qemuSecurityMoveImageMetadata(driver, vm, n, NULL) < 0) { - VIR_WARN("Unable to remove disk metadata on " - "vm %s from %s (disk target %s)", - vm->def->name, - NULLSTR(disk->src->path), - disk->dst); - } - } + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror); virObjectUnref(disk->mirror); disk->mirror = NULL; -- 2.23.0

On Tue, Nov 19, 2019 at 10:14:08 +0100, Michal Privoznik wrote:
There are four places where we remove image XATTRs and in all of them we have the same for() loop with the same body. Move it into a separate function because I'm about to introduce fifth place where the same needs to be done.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_block.c | 25 ++++++++++++++++++++++ src/qemu/qemu_block.h | 6 ++++++ src/qemu/qemu_blockjob.c | 45 ++++------------------------------------ 3 files changed, 35 insertions(+), 41 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

If user starts a blockcommit without --pivot then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain. In this case we don't ever clear the access we granted at the beginning. https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 8cb12ac9a6..fa9167441e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7617,6 +7617,15 @@ void qemuProcessStop(virQEMUDriverPtr driver, for (i = 0; i < vm->def->niothreadids; i++) vm->def->iothreadids[i]->thread_id = 0; + for (i = 0; i < def->ndisks; i++) { + virDomainDiskDefPtr disk = def->disks[i]; + + if (!disk->mirror) + continue; + + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror); + } + /* clear all private data entries which are no longer needed */ qemuDomainObjPrivateDataClear(priv); -- 2.23.0

On Tue, Nov 19, 2019 at 10:14:09 +0100, Michal Privoznik wrote: In subject, please remove 'Simulate blockjob --abort' and replace it e.g. with: 'remove image metadata for running mirror jobs' or something similar, because this patch does not really do anything with the job itself.
If user starts a blockcommit without --pivot then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain. In this case we don't ever clear the access we granted at the beginning.
This applies any time a job is still running. --pivot makes sure that it finishes after qemu reaches the correct point, but the intermediate state is still potentially long-running. Also note that disk->mirror is only present for an active layer commit job but not for regular commit jobs. Does this bug apply to those as well? It's harder to simulate that job though, but if qemu crashes during the job exactly the same semantics as if --pivot was not used and the VM was destroyed instead apply. We still relabelled some images. Also disk->mirror is present also for virDomainBlockCopy which this commit message omits completely. You'll have to reformulate the commit message for this patch given that you persuade me that checking disk->mirror is enough given the existence of non-active jobs which don't populate disk->mirror.
https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 9 +++++++++ 1 file changed, 9 insertions(+)

On 11/19/19 10:31 AM, Peter Krempa wrote:
On Tue, Nov 19, 2019 at 10:14:09 +0100, Michal Privoznik wrote:
In subject, please remove 'Simulate blockjob --abort' and replace it e.g. with: 'remove image metadata for running mirror jobs' or something similar, because this patch does not really do anything with the job itself.
If user starts a blockcommit without --pivot then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain. In this case we don't ever clear the access we granted at the beginning.
This applies any time a job is still running. --pivot makes sure that it finishes after qemu reaches the correct point, but the intermediate state is still potentially long-running.
Right, point is, at the time we're doing qemuProcessStop(), qemu is no longer running -> we are restoring seclabels. No XATTR should be left behind.
Also note that disk->mirror is only present for an active layer commit job but not for regular commit jobs. Does this bug apply to those as well?
Interesting question. Let me test. (testing, testing, testing). Oh yeah it does. Looks like I'm going to need mimic the else branch from qemuBlockJobEventProcessLegacyCompleted(). That is, I'll need qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src); too.
It's harder to simulate that job though, but if qemu crashes during the job exactly the same semantics as if --pivot was not used and the VM was destroyed instead apply. We still relabelled some images.
Oh right. But that's basically the same thing. The only difference is whether libvirt initiated qemu shutdown or not.
Also disk->mirror is present also for virDomainBlockCopy which this commit message omits completely.
Okay, will mention it in v2.
You'll have to reformulate the commit message for this patch given that you persuade me that checking disk->mirror is enough given the existence of non-active jobs which don't populate disk->mirror.
https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 9 +++++++++ 1 file changed, 9 insertions(+)
Michal

If user starts a blockcommit without --pivot or a blockcopy also without --pivot then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain or if qemu dies whilst executing the block job. In this case we don't ever clear the access we granted at the beginning. To fix this, we need to mimic what block job code does for aborting a block job -> remove image metadata from disk->mirror and disk->src chains. https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- Diff to v2: - updated commit message - do more remove - for disk->src chain too - put a comment about the importance of code placement src/qemu/qemu_process.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 209d07cfe8..b9dd433f54 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7615,6 +7615,18 @@ void qemuProcessStop(virQEMUDriverPtr driver, for (i = 0; i < vm->def->niothreadids; i++) vm->def->iothreadids[i]->thread_id = 0; + /* Do this explicitly after vm->pid is reset so that security drivers don't + * try to enter the domain's namespace which is non-existent by now as qemu + * is no longer running. */ + for (i = 0; i < def->ndisks; i++) { + virDomainDiskDefPtr disk = def->disks[i]; + + if (disk->mirror) + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror); + + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src); + } + /* clear all private data entries which are no longer needed */ qemuDomainObjPrivateDataClear(priv); -- 2.23.0

On Tue, Nov 19, 2019 at 14:15:12 +0100, Michal Privoznik wrote:
If user starts a blockcommit without --pivot or a blockcopy also without --pivot then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain or if qemu dies whilst executing the block job. In this case we don't ever clear the access we granted at the beginning. To fix this, we need to mimic what block job code does for aborting a block job -> remove image metadata from disk->mirror and disk->src chains.
https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Diff to v2: - updated commit message - do more remove - for disk->src chain too - put a comment about the importance of code placement
src/qemu/qemu_process.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 209d07cfe8..b9dd433f54 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7615,6 +7615,18 @@ void qemuProcessStop(virQEMUDriverPtr driver, for (i = 0; i < vm->def->niothreadids; i++) vm->def->iothreadids[i]->thread_id = 0;
+ /* Do this explicitly after vm->pid is reset so that security drivers don't + * try to enter the domain's namespace which is non-existent by now as qemu + * is no longer running. */ + for (i = 0; i < def->ndisks; i++) { + virDomainDiskDefPtr disk = def->disks[i]; + + if (disk->mirror) + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror); + + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src); + }
So aren't we tracking the image metadata also for the shared subsets of backing chain which might be used by multiple domains? Because this undoes everything and we do have some code which does this in the security driver, don't we? So this either is duplicate or it's a superset of what is done in the security driver.

On 11/20/19 12:17 PM, Peter Krempa wrote:
On Tue, Nov 19, 2019 at 14:15:12 +0100, Michal Privoznik wrote:
If user starts a blockcommit without --pivot or a blockcopy also without --pivot then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain or if qemu dies whilst executing the block job. In this case we don't ever clear the access we granted at the beginning. To fix this, we need to mimic what block job code does for aborting a block job -> remove image metadata from disk->mirror and disk->src chains.
https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Diff to v2: - updated commit message - do more remove - for disk->src chain too - put a comment about the importance of code placement
src/qemu/qemu_process.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 209d07cfe8..b9dd433f54 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7615,6 +7615,18 @@ void qemuProcessStop(virQEMUDriverPtr driver, for (i = 0; i < vm->def->niothreadids; i++) vm->def->iothreadids[i]->thread_id = 0;
+ /* Do this explicitly after vm->pid is reset so that security drivers don't + * try to enter the domain's namespace which is non-existent by now as qemu + * is no longer running. */ + for (i = 0; i < def->ndisks; i++) { + virDomainDiskDefPtr disk = def->disks[i]; + + if (disk->mirror) + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror); + + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src); + }
So aren't we tracking the image metadata also for the shared subsets of backing chain which might be used by multiple domains?
No. We are not doing that: static int virSecurityDACSetImageLabelInternal(virSecurityManagerPtr mgr, virDomainDefPtr def, virStorageSourcePtr src, virStorageSourcePtr parent) { bool remember; bool is_toplevel = parent == src || parent->externalDataStore == src; ... /* We can't do restore on shared resources safely. Not even * with refcounting implemented in XATTRs because if there * was a domain running with the feature turned off the * refcounter in XATTRs would not reflect the actual number * of times the resource is in use and thus the last restore * on the resource (which actually restores the original * owner) might cut off access to the domain with the feature * disabled. * For disks, a shared resource is the whole backing chain * but the top layer, or read only image, or disk explicitly * marked as shared. */ remember = is_toplevel && !src->readonly && !src->shared; return virSecurityDACSetOwnership(mgr, src, NULL, user, group, remember); } static int virSecuritySELinuxSetImageLabelInternal(virSecurityManagerPtr mgr, virDomainDefPtr def, virStorageSourcePtr src, virStorageSourcePtr parent) { bool remember; bool is_toplevel = parent == src || parent->externalDataStore == src; int ret; ... /* We can't do restore on shared resources safely. Not even * with refcounting implemented in XATTRs because if there * was a domain running with the feature turned off the * refcounter in XATTRs would not reflect the actual number * of times the resource is in use and thus the last restore * on the resource (which actually restores the original * owner) might cut off access to the domain with the feature * disabled. * For disks, a shared resource is the whole backing chain * but the top layer, or read only image, or disk explicitly * marked as shared. */ remember = is_toplevel && !src->readonly && !src->shared; .. ret = virSecuritySELinuxSetFilecon(mgr, src->path, use_label, remember); ... }
Because this undoes everything and we do have some code which does this in the security driver, don't we? So this either is duplicate or it's a superset of what is done in the security driver.
So the problem here is that we call qemuDomainStorageSourceAccessAllow() but never call the counter parts. Now, I could call qemuDomainStorageSourceAccessRevoke() here but: 1) we don't need to bother with CGroups or image locking (as in virtlockd/sanlock) because qemu process is gone at this point and so is its CGroup and image locks (both virtlockd and sanlock automatically release image locks if the process holding locks dies), 2) I'd need to access top_parent and baseSource - are they available here easily? 3) we are doing this code I'm suggesting already in qemuBlockJobEventProcessLegacyCompleted() Also note that I'm calling qemuBlockRemoveImageMetadata() only after qemuSecurityRestoreAllLabel(). Michal

On Wed, Nov 20, 2019 at 15:10:17 +0100, Michal Privoznik wrote:
On 11/20/19 12:17 PM, Peter Krempa wrote:
On Tue, Nov 19, 2019 at 14:15:12 +0100, Michal Privoznik wrote:
If user starts a blockcommit without --pivot or a blockcopy also without --pivot then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain or if qemu dies whilst executing the block job. In this case we don't ever clear the access we granted at the beginning. To fix this, we need to mimic what block job code does for aborting a block job -> remove image metadata from disk->mirror and disk->src chains.
https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Diff to v2: - updated commit message - do more remove - for disk->src chain too - put a comment about the importance of code placement
src/qemu/qemu_process.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 209d07cfe8..b9dd433f54 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7615,6 +7615,18 @@ void qemuProcessStop(virQEMUDriverPtr driver, for (i = 0; i < vm->def->niothreadids; i++) vm->def->iothreadids[i]->thread_id = 0; + /* Do this explicitly after vm->pid is reset so that security drivers don't + * try to enter the domain's namespace which is non-existent by now as qemu + * is no longer running. */ + for (i = 0; i < def->ndisks; i++) { + virDomainDiskDefPtr disk = def->disks[i]; + + if (disk->mirror) + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->mirror); + + qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src); + }
So aren't we tracking the image metadata also for the shared subsets of backing chain which might be used by multiple domains?
No. We are not doing that:
static int virSecurityDACSetImageLabelInternal(virSecurityManagerPtr mgr, virDomainDefPtr def, virStorageSourcePtr src, virStorageSourcePtr parent) { bool remember; bool is_toplevel = parent == src || parent->externalDataStore == src;
...
/* We can't do restore on shared resources safely. Not even * with refcounting implemented in XATTRs because if there * was a domain running with the feature turned off the * refcounter in XATTRs would not reflect the actual number * of times the resource is in use and thus the last restore * on the resource (which actually restores the original * owner) might cut off access to the domain with the feature * disabled. * For disks, a shared resource is the whole backing chain * but the top layer, or read only image, or disk explicitly * marked as shared. */ remember = is_toplevel && !src->readonly && !src->shared;
return virSecurityDACSetOwnership(mgr, src, NULL, user, group, remember); }
static int virSecuritySELinuxSetImageLabelInternal(virSecurityManagerPtr mgr, virDomainDefPtr def, virStorageSourcePtr src, virStorageSourcePtr parent) { bool remember; bool is_toplevel = parent == src || parent->externalDataStore == src; int ret;
...
/* We can't do restore on shared resources safely. Not even * with refcounting implemented in XATTRs because if there * was a domain running with the feature turned off the * refcounter in XATTRs would not reflect the actual number * of times the resource is in use and thus the last restore * on the resource (which actually restores the original * owner) might cut off access to the domain with the feature * disabled. * For disks, a shared resource is the whole backing chain * but the top layer, or read only image, or disk explicitly * marked as shared. */ remember = is_toplevel && !src->readonly && !src->shared;
..
ret = virSecuritySELinuxSetFilecon(mgr, src->path, use_label, remember);
... }
Because this undoes everything and we do have some code which does this in the security driver, don't we? So this either is duplicate or it's a superset of what is done in the security driver.
So the problem here is that we call qemuDomainStorageSourceAccessAllow() but
Well, I was worried that we actually call the code even if qemuDomainStorageSourceAccessAllow was never called on the image. For example when no blockjob was ever used.
never call the counter parts. Now, I could call qemuDomainStorageSourceAccessRevoke() here but:
This would be wrong and could actually cut off access for other VMs in case when the image is shared. In case when we run a blockjob we relabel it for write acces but by starting the job the user declares that the chain is not shared in the first place as executing any merge would corrupt other readers. In addition now qemu image locks and our metadata locks should prevent from getting there in the first place, but both of those have some limitations.
1) we don't need to bother with CGroups or image locking (as in virtlockd/sanlock) because qemu process is gone at this point and so is its CGroup and image locks (both virtlockd and sanlock automatically release image locks if the process holding locks dies),
2) I'd need to access top_parent and baseSource - are they available here easily?
3) we are doing this code I'm suggesting already in qemuBlockJobEventProcessLegacyCompleted()
As mentioned above, the used declared that the chain is not shared by calling the API in the first place but calling it all the time is ... suspicious at best. It feels like the labelling code should handle this if it should be executed all the time during vm shutdown. Since the functions you've pasted above say that it's not the correct thing to do this also must be wrong. If the only correct thing to do is to nuke the xattrs it still feels like the labelling code which put the xattrs there in the first place should deal with it when un-labelling the files and not deal with it explicitly in many places. I can understand that the old block job handlers require this special handling but for the new blockjob handler we unlabel the files as they are being removed so even that place would be fixed with a systemic solution.
Also note that I'm calling qemuBlockRemoveImageMetadata() only after qemuSecurityRestoreAllLabel().
Michal

On 11/20/19 3:22 PM, Peter Krempa wrote:
Okay, how about this commit message: qemuProcessStop: Remove image metadata for running mirror jobs If user starts a blockcommit without --pivot or a blockcopy also without --pivot then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain or if qemu dies whilst executing the block job. In this case we don't ever clear the access we granted at the beginning. To fix this, maybe a bit harsh approach is used, but it works: after all labels were restored (that is after qemuSecurityRestoreAllLabel() was called), we iterate over each disk in the domain and remove XATTRs from the whole backing chain and also from any file the disk is being mirrored to. This would have been done at the time of pivot, but it isn't because user decided to kill the domain instead. If we don't do this and leave some XATTRs behind the domain might be unable to start. Also, secdriver can't do this because it doesn't know if there is any job running. It's outside of its scope - the hypervisor driver is responsible for calling secdriver's APIs. https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Michal

On Thu, Nov 21, 2019 at 14:02:49 +0100, Michal Privoznik wrote:
On 11/20/19 3:22 PM, Peter Krempa wrote:
Okay, how about this commit message:
qemuProcessStop: Remove image metadata for running mirror jobs
If user starts a blockcommit without --pivot or a blockcopy also without --pivot then we modify access for qemu on both images anda
We modify the access perms even if you use --pivot.
leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot calls destroy on the domain or if qemu dies whilst executing the block job. In this case we don't ever clear the access we granted at the beginning. To fix this, maybe a bit harsh approach is used, but it works: after all labels were restored (that is after qemuSecurityRestoreAllLabel() was called), we iterate over each disk in the domain and remove XATTRs from the whole backing chain and also from any file the disk is being mirrored to.
This would have been done at the time of pivot, but it isn't because user decided to kill the domain instead. If we don't do this and leave some XATTRs behind the domain might be unable to start.
Also, secdriver can't do this because it doesn't know if there is any job running. It's outside of its scope - the hypervisor driver is responsible for calling secdriver's APIs.
So you correctly explain that this fixes everything up after a block job and you also explain why the secdrivers can't do that. What's missing is explanation why it's okay to call even if no blockjob was ever active on the VM and that this does not have implications e.g. on backing chains shared between two VMs (even if none of them ran a blockjob ever). With the above case explained I'll be willing to ACK the patch.
https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal

On 11/21/19 3:31 PM, Peter Krempa wrote:
On Thu, Nov 21, 2019 at 14:02:49 +0100, Michal Privoznik wrote:
On 11/20/19 3:22 PM, Peter Krempa wrote:
New commit message: qemuProcessStop: Remove image metadata for running mirror jobs If user starts a blockcommit or a blockcopy then we modify access for qemu on both images and leave it like that until pivot is executed. So far so good. Problem is, if user instead of issuing pivot (where we would modify the access again so that the state before the job is restored) calls destroy on the domain or if qemu dies whilst executing the block job. In this case we don't ever clear the access we granted at the beginning. To fix this, maybe a bit harsh approach is used, but it works: after all labels were restored (that is after qemuSecurityRestoreAllLabel() was called), we iterate over each disk in the domain and remove XATTRs from the whole backing chain and also from any file the disk is being mirrored to. This would have been done at the time of pivot, but it isn't because user decided to kill the domain instead. If we don't do this and leave some XATTRs behind the domain might be unable to start. Also, secdriver can't do this because it doesn't know if there is any job running. It's outside of its scope - the hypervisor driver is responsible for calling secdriver's APIs. Moreover, this is safe to call because we don't remember labels for any member of a backing chain instead of the top layer. But that one was restored in qemuSecurityRestoreAllLabel() call done earlier. Therefore, not only we don't remember labels (and thus this is basically a NOP for other images in the backing chain) it is also safe to call this when no blockjob was started in the first place, or if some parts of the backing chain are shared with some other domains - this is NOP, unless a block job is active at the time of domain destroy. https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Michal

On Thu, Nov 21, 2019 at 17:00:37 +0100, Michal Privoznik wrote:
On 11/21/19 3:31 PM, Peter Krempa wrote:
On Thu, Nov 21, 2019 at 14:02:49 +0100, Michal Privoznik wrote:
On 11/20/19 3:22 PM, Peter Krempa wrote:
New commit message:
qemuProcessStop: Remove image metadata for running mirror jobs
If user starts a blockcommit or a blockcopy then we modify access for qemu on both images and leave it like that until pivot is
until the job terminates
executed. So far so good. Problem is, if user instead of issuing pivot (where we would modify the access again so that the state
instead of terminating the job (where ...
before the job is restored) calls destroy on the domain or if qemu dies whilst executing the block job. In this case we don't ever clear the access we granted at the beginning. To fix this, maybe a bit harsh approach is used, but it works: after all labels were restored (that is after qemuSecurityRestoreAllLabel() was called), we iterate over each disk in the domain and remove XATTRs from the whole backing chain and also from any file the disk is being mirrored to.
This would have been done at the time of pivot, but it isn't because user decided to kill the domain instead. If we don't do this and leave some XATTRs behind the domain might be unable to start.
Also, secdriver can't do this because it doesn't know if there is any job running. It's outside of its scope - the hypervisor driver is responsible for calling secdriver's APIs.
Moreover, this is safe to call because we don't remember labels for any member of a backing chain instead of the top layer. But
s/instead/except/
that one was restored in qemuSecurityRestoreAllLabel() call done earlier. Therefore, not only we don't remember labels (and thus this is basically a NOP for other images in the backing chain) it is also safe to call this when no blockjob was started in the first place, or if some parts of the backing chain are shared with some other domains - this is NOP, unless a block job is active at the time of domain destroy.
https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
participants (2)
-
Michal Privoznik
-
Peter Krempa