[libvirt] [PATCH v2 0/2] Fix possible crash when resctrl gets remounted

First one just fixes the crash, but the second one makes sure we get consistent info with at least higher probability. Martin Kletzander (2): util: Check if kernel-provided info is consistent with itself qemu: Refresh capabilities when creating resctrl allocation src/qemu/qemu_process.c | 8 +++++++- src/util/virresctrl.c | 11 +++++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) -- 2.16.1

Just in case someone re-mounted /sys/fs/resctrl with different mount options (cdp), add a check here. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/virresctrl.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index ef388757a704..6860e86e649d 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -941,6 +941,17 @@ virResctrlAllocParseProcessCache(virResctrlInfoPtr resctrl, if (!mask) return -1; + if (!resctrl || + level >= resctrl->nlevels || + !resctrl->levels[level] || + !resctrl->levels[level]->types[type]) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Missing or inconsistent resctrl info for " + "level '%ud' type '%s'"), + level, virCacheTypeToString(type)); + goto cleanup; + } + if (virBitmapShrink(mask, resctrl->levels[level]->types[type]->bits) < 0) goto cleanup; -- 2.16.1

On Fri, Feb 02, 2018 at 15:27:21 +0100, Martin Kletzander wrote:
Just in case someone re-mounted /sys/fs/resctrl with different mount options (cdp), add a check here.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/virresctrl.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index ef388757a704..6860e86e649d 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -941,6 +941,17 @@ virResctrlAllocParseProcessCache(virResctrlInfoPtr resctrl, if (!mask) return -1;
+ if (!resctrl || + level >= resctrl->nlevels || + !resctrl->levels[level] || + !resctrl->levels[level]->types[type]) {
The only caller of this function checks the range of type by converting it from string with 'virResctrlTypeFromString' but the type in this function is 'virCacheType' and you use 'virCacheTypeToString'. Given the inconsistency and the fact that C will not validate the passed type in this case you should also validate that 'type' has the correct range. ACK with that check added.

On Fri, Feb 02, 2018 at 06:02:22PM +0100, Peter Krempa wrote:
On Fri, Feb 02, 2018 at 15:27:21 +0100, Martin Kletzander wrote:
Just in case someone re-mounted /sys/fs/resctrl with different mount options (cdp), add a check here.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/virresctrl.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index ef388757a704..6860e86e649d 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -941,6 +941,17 @@ virResctrlAllocParseProcessCache(virResctrlInfoPtr resctrl, if (!mask) return -1;
+ if (!resctrl || + level >= resctrl->nlevels || + !resctrl->levels[level] || + !resctrl->levels[level]->types[type]) {
The only caller of this function checks the range of type by converting it from string with 'virResctrlTypeFromString' but the type in this function is 'virCacheType' and you use 'virCacheTypeToString'.
Given the inconsistency and the fact that C will not validate the passed type in this case you should also validate that 'type' has the correct range.
ACK with that check added.
There are three "inconsistent" types, but they all use VIR_CACHE_TYPE_LAST in their VIR_ENUM_IMPL: - virResctrl - That's the naming we need to use when writing into resctrl schemata file - virCache - That's the naming that was decide upstream that we should use in our XML - virCacheKernel - That's what kernel uses when we read from cache info from sysfs I have no problem with adding one extra check here, but I think this is taken care of by the fact that they all use the same enum for the implementation. I can just extract the const char * fromt he error message before the check and just add `str != NULL` in the condition. I'll leave that choice up to you. I agree the code is not as nice as it could be and I'm already working on at least the minimal refactoring that needs tobe done, but in the meantime I'd like to have this working at least. The number of workarounds we need to make due to IMNSHO poor implementation of resctrl in kernel is sickening^Wtiring let's say. Just so we don't misunderstand each other, I'm not against any changes to that code. Feel free to suggest any other things that might be made better, I'm all for making this nicer. It's just that most of the time I couldn't find any nicer way to do some of these _suboptimal_ things. Have a nice day, Martin

On Mon, Feb 05, 2018 at 10:37:25 +0100, Martin Kletzander wrote:
On Fri, Feb 02, 2018 at 06:02:22PM +0100, Peter Krempa wrote:
On Fri, Feb 02, 2018 at 15:27:21 +0100, Martin Kletzander wrote:
Just in case someone re-mounted /sys/fs/resctrl with different mount options (cdp), add a check here.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/virresctrl.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index ef388757a704..6860e86e649d 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -941,6 +941,17 @@ virResctrlAllocParseProcessCache(virResctrlInfoPtr resctrl, if (!mask) return -1;
+ if (!resctrl || + level >= resctrl->nlevels || + !resctrl->levels[level] || + !resctrl->levels[level]->types[type]) {
The only caller of this function checks the range of type by converting it from string with 'virResctrlTypeFromString' but the type in this function is 'virCacheType' and you use 'virCacheTypeToString'.
Given the inconsistency and the fact that C will not validate the passed type in this case you should also validate that 'type' has the correct range.
ACK with that check added.
There are three "inconsistent" types, but they all use VIR_CACHE_TYPE_LAST in their VIR_ENUM_IMPL:
- virResctrl - That's the naming we need to use when writing into resctrl schemata file
- virCache - That's the naming that was decide upstream that we should use in our XML
- virCacheKernel - That's what kernel uses when we read from cache info from sysfs
I have no problem with adding one extra check here, but I think this is taken care of by the fact that they all use the same enum for the implementation. I can just extract the const char * fromt he error message before the check and just add `str != NULL` in the condition. I'll leave that choice up to you. I agree the code is not as nice as it could be and I'm already working on at least the minimal refactoring that needs tobe done, but in the meantime I'd like to have this working at least. The number of workarounds we need to make due to IMNSHO poor implementation of resctrl in kernel is sickening^Wtiring let's say.
Just so we don't misunderstand each other, I'm not against any changes to that code. Feel free to suggest any other things that might be made better, I'm all for making this nicer. It's just that most of the time I couldn't find any nicer way to do some of these _suboptimal_ things.
Hmm, okay fair enough. Maybe just add a comment to the enum implementations that they should all be terminated via VIR_CACHE_TYPE_LAST since code depends on it. It can definitely be separate in this case. ACK to this patch as-is.

Since one of the things in capabilities (info from resctrl updated with data about caches) can be change on the system by remounting the /sys/fs/resctrl with different options, the capabilities need to be refreshed. There is a better fix in the works, but it's going to be way bigger than this (hence the XXX note there), so for the time being let's workaround this. And in order not to slow down the domain starting, only get the capabilities if there are any cachetunes. Relates-to: https://bugzilla.redhat.com/show_bug.cgi?id=1540780 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_process.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 5a364730c8c1..2ab242b7634d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2514,9 +2514,15 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, { int ret = -1; size_t i = 0; - virCapsPtr caps = virQEMUDriverGetCapabilities(driver, false); + virCapsPtr caps = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; + if (!vm->def->ncachetunes) + return 0; + + /* Force capability refresh since resctrl info can change + * XXX: move cache info into virresctrl so caps are not needed */ + caps = virQEMUDriverGetCapabilities(driver, true); if (!caps) return -1; -- 2.16.1

On Fri, Feb 02, 2018 at 15:27:22 +0100, Martin Kletzander wrote:
Since one of the things in capabilities (info from resctrl updated with data about caches) can be change on the system by remounting the /sys/fs/resctrl with different options, the capabilities need to be refreshed. There is a better fix in the works, but it's going to be way bigger than this (hence the XXX note there), so for the time being let's workaround this. And in order not to slow down the domain starting, only get the capabilities if there are any cachetunes.
Relates-to: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_process.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 5a364730c8c1..2ab242b7634d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2514,9 +2514,15 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, { int ret = -1; size_t i = 0; - virCapsPtr caps = virQEMUDriverGetCapabilities(driver, false); + virCapsPtr caps = NULL; qemuDomainObjPrivatePtr priv = vm->privateData;
+ if (!vm->def->ncachetunes) + return 0; + + /* Force capability refresh since resctrl info can change + * XXX: move cache info into virresctrl so caps are not needed */ + caps = virQEMUDriverGetCapabilities(driver, true);
I hope that your upcoming fix will also deal with the hugepage and NUMa data which is more often used than this. My feelings towards doing this are slightly mixed, but ACK

On Fri, Feb 02, 2018 at 06:05:56PM +0100, Peter Krempa wrote:
On Fri, Feb 02, 2018 at 15:27:22 +0100, Martin Kletzander wrote:
Since one of the things in capabilities (info from resctrl updated with data about caches) can be change on the system by remounting the /sys/fs/resctrl with different options, the capabilities need to be refreshed. There is a better fix in the works, but it's going to be way bigger than this (hence the XXX note there), so for the time being let's workaround this. And in order not to slow down the domain starting, only get the capabilities if there are any cachetunes.
Relates-to: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_process.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 5a364730c8c1..2ab242b7634d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2514,9 +2514,15 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, { int ret = -1; size_t i = 0; - virCapsPtr caps = virQEMUDriverGetCapabilities(driver, false); + virCapsPtr caps = NULL; qemuDomainObjPrivatePtr priv = vm->privateData;
+ if (!vm->def->ncachetunes) + return 0; + + /* Force capability refresh since resctrl info can change + * XXX: move cache info into virresctrl so caps are not needed */ + caps = virQEMUDriverGetCapabilities(driver, true);
I hope that your upcoming fix will also deal with the hugepage and NUMa data which is more often used than this.
Actually, it will just deal with resctrl info being as consistent as possible. I mean you can always remount the directory between libvirtd reading the two files, there will always be a window of opportunity. The fix I have in mind (and code) is just going to make rid of the capability usage from the driver in resctrl-related code. Do you have any pointers to the numa and hugepage faults that we ought to fix? Maybe it can be dealt with the same way.
My feelings towards doing this are slightly mixed, but ACK
Thanks, pinky swear that I'll remove it in near future ;)

On Mon, Feb 05, 2018 at 10:40:58 +0100, Martin Kletzander wrote:
On Fri, Feb 02, 2018 at 06:05:56PM +0100, Peter Krempa wrote:
On Fri, Feb 02, 2018 at 15:27:22 +0100, Martin Kletzander wrote:
Since one of the things in capabilities (info from resctrl updated with data about caches) can be change on the system by remounting the /sys/fs/resctrl with different options, the capabilities need to be refreshed. There is a better fix in the works, but it's going to be way bigger than this (hence the XXX note there), so for the time being let's workaround this. And in order not to slow down the domain starting, only get the capabilities if there are any cachetunes.
Relates-to: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_process.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 5a364730c8c1..2ab242b7634d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2514,9 +2514,15 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, { int ret = -1; size_t i = 0; - virCapsPtr caps = virQEMUDriverGetCapabilities(driver, false); + virCapsPtr caps = NULL; qemuDomainObjPrivatePtr priv = vm->privateData;
+ if (!vm->def->ncachetunes) + return 0; + + /* Force capability refresh since resctrl info can change + * XXX: move cache info into virresctrl so caps are not needed */ + caps = virQEMUDriverGetCapabilities(driver, true);
I hope that your upcoming fix will also deal with the hugepage and NUMa data which is more often used than this.
Actually, it will just deal with resctrl info being as consistent as possible. I mean you can always remount the directory between libvirtd reading the two files, there will always be a window of opportunity. The fix I have in mind (and code) is just going to make rid of the capability usage from the driver in resctrl-related code. Do you have any pointers to the numa and hugepage faults that we ought to fix?
For the hugepage code I don't really know the internals which are wrong, but the high-level symptom is similar/same to what you have with resctl. If you mount your hugepage mounts (or modify them) after starting libvirtd, your hugepage-backed guests will fail to start until libvirtd is restarted. In the NUMA/capability code the bug is that if you enable/disable any host cpus, we use stale data when translating a numa node map into a full cpumap in virCapabilitiesGetCpusForNode. Cgroups code is not happy when attempting to pin a thread to a non-existing cpu.]
Maybe it can be dealt with the same way.
Possibly. It probably isn't very expensive to iterate the numa node maps. I'm not sure about the hugepage mounts, but in that case the guest is probably going to be huge, so some overhead might be okay.
My feelings towards doing this are slightly mixed, but ACK
Thanks, pinky swear that I'll remove it in near future ;)
participants (2)
-
Martin Kletzander
-
Peter Krempa