[PATCH 0/2] remove kvm_pr PPC64 support for non-Power8 hosts

The reasoning is described in the commit message of patch 02. Gitlab tree: https://gitlab.com/danielhb/libvirt/-/tree/kvmpr_cap_v1 Daniel Henrique Barboza (2): virt-host-validade-common: move virHostKernelModuleIsLoaded to virkmod qemu_capabilities.c: drop 'kvm_pr' support for non-Power8 hosts src/libvirt_private.syms | 1 + src/qemu/qemu_capabilities.c | 38 +++++++++++++++++++++++++++++-- src/util/virkmod.c | 34 +++++++++++++++++++++++++++ src/util/virkmod.h | 1 + tools/virt-host-validate-common.c | 27 ---------------------- tools/virt-host-validate-common.h | 2 -- tools/virt-host-validate-qemu.c | 3 ++- 7 files changed, 74 insertions(+), 32 deletions(-) -- 2.26.2

This function will be used in a later patch by QEMU code. Rename it to virKModIsLoaded to be compliant with the other helpers. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/libvirt_private.syms | 1 + src/util/virkmod.c | 34 +++++++++++++++++++++++++++++++ src/util/virkmod.h | 1 + tools/virt-host-validate-common.c | 27 ------------------------ tools/virt-host-validate-common.h | 2 -- tools/virt-host-validate-qemu.c | 3 ++- 6 files changed, 38 insertions(+), 30 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 42f8d7c222..b86d7e3cb0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2404,6 +2404,7 @@ virKeycodeValueTranslate; # util/virkmod.h virKModConfig; virKModIsBlacklisted; +virKModIsLoaded; virKModLoad; virKModUnload; diff --git a/src/util/virkmod.c b/src/util/virkmod.c index 12cb5920e8..3110f3bf8b 100644 --- a/src/util/virkmod.c +++ b/src/util/virkmod.c @@ -24,6 +24,7 @@ #include "virkmod.h" #include "vircommand.h" #include "virstring.h" +#include "virfile.h" static int doModprobe(const char *opts, const char *module, char **outbuf, char **errbuf) @@ -60,6 +61,39 @@ doRmmod(const char *module, char **errbuf) return 0; } + +/** + * virKModModuleIsLoaded: + * + * Returns true if the module is loaded, false otherwise. + */ +bool virKModIsLoaded(const char *module) +{ + FILE *fp; + bool ret = false; + + if (!(fp = fopen("/proc/modules", "r"))) + return false; + + do { + char line[1024]; + + if (!fgets(line, sizeof(line), fp)) + break; + + if (STRPREFIX(line, module)) { + ret = true; + break; + } + + } while (1); + + VIR_FORCE_FCLOSE(fp); + + return ret; +} + + /** * virKModConfig: * diff --git a/src/util/virkmod.h b/src/util/virkmod.h index f05cf83cf6..db0d4aabe3 100644 --- a/src/util/virkmod.h +++ b/src/util/virkmod.h @@ -24,6 +24,7 @@ #include "internal.h" char *virKModConfig(void); +bool virKModIsLoaded(const char *); char *virKModLoad(const char *, bool) ATTRIBUTE_NONNULL(1); char *virKModUnload(const char *) diff --git a/tools/virt-host-validate-common.c b/tools/virt-host-validate-common.c index f68c9c7c96..5986c26a0d 100644 --- a/tools/virt-host-validate-common.c +++ b/tools/virt-host-validate-common.c @@ -417,33 +417,6 @@ int virHostValidateIOMMU(const char *hvname, } -bool virHostKernelModuleIsLoaded(const char *module) -{ - FILE *fp; - bool ret = false; - - if (!(fp = fopen("/proc/modules", "r"))) - return false; - - do { - char line[1024]; - - if (!fgets(line, sizeof(line), fp)) - break; - - if (STRPREFIX(line, module)) { - ret = true; - break; - } - - } while (1); - - VIR_FORCE_FCLOSE(fp); - - return ret; -} - - int virHostValidateSecureGuests(const char *hvname, virHostValidateLevel level) { diff --git a/tools/virt-host-validate-common.h b/tools/virt-host-validate-common.h index 3df5ea0c7e..87aeefe2bb 100644 --- a/tools/virt-host-validate-common.h +++ b/tools/virt-host-validate-common.h @@ -87,5 +87,3 @@ int virHostValidateIOMMU(const char *hvname, int virHostValidateSecureGuests(const char *hvname, virHostValidateLevel level); - -bool virHostKernelModuleIsLoaded(const char *module); diff --git a/tools/virt-host-validate-qemu.c b/tools/virt-host-validate-qemu.c index ea7f172790..e590057c9c 100644 --- a/tools/virt-host-validate-qemu.c +++ b/tools/virt-host-validate-qemu.c @@ -27,6 +27,7 @@ #include "virarch.h" #include "virbitmap.h" #include "vircgroup.h" +#include "virkmod.h" int virHostValidateQEMU(void) { @@ -92,7 +93,7 @@ int virHostValidateQEMU(void) if (arch == VIR_ARCH_PPC64 || arch == VIR_ARCH_PPC64LE) { virHostMsgCheck("QEMU", "%s", _("for PowerPC KVM module loaded")); - if (!virHostKernelModuleIsLoaded("kvm_hv")) + if (!virKModIsLoaded("kvm_hv")) virHostMsgFail(VIR_HOST_VALIDATE_WARN, _("Load kvm_hv for better performance")); else -- 2.26.2

PPC64 has two KVM modules: kvm_hv and kvm_pr. The official supported module was always kvm_hv, while kvm_pr was used for internal testing or for very niche cases in Power 8 hosts, always without official IBM or distro support. Problem is, QMP will report KVM supportfor PPC64 if any of these modules is loaded in the host, and kvm_pr is broken in everything but Power8 (and will remain broken, since kvm_pr is unmaintained). This can lead to situations like [1], where the tooling is misled to believe that the host has KVM capabilities when in reality it doesn't. The first reaction would be to simply forsake kvm_pr support entirely and move on, but there is no reason for now to be disruptive with any Power8 guests in the wild that are using kvm_pr (somehow). A more subtle approach is to not claim QEMU_CAPS_KVM support in all cases that we know it's completely broken, allowing Power8 users to take their shot using kvm_pr in their VMs. We can remove kvm_pr support completely when the module is removed from the kernel. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1843865 CC: Leonardo Augusto Guimarães Garcia <lagarcia@br.ibm.com> CC: Greg Kurz <groug@kaod.org> CC: David Gibson <david@gibson.dropbear.id.au> CC: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/qemu/qemu_capabilities.c | 38 ++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 484fff99e5..b1c1d4dd70 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -49,6 +49,7 @@ #include "qemu_process.h" #include "qemu_firmware.h" #include "virutil.h" +#include "virkmod.h" #include <fcntl.h> #include <sys/stat.h> @@ -3242,6 +3243,31 @@ virQEMUCapsProbeQMPTPM(virQEMUCapsPtr qemuCaps, } +static void +virQEMUCapsSetPPC64KVMState(virQEMUCapsPtr qemuCaps, virArch hostArch) +{ + g_autoptr(virCPUDef) hostCPU = virCPUProbeHost(hostArch); + + /* At this moment, PPC64 has two KVM modules: kvm_hv and kvm_pr. + * QEMU will return KVM present and enabled = true if any of these + * is loaded in the host. Thing is, kvm_pr was never officially + * supported by IBM or any other distro, but people still ended + * up using it in Power8 hosts regardless. It is also currently + * unmaintained and broken on Power9, and will be sunset in the + * future. kvm_hv is the only KVM module that is and will be + * supported. + * + * Until then, do not claim QEMU_CAPS_KVM if there is only kvm_pr + * loaded in the host. There is a good chance that there are + * unsupported kvm_pr Power8 guests running in the wild, so let's + * cut some slack for this case, for now. */ + if (STRNEQLEN(hostCPU->model, "POWER8", 6) && !virKModIsLoaded("kvm_hv")) + return; + + virQEMUCapsSet(qemuCaps, QEMU_CAPS_KVM); +} + + static int virQEMUCapsProbeQMPKVMState(virQEMUCapsPtr qemuCaps, qemuMonitorPtr mon) @@ -3252,8 +3278,16 @@ virQEMUCapsProbeQMPKVMState(virQEMUCapsPtr qemuCaps, if (qemuMonitorGetKVMState(mon, &enabled, &present) < 0) return -1; - if (present && enabled) - virQEMUCapsSet(qemuCaps, QEMU_CAPS_KVM); + if (present && enabled) { + virArch hostArch = virArchFromHost(); + + if (ARCH_IS_PPC64(hostArch)) { + virQEMUCapsSetPPC64KVMState(qemuCaps, hostArch); + return 0; + } + + virQEMUCapsSet(qemuCaps, QEMU_CAPS_KVM); + } return 0; } -- 2.26.2

On Fri, Jun 19, 2020 at 06:04:33PM -0300, Daniel Henrique Barboza wrote:
PPC64 has two KVM modules: kvm_hv and kvm_pr. The official supported module was always kvm_hv, while kvm_pr was used for internal testing or for very niche cases in Power 8 hosts, always without official IBM or distro support.
Problem is, QMP will report KVM supportfor PPC64 if any of these modules is loaded in the host, and kvm_pr is broken in everything but Power8 (and will remain broken, since kvm_pr is unmaintained). This can lead to situations like [1], where the tooling is misled to believe that the host has KVM capabilities when in reality it doesn't.
The first reaction would be to simply forsake kvm_pr support entirely and move on, but there is no reason for now to be disruptive with any Power8 guests in the wild that are using kvm_pr (somehow). A more subtle approach is to not claim QEMU_CAPS_KVM support in all cases that we know it's completely broken, allowing Power8 users to take their shot using kvm_pr in their VMs. We can remove kvm_pr support completely when the module is removed from the kernel.
I'm sorry I don't have a way to test this patch. However it looks reasonable from here based on the description above and the changes to the code below, so ACK. Rich.
CC: Leonardo Augusto Guimarães Garcia <lagarcia@br.ibm.com> CC: Greg Kurz <groug@kaod.org> CC: David Gibson <david@gibson.dropbear.id.au> CC: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/qemu/qemu_capabilities.c | 38 ++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 484fff99e5..b1c1d4dd70 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -49,6 +49,7 @@ #include "qemu_process.h" #include "qemu_firmware.h" #include "virutil.h" +#include "virkmod.h"
#include <fcntl.h> #include <sys/stat.h> @@ -3242,6 +3243,31 @@ virQEMUCapsProbeQMPTPM(virQEMUCapsPtr qemuCaps, }
+static void +virQEMUCapsSetPPC64KVMState(virQEMUCapsPtr qemuCaps, virArch hostArch) +{ + g_autoptr(virCPUDef) hostCPU = virCPUProbeHost(hostArch); + + /* At this moment, PPC64 has two KVM modules: kvm_hv and kvm_pr. + * QEMU will return KVM present and enabled = true if any of these + * is loaded in the host. Thing is, kvm_pr was never officially + * supported by IBM or any other distro, but people still ended + * up using it in Power8 hosts regardless. It is also currently + * unmaintained and broken on Power9, and will be sunset in the + * future. kvm_hv is the only KVM module that is and will be + * supported. + * + * Until then, do not claim QEMU_CAPS_KVM if there is only kvm_pr + * loaded in the host. There is a good chance that there are + * unsupported kvm_pr Power8 guests running in the wild, so let's + * cut some slack for this case, for now. */ + if (STRNEQLEN(hostCPU->model, "POWER8", 6) && !virKModIsLoaded("kvm_hv")) + return; + + virQEMUCapsSet(qemuCaps, QEMU_CAPS_KVM); +} + + static int virQEMUCapsProbeQMPKVMState(virQEMUCapsPtr qemuCaps, qemuMonitorPtr mon) @@ -3252,8 +3278,16 @@ virQEMUCapsProbeQMPKVMState(virQEMUCapsPtr qemuCaps, if (qemuMonitorGetKVMState(mon, &enabled, &present) < 0) return -1;
- if (present && enabled) - virQEMUCapsSet(qemuCaps, QEMU_CAPS_KVM); + if (present && enabled) { + virArch hostArch = virArchFromHost(); + + if (ARCH_IS_PPC64(hostArch)) { + virQEMUCapsSetPPC64KVMState(qemuCaps, hostArch); + return 0; + } + + virQEMUCapsSet(qemuCaps, QEMU_CAPS_KVM); + }
return 0; } -- 2.26.2
-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org

On Fri, 2020-06-19 at 18:04 -0300, Daniel Henrique Barboza wrote:
+static void +virQEMUCapsSetPPC64KVMState(virQEMUCapsPtr qemuCaps, virArch hostArch) +{ + g_autoptr(virCPUDef) hostCPU = virCPUProbeHost(hostArch); + + /* At this moment, PPC64 has two KVM modules: kvm_hv and kvm_pr. + * QEMU will return KVM present and enabled = true if any of these + * is loaded in the host. Thing is, kvm_pr was never officially + * supported by IBM or any other distro, but people still ended + * up using it in Power8 hosts regardless. It is also currently + * unmaintained and broken on Power9, and will be sunset in the + * future. kvm_hv is the only KVM module that is and will be + * supported. + * + * Until then, do not claim QEMU_CAPS_KVM if there is only kvm_pr + * loaded in the host. There is a good chance that there are + * unsupported kvm_pr Power8 guests running in the wild, so let's + * cut some slack for this case, for now. */ + if (STRNEQLEN(hostCPU->model, "POWER8", 6) && !virKModIsLoaded("kvm_hv"))
The macro you're looking for is STRPREFIX() :) Anyway, the patch overall makes sense to me, but I'm wondering: since generally we get KVM availability information from QEMU, is libvirt the right place for this kind of check, or should it rather happen one layer down the stack so that QEMU reports correct[*] KVM status information to clients other than libvirt as well? David, what do you think? [*] It's fairly questionable whether reporting KVM as not available when only kvm_pr is loaded is correct. It is certainly helpful, but correct? Not entirely sure. -- Andrea Bolognani / Red Hat / Virtualization

On 6/30/20 2:35 PM, Andrea Bolognani wrote:
On Fri, 2020-06-19 at 18:04 -0300, Daniel Henrique Barboza wrote:
+static void +virQEMUCapsSetPPC64KVMState(virQEMUCapsPtr qemuCaps, virArch hostArch) +{ + g_autoptr(virCPUDef) hostCPU = virCPUProbeHost(hostArch); + + /* At this moment, PPC64 has two KVM modules: kvm_hv and kvm_pr. + * QEMU will return KVM present and enabled = true if any of these + * is loaded in the host. Thing is, kvm_pr was never officially + * supported by IBM or any other distro, but people still ended + * up using it in Power8 hosts regardless. It is also currently + * unmaintained and broken on Power9, and will be sunset in the + * future. kvm_hv is the only KVM module that is and will be + * supported. + * + * Until then, do not claim QEMU_CAPS_KVM if there is only kvm_pr + * loaded in the host. There is a good chance that there are + * unsupported kvm_pr Power8 guests running in the wild, so let's + * cut some slack for this case, for now. */ + if (STRNEQLEN(hostCPU->model, "POWER8", 6) && !virKModIsLoaded("kvm_hv"))
The macro you're looking for is STRPREFIX() :)
Thanks!
Anyway, the patch overall makes sense to me, but I'm wondering: since generally we get KVM availability information from QEMU, is libvirt the right place for this kind of check, or should it rather happen one layer down the stack so that QEMU reports correct[*] KVM status information to clients other than libvirt as well? David, what do you think?
The reason I did it in Libvirt was to make the tooling and customers aware that kvm_pr is not officially supported, but at the same time not hinder the usage of kvm_pr completely for users using it via QEMU command line. People are still using kvm_pr eventually in niche cases, such as internal tests and development, specially because hvm_hv does not replace kvm_pr entirely yet (e.g. kvm_hv does not work in a LPAR).
[*] It's fairly questionable whether reporting KVM as not available when only kvm_pr is loaded is correct. It is certainly helpful, but correct? Not entirely sure.
Yeah, this is not ideal. Thing is that, in the Libvirt level, I can cater to customers/tooling and filter kvm_pr out of the game (i.e. Bugzillas). In QEMU what would happen is either (1) more people will be impacted and hating my guts because I blocked kvm_pr for unsupported but valid usage or (2) I would be NACKed in QEMU (I mean, what's the technical reason for me to claim "kvm_pr is not real KVM"?) and everything will stay as is, everyone thinking that kvm_pr is usable in both QEMU and Libvirt - and opening bugs - until kvm_pr is removed. TBH I don't see any pleasant solution for this kvm_pr situation. Leaving it as is will keep bugs flowing in until it gets removed, filtering it out under Libvirt/QEMU is kind of lame and you won't see me denying it. This patch is not my finest hour, that's for sure :) Thanks, DHB

On Fri, Jun 19, 2020 at 06:04:33PM -0300, Daniel Henrique Barboza wrote:
PPC64 has two KVM modules: kvm_hv and kvm_pr. The official supported module was always kvm_hv, while kvm_pr was used for internal testing or for very niche cases in Power 8 hosts, always without official IBM or distro support.
Problem is, QMP will report KVM supportfor PPC64 if any of these modules is loaded in the host, and kvm_pr is broken in everything but Power8 (and will remain broken, since kvm_pr is unmaintained). This can lead to situations like [1], where the tooling is misled to believe that the host has KVM capabilities when in reality it doesn't.
The first reaction would be to simply forsake kvm_pr support entirely and move on, but there is no reason for now to be disruptive with any Power8 guests in the wild that are using kvm_pr (somehow). A more subtle approach is to not claim QEMU_CAPS_KVM support in all cases that we know it's completely broken, allowing Power8 users to take their shot using kvm_pr in their VMs. We can remove kvm_pr support completely when the module is removed from the kernel.
I'm not sure libvirt should be forbidding this use of kvm_pr on non-Power8. IIUC, it is the only impl that can actually be used when in a Power9 LPARs. This patch is essentially saying that TCG is better for Power LPARs than kvm_pr which I think is not right. The problem is that modern QEMU ppc machine types don't work on kvm_pr without a bunch of extra CLI args to disable use of varoous missing features in kvm_pr. If you do pass though CLI args though, things can work to some extent. IIUC, TCG suffers from the same problem with these missing features though, though the BZ report below does not seem to confirm that belief. If someone is able to succesfully use kvm_pr with QEMU without using libvirt, then it makes libvirt look bad if we refuse to allow the same setup. So if we're going to declare that kvm_pr is not supported for QEMU on non-Power8, then I feel like QEMU should be the thing declaring that. We probe QEMU via QMP to see if it supports KVM. IOW, if QEMU doesn't want to support kvm_pr, it should report kvm disabled when asked. On the libvirt side I think we need to focus more on awareness of the problem. eg virt-host-validate should ERROR is kvm_pr is loaded on a host which is capable of kvm_hv. It might want to WARN if kvm_pr is used on any other host, if that's not too aggressive ? Possibly the capabilities XML should have some way to report which KVM impl is present. Not sure what this would look like though.
That BZ is reported against RHEL downstream, and from a RHEL POV I think the better answer is to simply not build the kvm_pr kernel module in the first place, since it was never considered a supported KVM impl. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 7/27/20 12:29 PM, Daniel P. Berrangé wrote:
On Fri, Jun 19, 2020 at 06:04:33PM -0300, Daniel Henrique Barboza wrote:
PPC64 has two KVM modules: kvm_hv and kvm_pr. The official supported module was always kvm_hv, while kvm_pr was used for internal testing or for very niche cases in Power 8 hosts, always without official IBM or distro support.
Problem is, QMP will report KVM supportfor PPC64 if any of these modules is loaded in the host, and kvm_pr is broken in everything but Power8 (and will remain broken, since kvm_pr is unmaintained). This can lead to situations like [1], where the tooling is misled to believe that the host has KVM capabilities when in reality it doesn't.
The first reaction would be to simply forsake kvm_pr support entirely and move on, but there is no reason for now to be disruptive with any Power8 guests in the wild that are using kvm_pr (somehow). A more subtle approach is to not claim QEMU_CAPS_KVM support in all cases that we know it's completely broken, allowing Power8 users to take their shot using kvm_pr in their VMs. We can remove kvm_pr support completely when the module is removed from the kernel.
I'm not sure libvirt should be forbidding this use of kvm_pr on non-Power8. IIUC, it is the only impl that can actually be used when in a Power9 LPARs. This patch is essentially saying that TCG is better for Power LPARs than kvm_pr which I think is not right.
We didn't work in kvm_pr support for Power 9. All the effort was put into implementing nested virtualization with the kvm_hv module. The fact that kvm_pr module loads in Power 9 is kind of an "accident". As you can see in the bug linked, it's non-functional. Things get messy because, well, kvm_hv does not work in P9 LPAR either. This use case is then left with TCG and the only (sort of) viable option for this scenario.
The problem is that modern QEMU ppc machine types don't work on kvm_pr without a bunch of extra CLI args to disable use of varoous missing features in kvm_pr. If you do pass though CLI args though, things can work to some extent.
IIUC, TCG suffers from the same problem with these missing features though, though the BZ report below does not seem to confirm that belief.
If someone is able to succesfully use kvm_pr with QEMU without using libvirt, then it makes libvirt look bad if we refuse to allow the same setup. So if we're going to declare that kvm_pr is not supported for QEMU on non-Power8, then I feel like QEMU should be the thing declaring that. We probe QEMU via QMP to see if it supports KVM. IOW, if QEMU doesn't want to support kvm_pr, it should report kvm disabled when asked.
The more I think about this the more I'm convinced that my approach here is not ideal and a QEMU side fix is better. At the very least we can center these decisions in the QEMU layer, while with these patches one would need to change both QEMU and Libvirt if something changes.
On the libvirt side I think we need to focus more on awareness of the problem. eg virt-host-validate should ERROR is kvm_pr is loaded on a host which is capable of kvm_hv. It might want to WARN if kvm_pr is used on any other host, if that's not too aggressive ?
Possibly the capabilities XML should have some way to report which KVM impl is present. Not sure what this would look like though.
That BZ is reported against RHEL downstream, and from a RHEL POV I think the better answer is to simply not build the kvm_pr kernel module in the first place, since it was never considered a supported KVM impl.
This would be ideal. Perhaps we should just rush into removing kvm_pr from the build and call it a day. People could keep using kvm_pr with QEMU/Libvirt by building the module themselves, at their own peril. Thanks, DHB
Regards, Daniel

On Mon, Jul 27, 2020 at 01:44:07PM -0300, Daniel Henrique Barboza wrote:
On 7/27/20 12:29 PM, Daniel P. Berrangé wrote:
On Fri, Jun 19, 2020 at 06:04:33PM -0300, Daniel Henrique Barboza wrote:
PPC64 has two KVM modules: kvm_hv and kvm_pr. The official supported module was always kvm_hv, while kvm_pr was used for internal testing or for very niche cases in Power 8 hosts, always without official IBM or distro support.
Problem is, QMP will report KVM supportfor PPC64 if any of these modules is loaded in the host, and kvm_pr is broken in everything but Power8 (and will remain broken, since kvm_pr is unmaintained). This can lead to situations like [1], where the tooling is misled to believe that the host has KVM capabilities when in reality it doesn't.
The first reaction would be to simply forsake kvm_pr support entirely and move on, but there is no reason for now to be disruptive with any Power8 guests in the wild that are using kvm_pr (somehow). A more subtle approach is to not claim QEMU_CAPS_KVM support in all cases that we know it's completely broken, allowing Power8 users to take their shot using kvm_pr in their VMs. We can remove kvm_pr support completely when the module is removed from the kernel.
I'm not sure libvirt should be forbidding this use of kvm_pr on non-Power8. IIUC, it is the only impl that can actually be used when in a Power9 LPARs. This patch is essentially saying that TCG is better for Power LPARs than kvm_pr which I think is not right.
We didn't work in kvm_pr support for Power 9. All the effort was put into implementing nested virtualization with the kvm_hv module. The fact that kvm_pr module loads in Power 9 is kind of an "accident". As you can see in the bug linked, it's non-functional.
If kvm_pr is known to be broken in Power 9 LPARs, then that is an upstream kernel bug for allowing it to be loaded under Power 9 at all. Can we fix the kernel to prevent it loading in cases where it is known to be bad. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, 27 Jul 2020 17:47:58 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, Jul 27, 2020 at 01:44:07PM -0300, Daniel Henrique Barboza wrote:
On 7/27/20 12:29 PM, Daniel P. Berrangé wrote:
On Fri, Jun 19, 2020 at 06:04:33PM -0300, Daniel Henrique Barboza wrote:
PPC64 has two KVM modules: kvm_hv and kvm_pr. The official supported module was always kvm_hv, while kvm_pr was used for internal testing or for very niche cases in Power 8 hosts, always without official IBM or distro support.
Problem is, QMP will report KVM supportfor PPC64 if any of these modules is loaded in the host, and kvm_pr is broken in everything but Power8 (and will remain broken, since kvm_pr is unmaintained). This can lead to situations like [1], where the tooling is misled to believe that the host has KVM capabilities when in reality it doesn't.
The first reaction would be to simply forsake kvm_pr support entirely and move on, but there is no reason for now to be disruptive with any Power8 guests in the wild that are using kvm_pr (somehow). A more subtle approach is to not claim QEMU_CAPS_KVM support in all cases that we know it's completely broken, allowing Power8 users to take their shot using kvm_pr in their VMs. We can remove kvm_pr support completely when the module is removed from the kernel.
I'm not sure libvirt should be forbidding this use of kvm_pr on non-Power8. IIUC, it is the only impl that can actually be used when in a Power9 LPARs. This patch is essentially saying that TCG is better for Power LPARs than kvm_pr which I think is not right.
We didn't work in kvm_pr support for Power 9. All the effort was put into implementing nested virtualization with the kvm_hv module. The fact that kvm_pr module loads in Power 9 is kind of an "accident". As you can see in the bug linked, it's non-functional.
I haven't tried to use PR within an LPAR on POWER9 but I expect it should be usable by QEMU if you pass options to the machine, eg. cap-large-decr=off for a start (and maybe some more).
If kvm_pr is known to be broken in Power 9 LPARs, then that is an upstream kernel bug for allowing it to be loaded under Power 9 at all. Can we fix the kernel to prevent it loading in cases where it is known to be bad.
The only known case where PR cannot work is POWER9+radix, in which case the module cannot be loaded.
Regards, Daniel
Cheers, -- Greg

On Mon, 27 Jul 2020 16:29:20 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Fri, Jun 19, 2020 at 06:04:33PM -0300, Daniel Henrique Barboza wrote:
PPC64 has two KVM modules: kvm_hv and kvm_pr. The official supported module was always kvm_hv, while kvm_pr was used for internal testing or for very niche cases in Power 8 hosts, always without official IBM or distro support.
Problem is, QMP will report KVM supportfor PPC64 if any of these modules is loaded in the host, and kvm_pr is broken in everything but Power8 (and will remain broken, since kvm_pr is unmaintained). This can lead to situations like [1], where the tooling is misled to believe that the host has KVM capabilities when in reality it doesn't.
The first reaction would be to simply forsake kvm_pr support entirely and move on, but there is no reason for now to be disruptive with any Power8 guests in the wild that are using kvm_pr (somehow). A more subtle approach is to not claim QEMU_CAPS_KVM support in all cases that we know it's completely broken, allowing Power8 users to take their shot using kvm_pr in their VMs. We can remove kvm_pr support completely when the module is removed from the kernel.
I'm not sure libvirt should be forbidding this use of kvm_pr on non-Power8. IIUC, it is the only impl that can actually be used when in a Power9 LPARs. This patch is essentially saying that TCG is better for Power LPARs than kvm_pr which I think is not right.
On a POWER9 system, PR KVM can only be loaded if the kernel is using the legacy HPT MMU mode (which is the case in LPARs AFAIK), not the newer radix MMU mode that was introduced for POWER9. I agree that PR KVM is likely to provide a better experience to the user than TCG.
The problem is that modern QEMU ppc machine types don't work on kvm_pr without a bunch of extra CLI args to disable use of varoous missing features in kvm_pr. If you do pass though CLI args though, things can work to some extent.
IIUC, TCG suffers from the same problem with these missing features though, though the BZ report below does not seem to confirm that belief.
Yes, both TCG and PR KVM lack features, but we have relaxed our feature checks when using TCG so that we still can start a VM, for the sake of "make check" on non-POWER hardware. QEMU just prints some warnings in this case. We've always tried to avoid PR KVM specific paths in QEMU, even if we do have a few of them in the current code base. I don't think we want to add more.
If someone is able to succesfully use kvm_pr with QEMU without using libvirt, then it makes libvirt look bad if we refuse to allow the same setup. So if we're going to declare that kvm_pr is not supported for QEMU on non-Power8, then I feel like QEMU should be the thing declaring that. We probe QEMU via QMP to see if it supports KVM. IOW, if QEMU doesn't want to support kvm_pr, it should report kvm disabled when asked.
Well... using PR KVM requires extra options to be passed to the machine, so if the reference is to be able to run with default settings, libvirt would simply match what the user can do with QEMU, ie. nothing.
On the libvirt side I think we need to focus more on awareness of the problem. eg virt-host-validate should ERROR is kvm_pr is loaded on a host which is capable of kvm_hv. It might want to WARN if kvm_pr is used on any other host, if that's not too aggressive ?
To make things funnier, it is possible to have kvm_hv and kvm_pr loaded at the same time. The pseries machine implement the kvm_type() machine hook, which allows to select the KVM flavor to be used when calling the KVM_CREATE_VM ioctl: $ qemu-system-ppc64 -M pseries,help | grep kvm-type pseries-4.2.kvm-type=string (Specifies the KVM virtualization mode (HV, PR)) If kvm-type isn't provided, KVM internally decides which implementation to use (HV prevails). So I'm not sure that checking if kvm_pr is loaded brings much...
Possibly the capabilities XML should have some way to report which KVM impl is present. Not sure what this would look like though.
That BZ is reported against RHEL downstream, and from a RHEL POV I think the better answer is to simply not build the kvm_pr kernel module in the first place, since it was never considered a supported KVM impl.
This would certainly put an end to the BZ, and I guess a RHEL user is unlikely to use PR so I tend to agree that we should probably better address that downstream.
Regards, Daniel
Cheers, -- Greg
participants (5)
-
Andrea Bolognani
-
Daniel Henrique Barboza
-
Daniel P. Berrangé
-
Greg Kurz
-
Richard W.M. Jones