On Mon, 27 Jul 2020 16:29:20 +0100
Daniel P. Berrangé <berrange(a)redhat.com> wrote:
On Fri, Jun 19, 2020 at 06:04:33PM -0300, Daniel Henrique Barboza
wrote:
> PPC64 has two KVM modules: kvm_hv and kvm_pr. The official supported
> module was always kvm_hv, while kvm_pr was used for internal testing
> or for very niche cases in Power 8 hosts, always without official
> IBM or distro support.
>
> Problem is, QMP will report KVM supportfor PPC64 if any of these
> modules is loaded in the host, and kvm_pr is broken in everything
> but Power8 (and will remain broken, since kvm_pr is unmaintained).
> This can lead to situations like [1], where the tooling is misled to
> believe that the host has KVM capabilities when in reality it
> doesn't.
>
> The first reaction would be to simply forsake kvm_pr support entirely
> and move on, but there is no reason for now to be disruptive with any
> Power8 guests in the wild that are using kvm_pr (somehow). A more
> subtle approach is to not claim QEMU_CAPS_KVM support in all cases
> that we know it's completely broken, allowing Power8 users to take
> their shot using kvm_pr in their VMs. We can remove kvm_pr support
> completely when the module is removed from the kernel.
I'm not sure libvirt should be forbidding this use of kvm_pr
on non-Power8. IIUC, it is the only impl that can actually be
used when in a Power9 LPARs. This patch is essentially saying
that TCG is better for Power LPARs than kvm_pr which I think
is not right.
On a POWER9 system, PR KVM can only be loaded if the kernel is using
the legacy HPT MMU mode (which is the case in LPARs AFAIK), not the
newer radix MMU mode that was introduced for POWER9. I agree that
PR KVM is likely to provide a better experience to the user than
TCG.
The problem is that modern QEMU ppc machine types don't work
on kvm_pr without a bunch of extra CLI args to disable use
of varoous missing features in kvm_pr. If you do pass though
CLI args though, things can work to some extent.
IIUC, TCG suffers from the same problem with these missing
features though, though the BZ report below does not seem to
confirm that belief.
Yes, both TCG and PR KVM lack features, but we have relaxed our
feature checks when using TCG so that we still can start a VM,
for the sake of "make check" on non-POWER hardware. QEMU just
prints some warnings in this case.
We've always tried to avoid PR KVM specific paths in QEMU, even
if we do have a few of them in the current code base. I don't
think we want to add more.
If someone is able to succesfully use kvm_pr with QEMU without
using libvirt, then it makes libvirt look bad if we refuse to
allow the same setup. So if we're going to declare that kvm_pr
is not supported for QEMU on non-Power8, then I feel like QEMU
should be the thing declaring that. We probe QEMU via QMP to
see if it supports KVM. IOW, if QEMU doesn't want to support
kvm_pr, it should report kvm disabled when asked.
Well... using PR KVM requires extra options to be passed to
the machine, so if the reference is to be able to run with
default settings, libvirt would simply match what the user
can do with QEMU, ie. nothing.
On the libvirt side I think we need to focus more on awareness
of the problem. eg virt-host-validate should ERROR is kvm_pr
is loaded on a host which is capable of kvm_hv. It might want
to WARN if kvm_pr is used on any other host, if that's not
too aggressive ?
To make things funnier, it is possible to have kvm_hv and kvm_pr
loaded at the same time. The pseries machine implement the
kvm_type() machine hook, which allows to select the KVM flavor
to be used when calling the KVM_CREATE_VM ioctl:
$ qemu-system-ppc64 -M pseries,help | grep kvm-type
pseries-4.2.kvm-type=string (Specifies the KVM virtualization mode (HV, PR))
If kvm-type isn't provided, KVM internally decides which implementation
to use (HV prevails).
So I'm not sure that checking if kvm_pr is loaded brings much...
Possibly the capabilities XML should have some way to report
which KVM impl is present. Not sure what this would look
like though.
> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1843865
That BZ is reported against RHEL downstream, and from a RHEL POV I
think the better answer is to simply not build the kvm_pr kernel
module in the first place, since it was never considered a supported
KVM impl.
This would certainly put an end to the BZ, and I guess a RHEL
user is unlikely to use PR so I tend to agree that we should
probably better address that downstream.
Regards,
Daniel
Cheers,
--
Greg