Sure, I do understand that Red Hat (or any other vendor) is taking
no
support responsibility for this. At this point I'd just like to
contribute to a better understanding of what's expected to definitely
_not_ work, so that people don't bloody their noses on that. :)
Indeed. nesting is nice to enable as it works in 99% of all cases. It
just doesn't work when trying to migrate a nested hypervisor. (on x86)
That's what most people don't realize, as it works "just fine" for 99%
of all use cases.
[...]
>
> savevm/loadvm is not expected to work correctly on an L1 if it is
> running L2 guests. It should work on L2 however.
Again, I'm somewhat struggling to understand this vs. live migration —
but it's entirely possible that I'm sorely lacking in my knowledge of
kernel and CPU internals.
(savevm/loadvm is also called "migration to file")
When we migrate to a file, it really is the same migration stream. You
"dump" the VM state into a file, instead of sending it over to another
(running) target.
Once you load your VM state from that file, it is a completely fresh
VM/KVM environment. So you have to restore all the state. Now, as nVMX
state is not contained in the migration stream, you cannot restore that
state. The L1 state is therefore "damaged" or incomplete.
[...]
>> Kashyap, can you think of any other limitations that would
benefit
>> from improved documentation?
>
> We should certainly document what I have summaries here properly at a
> central palce!
I tried getting registered on the
linux-kvm.org wiki to do exactly
that, and ran into an SMTP/DNS configuration issue with the
verification email. Kashyap said he was going to poke the site admin
about that.
Now, here's a bit more information on my continued testing. As I
mentioned on IRC, one of the things that struck me as odd was that if
I ran into the issue previously described, the L1 guest would enter a
reboot loop if configured with kernel.panic_on_oops=1. In other words,
I would savevm the L1 guest (with a running L2), then loadvm it, and
then the L1 would stack-trace, reboot, and then keep doing that
indefinitely. I found that weird because on the second reboot, I would
expect the system to come up cleanly.
Guess the L1 state (in the kernel) is broken that hard, that even a
reset cannot fix it.
I've now changed my L2 guest's CPU configuration so that libvirt (in
L1) starts the L2 guest with the following settings:
<cpu>
<model fallback='forbid'>Haswell-noTSX</model>
<vendor>Intel</vendor>
<feature policy='disable' name='vme'/>
<feature policy='disable' name='ss'/>
<feature policy='disable' name='f16c'/>
<feature policy='disable' name='rdrand'/>
<feature policy='disable' name='hypervisor'/>
<feature policy='disable' name='arat'/>
<feature policy='disable' name='tsc_adjust'/>
<feature policy='disable' name='xsaveopt'/>
<feature policy='disable' name='abm'/>
<feature policy='disable' name='aes'/>
<feature policy='disable' name='invpcid'/>
</cpu>
Maybe one of these features is the root cause of the "messed up" state
in KVM. So disabling it also makes the L1 state "less broken".
Basically, I am disabling every single feature that my L1's "virsh
capabilities" reports. Now this does not make my L1 come up happily
from loadvm. But it does seem to initiate a clean reboot after loadvm,
and after that clean reboot it lives happily.
If this is as good as it gets (for now), then I can totally live with
that. It certainly beats running the L2 guest with Qemu (without KVM
acceleration). But I would still love to understand the issue a little
bit better.
I mean the real solution to the problem is of course restoring the L1
state correctly (migrating nVMX state, what people are working on right
now). So what you are seeing is a bad "side effect" of that.
For now, nested=true should never be used along with savevm/loadvm/live
migration
Cheers,
Florian
--
Thanks,
David / dhildenb