Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Thursday, 8 February 2018

On Thu, Feb 8, 2018 at 2:47 PM, David Hildenbrand <david(a)redhat.com&gt; wrote:
...
> Again, I'm somewhat struggling to understand this vs. live
migration —
> but it's entirely possible that I'm sorely lacking in my knowledge of
> kernel and CPU internals.

 (savevm/loadvm is also called "migration to file")

 When we migrate to a file, it really is the same migration stream. You
 "dump" the VM state into a file, instead of sending it over to another
 (running) target.

 Once you load your VM state from that file, it is a completely fresh
 VM/KVM environment. So you have to restore all the state. Now, as nVMX
 state is not contained in the migration stream, you cannot restore that
 state. The L1 state is therefore "damaged" or incomplete. 
*lightbulb* Thanks a lot, that's a perfectly logical explanation. :)

...
> Now, here's a bit more information on my continued testing.
As I
> mentioned on IRC, one of the things that struck me as odd was that if
> I ran into the issue previously described, the L1 guest would enter a
> reboot loop if configured with kernel.panic_on_oops=1. In other words,
> I would savevm the L1 guest (with a running L2), then loadvm it, and
> then the L1 would stack-trace, reboot, and then keep doing that
> indefinitely. I found that weird because on the second reboot, I would
> expect the system to come up cleanly.

 Guess the L1 state (in the kernel) is broken that hard, that even a
 reset cannot fix it. 
... which would also explain that in contrast to that, a virsh
destroy/virsh start cycle does fix things.

...
> I've now changed my L2 guest's CPU configuration so that
libvirt (in
> L1) starts the L2 guest with the following settings:
>
> <cpu>
>     <model fallback='forbid'>Haswell-noTSX</model>
>     <vendor>Intel</vendor>
>     <feature policy='disable' name='vme'/>
>     <feature policy='disable' name='ss'/>
>     <feature policy='disable' name='f16c'/>
>     <feature policy='disable' name='rdrand'/>
>     <feature policy='disable' name='hypervisor'/>
>     <feature policy='disable' name='arat'/>
>     <feature policy='disable' name='tsc_adjust'/>
>     <feature policy='disable' name='xsaveopt'/>
>     <feature policy='disable' name='abm'/>
>     <feature policy='disable' name='aes'/>
>     <feature policy='disable' name='invpcid'/>
> </cpu>

 Maybe one of these features is the root cause of the "messed up" state
 in KVM. So disabling it also makes the L1 state "less broken". 
Would you try a guess as to which of the above features is a likely culprit?

...
> Basically, I am disabling every single feature that my L1's
"virsh
> capabilities" reports. Now this does not make my L1 come up happily
> from loadvm. But it does seem to initiate a clean reboot after loadvm,
> and after that clean reboot it lives happily.
>
> If this is as good as it gets (for now), then I can totally live with
> that. It certainly beats running the L2 guest with Qemu (without KVM
> acceleration). But I would still love to understand the issue a little
> bit better.

 I mean the real solution to the problem is of course restoring the L1
 state correctly (migrating nVMX state, what people are working on right
 now). So what you are seeing is a bad "side effect" of that.

 For now, nested=true should never be used along with savevm/loadvm/live
 migration. 
Yes, I gathered as much. :) Thanks again!

Cheers,
Florian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)