Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Thursday, 8 February 2018

On 07.02.2018 16:31, Kashyap Chamarthy wrote:
...
 [Cc: KVM upstream list.]

 On Tue, Feb 06, 2018 at 04:11:46PM +0100, Florian Haas wrote:
> Hi everyone,
>
> I hope this is the correct list to discuss this issue; please feel
> free to redirect me otherwise.
>
> I have a nested virtualization setup that looks as follows:
>
> - Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node)
> - L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default
> - Nested guest: SLES 12, kernel 3.12.28-4-default
>
> The nested guest is configured with "<type arch='x86_64'
> machine='pc-i440fx-1.4'>hvm</type>".
>
> This is working just beautifully, except when the L0 guest wakes up
> from managed save (openstack server resume in OpenStack parlance).
> Then, in the L0 guest we immediately see this:

 [...] # Snip the call trace from Florian.  It is here:
 https://www.redhat.com/archives/libvirt-users/2018-February/msg00014.html

> What does fix things, of course, is to switch from the nested guest
> from KVM to Qemu — but that also makes things significantly slower.
>
> So I'm wondering: is there someone reading this who does run nested
> KVM and has managed to successfully live-migrate or managed-save? If
> so, would you be able to share a working host kernel / L0 guest kernel
> / nested guest kernel combination, or any other hints for tuning the
> L0 guest to support managed save and live migration?

 Following up from our IRC discussion (on #kvm, Freenode).  Re-posting my
 comment here:

 So I just did a test of 'managedsave' (which is just "save the state of
 the running VM to a file" in libvirt parlance) of L1, _while_ L2 is
 running, and I seem to reproduce your case (see the call trace
 attached).

     # Ensure L2 (the nested guest) is running on L1.  Then, from L0, do
     # the following:
     [L0] $ virsh managedsave L1
     [L0] $ virsh start L1 --console

 Result: See the call trace attached to this bug.  But L1 goes on to
 start "fine", and L2 keeps running, too.  But things start to seem
 weird.  As in: I try to safely, read-only mount the L2 disk image via
 libguestfs (by setting export LIBGUESTFS_BACKEND=direct, which uses
 direct QEMU): `guestfish --ro -a -i ./cirros.qcow2`.  It throws the call
 trace again on the L1 serial console.  And the `guestfish` command just
 sits there forever

   - L0 (bare metal) Kernel: 4.13.13-300.fc27.x86_64+debug
   - L1 (guest hypervisor) kernel: 4.11.10-300.fc26.x86_64
   - L2 is a CirrOS 3.5 image

 I can reproduce this at least 3 times, with the above versions.

 I'm using libvirt 'host-passthrough' for CPU (meaning: '-cpu host'
in
 QEMU parlance) for both L1 and L2.

 My L0 CPU is:  Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz.

 Thoughts? 
Sounds like a similar problem as in
https://bugzilla.kernel.org/show_bug.cgi?id=198621

In short: there is no (live) migration support for nested VMX yet. So as
soon as your guest is using VMX itself ("nVMX"), this is not expected to
work.

-- 

Thanks,

David / dhildenb

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)