On Tue, Jul 11, 2023 at 03:48:25PM +0200, Claudio Imbrenda wrote:
On Tue, 11 Jul 2023 09:17:00 +0100
Daniel P. Berrangé <berrange(a)redhat.com> wrote:
[...]
> > We could add additional time depending on the guest memory size BUT with
> > Secure Execution the timeout would need to be increased by factors (two
> > digits). Also for libvirt it is not possible to detect if the guest is in
> > Secure Execution mode.
>
> What component is causing this 2 orders of magnitude delay in shutting
Secure Execution (protected VMs)
So its the hardware that imposes the penalty, rather than something
the kenrel is doing ?
Can anything else mitigate this ? eg does using huge pages make it
faster than normal pages ?
> down a guest ? If the host can't tell if Secure Execution
mode is
> enabled or not, why would any code path be different & slower ?
The host kernel (and QEMU) know if a specific VM is running in
secure mode, but there is no meaningful way for this information to be
communicated outwards (e.g. to libvirt)
Can we expose this in one of the QMP commands, or a new one ? It feels
like a mgmt app is going to want to know if a guest is running in secure
mode or not, so it can know if this shutdown penalty is going to be
present.
During teardown, the host kernel will need to do some time-consuming
extra cleanup for each page that belonged to a secure guest.
>
> > I also assume that timeouts of +1h are not acceptable. Wouldn't a long
> > timeout cause other trouble like stalling "virsh list" run in
parallel?
>
> Well a 1 hour timeout is pretty insane, even with the async teardown
I think we all agree, and that's why asynchronous teardown was
implemented
> that's terrible as RAM is unable to be used for any new guest for
> an incredibly long time.
I'm not sure what you mean here. RAM is not kept aside until the
teardown is complete; cleared pages are returned to the free pool
immediately as they are cleared. i.e. when the cleanup is halfway
through, half of the memory will have been freed.
Yes, it is incrementally released, but in practice most hypervisors are
memory constrained. So if you stop a 2 TB guest, and want to then boot it
again, unless you have a couple of free TB of RAM hanging around, you're
going to need to wait for most all of the orignial RAM to be reclaimed.
Async cleanup definitely helps, but there's only so much it can do.
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|