Re: Add option to skip cpu feature and model verification in libvirt and rely on qemu 'enforce' option

Wednesday, 23 November 2022

On 23/11/22 5:44 pm, Daniel P. Berrangé wrote:
...
 On Wed, Nov 23, 2022 at 12:51:39PM +0100, Jiri Denemark wrote:
> On Wed, Nov 23, 2022 at 11:41:03 +0000, Daniel P. Berrangé wrote:
>> On Wed, Nov 23, 2022 at 12:29:24PM +0100, Jiri Denemark wrote:
>>> On Wed, Nov 23, 2022 at 11:18:01 +0000, Daniel P. Berrangé wrote:
>>>> On Tue, Nov 22, 2022 at 01:48:39PM +0100, Jiri Denemark wrote:
>>>>> On Mon, Nov 21, 2022 at 12:47:58 +0530, manish.mishra wrote:
>>>>>> On 18/11/22 5:00 pm, Jiri Denemark wrote:
>>>>>>> On Fri, Nov 18, 2022 at 10:18:59 +0000, John Levon wrote:
>>>>>>>> On Fri, Nov 18, 2022 at 10:52:32AM +0100, Jiri Denemark
wrote:
>>>>>>>>
>>>>>>>>>>     * Qemu already provide an option
'enforce' to validate if features
>>>>>>>>>>     with which vm is started is exactly same as
one provided and nothing
>>>>>>>>>>     is silently dropped.
>>>>>>>>> Right, but it's not enough. In addition to
removed features libvirt also
>>>>>>>>> checks for unexpectedly added features. And you
really need to do both.
>>>>>>>>> Because if you ask for -cpu
Model,feat1=on,feat2=on,enforce and QEMU
>>>>>>>>> says everything is fine, the guest might see more
than what you asked.
>>>>>>>>> For example, if a feature is enabled only if a host
supports it you may
>>>>>>>>> or may not get it without any complains from QEMU.
But if you get it you
>>>>>>>>> really need to explicitly ask for it during
migration, otherwise the
>>>>>>>>> feature can just silently disappear. Of course, this
would be a really
>>>>>>>>> bad behavior from QEMU, but that does not mean it
can't happen (I think
>>>>>>>>> SVM is a bit problematic in this way) and the whole
point of libvirt's
>>>>>>>>> checks is to prevent this kind of issues.
>>>>>>>> Hi Jiri, I'm not following this very well. I think
you're saying that qemu has
>>>>>>>> had bugs previously where features get silently enabled,
>>>>>>> Personally, I haven't actually witnessed any bug in this
area (as far as
>>>>>>> I can remember, which is not that far :-)), but got some
reports of at
>>>>>>> least one, even though without any proof.
>>>>>>>
>>>>>>> Specifically, SVM is quite strange as it is included in all
AMD CPU
>>>>>>> models in QEMU and yet if you try to start it on a host
without nesting
>>>>>>> enabled "enforce" does not complain. I saw the
feature is enabled with
>>>>>>> older machine types, but I was told the magic behind this
feature looks
>>>>>>> like not only machine type but even host configuration itself
is
>>>>>>> involved. Anyway, although I saw reports of that the feature
could be
>>>>>>> enabled without an explicit request even with new machine
types I still
>>>>>>> haven't seen any proof of this happening. So I hope it
just does not
>>>>>>> happen and users are only afraid of this possibility.
>>>>>>>
>>>>>>>> and it's libvirt's job/role to paper over those
issues? Do you have
>>>>>>>> some specific cases of this?
>>>>>>> This is not about papering. It's actually the opposite,
that is about
>>>>>>> detecting if something like this happens and reporting it as
a failure
>>>>>>> rather than papering it and hoping everything goes well. So I
think
>>>>>>> doing this is a good idea even though I don't think we
actually saw any
>>>>>>> issue of this kind.
>>>>>>
>>>>>> Hi Jiri, I see now with libvirt master, with
check=='full' we verify
>>>>>> both silently dropped as well as added features. But as you
already
>>>>>> stated Qemu silently adding feature is a Qemu bug and libvirt
just
>>>>>> reports that bug, so it should be very unlikely, i agree that is
not a
>>>>>> good reasoning :). Our requirement is that we want to use CPU
Models
>>>>>> and features which are defined in Qemu but not in libvirt for e.g
if
>>>>>> we want to use Icelake-Server-V4 directly or newly added vmx
feature,
>>>>>> libvirt does not allow. Currently we take help of qemu to do
>>>>>> validations but for cpu feature verfication and model definations
we
>>>>>> still use libvirt defined definations which prevent us to use
anything
>>>>>> which is not defined in libvirt. I see there are already efforts
going
>>>>>> on to get all model and feature defination from qemu itself, but
not
>>>>>> sure how much time it will take. Till that happens we thought
safest
>>>>>> option is to have an option to remove all validations from
libvirt and
>>>>>> rely on qemu 'enforce' for migration safetly. I
understand
>>>>>> qemu-enforce does not check for silently added features, but that
case
>>>>>> we can assume is very unlikely and Qemu should fix otherwise VMs
will
>>>>>> not poweron anyway with check=='full'. Basically we want
it as an
>>>>>> modification of check='none' but also skipping things
like
>>>>>> virCPUValidateFeatures and passing option 'enforce' to
Qemu. Or if
>>>>>> silently adding features is that big concern we can have a check
in
>>>>>> Qemu itself? I understand current qemu-enforce is not as
migration
>>>>>> safe as check=='full' but probably suitable for our use
case for time
>>>>>> being?
>>>>> I understand why you want this, but on the other hand adding just a
>>>>> plain pass-through for CPU model and features is quite dangerous as
it
>>>>> can be used to set any -cpu option even if it's not a feature.
And
>>>>> especially the parts that are configured in other areas of domain
XML
>>>>> (such as hypervisor features). So I think instead of adding a new
mode
>>>>> for <cpu> we should go another way. For things that are not
yet
>>>>> supported by libvirt we support various elements in qemu namespace,
>>>>> e.g., setting QEMU command line options, environment variables, or
even
>>>>> overriding options libvirt would normally set on the command line. So
I
>>>>> guess we could have a special <qemu:cpu> element which would be
used
>>>>> when a user wants full control over the -cpu option without any
libvirt
>>>>> intervention.
>>>> I really don't think we should allow this, especially not for
something
>>>> which is clearly intended to be used for production deployment. Our
>>>> hypervisor CLI passthrough is there to allow for short term workarounds
>>>> for bugs, or to experiment with a feature before it is mapped into
>>>> libvirt in a supported manner.
>>>>
>>>> If there are aspects related to QEMU configuration thiat are not in
>>>> libvirt yet, we need to address those gaps, not provide yet another
>>>> way to bypass libvirt mapping.
>>> Indeed, this was definitely meant as a short term workaround for stuff
>>> we don't have support for yet, for testing or experimental purposes. The
>>> supported approach is for implement the missing parts in libvirt (and
>>> QEMU) as soon as possible. I don't see why would a properly documented
>>> support for experiments with -cpu would be an issue.
>> Why can't it just use the exsting QEMU passthrough syntax we have.
>> I don't think we should be adding specifial support just for CPUs
> That would be nice, but the QEMU passthrough syntax cannot be used for
> changing options that libvirt already passes to QEMU. So using it would
> likely result in two separate -cpu options on QEMU command line. And it
> would not rule out the CPU verification code in libvirt. Remember, we
> add a default -cpu model in case there's none configured in the XML.
 Two -cpu options isn't a problem, the latter will override the former,
 which is fine from the level of support intended for QEMU passthrough
 usage.

 For the libvirt checking, isn't the 'check=none' attr sufficient
 to skip checks libvirt does.

 With regards,
 Daniel 
Actually even within check='none' libvirt does few checks e.g.
virCPUValidateFeatures if feature string passed is proper and defined in libvirt. We could
extend check='none' to remove even these kind of things. So Jiri suggested that
making such kind of changes in general path is not good idea so discussion went to using
separate cpu class <qemu:cpu>.

Thanks

Manish Mishra

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Add option to skip cpu feature and model verification in libvirt and rely on qemu 'enforce' option