On 23/11/22 5:11 pm, Daniel P. Berrangé wrote:
On Wed, Nov 23, 2022 at 12:29:24PM +0100, Jiri Denemark wrote:
> On Wed, Nov 23, 2022 at 11:18:01 +0000, Daniel P. Berrangé wrote:
>> On Tue, Nov 22, 2022 at 01:48:39PM +0100, Jiri Denemark wrote:
>>> On Mon, Nov 21, 2022 at 12:47:58 +0530, manish.mishra wrote:
>>>> On 18/11/22 5:00 pm, Jiri Denemark wrote:
>>>>> On Fri, Nov 18, 2022 at 10:18:59 +0000, John Levon wrote:
>>>>>> On Fri, Nov 18, 2022 at 10:52:32AM +0100, Jiri Denemark wrote:
>>>>>>
>>>>>>>> * Qemu already provide an option 'enforce' to
validate if features
>>>>>>>> with which vm is started is exactly same as one
provided and nothing
>>>>>>>> is silently dropped.
>>>>>>> Right, but it's not enough. In addition to removed
features libvirt also
>>>>>>> checks for unexpectedly added features. And you really need
to do both.
>>>>>>> Because if you ask for -cpu Model,feat1=on,feat2=on,enforce
and QEMU
>>>>>>> says everything is fine, the guest might see more than what
you asked.
>>>>>>> For example, if a feature is enabled only if a host supports
it you may
>>>>>>> or may not get it without any complains from QEMU. But if you
get it you
>>>>>>> really need to explicitly ask for it during migration,
otherwise the
>>>>>>> feature can just silently disappear. Of course, this would be
a really
>>>>>>> bad behavior from QEMU, but that does not mean it can't
happen (I think
>>>>>>> SVM is a bit problematic in this way) and the whole point of
libvirt's
>>>>>>> checks is to prevent this kind of issues.
>>>>>> Hi Jiri, I'm not following this very well. I think you're
saying that qemu has
>>>>>> had bugs previously where features get silently enabled,
>>>>> Personally, I haven't actually witnessed any bug in this area (as
far as
>>>>> I can remember, which is not that far :-)), but got some reports of
at
>>>>> least one, even though without any proof.
>>>>>
>>>>> Specifically, SVM is quite strange as it is included in all AMD CPU
>>>>> models in QEMU and yet if you try to start it on a host without
nesting
>>>>> enabled "enforce" does not complain. I saw the feature is
enabled with
>>>>> older machine types, but I was told the magic behind this feature
looks
>>>>> like not only machine type but even host configuration itself is
>>>>> involved. Anyway, although I saw reports of that the feature could
be
>>>>> enabled without an explicit request even with new machine types I
still
>>>>> haven't seen any proof of this happening. So I hope it just does
not
>>>>> happen and users are only afraid of this possibility.
>>>>>
>>>>>> and it's libvirt's job/role to paper over those issues?
Do you have
>>>>>> some specific cases of this?
>>>>> This is not about papering. It's actually the opposite, that is
about
>>>>> detecting if something like this happens and reporting it as a
failure
>>>>> rather than papering it and hoping everything goes well. So I think
>>>>> doing this is a good idea even though I don't think we actually
saw any
>>>>> issue of this kind.
>>>>
>>>> Hi Jiri, I see now with libvirt master, with check=='full' we
verify
>>>> both silently dropped as well as added features. But as you already
>>>> stated Qemu silently adding feature is a Qemu bug and libvirt just
>>>> reports that bug, so it should be very unlikely, i agree that is not a
>>>> good reasoning :). Our requirement is that we want to use CPU Models
>>>> and features which are defined in Qemu but not in libvirt for e.g if
>>>> we want to use Icelake-Server-V4 directly or newly added vmx feature,
>>>> libvirt does not allow. Currently we take help of qemu to do
>>>> validations but for cpu feature verfication and model definations we
>>>> still use libvirt defined definations which prevent us to use anything
>>>> which is not defined in libvirt. I see there are already efforts going
>>>> on to get all model and feature defination from qemu itself, but not
>>>> sure how much time it will take. Till that happens we thought safest
>>>> option is to have an option to remove all validations from libvirt and
>>>> rely on qemu 'enforce' for migration safetly. I understand
>>>> qemu-enforce does not check for silently added features, but that case
>>>> we can assume is very unlikely and Qemu should fix otherwise VMs will
>>>> not poweron anyway with check=='full'. Basically we want it as
an
>>>> modification of check='none' but also skipping things like
>>>> virCPUValidateFeatures and passing option 'enforce' to Qemu. Or
if
>>>> silently adding features is that big concern we can have a check in
>>>> Qemu itself? I understand current qemu-enforce is not as migration
>>>> safe as check=='full' but probably suitable for our use case for
time
>>>> being?
>>> I understand why you want this, but on the other hand adding just a
>>> plain pass-through for CPU model and features is quite dangerous as it
>>> can be used to set any -cpu option even if it's not a feature. And
>>> especially the parts that are configured in other areas of domain XML
>>> (such as hypervisor features). So I think instead of adding a new mode
>>> for <cpu> we should go another way. For things that are not yet
>>> supported by libvirt we support various elements in qemu namespace,
>>> e.g., setting QEMU command line options, environment variables, or even
>>> overriding options libvirt would normally set on the command line. So I
>>> guess we could have a special <qemu:cpu> element which would be used
>>> when a user wants full control over the -cpu option without any libvirt
>>> intervention.
>> I really don't think we should allow this, especially not for something
>> which is clearly intended to be used for production deployment. Our
>> hypervisor CLI passthrough is there to allow for short term workarounds
>> for bugs, or to experiment with a feature before it is mapped into
>> libvirt in a supported manner.
>>
>> If there are aspects related to QEMU configuration thiat are not in
>> libvirt yet, we need to address those gaps, not provide yet another
>> way to bypass libvirt mapping.
> Indeed, this was definitely meant as a short term workaround for stuff
> we don't have support for yet, for testing or experimental purposes. The
> supported approach is for implement the missing parts in libvirt (and
> QEMU) as soon as possible. I don't see why would a properly documented
> support for experiments with -cpu would be an issue.
Why can't it just use the exsting QEMU passthrough syntax we have.
I don't think we should be adding specifial support just for CPUs
With regards,
Daniel
Daniel, We actually want migratibility of VM too, i mean best set of migratable of
features supported by both source and destination Qemu. Sometimes important cpu
model/version or features take a lot of time to be updated in libvirt, also other class of
issues like what John mentioned multiple source of truths is very common.
Thanks
Manish Mishra