Caching qemu capabilities and KubeVirt

Hi, I have a question regarding capability caching in the context of KubeVirt. Since we start in KubeVirt one libvirt instance per VM, libvirt has to re-discover on every VM start the qemu capabilities which leads to a 1-2s+ delay in startup. We already discover the features in a dedicated KubeVirt pod on each node. Therefore I tried to copy the capabilities over to see if that would work. It looks like in general it could work, but libvirt seems to detect a mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates the cache. The recreated capability cache looks esctly like the original one though ... The check responsible for the invalidation is this: ``` Outdated capabilities for '%s': host cpuid changed ``` So the KVM_GET_SUPPORTED_CPUID call seems to return slightly different values in different containers. After trying out the attached golang scripts in different containers, I could indeed see differences. I can however not really judge what the differences in these KVM function registers mean and I am curious if someone else knows. The files are attached too (as json for easy diffing). Best regards, Roman

On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote:
Hi,
I have a question regarding capability caching in the context of KubeVirt. Since we start in KubeVirt one libvirt instance per VM, libvirt has to re-discover on every VM start the qemu capabilities which leads to a 1-2s+ delay in startup.
We already discover the features in a dedicated KubeVirt pod on each node. Therefore I tried to copy the capabilities over to see if that would work.
It looks like in general it could work, but libvirt seems to detect a mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates the cache. The recreated capability cache looks esctly like the original one though ...
The check responsible for the invalidation is this:
``` Outdated capabilities for '%s': host cpuid changed ```
So the KVM_GET_SUPPORTED_CPUID call seems to return slightly different values in different containers.
After trying out the attached golang scripts in different containers, I could indeed see differences.
I can however not really judge what the differences in these KVM function registers mean and I am curious if someone else knows. The files are attached too (as json for easy diffing).
Can you confirm whether the two attached data files were captured by containers running on the same physical host, or could each container have run on a different host. My understanding is that KVM_GET_SUPPORTED_CPUID returns the intersection of CPUID flags supported by the physical CPUs and CPUID flag supported by the KVM kernel module. IOW, I believe the results should only differe if run across hosts with differing CPU models and/or kernel versions. I've not tried to diagnose exactly which feature bits are different in your dumps yet. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote:
Hi,
I have a question regarding capability caching in the context of KubeVirt. Since we start in KubeVirt one libvirt instance per VM, libvirt has to re-discover on every VM start the qemu capabilities which leads to a 1-2s+ delay in startup.
We already discover the features in a dedicated KubeVirt pod on each node. Therefore I tried to copy the capabilities over to see if that would work.
It looks like in general it could work, but libvirt seems to detect a mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates the cache. The recreated capability cache looks esctly like the original one though ...
The check responsible for the invalidation is this:
``` Outdated capabilities for '%s': host cpuid changed ```
So the KVM_GET_SUPPORTED_CPUID call seems to return slightly different values in different containers.
After trying out the attached golang scripts in different containers, I could indeed see differences.
I can however not really judge what the differences in these KVM function registers mean and I am curious if someone else knows. The files are attached too (as json for easy diffing).
Can you confirm whether the two attached data files were captured by containers running on the same physical host, or could each container have run on a different host.
They are coming from the same host, that is the most surprising bit for me. I am also very sure that this is the case, because I only had one k8s node from where I took these. The containers however differ (obviously) on namespaces and on the privilege level (less obvious). The handler dump is from a fully privileged container. Thanks for checking. Best regards, Roman
My understanding is that KVM_GET_SUPPORTED_CPUID returns the intersection of CPUID flags supported by the physical CPUs and CPUID flag supported by the KVM kernel module.
IOW, I believe the results should only differe if run across hosts with differing CPU models and/or kernel versions.
I've not tried to diagnose exactly which feature bits are different in your dumps yet.
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote:
Hi,
I have a question regarding capability caching in the context of KubeVirt. Since we start in KubeVirt one libvirt instance per VM, libvirt has to re-discover on every VM start the qemu capabilities which leads to a 1-2s+ delay in startup.
We already discover the features in a dedicated KubeVirt pod on each node. Therefore I tried to copy the capabilities over to see if that would work.
It looks like in general it could work, but libvirt seems to detect a mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates the cache. The recreated capability cache looks esctly like the original one though ...
The check responsible for the invalidation is this:
``` Outdated capabilities for '%s': host cpuid changed ```
So the KVM_GET_SUPPORTED_CPUID call seems to return slightly different values in different containers.
After trying out the attached golang scripts in different containers, I could indeed see differences.
I can however not really judge what the differences in these KVM function registers mean and I am curious if someone else knows. The files are attached too (as json for easy diffing).
Can you confirm whether the two attached data files were captured by containers running on the same physical host, or could each container have run on a different host.
They are coming from the same host, that is the most surprising bit for me. I am also very sure that this is the case, because I only had one k8s node from where I took these. The containers however differ (obviously) on namespaces and on the privilege level (less obvious). The handler dump is from a fully privileged container.
The privilege level sounds like something that might be impactful, so I'll investigate that. I'd be pretty surprised for namespaces to have any impact thnough. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrangé wrote:
On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote:
Hi,
I have a question regarding capability caching in the context of KubeVirt. Since we start in KubeVirt one libvirt instance per VM, libvirt has to re-discover on every VM start the qemu capabilities which leads to a 1-2s+ delay in startup.
We already discover the features in a dedicated KubeVirt pod on each node. Therefore I tried to copy the capabilities over to see if that would work.
It looks like in general it could work, but libvirt seems to detect a mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates the cache. The recreated capability cache looks esctly like the original one though ...
The check responsible for the invalidation is this:
``` Outdated capabilities for '%s': host cpuid changed ```
So the KVM_GET_SUPPORTED_CPUID call seems to return slightly different values in different containers.
After trying out the attached golang scripts in different containers, I could indeed see differences.
I can however not really judge what the differences in these KVM function registers mean and I am curious if someone else knows. The files are attached too (as json for easy diffing).
Can you confirm whether the two attached data files were captured by containers running on the same physical host, or could each container have run on a different host.
They are coming from the same host, that is the most surprising bit for me. I am also very sure that this is the case, because I only had one k8s node from where I took these. The containers however differ (obviously) on namespaces and on the privilege level (less obvious). The handler dump is from a fully privileged container.
The privilege level sounds like something that might be impactful, so I'll investigate that. I'd be pretty surprised for namespaces to have any impact thnough.
The privilege level is a red herring. Peter reminded me that we have to filter out some parts of CPUID because the APIC IDs vary depending on what host CPU the task executes on. https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346 In the 2 jSON files you provide, the differences i see should already be matched by /* filter out local apic id */ if (entry->function == 0x01 && entry->index == 0x00) entry->ebx &= 0x00ffffff; if (entry->function == 0x0b) entry->edx &= 0xffffff00; so those differences ought not to be causing the cache to be invalidated. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Sep 8, 2022 at 4:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé <berrange@redhat.com
wrote:
On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote:
Hi,
I have a question regarding capability caching in the context of KubeVirt. Since we start in KubeVirt one libvirt instance per VM, libvirt has to re-discover on every VM start the qemu capabilities which leads to a 1-2s+ delay in startup.
We already discover the features in a dedicated KubeVirt pod on each node. Therefore I tried to copy the capabilities over to see if that would work.
It looks like in general it could work, but libvirt seems to detect a mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates the cache. The recreated capability cache looks esctly like the original one though ...
The check responsible for the invalidation is this:
``` Outdated capabilities for '%s': host cpuid changed ```
So the KVM_GET_SUPPORTED_CPUID call seems to return slightly different values in different containers.
After trying out the attached golang scripts in different containers, I could indeed see differences.
I can however not really judge what the differences in these KVM function registers mean and I am curious if someone else knows. The files are attached too (as json for easy diffing).
Can you confirm whether the two attached data files were captured by containers running on the same physical host, or could each container have run on a different host.
They are coming from the same host, that is the most surprising bit for me. I am also very sure that this is the case, because I only had one k8s node from where I took these. The containers however differ (obviously) on namespaces and on the privilege level (less obvious). The handler dump is from a fully
On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrangé wrote: privileged
container.
The privilege level sounds like something that might be impactful, so I'll investigate that. I'd be pretty surprised for namespaces to have any impact thnough.
The privilege level is a red herring. Peter reminded me that we have to filter out some parts of CPUID because the APIC IDs vary depending on what host CPU the task executes on.
https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346
In the 2 jSON files you provide, the differences i see should already be matched by
/* filter out local apic id */ if (entry->function == 0x01 && entry->index == 0x00) entry->ebx &= 0x00ffffff; if (entry->function == 0x0b) entry->edx &= 0xffffff00;
so those differences ought not to be causing the cache to be invalidated.
Hm, maybe I misinterpreted the logs then. The snipped I looked at was this: ``` {"component":"virt-launcher","level":"info","msg":"/dev/kvm has changed (1661786802 vs 0)","pos":"virQEMUCapsKVMUsable:4850","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.936000Z"} {"component":"virt-launcher","level":"info","msg":"a=0x7f8138153ba0, b=0x7f818001c480","pos":"virCPUDataIsIdentical:1178","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} {"component":"virt-launcher","level":"info","msg":"Outdated capabilities for '/usr/bin/qemu-system-x86_64': host cpuid changed","pos":"virQEMUCapsIsValid:4993","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} {"component":"virt-launcher","level":"info","msg":"Outdated cached capabilities '/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml' for '/usr/bin/qemu-system-x86_64'","pos":"virFileCacheLoad:163","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} ``` I had the impression from the code that the `/dev/kvm` change (because the containers are not created at the same time) does not invalidate it either. I added the whole debug log, maybe I missed something obvious. Does it make a difference if the cache is created via `virsh domcapabilities` and `virsh capabilities` or via defining the first domain? Best regards, Roman
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Sep 08, 2022 at 04:22:17PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 4:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé <berrange@redhat.com
wrote:
On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote:
Hi,
I have a question regarding capability caching in the context of KubeVirt. Since we start in KubeVirt one libvirt instance per VM, libvirt has to re-discover on every VM start the qemu capabilities which leads to a 1-2s+ delay in startup.
We already discover the features in a dedicated KubeVirt pod on each node. Therefore I tried to copy the capabilities over to see if that would work.
It looks like in general it could work, but libvirt seems to detect a mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates the cache. The recreated capability cache looks esctly like the original one though ...
The check responsible for the invalidation is this:
``` Outdated capabilities for '%s': host cpuid changed ```
So the KVM_GET_SUPPORTED_CPUID call seems to return slightly different values in different containers.
After trying out the attached golang scripts in different containers, I could indeed see differences.
I can however not really judge what the differences in these KVM function registers mean and I am curious if someone else knows. The files are attached too (as json for easy diffing).
Can you confirm whether the two attached data files were captured by containers running on the same physical host, or could each container have run on a different host.
They are coming from the same host, that is the most surprising bit for me. I am also very sure that this is the case, because I only had one k8s node from where I took these. The containers however differ (obviously) on namespaces and on the privilege level (less obvious). The handler dump is from a fully
On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrangé wrote: privileged
container.
The privilege level sounds like something that might be impactful, so I'll investigate that. I'd be pretty surprised for namespaces to have any impact thnough.
The privilege level is a red herring. Peter reminded me that we have to filter out some parts of CPUID because the APIC IDs vary depending on what host CPU the task executes on.
https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346
In the 2 jSON files you provide, the differences i see should already be matched by
/* filter out local apic id */ if (entry->function == 0x01 && entry->index == 0x00) entry->ebx &= 0x00ffffff; if (entry->function == 0x0b) entry->edx &= 0xffffff00;
so those differences ought not to be causing the cache to be invalidated.
Hm, maybe I misinterpreted the logs then. The snipped I looked at was this:
``` {"component":"virt-launcher","level":"info","msg":"/dev/kvm has changed (1661786802 vs 0)","pos":"virQEMUCapsKVMUsable:4850","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.936000Z"} {"component":"virt-launcher","level":"info","msg":"a=0x7f8138153ba0, b=0x7f818001c480","pos":"virCPUDataIsIdentical:1178","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} {"component":"virt-launcher","level":"info","msg":"Outdated capabilities for '/usr/bin/qemu-system-x86_64': host cpuid changed","pos":"virQEMUCapsIsValid:4993","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} {"component":"virt-launcher","level":"info","msg":"Outdated cached capabilities '/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml' for '/usr/bin/qemu-system-x86_64'","pos":"virFileCacheLoad:163","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} ```
Can you capture the /var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml' from the virt-handler and virt-launcher pods. It contains a <cpuid> block that will show us the differences libvirt recored, /after/, libvirt has done its filtering. This will show if there si more we need to filter.
I had the impression from the code that the `/dev/kvm` change (because the containers are not created at the same time) does not invalidate it either.
I added the whole debug log, maybe I missed something obvious.
Does it make a difference if the cache is created via `virsh domcapabilities` and `virsh capabilities` or via defining the first domain?
They'll all end up at the same caching scode so should not make any difference.
Best regards, Roman
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Sep 8, 2022 at 4:32 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 8, 2022 at 4:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé < berrange@redhat.com
wrote:
On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote: > Hi, > > > I have a question regarding capability caching in the context of KubeVirt. > Since we start in KubeVirt one libvirt instance per VM, libvirt has to > re-discover on every VM start the qemu capabilities which leads to a 1-2s+ > delay in startup. > > We already discover the features in a dedicated KubeVirt pod on each node. > Therefore I tried to copy the capabilities over to see if that would work. > > It looks like in general it could work, but libvirt seems to detect a > mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates > the cache. The recreated capability cache looks esctly like the original > one though ... > > The check responsible for the invalidation is this: > > ``` > Outdated capabilities for '%s': host cpuid changed > ``` > > So the KVM_GET_SUPPORTED_CPUID call seems to return > slightly different values in different containers. > > After trying out the attached golang scripts in different containers, I > could indeed see differences. > > I can however not really judge what the differences in these KVM function > registers mean and I am curious if someone else knows. The files are > attached too (as json for easy diffing).
Can you confirm whether the two attached data files were captured by containers running on the same physical host, or could each container have run on a different host.
They are coming from the same host, that is the most surprising bit for me. I am also very sure that this is the case, because I only had one k8s node from where I took these. The containers however differ (obviously) on namespaces and on the privilege level (less obvious). The handler dump is from a fully
On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrangé wrote: privileged
container.
The privilege level sounds like something that might be impactful, so I'll investigate that. I'd be pretty surprised for namespaces to have any impact thnough.
The privilege level is a red herring. Peter reminded me that we have to filter out some parts of CPUID because the APIC IDs vary depending on what host CPU the task executes on.
https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346
In the 2 jSON files you provide, the differences i see should already be matched by
/* filter out local apic id */ if (entry->function == 0x01 && entry->index == 0x00) entry->ebx &= 0x00ffffff; if (entry->function == 0x0b) entry->edx &= 0xffffff00;
so those differences ought not to be causing the cache to be invalidated.
Hm, maybe I misinterpreted the logs then. The snipped I looked at was
On Thu, Sep 08, 2022 at 04:22:17PM +0200, Roman Mohr wrote: this:
``` {"component":"virt-launcher","level":"info","msg":"/dev/kvm has changed (1661786802 vs
0)","pos":"virQEMUCapsKVMUsable:4850","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.936000Z"}
{"component":"virt-launcher","level":"info","msg":"a=0x7f8138153ba0,
b=0x7f818001c480","pos":"virCPUDataIsIdentical:1178","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
{"component":"virt-launcher","level":"info","msg":"Outdated capabilities for '/usr/bin/qemu-system-x86_64': host cpuid
changed","pos":"virQEMUCapsIsValid:4993","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
{"component":"virt-launcher","level":"info","msg":"Outdated cached capabilities
'/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml'
for
'/usr/bin/qemu-system-x86_64'","pos":"virFileCacheLoad:163","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
```
Can you capture the
/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml'
from the virt-handler and virt-launcher pods. It contains a <cpuid> block that will show us the differences libvirt recored, /after/, libvirt has done its filtering. This will show if there si more we need to filter.
Done. Only attached the one from handler since it is 100% identical to the launcher. Let me know if you need more information. Thanks and best regards, Roman
I had the impression from the code that the `/dev/kvm` change (because the containers are not created at the same time) does not invalidate it either.
I added the whole debug log, maybe I missed something obvious.
Does it make a difference if the cache is created via `virsh domcapabilities` and `virsh capabilities` or via defining the first domain?
They'll all end up at the same caching scode so should not make any difference.
Best regards, Roman
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hi Daniel, just wanted to ask if you had a chance to check the shared file. Let me know if you need more info. Best regards, Roman On Tue, Sep 13, 2022 at 5:31 PM Roman Mohr <rmohr@google.com> wrote:
On Thu, Sep 8, 2022 at 4:32 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 8, 2022 at 4:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrangé wrote:
On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé < berrange@redhat.com
wrote:
> On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote: > > Hi, > > > > > > I have a question regarding capability caching in the context of > KubeVirt. > > Since we start in KubeVirt one libvirt instance per VM,
has to
> > re-discover on every VM start the qemu capabilities which leads to a > 1-2s+ > > delay in startup. > > > > We already discover the features in a dedicated KubeVirt pod on each > node. > > Therefore I tried to copy the capabilities over to see if that would > work. > > > > It looks like in general it could work, but libvirt seems to detect a > > mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates > > the cache. The recreated capability cache looks esctly like
> > one though ... > > > > The check responsible for the invalidation is this: > > > > ``` > > Outdated capabilities for '%s': host cpuid changed > > ``` > > > > So the KVM_GET_SUPPORTED_CPUID call seems to return > > slightly different values in different containers. > > > > After trying out the attached golang scripts in different containers, I > > could indeed see differences. > > > > I can however not really judge what the differences in these KVM function > > registers mean and I am curious if someone else knows. The files are > > attached too (as json for easy diffing). > > Can you confirm whether the two attached data files were captured > by containers running on the same physical host, or could each > container have run on a different host. >
They are coming from the same host, that is the most surprising bit for me. I am also very sure that this is the case, because I only had one k8s node from where I took these. The containers however differ (obviously) on namespaces and on the privilege level (less obvious). The handler dump is from a fully
original privileged
container.
The privilege level sounds like something that might be impactful, so I'll investigate that. I'd be pretty surprised for namespaces to have any impact thnough.
The privilege level is a red herring. Peter reminded me that we have to filter out some parts of CPUID because the APIC IDs vary depending on what host CPU the task executes on.
https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346
In the 2 jSON files you provide, the differences i see should already be matched by
/* filter out local apic id */ if (entry->function == 0x01 && entry->index == 0x00) entry->ebx &= 0x00ffffff; if (entry->function == 0x0b) entry->edx &= 0xffffff00;
so those differences ought not to be causing the cache to be invalidated.
Hm, maybe I misinterpreted the logs then. The snipped I looked at was
On Thu, Sep 08, 2022 at 04:22:17PM +0200, Roman Mohr wrote: libvirt the this:
``` {"component":"virt-launcher","level":"info","msg":"/dev/kvm has changed (1661786802 vs
0)","pos":"virQEMUCapsKVMUsable:4850","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.936000Z"}
{"component":"virt-launcher","level":"info","msg":"a=0x7f8138153ba0,
b=0x7f818001c480","pos":"virCPUDataIsIdentical:1178","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
{"component":"virt-launcher","level":"info","msg":"Outdated capabilities for '/usr/bin/qemu-system-x86_64': host cpuid
changed","pos":"virQEMUCapsIsValid:4993","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
{"component":"virt-launcher","level":"info","msg":"Outdated cached capabilities
'/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml'
for
'/usr/bin/qemu-system-x86_64'","pos":"virFileCacheLoad:163","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
```
Can you capture the
/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml'
from the virt-handler and virt-launcher pods. It contains a <cpuid> block that will show us the differences libvirt recored, /after/, libvirt has done its filtering. This will show if there si more we need to filter.
Done. Only attached the one from handler since it is 100% identical to the launcher.
Let me know if you need more information.
Thanks and best regards, Roman
I had the impression from the code that the `/dev/kvm` change (because the containers are not created at the same time) does not invalidate it either.
I added the whole debug log, maybe I missed something obvious.
Does it make a difference if the cache is created via `virsh domcapabilities` and `virsh capabilities` or via defining the first domain?
They'll all end up at the same caching scode so should not make any difference.
Best regards, Roman
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Sep 13, 2022 at 05:31:42PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 4:32 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 8, 2022 at 4:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote:
On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé < berrange@redhat.com
wrote:
> On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote: > > Hi, > > > > > > I have a question regarding capability caching in the context of > KubeVirt. > > Since we start in KubeVirt one libvirt instance per VM, libvirt has to > > re-discover on every VM start the qemu capabilities which leads to a > 1-2s+ > > delay in startup. > > > > We already discover the features in a dedicated KubeVirt pod on each > node. > > Therefore I tried to copy the capabilities over to see if that would > work. > > > > It looks like in general it could work, but libvirt seems to detect a > > mismatch in the exposed KVM CPU ID in every pod. Therefore it invalidates > > the cache. The recreated capability cache looks esctly like the original > > one though ... > > > > The check responsible for the invalidation is this: > > > > ``` > > Outdated capabilities for '%s': host cpuid changed > > ``` > > > > So the KVM_GET_SUPPORTED_CPUID call seems to return > > slightly different values in different containers. > > > > After trying out the attached golang scripts in different containers, I > > could indeed see differences. > > > > I can however not really judge what the differences in these KVM function > > registers mean and I am curious if someone else knows. The files are > > attached too (as json for easy diffing). > > Can you confirm whether the two attached data files were captured > by containers running on the same physical host, or could each > container have run on a different host. >
They are coming from the same host, that is the most surprising bit for me. I am also very sure that this is the case, because I only had one k8s node from where I took these. The containers however differ (obviously) on namespaces and on the privilege level (less obvious). The handler dump is from a fully
On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrangé wrote: privileged
container.
The privilege level sounds like something that might be impactful, so I'll investigate that. I'd be pretty surprised for namespaces to have any impact thnough.
The privilege level is a red herring. Peter reminded me that we have to filter out some parts of CPUID because the APIC IDs vary depending on what host CPU the task executes on.
https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346
In the 2 jSON files you provide, the differences i see should already be matched by
/* filter out local apic id */ if (entry->function == 0x01 && entry->index == 0x00) entry->ebx &= 0x00ffffff; if (entry->function == 0x0b) entry->edx &= 0xffffff00;
so those differences ought not to be causing the cache to be invalidated.
Hm, maybe I misinterpreted the logs then. The snipped I looked at was
On Thu, Sep 08, 2022 at 04:22:17PM +0200, Roman Mohr wrote: this:
``` {"component":"virt-launcher","level":"info","msg":"/dev/kvm has changed (1661786802 vs
0)","pos":"virQEMUCapsKVMUsable:4850","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.936000Z"}
{"component":"virt-launcher","level":"info","msg":"a=0x7f8138153ba0,
b=0x7f818001c480","pos":"virCPUDataIsIdentical:1178","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
{"component":"virt-launcher","level":"info","msg":"Outdated capabilities for '/usr/bin/qemu-system-x86_64': host cpuid
changed","pos":"virQEMUCapsIsValid:4993","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
{"component":"virt-launcher","level":"info","msg":"Outdated cached capabilities
'/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml'
for
'/usr/bin/qemu-system-x86_64'","pos":"virFileCacheLoad:163","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"}
```
Can you capture the
/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml'
from the virt-handler and virt-launcher pods. It contains a <cpuid> block that will show us the differences libvirt recored, /after/, libvirt has done its filtering. This will show if there si more we need to filter.
Done. Only attached the one from handler since it is 100% identical to the launcher.
The only differences between libvirt's XML and what's in your original testbin_*json files, are the two functions that libvirt masks. The result of masking your JSON matches the libvirt XML record, so that's good. This still leaves me puzzelled.
Let me know if you need more information.
Can you run libvirt with a different log filter 1:cpu 1:qemu and attach the full log, from when libvirtd starts up until it shows the capabilities invalidation. as the cpu code might be telling us more about what the difference is. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Daniel P. Berrangé
-
Roman Mohr