[libvirt] [RFC] docs: Discourage usage of cache mode=passthrough

Cache mode=passthrough can result in a broken cache topology if the domain topology is not exactly the same as the host topology. Warn about that in the documentation. Bug report for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1184125 Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 57ec2ff34..9c21892f3 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1478,7 +1478,9 @@ <dt><code>passthrough</code></dt> <dd>The real CPU cache data reported by the host CPU will be - passed through to the virtual CPU.</dd> + passed through to the virtual CPU. Using this mode is not + recommended unless the domain CPU and NUMA topology is exactly + the same as the host CPU and NUMA topology.</dd> <dt><code>disable</code></dt> <dd>The virtual CPU will report no CPU cache of the specified -- 2.13.5

On 09/19/2017 03:37 PM, Eduardo Habkost wrote:
Cache mode=passthrough can result in a broken cache topology if the domain topology is not exactly the same as the host topology. Warn about that in the documentation.
Bug report for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1184125
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 57ec2ff34..9c21892f3 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1478,7 +1478,9 @@
<dt><code>passthrough</code></dt> <dd>The real CPU cache data reported by the host CPU will be - passed through to the virtual CPU.</dd> + passed through to the virtual CPU. Using this mode is not + recommended unless the domain CPU and NUMA topology is exactly + the same as the host CPU and NUMA topology.</dd>
To me this sounds like it should be forbidden by libvirt, rather than just documented as "bad". (I haven't followed any previous discussion on the topic though, so maybe I'm over-reacting).

On Thu, Sep 21, 2017 at 01:14:04PM -0400, Laine Stump wrote:
On 09/19/2017 03:37 PM, Eduardo Habkost wrote:
Cache mode=passthrough can result in a broken cache topology if the domain topology is not exactly the same as the host topology. Warn about that in the documentation.
Bug report for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1184125
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 57ec2ff34..9c21892f3 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1478,7 +1478,9 @@
<dt><code>passthrough</code></dt> <dd>The real CPU cache data reported by the host CPU will be - passed through to the virtual CPU.</dd> + passed through to the virtual CPU. Using this mode is not + recommended unless the domain CPU and NUMA topology is exactly + the same as the host CPU and NUMA topology.</dd>
To me this sounds like it should be forbidden by libvirt, rather than just documented as "bad". (I haven't followed any previous discussion on the topic though, so maybe I'm over-reacting).
mode=passthrough is a bad idea most times, and most people don't really need it. But if libvirt already supports it, won't its removal be a regression for people that are already relying on it? I will check later if we can make host-cache-info safer in QEMU, by fixing up the socket/core/thread counts in CPUID instead of copying it as-is from the host. -- Eduardo

On Thu, Sep 21, 2017 at 01:14:04PM -0400, Laine Stump wrote:
On 09/19/2017 03:37 PM, Eduardo Habkost wrote:
Cache mode=passthrough can result in a broken cache topology if the domain topology is not exactly the same as the host topology. Warn about that in the documentation.
Bug report for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1184125
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 57ec2ff34..9c21892f3 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1478,7 +1478,9 @@
<dt><code>passthrough</code></dt> <dd>The real CPU cache data reported by the host CPU will be - passed through to the virtual CPU.</dd> + passed through to the virtual CPU. Using this mode is not + recommended unless the domain CPU and NUMA topology is exactly + the same as the host CPU and NUMA topology.</dd>
To me this sounds like it should be forbidden by libvirt, rather than just documented as "bad". (I haven't followed any previous discussion on the topic though, so maybe I'm over-reacting).
In high performance setups, people pin guest vCPUs to host pCPUs and set the vCPU topology to match the host pCPU topology they've pinned to. So ohaving a cache mode that matches this topology is just fine. It simply isn't something you want as a default for the more typical floating vCPUs scenarios. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Sep 28, 2017 at 09:21:41AM +0100, Daniel P. Berrange wrote:
On Thu, Sep 21, 2017 at 01:14:04PM -0400, Laine Stump wrote:
On 09/19/2017 03:37 PM, Eduardo Habkost wrote:
Cache mode=passthrough can result in a broken cache topology if the domain topology is not exactly the same as the host topology. Warn about that in the documentation.
Bug report for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1184125
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 57ec2ff34..9c21892f3 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1478,7 +1478,9 @@
<dt><code>passthrough</code></dt> <dd>The real CPU cache data reported by the host CPU will be - passed through to the virtual CPU.</dd> + passed through to the virtual CPU. Using this mode is not + recommended unless the domain CPU and NUMA topology is exactly + the same as the host CPU and NUMA topology.</dd>
To me this sounds like it should be forbidden by libvirt, rather than just documented as "bad". (I haven't followed any previous discussion on the topic though, so maybe I'm over-reacting).
In high performance setups, people pin guest vCPUs to host pCPUs and set the vCPU topology to match the host pCPU topology they've pinned to. So ohaving a cache mode that matches this topology is just fine. It simply isn't something you want as a default for the more typical floating vCPUs scenarios.
So, should this patch be applied? -- Eduardo

On Mon, Nov 06, 2017 at 11:10:00AM -0200, Eduardo Habkost wrote:
On Thu, Sep 28, 2017 at 09:21:41AM +0100, Daniel P. Berrange wrote:
On Thu, Sep 21, 2017 at 01:14:04PM -0400, Laine Stump wrote:
On 09/19/2017 03:37 PM, Eduardo Habkost wrote:
Cache mode=passthrough can result in a broken cache topology if the domain topology is not exactly the same as the host topology. Warn about that in the documentation.
Bug report for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1184125
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 57ec2ff34..9c21892f3 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1478,7 +1478,9 @@
<dt><code>passthrough</code></dt> <dd>The real CPU cache data reported by the host CPU will be - passed through to the virtual CPU.</dd> + passed through to the virtual CPU. Using this mode is not + recommended unless the domain CPU and NUMA topology is exactly + the same as the host CPU and NUMA topology.</dd>
To me this sounds like it should be forbidden by libvirt, rather than just documented as "bad". (I haven't followed any previous discussion on the topic though, so maybe I'm over-reacting).
In high performance setups, people pin guest vCPUs to host pCPUs and set the vCPU topology to match the host pCPU topology they've pinned to. So ohaving a cache mode that matches this topology is just fine. It simply isn't something you want as a default for the more typical floating vCPUs scenarios.
So, should this patch be applied?
We could take a patch that describes more clearly when it is reasonable to use the passthrough mode. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Nov 06, 2017 at 01:17:02PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 06, 2017 at 11:10:00AM -0200, Eduardo Habkost wrote:
On Thu, Sep 28, 2017 at 09:21:41AM +0100, Daniel P. Berrange wrote:
On Thu, Sep 21, 2017 at 01:14:04PM -0400, Laine Stump wrote:
On 09/19/2017 03:37 PM, Eduardo Habkost wrote:
Cache mode=passthrough can result in a broken cache topology if the domain topology is not exactly the same as the host topology. Warn about that in the documentation.
Bug report for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1184125
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 57ec2ff34..9c21892f3 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1478,7 +1478,9 @@
<dt><code>passthrough</code></dt> <dd>The real CPU cache data reported by the host CPU will be - passed through to the virtual CPU.</dd> + passed through to the virtual CPU. Using this mode is not + recommended unless the domain CPU and NUMA topology is exactly + the same as the host CPU and NUMA topology.</dd>
To me this sounds like it should be forbidden by libvirt, rather than just documented as "bad". (I haven't followed any previous discussion on the topic though, so maybe I'm over-reacting).
In high performance setups, people pin guest vCPUs to host pCPUs and set the vCPU topology to match the host pCPU topology they've pinned to. So ohaving a cache mode that matches this topology is just fine. It simply isn't something you want as a default for the more typical floating vCPUs scenarios.
So, should this patch be applied?
We could take a patch that describes more clearly when it is reasonable to use the passthrough mode.
Why "unless the domain CPU and NUMA topology is exactly the same as the host CPU and NUMA topology" isn't a clear description? -- Eduardo

On Mon, Nov 06, 2017 at 11:43:49AM -0200, Eduardo Habkost wrote:
On Mon, Nov 06, 2017 at 01:17:02PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 06, 2017 at 11:10:00AM -0200, Eduardo Habkost wrote:
On Thu, Sep 28, 2017 at 09:21:41AM +0100, Daniel P. Berrange wrote:
On Thu, Sep 21, 2017 at 01:14:04PM -0400, Laine Stump wrote:
On 09/19/2017 03:37 PM, Eduardo Habkost wrote:
Cache mode=passthrough can result in a broken cache topology if the domain topology is not exactly the same as the host topology. Warn about that in the documentation.
Bug report for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1184125
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 57ec2ff34..9c21892f3 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1478,7 +1478,9 @@
<dt><code>passthrough</code></dt> <dd>The real CPU cache data reported by the host CPU will be - passed through to the virtual CPU.</dd> + passed through to the virtual CPU. Using this mode is not + recommended unless the domain CPU and NUMA topology is exactly + the same as the host CPU and NUMA topology.</dd>
To me this sounds like it should be forbidden by libvirt, rather than just documented as "bad". (I haven't followed any previous discussion on the topic though, so maybe I'm over-reacting).
In high performance setups, people pin guest vCPUs to host pCPUs and set the vCPU topology to match the host pCPU topology they've pinned to. So ohaving a cache mode that matches this topology is just fine. It simply isn't something you want as a default for the more typical floating vCPUs scenarios.
So, should this patch be applied?
We could take a patch that describes more clearly when it is reasonable to use the passthrough mode.
Why "unless the domain CPU and NUMA topology is exactly the same as the host CPU and NUMA topology" isn't a clear description?
Just matching topology is not useful unless you've also pinned the guest CPUs to host CPUs. So I think it'd be clearer to say something like "If using 'passthrough' mode, it is recommended to explicitly pin each virtual CPU to a dedicated host CPU, and setup the guest CPU and NUMA topology to match that of the host. Mis-matched topology or freely floating CPUs will result in unpredictable performance, so should be avoided." Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Nov 06, 2017 at 02:08:31PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 06, 2017 at 11:43:49AM -0200, Eduardo Habkost wrote:
On Mon, Nov 06, 2017 at 01:17:02PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 06, 2017 at 11:10:00AM -0200, Eduardo Habkost wrote:
On Thu, Sep 28, 2017 at 09:21:41AM +0100, Daniel P. Berrange wrote:
On Thu, Sep 21, 2017 at 01:14:04PM -0400, Laine Stump wrote:
On 09/19/2017 03:37 PM, Eduardo Habkost wrote: > Cache mode=passthrough can result in a broken cache topology if > the domain topology is not exactly the same as the host topology. > Warn about that in the documentation. > > Bug report for reference: > https://bugzilla.redhat.com/show_bug.cgi?id=1184125 > > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> > --- > docs/formatdomain.html.in | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index 57ec2ff34..9c21892f3 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -1478,7 +1478,9 @@ > > <dt><code>passthrough</code></dt> > <dd>The real CPU cache data reported by the host CPU will be > - passed through to the virtual CPU.</dd> > + passed through to the virtual CPU. Using this mode is not > + recommended unless the domain CPU and NUMA topology is exactly > + the same as the host CPU and NUMA topology.</dd>
To me this sounds like it should be forbidden by libvirt, rather than just documented as "bad". (I haven't followed any previous discussion on the topic though, so maybe I'm over-reacting).
In high performance setups, people pin guest vCPUs to host pCPUs and set the vCPU topology to match the host pCPU topology they've pinned to. So ohaving a cache mode that matches this topology is just fine. It simply isn't something you want as a default for the more typical floating vCPUs scenarios.
So, should this patch be applied?
We could take a patch that describes more clearly when it is reasonable to use the passthrough mode.
Why "unless the domain CPU and NUMA topology is exactly the same as the host CPU and NUMA topology" isn't a clear description?
Just matching topology is not useful unless you've also pinned the guest CPUs to host CPUs. So I think it'd be clearer to say something like
"If using 'passthrough' mode, it is recommended to explicitly pin each virtual CPU to a dedicated host CPU, and setup the guest CPU and NUMA topology to match that of the host. Mis-matched topology or freely floating CPUs will result in unpredictable performance, so should be avoided."
Performance of VMs with more complex topologies can be unpredictable even if not using cache passthrough mode. I believe this explanation belongs to the documentation of the cpu/topology or cpu/numa elements. -- Eduardo
participants (3)
-
Daniel P. Berrange
-
Eduardo Habkost
-
Laine Stump