Re: [PATCH RFC 00/10] qemu: Enable SCHED_CORE for domains and helper processes

Monday, 23 May 2022

On Mon, May 09, 2022 at 05:02:07PM +0200, Michal Privoznik wrote:
...
 The Linux kernel offers a way to mitigate side channel attacks on
Hyper
 Threads (e.g. MDS and L1TF). Long story short, userspace can define
 groups of processes (aka trusted groups) and only processes within one
 group can run on sibling Hyper Threads. The group membership is
 automatically preserved on fork() and exec().

 Now, there is one scenario which I don't cover in my series and I'd like
 to hear proposal: if there are two guests with odd number of vCPUs they
 can no longer run on sibling Hyper Threads because my patches create
 separate group for each QEMU. This is a performance penalty. Ideally, we
 would have a knob inside domain XML that would place two or more domains
 into the same trusted group. But since there's pre-existing example (of
 sharing a piece of information between two domains) I've failed to come
 up with something usable. 
Right now users have two choices

  - Run with SMT enabled. 100% of CPUs available. VMs are vulnerable
  - Run with SMT disabled. 50% of CPUs available. VMs are safe

What the core scheduling gives is somewhere inbetween, depending on
the vCPU count. If we assume all guests have even CPUs then

  - Run with SMT enabled + core scheduling. 100% of CPUs available.
    100% of CPUs are used, VMs are safe

This is the ideal scenario, and probably the fairly common scenario
too as IMHO even number CPU counts are likely to be typical.

If we assume the worst case, of entirely 1 vCPU guests then we have

  - Run with SMT enabled + core scheduling. 100% of CPUs available.
    50% of CPUs are used, VMs are safe

This feels highly unlikely though, as all except tiny workloads
want > 1 vCPU.

With entirely 3 vCPU guests then we have

  - Run with SMT enabled + core scheduling. 100% of CPUs available.
    75% of CPUs are used, VMs are safe

With entirely 5 vCPU guests then we have

  - Run with SMT enabled + core scheduling. 100% of CPUs available.
    83% of CPUs are used, VMs are safe

If we have a mix of even and odd numbered vCPU guests, with mostly
even numbered, then I think utilization will  be high enough that
almost no one will care about the last few %.

While we could try to come up with a way to express sharing of
cores between VMs I don't think its worth it, in the absence of
someone presenting compelling data why it'll be needed in a non
niche use case. Bear in mind, that users can also resort to
pinning VMs explicitly to get sharing.

In terms of defaults I'd very much like us to default to enabling
core scheduling, so that we have a secure deployment out of the box.
The only caveat is that this does have the potential to be interpreted
as a regression for existing deployments in some cases. Perhaps we
should make it a meson option for distros to decide whether to ship
with it turned on out of the box or not ?

I don't think we need core scheduling to be a VM XML config option,
because security is really a host level matter IMHO, such that it
does't make sense to have both secure & insecure VMs co-located.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH RFC 00/10] qemu: Enable SCHED_CORE for domains and helper processes