On Tue, May 24, 2022 at 05:35:03PM +0200, Michal Prívozník wrote:
On 5/24/22 12:33, Daniel P. Berrangé wrote:
> On Tue, May 24, 2022 at 11:50:50AM +0200, Michal Prívozník wrote:
>> On 5/23/22 18:30, Daniel P. Berrangé wrote:
>>> On Mon, May 09, 2022 at 05:02:17PM +0200, Michal Privoznik wrote:
>>>> Since the level of trust that QEMU has is the same level of trust
>>>> that helper processes have there's no harm in placing all of them
>>>> into the same group.
>>>
>>> This assumption feels like it might be a bit of a stretch. I
>>> recall discussing this with Paolo to some extent a long time
>>> back, but let me recap my understanding.
>>>
>>> IIUC, the attack scenario is that a guest vCPU thread is scheduled
>>> on a SMT sibling with another thread that is NOT running guest OS
>>> code. "another thread" in this context refers to many things
>>>
>>> - Random host OS processes
>>> - QEMU vCPU threads from a different geust
>>> - QEMU emulator threads from any guest
>>> - QEMU helper process threads from any guest
>>>
>>> Consider for example, if the QEMU emulator thread contains a password
>>> used for logging into a remote RBD/Ceph server. That is a secret
>>> credential that the guest OS should not have permission to access.
>>>
>>> Consider alternatively that the QEMU emulator is making a TLS connection
>>> to some service, and there are keys negotiated for the TLS session. While
>>> some of the data transmitted over the session is known to the guest OS,
>>> we shouldn't assume it all is.
>>>
>>> Now in the case of QEMU emulator threads I think you can make a somewhat
>>> decent case that we don't have to worry about it. Most of the
keys/passwds
>>> are used once at cold boot, so there's no attack window for vCPUs at
that
>>> point. There is a small window of risk when hotplugging. If someone is
>>> really concerned about this though, they shouldn't have let QEMU have
>>> these credentials in the first place, as its already vulnerable to a
>>> guest escape. eg use kernel RBD instead of letting QEMU directly login
>>> to RBD.
>>>
>>> IOW, on balance of probabilities it is reasonable to let QEMU emulator
>>> threads be in the same core scheduling domain as vCPU threads.
>>>
>>> In the case of external QEMU helper processes though, I think it is
>>> a far less clearcut decision. There are a number of reasons why helper
>>> processes are used, but at least one significant motivating factor is
>>> security isolation between QEMU & the helper - they can only
communicate
>>> and share information through certain controlled mechanisms.
>>>
>>> With this in mind I think it is risky to assume that it is safe to
>>> run QEMU and helper processes in the same core scheduling group. At
>>> the same time there are likely cases where it is also just fine to
>>> do so.
>>>
>>> If we separate helper processes from QEMU vCPUs this is not as wasteful
>>> as it sounds. Some the helper processes are running trusted code, there
>>> is no need for helper processes from different guests to be isolated.
>>> They can all just live in the default core scheduling domain.
>>>
>>> I feel like I'm talking myself into suggesting the core scheduling
>>> host knob in qemu.conf needs to be more than just a single boolean.
>>> Either have two knobs - one to turn it on/off and one to control
>>> whether helpers are split or combined - or have one knob and make
>>> it an enumeration.
>>
>> Seems reasonable. And the default should be QEMU's emulator + vCPU
>> threads in one sched group, and all helper processes in another, right?
>
> Not quite. I'm suggesting that helper processes can remain in the
> host's default core scheduling group, since the helpers are all
> executing trusted machine code.
>
>>> One possible complication comes if we consider a guest that is
>>> pinned, but not on the fine grained per-vCPU basis.
>>>
>>> eg if guest is set to allow floating over a sub-set of host CPUs
>>> we need to make sure that it is possible to actually execute the
>>> guest still. ie if entire guest is pinned to 1 host CPU but our
>>> config implies use of 2 distinct core scheduling domains, we have
>>> an unsolvable constraint.
>>
>> Do we? Since we're placing emulator + vCPUs into one group and helper
>> processes into another these would never run at the same time, but that
>> would be the case anyways - if emulator write()-s into a helper's socket
>> it would be blocked because the helper isn't running. This
"bottleneck"
>> is result of pinning everything onto a single CPU and exists regardless
>> of scheduling groups.
>>
>> The only case where scheduling groups would make the bottleneck worse is
>> if emulator and vCPUs were in different groups, but we don't intent to
>> allow that.
>
> Do we actually pin the helper processes at all ?
Yes, we do. Into the same CGroup as emulator thread:
qemuSetupCgroupForExtDevices().
>
> I was thinking of a scenario where we implicitly pin helper processes to
> the same CPUs as the emulator threads and/or QEMU process-global pinning
> mask. eg
>
> If we only had
>
> <vcpu placement='static' cpuset="2-3"
current="1">2</vcpu>
>
> Traditionally the emulator threads, i/o threads, vCPU threads will
> all float across host CPUs 2 & 3. I was assuming we also placed
> helper processes in these same 2 host CPUs. Not sure if that's right
> or not. Assuming we do, then...
>
> Lets say CPUs 2 & 3 are SMT siblings.
>
> We have helper processes in the default core scheduling
> domain and QEMU in a dedicated core scheduling domain. We
> loose 100% of concurrency between the vCPUs and helper
> processes.
So in this case users might want to have helpers and emulator in the
same group. Therefore, in qemu.conf we should allow something like:
sched_core = "none" // off, no SCHED_CORE
"emulator" // default, place only emulator & vCPU threads
// into the group
"helpers" // place emulator & vCPU & helpers into the
// group
I agree that "helpers" is terrible name, maybe "emulator+helpers"?
Or
something completely different? Maybe:
A scalar is nice, but we can just call it "full" or "all", as in
the opposite of "none".
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|