On Wed, Aug 31, 2022 at 13:40:45 -0500, Jonathon Jongsma wrote:
After a bit of a lengthy delay, this is the second version of this
patch
series. See
https://bugzilla.redhat.com/show_bug.cgi?id=2016527 for more
information about the goal, but the summary is that RHEL does not want to ship
the qemu storage plugins for curl and ssh. Handling them outside of the qemu
process provides several advantages such as reduced attack surface and
stability.
IMO it's also worthy noting that it increases complexity of the setup
and potentially also resource usage.
A quick summary of the code:
[...]
Open questions
- selinux: I need some help from people more familiar with selinux to figure
out what is needed here. When selinux is enforcing, I get a failure to
launch nbdkit to serve the disks. I suspect we need a new context and policy
for /usr/sbin/nbdkit that allows it to transition to the appropriate selinux
context. The current context (on fedora) is "system_u:object_r:bin_t:s0".
When I (temporarily) change the context to something like qemu_exec_t,
I am able to start nbdkit and the domain launches.
Is the problem with starting the 'nbdkit' process itself or with the
socket?
At least in case of the socket we must make sure that no other process
can acess it especially once you pass authentication to nbdkit to avoid
any kind of backdoor to authenticated storage.
Few more open questions:
- What if 'nbdkit' crashes
With an integrated block layer, all of the VM crashes. Now when we
have a separated access to disks (this is also an issue for use of
the qemu-storage-daemon) if any of the helper processes crash we get
into a new situation.
I think we'll need to think about:
- adding an event for any of the helper processes failing
- adding a lifecycle action for it (e.g. pause qemu if nbdkit
dies)
- think about possible recovery of the situation
- resource pinning
For now all the resources are integral to the qemu process so
emulator and iothread pinning can be used to steer which cpus the
disk should use. With us adding new possibly cpu intensive processes
we'll probably need to consider how to handle them more generally and
manage their resources.
- Integration with qemu storage daemon used for a VM
With the attempt to rewrite QSD in other languages it will possibly
make sense to run it instead of the native qemu block layer (e.g.
take advantage of memory safe languages). So we should also think
about how these two will be able to coexist.
The last point is more of a future-work thing to consider, but the
first two points should be considered for the final release of this
feature. Specifically because in your current design it replaces the
in-qemu driver even in cases when it is compiled into qemu thus also for
existing users. Alternatively it would have to be opt-in.
Known shortcomings
- creating disks (in ssh) still isn't supported. I wanted to send out the
patch series anyway since it's been delayed too long already.
That shouldn't be a problem, there's plenty protocols where we don't
support creating the storage. Creating storage is needed only for
snapshots so we can simply refuse to do it in the first place.