On 9/21/22 1:44 PM, Jonathon Jongsma wrote:
On 9/19/22 8:48 AM, Peter Krempa wrote:
> On Wed, Aug 31, 2022 at 13:40:45 -0500, Jonathon Jongsma wrote:
>> After a bit of a lengthy delay, this is the second version of this patch
>> series. See
https://bugzilla.redhat.com/show_bug.cgi?id=2016527 for more
>> information about the goal, but the summary is that RHEL does not
>> want to ship
>> the qemu storage plugins for curl and ssh. Handling them outside of
>> the qemu
>> process provides several advantages such as reduced attack surface and
>> stability.
>
> IMO it's also worthy noting that it increases complexity of the setup
> and potentially also resource usage.
>
>> A quick summary of the code:
>
> [...]
>
>> Open questions
>> - selinux: I need some help from people more familiar with selinux
>> to figure
>> out what is needed here. When selinux is enforcing, I get a
>> failure to
>> launch nbdkit to serve the disks. I suspect we need a new context
>> and policy
>> for /usr/sbin/nbdkit that allows it to transition to the
>> appropriate selinux
>> context. The current context (on fedora) is
>> "system_u:object_r:bin_t:s0".
>> When I (temporarily) change the context to something like
>> qemu_exec_t,
>> I am able to start nbdkit and the domain launches.
>
> Is the problem with starting the 'nbdkit' process itself or with the
> socket?
The problem seems to be with the nbdkit process itself. Because I use
qemuSecurityCommandRun() to launch nbdkit, it sets the security label
for the nbkit process to be the security label for the domain and then
attempts to launch it. Presumably there is no rule allowing the binaries
with the bin_t context to transition to the label for the domain. But as
I said, my selinux knowledge is quite shallow, so I may need some help
figuring out the proper way to do this.
Just for completeness, here's some additional information from
SETroubleshoot when trying to start a domain using nbdkit with my
patches. In this case I'm just executing the monolithic libvirtd daemon
from my working directory for testing. The domain fails to start :
SELinux is preventing rpc-libvirtd from entrypoint access on the file
/usr/sbin/nbdkit.
Additional Information:
Source Context unconfined_u:unconfined_r:svirt_t:s0:c129,c164
Target Context system_u:object_r:bin_t:s0
Target Objects /usr/sbin/nbdkit [ file ]
Source rpc-libvirtd
Source Path rpc-libvirtd
Port <Unknown>
Raw Audit Messages
type=AVC msg=audit(1663876079.221:7519): avc: denied { entrypoint }
for pid=1750906 comm="rpc-libvirtd" path="/usr/sbin/nbdkit"
dev="dm-1"
ino=3160232 scontext=unconfined_u:unconfined_r:svirt_t:s0:c129,c164
tcontext=system_u:object_r:bin_t:s0 tclass=file permissive=0
Then if I temporarily do something like:
# chcon -t qemu_exec_t /usr/sbin/nbdkit
nbdkit (and the domain) starts fine
> At least in case of the socket we must make sure that no other
process
> can acess it especially once you pass authentication to nbdkit to avoid
> any kind of backdoor to authenticated storage.
nbdkit does have a --selinux-label argument which will set the label of
the socket.
>
> Few more open questions:
> - What if 'nbdkit' crashes
> With an integrated block layer, all of the VM crashes. Now when we
> have a separated access to disks (this is also an issue for use of
> the qemu-storage-daemon) if any of the helper processes crash we get
> into a new situation.
>
> I think we'll need to think about:
> - adding an event for any of the helper processes failing
> - adding a lifecycle action for it (e.g. pause qemu if nbdkit
> dies)
> - think about possible recovery of the situation
Indeed. Seems like something re-usable that could also be used for
qemu-storage-daemon might be useful. I'll look into it.
>
> - resource pinning
> For now all the resources are integral to the qemu process so
> emulator and iothread pinning can be used to steer which cpus the
> disk should use. With us adding new possibly cpu intensive processes
> we'll probably need to consider how to handle them more generally and
> manage their resources.
Good point. I hadn't really thought about this.
>
> - Integration with qemu storage daemon used for a VM
>
> With the attempt to rewrite QSD in other languages it will possibly
> make sense to run it instead of the native qemu block layer (e.g.
> take advantage of memory safe languages). So we should also think
> about how these two will be able to coexist.
>
> The last point is more of a future-work thing to consider, but the
> first two points should be considered for the final release of this
> feature. Specifically because in your current design it replaces the
> in-qemu driver even in cases when it is compiled into qemu thus also for
> existing users. Alternatively it would have to be opt-in.
As far as I can tell, this is currently no good way to determine (via
capabilities or otherwise) whether e.g. the curl driver is compiled into
a particular qemu binary. If I've missed something, please let me know.
My recollection of previous discussions was that I should try to use
nbdkit if available and then fall back to trying the built-in qemu
driver approach. But we could change this strategy if there's a good way
to do it. When you talk about opt-in, do you mean per-domain? e.g. in
the xml? or some other mechanism?
>
>> Known shortcomings
>> - creating disks (in ssh) still isn't supported. I wanted to send
>> out the
>> patch series anyway since it's been delayed too long already.
>
> That shouldn't be a problem, there's plenty protocols where we don't
> support creating the storage. Creating storage is needed only for
> snapshots so we can simply refuse to do it in the first place.
>