On Mon, 2020-03-09 at 18:04 +0000, Daniel P. Berrangé wrote:
On Mon, Mar 09, 2020 at 06:09:13PM +0100, Andrea Bolognani wrote:
> On Fri, 2020-03-06 at 17:49 +0000, Daniel P. Berrangé wrote:
> > On Fri, Mar 06, 2020 at 06:24:15PM +0100, Andrea Bolognani wrote:
[...]
> Aside: instead of a per-VM setting, would it make sense for this
to
> be a connection-level setting? That way, even on traditional libvirt
> deployments, the hypervisor admin could eg. opt out of machinectl
> registration if they so desire; at the same time, you'd avoid the
> situation where most VMs are confined using CGroups but a few are
> not, which is probably not a desirable scenario.
Yes, functionally it would work fine as a connection level setting
too, though this hides the behaviour from the anyone looking at the
guest config. We've previously punted quite a few things to the
qemu.conf because we didn't want to go through process of representing
them in the domain XML. This was OK when the qemu.conf settings were
something done once at host deployment time.
With the embedded driver, I think this is not so desirable, as means
to get the configuration they want from a VM, they need to deal with
two distinct config files. The ideal would be that everything that
is commonly needed can be achieved solely in the domain XML, and
I think resource control backend is one such common tunable.
I don't have a strong opinion either way, and as far as my current
use is concerned it doesn't bother me to have to deal with a second
configuration file. The reason why I thought a per-VM setting might
not be desirable is that applications would then be able to
override it, and so maybe VMs created with virt-manager would be
registered against machinectl but VMs created using GNOME Boxes
would not, and if the sysadmin likes to use machinectl to get a
comprehensive view of the system they'd no longer be guaranteed
that. But if that's not the kind of scenario we think we should
prevent, then a per-VM setting is fine by me :)
[...]
> Right now we're already doing
>
> qemu-$domid-$domname
>
> where I'm not entirely sure how much $domid actually buys us.
IIRC $domid was a hack because at one time we had problems with
systemd not cleaning up the transient scope when QEMU was killed.
This would prevent libvirt starting the guest again thereafter.
I can't remember now if this was a bug we fixed in systemd or
libvirt or both or neither.
I see! It would be neat if we could get rid of it, assuming of course
it's no longer needed on the platforms we target.
[...]
> > Of course you can turn off virtlogd via qemu.conf
>
> That's what I'm doing right now and it works well enough, but I'm
> afraid that requiring a bunch of setup will discourage developers
> from using the embedded driver. We should aim for a reasonable out
> of the box experience instead.
Why do you need to turn it off though ? It should already
"do the right thing" as the log files should appear under a
different location and not have any clash. Turning it off
immediately creates a denial of service CVE in your application.
I was getting the same SELinux denial that Michal reported a few
days back: virtlogd wants to verify it's being connected to by a
process running as root, but it's only allowed by the policy to
look into libvirtd's /proc/$pid for this information. So, for the
same reason virtqemud can't currently connect to virtlogd when
SELinux is in enforcing mode, neither can my qemu:///embed-using
application.
Besides that, there is the fact that a lot of people, mainly those
coming from a containers background, are not happy with having extra
daemons running. I'm not saying they would prefer being hit by a DoS
than having virtlogd running :) but they really, really don't like
daemons :)
None the less, as above I think we need common things to be
controllable via the domain XML. So either we need to make a
tunable there for use of logd or not, or we need to implement
the FIFO idea to avoid need for logd at all.
It seems like the FIFO idea (though I'll admit I don't fully
understand it) would be the best of both worlds.
[...]
> > If you don't want to use virtnetworkd, then you
won't be
> > creating such an <interface> config in the first place.
> > The app will have the option to open an embedded seconary
> > driver if desired. Some of the drivers won't really make
> > sense as embedded things though, at least not without
> > extra work. ie a embedded network or nwfilter driver has
> > no value unless your app has moved into a new network
> > namespace, as otherwise it will just fight with the
> > global network driver.
>
> Again, I think our defaults for qemu:///embed should be consistent
> and conservative: instead of having developers opt out of using
> network:///system, they should have to opt in before global
> resources are involved.
They already opt-in to using the network driver by virtue of
configuring their guest to request its use. We don't need to
opt-in twice.
For some applications that's definitely an option, but others like
virt-qemu-run accept a fairly arbitrary configuration and having to
figure out whether that would result in eg. virtnetworkd being used,
and whether that's acceptable, is extra work as well as potential
duplication of logic that already exists in libvirt...
Then again, something like virt-qemu-run is probably expected to
expose basically the entire feature set of libvirt, as opposed to
more targeted applications which will use qemu:///embed internally
and only rely on a relatively small subset.
> If we don't do that, I'm afraid developers will lose
trust in the
> whole qemu:///embed idea. Speaking from my own experience, I was
> certainly surprised when I accidentally realized my qemu:///embed
> VMs were showing up in the output of machinectl, and now I'm kinda
> wondering how many other ways the application I'm working on, for
> which the use of libvirt is just an implementation detail, is poking
> at the system without my knowledge...
First off, this mis-understanding highlights a need for better
documentation to set out what the embedded driver is and is not
expected to do.
We definitely need to document this very clearly if we want
qemu:///embed to gain traction.
At a high level the embedded QEMU driver
- Isolated from any other instance of the QEMU driver
Yup.
- Process context of app is inherited by QEMU (namespaces,
cgroups, CPU pinning, security context, etc)
Optionally! The fact that libvirt can deal with these is a big
selling point in some scenarios.
- All other functionality is unchanged.
Your comments above are equating two distinct scenarios, one which
had a serious functional bug & violated the first two design goals,
and one which does not cause any functional issues at all.
I'm not equating the two, just reporting a bunch of behaviors that
I ran into while trying to use qemu:///embed in my application and
that I found to be surprising, in an attempt to figure out which
ones are intentional and whether even those are necessarily
something that we want to keep around in their current form.
There's no question that we must fix the machined problem.
There is nothing that needs fixing in the virtual network case as
that behaviour is intentional and is not causing any ill effects.
The embedded driver is NOT intended to be completely isolated from
everything on the host, whether it is systemd, or another host OS
service, a libvirt secondary driver, or something else again.
In fact a libvirt secondary driver should really just be considered
the same as any other host OS service. We just happen to use a
familiar RPC API to talk to that secondary driver, where as for
a host OS service we'd use DBus or some 3rd party API/protocol.
We don't require a special config in the QEMU driver to permit
use of other host OS services, we would simply do the right
thing based on whatever domain XML the application provides. The
same is true of use of libvirt secondary drivers. We must NEVER
connect to a secondary driver, unless something in the domain XML
requests that behaviour. The <interface type=network> is a valid
reason to connect to the virnetworkd daemon.
I mean, one could argue that namespaces and CGroups are OS services,
and whether or not those are used is decided not based on the per-VM
configuration but the per-driver configuration :) But I'm not here
to play devil's advocate, and I mostly get your point.
Anyway, as I mentioned earlier part of the issue is having daemons
running, which neither of the features mentioned above requires.
Of course when we do connect to virnetworkd, we MUST ensure that
anything we do preserves isolation from other QEMU driver instances.
I would also note that the virtnetworkd daemon connected to, may
or may not actually be the same as the host OS one. It is entirely
possible that the application has opened the embedded QEMU driver
from within a namesace, that results in a connection to the
/var/run/libvirt/virnetworkd being serviced by a completely isolated
virnetworkd running inside a different network namespace from the
host OS virtnetworkd. This is of no concern to the QEMU driver
though - it just needs to honour what is in the domain XML.
This kind of setup sounds extremely confusing.
Would you have multiple instances of virtnetworkd, one per network
namespace, all running at the same time? How would the application
pick the correct one to connect to?
--
Andrea Bolognani / Red Hat / Virtualization