Peter Krempa, Oct 07, 2024 at 15:42:
On Mon, Oct 07, 2024 at 12:03:17 +0200, Anthony Harivel wrote:
> Hi,
>
> Peter Krempa, Oct 03, 2024 at 17:33:
> >> Anthony Harivel, Sep 03, 2024 at 14:41:
> > [...]
> >
> >> If I may resume the conversation:
> >>
> >> 1) The helper daemon is primarily needed for security reasons to prevent
> >> potential leaks of confidential information through RAPL.
> >>
> >> 2) There are two main ways to handle the helper daemon:
> >> > Single instance for all QEMU processes:
> >> This requires adjusting socket permissions (chmod/chown) to control
> >> access securely.
> >> > One instance per QEMU process: This offers better isolation but
> >> might increase CPU overhead slightly if the number of instances
> >> running get high.
> >>
> >> 3) Libvirt can manage both approaches, but no existing SELinux policy
> >> covers this yet, so we’ll need to consider that.
> >
> > The question is also which of the approaches will actually be used.
> > Implementing both is possible, it's just whether the managed approach
> > will be used in the end.
> >
>
> In the end, the one that will be used is the one that makes things the
> more transparent/easier for the user.
>
> If libvirt does not manage the daemon, who is going to do it ?
> The vmsr daemon could also be managed by systemd. But again who is
> going to setup this ?
>
> My goal is to make this feature the easiest possible to deploy:
> - Install qemu (the daemon is installed on the system)
> - Install libvirt (the daemon is managed by it)
> - add the feature to the VM XML
>
> And it works.
>
> Users need this ideally as simple as this.
Okay that was really not obvious from the initial patch that you wanted
to achieve the above.
As noted it makes sense to have the managed version if it's going to be
used this way, but the initial implementation didn't make it seem as
such.
And you are absolutely right ! Sorry about the confusion.
My initial patch was only about enabling the feature in the XML file that
creates the right QEMU command line.
Daniel came into the discussion and gave me some thought about the
daemon. Now I've decided that it should be better to have it this way:
one commit to enable the feature in libvirt and another one to manage
the daemon, both coming from the same patch queue.
Bear with me while I'm doing this for the v3.
Thanks,
Anthony
> >> 4) qemu-pr-helper is very similar to the
qemu-vmsr-helper in terms of
> >> architecture and in terms of privileged needs. It can be single
> >> shared instance or per-vm instance.
> >>
> >>
> >> I cannot find the original patch that has added qemu-pr-helper into
> >> libvirt. The latest commit in src/qemu/qemu_process is 7eead248c65f
> >> and it doesn't look right IMHO.
> >>
> >> If you have a reference to the original patch, I can use it as a guide
> >> to implement this helper in the same way, if you believe that is the
> >> best solution.
> >
> > The qemu-pr-helper was introduced in libvirt in commits
> > b0cd8045f012af78e863cd19f74e9db6c1b5dfdd and few prior ones. Note that
> > it was a very long time ago so the code will likely no longer be state
> > of the art. Also note that for 'qemu-pr-helper' it's much more
> > complicated as it needs to be handled also for hotplug of disks when a
> > new instance might need to be started.
> >
> > Here you'll only ever have one instance, that shares the lifetime with
> > the VM, which makes few things much simpler (much less wiring up weird
> > corner cases).
> >
> > In case we do in fact want a libvirt-managed version of this the
> > question is also how to handle potential cases e.g. when the
> > libvirt-managed daemon gets killed/crashes.
> >
> > The PR manager daemon has some form of event handlign which will restart
> > and reconnect it. Is that needed here as well?
>
> It's a stateless daemon so it should be pretty simple.
>
> - The daemon crash: the MSR value will be 0 during the reboot of the
> daemon, the VM does not crash.
>
> - The VM crash: it doesn't matter for the daemon. The VM reboot, request
> the MSR, it works.
Fair enough. If the VM keeps running and it just breaks this feature
which looks mostly like statistics it should be fine if it simply stops
working in such cases.