Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

19 Jun 2019


      On Wed, Jun 19, 2019 at 11:57 AM Cornelia Huck <cohuck@redhat.com> wrote:
...
On Wed, 19 Jun 2019 11:04:15 +0200
Sylvain Bauza <sbauza@redhat.com> wrote:
...
On Wed, Jun 19, 2019 at 12:27 AM Alex Williamson <
alex.williamson@redhat.com>
wrote:
...
On Tue, 18 Jun 2019 14:48:11 +0200
Sylvain Bauza <sbauza@redhat.com> wrote:
...
On Tue, Jun 18, 2019 at 1:01 PM Cornelia Huck <cohuck@redhat.com>
wrote:
...
...
...
...
I think we need to reach consensus about the actual scope of the
mdevctl tool.
Thanks Cornelia, my thoughts:
- Is it supposed to be responsible for managing *all* mdev devices
in
...
the system, or is it more supposed to be a convenience helper for
  users/software wanting to manage mdevs?
The latter. If an operator (or some software) wants to create mdevs
by
not
using mdevctl (and rather directly calling the sysfs), I think it's
OK.
That said, mdevs created by mdevctl would be supported by
systemctl,
while
the others not but I think it's okay.
I agree (sort of), and I'm hearing that we should drop any sort of
automatic persistence of mdevs created outside of mdevctl.  The problem
comes when we try to draw the line between unmanaged and manged
devices.  For instance, if we have a command to list mdevs it would
feel incomplete if it didn't list all mdevs both those managed by
mdevctl and those created elsewhere.  For managed devices, I expect
we'll also have commands that allow the mode of the device to be
switched between transient, saved, and persistent.  Should a user then
be allowed to promote an unmanaged device to one of these modes via the
same command?  Should they be allowed to stop an unmanaged device
through driverctl?  Through systemctl?  These all seem like reasonable
things to do, so what then is the difference between transient and
unmanaged mdev and is mdevctl therefore managing all mdevs, not just
those it has created?
Well, IMHO, mdevs created by mdevctl could all be persisted or transient
just by adding an option when calling mdevctl, like :
"mdevctl create-mdev [--transient] <uuid> <pci_id> <type>" where default
would be persisting the mdev.
This sounds useful; the caller can avoid fiddling with sysfs entries
directly, while not committing to a permanent configuration.
...
For mdevs *not* created by mdevctl, then a usecase could be "I'd like to
ask mdevctl to manage mdevs I already created" and if so, a mdevctl
command
...
like :
"mdevctl manage-mdev [--transient] <mdev_uuid>"
What kind of 'managing' would this actually enable? If we rely on
mdevctl working with sysfs directly for transient devices, I can't
really think of anything...
Just for getting the list of mdevs and see whether they are persistent.
...
...
Of course, that would mean that when you list mdevs by "mdev list-all"
you
wouldn't get mdevs managed by mdevctl.
Thoughts ?
...
- Do we want mdevctl to manage config files for individual mdevs, or
...
...
are they supposed to be in a common format that can also be
managed
  by e.g. libvirt?
Unless I misunderstand, I think mdevctl just helps to create mdevs
for
being used by guests created either by libvirt or QEMU or even
others.
How a guest would allocate a mdev (ie. saying "I'll use this
specific
mdev
UUID") is IMHO not something for mdevctl.
Right, mdevctl isn't concerned with how a specific mdev is used, but I
think what Connie is after is more the proposal from Daniel where
libvirt can essentially manage mdevctl config files itself and then
only invoke mdevctl for the dirty work of creating and deleting
devices.  In fact, assuming systemd, libvirt could avoid direct
interaction with mdevctl entirely, instead using systemctl device units
to start and stop the mdevs.  Maybe where that proposal takes a turn is
when we again consider non-systemd hosts, where maybe mdevctl needs to
write out an init script per mdev and libvirt injecting itself into
manipulation of the config files would either need to perform the same
or fall back to mdevctl.  Unfortunately there seems to be an ultimatum
to either condone external config file manipulation or expand the scope
of the project into becoming a library.
Well, like I said, I think it's maybe another user case : just using
libvirt when you want a guest having vGPUs and then libvirt would create
mdevs (so users wouln't need to know at that).
That said, for the moment, I think we don't really need it so maybe a new
RFE once we at least have mdevctl packaged and supported by RHEL ?
If we allow config file handling directly, libvirt could start using it
even without mdevctl present? (Not sure if that makes sense.)
Well, sure.
...
...
...
- Should mdevctl be a stand-alone tool, provide library functions, or
...
...
both? Related: should it keep any internal state that is not
...
...
...
...
to disk? (I think that also plays into the transient vs.
written
persistent
...
...
...
...
question.)
I don't think we want an mdevctld, if that's what you mean by internal
Yeah, mdevctld--.
...
...
state not written to disk.  I think we ideally want all state in the
mdev config files or discerned through sysfs.  How we handle
non-systemd hosts may throw a wrench in that though since currently the
systemd integration relies on a template to support arbitrary mdevs and
I'm not sure how to replicate that in other init services.  If we need
to dynamically manage per mdev init files in addition to config files,
we're not so self contained.
...
FWIW, I'd love using mdevctl for OpenStack (Nova) just at least for
creating persisted mdevs (ie. mdevs that would be recreated after
rebooting
using systemctl). That's the real use case I need.
Whether libvirt would internally support mdevctl would be nice but
that's
not really something Nova needs, so I leave others providing their
own
thoughts.
My personal opinion is that mdevctl should be able to tolerate
mdevs
...
being configured by other means, but probably should not try to
impose
its own configuration if it detects that (unless explicitly asked
to do
so). Not sure how feasible that goal is.
That's what I misunderstand : in order to have a guest using a
vGPU,
you
need to do two things :
1/ create the mdev
2/ allocate this created dev to a specific guest config
Of course, we could imagine a way to have both steps to be done
directly
by
libvirt, but from my opinion, mdevctl is really helping 1/ and not
2/.
Yep, we also don't want to presume libvirt is the only consumer here.
mdevctl should also support other VM management tools, users who write
their own management scripts, and even non-VM related use cases.
Oh yes, please don't premuse mdevctl is only needed by libvirt.
FWIW, once mdevctl is supported by RHEL, I'd love to use it for OpenStack
Nova at least because I want to persist the mdevs.
At the moment, Nova just creates mdevs directly by sysfs and look the
existing onces up by sysfs, but upstream community in Nova thinks the
mission statement is not about managing mdevs so we don't really want to
add in Nova some kind of DB persistence just for mdevs.
So, Nova would basically poke mdevctl, but not interact with the config
files directly? Or am I misunderstanding?
Correct, instead of doing something like
https://github.com/openstack/nova/blob/master/nova/privsep/libvirt.py#L207-L...
That said, Nova could do like libvirt and create a config file, for sure.
...
...
...
...
A well-defined config file format is probably a win, even if it
only
...
ends up being used by mdevctl itself.
Yes, regardless of whether others touch them, conversion scripts on
upgrade should be avoided.  Do we need something beyond a key=value
file?  So far we're only storing the mdev type and startup mode, but
vfio-ap clearly needs more, apparently key=value1,value2,... type
representation.  Still, I think I'd prefer simple over jumping to xml
or json or yaml.  Thanks,
Heh, in OpenStack discussing about a file format is possibly one of the
longest arguments we already have, so I leave others to say their own
opinions but FWIW, as we use Python we tend to prefer YAML files. I don't
care about the format tho, just take the most convenient for libvirt I'd
say.
Aww, and here I was looking forward to a nice file format discussion...
More seriously, as I said in my other reply, .ini style would be good,
but using JSON probably gives us more flexibility in the long run.