On 10/2/18 10:14 AM, Andrea Bolognani wrote:
Background
==========
We have plenty of features in libvirt, some of which were designed at
a time when the virtualization story was much more straightforward
than the multi-architecture, multi-hypervisor, multi-machine world we
currently live in and, while we have found ways to keep the APIs
chugging along, the result is sometimes somewhat confusing for users
and application developers, as well as requiring libvirt developers
themselves to spend quite a bit of collective time working around
decisions that, in hindsight, turn out to have been less than
fortunate.
I've read the other responses and as difficult as it is to type - yeah,
it seems we're stuck with what we have. Hard to undo old decisions as
difficult as they make future life. Being stuck with those decisions
makes future adjustments that much harder unless we can find a better
way to fracture things between legacy and current (which yes has it's
own downsides).
Spending time coding and reviewing on "warnings" serves only those that
keep up with the latest and greatest serves what purpose? Until it stops
working, who notices?
Will someone who develops code that hasn't kept up with the advances in
virtualization see, know, or care? They have something that works and
won't change it. They're probably also holding onto their Commodore
VIC-20 hoping for a comeback, too.
I would think the "bulk" of the scope is less old API's and more the
default value problem. Is this a lazy or uninformed consumer problem?
The default value problem seems to be limited to QEMU machine types and
PCI model/type chosen. If QEMU changed to default to q35 and/or removed
i440fx, then IIUC there'd be lots of problems. No small wonder so many
vendors of physical systems go out of business. Bad defaults. That's not
to say those that are left are providing systems with great defaults,
just better than the others. The whole conversation seems to be a very
circular mess/problem. Libvirt's history may be to not make policy
decisions, but it does seem there was one core policy decision that is
haunting us when trying to move forward.
Two concrete examples are considered here: one is the
virConnectNumOfDomains() API which, while known to be racy and having
non-racy alternatives, can still be used by developers without
getting any kind of warning in the process; the other one is the
ability to define a domain without specifying the machine type, which
is becoming increasingly problematic now with some x86_64 features
being limited to q35 and downstreams looking to push for its
adoption, as well as being a manifestation of the more general
problem of libvirt's default being sometimes too conservative and at
odds with the existence of slimmed-down QEMU binaries being built
with reducing the total attack surface in mind.
I know a moot point now, but...
connectNumOf[Defined]Domains may be used without calling
connectList[Defined]Domains. The latter is actually the one I'd think is
more buggy as opposed to using connectListAllDomains. I think the same
argument can be made for other subsystems with both connectList and
connectListAll (e.g. most of them). Still, since connectListAllDomains
was introduced in 0.9.13 (01-Jul-2012) - I have to imagine anyone
developing code in 2018 would use that instead of the old API's.
Having a proper deprecation story in libvirt would allow us to point
users and developers towards the recommended solution in each case,
be it using a different API or querying libosinfo for information;
after a generous grace period, we could then remove the problematic
functionality altogether. This would be a more conservative version
of the process we already have in place for dropping support for
older QEMU releases, which recently has allowed us to ax massive
chunks of effectively dead code and simplify parts of libvirt quite
significantly.
We cannot deprecate old stuff, but can we declare new stuff comes with
limited lifetimes? IOW: learn from our previous mistakes and not allow
them to happen again. Maybe one of us will still be here in that
lifetime and get to remove something!
Even though we cannot deprecate, we could move stuff around such that
certain things end up on *_legacy.{c,h} files and if problems arise in
them, we could have a general "support" statement that would hopefully
prod the up the stack app to try and enter the modern ages. Perhaps a
time consuming short term with longer term benefits.
This series explores one possible approach to the problem and aims
to spark project-wide discussion around the topic.
Game on ;-)!
This series only 'solves' or 'advises of' the problem for anyone that
would install whatever libvirt version it got into for both client
(virsh specific too) and libvirtd/API. Doesn't help other combinations.
They wouldn't necessarily know the message occurred.
Some random thoughts without putting too much thought into each.
* Can we pull of the python-3 and python-2 magic trick with virsh?
Perhaps creating virsh-legacy that gets fork/exec'd if we cannot find
some new API we create circa 2018-2019 that knows how to only use newer
API's and perhaps (new) API to generate newer/better defaults. I know
that only solves the virsh problem, but one would think virsh-{python,
perl, etc} would be able to pull off similar tricks. Other libvirt API
consumers would be able to use the new API too. Some day soon they are
going to have to solve the same problem especially if they provide
defaults or have configurations using legacy or older options when
something more newer is available.
* Can we come up with a mechanism that allows domain admins to update
defaults automagically? Perhaps some new attribute that indicates such a
desire.
* How about a new domine define flag that says pick the latest and
greatest defaults. Of course that assumes someone is using the later and
greater *Flags API too!
* How about a new tools/virt* image that can be run providing
information about existing [defined only?] domains that would point out
which definitions may use old technology or at least informs a user what
may need to change based on current technology? Thus when a domain is
stopped, you run the tool, it changes (for example) the machine type and
then runs the post parse and verify code and either automatically cleans
up or allows the admin to keep updating until they fix things
themselves. Puts the onus on someone else to make the "policy decision".
* Can/should the Default Machine probing be split into a connect* type
driver function and then used by new apps that care?
* How much of the "default values" code for various machines or
interconnects be split into loadable modules based on machine type?
We're splitting cgroups into v1 and v2 right now. Could we do the same
to have machine specific code in specific modules? Does i440fx use
pci-e? IIRC q35 uses pci-e only.
John
Further work
============
* Fix the known issues listed below as well as all not-yet-known
issues that will undoubtably surface through discussion :)
* Introduce a mechanism to catch use of deprecated APIs at build
time, similar to GLib's G_DISABLE_DEPRECATED, to help application
developers proactively move off problematic APIs.
* Create a formal deprecation policy with well-defined rules and
time scales in the spirit of the existing one covering our
relationship with QEMU.
Know issues
===========
* For the more granular (and more interesting) type of deprecation
shown in patch 6/7, warnings are not being reported back to the
client as expected. I believe this is caused by the RPC code
looking for either a failure, in which case the virError is
transmitted, or a success, in which case the actual return value
is: we'll have to figure out a way for the error to travel across
the wire regardless of whether or not the API call was ultimately
successful if we want clients to actually receive warnings when
non-local drivers are involved.
Andrea Bolognani (7):
util: Add 'level' argument to virReportErrorHelper()
util: Introduce virReportWarning()
tools: Print warnings in virsh
util: Introduce VIR_ERR_DEPRECATED_FEATURE
Deprecate virConnectNumOfDomains()
Deprecate missing machine type in virDomainDefineXMLFlags()
tools: Force virsh to use deprecated features
include/libvirt/virterror.h | 1 +
src/access/viraccessdriverpolkit.c | 2 +-
src/access/viraccessmanager.c | 2 +-
src/conf/domain_conf.c | 11 ++++++++---
src/datatypes.h | 30 ++++++++++++++++++++++++++----
src/libvirt-domain.c | 6 ++++++
src/libvirt.c | 1 +
src/util/virbuffer.c | 4 ++--
src/util/virconf.c | 6 ++++--
src/util/virerror.c | 10 +++++++++-
src/util/virerror.h | 15 ++++++++++-----
src/util/virkeyfile.c | 6 ++++--
src/util/virxml.c | 2 +-
tools/virsh-domain-monitor.c | 2 ++
tools/vsh.c | 3 +++
15 files changed, 79 insertions(+), 22 deletions(-)