Re: [libvirt PATCH 0/4] RFC: tests: introduce lavocado

Wednesday, 21 July 2021

On Tue, Jul 20, 2021 at 02:19:25PM +0200, Erik Skultety wrote:
...
 On Thu, Jul 01, 2021 at 07:04:32PM +0100, Daniel P. Berrangé wrote:
 > Libvirt has consumers writing applications in a variety of
 > languages, and periodically reporting bugs.  My general wish
 > for a test harness would be for something that can invoke
 > and consume arbitrary test cases. Essentially if someone
 > can provide us a piece of code that demonstrates a problem
 > in libvirt, I would like to be able to use that directly as
 > a functional test, regardless of language it was written
 > in.

 Well, in theory yes - it is all nice to let anyone contribute tests in their
 language of preference. Realistically, look at the nwfilter stuff in TCK, it's
 a pile of unintelligible Bash which cannot be debugged. So, I understand why
 you'd want to imagine or write tests in Go, but someone will have to maintain
 them otherwise we'd end up with a 3rd framework nobody will want to use. 
The nwfilter shell script was just badly written, and we should have
pushed back on accepting it in that state, but it is was so broadly
functional it was preferrable than getting no testing. It could have
been just as horrible if written in Perl or Python too - the core
approach of the tests themselves was just wrong. The use of shell
was just icing on the cake.

The main point is that we have applications consuming libvirt in
multiple languages. Python and Go are the most important ones in
terms of usage today, and we get some quite detailed bug reports
with demo code. If we get a bug report with a Go demo program,
it can't be assumed to be easy to just convert that to Python,
and vica-verca.

We do have to consider the maintainability of any tests being
contributed, but I don't want to be rejecting bug reproducers
that work, just because they're not in a specific language.

...
 Luckily, Python as a language doesn't suffer from this and
we've already made
 that preference in the libvirt repo already. So, I'd suggest to use our
 expertise in helping contributors who wish to contribute a test with porting
 their test case to a Python one - yes, it does consumer human resources and
 prolongs the time for a contribution to be accepted, but that pays off later
 down the road when the test cases need to be adapted to new
 conditions/environments. 
Certainly I'd say we can consider python as the default or preferred
language for writing tests, but I'd like us to be open to accepting
tests in other languages. 

...
 > In theory the libvirt TCK allowed for that, and indeed the
 > few tests written in shell essentially came to exist because
 > someone at IBM had written some shell code to test firewalls
 > and contributed that as a functional test. They just hacked
 > their script to emit TAP data.

 Others have already responded that Avocado can already do the same.
 What I'd like to say, again in context of arbitrary languages used in tests,
 the problem with the whole bash machinery IBM contributed is not only that it
 cannot be debugged, but also that you cannot replay a single failed test, so if
 an error happened in e.g. test 1148, you have literally no idea how to work
 only with that test case. Knowing literally zero about Avocado's nrunner and
 external (non-native) test suite support I cannot tell whether Avocado would
 have done a better job than TCK (I doubt it), but what I know for sure is that
 if you write native tests, you get much better support and control over the
 test suite. 
As above the shell tests were just bad as an overall conceptual approach.
Running commands and then comparing the stdout/err against expected
output is way too fragile, and inflexible for debugging too. We see the
same problem in QEMU block layer tests too.

...
 IIRC it was still necessary in TCK to run some of the
network-related
 tests with root permissions, but polkit may help (I haven't tried) - I sure
 don't want to run the test suite with root permissions to overcome some of the
 issues, because that doesn't simulate the real world usage. Therefore, I'd like
 libvirt Avocado as a framework to be set up by default in a way where polkit
 rules are in effect. 
In theory you shouldn't need root perimissions directly, because you just
have to grant access to libvirtd.  In practice of course that is just a
facade as a libvirt connection gives you indirect permissions equiva to
a root shell. The tests written was supposed to take care when interacting
with host resources - so only use NICs that are explicitly listed in the
config file, otherise skip the test. This kind of approach is doable for
any test framework - just requires someone to define the best practice
and reviewers to make sure people actually follow it.

...
 > I think the test framework should not concern itself with this
 > kind of thing directly. We already have libvirt-ci, which has
 > tools for provisioning VMs with pre-defined package sets. The
 > test harness should just expect that this has been done, and
 > that it is already running in such an environment if the user
 > wanted it to be.

 I suggested Beraldo to put this one in so as to make it very clear what our
 intentions are wrt/ coming up with a new framework and our expectations of it
 as an end result (like I said, we already have 2 frameworks, we don't want this
 one to end up the same) 

...

 > 
 > In the TCK config file we provided setting for a URI to connect
 > to other hosts, to enable multi-hosts tests like live migration
 > to be done.

 We would not exclude live migration from the test suite, it's simply a beast on
 its own (mainly because of automation, more specifically preparing a know
 destination to migrate to). But again, ^this is not something the framework
 itself should be concerned about, just a background story :).

 > 
 > > I'm adding more information with some details inside the README file.
 > 
 > Overall, I'm more enthusiastic about writing tests in Python
 > than Perl, for the long term, but would also potentially like
 > to write tests in Go too.
 > 
 > I'm wondering if we can't bridge the divide between what we
 > have already in libvirt-tck, and what you're bringing to the
 > table with avocado here. While we've not done much development
 > with the TCK recently, there are some very valuable tests
 > there, especially related to firewall support and I don't
 > fancy rewriting them.

 Nobody does fancy rewriting those, but there's no way going around that, we'll
 have to do it at some point because like I said, they're very error prone
 since they match against the verbatim output of firewall CLI tools which leads
 to frequent breakages and you figuring out what went wrong is pure torment. 
Agreed, we'll have to bite the bullet eventually. The hard thing is
deciding just how we want to validate correctness. Comparing stdout
format is an "easy" way. If we don't do that we need to figure out a
way to declare what the firewall state is suposed to be, and how to
validate a match, as we want to validate that the correct set of
rules is actually generated.

Possibly we should have used a python binding for libxtables to
query the firewall state and thus been in control of the expected
data format.

...
 > Thus my suggestion is that we:
 > 
 >   - Put this avocado code into the libvirt-tck repository,
 >     with focus on the supporting infra for making it easy to
 >     write Python tests

 Clearly I missed the section where you mentioned that we don't need to host
 the framework in the main repo...Anyway, I disagree here, again, we have a
 precedent of 2 failed frameworks where nobody really knows where to contribute
 a test nor wants to deal with any additional repos. A few remarks
 - it is much easier if the
   test suite lives alongside our unit test suite which is an "environment"
every
   developer is familiar with. 
I don't buy that argument. We already have countless repos and people
who are motivated to contribute do so easily. The failures of the
existing frameworks is down to a number of reasons. People really
didn't want to write Perl code is the big one for the TCK. For the
other framework its complexity and fact that it was not safe to
run on precious hosts put off ever using it.

The biggest thing though is that we never even made an effort to
develop a culture of writing functional tests as a prerequisite
for merging feature.

In QEMU there is a avocado integration in-repo, but very few
QEMU contributors ever do anything with it.  The block iotests
integration is another functional test suite and that is very
successful, because the block maintainers have set clear
expectation that you must contributor tests to it if you want
your features accepted.

We can see the same with other projects which have a strong
culture of testing. In OpenStack you'll not get your feature
accepted unless you've made a credible effort to provide
functional tests - which live in a completely separate git
repo.

...
 - integration into gitlab CI would be much easier with a single repo
than with
   a multi-repo setup
     -> yes, we already do that with binding builds, but I'm talking about
        setups where we'd have to configure gitlab to clone 3 repos and build
        them in chain, setup the framework config and execute the tests; 
As you say, we already deal with that. Cloning extra git repos is
a complete non-issue in the grand scheme of everything needed to
create and maintain a CI pipeline.  If we really wanted to we could
create a psuedo-monolithic repo and graft in the individual repos
using 'git subtree'.

...
 - keep in mind that we're trying to keep our efforts in sync with
QEMU wrt/
   (should apply vice-versa as well), so if avocado-qemu was accepted in QEMU
   upstream as the current framework which also happens to be hosted under the
   main repo, I don't see a problem why we could not do the same in libvirt 
QEMU has a very different maintainance model, with its monolithic
repo approach is actually a significant burden to QEMU development.
QEMU would be better off in many ways by modularizing its repos.
As above, having it in the tree doesn't magically make people
contribute to it

...
 - the main selling point IIRC of having TCK hosted in a standalone
repo was
   that you could test ANY version of libvirt
     -> we didn't have CI back then, but we do now...so if we start running the
        functional tests early, every upstream release would technically be
        tested, so there would not be a need to test older versions of libvirt
        with the same test suite, because it would have already been done before
        releasing that specific version at some point in time 
That's only true of unit tests because those run in isolation only
exercising the libvirt code. The functional / integration testing
is looking at the results of the entire system. You absolutely
need to run those tests in many different scenarios against both
pre-release and post-release code, because changes in the OS
will frequently break you. Changes in the OS will require fixing
the test suite and you then need to run the updated test suite
against the existing libvirt release. This is the key reason why
it is beneficial to have to cleanly separated from the repo.

We see this in Fedora/RHEL CI gating jobs, which run after the
RPM builds are done, using a full composed OS environent. For
many packages we have to untangle the test suite from the main
app code, in order to be able to run it seprately

...
 >   - Switch the TCK configuration file to use JSON/YAML 
 >     instead of its current custom format. This is to enable
 >     the Python tests to share a config file with the Perl
 >     and Shell tests. Thus they can all have the same way
 >     to figure out what block devs, NICs, PCI devs, etc
 >     they are permitted to use, and what dirs it is safe
 >     to create VM images in, etc.

 I'm worried that any more time we spend on trying to re-shape TCK into
 something modern is wasted compared to adopting Avocado. I've been fixing the
 tests in TCK for a while just to make sure TCK doesn't fail, because the
 original idea was for Avocado to utilize its nrunner to run TCK in its current
 shape and start testing early just to give us time to transition from TCK to
 Avocado by porting a good fistful of tests. 
The TCK is frequently broken largely because we never run it, and so it
bit rots as the OS we're running it on changes beneath our feet. We see
the same problem with any of QEMU's functional tests that don't get run
on a frequent basis, includng the avocado tests it has.

The only real solution there is to ensure whatever test suite we have
is actually run frequently and set an expectation that a maintainers
job is more than just writing code - they should be expected to help
fix test failures that arise. If the CI is used as a gating check
this becomes a strong motivating factor, because contributors code
will never get merged if the gating is broken.

With the every little experimentation I've done it looks like it should
be trivial to get avocado to run all the existing TCK tests by just
giving it a hint that they're TAP format. That looks like a small part
of work involved in actually developing avocado into a solid test
framework for libvirt.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt PATCH 0/4] RFC: tests: introduce lavocado