On Thu, Apr 19, 2018 at 09:29:55 +0200, Andrea Bolognani wrote:
On Wed, 2018-04-18 at 17:22 +0200, Peter Krempa wrote:
> On Wed, Apr 18, 2018 at 14:43:55 +0200, Andrea Bolognani wrote:
> > I think using real-life capabilities instead of the syntetic data
> > we use now is a step in the right direction.
> >
> > I'm still a bit uneasy about the "moving target" factor, though
> > I've spent some time trying to come up with possible failure
> > scenarios without much success...
>
> There are few churn-scenarios, some avoidable, some not so much:
>
> - When adding something which is added for a lot of configurations or all
> of them (e.g. recent adding of the new seccomp code) the change will
> generate a lot of changes.
>
> - Machine type. We can't use any of the alias machine types in the XML
> files as they will change when new capabilities with new machine types
> are added. This can easily be avoided by using an explicit machine
> type.
>
> While the above may induce some churn in some scenarios, I think that
> the benefit of at least seeing that something changed in places where
> you did not expect any change may outweigh it.
I'm not really worried about churn, but about situations where
changes to libvirt would make it behave wrongly with old versions
of QEMU but correctly with newer ones, possibly with respect to
features that are not directly touched by the changes, in ways
unforeseen by the developer.
Well, that is actually exactly why we need to check against the latest
capability set. If we have the locked-in feature bit set for testing, it
will catch regressions which happen with that set of capabilities. Any
regressing behaviour which would happen only with a newly added
feature will not be caught, and those are the very sneaky ones.
Right now that's easy to spot because everything is locked to
some
well-defined set of capabilities, but if we started using moving
Well, only half of the scenarios is covered, as said above.
targets for most tests we'd decrease coverage of older QEMU in
favor of newer QEMU over time.
I don't think this will happen if we fork the test for a certain version
in cases when the output would change. If the output will not change we
will not have to do anything. This will prevent having a lot of test
files which don't do anything.
> > I think ideally with each new feature the author would
introduce
> > three tests: a negative one locked to a QEMU version that doesn't
> > have the required bits, a positive one locked to a QEMU version
> > that has them, and a positive one against the latest QEMU version.
>
> I'm considering whether it's necessary to have the one locked to the
> older-but-supporting version of qemu.
>
> My idea was that if somebody were to add test which would change
> anything, the test can be forked prior to that. But it seems that it may
> be a better idea to lock-in the changes right away.
I believe so. This would also make it so our test coverage always
increases over time.
Now, a crazy idea: what if the test author would define a baseline
version QEMU where the test passes, and the test suite would
automatically test all versions between that one and the latest?
It certainly is techincally possible as we could e.g. do a range of
versions for the macro rather than single version.
I'm only worried that it's slightly too expensive. The good thing is,
that when we'll fork the tests rigorously on any change we should get
coverage for most code paths.
The unfortunate part is that the qemuxml2argvtest has now 800 XML files
for testing (some are invalid though). If we introduce cartesian testing
with all capability dumps, it will expand the test case count 12 fold
which will tend to increase over time. (Thankfully Jano deleted a lot of
old crap though)
That would mean a lot of churn whenever we add new capabilities
data, but it should be feasible to detect output files that do not
change and turn them into symlinks to reduce diffs and save disk
space.
Actually if a test is stable (which requires it to be as minimal as
possible) in a range of versions, adding new capabilities should be for
free. We can test all versions in a range against the same output file.
It should only change when adding a new feature.
This only requires forking the test file if a change modifies anything
retroactively.
We would also need a way to run a test only in a range of versions,
for negative testing.
Well, the same as for positive testing. If we have infrastructure to
test a version range, then this should be possible as well. I'm not sure
though whether it's worth it at all.
Too far fetched? :)
Slightly. But certainly has some value. Unfortunately I can't put in
much more time besides this initial idea, since I'm busy with the
blockdev stuff.