On 05/04/2015 08:59 AM, Ján Tomko wrote:
On Thu, Apr 30, 2015 at 04:40:46PM -0400, Laine Stump wrote:
> Back in 2013, commit 877bc089 added in some tests that made sure no
> error was generated on a domain definition that had an automatically
> added usb controller if that domain didn't have a PCI bus to attach
> the usb controller to. In particular, two s390-specific tests were
> added, one with <controller type='usb' model='none'/> and
another
> (called "s390-piix-controllers") that had both usb and ide
> controllers, but nothing attached to them.
>
It was a synthetic test (I never ran the command line) meant to check
the fix for starting s390 guests:
https://www.redhat.com/archives/libvir-list/2013-April/msg01920.html
which I posted instead of this proposed patch
https://www.redhat.com/archives/libvir-list/2013-April/msg01925.html
(Thanks for the extra history. I had extracted part of it from the
commit logs, but didn't know about that original patch.)
Yes, that definitely would not be the right fix. We shouldn't pretend
that a domain has hardware that it doesn't actually have (it was fixing
up yet another occurrence of that problem that led me to this).
> Then in February of this year, commit 09ab9dcc eliminated the annoying
> auto-adding of a usb device for s390 and s390x machines, stating:
>
> "Since s390 does not support usb the default creation of a usb
> controller for a domain should not occur."
>
> Since s390 doesn't support usb and usb controllers aren't added to
> s390 domain definitions automatically, there is no reason to have the
> tests with a usb controller and expect them to succeed.
QEMU seems to ignore the -usb parameter,
Will the badness never end? :-/
(seriously, this entire collection of problems/patches is all due to one
bit of code or another silently implying that nonexistent things exist
and/or not logging an error and failing when they actually don't exist.
If everyone involved would just *tell the truth* we could stop the
insanity).
so the only reason would be:
'It worked before'. (Unless running the qemu-system-s390x binary without
an guest on x86_64 does not give me results matching the real-world
usage).
I'm okay with breaking the controllers with explicit PCI addresses,
since libvirt stopped adding those almost 2 years ago.
Erroring out on bare <controller type='usb'> would break domains that
worked just three releases ago (the tests and the fix was added
precisely to avoid that).
Ugh. This is very distasteful, but I do see your point. Note that this
patch doesn't add any new error conditions, it just removes a test that
verifies acceptance of something that shouldn't have happened in the
first place. However, it was libvirt itself that created these
technically invalid configs (by auto-adding the <usb>, so it would be a
bit mean-spirited to suddenly begin failing.
The "other patch" I refer to in the commit log is one that results in an
error log only for IDE controllers in a domain that doesn't support IDE,
though. Since libvirt itself never adds an IDE controller automatically
unless someone adds a disk that attaches to an IDE controller, and since
any disk attached to a nonexistent IDE controller would result in an
error from QEMU on startup (except in the case of a Q35 domain, which is
exactly the ignored case where I *want* it to cause an error), I
seriously doubt that there are any configs in the wild that match the
following:
1) target domain doesn't support IDE
2) config has IDE controller
3) config has no IDE disk (and thus currently doesn't cause an error)
Also, USB controller model='none' should be allowed.
Yeah, I guess I can live with that one.
> And since the
> only reference of an IDE controller wrt s390 that I've found is in the
> one test case mentioned above, and the commit log that added it
> specifically mentions the purpose to be quieting error messages on
> machines with no PCI bus, I'm assuming that the s390 also doesn't
> support IDE controllers. Based on that reasoning
The IDE controller seem to be added only if something uses the IDE bus
(the problem seems to be the disk target starting with 'hd')
Right. So up until now there are only two ways an IDE controller would
have been added:
1) user adds a disk and says that it attaches to the IDE bus. qemu
generates an error, user removes disk.
or
2) user manually adds an IDE controller for no good reason.
or the *real* problem, which occurs only on Q35 domains (since the Q35
domain has a SATA controller that qemu gives the id "ide" (which is
identical to the integrated IDE controller on a 440fx domain):
3) user adds a disk and says that it attaches to the IDE bus. libvirt
generates commandline saying "attach this to id=ide", qemu says "okay,
I'm attaching this to the SATA controller named 'ide'" and proceeds on
its happy way. (this later causes a problem if the user attaches a disk
to the SATA bus, or a 2nd disk to the "IDE" bus).
libvirt should have *always* flagged all of these as errors - not
flagging (1) and (2) as errors is what led to the bug in (3) creeping in
and being unidentified for so long (see
https://bugzilla.redhat.com/show_bug.cgi?id=1207834 ) and leaving it
that way would only lead to more bugs.
Both tests can be fixed to avoid that.
Some amount of lenience for bad config is okay, but since allowing a
non-supported IDE controller into the code has been demonstrated to have
allowed a bug into the code that lay dormant for over a year and has
caused such a headache now, I think the proper thing to do is fix it and
take the short-term pain rather than continuing to gloss it over.
> (and the fact that
> s390-piix-controllers causes a test error for an upcoming patch),
Honestly, that is not a great reason to remove a test :)
The upcoming patch correctly identifies an IDE controller being defined
when it shouldn't be; the current s390-piix-controllers test is arguably
invalid - the test is just testing that we allow a bad config.
My opinion of this is that:
1) we shouldn't test to verify that we allow an invalid config. This is
institutionalizing bad behavior (and misleading those who read the
tests). (For awhile I was wondering if the s390 virtual machine actually
did have a piix controller on it - "there's a test case for it, so it
must exist")
2) I have no problem with *allowing* <controller usb ...> in domains
that have no support for USB, although I think we should add a
VIR_WARNING() pointing out that the config has an unsupported (and
therefore ignored) USB controller defined, and that sometime in the
future this may turn into an error. Then after [some undetermined time]
this VIR_WARNING() can be changed to virReportError().
3) Since ide controllers should only have been added by user
interaction, I think we should immediately flag them all as errors.
(neither 2 nor 3 is done by this current patch, but (3) is done by the
next patch after this one on my dev branch - I just sent this patch
early because I wanted to make sure the authors of 09ab9dcc had ample
time to respond).
Perhaps a reasonable compromise would be this:
1) rename the s390-piix-controllers test to
"s390-allow-bogus-usb-controller" (or something like that) and remove
the ide controller from it.
2) likewise rename "s390-usb-none" to
"s390-allow-bogus-usb-controller-none"
(the idea of both of these is to make it apparent to the untrained eye
that this is testing for something that isn't really valid, but we're
allowing for now).
3) in a separate patch, add a VIR_WARN() any time a domain has a USB
controller but doesn't support USB.
4) (already done in an upcoming patch) log an error if a domain has an
ide controller defined when it isn't supported.
How much (if any :-) or that would you agree to?