On 05/10/2016 03:30 AM, Ján Tomko wrote:
On Mon, May 09, 2016 at 11:57:20AM -0400, Cole Robinson wrote:
> On 05/09/2016 04:59 AM, Peter Krempa wrote:
>> On Fri, May 06, 2016 at 11:19:05 -0400, John Ferlan wrote:
>>>
>>>
>>> On 05/02/2016 10:32 AM, Peter Krempa wrote:
>>>> This reverts commit c79ebf53b5fe0a33bf407b3bcb49e3a27ec97eb4.
>>>>
>>>> We can't just add checks to the XML parser once we've accepted
such
>>>> configuration in the past.
But we can just revert the check once we've accepted it? :)
>>>> ---
>>>> src/conf/domain_conf.c | 22 ----------------------
>>>> tests/qemuxml2argvtest.c | 3 +--
>>>> 2 files changed, 1 insertion(+), 24 deletions(-)
>>>>
>>>
>>> There was a bz associated with that commit - that'll need to be
>>> addressed in some manner...
>>
>> Well, the initial assesment of that BZ was wrong. This should have been
>> fixed in virt manager at that point.
>>
>
> That bz:
https://bugzilla.redhat.com/show_bug.cgi?id=1201143
>
> Is loaded with GSS and other 'priority' tags. Maybe the original assessment
of
> the bug is wrong. But just reverting the commit with little mention of that
> until John dug it up is not helpful IMO
>
Neither is discussing a private BZ on a public list:
"You are not authorized to access bug #1201143. To see this bug, you
must first log in to an account with the appropriate permissions."
Personally not sure why it's private, seeing as that "decision point"
doesn't seem to have any consistency. For some bugs, it's easy, but for
this one, it's not clear. In any case, there's an external bug id
attached to it as well - although again - no idea how to point someone
at that. There's also errata associated with it.. There's another
private bug that was used to generate the "original" adjustments that
were only "run-time" related that resulted in commit id '33188c9fc' -
the bz associated with that uses the same external customer id.
Personally what I find interesting about the bug report is that the
consumer was indicating that the configuration should have been rejected
at the libvirt level. I cannot recall the details or any discussion that
happened last June over this (hence why I tend to be more verbose either
in code commits/comments, list responses, or bzs).
My point in NACK'ing wasn't that this isn't the right thing to do, it
was more towards what is going to be done to address the bz that was
associated with this. I'm not sure it'd be "good form" to just revert
something causing the original problem to surface again months later
without also dealing with the cause. Although perhaps by this time that
type of configuration is avoided.
>>> While I understand your point here, the configuration
didn't work - that
>>> is it couldn't be started anyway so there could not be a domain running
>>> with that configuration and thus it wouldn't disappear on a subsequent
>>> reload, hence why checking the config and rejecting "earlier"
seemed
>>
>> It does not kill any running domain, that's right. Defined domains still
>> vanish after that commit if they were defined before. That is still
>> unwanted.
>>
>
> Yes, this is problematic, but then again has this issue bitten us in the year
> since this was committed? We should still fix it, but it's not time
> critical... Maybe we come up with a better solution.
>
Recalling and following the bz text/comments - this whole domain
disappearing was brought up prior to posting a patch upstream, but alas
the memory cannot recall the details of why I pushed forward anyway. I
probably leaned more heavily towards the side of what the 'consumer' was
requesting in my decision.
>>> proper even though we hadn't rejected such a config
when the
>>> "mode='host'" was first implemented.
>>
>> Only when introducing a feature you are allowed to do a check that
>> rejects parsing XML, afterwards, no such thins should be added.
>>
>
> Right, these rules make technical sense, but are extremely difficult to audit,
> and have proven hard to enforce. People wandering into the code may follow the
> conventions of the code around them, and have no idea that adding a new
> validation check is 'bad' when the pre-existing similar checks are
'good'
> strictly based one when they were added.
>
> I think we need a better framework for this. I'll probably send a larger mail
> at some point, but basically I think we should:
>
> - rename VIR_DOMAIN_DEF_PARSE_SKIP_OSTYPE_CHECKS to something generic like
> VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATION. The original flag is used at libvirtd
> startup time to avoid the 'disappearing domain' problem for when a qemu is
> uninstalled for example, but we still get that validation check for normal
> runtime XML define
>
The problem with skipping validation is that we will end up with invalid
domains being defined, breaking assumptions in other parts of libvirt.
That way we would have to duplicate the checks on domain startup too
(which would not be such a problem if they were all in the same
function).
For example:
commit 21b316f4d351859d9ccbf8a20199f7e8707fd51d
qemu: error out on missing machine type in configs
which I added after someone tried to put the machine XML directly in
/etc/libvirt.
There's no point in allowing the user to (try to) start such a domain,
but currently we treat such unvalidated XML the same as XML from a fresh
define.
Maybe some new API virIsDomainRunable (or some better name) that could
be generated by carefully extracting checks from the qemu_command line
build processing in order to make all those run/build time checks. Then
command line building just fails if there's some sort of problem with
the actual building rather than the config. Some of those checks are
very specific configuration checks for supportable cross device or
architecture validation. We don't provide an easy way for our consumers
to validate something is good or bad until run time when it's rejected.
For some consumers, that's a bad time to make that decision.
I think I recall an attempt to introudce an 'invalid' state, where you
could not start the domain, but it was editable by libvirt APIs.
Unfortunately I cannot find it in the archives. Does anyone remember
what happened to it?
This one I believe is what you're looking for:
http://www.redhat.com/archives/libvir-list/2015-December/msg00027.html
John