On 3/14/23 20:33, Andrea Bolognani wrote:
On Tue, Mar 14, 2023 at 11:20:49AM +0100, Michal Privoznik wrote:
> We have this crazy backwards compatibility when it comes to
> serial and console devices.
This truly is the gift that keeps on giving :/
> Basically, in same cases the very
> first <console/> is just an alias to the very first <serial/>
> device. This is to be seen at various places:
>
> 1) virDomainDefFormatInternalSetRootName() - when generating
> domain XML, the <console/> configuration is basically ignored
> and corresponding <serial/> config is formatted,
Wow, I wasn't aware of that part of the insanity. That got me
confused for a bit: dumpxml on a running domain was showing the
expected alias for the <console>. Now I understand why! It was
just displaying the information for the <serial> instead.
Indeed. Even though the internal struct holds different data. Hence the
mismatch.
> 2) virDomainDefAddConsoleCompat() - which adds a copy of
> <serial/> or <console/> into virDomainDef in post parse.
Since we add the missing elements in this function, shouldn't we be
able to remove the hacky code from the format part?
That's what I wanted to do. But A LOT of tests started failing. But I
blame my code for that. Let me see if I can do better today.
But at any rate, we would need to keep some parts of the code that
handle the aliasing - e.g. in qemuDomainChrRemove(), cmd line
generation, etc. In the end, the deciding factor for me was - in
majority of cases [1] we will just waste memory, as the def->console[0]
would be populated, but ignored. And given that my initial code didn't
work I decided to hack around the problem.
BTW: even then we still would need to keep this patch (or its variant),
because the problem would still be there. I mean, the backcompat console
handling is done in PostParse. Alias assignment is done after PostParse
(when starting the domain). So regardless of what we do in PostParse -
the aliased console would still get different alias then the serial
device. Simply because that's how qemuAssignDeviceChrAlias() is written.
But okay, we could work around that - we have qemuProcessPrepareDomain()
where we could fix the aliases. But what about other drivers? So far,
the CH driver is the only one where this crazy backcompat is not
happening (as of v9.1.0-rc1~63). But whatever we do, we would need to
just push the code from src/conf/ into the drivers.
1: majority, because by default we add serial console (as
virDomainDefAddConsoleCompat() demonstrates).
I've done a quick test and it didn't work, obviously :) but maybe
we're just failing to update some parts of the definition somewhere
else, and fixing those instances would allow to restore some sanity
to the formatting routine?
> And when talking to QEMU we need a special handling too, because
> while <serial/> is generated on the cmd line, the <console/> is
> not. And in a lot of place we get it right. Except for generating
> device aliases. On domain startup the 'expected' happens and
> devices get "serial0" and "console0" aliases, correspondingly.
> This ends up in the status XML too. But due to aforementioned
> trick when formatting domain XML, "serial0" ends up in both
> 'virsh dumpxml' and the status XML. But internally, both devices
> have different alias. Therefore, detaching the device using
> <console/> fails as qemuDomainDetachDeviceChr() tries to detach
> "console0".
>
> After the daemon is restarted and status XML is parsed, then
> everything works suddenly. This is because in the status XML both
> devices have the same alias.
So is the fact that two devices end up with the same alias something
that we consider okay? When it comes to user aliases, such a
configuration is (understandably) rejected.
Clearly nothing explodes too badly in this situation, as evidenced by
the fact that things actually start working more reasonably once the
two devices get matching aliases, but I wonder if it wouldn't be more
sensible to assign different aliases for the <console> and the
<serial>, then handle compatibility further down, closer to where we
generate command line arguments and monitor commands?
> Let's generate correct alias from the beginning.
>
> Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=2156300
After your patches, the "unknown cause" error is no longer reported,
and instead we get
unable to execute QEMU command 'device_del': Bus 'isa.0' does not
support hotplugging
which is clearly a better error message.
Indeed. This still papers over the original problem (the "unknown
cause"). But that's because in qemuDomainDeleteDevice() the rc = -2 path
is used. Obviously. I mean, the comment in that path says it all:
The device does not exist in qemu, but it still exists in libvirt.
which is obviously true, because we've asked QEMU to detach console0
which does not exist in QEMU.
Doesn't really help with the fact that it's fundamentally impossible
to hot-unplug platform devices, unfortunately :)
> + /* Some crazy backcompat for consoles. Look into
> + * virDomainDefAddConsoleCompat() for more explanation. */
> + if (chr->deviceType == VIR_DOMAIN_CHR_DEVICE_TYPE_CONSOLE &&
> + chr->targetType == VIR_DOMAIN_CHR_CONSOLE_TARGET_TYPE_SERIAL &&
> + def->os.type == VIR_DOMAIN_OSTYPE_HVM &&
> + def->consoles[0] == chr &&
> + def->nserials &&
> + def->serials[0]->info.alias) {
> + chr->info.alias = g_strdup(def->serials[0]->info.alias);
> + return 0;
> + }
This code is... Fine. It does the trick.
The only reason why I'm not ACKing it right away is that I'm
concerned we're piling hacks on top of hacks, and in particular that
allowing multiple devices to have the same alias will cause grief
further down the line.
Yeah, I was worried about non-unique alias too. Until I realized that
after the daemon is restarted, the status XML is parsed and the aliases
ARE the same. IOW, we're exactly at the same position as after this patch.
If you have already considered this and convinced yourself that we're
okay, then say so and I'll gladly ACK the patch.
Yeah, I'm not proud of this patch either. It's not something I'd be
showing to others at parties. But at the same time, I think this is the
only sensible solution. MAYBE, there's another one - writing similar
hack into qemuDomainDetachDeviceChr(). But that fixes just the
device-detach case. I don't know if there's another place where this
aliasing is causing problems, so maybe we could do just that and not use
this big hammer?
Michal