On Thu, May 18, 2017 at 10:22:59AM +0200, Jiri Denemark wrote:
The big question is how to fix the regression in a backward
compatible
way and still keep the ability to properly check guest CPU ABI with new
enough libvirt and QEMU. Clearly, we need to keep both the original and
the updated CPU definition (or one of them and a diff against the
other).
I came up with the following options:
1. When migrating a domain, the domain XML would contain the original
CPU def while the updated one would be transferred in a migration
cookie.
- doesn't fix save/restore or snapshot revert
- could be fixed by not updating the CPU with QEMU < 2.9.0, but it
won't work when restore is done on a different host with older
QEMU (and yes, this happens in real life, e.g., with oVirt)
- doesn't even fix migration after save/restore or snapshot revert
Yep, not an option.
2. Use migration cookie and change the save image format to contain
additional data (something like migration cookie) which a new libvirt
could make use of while old libvirt would ignore any additional data
it doesn't know about
- snapshot XML would need to be updated too, but that's trivial
- cleanly changing save image format requires its version to be
increased and old libvirt will just refuse to read such image
- this would fix save/restore on the same host or on a host with
older QEMU
- doesn't fix restore on a different host running older libvirt
- even this is done by oVirt
The only way we can change save image format, is if we ensure we use
the old format by default, and require a manual VIR_DOMAIN_SAVE_CPU_CHECK
flag to turn on the "new" format.
3. Use migration cookie and change the save image format without
increasing its version (and update snapshot XML)
- this fixes all cases
- technically possible by using one of the unused 32b ints and
adding \0 at the end of the domain XML followed by anothe XML with
the additional data (pointed to by the formerly unused uint32_t)
- very ugly hack
While it seems ugly, this is kind of what the 'unused' ints are
there for in the first place.
Basically for required incompatible changes we'd bump version, but
for safe opt-in changes we can just use an unused field. So I think
this is ok
4. Format both CPUs in domain XML by adding a new subelement in
<cpu>
which would list all extra features disabled or enabled by QEMU
- fixes all cases without the need to use migration cookie, change
save image format, or snapshot XML
- but it may be very confusing for users and it would need to be
documented as "output only, don't touch staff"
Yeah this just looks too ugly.
So my preferred solution is 2. It breaks restore on a host with
older
libvirt, but it was never guaranteed to work, even though it usually
just worked. And we could just tell oVirt to fix there usage :-) Also
additional data in save image header may be very useful in the future; I
think I remember someone sighing about the inability to store more than
just domain XML in a saved image when they were trying to fix some
compatibility issues.
I think 2 is ok, if we use the VIR_DOMAIN_SAVE_CPU_CHECK to opt-in to
writing the new format with expanded CPU.
It means OpenStack/oVirt would *not* use this flag by default - it would
need to be an opt-in once the admin knows they don't have any nodes with
older version in use.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|