On Fri, 3 Aug 2018 12:07:58 +0000
"Wang, Zhi A" <zhi.a.wang(a)intel.com> wrote:
Hi:
Thanks for unfolding your idea. The picture is clearer to me now. I didn't realize
that you also want to support cross hardware migration. Well, I thought for a while, the
cross hardware migration might be not popular in vGPU case but could be quite popular in
other mdev cases.
Exactly, we need to think beyond the implementation for a specific
vendor or class of device.
Let me continue my summary:
Mdev dev type has already included a parent driver name/a group name/physical device
version/configuration type. For example i915-GVTg_V5_4. The driver name and the group name
could already distinguish the vendor and the product between different mdevs, e.g. between
Intel and Nvidia, between vGPU or vOther.
Note that there are only two identifiers here, a vendor driver and a
type. We included the vendor driver to avoid namespace collisions
between vendors. The type itself should be considered opaque regardless
of how a specific vendor makes use of it.
Each device provides a collection of the version of device state of
data stream in a preferred order in a mdev type, as newer version of device state might
contains more information which might help on performances.
Let's say a new device N and an old device O, they both support mdev_type M.
For example:
Device N is newer and supports the versions of device state: [ 6.3 6.2 .6.1 ] in mdev
type M
Device O is older and supports the versions of device state: [ 5.3 5.2 5.1 ] in mdev type
M
- Version scheme of device state in backwards compatibility case: Migrate a VM from a VM
with device O to a VM with device N, the mdev type is M.
Device N: [ 6.3 6.2 6.1 5.3 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: 5.3
The new device directly supports mdev_type M with the preferred version on Device O.
Good, best situation.
Device N: [ 6.3 6.2 6.1 5.2 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: 5.2
The new device supports mdev_type M, but not the preferred version. After the migration,
the vendor driver might have to disable some features which is not mentioned in 5.2 device
state. But this totally depends on the vendor driver. If user wish to achieve the best
experience, he should update the vendor driver in device N, which supports the preferred
version on device O.
Device N: [ 6.3 6.2 6.1 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: None
No version is matched. Migration would fail. User should update the vendor driver on
device N and device O.
- Version scheme of device state in forwards compatibility case: Migrate a VM from a VM
with N to a VM with device O, the mdev type is M.
Device N: [ 6.3 6.2 .6.1 ] in M
Device O: [ 5.3 5.2 5.1 ] in M, but the user updates the vendor driver on device O. Now
device O could support [ 5.3 5.2 5.1 6.1 ] (As an old device, the Device O still prefers
version 5.3)
Version used in migration: 6.1
As the new device states is going to migrate to an old device, the vendor driver on old
device might have to specially dealing with the new version of device state. It depends on
the vendor driver.
- QEMU has to figure out and choose the version of device states before reading device
state from the region. (Perhaps we can put the option of selection in the control part of
the region as well)
- Libvirt will check if there is any match of the version in the collection in device O
and device N before migration.
- Each mdev_type has its own collection of versions. (Device can support different
versions in different types)
- Better the collection is not a range, better they could be a collection of the version
strings. (The vendor driver might drop some versions during the upgrade since they are not
ideal)
I believe that QEMU has always avoided trying to negotiate a migration
version. We can only negotiate if the target is online and since a
save/restore is essentially an offline migration, there's no
opportunity for negotiation. Therefore I think we need to assume the
source version is fixed. If we need to expose an older migration
interface, I think we'd need to consider instantiating the mdev with
that specification or configuring it via attributes before usage, just
like QEMU does with specifying a machine type version.
Providing an explicit list of compatible versions also seems like it
could quickly get out of hand, imagine a driver with regular releases
that maintains compatibility for years. The list could get
unmanageable.
To be honest, I'm pretty dubious whether vendors will actually implement
cross version migration, or really consider migration compatibility at
all, which is why I think we need to impose migration compatibility with
this sort of interface. A vendor that doesn't want to support cross
version migration can simply increment the version and provide no
minimum version, without at least that, I think we're gambling for
breaking devices and systems in interesting and unpredictable ways.
Thanks,
Alex