On Mon, 6 Aug 2018 23:45:21 +0530
Kirti Wankhede <kwankhede(a)nvidia.com> wrote:
On 8/3/2018 11:26 PM, Alex Williamson wrote:
> On Fri, 3 Aug 2018 12:07:58 +0000
> "Wang, Zhi A" <zhi.a.wang(a)intel.com> wrote:
>
>> Hi:
>>
>> Thanks for unfolding your idea. The picture is clearer to me now. I didn't
realize that you also want to support cross hardware migration. Well, I thought for a
while, the cross hardware migration might be not popular in vGPU case but could be quite
popular in other mdev cases.
>
> Exactly, we need to think beyond the implementation for a specific
> vendor or class of device.
>
>> Let me continue my summary:
>>
>> Mdev dev type has already included a parent driver name/a group name/physical
device version/configuration type. For example i915-GVTg_V5_4. The driver name and the
group name could already distinguish the vendor and the product between different mdevs,
e.g. between Intel and Nvidia, between vGPU or vOther.
>
> Note that there are only two identifiers here, a vendor driver and a
> type. We included the vendor driver to avoid namespace collisions
> between vendors. The type itself should be considered opaque regardless
> of how a specific vendor makes use of it.
>
>> Each device provides a collection of the version of device state of data stream
in a preferred order in a mdev type, as newer version of device state might contains more
information which might help on performances.
>>
>> Let's say a new device N and an old device O, they both support mdev_type
M.
>>
>> For example:
>> Device N is newer and supports the versions of device state: [ 6.3 6.2 .6.1 ]
in mdev type M
>> Device O is older and supports the versions of device state: [ 5.3 5.2 5.1 ] in
mdev type M
>>
>> - Version scheme of device state in backwards compatibility case: Migrate a VM
from a VM with device O to a VM with device N, the mdev type is M.
>>
>> Device N: [ 6.3 6.2 6.1 5.3 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: 5.3
>> The new device directly supports mdev_type M with the preferred version on
Device O. Good, best situation.
>>
>> Device N: [ 6.3 6.2 6.1 5.2 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: 5.2
>> The new device supports mdev_type M, but not the preferred version. After the
migration, the vendor driver might have to disable some features which is not mentioned in
5.2 device state. But this totally depends on the vendor driver. If user wish to achieve
the best experience, he should update the vendor driver in device N, which supports the
preferred version on device O.
>>
>> Device N: [ 6.3 6.2 6.1 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: None
>> No version is matched. Migration would fail. User should update the vendor
driver on device N and device O.
>>
>> - Version scheme of device state in forwards compatibility case: Migrate a VM
from a VM with N to a VM with device O, the mdev type is M.
>>
>> Device N: [ 6.3 6.2 .6.1 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M, but the user updates the vendor driver on device
O. Now device O could support [ 5.3 5.2 5.1 6.1 ] (As an old device, the Device O still
prefers version 5.3)
>> Version used in migration: 6.1
>> As the new device states is going to migrate to an old device, the vendor driver
on old device might have to specially dealing with the new version of device state. It
depends on the vendor driver.
>>
>> - QEMU has to figure out and choose the version of device states before reading
device state from the region. (Perhaps we can put the option of selection in the control
part of the region as well)
>> - Libvirt will check if there is any match of the version in the collection in
device O and device N before migration.
>> - Each mdev_type has its own collection of versions. (Device can support
different versions in different types)
>> - Better the collection is not a range, better they could be a collection of the
version strings. (The vendor driver might drop some versions during the upgrade since they
are not ideal)
>
> I believe that QEMU has always avoided trying to negotiate a migration
> version. We can only negotiate if the target is online and since a
> save/restore is essentially an offline migration, there's no
> opportunity for negotiation. Therefore I think we need to assume the
> source version is fixed. If we need to expose an older migration
> interface, I think we'd need to consider instantiating the mdev with
> that specification or configuring it via attributes before usage, just
> like QEMU does with specifying a machine type version.
>
> Providing an explicit list of compatible versions also seems like it
> could quickly get out of hand, imagine a driver with regular releases
> that maintains compatibility for years. The list could get
> unmanageable.
>
> To be honest, I'm pretty dubious whether vendors will actually implement
> cross version migration, or really consider migration compatibility at
> all, which is why I think we need to impose migration compatibility with
> this sort of interface.
Vendor driver can implement cross version migration support, may not be
cross major version but cross minor version migration support can be
implemented.
Of course, but I think we need to consider this an opt-in for the
vendor, the default should be identical version only unless the vendor
driver states otherwise.
In case of live migration, if vendor driver returns failure at
destination during its resume phase, then VM at source is resumed and it
continues to run at source, right? Please correct me if my understanding
is wrong. Then in case of Live migration, vendor driver can add binary
blob of compatibility details which vendor driver understands as first
binary blob and at destination while resuming the first step is to check
compatibility and return accordingly. If vendor driver finds its not
compatible then fail resume at destination with proper error message in
syslog.
While this is true, the device state is the final component of
migration, so you're basically asking your users to try it to see if it
works, and if it doesn't work, apparently it's not supported, or maybe
something else is broken. Not only is that a poor user experience, but
it potentially consumes massive amounts of bandwidth, resources, incurs
downtime in the VM, and it makes it difficult for management tools to
predict where a VM can be successfully migrated.
In case of save/restore same logic can be applied and resume can fail
if
vendor version is not compatible with the version when VM was saved.
So again, the user and management tool experience is to hope for the
best and assume unsupported if it doesn't work? We can do better.
Rather than embedding version information into the binary blob part of
the migration stream, shouldn't it be exposed as a standard parsed
field such that it can be included in the migration stream and
introspected later for compatibility with the host driver?
> A vendor that doesn't want to support cross
> version migration can simply increment the version and provide no
> minimum version, without at least that, I think we're gambling for
> breaking devices and systems in interesting and unpredictable ways.
If vendor driver doesn't want to support cross version migration then
they can just have version string in first binary blob and check if its
equal or not.
Then libvirt doesn't have to worry about vendor driver version. Libvirt
only need to verify that mdev type at source is creatable at destination.
As outlined above, failing at device restore is a poor solution, it's a
last resort. We need to think about supportability. Assuming that a
vendor driver has taken migration compatibility into account is not
supportable. Embedding version information into the binary blob part
of the device migration stream is not supportable. I want to be able
to file bugs with vendors with meaningful information about the source
stream and target driver with clear expectations of what should and
should not work, not shrug my shoulders and randomly try another host.
When Libvirt creates mdev type at destination, will mdev's UUID
at
source and destination be same?
There's no reason it needs to be from an mdev or QEMU perspective.
Thanks,
Alex