Re: [libvirt] [RFC PATCH] Add new migration flag VIR_MIGRATE_DRY_RUN

12 Nov 2018


      On 11/12/18 4:26 AM, Daniel P. Berrangé wrote:
...
On Fri, Nov 02, 2018 at 04:34:02PM -0600, Jim Fehlig wrote:
...
A dry run can be used as a best-effort check that a migration command
will succeed. The destination host will be checked to see if it can
accommodate the resources required by the domain. DRY_RUN will fail if
the destination host is not capable of running the domain. Although a
subsequent migration will likely succeed, the success of DRY_RUN does not
ensure a future migration will succeed. Resources on the destination host
could become unavailable between a DRY_RUN and actual migration.
I'm not really convinced this is a particularly useful concept,
as it is only going to catch a very small number of the reasons
why migration can fail. So you still have to expect the real
migration invokation to have a strong chance of failing.
I agree it is difficult to reliably check that a migration will succeed. TBH, I 
was expecting opposition due to libvirt already providing info for applications 
to do the check themselves. E.g. as nova has done with 
check_can_live_migrate_{source,destination} APIs.

Do you think libvirt provides enough information for an app to determine if a VM 
can be migrated between two hosts? Or maybe better asked: What info is currently 
missing for an app to reliably check if a VM can be migrated between two hosts?
...
...
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
---
If it is agreed this is useful, my thought was to use the begin and
prepare phases of migration to implement it. qemuMigrationDstPrepareAny()
already does a lot of the heavy lifting wrt checking the host can
accommodate the domain. Some of it, and the remaining migration phases,
can be short-circuited in the case of dry run.
One interesting wrinkle I've observed is the check for cpu compatibility.
AFAICT qemu is actually invoked on the dst, "filtered-features" of the cpu
are requested via qmp, and results are checked against cpu in domain config.
If cpu on dst is insufficient, migration fails in the prepare phase with
something like "guest CPU doesn't match specification: missing features: z y z".
I was hoping to avoid launching qemu in the case of dry run, but that may
be unavoidable if we'd like a dependable dry run result.
Even launching QEMU isn't good enough - it has to actually process the
migration data stream for devices to get a good indication of success,
at which point you're basically doing a real migration.
Bummer. I guess that answers my question above: no. It also implies apps cannot 
reliably check if a migration will succeed and should instead put effort into 
handling errors from an actual migration :-).

Regards,
Jim