On Fri, Aug 24, 2018 at 14:40:08 -0600, Jim Fehlig wrote:
While investigating a bug [1] found by Xen's osstest I realized I
don't quite
understand how to handle modify jobs (e.g. BeginJob/EndJob) on virDomainObj
across the various phases of V3 migration protocol. E.g. on the src host the
Begin, Perform, and Confirm phases are performed. Should a modify job start
(BeginJob) in the Begin phase and stop (EndJob) in the Confirm phase? Or should
each phase, if necessary, do BeginJob/EndJob? Same question for dst host. IMO
the job should be held across the phases on each host, preventing any
modifications during the overall migration process.
Right, the first phase (Begin on the source and Prepare on the
destination) should acquire the job and it should be held until the end
of the migration (Confirm/Finish) to make sure nothing changes during
the migration. In QEMU driver, we have several helpers around the
generic job APIs:
qemuMigrationJobStart
- used at the beginning of migration
qemuMigrationJobSetPhase
- called at the beginning of each migration phase except the
first one (the first one calls qemuMigrationJobStart)
qemuMigrationJobContinue
- called at the end of each phase except for the last one (which
calls qemuMigrationJobFinish)
qemuMigrationJobFinish
- called at the end of migration
Although I do worry about orphaned jobs, e.g. a missed EndJob caused
by some obscure error in the migration machinery.
Well, this could happen even if the job was acquired by each step
separately. I don't think that spanning the job over several APIs makes
the situation significantly worse.
Jirka