This series of patches fixes problems discovered in libxl migration.
The first patch fixes an issue that went undetected while testing the
initial implementation of migration. Receiving migration data occurs
in the context of an event loop callback, effectively blocking the
event loop during the entire migration process. The patch moves the
work of receiving migration data to a thread.
Interestingly, this issue manifested in a failed migration due to failed
keepalives, which would kill virsh's connection to dst host. The dst host
failed to respond to keepalives since its event loop was blocked on
receiving migration data. Ultimately the migration perform phase would
succeed leaving a running domain on dst. However, the subsequent finish
phase would fail since virsh's connection to dst had been killed by the
keepalive failure. Since finish failed, the confirm phase would resume
the domain on src. Yikes! Same domain running on two different hosts :(.
Patches 2 and 3 improve handling of errors in the event the perform or
finish phases of migration fail. See the individual patches for details.
Jim Fehlig (3):
libxl: Receive migration data in a thread
libxl: start domain paused on migration dst
libxl: destroy domain in migration finish phase on failure
src/libxl/libxl_migration.c | 75 ++++++++++++++++++++++++++++++---------------
1 file changed, 51 insertions(+), 24 deletions(-)
--
1.8.4.5