Hi all,
Currently when we start a non-tunneled migration, data go straight from
source qemu to destination qemu. This is nice in that there is no additional
overhead but it also has several disadvantages. If the communication between
source and destination qemu breaks, we only get unexpected error message from
qemu with no glue about what happened. Another issue is that if qemu cannot
send migration data, we cannot cancel the migration because migrate_cancel
blocks until all buffers with migration data queued up for transmission are
written into the socket.
That said, I think we should act as a proxy between source and destination
qemu so that we can detect and report normal errors (such as connection reset
by peer) and cancel migration at any time. Since we have virNetSocket and we
already use that for connecting to destination qemu, we should use it for
proxying migration data as well. This approach also has some disadvantages,
e.g., a single libvirt thread instead of several qemu processes will now send
migration data from all domains that are being migrated. However, I feel like
the gain is bigger than the downside. And we already do the same for tunneled
migration anyway.
Any objections?
Jirka