
On Mon, Mar 10, 2014 at 15:36:06 +0100, Olaf Hering wrote:
During live migration of VMs from one host to another the VM is suspended for an unpredictable amount of time. The actual downtime depends on how many new pages will be dirty and the band width to the destination host. Since VM memory size grows faster than transfer rates the currently available tuneables will cause troubles for workloads within the VM which can not handle large timejumps.
I have already written code to tweak the inner loop doing the actual migration work in libxc. But the patchset exposes the details of the loop to the cmdline, as such it is not portable nor is it a friendly UI for the hostadmin.
Here is my proposal for a new option for virsh and 2 new options for xl:
[xl | virsh --live] --max-suspend-time N --timeout N VM host
--max-suspend-time N: as the name suggests, the VM downtime must not be longer than specified. The code doing the migration has to estimate the transfer speed. If the VM is about to be suspended, it has to check if the remaining dirty pages can be transfered within the required timeframe. If not, the migration is aborted, the VM continues to run on the src host, the new VM on the dst host is destroyed and an error is returned.
Libvirt already has virDomainMigrateSetMaxDowntime API with this semantics. However, using virsh, one can set it with virsh migrate-setmaxdowntime command while migration is happening. Not sure if exposing it as yet another parameter of already quite complicated migrate command would buy us much.
--timeout N: if a VM is busy and its workload causes many new dirty pages the migrate command would take forever. This option is supposed to stop the migration attempt if the number of new dirty pages is too high. It would change the semantics of "virsh migrate --timeout n", which currently forces a suspend (according to the help text).
This is not acceptable. If you want an option to automatically cancel migration after a given timeout, you would need to introduce a new option instead of changing semantics of an existing option. Jirka