On 07/13/2010 01:12 PM, Daniel P. Berrange wrote:
On Tue, Jul 13, 2010 at 06:56:53PM +0200, Thomas Treutner wrote:
> Hi,
>
> I'm facing some troubles with virDomainMigrate &
> virDomainMigrateSetMaxDowntime. The core problem is that KVM's default
> value for the maximum allowed downtime is 30ms (max_downtime in
> migration.c, it's nanoseconds there; 0.12.3) which is too low for my VMs
> when they're busy (~50% CPU util and above). Migrations then take
> literally forever, I had to abort them after 15 minutes or so. I'm using
> GBit Ethernet, so plenty bandwidth should be available. Increasing the
> allowed downtime to 50ms seems to help, but I have not tested situations
> where the VM is completely utilized. Anyways, the default value is too
> low for me, so I tried virDomainMigrateSetMaxDowntime resp. the Java
> wrapper function.
>
> Here I'm facing a problem I can overcome only with a quite crude hack:
> org.libvirt.Domain.migrate(..) blocks until the migration is done, which
> is of course reasonable. So I tried calling migrateSetMaxDowntime(..)
> before migrating, causing an error:
>
> "Requested operation is not valid: domain is not being migrated"
>
> This tells me that calling migrateSetMaxDowntime is only allowed during
> migrations. As I'm migrating VMs automatically and without any user
> intervention I'd need to create some glue code that runs in an extra
> thread, waiting "some time" hoping that the migration was kicked off in
> the main thread yet and then calling migrateSetMaxDowntime. I'd like to
> avoid such quirks in the long run, if possible.
Multiple threads is our recommended approach to the problem, since it is
a general solution. eg you can call virDomainSuspend to pause the guest
during migration & thus let it complete non-live. And virDomainGetJobInfo
to check progress. And virDomainAbortJob to cancel.
> So my question: Would it be possible to extend the migrate() method
> resp. virDomainMigrate() function with an optional maxDowntime parameter
> that is passed down as QEMU_JOB_SIGNAL_MIGRATE_DOWNTIME so that
> qemuDomainWaitForMigrationComplete would set the value? Or are there
> easier ways?
That approach really desirable IMHO, because it is already possible
todo this using threads, which is already neccessary for the other
APIs you can invoke during migration. If you care about the
max downtime parameter, then you almost certainly need to care about
calling virDomainGetJobInfo() in order to determine whether the
guest is actually progressing during migration or not.
Also sounds like it would be handy to allow globally configuring the
default migration downtime in /etc/libvirt/qemu.conf
- Cole