[libvirt] [PATCH] qemu: fix domjobabort regression

This reverts commit ef1065cf5ac; see also this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=751900 In qemu 0.15.1 and earlier, during migration to file, the qemu_savevm_state_begin and qemu_savevm_state_iterate methods will both process as much migration data as possible until either 1. The file descriptor returns EAGAIN 2. The bandwidth rate limit is reached If we set the rate limit to ULONG_MAX, test 2 never becomes true. We're passing a plain file descriptor to QEMU and POSIX does not support EAGAIN on regular files / block devices, so test 1 never becomes true either. In the 'virsh save --bypass-cache' case, we pass a pipe instead of a regular fd, but using a pipe adds I/O overhead, so always passing a pipe just so qemu can see EAGAIN doesn't seem nice. The ultimate fix needs to come from qemu - background migration must respect asynchronous abort requests, or else periodically return control to the main handling loop without an EAGAIN and without waiting to hit an insanely large amount of data. But until a version of qemu is fixed to support "unlimited" data rates while still allowing cancellation, the best we can do is avoid the automatic use of unlimited rates from within libvirt (users can still explicitly change the migration rates, if they are aware that they are giving up the ability to cancel a job). Reverting the lone use of QEMU_DOMAIN_FILE_MIG_BANDWIDTH_MAX is the simplest patch; this slows migration back down to a default 32M/sec cap, but also ensures that the main qemu processing loop will still be responsive to cancellation requests. Hopefully upstream qemu will provide us a means of safely using unlimited speed, including a runtime probe of that capability. * src/qemu/qemu_migration.c (qemuMigrationToFile): Revert attempt to use unlimited migration bandwidth when migrating to file. Signed-off-by: Daniel Veillard <veillard@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_migration.c | 17 ----------------- 1 files changed, 0 insertions(+), 17 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 838a31f..30f805d 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2825,16 +2825,6 @@ qemuMigrationToFile(struct qemud_driver *driver, virDomainObjPtr vm, bool restoreLabel = false; virCommandPtr cmd = NULL; int pipeFD[2] = { -1, -1 }; - unsigned long saveMigBandwidth = priv->migMaxBandwidth; - - /* Increase migration bandwidth to unlimited since target is a file. - * Failure to change migration speed is not fatal. */ - if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { - qemuMonitorSetMigrationSpeed(priv->mon, - QEMU_DOMAIN_FILE_MIG_BANDWIDTH_MAX); - priv->migMaxBandwidth = QEMU_DOMAIN_FILE_MIG_BANDWIDTH_MAX; - qemuDomainObjExitMonitorWithDriver(driver, vm); - } if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && (!compressor || pipe(pipeFD) == 0)) { @@ -2946,13 +2936,6 @@ qemuMigrationToFile(struct qemud_driver *driver, virDomainObjPtr vm, ret = 0; cleanup: - /* Restore max migration bandwidth */ - if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { - qemuMonitorSetMigrationSpeed(priv->mon, saveMigBandwidth); - priv->migMaxBandwidth = saveMigBandwidth; - qemuDomainObjExitMonitorWithDriver(driver, vm); - } - VIR_FORCE_CLOSE(pipeFD[0]); VIR_FORCE_CLOSE(pipeFD[1]); virCommandFree(cmd); -- 1.7.4.4

On Wed, Nov 09, 2011 at 09:29:23AM -0700, Eric Blake wrote:
This reverts commit ef1065cf5ac; see also this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=751900
In qemu 0.15.1 and earlier, during migration to file, the qemu_savevm_state_begin and qemu_savevm_state_iterate methods will both process as much migration data as possible until either
1. The file descriptor returns EAGAIN 2. The bandwidth rate limit is reached
If we set the rate limit to ULONG_MAX, test 2 never becomes true. We're passing a plain file descriptor to QEMU and POSIX does not support EAGAIN on regular files / block devices, so test 1 never becomes true either.
In the 'virsh save --bypass-cache' case, we pass a pipe instead of a regular fd, but using a pipe adds I/O overhead, so always passing a pipe just so qemu can see EAGAIN doesn't seem nice.
The ultimate fix needs to come from qemu - background migration must
ACK, it's a bit of a shame, but until we have a fix, it's more important to be able to abort a save to disk than to be able to save at a speed of more than 32MB/s (if the storage system support this) Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Daniel Veillard wrote:
On Wed, Nov 09, 2011 at 09:29:23AM -0700, Eric Blake wrote:
This reverts commit ef1065cf5ac; see also this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=751900
In qemu 0.15.1 and earlier, during migration to file, the qemu_savevm_state_begin and qemu_savevm_state_iterate methods will both process as much migration data as possible until either
1. The file descriptor returns EAGAIN 2. The bandwidth rate limit is reached
If we set the rate limit to ULONG_MAX, test 2 never becomes true. We're passing a plain file descriptor to QEMU and POSIX does not support EAGAIN on regular files / block devices, so test 1 never becomes true either.
In the 'virsh save --bypass-cache' case, we pass a pipe instead of a regular fd, but using a pipe adds I/O overhead, so always passing a pipe just so qemu can see EAGAIN doesn't seem nice.
The ultimate fix needs to come from qemu - background migration must
ACK, it's a bit of a shame, but until we have a fix, it's more important to be able to abort a save to disk than to be able to save at a speed of more than 32MB/s (if the storage system support this)
Agreed. BTW, I don't have access to the bug to see details, but I think I understand given Eric's description. Jim

On 11/10/2011 04:43 PM, Jim Fehlig wrote:
In the 'virsh save --bypass-cache' case, we pass a pipe instead of a regular fd, but using a pipe adds I/O overhead, so always passing a pipe just so qemu can see EAGAIN doesn't seem nice.
The ultimate fix needs to come from qemu - background migration must
ACK, it's a bit of a shame, but until we have a fix, it's more important to be able to abort a save to disk than to be able to save at a speed of more than 32MB/s (if the storage system support this)
Agreed.
BTW, I don't have access to the bug to see details, but I think I understand given Eric's description.
I've gone ahead and pushed my quick fix, but hopefully this isn't the end of the story (see Dan's thread: https://www.redhat.com/archives/libvir-list/2011-November/msg00572.html). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
participants (3)
-
Daniel Veillard
-
Eric Blake
-
Jim Fehlig