[libvirt] [PATCH] qemu: Transition domain to PAUSED after 'stop' command

Currently, we mark domain PAUSED (but not emit an event) just before we issue 'stop' on monitor; This command can take ages to finish, esp. when domain's doing a lot of IO - users can enforce qemu to open files with O_DIRECT which doesn't return from write() until data reaches the block device. Having said that, we report PAUSED even if domain is not paused yet. --- The event is emitted correctly after all operations returns though. But if mgmt app would 'virsh domstate $dom' as we are issuing 'stop' monitor command it could get spurious results. src/qemu/qemu_process.c | 7 +------ 1 files changed, 1 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c28f5a5..b6eb342 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2708,13 +2708,9 @@ int qemuProcessStopCPUs(struct qemud_driver *driver, virDomainObjPtr vm, enum qemuDomainAsyncJob asyncJob) { int ret; - int oldState; - int oldReason; qemuDomainObjPrivatePtr priv = vm->privateData; VIR_FREE(priv->lockState); - oldState = virDomainObjGetState(vm, &oldReason); - virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, reason); ret = qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob); if (ret == 0) { @@ -2723,11 +2719,10 @@ int qemuProcessStopCPUs(struct qemud_driver *driver, virDomainObjPtr vm, } if (ret == 0) { + virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, reason); if (virDomainLockProcessPause(driver->lockManager, vm, &priv->lockState) < 0) VIR_WARN("Unable to release lease on %s", vm->def->name); VIR_DEBUG("Preserving lock state '%s'", NULLSTR(priv->lockState)); - } else { - virDomainObjSetState(vm, oldState, oldReason); } return ret; -- 1.7.8.6

On 19.09.2012 11:43, Michal Privoznik wrote:
Currently, we mark domain PAUSED (but not emit an event) just before we issue 'stop' on monitor; This command can take ages to finish, esp. when domain's doing a lot of IO - users can enforce qemu to open files with O_DIRECT which doesn't return from write() until data reaches the block device. Having said that, we report PAUSED even if domain is not paused yet. ---
The event is emitted correctly after all operations returns though. But if mgmt app would 'virsh domstate $dom' as we are issuing 'stop' monitor command it could get spurious results.
Just for the record, qemu 'stop' command is guaranteed to NOT return until after all disks are synced. It is not that kind of command which just requests an operation an returns immediately (like 'shutdown' or something). This bug really confuses the enemy - esp. when mgmt application asks from another thread, it may get PAUSED domain state even though qemu is still syncing the disks or monitor command hasn't been issued at all or is about to fail. Michal

On 09/19/2012 12:16 PM, Michal Privoznik wrote:
On 19.09.2012 11:43, Michal Privoznik wrote:
Currently, we mark domain PAUSED (but not emit an event) just before we issue 'stop' on monitor; This command can take ages to finish, esp. when domain's doing a lot of IO - users can enforce qemu to open files with O_DIRECT which doesn't return from write() until data reaches the block device. Having said that, we report PAUSED even if domain is not paused yet. ---
The event is emitted correctly after all operations returns though. But if mgmt app would 'virsh domstate $dom' as we are issuing 'stop' monitor command it could get spurious results.
Just for the record, qemu 'stop' command is guaranteed to NOT return until after all disks are synced. It is not that kind of command which just requests an operation an returns immediately (like 'shutdown' or something).
This bug really confuses the enemy - esp. when mgmt application asks from another thread, it may get PAUSED domain state even though qemu is still syncing the disks or monitor command hasn't been issued at all or is about to fail.
That sounds reasonable. Reporting PAUSED when the domain is _being_ stopped and even if the operation might still fail is less sensible to me as well. It would be nice to have a STOPPING state, but that's idea for a longer period of time and more places in the code consequently heading towards more problems etc. So ACK for this version. Martin

On 19.09.2012 14:59, Martin Kletzander wrote:
On 09/19/2012 12:16 PM, Michal Privoznik wrote:
On 19.09.2012 11:43, Michal Privoznik wrote:
Currently, we mark domain PAUSED (but not emit an event) just before we issue 'stop' on monitor; This command can take ages to finish, esp. when domain's doing a lot of IO - users can enforce qemu to open files with O_DIRECT which doesn't return from write() until data reaches the block device. Having said that, we report PAUSED even if domain is not paused yet. ---
The event is emitted correctly after all operations returns though. But if mgmt app would 'virsh domstate $dom' as we are issuing 'stop' monitor command it could get spurious results.
Just for the record, qemu 'stop' command is guaranteed to NOT return until after all disks are synced. It is not that kind of command which just requests an operation an returns immediately (like 'shutdown' or something).
This bug really confuses the enemy - esp. when mgmt application asks from another thread, it may get PAUSED domain state even though qemu is still syncing the disks or monitor command hasn't been issued at all or is about to fail.
That sounds reasonable. Reporting PAUSED when the domain is _being_ stopped and even if the operation might still fail is less sensible to me as well. It would be nice to have a STOPPING state, but that's idea for a longer period of time and more places in the code consequently heading towards more problems etc.
So ACK for this version.
Martin
Thanks, pushed now. Michal
participants (2)
-
Martin Kletzander
-
Michal Privoznik