On 19.12.2019 04:30, Cole Robinson wrote:
On 10/30/19 4:09 AM, Nikolay Shirokovskiy wrote:
> If we use fake reboot then domain goes thru running->shutdown->running
> state changes with shutdown state only for short period of time. At
> least this is implementation details leaking into API. And also there is
> one real case when this is not convinient. I'm doing a backup with the
> help of temporary block snapshot (with the help of qemu's API which is
> used in the newly created libvirt's backup API). If guest is shutdowned
> I want to continue to backup so I don't kill the process and domain is
> in shutdown state. Later when backup is finished I want to destroy qemu
> process. So I check if it is in shutdowned state and destroy it if it
> is. Now if instead of shutdown domain got fake reboot then I can destroy
> process in the middle of fake reboot process.
>
> After shutdown event we also get stop event and now as domain state is
> running it will be transitioned to paused state and back to running
> later. Though this is not critical for the described case I guess it is
> better not to leak these details to user too. So let's leave domain in
> running state on stop event if fake reboot is in process.
>
> As we don't know when stop event really arrive let's move resetting flag
> after starting CPUs - at this point stop event should be handled.
>
Sorry this didn't receive a timely response. Unsurprisingly it's
conflicting with master now.
Conceptually what you say makes sense, that the 'shutdown' is an
implementation detail of our 'reboot' implementation and details
shouldn't leak to apps.
Here's the events I see with: sudo virsh reboot --mode=acpi f30
$ sudo virsh event --domain f30 --all --loop
event 'agent-lifecycle' for domain f30: state: 'disconnected' reason:
'channel event'
event 'lifecycle' for domain f30: Shutdown Finished after guest request
event 'reboot' for domain f30
event 'lifecycle' for domain f30: Resumed Unpaused
event 'reboot' for domain f30
event 'agent-lifecycle' for domain f30: state: 'connected' reason:
'channel event'
Using --mode=agent gives:
event 'agent-lifecycle' for domain f30: state: 'disconnected' reason:
'channel event'
event 'reboot' for domain f30
event 'reboot' for domain f30
event 'agent-lifecycle' for domain f30: state: 'connected' reason:
'channel event'
So this change is moving the first more towards the latter. (not sure
what those two reboot events are about)
> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy(a)virtuozzo.com>
> ---
> src/qemu/qemu_process.c | 57 ++++++++++++++++++++++-------------------
> 1 file changed, 31 insertions(+), 26 deletions(-)
>
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index ed8666e9d1..2d37f92724 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -500,6 +500,8 @@ qemuProcessFakeReboot(void *opaque)
> goto endjob;
> }
>
> + qemuDomainSetFakeReboot(driver, vm, false);
> +
I think this should also be in the cleanup path of this function,
otherwise an earlier error looks like it would leave priv->fakeReboot
still set. But it makes sense it needs to be here too, so it's saved in
the domain status
Fake reboot is not reseted only in case of error but in this case domain
is killed forcefully and qemuProcessInit resets fake reboot flag so seems
good as it is to me.
By the way I reset fake reboot flag this late so that is seen as set by
qemuProcessHandleStop.
> if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm,
driver->caps) < 0) {
> VIR_WARN("Unable to save status on vm %s after state change",
> vm->def->name);
> @@ -525,7 +527,6 @@ qemuProcessShutdownOrReboot(virQEMUDriverPtr driver,
> qemuDomainObjPrivatePtr priv = vm->privateData;
>
> if (priv->fakeReboot) {
> - qemuDomainSetFakeReboot(driver, vm, false);
> virObjectRef(vm);
> virThread th;
> if (virThreadCreate(&th,
> @@ -534,6 +535,7 @@ qemuProcessShutdownOrReboot(virQEMUDriverPtr driver,
> vm) < 0) {
> VIR_ERROR(_("Failed to create reboot thread, killing
domain"));
> ignore_value(qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_NOWAIT));
> + qemuDomainSetFakeReboot(driver, vm, false);
> virObjectUnref(vm);
> }
> } else {
> @@ -595,35 +597,37 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon G_GNUC_UNUSED,
> goto unlock;
> }
>
> - VIR_DEBUG("Transitioned guest %s to shutdown state",
> - vm->def->name);
> - virDomainObjSetState(vm,
> - VIR_DOMAIN_SHUTDOWN,
> - VIR_DOMAIN_SHUTDOWN_UNKNOWN);
> + if (!priv->fakeReboot) {
Here it would be helpful to have a comment like
/* If the qemu process was stopped as part of fakeReboot reset, we skip
sending a shutdown event */
> + VIR_DEBUG("Transitioned guest %s to shutdown state",
> + vm->def->name);
> + virDomainObjSetState(vm,
> + VIR_DOMAIN_SHUTDOWN,
> + VIR_DOMAIN_SHUTDOWN_UNKNOWN);
>
> - switch (guest_initiated) {
> - case VIR_TRISTATE_BOOL_YES:
> - detail = VIR_DOMAIN_EVENT_SHUTDOWN_GUEST;
> - break;
> + switch (guest_initiated) {
> + case VIR_TRISTATE_BOOL_YES:
> + detail = VIR_DOMAIN_EVENT_SHUTDOWN_GUEST;
> + break;
>
> - case VIR_TRISTATE_BOOL_NO:
> - detail = VIR_DOMAIN_EVENT_SHUTDOWN_HOST;
> - break;
> + case VIR_TRISTATE_BOOL_NO:
> + detail = VIR_DOMAIN_EVENT_SHUTDOWN_HOST;
> + break;
>
> - case VIR_TRISTATE_BOOL_ABSENT:
> - case VIR_TRISTATE_BOOL_LAST:
> - default:
> - detail = VIR_DOMAIN_EVENT_SHUTDOWN_FINISHED;
> - break;
> - }
> + case VIR_TRISTATE_BOOL_ABSENT:
> + case VIR_TRISTATE_BOOL_LAST:
> + default:
> + detail = VIR_DOMAIN_EVENT_SHUTDOWN_FINISHED;
> + break;
> + }
>
> - event = virDomainEventLifecycleNewFromObj(vm,
> - VIR_DOMAIN_EVENT_SHUTDOWN,
> - detail);
> + event = virDomainEventLifecycleNewFromObj(vm,
> + VIR_DOMAIN_EVENT_SHUTDOWN,
> + detail);
>
> - if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm,
driver->caps) < 0) {
> - VIR_WARN("Unable to save status on vm %s after state change",
> - vm->def->name);
> + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm,
driver->caps) < 0) {
> + VIR_WARN("Unable to save status on vm %s after state change",
> + vm->def->name);
> + }
> }
>
> if (priv->agent)
> @@ -657,7 +661,8 @@ qemuProcessHandleStop(qemuMonitorPtr mon G_GNUC_UNUSED,
> reason = priv->pausedReason;
> priv->pausedReason = VIR_DOMAIN_PAUSED_UNKNOWN;
>
> - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) {
> + if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING &&
> + !priv->fakeReboot) {
What case does this trigger in? A comment will help too
Ok. I add comment to code too and leave some notes here. This additional
check helps to hide transitional paused state during fake reboot.
Couple things I'm curious about is if the VM is left in an inaccurate
state if the fakereboot fails. But qemuProcessFakeReboot will
qemuProcessKill in that case which I think covers it.
Yeah, this was the idea.
The other is, whether we should fully ignore the internal state change,
or just not send the event. By ignoring the state change we are
willfully claiming the VM is still running when we know it isn't. I
guess that should be a small window of time, reset + StartCPUs should be
quick I think, but it's not a pattern I've seen in the code before
I think of the case next way. What use if one peeks for domain state and sees
it is shutdown or paused during fake reboot. Probably one thinks domain
is "really" shutdown/paused and will act accordingly. The similar approach
uses openstack AFAII which has its own vm state and it is running throught
the reboot process even the libvirt domain goes thru shutdown and start.
I'll send next version.
Nikolay