
On 07.11.2018 17:29, John Ferlan wrote:
On 10/8/18 7:21 AM, Nikolay Shirokovskiy wrote:
Let's introduce shutdown reason "daemon" which means we have to kill running domain ourselves as the best action we can take at that moment. Failure to pick up domain on daemon restart is one example of such case. Using reason "crashed" is a bit misleading as it is used when qemu is actually crashed or qemu destroyed on guest panic as result of user choice that is the problem was in qemu/guest itself. So I propose to use "crashed" only for qemu side issues and introduce "daemon" when we have to kill the qemu for good.
There is another example where "daemon" will be useful. If we can not reboot domain we kill it and got "crashed" reason now. Looks like good candidate for "daemon" reason.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- include/libvirt/libvirt-domain.h | 1 + src/conf/domain_conf.c | 3 ++- src/qemu/qemu_process.c | 11 ++++------- tools/virsh-domain-monitor.c | 3 ++- 4 files changed, 9 insertions(+), 9 deletions(-)
[...]
Now that I've pushed commits 296e05b54 and 8f0f8425d, let's revisit this... Merging in that set of changes with this patch means that instead of:
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e9c7618..c4bc9ca 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7988,15 +7988,12 @@ qemuProcessReconnect(void *opaque) /* We can't get the monitor back, so must kill the VM * to remove danger of it ending up running twice if * user tries to start it again later - * If we couldn't get the monitor since QEMU supports - * no-shutdown, we can safely say that the domain - * crashed ... */ - state = VIR_DOMAIN_SHUTOFF_CRASHED; - /* If BeginJob failed, we jumped here without a job, let's hope another + * If BeginJob failed, we jumped here without a job, let's hope another * thread didn't have a chance to start playing with the domain yet * (it's all we can do anyway). */ - qemuProcessStop(driver, obj, state, QEMU_ASYNC_JOB_NONE, stopFlags); + qemuProcessStop(driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, + QEMU_ASYNC_JOB_NONE, stopFlags); } goto cleanup; } @@ -8035,7 +8032,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, * is no thread that could be doing anything else with the same domain * object. */ - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, QEMU_ASYNC_JOB_NONE, 0); qemuDomainRemoveInactiveJobLocked(src->driver, obj);
We'd have:
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3bf84834ec..023e187729 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7995,10 +7995,14 @@ qemuProcessReconnect(void *opaque) * * If we cannot get to the monitor when the QEMU command * line used -no-shutdown, then we can safely say that the - * domain crashed; otherwise, we don't really know. */ + * domain crashed; otherwise, if the monitor was started, + * then we can blame ourselves, else we failed before the + * monitor started so we don't really know. */ if (!priv->mon && tryMonReconn && qemuDomainIsUsingNoShutdown(priv)) state = VIR_DOMAIN_SHUTOFF_CRASHED; + else if (priv->mon) + state = VIR_DOMAIN_SHUTOFF_DAEMON; else state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
@@ -8045,7 +8049,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, * is no thread that could be doing anything else with the same domain * object. */ - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, QEMU_ASYNC_JOB_NONE, 0);
I think we need to report VIR_DOMAIN_SHUTOFF_UNKNOWN here. This is similar to qemuProcessReconnect before we try to reconnect. May be this hunk need to be placed in distinct patch therefore.
qemuDomainRemoveInactiveJobLocked(src->driver, obj);
[...]
So does this essentially meet/handle the condition you were trying to alter? That is - once we get beyond the monitor reconnection, then if we end up in error that means the daemon has decided to stop things. Leaving the UNKNOWN to be the period prior to attempting to reconnect and CRASHED to the period during which we try to connect to the monitor.
This also captures failures after connection is successful (when priv->mon in qemuConnectMonitor).
Getting more specific failures of why the connection failed is perhaps a future task if it really matters.
This looks good enough for me too.
If this looks good to you let me know, I can merge and push with the previously noted changes and a commit message of:
"This patch introduces a new shutdown reason "daemon" in order to indicate that the daemon needed to force shutdown the domain as the best course of action to take at the moment.
This action would occur during reconnection when processing encounters an error once the monitor reconnection is successful."
Besides the first comment I'm fine with this change. I'm going to send a patch to change reason to VIR_DOMAIN_SHUTOFF_DAEMON in appropriate reboot failure cases too. Also one day let's fix the code not to kill unrelated process. Nikolay