On 4/25/19 6:46 AM, Michal Privoznik wrote:
On 4/24/19 11:16 PM, Daniel Henrique Barboza wrote:
> If the current QEMU guest can't wake up from suspend properly,
> and we are able to determine that, avoid suspending the guest
> at all. To be able to determine this support, QEMU needs to
> implement the 'query-current-machine' QMP call. This is reflected
> by the QEMU_CAPS_QUERY_CURRENT_MACHINE cap.
>
> If the cap is enabled, a new function qemuDomainProbeQMPCurrentMachine
> is called. This is wrapper for qemuMonitorGetCurrentMachineInfo,
> where the 'wakeup-suspend-support' flag is retrieved from
> 'query-current-machine'. If wakeupSuspendSupport is true,
> proceed with the regular flow of qemuDomainPMSuspendForDuration.
>
> The absence of QEMU_CAPS_QUERY_CURRENT_MACHINE indicates that
> we're dealing with a QEMU version older than 4.0 (which implements
> the required QMP API). In this case, proceed as usual with the
> suspend logic of qemuDomainPMSuspendForDuration, since we can't
> assume whether the guest has support or not.
>
> Fixes:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1759509
> Reported-by: Balamuruhan S <bala24(a)linux.vnet.ibm.com>
> Signed-off-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
> ---
> src/qemu/qemu_driver.c | 54 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 31d8647eee..d584f6ae66 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -19145,6 +19145,35 @@ qemuDomainGetCPUStats(virDomainPtr domain,
> return ret;
> }
> +static int
> +qemuDomainProbeQMPCurrentMachine(virQEMUDriverPtr driver,
> + virDomainObjPtr vm,
> + bool *enabled)
> +{
> + qemuMonitorCurrentMachineInfo info = { 0 };
> + qemuDomainObjPrivatePtr priv;
> + int ret = -1;
> +
> + *enabled = false;
> + priv = vm->privateData;
> +
> + qemuDomainObjEnterMonitor(driver, vm);
> +
> + if (qemuMonitorGetCurrentMachineInfo(priv->mon, &info) < 0)
> + goto cleanup;
> +
> + ret = 0;
> +
> + if (info.wakeupSuspendSupport)
> + *enabled = true;
> +
> + cleanup:
> + if (qemuDomainObjExitMonitor(driver, vm) < 0)
> + ret = -1;
> +
> + return ret;
This can be simplified.
> +}
> +
> static int
> qemuDomainPMSuspendForDuration(virDomainPtr dom,
> unsigned int target,
> @@ -19152,6 +19181,7 @@ qemuDomainPMSuspendForDuration(virDomainPtr dom,
> unsigned int flags)
> {
> virQEMUDriverPtr driver = dom->conn->privateData;
> + qemuDomainObjPrivatePtr priv;
> virDomainObjPtr vm;
> qemuAgentPtr agent;
> int ret = -1;
> @@ -19185,6 +19215,30 @@ qemuDomainPMSuspendForDuration(virDomainPtr
> dom,
> if (virDomainObjCheckActive(vm) < 0)
> goto endjob;
> + /*
> + * The case we want to handle here is when QEMU has the API (i.e.
> + * QEMU_CAPS_QUERY_CURRENT_MACHINE is set). Otherwise, do not
> interfere
> + * with the suspend process. This means that existing running
> domains,
> + * that don't know about this cap, will keep their old behavior of
> + * suspending 'in the dark'.
> + */
> + priv = vm->privateData;
> + if (virQEMUCapsGet(priv->qemuCaps,
> QEMU_CAPS_QUERY_CURRENT_MACHINE)) {
> + bool enabled;
s/enabled/wakeupSupported/ to make it a bit more obvious what it is
we're querying for here.
> +
> + if (qemuDomainProbeQMPCurrentMachine(driver, vm, &enabled) <
> 0) {
This is not sufficient. A few lines above (not visible in the patch
context) we grab agent job but this enters the monitor then. This is
dangerous because it means we're talking on monitor without locking it
and thus might interfere with some other thread. Either we grab the
correct job depending whether we are going to talk on monitor or not,
or we can just grab both jobs even if qemu doesn't have the capability.
Thanks for the explanation and fixing before pushing. I was under the
impression
that the now extinct line:
"if (qemuDomainObjBeginAgentJob(driver, vm, QEMU_AGENT_JOB_MODIFY) < 0)"
Was enough to hold the lock context for all the VM resources, not only
the QEMU
agent. Mostly because I saw the usage of:
if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0)
Right below, in qemuDomainPMWakeup(), that access the monitor in the same
function.
But now that I'm putting it all together I noticed that the functions
are all different .... so qemuDomainObjBeginJob() locks the monitor,
qemuDomainObjBeginAgentJob() locks the agent, and
qemuDomainObjBeginJobWithAgent() can lock both, depending if
job isn't QEMU_JOB_NONE. Is this somewhat close to the actual usage?
Thanks,
DHB
The cleaner solution would be the former.
> + virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> + _("Error executing query-current-machine
> call"));
No need to overwrite an error here. I mean,
qemuDomainProbeQMPCurrentMachine() will report error on failure (not
explicitly, but the functions it's calling do report error.
> + goto endjob;
> + }
> +
> + if (!enabled) {
> + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
> + _("Domain does not have suspend support"));
> + goto endjob;
> + }
> + }
> +
> if (vm->def->pm.s3 || vm->def->pm.s4) {
> if (vm->def->pm.s3 == VIR_TRISTATE_BOOL_NO &&
> (target == VIR_NODE_SUSPEND_TARGET_MEM ||
>
I'll fix the issues before pushing.
Michal