[libvirt] [PATCH 0/2] qemu: Improve error/debug messages if unable to acquire a job

older
[libvirt] [PATCH] qemu: ensure FDs...

Michal Privoznik

20 Jun 2018 20 Jun '18

4:32 a.m.

*** BLURB HERE *** Michal Privoznik (2): qemuDomainObjBeginJobInternal: Log agent job too qemuDomainObjBeginJobInternal: Report agent job in error message src/qemu/qemu_domain.c | 42 +++++++++++++++++++++++++++--------------- 1 file changed, 27 insertions(+), 15 deletions(-) -- 2.16.4

Show replies by date

Michal Privoznik

20 Jun 20 Jun

4:32 a.m.

New subject: [libvirt] [PATCH 1/2] qemuDomainObjBeginJobInternal: Log agent job too

If a thread is unable to start a job (e.g. because of timeout) an warning is printed into the logs. So far, the message does not contain agent job info. Add it as it might help future debugging. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d7c0598cee..827597d5f3 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6422,6 +6422,7 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, const char *blocker = NULL; int ret = -1; unsigned long long duration = 0; + unsigned long long agentDuration = 0; unsigned long long asyncDuration = 0; VIR_DEBUG("Starting job: job=%s agentJob=%s asyncJob=%s " @@ -6526,22 +6527,27 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, ignore_value(virTimeMillisNow(&now)); if (priv->job.active && priv->job.started) duration = now - priv->job.started; + if (priv->job.agentActive && priv->job.agentStarted) + agentDuration = now - priv->job.agentStarted; if (priv->job.asyncJob && priv->job.asyncStarted) asyncDuration = now - priv->job.asyncStarted; - VIR_WARN("Cannot start job (%s, %s) for domain %s; " - "current job is (%s, %s) " - "owned by (%llu %s, %llu %s (flags=0x%lx)) " - "for (%llus, %llus)", + VIR_WARN("Cannot start job (%s, %s, %s) for domain %s; " + "current job is (%s, %s, %s) " + "owned by (%llu %s, %llu %s, %llu %s (flags=0x%lx)) " + "for (%llus, %llus, %llus)", qemuDomainJobTypeToString(job), + qemuDomainAgentJobTypeToString(agentJob), qemuDomainAsyncJobTypeToString(asyncJob), obj->def->name, qemuDomainJobTypeToString(priv->job.active), + qemuDomainAgentJobTypeToString(priv->job.agentActive), qemuDomainAsyncJobTypeToString(priv->job.asyncJob), priv->job.owner, NULLSTR(priv->job.ownerAPI), + priv->job.agentOwner, NULLSTR(priv->job.agentOwnerAPI), priv->job.asyncOwner, NULLSTR(priv->job.asyncOwnerAPI), priv->job.apiFlags, - duration / 1000, asyncDuration / 1000); + duration / 1000, agentDuration / 1000, asyncDuration / 1000); if (nested || qemuDomainNestedJobAllowed(priv, job)) blocker = priv->job.ownerAPI; -- 2.16.4

Jiri Denemark

21 Jun 21 Jun

4:20 a.m.

New subject: [libvirt] [PATCH 1/2] qemuDomainObjBeginJobInternal: Log agent job too

On Wed, Jun 20, 2018 at 14:32:03 +0200, Michal Privoznik wrote:

...

If a thread is unable to start a job (e.g. because of timeout) an warning is printed into the logs. So far, the message does not

s/an/a/

...

contain agent job info. Add it as it might help future debugging.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)

Reviewed-by: Jiri Denemark <jdenemar@redhat.com>

Michal Privoznik

20 Jun 20 Jun

4:32 a.m.

New subject: [libvirt] [PATCH 2/2] qemuDomainObjBeginJobInternal: Report agent job in error message

If a thread is unable to acquire a job (e.g. because of timeout) an error is reported and the error message contains reference to the other thread holding the job. Well, the error message should report agent job too as it is yet another source of possible failure. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 827597d5f3..4331b95917 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6420,6 +6420,7 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, bool async = job == QEMU_JOB_ASYNC; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); const char *blocker = NULL; + const char *agentBlocker = NULL; int ret = -1; unsigned long long duration = 0; unsigned long long agentDuration = 0; @@ -6549,16 +6550,21 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, priv->job.apiFlags, duration / 1000, agentDuration / 1000, asyncDuration / 1000); - if (nested || qemuDomainNestedJobAllowed(priv, job)) - blocker = priv->job.ownerAPI; - else - blocker = priv->job.asyncOwnerAPI; + if (job) { + if (nested || qemuDomainNestedJobAllowed(priv, job)) + blocker = priv->job.ownerAPI; + else + blocker = priv->job.asyncOwnerAPI; + } + + if (agentJob) + agentBlocker = priv->job.agentOwnerAPI; if (errno == ETIMEDOUT) { - if (blocker) { + if (blocker || agentBlocker) { virReportError(VIR_ERR_OPERATION_TIMEOUT, - _("cannot acquire state change lock (held by %s)"), - blocker); + _("cannot acquire state change lock (held by %s %s)"), + NULLSTR(blocker), NULLSTR(agentBlocker)); } else { virReportError(VIR_ERR_OPERATION_TIMEOUT, "%s", _("cannot acquire state change lock")); @@ -6566,11 +6572,11 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, ret = -2; } else if (cfg->maxQueuedJobs && priv->jobs_queued > cfg->maxQueuedJobs) { - if (blocker) { + if (blocker || agentBlocker) { virReportError(VIR_ERR_OPERATION_FAILED, - _("cannot acquire state change lock (held by %s) " + _("cannot acquire state change lock (held by %s %s) " "due to max_queued limit"), - blocker); + NULLSTR(blocker), NULLSTR(agentBlocker)); } else { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("cannot acquire state change lock " -- 2.16.4

Jiri Denemark

22 Jun 22 Jun

12:07 a.m.

New subject: [libvirt] [PATCH 2/2] qemuDomainObjBeginJobInternal: Report agent job in error message

On Wed, Jun 20, 2018 at 14:32:04 +0200, Michal Privoznik wrote:

...

If a thread is unable to acquire a job (e.g. because of timeout) an error is reported and the error message contains reference to the other thread holding the job. Well, the error message should report agent job too as it is yet another source of possible failure.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 827597d5f3..4331b95917 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6420,6 +6420,7 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, bool async = job == QEMU_JOB_ASYNC; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); const char *blocker = NULL; + const char *agentBlocker = NULL; int ret = -1; unsigned long long duration = 0; unsigned long long agentDuration = 0; @@ -6549,16 +6550,21 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, priv->job.apiFlags, duration / 1000, agentDuration / 1000, asyncDuration / 1000);

- if (nested || qemuDomainNestedJobAllowed(priv, job)) - blocker = priv->job.ownerAPI; - else - blocker = priv->job.asyncOwnerAPI; + if (job) { + if (nested || qemuDomainNestedJobAllowed(priv, job)) + blocker = priv->job.ownerAPI; + else + blocker = priv->job.asyncOwnerAPI; + } + + if (agentJob) + agentBlocker = priv->job.agentOwnerAPI;

if (errno == ETIMEDOUT) { - if (blocker) { + if (blocker || agentBlocker) { virReportError(VIR_ERR_OPERATION_TIMEOUT, - _("cannot acquire state change lock (held by %s)"), - blocker); + _("cannot acquire state change lock (held by %s %s)"), + NULLSTR(blocker), NULLSTR(agentBlocker));

Since this is an error message reported to the user I think we should make it a little bit more user friendly. It would be nice to distinguish state change lock and agent lock and only print the relevant blocker (rather than both when one of them is NULL). And when both blockers are reported, we should separate them better, e.g., "%s and %s". Jirka

Michal Prívozník

24 Jun 24 Jun

9:37 p.m.

New subject: [libvirt] [PATCH 2/2] qemuDomainObjBeginJobInternal: Report agent job in error message

On 06/22/2018 10:07 AM, Jiri Denemark wrote:

...

On Wed, Jun 20, 2018 at 14:32:04 +0200, Michal Privoznik wrote:

...
If a thread is unable to acquire a job (e.g. because of timeout) an error is reported and the error message contains reference to the other thread holding the job. Well, the error message should report agent job too as it is yet another source of possible failure.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 827597d5f3..4331b95917 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6420,6 +6420,7 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, bool async = job == QEMU_JOB_ASYNC; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); const char *blocker = NULL; + const char *agentBlocker = NULL; int ret = -1; unsigned long long duration = 0; unsigned long long agentDuration = 0; @@ -6549,16 +6550,21 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, priv->job.apiFlags, duration / 1000, agentDuration / 1000, asyncDuration / 1000);

- if (nested || qemuDomainNestedJobAllowed(priv, job)) - blocker = priv->job.ownerAPI; - else - blocker = priv->job.asyncOwnerAPI; + if (job) { + if (nested || qemuDomainNestedJobAllowed(priv, job)) + blocker = priv->job.ownerAPI; + else + blocker = priv->job.asyncOwnerAPI; + } + + if (agentJob) + agentBlocker = priv->job.agentOwnerAPI;

if (errno == ETIMEDOUT) { - if (blocker) { + if (blocker || agentBlocker) { virReportError(VIR_ERR_OPERATION_TIMEOUT, - _("cannot acquire state change lock (held by %s)"), - blocker); + _("cannot acquire state change lock (held by %s %s)"), + NULLSTR(blocker), NULLSTR(agentBlocker));

Since this is an error message reported to the user I think we should make it a little bit more user friendly. It would be nice to distinguish state change lock and agent lock and only print the relevant blocker (rather than both when one of them is NULL). And when both blockers are reported, we should separate them better, e.g., "%s and %s".

I thought about it too, but then decided to take the easy way out because this would end up being a spaghetti code in my sight. But okay, I will rework this to: if (blocker && agentBlocker) { ... } else if (blocker) { ... } else if (agentBlocker) { ... } else { ... } where all four bodies are merely the same (only the error message would change - hence the spaghetti code). Michal

2806

Age (days ago)

2811

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Jiri Denemark
Michal Privoznik
Michal Prívozník