On Thu, Oct 22, 2015 at 13:52:29 +0100, Daniel P. Berrange wrote:
...
On a related topic, we don't have great error reporting in the
(usually
unlikely) scenario that we get a stuck job / timeout. I've long thought
it could be desirable to record some metadata when we start jobs, such
as the __FUNC__ of the method which started the job, so when we report
an error we can include that info as a diagnostic aid.
Do you mean something like
virsh # resume cd
error: Failed to resume domain cd
error: Timed out during operation: cannot acquire state change lock
(held by remoteDispatchDomainSuspend)
This was implemented by v1.2.13-295-gb79f25e
This would
have to be against the qemuDomainObjPrivPtr struct. THis makes me
think that using the separate bool inJob/inMonitor stack variables
is not required.
We could just add
int threadid;
bool inJob;
bool inMonitor;
const char *jobfunc;
to qemuDomainObjPrivPtr. That way you don't need to modify the
Enter/Exit functions to add extra arguments - we just track
everything internally. When exiting, we'd compare against the
threadid, to make sure we don't accidentally relaase a different
thread's job.
Yeah, as long as we can make sure threadid is unique and stable:
/* These next two functions are for debugging only, since they are not
* guaranteed to give unique values for distinct threads on all
* architectures, nor are the two functions guaranteed to give the same
* value for the same thread. */
unsigned long long virThreadSelfID(void);
unsigned long long virThreadID(virThreadPtr thread);
so far we avoided using thread IDs for anything critical.
Jirka