Re: [libvirt] Determining domain job kind from job stats?

[Starting to move to the development list.] Milan Zamazal <mzamazal@redhat.com> writes:
Jiri Denemark <jdenemar@redhat.com> writes:
On Fri, Feb 17, 2017 at 12:38:24 +0100, Milan Zamazal wrote:
There are basically two problems:
- When the job completion callback is called, I need to distinguish what kind of job was it to perform the appropriate actions. It would be easier if I knew the job type directly in the callback (no need to coordinate anything), but "external" job tracking is also possible.
An immediate answer would be: "don't rely on the completion callback and just check the return value of the API which started the job", but I guess you want it because checking the return value is not possible when the process which started the job is not running anymore as described below.
Well, avoiding using the completion callback is probably OK for me.
Thinking about it more, it's not very nice: I have to use the callback to get the completed job stats (I'm not guaranteed the domain still exists on the source host when I ask it for the stats explicitly) *and* to track the jobs outside the callback to know whether the callback is related to the type of domain jobs I'm going to handle. Although not absolutely necessary, it would be much nicer if the job type was identified in the callback.
(In case of the process restart, I don't expect having everything perfectly working, just some basic sanity.)
- If I lost track of my jobs (e.g. because of a crash and restart), I'd like to find out whether a given VM is migrating. Examining the job looked like a good candidate to get the information, but apparently it's not. Again, I can probably arrange things to handle that, but to get the information directly from libvirt (not necessarily via job info) would be easier and more reliable.
Apparently you are talking about peer-to-peer migration,
Yes.
otherwise the migration would be automatically canceled when the process which started it disappears. I'm afraid this is not currently possible in general. You might be able to get something by checking the domain's status, but it won't work in all cases.
Too bad. Could some future libvirt version provide that information?
If libvirt provided information about the job type, it would help with several things, for instance: With the callback problem above, with using libvirt as the ultimate single source of information about the VMs, or with handling VMs not running under complete control of a particular piece of software. If some piece of information about a VM is missing from libvirt I need to store it somewhere. Domain metadata is a natural place for that, since it's closely bound to the corresponding VM and keeps libvirt as the ultimate single source of information. But putting there information about migration is weird at best. Based on those thoughts I think libvirt should really provide a simple way to find out whether the VM is migrating to another host and to identify the domain job type in the job completed callback. Is there anything preventing to add that information? Thanks, Milan

On Tue, Feb 21, 2017 at 15:38:22 +0100, Milan Zamazal wrote:
[Starting to move to the development list.]
Milan Zamazal <mzamazal@redhat.com> writes:
Jiri Denemark <jdenemar@redhat.com> writes:
On Fri, Feb 17, 2017 at 12:38:24 +0100, Milan Zamazal wrote:
There are basically two problems:
- When the job completion callback is called, I need to distinguish what kind of job was it to perform the appropriate actions. It would be easier if I knew the job type directly in the callback (no need to coordinate anything), but "external" job tracking is also possible.
An immediate answer would be: "don't rely on the completion callback and just check the return value of the API which started the job", but I guess you want it because checking the return value is not possible when the process which started the job is not running anymore as described below.
Well, avoiding using the completion callback is probably OK for me.
Thinking about it more, it's not very nice: I have to use the callback to get the completed job stats (I'm not guaranteed the domain still exists on the source host when I ask it for the stats explicitly) *and* to track the jobs outside the callback to know whether the callback is related to the type of domain jobs I'm going to handle.
Although not absolutely necessary, it would be much nicer if the job type was identified in the callback.
The job completed event uses type parameters so adding a new parameter describing the just completed job should not be a problem. Jirka

Jiri Denemark <jdenemar@redhat.com> writes:
On Tue, Feb 21, 2017 at 15:38:22 +0100, Milan Zamazal wrote:
[Starting to move to the development list.]
Milan Zamazal <mzamazal@redhat.com> writes:
Jiri Denemark <jdenemar@redhat.com> writes:
On Fri, Feb 17, 2017 at 12:38:24 +0100, Milan Zamazal wrote:
There are basically two problems:
- When the job completion callback is called, I need to distinguish what kind of job was it to perform the appropriate actions. It would be easier if I knew the job type directly in the callback (no need to coordinate anything), but "external" job tracking is also possible.
An immediate answer would be: "don't rely on the completion callback and just check the return value of the API which started the job", but I guess you want it because checking the return value is not possible when the process which started the job is not running anymore as described below.
Well, avoiding using the completion callback is probably OK for me.
Thinking about it more, it's not very nice: I have to use the callback to get the completed job stats (I'm not guaranteed the domain still exists on the source host when I ask it for the stats explicitly) *and* to track the jobs outside the callback to know whether the callback is related to the type of domain jobs I'm going to handle.
Although not absolutely necessary, it would be much nicer if the job type was identified in the callback.
The job completed event uses type parameters so adding a new parameter describing the just completed job should not be a problem.
Great, so all what remains to solve the problem is to add the parameter :-). (And I'd like to have the same info available when a job is still running for reasons already discussed.) I looked how the change could be implemented. Could you please help me clarify some things? - I think a new member should be added to _virDomainJobInfo for the purpose. What would be a good name for it? Maybe "operation"? - Do I need to care about backends other than QEMU? - Jobs are classified by qemuDomainAsyncJob, which is a QEMU specific type. Is it OK to use such structures in virsh-domain.c or is there any additional abstraction needed? - Are there any libvirt-python updates needed or will all the things propagate to it automatically? - I think there are no documentation updates needed to perform manually for this change, right? - Should I be aware of anything else? Thanks, Milan

On Mon, Apr 10, 2017 at 12:58:09PM +0200, Milan Zamazal wrote:
Jiri Denemark <jdenemar@redhat.com> writes:
On Tue, Feb 21, 2017 at 15:38:22 +0100, Milan Zamazal wrote:
[Starting to move to the development list.]
Milan Zamazal <mzamazal@redhat.com> writes:
Jiri Denemark <jdenemar@redhat.com> writes:
On Fri, Feb 17, 2017 at 12:38:24 +0100, Milan Zamazal wrote:
There are basically two problems:
- When the job completion callback is called, I need to distinguish what kind of job was it to perform the appropriate actions. It would be easier if I knew the job type directly in the callback (no need to coordinate anything), but "external" job tracking is also possible.
An immediate answer would be: "don't rely on the completion callback and just check the return value of the API which started the job", but I guess you want it because checking the return value is not possible when the process which started the job is not running anymore as described below.
Well, avoiding using the completion callback is probably OK for me.
Thinking about it more, it's not very nice: I have to use the callback to get the completed job stats (I'm not guaranteed the domain still exists on the source host when I ask it for the stats explicitly) *and* to track the jobs outside the callback to know whether the callback is related to the type of domain jobs I'm going to handle.
Although not absolutely necessary, it would be much nicer if the job type was identified in the callback.
The job completed event uses type parameters so adding a new parameter describing the just completed job should not be a problem.
Great, so all what remains to solve the problem is to add the parameter :-). (And I'd like to have the same info available when a job is still running for reasons already discussed.)
I looked how the change could be implemented. Could you please help me clarify some things?
- I think a new member should be added to _virDomainJobInfo for the purpose. What would be a good name for it? Maybe "operation"? - Do I need to care about backends other than QEMU? - Jobs are classified by qemuDomainAsyncJob, which is a QEMU specific type. Is it OK to use such structures in virsh-domain.c or is there any additional abstraction needed?
I don't much like the idea of exposing the QEMU job operation names in the public API. Perhaps we instead need to have the method which starts the job, return an integer "job id" that is then reported against the job, so apps can match them up. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

"Daniel P. Berrange" <berrange@redhat.com> writes:
On Mon, Apr 10, 2017 at 12:58:09PM +0200, Milan Zamazal wrote:
Jiri Denemark <jdenemar@redhat.com> writes:
On Tue, Feb 21, 2017 at 15:38:22 +0100, Milan Zamazal wrote:
[Starting to move to the development list.]
Milan Zamazal <mzamazal@redhat.com> writes:
Jiri Denemark <jdenemar@redhat.com> writes:
On Fri, Feb 17, 2017 at 12:38:24 +0100, Milan Zamazal wrote: > > There are basically two problems: > > - When the job completion callback is called, I need to distinguish what > kind of job was it to perform the appropriate actions. It would be > easier if I knew the job type directly in the callback (no need to > coordinate anything), but "external" job tracking is also possible.
An immediate answer would be: "don't rely on the completion callback and just check the return value of the API which started the job", but I guess you want it because checking the return value is not possible when the process which started the job is not running anymore as described below.
Well, avoiding using the completion callback is probably OK for me.
Thinking about it more, it's not very nice: I have to use the callback to get the completed job stats (I'm not guaranteed the domain still exists on the source host when I ask it for the stats explicitly) *and* to track the jobs outside the callback to know whether the callback is related to the type of domain jobs I'm going to handle.
Although not absolutely necessary, it would be much nicer if the job type was identified in the callback.
The job completed event uses type parameters so adding a new parameter describing the just completed job should not be a problem.
Great, so all what remains to solve the problem is to add the parameter :-). (And I'd like to have the same info available when a job is still running for reasons already discussed.)
I looked how the change could be implemented. Could you please help me clarify some things?
- I think a new member should be added to _virDomainJobInfo for the purpose. What would be a good name for it? Maybe "operation"? - Do I need to care about backends other than QEMU? - Jobs are classified by qemuDomainAsyncJob, which is a QEMU specific type. Is it OK to use such structures in virsh-domain.c or is there any additional abstraction needed?
I don't much like the idea of exposing the QEMU job operation names in the public API.
Perhaps we instead need to have the method which starts the job, return an integer "job id" that is then reported against the job, so apps can match them up.
That would make event-job matching unique, but it wouldn't solve the problem that the running job is a "mystery" for any entity that didn't start it. If the application that started the job gets lost or restarted or if I simply ssh to a machine and want to know what's happening to VMs there, I'm unable to tell if a VM is migrating to another host, creating a snapshot, etc. We need the information and I assume we can't get it easily without some form of exposing the QEMU job operation names. Regards, Milan

On Mon, Apr 10, 2017 at 12:01:43 +0100, Daniel P. Berrange wrote:
On Mon, Apr 10, 2017 at 12:58:09PM +0200, Milan Zamazal wrote:
I looked how the change could be implemented. Could you please help me clarify some things?
- I think a new member should be added to _virDomainJobInfo for the purpose. What would be a good name for it? Maybe "operation"? - Do I need to care about backends other than QEMU? - Jobs are classified by qemuDomainAsyncJob, which is a QEMU specific type. Is it OK to use such structures in virsh-domain.c or is there any additional abstraction needed?
I don't much like the idea of exposing the QEMU job operation names in the public API.
Perhaps we instead need to have the method which starts the job, return an integer "job id" that is then reported against the job, so apps can match them up.
The problem with "job id" is that only the process which started the job would know what it means. Not to mention it would require a lot of API changes. I think we should just introduce a new virDomainJobSomething enum as VIR_DOMAIN_JOB_SOMETHING_INCOMING_MIGRATION, VIR_DOMAIN_JOB_SOMETHING_OUTGOING_MIGRATION, VIR_DOMAIN_JOB_SOMETHING_SAVE, VIR_DOMAIN_JOB_SOMETHING_RESTORE, ... and report it in virDomainGetJobStat (definitely not in _virDomainJobInfo as it would break ABI). I'm not sure what the best name for "Something" would be. "Operation", "Action", or something else? Jirka

On Tue, Apr 11, 2017 at 09:49:42AM +0200, Jiri Denemark wrote:
On Mon, Apr 10, 2017 at 12:01:43 +0100, Daniel P. Berrange wrote:
On Mon, Apr 10, 2017 at 12:58:09PM +0200, Milan Zamazal wrote:
I looked how the change could be implemented. Could you please help me clarify some things?
- I think a new member should be added to _virDomainJobInfo for the purpose. What would be a good name for it? Maybe "operation"? - Do I need to care about backends other than QEMU? - Jobs are classified by qemuDomainAsyncJob, which is a QEMU specific type. Is it OK to use such structures in virsh-domain.c or is there any additional abstraction needed?
I don't much like the idea of exposing the QEMU job operation names in the public API.
Perhaps we instead need to have the method which starts the job, return an integer "job id" that is then reported against the job, so apps can match them up.
The problem with "job id" is that only the process which started the job would know what it means. Not to mention it would require a lot of API changes.
I think we should just introduce a new virDomainJobSomething enum as
VIR_DOMAIN_JOB_SOMETHING_INCOMING_MIGRATION, VIR_DOMAIN_JOB_SOMETHING_OUTGOING_MIGRATION, VIR_DOMAIN_JOB_SOMETHING_SAVE, VIR_DOMAIN_JOB_SOMETHING_RESTORE, ...
and report it in virDomainGetJobStat (definitely not in _virDomainJobInfo as it would break ABI).
I'm not sure what the best name for "Something" would be. "Operation", "Action", or something else?
IMHO It would be an "Operation", since a single logical operation can involve running multiple actions. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

"Daniel P. Berrange" <berrange@redhat.com> writes:
On Tue, Apr 11, 2017 at 09:49:42AM +0200, Jiri Denemark wrote:
On Mon, Apr 10, 2017 at 12:01:43 +0100, Daniel P. Berrange wrote:
On Mon, Apr 10, 2017 at 12:58:09PM +0200, Milan Zamazal wrote:
I looked how the change could be implemented. Could you please help me clarify some things?
- I think a new member should be added to _virDomainJobInfo for the purpose. What would be a good name for it? Maybe "operation"? - Do I need to care about backends other than QEMU? - Jobs are classified by qemuDomainAsyncJob, which is a QEMU specific type. Is it OK to use such structures in virsh-domain.c or is there any additional abstraction needed?
I don't much like the idea of exposing the QEMU job operation names in the public API.
Perhaps we instead need to have the method which starts the job, return an integer "job id" that is then reported against the job, so apps can match them up.
The problem with "job id" is that only the process which started the job would know what it means. Not to mention it would require a lot of API changes.
I think we should just introduce a new virDomainJobSomething enum as
VIR_DOMAIN_JOB_SOMETHING_INCOMING_MIGRATION, VIR_DOMAIN_JOB_SOMETHING_OUTGOING_MIGRATION, VIR_DOMAIN_JOB_SOMETHING_SAVE, VIR_DOMAIN_JOB_SOMETHING_RESTORE, ...
and report it in virDomainGetJobStat (definitely not in _virDomainJobInfo as it would break ABI).
I'm not sure what the best name for "Something" would be. "Operation", "Action", or something else?
IMHO It would be an "Operation", since a single logical operation can involve running multiple actions.
Sounds good. So what's best to do now to move on with the feature? Regards, Milan
participants (3)
-
Daniel P. Berrange
-
Jiri Denemark
-
Milan Zamazal