On Fri, Nov 22, 2019 at 09:06:31AM +0100, Peter Krempa wrote:
On Thu, Nov 21, 2019 at 18:54:12 +0000, Daniel Berrange wrote:
> On Fri, Oct 18, 2019 at 06:11:15PM +0200, Peter Krempa wrote:
> > From: Eric Blake <eblake(a)redhat.com>
> >
> > Introduce a few new public APIs related to incremental backups. This
> > builds on the previous notion of a checkpoint (without an existing
> > checkpoint, the new API is a full backup, differing from
> > virDomainBlockCopy in the point of time chosen and in operation on
> > multiple disks at once); and also allows creation of a new checkpoint
> > at the same time as starting the backup (after all, an incremental
> > backup is only useful if it covers the state since the previous
> > backup).
> >
> > A backup job also affects filtering a listing of domains, as well as
> > adding event reporting for signaling when a push model backup
> > completes (where the hypervisor creates the backup); note that the
> > pull model does not have an event (starting the backup lets a third
> > party access the data, and only the third party knows when it is
> > finished).
> >
> > Since multiple backup jobs can be run in parallel in the future (well,
> > qemu doesn't support it yet, but we don't want to preclude the idea),
> > virDomainBackupBegin() returns a positive job id, and the id is also
> > visible in the backup XML. But until a future libvirt release adds a
> > bunch of APIs related to parallel job management where job ids will
> > actually matter, the documentation is also clear that job id 0 means
> > the 'currently running backup job' (provided one exists), for use in
> > virDomainBackupGetXMLDesc() and virDomainBackupEnd().
> >
> > The full list of new APIs:
> > virDomainBackupBegin;
> > virDomainBackupEnd;
> > virDomainBackupGetXMLDesc;
> >
> > Signed-off-by: Eric Blake <eblake(a)redhat.com>
> > Reviewed-by: Daniel P. Berrangé <berrange(a)redhat.com>
> > ---
> > include/libvirt/libvirt-domain.h | 26 ++++-
> > src/driver-hypervisor.h | 20 ++++
> > src/libvirt-domain-checkpoint.c | 7 +-
> > src/libvirt-domain.c | 191 +++++++++++++++++++++++++++++++
> > src/libvirt_public.syms | 8 ++
> > tools/virsh-domain.c | 4 +-
> > 6 files changed, 252 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/libvirt/libvirt-domain.h
b/include/libvirt/libvirt-domain.h
> > index 22277b0a84..2d9f69f7d4 100644
> > --- a/include/libvirt/libvirt-domain.h
> > +++ b/include/libvirt/libvirt-domain.h
> > @@ -3267,6 +3267,7 @@ typedef enum {
>
>
> >
> > +/**
> > + * VIR_DOMAIN_JOB_ID:
> > + *
> > + * virDomainGetJobStats field: the id of the job (so far, only for jobs
> > + * started by virDomainBackupBegin()), as VIR_TYPED_PARAM_INT.
> > + */
> > +# define VIR_DOMAIN_JOB_ID "id"
> > +
> > /**
> > * VIR_DOMAIN_JOB_TIME_ELAPSED:
> > *
> > @@ -4106,7 +4115,8 @@ typedef void
(*virConnectDomainEventMigrationIterationCallback)(virConnectPtr co
> > * @nparams: size of the params array
> > * @opaque: application specific data
> > *
> > - * This callback occurs when a job (such as migration) running on the domain
> > + * This callback occurs when a job (such as migration or push-model
> > + * virDomainBackupBegin()) running on the domain
> > * is completed. The params array will contain statistics of the just
completed
> > * job as virDomainGetJobStats would return. The callback must not free
@params
> > * (the array will be freed once the callback finishes).
> > @@ -4916,4 +4926,18 @@ int virDomainGetGuestInfo(virDomainPtr domain,
> > int *nparams,
> > unsigned int flags);
> >
> > +
> > +int virDomainBackupBegin(virDomainPtr domain,
> > + const char *backupXML,
> > + const char *checkpointXML,
> > + unsigned int flags);
> > +
> > +char *virDomainBackupGetXMLDesc(virDomainPtr domain,
> > + int id,
> > + unsigned int flags);
> > +
> > +int virDomainBackupEnd(virDomainPtr domain,
> > + int id,
> > + unsigned int flags);
>
> So this is still using a plain integer job ID, which is a concern
> wrt future extensibility.
My current plan is to go ahead with API based on this old one but with
no support for any parallel jobs. Basically the same thing but 'id'
argument removed.
This actually fits in with the original documentation which was already
ACKed for the virDomainBackupBegin API which said that the backup job
uses the domain async job infrastructure. This means that the
virDomainAbortJob and virDomainGetJobStats can be used to monitor the
blockjob. It also gives us the possibility to query the state of a
finised job by passing VIR_DOMAIN_JOB_STATS_COMPLETED to
virDomainGetJobStats. We also have an async event for reporting the job
state. I'll also consider removing virDomainBackupEnd for this
implementation as virDomainAbortJob should be enough.
Ok, that makes sense. So we'll just have the Begin + GetXMLDesc APIs
for now.
I spoke with oVirt devs who are very keen on getting this API
finished
and their requirements currently don't require any parallel jobs. In
fact they were abusing Eric's design which only ever returned the same
job ID so that they would not have to persist it.
Hah, that's gross :-)
Given that we'll need to deal with the domain job anyways for the
better
job infra outlined below I don't think adding these APIs will be too
much of a burden in the interim so that we can appease oVirt's desire
for this feature and we'll have more time to design the new job
interface properly.
Ok, that's fine with me. As you say, adding a 2nd variant of the API
later if we need many jobs is not the end of the world, since we'll
need todo that anyway for many other APIs we already have.,
My preference is indeed to get something finally merged so that oVirt
can use it, since we've been debating the design on list here for waaaay
too long now.
> Earlier in the year I queried whether we should turn the
"job" into a
> fully fledged object, using either a string or a UUID to identify it
> uniquely.
>
>
https://www.redhat.com/archives/libvir-list/2019-March/msg01695.html
>
> IOW having something like this:
Going forward I want this not only for the backup job but basically for
any long running operation. We needed this for a long time but of the
two long running job impls we have both are not flexible enough.
There is only one 'domain job' (migration/save/etc) and blockjobs are
bound to disks.
Yeah, we kinda messed up in these two designs.
> typedef struct _virDomainJob virDomainJob;
> typedef virDomainJob *virDomainJobPtr;
I actually started some work on this but didn't get far yet. I used the
same name in my case, but I'm partially afraid that virDomainAbortJob
which would not be related to these objects will be mistaken for
actually working in this case.
Unfortunately I don't have any better idea.
We could just pick a completely different term. eg "virDomainOperation"
> void virDomainJobFree(virDomainJobPtr job);
>
> virDomainJobLookupByUUID(virDomainPtr job,
> unsigned char *uuid);
>
> int virDomainJobGetType(virDomainJobPtr job);
> int virDomainJobGetUUID(virDomainJobPtr job,
> unsigned char *uuid);
> int virDomainJobGetUUIDString(virDomainJobPtr job,
> char *uuidstr);
>
>
> virDomainJobPtr virDomainBackupBegin(virDomainPtr domain,
> const char *backupXML,
> const char *checkpointXML,
> unsigned int flags);
I was thinking about a super-universal API so that we don't have to redo
all APIs for blockjobs. Something along
virDomainJobPtr virDomainJobBegin(virDomainPtr domain,
const char *jobxml,
unsigned int flags);
It would give us the flexibility to add new jobs and arguments for them
via XML which is more flexible (e.g. we could easily add a
virDomainBlockPull with the 'top' argument which is currently missing
but qemu started to support it some time ago). On the other hand that
seems a too prone to abuse.
virTypedParameters is always an option if we wish to make the
specific job creation API more flexible without going all the
way to XML
As of the above I'm unsure about the tradeofs between flexibility and
too much flexibility in this case. But that's probably for a future
discussion.
Yeah, its a tricky balance between making it super flexible, and
making it more simple at an API caller POV.
Basically we have two choices
- consider the "job" to be just a way to monitor and manage
completion of jobs that something else spawned (what we
do now)
- consider the "job" to be the way to control the entire
lifecycle of the job, including spawning it.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|