[libvirt] Job Control API [RFC]

Wednesday, 21 May 2014

  My name is Tucker DiNapoli and I am working on implementing job control
for
the storage driver for the google summer of code, the first step in doing
this
is creating and implementing a unified api for job control.

Currently there are several places where various aspects of job control are
implemented. The qemu and libxl drivers both contain internal
implementations
for job control on domain level jobs, with the qemu driver containing
support
for asynchronous jobs. There is also code in the libvirt.c file for running
block jobs and for querying domain jobs for information.

I would like for the job control api to be as independent of different
drivers
as possible since it will need to be used with storage drivers as well as
different virtualization drivers.

I imagine most of the api will revolve around a job object, and I think it's
important to decide what exactly should go in this job object.

This is a response from my first post on the mailing list and I think this
is a
good idea.

...
>I'd _really_ like to see a common notion of a 'job id'
that EVERY job
>(whether domain-level, like migration; or block-level, like
>commit/pull/rebase; or storage-level, like your new proposed storage
>jobs) shares a common job namespace.  The job id is a positive integer.
> Existing APIs will have to be retrofitted into the new job id notion;
>any action that starts a long-running job that currently returns 0 on
>success could be changed to return a positive job id; or we may need a
>new API that queries the notion of the 'current job' (the job most
>recently started) or even to set the 'current job' to a different job
>id.  We'll need new API for querying a job by id, and to be most
>portable, we should do job reporting via virTypedParameter
>(virDomainGetJobInfo and virDomainGetBlockJobInfo are hardcoded into
>returning a struct, so they are non-extensible; virDomainGetJobStats
>almost did it right, except that the user has to call it twice, once to
>learn how large to allocate, and again to pass in pre-allocated memory -
>the ideal API would allocate the memory on a single call). 
Currently there are separate types for block job info and job info, if
possible
I would like to merge these into a common job info type, and perhaps make
this
a part of the job object itself.

Currently (in libxl and qemu) jobs are a part of the domain struct, I think
that jobs should be moved out of the domain struct instead using the idea of
job ids for domains to keep track of currently running jobs. I'm still new
to
libvirt so it this doesn't make sense and the idea of keeping job objects
attached to domains makes sense that's fine.

I think at the minimum each job object should contain: the id of the thread
running the job, the type of job, the job id, a condition variable to
coordinate jobs, and information about the job, either as a separate job
info
object or as part of the job object itself. The job should also contain a
reference to the domain or storage it is associated with.

There are a few basic functions that should definitely be part of the api:
initialize a job, free a job, start a job, end a job, abort a job and get
info
on a job. It would be nice to be able to suspend a job and to change the
currently running job as well. That's what I can come up with, but I don't
have
much experience in libvirt so if there are other features that make sense
they
can be added as well.

Finally (as far as I can think of right now) is the idea of parallel
jobs. Currently the qemu driver allows some jobs to be run in parallel by
allowing a job to be run asynchronously, this async job has a mask of job
types
associated with it that determine what types of regular jobs can be run
during
it. However I would like to allow an arbitrary number of jobs to be run at
once
(I'm not sure how useful this would be, but it seems best not to impose hard
limits on things). The easiest way to deal with this is to just ignore it
and
put the burden of synchronizing jobs on the drivers. This is obviously a bad
solution. Another way would be the way it is currently done it the qemu
driver,
have a mask of job types associated with each domain/storage which is
updated
when a job is started or ended which dictates what types of jobs can be
started. Regardless of how this is done it will require support from the
driver/domain/storage that each job is associated with.

Tucker DINapoli

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005