On 10/12/18 9:02 AM, Peter Krempa wrote:
On Fri, Oct 12, 2018 at 00:10:09 -0500, Eric Blake wrote:
> Introduce a few more new public APIs related to incremental backups.
> This builds on the previous notion of a checkpoint (without an
> existing checkpoint, the new API is a full backup, differing only
> from virDomainCopy in the point of time chosen); and also allows
> creation of a new checkpoint at the same time as starting the backup
> (after all, an incremental backup is only useful if it covers the
> state since the previous backup). It also enhances event reporting
> for signaling when a push model backup completes (where the
> hypervisor creates the backup); note that the pull model does not
> have an event (starting the backup lets a third party access the
> data, and only the third party knows when it is finished).
>
> The full list of new API:
> virDomainBackupBegin;
> virDomainBackupEnd;
> virDomainBackupGetXMLDesc;
>
Skipping wording improvements for now (which I will probably incorporate
without question), and focusing on:
> + * Start a point-in-time backup job for the specified disks of
a
> + * running domain.
> + *
> + * A backup job is mutually exclusive with domain migration
> + * (particularly when the job sets up an NBD export, since it is not
> + * possible to tell any NBD clients about a server migrating between
> + * hosts). For now, backup jobs are also mutually exclusive with any
> + * other block job on the same device, although this restriction may
> + * be lifted in a future release. Progress of the backup job can be
Hypervisors may not allow this job if other block devices are part of
other block jobs.
> + * tracked via virDomainGetJobStats(). The job remains active until a
How is this going to track multiple jobs?
Yeah, that's already become obvious to me that we really don't have any
GOOD design in place for multiple parallel jobs. At the moment, qemu
itself is hard-coded to at most one NBD server (and thus at most one
pull backup job), but I really don't want the backup APIs to be held
hostage to also fixing the fact that libvirt needs a generic API for
multiple parallel jobs.
At a high level, what I imagine is a parallel set of APIs that takes a
job id everywhere, as well as a way to set the "default" job id. All
existing APIs, such as virDomainGetJobStats(), would then have
documentation updated to state that they work on the "default" job, and
APIs that start a job but do not return a job id implicitly change the
"default" job.
> + * The @diskXml parameter is optional but usually provided, and
> + * contains details about the backup, including which backup mode to
> + * use, whether the backup is incremental from a previous checkpoint,
> + * which disks participate in the backup, the destination for a push
> + * model backup, and the temporary storage and NBD server details for
> + * a pull model backup. If omitted, the backup attempts to default to
> + * a push mode full backup of all disks, where libvirt generates a
> + * filename for each disk by appending a suffix of a timestamp in
Do we really want to support this convenience option?
It's always possible to make things mandatory initially, then relax
things to optional later. I can make those sorts of tweaks once I have
the qemu code demonstration working (KVM Forum is next week - it will be
interesting to see how much I have in place).
> + * seconds since the Epoch. virDomainBackupGetXMLDesc() can be called
> + * to learn actual values selected. For more information, see
> + * formatcheckpoint.html#BackupAttributes.
> + *
> + * The @checkpointXml parameter is optional; if non-NULL, then libvirt
> + * behaves as if virDomainCheckpointCreateXML() were called with
> + * @checkpointXml and the flag VIR_DOMAIN_BACKUP_BEGIN_NO_METADATA
> + * forwarded appropriately, atomically covering the same guest state
> + * that will be part of the backup. The creation of a new checkpoint
> + * allows for future incremental backups. Note that some hypervisors
> + * may require a particular disk format, such as qcow2, in order to
> + * take advantage of checkpoints, while allowing arbitrary formats
> + * if checkpoints are not involved.
> + *
> + * Returns a non-negative job id on success, or negative on failure.
> + * This operation returns quickly, such that a user can choose to
> + * start a backup job between virDomainFSFreeze() and
> + * virDomainFSThaw() in order to create the backup while guest I/O is
> + * quiesced.
> + */
> +/* FIXME: Do we need a specific API for listing all current backup
> + * jobs (which, at the moment, is at most one job), or is it better to
> + * refactor other existing job APIs in libvirt-domain.c to have job-id
> + * counterparts along with a generic listing of all jobs (with flags
> + * for filtering to specific job types)?
This concern itself should be a warning that we should not push this
until a full implementation is provided, so that we know that the job
handling APIs will be enough.
You probably also need an API to list the job if a client
loses the job id somehow, since all the other APIs below take it.
Yes, anything smarter than "the job id is always 1 because we don't
support parallel jobs yet" will need an API for getting the id again.
Additionally since I'm currently working on re-doing blockjobs to
support -blockdev and friends, one of the steps involves changing from
the 'block-job-***' QMP apis to the newer 'job-**' APIs so that
'blockdev-create' can be supported. Is there any overlap with the backup
APIs? Specifically number-based names for 'job-**' are less than
sub-optimal, but in this case we could map from them to a proper name.
Would it be better to return a char* job name, and/or let the user
request a job name of their choice in the XML? That's doable, but it
does have ripple effects on the entire design, so it's the sort of thing
that we DO want to figure out before promoting this into a release.
At any rate, if the backup work requires to use the 'job-***' APIs we'll
need to find an intersection.
My initial implementation is using blockdev-backup, but yes, we want to
use the job-*** APIs where possible.
> +/**
> + * virDomainBackupGetXMLDesc:
> + * @domain: a domain object
> + * @id: the id of an active backup job previously started with
> + * virDomainBackupBegin()
> + * @flags: bitwise-OR of subset of virDomainXMLFlags
> + *
> + * In some cases, a user can start a backup job without supplying all
> + * details, and rely on libvirt to fill in the rest (for example,
> + * selecting the port used for an NBD export). This API can then be
> + * used to learn what default values were chosen.
> + *
> + * No security-sensitive data will be included unless @flags contains
> + * VIR_DOMAIN_XML_SECURE; this flag is rejected on read-only
> + * connections. For this API, @flags should not contain either
> + * VIR_DOMAIN_XML_INACTIVE or VIR_DOMAIN_XML_UPDATE_CPU.
I fail to see why the full domain XML should be part of the backup job
description.
It's not. It remains a useful part of the checkpoint XML, but I'm
dropping it from the backup XML - _if_ we keep <domaincheckpoint> and
<domainbackup> as separate XML arguments.
Then again, I still have the open question about how to extend
checkpoints onto external snapshots. There are two viable approaches to
consider:
1. expand existing <domainsnapshot> XML to take <domainbackup> as a
subelement, then we can atomically create checkpoints as part of the
existing virDomainSnapshotCreateXML - if we do this, then
virDomainBackupBegin() should take only one XML argument, where
<domainsnapshot> is a subelement of <domainbackup>
2. add a new API that parallels virDomainSnapshotCreateXML but takes an
additional XML argument for creating a checkpoint at the same time - if
we do this, then virDomainBackupBegin() remains in its current style of
two XML arguments.
Depending on which approach we like better, the flags here may make
sense after all.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org