On 03/12/2014 02:42 PM, Peter Krempa wrote:
> A backing chain of 3 files (base <- mid <- top) in the
local file system:
>
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'/>
> <source file='/var/lib/libvirt/images/top.qcow2'/>
> <backingStore type='file'>
> <driver name='qemu' type='qcow2'/>
... we should add an attribute with the index of the backing chain
element in the backing chain.
Hmm. Another feature coming down the pipes in qemu 2.0 is the ability
to give an alias to any portion of the backing chain. Right now, we
have an <alias> element tied to the <disk> as a whole (in qemu parlance,
the device id), but some qemu operations will be easier if we also have
a name tied to each file in the chain (in qemu parlance, a node id for
the bd [block driver structure]). Maybe we kill two birds with one
stone, by having each <backingStore> track an <alias> sub-element with
the name of the node, when communicating with qemu 2.0 and newer.
For a specific instance, consider a quorum vs. a snapshot create action
- there are two approaches: create a single qcow2 whose backing file is
the quorum (that is, request the snapshot on the node tied to the quorum):
Q[a, b, c] <- snap
or create a new quorum of three qcow2 files, with each qcow2 file
wrapping a member of the old quorum (actually, a 'transaction' command
that creates three files in one go):
Q[a <- snapA, b <- snapB, c <- snapC]
or even anything in between (request a snapshot of the node tied to A,
while leaving b and c alone, since node A is on the storage most
amenable to copying off the snapshot for backup purposes while node B
and C are remote). The way qemu is exposing this is by specifying that
when creating the new node for the snapshot, whether its backing file is
the node id of the overall quorum or of a node id of one of the pieces
of the quorum. So while the overall <disk> alias remains constant, the
quorum node is different from any of its three backing files. It's
further evidence that the quorum itself does not use any file resources,
but instead relies on multiple backingStores, and taking the snapshot
(or snapshots) needs control over all possible nodes as the starting
point that will be gaining a new qcow2 node as part of the snapshot
creation.
Right now, <alias> is currently a run-time and output-only parameter,
but we someday want to support offline block-pull and friends, where
we'd need the index to exist even when <alias> does not. Likewise,
while each <backingStore> corresponds to a qemu node, and can thus have
one name, the top-level <disk> has the chance for BOTH a device alias
(which moves around whenever the active image changes due to snapshots,
block copy, or block commit operations) and a node index (which is tied
to the file name, even if the file changes to no longer being the active
image in the chain).
Thanks for making me think about that!
Code-wise, I'm looking at splitting 'struct _virDomainDiskDef' into two
parts. The outermost part is _virDomainDiskDef, which tracks anything
tied to the guest view, or to the device as a whole (<target>, <alias>,
<address>); the inner part is a new _virDomainDiskSrcDef, which tracks
anything related to a host view (node name, <source>, <driver>,
<backingStore>), and where each backingStore is also a
_virDomainDiskSrcDef, as a recursive structure - we just special case
the output so that the first _virDomainDiskSrcDef feeds the XML of
<disk> element, while all other _virDomainDiskSrcDef feed the XML of a
<backingStore>. For tracking node ids, I would then add a counter of
nodes created so far to the outer structure, (more important for an
online domain, as we want to track node names that mesh with qemu node
names, and must not reuse names no matter how many snapshots or
block-commits happen in between), and where each inner structure grabs
the next increment of the counter. So revisiting various operations:
On snapshot, we are going from:
domainDiskDef (counter 1, alias "ide0-0-0")
+ domainDiskSrcDef (node "ide0-0-0[0]", source "base")
to:
domainDiskDef (counter 2, alias "ide0-0-0")
+ domainDiskSrcDef (node "ide0-0-0[1]", source "snap")
+ domainDiskSrcDef (node "ide0-0-0[0]", source "base")
Note that the node names grow in order of creation, which is NOT the
same as a top-down breadth-first numbering. <alias> and nodeid would be
output only (ignored on input); as long as qemu is running we cannot
reuse old nodeids, but when qemu is offline, we could rename things to
start back from 0; maybe only when passed a specific flag (similar to
the update cpu flag forcing us to update portions of the xml that we
otherwise leave unchanged).
Do we need both a node id and a <backingStore> index? We already allow
disk operations by <alias> name; so referring to the node id may be
sufficient. On the other hand, having index as an attribute might make
it easier to write XPath queries that resolve to a numbered node
regardless of depth (I'm a bit weak on XPath, but there's bound to be a
way to lookup a <disk> element whose target is named "vda" and that has
a "backingStore[index=4]" sub-element).
So, for a theoretical quorum with 2/3 majority and where one of the
disks is a backing chain, as in Q[a, b <- c, d], and where qemu is
running, it might look like:
<disk type='quorum' device='disk'>
<driver name='qemu' type='quorum' threshold='2'
node='[4]'/>
<backingStore type='file' index='1'>
<driver name='qemu' type='raw' node='[0]'/>
<source path='/path/to/a'/>
<backingStore/>
</backingStore>
<backingStore type='file' index='2'>
<driver name='qemu' type='qcow2' node='[2]'/>
<source path='/path/to/c'/>
<backingStore type='file' index='3' node='[1]'>
<driver name='qemu' type='raw'/>
<source path='/path/to/b'/>
<backingStore/>
</backingStore>
</backingStore>
<backingStore type='file' index='4'>
<driver name='qemu' type='raw' node='[3]'/>
<source path='/path/to/d'/>
<backingStore/>
</backingStore>
<target dev='hda' bus='ide'/>
<alias name='ide0-0-0'/>
<address type='drive' controller='0' bus='0'
target='0' unit='0'/>
</disk>
then the node names that qemu uses will be the concatenation of the
<disk> alias and each DiskSrcDef node ("ide0-0-0[4]" is the quorum,
"ide0-0-0[0]" is the node for file A, ...), and where you can also refer
to backing stores by index ("vda" or "vda[0]" is the quorum,
"vda[1]" is
file A from the quorum, "vda[2]" is the active part of the chain from
the second member of the quorum, ...)
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org