On 06/14/2016 06:50 AM, Kashyap Chamarthy wrote:
Summary
-------
It seems like libvirt is generating the same QEMU drive host
alias ('drive-virtio-disk%d') for different two disk labels (vdb, vdb1).
[Refer the contextual root cause analysis discussion at the end with
Laine & Peter.]
Let's take a quick example to demonstrate the issue.
On a guest that is shut down, attach a couplee of disks with labels
'vdb', and 'vdb1'
$ sudo virsh attach-disk cvm1 \
/export/vmimages/1.raw vdb --config
$ sudo virsh attach-disk cvm1 \
/export/vmimages/1.raw vdb1 --config
$ virsh domblklist cvm1
Target Source
------------------------------------------------
vda /var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img
vdb /export/vmimages/1.raw
vdb1 /export/vmimages/1.raw
Which results in guest XML:
[...]
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/export/vmimages/1.raw'/>
<target dev='vdb' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x07' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/export/vmimages/1.raw'/>
<target dev='vdb1' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x08' function='0x0'/>
</disk>
[...]
Sarting the guest (error message manually wrapped) fails, as
$ virsh start cvm1
error: Failed to start domain cvm1
error: internal error: process exited while connecting to monitor:
qemu-system-x86_64: -drive
file=/export/vmimages/1.raw,format=raw,if=none,id=drive-virtio-disk1: Duplicate
ID 'drive-virtio-disk1' for drive
It results in QEMU CLI:
-drive file=/export/vmimages/1.raw,format=raw,if=none,id=drive-virtio-disk1 \
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 \
-drive file=/export/vmimages/1.raw,format=raw,if=none,id=drive-virtio-disk1 \
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 \
Root cause analysis discussion from IRC from Friday
---------------------------------------------------
[laine]
The problem is that the "alias" libvirt generates based on the target
device name should be unique, but it isn't.
I can see why it's doing that. When putting the index of the disk it's
processing within the entries into drive-virtio-disk%d, it learns the
index by calling virDiskNameToIndex().
It's going through a list of disks, 0, 1, 2, 3, etc creating the
alias. But at each index, it calls this other function to learn the
index, and that function just searches for the first entry that matches
the name.
If they *really* don't know the index (because they only have a
pointer to the entry) they should just search for the entry with the
matching *address in memory*
Haven't stepped through it carefully, but it looks like "vdb" and
"vdb1" lead to the same id, and also "vdb" and "sdb"
[pkrempa]
The alias generator calls virDiskNameToIndex() which calls
virDiskNameParse(). virDiskNameParse() parses the name including
the partition number and returns it. virDiskNameToIndex() discards
the partition number and returns the disk index.
If we accept both 'sdb' and 'sdb1' but generate the same alias then
that's a bug.
oh - this brings back some shorter term bad memories.
This all gets even more complicated with <hostdev>'s. Try to follow both
virDomainDiskDefAssignAddress and virDomainHostdevAssignAddress
especially when the <address> is not supplied.
I do agree with Dan - there should have been some failure on the second
command since the source file is duplicated. It seems the only check is
for duplicated target dev (see virDomainDiskDefCheckDuplicateInfo). A
quick scan finds no disk->src->path comparison for FILE type - not sure
a vanilla comparison would work for all "src->type"'s...
Beyond that - if one just "reads" the comments for virDiskNameToIndex
(and virDiskNameParse) they may assume that the code is attempting to
"serially assign an address" based on the target name. Thus "vdb"
would
equate to the "2nd" disk while "vdb1" would equate to the
"27th" disk.
However, that would only be "true" if the "partition" parameter was
supplied (e.g. a disk partition). If it's not (which is the more general
case), then the partition number is "dropped":
/* Count the trailing digits. */
size_t n_digits = strspn(ptr, "0123456789");
if (ptr[n_digits] != '\0')
return -1;
*disk = idx;
If the "digit" mattered, then the "n_digits" would have been used to
alter "idx" by "idx += *ptr - '0'; similar to how the 'a'
math was done.
The order of the code perhaps tricks the eyes into believing that the
idx was adjusted.
In any case, for a non "disk partition" case, the second target perhaps
should have been "sdbb" and not "sdb1" (look at tests/utiltest.c).
Furthermore, maybe the "solution" for this is the XML catching that the
supplied target has a numerically ending value, but does that work for
all cases? If one looks at :
tests/qemuxml2argvdata/qemuxml2argv-pci-many.xml
one will see usage of "vdaa", "vdab", etc. and not "vda1",
"vda2", etc.
I did not dig for other tests for many disks per controller.
Still even with all that "resolved", there is another issue. There's no
code currently that makes the "post parse" check if there was some disk
or hostdev somehow assigned the same address. I believe this is
something determined during review of the series:
http://www.redhat.com/archives/libvir-list/2015-July/msg00870.html
which went into August, but perhaps most relevant is patch 12:
http://www.redhat.com/archives/libvir-list/2015-August/msg00058.html
Based on that I started writing a patch that does a very nasty post
parse address duplication check for both "disks" and "hostdevs";
however, I believe I was "waiting for" something that allowed us to call
it before domain start. IOW: Before what now is Peter's domain
configuration validation series (essentially through commit id
'5972f185e').
The problem being that prior Peter's series, the only way we could stop
something with a duplicate address was at startup time. Now with Peter's
series, it would be possible to check for duplicates prior to start and
actually list them before actually attempting the start.
John
Hope this all makes sense - lots of distractions at home this morning...