[libvirt] libvirt RBD attach regression?

Hello, I have been testing libvirt v1.0.0 for deployment within my organization, and in the process discovered what appears to be a bug that breaks virsh attach-device, when attaching an RBD volume to an instance. First, here is the error presented, with v1.0.0 (this worked in v0.10.2): [root@host ~]# virsh attach-device W5APQ8 G84VV1.xml error: Failed to attach device from G84VV1.xml error: cannot open file 'dc3-1-test/G84VV1': No such file or directory [root@host ~]# W5APQ8 is my running QEMU/KVM instance, and G84VV1.xml contains the following: <disk type='network' device='disk'> <driver name='qemu' type='raw'/> <auth username="removed"> <secret type="ceph" uuid="removed"/> </auth> <source protocol='rbd' name='dc3-1-test/G84VV1'> <host name="X.X.X.X" port="6789"/> <host name="X.X.X.X" port="6789"/> <host name="X.X.X.X" port="6789"/> </source> <target dev='vdc' bus='virtio'/> <serial>hpbs-G84VV1</serial> </disk> Using git bisect, I narrowed the problem down to this as the first commit to break this setup: 4d34c92947e8cf9e9bedfa227ada1d2dba92d68a is the first bad commit commit 4d34c92947e8cf9e9bedfa227ada1d2dba92d68a Author: Eric Blake <eblake@redhat.com> Date: Tue Oct 9 16:08:14 2012 -0600 These all look closely related as well to the problem commit: http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=67aea3fb780346b4aa5aea4... http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=38c4a9cc40476bd2e598a18... http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=4d34c92947e8cf9e9bedfa2... If I build libvirt from sources before these commits and then run the exact virsh-attach command shown above, the attachment works and I do not get any errors. I wanted to see if anyone had any insights on this, or perhaps a commit to correct this issue. If I can be of any further assistance with this, let me know. Thanks, Scott Sullivan

On 11/21/2012 09:45 AM, Scott Sullivan wrote:
Hello,
I have been testing libvirt v1.0.0 for deployment within my organization, and in the process discovered what appears to be a bug that breaks virsh attach-device, when attaching an RBD volume to an instance. First, here is the error presented, with v1.0.0 (this worked in v0.10.2):
[root@host ~]# virsh attach-device W5APQ8 G84VV1.xml error: Failed to attach device from G84VV1.xml error: cannot open file 'dc3-1-test/G84VV1': No such file or directory [root@host ~]#
Thanks for the report. Hmm, something in the new probing code is failing to recognize that rbd devices are not local files, and therefore should not be probed via stat() and open() calls.
Using git bisect, I narrowed the problem down to this as the first commit to break this setup:
4d34c92947e8cf9e9bedfa227ada1d2dba92d68a is the first bad commit commit 4d34c92947e8cf9e9bedfa227ada1d2dba92d68a Author: Eric Blake <eblake@redhat.com> Date: Tue Oct 9 16:08:14 2012 -0600
That may be the commit that exposed the problem, but I'm sure the actual regression was introduced as a latent bug in one of my earlier conversions in src/util/storage_file.c:
These all look closely related as well to the problem commit:
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=67aea3fb780346b4aa5aea4...
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=38c4a9cc40476bd2e598a18...
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=4d34c92947e8cf9e9bedfa2...
which you have indeed identified as possible culprits.
If I build libvirt from sources before these commits and then run the exact virsh-attach command shown above, the attachment works and I do not get any errors.
I wanted to see if anyone had any insights on this, or perhaps a commit to correct this issue. If I can be of any further assistance with this, let me know.
If you feel comfortable writing up a C patch; great. If not, I probably won't be able to get to this until next week, or someone else may beat me to it. I also know that Peter is also trying to plug another regression in the same code: https://www.redhat.com/archives/libvir-list/2012-November/msg00894.html -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hello, We were able to come up with this patch, which does fix the problem in our use case: http://pastebin.com/izx40mRd Interested to hear any feedback on the patch. On 11/21/2012 12:13 PM, Eric Blake wrote:
On 11/21/2012 09:45 AM, Scott Sullivan wrote:
Hello,
I have been testing libvirt v1.0.0 for deployment within my organization, and in the process discovered what appears to be a bug that breaks virsh attach-device, when attaching an RBD volume to an instance. First, here is the error presented, with v1.0.0 (this worked in v0.10.2):
[root@host ~]# virsh attach-device W5APQ8 G84VV1.xml error: Failed to attach device from G84VV1.xml error: cannot open file 'dc3-1-test/G84VV1': No such file or directory [root@host ~]#
Thanks for the report. Hmm, something in the new probing code is failing to recognize that rbd devices are not local files, and therefore should not be probed via stat() and open() calls.
Using git bisect, I narrowed the problem down to this as the first commit to break this setup:
4d34c92947e8cf9e9bedfa227ada1d2dba92d68a is the first bad commit commit 4d34c92947e8cf9e9bedfa227ada1d2dba92d68a Author: Eric Blake<eblake@redhat.com> Date: Tue Oct 9 16:08:14 2012 -0600
That may be the commit that exposed the problem, but I'm sure the actual regression was introduced as a latent bug in one of my earlier conversions in src/util/storage_file.c:
These all look closely related as well to the problem commit:
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=67aea3fb780346b4aa5aea4...
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=38c4a9cc40476bd2e598a18...
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=4d34c92947e8cf9e9bedfa2...
which you have indeed identified as possible culprits.
If I build libvirt from sources before these commits and then run the exact virsh-attach command shown above, the attachment works and I do not get any errors.
I wanted to see if anyone had any insights on this, or perhaps a commit to correct this issue. If I can be of any further assistance with this, let me know.
If you feel comfortable writing up a C patch; great. If not, I probably won't be able to get to this until next week, or someone else may beat me to it. I also know that Peter is also trying to plug another regression in the same code: https://www.redhat.com/archives/libvir-list/2012-November/msg00894.html

On 11/21/2012 11:06 AM, Scott Sullivan wrote:
Hello,
We were able to come up with this patch, which does fix the problem in our use case:
Pastebins are transient; so reposting here:
index d01e366..3b37ece 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c -2013,6 +2013,10 @@ qemuDomainDetermineDiskChain(struct qemud_driver *driver, { bool probe = driver->allowDiskFormatProbing;
+ if (disk->type == VIR_DOMAIN_DISK_TYPE_NETWORK) { + return 0; + } + if (!disk->src) return 0;
Interested to hear any feedback on the patch.
That actually looks correct :) I'll go ahead and commit it in your name; and thanks again not only for the report, but also for the fix. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Scott Sullivan