[libvirt] Stored secrets seem to get corrupted

Hi, On one of my systems I'm having troubles with my RBD storage backend. At first I thought it was a problem with my code, but after trying the same code on a second machine I'm a bit confused. The problem is that the storage backend tries to retrieve the value of a secret and base64 decode it, that fails. My debug log shows: debug : virStorageBackendRBDOpenRADOSConn:65 : Using cephx authorization debug : virStorageBackendRBDOpenRADOSConn:75 : Looking up secret by UUID: 322bccea-f2ed-4eae-a7e5-d0793ffb162d debug : virSecretLookupByUUIDString:14128 : conn=0x7fac9c0009c0, uuidstr=322bccea-f2ed-4eae-a7e5-d0793ffb162d debug : virSecretLookupByUUID:14086 : conn=0x7fac9c0009c0, uuid=322bccea-f2ed-4eae-a7e5-d0793ffb162d debug : virSecretGetValue:14486 : secret=0x7fac94000d30, value_size=0x7facad481918, flags=0 debug : virStorageBackendRBDOpenRADOSConn:103 : Found cephx key: `I^% debug : virStorageBackendRBDOpenRADOSConn:135 : Found 1 RADOS cluster monitors in the pool configuration debug : virStorageBackendRBDOpenRADOSConn:159 : RADOS mon_host has been set to: 31.25.100.131:6789, error : virStorageBackendRBDOpenRADOSConn:171 : internal error failed to connect to the RADOS monitor on: 31.25.100.131:6789, It goes wrong at "Found cephx key: <garbage>" So I figured it had to be something in my code and I went over the code again, but nothing seemed odd. I tried the same checkout (tag: 0.9.13-rc1) on a different host (also Ubuntu 12.04) and that worked. The secret and pool XML's are the same, but what I found is that the secret storage on disk seems to go wrong on one machine. Notice this behavior: root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 b4b147bc522828731f1a016bfa72c073 /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 927e2458c32cc3f6754d91694e41333f /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~# As you can see, the md5sum of the file changes when I set the value of the secret to the same. I tried the same on the other host: root@stack02:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set root@stack02:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 c30db27f9ebfe3f7903470d4bd542d1d /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack02:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set root@stack02:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 c30db27f9ebfe3f7903470d4bd542d1d /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack02:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set root@stack02:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 c30db27f9ebfe3f7903470d4bd542d1d /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack02:~# The md5sum stays the same on stack02. I verified that stack01 isn't out of disk space or out of inodes, those are in the acceptable values range. Any suggestions? For the record, the hosts: OS: Ubuntu 12.04 x864_64 Kernel: 3.2.0-25-generic Libvirt commit: 0fce94fe1bd782ac4c33fdd59d13ee37b3437413 Thank you, Wido

On Mon, Jun 25, 2012 at 04:37:48PM +0200, Wido den Hollander wrote:
Hi,
On one of my systems I'm having troubles with my RBD storage backend.
At first I thought it was a problem with my code, but after trying the same code on a second machine I'm a bit confused.
The problem is that the storage backend tries to retrieve the value of a secret and base64 decode it, that fails.
My debug log shows:
Notice this behavior:
root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 b4b147bc522828731f1a016bfa72c073 /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 927e2458c32cc3f6754d91694e41333f /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~#
As you can see, the md5sum of the file changes when I set the value of the secret to the same.
That is really bizarre. Can you look at what is actually stored in the .base64 file each time ? And what 'secret-get-value' replies with ?
I tried the same on the other host:
root@stack02:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack02:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 c30db27f9ebfe3f7903470d4bd542d1d /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack02:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack02:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 c30db27f9ebfe3f7903470d4bd542d1d /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack02:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack02:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 c30db27f9ebfe3f7903470d4bd542d1d /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack02:~#
The md5sum stays the same on stack02.
This is the correct behaviour tht I see myself too.
I verified that stack01 isn't out of disk space or out of inodes, those are in the acceptable values range.
Any suggestions?
I think you'll probably need to add some more VIR_DEBUG lines to secret_driver.c to see where in the process it is going wrong. Or perhaps strace libvirtd to see what it thinks it is writing out & whether any errors appear. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/25/2012 04:54 PM, Daniel P. Berrange wrote:
On Mon, Jun 25, 2012 at 04:37:48PM +0200, Wido den Hollander wrote:
Hi,
On one of my systems I'm having troubles with my RBD storage backend.
At first I thought it was a problem with my code, but after trying the same code on a second machine I'm a bit confused.
The problem is that the storage backend tries to retrieve the value of a secret and base64 decode it, that fails.
My debug log shows:
Notice this behavior:
root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 b4b147bc522828731f1a016bfa72c073 /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 927e2458c32cc3f6754d91694e41333f /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~#
As you can see, the md5sum of the file changes when I set the value of the secret to the same.
That is really bizarre. Can you look at what is actually stored in the .base64 file each time ? And what 'secret-get-value' replies with ?
The content of the .base64 is pure garbage, my terminal can't make anything of it. What I do notice is that the .base64 file is only 2 bytes big, while it should be 40 bytes. "secret-get-value" returns the correct data, but I think that is due to it being in memory. That also tells me that the writing to disk fails, in memory it is still fine. When I restart libvirt I see: secretLoadValue:406 : internal error invalid base64 in '/etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64' I checked my disk-space and inodes again, but those are all fine. I can write other files on the same FS without any problem. I also made sure that AppArmor (Ubuntu) was turned off.
I verified that stack01 isn't out of disk space or out of inodes, those are in the acceptable values range.
Any suggestions?
I think you'll probably need to add some more VIR_DEBUG lines to secret_driver.c to see where in the process it is going wrong. Or perhaps strace libvirtd to see what it thinks it is writing out & whether any errors appear.
I'll try that. Wido

On 25-06-12 16:54, Daniel P. Berrange wrote:
Notice this behavior:
root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 b4b147bc522828731f1a016bfa72c073 /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 927e2458c32cc3f6754d91694e41333f /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~#
As you can see, the md5sum of the file changes when I set the value of the secret to the same.
That is really bizarre. Can you look at what is actually stored in the .base64 file each time ? And what 'secret-get-value' replies with ?
I haven't been able to look into this any further, however: I just downloaded 0.9.13 from the libvirt website and installed it on a totally different host which is also running Ubuntu 12.04 I wanted to start a virtual machine with RBD storage and that failed, the secret was corrupted... The symptoms on this machine are exactly the same, the secret file is just 2 bytes big. root@amd:~# ls -al /etc/libvirt/secrets/*.base64 -rw------- 1 root root 2 Jul 3 15:02 /etc/libvirt/secrets/69f9540e-f0ce-4184-8254-9b22efade5f2.base64 root@amd:~#
This is the correct behaviour tht I see myself too.
I verified that stack01 isn't out of disk space or out of inodes, those are in the acceptable values range.
Any suggestions?
I think you'll probably need to add some more VIR_DEBUG lines to secret_driver.c to see where in the process it is going wrong. Or perhaps strace libvirtd to see what it thinks it is writing out & whether any errors appear.
I haven't added any VIR_DEBUG lines yet, but stracing the libvirtd process doesn't show any fopen() nor fwrites() to any *.base64 files. Wido

On Tue, Jul 03, 2012 at 03:11:59PM +0200, Wido den Hollander wrote:
On 25-06-12 16:54, Daniel P. Berrange wrote:
Notice this behavior:
root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 b4b147bc522828731f1a016bfa72c073 /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~# virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d AQAE+uJPCFpELBAAkTniQvHabBGj0Quwnu2imA== Secret value set
root@stack01:~# md5sum /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 927e2458c32cc3f6754d91694e41333f /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 root@stack01:~#
As you can see, the md5sum of the file changes when I set the value of the secret to the same.
That is really bizarre. Can you look at what is actually stored in the .base64 file each time ? And what 'secret-get-value' replies with ?
I haven't been able to look into this any further, however: I just downloaded 0.9.13 from the libvirt website and installed it on a totally different host which is also running Ubuntu 12.04
I wanted to start a virtual machine with RBD storage and that failed, the secret was corrupted...
The symptoms on this machine are exactly the same, the secret file is just 2 bytes big.
root@amd:~# ls -al /etc/libvirt/secrets/*.base64 -rw------- 1 root root 2 Jul 3 15:02 /etc/libvirt/secrets/69f9540e-f0ce-4184-8254-9b22efade5f2.base64 root@amd:~#
This is the correct behaviour tht I see myself too.
I verified that stack01 isn't out of disk space or out of inodes, those are in the acceptable values range.
Any suggestions?
I think you'll probably need to add some more VIR_DEBUG lines to secret_driver.c to see where in the process it is going wrong. Or perhaps strace libvirtd to see what it thinks it is writing out & whether any errors appear.
I haven't added any VIR_DEBUG lines yet, but stracing the libvirtd process doesn't show any fopen() nor fwrites() to any *.base64 files.
When strac'ing libvirtd make sure you add the '-f' arg so that you trace all threads - the libvirtd thread leader will never do any interesting stuff except RPC i/o Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03-07-12 15:13, Daniel P. Berrange wrote:
That is really bizarre. Can you look at what is actually stored in the .base64 file each time ? And what 'secret-get-value' replies with ?
I haven't been able to look into this any further, however: I just downloaded 0.9.13 from the libvirt website and installed it on a totally different host which is also running Ubuntu 12.04
I wanted to start a virtual machine with RBD storage and that failed, the secret was corrupted...
The symptoms on this machine are exactly the same, the secret file is just 2 bytes big.
root@amd:~# ls -al /etc/libvirt/secrets/*.base64 -rw------- 1 root root 2 Jul 3 15:02 /etc/libvirt/secrets/69f9540e-f0ce-4184-8254-9b22efade5f2.base64 root@amd:~#
This is the correct behaviour tht I see myself too.
I verified that stack01 isn't out of disk space or out of inodes, those are in the acceptable values range.
Any suggestions?
I think you'll probably need to add some more VIR_DEBUG lines to secret_driver.c to see where in the process it is going wrong. Or perhaps strace libvirtd to see what it thinks it is writing out & whether any errors appear.
I haven't added any VIR_DEBUG lines yet, but stracing the libvirtd process doesn't show any fopen() nor fwrites() to any *.base64 files.
I just added a couple of VIR_DEBUG lines to secret_driver.c and found out that the base64 encoding is actually the problem. In secretSaveValue VIR_DEBUG("WIDO Secret value: %s, size %lu", secret->value, secret->value_size); filename = secretBase64Path(driver, secret); if (filename == NULL) goto cleanup; base64_encode_alloc((char *)secret->value, secret->value_size, &base64); if (base64 == NULL) { virReportOOMError(); goto cleanup; } VIR_DEBUG("WIDO Writing %s to %s with a length of %lu", base64, filename, strlen(base64)); if (replaceFile(filename, base64, strlen(base64)) < 0) goto cleanup; The results I get back: $ virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d d2lkbw== 2012-07-03 14:02:57.065+0000: 4593: debug : secretSaveValue:297 : WIDO Secret value: wido, size 4 2012-07-03 14:02:57.065+0000: 4593: debug : secretSaveValue:309 : WIDO Writing ��� to /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 with a length of 6 Here you can see the secret value arrives at the secret driver in tact, but the base64_encode_alloc seems to scramble the data. It should display base64 encoded data to write to 'filename', but it's showing some binary stuff. I then tried to shut down libvirt, manually write the base64 data to the file and start libvirt again: loadSecrets:510 : Error reading secret: internal error invalid base64 in XXX But there is valid base64 data in that file. This seems to be a gnulib thing? The submodule gnulib is used for the base64 encoding, isn't it? Checking the gnulib source shows that there have been no changes to base64.h and .c. The last change is from over a year ago.
When strac'ing libvirtd make sure you add the '-f' arg so that you trace all threads - the libvirtd thread leader will never do any interesting stuff except RPC i/o
Right... My bad :) Wido

On Tue, Jul 03, 2012 at 04:42:54PM +0200, Wido den Hollander wrote:
On 03-07-12 15:13, Daniel P. Berrange wrote:
That is really bizarre. Can you look at what is actually stored in the .base64 file each time ? And what 'secret-get-value' replies with ?
I haven't been able to look into this any further, however: I just downloaded 0.9.13 from the libvirt website and installed it on a totally different host which is also running Ubuntu 12.04
I wanted to start a virtual machine with RBD storage and that failed, the secret was corrupted...
The symptoms on this machine are exactly the same, the secret file is just 2 bytes big.
root@amd:~# ls -al /etc/libvirt/secrets/*.base64 -rw------- 1 root root 2 Jul 3 15:02 /etc/libvirt/secrets/69f9540e-f0ce-4184-8254-9b22efade5f2.base64 root@amd:~#
This is the correct behaviour tht I see myself too.
I verified that stack01 isn't out of disk space or out of inodes, those are in the acceptable values range.
Any suggestions?
I think you'll probably need to add some more VIR_DEBUG lines to secret_driver.c to see where in the process it is going wrong. Or perhaps strace libvirtd to see what it thinks it is writing out & whether any errors appear.
I haven't added any VIR_DEBUG lines yet, but stracing the libvirtd process doesn't show any fopen() nor fwrites() to any *.base64 files.
I just added a couple of VIR_DEBUG lines to secret_driver.c and found out that the base64 encoding is actually the problem.
In secretSaveValue
VIR_DEBUG("WIDO Secret value: %s, size %lu", secret->value, secret->value_size);
filename = secretBase64Path(driver, secret); if (filename == NULL) goto cleanup; base64_encode_alloc((char *)secret->value, secret->value_size, &base64); if (base64 == NULL) { virReportOOMError(); goto cleanup; }
VIR_DEBUG("WIDO Writing %s to %s with a length of %lu", base64, filename, strlen(base64)); if (replaceFile(filename, base64, strlen(base64)) < 0) goto cleanup;
The results I get back:
$ virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d d2lkbw==
2012-07-03 14:02:57.065+0000: 4593: debug : secretSaveValue:297 : WIDO Secret value: wido, size 4 2012-07-03 14:02:57.065+0000: 4593: debug : secretSaveValue:309 : WIDO Writing ���
to /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 with a length of 6
Here you can see the secret value arrives at the secret driver in tact, but the base64_encode_alloc seems to scramble the data.
It should display base64 encoded data to write to 'filename', but it's showing some binary stuff.
Yeah, I'm damned I can understand what's broken at that point. The logs show the input is sensible, and we're calling the APIs the right way. Can you try to run libvirtd under valgrind eg, just run valgrind /usr/sbin/libvirtd and then try to reproduce it. This would show if there is memory corruption happening somewhere Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03-07-12 16:54, Daniel P. Berrange wrote:
On Tue, Jul 03, 2012 at 04:42:54PM +0200, Wido den Hollander wrote:
On 03-07-12 15:13, Daniel P. Berrange wrote:
That is really bizarre. Can you look at what is actually stored in the .base64 file each time ? And what 'secret-get-value' replies with ?
I haven't been able to look into this any further, however: I just downloaded 0.9.13 from the libvirt website and installed it on a totally different host which is also running Ubuntu 12.04
I wanted to start a virtual machine with RBD storage and that failed, the secret was corrupted...
The symptoms on this machine are exactly the same, the secret file is just 2 bytes big.
root@amd:~# ls -al /etc/libvirt/secrets/*.base64 -rw------- 1 root root 2 Jul 3 15:02 /etc/libvirt/secrets/69f9540e-f0ce-4184-8254-9b22efade5f2.base64 root@amd:~#
This is the correct behaviour tht I see myself too.
I verified that stack01 isn't out of disk space or out of inodes, those are in the acceptable values range.
Any suggestions?
I think you'll probably need to add some more VIR_DEBUG lines to secret_driver.c to see where in the process it is going wrong. Or perhaps strace libvirtd to see what it thinks it is writing out & whether any errors appear.
I haven't added any VIR_DEBUG lines yet, but stracing the libvirtd process doesn't show any fopen() nor fwrites() to any *.base64 files.
I just added a couple of VIR_DEBUG lines to secret_driver.c and found out that the base64 encoding is actually the problem.
In secretSaveValue
VIR_DEBUG("WIDO Secret value: %s, size %lu", secret->value, secret->value_size);
filename = secretBase64Path(driver, secret); if (filename == NULL) goto cleanup; base64_encode_alloc((char *)secret->value, secret->value_size, &base64); if (base64 == NULL) { virReportOOMError(); goto cleanup; }
VIR_DEBUG("WIDO Writing %s to %s with a length of %lu", base64, filename, strlen(base64)); if (replaceFile(filename, base64, strlen(base64)) < 0) goto cleanup;
The results I get back:
$ virsh secret-set-value 322bccea-f2ed-4eae-a7e5-d0793ffb162d d2lkbw==
2012-07-03 14:02:57.065+0000: 4593: debug : secretSaveValue:297 : WIDO Secret value: wido, size 4 2012-07-03 14:02:57.065+0000: 4593: debug : secretSaveValue:309 : WIDO Writing ���
to /etc/libvirt/secrets/322bccea-f2ed-4eae-a7e5-d0793ffb162d.base64 with a length of 6
Here you can see the secret value arrives at the secret driver in tact, but the base64_encode_alloc seems to scramble the data.
It should display base64 encoded data to write to 'filename', but it's showing some binary stuff.
Yeah, I'm damned I can understand what's broken at that point. The logs show the input is sensible, and we're calling the APIs the right way.
Can you try to run libvirtd under valgrind eg, just run
valgrind /usr/sbin/libvirtd
and then try to reproduce it. This would show if there is memory corruption happening somewhere
Yes, there is memory corruption somewhere. I never used valgrind before, but the output seems to show. I ran libvirtd inside a screen, I've attached the screenlog with all the output. At the end you'll see there is a misread. I was storing the base64 encoded value of "wido". Wido

On Tue, Jul 03, 2012 at 05:11:48PM +0200, Wido den Hollander wrote:
Yes, there is memory corruption somewhere. I never used valgrind before, but the output seems to show.
I ran libvirtd inside a screen, I've attached the screenlog with all the output.
At the end you'll see there is a misread. I was storing the base64 encoded value of "wido".
Thanks so there are two errors shown by valgrind ==6825== Invalid read of size 1 ==6825== at 0x5769DBA: vfprintf (vfprintf.c:1624) ==6825== by 0x58289D0: __vasprintf_chk (vasprintf_chk.c:68) ==6825== by 0x509C727: virVasprintf (stdio2.h:199) ==6825== by 0x508CFEA: virLogVMessage (logging.c:749) ==6825== by 0x508D349: virLogMessage (logging.c:696) ==6825== by 0x127AB0BC: secretSaveValue (secret_driver.c:297) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== Address 0x19cb3c64 is 0 bytes after a block of size 4 alloc'd ==6825== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6825== by 0x508E32F: virAllocN (memory.c:128) ==6825== by 0x127AB21B: secretSetValue (secret_driver.c:838) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== by 0x5098755: virThreadHelper (threads-pthread.c:161) ==6825== by 0x550AE99: start_thread (pthread_create.c:308) ==6825== by 0x58124BC: clone (clone.S:112) This one is harmless - this is because the VIR_DEBUG line you added uses '%s' to print secret_value, but this is not a NULL terminated string. So we can ignore this. ==6825== ==6825== Invalid read of size 4 ==6825== at 0xA57E4B9: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140) ==6825== by 0x127AB0EF: secretSaveValue (secret_driver.c:302) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== by 0x5098755: virThreadHelper (threads-pthread.c:161) ==6825== by 0x550AE99: start_thread (pthread_create.c:308) ==6825== by 0x58124BC: clone (clone.S:112) ==6825== Address 0xb7786c8 is 8 bytes inside a block of size 9 alloc'd ==6825== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6825== by 0xA57E3D7: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140) ==6825== by 0x127AB0EF: secretSaveValue (secret_driver.c:302) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) This one is very interesting. It shows that the 'base64_encode' function is doing an out-of-bounds read. More tellingly though is that it is reporting 'base64_encode' function is in a wierd library: /usr/lib/x86_64-linux-gnu/libroken.so.18.1. If this were normal, we should expect to see that function present in 'base64.c' since this function code is provided by gnulib itself. So something else libvirt is linking to, directly or indirectly is using libroken.so which also has a 'base64_encode'symbol defined. This is overriding gnulib's symbol of the same name. I'm willing to bet the API contract of this libroken.so base64_encode. differs from GNULIBS, with crashtastic results Can you try and find how this libroken.so is getting linked to libvirt ? Or indeed what this library does, or what package it is part of. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03-07-12 17:21, Daniel P. Berrange wrote:
On Tue, Jul 03, 2012 at 05:11:48PM +0200, Wido den Hollander wrote:
Yes, there is memory corruption somewhere. I never used valgrind before, but the output seems to show.
I ran libvirtd inside a screen, I've attached the screenlog with all the output.
At the end you'll see there is a misread. I was storing the base64 encoded value of "wido".
Thanks so there are two errors shown by valgrind
==6825== Invalid read of size 1 ==6825== at 0x5769DBA: vfprintf (vfprintf.c:1624) ==6825== by 0x58289D0: __vasprintf_chk (vasprintf_chk.c:68) ==6825== by 0x509C727: virVasprintf (stdio2.h:199) ==6825== by 0x508CFEA: virLogVMessage (logging.c:749) ==6825== by 0x508D349: virLogMessage (logging.c:696) ==6825== by 0x127AB0BC: secretSaveValue (secret_driver.c:297) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== Address 0x19cb3c64 is 0 bytes after a block of size 4 alloc'd ==6825== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6825== by 0x508E32F: virAllocN (memory.c:128) ==6825== by 0x127AB21B: secretSetValue (secret_driver.c:838) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== by 0x5098755: virThreadHelper (threads-pthread.c:161) ==6825== by 0x550AE99: start_thread (pthread_create.c:308) ==6825== by 0x58124BC: clone (clone.S:112)
This one is harmless - this is because the VIR_DEBUG line you added uses '%s' to print secret_value, but this is not a NULL terminated string. So we can ignore this.
==6825== ==6825== Invalid read of size 4 ==6825== at 0xA57E4B9: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140) ==6825== by 0x127AB0EF: secretSaveValue (secret_driver.c:302) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== by 0x5098755: virThreadHelper (threads-pthread.c:161) ==6825== by 0x550AE99: start_thread (pthread_create.c:308) ==6825== by 0x58124BC: clone (clone.S:112) ==6825== Address 0xb7786c8 is 8 bytes inside a block of size 9 alloc'd ==6825== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6825== by 0xA57E3D7: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140) ==6825== by 0x127AB0EF: secretSaveValue (secret_driver.c:302) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143)
This one is very interesting. It shows that the 'base64_encode' function is doing an out-of-bounds read. More tellingly though is that it is reporting 'base64_encode' function is in a wierd library:
/usr/lib/x86_64-linux-gnu/libroken.so.18.1.
If this were normal, we should expect to see that function present in 'base64.c' since this function code is provided by gnulib itself.
So something else libvirt is linking to, directly or indirectly is using libroken.so which also has a 'base64_encode'symbol defined. This is overriding gnulib's symbol of the same name.
I'm willing to bet the API contract of this libroken.so base64_encode. differs from GNULIBS, with crashtastic results
Can you try and find how this libroken.so is getting linked to libvirt ?
Or indeed what this library does, or what package it is part of.
The library is libroken18-heimdal under Ubuntu 12.04: http://packages.ubuntu.com/precise/libroken18-heimdal When installing ubuntu-virt-server libraries like gnutls depend on this library. I'm not sure why libvirt gets linked to libroken, but on Ubuntu systems it's installed on most system which use libvirt. I haven't found a header file which as the symbol 'base64_encode' in it for libroken, only gnutls and glib seem to have something that has 'base64_encode' in it, but those are prefixed with g_ For now I have no clues to why it gets linked to libroken, but it seems the problem has been found. Wido

On Wed, Jul 04, 2012 at 10:24:38AM +0200, Wido den Hollander wrote:
On 03-07-12 17:21, Daniel P. Berrange wrote:
On Tue, Jul 03, 2012 at 05:11:48PM +0200, Wido den Hollander wrote:
Yes, there is memory corruption somewhere. I never used valgrind before, but the output seems to show.
I ran libvirtd inside a screen, I've attached the screenlog with all the output.
At the end you'll see there is a misread. I was storing the base64 encoded value of "wido".
Thanks so there are two errors shown by valgrind
==6825== Invalid read of size 1 ==6825== at 0x5769DBA: vfprintf (vfprintf.c:1624) ==6825== by 0x58289D0: __vasprintf_chk (vasprintf_chk.c:68) ==6825== by 0x509C727: virVasprintf (stdio2.h:199) ==6825== by 0x508CFEA: virLogVMessage (logging.c:749) ==6825== by 0x508D349: virLogMessage (logging.c:696) ==6825== by 0x127AB0BC: secretSaveValue (secret_driver.c:297) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== Address 0x19cb3c64 is 0 bytes after a block of size 4 alloc'd ==6825== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6825== by 0x508E32F: virAllocN (memory.c:128) ==6825== by 0x127AB21B: secretSetValue (secret_driver.c:838) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== by 0x5098755: virThreadHelper (threads-pthread.c:161) ==6825== by 0x550AE99: start_thread (pthread_create.c:308) ==6825== by 0x58124BC: clone (clone.S:112)
This one is harmless - this is because the VIR_DEBUG line you added uses '%s' to print secret_value, but this is not a NULL terminated string. So we can ignore this.
==6825== ==6825== Invalid read of size 4 ==6825== at 0xA57E4B9: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140) ==6825== by 0x127AB0EF: secretSaveValue (secret_driver.c:302) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== by 0x5098755: virThreadHelper (threads-pthread.c:161) ==6825== by 0x550AE99: start_thread (pthread_create.c:308) ==6825== by 0x58124BC: clone (clone.S:112) ==6825== Address 0xb7786c8 is 8 bytes inside a block of size 9 alloc'd ==6825== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6825== by 0xA57E3D7: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140) ==6825== by 0x127AB0EF: secretSaveValue (secret_driver.c:302) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143)
This one is very interesting. It shows that the 'base64_encode' function is doing an out-of-bounds read. More tellingly though is that it is reporting 'base64_encode' function is in a wierd library:
/usr/lib/x86_64-linux-gnu/libroken.so.18.1.
If this were normal, we should expect to see that function present in 'base64.c' since this function code is provided by gnulib itself.
So something else libvirt is linking to, directly or indirectly is using libroken.so which also has a 'base64_encode'symbol defined. This is overriding gnulib's symbol of the same name.
I'm willing to bet the API contract of this libroken.so base64_encode. differs from GNULIBS, with crashtastic results
Can you try and find how this libroken.so is getting linked to libvirt ?
Or indeed what this library does, or what package it is part of.
The library is libroken18-heimdal under Ubuntu 12.04: http://packages.ubuntu.com/precise/libroken18-heimdal
When installing ubuntu-virt-server libraries like gnutls depend on this library.
I'm not sure why libvirt gets linked to libroken, but on Ubuntu systems it's installed on most system which use libvirt.
I haven't found a header file which as the symbol 'base64_encode' in it for libroken, only gnutls and glib seem to have something that has 'base64_encode' in it, but those are prefixed with g_
I expect that this is an internal symbol from libroken.so which they leak into the public namespace.
For now I have no clues to why it gets linked to libroken, but it seems the problem has been found.
If gnutls links to libroken.so, then libvirt gets linked to it indirectly It sounds like we might need to have a workaround in gnulib to avoid this problem. With other cases where gnulib replaces existing symbols they use some magic such that the gnulib replacement gets prefixed with 'rpl_'. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 04-07-12 10:45, Daniel P. Berrange wrote:
On Wed, Jul 04, 2012 at 10:24:38AM +0200, Wido den Hollander wrote:
On 03-07-12 17:21, Daniel P. Berrange wrote:
On Tue, Jul 03, 2012 at 05:11:48PM +0200, Wido den Hollander wrote:
Yes, there is memory corruption somewhere. I never used valgrind before, but the output seems to show.
I ran libvirtd inside a screen, I've attached the screenlog with all the output.
At the end you'll see there is a misread. I was storing the base64 encoded value of "wido".
Thanks so there are two errors shown by valgrind
==6825== Invalid read of size 1 ==6825== at 0x5769DBA: vfprintf (vfprintf.c:1624) ==6825== by 0x58289D0: __vasprintf_chk (vasprintf_chk.c:68) ==6825== by 0x509C727: virVasprintf (stdio2.h:199) ==6825== by 0x508CFEA: virLogVMessage (logging.c:749) ==6825== by 0x508D349: virLogMessage (logging.c:696) ==6825== by 0x127AB0BC: secretSaveValue (secret_driver.c:297) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== Address 0x19cb3c64 is 0 bytes after a block of size 4 alloc'd ==6825== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6825== by 0x508E32F: virAllocN (memory.c:128) ==6825== by 0x127AB21B: secretSetValue (secret_driver.c:838) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== by 0x5098755: virThreadHelper (threads-pthread.c:161) ==6825== by 0x550AE99: start_thread (pthread_create.c:308) ==6825== by 0x58124BC: clone (clone.S:112)
This one is harmless - this is because the VIR_DEBUG line you added uses '%s' to print secret_value, but this is not a NULL terminated string. So we can ignore this.
==6825== ==6825== Invalid read of size 4 ==6825== at 0xA57E4B9: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140) ==6825== by 0x127AB0EF: secretSaveValue (secret_driver.c:302) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143) ==6825== by 0x5098755: virThreadHelper (threads-pthread.c:161) ==6825== by 0x550AE99: start_thread (pthread_create.c:308) ==6825== by 0x58124BC: clone (clone.S:112) ==6825== Address 0xb7786c8 is 8 bytes inside a block of size 9 alloc'd ==6825== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6825== by 0xA57E3D7: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140) ==6825== by 0x127AB0EF: secretSaveValue (secret_driver.c:302) ==6825== by 0x127AB30A: secretSetValue (secret_driver.c:861) ==6825== by 0x5130CF4: virSecretSetValue (libvirt.c:14457) ==6825== by 0x41CC8F: remoteDispatchSecretSetValueHelper (remote_dispatch.h:10962) ==6825== by 0x5174F84: virNetServerProgramDispatch (virnetserverprogram.c:416) ==6825== by 0x5171260: virNetServerHandleJob (virnetserver.c:161) ==6825== by 0x50990AD: virThreadPoolWorker (threadpool.c:143)
This one is very interesting. It shows that the 'base64_encode' function is doing an out-of-bounds read. More tellingly though is that it is reporting 'base64_encode' function is in a wierd library:
/usr/lib/x86_64-linux-gnu/libroken.so.18.1.
If this were normal, we should expect to see that function present in 'base64.c' since this function code is provided by gnulib itself.
So something else libvirt is linking to, directly or indirectly is using libroken.so which also has a 'base64_encode'symbol defined. This is overriding gnulib's symbol of the same name.
I'm willing to bet the API contract of this libroken.so base64_encode. differs from GNULIBS, with crashtastic results
Can you try and find how this libroken.so is getting linked to libvirt ?
Or indeed what this library does, or what package it is part of.
The library is libroken18-heimdal under Ubuntu 12.04: http://packages.ubuntu.com/precise/libroken18-heimdal
When installing ubuntu-virt-server libraries like gnutls depend on this library.
I'm not sure why libvirt gets linked to libroken, but on Ubuntu systems it's installed on most system which use libvirt.
I haven't found a header file which as the symbol 'base64_encode' in it for libroken, only gnutls and glib seem to have something that has 'base64_encode' in it, but those are prefixed with g_
I expect that this is an internal symbol from libroken.so which they leak into the public namespace.
I just verified by downloading the source from http://www.h5l.org and lib/roken/base64.h indeed declares base64_encode and base64_decode.
For now I have no clues to why it gets linked to libroken, but it seems the problem has been found.
If gnutls links to libroken.so, then libvirt gets linked to it indirectly
It sounds like we might need to have a workaround in gnulib to avoid this problem. With other cases where gnulib replaces existing symbols they use some magic such that the gnulib replacement gets prefixed with 'rpl_'.
Not my expertise, but sounds reasonable. It will take time however I think? Until then secrets on Ubuntu 12.04 will be broken. Wido

[adding gnulib] On 07/04/2012 02:45 AM, Daniel P. Berrange wrote:
==6825== ==6825== Invalid read of size 4 ==6825== at 0xA57E4B9: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140)
This one is very interesting. It shows that the 'base64_encode' function is doing an out-of-bounds read. More tellingly though is that it is reporting 'base64_encode' function is in a wierd library:
/usr/lib/x86_64-linux-gnu/libroken.so.18.1.
If this were normal, we should expect to see that function present in 'base64.c' since this function code is provided by gnulib itself.
So something else libvirt is linking to, directly or indirectly is using libroken.so which also has a 'base64_encode'symbol defined. This is overriding gnulib's symbol of the same name.
I'm willing to bet the API contract of this libroken.so base64_encode. differs from GNULIBS, with crashtastic results
The library is libroken18-heimdal under Ubuntu 12.04: http://packages.ubuntu.com/precise/libroken18-heimdal
When installing ubuntu-virt-server libraries like gnutls depend on this library.
I expect that this is an internal symbol from libroken.so which they leak into the public namespace.
It sounds like we might need to have a workaround in gnulib to avoid this problem. With other cases where gnulib replaces existing symbols they use some magic such that the gnulib replacement gets prefixed with 'rpl_'.
Yuck. Gnulib can't really probe at configure time whether an application will link against a shared library that drags in namespace pollution, so I don't see how to automate any 'rpl_' renaming in gnulib directly. It would be possible to blindly rename the gnulib functions, but that's an interface change that would affect all clients of the gnulib base64 module. I'm wondering if it is better for libvirt to just #define base64_encode to a different name in config.h. Meanwhile, we need to open a bug report against heimdal to fix their library namespace pollution through libroken. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04-07-12 14:08, Eric Blake wrote:
[adding gnulib]
On 07/04/2012 02:45 AM, Daniel P. Berrange wrote:
==6825== ==6825== Invalid read of size 4 ==6825== at 0xA57E4B9: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140)
This one is very interesting. It shows that the 'base64_encode' function is doing an out-of-bounds read. More tellingly though is that it is reporting 'base64_encode' function is in a wierd library:
/usr/lib/x86_64-linux-gnu/libroken.so.18.1.
If this were normal, we should expect to see that function present in 'base64.c' since this function code is provided by gnulib itself.
So something else libvirt is linking to, directly or indirectly is using libroken.so which also has a 'base64_encode'symbol defined. This is overriding gnulib's symbol of the same name.
I'm willing to bet the API contract of this libroken.so base64_encode. differs from GNULIBS, with crashtastic results
The library is libroken18-heimdal under Ubuntu 12.04: http://packages.ubuntu.com/precise/libroken18-heimdal
When installing ubuntu-virt-server libraries like gnutls depend on this library.
I expect that this is an internal symbol from libroken.so which they leak into the public namespace.
It sounds like we might need to have a workaround in gnulib to avoid this problem. With other cases where gnulib replaces existing symbols they use some magic such that the gnulib replacement gets prefixed with 'rpl_'.
Yuck. Gnulib can't really probe at configure time whether an application will link against a shared library that drags in namespace pollution, so I don't see how to automate any 'rpl_' renaming in gnulib directly. It would be possible to blindly rename the gnulib functions, but that's an interface change that would affect all clients of the gnulib base64 module.
I'm wondering if it is better for libvirt to just #define base64_encode to a different name in config.h. Meanwhile, we need to open a bug report against heimdal to fix their library namespace pollution through libroken.
As this is getting a bit out of my league it's safe for me to assume somebody else will pick this up? Not to take the easy way out, but I don't think I can't provide much help here other then testing any patches. Wido

On Wed, Jul 04, 2012 at 04:17:39PM +0200, Wido den Hollander wrote:
On 04-07-12 14:08, Eric Blake wrote:
[adding gnulib]
On 07/04/2012 02:45 AM, Daniel P. Berrange wrote:
==6825== ==6825== Invalid read of size 4 ==6825== at 0xA57E4B9: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140)
This one is very interesting. It shows that the 'base64_encode' function is doing an out-of-bounds read. More tellingly though is that it is reporting 'base64_encode' function is in a wierd library:
/usr/lib/x86_64-linux-gnu/libroken.so.18.1.
If this were normal, we should expect to see that function present in 'base64.c' since this function code is provided by gnulib itself.
So something else libvirt is linking to, directly or indirectly is using libroken.so which also has a 'base64_encode'symbol defined. This is overriding gnulib's symbol of the same name.
I'm willing to bet the API contract of this libroken.so base64_encode. differs from GNULIBS, with crashtastic results
The library is libroken18-heimdal under Ubuntu 12.04: http://packages.ubuntu.com/precise/libroken18-heimdal
When installing ubuntu-virt-server libraries like gnutls depend on this library.
I expect that this is an internal symbol from libroken.so which they leak into the public namespace.
It sounds like we might need to have a workaround in gnulib to avoid this problem. With other cases where gnulib replaces existing symbols they use some magic such that the gnulib replacement gets prefixed with 'rpl_'.
Yuck. Gnulib can't really probe at configure time whether an application will link against a shared library that drags in namespace pollution, so I don't see how to automate any 'rpl_' renaming in gnulib directly. It would be possible to blindly rename the gnulib functions, but that's an interface change that would affect all clients of the gnulib base64 module.
I'm wondering if it is better for libvirt to just #define base64_encode to a different name in config.h. Meanwhile, we need to open a bug report against heimdal to fix their library namespace pollution through libroken.
As this is getting a bit out of my league it's safe for me to assume somebody else will pick this up?
Yes, we'll sort it out from here.
Not to take the easy way out, but I don't think I can't provide much help here other then testing any patches.
I'll try to remember to let you know when we've got a possible fix. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Jul 04, 2012 at 06:08:59AM -0600, Eric Blake wrote:
[adding gnulib]
On 07/04/2012 02:45 AM, Daniel P. Berrange wrote:
==6825== ==6825== Invalid read of size 4 ==6825== at 0xA57E4B9: base64_encode (in /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0) ==6825== by 0x10DDBC98: base64_encode_alloc (base64.c:140)
This one is very interesting. It shows that the 'base64_encode' function is doing an out-of-bounds read. More tellingly though is that it is reporting 'base64_encode' function is in a wierd library:
/usr/lib/x86_64-linux-gnu/libroken.so.18.1.
If this were normal, we should expect to see that function present in 'base64.c' since this function code is provided by gnulib itself.
So something else libvirt is linking to, directly or indirectly is using libroken.so which also has a 'base64_encode'symbol defined. This is overriding gnulib's symbol of the same name.
I'm willing to bet the API contract of this libroken.so base64_encode. differs from GNULIBS, with crashtastic results
The library is libroken18-heimdal under Ubuntu 12.04: http://packages.ubuntu.com/precise/libroken18-heimdal
When installing ubuntu-virt-server libraries like gnutls depend on this library.
I expect that this is an internal symbol from libroken.so which they leak into the public namespace.
It sounds like we might need to have a workaround in gnulib to avoid this problem. With other cases where gnulib replaces existing symbols they use some magic such that the gnulib replacement gets prefixed with 'rpl_'.
Yuck. Gnulib can't really probe at configure time whether an application will link against a shared library that drags in namespace pollution, so I don't see how to automate any 'rpl_' renaming in gnulib directly. It would be possible to blindly rename the gnulib functions, but that's an interface change that would affect all clients of the gnulib base64 module.
I'm wondering if it is better for libvirt to just #define base64_encode to a different name in config.h.
Yeah, that's sort of what I was imagining we could do in base64.h in fact. If its better to just do it in libvirt config.h, then we can do that too Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Daniel P. Berrange wrote:
If its better to just do it in libvirt config.h, then we can do that too
Yes, doing '#define foo libvirt_foo' in config.h is the preferred way of achieving a namespace clean shared library. There are two ways to generate these #defines: 1) You collect manually, on various systems, the set of symbols that you don't want to clash with symbols from other shared libraries. You need to do this on various systems, because gnulib may define functions 'rpl_fflush' or 'dprintf' on some systems and not on others. 2) You collect, from a set of header files, the set of symbols that you want to have exported, and process all other symbols with '#define foo libvirt_foo' This approach is more robust, but requires to compile all *.o files twice: Once with the initial settings (no #define), and once for real. This approach is implemented in libunistring. Look at the config.h rule in this Makefile.am [1]. There are two auxiliary scripts: 'declared.sh' [2] extracts the symbols from a .h file (assuming a particular coding style). 'exported.sh' [3] extracts te symbols of a .o file. Bruno [1] http://git.savannah.gnu.org/gitweb/?p=libunistring.git;a=blob;f=lib/Makefile... [2] http://git.savannah.gnu.org/gitweb/?p=libunistring.git;a=blob;f=lib/declared... [3] http://git.savannah.gnu.org/gitweb/?p=libunistring.git;a=blob;f=lib/exported...

On 07/08/2012 07:51 PM, Bruno Haible wrote:
Daniel P. Berrange wrote:
If its better to just do it in libvirt config.h, then we can do that too
Yes, doing '#define foo libvirt_foo' in config.h is the preferred way of achieving a namespace clean shared library.
There are two ways to generate these #defines:
1) You collect manually, on various systems, the set of symbols that you don't want to clash with symbols from other shared libraries. You need to do this on various systems, because gnulib may define functions 'rpl_fflush' or 'dprintf' on some systems and not on others.
2) You collect, from a set of header files, the set of symbols that you want to have exported, and process all other symbols with '#define foo libvirt_foo'
This approach is more robust, but requires to compile all *.o files twice: Once with the initial settings (no #define), and once for real.
This approach is implemented in libunistring. Look at the config.h rule in this Makefile.am [1]. There are two auxiliary scripts: 'declared.sh' [2] extracts the symbols from a .h file (assuming a particular coding style). 'exported.sh' [3] extracts te symbols of a .o file.
I don't want to rush anything, but I see that libvirt 0.10 will be coming out soon and I don't think this has been corrected? Right now this means that libvirt is not usable on Ubuntu 12.04 systems when you want to use the secrets of libvirt. Is it feasible to have this fixed before 0.10 comes out? Wido
Bruno
[1] http://git.savannah.gnu.org/gitweb/?p=libunistring.git;a=blob;f=lib/Makefile... [2] http://git.savannah.gnu.org/gitweb/?p=libunistring.git;a=blob;f=lib/declared... [3] http://git.savannah.gnu.org/gitweb/?p=libunistring.git;a=blob;f=lib/exported...

On Thu, Aug 02, 2012 at 01:18:12PM +0200, Wido den Hollander wrote:
On 07/08/2012 07:51 PM, Bruno Haible wrote:
Daniel P. Berrange wrote:
If its better to just do it in libvirt config.h, then we can do that too
Yes, doing '#define foo libvirt_foo' in config.h is the preferred way of achieving a namespace clean shared library.
There are two ways to generate these #defines:
1) You collect manually, on various systems, the set of symbols that you don't want to clash with symbols from other shared libraries. You need to do this on various systems, because gnulib may define functions 'rpl_fflush' or 'dprintf' on some systems and not on others.
2) You collect, from a set of header files, the set of symbols that you want to have exported, and process all other symbols with '#define foo libvirt_foo'
This approach is more robust, but requires to compile all *.o files twice: Once with the initial settings (no #define), and once for real.
This approach is implemented in libunistring. Look at the config.h rule in this Makefile.am [1]. There are two auxiliary scripts: 'declared.sh' [2] extracts the symbols from a .h file (assuming a particular coding style). 'exported.sh' [3] extracts te symbols of a .o file.
I don't want to rush anything, but I see that libvirt 0.10 will be coming out soon and I don't think this has been corrected?
Right now this means that libvirt is not usable on Ubuntu 12.04 systems when you want to use the secrets of libvirt.
Is it feasible to have this fixed before 0.10 comes out?
Try applying this patch to your source tree diff --git a/configure.ac b/configure.ac index 6b189db..4f906bb 100644 --- a/configure.ac +++ b/configure.ac @@ -2876,6 +2876,10 @@ test "x$lv_cv_static_analysis" = xyes && t=1 AC_DEFINE_UNQUOTED([STATIC_ANALYSIS], [$t], [Define to 1 when performing static analysis.]) +AC_DEFINE_UNQUOTED([isbase64],[gnulib_isbase64],[Hack to avoid symbol clash]) +AC_DEFINE_UNQUOTED([base64_encode],[gnulib_base64_encode],[Hack to avoid symbol clash]) +AC_DEFINE_UNQUOTED([base64_encode_alloc],[gnulib_base64_encode_alloc],[Hack to avoid symbol clash]) + AC_OUTPUT(Makefile src/Makefile include/Makefile docs/Makefile \ docs/schemas/Makefile \ gnulib/lib/Makefile \ Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Daniel P. Berrange wrote:
Try applying this patch to your source tree
diff --git a/configure.ac b/configure.ac index 6b189db..4f906bb 100644 --- a/configure.ac +++ b/configure.ac @@ -2876,6 +2876,10 @@ test "x$lv_cv_static_analysis" = xyes && t=1 AC_DEFINE_UNQUOTED([STATIC_ANALYSIS], [$t], [Define to 1 when performing static analysis.])
+AC_DEFINE_UNQUOTED([isbase64],[gnulib_isbase64],[Hack to avoid symbol clash]) +AC_DEFINE_UNQUOTED([base64_encode],[gnulib_base64_encode],[Hack to avoid symbol clash]) +AC_DEFINE_UNQUOTED([base64_encode_alloc],[gnulib_base64_encode_alloc],[Hack to avoid symbol clash]) +
Could you please use a prefix that is specific to libvirt, rather than 'gnulib_'? I mean, if two different libraries libvirt and libfoo use the same prefix 'gnulib_', the symbols will *still* clash. Bruno

On 08/02/2012 01:37 PM, Daniel P. Berrange wrote:
On Thu, Aug 02, 2012 at 01:18:12PM +0200, Wido den Hollander wrote:
On 07/08/2012 07:51 PM, Bruno Haible wrote:
Daniel P. Berrange wrote:
If its better to just do it in libvirt config.h, then we can do that too
Yes, doing '#define foo libvirt_foo' in config.h is the preferred way of achieving a namespace clean shared library.
There are two ways to generate these #defines:
1) You collect manually, on various systems, the set of symbols that you don't want to clash with symbols from other shared libraries. You need to do this on various systems, because gnulib may define functions 'rpl_fflush' or 'dprintf' on some systems and not on others.
2) You collect, from a set of header files, the set of symbols that you want to have exported, and process all other symbols with '#define foo libvirt_foo'
This approach is more robust, but requires to compile all *.o files twice: Once with the initial settings (no #define), and once for real.
This approach is implemented in libunistring. Look at the config.h rule in this Makefile.am [1]. There are two auxiliary scripts: 'declared.sh' [2] extracts the symbols from a .h file (assuming a particular coding style). 'exported.sh' [3] extracts te symbols of a .o file.
I don't want to rush anything, but I see that libvirt 0.10 will be coming out soon and I don't think this has been corrected?
Right now this means that libvirt is not usable on Ubuntu 12.04 systems when you want to use the secrets of libvirt.
Is it feasible to have this fixed before 0.10 comes out?
Try applying this patch to your source tree
diff --git a/configure.ac b/configure.ac index 6b189db..4f906bb 100644 --- a/configure.ac +++ b/configure.ac @@ -2876,6 +2876,10 @@ test "x$lv_cv_static_analysis" = xyes && t=1 AC_DEFINE_UNQUOTED([STATIC_ANALYSIS], [$t], [Define to 1 when performing static analysis.])
+AC_DEFINE_UNQUOTED([isbase64],[gnulib_isbase64],[Hack to avoid symbol clash]) +AC_DEFINE_UNQUOTED([base64_encode],[gnulib_base64_encode],[Hack to avoid symbol clash]) +AC_DEFINE_UNQUOTED([base64_encode_alloc],[gnulib_base64_encode_alloc],[Hack to avoid symbol clash]) + AC_OUTPUT(Makefile src/Makefile include/Makefile docs/Makefile \ docs/schemas/Makefile \ gnulib/lib/Makefile \
Yes, that works for me. The secrets are working on Ubuntu 12.04. Thanks, Wido
Regards, Daniel

On Thu, Aug 02, 2012 at 03:14:27PM +0200, Wido den Hollander wrote:
On 08/02/2012 01:37 PM, Daniel P. Berrange wrote:
On Thu, Aug 02, 2012 at 01:18:12PM +0200, Wido den Hollander wrote:
On 07/08/2012 07:51 PM, Bruno Haible wrote:
Daniel P. Berrange wrote:
If its better to just do it in libvirt config.h, then we can do that too
Yes, doing '#define foo libvirt_foo' in config.h is the preferred way of achieving a namespace clean shared library.
There are two ways to generate these #defines:
1) You collect manually, on various systems, the set of symbols that you don't want to clash with symbols from other shared libraries. You need to do this on various systems, because gnulib may define functions 'rpl_fflush' or 'dprintf' on some systems and not on others.
2) You collect, from a set of header files, the set of symbols that you want to have exported, and process all other symbols with '#define foo libvirt_foo'
This approach is more robust, but requires to compile all *.o files twice: Once with the initial settings (no #define), and once for real.
This approach is implemented in libunistring. Look at the config.h rule in this Makefile.am [1]. There are two auxiliary scripts: 'declared.sh' [2] extracts the symbols from a .h file (assuming a particular coding style). 'exported.sh' [3] extracts te symbols of a .o file.
I don't want to rush anything, but I see that libvirt 0.10 will be coming out soon and I don't think this has been corrected?
Right now this means that libvirt is not usable on Ubuntu 12.04 systems when you want to use the secrets of libvirt.
Is it feasible to have this fixed before 0.10 comes out?
Try applying this patch to your source tree
diff --git a/configure.ac b/configure.ac index 6b189db..4f906bb 100644 --- a/configure.ac +++ b/configure.ac @@ -2876,6 +2876,10 @@ test "x$lv_cv_static_analysis" = xyes && t=1 AC_DEFINE_UNQUOTED([STATIC_ANALYSIS], [$t], [Define to 1 when performing static analysis.])
+AC_DEFINE_UNQUOTED([isbase64],[gnulib_isbase64],[Hack to avoid symbol clash]) +AC_DEFINE_UNQUOTED([base64_encode],[gnulib_base64_encode],[Hack to avoid symbol clash]) +AC_DEFINE_UNQUOTED([base64_encode_alloc],[gnulib_base64_encode_alloc],[Hack to avoid symbol clash]) + AC_OUTPUT(Makefile src/Makefile include/Makefile docs/Makefile \ docs/schemas/Makefile \ gnulib/lib/Makefile \
Yes, that works for me. The secrets are working on Ubuntu 12.04.
Ok, I'll apply this, but with a 'libvirt_' prefix as Bruno requested Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (4)
-
Bruno Haible
-
Daniel P. Berrange
-
Eric Blake
-
Wido den Hollander