
On 19.11.2015 15:00, Piotr Rybicki wrote:
W dniu 2015-11-19 o 14:36, Piotr Rybicki pisze:
W dniu 2015-11-19 o 11:07, Michal Privoznik pisze:
On 18.11.2015 15:33, Piotr Rybicki wrote:
Hi.
There is a mem leak in libvirt, when doing external snapshot (for backup purposes). My KVM domain uses raw storage images via libgfapi. I'm using latest 1.2.21 libvirt (although previous versions act the same).
My bash script for snapshot backup uses series of shell commands (virsh connect to a remote libvirt host):
* virsh domblklist KVM * qemu-img create -f qcow2 -o backing_file=gluster(...) - precreate backing file * virsh snapshot-create KVM SNAP.xml (...) - create snapshot from precreated XML snapshot file * cp main img file * virsh blockcommit KVM disk (...)
Backup script works fine, however libvirtd process gets bigger and bigger each time I run this script.
Some proof of memleak:
32017 - libvirtd pid
When libvirt started: # ps p 32017 o vsz,rss VSZ RSS 585736 15220
When I start KVM via virsh start KVM # ps p 32017 o vsz,rss VSZ RSS 1327968 125956
When i start backup script, after snapshot is created (lots of mem allocated) # ps p 32017 o vsz,rss VSZ RSS 3264544 537632
After backup script finished # ps p 32017 o vsz,rss VSZ RSS 3715920 644940
When i start backup script for a second time, after snapshot is created # ps p 32017 o vsz,rss VSZ RSS 5521424 1056352
And so on, until libvirt spills 'Out of memory' when connecting, ane being really huge process.
Now, I would like to diagnose it further, to provide detailed information about memleak. I tried to use valgrind, but unfortunatelly I'm on Opteron 6380 platform, and valgrind doesn't support XOP quitting witch SIGILL.
If someone could provide me with detailed information on how to get some usefull debug info about this memleak, i'll be more than happy to do it, and share results here.
You can run libvirtd under valgrind (be aware that it will be slow as snail), then run the reproducer and then just terminate the daemon (CTRL+C). Valgrind will then report on all the leaks. When doing this I usually use:
# valgrind --leak-check=full --show-reachable=yes \ --child-silent-after-fork=yes libvirtd
Remember to terminate the system-wide daemon firstly as the one started under valgrind will die early since you can only have one deamon running at the time.
If you are unfamiliar with the output, share it somewhere and I will take a look.
Thank You Michal.
I finally managed to have valgrind running on Opteron 6380. I recomplied with -mno-xop glibc, libvirt and other revelant libs (just for others looking for solution for valgrind on Opteron).
Gluster is at 3.5.4
procedure is: start libvirtd start kvm run backup script (with external snapshot) stop kvm stop libvirtd
Valgrind output:
Sorry, better valgrind output - showing problem:
valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes /usr/sbin/libvirtd --listen 2> valgrind.log
Interesting. I'm gonna post a couple of errors here, so that they don't get lost meanwhile: ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,444 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0x115AE41A: qemuDomainStorageFileInit (qemu_domain.c:2929) ==2650== by 0x1163DE5A: qemuDomainSnapshotCreateSingleDiskActive (qemu_driver.c:14201) ==2650== by 0x1163E604: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14371) ==2650== by 0x1163ED27: qemuDomainSnapshotCreateActiveExternal (qemu_driver.c:14559) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,445 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) ==2650== by 0x1163E843: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14421) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,446 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4DC5: virStorageFileGetMetadataRecurse (storage_driver.c:3054) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) So, I think that we are missing few virStorageFileDeinit() calls somewhere. This is a very basic scratch: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f0ce78b..bdb511f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2970,9 +2970,10 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver, goto cleanup; if (disk->src->backingStore) { - if (force_probe) + if (force_probe) { + virStorageFileDeinit(disk->src); virStorageSourceBackingStoreClear(disk->src); - else + } else goto cleanup; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2192ad8..dd9a89a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5256,6 +5256,7 @@ void qemuProcessStop(virQEMUDriverPtr driver, dev.type = VIR_DOMAIN_DEVICE_DISK; dev.data.disk = disk; ignore_value(qemuRemoveSharedDevice(driver, &dev, vm->def->name)); + virStorageFileDeinit(disk->src); } /* Clear out dynamically assigned labels */ Can you apply it, build libvirt and give it a try? valgrind should report much fewer leaks. Michal