[libvirt] Found mem leak in livirt, need help to debug

newer
[libvirt] [PATCH 0/4] tests: Fix...

Piotr Rybicki

18 Nov 2015 18 Nov '15

2:33 p.m.

Hi. There is a mem leak in libvirt, when doing external snapshot (for backup purposes). My KVM domain uses raw storage images via libgfapi. I'm using latest 1.2.21 libvirt (although previous versions act the same). My bash script for snapshot backup uses series of shell commands (virsh connect to a remote libvirt host): * virsh domblklist KVM * qemu-img create -f qcow2 -o backing_file=gluster(...) - precreate backing file * virsh snapshot-create KVM SNAP.xml (...) - create snapshot from precreated XML snapshot file * cp main img file * virsh blockcommit KVM disk (...) Backup script works fine, however libvirtd process gets bigger and bigger each time I run this script. Some proof of memleak: 32017 - libvirtd pid When libvirt started: # ps p 32017 o vsz,rss VSZ RSS 585736 15220 When I start KVM via virsh start KVM # ps p 32017 o vsz,rss VSZ RSS 1327968 125956 When i start backup script, after snapshot is created (lots of mem allocated) # ps p 32017 o vsz,rss VSZ RSS 3264544 537632 After backup script finished # ps p 32017 o vsz,rss VSZ RSS 3715920 644940 When i start backup script for a second time, after snapshot is created # ps p 32017 o vsz,rss VSZ RSS 5521424 1056352 And so on, until libvirt spills 'Out of memory' when connecting, ane being really huge process. Now, I would like to diagnose it further, to provide detailed information about memleak. I tried to use valgrind, but unfortunatelly I'm on Opteron 6380 platform, and valgrind doesn't support XOP quitting witch SIGILL. If someone could provide me with detailed information on how to get some usefull debug info about this memleak, i'll be more than happy to do it, and share results here. Thanks in advance and best regards Piotr Rybicki

Show replies by date

Michal Privoznik

19 Nov 19 Nov

10:07 a.m.

On 18.11.2015 15:33, Piotr Rybicki wrote:

...

Hi.

There is a mem leak in libvirt, when doing external snapshot (for backup purposes). My KVM domain uses raw storage images via libgfapi. I'm using latest 1.2.21 libvirt (although previous versions act the same).

My bash script for snapshot backup uses series of shell commands (virsh connect to a remote libvirt host):

* virsh domblklist KVM * qemu-img create -f qcow2 -o backing_file=gluster(...) - precreate backing file * virsh snapshot-create KVM SNAP.xml (...) - create snapshot from precreated XML snapshot file * cp main img file * virsh blockcommit KVM disk (...)

Backup script works fine, however libvirtd process gets bigger and bigger each time I run this script.

Some proof of memleak:

32017 - libvirtd pid

When libvirt started: # ps p 32017 o vsz,rss VSZ RSS 585736 15220

When I start KVM via virsh start KVM # ps p 32017 o vsz,rss VSZ RSS 1327968 125956

When i start backup script, after snapshot is created (lots of mem allocated) # ps p 32017 o vsz,rss VSZ RSS 3264544 537632

After backup script finished # ps p 32017 o vsz,rss VSZ RSS 3715920 644940

When i start backup script for a second time, after snapshot is created # ps p 32017 o vsz,rss VSZ RSS 5521424 1056352

And so on, until libvirt spills 'Out of memory' when connecting, ane being really huge process.

Now, I would like to diagnose it further, to provide detailed information about memleak. I tried to use valgrind, but unfortunatelly I'm on Opteron 6380 platform, and valgrind doesn't support XOP quitting witch SIGILL.

If someone could provide me with detailed information on how to get some usefull debug info about this memleak, i'll be more than happy to do it, and share results here.

You can run libvirtd under valgrind (be aware that it will be slow as snail), then run the reproducer and then just terminate the daemon (CTRL+C). Valgrind will then report on all the leaks. When doing this I usually use: # valgrind --leak-check=full --show-reachable=yes \ --child-silent-after-fork=yes libvirtd Remember to terminate the system-wide daemon firstly as the one started under valgrind will die early since you can only have one deamon running at the time. If you are unfamiliar with the output, share it somewhere and I will take a look. Michal

Piotr Rybicki

1:36 p.m.

W dniu 2015-11-19 o 11:07, Michal Privoznik pisze:

...

On 18.11.2015 15:33, Piotr Rybicki wrote:

...
Hi.

There is a mem leak in libvirt, when doing external snapshot (for backup purposes). My KVM domain uses raw storage images via libgfapi. I'm using latest 1.2.21 libvirt (although previous versions act the same).

My bash script for snapshot backup uses series of shell commands (virsh connect to a remote libvirt host):

* virsh domblklist KVM * qemu-img create -f qcow2 -o backing_file=gluster(...) - precreate backing file * virsh snapshot-create KVM SNAP.xml (...) - create snapshot from precreated XML snapshot file * cp main img file * virsh blockcommit KVM disk (...)

Backup script works fine, however libvirtd process gets bigger and bigger each time I run this script.

Some proof of memleak:

32017 - libvirtd pid

When libvirt started: # ps p 32017 o vsz,rss VSZ RSS 585736 15220

When I start KVM via virsh start KVM # ps p 32017 o vsz,rss VSZ RSS 1327968 125956

When i start backup script, after snapshot is created (lots of mem allocated) # ps p 32017 o vsz,rss VSZ RSS 3264544 537632

After backup script finished # ps p 32017 o vsz,rss VSZ RSS 3715920 644940

When i start backup script for a second time, after snapshot is created # ps p 32017 o vsz,rss VSZ RSS 5521424 1056352

And so on, until libvirt spills 'Out of memory' when connecting, ane being really huge process.

Now, I would like to diagnose it further, to provide detailed information about memleak. I tried to use valgrind, but unfortunatelly I'm on Opteron 6380 platform, and valgrind doesn't support XOP quitting witch SIGILL.

If someone could provide me with detailed information on how to get some usefull debug info about this memleak, i'll be more than happy to do it, and share results here.

You can run libvirtd under valgrind (be aware that it will be slow as snail), then run the reproducer and then just terminate the daemon (CTRL+C). Valgrind will then report on all the leaks. When doing this I usually use:

# valgrind --leak-check=full --show-reachable=yes \ --child-silent-after-fork=yes libvirtd

Remember to terminate the system-wide daemon firstly as the one started under valgrind will die early since you can only have one deamon running at the time.

If you are unfamiliar with the output, share it somewhere and I will take a look.

Thank You Michal. I finally managed to have valgrind running on Opteron 6380. I recomplied with -mno-xop glibc, libvirt and other revelant libs (just for others looking for solution for valgrind on Opteron). Gluster is at 3.5.4 procedure is: start libvirtd start kvm run backup script (with external snapshot) stop kvm stop libvirtd Valgrind output: http://wikisend.com/download/457046/valgrind.log I' happy to test patches against libvirt 1.2.21 and retest. Best regards Piotr Rybicki

Piotr Rybicki

2 p.m.

W dniu 2015-11-19 o 14:36, Piotr Rybicki pisze:

...

W dniu 2015-11-19 o 11:07, Michal Privoznik pisze:

...
On 18.11.2015 15:33, Piotr Rybicki wrote:

...
Hi.

There is a mem leak in libvirt, when doing external snapshot (for backup purposes). My KVM domain uses raw storage images via libgfapi. I'm using latest 1.2.21 libvirt (although previous versions act the same).

My bash script for snapshot backup uses series of shell commands (virsh connect to a remote libvirt host):

* virsh domblklist KVM * qemu-img create -f qcow2 -o backing_file=gluster(...) - precreate backing file * virsh snapshot-create KVM SNAP.xml (...) - create snapshot from precreated XML snapshot file * cp main img file * virsh blockcommit KVM disk (...)

Backup script works fine, however libvirtd process gets bigger and bigger each time I run this script.

Some proof of memleak:

32017 - libvirtd pid

When libvirt started: # ps p 32017 o vsz,rss VSZ RSS 585736 15220

When I start KVM via virsh start KVM # ps p 32017 o vsz,rss VSZ RSS 1327968 125956

When i start backup script, after snapshot is created (lots of mem allocated) # ps p 32017 o vsz,rss VSZ RSS 3264544 537632

After backup script finished # ps p 32017 o vsz,rss VSZ RSS 3715920 644940

When i start backup script for a second time, after snapshot is created # ps p 32017 o vsz,rss VSZ RSS 5521424 1056352

And so on, until libvirt spills 'Out of memory' when connecting, ane being really huge process.

Now, I would like to diagnose it further, to provide detailed information about memleak. I tried to use valgrind, but unfortunatelly I'm on Opteron 6380 platform, and valgrind doesn't support XOP quitting witch SIGILL.

If someone could provide me with detailed information on how to get some usefull debug info about this memleak, i'll be more than happy to do it, and share results here.

You can run libvirtd under valgrind (be aware that it will be slow as snail), then run the reproducer and then just terminate the daemon (CTRL+C). Valgrind will then report on all the leaks. When doing this I usually use:

# valgrind --leak-check=full --show-reachable=yes \ --child-silent-after-fork=yes libvirtd

Remember to terminate the system-wide daemon firstly as the one started under valgrind will die early since you can only have one deamon running at the time.

If you are unfamiliar with the output, share it somewhere and I will take a look.

Thank You Michal.

I finally managed to have valgrind running on Opteron 6380. I recomplied with -mno-xop glibc, libvirt and other revelant libs (just for others looking for solution for valgrind on Opteron).

Gluster is at 3.5.4

procedure is: start libvirtd start kvm run backup script (with external snapshot) stop kvm stop libvirtd

Valgrind output:

Sorry, better valgrind output - showing problem: valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes /usr/sbin/libvirtd --listen 2> valgrind.log http://wikisend.com/download/314166/valgrind.log Best regards Piotr Rybicki

Michal Privoznik

4:31 p.m.

On 19.11.2015 15:00, Piotr Rybicki wrote:

...

W dniu 2015-11-19 o 14:36, Piotr Rybicki pisze:

...
W dniu 2015-11-19 o 11:07, Michal Privoznik pisze:

...
On 18.11.2015 15:33, Piotr Rybicki wrote:

...
Hi.

There is a mem leak in libvirt, when doing external snapshot (for backup purposes). My KVM domain uses raw storage images via libgfapi. I'm using latest 1.2.21 libvirt (although previous versions act the same).

My bash script for snapshot backup uses series of shell commands (virsh connect to a remote libvirt host):

* virsh domblklist KVM * qemu-img create -f qcow2 -o backing_file=gluster(...) - precreate backing file * virsh snapshot-create KVM SNAP.xml (...) - create snapshot from precreated XML snapshot file * cp main img file * virsh blockcommit KVM disk (...)

Backup script works fine, however libvirtd process gets bigger and bigger each time I run this script.

Some proof of memleak:

32017 - libvirtd pid

When libvirt started: # ps p 32017 o vsz,rss VSZ RSS 585736 15220

When I start KVM via virsh start KVM # ps p 32017 o vsz,rss VSZ RSS 1327968 125956

When i start backup script, after snapshot is created (lots of mem allocated) # ps p 32017 o vsz,rss VSZ RSS 3264544 537632

After backup script finished # ps p 32017 o vsz,rss VSZ RSS 3715920 644940

When i start backup script for a second time, after snapshot is created # ps p 32017 o vsz,rss VSZ RSS 5521424 1056352

And so on, until libvirt spills 'Out of memory' when connecting, ane being really huge process.

Now, I would like to diagnose it further, to provide detailed information about memleak. I tried to use valgrind, but unfortunatelly I'm on Opteron 6380 platform, and valgrind doesn't support XOP quitting witch SIGILL.

If someone could provide me with detailed information on how to get some usefull debug info about this memleak, i'll be more than happy to do it, and share results here.

You can run libvirtd under valgrind (be aware that it will be slow as snail), then run the reproducer and then just terminate the daemon (CTRL+C). Valgrind will then report on all the leaks. When doing this I usually use:

# valgrind --leak-check=full --show-reachable=yes \ --child-silent-after-fork=yes libvirtd

Remember to terminate the system-wide daemon firstly as the one started under valgrind will die early since you can only have one deamon running at the time.

If you are unfamiliar with the output, share it somewhere and I will take a look.

Thank You Michal.

I finally managed to have valgrind running on Opteron 6380. I recomplied with -mno-xop glibc, libvirt and other revelant libs (just for others looking for solution for valgrind on Opteron).

Gluster is at 3.5.4

procedure is: start libvirtd start kvm run backup script (with external snapshot) stop kvm stop libvirtd

Valgrind output:

Sorry, better valgrind output - showing problem:

valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes /usr/sbin/libvirtd --listen 2> valgrind.log

http://wikisend.com/download/314166/valgrind.log

Interesting. I'm gonna post a couple of errors here, so that they don't get lost meanwhile: ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,444 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0x115AE41A: qemuDomainStorageFileInit (qemu_domain.c:2929) ==2650== by 0x1163DE5A: qemuDomainSnapshotCreateSingleDiskActive (qemu_driver.c:14201) ==2650== by 0x1163E604: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14371) ==2650== by 0x1163ED27: qemuDomainSnapshotCreateActiveExternal (qemu_driver.c:14559) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,445 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) ==2650== by 0x1163E843: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14421) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,446 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4DC5: virStorageFileGetMetadataRecurse (storage_driver.c:3054) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) So, I think that we are missing few virStorageFileDeinit() calls somewhere. This is a very basic scratch: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f0ce78b..bdb511f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2970,9 +2970,10 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver, goto cleanup; if (disk->src->backingStore) { - if (force_probe) + if (force_probe) { + virStorageFileDeinit(disk->src); virStorageSourceBackingStoreClear(disk->src); - else + } else goto cleanup; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2192ad8..dd9a89a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5256,6 +5256,7 @@ void qemuProcessStop(virQEMUDriverPtr driver, dev.type = VIR_DOMAIN_DEVICE_DISK; dev.data.disk = disk; ignore_value(qemuRemoveSharedDevice(driver, &dev, vm->def->name)); + virStorageFileDeinit(disk->src); } /* Clear out dynamically assigned labels */ Can you apply it, build libvirt and give it a try? valgrind should report much fewer leaks. Michal

Piotr Rybicki

8:13 p.m.

W dniu 2015-11-19 o 17:31, Michal Privoznik pisze:

...

procedure is: start libvirtd start kvm run backup script (with external snapshot) stop kvm stop libvirtd

Valgrind output:

...
Sorry, better valgrind output - showing problem:

valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes /usr/sbin/libvirtd --listen 2> valgrind.log

http://wikisend.com/download/314166/valgrind.log Interesting. I'm gonna post a couple of errors here, so that they don't get lost meanwhile:

==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,444 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0x115AE41A: qemuDomainStorageFileInit (qemu_domain.c:2929) ==2650== by 0x1163DE5A: qemuDomainSnapshotCreateSingleDiskActive (qemu_driver.c:14201) ==2650== by 0x1163E604: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14371) ==2650== by 0x1163ED27: qemuDomainSnapshotCreateActiveExternal (qemu_driver.c:14559) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,445 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) ==2650== by 0x1163E843: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14421) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,446 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4DC5: virStorageFileGetMetadataRecurse (storage_driver.c:3054) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980)

So, I think that we are missing few virStorageFileDeinit() calls somewhere. This is a very basic scratch:

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f0ce78b..bdb511f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2970,9 +2970,10 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver, goto cleanup;

if (disk->src->backingStore) { - if (force_probe) + if (force_probe) { + virStorageFileDeinit(disk->src); virStorageSourceBackingStoreClear(disk->src); - else + } else goto cleanup; }

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2192ad8..dd9a89a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5256,6 +5256,7 @@ void qemuProcessStop(virQEMUDriverPtr driver, dev.type = VIR_DOMAIN_DEVICE_DISK; dev.data.disk = disk; ignore_value(qemuRemoveSharedDevice(driver, &dev, vm->def->name)); + virStorageFileDeinit(disk->src); }

/* Clear out dynamically assigned labels */

Can you apply it, build libvirt and give it a try? valgrind should report much fewer leaks.

Looks like it doesn't make much of a difference :( http://wikisend.com/download/158168/valgrind2.log Best regards Piotr Rybicki

Peter Krempa

20 Nov 20 Nov

8:13 a.m.

On Thu, Nov 19, 2015 at 21:13:56 +0100, Piotr Rybicki wrote:

...

W dniu 2015-11-19 o 17:31, Michal Privoznik pisze:

...

...
==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,444 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0x115AE41A: qemuDomainStorageFileInit (qemu_domain.c:2929) ==2650== by 0x1163DE5A: qemuDomainSnapshotCreateSingleDiskActive (qemu_driver.c:14201) ==2650== by 0x1163E604: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14371) ==2650== by 0x1163ED27: qemuDomainSnapshotCreateActiveExternal (qemu_driver.c:14559) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,445 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) ==2650== by 0x1163E843: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14421) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,446 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4DC5: virStorageFileGetMetadataRecurse (storage_driver.c:3054) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980)

I've seen some of theese already. The bug is actually not in libvirt but in gluster's libgfapi library, so any change in libvirt won't help. This was tracked in gluster as: https://bugzilla.redhat.com/show_bug.cgi?id=1093594 I suggest you update the gluster library to resolve this issue. Peter

Piotr Rybicki

10:29 a.m.

...

I've seen some of theese already. The bug is actually not in libvirt but in gluster's libgfapi library, so any change in libvirt won't help.

This was tracked in gluster as:

https://bugzilla.redhat.com/show_bug.cgi?id=1093594

I suggest you update the gluster library to resolve this issue.

Peter, Michal - thank You for Your time. I'll try higher gluster version. Best regards Piotr Rybicki

Piotr Rybicki

9 Feb 9 Feb

12:36 p.m.

New subject: [libvirt] Found mem leak in libvirtd, need help to debug

Hi guys. W dniu 2015-11-20 o 11:29, Piotr Rybicki pisze:

...

...
I've seen some of theese already. The bug is actually not in libvirt but in gluster's libgfapi library, so any change in libvirt won't help.

This was tracked in gluster as:

https://bugzilla.redhat.com/show_bug.cgi?id=1093594

I suggest you update the gluster library to resolve this issue.

I've tested further this issue. I have to report, that mem leak still exists in latest versions gluster: 3.7.6 libvirt 1.3.1 mem leak exists even when starting domain (virsh start DOMAIN) which acesses drivie via libgfapi (although leak is much smaller than with gluster 3.5.X). when using drive via file (gluster fuse mount), there is no mem leak when starting domain. my drive definition (libgfapi): <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writethrough' iothread='1'/> <source protocol='gluster' name='pool/disk-sys.img'> <host name='X.X.X.X' transport='rdma'/> </source> <blockio logical_block_size='512' physical_block_size='32768'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> valgrind details (libgfapi): # valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes libvirtd --listen 2> libvirt-gfapi.log ==6532== Memcheck, a memory error detector ==6532== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6532== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==6532== Command: libvirtd --listen ==6532== ==6532== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints. ==6532== This could cause spurious value errors to appear. ==6532== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. 2016-02-09 12:20:26.732+0000: 6535: info : libvirt version: 1.3.1 2016-02-09 12:20:26.732+0000: 6535: info : hostname: adm-office 2016-02-09 12:20:26.732+0000: 6535: warning : qemuDomainObjTaint:2223 : Domain id=1 name='gentoo-intel' uuid=f9fd934b-cbda-af4e-cc98-0dd2c8dd6c2c is tainted: host-cpu 2016-02-09 12:21:29.924+0000: 6532: error : qemuMonitorIO:689 : internal error: End of file from monitor ==6532== ==6532== HEAP SUMMARY: ==6532== in use at exit: 3,726,573 bytes in 15,324 blocks ==6532== total heap usage: 238,573 allocs, 223,249 frees, 1,020,776,752 bytes allocated (...) ==6532== LEAK SUMMARY: ==6532== definitely lost: 19,760 bytes in 97 blocks ==6532== indirectly lost: 21,098 bytes in 122 blocks ==6532== possibly lost: 2,698,764 bytes in 67 blocks ==6532== still reachable: 986,951 bytes in 15,038 blocks ==6532== suppressed: 0 bytes in 0 blocks ==6532== ==6532== For counts of detected and suppressed errors, rerun with: -v ==6532== ERROR SUMMARY: 96 errors from 96 contexts (suppressed: 0 from 0) full log: http://195.191.233.1/libvirt-gfapi.log http://195.191.233.1/libvirt-gfapi.log.bz2 Best regards Piotr Rybicki

Michal Privoznik

3:12 p.m.

New subject: [libvirt] Found mem leak in libvirtd, need help to debug

On 09.02.2016 13:36, Piotr Rybicki wrote:

...

Hi guys.

W dniu 2015-11-20 o 11:29, Piotr Rybicki pisze:

...
...
I've seen some of theese already. The bug is actually not in libvirt but in gluster's libgfapi library, so any change in libvirt won't help.

This was tracked in gluster as:

https://bugzilla.redhat.com/show_bug.cgi?id=1093594

I suggest you update the gluster library to resolve this issue.

I've tested further this issue.

I have to report, that mem leak still exists in latest versions gluster: 3.7.6 libvirt 1.3.1

mem leak exists even when starting domain (virsh start DOMAIN) which acesses drivie via libgfapi (although leak is much smaller than with gluster 3.5.X).

when using drive via file (gluster fuse mount), there is no mem leak when starting domain.

my drive definition (libgfapi):

<disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writethrough' iothread='1'/> <source protocol='gluster' name='pool/disk-sys.img'> <host name='X.X.X.X' transport='rdma'/> </source> <blockio logical_block_size='512' physical_block_size='32768'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk>

valgrind details (libgfapi):

# valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes libvirtd --listen 2> libvirt-gfapi.log

==6532== Memcheck, a memory error detector ==6532== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6532== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==6532== Command: libvirtd --listen ==6532== ==6532== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints. ==6532== This could cause spurious value errors to appear. ==6532== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. 2016-02-09 12:20:26.732+0000: 6535: info : libvirt version: 1.3.1 2016-02-09 12:20:26.732+0000: 6535: info : hostname: adm-office 2016-02-09 12:20:26.732+0000: 6535: warning : qemuDomainObjTaint:2223 : Domain id=1 name='gentoo-intel' uuid=f9fd934b-cbda-af4e-cc98-0dd2c8dd6c2c is tainted: host-cpu 2016-02-09 12:21:29.924+0000: 6532: error : qemuMonitorIO:689 : internal error: End of file from monitor ==6532== ==6532== HEAP SUMMARY: ==6532== in use at exit: 3,726,573 bytes in 15,324 blocks ==6532== total heap usage: 238,573 allocs, 223,249 frees, 1,020,776,752 bytes allocated

(...)

==6532== LEAK SUMMARY: ==6532== definitely lost: 19,760 bytes in 97 blocks ==6532== indirectly lost: 21,098 bytes in 122 blocks ==6532== possibly lost: 2,698,764 bytes in 67 blocks ==6532== still reachable: 986,951 bytes in 15,038 blocks ==6532== suppressed: 0 bytes in 0 blocks ==6532== ==6532== For counts of detected and suppressed errors, rerun with: -v ==6532== ERROR SUMMARY: 96 errors from 96 contexts (suppressed: 0 from 0)

full log: http://195.191.233.1/libvirt-gfapi.log http://195.191.233.1/libvirt-gfapi.log.bz2

I still think these are libgfapi leaks; All the definitely lost bytes come from the library. ==6532== 3,064 (96 direct, 2,968 indirect) bytes in 1 blocks are definitely lost in loss record 1,106 of 1,142 ==6532== at 0x4C2C0D0: calloc (vg_replace_malloc.c:711) ==6532== by 0x10701279: __gf_calloc (mem-pool.c:117) ==6532== by 0x106CC541: xlator_dynload (xlator.c:259) ==6532== by 0xFC4E947: create_master (glfs.c:202) ==6532== by 0xFC4E947: glfs_init_common (glfs.c:863) ==6532== by 0xFC4EB50: glfs_init@@GFAPI_3.4.0 (glfs.c:916) ==6532== by 0xF7E4A33: virStorageFileBackendGlusterInit (storage_backend_gluster.c:625) ==6532== by 0xF7D56DE: virStorageFileInitAs (storage_driver.c:2788) ==6532== by 0xF7D5E39: virStorageFileGetMetadataRecurse (storage_driver.c:3048) ==6532== by 0xF7D6295: virStorageFileGetMetadata (storage_driver.c:3171) ==6532== by 0x1126A2B0: qemuDomainDetermineDiskChain (qemu_domain.c:3179) ==6532== by 0x11269AE6: qemuDomainCheckDiskPresence (qemu_domain.c:2998) ==6532== by 0x11292055: qemuProcessLaunch (qemu_process.c:4708) Care to reporting it to them? Michal

Piotr Rybicki

3:34 p.m.

New subject: [libvirt] Found mem leak in libvirtd, need help to debug

W dniu 2016-02-09 o 16:12, Michal Privoznik pisze:

...

On 09.02.2016 13:36, Piotr Rybicki wrote:

...
Hi guys.

W dniu 2015-11-20 o 11:29, Piotr Rybicki pisze:

...
...
I've seen some of theese already. The bug is actually not in libvirt but in gluster's libgfapi library, so any change in libvirt won't help.

This was tracked in gluster as:

https://bugzilla.redhat.com/show_bug.cgi?id=1093594

I suggest you update the gluster library to resolve this issue.

I've tested further this issue.

I have to report, that mem leak still exists in latest versions gluster: 3.7.6 libvirt 1.3.1

mem leak exists even when starting domain (virsh start DOMAIN) which acesses drivie via libgfapi (although leak is much smaller than with gluster 3.5.X).

when using drive via file (gluster fuse mount), there is no mem leak when starting domain.

my drive definition (libgfapi):

<disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writethrough' iothread='1'/> <source protocol='gluster' name='pool/disk-sys.img'> <host name='X.X.X.X' transport='rdma'/> </source> <blockio logical_block_size='512' physical_block_size='32768'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk>

valgrind details (libgfapi):

# valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes libvirtd --listen 2> libvirt-gfapi.log

==6532== Memcheck, a memory error detector ==6532== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6532== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==6532== Command: libvirtd --listen ==6532== ==6532== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints. ==6532== This could cause spurious value errors to appear. ==6532== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. 2016-02-09 12:20:26.732+0000: 6535: info : libvirt version: 1.3.1 2016-02-09 12:20:26.732+0000: 6535: info : hostname: adm-office 2016-02-09 12:20:26.732+0000: 6535: warning : qemuDomainObjTaint:2223 : Domain id=1 name='gentoo-intel' uuid=f9fd934b-cbda-af4e-cc98-0dd2c8dd6c2c is tainted: host-cpu 2016-02-09 12:21:29.924+0000: 6532: error : qemuMonitorIO:689 : internal error: End of file from monitor ==6532== ==6532== HEAP SUMMARY: ==6532== in use at exit: 3,726,573 bytes in 15,324 blocks ==6532== total heap usage: 238,573 allocs, 223,249 frees, 1,020,776,752 bytes allocated

(...)

==6532== LEAK SUMMARY: ==6532== definitely lost: 19,760 bytes in 97 blocks ==6532== indirectly lost: 21,098 bytes in 122 blocks ==6532== possibly lost: 2,698,764 bytes in 67 blocks ==6532== still reachable: 986,951 bytes in 15,038 blocks ==6532== suppressed: 0 bytes in 0 blocks ==6532== ==6532== For counts of detected and suppressed errors, rerun with: -v ==6532== ERROR SUMMARY: 96 errors from 96 contexts (suppressed: 0 from 0)

full log: http://195.191.233.1/libvirt-gfapi.log http://195.191.233.1/libvirt-gfapi.log.bz2

I still think these are libgfapi leaks; All the definitely lost bytes come from the library.

==6532== 3,064 (96 direct, 2,968 indirect) bytes in 1 blocks are definitely lost in loss record 1,106 of 1,142 ==6532== at 0x4C2C0D0: calloc (vg_replace_malloc.c:711) ==6532== by 0x10701279: __gf_calloc (mem-pool.c:117) ==6532== by 0x106CC541: xlator_dynload (xlator.c:259) ==6532== by 0xFC4E947: create_master (glfs.c:202) ==6532== by 0xFC4E947: glfs_init_common (glfs.c:863) ==6532== by 0xFC4EB50: glfs_init@@GFAPI_3.4.0 (glfs.c:916) ==6532== by 0xF7E4A33: virStorageFileBackendGlusterInit (storage_backend_gluster.c:625) ==6532== by 0xF7D56DE: virStorageFileInitAs (storage_driver.c:2788) ==6532== by 0xF7D5E39: virStorageFileGetMetadataRecurse (storage_driver.c:3048) ==6532== by 0xF7D6295: virStorageFileGetMetadata (storage_driver.c:3171) ==6532== by 0x1126A2B0: qemuDomainDetermineDiskChain (qemu_domain.c:3179) ==6532== by 0x11269AE6: qemuDomainCheckDiskPresence (qemu_domain.c:2998) ==6532== by 0x11292055: qemuProcessLaunch (qemu_process.c:4708)

Care to reporting it to them?

Of course - i will. But, are You sure there is no need to call glfs_fini() after qemu process is launched? Are all of those resources still needed in libvirt? I understand, that libvirt needs to check presence / other-things of storage, but after qemu is launched? Just trying to understand... Best regards Piotr Rybicki

Michal Privoznik

3:59 p.m.

New subject: [libvirt] Found mem leak in libvirtd, need help to debug

On 09.02.2016 16:34, Piotr Rybicki wrote:

...

W dniu 2016-02-09 o 16:12, Michal Privoznik pisze:

...
On 09.02.2016 13:36, Piotr Rybicki wrote:

...
Hi guys.

W dniu 2015-11-20 o 11:29, Piotr Rybicki pisze:

...
...
I've seen some of theese already. The bug is actually not in libvirt but in gluster's libgfapi library, so any change in libvirt won't help.

This was tracked in gluster as:

https://bugzilla.redhat.com/show_bug.cgi?id=1093594

I suggest you update the gluster library to resolve this issue.

I've tested further this issue.

I have to report, that mem leak still exists in latest versions gluster: 3.7.6 libvirt 1.3.1

mem leak exists even when starting domain (virsh start DOMAIN) which acesses drivie via libgfapi (although leak is much smaller than with gluster 3.5.X).

when using drive via file (gluster fuse mount), there is no mem leak when starting domain.

my drive definition (libgfapi):

<disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writethrough' iothread='1'/> <source protocol='gluster' name='pool/disk-sys.img'> <host name='X.X.X.X' transport='rdma'/> </source> <blockio logical_block_size='512' physical_block_size='32768'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk>

valgrind details (libgfapi):

# valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes libvirtd --listen 2> libvirt-gfapi.log

==6532== Memcheck, a memory error detector ==6532== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6532== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==6532== Command: libvirtd --listen ==6532== ==6532== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints. ==6532== This could cause spurious value errors to appear. ==6532== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. 2016-02-09 12:20:26.732+0000: 6535: info : libvirt version: 1.3.1 2016-02-09 12:20:26.732+0000: 6535: info : hostname: adm-office 2016-02-09 12:20:26.732+0000: 6535: warning : qemuDomainObjTaint:2223 : Domain id=1 name='gentoo-intel' uuid=f9fd934b-cbda-af4e-cc98-0dd2c8dd6c2c is tainted: host-cpu 2016-02-09 12:21:29.924+0000: 6532: error : qemuMonitorIO:689 : internal error: End of file from monitor ==6532== ==6532== HEAP SUMMARY: ==6532== in use at exit: 3,726,573 bytes in 15,324 blocks ==6532== total heap usage: 238,573 allocs, 223,249 frees, 1,020,776,752 bytes allocated

(...)

==6532== LEAK SUMMARY: ==6532== definitely lost: 19,760 bytes in 97 blocks ==6532== indirectly lost: 21,098 bytes in 122 blocks ==6532== possibly lost: 2,698,764 bytes in 67 blocks ==6532== still reachable: 986,951 bytes in 15,038 blocks ==6532== suppressed: 0 bytes in 0 blocks ==6532== ==6532== For counts of detected and suppressed errors, rerun with: -v ==6532== ERROR SUMMARY: 96 errors from 96 contexts (suppressed: 0 from 0)

full log: http://195.191.233.1/libvirt-gfapi.log http://195.191.233.1/libvirt-gfapi.log.bz2

I still think these are libgfapi leaks; All the definitely lost bytes come from the library.

==6532== 3,064 (96 direct, 2,968 indirect) bytes in 1 blocks are definitely lost in loss record 1,106 of 1,142 ==6532== at 0x4C2C0D0: calloc (vg_replace_malloc.c:711) ==6532== by 0x10701279: __gf_calloc (mem-pool.c:117) ==6532== by 0x106CC541: xlator_dynload (xlator.c:259) ==6532== by 0xFC4E947: create_master (glfs.c:202) ==6532== by 0xFC4E947: glfs_init_common (glfs.c:863) ==6532== by 0xFC4EB50: glfs_init@@GFAPI_3.4.0 (glfs.c:916) ==6532== by 0xF7E4A33: virStorageFileBackendGlusterInit (storage_backend_gluster.c:625) ==6532== by 0xF7D56DE: virStorageFileInitAs (storage_driver.c:2788) ==6532== by 0xF7D5E39: virStorageFileGetMetadataRecurse (storage_driver.c:3048) ==6532== by 0xF7D6295: virStorageFileGetMetadata (storage_driver.c:3171) ==6532== by 0x1126A2B0: qemuDomainDetermineDiskChain (qemu_domain.c:3179) ==6532== by 0x11269AE6: qemuDomainCheckDiskPresence (qemu_domain.c:2998) ==6532== by 0x11292055: qemuProcessLaunch (qemu_process.c:4708)

Care to reporting it to them?

Of course - i will.

But, are You sure there is no need to call glfs_fini() after qemu process is launched? Are all of those resources still needed in libvirt?

I understand, that libvirt needs to check presence / other-things of storage, but after qemu is launched?

We call glfs_fini(). And that's the problem. It does not free everything that glfs_init() allocated. Hence the leaks. Actually every time we call glfs_init() we print a debug message from virStorageFileBackendGlusterInit() which wraps it. And then another debug message from virStorageFileBackendGlusterDeinit() when we call glfs_fini(). So if you set up debug logs, you can check whether our init and finish calls match. Michal

Piotr Rybicki

11 Feb 11 Feb

3:15 p.m.

New subject: [libvirt] Found mem leak in libvirtd, need help to debug

...

...
...
I still think these are libgfapi leaks; All the definitely lost bytes come from the library.

==6532== 3,064 (96 direct, 2,968 indirect) bytes in 1 blocks are definitely lost in loss record 1,106 of 1,142 ==6532== at 0x4C2C0D0: calloc (vg_replace_malloc.c:711) ==6532== by 0x10701279: __gf_calloc (mem-pool.c:117) ==6532== by 0x106CC541: xlator_dynload (xlator.c:259) ==6532== by 0xFC4E947: create_master (glfs.c:202) ==6532== by 0xFC4E947: glfs_init_common (glfs.c:863) ==6532== by 0xFC4EB50: glfs_init@@GFAPI_3.4.0 (glfs.c:916) ==6532== by 0xF7E4A33: virStorageFileBackendGlusterInit (storage_backend_gluster.c:625) ==6532== by 0xF7D56DE: virStorageFileInitAs (storage_driver.c:2788) ==6532== by 0xF7D5E39: virStorageFileGetMetadataRecurse (storage_driver.c:3048) ==6532== by 0xF7D6295: virStorageFileGetMetadata (storage_driver.c:3171) ==6532== by 0x1126A2B0: qemuDomainDetermineDiskChain (qemu_domain.c:3179) ==6532== by 0x11269AE6: qemuDomainCheckDiskPresence (qemu_domain.c:2998) ==6532== by 0x11292055: qemuProcessLaunch (qemu_process.c:4708)

Care to reporting it to them?

Of course - i will.

But, are You sure there is no need to call glfs_fini() after qemu process is launched? Are all of those resources still needed in libvirt?

I understand, that libvirt needs to check presence / other-things of storage, but after qemu is launched?

We call glfs_fini(). And that's the problem. It does not free everything that glfs_init() allocated. Hence the leaks. Actually every time we call glfs_init() we print a debug message from virStorageFileBackendGlusterInit() which wraps it. And then another debug message from virStorageFileBackendGlusterDeinit() when we call glfs_fini(). So if you set up debug logs, you can check whether our init and finish calls match.

Thanks Michal, You are right. Leak still exists in newest gluster 3.7.8 There is even simpler case to see this memleak. valgrind on: qemu-img info gluster://SERVER_IP:0/pool/FILE.img ==6100== LEAK SUMMARY: ==6100== definitely lost: 19,846 bytes in 98 blocks ==6100== indirectly lost: 2,479,205 bytes in 182 blocks ==6100== possibly lost: 240,600 bytes in 7 blocks ==6100== still reachable: 3,271,130 bytes in 2,931 blocks ==6100== suppressed: 0 bytes in 0 blocks So it's definitely gluster fault. I've just reported it on gluster-devel@ Best regards Piotr Rybicki

Piotr Rybicki

4 Dec 4 Dec

3:59 p.m.

W dniu 2015-11-20 o 09:13, Peter Krempa pisze:

...

On Thu, Nov 19, 2015 at 21:13:56 +0100, Piotr Rybicki wrote:

...
W dniu 2015-11-19 o 17:31, Michal Privoznik pisze:

...

...
...
==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,444 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0x115AE41A: qemuDomainStorageFileInit (qemu_domain.c:2929) ==2650== by 0x1163DE5A: qemuDomainSnapshotCreateSingleDiskActive (qemu_driver.c:14201) ==2650== by 0x1163E604: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14371) ==2650== by 0x1163ED27: qemuDomainSnapshotCreateActiveExternal (qemu_driver.c:14559) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,445 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) ==2650== by 0x1163E843: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14421) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,446 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4DC5: virStorageFileGetMetadataRecurse (storage_driver.c:3054) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980)

I've seen some of theese already. The bug is actually not in libvirt but in gluster's libgfapi library, so any change in libvirt won't help.

This was tracked in gluster as:

https://bugzilla.redhat.com/show_bug.cgi?id=1093594

I suggest you update the gluster library to resolve this issue.

Hello. Unfortunatelly, memleak still exists (although smaller with glusterfs 3.7.6) latest versions: libvirt 1.2.21 qemu 2.4.1 glusterfs 3.7.6 steps to reproduce: * virsh domblklist KVM * qemu-img create -f qcow2 -o backing_file=gluster(...) - precreate backing file * virsh snapshot-create KVM SNAP.xml (...) - create snapshot from precreated XML snapshot file * cp main img file * virsh blockcommit KVM disk (...) valgrind --leak-check=full --show-reachable=yes --child-silent-after-fork=yes /usr/sbin/libvirtd --listen 2> valgrind.log valgrind.log: http://www.filedropper.com/valgrind (sorry for crappy pastebin server, wikisend seems to have problem). Best regards Piotr Rybicki

3576

Age (days ago)

3661

Last active (days ago)

List overview

Download

13 comments

3 participants

participants (3)

Michal Privoznik
Peter Krempa
Piotr Rybicki

[libvirt] Found mem leak in livirt, need help to debug

tags

participants (3)