virsh vol-download uses a lot of memory

R. Diez

22 Jan 2020 22 Jan '20

10:03 a.m.

Hi all: I am using the libvirt version that comes with Ubuntu 18.04.3 LTS. I have written a script that backs up my virtual machines every night. I want to limit the amount of memory that this backup operation consumes, mainly to prevent page cache thrashing. I have described the Linux page cache thrashing issue in detail here: http://rdiez.shoutwiki.com/wiki/Today%27s_Operating_Systems_are_still_incred... The VM virtual disk weighs 140 GB at the moment. I thought 500 MiB of RAM should be more than enough to back it up, so I added the following options to the systemd service file associated to the systemd timer I am using: MemoryLimit=500M However, the OOM is killing "virsh vol-download": Jan 21 23:40:00 GS-CEL-L kernel: [55535.913525] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 21 23:40:00 GS-CEL-L kernel: [55535.913527] [ 13232] 1000 13232 5030 786 77824 103 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913528] [ 13267] 1000 13267 5063 567 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913529] [ 13421] 1000 13421 5063 458 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913530] [ 13428] 1000 13428 712847 124686 5586944 523997 0 virsh Jan 21 23:40:00 GS-CEL-L kernel: [55535.913532] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/system.slice/VmBackup.service,task_memcg=/system.slice/VmBackup.service,task=virsh,pid=13428,uid=1000 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913538] Memory cgroup out of memory: Killed process 13428 (virsh) total-vm:2851388kB, anon-rss:486180kB, file-rss:12564kB, shmem-rss:0kB I wonder why "virsh vol-download" needs so much RAM. It does not get killed straight away, it takes a few minutes to get killed. It starts using a VMSIZE of around 295 MiB, which is not really frugal for a file download operation, but then it grows and grows. Note that the virtual machine is not running (shut off) while doing the backup. Last time I tried with an increased memory limit of 5G, "virsh vol-download" was killed when using 7,4 G, and the partially-downloaded volume file weighted 60G. Therefore, it looks like "virsh vol-download" is using a percentage of the downloaded size in RAM. Is there a way to make "virsh vol-download" use less memory? Thanks in advance, rdiez

Show replies by date

Michal Privoznik

22 Jan 22 Jan

11:11 a.m.

On 1/22/20 10:03 AM, R. Diez wrote:

...

Hi all:

I am using the libvirt version that comes with Ubuntu 18.04.3 LTS.

I'm sorry, I don't have Ubuntu installed anywhere to look the version up. Can you run 'virsh version' to find it out for me please?

...

I have written a script that backs up my virtual machines every night. I want to limit the amount of memory that this backup operation consumes, mainly to prevent page cache thrashing. I have described the Linux page cache thrashing issue in detail here:

http://rdiez.shoutwiki.com/wiki/Today%27s_Operating_Systems_are_still_incred...

The VM virtual disk weighs 140 GB at the moment. I thought 500 MiB of RAM should be more than enough to back it up, so I added the following options to the systemd service file associated to the systemd timer I am using:

MemoryLimit=500M

However, the OOM is killing "virsh vol-download":

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913525] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 21 23:40:00 GS-CEL-L kernel: [55535.913527] [ 13232] 1000 13232 5030 786 77824 103 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913528] [ 13267] 1000 13267 5063 567 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913529] [ 13421] 1000 13421 5063 458 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913530] [ 13428] 1000 13428 712847 124686 5586944 523997 0 virsh Jan 21 23:40:00 GS-CEL-L kernel: [55535.913532] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/system.slice/VmBackup.service,task_memcg=/system.slice/VmBackup.service,task=virsh,pid=13428,uid=1000

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913538] Memory cgroup out of memory: Killed process 13428 (virsh) total-vm:2851388kB, anon-rss:486180kB, file-rss:12564kB, shmem-rss:0kB

I wonder why "virsh vol-download" needs so much RAM. It does not get killed straight away, it takes a few minutes to get killed. It starts using a VMSIZE of around 295 MiB, which is not really frugal for a file download operation, but then it grows and grows.

This is very likely a memory leak somewhere. Can you try to run virsh under valgrind and download a small disk? valgrind could help us identify the leak. For instance: valgrind --leak-check=full virsh vol-download /path/to/small/volume /tmp/blah; rm /tmp/blah However, I am unable to reproduce with the current git master so looks like the leak was fixed - question is, which commit fixed it so that your distro maintainers can backport it. Michal

Michal Privoznik

1:01 p.m.

On 1/22/20 11:11 AM, Michal Privoznik wrote:

...

On 1/22/20 10:03 AM, R. Diez wrote:

...
Hi all:

I am using the libvirt version that comes with Ubuntu 18.04.3 LTS.

I'm sorry, I don't have Ubuntu installed anywhere to look the version up. Can you run 'virsh version' to find it out for me please?

Nevermind, I've managed to reproduce with the latest libvirt anyway.

...

...
I have written a script that backs up my virtual machines every night. I want to limit the amount of memory that this backup operation consumes, mainly to prevent page cache thrashing. I have described the Linux page cache thrashing issue in detail here:

http://rdiez.shoutwiki.com/wiki/Today%27s_Operating_Systems_are_still_incred...

The VM virtual disk weighs 140 GB at the moment. I thought 500 MiB of RAM should be more than enough to back it up, so I added the following options to the systemd service file associated to the systemd timer I am using:

MemoryLimit=500M

However, the OOM is killing "virsh vol-download":

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913525] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 21 23:40:00 GS-CEL-L kernel: [55535.913527] [ 13232] 1000 13232 5030 786 77824 103 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913528] [ 13267] 1000 13267 5063 567 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913529] [ 13421] 1000 13421 5063 458 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913530] [ 13428] 1000 13428 712847 124686 5586944 523997 0 virsh Jan 21 23:40:00 GS-CEL-L kernel: [55535.913532] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/system.slice/VmBackup.service,task_memcg=/system.slice/VmBackup.service,task=virsh,pid=13428,uid=1000

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913538] Memory cgroup out of memory: Killed process 13428 (virsh) total-vm:2851388kB, anon-rss:486180kB, file-rss:12564kB, shmem-rss:0kB

I wonder why "virsh vol-download" needs so much RAM. It does not get killed straight away, it takes a few minutes to get killed. It starts using a VMSIZE of around 295 MiB, which is not really frugal for a file download operation, but then it grows and grows.

This is very likely a memory leak somewhere.

Actually, it is not. It's caused by our design of the client event loop. If there are any incoming data, read as much as possible placing them at the end of linked list of incoming stream data (stream is a way that libvirt uses to transfer binary data). Problem is that instead of returning NULL to our malloc()-s once the limit is reached, kernel decides to kill us. For anybody with libvirt insight: virNetClientIOHandleInput() -> virNetClientCallDispatch() -> virNetClientCallDispatchStream() -> virNetClientStreamQueuePacket(). The obvious fix would be to stop processing incoming packets if stream has "too much" data cached (define "too much"). But this may lead to unresponsive client event loop - if the client doesn't pull data from incoming stream fast enough they won't be able to make any other RPC. Anybody got any ideas? Michal

Daniel P. Berrangé

1:18 p.m.

On Wed, Jan 22, 2020 at 01:01:42PM +0100, Michal Privoznik wrote:

...

On 1/22/20 11:11 AM, Michal Privoznik wrote:

...
On 1/22/20 10:03 AM, R. Diez wrote:

...
Hi all:

I am using the libvirt version that comes with Ubuntu 18.04.3 LTS.

I'm sorry, I don't have Ubuntu installed anywhere to look the version up. Can you run 'virsh version' to find it out for me please?

Nevermind, I've managed to reproduce with the latest libvirt anyway.

...
...
I have written a script that backs up my virtual machines every night. I want to limit the amount of memory that this backup operation consumes, mainly to prevent page cache thrashing. I have described the Linux page cache thrashing issue in detail here:

http://rdiez.shoutwiki.com/wiki/Today%27s_Operating_Systems_are_still_incred...

The VM virtual disk weighs 140 GB at the moment. I thought 500 MiB of RAM should be more than enough to back it up, so I added the following options to the systemd service file associated to the systemd timer I am using:

MemoryLimit=500M

However, the OOM is killing "virsh vol-download":

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913525] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 21 23:40:00 GS-CEL-L kernel: [55535.913527] [ 13232] 1000 13232 5030 786 77824 103 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913528] [ 13267] 1000 13267 5063 567 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913529] [ 13421] 1000 13421 5063 458 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913530] [ 13428] 1000 13428 712847 124686 5586944 523997 0 virsh Jan 21 23:40:00 GS-CEL-L kernel: [55535.913532] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/system.slice/VmBackup.service,task_memcg=/system.slice/VmBackup.service,task=virsh,pid=13428,uid=1000

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913538] Memory cgroup out of memory: Killed process 13428 (virsh) total-vm:2851388kB, anon-rss:486180kB, file-rss:12564kB, shmem-rss:0kB

I wonder why "virsh vol-download" needs so much RAM. It does not get killed straight away, it takes a few minutes to get killed. It starts using a VMSIZE of around 295 MiB, which is not really frugal for a file download operation, but then it grows and grows.

This is very likely a memory leak somewhere.

Actually, it is not. It's caused by our design of the client event loop. If there are any incoming data, read as much as possible placing them at the end of linked list of incoming stream data (stream is a way that libvirt uses to transfer binary data). Problem is that instead of returning NULL to our malloc()-s once the limit is reached, kernel decides to kill us.

For anybody with libvirt insight: virNetClientIOHandleInput() -> virNetClientCallDispatch() -> virNetClientCallDispatchStream() -> virNetClientStreamQueuePacket().

The obvious fix would be to stop processing incoming packets if stream has "too much" data cached (define "too much"). But this may lead to unresponsive client event loop - if the client doesn't pull data from incoming stream fast enough they won't be able to make any other RPC.

IMHO if they're not pulling stream data and still expecting to make other RPC calls in a timely manner, then their code is broken. Having said that, in retrospect I rather regret ever implementing our stream APIs as we did. We really should have just exposed an API which lets you spawn an NBD server associated with a storage volume, or tunnelled NBD over libvirtd. The former is probably our best strategy these days, now that NBD has native TLS support. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Peter Crowther

1:44 p.m.

Architecturally, separating the data and control channels feels like the right approach - whether nbd or something else. Would need signposting for those of us who routinely implement firewalling on hosts, but that's a detail. I presume there's no flow control on streams at the moment? Cheers, Peter On Wed, 22 Jan 2020, 12:18 Daniel P. Berrangé, <berrange@redhat.com> wrote:

...

On Wed, Jan 22, 2020 at 01:01:42PM +0100, Michal Privoznik wrote:

...
On 1/22/20 11:11 AM, Michal Privoznik wrote:

...
On 1/22/20 10:03 AM, R. Diez wrote:

...
Hi all:

I am using the libvirt version that comes with Ubuntu 18.04.3 LTS.

I'm sorry, I don't have Ubuntu installed anywhere to look the version up. Can you run 'virsh version' to find it out for me please?

Nevermind, I've managed to reproduce with the latest libvirt anyway.

...
...
I have written a script that backs up my virtual machines every night. I want to limit the amount of memory that this backup operation consumes, mainly to prevent page cache thrashing. I have described the Linux page cache thrashing issue in detail here:

http://rdiez.shoutwiki.com/wiki/Today%27s_Operating_Systems_are_still_incred...

...
...
The VM virtual disk weighs 140 GB at the moment. I thought 500 MiB of RAM should be more than enough to back it up, so I added the following options to the systemd service file associated to the systemd timer I am using:

MemoryLimit=500M

However, the OOM is killing "virsh vol-download":

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913525] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 21 23:40:00 GS-CEL-L kernel: [55535.913527] [ 13232] 1000 13232 5030 786 77824 103 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913528] [ 13267] 1000 13267 5063 567 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913529] [ 13421] 1000 13421 5063 458 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913530] [ 13428] 1000 13428 712847 124686 5586944 523997 0 virsh Jan 21 23:40:00 GS-CEL-L kernel: [55535.913532]

oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/system.slice/VmBackup.service,task_memcg=/system.slice/VmBackup.service,task=virsh,pid=13428,uid=1000

...
Jan 21 23:40:00 GS-CEL-L kernel: [55535.913538] Memory cgroup out of memory: Killed process 13428 (virsh) total-vm:2851388kB, anon-rss:486180kB, file-rss:12564kB, shmem-rss:0kB

I wonder why "virsh vol-download" needs so much RAM. It does not get killed straight away, it takes a few minutes to get killed. It starts using a VMSIZE of around 295 MiB, which is not really frugal for a file download operation, but then it grows and grows.

This is very likely a memory leak somewhere.

Actually, it is not. It's caused by our design of the client event loop. If there are any incoming data, read as much as possible placing them at the end of linked list of incoming stream data (stream is a way that libvirt uses to transfer binary data). Problem is that instead of returning NULL to our malloc()-s once the limit is reached, kernel decides to kill us.

For anybody with libvirt insight: virNetClientIOHandleInput() -> virNetClientCallDispatch() -> virNetClientCallDispatchStream() -> virNetClientStreamQueuePacket().

The obvious fix would be to stop processing incoming packets if stream has "too much" data cached (define "too much"). But this may lead to unresponsive client event loop - if the client doesn't pull data from incoming stream fast enough they won't be able to make any other RPC.

IMHO if they're not pulling stream data and still expecting to make other RPC calls in a timely manner, then their code is broken.

Having said that, in retrospect I rather regret ever implementing our stream APIs as we did. We really should have just exposed an API which lets you spawn an NBD server associated with a storage volume, or tunnelled NBD over libvirtd. The former is probably our best strategy these days, now that NBD has native TLS support.

Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Michal Privoznik

23 Jan 23 Jan

10:22 a.m.

On 1/22/20 1:18 PM, Daniel P. Berrangé wrote:

...

On Wed, Jan 22, 2020 at 01:01:42PM +0100, Michal Privoznik wrote:

...
On 1/22/20 11:11 AM, Michal Privoznik wrote:

...
On 1/22/20 10:03 AM, R. Diez wrote:

...
Hi all:

I am using the libvirt version that comes with Ubuntu 18.04.3 LTS.

I'm sorry, I don't have Ubuntu installed anywhere to look the version up. Can you run 'virsh version' to find it out for me please?

Nevermind, I've managed to reproduce with the latest libvirt anyway.

...
...
I have written a script that backs up my virtual machines every night. I want to limit the amount of memory that this backup operation consumes, mainly to prevent page cache thrashing. I have described the Linux page cache thrashing issue in detail here:

http://rdiez.shoutwiki.com/wiki/Today%27s_Operating_Systems_are_still_incred...

The VM virtual disk weighs 140 GB at the moment. I thought 500 MiB of RAM should be more than enough to back it up, so I added the following options to the systemd service file associated to the systemd timer I am using:

MemoryLimit=500M

However, the OOM is killing "virsh vol-download":

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913525] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 21 23:40:00 GS-CEL-L kernel: [55535.913527] [ 13232] 1000 13232 5030 786 77824 103 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913528] [ 13267] 1000 13267 5063 567 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913529] [ 13421] 1000 13421 5063 458 73728 132 0 BackupWindows10 Jan 21 23:40:00 GS-CEL-L kernel: [55535.913530] [ 13428] 1000 13428 712847 124686 5586944 523997 0 virsh Jan 21 23:40:00 GS-CEL-L kernel: [55535.913532] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/system.slice/VmBackup.service,task_memcg=/system.slice/VmBackup.service,task=virsh,pid=13428,uid=1000

Jan 21 23:40:00 GS-CEL-L kernel: [55535.913538] Memory cgroup out of memory: Killed process 13428 (virsh) total-vm:2851388kB, anon-rss:486180kB, file-rss:12564kB, shmem-rss:0kB

I wonder why "virsh vol-download" needs so much RAM. It does not get killed straight away, it takes a few minutes to get killed. It starts using a VMSIZE of around 295 MiB, which is not really frugal for a file download operation, but then it grows and grows.

This is very likely a memory leak somewhere.

Actually, it is not. It's caused by our design of the client event loop. If there are any incoming data, read as much as possible placing them at the end of linked list of incoming stream data (stream is a way that libvirt uses to transfer binary data). Problem is that instead of returning NULL to our malloc()-s once the limit is reached, kernel decides to kill us.

For anybody with libvirt insight: virNetClientIOHandleInput() -> virNetClientCallDispatch() -> virNetClientCallDispatchStream() -> virNetClientStreamQueuePacket().

The obvious fix would be to stop processing incoming packets if stream has "too much" data cached (define "too much"). But this may lead to unresponsive client event loop - if the client doesn't pull data from incoming stream fast enough they won't be able to make any other RPC.

IMHO if they're not pulling stream data and still expecting to make other RPC calls in a timely manner, then their code is broken.

This is virsh that we are talking about. It's not some random application. And I am able to limit virsh mem usage to "just" 100MiB with one well placed usleep() - to slow down putting incoming stram packets onto the queue: diff --git i/src/rpc/virnetclientstream.c w/src/rpc/virnetclientstream.c index f904eaba31..cfb3f225f2 100644 --- i/src/rpc/virnetclientstream.c +++ w/src/rpc/virnetclientstream.c @@ -358,6 +358,7 @@ int virNetClientStreamQueuePacket(virNetClientStreamPtr st, virNetClientStreamEventTimerUpdate(st); virObjectUnlock(st); + usleep(1000); return 0; } But any attempt I've made to ignore POLLIN if stream queue is longer than say 8 packets was unsuccessful (the code still read incoming packets and placed them into the queue). I blame passing the bucket algorithm for that (rather than my poor skills :-P).

...

Having said that, in retrospect I rather regret ever implementing our stream APIs as we did. We really should have just exposed an API which lets you spawn an NBD server associated with a storage volume, or tunnelled NBD over libvirtd. The former is probably our best strategy these days, now that NBD has native TLS support.

Yeah, but IIRC NBD wasn't a thing back then, was it? Michal

Daniel P. Berrangé

10:32 a.m.

On Thu, Jan 23, 2020 at 10:22:33AM +0100, Michal Privoznik wrote:

...

On 1/22/20 1:18 PM, Daniel P. Berrangé wrote:

...
On Wed, Jan 22, 2020 at 01:01:42PM +0100, Michal Privoznik wrote:

...
For anybody with libvirt insight: virNetClientIOHandleInput() -> virNetClientCallDispatch() -> virNetClientCallDispatchStream() -> virNetClientStreamQueuePacket().

The obvious fix would be to stop processing incoming packets if stream has "too much" data cached (define "too much"). But this may lead to unresponsive client event loop - if the client doesn't pull data from incoming stream fast enough they won't be able to make any other RPC.

IMHO if they're not pulling stream data and still expecting to make other RPC calls in a timely manner, then their code is broken.

This is virsh that we are talking about. It's not some random application.

Right so the problem doesn't exist in virsh, as it isn't trying to run oither RPC calls while streaming data. So we ought to be able to just stop reading from the socket

...

And I am able to limit virsh mem usage to "just" 100MiB with one well placed usleep() - to slow down putting incoming stram packets onto the queue:

diff --git i/src/rpc/virnetclientstream.c w/src/rpc/virnetclientstream.c index f904eaba31..cfb3f225f2 100644 --- i/src/rpc/virnetclientstream.c +++ w/src/rpc/virnetclientstream.c @@ -358,6 +358,7 @@ int virNetClientStreamQueuePacket(virNetClientStreamPtr st, virNetClientStreamEventTimerUpdate(st);

virObjectUnlock(st); + usleep(1000); return 0; }

But any attempt I've made to ignore POLLIN if stream queue is longer than say 8 packets was unsuccessful (the code still read incoming packets and placed them into the queue). I blame passing the bucket algorithm for that (rather than my poor skills :-P).

I think we need some calls to virNetClientIOUpdateCallback() added around the places where we dealing with the stream data queue. The method will want to be updated so that it avoids setting the READABLE bit if the queue is too large I guess.

...

...
Having said that, in retrospect I rather regret ever implementing our stream APIs as we did. We really should have just exposed an API which lets you spawn an NBD server associated with a storage volume, or tunnelled NBD over libvirtd. The former is probably our best strategy these days, now that NBD has native TLS support.

Yeah, but IIRC NBD wasn't a thing back then, was it?

NBD predates libvirtd in fact though I can't remember when the qemu-mbd tool arrived vs libvirt's streams ! Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

R. Diez

22 Jan 22 Jan

2:13 p.m.

...

[...] Actually, it is not. It's caused by our design of the client event loop. If there are any incoming data, read as much as possible placing them at the end of linked list of incoming stream data (stream is a way that libvirt uses to transfer binary data). Problem is that instead of returning NULL to our malloc()-s once the limit is reached, kernel decides to kill us.

This is actually a serious issue. I cannot effectively limit the memory that the backup process is using with MemoryLimit=500M . Due to Linux' issues with the page cache (which I mentioned before), and to the large amount of memory that "virsh vol-download" is using, my whole server becomes unresponsive for many minutes under the high I/O load. If I have understood the issue correctly, attempting to limit the I/O bandwidth may even further increase the queue length and therefore the memory usage. In any case, my server has to have quite a lot of free RAM for "virsh vol-downloaded" not to get randomly killed. How much free RAM is necessary probably depends on the current disk read performance. Is there anything I can do with virsh to at least mitigate this problem? I have written an alternative script to copy the .qcow2 files directly, bypassing virsh: https://github.com/rdiez/Tools/blob/master/VirtualMachineManager/BackupVm.sh But with that script I have hit file permission issues with the .qcow2 files. I do not know much about libvirt yet, but I have a feeling that such permission problems are long standing and not actually properly addressed yet. I am no good script coder, so I fear running the script as root may cause havoc or create security risks. I would welcome a configuration option to set the .qcow2 file permissions to an arbitrary user and group, at least when the VM is shut off. Thanks for your help, rdiez

R. Diez

1:55 p.m.

...

I'm sorry, I don't have Ubuntu installed anywhere to look the version up. Can you run 'virsh version' to find it out for me please?

$ virsh version Compiled against library: libvirt 4.0.0 Using library: libvirt 4.0.0 Using API: QEMU 4.0.0 Running hypervisor: QEMU 2.11.1

...

This is very likely a memory leak somewhere. Can you try to run virsh under valgrind and download a small disk? valgrind could help us identify the leak. For instance:

valgrind --leak-check=full virsh vol-download /path/to/small/volume /tmp/blah; rm /tmp/blah

However, I am unable to reproduce with the current git master so looks like the leak was fixed - question is, which commit fixed it so that your distro maintainers can backport it.

I have not got the time at the moment, but maybe later. What would you need to get good info from valgrind? For example, are debug symbols included in the binaries? Best regards, rdiez

2113

Age (days ago)

2114

Last active (days ago)

List overview

Download

8 comments

4 participants

participants (4)

Daniel P. Berrangé
Michal Privoznik
Peter Crowther
R. Diez