[...]
Actually, it is not. It's caused by our design of the client event loop. If there are
any incoming data, read as much as possible placing
them at the end of linked list of incoming stream data (stream is a way that libvirt uses
to transfer binary data). Problem is that instead
of returning NULL to our malloc()-s once the limit is reached, kernel decides to kill us.
This is actually a serious issue.
I cannot effectively limit the memory that the backup process is using with
MemoryLimit=500M .
Due to Linux' issues with the page cache (which I mentioned before), and to the large
amount of memory that "virsh vol-download" is using,
my whole server becomes unresponsive for many minutes under the high I/O load.
If I have understood the issue correctly, attempting to limit the I/O bandwidth may even
further increase the queue length and therefore the
memory usage.
In any case, my server has to have quite a lot of free RAM for "virsh
vol-downloaded" not to get randomly killed. How much free RAM is
necessary probably depends on the current disk read performance.
Is there anything I can do with virsh to at least mitigate this problem?
I have written an alternative script to copy the .qcow2 files directly, bypassing virsh:
https://github.com/rdiez/Tools/blob/master/VirtualMachineManager/BackupVm.sh
But with that script I have hit file permission issues with the .qcow2 files. I do not
know much about libvirt yet, but I have a feeling
that such permission problems are long standing and not actually properly addressed yet. I
am no good script coder, so I fear running the
script as root may cause havoc or create security risks.
I would welcome a configuration option to set the .qcow2 file permissions to an arbitrary
user and group, at least when the VM is shut off.
Thanks for your help,
rdiez