Memory locking limit and zero-copy migrations

Hi, do I read libvirt sources right that when <memtune> is not used in the libvirt domain then libvirt takes proper care about setting memory locking limits when zero-copy is requested for a migration? I also wonder whether there are any other situations where memory limits could be set by libvirt or QEMU automatically rather than having no memory limits? We had oVirt bugs in the past where certain VMs with VFIO devices couldn't be started due to extra requirements on the amount of locked memory and adding <hard_limit> to the domain apparently helped. Thanks, Milan

On Wed, Aug 17, 2022 at 10:56:54 +0200, Milan Zamazal wrote:
Hi,
do I read libvirt sources right that when <memtune> is not used in the libvirt domain then libvirt takes proper care about setting memory locking limits when zero-copy is requested for a migration?
Well yes, for a definition of "proper". In this instance qemu can lock up to the guest-visible memory size of memory for the migration, thus we set the lockable size to the guest memory size. This is a simple upper bound which is supposed to work in all scenarios. Qemu is also unlikely to ever use up all the allowed locking.
I also wonder whether there are any other situations where memory limits could be set by libvirt or QEMU automatically rather than having no memory limits? We had oVirt bugs in the past where certain VMs with VFIO devices couldn't be started due to extra requirements on the amount of locked memory and adding <hard_limit> to the domain apparently helped.
<hard_limit> is not only an amount of memory qemu can lock into ram, but an upper bound of all memory the qemu process can consume. This includes any qemu overhead e.g. used for the emulation layer. Guessing the correct size of overhead still has the same problems it had and libvirt is not going to be in the business of doing that.

Peter Krempa <pkrempa@redhat.com> writes:
On Wed, Aug 17, 2022 at 10:56:54 +0200, Milan Zamazal wrote:
Hi,
do I read libvirt sources right that when <memtune> is not used in the libvirt domain then libvirt takes proper care about setting memory locking limits when zero-copy is requested for a migration?
Well yes, for a definition of "proper". In this instance qemu can lock up to the guest-visible memory size of memory for the migration, thus we set the lockable size to the guest memory size. This is a simple upper bound which is supposed to work in all scenarios. Qemu is also unlikely to ever use up all the allowed locking.
Great, thank you for confirmation.
I also wonder whether there are any other situations where memory limits could be set by libvirt or QEMU automatically rather than having no memory limits? We had oVirt bugs in the past where certain VMs with VFIO devices couldn't be started due to extra requirements on the amount of locked memory and adding <hard_limit> to the domain apparently helped.
<hard_limit> is not only an amount of memory qemu can lock into ram, but an upper bound of all memory the qemu process can consume. This includes any qemu overhead e.g. used for the emulation layer.
Guessing the correct size of overhead still has the same problems it had and libvirt is not going to be in the business of doing that.
To clarify, my point was not whether libvirt should, but whether libvirt or any related component possibly does (or did in the past) impose memory limits. Because as I was looking around it seems there are no real memory limits by default, at least in libvirt, but some limit had been apparently hit in the reported bugs.

I can share some test results with you: 1. If no memtune->hard_limit is set when start a vm, the default memlock hard limit is 64MB 2. If memtune->hard_limit is set when start a vm, memlock hard limit will be set to the value of memtune->hard_limit 3. If memtune->hard_limit is updated at run-time, memlock hard limit won't be changed accordingly And some additional knowledge: 1. memlock hard limit can be shown by ‘prlimit -p <pid-of-qemu> -l’ 2. The default value of memlock hard limit can be changed by setting LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service BR, Fangge Jin On Wed, Aug 17, 2022 at 19:25 Milan Zamazal <mzamazal@redhat.com> wrote:
Peter Krempa <pkrempa@redhat.com> writes:
On Wed, Aug 17, 2022 at 10:56:54 +0200, Milan Zamazal wrote:
Hi,
do I read libvirt sources right that when <memtune> is not used in the libvirt domain then libvirt takes proper care about setting memory locking limits when zero-copy is requested for a migration?
Well yes, for a definition of "proper". In this instance qemu can lock up to the guest-visible memory size of memory for the migration, thus we set the lockable size to the guest memory size. This is a simple upper bound which is supposed to work in all scenarios. Qemu is also unlikely to ever use up all the allowed locking.
Great, thank you for confirmation.
I also wonder whether there are any other situations where memory limits could be set by libvirt or QEMU automatically rather than having no memory limits? We had oVirt bugs in the past where certain VMs with VFIO devices couldn't be started due to extra requirements on the amount of locked memory and adding <hard_limit> to the domain apparently helped.
<hard_limit> is not only an amount of memory qemu can lock into ram, but an upper bound of all memory the qemu process can consume. This includes any qemu overhead e.g. used for the emulation layer.
Guessing the correct size of overhead still has the same problems it had and libvirt is not going to be in the business of doing that.
To clarify, my point was not whether libvirt should, but whether libvirt or any related component possibly does (or did in the past) impose memory limits. Because as I was looking around it seems there are no real memory limits by default, at least in libvirt, but some limit had been apparently hit in the reported bugs.

On Thu, Aug 18, 2022 at 10:46 AM Fangge Jin <fjin@redhat.com> wrote:
I can share some test results with you: 1. If no memtune->hard_limit is set when start a vm, the default memlock hard limit is 64MB 2. If memtune->hard_limit is set when start a vm, memlock hard limit will be set to the value of memtune->hard_limit 3. If memtune->hard_limit is updated at run-time, memlock hard limit won't be changed accordingly
And some additional knowledge: 1. memlock hard limit can be shown by ‘prlimit -p <pid-of-qemu> -l’ 2. The default value of memlock hard limit can be changed by setting LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service
Or /usr/lib/systemd/system/libvirtd.service here.
BR, Fangge Jin

Fangge Jin <fjin@redhat.com> writes:
I can share some test results with you: 1. If no memtune->hard_limit is set when start a vm, the default memlock hard limit is 64MB 2. If memtune->hard_limit is set when start a vm, memlock hard limit will be set to the value of memtune->hard_limit 3. If memtune->hard_limit is updated at run-time, memlock hard limit won't be changed accordingly
And some additional knowledge: 1. memlock hard limit can be shown by ‘prlimit -p <pid-of-qemu> -l’ 2. The default value of memlock hard limit can be changed by setting LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service
Ah, that explains it to me, thank you. And since in the default case the systemd limit is not reported in <memtune> of a running VM, I assume libvirt takes it as "not set" and sets the higher limit when setting up a zero-copy migration. Good. Regards, Milan
BR, Fangge Jin
On Wed, Aug 17, 2022 at 19:25 Milan Zamazal <mzamazal@redhat.com> wrote:
Peter Krempa <pkrempa@redhat.com> writes:
On Wed, Aug 17, 2022 at 10:56:54 +0200, Milan Zamazal wrote:
Hi,
do I read libvirt sources right that when <memtune> is not used in the libvirt domain then libvirt takes proper care about setting memory locking limits when zero-copy is requested for a migration?
Well yes, for a definition of "proper". In this instance qemu can lock up to the guest-visible memory size of memory for the migration, thus we set the lockable size to the guest memory size. This is a simple upper bound which is supposed to work in all scenarios. Qemu is also unlikely to ever use up all the allowed locking.
Great, thank you for confirmation.
I also wonder whether there are any other situations where memory limits could be set by libvirt or QEMU automatically rather than having no memory limits? We had oVirt bugs in the past where certain VMs with VFIO devices couldn't be started due to extra requirements on the amount of locked memory and adding <hard_limit> to the domain apparently helped.
<hard_limit> is not only an amount of memory qemu can lock into ram, but an upper bound of all memory the qemu process can consume. This includes any qemu overhead e.g. used for the emulation layer.
Guessing the correct size of overhead still has the same problems it had and libvirt is not going to be in the business of doing that.
To clarify, my point was not whether libvirt should, but whether libvirt or any related component possibly does (or did in the past) impose memory limits. Because as I was looking around it seems there are no real memory limits by default, at least in libvirt, but some limit had been apparently hit in the reported bugs.

On Thu, Aug 18, 2022 at 2:46 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Fangge Jin <fjin@redhat.com> writes:
I can share some test results with you: 1. If no memtune->hard_limit is set when start a vm, the default memlock hard limit is 64MB 2. If memtune->hard_limit is set when start a vm, memlock hard limit will be set to the value of memtune->hard_limit 3. If memtune->hard_limit is updated at run-time, memlock hard limit won't be changed accordingly
And some additional knowledge: 1. memlock hard limit can be shown by ‘prlimit -p <pid-of-qemu> -l’ 2. The default value of memlock hard limit can be changed by setting LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service
Ah, that explains it to me, thank you. And since in the default case the systemd limit is not reported in <memtune> of a running VM, I assume libvirt takes it as "not set" and sets the higher limit when setting up a zero-copy migration. Good.
Not sure whether you already know this, but I had a hard time differentiating the two concepts: 1. memlock hard limit(shown by prlimit): the hard limit for locked host memory 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use host memory, this memory can be swapped out.
Regards, Milan

Fangge Jin <fjin@redhat.com> writes:
On Thu, Aug 18, 2022 at 2:46 PM Milan Zamazal <mzamazal@redhat.com> wrote:
Fangge Jin <fjin@redhat.com> writes:
I can share some test results with you: 1. If no memtune->hard_limit is set when start a vm, the default memlock hard limit is 64MB 2. If memtune->hard_limit is set when start a vm, memlock hard limit will be set to the value of memtune->hard_limit 3. If memtune->hard_limit is updated at run-time, memlock hard limit won't be changed accordingly
And some additional knowledge: 1. memlock hard limit can be shown by ‘prlimit -p <pid-of-qemu> -l’ 2. The default value of memlock hard limit can be changed by setting LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service
Ah, that explains it to me, thank you. And since in the default case the systemd limit is not reported in <memtune> of a running VM, I assume libvirt takes it as "not set" and sets the higher limit when setting up a zero-copy migration. Good.
Not sure whether you already know this, but I had a hard time differentiating the two concepts: 1. memlock hard limit(shown by prlimit): the hard limit for locked host memory 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use host memory, this memory can be swapped out.
No, I didn't know it, thank you for pointing this out. Indeed, 2. is what both the libvirt and kernel documentation seem to say, although not so clearly. But when I add <memtune> with <hard_limit> to the domain XML and then start the VM, I can see the limit shown by `prlimit -l' is increased accordingly. This is good for my use case, but does it match what you say about the two concepts?

On Fri, Aug 19, 2022 at 4:08 AM Milan Zamazal <mzamazal@redhat.com> wrote:
Not sure whether you already know this, but I had a hard time differentiating the two concepts: 1. memlock hard limit(shown by prlimit): the hard limit for locked host memory 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use host memory, this memory can be swapped out.
No, I didn't know it, thank you for pointing this out. Indeed, 2. is what both the libvirt and kernel documentation seem to say, although not so clearly.
But when I add <memtune> with <hard_limit> to the domain XML and then start the VM, I can see the limit shown by `prlimit -l' is increased accordingly. This is good for my use case, but does it match what you say about the two concepts?
memtune->hard_limit(hard limit of in-use memory) actually takes effect via cgroup, you can check the value by: # virsh memtune uefi1 hard_limit : 134217728 soft_limit : unlimited swap_hard_limit: unlimited # cat /sys/fs/cgroup/memory/machine.slice/machine-qemu\\x2d6\\x2duefi1.scope/libvirt/memory.limit_in_bytes 137438953472 When vm starts with memtune->hard_limit set in domain XML, memlock hard limit( hard_limit of locked memory, shown by 'prlimit -l')will be set to the value of memtune->hard_limit. This's probably because memlock hard limit must be less than memtune->hard_limit.

Fangge Jin <fjin@redhat.com> writes:
On Fri, Aug 19, 2022 at 4:08 AM Milan Zamazal <mzamazal@redhat.com> wrote:
Not sure whether you already know this, but I had a hard time differentiating the two concepts: 1. memlock hard limit(shown by prlimit): the hard limit for locked host memory 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use host memory, this memory can be swapped out.
No, I didn't know it, thank you for pointing this out. Indeed, 2. is what both the libvirt and kernel documentation seem to say, although not so clearly.
But when I add <memtune> with <hard_limit> to the domain XML and then start the VM, I can see the limit shown by `prlimit -l' is increased accordingly. This is good for my use case, but does it match what you say about the two concepts?
memtune->hard_limit(hard limit of in-use memory) actually takes effect via cgroup, you can check the value by: # virsh memtune uefi1 hard_limit : 134217728 soft_limit : unlimited swap_hard_limit: unlimited # cat /sys/fs/cgroup/memory/machine.slice/machine-qemu\\x2d6\\x2duefi1.scope/libvirt/memory.limit_in_bytes
137438953472
When vm starts with memtune->hard_limit set in domain XML, memlock hard limit( hard_limit of locked memory, shown by 'prlimit -l')will be set to the value of memtune->hard_limit. This's probably because memlock hard limit must be less than memtune->hard_limit.
Well, increasing the memlock limit to keep it within memtune->hard_limit wouldn't make much sense, but thank you for confirming that setting memtune->hard_limit adjusts both the limits to the requested value.

On Fri, Aug 19, 2022 at 08:01:58 +0200, Milan Zamazal wrote:
Fangge Jin <fjin@redhat.com> writes:
On Fri, Aug 19, 2022 at 4:08 AM Milan Zamazal <mzamazal@redhat.com> wrote:
Not sure whether you already know this, but I had a hard time differentiating the two concepts: 1. memlock hard limit(shown by prlimit): the hard limit for locked host memory 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use host memory, this memory can be swapped out.
No, I didn't know it, thank you for pointing this out. Indeed, 2. is what both the libvirt and kernel documentation seem to say, although not so clearly.
But when I add <memtune> with <hard_limit> to the domain XML and then start the VM, I can see the limit shown by `prlimit -l' is increased accordingly. This is good for my use case, but does it match what you say about the two concepts?
memtune->hard_limit(hard limit of in-use memory) actually takes effect via cgroup, you can check the value by: # virsh memtune uefi1 hard_limit : 134217728 soft_limit : unlimited swap_hard_limit: unlimited # cat /sys/fs/cgroup/memory/machine.slice/machine-qemu\\x2d6\\x2duefi1.scope/libvirt/memory.limit_in_bytes
137438953472
When vm starts with memtune->hard_limit set in domain XML, memlock hard limit( hard_limit of locked memory, shown by 'prlimit -l')will be set to the value of memtune->hard_limit. This's probably because memlock hard limit must be less than memtune->hard_limit.
Well, increasing the memlock limit to keep it within memtune->hard_limit wouldn't make much sense, but thank you for confirming that setting memtune->hard_limit adjusts both the limits to the requested value.
Right, whenever libvirt needs to set a memory locking limit for zero copy migration it uses memtune->hard_limit if it is set (RDMA migration does the same), otherwise the configured memory amount for the domain is used. Anyway, unless you have a specific reason for setting the hard_limit, it's usually better to leave it unset as the overall memory consumption of a domain including the memory consumed by QEMU is hard to predict and you would risk the domain to be killed unexpectedly by the kernel when the limit is reached. Jirka
participants (4)
-
Fangge Jin
-
Jiri Denemark
-
Milan Zamazal
-
Peter Krempa