On Mon, Mar 13, 2017 at 11:58:24AM -0400, Luiz Capitulino wrote:
Libvirt commit c2e60ad0e51 added a new check to the XML validation
logic where XMLs containing <memoryBacking><mlocked/> must also
contain <memtune><hard_limit>. This causes two breakages where
working guests won't start anymore:
1. Systems where mlock limit was set in /etc/security/limits.conf
I'm surprised if that has any effect, unless you were setting it
against the root user.
The limits.conf file is loaded by PAM, and when libvirtd spawns
QEMU, PAM is not involved, so limits.conf will never be activated.
This is why libvirt provides max_processes/max_files/max_core
settings in /etc/libvirt/qemu.conf - you can't set those in
limits.conf and have them work - unless you set them against
root, so libvirtd itself got the higher limits which are then
inherited by QEMU.
2. Guests using hugeTLB pages. In this case, guests were relying
on the fact that libvirt automagically increases mlock
limit to 1GB
Yep, that's bad - we mustn't break previously working scenarios
like this, even if there were not following documented practice.
While it's true that <memoryBacking><mlocked/>
documentation
says that <memtune><hard_limit> is required, this is actually
an extremely bad request because:
A. <memtune><hard_limit> own documention strongly recommends
NOT using it
Yep, hard limit is impossible to calculate reliably since no one
has been able to provide an accurate way to predict QEMU's peak
memory usage. When libvirt previously set hard_limit by default,
we got many bug reports about guest's killed by the OOM killer,
no matter what algorithm we tried.
B. <memtune><hard_limit> does more than setting memory
locking
limit
C. <memtune><hard_limit> does not support infinity, so you have
to guess a limit
D. If <memtune><hard_limit> is less than 1GB, it will lower
VFIO's automatic limit of "guest memory + 1GB"
Here's two possible solutions to fix this all:
1. Drop change c2e60ad0e51 and drop automatic increases. Let
users configure limits in /etc/security/limits.conf
pros: this is the most correct way to do it, and how
it should be done originally IMHO
cons: will break working VFIO setups, so probably undoable
limits.conf is useless - see above.
2. Drop change c2e60ad0e51 and automtically increase memory
locking limit to infinity when seeing <memoryBacking><locked/>
pros: make all cases work, no more <hard_limit> requirement
cons: allows guests with <locked/> to lock all memory
assigned to them plus QEMU allocations. While this seems
undesirable or even a security issue, using <hard_limit>
will have the same effect
I think this is the only viable approach, given that no one can
provide a way to reliably calculate QEMU peak memory usage.
Unless we want to take guest RAM + $LARGE NUMBER - eg just blindly
assume that 2 GB is enough for QEMU working set, so for an 8 GB
guest, just set 10 GB as the limit.
Lastly, <locked/> doesn't belong to <memoryBacking>,
it should
be in <memtune>. I recommend deprecating it from <memoryBacking>
and adding it where it belongs.
We never make these kind of changes in libvirt XML. It is sub-optimal
location, but it has no functional problem, so there's no functional
benefit to moving it and clear backcompat downsides.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://entangle-photo.org -o-
http://search.cpan.org/~danberr/ :|