Hi,
This is very rough and early, but I wanted to get some feedback,
possibly advice, and see if there's some interest in at least creating
infrastructure for user contributed libvirt hooks, if not some default
ones that are no-ops unless configured.
The impetus for this is that I started trying to tune a dual-socket
system for performance with device assignment and quickly became very
frustrated that distros don't provide any built-in support for more than
the very basics of hugepages. Yes, there are kernel commandline
options, but those don't allow node specific configuration. Yes,
there's 'virsh allocpages', but how does that get incorporated
automatically into initializing libvirtd or a domain? Creating any sort
of persistence for hugepages is an exercise for the user.
So I think the first step in this is that the hooks scripts[1] should by
default support sub-scripts in the common way, with a ".d" sub-directory
holding those scripts, for example daemon.d and qemu.d. In the attached
file, I've simply commandeered the default script to call the
sub-scripts. For compatibility (ie. not overwriting user scripts), that
should probably happen within libvirt. In any case, a single monolithic
hook file is impossible to maintain on a system, let alone multiple
systems, so this needs to be brought up to date.
The second step is that even if we drop user contrib hooks out
in /usr/share for admins to pull in as desired, perhaps we can provide
some consistency for how to configure those hooks. In the example below
I propose /etc/sysconfig/libvirt-hook-config.xml. You can see how
currently it supports static and dynamic hugepage hooks, static
occurring through the daemon hook and dynamic through the qemu hook.
Ideally the dynamic hook would simply list the domain names
participating in dynamic hugepages and figure out what needs to be
allocated where from the domain xml. I haven't gotten that far (and
frankly trying to satisfy cpu/numa vs numatune/memory|memnode vs
memoryBacking/hugepages and memory size still looks very confusing to
me). Do we want a common place to configure this sort of thing? Is XML
the right format?
On to the scripts themselves. I got some advice on #virt that I should
use 'virsh allocpages' to manage hugepages. Despite the warning not to
call into libvirt in the hook documentation[1], I was assured it'd be ok
here. However, somehow 'virsh freepages' did manage to hang and my
RHEL7.1 system doesn't support allocpages yet, so my prototype uses raw
sysfs. AFAICT, any sort of hugepage manipulation is inherently broken
because of the racy kernel interfaces. We really need a hugepage
broker, but that's well beyond the scope of libvirt.
Functionally this seems to work well for me. I don't know how practical
it is to support dynamic 1G pages; I'd probably encourage static setup
for that as my system only survived a couple rounds before getting too
fragmented. 2M dynamic seems to work quite nicely though.
TL;DR, I thought I'd post this, even in a rough state to see if there's
interest, get nitpicks at my terrible scripting, and make sure I'm not
just scratching my own itch. Thanks,
Alex
[1]
https://www.libvirt.org/hooks.html