On Wed, Jul 22, 2009 at 09:25:02PM -0400, john cooper wrote:
This patch allows passing of a "-mem-path <arg>"
flag to qemu for support of huge page backed
guests. A guest may request this option via
specifying:
<hugepage>on</hugepage>
in its domain definition xml file.
This really opens a can of worms. While obviously this maps
very simply onto KVM's -mem-path argument, I can't help
thinking things are going to get much more advanced later.
For example, I don't think a boolean on/off is sufficient for
this, since Xen already has a 3rd option of 'best effort' it
uses by default where it'll try to allocate hugepages and
fallback to normal pages - in fact you can't tell Xen not
to use hugepages AFAIK. I'm also wondering whether we need
to be concerned about different hugepage sizes for guest
configs eg 2M vs 1 GB, vs a mix of both - who decides?
KVM also seems to have ability to request that huge pages
are pre-allocated upfront, vs on demand, though I'm not
sure what happens to a VM if it doesn't pre-allocate and
it later can't be satisfied.
The request
for huge page backing will be attempted within
libvirt if the host system has indicated a
hugetlbfs mount point in qemu.conf, for example:
hugepage_mount = "/hugetlbfs"
Seems like it would be simpler to just open /proc/mounts
and scan it to find whether/where hugetlbfs is mounted,
so it would 'just work' if the user had mounted it.
_and_ the target qemu executable is aware of
the "-mem-path" flag. Otherwise this request
by a guest will result in an error.
It looks like argument is not available in upstream QEMU, only
part of the KVM fork ? ANy idea why it hasn't been sent upstream,
and/or whether it will be soon. I'm loathe to add more KVM
specific options since we've been burnt everytime we've done
this in the past with its semantics changing when merged to
QEMU :-(
This patch does not address setup of the required
host hugetlbfs mount point, verifying the mount
point is correct/usable, nor assure sufficient
free huge pages are available; which are assumed
to be addressed by other means.
I agree that setting up hugetlbfs is out of scope for libvirt.
We should just probe to see whether its available or not.
We ought to have some way of reporting available hugepages
though, both at a host level, and likely per NUMA node too.
Without this a mgmt app using libvirt has no clue whether they'll
be able to actually use hugepages successfully or not.
Regards,
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|