On Thu, Jul 24, 2014 at 02:20:22PM +0200, Peter Krempa wrote:
For the hotplug to work the VM needs to be started with a certain
number
of "dimm" slots for plugging virtual memory modules. The memory of the
VM at startup has to occupy at least one of the slots. Later on the
management can decide to plug more memory to the guest by inserting a
virtual memory module into the guest.
For representing this in libvirt I'm thinking of using the <devices>
section of our domain XML where we'd add a new device type:
<memory type="ram">
<source .../> <!-- will be elaborated below -->
<target .../> <!-- will be elaborated below -->
<address type="acpi-dimm" slot="1"/>
</memory>
type="ram" to denote that we are adding RAM memory. This will allow
possible extensions for example to add a generic pflash (type="flash",
type="rom") device or other
memory type mapped in the address space.
To enable this infrastructure qemu needs two command line options
supplied, one setting the maximum amount of supportable memory and the
second one the maximum number of memory modules (capped at 256 due to ACPI).
The current XML format for specifying memory looks like this:
<memory unit='KiB'>524288</memory>
<currentMemory unit='KiB'>524288</currentMemory>
I'm thinking of adding following attributes to the memory element:
<memory slots='16' max-memory='1'
max-memory-unit='TiB'/>
This would then be updated to the actual size after summing sizes of the
memory modules to:
<memory slots='16' max-memory='1'
max-memory-unit='TiB'/>
unit='MiB'>512</memory>
Given that we already have <memory> and <currentMemory> it
feels a little odd to be adding max-memory as an attribute
instead of doing <maxMemory slots='16'
unit='TiB'>1</maxMemory>.
This would also allow a possibility to specify the line above and
libvirt would then add a memory module holding the whole guest memory.
If a guest has multiple NUMA nodes, this would imply that the one
default memory module were spanning multiple NUMA nodes. That does
not make much conceptual sense to me. At the very minimum you need
to have 1 memory slot per guest NUMA node.
Representing the memory module as a device will then allow us to use
the
existing hot(un)plug APIs to do the operations on the actual VM.
Ok, that does sort of make sense as a goal.
For the ram memory type the source and target elements will allow to
specify the following option.
For backing the guest with normal memory:
<source type="ram" size="500" unit="MiB"
host-node="2"/>
For hugepage-backed guest:
<source type="hugepage" page-size="2048" count="1024"
node="1"/>
This design concerns me because it seems like it is adding alot
of redundant information vs existing XML schema work we've done
to represent NUMA placement / huge page allocation for VMs.
Note: node attribute target's host numa node and is optional.
And possibly others possibly for the rom/flash types:
<source type="file" path="/asdf"/>
For targetting the RAM module the target element could have the
following format:
<target model="dimm" node='2' address='0xdeadbeef'/>
"node" determines the guest numa node to connect the memory "module"
to.
The attribute is optional for non-numa guests or node 0 is assumed.
If I'm thinking about this from a physical hardware POV, it doesn't
make a whole lot of sense for the NUMA node to be configurable at
the time you plug in the DIMM. The NUMA affinity is a property of
how the slot is wired into the memory controller. Plugging the DIMM
cannot change that.
So from that POV, I'd say that when we initially configure the
NUMA / huge page information for a guest at boot time, we should
be doing that wrt to the 'maxMemory' size, instead of the current
'memory' size. ie the actual NUMA topology is all setup upfront
even though the DIMMS are not present for some of this topology.
"address" determines the address in the guest's memory
space where the
memory will be mapped. This is optional and not recommended being set by
the user (except for special cases).
For expansion the model="pflash" device may be added.
For migration the target VM needs to be started with the hotplugged
modules already specified on the command line, which is in line how we
treat devices currently.
My suggestion above contrasts with the approach Michal and Martin took
when adding the numa and hugepage backing capabilities as they describe
a node while this describes the memory device beneath it. I think those
two approaches can co-exist whilst being mutually-exclusive. Simply when
using memory hotplug, the memory will need to be specified using the
memory modules. Non-hotplug guests could use the approach defined
originally.
I don't think it is viable to have two different approaches for configuring
NUMA / huge page information. Apps should not have to change the way they
configure NUMA/hugepages when they decide they want to take advantage of
DIMM hotplug.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|