On Mon, Feb 09, 2015 at 08:38:52 -0500, John Ferlan wrote:
On 01/30/2015 08:20 AM, Peter Krempa wrote:
> !! this series applies on top of the cleanup series posted earlier !!
>
> Hi,
> this is my try to implement memory hotplug in libvirt (for the qemu) driver.
> This series is almost code-complete but is lacking tests and docs as I wanted
> to agree on design first before attempting to write the documentation.
>
> Additionally this series is also lacking code that would fix memballoon
> handling but I'm waiting on a clousure how it's supposed to work from
qemu's
> side as it appears to be broken there too.
>
> The basic XML used to add a memory device:
> <memory model='acpi-dimm'>
> <target>
> <size unit='KiB'>524287</size>
> <node>0</node>
> </target>
> </memory>
>
> The <target> subelement is mandatory, whereas the <source> subelement
(that
> contains source numa nodes, source page size ) is optional and is inferred
> from the NUMA tuning for given target numa node.
>
> Please note that at least one guest numa node has to be configured for the
> guest for this to work (limitation of qemu).
>
Been kind of back-burner thinking about this series overall for the last
couple of days. After reading Daniel's review - I think he touched on a
few things I was pondering. Since I haven't followed the upstream QEMU
work on this - I assume it's following that work very closely. I also
assume other hypervisor's would/could be interested in providing similar
functionality (hot plug memory) - I'm left wondering - is the model
chosen flexible enough for that? You certainly will need a "healthy
dose" of documentation to describe how to configure values properly.
Perhaps even an extra utility to output what's available to be provided
for the XML.
The one area that seems to be very QEMU implementation specific is the
"<address type='acpi-dimm' slot='#'
base='0xHEX'/>" XML. Maybe that
becomes a "<driver name='qemu' slot='#'
base='0xHEX'/>" instead leaving
"<address type='acpi-dimm'/>" or even empty? Allowing users to
provide
"base" implies either extra validation to ensure it's 'correct'
(format
& usage/duplication) or documentation to describe what/how to provide
the data. Perhaps even an extra utility to provide "valid values". If
Firstly, the address is by default generated automatically by qemu. We
represent it only for migration purposes so that the address doesn't
change in that case. The user ideally shouldn't provide any address and
let the hypervisor assign the module itself. That is the main reason I
chose to do it via the <address> element. Users usually don't touch
those while having the slot and base in the device itself would likely
incline the users to provide some values.
We can also make the address type qemu specific, but I still think that
the right place for the slot and base data is the address element.
Secondly, for validating of the actual data, we use qemu. The data is
passed through to the emulator and if it is incorrect it will either be
tweaked to the actual value used or rejected. We could (and I probably
will add this check in the next version) check the slot ID as it has to
be less than the maximum count configured at startup.
For the address qemu is doing alignment and other checks that I don't
want to represent in libvirt as it would be mostly black magic and
require us to mirror qemu. Basically the user shouldn't touch the base
address ... ever.
QEMU is expecting specific values - maybe it'd be in our best
interests
not to expect the values. I suppose I'm having a difficult time
visualizing the values and what they express. Also what if someone
attempts to "poor man's clone" a live guest (dumpxml > file, edit file
to change name, remove/change uuid, change disk source data, etc., then
start). If they don't change those addresses - what will happen? I
That is actually exactly what they should do. The address represents the
start address in the physical address space in the guest where the
memory module is placed at thus it can be duplicated.
would think it would be "expected of" libvirt to ensure
what's passed to
QEMU is "uniquely expressed" anyway, right? What's still not clear to me
is the relationship between 'acpi-dimm' and NUMA slots/nodes.
FWIW: As an aside, after patch8, it would be beneficial to update
qemuxml2argv-memory-hotplug.args in order to include the expected output
of the "pc-dimm,node=%d,memdev=mem%s,id=%s" and "-object xxx -device
xxxx"...
Tests are still missing for this series. I'll be tweaking them before
the non-RFC posting.
A tangent... When I see 'acpi-' in the XML - I think firmware or some
external memory device which firmware needs to find/recognize... Long
Well it actually is this case. If a piece of the unused physicall
address space would become additional memory, then without any
interaction between the hardware (in this case qemu) and the guest
operating system the kernel wouldn't be able to guess that some memory
appeared and is free to use. ACPI in this case is used to notify the
guest kernel of the new memory that appeared.
ago I worked on a project that used a memory device as a cluster
interconnect (memory channel). A "practical use" in a virtual world
could be as an interconnect for migrations. This is what I thought of
when I saw a <memory> "device" with 'acpi-...' as opposed to a
memory
device that's providing host NUMA nodes/slots to the guest as extra
available hot-plug memory.
That sounds more like RDMA-migration to me.
John
Peter