Re: [libvirt] [RFC PATCH 00/12] Add support for memory hotplug

Monday, 9 February 2015

On Mon, Feb 09, 2015 at 08:38:52 -0500, John Ferlan wrote:
...
 On 01/30/2015 08:20 AM, Peter Krempa wrote:
 > !! this series applies on top of the cleanup series posted earlier !!
 > 
 > Hi,
 > this is my try to implement memory hotplug in libvirt (for the qemu) driver.
 > This series is almost code-complete but is lacking tests and docs as I wanted
 > to agree on design first before attempting to write the documentation.
 > 
 > Additionally this series is also lacking code that would fix memballoon
 > handling but I'm waiting on a clousure how it's supposed to work from
qemu's
 > side as it appears to be broken there too.
 > 
 > The basic XML used to add a memory device:
 >     <memory model='acpi-dimm'>
 >       <target>
 >         <size unit='KiB'>524287</size>
 >         <node>0</node>
 >       </target>
 >     </memory>
 > 
 > The <target> subelement is mandatory, whereas the <source> subelement
(that
 > contains source numa nodes, source page size ) is optional and is inferred
 > from the NUMA tuning for given target numa node.
 > 
 > Please note that at least one guest numa node has to be configured for the
 > guest for this to work (limitation of qemu).
 > 

 Been kind of back-burner thinking about this series overall for the last
 couple of days. After reading Daniel's review - I think he touched on a
 few things I was pondering. Since I haven't followed the upstream QEMU
 work on this - I assume it's following that work very closely. I also
 assume other hypervisor's would/could be interested in providing similar
 functionality (hot plug memory) - I'm left wondering - is the model
 chosen flexible enough for that? You certainly will need a "healthy
 dose" of documentation to describe how to configure values properly.
 Perhaps even an extra utility to output what's available to be provided
 for the XML.

 The one area that seems to be very QEMU implementation specific is the
 "<address type='acpi-dimm' slot='#'
base='0xHEX'/>" XML. Maybe that
 becomes a "<driver name='qemu' slot='#'
base='0xHEX'/>" instead leaving
 "<address type='acpi-dimm'/>" or even empty? Allowing users to
provide
 "base" implies either extra validation to ensure it's 'correct'
(format
 & usage/duplication) or documentation to describe what/how to provide
 the data. Perhaps even an extra utility to provide "valid values". If 
Firstly, the address is by default generated automatically by qemu. We
represent it only for migration purposes so that the address doesn't
change in that case. The user ideally shouldn't provide any address and
let the hypervisor assign the module itself. That is the main reason I
chose to do it via the <address> element. Users usually don't touch
those while having the slot and base in the device itself would likely
incline the users to provide some values.

We can also make the address type qemu specific, but I still think that
the right place for the slot and base data is the address element.

Secondly, for validating of the actual data, we use qemu. The data is
passed through to the emulator and if it is incorrect it will either be
tweaked to the actual value used or rejected. We could (and I probably
will add this check in the next version) check the slot ID as it has to
be less than the maximum count configured at startup.

For the address qemu is doing alignment and other checks that I don't
want to represent in libvirt as it would be mostly black magic and
require us to mirror qemu. Basically the user shouldn't touch the base
address ... ever.

...
 QEMU is expecting specific values - maybe it'd be in our best
interests
 not to expect the values. I suppose I'm having a difficult time
 visualizing the values and what they express. Also what if someone
 attempts to "poor man's clone" a live guest (dumpxml > file, edit file
 to change name, remove/change uuid, change disk source data, etc., then
 start). If they don't change those addresses - what will happen?  I 
That is actually exactly what they should do. The address represents the
start address in the physical address space in the guest where the
memory module is placed at thus it can be duplicated.

...
 would think it would be "expected of" libvirt to ensure
what's passed to
 QEMU is "uniquely expressed" anyway, right? What's still not clear to me
 is the relationship between 'acpi-dimm' and NUMA slots/nodes.

 FWIW:  As an aside, after patch8, it would be beneficial to update
 qemuxml2argv-memory-hotplug.args in order to include the expected output
 of the "pc-dimm,node=%d,memdev=mem%s,id=%s" and "-object xxx -device
 xxxx"... 
Tests are still missing for this series. I'll be tweaking them before
the non-RFC posting.

...

 A tangent... When I see 'acpi-' in the XML - I think firmware or some
 external memory device which firmware needs to find/recognize... Long 
Well it actually is this case. If a piece of the unused physicall
address space would become additional memory, then without any
interaction between the hardware (in this case qemu) and the guest
operating system the kernel wouldn't be able to guess that some memory
appeared and is free to use. ACPI in this case is used to notify the
guest kernel of the new memory that appeared.

...
 ago I worked on a project that used a memory device as a cluster
 interconnect (memory channel). A "practical use" in a virtual world
 could be as an interconnect for migrations. This is what I thought of
 when I saw a <memory> "device" with 'acpi-...' as opposed to a
memory
 device that's providing host NUMA nodes/slots to the guest as extra
 available hot-plug memory. 
That sounds more like RDMA-migration to me.

...

 John

Peter

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [RFC PATCH 00/12] Add support for memory hotplug