[libvirt] [PATCH v2 00/10] Introduce NVDIMM support

This is v2 of: https://www.redhat.com/archives/libvir-list/2016-August/msg00055.html diff to v1: - Rebase to the latest HEAD - Introduced relabeling for host side of NVDIMM Michal Privoznik (10): Introduce NVDIMM memory model qemu: Introduce QEMU_CAPS_DEVICE_NVDIMM qemu: Implement NVDIMM conf: Introduce memAccess to <memory/> qemu: Implement memAccess for <memory/> banks security_dac: Label host side of NVDIMM security_selinux: Label host side of NVDIMM security: Introduce internal APIs for memdev labelling secdrivers: Implement memdev relabel APIs qemu_hotplug: Relabel memdev docs/formatdomain.html.in | 41 ++++++-- docs/schemas/domaincommon.rng | 51 ++++++---- src/conf/domain_conf.c | 112 ++++++++++++++++----- src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 4 + src/qemu/qemu_alias.c | 12 ++- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 87 +++++++++++----- src/qemu/qemu_command.h | 2 + src/qemu/qemu_domain.c | 29 ++++-- src/qemu/qemu_hotplug.c | 17 +++- src/security/security_dac.c | 76 ++++++++++++++ src/security/security_driver.h | 9 ++ src/security/security_manager.c | 56 +++++++++++ src/security/security_manager.h | 7 ++ src/security/security_nop.c | 19 ++++ src/security/security_selinux.c | 69 +++++++++++++ src/security/security_stack.c | 38 +++++++ tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml | 1 + .../qemuxml2argv-hugepages-numa.args | 5 +- .../qemuxml2argv-hugepages-pages.args | 24 ++--- .../qemuxml2argv-hugepages-pages2.args | 8 +- .../qemuxml2argv-hugepages-pages3.args | 4 +- .../qemuxml2argv-hugepages-shared.args | 22 ++-- .../qemuxml2argv-memory-hotplug-dimm-addr.args | 5 +- .../qemuxml2argv-memory-hotplug-dimm.args | 5 +- ...muxml2argv-memory-hotplug-nvdimm-memAccess.args | 26 +++++ ...emuxml2argv-memory-hotplug-nvdimm-memAccess.xml | 49 +++++++++ .../qemuxml2argv-memory-hotplug-nvdimm.args | 25 +++++ .../qemuxml2argv-memory-hotplug-nvdimm.xml | 49 +++++++++ tests/qemuxml2argvtest.c | 6 +- 33 files changed, 743 insertions(+), 123 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml -- 2.8.4

NVDIMM is new type of memory introduced in qemu. The idea is that we have a DIMM module that keeps the data persistent across domain reboots. At the domain XML level, we already have some representation of 'dimm' modules. Long story short, we have <memory/> element that lives under <devices/>. Now, the element even has @model attribute which we can use to introduce new memory type: <memory model='nvdimm'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> </memory> So far, this is just a XML parser/formatter extension. QEMU driver implementation is in the next commit. For more info on NVDIMM visit the following web page: http://pmem.io/ Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 26 ++++-- docs/schemas/domaincommon.rng | 32 ++++--- src/conf/domain_conf.c | 97 ++++++++++++++++------ src/conf/domain_conf.h | 2 + src/qemu/qemu_command.c | 6 ++ src/qemu/qemu_domain.c | 1 + .../qemuxml2argv-memory-hotplug-nvdimm.xml | 49 +++++++++++ 7 files changed, 171 insertions(+), 42 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index bfbb0f2..657df8f 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -6740,6 +6740,15 @@ qemu-kvm -net nic,model=? /dev/null <node>1</node> </target> </memory> + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>524287</size> + <node>1</node> + </target> + </memory> </devices> ... </pre> @@ -6747,17 +6756,19 @@ qemu-kvm -net nic,model=? /dev/null <dt><code>model</code></dt> <dd> <p> - Currently only the <code>dimm</code> model is supported in order to - add a virtual DIMM module to the guest. + Select <code>dimm</code> to add a virtual DIMM module to the guest. + Alternatively, <code>nvdimm</code> model adds a Non-Volatile DIMM + module. <span class="since">Since 2.2.0</span> </p> </dd> <dt><code>source</code></dt> <dd> <p> - The optional source element allows to fine tune the source of the - memory used for the given memory device. If the element is not - provided defaults configured via <code>numatune</code> are used. + For model <code>dimm</code> this element is optional and allows to + fine tune the source of the memory used for the given memory device. + If the element is not provided defaults configured via + <code>numatune</code> are used. </p> <p> <code>pagesize</code> can optionally be used to override the default @@ -6770,6 +6781,11 @@ qemu-kvm -net nic,model=? /dev/null <code>nodemask</code> can optionally be used to override the default set of NUMA nodes where the memory would be allocated. </p> + <p> + For model <code>nvdimm</code> this element is mandatory and has a + single child element <code>path</code> which value represents a path + in host that back the nvdimm module in the guest. + </p> </dd> <dt><code>target</code></dt> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 052f28c..e6ad452 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4722,6 +4722,7 @@ <attribute name="model"> <choice> <value>dimm</value> + <value>nvdimm</value> </choice> </attribute> <interleave> @@ -4741,18 +4742,27 @@ <define name="memorydev-source"> <element name="source"> - <interleave> - <optional> - <element name="pagesize"> - <ref name="scaledInteger"/> + <choice> + <group> + <interleave> + <optional> + <element name="pagesize"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="nodemask"> + <ref name="cpuset"/> + </element> + </optional> + </interleave> + </group> + <group> + <element name="path"> + <text/> </element> - </optional> - <optional> - <element name="nodemask"> - <ref name="cpuset"/> - </element> - </optional> - </interleave> + </group> + </choice> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 82876f3..cb5a053 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -840,8 +840,11 @@ VIR_ENUM_DECL(virDomainBlockJob) VIR_ENUM_IMPL(virDomainBlockJob, VIR_DOMAIN_BLOCK_JOB_TYPE_LAST, "", "", "copy", "", "active-commit") -VIR_ENUM_IMPL(virDomainMemoryModel, VIR_DOMAIN_MEMORY_MODEL_LAST, - "", "dimm") +VIR_ENUM_IMPL(virDomainMemoryModel, + VIR_DOMAIN_MEMORY_MODEL_LAST, + "", + "dimm", + "nvdimm") static virClassPtr virDomainObjClass; static virClassPtr virDomainXMLOptionClass; @@ -2345,6 +2348,7 @@ void virDomainMemoryDefFree(virDomainMemoryDefPtr def) if (!def) return; + VIR_FREE(def->path); virBitmapFree(def->sourceNodes); virDomainDeviceInfoClear(&def->info); VIR_FREE(def); @@ -13140,20 +13144,36 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, xmlNodePtr save = ctxt->node; ctxt->node = node; - if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, - &def->pagesize, false, false) < 0) - goto cleanup; - - if ((nodemask = virXPathString("string(./nodemask)", ctxt))) { - if (virBitmapParse(nodemask, &def->sourceNodes, - VIR_DOMAIN_CPUMASK_LEN) < 0) + switch ((virDomainMemoryModel) def->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, + &def->pagesize, false, false) < 0) goto cleanup; - if (virBitmapIsAllClear(def->sourceNodes)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Invalid value of 'nodemask': %s"), nodemask); + if ((nodemask = virXPathString("string(./nodemask)", ctxt))) { + if (virBitmapParse(nodemask, &def->sourceNodes, + VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(def->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'nodemask': %s"), nodemask); + goto cleanup; + } + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (!(def->path = virXPathString("string(./path)", ctxt))) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("path is required for model nvdimm'")); goto cleanup; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } ret = 0; @@ -14541,12 +14561,25 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def, tmp->size != mem->size) continue; - /* source stuff -> match with device */ - if (tmp->pagesize != mem->pagesize) - continue; + switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + /* source stuff -> match with device */ + if (tmp->pagesize != mem->pagesize) + continue; - if (!virBitmapEqual(tmp->sourceNodes, mem->sourceNodes)) - continue; + if (!virBitmapEqual(tmp->sourceNodes, mem->sourceNodes)) + continue; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (STRNEQ(tmp->path, mem->path)) + continue; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } break; } @@ -21705,23 +21738,35 @@ virDomainMemorySourceDefFormat(virBufferPtr buf, char *bitmap = NULL; int ret = -1; - if (!def->pagesize && !def->sourceNodes) + if (!def->pagesize && !def->sourceNodes && !def->path) return 0; virBufferAddLit(buf, "<source>\n"); virBufferAdjustIndent(buf, 2); - if (def->sourceNodes) { - if (!(bitmap = virBitmapFormat(def->sourceNodes))) - goto cleanup; + switch ((virDomainMemoryModel) def->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (def->sourceNodes) { + if (!(bitmap = virBitmapFormat(def->sourceNodes))) + goto cleanup; - virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap); + virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap); + } + + if (def->pagesize) + virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n", + def->pagesize); + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + virBufferAsprintf(buf, "<path>%s</path>\n", def->path); + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } - if (def->pagesize) - virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n", - def->pagesize); - virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</source>\n"); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 8b26724..4e6a9bf 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1939,6 +1939,7 @@ struct _virDomainRNGDef { typedef enum { VIR_DOMAIN_MEMORY_MODEL_NONE, VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */ + VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */ VIR_DOMAIN_MEMORY_MODEL_LAST } virDomainMemoryModel; @@ -1947,6 +1948,7 @@ struct _virDomainMemoryDef { /* source */ virBitmapPtr sourceNodes; unsigned long long pagesize; /* kibibytes */ + char *path; /* target */ int model; /* virDomainMemoryModel */ diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 55df23d..a4cc87f 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3393,6 +3393,12 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) break; + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + virReportError(VIR_ERR_NO_SUPPORT, "%s", + _("nvdimm not supported yet")); + return NULL; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index efc46f9..28ee81d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5203,6 +5203,7 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, { switch ((virDomainMemoryModel) mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM && mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml new file mode 100644 index 0000000..e932241 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml @@ -0,0 +1,49 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>1267710</memory> + <currentMemory unit='KiB'>1267710</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <idmap> + <uid start='0' target='1000' count='10'/> + <gid start='0' target='1000' count='10'/> + </idmap> + <cpu> + <topology sockets='2' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='219136' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'/> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + </memory> + </devices> +</domain> -- 2.8.4

On 08/11/2016 09:26 AM, Michal Privoznik wrote:
NVDIMM is new type of memory introduced in qemu. The idea is that
s/in qemu/into QEMU 2.6/
we have a DIMM module that keeps the data persistent across
s/DIMM/Non Volatile memory/
domain reboots.
Perhaps a word of two about what is the usefulness of such a beast. I think (from a former project) one usage is to store parameters for firmware such as OVMF Add extra line between paragraphs...
At the domain XML level, we already have some representation of 'dimm' modules. Long story short, we have <memory/> element that lives under <devices/>. Now, the element even has @model attribute which we can use to introduce new memory type:
<memory model='nvdimm'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> </memory>
So far, this is just a XML parser/formatter extension. QEMU driver implementation is in the next commit.
For more info on NVDIMM visit the following web page:
One could also google it ;-)... In any case, the actual details are found the Documents link from that page, subject to move over time of course.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 26 ++++-- docs/schemas/domaincommon.rng | 32 ++++--- src/conf/domain_conf.c | 97 ++++++++++++++++------ src/conf/domain_conf.h | 2 + src/qemu/qemu_command.c | 6 ++ src/qemu/qemu_domain.c | 1 + .../qemuxml2argv-memory-hotplug-nvdimm.xml | 49 +++++++++++ 7 files changed, 171 insertions(+), 42 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index bfbb0f2..657df8f 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -6740,6 +6740,15 @@ qemu-kvm -net nic,model=? /dev/null <node>1</node> </target> </memory> + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>524287</size> + <node>1</node> + </target> + </memory> </devices> ... </pre> @@ -6747,17 +6756,19 @@ qemu-kvm -net nic,model=? /dev/null <dt><code>model</code></dt> <dd> <p> - Currently only the <code>dimm</code> model is supported in order to - add a virtual DIMM module to the guest. + Select <code>dimm</code> to add a virtual DIMM module to the guest.
Ugh... The 'dimm' Since tag is in the "Memory devices" section... Which just make the following awkward. I think you should grab that 1.2.14 from above and place it here. Rather than Select go with "Use" or "Provide"
+ Alternatively, <code>nvdimm</code> model adds a Non-Volatile DIMM
Use the same format - e.g. "Use/Provide <code>nvdimm</code> to add a Non-Volatile DIMM module."
+ module. <span class="since">Since 2.2.0</span>
Well we won't hit 2.2.0....
</p> </dd>
<dt><code>source</code></dt> <dd> <p> - The optional source element allows to fine tune the source of the - memory used for the given memory device. If the element is not - provided defaults configured via <code>numatune</code> are used. + For model <code>dimm</code> this element is optional and allows to + fine tune the source of the memory used for the given memory device. + If the element is not provided defaults configured via + <code>numatune</code> are used. </p> <p> <code>pagesize</code> can optionally be used to override the default @@ -6770,6 +6781,11 @@ qemu-kvm -net nic,model=? /dev/null <code>nodemask</code> can optionally be used to override the default set of NUMA nodes where the memory would be allocated. </p>
Since pagesize and nodemask are for DIMM only, so they probably need to be converted to some sort of list or in some way indented to ensure the visual cue is there that they are meant for dimm. Perhaps the end of the dimm paragraph needs a "If <code>source</code> is provided, then at least one of the following values would be provided:". I think the only way to get the right formatting look is : <dd> <dt><code>pagesize</code></dt> <p> ... </p> <dt><code>nodemask</code></dt> <p> ... </p> </dd>
+ <p> + For model <code>nvdimm</code> this element is mandatory and has a + single child element <code>path</code> which value represents a path
s/which value/that/ ?
+ in host that back the nvdimm module in the guest.
s/in host that back/in the host that backs/ Should there be any mention that this also requires "<maxMemory slots='#'...>" to be set? So something that isn't quite clear... For 'dimm', these are 'extra' memory, while for 'nvdimm 'it seems to be one hunk that's really not extra memory - rather it's memory that can be used for a specific purpose. What's not clear to me - is the existing "physical" memory in the guest (e.g. the numa node page) is used to map to the file or does this appear as extra memory that is only accessible for this purpose? Also, does the numa node have to be properly sized? Your example shows a 214Mb numa cell and a 511MB nvdimm. Doesn't seem to me that fits quite right. BTW: That video you point to in your v1 indicates two types of NVDIMM - "persistent memory namespace" (accessed using loads/stores, mapped to physical memory) and "block mode namespace" (accessed via block operations, atomicity at block level, indirect mapping thru a block window, no mapping entier memory. So which is KVM? Since libvirt must be the master to more than one driver - should there be some sort of subtype="block|persistent" as well? Or we could just indicate KVM only for now...
+ </p> </dd>
<dt><code>target</code></dt> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 052f28c..e6ad452 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4722,6 +4722,7 @@ <attribute name="model"> <choice> <value>dimm</value> + <value>nvdimm</value> </choice> </attribute> <interleave> @@ -4741,18 +4742,27 @@
<define name="memorydev-source"> <element name="source"> - <interleave> - <optional> - <element name="pagesize"> - <ref name="scaledInteger"/> + <choice> + <group> + <interleave> + <optional> + <element name="pagesize"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="nodemask"> + <ref name="cpuset"/> + </element> + </optional> + </interleave> + </group> + <group> + <element name="path"> + <text/> </element> - </optional> - <optional> - <element name="nodemask"> - <ref name="cpuset"/> - </element> - </optional> - </interleave> + </group> + </choice> </element> </define>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 82876f3..cb5a053 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -840,8 +840,11 @@ VIR_ENUM_DECL(virDomainBlockJob) VIR_ENUM_IMPL(virDomainBlockJob, VIR_DOMAIN_BLOCK_JOB_TYPE_LAST, "", "", "copy", "", "active-commit")
-VIR_ENUM_IMPL(virDomainMemoryModel, VIR_DOMAIN_MEMORY_MODEL_LAST, - "", "dimm") +VIR_ENUM_IMPL(virDomainMemoryModel, + VIR_DOMAIN_MEMORY_MODEL_LAST, + "", + "dimm", + "nvdimm")
static virClassPtr virDomainObjClass; static virClassPtr virDomainXMLOptionClass; @@ -2345,6 +2348,7 @@ void virDomainMemoryDefFree(virDomainMemoryDefPtr def) if (!def) return;
+ VIR_FREE(def->path); virBitmapFree(def->sourceNodes); virDomainDeviceInfoClear(&def->info); VIR_FREE(def); @@ -13140,20 +13144,36 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, xmlNodePtr save = ctxt->node; ctxt->node = node;
- if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, - &def->pagesize, false, false) < 0) - goto cleanup; - - if ((nodemask = virXPathString("string(./nodemask)", ctxt))) { - if (virBitmapParse(nodemask, &def->sourceNodes, - VIR_DOMAIN_CPUMASK_LEN) < 0) + switch ((virDomainMemoryModel) def->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, + &def->pagesize, false, false) < 0) goto cleanup;
- if (virBitmapIsAllClear(def->sourceNodes)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Invalid value of 'nodemask': %s"), nodemask); + if ((nodemask = virXPathString("string(./nodemask)", ctxt))) { + if (virBitmapParse(nodemask, &def->sourceNodes, + VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(def->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'nodemask': %s"), nodemask); + goto cleanup; + } + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (!(def->path = virXPathString("string(./path)", ctxt))) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("path is required for model nvdimm'")); goto cleanup; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; }
ret = 0; @@ -14541,12 +14561,25 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def, tmp->size != mem->size) continue;
- /* source stuff -> match with device */ - if (tmp->pagesize != mem->pagesize) - continue; + switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + /* source stuff -> match with device */ + if (tmp->pagesize != mem->pagesize) + continue;
- if (!virBitmapEqual(tmp->sourceNodes, mem->sourceNodes)) - continue; + if (!virBitmapEqual(tmp->sourceNodes, mem->sourceNodes)) + continue; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (STRNEQ(tmp->path, mem->path)) + continue; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + }
break; } @@ -21705,23 +21738,35 @@ virDomainMemorySourceDefFormat(virBufferPtr buf, char *bitmap = NULL; int ret = -1;
- if (!def->pagesize && !def->sourceNodes) + if (!def->pagesize && !def->sourceNodes && !def->path) return 0;
virBufferAddLit(buf, "<source>\n"); virBufferAdjustIndent(buf, 2);
- if (def->sourceNodes) { - if (!(bitmap = virBitmapFormat(def->sourceNodes))) - goto cleanup; + switch ((virDomainMemoryModel) def->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (def->sourceNodes) { + if (!(bitmap = virBitmapFormat(def->sourceNodes))) + goto cleanup;
- virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap); + virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap); + } + + if (def->pagesize) + virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n", + def->pagesize); + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + virBufferAsprintf(buf, "<path>%s</path>\n", def->path); + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; }
- if (def->pagesize) - virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n", - def->pagesize); - virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</source>\n");
So do you feel there is a need for a check for different path in virDomainMemoryDefCheckABIStability? I know it's not a target thing, but the target/guest uses it by mapping directly... Should there be checks for order? That is src has dimm, dimm, dimm, nvdim while dst has nvdimm, dimm, dimm, dimm? Would that matter? After going through patch 3, I'm beginning to think it wouldn't be such a good idea to keep nvdimm's and dimm's in the same nmems list. I think they might just be too different. I also recall some algorithm somewhere which made sure addresses and sizes used fit "properly"... Searching on DIMM in cscope brings up a few more references and questions. And well - what about migration?
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 8b26724..4e6a9bf 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1939,6 +1939,7 @@ struct _virDomainRNGDef { typedef enum { VIR_DOMAIN_MEMORY_MODEL_NONE, VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */ + VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */
VIR_DOMAIN_MEMORY_MODEL_LAST } virDomainMemoryModel; @@ -1947,6 +1948,7 @@ struct _virDomainMemoryDef { /* source */ virBitmapPtr sourceNodes; unsigned long long pagesize; /* kibibytes */ + char *path;
/* target */ int model; /* virDomainMemoryModel */ diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 55df23d..a4cc87f 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3393,6 +3393,12 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem)
break;
+ case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + virReportError(VIR_ERR_NO_SUPPORT, "%s", + _("nvdimm not supported yet")); + return NULL; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index efc46f9..28ee81d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5203,6 +5203,7 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, { switch ((virDomainMemoryModel) mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
So the assumption is that NVDIMM will fail the following check? I would think it would want it's own error message like above "not supported yet". Something fortified by the patch 3 code which alters the MemoryDeviceStr to be "pc-dimm" or "nvdimm"...
if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM && mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml new file mode 100644 index 0000000..e932241 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml @@ -0,0 +1,49 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>1267710</memory> + <currentMemory unit='KiB'>1267710</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <idmap> + <uid start='0' target='1000' count='10'/> + <gid start='0' target='1000' count='10'/> + </idmap> + <cpu> + <topology sockets='2' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='219136' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'/> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + </memory> + </devices> +</domain>
No xml2xml tests? I know there's some "newer" way to use file links from tests/qemuxml2xmloutdata to ../qemuxml2argvdata/. IIRC when I did something similar I had to create the link and git add the link... Ah yes, look at the '*luks-disks* - commit id '9bbf0d7e'. But now you've already done it for the rxqsz patch too, so you're the new expert! John

On 01.09.2016 00:51, John Ferlan wrote:
On 08/11/2016 09:26 AM, Michal Privoznik wrote:
NVDIMM is new type of memory introduced in qemu. The idea is that
s/in qemu/into QEMU 2.6/
we have a DIMM module that keeps the data persistent across
s/DIMM/Non Volatile memory/
domain reboots.
Perhaps a word of two about what is the usefulness of such a beast. I think (from a former project) one usage is to store parameters for firmware such as OVMF
Add extra line between paragraphs...
At the domain XML level, we already have some representation of 'dimm' modules. Long story short, we have <memory/> element that lives under <devices/>. Now, the element even has @model attribute which we can use to introduce new memory type:
<memory model='nvdimm'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> </memory>
So far, this is just a XML parser/formatter extension. QEMU driver implementation is in the next commit.
For more info on NVDIMM visit the following web page:
One could also google it ;-)... In any case, the actual details are found the Documents link from that page, subject to move over time of course.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 26 ++++-- docs/schemas/domaincommon.rng | 32 ++++--- src/conf/domain_conf.c | 97 ++++++++++++++++------ src/conf/domain_conf.h | 2 + src/qemu/qemu_command.c | 6 ++ src/qemu/qemu_domain.c | 1 + .../qemuxml2argv-memory-hotplug-nvdimm.xml | 49 +++++++++++ 7 files changed, 171 insertions(+), 42 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index bfbb0f2..657df8f 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -6740,6 +6740,15 @@ qemu-kvm -net nic,model=? /dev/null <node>1</node> </target> </memory> + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>524287</size> + <node>1</node> + </target> + </memory> </devices> ... </pre> @@ -6747,17 +6756,19 @@ qemu-kvm -net nic,model=? /dev/null <dt><code>model</code></dt> <dd> <p> - Currently only the <code>dimm</code> model is supported in order to - add a virtual DIMM module to the guest. + Select <code>dimm</code> to add a virtual DIMM module to the guest.
Ugh... The 'dimm' Since tag is in the "Memory devices" section... Which just make the following awkward. I think you should grab that 1.2.14 from above and place it here.
Rather than Select go with "Use" or "Provide"
+ Alternatively, <code>nvdimm</code> model adds a Non-Volatile DIMM
Use the same format - e.g.
"Use/Provide <code>nvdimm</code> to add a Non-Volatile DIMM module."
+ module. <span class="since">Since 2.2.0</span>
Well we won't hit 2.2.0....
</p> </dd>
<dt><code>source</code></dt> <dd> <p> - The optional source element allows to fine tune the source of the - memory used for the given memory device. If the element is not - provided defaults configured via <code>numatune</code> are used. + For model <code>dimm</code> this element is optional and allows to + fine tune the source of the memory used for the given memory device. + If the element is not provided defaults configured via + <code>numatune</code> are used. </p> <p> <code>pagesize</code> can optionally be used to override the default @@ -6770,6 +6781,11 @@ qemu-kvm -net nic,model=? /dev/null <code>nodemask</code> can optionally be used to override the default set of NUMA nodes where the memory would be allocated. </p>
Since pagesize and nodemask are for DIMM only, so they probably need to be converted to some sort of list or in some way indented to ensure the visual cue is there that they are meant for dimm. Perhaps the end of the dimm paragraph needs a "If <code>source</code> is provided, then at least one of the following values would be provided:".
I think the only way to get the right formatting look is :
<dd> <dt><code>pagesize</code></dt> <p> ... </p> <dt><code>nodemask</code></dt> <p> ... </p> </dd>
+ <p> + For model <code>nvdimm</code> this element is mandatory and has a + single child element <code>path</code> which value represents a path
s/which value/that/ ?
+ in host that back the nvdimm module in the guest.
s/in host that back/in the host that backs/
Should there be any mention that this also requires "<maxMemory slots='#'...>" to be set?
So something that isn't quite clear... For 'dimm', these are 'extra' memory, while for 'nvdimm 'it seems to be one hunk that's really not extra memory - rather it's memory that can be used for a specific purpose. What's not clear to me - is the existing "physical" memory in the guest (e.g. the numa node page) is used to map to the file or does this appear as extra memory that is only accessible for this purpose?
Kernel expose this new memory as a block device. As
Also, does the numa node have to be properly sized? Your example shows a 214Mb numa cell and a 511MB nvdimm. Doesn't seem to me that fits quite right.
This should be okay, It's a new memory that is not accounted as RAM (it's a block device that guest sees).
BTW: That video you point to in your v1 indicates two types of NVDIMM - "persistent memory namespace" (accessed using loads/stores, mapped to physical memory) and "block mode namespace" (accessed via block operations, atomicity at block level, indirect mapping thru a block window, no mapping entier memory.
In both cases the device is shown as block device, but the "block mode" is not supported even in qemu. And I doubt it will ever be.
So which is KVM? Since libvirt must be the master to more than one driver - should there be some sort of subtype="block|persistent" as well? Or we could just indicate KVM only for now...
If we are ever gonna implement the block mode, we can introduce an attribute for which "persistent memory mode" is the default.
+ </p> </dd>
<dt><code>target</code></dt>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 82876f3..cb5a053 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c
@@ -21705,23 +21738,35 @@ virDomainMemorySourceDefFormat(virBufferPtr buf, char *bitmap = NULL; int ret = -1;
- if (!def->pagesize && !def->sourceNodes) + if (!def->pagesize && !def->sourceNodes && !def->path) return 0;
virBufferAddLit(buf, "<source>\n"); virBufferAdjustIndent(buf, 2);
- if (def->sourceNodes) { - if (!(bitmap = virBitmapFormat(def->sourceNodes))) - goto cleanup; + switch ((virDomainMemoryModel) def->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (def->sourceNodes) { + if (!(bitmap = virBitmapFormat(def->sourceNodes))) + goto cleanup;
- virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap); + virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap); + } + + if (def->pagesize) + virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n", + def->pagesize); + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + virBufferAsprintf(buf, "<path>%s</path>\n", def->path); + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; }
- if (def->pagesize) - virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n", - def->pagesize); - virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</source>\n");
So do you feel there is a need for a check for different path in virDomainMemoryDefCheckABIStability? I know it's not a target thing, but the target/guest uses it by mapping directly...
From my understanding qemu will copy the content of NVDIMM over to the destination, therefore we are safe with backend changing its path.
Should there be checks for order? That is src has dimm, dimm, dimm, nvdim while dst has nvdimm, dimm, dimm, dimm? Would that matter?
No, the order shouldn't matter. See my next answer for explanation.
After going through patch 3, I'm beginning to think it wouldn't be such a good idea to keep nvdimm's and dimm's in the same nmems list. I think they might just be too different. I also recall some algorithm somewhere which made sure addresses and sizes used fit "properly"... Searching on DIMM in cscope brings up a few more references and questions.
So, I've just discussed this with Peter who implemented the DIMM (aka mem hotplug) feature. In this case, libvirt tries to not duplicate code that already lives in qemu. So how this works is: on domain startup libvirt just puts the dimms (and after my patches nvdimms as well) onto the qemu cmd line just in the order as in the XML. Then, when qemu is started its magic algorithm assigns addresses to all the modules (so that they don't overlap) and libvirt just queries this information on the monitor. And it is only in case of migration when we explicitly put the addresses onto qemu's cmd line (to preserve ABI stability). On the next run of domain (I mean hard stop & start => new qemu process), the process repeats itself again. So addresses can be possibly different, but I'd say this is okay. For two reasons: even in real hardware if you switch two devices (e.g. DIMM modules, disks, ...) you'll boot into different environment (e.g. disks gets renamed, what used to be sda is now sdb). Secondly, this NVDIMM is meant to be used as a sync mechanism between host & guest. Therefore I think DIMMs and NVDIMMs are very similar and can be kept in the same internal list. At least for now. If we ever discover great difference, we can put them into separate lists - after all it's an internal list and we can change it whenever we please.
And well - what about migration?
Michal

On Thu, Sep 08, 2016 at 03:50:40PM +0200, Michal Privoznik wrote:
After going through patch 3, I'm beginning to think it wouldn't be such a good idea to keep nvdimm's and dimm's in the same nmems list. I think they might just be too different. I also recall some algorithm somewhere which made sure addresses and sizes used fit "properly"... Searching on DIMM in cscope brings up a few more references and questions.
So, I've just discussed this with Peter who implemented the DIMM (aka mem hotplug) feature. In this case, libvirt tries to not duplicate code that already lives in qemu. So how this works is: on domain startup libvirt just puts the dimms (and after my patches nvdimms as well) onto the qemu cmd line just in the order as in the XML. Then, when qemu is started its magic algorithm assigns addresses to all the modules (so that they don't overlap) and libvirt just queries this information on the monitor. And it is only in case of migration when we explicitly put the addresses onto qemu's cmd line (to preserve ABI stability). On the next run of domain (I mean hard stop & start => new qemu process), the process repeats itself again. So addresses can be possibly different, but I'd say this is okay. For two reasons: even in real hardware if you switch two devices (e.g. DIMM modules, disks, ...) you'll boot into different environment (e.g. disks gets renamed, what used to be sda is now sdb). Secondly, this NVDIMM is meant to be used as a sync mechanism between host & guest.
This is very different to how we deal with addressing for all other types of device in libvirt, where we always assign addresses immediately at time the guest is defined. By only assigning addresses transiently when QEMU starts up, then you're liable to get different addressing even if the DIMM config isn't isn't changed. eg addition of other unrelated devices may alter the address QEMU ends up assigning to the DIMMS. Your arguments in favour of not having persistent addresses aren't really in any way specific to DIMMs, so if we wanted to take that approach, then it would apply to all types of device. So I think we need to fix DIMM address assignment so that it works exactly the same way as we do for all other device addressing needs and assign it up front. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Thu, Sep 08, 2016 at 14:59:07 +0100, Daniel Berrange wrote:
On Thu, Sep 08, 2016 at 03:50:40PM +0200, Michal Privoznik wrote:
[...]
This is very different to how we deal with addressing for all other types of device in libvirt, where we always assign addresses immediately at time the guest is defined. By only assigning addresses transiently when QEMU starts up, then you're liable to get different addressing even if the DIMM config isn't isn't changed. eg addition of other unrelated devices may alter the address QEMU ends up assigning to the DIMMS. Your arguments in favour of not having persistent addresses aren't really in any way specific to DIMMs, so if we wanted to take that approach, then it would apply to all types of device. So I think we need to fix DIMM address assignment so that it works exactly the same way as we do for all other device addressing needs and assign it up front.
Well, we certainly can assign the slot numbers (I'm going to fix this soon as it may create problems). It's nearly impossible for us to assign the memory module base address though. This would require to re-implement all the platform specific code that maps memory in qemu which would be ridiculous. I think that an attempt to do so would most certainly break if not kept in sync with qemu. I still strongly prefer not to attempt to do this and just discover the addresses so that we can keep them across migrations. Peter

On 08.09.2016 16:15, Peter Krempa wrote:
On Thu, Sep 08, 2016 at 14:59:07 +0100, Daniel Berrange wrote:
On Thu, Sep 08, 2016 at 03:50:40PM +0200, Michal Privoznik wrote:
[...]
This is very different to how we deal with addressing for all other types of device in libvirt, where we always assign addresses immediately at time the guest is defined. By only assigning addresses transiently when QEMU starts up, then you're liable to get different addressing even if the DIMM config isn't isn't changed. eg addition of other unrelated devices may alter the address QEMU ends up assigning to the DIMMS. Your arguments in favour of not having persistent addresses aren't really in any way specific to DIMMs, so if we wanted to take that approach, then it would apply to all types of device. So I think we need to fix DIMM address assignment so that it works exactly the same way as we do for all other device addressing needs and assign it up front.
Well, we certainly can assign the slot numbers (I'm going to fix this soon as it may create problems).
It's nearly impossible for us to assign the memory module base address though. This would require to re-implement all the platform specific code that maps memory in qemu which would be ridiculous.
I think that an attempt to do so would most certainly break if not kept in sync with qemu. I still strongly prefer not to attempt to do this and just discover the addresses so that we can keep them across migrations.
Agreed. Let's do the slots and leave the mapping address for hypervisor to assign. Moreover, other hypervisors may implement different algorithms, so if we were to copy what qemu does - would that work with other hypervisors? Michal

On Fri, Sep 09, 2016 at 11:31:07AM +0200, Michal Privoznik wrote:
On 08.09.2016 16:15, Peter Krempa wrote:
On Thu, Sep 08, 2016 at 14:59:07 +0100, Daniel Berrange wrote:
On Thu, Sep 08, 2016 at 03:50:40PM +0200, Michal Privoznik wrote:
[...]
This is very different to how we deal with addressing for all other types of device in libvirt, where we always assign addresses immediately at time the guest is defined. By only assigning addresses transiently when QEMU starts up, then you're liable to get different addressing even if the DIMM config isn't isn't changed. eg addition of other unrelated devices may alter the address QEMU ends up assigning to the DIMMS. Your arguments in favour of not having persistent addresses aren't really in any way specific to DIMMs, so if we wanted to take that approach, then it would apply to all types of device. So I think we need to fix DIMM address assignment so that it works exactly the same way as we do for all other device addressing needs and assign it up front.
Well, we certainly can assign the slot numbers (I'm going to fix this soon as it may create problems).
It's nearly impossible for us to assign the memory module base address though. This would require to re-implement all the platform specific code that maps memory in qemu which would be ridiculous.
I think that an attempt to do so would most certainly break if not kept in sync with qemu. I still strongly prefer not to attempt to do this and just discover the addresses so that we can keep them across migrations.
Agreed. Let's do the slots and leave the mapping address for hypervisor to assign. Moreover, other hypervisors may implement different algorithms, so if we were to copy what qemu does - would that work with other hypervisors?
Agreed, doing the slots is sufficient Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Introduce a qemu capability for -device nvdimm. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml | 1 + 4 files changed, 5 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 43e3ea7..b1af4e3 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -340,6 +340,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "display", /* 230 */ "intel-iommu", "smm", + "nvdimm", ); @@ -1561,6 +1562,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "pxb-pcie", QEMU_CAPS_DEVICE_PXB_PCIE }, { "tls-creds-x509", QEMU_CAPS_OBJECT_TLS_CREDS_X509 }, { "intel-iommu", QEMU_CAPS_DEVICE_INTEL_IOMMU }, + { "nvdimm", QEMU_CAPS_DEVICE_NVDIMM }, }; static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsVirtioBalloon[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index d249e2e..a88ab75 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -373,6 +373,7 @@ typedef enum { QEMU_CAPS_DISPLAY, /* -display */ QEMU_CAPS_DEVICE_INTEL_IOMMU, /* -device intel-iommu */ QEMU_CAPS_MACHINE_SMM_OPT, /* -machine xxx,smm=on/off/auto */ + QEMU_CAPS_DEVICE_NVDIMM, /* -device nvdimm */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml index 495c114..b2a82c1 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml @@ -195,6 +195,7 @@ <flag name='display'/> <flag name='intel-iommu'/> <flag name='smm'/> + <flag name='nvdimm'/> <version>2006000</version> <kvmVersion>0</kvmVersion> <package></package> diff --git a/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml index fafffa6..1e4ca33 100644 --- a/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml @@ -192,6 +192,7 @@ <flag name='display'/> <flag name='intel-iommu'/> <flag name='smm'/> + <flag name='nvdimm'/> <version>2006091</version> <kvmVersion>0</kvmVersion> <package> (v2.7.0-rc1-52-g42e0d60)</package> -- 2.8.4

On 08/11/2016 09:26 AM, Michal Privoznik wrote:
Introduce a qemu capability for -device nvdimm.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml | 1 + 4 files changed, 5 insertions(+)
ACK with the obvious merge to top of tree pending... John

So, majority of the code is just ready as-is. Well, with one slight change: differentiate between dimm and nvdimm in places like device alias generation, generating the command line and so on. Speaking of the command line, we also need to append 'nvdimm=on' to the '-machine' argument so that the nvdimm feature is advertised in the ACPI tables properly. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 12 ++- src/qemu/qemu_command.c | 90 ++++++++++++++-------- src/qemu/qemu_command.h | 1 + src/qemu/qemu_domain.c | 28 +++++-- src/qemu/qemu_hotplug.c | 2 +- .../qemuxml2argv-hugepages-numa.args | 5 +- .../qemuxml2argv-hugepages-pages.args | 24 +++--- .../qemuxml2argv-hugepages-pages2.args | 8 +- .../qemuxml2argv-hugepages-pages3.args | 4 +- .../qemuxml2argv-hugepages-shared.args | 22 +++--- .../qemuxml2argv-memory-hotplug-dimm-addr.args | 5 +- .../qemuxml2argv-memory-hotplug-dimm.args | 5 +- .../qemuxml2argv-memory-hotplug-nvdimm.args | 25 ++++++ tests/qemuxml2argvtest.c | 4 +- 14 files changed, 155 insertions(+), 80 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 0102c96..517dca1 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -339,13 +339,19 @@ qemuAssignDeviceMemoryAlias(virDomainDefPtr def, size_t i; int maxidx = 0; int idx; + const char *prefix; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM) + prefix = "dimm"; + else + prefix = "nvdimm"; for (i = 0; i < def->nmems; i++) { - if ((idx = qemuDomainDeviceAliasIndex(&def->mems[i]->info, "dimm")) >= maxidx) + if ((idx = qemuDomainDeviceAliasIndex(&def->mems[i]->info, prefix)) >= maxidx) maxidx = idx + 1; } - if (virAsprintf(&mem->info.alias, "dimm%d", maxidx) < 0) + if (virAsprintf(&mem->info.alias, "%s%d", prefix, maxidx) < 0) return -1; return 0; @@ -445,7 +451,7 @@ qemuAssignDeviceAliases(virDomainDefPtr def, virQEMUCapsPtr qemuCaps) return -1; } for (i = 0; i < def->nmems; i++) { - if (virAsprintf(&def->mems[i]->info.alias, "dimm%zu", i) < 0) + if (qemuAssignDeviceMemoryAlias(def, def->mems[i]) < 0) return -1; } diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a4cc87f..6b83d1c 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3061,6 +3061,7 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, * to, or -1 if NUMA is not used in the guest * @hostNodes: map of host nodes to alloc the memory in, NULL for default * @autoNodeset: fallback nodeset in case of automatic numa placement + * @memPath: request memory-backend-file with specific mem-path * @def: domain definition object * @qemuCaps: qemu capabilities object * @cfg: qemu driver config object @@ -3082,6 +3083,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, int guestNode, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, + const char *memPath, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, @@ -3173,35 +3175,42 @@ qemuBuildMemoryBackendStr(unsigned long long size, if (!(props = virJSONValueNewObject())) return -1; - if (pagesize || hugepage) { - if (pagesize) { - /* Now lets see, if the huge page we want to use is even mounted - * and ready to use */ - for (i = 0; i < cfg->nhugetlbfs; i++) { - if (cfg->hugetlbfs[i].size == pagesize) - break; - } + if (memPath || pagesize || hugepage) { + if (pagesize || hugepage) { + if (pagesize) { + /* Now lets see, if the huge page we want to use is even mounted + * and ready to use */ + for (i = 0; i < cfg->nhugetlbfs; i++) { + if (cfg->hugetlbfs[i].size == pagesize) + break; + } - if (i == cfg->nhugetlbfs) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Unable to find any usable hugetlbfs mount for %llu KiB"), - pagesize); - goto cleanup; - } + if (i == cfg->nhugetlbfs) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find any usable hugetlbfs mount for %llu KiB"), + pagesize); + goto cleanup; + } - if (!(mem_path = qemuGetHugepagePath(&cfg->hugetlbfs[i]))) - goto cleanup; - } else { - if (!(mem_path = qemuGetDefaultHugepath(cfg->hugetlbfs, - cfg->nhugetlbfs))) - goto cleanup; + if (!(mem_path = qemuGetHugepagePath(&cfg->hugetlbfs[i]))) + goto cleanup; + } else { + if (!(mem_path = qemuGetDefaultHugepath(cfg->hugetlbfs, + cfg->nhugetlbfs))) + goto cleanup; + } } *backendType = "memory-backend-file"; if (virJSONValueObjectAdd(props, + "s:mem-path", memPath ? memPath : mem_path, + NULL) < 0) + goto cleanup; + + if (!memPath && (pagesize || hugepage) && + virJSONValueObjectAdd(props, "b:prealloc", true, - "s:mem-path", mem_path, NULL) < 0) goto cleanup; @@ -3253,7 +3262,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, } /* If none of the following is requested... */ - if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force) { + if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force && !memPath) { /* report back that using the new backend is not necessary * to achieve the desired configuration */ ret = 1; @@ -3309,7 +3318,7 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def, goto cleanup; if ((rc = qemuBuildMemoryBackendStr(memsize, 0, cell, NULL, auto_nodeset, - def, qemuCaps, cfg, &backendType, + NULL, def, qemuCaps, cfg, &backendType, &props, false)) < 0) goto cleanup; @@ -3351,7 +3360,7 @@ qemuBuildMemoryDimmBackendStr(virDomainMemoryDefPtr mem, if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, auto_nodeset, - def, qemuCaps, cfg, + mem->path, def, qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup; @@ -3369,6 +3378,7 @@ char * qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) { virBuffer buf = VIR_BUFFER_INITIALIZER; + const char *device; if (!mem->info.alias) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -3377,8 +3387,15 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) } switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: case VIR_DOMAIN_MEMORY_MODEL_DIMM: - virBufferAddLit(&buf, "pc-dimm,"); + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM) + device = "pc-dimm"; + else + device = "nvdimm"; + + virBufferAsprintf(&buf, "%s,", device); if (mem->targetNode >= 0) virBufferAsprintf(&buf, "node=%d,", mem->targetNode); @@ -3393,12 +3410,6 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) break; - case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: - virReportError(VIR_ERR_NO_SUPPORT, "%s", - _("nvdimm not supported yet")); - return NULL; - break; - case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; @@ -6938,6 +6949,7 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, { virBuffer buf = VIR_BUFFER_INITIALIZER; bool obsoleteAccel = false; + size_t i; int ret = -1; /* This should *never* be NULL, since we always provide @@ -6974,6 +6986,15 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, "with this QEMU binary")); return -1; } + + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("nvdimm not is not available " + "with this QEMU binary")); + return -1; + } + } } else { virTristateSwitch vmport = def->features[VIR_DOMAIN_FEATURE_VMPORT]; virTristateSwitch smm = def->features[VIR_DOMAIN_FEATURE_SMM]; @@ -7070,6 +7091,13 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, } } + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + virBufferAddLit(&buf, ",nvdimm=on"); + break; + } + } + virCommandAddArgBuffer(cmd, &buf); } diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index dcf9ba6..003a5d7 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -118,6 +118,7 @@ int qemuBuildMemoryBackendStr(unsigned long long size, int guestNode, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, + const char *memPath, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 28ee81d..6b049e7 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5289,12 +5289,6 @@ qemuDomainDefValidateMemoryHotplug(const virDomainDef *def, return 0; } - if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_PC_DIMM)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("memory hotplug isn't supported by this QEMU binary")); - return -1; - } - if (!ARCH_IS_PPC64(def->os.arch)) { /* due to guest support, qemu would silently enable NUMA with one node * once the memory hotplug backend is enabled. To avoid possible @@ -5318,6 +5312,28 @@ qemuDomainDefValidateMemoryHotplug(const virDomainDef *def, for (i = 0; i < def->nmems; i++) { hotplugMemory += def->mems[i]->size; + switch ((virDomainMemoryModel) def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_PC_DIMM)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("memory hotplug isn't supported by this QEMU binary")); + return -1; + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_NVDIMM)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("nvdimm isn't supported by this QEMU binary")); + return -1; + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } + /* already existing devices don't need to be checked on hotplug */ if (!mem && qemuDomainDefValidateMemoryHotplugDevice(def->mems[i], def) < 0) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 00e4a75..bf22b0a 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1878,7 +1878,7 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, NULL, - vm->def, priv->qemuCaps, cfg, + mem->path, vm->def, priv->qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args index 2eb006e..dd12751 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args @@ -13,9 +13,8 @@ QEMU_AUDIO_DRV=spice \ -mem-prealloc \ -mem-path /dev/hugepages2M/libvirt/qemu \ -numa node,nodeid=0,cpus=0-1,mem=1024 \ --object memory-backend-file,id=memdimm0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm0,mem-path=/dev/hugepages1G/libvirt/qemu,\ +prealloc=yes,size=1073741824,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0 \ -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \ -nodefaults \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args index 9f0e696..2a196ab 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args @@ -10,21 +10,21 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 4096 \ -smp 4,sockets=4,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object memory-backend-file,id=ram-node2,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node2,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object memory-backend-file,id=ram-node3,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=3,\ -policy=bind \ +-object memory-backend-file,id=ram-node3,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=3,policy=bind \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args index 447bb52..30f87a8 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args @@ -10,11 +10,11 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 1024 \ -smp 2,sockets=2,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=268435456 \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=268435456 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=805306368 \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=805306368 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args index 57dd3fa..92045a0 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args @@ -12,8 +12,8 @@ QEMU_AUDIO_DRV=none \ -smp 2,sockets=2,cores=1,threads=1 \ -object memory-backend-ram,id=ram-node0,size=268435456 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=805306368 \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=805306368 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args index f9fc218..aaa9e99 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args @@ -10,21 +10,21 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 4096 \ -smp 4,sockets=4,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,share=yes,size=1073741824,\ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,share=yes,size=1073741824,\ host-nodes=0-3,policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object memory-backend-file,id=ram-node2,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,share=no,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node2,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,share=no,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object memory-backend-file,id=ram-node3,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=3,\ -policy=bind \ +-object memory-backend-file,id=ram-node3,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=3,policy=bind \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args index 1c881c6..ea46c82 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args @@ -11,9 +11,8 @@ QEMU_AUDIO_DRV=none \ -m size=219136k,slots=16,maxmem=1099511627776k \ -smp 2,sockets=2,cores=1,threads=1 \ -numa node,nodeid=0,cpus=0-1,mem=214 \ --object memory-backend-file,id=memdimm0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm0,mem-path=/dev/hugepages2M/libvirt/qemu,\ +prealloc=yes,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0,addr=4294967296 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args index fa64fcf..dc58614 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args @@ -13,9 +13,8 @@ QEMU_AUDIO_DRV=none \ -numa node,nodeid=0,cpus=0-1,mem=214 \ -object memory-backend-ram,id=memdimm0,size=536870912 \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0 \ --object memory-backend-file,id=memdimm1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm1,mem-path=/dev/hugepages2M/libvirt/qemu,\ +prealloc=yes,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm1,id=dimm1 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args new file mode 100644 index 0000000..8cda774 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args @@ -0,0 +1,25 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,nvdimm=on \ +-m size=219136k,slots=16,maxmem=1099511627776k \ +-smp 2,sockets=2,cores=1,threads=1 \ +-numa node,nodeid=0,cpus=0-1,mem=214 \ +-object memory-backend-file,id=memnvdimm0,mem-path=/tmp/nvdimm,size=536870912 \ +-device nvdimm,node=0,memdev=memnvdimm0,id=nvdimm0 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index ad0693f..1995ccc 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1977,7 +1977,7 @@ mymain(void) DO_TEST_FAILURE("memory-align-fail", NONE); DO_TEST_FAILURE("memory-hotplug-nonuma", QEMU_CAPS_DEVICE_PC_DIMM); - DO_TEST_FAILURE("memory-hotplug", NONE); + DO_TEST("memory-hotplug", NONE); DO_TEST("memory-hotplug", QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA); DO_TEST("memory-hotplug-dimm", QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); @@ -1985,6 +1985,8 @@ mymain(void) QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("memory-hotplug-ppc64-nonuma", QEMU_CAPS_KVM, QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); + DO_TEST("memory-hotplug-nvdimm", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, + QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_AES_KEY_WRAP, -- 2.8.4

On 08/11/2016 09:26 AM, Michal Privoznik wrote:
So, majority of the code is just ready as-is. Well, with one slight change: differentiate between dimm and nvdimm in places like device alias generation, generating the command line and so on.
Speaking of the command line, we also need to append 'nvdimm=on' to the '-machine' argument so that the nvdimm feature is advertised in the ACPI tables properly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 12 ++- src/qemu/qemu_command.c | 90 ++++++++++++++-------- src/qemu/qemu_command.h | 1 + src/qemu/qemu_domain.c | 28 +++++-- src/qemu/qemu_hotplug.c | 2 +- .../qemuxml2argv-hugepages-numa.args | 5 +- .../qemuxml2argv-hugepages-pages.args | 24 +++--- .../qemuxml2argv-hugepages-pages2.args | 8 +- .../qemuxml2argv-hugepages-pages3.args | 4 +- .../qemuxml2argv-hugepages-shared.args | 22 +++--- .../qemuxml2argv-memory-hotplug-dimm-addr.args | 5 +- .../qemuxml2argv-memory-hotplug-dimm.args | 5 +- .../qemuxml2argv-memory-hotplug-nvdimm.args | 25 ++++++ tests/qemuxml2argvtest.c | 4 +- 14 files changed, 155 insertions(+), 80 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args
diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 0102c96..517dca1 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -339,13 +339,19 @@ qemuAssignDeviceMemoryAlias(virDomainDefPtr def, size_t i; int maxidx = 0; int idx; + const char *prefix; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM) + prefix = "dimm"; + else + prefix = "nvdimm";
for (i = 0; i < def->nmems; i++) { - if ((idx = qemuDomainDeviceAliasIndex(&def->mems[i]->info, "dimm")) >= maxidx) + if ((idx = qemuDomainDeviceAliasIndex(&def->mems[i]->info, prefix)) >= maxidx) maxidx = idx + 1; }
- if (virAsprintf(&mem->info.alias, "dimm%d", maxidx) < 0) + if (virAsprintf(&mem->info.alias, "%s%d", prefix, maxidx) < 0) return -1;
return 0; @@ -445,7 +451,7 @@ qemuAssignDeviceAliases(virDomainDefPtr def, virQEMUCapsPtr qemuCaps) return -1; } for (i = 0; i < def->nmems; i++) { - if (virAsprintf(&def->mems[i]->info.alias, "dimm%zu", i) < 0) + if (qemuAssignDeviceMemoryAlias(def, def->mems[i]) < 0) return -1; }
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a4cc87f..6b83d1c 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3061,6 +3061,7 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, * to, or -1 if NUMA is not used in the guest * @hostNodes: map of host nodes to alloc the memory in, NULL for default * @autoNodeset: fallback nodeset in case of automatic numa placement + * @memPath: request memory-backend-file with specific mem-path * @def: domain definition object * @qemuCaps: qemu capabilities object * @cfg: qemu driver config object @@ -3082,6 +3083,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, int guestNode, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, + const char *memPath, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, @@ -3173,35 +3175,42 @@ qemuBuildMemoryBackendStr(unsigned long long size, if (!(props = virJSONValueNewObject())) return -1;
- if (pagesize || hugepage) { - if (pagesize) { - /* Now lets see, if the huge page we want to use is even mounted - * and ready to use */ - for (i = 0; i < cfg->nhugetlbfs; i++) { - if (cfg->hugetlbfs[i].size == pagesize) - break; - } + if (memPath || pagesize || hugepage) { + if (pagesize || hugepage) { + if (pagesize) { + /* Now lets see, if the huge page we want to use is even mounted + * and ready to use */ + for (i = 0; i < cfg->nhugetlbfs; i++) { + if (cfg->hugetlbfs[i].size == pagesize) + break; + }
- if (i == cfg->nhugetlbfs) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Unable to find any usable hugetlbfs mount for %llu KiB"), - pagesize); - goto cleanup; - } + if (i == cfg->nhugetlbfs) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find any usable hugetlbfs mount for %llu KiB"), + pagesize); + goto cleanup; + }
- if (!(mem_path = qemuGetHugepagePath(&cfg->hugetlbfs[i]))) - goto cleanup; - } else { - if (!(mem_path = qemuGetDefaultHugepath(cfg->hugetlbfs, - cfg->nhugetlbfs))) - goto cleanup; + if (!(mem_path = qemuGetHugepagePath(&cfg->hugetlbfs[i]))) + goto cleanup; + } else { + if (!(mem_path = qemuGetDefaultHugepath(cfg->hugetlbfs, + cfg->nhugetlbfs))) + goto cleanup; + } }
*backendType = "memory-backend-file";
if (virJSONValueObjectAdd(props, + "s:mem-path", memPath ? memPath : mem_path, + NULL) < 0) + goto cleanup; + + if (!memPath && (pagesize || hugepage) && + virJSONValueObjectAdd(props, "b:prealloc", true, - "s:mem-path", mem_path, NULL) < 0) goto cleanup;
Two trips through virJSONValueObjectAdd... IIRC a prior review of my code noted it's an expensive trip into that code... Besides you're switching the arguments resulting changes to all the dimm tests... Perhaps this should just be an if (memPath) "s:mem-path", memPath else "b:prealloc" "s:mem-path", mem_path In fact rather than the excess of changes with indents, why not keep: if (pagesize || hugepage) { } and replace the closing } with } else if (memPath) { } I think that would be cleaner and more obvious that they're different.
@@ -3253,7 +3262,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, }
/* If none of the following is requested... */ - if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force) { + if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force && !memPath) { /* report back that using the new backend is not necessary * to achieve the desired configuration */ ret = 1; @@ -3309,7 +3318,7 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def, goto cleanup;
if ((rc = qemuBuildMemoryBackendStr(memsize, 0, cell, NULL, auto_nodeset, - def, qemuCaps, cfg, &backendType, + NULL, def, qemuCaps, cfg, &backendType, &props, false)) < 0) goto cleanup;
@@ -3351,7 +3360,7 @@ qemuBuildMemoryDimmBackendStr(virDomainMemoryDefPtr mem,
if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, auto_nodeset, - def, qemuCaps, cfg, + mem->path, def, qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup;
@@ -3369,6 +3378,7 @@ char * qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) { virBuffer buf = VIR_BUFFER_INITIALIZER; + const char *device;
if (!mem->info.alias) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -3377,8 +3387,15 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) }
switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: case VIR_DOMAIN_MEMORY_MODEL_DIMM: - virBufferAddLit(&buf, "pc-dimm,"); + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM) + device = "pc-dimm"; + else + device = "nvdimm"; + + virBufferAsprintf(&buf, "%s,", device);
if (mem->targetNode >= 0) virBufferAsprintf(&buf, "node=%d,", mem->targetNode); @@ -3393,12 +3410,6 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem)
break;
- case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: - virReportError(VIR_ERR_NO_SUPPORT, "%s", - _("nvdimm not supported yet")); - return NULL; - break; - case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; @@ -6938,6 +6949,7 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, { virBuffer buf = VIR_BUFFER_INITIALIZER; bool obsoleteAccel = false; + size_t i; int ret = -1;
/* This should *never* be NULL, since we always provide @@ -6974,6 +6986,15 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, "with this QEMU binary")); return -1; } + + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("nvdimm not is not available " + "with this QEMU binary")); + return -1; + } + } } else { virTristateSwitch vmport = def->features[VIR_DOMAIN_FEATURE_VMPORT]; virTristateSwitch smm = def->features[VIR_DOMAIN_FEATURE_SMM]; @@ -7070,6 +7091,13 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, } }
+ for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + virBufferAddLit(&buf, ",nvdimm=on"); + break; + } + } +
So I know we check later on when building the NVDIMM command line for the QEMU_CAPS_DEVICE_NVDIMM; however, should we also check here. Mostly for completeness
virCommandAddArgBuffer(cmd, &buf); }
diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index dcf9ba6..003a5d7 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -118,6 +118,7 @@ int qemuBuildMemoryBackendStr(unsigned long long size, int guestNode, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, + const char *memPath, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 28ee81d..6b049e7 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5289,12 +5289,6 @@ qemuDomainDefValidateMemoryHotplug(const virDomainDef *def, return 0; }
- if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_PC_DIMM)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("memory hotplug isn't supported by this QEMU binary")); - return -1; - } - if (!ARCH_IS_PPC64(def->os.arch)) { /* due to guest support, qemu would silently enable NUMA with one node * once the memory hotplug backend is enabled. To avoid possible
BTW: So the "if (nmems > def->mem.memory_slots) {" check is what got me initially thinking about the "how this is used" query I brought up in patch 1 for formatdomain.
@@ -5318,6 +5312,28 @@ qemuDomainDefValidateMemoryHotplug(const virDomainDef *def, for (i = 0; i < def->nmems; i++) { hotplugMemory += def->mems[i]->size;
This to is where I started wondering "how" this was implemented and how it looks to the guest. Does the guest believe it has "extra" memory? Unlike a DIMM, I wouldn't expect the memory to be general purpose usage for the guest and thus wonder if the value for NVDIMM should be included. I seem to have this recollection of some other algorithm that makes sure address space(s) used by multiple dimm's don't have overlaps; however, that wouldn't be true for an nvdimm. Perhaps that answer will be in later patches.
+ switch ((virDomainMemoryModel) def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_PC_DIMM)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("memory hotplug isn't supported by this QEMU binary")); + return -1; + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_NVDIMM)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("nvdimm isn't supported by this QEMU binary")); + return -1; + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } +
Good thing there aren't 1000's of these things... That's a check every time through the loop John
/* already existing devices don't need to be checked on hotplug */ if (!mem && qemuDomainDefValidateMemoryHotplugDevice(def->mems[i], def) < 0) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 00e4a75..bf22b0a 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1878,7 +1878,7 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver,
if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, NULL, - vm->def, priv->qemuCaps, cfg, + mem->path, vm->def, priv->qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup;
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args index 2eb006e..dd12751 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args @@ -13,9 +13,8 @@ QEMU_AUDIO_DRV=spice \ -mem-prealloc \ -mem-path /dev/hugepages2M/libvirt/qemu \ -numa node,nodeid=0,cpus=0-1,mem=1024 \ --object memory-backend-file,id=memdimm0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm0,mem-path=/dev/hugepages1G/libvirt/qemu,\ +prealloc=yes,size=1073741824,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0 \ -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \ -nodefaults \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args index 9f0e696..2a196ab 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args @@ -10,21 +10,21 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 4096 \ -smp 4,sockets=4,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object memory-backend-file,id=ram-node2,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node2,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object memory-backend-file,id=ram-node3,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=3,\ -policy=bind \ +-object memory-backend-file,id=ram-node3,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=3,policy=bind \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args index 447bb52..30f87a8 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args @@ -10,11 +10,11 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 1024 \ -smp 2,sockets=2,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=268435456 \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=268435456 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=805306368 \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=805306368 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args index 57dd3fa..92045a0 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args @@ -12,8 +12,8 @@ QEMU_AUDIO_DRV=none \ -smp 2,sockets=2,cores=1,threads=1 \ -object memory-backend-ram,id=ram-node0,size=268435456 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=805306368 \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=805306368 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args index f9fc218..aaa9e99 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args @@ -10,21 +10,21 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 4096 \ -smp 4,sockets=4,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,share=yes,size=1073741824,\ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,share=yes,size=1073741824,\ host-nodes=0-3,policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object memory-backend-file,id=ram-node2,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,share=no,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node2,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,share=no,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object memory-backend-file,id=ram-node3,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=3,\ -policy=bind \ +-object memory-backend-file,id=ram-node3,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=3,policy=bind \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args index 1c881c6..ea46c82 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args @@ -11,9 +11,8 @@ QEMU_AUDIO_DRV=none \ -m size=219136k,slots=16,maxmem=1099511627776k \ -smp 2,sockets=2,cores=1,threads=1 \ -numa node,nodeid=0,cpus=0-1,mem=214 \ --object memory-backend-file,id=memdimm0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm0,mem-path=/dev/hugepages2M/libvirt/qemu,\ +prealloc=yes,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0,addr=4294967296 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args index fa64fcf..dc58614 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args @@ -13,9 +13,8 @@ QEMU_AUDIO_DRV=none \ -numa node,nodeid=0,cpus=0-1,mem=214 \ -object memory-backend-ram,id=memdimm0,size=536870912 \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0 \ --object memory-backend-file,id=memdimm1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm1,mem-path=/dev/hugepages2M/libvirt/qemu,\ +prealloc=yes,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm1,id=dimm1 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args new file mode 100644 index 0000000..8cda774 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args @@ -0,0 +1,25 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,nvdimm=on \ +-m size=219136k,slots=16,maxmem=1099511627776k \ +-smp 2,sockets=2,cores=1,threads=1 \ +-numa node,nodeid=0,cpus=0-1,mem=214 \ +-object memory-backend-file,id=memnvdimm0,mem-path=/tmp/nvdimm,size=536870912 \ +-device nvdimm,node=0,memdev=memnvdimm0,id=nvdimm0 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index ad0693f..1995ccc 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1977,7 +1977,7 @@ mymain(void)
DO_TEST_FAILURE("memory-align-fail", NONE); DO_TEST_FAILURE("memory-hotplug-nonuma", QEMU_CAPS_DEVICE_PC_DIMM); - DO_TEST_FAILURE("memory-hotplug", NONE); + DO_TEST("memory-hotplug", NONE); DO_TEST("memory-hotplug", QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA); DO_TEST("memory-hotplug-dimm", QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); @@ -1985,6 +1985,8 @@ mymain(void) QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("memory-hotplug-ppc64-nonuma", QEMU_CAPS_KVM, QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); + DO_TEST("memory-hotplug-nvdimm", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, + QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE);
DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_AES_KEY_WRAP,

Now that NVDIMM has found its way into libvirt, users might want to fine tune some settings for each module separately. One such setting is 'share=on|off' for the memory-backend-file object. This setting - just like its name suggest already - enables sharing the nvdimm module with other applications. Under the hood it controls whether qemu mmaps() the file as MAP_PRIVATE or MAP_SHARED. Yet again, we have such config knob in domain XML, but it's just an attribute to numa <cell/>. This does not give fine enough tuning on per-memdevice basis so we need to have the attribute for each device too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 15 ++++++- docs/schemas/domaincommon.rng | 19 ++++++--- src/conf/domain_conf.c | 15 ++++++- src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 2 + ...emuxml2argv-memory-hotplug-nvdimm-memAccess.xml | 49 ++++++++++++++++++++++ 6 files changed, 93 insertions(+), 9 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 657df8f..06fe50d 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1356,7 +1356,7 @@ <span class='since'>Since 1.2.9</span> the optional attribute <code>memAccess</code> can control whether the memory is to be mapped as "shared" or "private". This is valid only for - hugepages-backed memory. + hugepages-backed memory and nvdimm modules. </p> <p> @@ -6724,7 +6724,7 @@ qemu-kvm -net nic,model=? /dev/null <pre> ... <devices> - <memory model='dimm'> + <memory model='dimm' memAccess='private'> <target> <size unit='KiB'>524287</size> <node>0</node> @@ -6762,6 +6762,17 @@ qemu-kvm -net nic,model=? /dev/null </p> </dd> + <dt><code>memAccess</code></dt> + <dd> + <p> + Then there is optional attribute <code>memAccess</code> + (<span class="since">Since 2.2.0</span>) that allows + uses to fine tune mapping of the memory on per module + basis. Values are the same as for numa <cell/>: + <code>shared</code> and <code>private</code>. + </p> + </dd> + <dt><code>source</code></dt> <dd> <p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index e6ad452..3e9d342 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4467,6 +4467,15 @@ </element> </define> + <define name="memAccess"> + <attribute name="memAccess"> + <choice> + <value>shared</value> + <value>private</value> + </choice> + </attribute> + </define> + <define name="numaCell"> <element name="cell"> <optional> @@ -4486,12 +4495,7 @@ </attribute> </optional> <optional> - <attribute name="memAccess"> - <choice> - <value>shared</value> - <value>private</value> - </choice> - </attribute> + <ref name="memAccess"/> </optional> </element> </define> @@ -4725,6 +4729,9 @@ <value>nvdimm</value> </choice> </attribute> + <optional> + <ref name="memAccess"/> + </optional> <interleave> <optional> <ref name="memorydev-source"/> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index cb5a053..84f76dd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -13245,6 +13245,15 @@ virDomainMemoryDefParseXML(xmlNodePtr memdevNode, } VIR_FREE(tmp); + tmp = virXMLPropString(memdevNode, "memAccess"); + if (tmp && + (def->memAccess = virNumaMemAccessTypeFromString(tmp)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid memAccess mode '%s'"), tmp); + goto error; + } + VIR_FREE(tmp); + /* source */ if ((node = virXPathNode("./source", ctxt)) && virDomainMemorySourceDefParseXML(node, ctxt, def) < 0) @@ -21800,7 +21809,11 @@ virDomainMemoryDefFormat(virBufferPtr buf, { const char *model = virDomainMemoryModelTypeToString(def->model); - virBufferAsprintf(buf, "<memory model='%s'>\n", model); + virBufferAsprintf(buf, "<memory model='%s'", model); + if (def->memAccess) + virBufferAsprintf(buf, " memAccess='%s'", + virNumaMemAccessTypeToString(def->memAccess)); + virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2); if (virDomainMemorySourceDefFormat(buf, def) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4e6a9bf..213768d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1945,6 +1945,8 @@ typedef enum { } virDomainMemoryModel; struct _virDomainMemoryDef { + virNumaMemAccess memAccess; + /* source */ virBitmapPtr sourceNodes; unsigned long long pagesize; /* kibibytes */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 419c33d..024c3e6 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2058,6 +2058,8 @@ virNumaGetNodeMemory; virNumaGetPageInfo; virNumaGetPages; virNumaIsAvailable; +virNumaMemAccessTypeFromString; +virNumaMemAccessTypeToString; virNumaNodeIsAvailable; virNumaNodesetIsAvailable; virNumaSetPagePoolSize; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml new file mode 100644 index 0000000..4ebea01 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml @@ -0,0 +1,49 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>1267710</memory> + <currentMemory unit='KiB'>1267710</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <idmap> + <uid start='0' target='1000' count='10'/> + <gid start='0' target='1000' count='10'/> + </idmap> + <cpu> + <topology sockets='2' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='219136' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'/> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + <memory model='nvdimm' memAccess='private'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + </memory> + </devices> +</domain> -- 2.8.4

On 08/11/2016 09:26 AM, Michal Privoznik wrote:
Now that NVDIMM has found its way into libvirt, users might want to fine tune some settings for each module separately. One such setting is 'share=on|off' for the memory-backend-file object. This setting - just like its name suggest already - enables sharing the nvdimm module with other applications. Under the hood it controls whether qemu mmaps() the file as MAP_PRIVATE or MAP_SHARED.
Yet again, we have such config knob in domain XML, but it's just an attribute to numa <cell/>. This does not give fine enough tuning on per-memdevice basis so we need to have the attribute for each device too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 15 ++++++- docs/schemas/domaincommon.rng | 19 ++++++--- src/conf/domain_conf.c | 15 ++++++- src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 2 + ...emuxml2argv-memory-hotplug-nvdimm-memAccess.xml | 49 ++++++++++++++++++++++ 6 files changed, 93 insertions(+), 9 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml
After reading this patch and the next one, I'm not sure I understand the need. For a 'dimm' it already takes a value from whatever numa has for the node. Adding this would seemingly allow someone to overwrite that value. So what would happen if the numa node was private and the *dimm node shared? The setting and usage does not restrict to nvdimm only. The rest of this is my thoughts upon first read... I'm still hung up on whether dimm and nvdimm should share nmems, but still figured I'd look at more patches to provide more thoughts.
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 657df8f..06fe50d 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1356,7 +1356,7 @@ <span class='since'>Since 1.2.9</span> the optional attribute <code>memAccess</code> can control whether the memory is to be mapped as "shared" or "private". This is valid only for - hugepages-backed memory. + hugepages-backed memory and nvdimm modules. </p>
<p> @@ -6724,7 +6724,7 @@ qemu-kvm -net nic,model=? /dev/null <pre> ... <devices> - <memory model='dimm'> + <memory model='dimm' memAccess='private'>
'dimm'?? Shouldn't this be on the 'nvdimm'? The way I read the commit msg it's for nvdimm only. It would seem this would allow the dimm I would think there'd be concerns over nefarious uses of this hole in the guest...
<target> <size unit='KiB'>524287</size> <node>0</node> @@ -6762,6 +6762,17 @@ qemu-kvm -net nic,model=? /dev/null </p> </dd>
+ <dt><code>memAccess</code></dt> + <dd> + <p> + Then there is optional attribute <code>memAccess</code> + (<span class="since">Since 2.2.0</span>) that allows
2.3.0 at the earliest.
+ uses to fine tune mapping of the memory on per module + basis. Values are the same as for numa <cell/>: + <code>shared</code> and <code>private</code>. + </p> + </dd> +
Placement. This is an attribute of <memory...> but currently "tied to" nvdimm. Making this a <dt> at the same level of <source> gives me the wrong impression. I think the text of the message belongs in the previous paragraph. E.g. "... a Non-Volatile DIMM module. For NVDIMM modules, an optional attribute <code>memAccess</code> can be provided. This allows users to fine tune mapping of the memory on a per module basis. Values are the same as..." BTW: It took me a few searches to find the shared/private NUMA discussion in the "#elementsCPU"... Makes me wonder if we should create another label "#numaTopology" that allows us to link directly to that discussion from right here.
<dt><code>source</code></dt> <dd> <p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index e6ad452..3e9d342 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4467,6 +4467,15 @@ </element> </define>
+ <define name="memAccess"> + <attribute name="memAccess"> + <choice> + <value>shared</value> + <value>private</value> + </choice> + </attribute> + </define> + <define name="numaCell"> <element name="cell"> <optional> @@ -4486,12 +4495,7 @@ </attribute> </optional> <optional> - <attribute name="memAccess"> - <choice> - <value>shared</value> - <value>private</value> - </choice> - </attribute> + <ref name="memAccess"/> </optional> </element> </define> @@ -4725,6 +4729,9 @@ <value>nvdimm</value> </choice> </attribute> + <optional> + <ref name="memAccess"/> + </optional> <interleave> <optional> <ref name="memorydev-source"/> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index cb5a053..84f76dd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -13245,6 +13245,15 @@ virDomainMemoryDefParseXML(xmlNodePtr memdevNode, } VIR_FREE(tmp);
+ tmp = virXMLPropString(memdevNode, "memAccess"); + if (tmp && + (def->memAccess = virNumaMemAccessTypeFromString(tmp)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid memAccess mode '%s'"), tmp); + goto error; + } + VIR_FREE(tmp); +
An no check for NVDIMM usage only... What if someone does add this to their DIMM? In the next patch, we'll take whatever this is set to and change the memAccess based on that. I would also think that if we did support this that we'd *have* to have some ABI check to make sure src and dst used the same memAccess mode...
/* source */ if ((node = virXPathNode("./source", ctxt)) && virDomainMemorySourceDefParseXML(node, ctxt, def) < 0) @@ -21800,7 +21809,11 @@ virDomainMemoryDefFormat(virBufferPtr buf, { const char *model = virDomainMemoryModelTypeToString(def->model);
- virBufferAsprintf(buf, "<memory model='%s'>\n", model); + virBufferAsprintf(buf, "<memory model='%s'", model); + if (def->memAccess) + virBufferAsprintf(buf, " memAccess='%s'", + virNumaMemAccessTypeToString(def->memAccess)); + virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2);
if (virDomainMemorySourceDefFormat(buf, def) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4e6a9bf..213768d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1945,6 +1945,8 @@ typedef enum { } virDomainMemoryModel;
struct _virDomainMemoryDef { + virNumaMemAccess memAccess; + /* source */ virBitmapPtr sourceNodes; unsigned long long pagesize; /* kibibytes */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 419c33d..024c3e6 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2058,6 +2058,8 @@ virNumaGetNodeMemory; virNumaGetPageInfo; virNumaGetPages; virNumaIsAvailable; +virNumaMemAccessTypeFromString; +virNumaMemAccessTypeToString; virNumaNodeIsAvailable; virNumaNodesetIsAvailable; virNumaSetPagePoolSize; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml new file mode 100644 index 0000000..4ebea01 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml @@ -0,0 +1,49 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>1267710</memory> + <currentMemory unit='KiB'>1267710</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <idmap> + <uid start='0' target='1000' count='10'/> + <gid start='0' target='1000' count='10'/> + </idmap> + <cpu> + <topology sockets='2' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='219136' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'/> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + <memory model='nvdimm' memAccess='private'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + </memory> + </devices> +</domain>
And again no xml2xml tests... So in this case, we overwrite whatever the numa default is. If the numa default were private, how would it be possible to have the nvdimm allow shared? John

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 11 +++++++-- src/qemu/qemu_command.h | 1 + src/qemu/qemu_hotplug.c | 3 ++- ...muxml2argv-memory-hotplug-nvdimm-memAccess.args | 26 ++++++++++++++++++++++ tests/qemuxml2argvtest.c | 2 ++ 5 files changed, 40 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 6b83d1c..f888de3 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3062,6 +3062,7 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, * @hostNodes: map of host nodes to alloc the memory in, NULL for default * @autoNodeset: fallback nodeset in case of automatic numa placement * @memPath: request memory-backend-file with specific mem-path + * @memAccessReq: specifically requested memAccess mode * @def: domain definition object * @qemuCaps: qemu capabilities object * @cfg: qemu driver config object @@ -3084,6 +3085,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, const char *memPath, + virNumaMemAccess memAccessReq, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, @@ -3119,6 +3121,9 @@ qemuBuildMemoryBackendStr(unsigned long long size, memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); } + if (memAccessReq) + memAccess = memAccessReq; + if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; @@ -3318,7 +3323,8 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def, goto cleanup; if ((rc = qemuBuildMemoryBackendStr(memsize, 0, cell, NULL, auto_nodeset, - NULL, def, qemuCaps, cfg, &backendType, + NULL, VIR_NUMA_MEM_ACCESS_DEFAULT, + def, qemuCaps, cfg, &backendType, &props, false)) < 0) goto cleanup; @@ -3360,7 +3366,8 @@ qemuBuildMemoryDimmBackendStr(virDomainMemoryDefPtr mem, if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, auto_nodeset, - mem->path, def, qemuCaps, cfg, + mem->path, mem->memAccess, + def, qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup; diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 003a5d7..29c0f58 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -119,6 +119,7 @@ int qemuBuildMemoryBackendStr(unsigned long long size, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, const char *memPath, + virNumaMemAccess memAccessReq, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index bf22b0a..6ba0b8e 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1878,7 +1878,8 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, NULL, - mem->path, vm->def, priv->qemuCaps, cfg, + mem->path, mem->memAccess, vm->def, + priv->qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args new file mode 100644 index 0000000..9446259 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args @@ -0,0 +1,26 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,nvdimm=on \ +-m size=219136k,slots=16,maxmem=1099511627776k \ +-smp 2,sockets=2,cores=1,threads=1 \ +-numa node,nodeid=0,cpus=0-1,mem=214 \ +-object memory-backend-file,id=memnvdimm0,mem-path=/tmp/nvdimm,share=no,\ +size=536870912 \ +-device nvdimm,node=0,memdev=memnvdimm0,id=nvdimm0 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 1995ccc..985412c 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1987,6 +1987,8 @@ mymain(void) QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("memory-hotplug-nvdimm", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); + DO_TEST("memory-hotplug-nvdimm-memAccess", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, + QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_AES_KEY_WRAP, -- 2.8.4

On 08/11/2016 09:26 AM, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 11 +++++++-- src/qemu/qemu_command.h | 1 + src/qemu/qemu_hotplug.c | 3 ++- ...muxml2argv-memory-hotplug-nvdimm-memAccess.args | 26 ++++++++++++++++++++++ tests/qemuxml2argvtest.c | 2 ++ 5 files changed, 40 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 6b83d1c..f888de3 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3062,6 +3062,7 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, * @hostNodes: map of host nodes to alloc the memory in, NULL for default * @autoNodeset: fallback nodeset in case of automatic numa placement * @memPath: request memory-backend-file with specific mem-path + * @memAccessReq: specifically requested memAccess mode * @def: domain definition object * @qemuCaps: qemu capabilities object * @cfg: qemu driver config object @@ -3084,6 +3085,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, const char *memPath, + virNumaMemAccess memAccessReq, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, @@ -3119,6 +3121,9 @@ qemuBuildMemoryBackendStr(unsigned long long size, memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); }
+ if (memAccessReq) + memAccess = memAccessReq; +
This would overwrite the value for dimm as well, which I'm still not sure is what you expected. John
if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; @@ -3318,7 +3323,8 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def, goto cleanup;
if ((rc = qemuBuildMemoryBackendStr(memsize, 0, cell, NULL, auto_nodeset, - NULL, def, qemuCaps, cfg, &backendType, + NULL, VIR_NUMA_MEM_ACCESS_DEFAULT, + def, qemuCaps, cfg, &backendType, &props, false)) < 0) goto cleanup;
@@ -3360,7 +3366,8 @@ qemuBuildMemoryDimmBackendStr(virDomainMemoryDefPtr mem,
if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, auto_nodeset, - mem->path, def, qemuCaps, cfg, + mem->path, mem->memAccess, + def, qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup;
diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 003a5d7..29c0f58 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -119,6 +119,7 @@ int qemuBuildMemoryBackendStr(unsigned long long size, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, const char *memPath, + virNumaMemAccess memAccessReq, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index bf22b0a..6ba0b8e 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1878,7 +1878,8 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver,
if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, NULL, - mem->path, vm->def, priv->qemuCaps, cfg, + mem->path, mem->memAccess, vm->def, + priv->qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup;
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args new file mode 100644 index 0000000..9446259 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args @@ -0,0 +1,26 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,nvdimm=on \ +-m size=219136k,slots=16,maxmem=1099511627776k \ +-smp 2,sockets=2,cores=1,threads=1 \ +-numa node,nodeid=0,cpus=0-1,mem=214 \ +-object memory-backend-file,id=memnvdimm0,mem-path=/tmp/nvdimm,share=no,\ +size=536870912 \ +-device nvdimm,node=0,memdev=memnvdimm0,id=nvdimm0 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 1995ccc..985412c 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1987,6 +1987,8 @@ mymain(void) QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("memory-hotplug-nvdimm", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); + DO_TEST("memory-hotplug-nvdimm-memAccess", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, + QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE);
DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_AES_KEY_WRAP,

When domain is being started up, we ought to relabel the host side of NVDIMM so qemu has access to it. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_dac.c | 73 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 442ce70..253cbbf 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1073,6 +1073,30 @@ virSecurityDACRestoreInputLabel(virSecurityManagerPtr mgr, static int +virSecurityDACRestoreMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainMemoryDefPtr mem) +{ + virSecurityDACDataPtr priv = virSecurityManagerGetPrivateData(mgr); + int ret = -1; + + switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + ret = virSecurityDACRestoreFileLabel(priv, mem->path); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + ret = 0; + break; + } + + return ret; +} + + +static int virSecurityDACRestoreAllLabel(virSecurityManagerPtr mgr, virDomainDefPtr def, bool migrated) @@ -1111,6 +1135,13 @@ virSecurityDACRestoreAllLabel(virSecurityManagerPtr mgr, rc = -1; } + for (i = 0; i < def->nmems; i++) { + if (virSecurityDACRestoreMemoryLabel(mgr, + def, + def->mems[i]) < 0) + rc = -1; + } + if (virDomainChrDefForeach(def, false, virSecurityDACRestoreChardevCallback, @@ -1144,6 +1175,41 @@ virSecurityDACSetChardevCallback(virDomainDefPtr def, static int +virSecurityDACSetMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def, + virDomainMemoryDefPtr mem) + +{ + virSecurityDACDataPtr priv = virSecurityManagerGetPrivateData(mgr); + virSecurityLabelDefPtr seclabel; + int ret = -1; + uid_t user; + gid_t group; + + seclabel = virDomainDefGetSecurityLabelDef(def, SECURITY_DAC_NAME); + if (seclabel && !seclabel->relabel) + return 0; + + switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (virSecurityDACGetIds(seclabel, priv, &user, &group, NULL, NULL) < 0) + return -1; + + ret = virSecurityDACSetOwnership(priv, NULL, mem->path, user, group); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + ret = 0; + break; + } + + return ret; +} + + +static int virSecurityDACSetAllLabel(virSecurityManagerPtr mgr, virDomainDefPtr def, const char *stdin_path ATTRIBUTE_UNUSED) @@ -1182,6 +1248,13 @@ virSecurityDACSetAllLabel(virSecurityManagerPtr mgr, return -1; } + for (i = 0; i < def->nmems; i++) { + if (virSecurityDACSetMemoryLabel(mgr, + def, + def->mems[i]) < 0) + return -1; + } + if (virDomainChrDefForeach(def, true, virSecurityDACSetChardevCallback, -- 2.8.4

When domain is being started up, we ought to relabel the host side of NVDIMM so qemu has access to it. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_selinux.c | 66 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 4be946d..c7c4921 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1119,6 +1119,62 @@ virSecuritySELinuxRestoreInputLabel(virSecurityManagerPtr mgr, static int +virSecuritySELinuxSetMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def, + virDomainMemoryDefPtr mem) +{ + virSecurityLabelDefPtr seclabel; + + seclabel = virDomainDefGetSecurityLabelDef(def, SECURITY_SELINUX_NAME); + if (seclabel == NULL) + return 0; + + switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (virSecuritySELinuxSetFilecon(mgr, mem->path, + seclabel->imagelabel) < 0) + return -1; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } + + return 0; +} + + +static int +virSecuritySELinuxRestoreMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def, + virDomainMemoryDefPtr mem) +{ + int ret = -1; + virSecurityLabelDefPtr seclabel; + + seclabel = virDomainDefGetSecurityLabelDef(def, SECURITY_SELINUX_NAME); + if (seclabel == NULL) + return 0; + + switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + ret = virSecuritySELinuxRestoreFileLabel(mgr, mem->path); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + ret = 0; + break; + } + + return ret; +} + + +static int virSecuritySELinuxSetTPMFileLabel(virSecurityManagerPtr mgr, virDomainDefPtr def, virDomainTPMDefPtr tpm) @@ -2016,6 +2072,11 @@ virSecuritySELinuxRestoreAllLabel(virSecurityManagerPtr mgr, rc = -1; } + for (i = 0; i < def->nmems; i++) { + if (virSecuritySELinuxRestoreMemoryLabel(mgr, def, def->mems[i]) < 0) + return -1; + } + for (i = 0; i < def->ndisks; i++) { virDomainDiskDefPtr disk = def->disks[i]; @@ -2402,6 +2463,11 @@ virSecuritySELinuxSetAllLabel(virSecurityManagerPtr mgr, return -1; } + for (i = 0; i < def->nmems; i++) { + if (virSecuritySELinuxSetMemoryLabel(mgr, def, def->mems[i]) < 0) + return -1; + } + if (def->tpm) { if (virSecuritySELinuxSetTPMFileLabel(mgr, def, def->tpm) < 0) return -1; -- 2.8.4

These APIs will be used whenever we are hot (un-)plugging a memdev. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 2 ++ src/security/security_driver.h | 9 +++++++ src/security/security_manager.c | 56 +++++++++++++++++++++++++++++++++++++++++ src/security/security_manager.h | 7 ++++++ src/security/security_stack.c | 38 ++++++++++++++++++++++++++++ 5 files changed, 112 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 024c3e6..86e6afd 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1130,6 +1130,7 @@ virSecurityManagerRestoreAllLabel; virSecurityManagerRestoreDiskLabel; virSecurityManagerRestoreHostdevLabel; virSecurityManagerRestoreImageLabel; +virSecurityManagerRestoreMemoryLabel; virSecurityManagerRestoreSavedStateLabel; virSecurityManagerSetAllLabel; virSecurityManagerSetChildProcessLabel; @@ -1139,6 +1140,7 @@ virSecurityManagerSetHostdevLabel; virSecurityManagerSetHugepages; virSecurityManagerSetImageFDLabel; virSecurityManagerSetImageLabel; +virSecurityManagerSetMemoryLabel; virSecurityManagerSetProcessLabel; virSecurityManagerSetSavedStateLabel; virSecurityManagerSetSocketLabel; diff --git a/src/security/security_driver.h b/src/security/security_driver.h index 7cb62f0..90b4e2e 100644 --- a/src/security/security_driver.h +++ b/src/security/security_driver.h @@ -118,6 +118,12 @@ typedef int (*virSecurityDomainSetImageLabel) (virSecurityManagerPtr mgr, typedef int (*virSecurityDomainRestoreImageLabel) (virSecurityManagerPtr mgr, virDomainDefPtr def, virStorageSourcePtr src); +typedef int (*virSecurityDomainSetMemoryLabel) (virSecurityManagerPtr mgr, + virDomainDefPtr def, + virDomainMemoryDefPtr mem); +typedef int (*virSecurityDomainRestoreMemoryLabel) (virSecurityManagerPtr mgr, + virDomainDefPtr def, + virDomainMemoryDefPtr mem); typedef int (*virSecurityDomainSetPathLabel) (virSecurityManagerPtr mgr, virDomainDefPtr def, const char *path); @@ -143,6 +149,9 @@ struct _virSecurityDriver { virSecurityDomainSetImageLabel domainSetSecurityImageLabel; virSecurityDomainRestoreImageLabel domainRestoreSecurityImageLabel; + virSecurityDomainSetMemoryLabel domainSetSecurityMemoryLabel; + virSecurityDomainRestoreMemoryLabel domainRestoreSecurityMemoryLabel; + virSecurityDomainSetDaemonSocketLabel domainSetSecurityDaemonSocketLabel; virSecurityDomainSetSocketLabel domainSetSecuritySocketLabel; virSecurityDomainClearSocketLabel domainClearSecuritySocketLabel; diff --git a/src/security/security_manager.c b/src/security/security_manager.c index ecb4a40..92c09ba 100644 --- a/src/security/security_manager.c +++ b/src/security/security_manager.c @@ -1001,3 +1001,59 @@ virSecurityManagerDomainSetPathLabel(virSecurityManagerPtr mgr, return 0; } + + +/** + * virSecurityManagerSetMemoryLabel: + * @mgr: security manager object + * @vm: domain definition object + * @mem: memory module to operate on + * + * Labels the host part of a memory module. + * + * Returns: 0 on success, -1 on error. + */ +int +virSecurityManagerSetMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virDomainMemoryDefPtr mem) +{ + if (mgr->drv->domainSetSecurityMemoryLabel) { + int ret; + virObjectLock(mgr); + ret = mgr->drv->domainSetSecurityMemoryLabel(mgr, vm, mem); + virObjectUnlock(mgr); + return ret; + } + + virReportUnsupportedError(); + return -1; +} + + +/** + * virSecurityManagerRestoreMemoryLabel: + * @mgr: security manager object + * @vm: domain definition object + * @mem: memory module to operate on + * + * Removes security label from the host part of a memory module. + * + * Returns: 0 on success, -1 on error. + */ +int +virSecurityManagerRestoreMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virDomainMemoryDefPtr mem) +{ + if (mgr->drv->domainRestoreSecurityMemoryLabel) { + int ret; + virObjectLock(mgr); + ret = mgr->drv->domainRestoreSecurityMemoryLabel(mgr, vm, mem); + virObjectUnlock(mgr); + return ret; + } + + virReportUnsupportedError(); + return -1; +} diff --git a/src/security/security_manager.h b/src/security/security_manager.h index 4cbc2d8..97be3f6 100644 --- a/src/security/security_manager.h +++ b/src/security/security_manager.h @@ -160,6 +160,13 @@ int virSecurityManagerRestoreImageLabel(virSecurityManagerPtr mgr, virDomainDefPtr vm, virStorageSourcePtr src); +int virSecurityManagerSetMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virDomainMemoryDefPtr mem); +int virSecurityManagerRestoreMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virDomainMemoryDefPtr mem); + int virSecurityManagerDomainSetPathLabel(virSecurityManagerPtr mgr, virDomainDefPtr vm, const char *path); diff --git a/src/security/security_stack.c b/src/security/security_stack.c index 3ea2751..7727153 100644 --- a/src/security/security_stack.c +++ b/src/security/security_stack.c @@ -600,6 +600,41 @@ virSecurityStackRestoreImageLabel(virSecurityManagerPtr mgr, } static int +virSecurityStackSetMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virDomainMemoryDefPtr mem) +{ + virSecurityStackDataPtr priv = virSecurityManagerGetPrivateData(mgr); + virSecurityStackItemPtr item = priv->itemsHead; + int rc = 0; + + for (; item; item = item->next) { + if (virSecurityManagerSetMemoryLabel(item->securityManager, vm, mem) < 0) + rc = -1; + } + + return rc; +} + +static int +virSecurityStackRestoreMemoryLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virDomainMemoryDefPtr mem) +{ + virSecurityStackDataPtr priv = virSecurityManagerGetPrivateData(mgr); + virSecurityStackItemPtr item = priv->itemsHead; + int rc = 0; + + for (; item; item = item->next) { + if (virSecurityManagerRestoreMemoryLabel(item->securityManager, + vm, mem) < 0) + rc = -1; + } + + return rc; +} + +static int virSecurityStackDomainSetPathLabel(virSecurityManagerPtr mgr, virDomainDefPtr vm, const char *path) @@ -637,6 +672,9 @@ virSecurityDriver virSecurityDriverStack = { .domainSetSecurityImageLabel = virSecurityStackSetImageLabel, .domainRestoreSecurityImageLabel = virSecurityStackRestoreImageLabel, + .domainSetSecurityMemoryLabel = virSecurityStackSetMemoryLabel, + .domainRestoreSecurityMemoryLabel = virSecurityStackRestoreMemoryLabel, + .domainSetSecurityDaemonSocketLabel = virSecurityStackSetDaemonSocketLabel, .domainSetSecuritySocketLabel = virSecurityStackSetSocketLabel, .domainClearSecuritySocketLabel = virSecurityStackClearSocketLabel, -- 2.8.4

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_dac.c | 3 +++ src/security/security_nop.c | 19 +++++++++++++++++++ src/security/security_selinux.c | 3 +++ 3 files changed, 25 insertions(+) diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 253cbbf..ed9e6cf 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1657,6 +1657,9 @@ virSecurityDriver virSecurityDriverDAC = { .domainSetSecurityImageLabel = virSecurityDACSetImageLabel, .domainRestoreSecurityImageLabel = virSecurityDACRestoreImageLabel, + .domainSetSecurityMemoryLabel = virSecurityDACSetMemoryLabel, + .domainRestoreSecurityMemoryLabel = virSecurityDACRestoreMemoryLabel, + .domainSetSecurityDaemonSocketLabel = virSecurityDACSetDaemonSocketLabel, .domainSetSecuritySocketLabel = virSecurityDACSetSocketLabel, .domainClearSecuritySocketLabel = virSecurityDACClearSocketLabel, diff --git a/src/security/security_nop.c b/src/security/security_nop.c index 951125d..0a9b515 100644 --- a/src/security/security_nop.c +++ b/src/security/security_nop.c @@ -236,6 +236,22 @@ virSecurityDomainSetImageLabelNop(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, return 0; } +static int +virSecurityDomainSetMemoryLabelNop(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, + virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainMemoryDefPtr mem ATTRIBUTE_UNUSED) +{ + return 0; +} + +static int +virSecurityDomainRestoreMemoryLabelNop(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, + virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainMemoryDefPtr mem ATTRIBUTE_UNUSED) +{ + return 0; +} + virSecurityDriver virSecurityDriverNop = { .privateDataLen = 0, @@ -255,6 +271,9 @@ virSecurityDriver virSecurityDriverNop = { .domainSetSecurityImageLabel = virSecurityDomainSetImageLabelNop, .domainRestoreSecurityImageLabel = virSecurityDomainRestoreImageLabelNop, + .domainSetSecurityMemoryLabel = virSecurityDomainSetMemoryLabelNop, + .domainRestoreSecurityMemoryLabel = virSecurityDomainRestoreMemoryLabelNop, + .domainSetSecurityDaemonSocketLabel = virSecurityDomainSetDaemonSocketLabelNop, .domainSetSecuritySocketLabel = virSecurityDomainSetSocketLabelNop, .domainClearSecuritySocketLabel = virSecurityDomainClearSocketLabelNop, diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index c7c4921..6e02a37 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -2694,6 +2694,9 @@ virSecurityDriver virSecurityDriverSELinux = { .domainSetSecurityImageLabel = virSecuritySELinuxSetImageLabel, .domainRestoreSecurityImageLabel = virSecuritySELinuxRestoreImageLabel, + .domainSetSecurityMemoryLabel = virSecuritySELinuxSetMemoryLabel, + .domainRestoreSecurityMemoryLabel = virSecuritySELinuxRestoreMemoryLabel, + .domainSetSecurityDaemonSocketLabel = virSecuritySELinuxSetDaemonSocketLabel, .domainSetSecuritySocketLabel = virSecuritySELinuxSetSocketLabel, .domainClearSecuritySocketLabel = virSecuritySELinuxClearSocketLabel, -- 2.8.4

Now that we have APIs for relabel memdevs on hotplug, fill in the missing implementation in qemu hotplug code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 6ba0b8e..afabbda 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1861,6 +1861,7 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, int id; int ret = -1; int rv; + bool restoreLabel = false; qemuDomainMemoryDeviceAlignSize(vm->def, mem); @@ -1893,6 +1894,11 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, goto removedef; } + if (virSecurityManagerSetMemoryLabel(driver->securityManager, + vm->def, mem) < 0) + goto cleanup; + restoreLabel = true; + qemuDomainObjEnterMonitor(driver, vm); rv = qemuMonitorAddObject(priv->mon, backendType, objalias, props); props = NULL; /* qemuMonitorAddObject consumes */ @@ -1945,6 +1951,10 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, mem = NULL; goto audit; } + if (mem && restoreLabel && + virSecurityManagerRestoreMemoryLabel(driver->securityManager, + vm->def, mem) < 0) + VIR_WARN("Unable to restore security label on memdev"); removedef: if ((id = virDomainMemoryFindByDef(vm->def, mem)) >= 0) @@ -3141,6 +3151,10 @@ qemuDomainRemoveMemoryDevice(virQEMUDriverPtr driver, if ((idx = virDomainMemoryFindByDef(vm->def, mem)) >= 0) virDomainMemoryRemove(vm->def, idx); + if (virSecurityManagerRestoreMemoryLabel(driver->securityManager, + vm->def, mem) < 0) + VIR_WARN("Unable to restore security label on memdev"); + virDomainMemoryDefFree(mem); /* fix the balloon size */ -- 2.8.4

On 08/11/2016 09:26 AM, Michal Privoznik wrote:
Now that we have APIs for relabel memdevs on hotplug, fill in the missing implementation in qemu hotplug code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Note: Patches 6-9 have an implicit ACK - they seem to be fairly standard. Although what about apparmour?
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 6ba0b8e..afabbda 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1861,6 +1861,7 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, int id; int ret = -1; int rv; + bool restoreLabel = false;
qemuDomainMemoryDeviceAlignSize(vm->def, mem);
@@ -1893,6 +1894,11 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, goto removedef; }
+ if (virSecurityManagerSetMemoryLabel(driver->securityManager, + vm->def, mem) < 0) + goto cleanup; + restoreLabel = true; + qemuDomainObjEnterMonitor(driver, vm); rv = qemuMonitorAddObject(priv->mon, backendType, objalias, props); props = NULL; /* qemuMonitorAddObject consumes */ @@ -1945,6 +1951,10 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, mem = NULL; goto audit; } + if (mem && restoreLabel &&
Coverity notes that checking for mem here is unnecessary. It dereffed at the top and there is no way to get to the exit_monitor label after the mem = NULL.
+ virSecurityManagerRestoreMemoryLabel(driver->securityManager, + vm->def, mem) < 0) + VIR_WARN("Unable to restore security label on memdev");
In any case, if this does stay within this label, I think it should move to inside the 'orig_err' code... The question becomes, if the qemuDomainObjExitMonitor fails, should the Restore be called as well. Part of me says yes, but then it's noted in the failure to ExitMonitor that we cannot touch mem, so we're SOL. John
removedef: if ((id = virDomainMemoryFindByDef(vm->def, mem)) >= 0) @@ -3141,6 +3151,10 @@ qemuDomainRemoveMemoryDevice(virQEMUDriverPtr driver, if ((idx = virDomainMemoryFindByDef(vm->def, mem)) >= 0) virDomainMemoryRemove(vm->def, idx);
+ if (virSecurityManagerRestoreMemoryLabel(driver->securityManager, + vm->def, mem) < 0) + VIR_WARN("Unable to restore security label on memdev"); + virDomainMemoryDefFree(mem);
/* fix the balloon size */
participants (4)
-
Daniel P. Berrange
-
John Ferlan
-
Michal Privoznik
-
Peter Krempa