[libvirt] [PATCH 0/5] Introduce NVDIMM support

NVDIMM was introduced to qemu in v2.6.0-rc0~248^2~25. So it's been a while since then. It's not the next big thing, but it is very interesting feature enabling higher performance as reading/writing to the module (and subsequently to the file on the host) does not require a VMEXIT. It can be used to access host files directly bypassing page cache whilst doing so. How to test the feature? 1) you need PMEM enabled kernel: CONFIG_LIBNVDIMM=y CONFIG_BLK_DEV_PMEM=m CONFIG_ACPI_NFIT=m 2) Create a file in the host: truncate -s 512M /tmp/nvdimm 3) Add the following to the domain XML: <memory model='nvdimm' memAccess='shared'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> </memory> 4) Start the domain and write something into the NVDIMM module: (guest) $ echo 'Hello world' > /dev/pmem0 5) From the host, check the file has changed: (host) $ hexdump -C /tmp/nvdimm Want to watch very interesting video while reviewing? https://youtu.be/Vit3-PjbN9M Michal Privoznik (5): Introduce NVDIMM memory model qemu: Introduce QEMU_CAPS_DEVICE_NVDIMM qemu: Implement NVDIMM conf: Introduce memAccess to <memory/> qemu: Implement memAccess for <memory/> banks docs/formatdomain.html.in | 41 ++++++-- docs/schemas/domaincommon.rng | 51 ++++++---- src/conf/domain_conf.c | 112 ++++++++++++++++----- src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 2 + src/qemu/qemu_alias.c | 12 ++- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 87 +++++++++++----- src/qemu/qemu_command.h | 2 + src/qemu/qemu_domain.c | 29 ++++-- src/qemu/qemu_hotplug.c | 3 +- tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + .../qemuxml2argv-hugepages-numa.args | 5 +- .../qemuxml2argv-hugepages-pages.args | 24 ++--- .../qemuxml2argv-hugepages-pages2.args | 8 +- .../qemuxml2argv-hugepages-pages3.args | 4 +- .../qemuxml2argv-hugepages-shared.args | 22 ++-- .../qemuxml2argv-memory-hotplug-dimm-addr.args | 5 +- .../qemuxml2argv-memory-hotplug-dimm.args | 5 +- ...muxml2argv-memory-hotplug-nvdimm-memAccess.args | 26 +++++ ...emuxml2argv-memory-hotplug-nvdimm-memAccess.xml | 49 +++++++++ .../qemuxml2argv-memory-hotplug-nvdimm.args | 25 +++++ .../qemuxml2argv-memory-hotplug-nvdimm.xml | 49 +++++++++ tests/qemuxml2argvtest.c | 6 +- 25 files changed, 452 insertions(+), 123 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml -- 2.8.4

NVDIMM is new type of memory introduced in qemu. The idea is that we have a DIMM module that keeps the data persistent across domain reboots. At the domain XML level, we already have some representation of 'dimm' modules. Long story short, we have <memory/> element that lives under <devices/>. Now, the element even has @model attribute which we can use to introduce new memory type: <memory model='nvdimm'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> </memory> So far, this is just a XML parser/formatter extension. QEMU driver implementation is in the next commit. For more info on NVDIMM visit the following web page: http://pmem.io/ Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 26 ++++-- docs/schemas/domaincommon.rng | 32 ++++--- src/conf/domain_conf.c | 97 ++++++++++++++++------ src/conf/domain_conf.h | 2 + src/qemu/qemu_command.c | 6 ++ src/qemu/qemu_domain.c | 1 + .../qemuxml2argv-memory-hotplug-nvdimm.xml | 49 +++++++++++ 7 files changed, 171 insertions(+), 42 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 8efd6af..981b820 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -6702,6 +6702,15 @@ qemu-kvm -net nic,model=? /dev/null <node>1</node> </target> </memory> + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>524287</size> + <node>1</node> + </target> + </memory> </devices> ... </pre> @@ -6709,17 +6718,19 @@ qemu-kvm -net nic,model=? /dev/null <dt><code>model</code></dt> <dd> <p> - Currently only the <code>dimm</code> model is supported in order to - add a virtual DIMM module to the guest. + Select <code>dimm</code> to add a virtual DIMM module to the guest. + Alternatively, <code>nvdimm</code> model adds a Non-Volatile DIMM + module. <span class="since">Since 2.2.0</span> </p> </dd> <dt><code>source</code></dt> <dd> <p> - The optional source element allows to fine tune the source of the - memory used for the given memory device. If the element is not - provided defaults configured via <code>numatune</code> are used. + For model <code>dimm</code> this element is optional and allows to + fine tune the source of the memory used for the given memory device. + If the element is not provided defaults configured via + <code>numatune</code> are used. </p> <p> <code>pagesize</code> can optionally be used to override the default @@ -6732,6 +6743,11 @@ qemu-kvm -net nic,model=? /dev/null <code>nodemask</code> can optionally be used to override the default set of NUMA nodes where the memory would be allocated. </p> + <p> + For model <code>nvdimm</code> this element is mandatory and has a + single child element <code>path</code> which value represents a path + in host that back the nvdimm module in the guest. + </p> </dd> <dt><code>target</code></dt> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index ac9fd21..3265c2b 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4693,6 +4693,7 @@ <attribute name="model"> <choice> <value>dimm</value> + <value>nvdimm</value> </choice> </attribute> <interleave> @@ -4712,18 +4713,27 @@ <define name="memorydev-source"> <element name="source"> - <interleave> - <optional> - <element name="pagesize"> - <ref name="scaledInteger"/> + <choice> + <group> + <interleave> + <optional> + <element name="pagesize"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="nodemask"> + <ref name="cpuset"/> + </element> + </optional> + </interleave> + </group> + <group> + <element name="path"> + <text/> </element> - </optional> - <optional> - <element name="nodemask"> - <ref name="cpuset"/> - </element> - </optional> - </interleave> + </group> + </choice> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a56e0f5..60f1f21 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -837,8 +837,11 @@ VIR_ENUM_DECL(virDomainBlockJob) VIR_ENUM_IMPL(virDomainBlockJob, VIR_DOMAIN_BLOCK_JOB_TYPE_LAST, "", "", "copy", "", "active-commit") -VIR_ENUM_IMPL(virDomainMemoryModel, VIR_DOMAIN_MEMORY_MODEL_LAST, - "", "dimm") +VIR_ENUM_IMPL(virDomainMemoryModel, + VIR_DOMAIN_MEMORY_MODEL_LAST, + "", + "dimm", + "nvdimm") static virClassPtr virDomainObjClass; static virClassPtr virDomainXMLOptionClass; @@ -2342,6 +2345,7 @@ void virDomainMemoryDefFree(virDomainMemoryDefPtr def) if (!def) return; + VIR_FREE(def->path); virBitmapFree(def->sourceNodes); virDomainDeviceInfoClear(&def->info); VIR_FREE(def); @@ -13137,20 +13141,36 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, xmlNodePtr save = ctxt->node; ctxt->node = node; - if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, - &def->pagesize, false, false) < 0) - goto cleanup; - - if ((nodemask = virXPathString("string(./nodemask)", ctxt))) { - if (virBitmapParse(nodemask, &def->sourceNodes, - VIR_DOMAIN_CPUMASK_LEN) < 0) + switch ((virDomainMemoryModel) def->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, + &def->pagesize, false, false) < 0) goto cleanup; - if (virBitmapIsAllClear(def->sourceNodes)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Invalid value of 'nodemask': %s"), nodemask); + if ((nodemask = virXPathString("string(./nodemask)", ctxt))) { + if (virBitmapParse(nodemask, &def->sourceNodes, + VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(def->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'nodemask': %s"), nodemask); + goto cleanup; + } + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (!(def->path = virXPathString("string(./path)", ctxt))) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("path is required for model nvdimm'")); goto cleanup; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } ret = 0; @@ -14538,12 +14558,25 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def, tmp->size != mem->size) continue; - /* source stuff -> match with device */ - if (tmp->pagesize != mem->pagesize) - continue; + switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + /* source stuff -> match with device */ + if (tmp->pagesize != mem->pagesize) + continue; - if (!virBitmapEqual(tmp->sourceNodes, mem->sourceNodes)) - continue; + if (!virBitmapEqual(tmp->sourceNodes, mem->sourceNodes)) + continue; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (STRNEQ(tmp->path, mem->path)) + continue; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } break; } @@ -21659,23 +21692,35 @@ virDomainMemorySourceDefFormat(virBufferPtr buf, char *bitmap = NULL; int ret = -1; - if (!def->pagesize && !def->sourceNodes) + if (!def->pagesize && !def->sourceNodes && !def->path) return 0; virBufferAddLit(buf, "<source>\n"); virBufferAdjustIndent(buf, 2); - if (def->sourceNodes) { - if (!(bitmap = virBitmapFormat(def->sourceNodes))) - goto cleanup; + switch ((virDomainMemoryModel) def->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (def->sourceNodes) { + if (!(bitmap = virBitmapFormat(def->sourceNodes))) + goto cleanup; - virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap); + virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap); + } + + if (def->pagesize) + virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n", + def->pagesize); + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + virBufferAsprintf(buf, "<path>%s</path>\n", def->path); + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } - if (def->pagesize) - virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n", - def->pagesize); - virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</source>\n"); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3c2f182..1201feb 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1935,6 +1935,7 @@ struct _virDomainRNGDef { typedef enum { VIR_DOMAIN_MEMORY_MODEL_NONE, VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */ + VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */ VIR_DOMAIN_MEMORY_MODEL_LAST } virDomainMemoryModel; @@ -1943,6 +1944,7 @@ struct _virDomainMemoryDef { /* source */ virBitmapPtr sourceNodes; unsigned long long pagesize; /* kibibytes */ + char *path; /* target */ int model; /* virDomainMemoryModel */ diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 5325f48..f5a68cc 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3395,6 +3395,12 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) break; + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + virReportError(VIR_ERR_NO_SUPPORT, "%s", + _("nvdimm not supported yet")); + return NULL; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0a3cf0e..8948ac1 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5158,6 +5158,7 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, { switch ((virDomainMemoryModel) mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM && mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml new file mode 100644 index 0000000..e932241 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml @@ -0,0 +1,49 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>1267710</memory> + <currentMemory unit='KiB'>1267710</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <idmap> + <uid start='0' target='1000' count='10'/> + <gid start='0' target='1000' count='10'/> + </idmap> + <cpu> + <topology sockets='2' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='219136' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'/> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + </memory> + </devices> +</domain> -- 2.8.4

Introduce a qemu capability for -device nvdimm. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + 3 files changed, 4 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index d5b73e6..d32f9e5 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -339,6 +339,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "tls-creds-x509", /* 230 */ "display", "intel-iommu", + "nvdimm", ); @@ -1566,6 +1567,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "pxb-pcie", QEMU_CAPS_DEVICE_PXB_PCIE }, { "tls-creds-x509", QEMU_CAPS_OBJECT_TLS_CREDS_X509 }, { "intel-iommu", QEMU_CAPS_DEVICE_INTEL_IOMMU }, + { "nvdimm", QEMU_CAPS_DEVICE_NVDIMM }, }; static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsVirtioBalloon[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index bd5c6d9..eb4c993 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -372,6 +372,7 @@ typedef enum { QEMU_CAPS_OBJECT_TLS_CREDS_X509, /* -object tls-creds-x509 */ QEMU_CAPS_DISPLAY, /* -display */ QEMU_CAPS_DEVICE_INTEL_IOMMU, /* -device intel-iommu */ + QEMU_CAPS_DEVICE_NVDIMM, /* -device nvdimm */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml index 653ec75..a19222b 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml @@ -194,6 +194,7 @@ <flag name='tls-creds-x509'/> <flag name='display'/> <flag name='intel-iommu'/> + <flag name='nvdimm'/> <version>2006000</version> <kvmVersion>0</kvmVersion> <package></package> -- 2.8.4

So, majority of the code is just ready as-is. Well, with one slight change: differentiate between dimm and nvdimm in places like device alias generation, generating the command line and so on. Speaking of the command line, we also need to append 'nvdimm=on' to the '-machine' argument so that the nvdimm feature is advertised in the ACPI tables properly. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 12 ++- src/qemu/qemu_command.c | 90 ++++++++++++++-------- src/qemu/qemu_command.h | 1 + src/qemu/qemu_domain.c | 28 +++++-- src/qemu/qemu_hotplug.c | 2 +- .../qemuxml2argv-hugepages-numa.args | 5 +- .../qemuxml2argv-hugepages-pages.args | 24 +++--- .../qemuxml2argv-hugepages-pages2.args | 8 +- .../qemuxml2argv-hugepages-pages3.args | 4 +- .../qemuxml2argv-hugepages-shared.args | 22 +++--- .../qemuxml2argv-memory-hotplug-dimm-addr.args | 5 +- .../qemuxml2argv-memory-hotplug-dimm.args | 5 +- .../qemuxml2argv-memory-hotplug-nvdimm.args | 25 ++++++ tests/qemuxml2argvtest.c | 4 +- 14 files changed, 155 insertions(+), 80 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 51a654a..57e4cea 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -337,13 +337,19 @@ qemuAssignDeviceMemoryAlias(virDomainDefPtr def, size_t i; int maxidx = 0; int idx; + const char *prefix; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM) + prefix = "dimm"; + else + prefix = "nvdimm"; for (i = 0; i < def->nmems; i++) { - if ((idx = qemuDomainDeviceAliasIndex(&def->mems[i]->info, "dimm")) >= maxidx) + if ((idx = qemuDomainDeviceAliasIndex(&def->mems[i]->info, prefix)) >= maxidx) maxidx = idx + 1; } - if (virAsprintf(&mem->info.alias, "dimm%d", maxidx) < 0) + if (virAsprintf(&mem->info.alias, "%s%d", prefix, maxidx) < 0) return -1; return 0; @@ -443,7 +449,7 @@ qemuAssignDeviceAliases(virDomainDefPtr def, virQEMUCapsPtr qemuCaps) return -1; } for (i = 0; i < def->nmems; i++) { - if (virAsprintf(&def->mems[i]->info.alias, "dimm%zu", i) < 0) + if (qemuAssignDeviceMemoryAlias(def, def->mems[i]) < 0) return -1; } diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index f5a68cc..18ba7b6 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3063,6 +3063,7 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, * to, or -1 if NUMA is not used in the guest * @hostNodes: map of host nodes to alloc the memory in, NULL for default * @autoNodeset: fallback nodeset in case of automatic numa placement + * @memPath: request memory-backend-file with specific mem-path * @def: domain definition object * @qemuCaps: qemu capabilities object * @cfg: qemu driver config object @@ -3084,6 +3085,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, int guestNode, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, + const char *memPath, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, @@ -3175,35 +3177,42 @@ qemuBuildMemoryBackendStr(unsigned long long size, if (!(props = virJSONValueNewObject())) return -1; - if (pagesize || hugepage) { - if (pagesize) { - /* Now lets see, if the huge page we want to use is even mounted - * and ready to use */ - for (i = 0; i < cfg->nhugetlbfs; i++) { - if (cfg->hugetlbfs[i].size == pagesize) - break; - } + if (memPath || pagesize || hugepage) { + if (pagesize || hugepage) { + if (pagesize) { + /* Now lets see, if the huge page we want to use is even mounted + * and ready to use */ + for (i = 0; i < cfg->nhugetlbfs; i++) { + if (cfg->hugetlbfs[i].size == pagesize) + break; + } - if (i == cfg->nhugetlbfs) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Unable to find any usable hugetlbfs mount for %llu KiB"), - pagesize); - goto cleanup; - } + if (i == cfg->nhugetlbfs) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find any usable hugetlbfs mount for %llu KiB"), + pagesize); + goto cleanup; + } - if (!(mem_path = qemuGetHugepagePath(&cfg->hugetlbfs[i]))) - goto cleanup; - } else { - if (!(mem_path = qemuGetDefaultHugepath(cfg->hugetlbfs, - cfg->nhugetlbfs))) - goto cleanup; + if (!(mem_path = qemuGetHugepagePath(&cfg->hugetlbfs[i]))) + goto cleanup; + } else { + if (!(mem_path = qemuGetDefaultHugepath(cfg->hugetlbfs, + cfg->nhugetlbfs))) + goto cleanup; + } } *backendType = "memory-backend-file"; if (virJSONValueObjectAdd(props, + "s:mem-path", memPath ? memPath : mem_path, + NULL) < 0) + goto cleanup; + + if (!memPath && (pagesize || hugepage) && + virJSONValueObjectAdd(props, "b:prealloc", true, - "s:mem-path", mem_path, NULL) < 0) goto cleanup; @@ -3255,7 +3264,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, } /* If none of the following is requested... */ - if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force) { + if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force && !memPath) { /* report back that using the new backend is not necessary * to achieve the desired configuration */ ret = 1; @@ -3311,7 +3320,7 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def, goto cleanup; if ((rc = qemuBuildMemoryBackendStr(memsize, 0, cell, NULL, auto_nodeset, - def, qemuCaps, cfg, &backendType, + NULL, def, qemuCaps, cfg, &backendType, &props, false)) < 0) goto cleanup; @@ -3353,7 +3362,7 @@ qemuBuildMemoryDimmBackendStr(virDomainMemoryDefPtr mem, if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, auto_nodeset, - def, qemuCaps, cfg, + mem->path, def, qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup; @@ -3371,6 +3380,7 @@ char * qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) { virBuffer buf = VIR_BUFFER_INITIALIZER; + const char *device; if (!mem->info.alias) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -3379,8 +3389,15 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) } switch ((virDomainMemoryModel) mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: case VIR_DOMAIN_MEMORY_MODEL_DIMM: - virBufferAddLit(&buf, "pc-dimm,"); + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM) + device = "pc-dimm"; + else + device = "nvdimm"; + + virBufferAsprintf(&buf, "%s,", device); if (mem->targetNode >= 0) virBufferAsprintf(&buf, "node=%d,", mem->targetNode); @@ -3395,12 +3412,6 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem) break; - case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: - virReportError(VIR_ERR_NO_SUPPORT, "%s", - _("nvdimm not supported yet")); - return NULL; - break; - case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; @@ -6935,6 +6946,7 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, virQEMUCapsPtr qemuCaps) { bool obsoleteAccel = false; + size_t i; /* This should *never* be NULL, since we always provide * a machine in the capabilities data for QEMU. So this @@ -6970,6 +6982,15 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, "with this QEMU binary")); return -1; } + + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("nvdimm not is not available " + "with this QEMU binary")); + return -1; + } + } } else { virBuffer buf = VIR_BUFFER_INITIALIZER; virTristateSwitch vmport = def->features[VIR_DOMAIN_FEATURE_VMPORT]; @@ -7062,6 +7083,13 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, } } + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + virBufferAddLit(&buf, ",nvdimm=on"); + break; + } + } + virCommandAddArgBuffer(cmd, &buf); } diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index c4d0567..040c0f5 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -119,6 +119,7 @@ int qemuBuildMemoryBackendStr(unsigned long long size, int guestNode, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, + const char *memPath, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8948ac1..51f67df 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5244,12 +5244,6 @@ qemuDomainDefValidateMemoryHotplug(const virDomainDef *def, return 0; } - if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_PC_DIMM)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("memory hotplug isn't supported by this QEMU binary")); - return -1; - } - if (!ARCH_IS_PPC64(def->os.arch)) { /* due to guest support, qemu would silently enable NUMA with one node * once the memory hotplug backend is enabled. To avoid possible @@ -5273,6 +5267,28 @@ qemuDomainDefValidateMemoryHotplug(const virDomainDef *def, for (i = 0; i < def->nmems; i++) { hotplugMemory += def->mems[i]->size; + switch ((virDomainMemoryModel) def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_PC_DIMM)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("memory hotplug isn't supported by this QEMU binary")); + return -1; + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_NVDIMM)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("nvdimm isn't supported by this QEMU binary")); + return -1; + } + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } + /* already existing devices don't need to be checked on hotplug */ if (!mem && qemuDomainDefValidateMemoryHotplugDevice(def->mems[i], def) < 0) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index b970448..e4716c2 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1856,7 +1856,7 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, NULL, - vm->def, priv->qemuCaps, cfg, + mem->path, vm->def, priv->qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args index 2eb006e..dd12751 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args @@ -13,9 +13,8 @@ QEMU_AUDIO_DRV=spice \ -mem-prealloc \ -mem-path /dev/hugepages2M/libvirt/qemu \ -numa node,nodeid=0,cpus=0-1,mem=1024 \ --object memory-backend-file,id=memdimm0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm0,mem-path=/dev/hugepages1G/libvirt/qemu,\ +prealloc=yes,size=1073741824,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0 \ -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \ -nodefaults \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args index 9f0e696..2a196ab 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args @@ -10,21 +10,21 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 4096 \ -smp 4,sockets=4,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object memory-backend-file,id=ram-node2,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node2,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object memory-backend-file,id=ram-node3,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=3,\ -policy=bind \ +-object memory-backend-file,id=ram-node3,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=3,policy=bind \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args index 447bb52..30f87a8 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args @@ -10,11 +10,11 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 1024 \ -smp 2,sockets=2,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=268435456 \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=268435456 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=805306368 \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,size=805306368 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args index 57dd3fa..92045a0 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args @@ -12,8 +12,8 @@ QEMU_AUDIO_DRV=none \ -smp 2,sockets=2,cores=1,threads=1 \ -object memory-backend-ram,id=ram-node0,size=268435456 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=805306368 \ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=805306368 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args index f9fc218..aaa9e99 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args @@ -10,21 +10,21 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 4096 \ -smp 4,sockets=4,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node0,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,share=yes,size=1073741824,\ +-object memory-backend-file,id=ram-node1,\ +mem-path=/dev/hugepages2M/libvirt/qemu,prealloc=yes,share=yes,size=1073741824,\ host-nodes=0-3,policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object memory-backend-file,id=ram-node2,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,share=no,size=1073741824,host-nodes=0-3,\ -policy=bind \ +-object memory-backend-file,id=ram-node2,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,share=no,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object memory-backend-file,id=ram-node3,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=3,\ -policy=bind \ +-object memory-backend-file,id=ram-node3,\ +mem-path=/dev/hugepages1G/libvirt/qemu,prealloc=yes,size=1073741824,\ +host-nodes=3,policy=bind \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args index 1c881c6..ea46c82 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args @@ -11,9 +11,8 @@ QEMU_AUDIO_DRV=none \ -m size=219136k,slots=16,maxmem=1099511627776k \ -smp 2,sockets=2,cores=1,threads=1 \ -numa node,nodeid=0,cpus=0-1,mem=214 \ --object memory-backend-file,id=memdimm0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm0,mem-path=/dev/hugepages2M/libvirt/qemu,\ +prealloc=yes,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0,addr=4294967296 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args index fa64fcf..dc58614 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args @@ -13,9 +13,8 @@ QEMU_AUDIO_DRV=none \ -numa node,nodeid=0,cpus=0-1,mem=214 \ -object memory-backend-ram,id=memdimm0,size=536870912 \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0 \ --object memory-backend-file,id=memdimm1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,\ -policy=bind \ +-object memory-backend-file,id=memdimm1,mem-path=/dev/hugepages2M/libvirt/qemu,\ +prealloc=yes,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm1,id=dimm1 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args new file mode 100644 index 0000000..8cda774 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.args @@ -0,0 +1,25 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,nvdimm=on \ +-m size=219136k,slots=16,maxmem=1099511627776k \ +-smp 2,sockets=2,cores=1,threads=1 \ +-numa node,nodeid=0,cpus=0-1,mem=214 \ +-object memory-backend-file,id=memnvdimm0,mem-path=/tmp/nvdimm,size=536870912 \ +-device nvdimm,node=0,memdev=memnvdimm0,id=nvdimm0 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index a5d51a8..a53e49f 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1946,7 +1946,7 @@ mymain(void) DO_TEST_FAILURE("memory-align-fail", NONE); DO_TEST_FAILURE("memory-hotplug-nonuma", QEMU_CAPS_DEVICE_PC_DIMM); - DO_TEST_FAILURE("memory-hotplug", NONE); + DO_TEST("memory-hotplug", NONE); DO_TEST("memory-hotplug", QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA); DO_TEST("memory-hotplug-dimm", QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); @@ -1954,6 +1954,8 @@ mymain(void) QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("memory-hotplug-ppc64-nonuma", QEMU_CAPS_KVM, QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); + DO_TEST("memory-hotplug-nvdimm", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, + QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_AES_KEY_WRAP, -- 2.8.4

Now that NVDIMM has found its way into libvirt, users might want to fine tune some settings for each module separately. One such setting is 'share=on|off' for the memory-backend-file object. This setting - just like its name suggest already - enables sharing the nvdimm module with other applications. Under the hood it controls whether qemu mmaps() the file as MAP_PRIVATE or MAP_SHARED. Yet again, we have such config knob in domain XML, but it's just an attribute to numa <cell/>. This does not give fine enough tuning on per-memdevice basis so we need to have the attribute for each device too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 15 ++++++- docs/schemas/domaincommon.rng | 19 ++++++--- src/conf/domain_conf.c | 15 ++++++- src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 2 + ...emuxml2argv-memory-hotplug-nvdimm-memAccess.xml | 49 ++++++++++++++++++++++ 6 files changed, 93 insertions(+), 9 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 981b820..5df8a00 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1328,7 +1328,7 @@ <span class='since'>Since 1.2.9</span> the optional attribute <code>memAccess</code> can control whether the memory is to be mapped as "shared" or "private". This is valid only for - hugepages-backed memory. + hugepages-backed memory and nvdimm modules. </p> <p> @@ -6686,7 +6686,7 @@ qemu-kvm -net nic,model=? /dev/null <pre> ... <devices> - <memory model='dimm'> + <memory model='dimm' memAccess='private'> <target> <size unit='KiB'>524287</size> <node>0</node> @@ -6724,6 +6724,17 @@ qemu-kvm -net nic,model=? /dev/null </p> </dd> + <dt><code>memAccess</code></dt> + <dd> + <p> + Then there is optional attribute <code>memAccess</code> + (<span class="since">Since 2.2.0</span>) that allows + uses to fine tune mapping of the memory on per module + basis. Values are the same as for numa <cell/>: + <code>shared</code> and <code>private</code>. + </p> + </dd> + <dt><code>source</code></dt> <dd> <p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 3265c2b..472d05b 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4438,6 +4438,15 @@ </element> </define> + <define name="memAccess"> + <attribute name="memAccess"> + <choice> + <value>shared</value> + <value>private</value> + </choice> + </attribute> + </define> + <define name="numaCell"> <element name="cell"> <optional> @@ -4457,12 +4466,7 @@ </attribute> </optional> <optional> - <attribute name="memAccess"> - <choice> - <value>shared</value> - <value>private</value> - </choice> - </attribute> + <ref name="memAccess"/> </optional> </element> </define> @@ -4696,6 +4700,9 @@ <value>nvdimm</value> </choice> </attribute> + <optional> + <ref name="memAccess"/> + </optional> <interleave> <optional> <ref name="memorydev-source"/> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 60f1f21..9299b72 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -13242,6 +13242,15 @@ virDomainMemoryDefParseXML(xmlNodePtr memdevNode, } VIR_FREE(tmp); + tmp = virXMLPropString(memdevNode, "memAccess"); + if (tmp && + (def->memAccess = virNumaMemAccessTypeFromString(tmp)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid memAccess mode '%s'"), tmp); + goto error; + } + VIR_FREE(tmp); + /* source */ if ((node = virXPathNode("./source", ctxt)) && virDomainMemorySourceDefParseXML(node, ctxt, def) < 0) @@ -21754,7 +21763,11 @@ virDomainMemoryDefFormat(virBufferPtr buf, { const char *model = virDomainMemoryModelTypeToString(def->model); - virBufferAsprintf(buf, "<memory model='%s'>\n", model); + virBufferAsprintf(buf, "<memory model='%s'", model); + if (def->memAccess) + virBufferAsprintf(buf, " memAccess='%s'", + virNumaMemAccessTypeToString(def->memAccess)); + virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2); if (virDomainMemorySourceDefFormat(buf, def) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 1201feb..b0c83b2 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1941,6 +1941,8 @@ typedef enum { } virDomainMemoryModel; struct _virDomainMemoryDef { + virNumaMemAccess memAccess; + /* source */ virBitmapPtr sourceNodes; unsigned long long pagesize; /* kibibytes */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 89c4511..64ba619 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2053,6 +2053,8 @@ virNumaGetNodeMemory; virNumaGetPageInfo; virNumaGetPages; virNumaIsAvailable; +virNumaMemAccessTypeFromString; +virNumaMemAccessTypeToString; virNumaNodeIsAvailable; virNumaNodesetIsAvailable; virNumaSetPagePoolSize; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml new file mode 100644 index 0000000..4ebea01 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.xml @@ -0,0 +1,49 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>1267710</memory> + <currentMemory unit='KiB'>1267710</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <idmap> + <uid start='0' target='1000' count='10'/> + <gid start='0' target='1000' count='10'/> + </idmap> + <cpu> + <topology sockets='2' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='219136' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'/> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + <memory model='nvdimm' memAccess='private'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + </memory> + </devices> +</domain> -- 2.8.4

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 11 +++++++-- src/qemu/qemu_command.h | 1 + src/qemu/qemu_hotplug.c | 3 ++- ...muxml2argv-memory-hotplug-nvdimm-memAccess.args | 26 ++++++++++++++++++++++ tests/qemuxml2argvtest.c | 2 ++ 5 files changed, 40 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 18ba7b6..5e93824 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3064,6 +3064,7 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, * @hostNodes: map of host nodes to alloc the memory in, NULL for default * @autoNodeset: fallback nodeset in case of automatic numa placement * @memPath: request memory-backend-file with specific mem-path + * @memAccessReq: specifically requested memAccess mode * @def: domain definition object * @qemuCaps: qemu capabilities object * @cfg: qemu driver config object @@ -3086,6 +3087,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, const char *memPath, + virNumaMemAccess memAccessReq, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, @@ -3121,6 +3123,9 @@ qemuBuildMemoryBackendStr(unsigned long long size, memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); } + if (memAccessReq) + memAccess = memAccessReq; + if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; @@ -3320,7 +3325,8 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def, goto cleanup; if ((rc = qemuBuildMemoryBackendStr(memsize, 0, cell, NULL, auto_nodeset, - NULL, def, qemuCaps, cfg, &backendType, + NULL, VIR_NUMA_MEM_ACCESS_DEFAULT, + def, qemuCaps, cfg, &backendType, &props, false)) < 0) goto cleanup; @@ -3362,7 +3368,8 @@ qemuBuildMemoryDimmBackendStr(virDomainMemoryDefPtr mem, if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, auto_nodeset, - mem->path, def, qemuCaps, cfg, + mem->path, mem->memAccess, + def, qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup; diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 040c0f5..ddf30b0 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -120,6 +120,7 @@ int qemuBuildMemoryBackendStr(unsigned long long size, virBitmapPtr userNodeset, virBitmapPtr autoNodeset, const char *memPath, + virNumaMemAccess memAccessReq, virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virQEMUDriverConfigPtr cfg, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index e4716c2..bba83c9 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1856,7 +1856,8 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, if (qemuBuildMemoryBackendStr(mem->size, mem->pagesize, mem->targetNode, mem->sourceNodes, NULL, - mem->path, vm->def, priv->qemuCaps, cfg, + mem->path, mem->memAccess, vm->def, + priv->qemuCaps, cfg, &backendType, &props, true) < 0) goto cleanup; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args new file mode 100644 index 0000000..9446259 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm-memAccess.args @@ -0,0 +1,26 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,nvdimm=on \ +-m size=219136k,slots=16,maxmem=1099511627776k \ +-smp 2,sockets=2,cores=1,threads=1 \ +-numa node,nodeid=0,cpus=0-1,mem=214 \ +-object memory-backend-file,id=memnvdimm0,mem-path=/tmp/nvdimm,share=no,\ +size=536870912 \ +-device nvdimm,node=0,memdev=memnvdimm0,id=nvdimm0 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-nographic \ +-nodefaults \ +-monitor unix:/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index a53e49f..1205a73 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1956,6 +1956,8 @@ mymain(void) QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("memory-hotplug-nvdimm", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); + DO_TEST("memory-hotplug-nvdimm-memAccess", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_DEVICE_NVDIMM, + QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_MACHINE_OPT, QEMU_CAPS_AES_KEY_WRAP, -- 2.8.4

On Mon, Aug 01, 2016 at 05:10:04PM +0200, Michal Privoznik wrote:
NVDIMM was introduced to qemu in v2.6.0-rc0~248^2~25. So it's been a while since then.
It's not the next big thing, but it is very interesting feature enabling higher performance as reading/writing to the module (and subsequently to the file on the host) does not require a VMEXIT. It can be used to access host files directly bypassing page cache whilst doing so.
Looks good. Main thing from the libguestfs point of view is it supports setting the shared access to 'private', so we can use it as a way to accelerate the appliance root disk, which might also be of interest to other libvirt users. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

On Mon, Aug 1, 2016 at 4:10 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
NVDIMM was introduced to qemu in v2.6.0-rc0~248^2~25. So it's been a while since then.
It's not the next big thing, but it is very interesting feature enabling higher performance as reading/writing to the module (and subsequently to the file on the host) does not require a VMEXIT. It can be used to access host files directly bypassing page cache whilst doing so.
How to test the feature? 1) you need PMEM enabled kernel: CONFIG_LIBNVDIMM=y CONFIG_BLK_DEV_PMEM=m CONFIG_ACPI_NFIT=m
2) Create a file in the host: truncate -s 512M /tmp/nvdimm
3) Add the following to the domain XML:
<memory model='nvdimm' memAccess='shared'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> </memory>
The "nvdimm" device also has a label-size property. This determines the size of the Namespace Label area described in: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf By default no Namespace Label area is reserved in the file. If the user specifies label-size then the memory at the end of the file is used as the Namespace Label area. It is necessary to expose label-size so users can choose whether or not to have a Namespace Label area. I have CCed Guangrong Xiao who authored the QEMU patches. Stefan

On 04.08.2016 10:42, Stefan Hajnoczi wrote:
On Mon, Aug 1, 2016 at 4:10 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
NVDIMM was introduced to qemu in v2.6.0-rc0~248^2~25. So it's been a while since then.
It's not the next big thing, but it is very interesting feature enabling higher performance as reading/writing to the module (and subsequently to the file on the host) does not require a VMEXIT. It can be used to access host files directly bypassing page cache whilst doing so.
How to test the feature? 1) you need PMEM enabled kernel: CONFIG_LIBNVDIMM=y CONFIG_BLK_DEV_PMEM=m CONFIG_ACPI_NFIT=m
2) Create a file in the host: truncate -s 512M /tmp/nvdimm
3) Add the following to the domain XML:
<memory model='nvdimm' memAccess='shared'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> </memory>
The "nvdimm" device also has a label-size property. This determines the size of the Namespace Label area described in: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
By default no Namespace Label area is reserved in the file. If the user specifies label-size then the memory at the end of the file is used as the Namespace Label area.
It is necessary to expose label-size so users can choose whether or not to have a Namespace Label area.
I have CCed Guangrong Xiao who authored the QEMU patches.
Ah, thank you for that. From the code I understand that the minimum value for that is 128K, but what about other restrictions? Does the number need to be aligned? What is the minimal step between two different values? Michal

On 08/11/2016 04:12 PM, Michal Privoznik wrote:
On 04.08.2016 10:42, Stefan Hajnoczi wrote:
On Mon, Aug 1, 2016 at 4:10 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
NVDIMM was introduced to qemu in v2.6.0-rc0~248^2~25. So it's been a while since then.
It's not the next big thing, but it is very interesting feature enabling higher performance as reading/writing to the module (and subsequently to the file on the host) does not require a VMEXIT. It can be used to access host files directly bypassing page cache whilst doing so.
How to test the feature? 1) you need PMEM enabled kernel: CONFIG_LIBNVDIMM=y CONFIG_BLK_DEV_PMEM=m CONFIG_ACPI_NFIT=m
2) Create a file in the host: truncate -s 512M /tmp/nvdimm
3) Add the following to the domain XML:
<memory model='nvdimm' memAccess='shared'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>523264</size> <node>0</node> </target> </memory>
The "nvdimm" device also has a label-size property. This determines the size of the Namespace Label area described in: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
By default no Namespace Label area is reserved in the file. If the user specifies label-size then the memory at the end of the file is used as the Namespace Label area.
It is necessary to expose label-size so users can choose whether or not to have a Namespace Label area.
I have CCed Guangrong Xiao who authored the QEMU patches.
Ah, thank you for that. From the code I understand that the minimum value for that is 128K, but what about other restrictions? Does the number need to be aligned? What is the minimal step between two different values?
Michal, The are only two restrictions: 1) the minimum label size is 128K as you pointed out. 2) the remaining size (total-size - label-size, which is used for PMEM) alined with 4k for normal backend or hugepage-size for hugetlbfs can not be 0. Thanks!
participants (4)
-
Michal Privoznik
-
Richard W.M. Jones
-
Stefan Hajnoczi
-
Xiao Guangrong