[PATCH v1 0/4] revert latest PPC64 NVDIMM changes

Hi, In [1], Daniel pointed out that the changes made by [2] are uncompliant with the Libvirt design of not doing anything that can compromise guests in the wild. This series aims to solve that by reverting the changes made by [2]. It also enhances the auto-alignment of ppc64 NVDIMM in a way that we can have consistency between the sizes reported in the domain XML and what QEMU will see, while not hurting any running guests that are already using the auto-alignment feature. For simplicity, first patch contains the revert of all relevant patches from [2] in a single commit. [1] https://www.redhat.com/archives/libvir-list/2020-September/msg00572.html [2] https://www.redhat.com/archives/libvir-list/2020-July/msg02010.html Daniel Henrique Barboza (4): qemu: revert latest pSeries NVDIMM design changes conf, qemu: move qemuDomainNVDimmAlignSizePseries to domain_conf.c domain_conf.c: auto-align pSeries NVDIMM in virDomainMemoryDefParseXML() NEWS.rst: document the pSeries NVDIMM auto-alignment revival NEWS.rst | 12 ++++ docs/formatdomain.rst | 6 +- src/conf/domain_conf.c | 71 +++++++++++++++---- src/conf/domain_conf.h | 3 + src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 19 +++-- src/qemu/qemu_domain.h | 6 +- src/qemu/qemu_hotplug.c | 6 +- src/qemu/qemu_validate.c | 42 ++--------- ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args | 2 +- .../memory-hotplug-nvdimm-ppc64.xml | 2 +- .../memory-hotplug-nvdimm-ppc64.xml | 2 +- 12 files changed, 105 insertions(+), 67 deletions(-) -- 2.26.2

In [1], changes were made to remove the existing auto-alignment for pSeries NVDIMM devices. That design promotes strange situations where the NVDIMM size reported in the domain XML is different from what QEMU is actually using. We removed the auto-alignment and relied on standard size validation. However, this goes against Libvirt design philosophy of not tampering with existing guest behavior, as pointed out by Daniel in [2]. Since we can't know for sure whether there are guests that are relying on the auto-alignment feature to work, the changes made in [1] are a direct violation of this rule. This patch reverts [1] entirely, re-enabling auto-alignment for pSeries NVDIMM as it was before. Changes will be made to ease the limitations of this design without hurting existing guests. This reverts the following commits: - commit 0ee56369c8b4f2f898b6aa1ff1f51ab033be1c02, "qemu_domain.c: change qemuDomainMemoryDeviceAlignSize() return type". - commit 07de813924caf37e535855541c0c1183d9d382e2, "qemu_domain.c: do not auto-align ppc64 NVDIMMs" - commit 0ccceaa57c50e5ee528f7073fa8723afd62b88b7, "qemu_validate.c: add pSeries NVDIMM size alignment validation" - commit 4fa2202d884414ad34d9952e72fb39b1d93c7e14, "qemu_domain.c: make qemuDomainGetMemorySizeAlignment() public" - commit 2d93cbdea9d1b8dbf36bc0ffee6cb73d83d208c7, "Revert "formatdomain.html.in: mention pSeries NVDIMM 'align down' mechanic"" [1] https://www.redhat.com/archives/libvir-list/2020-July/msg02010.html [2] https://www.redhat.com/archives/libvir-list/2020-September/msg00572.html --- docs/formatdomain.rst | 6 +- src/qemu/qemu_domain.c | 57 +++++++++++++++++-- src/qemu/qemu_domain.h | 6 +- src/qemu/qemu_hotplug.c | 6 +- src/qemu/qemu_validate.c | 42 ++------------ ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args | 2 +- .../memory-hotplug-nvdimm-ppc64.xml | 2 +- .../memory-hotplug-nvdimm-ppc64.xml | 2 +- 8 files changed, 70 insertions(+), 53 deletions(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 18e237c157..d5930a4ac1 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7216,8 +7216,10 @@ Example: usage of the memory devices following restrictions apply: #. the minimum label size is 128KiB, - #. the remaining size (total-size - label-size) will be aligned - to 4KiB as default. + #. the remaining size (total-size - label-size), also called guest area, + will be aligned to 4KiB as default. For pSeries guests, the guest area + will be aligned down to 256MiB, and the minimum size of the guest area + must be at least 256MiB. ``readonly`` The ``readonly`` element is used to mark the vNVDIMM as read-only. Only diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 03917cf000..4f796bef4a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8037,7 +8037,7 @@ qemuDomainAgentAvailable(virDomainObjPtr vm, } -unsigned long long +static unsigned long long qemuDomainGetMemorySizeAlignment(const virDomainDef *def) { /* PPC requires the memory sizes to be rounded to 256MiB increments, so @@ -8067,6 +8067,44 @@ qemuDomainGetMemoryModuleSizeAlignment(const virDomainDef *def, } +static int +qemuDomainNVDimmAlignSizePseries(virDomainDefPtr def, + virDomainMemoryDefPtr mem) +{ + /* For NVDIMMs in ppc64 in we want to align down the guest + * visible space, instead of align up, to avoid writing + * beyond the end of file by adding a potential 256MiB + * to the user specified size. + * + * The label-size is mandatory for ppc64 as well, meaning that + * the guest visible space will be target_size-label_size. + * + * Finally, target_size must include label_size. + * + * The above can be summed up as follows: + * + * target_size = AlignDown(target_size - label_size) + label_size + */ + unsigned long long ppc64AlignSize = qemuDomainGetMemorySizeAlignment(def); + unsigned long long guestArea = mem->size - mem->labelsize; + + /* Align down guest_area. 256MiB is the minimum size. Error + * out if target_size is smaller than 256MiB + label_size, + * since aligning it up will cause QEMU errors. */ + if (mem->size < (ppc64AlignSize + mem->labelsize)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("minimum target size for the NVDIMM " + "must be 256MB plus the label size")); + return -1; + } + + guestArea = (guestArea/ppc64AlignSize) * ppc64AlignSize; + mem->size = guestArea + mem->labelsize; + + return 0; +} + + int qemuDomainAlignMemorySizes(virDomainDefPtr def) { @@ -8113,8 +8151,11 @@ qemuDomainAlignMemorySizes(virDomainDefPtr def) /* Align memory module sizes */ for (i = 0; i < def->nmems; i++) { - if (!(def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM && - ARCH_IS_PPC64(def->os.arch))) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM && + ARCH_IS_PPC64(def->os.arch)) { + if (qemuDomainNVDimmAlignSizePseries(def, def->mems[i]) < 0) + return -1; + } else { align = qemuDomainGetMemoryModuleSizeAlignment(def, def->mems[i]); def->mems[i]->size = VIR_ROUND_UP(def->mems[i]->size, align); } @@ -8143,15 +8184,19 @@ qemuDomainAlignMemorySizes(virDomainDefPtr def) * inplace. Default rounding is now to 1 MiB (qemu requires rounding to page, * size so this should be safe). */ -void +int qemuDomainMemoryDeviceAlignSize(virDomainDefPtr def, virDomainMemoryDefPtr mem) { - if (!(mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM && - ARCH_IS_PPC64(def->os.arch))) { + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM && + ARCH_IS_PPC64(def->os.arch)) { + return qemuDomainNVDimmAlignSizePseries(def, mem); + } else { mem->size = VIR_ROUND_UP(mem->size, qemuDomainGetMemorySizeAlignment(def)); } + + return 0; } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index adba79aded..e6d4acb8d4 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -751,11 +751,9 @@ bool qemuDomainDiskBlockJobIsActive(virDomainDiskDefPtr disk); bool qemuDomainHasBlockjob(virDomainObjPtr vm, bool copy_only) ATTRIBUTE_NONNULL(1); -unsigned long long qemuDomainGetMemorySizeAlignment(const virDomainDef *def); - int qemuDomainAlignMemorySizes(virDomainDefPtr def); -void qemuDomainMemoryDeviceAlignSize(virDomainDefPtr def, - virDomainMemoryDefPtr mem); +int qemuDomainMemoryDeviceAlignSize(virDomainDefPtr def, + virDomainMemoryDefPtr mem); virDomainChrDefPtr qemuFindAgentConfig(virDomainDefPtr def); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index e2c6e14c2e..3a780d00af 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2368,7 +2368,8 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, int id; int ret = -1; - qemuDomainMemoryDeviceAlignSize(vm->def, mem); + if (qemuDomainMemoryDeviceAlignSize(vm->def, mem) < 0) + goto cleanup; if (qemuDomainDefValidateMemoryHotplug(vm->def, priv->qemuCaps, mem) < 0) goto cleanup; @@ -5612,7 +5613,8 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm, virDomainMemoryDefPtr mem; int idx; - qemuDomainMemoryDeviceAlignSize(vm->def, match); + if (qemuDomainMemoryDeviceAlignSize(vm->def, match) < 0) + return -1; if ((idx = virDomainMemoryFindByDef(vm->def, match)) < 0) { virReportError(VIR_ERR_DEVICE_MISSING, diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 070f1c962b..c56ad70623 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -4034,45 +4034,15 @@ qemuValidateDomainDeviceDefHub(virDomainHubDefPtr hub, } -static unsigned long long -qemuValidateGetNVDIMMAlignedSizePseries(virDomainMemoryDefPtr mem, - const virDomainDef *def) -{ - unsigned long long ppc64AlignSize = qemuDomainGetMemorySizeAlignment(def); - unsigned long long guestArea = mem->size - mem->labelsize; - - /* NVDIMM is already aligned */ - if (guestArea % ppc64AlignSize == 0) - return mem->size; - - /* Suggested aligned size is rounded up */ - guestArea = (guestArea/ppc64AlignSize + 1) * ppc64AlignSize; - return guestArea + mem->labelsize; -} - static int qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem, - const virDomainDef *def, virQEMUCapsPtr qemuCaps) { - if (mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { - if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_NVDIMM)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("nvdimm isn't supported by this QEMU binary")); - return -1; - } - - if (qemuDomainIsPSeries(def)) { - unsigned long long alignedNVDIMMSize = - qemuValidateGetNVDIMMAlignedSizePseries(mem, def); - - if (mem->size != alignedNVDIMMSize) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("nvdimm size is not aligned. Suggested aligned " - "size: %llu KiB"), alignedNVDIMMSize); - return -1; - } - } + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_NVDIMM)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("nvdimm isn't supported by this QEMU binary")); + return -1; } return 0; @@ -4190,7 +4160,7 @@ qemuValidateDomainDeviceDef(const virDomainDeviceDef *dev, break; case VIR_DOMAIN_DEVICE_MEMORY: - ret = qemuValidateDomainDeviceDefMemory(dev->data.memory, def, qemuCaps); + ret = qemuValidateDomainDeviceDefMemory(dev->data.memory, qemuCaps); break; case VIR_DOMAIN_DEVICE_LEASE: diff --git a/tests/qemuxml2argvdata/memory-hotplug-nvdimm-ppc64.ppc64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-nvdimm-ppc64.ppc64-latest.args index 58e3f9e161..eff80dcf80 100644 --- a/tests/qemuxml2argvdata/memory-hotplug-nvdimm-ppc64.ppc64-latest.args +++ b/tests/qemuxml2argvdata/memory-hotplug-nvdimm-ppc64.ppc64-latest.args @@ -19,7 +19,7 @@ file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \ -smp 2,sockets=2,dies=1,cores=1,threads=1 \ -numa node,nodeid=0,cpus=0-1,mem=1024 \ -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm,\ -size=805437440 \ +size=537001984 \ -device nvdimm,node=0,label-size=131072,\ uuid=49545eb3-75e1-2d0a-acdd-f0294406c99e,memdev=memnvdimm0,id=nvdimm0,slot=0 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ diff --git a/tests/qemuxml2argvdata/memory-hotplug-nvdimm-ppc64.xml b/tests/qemuxml2argvdata/memory-hotplug-nvdimm-ppc64.xml index 10c146e8cf..ae5a17d3c8 100644 --- a/tests/qemuxml2argvdata/memory-hotplug-nvdimm-ppc64.xml +++ b/tests/qemuxml2argvdata/memory-hotplug-nvdimm-ppc64.xml @@ -38,7 +38,7 @@ <path>/tmp/nvdimm</path> </source> <target> - <size unit='KiB'>786560</size> + <size unit='KiB'>550000</size> <node>0</node> <label> <size unit='KiB'>128</size> diff --git a/tests/qemuxml2xmloutdata/memory-hotplug-nvdimm-ppc64.xml b/tests/qemuxml2xmloutdata/memory-hotplug-nvdimm-ppc64.xml index 10c146e8cf..ae5a17d3c8 100644 --- a/tests/qemuxml2xmloutdata/memory-hotplug-nvdimm-ppc64.xml +++ b/tests/qemuxml2xmloutdata/memory-hotplug-nvdimm-ppc64.xml @@ -38,7 +38,7 @@ <path>/tmp/nvdimm</path> </source> <target> - <size unit='KiB'>786560</size> + <size unit='KiB'>550000</size> <node>0</node> <label> <size unit='KiB'>128</size> -- 2.26.2

On Mon, 2020-09-14 at 23:42 -0300, Daniel Henrique Barboza wrote:
In [1], changes were made to remove the existing auto-alignment for pSeries NVDIMM devices. That design promotes strange situations where the NVDIMM size reported in the domain XML is different from what QEMU is actually using. We removed the auto-alignment and relied on standard size validation.
However, this goes against Libvirt design philosophy of not tampering with existing guest behavior, as pointed out by Daniel in [2]. Since we can't know for sure whether there are guests that are relying on the auto-alignment feature to work, the changes made in [1] are a direct violation of this rule.
This patch reverts [1] entirely, re-enabling auto-alignment for pSeries NVDIMM as it was before. Changes will be made to ease the limitations of this design without hurting existing guests.
This reverts the following commits:
- commit 0ee56369c8b4f2f898b6aa1ff1f51ab033be1c02, "qemu_domain.c: change qemuDomainMemoryDeviceAlignSize() return type".
- commit 07de813924caf37e535855541c0c1183d9d382e2, "qemu_domain.c: do not auto-align ppc64 NVDIMMs"
- commit 0ccceaa57c50e5ee528f7073fa8723afd62b88b7, "qemu_validate.c: add pSeries NVDIMM size alignment validation"
- commit 4fa2202d884414ad34d9952e72fb39b1d93c7e14, "qemu_domain.c: make qemuDomainGetMemorySizeAlignment() public"
- commit 2d93cbdea9d1b8dbf36bc0ffee6cb73d83d208c7, "Revert "formatdomain.html.in: mention pSeries NVDIMM 'align down' mechanic""
[1] https://www.redhat.com/archives/libvir-list/2020-July/msg02010.html [2] https://www.redhat.com/archives/libvir-list/2020-September/msg00572.html
Reverting multiple commits with a single commit is sort of unusual, but in this case it seem to make sense and I don't really have a problem with it, so unless someone shouts against this approach I think we can merge it as-is. A few tweaks to the commit message, though: * keep the order of commits consistent to the one they were merged in, which more specifically means putting 2d93cbdea9d1 at the very top of the list; * lose the commas after the various commit hashes - they break convenient double-click selection in most terminals; * indent the commit subject by two spaces for better readability. The commit is also missing your S-o-b, and as you know that's a blocker for merging. With the above addressed, Reviewed-by: Andrea Bolognani <abologna@redhat.com> -- Andrea Bolognani / Red Hat / Virtualization

On 9/17/20 7:48 AM, Andrea Bolognani wrote:
On Mon, 2020-09-14 at 23:42 -0300, Daniel Henrique Barboza wrote:
In [1], changes were made to remove the existing auto-alignment for pSeries NVDIMM devices. That design promotes strange situations where the NVDIMM size reported in the domain XML is different from what QEMU is actually using. We removed the auto-alignment and relied on standard size validation.
However, this goes against Libvirt design philosophy of not tampering with existing guest behavior, as pointed out by Daniel in [2]. Since we can't know for sure whether there are guests that are relying on the auto-alignment feature to work, the changes made in [1] are a direct violation of this rule.
This patch reverts [1] entirely, re-enabling auto-alignment for pSeries NVDIMM as it was before. Changes will be made to ease the limitations of this design without hurting existing guests.
This reverts the following commits:
- commit 0ee56369c8b4f2f898b6aa1ff1f51ab033be1c02, "qemu_domain.c: change qemuDomainMemoryDeviceAlignSize() return type".
- commit 07de813924caf37e535855541c0c1183d9d382e2, "qemu_domain.c: do not auto-align ppc64 NVDIMMs"
- commit 0ccceaa57c50e5ee528f7073fa8723afd62b88b7, "qemu_validate.c: add pSeries NVDIMM size alignment validation"
- commit 4fa2202d884414ad34d9952e72fb39b1d93c7e14, "qemu_domain.c: make qemuDomainGetMemorySizeAlignment() public"
- commit 2d93cbdea9d1b8dbf36bc0ffee6cb73d83d208c7, "Revert "formatdomain.html.in: mention pSeries NVDIMM 'align down' mechanic""
[1] https://www.redhat.com/archives/libvir-list/2020-July/msg02010.html [2] https://www.redhat.com/archives/libvir-list/2020-September/msg00572.html
Reverting multiple commits with a single commit is sort of unusual, but in this case it seem to make sense and I don't really have a problem with it, so unless someone shouts against this approach I think we can merge it as-is.
A few tweaks to the commit message, though:
* keep the order of commits consistent to the one they were merged in, which more specifically means putting 2d93cbdea9d1 at the very top of the list;
* lose the commas after the various commit hashes - they break convenient double-click selection in most terminals;
* indent the commit subject by two spaces for better readability.
Ok!
The commit is also missing your S-o-b, and as you know that's a blocker for merging.
Ouch. reverting + squashing took its toll ....
With the above addressed,
Reviewed-by: Andrea Bolognani <abologna@redhat.com>

On Thu, 2020-09-17 at 17:34 -0300, Daniel Henrique Barboza wrote:
On 9/17/20 7:48 AM, Andrea Bolognani wrote:
A few tweaks to the commit message, though:
* keep the order of commits consistent to the one they were merged in, which more specifically means putting 2d93cbdea9d1 at the very top of the list;
* lose the commas after the various commit hashes - they break convenient double-click selection in most terminals;
* indent the commit subject by two spaces for better readability.
Ok!
The commit is also missing your S-o-b, and as you know that's a blocker for merging.
Ouch. reverting + squashing took its toll ....
Since we're approaching the freeze period and don't want the revert to miss the deadline, resulting in another libvirt release implementing the problematic behavior, I have gone ahead and pushed this single patch after applying all the tweaks I mentioned above and that you had confirmed you were okay with anyway. I've also posted an update to the release notes covering the revert: https://www.redhat.com/archives/libvir-list/2020-September/msg01105.html This is the exact same wording I had suggested in [1], which you were also okay with, minus the part about reflecting the updated value in the domain XML, since that's obviously not implemented yet. [1] https://www.redhat.com/archives/libvir-list/2020-September/msg01020.html -- Andrea Bolognani / Red Hat / Virtualization

On 9/22/20 7:31 AM, Andrea Bolognani wrote:
On Thu, 2020-09-17 at 17:34 -0300, Daniel Henrique Barboza wrote:
On 9/17/20 7:48 AM, Andrea Bolognani wrote:
A few tweaks to the commit message, though:
* keep the order of commits consistent to the one they were merged in, which more specifically means putting 2d93cbdea9d1 at the very top of the list;
* lose the commas after the various commit hashes - they break convenient double-click selection in most terminals;
* indent the commit subject by two spaces for better readability.
Ok!
The commit is also missing your S-o-b, and as you know that's a blocker for merging.
Ouch. reverting + squashing took its toll ....
Since we're approaching the freeze period and don't want the revert to miss the deadline, resulting in another libvirt release implementing the problematic behavior, I have gone ahead and pushed this single patch after applying all the tweaks I mentioned above and that you had confirmed you were okay with anyway.
Roger that. I'll resend the series without this revert patch then (hopefully soon). Thanks, DHB
I've also posted an update to the release notes covering the revert:
https://www.redhat.com/archives/libvir-list/2020-September/msg01105.html
This is the exact same wording I had suggested in [1], which you were also okay with, minus the part about reflecting the updated value in the domain XML, since that's obviously not implemented yet.
[1] https://www.redhat.com/archives/libvir-list/2020-September/msg01020.html

We'll use the auto-alignment function during parse time, in domain_conf.c. Let's move the function to that file, renaming it to virDomainNVDimmAlignSizePseries(). This will also make it clearer that, although QEMU is the only driver that currently supports it, pSeries NVDIMM restrictions aren't tied to QEMU. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/conf/domain_conf.c | 36 ++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 3 +++ src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 42 ++-------------------------------------- 4 files changed, 42 insertions(+), 40 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 72ac4f4191..63434b9d3e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16876,6 +16876,42 @@ virDomainSEVDefParseXML(xmlNodePtr sevNode, return NULL; } +int +virDomainNVDimmAlignSizePseries(virDomainMemoryDefPtr mem) +{ + /* For NVDIMMs in ppc64 in we want to align down the guest + * visible space, instead of align up, to avoid writing + * beyond the end of file by adding a potential 256MiB + * to the user specified size. + * + * The label-size is mandatory for ppc64 as well, meaning that + * the guest visible space will be target_size-label_size. + * + * Finally, target_size must include label_size. + * + * The above can be summed up as follows: + * + * target_size = AlignDown(target_size - label_size) + label_size + */ + unsigned long long ppc64AlignSize = 256 * 1024; + unsigned long long guestArea = mem->size - mem->labelsize; + + /* Align down guest_area. 256MiB is the minimum size. Error + * out if target_size is smaller than 256MiB + label_size, + * since aligning it up will cause QEMU errors. */ + if (mem->size < (ppc64AlignSize + mem->labelsize)) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("minimum target size for the NVDIMM " + "must be 256MB plus the label size")); + return -1; + } + + guestArea = (guestArea/ppc64AlignSize) * ppc64AlignSize; + mem->size = guestArea + mem->labelsize; + + return 0; +} + static virDomainMemoryDefPtr virDomainMemoryDefParseXML(virDomainXMLOptionPtr xmlopt, xmlNodePtr memdevNode, diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 14a376350c..19ee9d6832 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3895,6 +3895,9 @@ bool virDomainBlockIoTuneInfoEqual(const virDomainBlockIoTuneInfo *a, const virDomainBlockIoTuneInfo *b); +int +virDomainNVDimmAlignSizePseries(virDomainMemoryDefPtr mem); + bool virHostdevIsSCSIDevice(const virDomainHostdevDef *hostdev) ATTRIBUTE_NONNULL(1); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5f1aea3694..6bf2ef9744 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -542,6 +542,7 @@ virDomainNetTypeToString; virDomainNetUpdate; virDomainNostateReasonTypeFromString; virDomainNostateReasonTypeToString; +virDomainNVDimmAlignSizePseries; virDomainObjAssignDef; virDomainObjBroadcast; virDomainObjCheckActive; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4f796bef4a..f50a0a4710 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8067,44 +8067,6 @@ qemuDomainGetMemoryModuleSizeAlignment(const virDomainDef *def, } -static int -qemuDomainNVDimmAlignSizePseries(virDomainDefPtr def, - virDomainMemoryDefPtr mem) -{ - /* For NVDIMMs in ppc64 in we want to align down the guest - * visible space, instead of align up, to avoid writing - * beyond the end of file by adding a potential 256MiB - * to the user specified size. - * - * The label-size is mandatory for ppc64 as well, meaning that - * the guest visible space will be target_size-label_size. - * - * Finally, target_size must include label_size. - * - * The above can be summed up as follows: - * - * target_size = AlignDown(target_size - label_size) + label_size - */ - unsigned long long ppc64AlignSize = qemuDomainGetMemorySizeAlignment(def); - unsigned long long guestArea = mem->size - mem->labelsize; - - /* Align down guest_area. 256MiB is the minimum size. Error - * out if target_size is smaller than 256MiB + label_size, - * since aligning it up will cause QEMU errors. */ - if (mem->size < (ppc64AlignSize + mem->labelsize)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("minimum target size for the NVDIMM " - "must be 256MB plus the label size")); - return -1; - } - - guestArea = (guestArea/ppc64AlignSize) * ppc64AlignSize; - mem->size = guestArea + mem->labelsize; - - return 0; -} - - int qemuDomainAlignMemorySizes(virDomainDefPtr def) { @@ -8153,7 +8115,7 @@ qemuDomainAlignMemorySizes(virDomainDefPtr def) for (i = 0; i < def->nmems; i++) { if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM && ARCH_IS_PPC64(def->os.arch)) { - if (qemuDomainNVDimmAlignSizePseries(def, def->mems[i]) < 0) + if (virDomainNVDimmAlignSizePseries(def->mems[i]) < 0) return -1; } else { align = qemuDomainGetMemoryModuleSizeAlignment(def, def->mems[i]); @@ -8190,7 +8152,7 @@ qemuDomainMemoryDeviceAlignSize(virDomainDefPtr def, { if (mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM && ARCH_IS_PPC64(def->os.arch)) { - return qemuDomainNVDimmAlignSizePseries(def, mem); + return virDomainNVDimmAlignSizePseries(mem); } else { mem->size = VIR_ROUND_UP(mem->size, qemuDomainGetMemorySizeAlignment(def)); -- 2.26.2

The alignment for the pSeries NVDIMM does not depend on runtime constraints. This means that it can be done in device parse time, instead of runtime, allowing the domain XML to reflect what the auto-alignment would do when the domain starts. This brings consistency between the NVDIMM size reported by the domain XML and what the guest sees, without impacting existing guests that are using an unaligned size - they'll work as usual, but the domain XML will be updated with the actual size of the NVDIMM. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/conf/domain_conf.c | 35 +++++++++++-------- .../memory-hotplug-nvdimm-ppc64.xml | 2 +- 2 files changed, 22 insertions(+), 15 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 63434b9d3e..0fd16964c1 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16965,6 +16965,25 @@ virDomainMemoryDefParseXML(virDomainXMLOptionPtr xmlopt, } VIR_FREE(tmp); + /* source */ + if ((node = virXPathNode("./source", ctxt)) && + virDomainMemorySourceDefParseXML(node, ctxt, def) < 0) + goto error; + + /* target */ + if (!(node = virXPathNode("./target", ctxt))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing <target> element for <memory> device")); + goto error; + } + + if (virDomainMemoryTargetDefParseXML(node, ctxt, def) < 0) + goto error; + + /* The UUID calculation for pSeries NVDIMM can be done earlier, + * but we'll also need to handle the size alignment, which depends + * on virDomainMemoryTargetDefParseXML() already done. Let's do it + * all here. */ if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM && ARCH_IS_PPC64(dom->os.arch)) { /* Extract nvdimm uuid or generate a new one */ @@ -16981,23 +17000,11 @@ virDomainMemoryDefParseXML(virDomainXMLOptionPtr xmlopt, "%s", _("malformed uuid element")); goto error; } - } - - /* source */ - if ((node = virXPathNode("./source", ctxt)) && - virDomainMemorySourceDefParseXML(node, ctxt, def) < 0) - goto error; - /* target */ - if (!(node = virXPathNode("./target", ctxt))) { - virReportError(VIR_ERR_XML_ERROR, "%s", - _("missing <target> element for <memory> device")); - goto error; + if (virDomainNVDimmAlignSizePseries(def) < 0) + goto error; } - if (virDomainMemoryTargetDefParseXML(node, ctxt, def) < 0) - goto error; - if (virDomainDeviceInfoParseXML(xmlopt, memdevNode, &def->info, flags) < 0) goto error; diff --git a/tests/qemuxml2xmloutdata/memory-hotplug-nvdimm-ppc64.xml b/tests/qemuxml2xmloutdata/memory-hotplug-nvdimm-ppc64.xml index ae5a17d3c8..ecb1b83b4a 100644 --- a/tests/qemuxml2xmloutdata/memory-hotplug-nvdimm-ppc64.xml +++ b/tests/qemuxml2xmloutdata/memory-hotplug-nvdimm-ppc64.xml @@ -38,7 +38,7 @@ <path>/tmp/nvdimm</path> </source> <target> - <size unit='KiB'>550000</size> + <size unit='KiB'>524416</size> <node>0</node> <label> <size unit='KiB'>128</size> -- 2.26.2

On Mon, 2020-09-14 at 23:42 -0300, Daniel Henrique Barboza wrote:
@@ -16981,23 +17000,11 @@ virDomainMemoryDefParseXML(virDomainXMLOptionPtr xmlopt, + if (virDomainNVDimmAlignSizePseries(def) < 0) + goto error;
The outcome of this change is good, and overall it doesn't negatively impact existing guest; however, the ParseXML() function is not the correct place to perform it, and it should happen in the PostParse() callback instead. If the restrictions on alignment are, as you claim in the commit message for the patch before this one, not QEMU-specific, then a reasonable home for the rounding logic could be virDomainDefPostParseMemory(), but you could also leave it in the QEMU driver by implementing a qemuDomainMemoryDefPostParse() function that's called by the qemuDomainDeviceDefPostParse() dispatcher. -- Andrea Bolognani / Red Hat / Virtualization

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- NEWS.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 14f098b1f6..9020696f86 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -28,6 +28,18 @@ v6.8.0 (unreleased) useful for containerised scenarios and can be used in both peer2peer and direct migrations. + * Re-introduce NVDIMM auto-alignment for pSeries Guests + + This feature was removed in libvirt v6.7.0 due to its shortcomings, namely + the lack of consistency between domain XML and actual NVDIMM size the guest + is using, but it came at a too high of a cost - we broke existing guests + that were running in libvirt v.6.6.0 and now had to adjust the size by + hand. The auto-alignment was re-introduced back, allowing existing guests + to keep running as usual. But now, instead of auto-aligning in a transparent + manner, it is also changing the domain XML. This brings a good balance + between consistency of domain XML and guest information, and also relieves + the user of knowing the alignment internals of pSeries NVDIMMs. + * **Bug fixes** -- 2.26.2

On Mon, 2020-09-14 at 23:42 -0300, Daniel Henrique Barboza wrote:
+ * Re-introduce NVDIMM auto-alignment for pSeries Guests + + This feature was removed in libvirt v6.7.0 due to its shortcomings, namely + the lack of consistency between domain XML and actual NVDIMM size the guest + is using, but it came at a too high of a cost - we broke existing guests + that were running in libvirt v.6.6.0 and now had to adjust the size by + hand. The auto-alignment was re-introduced back, allowing existing guests + to keep running as usual. But now, instead of auto-aligning in a transparent + manner, it is also changing the domain XML. This brings a good balance + between consistency of domain XML and guest information, and also relieves + the user of knowing the alignment internals of pSeries NVDIMMs.
I feel like this is needlessly verbose. How about something like * Re-introduce NVDIMM auto-alignment for pSeries Guests The auto-alignment logic was removed in v6.7.0 in favor of requiring the size provided by the user to be already aligned; however, this had the unintended consequence of breaking some existing guests. v6.8.0 restores the previous behavior, and as an improvement upon it also reflects the auto-aligned value in the domain XML. instead? -- Andrea Bolognani / Red Hat / Virtualization

On 9/17/20 1:02 PM, Andrea Bolognani wrote:
On Mon, 2020-09-14 at 23:42 -0300, Daniel Henrique Barboza wrote:
+ * Re-introduce NVDIMM auto-alignment for pSeries Guests + + This feature was removed in libvirt v6.7.0 due to its shortcomings, namely + the lack of consistency between domain XML and actual NVDIMM size the guest + is using, but it came at a too high of a cost - we broke existing guests + that were running in libvirt v.6.6.0 and now had to adjust the size by + hand. The auto-alignment was re-introduced back, allowing existing guests + to keep running as usual. But now, instead of auto-aligning in a transparent + manner, it is also changing the domain XML. This brings a good balance + between consistency of domain XML and guest information, and also relieves + the user of knowing the alignment internals of pSeries NVDIMMs.
I feel like this is needlessly verbose. How about something like
* Re-introduce NVDIMM auto-alignment for pSeries Guests
The auto-alignment logic was removed in v6.7.0 in favor of requiring the size provided by the user to be already aligned; however, this had the unintended consequence of breaking some existing guests. v6.8.0 restores the previous behavior, and as an improvement upon it also reflects the auto-aligned value in the domain XML.
instead?
LGTM. I'll change it in v2. Thanks, DHB
participants (2)
-
Andrea Bolognani
-
Daniel Henrique Barboza