[PATCH 1/1] NEWS.rst: update with pSeries initial memory fix

Commit v6.10.0-103-g198c1eb6b4 fixed this issue. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- I forgot to update NEWS.rst back then :/ NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index da88b19d0a..2c5cee77db 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -23,6 +23,13 @@ v7.0.0 (unreleased) * **Bug fixes** + * Avoid taking extra host memory when launching pSeries guests + + Under certain conditions, pSeries guests were being launched with more + RAM than it was specified in the domain XML by the user. New pSeries + domains created with libvirt 7.0.0 will always launch with the right + amount of initial memory. + v6.10.0 (2020-12-01) ==================== -- 2.26.2

On Wed, Jan 06, 2021 at 11:42:09AM -0300, Daniel Henrique Barboza wrote:
Commit v6.10.0-103-g198c1eb6b4 fixed this issue.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> ---
I forgot to update NEWS.rst back then :/
NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index da88b19d0a..2c5cee77db 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -23,6 +23,13 @@ v7.0.0 (unreleased)
* **Bug fixes**
+ * Avoid taking extra host memory when launching pSeries guests + + Under certain conditions, pSeries guests were being launched with more + RAM than it was specified in the domain XML by the user. New pSeries + domains created with libvirt 7.0.0 will always launch with the right + amount of initial memory.
Surely this is going to break live migration from old to new libvirt, as the QEMU started in the dest host will have a smaller -m arg. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 1/6/21 11:47 AM, Daniel P. Berrangé wrote:
On Wed, Jan 06, 2021 at 11:42:09AM -0300, Daniel Henrique Barboza wrote:
Commit v6.10.0-103-g198c1eb6b4 fixed this issue.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> ---
I forgot to update NEWS.rst back then :/
NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index da88b19d0a..2c5cee77db 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -23,6 +23,13 @@ v7.0.0 (unreleased)
* **Bug fixes**
+ * Avoid taking extra host memory when launching pSeries guests + + Under certain conditions, pSeries guests were being launched with more + RAM than it was specified in the domain XML by the user. New pSeries + domains created with libvirt 7.0.0 will always launch with the right + amount of initial memory.
Surely this is going to break live migration from old to new libvirt, as the QEMU started in the dest host will have a smaller -m arg.
This bug fix is being effective just when VIR_DOMAIN_DEF_PARSE_ABI_UPDATE is set. The migration code doesn't set this flag. Thanks, DHB
Regards, Daniel

On Wed, Jan 06, 2021 at 12:00:30PM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 11:47 AM, Daniel P. Berrangé wrote:
On Wed, Jan 06, 2021 at 11:42:09AM -0300, Daniel Henrique Barboza wrote:
Commit v6.10.0-103-g198c1eb6b4 fixed this issue.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> ---
I forgot to update NEWS.rst back then :/
NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index da88b19d0a..2c5cee77db 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -23,6 +23,13 @@ v7.0.0 (unreleased) * **Bug fixes** + * Avoid taking extra host memory when launching pSeries guests + + Under certain conditions, pSeries guests were being launched with more + RAM than it was specified in the domain XML by the user. New pSeries + domains created with libvirt 7.0.0 will always launch with the right + amount of initial memory.
Surely this is going to break live migration from old to new libvirt, as the QEMU started in the dest host will have a smaller -m arg.
This bug fix is being effective just when VIR_DOMAIN_DEF_PARSE_ABI_UPDATE is set. The migration code doesn't set this flag.
That doesn't make it safe. virDomainCreateXML on the source Libvirt 7.0.0 on the source will set PARSE_ABI_UPDATE and thus set the new smaller RAM size. Now we live migrate to libvirt 6.9.0 on dest host, and that will not set PARSE_ABI_UPDATE and thus set the larger RAM size. QEMU will fail to load the migration stream since the source and dest RAM sizes differ. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 1/6/21 12:03 PM, Daniel P. Berrangé wrote:
On Wed, Jan 06, 2021 at 12:00:30PM -0300, Daniel Henrique Barboza wrote:
On 1/6/21 11:47 AM, Daniel P. Berrangé wrote:
On Wed, Jan 06, 2021 at 11:42:09AM -0300, Daniel Henrique Barboza wrote:
Commit v6.10.0-103-g198c1eb6b4 fixed this issue.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> ---
I forgot to update NEWS.rst back then :/
NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index da88b19d0a..2c5cee77db 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -23,6 +23,13 @@ v7.0.0 (unreleased) * **Bug fixes** + * Avoid taking extra host memory when launching pSeries guests + + Under certain conditions, pSeries guests were being launched with more + RAM than it was specified in the domain XML by the user. New pSeries + domains created with libvirt 7.0.0 will always launch with the right + amount of initial memory.
Surely this is going to break live migration from old to new libvirt, as the QEMU started in the dest host will have a smaller -m arg.
This bug fix is being effective just when VIR_DOMAIN_DEF_PARSE_ABI_UPDATE is set. The migration code doesn't set this flag.
That doesn't make it safe.
Let me give a quick context since v6.10.0-103-g198c1eb6b4 commit msg explains it in detail. There is (still) an issue with the order we're doing things inside qemuDomainAlignMemorySizes(). We're calculating initialmem before aligning the memory modules, but the memory modules sizes are used in the initialmem calc, i.e. initialmem = total_mem - sum_of_mem_modules ppc64 has a 256Mb alignment, and prior to this change the mem modules were left unaligned for the initialmem calc in this function. So for a 2Gb totalmem guest with a 100Mb (unaligned) mem module: initialmem = 2Gb - 100Mb = 1.9Gb Later in the same function we align the memory modules, and then re-calculate the totalmem, and we'll have: totalmem = initialmem + aligned_mem_modules = 1.9 + 256Mb = 2156Mb, i.e. we overshot the intended memory in 156Mb. Aligning the memory modules in PostParse prevent this from happening. Now note that qemuDomainAlignMemorySizes() is called only when VIR_QEMU_PROCESS_START_NEW is set. This flag is not being set during migration or snapshot reload (as seen in qemuProcessStart(). This logic was originally set by Peter in commit c7d7ba85a6242. This means that the change made for pSeries guests will only occur when: - VIR_DOMAIN_DEF_PARSE_ABI_UPDATE is set, so PostParse alignment happens; - VIR_QEMU_PROCESS_START_NEW is set, otherwise qemuDomainAlignMemorySizes() will not execute.
virDomainCreateXML on the source Libvirt 7.0.0 on the source will set PARSE_ABI_UPDATE and thus set the new smaller RAM size.
Now we live migrate to libvirt 6.9.0 on dest host, and that will not set PARSE_ABI_UPDATE and thus set the larger RAM size.
In this scenario, yes, the memory modules on the destination will not be aligned in PostParse time, but they'll not be used to calculate initialmem/totalmem again because we don't align memory during live migration. Thanks, DHB
QEMU will fail to load the migration stream since the source and dest RAM sizes differ.> Regards, Daniel

On Wed, 2021-01-06 at 12:50 -0300, Daniel Henrique Barboza wrote:
On 1/6/21 12:03 PM, Daniel P. Berrangé wrote:
virDomainCreateXML on the source Libvirt 7.0.0 on the source will set PARSE_ABI_UPDATE and thus set the new smaller RAM size.
Now we live migrate to libvirt 6.9.0 on dest host, and that will not set PARSE_ABI_UPDATE and thus set the larger RAM size.
In this scenario, yes, the memory modules on the destination will not be aligned in PostParse time, but they'll not be used to calculate initialmem/totalmem again because we don't align memory during live migration.
Since we now perform memory alignment earlier, we will reflect the aligned size back to the XML instead of re-aligning it at command line generation time every single time the VM is started. So the XML that we're going to send to the migration destination will contain sizes that are already aligned, and thus further alignment should not be necessary. At least, that's the theory :) Daniel, did you test migration between libvirt 7.0.0 and earlier releases? Can you confirm it works? -- Andrea Bolognani / Red Hat / Virtualization

On 1/11/21 2:56 PM, Andrea Bolognani wrote:
On Wed, 2021-01-06 at 12:50 -0300, Daniel Henrique Barboza wrote:
On 1/6/21 12:03 PM, Daniel P. Berrangé wrote:
virDomainCreateXML on the source Libvirt 7.0.0 on the source will set PARSE_ABI_UPDATE and thus set the new smaller RAM size.
Now we live migrate to libvirt 6.9.0 on dest host, and that will not set PARSE_ABI_UPDATE and thus set the larger RAM size.
In this scenario, yes, the memory modules on the destination will not be aligned in PostParse time, but they'll not be used to calculate initialmem/totalmem again because we don't align memory during live migration.
Since we now perform memory alignment earlier, we will reflect the aligned size back to the XML instead of re-aligning it at command line generation time every single time the VM is started.
That's correct.
So the XML that we're going to send to the migration destination will contain sizes that are already aligned, and thus further alignment should not be necessary.
In fact, the destination will not realign the memory at any circunstance, period. If a miscalculation happens in the source and the memory ends up unaligned, the destination will have to deal with it.
At least, that's the theory :)
Daniel, did you test migration between libvirt 7.0.0 and earlier releases? Can you confirm it works?
Just tested a migration scenario between 2 hosts (a Power 9 Boston running Libvirt 6.8.0 and a Power 9 Mihawk running Libvirt 7.0.0), with a domain that reproduces the bug. TL;DR: migration works both ways without issues. Details below if interested: 1) created a common domain XML with the following format: (...) <maxMemory slots='16' unit='KiB'>4194304</maxMemory> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>2097152</currentMemory> (...) <memory model='dimm'> <target> <size unit='KiB'>323264</size> </target> <address type='dimm' slot='0'/> </memory> (...) The dimm is left unaligned on purpose to demonstrate how an older Libvirt behaves vs 7.0.0. 2) In a Power 9 Boston machine running Libvirt 6.8.0, I created a 'vm1_6.8.0' domain using the XML above. Since there is no postparse aignment in this version, the domain was defined with the unaligned dimm. 3) Started the VM and checked the QEMU command line: [danielhb@ltc-boston118 build]$ sudo ./run tools/virsh start vm1_6.8.0 Domain vm1_6.8.0 started [danielhb@ltc-boston118 build]$ sudo tail -n 30 ~/libvirt_build/var/log/libvirt/qemu/vm1_6.8.0.log (...) -cpu POWER9 \ -m size=1835008k,slots=16,maxmem=4194304k \ (...) -object memory-backend-ram,id=memdimm0,size=536870912,host-nodes=8,policy=bind \ -device pc-dimm,memdev=memdimm0,id=dimm0,slot=0 \ -uuid f3545d9d-f8e6-4569-9e10-ade357b28163 \ (...) Note that "-m" plus the dimm is more that 2Gb size, which is the bug I fixed in Libvirt 7.0.0 for new domains. 4) Migrating the vm1_6.8.0 domain to a Power 9 Mihawk server, running Libvirt 7.0.0: $ sudo ./run tools/virsh -c 'qemu:///system' migrate --live --domain vm1_6.8.0 \ --desturi qemu+ssh://ltcmihawk39/system --timeout 60 $ 5) In the Mihawk host: [danielhb@ltcmihawk39 build]$ sudo ./run tools/virsh list --all [sudo] password for danielhb: Id Name State ---------------------------- 1 vm1_6.8.0 running [danielhb@ltcmihawk39 build]$ sudo tail -n 30 ~/libvirt_build/var/log/libvirt/qemu/vm1_6.8.0.log (...) -cpu POWER9 \ -m size=1835008k,slots=16,maxmem=4194304k \ (...) -object memory-backend-ram,id=memdimm0,size=536870912,host-nodes=8,policy=bind \ -device pc-dimm,memdev=memdimm0,id=dimm0,slot=0,addr=2147483648 \ -uuid f3545d9d-f8e6-4569-9e10-ade357b28163 \ (...) As predicted, there is no recalculation of memory alignment in the destination. This is the same QEMU command line as in (3). 6) Going the other way around, I defined a vm1_7.0.0 domain using the domain XML described in (1). Since it's a new domain running Libvirt 7.0.0, the dimm were aligned in the PostParse callback: [danielhb@ltcmihawk39 build]$ sudo ./run tools/virsh dumpxml vm1_7.0.0 (...) <domain type='kvm'> <name>vm1_7.0.0</name> <uuid>6a45986d-68c4-4296-afa9-83df4e6ea6cd</uuid> <maxMemory slots='16' unit='KiB'>4194304</maxMemory> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>2097152</currentMemory> (...) <memory model='dimm'> <target> <size unit='KiB'>524288</size> </target> <address type='dimm' slot='0'/> </memory> (...) 7) Started the domain and checked QEMU arguments: [danielhb@ltcmihawk39 build]$ sudo ./run tools/virsh start vm1_7.0.0 Domain 'vm1_7.0.0' started [danielhb@ltcmihawk39 build]$ sudo tail -n 30 ~/libvirt_build/var/log/libvirt/qemu/vm1_7.0.0.log (...) -cpu POWER9 \ -m size=1572864k,slots=16,maxmem=4194304k \ (...) -object memory-backend-ram,id=memdimm0,size=536870912,host-nodes=8,policy=bind \ -device pc-dimm,memdev=memdimm0,id=dimm0,slot=0 \ -uuid 6a45986d-68c4-4296-afa9-83df4e6ea6cd \ (...) Here, we can see that this QEMU is indeed using the intended 2Gb of RAM instead of 1835008k+ 536870912 like in (3) with an older Libvirt. 8) Migrate it to the Boston server with Libvirt 6.8.0 and check if all went according to plan: [danielhb@ltcmihawk39 build]$ sudo ./run tools/virsh -c 'qemu:///system' migrate --live \ --domain vm1_7.0.0 --desturi qemu+ssh://ltc-boston118/system --timeout 60 [danielhb@ltc-boston118 build]$ sudo ./run tools/virsh list --all [sudo] password for danielhb: Id Name State ---------------------------- 4 vm1_7.0.0 running - vm1_6.8.0 shut off [danielhb@ltc-boston118 build]$ sudo tail -n 30 ~/libvirt_build/var/log/libvirt/qemu/vm1_7.0.0.log (...) -cpu POWER9 \ -m size=1572864k,slots=16,maxmem=4194304k \ (...) -object memory-backend-ram,id=memdimm0,size=536870912,host-nodes=8,policy=bind \ -device pc-dimm,memdev=memdimm0,id=dimm0,slot=0,addr=2147483648 \ -uuid 6a45986d-68c4-4296-afa9-83df4e6ea6cd \ (...) We can see that the domain was migrated using the memory sizes from the source host, as intended. Thanks, DHB

On Mon, 2021-01-11 at 23:03 -0300, Daniel Henrique Barboza wrote:
On 1/11/21 2:56 PM, Andrea Bolognani wrote:
Daniel, did you test migration between libvirt 7.0.0 and earlier releases? Can you confirm it works?
Just tested a migration scenario between 2 hosts (a Power 9 Boston running Libvirt 6.8.0 and a Power 9 Mihawk running Libvirt 7.0.0), with a domain that reproduces the bug.
TL;DR: migration works both ways without issues.
Thanks for checking! Did you also test the following, less straighforward scenarios: * creating the domain on 6.8.0, migrating it to 7.0.0, then migrating it back to 6.8.0; * creating the domain on 6.8.0, migrating it to 7.0.0, shutting it down, starting it back up, then migrating it back to 6.8.0. I believe they should work correctly, because in neither case ABI_UPDATE should be involved, but it would be great to have direct confirmation. -- Andrea Bolognani / Red Hat / Virtualization

On 1/12/21 10:24 AM, Andrea Bolognani wrote:
On Mon, 2021-01-11 at 23:03 -0300, Daniel Henrique Barboza wrote:
On 1/11/21 2:56 PM, Andrea Bolognani wrote:
Daniel, did you test migration between libvirt 7.0.0 and earlier releases? Can you confirm it works?
Just tested a migration scenario between 2 hosts (a Power 9 Boston running Libvirt 6.8.0 and a Power 9 Mihawk running Libvirt 7.0.0), with a domain that reproduces the bug.
TL;DR: migration works both ways without issues.
Thanks for checking! Did you also test the following, less straighforward scenarios:
* creating the domain on 6.8.0, migrating it to 7.0.0, then migrating it back to 6.8.0;
Ping-ponging works with either scenario, just checked.
* creating the domain on 6.8.0, migrating it to 7.0.0, shutting it down, starting it back up, then migrating it back to 6.8.0.
It works. Details down below: *** In the 6.8.0 server: [danielhb@ltc-boston118 build]$ sudo ./run tools/virsh start vm1_6.8.0 Domain vm1_6.8.0 started [danielhb@ltc-boston118 build]$ sudo ./run tools/virsh -c 'qemu:///system' migrate --live \ --domain vm1_6.8.0 --desturi qemu+ssh://ltcmihawk39/system --timeout 60 \ --persistent --undefinesource [danielhb@ltc-boston118 build]$ sudo ./run tools/virsh list --all Id Name State -------------------- <--- migration success ---> *** 7.0.0 server: [danielhb@ltcmihawk39 build]$ sudo ./run tools/virsh list --all Id Name State ---------------------------- 5 vm1_6.8.0 running - vm1_7.0.0 shut off [danielhb@ltcmihawk39 build]$ sudo ./run tools/virsh destroy vm1_6.8.0 Domain 'vm1_6.8.0' destroyed [danielhb@ltcmihawk39 build]$ sudo ./run tools/virsh start vm1_6.8.0 Domain 'vm1_6.8.0' started [danielhb@ltcmihawk39 build]$ sudo tail -n 30 ~/libvirt_build/var/log/libvirt/qemu/vm1_6.8.0.log (...) -cpu POWER9 \ -m size=1835008k,slots=16,maxmem=4194304k \ (...) -object memory-backend-ram,id=memdimm0,size=536870912,host-nodes=8,policy=bind \ -device pc-dimm,memdev=memdimm0,id=dimm0,slot=0 \ <-- Note: same memory parameters defined in the 6.8.0 version. We can't distinguish whether this memory setting was user intention or a misalign in the source Libvirt, since we do not re-align during migration --> [danielhb@ltcmihawk39 build]$ sudo ./run tools/virsh -c 'qemu:///system' migrate --live \ --domain vm1_6.8.0 --desturi qemu+ssh://ltc-boston118/system --timeout 60 \ --persistent --undefinesource [danielhb@ltcmihawk39 build]$ <--- migration success ---> *** Back to the 6.8.0 server: [danielhb@ltc-boston118 build]$ sudo ./run tools/virsh list --all [sudo] password for danielhb: Id Name State --------------------------- 8 vm1_6.8.0 running Thanks, DHB
I believe they should work correctly, because in neither case ABI_UPDATE should be involved, but it would be great to have direct confirmation.

On Tue, 2021-01-12 at 12:18 -0300, Daniel Henrique Barboza wrote:
On 1/12/21 10:24 AM, Andrea Bolognani wrote:
On Mon, 2021-01-11 at 23:03 -0300, Daniel Henrique Barboza wrote:
On 1/11/21 2:56 PM, Andrea Bolognani wrote:
Daniel, did you test migration between libvirt 7.0.0 and earlier releases? Can you confirm it works?
Just tested a migration scenario between 2 hosts (a Power 9 Boston running Libvirt 6.8.0 and a Power 9 Mihawk running Libvirt 7.0.0), with a domain that reproduces the bug.
TL;DR: migration works both ways without issues.
Thanks for checking! Did you also test the following, less straighforward scenarios:
* creating the domain on 6.8.0, migrating it to 7.0.0, then migrating it back to 6.8.0;
Ping-ponging works with either scenario, just checked.
* creating the domain on 6.8.0, migrating it to 7.0.0, shutting it down, starting it back up, then migrating it back to 6.8.0.
It works.
Thanks once again for checking! It looks like everything is fine. Dan, are you satisfied with these tests, or do you still feel the changes might have introduced compatibility issues? -- Andrea Bolognani / Red Hat / Virtualization

On Tue, Jan 12, 2021 at 04:34:18PM +0100, Andrea Bolognani wrote:
On Tue, 2021-01-12 at 12:18 -0300, Daniel Henrique Barboza wrote:
On 1/12/21 10:24 AM, Andrea Bolognani wrote:
On Mon, 2021-01-11 at 23:03 -0300, Daniel Henrique Barboza wrote:
On 1/11/21 2:56 PM, Andrea Bolognani wrote:
Daniel, did you test migration between libvirt 7.0.0 and earlier releases? Can you confirm it works?
Just tested a migration scenario between 2 hosts (a Power 9 Boston running Libvirt 6.8.0 and a Power 9 Mihawk running Libvirt 7.0.0), with a domain that reproduces the bug.
TL;DR: migration works both ways without issues.
Thanks for checking! Did you also test the following, less straighforward scenarios:
* creating the domain on 6.8.0, migrating it to 7.0.0, then migrating it back to 6.8.0;
Ping-ponging works with either scenario, just checked.
* creating the domain on 6.8.0, migrating it to 7.0.0, shutting it down, starting it back up, then migrating it back to 6.8.0.
It works.
Thanks once again for checking! It looks like everything is fine.
Dan, are you satisfied with these tests, or do you still feel the changes might have introduced compatibility issues?
It sounds fine to me Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (3)
-
Andrea Bolognani
-
Daniel Henrique Barboza
-
Daniel P. Berrangé