
On 11/16/20 4:43 PM, Andrea Bolognani wrote:
On Fri, 2020-11-13 at 16:23 -0300, Daniel Henrique Barboza wrote:
On 11/13/20 10:51 AM, Andrea Bolognani wrote:
I only skimmed the remaining patches and didn't dig into the code as much, or as recently, as you have, but from a high-level perspective I don't see why you wouldn't be able to simply move the existing rounding logic from the command line generator to PostParse? It's not like the former has access to additional information that the latter can't get to, right?
I was looking into the code and I think that might have the wrong idea here. Apparently we're not aligning memory during migration or snapshot restore. This specific line in qemu_command.c got my attention:
-- qemuBuildCommandLine() --
if (!migrateURI && !snapshot && qemuDomainAlignMemorySizes(def) < 0) return NULL;
------
I investigated the history behind this logic and found the following commit:
commit c7d7ba85a6242d789ba3f4dae313e950fbb638c5 Author: Peter Krempa <pkrempa@redhat.com> Date: Thu Sep 17 08:14:05 2015 +0200
qemu: command: Align memory sizes only on fresh starts
When we are starting a qemu process for an incomming migration or snapshot reloading we should not modify the memory sizes in the domain since we could potentially change the guest ABI that was tediously checked before. Additionally the function now updates the initial memory size according to the NUMA node size, which should not happen if we are restoring state.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1252685
---------
This means that the changes made in this series will not break migration, since the alignment of 'initialmem' is not being triggered in those cases. Which is good.
However, you also brought up in an earlier reply that these changes might break "the guest ABI across guest boots (if libvirt is upgraded in between them)". This can't be helped I think - an older ppc64 guest with 4GiB of 'currentMemory' in the XML, that is ending up having 4.256GiB (extra 256MiB) due to the way alignment was being done, will have exactly 4GiB of 'initialmem' after these changes. My point is that we're giving the exact memory the guest is demanding, as intended by the domain XML, in a fresh guest start. This might be considered an ABI break probably, but why would one complain that Libvirt is now giving precisely what was asked for? Avoiding granting extra 256MBs of mem for domains seems worth it, given that we're not impacting live domains or migration.
In general, changing guest ABI between boots is not something that we want to happen.
I have trouble keeping all the details of memory alignment inside my head and I can't quite spend the time necessary to swap them back in right now, so please apologies if I'm being vague and of course correct me if I'm wrong... Having Peter in the thread will also surely help with that :)
The aim of this series should *not* be to change the calculations that happen when aligning memory, but only to reflect them back to the domain XML where they can be queried: so for example if the input
<memory unit='GiB'>4</memory> <devices> <memory model='nvdimm'> <target> <size unit='MiB'>500</size>
results in the command line
-m 4352m -object memory-backend-file,size=512m
(the exact numbers are not relevant), then what we want is for the XML to be updated at define time so that it reads
I investigate it more and I think I got a few premises wrong here: - I pointed out that the alignment changes aren't being reflected. This is wrong. I figured it out when launching real guests and seeing that the live XML was being updated properly. This diminishes the urgency of this patch series considerably, and I apologize for the confusion. - The reason I claimed that is because I got misled by my own misunderstanding of what qemuxml2xmltest.c does: it checks the result XML only up to PostParse changes, and that's it. Any further change in the XML aren't being reflected by the tests. qemuxml2argvtest.c goes as far as building the command line, and I got the wrong idea that both tests reflected the same stages of a VM launch. For what I'm doing here, qemuxml2xml isn't helpful. All this out of the way:
<memory unit='MiB'>4864</memory> <devices> <memory model='nvdimm'> <target> <size unit='MiB'>512</size>
(again, the numbers are almost certainly wrong) and we want *that* XML to generate the same QEMU command line as before.
If this can't be achieved, or there are other side effects to it that I haven't considered, then we're better off leaving the current behavior alone (documenting the heck out of it if necessary) instead of changing it in ways that would alter guest ABI between boots.
This can be achieve, sort of. I can't just move the alignment to PostParse, even without changing how it is calculated, because I would break the premises it's based on today (not be executed on migration or snapshot). To do that I would need to gate the code into a VIR_DOMAIN_DEF_PARSE_ABI_UPDATE parse flag. But I'm not sure if this flag is activated on guest reboot without changing the domain (I'm looking into it ATM, but I don't think that's the case). So even if we do that, we would still need the alignment call after PostParse time, as is today, to accommodate existing guests. Regardless of whether this is worth doing or not (TBH, I'm not sure) I'll end up sending a couple of patches to reposition the alignment code out of qemu_command.c (but not up to ParseTime). What I still want to do is to use VIR_DOMAIN_DEF_PARSE_ABI_UPDATE to force the alignment of pseries DIMMs in PostParse time. This will not break ABI of existing guests in any capacity and will fix the alignment calculation for newer ones. We have precedent for such scenario in qemuDomainControllerDefPostParse(), when we use this same flag to define VIR_DOMAIN_CONTROLLER_MODEL_USB_QEMU_XHCI for new pSeries guests. I'll post patches today. Let's see how it goes. DHB