
On Thu, Feb 18, 2021 at 14:31:00 +0100, Michal Privoznik wrote:
The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes:
1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest.
The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway).
In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>.
Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with:
<memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory>
I hope by now it is obvious that:
1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address.
Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation).
Since now we have (possibly) two or more devices that allow memory inflation/deflation and accounting for all of them (and thus keeping <currentMemory/> updated) might be hard. Therefore, I'm deliberately forbidding memballoon. It's okay - virtio-mem is superior to memballoon anyway. We can always reevaluate later.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 51 +++++++++++++-- docs/schemas/domaincommon.rng | 11 ++++ src/conf/domain_conf.c | 53 ++++++++++++++- src/conf/domain_conf.h | 3 + src/conf/domain_validate.c | 39 +++++++++++ src/qemu/qemu_alias.c | 1 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_domain.c | 27 +++++++- src/qemu/qemu_domain_address.c | 38 ++++++++--- src/qemu/qemu_process.c | 2 + src/qemu/qemu_validate.c | 22 ++++++- src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + .../memory-hotplug-virtio-mem.xml | 64 +++++++++++++++++++ ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + 17 files changed, 298 insertions(+), 21 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 2587106191..05d359a32d 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst
[...]
@@ -6935,6 +6937,16 @@ Example: automatically added device with KVM </devices> ...
+Example: automatically added device with QEMU/KVM and a ``virtio-mem`` device: + +:: + + ... + <devices>
The example is missing the 'virtio-mem' device which causes the balloon to be disabled.
+ <memballoon model='none'/> + </devices> + ... + Example: manually added device with static PCI slot 2 requested
::
[...]
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 541d592bbe..26b9b5583e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3679,10 +3679,21 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def, }
if (addDefaultMemballoon && !def->memballoon) { - virDomainMemballoonDefPtr memballoon; - memballoon = g_new0(virDomainMemballoonDef, 1); + virDomainMemballoonDefPtr memballoon = g_new0(virDomainMemballoonDef, 1); + size_t i;
- memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO; + /* To simplify virtio-mem implementation, memballoon has to be turned + * off if domain has a virtio-mem device. See + * qemuValidateDomainDeviceDefMemory() for more details. */ + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + break; + } + + if (i == def->nmems) + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO; + else + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_NONE; def->memballoon = memballoon; }
This might be a point of contention. I can see why we'd want to do this but people might have other ideas. Preferably I'd like danpb's input. I think the best approach will be to not hide it into the massive patch adding the boring XML bits but split it separately. [...]
diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 2541ae856a..6150083251 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c
[...]
@@ -4688,6 +4689,25 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem, } break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + /* Accounting balloon and virtio-mem is hard. We have plenty of APIs + * which take balloon from QEMU and report it to users. We would have + * to change all that and account for virtio-mem actual size. Also, + * virtio-mem is supposed to be replacement for balloon. Disable + * coexistence of these two for now. */ + if (virDomainDefHasMemballoon(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem is not supported with memory balloon")); + return -1; + } + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem isn't supported by this QEMU binary")); + return -1; + } + break;
And this one too. I agree with this hunk as is, but not in this commit.