[libvirt] [RFC PATCH 00/17] Add vhost-user-gpu support

From: Marc-André Lureau <marcandre.lureau@redhat.com> Hi, This series of patches add support for running a virtio GPU in a seperate process, using vhost-user. The QEMU series "[PATCH v4 00/29] vhost-user for input & GPU" is still under review, and will hopefully land in 3.1. There are several benefits of running the GPU process in an external process, since Mesa is rather heavy on the qemu main loop, and may block for a while or crash. I observe x5 performance improvements with Unigine Heaven 4 benchmark. The external GPU process is started with one end of the vhost-user socket pair, the other end is given to a QEMU chardev. It is also added to the emulator cgroup to restrict its CPU usage. vhost-user requires shared VM memory. A few preliminary patches ease and improve shared memory setup, when no explicit domain NUMA configuration is given. Also, if there is no need for file-backed memory, teach libvirt to make use of memfd memory backend, which has better security guarantees and is easier to setup. Review welcome! Marc-André Lureau (17): qemu: setup shared memory without explicit numa configuration qemu: add memory-backend-memfd capability check qemu: use memory-backend-memfd if possible qemu: add vhost-user-gpu capabilities domain: add "vhost-user" video type qemu: fill the vhost-user video type capability qemu: check that qemu is vhost-user-vga capable qemu: vhost-user is valid as non-primary video device qemu: validate vhost-user video model qemu: add qemuSecurityStartVhostUserGPU helper qemu: add vhost-user-gpu helper unit qemu: restrict 'virgl=' option to 'virtio' video type qemu: set default address type on vhost-user video model qemu: start/stop the vhost-user-gpu external device qemu: build vhost-user-backend for vhost-user-gpu qemu: build vhost-user-gpu video device arguments tests: add vhost-user-gpu xml2argv tests docs/formatdomain.html.in | 5 +- docs/schemas/domaincommon.rng | 1 + src/conf/device_conf.h | 1 + src/conf/domain_conf.c | 4 +- src/conf/domain_conf.h | 1 + src/qemu/Makefile.inc.am | 2 + src/qemu/qemu_capabilities.c | 10 + src/qemu/qemu_capabilities.h | 5 + src/qemu/qemu_command.c | 191 ++++++++--- src/qemu/qemu_domain.c | 8 +- src/qemu/qemu_domain_address.c | 4 +- src/qemu/qemu_extdevice.c | 47 ++- src/qemu/qemu_process.c | 6 +- src/qemu/qemu_security.c | 48 +++ src/qemu/qemu_security.h | 6 + src/qemu/qemu_vhost_user_gpu.c | 318 ++++++++++++++++++ src/qemu/qemu_vhost_user_gpu.h | 48 +++ tests/domaincapsschemadata/full.xml | 1 + .../caps_2.12.0.aarch64.xml | 1 + .../caps_2.12.0.ppc64.xml | 1 + .../caps_2.12.0.s390x.xml | 1 + .../caps_2.12.0.x86_64.xml | 1 + .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + .../caps_3.0.0.x86_64.xml | 1 + .../fd-memory-no-numa-topology.args | 4 + tests/qemuxml2argvdata/memfd.args | 28 ++ tests/qemuxml2argvdata/memfd.xml | 32 ++ .../vhost-user-gpu-secondary.args | 34 ++ .../vhost-user-gpu-secondary.xml | 38 +++ tests/qemuxml2argvdata/vhost-user-vga.args | 31 ++ tests/qemuxml2argvdata/vhost-user-vga.xml | 35 ++ tests/qemuxml2argvtest.c | 14 + 32 files changed, 877 insertions(+), 51 deletions(-) create mode 100644 src/qemu/qemu_vhost_user_gpu.c create mode 100644 src/qemu/qemu_vhost_user_gpu.h create mode 100644 tests/qemuxml2argvdata/memfd.args create mode 100644 tests/qemuxml2argvdata/memfd.xml create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.args create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.xml create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.args create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.xml -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> When a domain is configured with 'shared' memory backing: <memoryBacking> <access mode='shared'/> </memoryBacking> But no explicit NUMA configuration, let's configure a shared memory backend associated with default -numa. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 100 ++++++++++++------ .../fd-memory-no-numa-topology.args | 4 + 2 files changed, 73 insertions(+), 31 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 44ae8dcef7..f1235099b2 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3254,26 +3254,21 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, static int -qemuBuildMemoryCellBackendStr(virDomainDefPtr def, - virQEMUDriverConfigPtr cfg, - size_t cell, - qemuDomainObjPrivatePtr priv, - virBufferPtr buf) +qemuBuildMemoryBackendStr(virDomainDefPtr def, + virQEMUDriverConfigPtr cfg, + const char *alias, + int targetNode, + unsigned long long memsize, + qemuDomainObjPrivatePtr priv, + virBufferPtr buf) { virJSONValuePtr props = NULL; - char *alias = NULL; - int ret = -1; - int rc; virDomainMemoryDef mem = { 0 }; - unsigned long long memsize = virDomainNumaGetNodeMemorySize(def->numa, - cell); - - if (virAsprintf(&alias, "ram-node%zu", cell) < 0) - goto cleanup; + int rc, ret = -1; mem.size = memsize; - mem.targetNode = cell; - mem.info.alias = alias; + mem.targetNode = targetNode; + mem.info.alias = (char *)alias; if ((rc = qemuBuildMemoryBackendProps(&props, alias, cfg, priv->qemuCaps, def, &mem, priv->autoNodeset, false)) < 0) @@ -3284,9 +3279,30 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def, ret = rc; +cleanup: + virJSONValueFree(props); + return ret; +} + + +static int +qemuBuildMemoryCellBackendStr(virDomainDefPtr def, + virQEMUDriverConfigPtr cfg, + size_t cell, + qemuDomainObjPrivatePtr priv, + virBufferPtr buf) +{ + char *alias = NULL; + int ret = -1; + unsigned long long memsize = virDomainNumaGetNodeMemorySize(def->numa, cell); + + if (virAsprintf(&alias, "ram-node%zu", cell) < 0) + goto cleanup; + + ret = qemuBuildMemoryBackendStr(def, cfg, alias, cell, memsize, priv, buf); + cleanup: VIR_FREE(alias); - virJSONValueFree(props); return ret; } @@ -7590,6 +7606,17 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, size_t ncells = virDomainNumaGetNodeCount(def->numa); const long system_page_size = virGetSystemPageSizeKB(); bool numa_distances = false; + bool implicit = false; + + if (ncells == 0) { + if (def->mem.access == VIR_DOMAIN_MEMORY_ACCESS_SHARED) { + ncells = 1; + implicit = true; + } else { + ret = 0; + goto cleanup; + } + } if (virDomainNumatuneHasPerNodeBinding(def->numa) && !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || @@ -7645,14 +7672,22 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { - if ((rc = qemuBuildMemoryCellBackendStr(def, cfg, i, priv, - &nodeBackends[i])) < 0) + + if (implicit) + rc = qemuBuildMemoryBackendStr(def, cfg, "ram-node", -1, + def->mem.total_memory, + priv, &nodeBackends[i]); + else + rc = qemuBuildMemoryCellBackendStr(def, cfg, i, + priv, &nodeBackends[i]); + if (rc < 0) goto cleanup; if (rc == 0) needBackend = true; } else { - if (virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) { + if (implicit || + virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Shared memory mapping is not supported " "with this QEMU")); @@ -7667,15 +7702,18 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, for (i = 0; i < ncells; i++) { VIR_FREE(cpumask); - if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i)))) - goto cleanup; - if (strchr(cpumask, ',') && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("disjoint NUMA cpu ranges are not supported " - "with this QEMU")); - goto cleanup; + if (!implicit) { + if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i)))) + goto cleanup; + + if (strchr(cpumask, ',') && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA cpu ranges are not supported " + "with this QEMU")); + goto cleanup; + } } if (needBackend) { @@ -7694,7 +7732,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, } if (needBackend) - virBufferAsprintf(&buf, ",memdev=ram-node%zu", i); + virBufferAsprintf(&buf, implicit ? + ",memdev=ram-node" : ",memdev=ram-node%zu", i); else virBufferAsprintf(&buf, ",mem=%llu", virDomainNumaGetNodeMemorySize(def->numa, i) / 1024); @@ -7717,7 +7756,7 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, break; } - if (numa_distances) { + if (!implicit && numa_distances) { if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA_DIST)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("setting NUMA distances is not " @@ -10303,8 +10342,7 @@ qemuBuildCommandLine(virQEMUDriverPtr driver, if (qemuBuildIOThreadCommandLine(cmd, def) < 0) goto error; - if (virDomainNumaGetNodeCount(def->numa) && - qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0) + if (qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0) goto error; if (qemuBuildMemoryDeviceCommandLine(cmd, cfg, def, priv) < 0) diff --git a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args index bd88daaa3b..400fb39cc6 100644 --- a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args +++ b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args @@ -11,6 +11,10 @@ QEMU_AUDIO_DRV=none \ -m 14336 \ -mem-prealloc \ -smp 8,sockets=8,cores=1,threads=1 \ +-object memory-backend-file,id=ram-node,\ +mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/-1-instance-00000092/ram-node,\ +share=yes,size=15032385536 \ +-numa node,nodeid=0,memdev=ram-node \ -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ -display none \ -no-user-config \ -- 2.18.0.129.ge3331758f1

On 07/13/2018 09:28 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
When a domain is configured with 'shared' memory backing:
<memoryBacking> <access mode='shared'/> </memoryBacking>
But no explicit NUMA configuration, let's configure a shared memory backend associated with default -numa.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 100 ++++++++++++------ .../fd-memory-no-numa-topology.args | 4 + 2 files changed, 73 insertions(+), 31 deletions(-)
NUMA, memory backends, and hugepages - not in my wheelhouse of knowledge. Hopefully Michal and/or Pavel will take a look! Is it possible someone may not want this type of thing to happen? Is there an upside or downside to this? What happens "today" when not generated? And of course, what about migration concerns about unconditionally doing this for some target migration?
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 44ae8dcef7..f1235099b2 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3254,26 +3254,21 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps,
static int -qemuBuildMemoryCellBackendStr(virDomainDefPtr def, - virQEMUDriverConfigPtr cfg, - size_t cell, - qemuDomainObjPrivatePtr priv, - virBufferPtr buf) +qemuBuildMemoryBackendStr(virDomainDefPtr def, + virQEMUDriverConfigPtr cfg, + const char *alias,
So the one "concern" I'd have here is some time in the future the @mem gets allocated and handled like a real device eventually calling virDomainDeviceInfoClear and that'd be a problem for the passed const char * string. Some future person's problem I guess!
+ int targetNode, + unsigned long long memsize, + qemuDomainObjPrivatePtr priv, + virBufferPtr buf)
As much as a long name is a pain, is this more of a : qemuBuildMemorySharedDefaultBackendStr
{ virJSONValuePtr props = NULL; - char *alias = NULL; - int ret = -1; - int rc; virDomainMemoryDef mem = { 0 }; - unsigned long long memsize = virDomainNumaGetNodeMemorySize(def->numa, - cell); - - if (virAsprintf(&alias, "ram-node%zu", cell) < 0) - goto cleanup; + int rc, ret = -1;
mem.size = memsize; - mem.targetNode = cell; - mem.info.alias = alias; + mem.targetNode = targetNode; + mem.info.alias = (char *)alias;
if ((rc = qemuBuildMemoryBackendProps(&props, alias, cfg, priv->qemuCaps, def, &mem, priv->autoNodeset, false)) < 0)> @@ -3284,9 +3279,30 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def,
ret = rc;
+cleanup:
Fails 'make check syntax-check' : maint.mk: Top-level labels should be indented by one space make: *** [cfg.mk:898: sc_require_space_before_label] Error 1
+ virJSONValueFree(props); + return ret; +} + + +static int +qemuBuildMemoryCellBackendStr(virDomainDefPtr def, + virQEMUDriverConfigPtr cfg, + size_t cell, + qemuDomainObjPrivatePtr priv, + virBufferPtr buf) +{ + char *alias = NULL; + int ret = -1; + unsigned long long memsize = virDomainNumaGetNodeMemorySize(def->numa, cell); + + if (virAsprintf(&alias, "ram-node%zu", cell) < 0) + goto cleanup; + + ret = qemuBuildMemoryBackendStr(def, cfg, alias, cell, memsize, priv, buf); + cleanup: VIR_FREE(alias); - virJSONValueFree(props);
return ret; } @@ -7590,6 +7606,17 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, size_t ncells = virDomainNumaGetNodeCount(def->numa); const long system_page_size = virGetSystemPageSizeKB(); bool numa_distances = false; + bool implicit = false; + + if (ncells == 0) { + if (def->mem.access == VIR_DOMAIN_MEMORY_ACCESS_SHARED) { + ncells = 1;
Well, that's cheating for the subsequent for loop ;-)
+ implicit = true; + } else { + ret = 0; + goto cleanup; + } + }
So if ncells == 0 && def->mem.access != VIR_DOMAIN_MEMORY_ACCESS_SHARED, then we return 0 without doing the subsequent code? Is that expected? Is there something done later that may be necessary, needed, or assumed.
if (virDomainNumatuneHasPerNodeBinding(def->numa) && !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || @@ -7645,14 +7672,22 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) {
- if ((rc = qemuBuildMemoryCellBackendStr(def, cfg, i, priv, - &nodeBackends[i])) < 0) + + if (implicit) + rc = qemuBuildMemoryBackendStr(def, cfg, "ram-node", -1,
Using "ram-node" where other devices use "ram-node%zu" could easily be missed - maybe "default" or "implicit" as a prefix or postfix to make it stand out a bit more? It's not a big deal though.
+ def->mem.total_memory, + priv, &nodeBackends[i]); + else + rc = qemuBuildMemoryCellBackendStr(def, cfg, i, + priv, &nodeBackends[i]); + if (rc < 0) goto cleanup;
if (rc == 0) needBackend = true; } else { - if (virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) { + if (implicit || + virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Shared memory mapping is not supported " "with this QEMU")); @@ -7667,15 +7702,18 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
for (i = 0; i < ncells; i++) { VIR_FREE(cpumask); - if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i)))) - goto cleanup;
- if (strchr(cpumask, ',') && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("disjoint NUMA cpu ranges are not supported " - "with this QEMU")); - goto cleanup; + if (!implicit) { + if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i)))) + goto cleanup; + + if (strchr(cpumask, ',') && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA cpu ranges are not supported " + "with this QEMU")); + goto cleanup; + } }
if (needBackend) { @@ -7694,7 +7732,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, }
if (needBackend) - virBufferAsprintf(&buf, ",memdev=ram-node%zu", i); + virBufferAsprintf(&buf, implicit ? + ",memdev=ram-node" : ",memdev=ram-node%zu", i); else virBufferAsprintf(&buf, ",mem=%llu", virDomainNumaGetNodeMemorySize(def->numa, i) / 1024); @@ -7717,7 +7756,7 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, break; }
- if (numa_distances) { + if (!implicit && numa_distances) { if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA_DIST)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("setting NUMA distances is not " @@ -10303,8 +10342,7 @@ qemuBuildCommandLine(virQEMUDriverPtr driver, if (qemuBuildIOThreadCommandLine(cmd, def) < 0) goto error;
- if (virDomainNumaGetNodeCount(def->numa) && - qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0) + if (qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0) goto error;
if (qemuBuildMemoryDeviceCommandLine(cmd, cfg, def, priv) < 0) diff --git a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args index bd88daaa3b..400fb39cc6 100644 --- a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args +++ b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args @@ -11,6 +11,10 @@ QEMU_AUDIO_DRV=none \ -m 14336 \ -mem-prealloc \ -smp 8,sockets=8,cores=1,threads=1 \ +-object memory-backend-file,id=ram-node,\ +mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/-1-instance-00000092/ram-node,\ +share=yes,size=15032385536 \
Curious does that perhaps rather large file for mem-path get created? If so, wow! Seems to me something like this would need to be documented or as noted earlier be configurable. John
+-numa node,nodeid=0,memdev=ram-node \ -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ -display none \ -no-user-config \

Hi On Thu, Aug 16, 2018 at 4:35 AM, John Ferlan <jferlan@redhat.com> wrote:
On 07/13/2018 09:28 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
When a domain is configured with 'shared' memory backing:
<memoryBacking> <access mode='shared'/> </memoryBacking>
But no explicit NUMA configuration, let's configure a shared memory backend associated with default -numa.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 100 ++++++++++++------ .../fd-memory-no-numa-topology.args | 4 + 2 files changed, 73 insertions(+), 31 deletions(-)
NUMA, memory backends, and hugepages - not in my wheelhouse of knowledge. Hopefully Michal and/or Pavel will take a look!
Is it possible someone may not want this type of thing to happen? Is
I assume someone that sets 'shared' memory mode may consider this as a bug fix.
there an upside or downside to this? What happens "today" when not
You get non-shared memory
generated? And of course, what about migration concerns about unconditionally doing this for some target migration?
True, this will break migration though if the target uses numa/memory-backend-file. What do you suggest?
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 44ae8dcef7..f1235099b2 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3254,26 +3254,21 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps,
static int -qemuBuildMemoryCellBackendStr(virDomainDefPtr def, - virQEMUDriverConfigPtr cfg, - size_t cell, - qemuDomainObjPrivatePtr priv, - virBufferPtr buf) +qemuBuildMemoryBackendStr(virDomainDefPtr def, + virQEMUDriverConfigPtr cfg, + const char *alias,
So the one "concern" I'd have here is some time in the future the @mem gets allocated and handled like a real device eventually calling virDomainDeviceInfoClear and that'd be a problem for the passed const char * string. Some future person's problem I guess!
+ int targetNode, + unsigned long long memsize, + qemuDomainObjPrivatePtr priv, + virBufferPtr buf)
As much as a long name is a pain, is this more of a :
qemuBuildMemorySharedDefaultBackendStr
Why?
{ virJSONValuePtr props = NULL; - char *alias = NULL; - int ret = -1; - int rc; virDomainMemoryDef mem = { 0 }; - unsigned long long memsize = virDomainNumaGetNodeMemorySize(def->numa, - cell); - - if (virAsprintf(&alias, "ram-node%zu", cell) < 0) - goto cleanup; + int rc, ret = -1;
mem.size = memsize; - mem.targetNode = cell; - mem.info.alias = alias; + mem.targetNode = targetNode; + mem.info.alias = (char *)alias;
if ((rc = qemuBuildMemoryBackendProps(&props, alias, cfg, priv->qemuCaps, def, &mem, priv->autoNodeset, false)) < 0)> @@ -3284,9 +3279,30 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def,
ret = rc;
+cleanup:
Fails 'make check syntax-check' :
maint.mk: Top-level labels should be indented by one space make: *** [cfg.mk:898: sc_require_space_before_label] Error 1
argh, fixed
+ virJSONValueFree(props); + return ret; +} + + +static int +qemuBuildMemoryCellBackendStr(virDomainDefPtr def, + virQEMUDriverConfigPtr cfg, + size_t cell, + qemuDomainObjPrivatePtr priv, + virBufferPtr buf) +{ + char *alias = NULL; + int ret = -1; + unsigned long long memsize = virDomainNumaGetNodeMemorySize(def->numa, cell); + + if (virAsprintf(&alias, "ram-node%zu", cell) < 0) + goto cleanup; + + ret = qemuBuildMemoryBackendStr(def, cfg, alias, cell, memsize, priv, buf); + cleanup: VIR_FREE(alias); - virJSONValueFree(props);
return ret; } @@ -7590,6 +7606,17 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, size_t ncells = virDomainNumaGetNodeCount(def->numa); const long system_page_size = virGetSystemPageSizeKB(); bool numa_distances = false; + bool implicit = false; + + if (ncells == 0) { + if (def->mem.access == VIR_DOMAIN_MEMORY_ACCESS_SHARED) { + ncells = 1;
Well, that's cheating for the subsequent for loop ;-)
+ implicit = true; + } else { + ret = 0; + goto cleanup; + } + }
So if ncells == 0 && def->mem.access != VIR_DOMAIN_MEMORY_ACCESS_SHARED, then we return 0 without doing the subsequent code? Is that expected? Is there something done later that may be necessary, needed, or assumed.
No, before the patch, virDomainNumaGetNodeCount() is checked before calling qemuBuildNumaArgStr(). Now it is handled inside in case ncells==0 && def->mem.access != VIR_DOMAIN_MEMORY_ACCESS_SHARED.
if (virDomainNumatuneHasPerNodeBinding(def->numa) && !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || @@ -7645,14 +7672,22 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) {
- if ((rc = qemuBuildMemoryCellBackendStr(def, cfg, i, priv, - &nodeBackends[i])) < 0) + + if (implicit) + rc = qemuBuildMemoryBackendStr(def, cfg, "ram-node", -1,
Using "ram-node" where other devices use "ram-node%zu" could easily be missed - maybe "default" or "implicit" as a prefix or postfix to make it stand out a bit more? It's not a big deal though.
+ def->mem.total_memory, + priv, &nodeBackends[i]); + else + rc = qemuBuildMemoryCellBackendStr(def, cfg, i, + priv, &nodeBackends[i]); + if (rc < 0) goto cleanup;
if (rc == 0) needBackend = true; } else { - if (virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) { + if (implicit || + virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Shared memory mapping is not supported " "with this QEMU")); @@ -7667,15 +7702,18 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
for (i = 0; i < ncells; i++) { VIR_FREE(cpumask); - if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i)))) - goto cleanup;
- if (strchr(cpumask, ',') && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("disjoint NUMA cpu ranges are not supported " - "with this QEMU")); - goto cleanup; + if (!implicit) { + if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i)))) + goto cleanup; + + if (strchr(cpumask, ',') && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA cpu ranges are not supported " + "with this QEMU")); + goto cleanup; + } }
if (needBackend) { @@ -7694,7 +7732,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, }
if (needBackend) - virBufferAsprintf(&buf, ",memdev=ram-node%zu", i); + virBufferAsprintf(&buf, implicit ? + ",memdev=ram-node" : ",memdev=ram-node%zu", i); else virBufferAsprintf(&buf, ",mem=%llu", virDomainNumaGetNodeMemorySize(def->numa, i) / 1024); @@ -7717,7 +7756,7 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, break; }
- if (numa_distances) { + if (!implicit && numa_distances) { if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA_DIST)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("setting NUMA distances is not " @@ -10303,8 +10342,7 @@ qemuBuildCommandLine(virQEMUDriverPtr driver, if (qemuBuildIOThreadCommandLine(cmd, def) < 0) goto error;
- if (virDomainNumaGetNodeCount(def->numa) && - qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0) + if (qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0) goto error;
if (qemuBuildMemoryDeviceCommandLine(cmd, cfg, def, priv) < 0) diff --git a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args index bd88daaa3b..400fb39cc6 100644 --- a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args +++ b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args @@ -11,6 +11,10 @@ QEMU_AUDIO_DRV=none \ -m 14336 \ -mem-prealloc \ -smp 8,sockets=8,cores=1,threads=1 \ +-object memory-backend-file,id=ram-node,\ +mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/-1-instance-00000092/ram-node,\ +share=yes,size=15032385536 \
Curious does that perhaps rather large file for mem-path get created? If so, wow! Seems to me something like this would need to be documented or as noted earlier be configurable.
You mean during tests? no, it's created by qemu afaik. I am not a libvirt expert either, and I would rather have no file be created when you want shared memory. That's what the next patches propose with memfd support.
John
+-numa node,nodeid=0,memdev=ram-node \ -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ -display none \ -no-user-config \
thanks

On 08/16/2018 06:31 AM, Marc-André Lureau wrote:
Hi
On Thu, Aug 16, 2018 at 4:35 AM, John Ferlan <jferlan@redhat.com> wrote:
On 07/13/2018 09:28 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
When a domain is configured with 'shared' memory backing:
<memoryBacking> <access mode='shared'/> </memoryBacking>
But no explicit NUMA configuration, let's configure a shared memory backend associated with default -numa.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 100 ++++++++++++------ .../fd-memory-no-numa-topology.args | 4 + 2 files changed, 73 insertions(+), 31 deletions(-)
NUMA, memory backends, and hugepages - not in my wheelhouse of knowledge. Hopefully Michal and/or Pavel will take a look!
Is it possible someone may not want this type of thing to happen? Is
I assume someone that sets 'shared' memory mode may consider this as a bug fix.
there an upside or downside to this? What happens "today" when not
You get non-shared memory
So today someone asks for "shared" and then end up with "non-shared"? I don't think that's apparent from the "access" description in: https://libvirt.org/formatdomain.html#elementsMemoryBacking Then again, not in my wheelhouse of knowledge, so maybe that's just one of those givens. Of course that perhaps goes to your first answer of this being a "bug fix". Not something that's apparent from the existing documentation or commit description though. This probably should have been it's own separate patch and not included in this series.
generated? And of course, what about migration concerns about unconditionally doing this for some target migration?
True, this will break migration though if the target uses numa/memory-backend-file.
What do you suggest?
This needs to be configurable as we cannot break w/ migration. I'm not quite sure how we document this other than as Dan suggests that if someone wants NUMA, then they'll configure things properly. If this 'shared' ends up as 'non-shared' because NUMA isn't configured, then we should indicate that. If the adding some property to the element to indicate desire of 'shared' via usage of some (local) backing store which means the guest probably isn't very migrate-able, then so bit it.
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 44ae8dcef7..f1235099b2 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3254,26 +3254,21 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps,
static int -qemuBuildMemoryCellBackendStr(virDomainDefPtr def, - virQEMUDriverConfigPtr cfg, - size_t cell, - qemuDomainObjPrivatePtr priv, - virBufferPtr buf) +qemuBuildMemoryBackendStr(virDomainDefPtr def, + virQEMUDriverConfigPtr cfg, + const char *alias,
So the one "concern" I'd have here is some time in the future the @mem gets allocated and handled like a real device eventually calling virDomainDeviceInfoClear and that'd be a problem for the passed const char * string. Some future person's problem I guess!
+ int targetNode, + unsigned long long memsize, + qemuDomainObjPrivatePtr priv, + virBufferPtr buf)
As much as a long name is a pain, is this more of a :
qemuBuildMemorySharedDefaultBackendStr
Why?
nm, I think when first reading for some reason I had this "separate" from the CellBackend call. [...]
+ implicit = true; + } else { + ret = 0; + goto cleanup; + } + }
So if ncells == 0 && def->mem.access != VIR_DOMAIN_MEMORY_ACCESS_SHARED, then we return 0 without doing the subsequent code? Is that expected? Is there something done later that may be necessary, needed, or assumed.
No, before the patch, virDomainNumaGetNodeCount() is checked before calling qemuBuildNumaArgStr(). Now it is handled inside in case ncells==0 && def->mem.access != VIR_DOMAIN_MEMORY_ACCESS_SHARED.
Ah, yes - I missed the check when looking at the changes - if (virDomainNumaGetNodeCount(def->numa) && - qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0) [...]
diff --git a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args index bd88daaa3b..400fb39cc6 100644 --- a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args +++ b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args @@ -11,6 +11,10 @@ QEMU_AUDIO_DRV=none \ -m 14336 \ -mem-prealloc \ -smp 8,sockets=8,cores=1,threads=1 \ +-object memory-backend-file,id=ram-node,\ +mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/-1-instance-00000092/ram-node,\ +share=yes,size=15032385536 \
Curious does that perhaps rather large file for mem-path get created? If so, wow! Seems to me something like this would need to be documented or as noted earlier be configurable.
You mean during tests? no, it's created by qemu afaik.
I am not a libvirt expert either, and I would rather have no file be created when you want shared memory. That's what the next patches propose with memfd support.
So my point/question was someone may not realize (or want) such a large file to be created by default for them in /var/lib/libvirt... As noted above maybe this becomes an "extra" attribute to request a non NUMA backed shared file with an additional element path that would provide the path to the file rather than the defualt /var/lib/libvirt... John (hopefully this all makes sense - I'm only one coffee in so far)
John
+-numa node,nodeid=0,memdev=ram-node \ -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ -display none \ -no-user-config \
thanks

On Thu, Aug 16, 2018 at 07:32:57AM -0400, John Ferlan wrote:
On 08/16/2018 06:31 AM, Marc-André Lureau wrote:
Hi
On Thu, Aug 16, 2018 at 4:35 AM, John Ferlan <jferlan@redhat.com> wrote:
On 07/13/2018 09:28 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
When a domain is configured with 'shared' memory backing:
<memoryBacking> <access mode='shared'/> </memoryBacking>
But no explicit NUMA configuration, let's configure a shared memory backend associated with default -numa.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 100 ++++++++++++------ .../fd-memory-no-numa-topology.args | 4 + 2 files changed, 73 insertions(+), 31 deletions(-)
NUMA, memory backends, and hugepages - not in my wheelhouse of knowledge. Hopefully Michal and/or Pavel will take a look!
Is it possible someone may not want this type of thing to happen? Is
I assume someone that sets 'shared' memory mode may consider this as a bug fix.
there an upside or downside to this? What happens "today" when not
You get non-shared memory
So today someone asks for "shared" and then end up with "non-shared"? I don't think that's apparent from the "access" description in:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Then again, not in my wheelhouse of knowledge, so maybe that's just one of those givens. Of course that perhaps goes to your first answer of this being a "bug fix". Not something that's apparent from the existing documentation or commit description though. This probably should have been it's own separate patch and not included in this series.
If we can't honour the "shared" request, we should make sure libvirt reports an error and aborts startup. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Jul 13, 2018 at 03:28:08PM +0200, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
When a domain is configured with 'shared' memory backing:
<memoryBacking> <access mode='shared'/> </memoryBacking>
But no explicit NUMA configuration, let's configure a shared memory backend associated with default -numa.
diff --git a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args index bd88daaa3b..400fb39cc6 100644 --- a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args +++ b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args @@ -11,6 +11,10 @@ QEMU_AUDIO_DRV=none \ -m 14336 \ -mem-prealloc \ -smp 8,sockets=8,cores=1,threads=1 \ +-object memory-backend-file,id=ram-node,\ +mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/-1-instance-00000092/ram-node,\ +share=yes,size=15032385536 \ +-numa node,nodeid=0,memdev=ram-node \
I'm not at all convinced it is safe todo this. We've been burnt in the past by adding use of memory-backend objects causing migration to break commit f309db1f4d51009bad0d32e12efc75530b66836b Author: Michal Privoznik <mprivozn@redhat.com> Date: Thu Dec 18 12:36:48 2014 +0100 qemu: Create memory-backend-{ram,file} iff needed Libvirt BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1175397 QEMU BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1170093 This change doesn't really feel like it is required either. If the user wants NUMA, then the XML can just be written to request a NUMA topology with a single node. Better to be explicit in the XML rather than silently adding things as a side effect Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hi On Thu, Aug 16, 2018 at 12:48 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Fri, Jul 13, 2018 at 03:28:08PM +0200, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
When a domain is configured with 'shared' memory backing:
<memoryBacking> <access mode='shared'/> </memoryBacking>
But no explicit NUMA configuration, let's configure a shared memory backend associated with default -numa.
diff --git a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args index bd88daaa3b..400fb39cc6 100644 --- a/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args +++ b/tests/qemuxml2argvdata/fd-memory-no-numa-topology.args @@ -11,6 +11,10 @@ QEMU_AUDIO_DRV=none \ -m 14336 \ -mem-prealloc \ -smp 8,sockets=8,cores=1,threads=1 \ +-object memory-backend-file,id=ram-node,\ +mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/-1-instance-00000092/ram-node,\ +share=yes,size=15032385536 \ +-numa node,nodeid=0,memdev=ram-node \
I'm not at all convinced it is safe todo this. We've been burnt in the past by adding use of memory-backend objects causing migration to break
commit f309db1f4d51009bad0d32e12efc75530b66836b Author: Michal Privoznik <mprivozn@redhat.com> Date: Thu Dec 18 12:36:48 2014 +0100
qemu: Create memory-backend-{ram,file} iff needed
Libvirt BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1175397 QEMU BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1170093
This change doesn't really feel like it is required either. If the user wants NUMA, then the XML can just be written to request a NUMA topology with a single node. Better to be explicit in the XML rather than silently adding things as a side effect
Ok, let's drop this patch then. I'll modify "qemu: use memory-backend-memfd if possible" to make use of memfd differently, with explicit NUMA.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Marc-André Lureau

From: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml | 1 + 8 files changed, 9 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 23b483349f..4ffd2d6257 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -503,6 +503,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "machine.pseries.cap-hpt-max-page-size", "machine.pseries.cap-htm", "usb-storage.werror", + "memory-backend-memfd", ); @@ -1144,6 +1145,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "vhost-vsock-device", QEMU_CAPS_DEVICE_VHOST_VSOCK }, { "mch", QEMU_CAPS_DEVICE_MCH }, { "sev-guest", QEMU_CAPS_SEV_GUEST }, + { "memory-backend-memfd", QEMU_CAPS_OBJECT_MEMORY_MEMFD }, }; static struct virQEMUCapsStringFlags virQEMUCapsDevicePropsVirtioBalloon[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 1fa0ebfea3..ed612595e0 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -487,6 +487,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_MACHINE_PSERIES_CAP_HPT_MAX_PAGE_SIZE, /* -machine pseries.cap-hpt-max-page-size */ QEMU_CAPS_MACHINE_PSERIES_CAP_HTM, /* -machine pseries.cap-htm */ QEMU_CAPS_USB_STORAGE_WERROR, /* -device usb-storage,werror=..,rerror=.. */ + QEMU_CAPS_OBJECT_MEMORY_MEMFD, /* -object memory-backend-memfd */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml index ecc029f403..e66b8a22ff 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml @@ -169,6 +169,7 @@ <flag name='vhost-vsock'/> <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>347550</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml index 7139179304..b62cd41bc0 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml @@ -167,6 +167,7 @@ <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> <flag name='machine.pseries.cap-htm'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>428334</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml index 87d189e58d..1d2f802040 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml @@ -133,6 +133,7 @@ <flag name='vhost-vsock'/> <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> + <flag name='memory-backend-memfd'/> <version>2012000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>375999</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml index 9c1f6c327c..0fd2d812dd 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml @@ -211,6 +211,7 @@ <flag name='mch'/> <flag name='mch.extended-tseg-mbytes'/> <flag name='sev-guest'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>416196</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml index 33cd00e613..c6a783353e 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml @@ -167,6 +167,7 @@ <flag name='tpm-emulator'/> <flag name='machine.pseries.cap-hpt-max-page-size'/> <flag name='machine.pseries.cap-htm'/> + <flag name='memory-backend-memfd'/> <version>2012050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>446771</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml index d7c25c65dd..37e44499ef 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml @@ -210,6 +210,7 @@ <flag name='mch.extended-tseg-mbytes'/> <flag name='sev-guest'/> <flag name='usb-storage.werror'/> + <flag name='memory-backend-memfd'/> <version>2012050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>437827</microcodeVersion> -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> When the memory is shared and the source isn't specified to be a file, we can make use of memory-backend-memfd. Sealing is enabled by default in qemu, hugepage is easier to setup, which makes it a better choice than memory-backend-file. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 20 ++++++++++++++++--- tests/qemuxml2argvdata/memfd.args | 28 +++++++++++++++++++++++++++ tests/qemuxml2argvdata/memfd.xml | 32 +++++++++++++++++++++++++++++++ tests/qemuxml2argvtest.c | 2 ++ 4 files changed, 79 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd.args create mode 100644 tests/qemuxml2argvdata/memfd.xml diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index f1235099b2..45ab1f85ae 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3137,7 +3137,21 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, if (!(props = virJSONValueNewObject())) return -1; - if (useHugepage || mem->nvdimmPath || memAccess || + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD) && + memAccess == VIR_DOMAIN_MEMORY_ACCESS_SHARED && + !mem->nvdimmPath && + def->mem.source != VIR_DOMAIN_MEMORY_SOURCE_FILE) { + + backendType = "memory-backend-memfd"; + + if (useHugepage && + virJSONValueObjectAdd(props, + "b:hugetlb", true, + "U:hugetlbsize", pagesize * 1024, + NULL) < 0) + goto cleanup; + + } else if (useHugepage || mem->nvdimmPath || memAccess || def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) { if (useHugepage) { @@ -7670,8 +7684,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, * need to check which approach to use */ for (i = 0; i < ncells; i++) { if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { - + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD)) { if (implicit) rc = qemuBuildMemoryBackendStr(def, cfg, "ram-node", -1, diff --git a/tests/qemuxml2argvdata/memfd.args b/tests/qemuxml2argvdata/memfd.args new file mode 100644 index 0000000000..dda8e272ea --- /dev/null +++ b/tests/qemuxml2argvdata/memfd.args @@ -0,0 +1,28 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i686 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-object memory-backend-memfd,id=ram-node,size=224395264 \ +-numa node,nodeid=0,memdev=ram-node \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/memfd.xml b/tests/qemuxml2argvdata/memfd.xml new file mode 100644 index 0000000000..c60b169169 --- /dev/null +++ b/tests/qemuxml2argvdata/memfd.xml @@ -0,0 +1,32 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <memoryBacking> + <access mode='shared'/> + </memoryBacking> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index a929e4314e..47bd2ab867 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -945,6 +945,8 @@ mymain(void) DO_TEST("pmu-feature", NONE); DO_TEST("pmu-feature-off", NONE); + DO_TEST("memfd", + QEMU_CAPS_OBJECT_MEMORY_MEMFD); DO_TEST("hugepages", NONE); DO_TEST("hugepages-numa", QEMU_CAPS_PIIX_DISABLE_S3, QEMU_CAPS_PIIX_DISABLE_S4, -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_capabilities.c | 6 ++++++ src/qemu/qemu_capabilities.h | 4 ++++ 2 files changed, 10 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 4ffd2d6257..a6c00308a2 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -504,6 +504,10 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "machine.pseries.cap-htm", "usb-storage.werror", "memory-backend-memfd", + + /* 315 */ + "vhost-user-gpu", + "vhost-user-vga", ); @@ -1146,6 +1150,8 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "mch", QEMU_CAPS_DEVICE_MCH }, { "sev-guest", QEMU_CAPS_SEV_GUEST }, { "memory-backend-memfd", QEMU_CAPS_OBJECT_MEMORY_MEMFD }, + { "vhost-user-gpu", QEMU_CAPS_DEVICE_VHOST_USER_GPU }, + { "vhost-user-vga", QEMU_CAPS_DEVICE_VHOST_USER_VGA }, }; static struct virQEMUCapsStringFlags virQEMUCapsDevicePropsVirtioBalloon[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index ed612595e0..ee1a19282a 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -489,6 +489,10 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_USB_STORAGE_WERROR, /* -device usb-storage,werror=..,rerror=.. */ QEMU_CAPS_OBJECT_MEMORY_MEMFD, /* -object memory-backend-memfd */ + /* 315 */ + QEMU_CAPS_DEVICE_VHOST_USER_GPU, /* -device vhost-user-gpu */ + QEMU_CAPS_DEVICE_VHOST_USER_VGA, /* -device vhost-user-vga */ + QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Learn to accept "vhost-user" model type: <video> <model type='vhost-user'/> </video> (fill the required enum and switches to compile successfully) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 5 +++-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 4 +++- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 9 ++++++--- src/qemu/qemu_domain.c | 1 + src/qemu/qemu_domain_address.c | 1 + tests/domaincapsschemadata/full.xml | 1 + 8 files changed, 17 insertions(+), 6 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a3afe137bf..8bd0d81dbd 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -6620,8 +6620,9 @@ qemu-kvm -net nic,model=? /dev/null The <code>model</code> element has a mandatory <code>type</code> attribute which takes the value "vga", "cirrus", "vmvga", "xen", "vbox", "qxl" (<span class="since">since 0.8.6</span>), - "virtio" (<span class="since">since 1.3.0</span>) - or "gop" (<span class="since">since 3.2.0</span>) + "virtio" (<span class="since">since 1.3.0</span>), + "gop" (<span class="since">since 3.2.0</span>) or + "vhost-user" (<span class="since">since 4.6.0</span>) depending on the hypervisor features available. </p> <p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f24a56392a..88dcc5404d 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -3451,6 +3451,7 @@ <value>vbox</value> <value>virtio</value> <value>gop</value> + <value>vhost-user</value> </choice> </attribute> <group> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 7396616eda..93c33263ad 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -590,7 +590,8 @@ VIR_ENUM_IMPL(virDomainVideo, VIR_DOMAIN_VIDEO_TYPE_LAST, "qxl", "parallels", "virtio", - "gop") + "gop", + "vhost-user") VIR_ENUM_IMPL(virDomainVideoVGAConf, VIR_DOMAIN_VIDEO_VGACONF_LAST, "io", @@ -15022,6 +15023,7 @@ virDomainVideoDefaultRAM(const virDomainDef *def, case VIR_DOMAIN_VIDEO_TYPE_VBOX: case VIR_DOMAIN_VIDEO_TYPE_PARALLELS: case VIR_DOMAIN_VIDEO_TYPE_VIRTIO: + case VIR_DOMAIN_VIDEO_TYPE_VHOST_USER: case VIR_DOMAIN_VIDEO_TYPE_GOP: case VIR_DOMAIN_VIDEO_TYPE_LAST: default: diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 0f10e242fd..4e608b6ae2 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1423,6 +1423,7 @@ typedef enum { VIR_DOMAIN_VIDEO_TYPE_PARALLELS, /* pseudo device for VNC in containers */ VIR_DOMAIN_VIDEO_TYPE_VIRTIO, VIR_DOMAIN_VIDEO_TYPE_GOP, + VIR_DOMAIN_VIDEO_TYPE_VHOST_USER, VIR_DOMAIN_VIDEO_TYPE_LAST } virDomainVideoType; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 45ab1f85ae..e78a534628 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -105,7 +105,8 @@ VIR_ENUM_IMPL(qemuVideo, VIR_DOMAIN_VIDEO_TYPE_LAST, "qxl", "", /* don't support parallels */ "", /* no need for virtio */ - "" /* don't support gop */); + "", /* don't support gop */ + "", /* no need for virtio */); VIR_ENUM_DECL(qemuDeviceVideo) @@ -119,7 +120,8 @@ VIR_ENUM_IMPL(qemuDeviceVideo, VIR_DOMAIN_VIDEO_TYPE_LAST, "qxl-vga", "", /* don't support parallels */ "virtio-vga", - "" /* don't support gop */); + "", /* don't support gop */ + "vhost-user-vga"); VIR_ENUM_DECL(qemuDeviceVideoSecondary) @@ -133,7 +135,8 @@ VIR_ENUM_IMPL(qemuDeviceVideoSecondary, VIR_DOMAIN_VIDEO_TYPE_LAST, "qxl", "", /* don't support parallels */ "virtio-gpu", - "" /* don't support gop */); + "", /* don't support gop */ + "vhost-user-gpu"); VIR_ENUM_DECL(qemuSoundCodec) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ed76495309..b8b5919795 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -4485,6 +4485,7 @@ qemuDomainDeviceDefValidateVideo(const virDomainVideoDef *video) case VIR_DOMAIN_VIDEO_TYPE_VMVGA: case VIR_DOMAIN_VIDEO_TYPE_QXL: case VIR_DOMAIN_VIDEO_TYPE_VIRTIO: + case VIR_DOMAIN_VIDEO_TYPE_VHOST_USER: case VIR_DOMAIN_VIDEO_TYPE_LAST: break; } diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 6ea80616af..df98d7384f 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -786,6 +786,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, case VIR_DOMAIN_DEVICE_VIDEO: switch ((virDomainVideoType)dev->data.video->type) { case VIR_DOMAIN_VIDEO_TYPE_VIRTIO: + case VIR_DOMAIN_VIDEO_TYPE_VHOST_USER: return virtioFlags; case VIR_DOMAIN_VIDEO_TYPE_VGA: diff --git a/tests/domaincapsschemadata/full.xml b/tests/domaincapsschemadata/full.xml index d3faf38da0..2038fad272 100644 --- a/tests/domaincapsschemadata/full.xml +++ b/tests/domaincapsschemadata/full.xml @@ -73,6 +73,7 @@ <value>parallels</value> <value>virtio</value> <value>gop</value> + <value>vhost-user</value> </enum> </video> <hostdev supported='yes'> -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> If vhost-user-gpu is supported, vhost-user video type is. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index a6c00308a2..b517477709 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -5036,6 +5036,8 @@ virQEMUCapsFillDomainDeviceVideoCaps(virQEMUCapsPtr qemuCaps, VIR_DOMAIN_CAPS_ENUM_SET(dev->modelType, VIR_DOMAIN_VIDEO_TYPE_QXL); if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_GPU)) VIR_DOMAIN_CAPS_ENUM_SET(dev->modelType, VIR_DOMAIN_VIDEO_TYPE_VIRTIO); + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VHOST_USER_GPU)) + VIR_DOMAIN_CAPS_ENUM_SET(dev->modelType, VIR_DOMAIN_VIDEO_TYPE_VHOST_USER); return 0; } -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> To support VGA, we need vhost-user-vga device, for "vhost-user" model. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_domain.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index b8b5919795..20da58d978 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10488,6 +10488,10 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_VGA)) return false; + if (video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VHOST_USER_VGA)) + return false; + return true; } -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 20da58d978..59c63fa2a4 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -4492,7 +4492,8 @@ qemuDomainDeviceDefValidateVideo(const virDomainVideoDef *video) if (!video->primary && video->type != VIR_DOMAIN_VIDEO_TYPE_QXL && - video->type != VIR_DOMAIN_VIDEO_TYPE_VIRTIO) { + video->type != VIR_DOMAIN_VIDEO_TYPE_VIRTIO && + video->type != VIR_DOMAIN_VIDEO_TYPE_VHOST_USER) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("video type '%s' is only valid as primary " "video device"), -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Check qemu capability, and accept 3d acceleration. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_process.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c903a8e5c8..ee4c3445fb 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4973,6 +4973,8 @@ qemuProcessStartValidateVideo(virDomainObjPtr vm, !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_QXL)) || (video->type == VIR_DOMAIN_VIDEO_TYPE_VIRTIO && !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_GPU)) || + (video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VHOST_USER_GPU)) || (video->type == VIR_DOMAIN_VIDEO_TYPE_VIRTIO && video->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW && !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_GPU_CCW))) { @@ -4983,7 +4985,9 @@ qemuProcessStartValidateVideo(virDomainObjPtr vm, } if (video->accel) { - if (video->accel->accel3d == VIR_TRISTATE_SWITCH_ON && + if (video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER) { + continue; + } else if (video->accel->accel3d == VIR_TRISTATE_SWITCH_ON && (video->type != VIR_DOMAIN_VIDEO_TYPE_VIRTIO || !virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_GPU_VIRGL))) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> See function documentation, used in following patch. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_security.c | 48 ++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_security.h | 6 +++++ 2 files changed, 54 insertions(+) diff --git a/src/qemu/qemu_security.c b/src/qemu/qemu_security.c index af3be42854..c722d19ca4 100644 --- a/src/qemu/qemu_security.c +++ b/src/qemu/qemu_security.c @@ -426,6 +426,54 @@ qemuSecurityRestoreChardevLabel(virQEMUDriverPtr driver, } +/* + * qemuSecurityStartVhostUserGPU: + * + * @driver: the QEMU driver + * @def: the domain definition + * @cmd: the command to run + * @existstatus: pointer to int returning exit status of process + * @cmdret: pointer to int returning result of virCommandRun + * + * Start the vhost-user-gpu process with approriate labels. + * This function returns -1 on security setup error, 0 if all the + * setup was done properly. In case the virCommand failed to run + * 0 is returned but cmdret is set appropriately with the process + * exitstatus also set. + */ +int +qemuSecurityStartVhostUserGPU(virQEMUDriverPtr driver, + virDomainDefPtr def, + virCommandPtr cmd, + int *exitstatus, + int *cmdret) +{ + int ret = -1; + + if (virSecurityManagerSetChildProcessLabel(driver->securityManager, + def, cmd) < 0) + goto cleanup; + + if (virSecurityManagerPreFork(driver->securityManager) < 0) + goto cleanup; + + ret = 0; + + *cmdret = virCommandRun(cmd, exitstatus); + + virSecurityManagerPostFork(driver->securityManager); + + if (*cmdret < 0) + goto cleanup; + + return 0; + +cleanup: + + return ret; +} + + /* * qemuSecurityStartTPMEmulator: * diff --git a/src/qemu/qemu_security.h b/src/qemu/qemu_security.h index a189b63828..75131120b9 100644 --- a/src/qemu/qemu_security.h +++ b/src/qemu/qemu_security.h @@ -84,6 +84,12 @@ int qemuSecurityRestoreChardevLabel(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainChrDefPtr chr); +int qemuSecurityStartVhostUserGPU(virQEMUDriverPtr driver, + virDomainDefPtr def, + virCommandPtr cmd, + int *exitstatus, + int *cmdret); + int qemuSecurityStartTPMEmulator(virQEMUDriverPtr driver, virDomainDefPtr def, virCommandPtr cmd, -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Similar to the qemu_tpm.c, add a unit with a few functions to start/stop and setup the cgroup of the external vhost-user-gpu process. See function documentation. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/conf/device_conf.h | 1 + src/qemu/Makefile.inc.am | 2 + src/qemu/qemu_vhost_user_gpu.c | 318 +++++++++++++++++++++++++++++++++ src/qemu/qemu_vhost_user_gpu.h | 48 +++++ 4 files changed, 369 insertions(+) create mode 100644 src/qemu/qemu_vhost_user_gpu.c create mode 100644 src/qemu/qemu_vhost_user_gpu.h diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index a31ce9c376..544fd33a33 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -175,6 +175,7 @@ struct _virDomainDeviceInfo { * cases we might want to prevent that from happening by * locking the isolation group */ bool isolationGroupLocked; + int vhost_user_fd; }; int virDomainDeviceInfoCopy(virDomainDeviceInfoPtr dst, diff --git a/src/qemu/Makefile.inc.am b/src/qemu/Makefile.inc.am index 2afa67f195..28daf9d426 100644 --- a/src/qemu/Makefile.inc.am +++ b/src/qemu/Makefile.inc.am @@ -56,6 +56,8 @@ QEMU_DRIVER_SOURCES = \ qemu/qemu_qapi.h \ qemu/qemu_tpm.c \ qemu/qemu_tpm.h \ + qemu/qemu_vhost_user_gpu.c \ + qemu/qemu_vhost_user_gpu.h \ $(NULL) diff --git a/src/qemu/qemu_vhost_user_gpu.c b/src/qemu/qemu_vhost_user_gpu.c new file mode 100644 index 0000000000..9007614020 --- /dev/null +++ b/src/qemu/qemu_vhost_user_gpu.c @@ -0,0 +1,318 @@ +/* + * qemu_vhost_user_gpu.c: QEMU vhost-user GPU support + * + * Copyright (C) 2018 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Author: Marc-André Lureau <marcandre.lureau@redhat.com> + */ + +#include <config.h> + +#include "qemu_extdevice.h" +#include "qemu_domain.h" +#include "qemu_security.h" + +#include "conf/domain_conf.h" +#include "vircommand.h" +#include "viralloc.h" +#include "virlog.h" +#include "virutil.h" +#include "virfile.h" +#include "virstring.h" +#include "virtime.h" +#include "virpidfile.h" +#include "qemu_vhost_user_gpu.h" + +#define VIR_FROM_THIS VIR_FROM_NONE + +VIR_LOG_INIT("qemu.vhost-user-gpu") + +/* + * Look up the vhost-user-gpu executable; to be found on the host + */ +static char *vhost_user_gpu_path; + +static int +qemuVhostUserGPUInit(void) +{ + if (!vhost_user_gpu_path) { + vhost_user_gpu_path = virFindFileInPath("vhost-user-gpu"); + if (!vhost_user_gpu_path) { + virReportSystemError(ENOENT, "%s", + _("Unable to find 'vhost-user-gpu' binary in $PATH")); + return -1; + } + if (!virFileIsExecutable(vhost_user_gpu_path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("vhost-user-gpu %s is not an executable"), + vhost_user_gpu_path); + VIR_FREE(vhost_user_gpu_path); + return -1; + } + } + + return 0; +} + + +static char * +qemuVhostUserGPUCreatePidFilename(const char *stateDir, + const char *shortName, + const char *alias) +{ + char *pidfile = NULL; + char *devicename = NULL; + + if (virAsprintf(&devicename, "%s-%s-vhost-user-gpu", shortName, alias) < 0) + return NULL; + + pidfile = virPidFileBuildPath(stateDir, devicename); + + VIR_FREE(devicename); + + return pidfile; +} + + +/* + * qemuVhostUserGPUGetPid + * + * @stateDir: the directory where vhost-user-gpu writes the pidfile into + * @shortName: short name of the domain + * @alias: video device alias + * @pid: pointer to pid + * + * Return -errno upon error, or zero on successful reading of the pidfile. + * If the PID was not still alive, zero will be returned, and @pid will be + * set to -1; + */ +static int +qemuVhostUserGPUGetPid(const char *stateDir, + const char *shortName, + const char *alias, + pid_t *pid) +{ + int ret; + char *pidfile = qemuVhostUserGPUCreatePidFilename(stateDir, shortName, alias); + if (!pidfile) + return -ENOMEM; + + ret = virPidFileReadPathIfAlive(pidfile, pid, vhost_user_gpu_path); + + VIR_FREE(pidfile); + + return ret; +} + + +/* + * qemuExtVhostUserGPUStart: + * + * @driver: QEMU driver + * @def: domain definition + * @video: the video device + * @logCtxt: log context + * + * Start the external vhost-user-gpu process: + * - open a socketpair for vhost-user communication + * - have the command line built + * - start the external process and sync with it before QEMU start + */ +int qemuExtVhostUserGPUStart(virQEMUDriverPtr driver, + virDomainDefPtr def, + virDomainVideoDefPtr video, + qemuDomainLogContextPtr logCtxt) +{ + int ret = -1; + virCommandPtr cmd = NULL; + int exitstatus = 0; + char *errbuf = NULL; + virQEMUDriverConfigPtr cfg; + char *pidfile, *shortName = virDomainDefGetShortName(def); + virTimeBackOffVar timebackoff; + const unsigned long long timeout = 500000; /* ms */ + int cmdret = 0, rc; + int pair[2] = { -1, -1 }; + + pid_t pid; + + if (!shortName) + return -1; + + cfg = virQEMUDriverGetConfig(driver); + + /* stop any left-over for this VM */ + qemuExtVhostUserGPUStop(driver, def, video); + + if (!(pidfile = qemuVhostUserGPUCreatePidFilename( + cfg->stateDir, shortName, video->info.alias))) + goto error; + + if (socketpair(AF_UNIX, SOCK_STREAM, 0, pair) < 0) { + virReportSystemError(errno, "%s", _("failed to create socket")); + goto error; + } + + cmd = virCommandNew(vhost_user_gpu_path); + if (!cmd) + goto error; + + virCommandClearCaps(cmd); + virCommandDaemonize(cmd); + + if (qemuExtDeviceLogCommand(logCtxt, cmd, "vhost-user-gpu") < 0) + goto error; + + virCommandAddArgList(cmd, "--pid", pidfile, NULL); + virCommandAddArgFormat(cmd, "--fd=%d", pair[0]); + virCommandPassFD(cmd, pair[0], VIR_COMMAND_PASS_FD_CLOSE_PARENT); + pair[0] = -1; + + if (video->accel && video->accel->accel3d) { + virCommandAddArg(cmd, "--virgl"); + } + if (qemuSecurityStartVhostUserGPU(driver, def, cmd, + &exitstatus, &cmdret) < 0) + goto error; + + if (cmdret < 0 || exitstatus != 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not start 'vhost-user-gpu'. exitstatus: %d, " + "error: %s"), exitstatus, errbuf); + goto cleanup; + } + + /* check that the helper has written its pid into the file */ + if (virTimeBackOffStart(&timebackoff, 1, timeout) < 0) + goto error; + while (virTimeBackOffWait(&timebackoff)) { + rc = qemuVhostUserGPUGetPid(cfg->stateDir, shortName, video->info.alias, &pid); + if (rc < 0) + continue; + if (rc == 0 && pid == (pid_t)-1) + goto error; + break; + } + + ret = 0; + video->info.vhost_user_fd = pair[1]; + pair[1] = -1; + +cleanup: + VIR_FORCE_CLOSE(pair[0]); + VIR_FORCE_CLOSE(pair[1]); + virObjectUnref(cfg); + VIR_FREE(pidfile); + virCommandFree(cmd); + + return ret; + +error: + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("vhost-user-gpu failed to start")); + goto cleanup; +} + + +/* + * qemuExtVhostUserGPUStop: + * + * @driver: QEMU driver + * @def: domain definition + * @video: the video device + * + * Check if vhost-user process pidfile is around, kill the process, + * and remove the pidfile. + */ +void qemuExtVhostUserGPUStop(virQEMUDriverPtr driver, + virDomainDefPtr def, + virDomainVideoDefPtr video) +{ + virErrorPtr orig_err; + char *pidfile, *shortName = virDomainDefGetShortName(def); + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + + if (qemuVhostUserGPUInit() < 0) + return; + + if (!(pidfile = qemuVhostUserGPUCreatePidFilename( + cfg->stateDir, shortName, video->info.alias))) { + VIR_WARN("Unable to construct vhost-user-gpu pidfile path"); + return; + } + + virErrorPreserveLast(&orig_err); + if (virPidFileForceCleanupPath(pidfile) < 0) { + VIR_WARN("Unable to kill vhost-user-gpu process"); + } else { + if (unlink(pidfile) < 0 && + errno != ENOENT) { + virReportSystemError(errno, + _("Unable to remove stale pidfile %s"), + pidfile); + } + } + virErrorRestore(&orig_err); + + VIR_FREE(pidfile); +} + + +/* + * qemuExtVhostUserGPUSetupCgroup: + * + * @driver: QEMU driver + * @def: domain definition + * @video: the video device + * @cgroupe: a cgroup + * + * Add the vhost-user-gpu PID to the given cgroup. + */ +int +qemuExtVhostUserGPUSetupCgroup(virQEMUDriverPtr driver, + virDomainDefPtr def, + virDomainVideoDefPtr video, + virCgroupPtr cgroup) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + char *pidfile = NULL; + char *shortName = NULL; + int ret = -1, rc; + pid_t pid; + + shortName = virDomainDefGetShortName(def); + if (!shortName) + goto cleanup; + + rc = qemuVhostUserGPUGetPid(cfg->stateDir, shortName, video->info.alias, &pid); + if (rc < 0 || (rc == 0 && pid == (pid_t)-1)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not get process id of vhost-user-gpu")); + goto cleanup; + } + if (virCgroupAddTask(cgroup, pid) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FREE(pidfile); + VIR_FREE(shortName); + virObjectUnref(cfg); + + return ret; +} diff --git a/src/qemu/qemu_vhost_user_gpu.h b/src/qemu/qemu_vhost_user_gpu.h new file mode 100644 index 0000000000..0f5331cce8 --- /dev/null +++ b/src/qemu/qemu_vhost_user_gpu.h @@ -0,0 +1,48 @@ +/* + * qemu_vhost_user_gpu.h: QEMU vhost-user GPU support + * + * Copyright (C) 2018 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Author: Marc-André Lureau <marcandre.lureau@redhat.com> + */ +#ifndef __QEMU_VHOST_USER_GPU_H__ +# define __QEMU_VHOST_USER_GPU_H__ + +# include "qemu_conf.h" +# include "vircommand.h" + +int qemuExtVhostUserGPUStart(virQEMUDriverPtr driver, + virDomainDefPtr def, + virDomainVideoDefPtr video, + qemuDomainLogContextPtr logCtxt) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_RETURN_CHECK; + +void qemuExtVhostUserGPUStop(virQEMUDriverPtr driver, + virDomainDefPtr def, + virDomainVideoDefPtr video) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + +int +qemuExtVhostUserGPUSetupCgroup(virQEMUDriverPtr driver, + virDomainDefPtr def, + virDomainVideoDefPtr video, + virCgroupPtr cgroup) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_RETURN_CHECK; + +#endif /* __QEMU_VHOST_USER_GPU_H__ */ -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> vhost-user doesn't have a virgl option, it's given to the vhost-user-gpu helper process instead. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index e78a534628..e85c5c3c1e 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4330,9 +4330,11 @@ qemuBuildDeviceVideoStr(const virDomainDef *def, virBufferAsprintf(&buf, ",id=%s", video->info.alias); - if (video->accel && video->accel->accel3d == VIR_TRISTATE_SWITCH_ON) { - virBufferAsprintf(&buf, ",virgl=%s", - virTristateSwitchTypeToString(video->accel->accel3d)); + if (video->type == VIR_DOMAIN_VIDEO_TYPE_VIRTIO) { + if (video->accel && video->accel->accel3d == VIR_TRISTATE_SWITCH_ON) { + virBufferAsprintf(&buf, ",virgl=%s", + virTristateSwitchTypeToString(video->accel->accel3d)); + } } if (video->type == VIR_DOMAIN_VIDEO_TYPE_QXL) { -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> It's a virtio device, like virtio-gpu. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_domain_address.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index df98d7384f..476ee2d4a5 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -329,7 +329,8 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDefPtr def, virDomainVideoDefPtr video = def->videos[i]; if (video->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && - video->type == VIR_DOMAIN_VIDEO_TYPE_VIRTIO) + (video->type == VIR_DOMAIN_VIDEO_TYPE_VIRTIO || + video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER)) video->info.type = type; } -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Each vhost-user-gpu needs its own helper gpu process. Start/stop them, and apply the emulator cgroup controller. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_extdevice.c | 47 ++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_extdevice.c b/src/qemu/qemu_extdevice.c index d982922470..8ec703a9b0 100644 --- a/src/qemu/qemu_extdevice.c +++ b/src/qemu/qemu_extdevice.c @@ -25,6 +25,7 @@ #include "qemu_extdevice.h" #include "qemu_domain.h" #include "qemu_tpm.h" +#include "qemu_vhost_user_gpu.h" #include "viralloc.h" #include "virlog.h" @@ -132,11 +133,21 @@ qemuExtDevicesStart(virQEMUDriverPtr driver, virDomainDefPtr def, qemuDomainLogContextPtr logCtxt) { - int ret = 0; + int i, ret = 0; if (qemuExtDevicesInitPaths(driver, def) < 0) return -1; + for (i = 0; i < def->nvideos; i++) { + virDomainVideoDefPtr video = def->videos[i]; + + if (video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER) { + ret = qemuExtVhostUserGPUStart(driver, def, video, logCtxt); + if (ret < 0) + return ret; + } + } + if (def->tpm) ret = qemuExtTPMStart(driver, def, logCtxt); @@ -148,9 +159,19 @@ void qemuExtDevicesStop(virQEMUDriverPtr driver, virDomainDefPtr def) { + int i; + if (qemuExtDevicesInitPaths(driver, def) < 0) return; + for (i = 0; i < def->nvideos; i++) { + virDomainVideoDefPtr video = def->videos[i]; + + if (video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER) { + qemuExtVhostUserGPUStop(driver, def, video); + } + } + if (def->tpm) qemuExtTPMStop(driver, def); } @@ -159,6 +180,13 @@ qemuExtDevicesStop(virQEMUDriverPtr driver, bool qemuExtDevicesHasDevice(virDomainDefPtr def) { + int i; + + for (i = 0; i < def->nvideos; i++) { + if (def->videos[i]->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER) + return true; + } + if (def->tpm && def->tpm->type == VIR_DOMAIN_TPM_TYPE_EMULATOR) return true; @@ -171,10 +199,19 @@ qemuExtDevicesSetupCgroup(virQEMUDriverPtr driver, virDomainDefPtr def, virCgroupPtr cgroup) { - int ret = 0; + int i; - if (def->tpm) - ret = qemuExtTPMSetupCgroup(driver, def, cgroup); + for (i = 0; i < def->nvideos; i++) { + virDomainVideoDefPtr video = def->videos[i]; - return ret; + if (video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER && + qemuExtVhostUserGPUSetupCgroup(driver, def, video, cgroup) < 0) + return -1; + } + + if (def->tpm && + qemuExtTPMSetupCgroup(driver, def, cgroup) < 0) + return -1; + + return 0; } -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Pass the vhost-user socket to a chardev, and associate a vhost-user-backend with it for each vhost-user-gpu. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 50 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index e85c5c3c1e..b3a2bba28e 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4475,6 +4475,38 @@ qemuBuildVgaVideoCommand(virCommandPtr cmd, } +static char * +qemuBuildVhostUserBackendStr(virDomainVideoDefPtr video, + virCommandPtr cmd, + char **chardev) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + if (virAsprintf(chardev, "socket,id=chr-vu-%s,fd=%d", + video->info.alias, video->info.vhost_user_fd) < 0) + goto error; + + virCommandPassFD(cmd, video->info.vhost_user_fd, + VIR_COMMAND_PASS_FD_CLOSE_PARENT); + + video->info.vhost_user_fd = -1; + + virBufferAsprintf(&buf, "vhost-user-backend,id=vu-%s,chardev=chr-vu-%s", + video->info.alias, video->info.alias); + + if (virBufferCheckError(&buf) < 0) + goto error; + + return virBufferContentAndReset(&buf); + +error: + VIR_FREE(*chardev); + virBufferFreeAndReset(&buf); + return NULL; + +} + + static int qemuBuildVideoCommandLine(virCommandPtr cmd, const virDomainDef *def, @@ -4482,6 +4514,24 @@ qemuBuildVideoCommandLine(virCommandPtr cmd, { size_t i; + for (i = 0; i < def->nvideos; i++) { + char *optstr; + char *chardev = NULL; + virDomainVideoDefPtr video = def->videos[i]; + + if (video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER) { + + if (!(optstr = qemuBuildVhostUserBackendStr(video, cmd, &chardev))) + return -1; + + virCommandAddArgList(cmd, "-chardev", chardev, NULL); + virCommandAddArgList(cmd, "-object", optstr, NULL); + + VIR_FREE(optstr); + VIR_FREE(chardev); + } + } + for (i = 0; i < def->nvideos; i++) { char *str = NULL; virDomainVideoDefPtr video = def->videos[i]; -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Change the model name to "vhost-user-gpu-pci" if running on PCI. Set the "max_outputs" property. Associate the device with the "vhost-user" backend. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index b3a2bba28e..b4afcd2e8c 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4319,7 +4319,7 @@ qemuBuildDeviceVideoStr(const virDomainDef *def, goto error; } - if (STREQ(model, "virtio-gpu")) { + if (STREQ(model, "virtio-gpu") || STREQ(model, "vhost-user-gpu")) { if (video->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) virBufferAsprintf(&buf, "%s-ccw", model); else @@ -4367,6 +4367,10 @@ qemuBuildDeviceVideoStr(const virDomainDef *def, if (video->heads) virBufferAsprintf(&buf, ",max_outputs=%u", video->heads); } + } else if (video->type == VIR_DOMAIN_VIDEO_TYPE_VHOST_USER) { + if (video->heads) + virBufferAsprintf(&buf, ",max_outputs=%u", video->heads); + virBufferAsprintf(&buf, ",vhost-user=vu-%s", video->info.alias); } else if (video->vram && ((video->type == VIR_DOMAIN_VIDEO_TYPE_VGA && virQEMUCapsGet(qemuCaps, QEMU_CAPS_VGA_VGAMEM)) || -- 2.18.0.129.ge3331758f1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- .../vhost-user-gpu-secondary.args | 34 +++++++++++++++++ .../vhost-user-gpu-secondary.xml | 38 +++++++++++++++++++ tests/qemuxml2argvdata/vhost-user-vga.args | 31 +++++++++++++++ tests/qemuxml2argvdata/vhost-user-vga.xml | 35 +++++++++++++++++ tests/qemuxml2argvtest.c | 12 ++++++ 5 files changed, 150 insertions(+) create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.args create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.xml create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.args create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.xml diff --git a/tests/qemuxml2argvdata/vhost-user-gpu-secondary.args b/tests/qemuxml2argvdata/vhost-user-gpu-secondary.args new file mode 100644 index 0000000000..0aee4f50ec --- /dev/null +++ b/tests/qemuxml2argvdata/vhost-user-gpu-secondary.args @@ -0,0 +1,34 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i686 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-object memory-backend-memfd,id=ram-node,size=224395264 \ +-numa node,nodeid=0,memdev=ram-node \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-chardev socket,id=chr-vu-video0,fd=0 \ +-object vhost-user-backend,id=vu-video0,chardev=chr-vu-video0 \ +-chardev socket,id=chr-vu-video1,fd=0 \ +-object vhost-user-backend,id=vu-video1,chardev=chr-vu-video1 \ +-device vhost-user-vga,id=video0,max_outputs=1,vhost-user=vu-video0,bus=pci.0,addr=0x2 \ +-device vhost-user-gpu-pci,id=video1,max_outputs=1,vhost-user=vu-video1,bus=pci.0,addr=0x4 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/vhost-user-gpu-secondary.xml b/tests/qemuxml2argvdata/vhost-user-gpu-secondary.xml new file mode 100644 index 0000000000..f91be819f6 --- /dev/null +++ b/tests/qemuxml2argvdata/vhost-user-gpu-secondary.xml @@ -0,0 +1,38 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <memoryBacking> + <access mode='shared'/> + </memoryBacking> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + <video> + <model type='vhost-user' heads='1'/> + </video> + <video> + <model type='vhost-user' heads='1'/> + </video> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/vhost-user-vga.args b/tests/qemuxml2argvdata/vhost-user-vga.args new file mode 100644 index 0000000000..fbaea7e371 --- /dev/null +++ b/tests/qemuxml2argvdata/vhost-user-vga.args @@ -0,0 +1,31 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i686 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-object memory-backend-memfd,id=ram-node,size=224395264 \ +-numa node,nodeid=0,memdev=ram-node \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot c \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ +-chardev socket,id=chr-vu-video0,fd=0 \ +-object vhost-user-backend,id=vu-video0,chardev=chr-vu-video0 \ +-device vhost-user-vga,id=video0,max_outputs=1,vhost-user=vu-video0,bus=pci.0,addr=0x2 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/vhost-user-vga.xml b/tests/qemuxml2argvdata/vhost-user-vga.xml new file mode 100644 index 0000000000..db38b5a45e --- /dev/null +++ b/tests/qemuxml2argvdata/vhost-user-vga.xml @@ -0,0 +1,35 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <memoryBacking> + <access mode='shared'/> + </memoryBacking> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + <video> + <model type='vhost-user' heads='1'/> + </video> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 47bd2ab867..d78caa6a37 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2921,6 +2921,18 @@ mymain(void) QEMU_CAPS_KVM, QEMU_CAPS_SEV_GUEST); + DO_TEST("vhost-user-vga", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_DEVICE_VIDEO_PRIMARY, + QEMU_CAPS_DEVICE_VHOST_USER_GPU, + QEMU_CAPS_DEVICE_VHOST_USER_VGA); + + DO_TEST("vhost-user-gpu-secondary", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_DEVICE_VIDEO_PRIMARY, + QEMU_CAPS_DEVICE_VHOST_USER_GPU, + QEMU_CAPS_DEVICE_VHOST_USER_VGA); + if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL) virFileDeleteTree(fakerootdir); -- 2.18.0.129.ge3331758f1

ping On Fri, Jul 13, 2018 at 3:28 PM <marcandre.lureau@redhat.com> wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi,
This series of patches add support for running a virtio GPU in a seperate process, using vhost-user.
The QEMU series "[PATCH v4 00/29] vhost-user for input & GPU" is still under review, and will hopefully land in 3.1. There are several benefits of running the GPU process in an external process, since Mesa is rather heavy on the qemu main loop, and may block for a while or crash. I observe x5 performance improvements with Unigine Heaven 4 benchmark.
The external GPU process is started with one end of the vhost-user socket pair, the other end is given to a QEMU chardev. It is also added to the emulator cgroup to restrict its CPU usage.
vhost-user requires shared VM memory. A few preliminary patches ease and improve shared memory setup, when no explicit domain NUMA configuration is given. Also, if there is no need for file-backed memory, teach libvirt to make use of memfd memory backend, which has better security guarantees and is easier to setup.
Review welcome!
Marc-André Lureau (17): qemu: setup shared memory without explicit numa configuration qemu: add memory-backend-memfd capability check qemu: use memory-backend-memfd if possible qemu: add vhost-user-gpu capabilities domain: add "vhost-user" video type qemu: fill the vhost-user video type capability qemu: check that qemu is vhost-user-vga capable qemu: vhost-user is valid as non-primary video device qemu: validate vhost-user video model qemu: add qemuSecurityStartVhostUserGPU helper qemu: add vhost-user-gpu helper unit qemu: restrict 'virgl=' option to 'virtio' video type qemu: set default address type on vhost-user video model qemu: start/stop the vhost-user-gpu external device qemu: build vhost-user-backend for vhost-user-gpu qemu: build vhost-user-gpu video device arguments tests: add vhost-user-gpu xml2argv tests
docs/formatdomain.html.in | 5 +- docs/schemas/domaincommon.rng | 1 + src/conf/device_conf.h | 1 + src/conf/domain_conf.c | 4 +- src/conf/domain_conf.h | 1 + src/qemu/Makefile.inc.am | 2 + src/qemu/qemu_capabilities.c | 10 + src/qemu/qemu_capabilities.h | 5 + src/qemu/qemu_command.c | 191 ++++++++--- src/qemu/qemu_domain.c | 8 +- src/qemu/qemu_domain_address.c | 4 +- src/qemu/qemu_extdevice.c | 47 ++- src/qemu/qemu_process.c | 6 +- src/qemu/qemu_security.c | 48 +++ src/qemu/qemu_security.h | 6 + src/qemu/qemu_vhost_user_gpu.c | 318 ++++++++++++++++++ src/qemu/qemu_vhost_user_gpu.h | 48 +++ tests/domaincapsschemadata/full.xml | 1 + .../caps_2.12.0.aarch64.xml | 1 + .../caps_2.12.0.ppc64.xml | 1 + .../caps_2.12.0.s390x.xml | 1 + .../caps_2.12.0.x86_64.xml | 1 + .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + .../caps_3.0.0.x86_64.xml | 1 + .../fd-memory-no-numa-topology.args | 4 + tests/qemuxml2argvdata/memfd.args | 28 ++ tests/qemuxml2argvdata/memfd.xml | 32 ++ .../vhost-user-gpu-secondary.args | 34 ++ .../vhost-user-gpu-secondary.xml | 38 +++ tests/qemuxml2argvdata/vhost-user-vga.args | 31 ++ tests/qemuxml2argvdata/vhost-user-vga.xml | 35 ++ tests/qemuxml2argvtest.c | 14 + 32 files changed, 877 insertions(+), 51 deletions(-) create mode 100644 src/qemu/qemu_vhost_user_gpu.c create mode 100644 src/qemu/qemu_vhost_user_gpu.h create mode 100644 tests/qemuxml2argvdata/memfd.args create mode 100644 tests/qemuxml2argvdata/memfd.xml create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.args create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.xml create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.args create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.xml
-- 2.18.0.129.ge3331758f1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Marc-André Lureau

On 08/14/2018 07:25 PM, Marc-André Lureau wrote:
ping On Fri, Jul 13, 2018 at 3:28 PM <marcandre.lureau@redhat.com> wrote:
Quite a bit has changed w/r/t qemu_capabilities.{c,h}. Can you please resync with the top of the tree and repost. Hopefully Pavel or Michal will be able to take a look at the repost as well since they have more recent NUMA and Hugepage experiences and that's where this series starts... Also I understand why things are combined, but the first 3 patches don't seem to be related to vhost-user-gpu, so they should be a separate series. Although I do understand why they're included due to qemu_capabilities conflicts for one series or the other. Since, all I was able to "easily" git am was patch 1, I'll provide a couple of comments there... John
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi,
This series of patches add support for running a virtio GPU in a seperate process, using vhost-user.
The QEMU series "[PATCH v4 00/29] vhost-user for input & GPU" is still under review, and will hopefully land in 3.1. There are several benefits of running the GPU process in an external process, since Mesa is rather heavy on the qemu main loop, and may block for a while or crash. I observe x5 performance improvements with Unigine Heaven 4 benchmark.
The external GPU process is started with one end of the vhost-user socket pair, the other end is given to a QEMU chardev. It is also added to the emulator cgroup to restrict its CPU usage.
vhost-user requires shared VM memory. A few preliminary patches ease and improve shared memory setup, when no explicit domain NUMA configuration is given. Also, if there is no need for file-backed memory, teach libvirt to make use of memfd memory backend, which has better security guarantees and is easier to setup.
Review welcome!
Marc-André Lureau (17): qemu: setup shared memory without explicit numa configuration qemu: add memory-backend-memfd capability check qemu: use memory-backend-memfd if possible qemu: add vhost-user-gpu capabilities domain: add "vhost-user" video type qemu: fill the vhost-user video type capability qemu: check that qemu is vhost-user-vga capable qemu: vhost-user is valid as non-primary video device qemu: validate vhost-user video model qemu: add qemuSecurityStartVhostUserGPU helper qemu: add vhost-user-gpu helper unit qemu: restrict 'virgl=' option to 'virtio' video type qemu: set default address type on vhost-user video model qemu: start/stop the vhost-user-gpu external device qemu: build vhost-user-backend for vhost-user-gpu qemu: build vhost-user-gpu video device arguments tests: add vhost-user-gpu xml2argv tests
docs/formatdomain.html.in | 5 +- docs/schemas/domaincommon.rng | 1 + src/conf/device_conf.h | 1 + src/conf/domain_conf.c | 4 +- src/conf/domain_conf.h | 1 + src/qemu/Makefile.inc.am | 2 + src/qemu/qemu_capabilities.c | 10 + src/qemu/qemu_capabilities.h | 5 + src/qemu/qemu_command.c | 191 ++++++++--- src/qemu/qemu_domain.c | 8 +- src/qemu/qemu_domain_address.c | 4 +- src/qemu/qemu_extdevice.c | 47 ++- src/qemu/qemu_process.c | 6 +- src/qemu/qemu_security.c | 48 +++ src/qemu/qemu_security.h | 6 + src/qemu/qemu_vhost_user_gpu.c | 318 ++++++++++++++++++ src/qemu/qemu_vhost_user_gpu.h | 48 +++ tests/domaincapsschemadata/full.xml | 1 + .../caps_2.12.0.aarch64.xml | 1 + .../caps_2.12.0.ppc64.xml | 1 + .../caps_2.12.0.s390x.xml | 1 + .../caps_2.12.0.x86_64.xml | 1 + .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + .../caps_3.0.0.x86_64.xml | 1 + .../fd-memory-no-numa-topology.args | 4 + tests/qemuxml2argvdata/memfd.args | 28 ++ tests/qemuxml2argvdata/memfd.xml | 32 ++ .../vhost-user-gpu-secondary.args | 34 ++ .../vhost-user-gpu-secondary.xml | 38 +++ tests/qemuxml2argvdata/vhost-user-vga.args | 31 ++ tests/qemuxml2argvdata/vhost-user-vga.xml | 35 ++ tests/qemuxml2argvtest.c | 14 + 32 files changed, 877 insertions(+), 51 deletions(-) create mode 100644 src/qemu/qemu_vhost_user_gpu.c create mode 100644 src/qemu/qemu_vhost_user_gpu.h create mode 100644 tests/qemuxml2argvdata/memfd.args create mode 100644 tests/qemuxml2argvdata/memfd.xml create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.args create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.xml create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.args create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.xml
-- 2.18.0.129.ge3331758f1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (5)
-
Daniel P. Berrangé
-
John Ferlan
-
Marc-André Lureau
-
Marc-André Lureau
-
marcandre.lureau@redhat.com