[libvirt] [RFC PATCH auto partition NUMA guest domains v1 0/2] auto partition guests providing the host NUMA topology
by Wim Ten Have
From: Wim ten Have <wim.ten.have(a)oracle.com>
This patch extends the guest domain administration adding support
to automatically advertise the host NUMA node capabilities obtained
architecture under a guest by creating a vNUMA copy.
The mechanism is enabled by setting the check='numa' attribute under
the CPU 'host-passthrough' topology:
<cpu mode='host-passthrough' check='numa' .../>
When enabled the mechanism automatically renders the host capabilities
provided NUMA architecture, evenly balances the guest reserved vcpu
and memory amongst its vNUMA composed cells and have the cell allocated
vcpus pinned towards the host NUMA node physical cpusets. This in such
way that the host NUMA topology is still in effect under the partitioned
guest domain.
Below example auto partitions the host 'lscpu' listed physical NUMA detail
under a guest domain vNUMA description.
[root@host ]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 240
On-line CPU(s) list: 0-239
Thread(s) per core: 2
Core(s) per socket: 15
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E7-8895 v2 @ 2.80GHz
Stepping: 7
CPU MHz: 3449.555
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 5586.28
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 38400K
NUMA node0 CPU(s): 0-14,120-134
NUMA node1 CPU(s): 15-29,135-149
NUMA node2 CPU(s): 30-44,150-164
NUMA node3 CPU(s): 45-59,165-179
NUMA node4 CPU(s): 60-74,180-194
NUMA node5 CPU(s): 75-89,195-209
NUMA node6 CPU(s): 90-104,210-224
NUMA node7 CPU(s): 105-119,225-239
Flags: ...
The guest 'anuma' without the auto partition rendering enabled
reads; "<cpu mode='host-passthrough' check='none'/>"
<domain type='kvm'>
<name>anuma</name>
<uuid>3f439f5f-1156-4d48-9491-945a2c0abc6d</uuid>
<memory unit='KiB'>67108864</memory>
<currentMemory unit='KiB'>67108864</currentMemory>
<vcpu placement='static'>16</vcpu>
<os>
<type arch='x86_64' machine='pc-q35-2.11'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<vmport state='off'/>
</features>
<cpu mode='host-passthrough' check='none'/>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled='no'/>
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/anuma.qcow2'/>
Enabling the auto partitioning the guest 'anuma' XML is rewritten
as listed below; "<cpu mode='host-passthrough' check='numa'>"
<domain type='kvm'>
<name>anuma</name>
<uuid>3f439f5f-1156-4d48-9491-945a2c0abc6d</uuid>
<memory unit='KiB'>67108864</memory>
<currentMemory unit='KiB'>67108864</currentMemory>
<vcpu placement='static'>16</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='0-14,120-134'/>
<vcpupin vcpu='1' cpuset='15-29,135-149'/>
<vcpupin vcpu='2' cpuset='30-44,150-164'/>
<vcpupin vcpu='3' cpuset='45-59,165-179'/>
<vcpupin vcpu='4' cpuset='60-74,180-194'/>
<vcpupin vcpu='5' cpuset='75-89,195-209'/>
<vcpupin vcpu='6' cpuset='90-104,210-224'/>
<vcpupin vcpu='7' cpuset='105-119,225-239'/>
<vcpupin vcpu='8' cpuset='0-14,120-134'/>
<vcpupin vcpu='9' cpuset='15-29,135-149'/>
<vcpupin vcpu='10' cpuset='30-44,150-164'/>
<vcpupin vcpu='11' cpuset='45-59,165-179'/>
<vcpupin vcpu='12' cpuset='60-74,180-194'/>
<vcpupin vcpu='13' cpuset='75-89,195-209'/>
<vcpupin vcpu='14' cpuset='90-104,210-224'/>
<vcpupin vcpu='15' cpuset='105-119,225-239'/>
</cputune>
<os>
<type arch='x86_64' machine='pc-q35-2.11'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<vmport state='off'/>
</features>
<cpu mode='host-passthrough' check='numa'>
<topology sockets='8' cores='1' threads='2'/>
<numa>
<cell id='0' cpus='0,8' memory='8388608' unit='KiB'>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='21'/>
<sibling id='2' value='31'/>
<sibling id='3' value='21'/>
<sibling id='4' value='21'/>
<sibling id='5' value='31'/>
<sibling id='6' value='31'/>
<sibling id='7' value='31'/>
</distances>
</cell>
<cell id='1' cpus='1,9' memory='8388608' unit='KiB'>
<distances>
<sibling id='0' value='21'/>
<sibling id='1' value='10'/>
<sibling id='2' value='21'/>
<sibling id='3' value='31'/>
<sibling id='4' value='31'/>
<sibling id='5' value='21'/>
<sibling id='6' value='31'/>
<sibling id='7' value='31'/>
</distances>
</cell>
<cell id='2' cpus='2,10' memory='8388608' unit='KiB'>
<distances>
<sibling id='0' value='31'/>
<sibling id='1' value='21'/>
<sibling id='2' value='10'/>
<sibling id='3' value='21'/>
<sibling id='4' value='31'/>
<sibling id='5' value='31'/>
<sibling id='6' value='21'/>
<sibling id='7' value='31'/>
</distances>
</cell>
<cell id='3' cpus='3,11' memory='8388608' unit='KiB'>
<distances>
<sibling id='0' value='21'/>
<sibling id='1' value='31'/>
<sibling id='2' value='21'/>
<sibling id='3' value='10'/>
<sibling id='4' value='31'/>
<sibling id='5' value='31'/>
<sibling id='6' value='31'/>
<sibling id='7' value='21'/>
</distances>
</cell>
<cell id='4' cpus='4,12' memory='8388608' unit='KiB'>
<distances>
<sibling id='0' value='21'/>
<sibling id='1' value='31'/>
<sibling id='2' value='31'/>
<sibling id='3' value='31'/>
<sibling id='4' value='10'/>
<sibling id='5' value='21'/>
<sibling id='6' value='21'/>
<sibling id='7' value='31'/>
</distances>
</cell>
<cell id='5' cpus='5,13' memory='8388608' unit='KiB'>
<distances>
<sibling id='0' value='31'/>
<sibling id='1' value='21'/>
<sibling id='2' value='31'/>
<sibling id='3' value='31'/>
<sibling id='4' value='21'/>
<sibling id='5' value='10'/>
<sibling id='6' value='31'/>
<sibling id='7' value='21'/>
</distances>
</cell>
<cell id='6' cpus='6,14' memory='8388608' unit='KiB'>
<distances>
<sibling id='0' value='31'/>
<sibling id='1' value='31'/>
<sibling id='2' value='21'/>
<sibling id='3' value='31'/>
<sibling id='4' value='21'/>
<sibling id='5' value='31'/>
<sibling id='6' value='10'/>
<sibling id='7' value='21'/>
</distances>
</cell>
<cell id='7' cpus='7,15' memory='8388608' unit='KiB'>
<distances>
<sibling id='0' value='31'/>
<sibling id='1' value='31'/>
<sibling id='2' value='31'/>
<sibling id='3' value='21'/>
<sibling id='4' value='31'/>
<sibling id='5' value='21'/>
<sibling id='6' value='21'/>
<sibling id='7' value='10'/>
</distances>
</cell>
</numa>
</cpu>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled='no'/>
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/anuma.qcow2'/>
Finally the auto partitioned guest anuma 'lscpu' listed virtual vNUMA detail.
[root@anuma ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E7-8895 v2 @ 2.80GHz
Stepping: 7
CPU MHz: 2793.268
BogoMIPS: 5586.53
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0,8
NUMA node1 CPU(s): 1,9
NUMA node2 CPU(s): 2,10
NUMA node3 CPU(s): 3,11
NUMA node4 CPU(s): 4,12
NUMA node5 CPU(s): 5,13
NUMA node6 CPU(s): 6,14
NUMA node7 CPU(s): 7,15
Flags: ...
Wim ten Have (2):
domain: auto partition guests providing the host NUMA topology
qemuxml2argv: add tests that exercise vNUMA auto partition topology
docs/formatdomain.html.in | 7 +
docs/schemas/cputypes.rng | 1 +
src/conf/cpu_conf.c | 3 +-
src/conf/cpu_conf.h | 1 +
src/conf/domain_conf.c | 166 ++++++++++++++++++
.../cpu-host-passthrough-nonuma.args | 25 +++
.../cpu-host-passthrough-nonuma.xml | 18 ++
.../cpu-host-passthrough-numa.args | 29 +++
.../cpu-host-passthrough-numa.xml | 18 ++
tests/qemuxml2argvtest.c | 2 +
10 files changed, 269 insertions(+), 1 deletion(-)
create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args
create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml
create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args
create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml
--
2.17.1
6 years
[libvirt] [RFC] Faster libvirtd restart with nwfilter rules
by Nikolay Shirokovskiy
Hi, all.
On fat hosts which are capable to run hundreds of VMs restarting libvirtd
makes it's services unavailable for a long time if VMs use network filters. In
my tests each of 100 VMs has no-promisc [1] and no-mac-spoofing filters and
executing virsh list right after daemon restart takes appoximately 140s if no
firewalld is running (that is ebtables/iptables/ip6tables commands are used to
configure kernel tables).
The problem is daemon does not even start to read from client connections
because state drivers are not initialized. Initialization is blocked in state
drivers autostart which grabs VMs locks. And VMs locks are hold by VMs
reconnection code. Each VM reloads network tables on reconnection and this
reloading is serialized on updateMutex in gentech nwfilter driver.
Workarounding autostart won't help much because even if state drivers will
initialize listing VM won't be possible because listing VMs takes each VM lock
one by one too. However managing VM that passed reconnection phase will be
possible which takes same 140s in worst case.
Note that this issue is only applicable if we use filters configuration that
don't need ip learning. In the latter case situation is different because
reconnection code spawns new thread that apply network rules only after ip is
learned from traffic and this thread does not grab VM lock. As result VMs are
managable but reloading filters in background takes appoximately those same
140s. I guess managing network filters during this period can have issues too.
Anyway this situation does not look good so fixing the described issue by
spawning threads even without ip learning does not look nice to me.
What speed up is possible on conservative approach? First we can remove for
test purpuses firewall ruleLock, gentech dirver updateMutex and filter object
mutex which do not serve function in restart scenario. This gives 36s restart
time. The speed up is archived because heavy fork/preexec steps are now run
concurrently.
Next we can try to reduce fork/preexec time. To estimate its contibution alone
let's bring back the above locks. It turns out the most time takes fork itself
and closing 8k (on my system) file descriptors in preexec. Using vfork gives
2x boost and so does dropping mass close. (I check this mass close contribution
because I not quite understand the purpose of this step - libvirt typically set
close-on-exec flag on it's descriptors). So this two optimizations alone can
result in restart time of 30s.
Unfortunately combining the above two approaches does not give boost multiple
of them along. The reason is due to concurrency and high number of VMs (100)
preexec boost does not have significant role and using vfork dininishes
concurrency as it freezes all parent threads before execve. So dropping locks
and closes gives 33s restart time and adding vfork to this gives 25s restart
time.
Another approach is to use --atomic-file option for ebtables
(iptables/ip6tables unfortunately does not have one). The idea is to save table
to file/edit file/commit table to kernel. I hoped this could give performance
boost because we don't need to load/store kernel network table for a single
rule update. In order to isolate approaches I also dropped all ip/ip6 updates
which can not be done this way. In this approach we can not drop ruleLock in
firewall because no other VM threads should change tables between save/commit.
This approach gives restart time 25s. But this approach is broken anyway as we
can not be sure another application doesn't change newtork table between
save/commit in which case these changes will be lost.
After all I think we need to move in a different direction. We can add API to
all binaries and firewalld to execute many commands in one run. We can pass
commands as arguments or wrote them into file which is then given to binary.
Then libvirt itself can update for example bridge network table in couple of
commands. The exact number depends on new API. For example if we add option to
delete chains recursively and an option not to fail on NOENT error we can
change table in one command (no listing current rules is required).
[1] no-promisc filter
<filter name='no-promisc' chain='root' priority='-750'>
<uuid>6d055022-1192-4a3d-ae1f-576baa5564b6</uuid>
<rule action='return' direction='in' priority='500'>
<mac dstmacaddr='ff:ff:ff:ff:ff:ff'/>
</rule>
<rule action='return' direction='in' priority='500'>
<mac dstmacaddr='$MAC'/>
</rule>
<rule action='return' direction='in' priority='500'>
<mac dstmacaddr='33:33:00:00:00:00' dstmacmask='ff:ff:00:00:00:00'/>
</rule>
<rule action='drop' direction='in' priority='500'>
<mac/>
</rule>
<rule action='return' direction='in' priority='500'>
<mac dstmacaddr='01:00:5e:00:00:00' dstmacmask='ff:ff:ff:80:00:00'/>
</rule>
</filter>
6 years
[libvirt] [PATCH v2 1/2] add nodeset='all' for interleave mode
by Peng Hao
sometimes we hoped that the memory of vm can be evenly distributed in
all nodes according to interleave mode. But different hosts
has different node number. So we add nodeset='all' for interleave mode.
Signed-off-by: Peng Hao <peng.hao2(a)zte.com.cn>
---
src/conf/numa_conf.c | 77 ++++++++++++++++++++++++++++++++++++-------
1 files changed, 64 insertions(+), 13 deletions(-)
diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c
index 97a3ca4..a336a62 100644
--- a/src/conf/numa_conf.c
+++ b/src/conf/numa_conf.c
@@ -29,6 +29,9 @@
#include "virnuma.h"
#include "virstring.h"
+#if WITH_NUMACTL
+#include <numa.h>
+#endif
/*
* Distance definitions defined Conform ACPI 2.0 SLIT.
* See include/linux/topology.h
@@ -66,6 +69,7 @@ typedef virDomainNumaNode *virDomainNumaNodePtr;
struct _virDomainNuma {
struct {
bool specified;
+ bool allnode;
virBitmapPtr nodeset;
virDomainNumatuneMemMode mode;
virDomainNumatunePlacement placement;
@@ -259,13 +263,21 @@ virDomainNumatuneParseXML(virDomainNumaPtr numa,
tmp = virXMLPropString(node, "nodeset");
if (tmp) {
- if (virBitmapParse(tmp, &nodeset, VIR_DOMAIN_CPUMASK_LEN) < 0)
- goto cleanup;
-
- if (virBitmapIsAllClear(nodeset)) {
+ if (STREQ(tmp, "all") && !virNumaIsAvailable()) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
- _("Invalid value of 'nodeset': %s"), tmp);
+ _("Invalid nodeset=%s when numactl is not supported"), tmp);
goto cleanup;
+ } else if (STREQ(tmp, "all") && virNumaIsAvailable()) {
+ numa->memory.allnode = true;
+ } else {
+ if (virBitmapParse(tmp, &nodeset, VIR_DOMAIN_CPUMASK_LEN) < 0)
+ goto cleanup;
+
+ if (virBitmapIsAllClear(nodeset)) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+ _("Invalid value of 'nodeset': %s"), tmp);
+ goto cleanup;
+ }
}
VIR_FREE(tmp);
@@ -319,10 +331,14 @@ virDomainNumatuneFormatXML(virBufferPtr buf,
virBufferAsprintf(buf, "<memory mode='%s' ", tmp);
if (numatune->memory.placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC) {
- if (!(nodeset = virBitmapFormat(numatune->memory.nodeset)))
- return -1;
- virBufferAsprintf(buf, "nodeset='%s'/>\n", nodeset);
- VIR_FREE(nodeset);
+ if (numatune->memory.allnode == true) {
+ virBufferAddLit(buf, "nodeset='all'/>\n");
+ } else {
+ if (!(nodeset = virBitmapFormat(numatune->memory.nodeset)))
+ return -1;
+ virBufferAsprintf(buf, "nodeset='%s'/>\n", nodeset);
+ VIR_FREE(nodeset);
+ }
} else if (numatune->memory.placement) {
tmp = virDomainNumatunePlacementTypeToString(numatune->memory.placement);
virBufferAsprintf(buf, "placement='%s'/>\n", tmp);
@@ -489,6 +505,37 @@ virDomainNumatuneMaybeFormatNodeset(virDomainNumaPtr numatune,
return 0;
}
+#if WITH_NUMACTL
+static int
+makeAllnodeBitmap(virDomainNumaPtr numa)
+{
+ size_t i = 0, maxnode = 0;
+ virBitmapPtr bitmap = NULL;
+
+ if ((bitmap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)) == NULL)
+ return -1;
+ virBitmapClearAll(bitmap);
+ maxnode = numa_max_node();
+ for (i = 0; i <= maxnode; i++) {
+ if (virBitmapSetBit(bitmap, i) < 0) {
+ virBitmapFree(bitmap);
+ return -1;
+ }
+ }
+
+ virBitmapFree(numa->memory.nodeset);
+ numa->memory.nodeset = bitmap;
+
+ return 0;
+}
+#else
+static int
+makeAllnodeBitmap(virDomainNumaPtr numa)
+{
+ return -1;
+}
+#endif
+
int
virDomainNumatuneSet(virDomainNumaPtr numa,
bool placement_static,
@@ -538,20 +585,28 @@ virDomainNumatuneSet(virDomainNumaPtr numa,
}
if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_DEFAULT) {
- if (numa->memory.nodeset || placement_static)
+ if (numa->memory.nodeset || placement_static || numa->memory.allnode)
placement = VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC;
else
placement = VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO;
}
if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC &&
- !numa->memory.nodeset) {
+ !numa->memory.nodeset && !numa->memory.allnode) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
_("nodeset for NUMA memory tuning must be set "
"if 'placement' is 'static'"));
goto cleanup;
}
+ if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC &&
+ mode == VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE &&
+ numa->memory.allnode && virNumaIsAvailable()) {
+
+ if (makeAllnodeBitmap(numa) < 0)
+ goto cleanup;
+ }
+
/* setting nodeset when placement auto is invalid */
if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO &&
numa->memory.nodeset) {
--
1.8.3.1
6 years
[libvirt] [PATCH 00/30] syntax: Remove spaces after casts
by Martin Kletzander
According to previous discussions it looks like this is the preferred way
of casting. One difference to the previous one is that this time I
tuned the regexp a bit so that it doesn't match some macros and
assignments and it also matches structs.
Feel free to require squashing of some small patches together.
Martin Kletzander (30):
examples/: Remove spaces after casts
access/: Remove spaces after casts
admin/: Remove spaces after casts
conf/: Remove spaces after casts
cpu/: Remove spaces after casts
esx/: Remove spaces after casts
hyperv/: Remove spaces after casts
libxl/: Remove spaces after casts
locking/: Remove spaces after casts
lxc/: Remove spaces after casts
network/: Remove spaces after casts
nwfilter/: Remove spaces after casts
phyp/: Remove spaces after casts
qemu/: Remove spaces after casts
remote/: Remove spaces after casts
rpc/: Remove spaces after casts
security/: Remove spaces after casts
storage/: Remove spaces after casts
test/: Remove spaces after casts
uml/: Remove spaces after casts
util/: Remove spaces after casts
vbox/: Remove spaces after casts
vmx/: Remove spaces after casts
vz/: Remove spaces after casts
xenapi/: Remove spaces after casts
xenconfig/: Remove spaces after casts
tests/: Remove spaces after casts
tools/: Remove spaces after casts
Remove spaces after casts in rest of the files
Prohibit space after cast
cfg.mk | 6 +
docs/hacking.html.in | 9 +
examples/object-events/event-test.c | 34 +-
src/access/viraccessdriverpolkit.c | 2 +-
src/admin/admin_remote.c | 54 +--
src/admin/admin_server_dispatch.c | 8 +-
src/conf/cpu_conf.c | 6 +-
src/conf/device_conf.c | 2 +-
src/conf/domain_audit.c | 4 +-
src/conf/domain_conf.c | 88 ++--
src/conf/interface_conf.c | 4 +-
src/conf/network_conf.c | 4 +-
src/conf/nwfilter_params.c | 4 +-
src/conf/storage_conf.c | 48 +-
src/conf/virchrdev.c | 4 +-
src/conf/virnodedeviceobj.c | 4 +-
src/conf/virsecretobj.c | 2 +-
src/conf/virstorageobj.c | 4 +-
src/cpu/cpu_ppc64.c | 4 +-
src/esx/esx_driver.c | 2 +-
src/hyperv/hyperv_driver.c | 8 +-
src/hyperv/hyperv_wmi.c | 24 +-
src/internal.h | 4 +-
src/libvirt-domain.c | 4 +-
src/libvirt-host.c | 2 +-
src/libvirt-lxc.c | 4 +-
src/libvirt-stream.c | 4 +-
src/libxl/libxl_conf.c | 4 +-
src/locking/lock_driver_sanlock.c | 28 +-
src/lxc/lxc_cgroup.c | 2 +-
src/lxc/lxc_controller.c | 4 +-
src/lxc/lxc_domain.c | 4 +-
src/lxc/lxc_driver.c | 4 +-
src/lxc/lxc_monitor.c | 2 +-
src/lxc/lxc_native.c | 4 +-
src/lxc/lxc_process.c | 4 +-
src/network/bridge_driver.c | 4 +-
src/nwfilter/nwfilter_dhcpsnoop.c | 6 +-
src/phyp/phyp_driver.c | 2 +-
src/qemu/qemu_agent.c | 4 +-
src/qemu/qemu_alias.c | 2 +-
src/qemu/qemu_block.c | 2 +-
src/qemu/qemu_capabilities.c | 20 +-
src/qemu/qemu_command.c | 18 +-
src/qemu/qemu_domain.c | 32 +-
src/qemu/qemu_domain_address.c | 12 +-
src/qemu/qemu_driver.c | 68 +--
src/qemu/qemu_hostdev.c | 2 +-
src/qemu/qemu_hotplug.c | 8 +-
src/qemu/qemu_migration.c | 2 +-
src/qemu/qemu_monitor.c | 2 +-
src/qemu/qemu_monitor_json.c | 4 +-
src/qemu/qemu_parse_command.c | 6 +-
src/qemu/qemu_process.c | 10 +-
src/remote/remote_daemon_dispatch.c | 50 +-
src/remote/remote_daemon_stream.c | 2 +-
src/remote/remote_driver.c | 620 ++++++++++++-------------
src/remote/remote_protocol.x | 18 +-
src/rpc/virnetclientstream.c | 4 +-
src/rpc/virnetserverclient.c | 2 +-
src/rpc/virnetserverprogram.c | 2 +-
src/rpc/virnetsocket.c | 6 +-
src/security/security_apparmor.c | 6 +-
src/security/security_dac.c | 42 +-
src/security/security_selinux.c | 14 +-
src/security/virt-aa-helper.c | 2 +-
src/storage/storage_backend_fs.c | 2 +-
src/storage/storage_backend_gluster.c | 6 +-
src/storage/storage_backend_logical.c | 2 +-
src/storage/storage_backend_vstorage.c | 6 +-
src/storage/storage_driver.c | 6 +-
src/storage/storage_util.c | 44 +-
src/test/test_driver.c | 4 +-
src/uml/uml_driver.c | 6 +-
src/util/iohelper.c | 2 +-
src/util/viralloc.h | 12 +-
src/util/virarptable.c | 2 +-
src/util/viratomic.h | 14 +-
src/util/virbitmap.c | 4 +-
src/util/virbuffer.c | 2 +-
src/util/vircgroup.c | 10 +-
src/util/vircommand.c | 4 +-
src/util/virdnsmasq.c | 2 +-
src/util/virfdstream.c | 6 +-
src/util/virfile.c | 46 +-
src/util/virfirmware.c | 4 +-
src/util/virhostcpu.c | 2 +-
src/util/virhostmem.c | 8 +-
src/util/viridentity.c | 2 +-
src/util/virjson.c | 4 +-
src/util/virlog.c | 8 +-
src/util/virmacaddr.c | 2 +-
src/util/virmacmap.c | 2 +-
src/util/virnetdev.c | 4 +-
src/util/virnetdevtap.c | 2 +-
src/util/virobject.c | 2 +-
src/util/virpidfile.c | 6 +-
src/util/virpolkit.c | 4 +-
src/util/virprocess.c | 26 +-
src/util/virresctrl.c | 4 +-
src/util/virsexpr.c | 8 +-
src/util/virstoragefile.c | 8 +-
src/util/virstring.c | 14 +-
src/util/virsysinfo.c | 2 +-
src/util/virsystemd.c | 4 +-
src/util/virthreadjob.c | 4 +-
src/util/virthreadpool.c | 4 +-
src/util/virtime.c | 4 +-
src/util/virutil.c | 26 +-
src/util/virutil.h | 6 +-
src/util/virxml.c | 30 +-
src/vbox/vbox_common.c | 4 +-
src/vbox/vbox_tmpl.c | 2 +-
src/vmx/vmx.c | 4 +-
src/vz/vz_utils.h | 2 +-
src/xenapi/xenapi_driver.c | 4 +-
src/xenconfig/xen_common.c | 4 +-
src/xenconfig/xen_xl.c | 2 +-
tests/domaincapstest.c | 4 +-
tests/qemuhotplugtest.c | 2 +-
tests/qemumonitorjsontest.c | 42 +-
tests/qemuxml2argvtest.c | 4 +-
tests/testutils.c | 2 +-
tests/testutils.h | 2 +-
tests/testutilshostcpus.h | 88 ++--
tests/virbitmaptest.c | 4 +-
tests/vircaps2xmltest.c | 2 +-
tests/virfiletest.c | 18 +-
tests/virfilewrapper.c | 2 +-
tests/virhashtest.c | 8 +-
tests/virhostcputest.c | 2 +-
tests/virhostdevtest.c | 2 +-
tests/virpcimock.c | 2 +-
tests/virpcitest.c | 2 +-
tests/virresctrltest.c | 2 +-
tests/virschematest.c | 2 +-
tests/virstoragetest.c | 42 +-
tests/virstringtest.c | 12 +-
tests/virusbmock.c | 2 +-
tools/nss/libvirt_nss.c | 14 +-
tools/virsh-domain-monitor.c | 14 +-
tools/virsh-domain.c | 16 +-
tools/virsh-interface.c | 12 +-
tools/virsh-network.c | 10 +-
tools/virsh-nodedev.c | 14 +-
tools/virsh-nwfilter.c | 6 +-
tools/virsh-pool.c | 36 +-
tools/virsh-secret.c | 6 +-
tools/virsh-util.c | 4 +-
tools/virsh-volume.c | 16 +-
tools/virt-admin.c | 4 +-
tools/vsh.c | 18 +-
152 files changed, 1097 insertions(+), 1082 deletions(-)
--
2.17.0
6 years
[libvirt] [ocaml] reset and resync the libvirt-ocaml repository
by Pino Toscano
Hi,
for reasons mostly lost in the history, after the libvirt-ocaml
repository was converted to git, it was not used by its main author
(Rich Jones); the development continued on Rich's git, at
http://git.annexia.org/?p=ocaml-libvirt.git;a=summary
After a talk with Rich, we agreed that it was better to move the
development back to libvirt.org, just like all the other bindings.
There are two problems however:
1) the first 38 commits have an bad author/committer date, and this is
also the reason why the existing libvirt-ocaml is not mirrored on
github
2) the top 3 commits on libvirt-ocaml were not integrated back to
Rich's ocaml-libvirt, and maybe their content might not be totally
OK (I will let Rich comment more on this)
While rewriting history is bad,
- most probably there are not many users of libvirt-ocaml around,
- the repository itself is very small (< 500k),
- in general it will better to have a working repository
So what I'm proposing is to replace the libvirt-ocaml repository with
a fixed version of Rich's ocaml-libvirt, and directly on the git hosting
side (i.e. not using force-push on the current one). Rich has already
commit access for libvirt, so there are no problems to keep his
maintainer role on it. Once done, we can notify users in this list
about it.
What do you think? Is it an acceptable path forward?
--
Pino Toscano
6 years
[libvirt] [PATCHv2 0/4] Introduce x86 RDT (CMT&MBM) host capability
by Wang Huaqiang
This series of patches introduced the x86 Cache Monitoring Technology
(CMT) to libvirt by interacting with kernel resource control (resctrl)
interface. CMT is one of the Intel(R) x86 CPU feature which belongs to
the Resource Director Technology (RDT). CMT reports the occupancy of the
last level cache, which is shared by all CPU cores.
In v1 series, we are introducing CMT for libvirt, including reporting
host capability and creating CMT groups. Introducing host capability
is pretty much a well self-contained step, we only cover this step in
this series. As an extension of v1, MBM capability is also introduced.
These patches will not cover the part of creating CMT groups, which
will be subsequent patches.
We have serval discussion about the enabling of CMT, please refer to
following links for the RFCs.
RFCv3
https://www.redhat.com/archives/libvir-list/2018-August/msg01213.html
RFCv2
https://www.redhat.com/archives/libvir-list/2018-July/msg00409.html
https://www.redhat.com/archives/libvir-list/2018-July/msg01241.html
RFCv1
https://www.redhat.com/archives/libvir-list/2018-June/msg00674.html
1. About reason why CMT is necessary for libvirt?
The perf events of 'CMT, MBML, MBMT' have been phased out since Linux
kernel commit c39a0e2c8850f08249383f2425dbd8dbe4baad69, in libvirt
the perf based cmt,mbm will not work with the latest linux kernel. These
patches add CMT feature to libvirt through kernel resctrlfs interface.
2. Interfaces for CMT from the high level.
CMT, CAT, MBM and MBA are orthogonal features, each could works
independently.
If 'CMT' is enabled in host, then a 'cache monitor' is introduced for
cache, which is role is monitoring the last level cache utilization
of target system process. Cache monitor capabilities is shown under
element <cache>.
'MBM', a monitor named memory bandwidth monitor is introduced, for
role of monitoring memory bandwidth utilization. The capability
information block is located under <memory bandwidth> element.
2.1 Query the host capability of CMT.
The element 'monitor' represents the host capabilities of CMT.
The explanations of involved attributes:
- 'maxMonitors': denotes the maximum monitoring groups could be
created, which is limited by the number of hardware 'RMID'.
- 'reuseThreshold': An adjustable value affects the final reuse of
resources used by monitor. After the action of removing a
monitor, the kernel may not release all hardware resources that
monitor used immediately if the cache occupancy value associated
with 'removed' monitor is above this threshold. Once the cache
occupancy is below this threshold, the underlying hardware
resource will be reclaimed and be put into the resource pool
for next reusing.
- 'llc_occupancy': a feature of CMT, reporting the last level cache
occupancy information.
- 'mbm_total_bytes': a feature of MBM, reporting total memory
bandwidth utilization, in bytes, including local memory and
remote memory for multi-node system.
- 'mbm_local_bytes': a feature of MBM, reporting only local memory
bandwidth utilization.
# virsh capabilities
...
<cache>
<bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'>
<control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/>
</bank>
<bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'>
<control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/>
</bank>
+ <monitor level='3' reuseThreshold='270336' maxMonitors='176'>
+ <feature name='llc_occupancy'/>
+ </monitor>
</cache>
<memory_bandwidth>
<node id='0' cpus='0-5'>
<control granularity='10' min ='10' maxAllocs='4'/>
</node>
<node id='1' cpus='6-11'>
<control granularity='10' min ='10' maxAllocs='4'/>
</node>
+ <monitor maxMonitors='176'>
+ <feature name='mbm_total_bytes'/>
+ <feature name='mbm_local_bytes'/>
+ </monitor>
</memory_bandwidth>
...
</host>
Changes since v1:
- Introduced MBM capability.
- Capability layout changed
* Moved <monitor> from cahe <bank> to <cache>
* Renamed <Threshold> to <reuseThreshold>
- Document for 'reuseThreshold' changed.
- Introduced API virResctrlInfoGetMonitorPrefix
- Added more tests, covering standalone CMT, fake new
feature.
- Creating CMT resource control group will be
subsequent job.
Wang Huaqiang (4):
util: Introduce monitor capability interface
conf: Refactor cache bank capability structure
conf: Refactor memory bandwidth capability structure
conf: Introduce RDT monitor host capability
docs/schemas/capability.rng | 37 +++-
src/conf/capabilities.c | 126 ++++++++---
src/conf/capabilities.h | 24 ++-
src/libvirt_private.syms | 2 +
src/util/virresctrl.c | 240 +++++++++++++++++++++
src/util/virresctrl.h | 62 ++++++
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../resctrl/info/L3_MON/mon_features | 1 +
.../resctrl/info/L3_MON/num_rmids | 1 +
.../linux-resctrl-cmt/resctrl/manualres/cpus | 1 +
.../linux-resctrl-cmt/resctrl/manualres/schemata | 1 +
.../linux-resctrl-cmt/resctrl/manualres/tasks | 0
.../linux-resctrl-cmt/resctrl/schemata | 1 +
tests/vircaps2xmldata/linux-resctrl-cmt/system | 1 +
.../resctrl/info/L3/cbm_mask | 1 +
.../resctrl/info/L3/min_cbm_bits | 1 +
.../resctrl/info/L3/num_closids | 1 +
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../resctrl/info/L3_MON/mon_features | 10 +
.../resctrl/info/L3_MON/num_rmids | 1 +
.../resctrl/info/MB/bandwidth_gran | 1 +
.../resctrl/info/MB/min_bandwidth | 1 +
.../resctrl/info/MB/num_closids | 1 +
.../resctrl/manualres/cpus | 1 +
.../resctrl/manualres/schemata | 1 +
.../resctrl/manualres/tasks | 0
.../linux-resctrl-fake-feature/resctrl/schemata | 1 +
.../linux-resctrl-fake-feature/system | 1 +
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../linux-resctrl/resctrl/info/L3_MON/mon_features | 3 +
.../linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 +
.../vircaps2xmldata/vircaps-x86_64-resctrl-cmt.xml | 53 +++++
.../vircaps-x86_64-resctrl-fake-feature.xml | 73 +++++++
tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 7 +
tests/vircaps2xmltest.c | 2 +
35 files changed, 624 insertions(+), 36 deletions(-)
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/mon_features
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/num_rmids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/cpus
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/schemata
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/tasks
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/schemata
create mode 120000 tests/vircaps2xmldata/linux-resctrl-cmt/system
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/cbm_mask
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/min_cbm_bits
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/num_closids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/mon_features
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/num_rmids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/bandwidth_gran
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/min_bandwidth
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/num_closids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/cpus
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/schemata
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/tasks
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/schemata
create mode 120000 tests/vircaps2xmldata/linux-resctrl-fake-feature/system
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids
create mode 100644 tests/vircaps2xmldata/vircaps-x86_64-resctrl-cmt.xml
create mode 100644 tests/vircaps2xmldata/vircaps-x86_64-resctrl-fake-feature.xml
--
2.7.4
6 years
[libvirt] [PATCH v2] qemu: agent: Avoid agentError when closing the QEMU agent
by Wang Yechao
After calling qemuAgentClose(), it is still possible for
the QEMU Agent I/O event callback to get invoked. This
will trigger an agent error because mon->fd has been set
to -1 at this point. Then vm->privateData->agentError is
always 'true' except that restart libvirtd or restart
qemu-guest-agent process in guest.
Silently ignore the case where mon->fd is -1, likewise for
mon->watch being zero.
Signed-off-by: Wang Yechao <wang.yechao255(a)zte.com.cn>
---
v1 patch:
https://www.redhat.com/archives/libvir-list/2018-September/msg01382.html
Changes in v2:
- do not set agentError, let agent state as disconnected instead of error.
---
src/qemu/qemu_agent.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
index 97ad0e7..d842b0e 100644
--- a/src/qemu/qemu_agent.c
+++ b/src/qemu/qemu_agent.c
@@ -530,6 +530,9 @@ static void qemuAgentUpdateWatch(qemuAgentPtr mon)
VIR_EVENT_HANDLE_HANGUP |
VIR_EVENT_HANDLE_ERROR;
+ if (!mon->watch)
+ return;
+
if (mon->lastError.code == VIR_ERR_OK) {
events |= VIR_EVENT_HANDLE_READABLE;
@@ -555,6 +558,12 @@ qemuAgentIO(int watch, int fd, int events, void *opaque)
VIR_DEBUG("Agent %p I/O on watch %d fd %d events %d", mon, watch, fd, events);
#endif
+ if (mon->fd == -1 || mon->watch == 0) {
+ virObjectUnlock(mon);
+ virObjectUnref(mon);
+ return;
+ }
+
if (mon->fd != fd || mon->watch != watch) {
if (events & (VIR_EVENT_HANDLE_HANGUP | VIR_EVENT_HANDLE_ERROR))
eof = true;
@@ -788,8 +797,10 @@ void qemuAgentClose(qemuAgentPtr mon)
virObjectLock(mon);
if (mon->fd >= 0) {
- if (mon->watch)
+ if (mon->watch) {
virEventRemoveHandle(mon->watch);
+ mon->watch = 0;
+ }
VIR_FORCE_CLOSE(mon->fd);
}
--
1.8.3.1
6 years
[libvirt] [PATCH 00/11] qemu: Improve / cleanup QEMU binary handling
by Andrea Bolognani
This is the output of 'virsh capabilities' on my laptop:
<guest>
<os_type>hvm</os_type>
<arch name='x86_64'>
<wordsize>64</wordsize>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<machine maxCpus='255'>pc-i440fx-3.0</machine>
<machine canonical='pc-i440fx-3.0' maxCpus='255'>pc</machine>
<machine maxCpus='288'>pc-q35-3.0</machine>
<machine canonical='pc-q35-3.0' maxCpus='288'>q35</machine>
<!-- Actually way more machine types listed here -->
<domain type='qemu'/>
<domain type='kvm'>
<emulator>/usr/bin/qemu-kvm</emulator>
<machine maxCpus='255'>pc-i440fx-3.0</machine>
<machine canonical='pc-i440fx-3.0' maxCpus='255'>pc</machine>
<machine maxCpus='288'>pc-q35-3.0</machine>
<machine canonical='pc-q35-3.0' maxCpus='288'>q35</machine>
<!-- Actually way more machine types listed here -->
</domain>
</arch>
<!-- Other stuff we don't care about -->
</guest>
Notice how all machine types are listed twice, and how we report
that qemu-system-x86_64 for TCG guests qemu-kvm must be used for
KVM guests - which is inaccurate, since the former can run KVM
guests just fine.
After this series, the output is much more reasonable:
<guest>
<os_type>hvm</os_type>
<arch name='x86_64'>
<wordsize>64</wordsize>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<machine maxCpus='255'>pc-i440fx-3.0</machine>
<machine canonical='pc-i440fx-3.0' maxCpus='255'>pc</machine>
<machine maxCpus='288'>pc-q35-3.0</machine>
<machine canonical='pc-q35-3.0' maxCpus='288'>q35</machine>
<!-- Actually way more machine types listed here -->
<domain type='qemu'/>
<domain type='kvm'/>
</arch>
<!-- Other stuff we don't care about -->
</guest>
As a bonus the code gets *simpler* in the process instead of more
complicated, and we even get to shave off ~100 lines! Yay!
Andrea Bolognani (11):
qemu: Move comments to virQEMUCapsGuestIsNative()
qemu: Don't duplicate binary name in capabilities
qemu: Move armv7l-on-aarch64 special case
qemu: Stop looking after finding the first binary
qemu: Expect a single binary in virQEMUCapsInitGuest()
qemu: Remove unnecessary variables
qemu: Don't look for "qemu-kvm" and "kvm" binaries
qemu: Simplify QEMU binary search
qemu: Rename qemubinCaps => qemuCaps
qemu: Refactor virQEMUCapsCacheLookupByArch()
qemu: Prefer qemu-system-* binaries
src/qemu/qemu_capabilities.c | 170 +++++++-----------
src/qemu/qemu_capabilities.h | 4 +-
.../qemucaps2xmloutdata/caps_1.5.3.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_1.6.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_1.7.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.1.1.x86_64.xml | 4 +-
.../caps_2.10.0.aarch64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.10.0.ppc64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.10.0.s390x.xml | 4 +-
.../caps_2.10.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.11.0.s390x.xml | 4 +-
.../caps_2.11.0.x86_64.xml | 4 +-
.../caps_2.12.0.aarch64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.12.0.ppc64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.12.0.s390x.xml | 4 +-
.../caps_2.12.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.4.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.5.0.x86_64.xml | 4 +-
.../caps_2.6.0.aarch64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.6.0.ppc64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.6.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.7.0.s390x.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.7.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.8.0.s390x.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.8.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.9.0.ppc64.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.9.0.s390x.xml | 4 +-
.../qemucaps2xmloutdata/caps_2.9.0.x86_64.xml | 4 +-
.../qemucaps2xmloutdata/caps_3.0.0.ppc64.xml | 4 +-
.../qemucaps2xmloutdata/caps_3.0.0.x86_64.xml | 4 +-
tests/qemucaps2xmltest.c | 2 -
31 files changed, 97 insertions(+), 191 deletions(-)
--
2.17.1
6 years
[libvirt] [RFC v2 00/16] Add vhost-user-gpu support
by marcandre.lureau@redhat.com
From: Marc-André Lureau <marcandre.lureau(a)redhat.com>
Hi,
This series of patches add support for running a virtio GPU in a
seperate process, using vhost-user.
The QEMU series "[PATCH v4 00/29] vhost-user for input & GPU" is still
under review, and will hopefully land in 3.1. There are several
benefits of running the GPU process in an external process, since Mesa
is rather heavy on the qemu main loop, and may block for a while or
crash. I observe x5 performance improvements with Unigine Heaven 4
benchmark.
The external GPU process is started with one end of the vhost-user
socket pair, the other end is given to a QEMU chardev. It is also
added to the emulator cgroup to restrict its CPU usage.
vhost-user requires shared VM memory. The first patches ease and
improve shared memory setup, by using memfd. They could be considered
seperatly, but that's the setup I'd recommend with vhost-user-gpu.
Review welcome!
RFCv2:
- add new memfd memroyBacking source type
- drop the implicit shared memory NUMA setup approach, explicit now
required
- rebased
Marc-André Lureau (16):
qemu: add memory-backend-memfd capability check
qemu: add memfd memory backing
qemu: add vhost-user-gpu capabilities checks
domain: add "vhost-user" video type
qemu: fill the vhost-user video type capability
qemu: check that qemu is vhost-user-vga capable
qemu: vhost-user is valid as non-primary video device
qemu: validate vhost-user video model
qemu: add qemuSecurityStartVhostUserGPU helper
qemu: add vhost-user-gpu helper unit
qemu: restrict 'virgl=' option to 'virtio' video type
qemu: set default address type on vhost-user video model
qemu: start/stop the vhost-user-gpu external device
qemu: build vhost-user-backend for vhost-user-gpu
qemu: build vhost-user-gpu video device arguments
tests: add vhost-user-gpu xml2argv tests
docs/formatdomain.html.in | 11 +-
docs/schemas/domaincommon.rng | 2 +
src/conf/device_conf.h | 1 +
src/conf/domain_conf.c | 7 +-
src/conf/domain_conf.h | 2 +
src/qemu/Makefile.inc.am | 2 +
src/qemu/qemu_capabilities.c | 8 +
src/qemu/qemu_capabilities.h | 3 +
src/qemu/qemu_command.c | 135 ++++++--
src/qemu/qemu_domain.c | 8 +-
src/qemu/qemu_domain_address.c | 4 +-
src/qemu/qemu_extdevice.c | 47 ++-
src/qemu/qemu_process.c | 6 +-
src/qemu/qemu_security.c | 48 +++
src/qemu/qemu_security.h | 6 +
src/qemu/qemu_vhost_user_gpu.c | 318 ++++++++++++++++++
src/qemu/qemu_vhost_user_gpu.h | 48 +++
tests/domaincapsschemadata/full.xml | 1 +
.../caps_2.12.0.aarch64.xml | 1 +
.../caps_2.12.0.ppc64.xml | 1 +
.../caps_2.12.0.s390x.xml | 1 +
.../caps_2.12.0.x86_64.xml | 1 +
.../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 +
.../caps_3.0.0.riscv32.xml | 1 +
.../caps_3.0.0.riscv64.xml | 1 +
.../caps_3.0.0.x86_64.xml | 1 +
tests/qemuxml2argvdata/memfd-memory-numa.args | 27 ++
tests/qemuxml2argvdata/memfd-memory-numa.xml | 33 ++
.../vhost-user-gpu-secondary.args | 33 ++
.../vhost-user-gpu-secondary.xml | 44 +++
tests/qemuxml2argvdata/vhost-user-vga.args | 30 ++
tests/qemuxml2argvdata/vhost-user-vga.xml | 41 +++
tests/qemuxml2argvtest.c | 15 +
33 files changed, 849 insertions(+), 39 deletions(-)
create mode 100644 src/qemu/qemu_vhost_user_gpu.c
create mode 100644 src/qemu/qemu_vhost_user_gpu.h
create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.args
create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.args
create mode 100644 tests/qemuxml2argvdata/vhost-user-gpu-secondary.xml
create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.args
create mode 100644 tests/qemuxml2argvdata/vhost-user-vga.xml
--
2.19.0.rc0.48.gb9dfa238d5
6 years
[libvirt] domain XML for tracking libosinfo ID
by Cole Robinson
Right now in virt-manager we only track a VM's OS name (win10, fedora28,
etc.) during the VM install phase. This piece of data is important
post-install though: if the user adds a new disk to the VM later, we
want to be able to ask libosinfo about what devices the installed OS
supports, so we can set optimal defaults, like enabling virtio.
There isn't any standard libvirt XML field to track this kind of info
though, so apps have to invent their own schema. nova and rhev do it
indirectly AFAICT. gnome-boxes does it directly with XML like this:
<metadata>
<boxes:gnome-boxes xmlns:boxes="https://wiki.gnome.org/Apps/Boxes">
<os-id>http://fedoraproject.org/fedora/28</os-id>
....
</boxes:gnome-boxes>
</metadata>
I want to add something similar to virt-manager but it seems a shame to
invent our own private schema for something that most non-trivial virt
apps will want to know about. I was thinking a schema we could document
with libosinfo, something like
<metadata>
<libosinfo
xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<os-id>http://fedoraproject.org/fedora/28</os-id>
</libosinfo>
</metadata>
FWIW there's an oooold bug about possible tracking something like this
in the domain XML as a first class citizen:
https://bugzilla.redhat.com/show_bug.cgi?id=509164
But I think nowadays that's a bad fit and is likely off the table
Thoughts?
- Cole
6 years