[libvirt] [PATCH] virfile: Fix virFileExists commentary
by Erik Skultety
Arguably though, function returning only on success is a very
interesting, although quite impractical concept. Also, the errno isn't
and shouldn't be preserved in this case, since the errno can be directly
fed to the virReportSystemError.
Signed-off-by: Erik Skultety <eskultet(a)redhat.com>
---
Pushed as trivial.
src/util/virfile.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/util/virfile.c b/src/util/virfile.c
index 6ba67bf..41cdca9 100644
--- a/src/util/virfile.c
+++ b/src/util/virfile.c
@@ -1820,7 +1820,8 @@ virFileIsDir(const char *path)
* virFileExists: Check for presence of file
* @path: Path of file to check
*
- * Returns if the file exists. Preserves errno in case it does not exist.
+ * Returns true if the file exists, false if it doesn't, setting errno
+ * appropriately.
*/
bool
virFileExists(const char *path)
--
2.10.2
7 years, 10 months
[libvirt] [PATCH v2 0/3] introduce virBufferEscapeN and fix escape bug
by Pavel Hrdina
I've dropped the STRCAT patches in v2. The whole purpose of the STRCAT was
to avoid calling *strcspn* several times but it was probably a bad call.
Now we call that function for each pair but if no escaping is required
we at least don't allocate the structure for that pair and later in the
*virBufferEscapeN* we only iterate over the pairs that will escape at least
one character.
Pavel Hrdina (3):
util: virbuffer: introduce virBufferEscapeN
util: virqemu: introduce virQEMUBuildBufferEscape
qemu: properly escape socket path for graphics
src/libvirt_private.syms | 2 +
src/qemu/qemu_command.c | 6 +-
src/util/virbuffer.c | 101 +++++++++++++++++++++
src/util/virbuffer.h | 2 +
src/util/virqemu.c | 17 ++++
src/util/virqemu.h | 1 +
.../qemuxml2argvdata/qemuxml2argv-name-escape.args | 5 +-
.../qemuxml2argvdata/qemuxml2argv-name-escape.xml | 7 +-
tests/qemuxml2argvtest.c | 3 +-
tests/virbuftest.c | 41 +++++++++
10 files changed, 179 insertions(+), 6 deletions(-)
--
2.11.1
7 years, 10 months
Re: [libvirt] A question for external snapshots with flag“VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE”
by Richard W.M. Jones
[Adding the correct mailing list, removing libvirt-list-owner]
On Fri, Feb 24, 2017 at 05:35:31PM +0800, WangJie (Captain) wrote:
> Hello, I got a question here. When we create consistent active external snapshots with flag “VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE” , the qemuDomainSnapshotFSFreeze will be called firstly to freeze all filesystems in vm, and then create snapshots. For windows vm, freezing filesystems used by vss service. If IO pressure in vm is too big or internal error happened in vss , freezing filesytems will failed , and qemuDomainSnapshotFSFreeze returns 0 which meaning that no filesystems are frozen. In the function qemuDomainSnapshotCreateActiveExternal, libvirt creates external snapshots all the same in such a situation that qemuDomainSnapshotFSFreeze returns 0, but the created snapshots are not consistent snapshots in such a situation . So shouldn't we abandon creating snapshots and goto cleanup in the situation that qemuDomainSnapshotFSFreeze returns 0?
>
>
> The code below:
>
>
> static int
> qemuDomainSnapshotCreateActiveExternal(virConnectPtr conn,
> virQEMUDriverPtr driver,
> virDomainObjPtr vm,
> virDomainSnapshotObjPtr snap,
> unsigned int flags)
> {
> virObjectEventPtr event;
> bool resume = false;
> int ret = -1;
> qemuDomainObjPrivatePtr priv = vm->privateData;
> char *xml = NULL;
> bool memory = snap->def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL;
> bool memory_unlink = false;
> bool atomic = !!(flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC);
> bool transaction = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_TRANSACTION);
> int thaw = 0; /* 1 if freeze succeeded, -1 if freeze failed */
> bool pmsuspended = false;
> virQEMUDriverConfigPtr cfg = NULL;
> int compressed;
> char *compressedpath = NULL;
>
> /* If quiesce was requested, then issue a freeze command, and a
> * counterpart thaw command when it is actually sent to agent.
> * The command will fail if the guest is paused or the guest agent
> * is not running, or is already quiesced. */
> if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE) {
> int freeze = qemuDomainSnapshotFSFreeze(driver, vm, NULL, 0);
> if (freeze < 0) {
> /* the helper reported the error */
> if (freeze == -2)
> thaw = -1; /* the command is sent but agent failed */
> goto cleanup;
> }
> thaw = 1;
> }
>
> ......
> ......
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
7 years, 10 months
[libvirt] [PATCH] qemuProcessInit: Jump onto correct label in case of error
by Michal Privoznik
After eca76884ea in case of error in qemuDomainSetPrivatePaths()
in pretended start we jump to stop. I've changed this during
review from 'cleanup' which turned out to be correct. Well, sort
of. We can't call qemuProcessStop() as it decrements
driver->nactive and we did not increment it. However, it calls
virDomainObjRemoveTransientDef() which is basically the only
function we need to call. So call that function and goto cleanup;
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
src/qemu/qemu_process.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index df1fa0371..9306e0e18 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -4758,8 +4758,10 @@ qemuProcessInit(virQEMUDriverPtr driver,
goto cleanup;
if (flags & VIR_QEMU_PROCESS_START_PRETEND) {
- if (qemuDomainSetPrivatePaths(driver, vm) < 0)
- goto stop;
+ if (qemuDomainSetPrivatePaths(driver, vm) < 0) {
+ virDomainObjRemoveTransientDef(vm);
+ goto cleanup;
+ }
} else {
vm->def->id = qemuDriverAllocateID(driver);
qemuDomainSetFakeReboot(driver, vm, false);
--
2.11.0
7 years, 10 months
[libvirt] [PATCH go v2] Add support for perf events
by Nitesh Konkar
Signed-off-by: Nitesh Konkar <nitkon12(a)linux.vnet.ibm.com>
---
connect.go | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
domain.go | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
domain_compat.h | 36 ++++++++++++++++++++++++++++++++++++
3 files changed, 144 insertions(+)
diff --git a/connect.go b/connect.go
index 3c53485..e24cf66 100644
--- a/connect.go
+++ b/connect.go
@@ -2264,6 +2264,24 @@ type DomainStatsPerf struct {
StalledCyclesBackend uint64
RefCpuCyclesSet bool
RefCpuCycles uint64
+ CpuClockSet bool
+ CpuClock uint64
+ TaskClockSet bool
+ TaskClock uint64
+ PageFaultsSet bool
+ PageFaults uint64
+ ContextSwitchesSet bool
+ ContextSwitches uint64
+ CpuMigrationsSet bool
+ CpuMigrations uint64
+ PageFaultsMinSet bool
+ PageFaultsMin uint64
+ PageFaultsMajSet bool
+ PageFaultsMaj uint64
+ AlignmentFaultsSet bool
+ AlignmentFaults uint64
+ EmulationFaultsSet bool
+ EmulationFaults uint64
}
func getDomainStatsPerfFieldInfo(params *DomainStatsPerf) map[string]typedParamsFieldInfo {
@@ -2320,6 +2338,42 @@ func getDomainStatsPerfFieldInfo(params *DomainStatsPerf) map[string]typedParams
set: ¶ms.RefCpuCyclesSet,
ul: ¶ms.RefCpuCycles,
},
+ "perf.cpu_clock": typedParamsFieldInfo{
+ set: ¶ms.CpuClockSet,
+ ul: ¶ms.CpuClock,
+ },
+ "perf.task_clock": typedParamsFieldInfo{
+ set: ¶ms.TaskClockSet,
+ ul: ¶ms.TaskClock,
+ },
+ "perf.page_faults": typedParamsFieldInfo{
+ set: ¶ms.PageFaultsSet,
+ ul: ¶ms.PageFaults,
+ },
+ "perf.context_switches": typedParamsFieldInfo{
+ set: ¶ms.ContextSwitchesSet,
+ ul: ¶ms.ContextSwitches,
+ },
+ "perf.cpu_migrations": typedParamsFieldInfo{
+ set: ¶ms.CpuMigrationsSet,
+ ul: ¶ms.CpuMigrations,
+ },
+ "perf.page_faults_min": typedParamsFieldInfo{
+ set: ¶ms.PageFaultsMinSet,
+ ul: ¶ms.PageFaultsMin,
+ },
+ "perf.page_faults_maj": typedParamsFieldInfo{
+ set: ¶ms.PageFaultsMajSet,
+ ul: ¶ms.PageFaultsMaj,
+ },
+ "perf.alignment_faults": typedParamsFieldInfo{
+ set: ¶ms.AlignmentFaultsSet,
+ ul: ¶ms.AlignmentFaults,
+ },
+ "perf.emulation_faults": typedParamsFieldInfo{
+ set: ¶ms.EmulationFaultsSet,
+ ul: ¶ms.EmulationFaults,
+ },
}
}
diff --git a/domain.go b/domain.go
index 2bd9852..1cb0b95 100644
--- a/domain.go
+++ b/domain.go
@@ -3195,6 +3195,24 @@ type DomainPerfEvents struct {
StalledCyclesBackend bool
RefCpuCyclesSet bool
RefCpuCycles bool
+ CpuClockSet bool
+ CpuClock bool
+ TaskClockSet bool
+ TaskClock bool
+ PageFaultsSet bool
+ PageFaults bool
+ ContextSwitchesSet bool
+ ContextSwitches bool
+ CpuMigrationsSet bool
+ CpuMigrations bool
+ PageFaultsMinSet bool
+ PageFaultsMin bool
+ PageFaultsMajSet bool
+ PageFaultsMaj bool
+ AlignmentFaultsSet bool
+ AlignmentFaults bool
+ EmulationFaultsSet bool
+ EmulationFaults bool
}
/* Remember to also update DomainStatsPerf in connect.go when adding to the stuct above */
@@ -3253,6 +3271,42 @@ func getDomainPerfEventsFieldInfo(params *DomainPerfEvents) map[string]typedPara
set: ¶ms.RefCpuCyclesSet,
b: ¶ms.RefCpuCycles,
},
+ C.VIR_PERF_PARAM_CPU_CLOCK: typedParamsFieldInfo{
+ set: ¶ms.CpuClockSet,
+ b: ¶ms.CpuClock,
+ },
+ C.VIR_PERF_PARAM_TASK_CLOCK: typedParamsFieldInfo{
+ set: ¶ms.TaskClockSet,
+ b: ¶ms.TaskClock,
+ },
+ C.VIR_PERF_PARAM_PAGE_FAULTS: typedParamsFieldInfo{
+ set: ¶ms.PageFaultsSet,
+ b: ¶ms.PageFaults,
+ },
+ C.VIR_PERF_PARAM_CONTEXT_SWITCHES: typedParamsFieldInfo{
+ set: ¶ms.ContextSwitchesSet,
+ b: ¶ms.ContextSwitches,
+ },
+ C.VIR_PERF_PARAM_CPU_MIGRATIONS: typedParamsFieldInfo{
+ set: ¶ms.CpuMigrationsSet,
+ b: ¶ms.CpuMigrations,
+ },
+ C.VIR_PERF_PARAM_PAGE_FAULTS_MIN: typedParamsFieldInfo{
+ set: ¶ms.PageFaultsMinSet,
+ b: ¶ms.PageFaultsMin,
+ },
+ C.VIR_PERF_PARAM_PAGE_FAULTS_MAJ: typedParamsFieldInfo{
+ set: ¶ms.PageFaultsMajSet,
+ b: ¶ms.PageFaultsMaj,
+ },
+ C.VIR_PERF_PARAM_ALIGNMENT_FAULTS: typedParamsFieldInfo{
+ set: ¶ms.AlignmentFaultsSet,
+ b: ¶ms.AlignmentFaults,
+ },
+ C.VIR_PERF_PARAM_REF_EMULATION_FAULTS: typedParamsFieldInfo{
+ set: ¶ms.EmulationFaultsSet,
+ b: ¶ms.EmulationFaults,
+ },
}
}
diff --git a/domain_compat.h b/domain_compat.h
index 9540435..a114173 100644
--- a/domain_compat.h
+++ b/domain_compat.h
@@ -45,6 +45,42 @@
#define VIR_PERF_PARAM_REF_CPU_CYCLES "ref_cpu_cycles"
#endif
+#ifndef VIR_PERF_PARAM_CPU_CLOCK
+#define VIR_PERF_PARAM_CPU_CLOCK "cpu_clock"
+#endif
+
+#ifndef VIR_PERF_PARAM_TASK_CLOCK
+#define VIR_PERF_PARAM_TASK_CLOCK "task_clock"
+#endif
+
+#ifndef VIR_PERF_PARAM_PAGE_FAULTS
+#define VIR_PERF_PARAM_PAGE_FAULTS "page_faults"
+#endif
+
+#ifndef VIR_PERF_PARAM_CONTEXT_SWITCHES
+#define VIR_PERF_PARAM_CONTEXT_SWITCHES "context_switches"
+#endif
+
+#ifndef VIR_PERF_PARAM_CPU_MIGRATIONS
+#define VIR_PERF_PARAM_CPU_MIGRATIONS "cpu_migrations"
+#endif
+
+#ifndef VIR_PERF_PARAM_PAGE_FAULTS_MIN
+#define VIR_PERF_PARAM_PAGE_FAULTS_MIN "page_faults_min"
+#endif
+
+#ifndef VIR_PERF_PARAM_PAGE_FAULTS_MAJ
+#define VIR_PERF_PARAM_PAGE_FAULTS_MAJ "page_faults_maj"
+#endif
+
+#ifndef VIR_PERF_PARAM_ALIGNMENT_FAULTS
+#define VIR_PERF_PARAM__ALIGNMENT_FAULTS "alignment_faults"
+#endif
+
+#ifndef VIR_PERF_PARAM_EMULATION_FAULTS
+#define VIR_PERF_PARAM_EMULATION_FAULTS "emulation_faults"
+#endif
+
#ifndef VIR_DOMAIN_EVENT_ID_METADATA_CHANGE
#define VIR_DOMAIN_EVENT_ID_METADATA_CHANGE 23
#endif
--
1.9.3
7 years, 10 months
[libvirt] [PATCH 00/13] Add TLS support for migration
by John Ferlan
Patches 1, 3 -> 9 are primarily quite a bit of code motion in order to allow
reuse of the "core" of the chardev TLS code.
Theoretically speaking of course, these patches should work - I don't
have a TLS and migration environment to test with, so between following
the qemu command model on Daniel's blog and prior experience with the
chardev TLS would
I added the saving of a flag to the private qemu domain state, although
I'm not 100% sure it was necessary. At one time I created the source TLS
objects during the Begin phase, but later decided to wait until just
before the migration is run. I think the main reason to have the flag
would be a restart of libvirtd to let 'something' know migration using
TLS was configured. I think it may only be "necessary" in order to
repopulate the migSecinfo after libvirtd restart, but it's not entirely
clear. By the time I started thinking more about while writing this cover
letter it was too late to just remove.
Also rather than create the destination host TLS objects on the fly,
I modified the command line generation. That model could change to adding
the TLS objects once the destination is started and before the params are
set for the migration.
This 'model' is also going to be used for the NBD, but I figured I'd get
this posted now since it was already too long of a series.
John Ferlan (13):
qemu: Create #define for TLS configuration setup.
conf: Introduce migrate_tls_x509_cert_dir
qemu: Rename qemuAliasTLSObjFromChardevAlias
qemu: Introduce qemuDomainSecretMigrate{Prepare|Destroy}
qemu: Refactor hotplug to introduce qemuDomain{Add|Del}TLSObjects
qemu: Refactor qemuDomainGetChardevTLSObjects to converge code
qemu: Move qemuDomainSecretChardevPrepare call
qemu: Move qemuDomainPrepareChardevSourceTLS call
qemu: Introduce qemuDomainGetTLSObjects
qemu: Add TLS params to _qemuMonitorMigrationParams
Add new migration flag VIR_MIGRATE_TLS
qemu: Set up the migrate TLS objects for target
qemu: Set up the migration TLS objects for source
include/libvirt/libvirt-domain.h | 8 +
src/qemu/libvirtd_qemu.aug | 6 +
src/qemu/qemu.conf | 39 +++++
src/qemu/qemu_alias.c | 8 +-
src/qemu/qemu_alias.h | 2 +-
src/qemu/qemu_command.c | 33 +++-
src/qemu/qemu_command.h | 4 +-
src/qemu/qemu_conf.c | 43 +++--
src/qemu/qemu_conf.h | 5 +
src/qemu/qemu_domain.c | 78 ++++++++++
src/qemu/qemu_domain.h | 89 ++++++-----
src/qemu/qemu_hotplug.c | 312 ++++++++++++++++++++-----------------
src/qemu/qemu_hotplug.h | 24 +++
src/qemu/qemu_migration.c | 135 ++++++++++++++++
src/qemu/qemu_migration.h | 1 +
src/qemu/qemu_monitor.c | 12 +-
src/qemu/qemu_monitor.h | 7 +
src/qemu/qemu_monitor_json.c | 13 +-
src/qemu/qemu_process.c | 4 +
src/qemu/test_libvirtd_qemu.aug.in | 4 +
tools/virsh-domain.c | 7 +
21 files changed, 625 insertions(+), 209 deletions(-)
--
2.9.3
7 years, 10 months
[libvirt] [RFC PATCH 00/12] Add block write threshold event
by Peter Krempa
Since I'm a procrastinator and I wanted to do this after full node-name support
is added I'm kind of late for this release here. Since oVirt is asking for this
feature for a very long time I would like to deliver it ASAP though.
This series adds a event which is fired once a guest writes beyond a configured
offset in the backing file. The threshold offset is configurable via a new API.
This series is in RFC state since I did not manage to polish few details mostly
connected to node names:
- currently only the top level image can be selected
- some blockjobs may clear the remembered node names and thus break this
( re-detection of the backing chain needs to be connected with node name
detection )
- the node name detection code is VERY crude
- I'm not sure how well it plays with qcow images, since I select only the top
level node (qcow protocol) rather than the file backing it
- the documentation probably sucks since I did not proof-read
Any feedback is welcome.
Peter Krempa (12):
util: storage: Split out useful bits of virStorageFileParseChainIndex
util: storage: Add preliminary storage for node names into
virStorageSource
lib: Introduce event for tracking disk backing file write threshold
qemu: monitor: Add support for BLOCK_WRITE_THRESHOLD event
qemu: domain: Add helper to lookup disk by node name
qemu: domain: Add helper to generate indexed backing store names
qemu: process: Wire up firing of the
VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD event
lib: Add API for setting the threshold size for
VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD
virsh: Implement 'blockthreshold' command to call
virDomainSetBlockThreshold
qemu: domain: Add helper to look up disk soruce by the backing store
string
qemu: implement qemuDomainSetBlockThreshold
qemu: WIP: lookup nodenames
daemon/remote.c | 43 +++++++++
examples/object-events/event-test.c | 19 ++++
include/libvirt/libvirt-domain.h | 36 +++++++
src/conf/domain_event.c | 97 +++++++++++++++++++
src/conf/domain_event.h | 15 +++
src/driver-hypervisor.h | 8 ++
src/libvirt-domain.c | 51 ++++++++++
src/libvirt_private.syms | 4 +
src/libvirt_public.syms | 1 +
src/qemu/qemu_capabilities.c | 2 +
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_domain.c | 103 ++++++++++++++++++++
src/qemu/qemu_domain.h | 14 +++
src/qemu/qemu_driver.c | 64 +++++++++++++
src/qemu/qemu_monitor.c | 42 ++++++++-
src/qemu/qemu_monitor.h | 19 ++++
src/qemu/qemu_monitor_json.c | 65 ++++++++++++-
src/qemu/qemu_monitor_json.h | 6 ++
src/qemu/qemu_process.c | 41 ++++++++
src/remote/remote_driver.c | 34 +++++++
src/remote/remote_protocol.x | 33 ++++++-
src/remote_protocol-structs | 16 ++++
src/util/virstoragefile.c | 105 ++++++++++++++++++---
src/util/virstoragefile.h | 14 +++
tests/qemucapabilitiesdata/caps_2.4.0.x86_64.xml | 1 +
tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml | 1 +
.../caps_2.6.0-gicv2.aarch64.xml | 1 +
.../caps_2.6.0-gicv3.aarch64.xml | 1 +
tests/qemucapabilitiesdata/caps_2.6.0.ppc64le.xml | 1 +
tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 +
tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml | 1 +
tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml | 1 +
tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml | 1 +
tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml | 1 +
tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml | 1 +
tools/virsh-domain.c | 84 +++++++++++++++++
tools/virsh.pod | 8 ++
37 files changed, 922 insertions(+), 14 deletions(-)
--
2.11.1
7 years, 10 months
[libvirt] [PATCH] qemu_process: spice: don't release used port
by Pavel Hrdina
The port is stored in graphics configuration and it will
also get released in qemuProcessStop in case of error.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1397440
Signed-off-by: Pavel Hrdina <phrdina(a)redhat.com>
---
src/qemu/qemu_process.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index df1fa0371d..760507d957 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -3693,7 +3693,6 @@ qemuProcessSPICEAllocatePorts(virQEMUDriverPtr driver,
ret = 0;
cleanup:
- virPortAllocatorRelease(driver->remotePorts, port);
virObjectUnref(cfg);
return ret;
}
--
2.11.1
7 years, 10 months
[libvirt] [PATCH] qemu: Don't update physical storage size of empty drives
by Peter Krempa
Previously the code called virStorageSourceUpdateBlockPhysicalSize which
did not do anything on empty drives since it worked only on block
devices. After the refactor in c5f6151390 it's called for all devices
and thus attempts to deref the NULL path of empty drives.
Add a check that skips the update of the physical size if the storage
source is empty.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1420718
---
src/qemu/qemu_driver.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 6e1e3d408..77d81755a 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -11336,6 +11336,9 @@ qemuDomainStorageUpdatePhysical(virQEMUDriverPtr driver,
int fd = -1;
struct stat sb;
+ if (virStorageSourceIsEmpty(src))
+ return 0;
+
if (qemuDomainStorageOpenStat(driver, cfg, vm, src, &fd, &sb) < 0)
return -1;
--
2.11.1
7 years, 10 months
[libvirt] Proposal PCI/PCIe device placement on PAPR guests
by David Gibson
There was a discussion back in November on the qemu list which spilled
onto the libvirt list about how to add support for PCIe devices to
POWER VMs, specifically 'pseries' machine type PAPR guests.
Here's a more concrete proposal for how to handle part of this in
future from the libvirt side. Strictly speaking what I'm suggesting
here isn't intrinsically linked to PCIe: it will make adding PCIe
support sanely easier, as well as having a number of advantages for
both PCIe and plain-PCI devices on PAPR guests.
Background:
* Currently the pseries machine type only supports vanilla PCI
buses.
* This is a qemu limitation, not something inherent - PAPR guests
running under PowerVM (the IBM hypervisor) can use passthrough
PCIe devices (PowerVM doesn't emulate devices though).
* In fact the way PCI access is para-virtalized in PAPR makes the
usual distinctions between PCI and PCIe largely disappear
* Presentation of PCIe devices to PAPR guests is unusual
* Unlike x86 - and other "bare metal" platforms, root ports are
not made visible to the guest. i.e. all devices (typically)
appear as though they were integrated devices on x86
* In terms of topology all devices will appear in a way similar to
a vanilla PCI bus, even PCIe devices
* However PCIe extended config space is accessible
* This means libvirt's usual placement of PCIe devices is not
suitable for PAPR guests
* PAPR has its own hotplug mechanism
* This is used instead of standard PCIe hotplug
* This mechanism works for both PCIe and vanilla-PCI devices
* This can hotplug/unplug devices even without a root port P2P
bridge between it and the root "bus
* Multiple independent host bridges are routine on PAPR
* Unlike PC (where all host bridges have multiplexed access to
configuration space) PCI host bridges (PHBs) are truly
independent for PAPR guests (disjoint MMIO regions in system
address space)
* PowerVM typically presents a separate PHB to the guest for each
host slot passed through
The Proposal:
I suggest that libvirt implement a new default algorithm for placing
(i.e. assigning addresses to) both PCI and PCIe devices for (only)
PAPR guests.
The short summary is that by default it should assign each device to a
separate vPHB, creating vPHBs as necessary.
* For passthrough sometimes a group of host devices can't be safely
isolated from each other - this is known as a (host) Partitionable
Endpoint (PE). In this case, if any device in the PE is passed
through to a guest, the whole PE must be passed through to the
same vPHB in the guest. From the guest POV, each vPHB has exactly
one (guest) PE.
* To allow for hotplugged devices, libvirt should also add a number
of additional, empty vPHBs (the PAPR spec allows for hotplug of
PHBs, but this is not yet implemented in qemu). When hotplugging
a new device (or PE) libvirt should locate a vPHB which doesn't
currently contain anything.
* libvirt should only (automatically) add PHBs - never root ports or
other PCI to PCI bridges
In order to handle migration, the vPHBs will need to be represented in
the domain XML, which will also allow the user to override this
topology if they want.
Advantages:
There are still some details I need to figure out w.r.t. handling PCIe
devices (on both the qemu and libvirt sides). However the fact that
PAPR guests don't typically see PCIe root ports means that the normal
libvirt PCIe allocation scheme won't work. This scheme has several
advantages with or without support for PCIe devices:
* Better performance for 32-bit devices
With multiple devices on a single vPHB they all must share a (fairly
small) 32-bit DMA/IOMMU window. With separate PHBs they each have a
separate window. PAPR guests have an always-on guest visible IOMMU.
* Better EEH handling for passthrough devices
EEH is an IBM hardware-assisted mechanism for isolating and safely
resetting devices experiencing hardware faults so they don't bring
down other devices or the system at large. It's roughly similar to
PCIe AER in concept, but has a different IBM specific interface, and
works on both PCI and PCIe devices.
Currently the kernel interfaces for handling EEH events on passthrough
devices will only work if there is a single (host) iommu group in the
vfio container. While lifting that restriction would be nice, it's
quite difficult to do so (it requires keeping state synchronized
between multiple host groups). That also means that an EEH error on
one device could stop another device where that isn't required by the
actual hardware.
The unit of EEH isolation is a PE (Partitionable Endpoint) and
currently there is only one guest PE per vPHB. Changing this might
also be possible, but is again quite complex and may result in
confusing and/or broken distinctions between groups for EEH isolation
and IOMMU isolation purposes.
Placing separate host groups in separate vPHBs sidesteps these
problems.
* Guest NUMA node assignment of devices
PAPR does not (and can't reasonably) use the pxb device. Instead to
allocate devices to different guest NUMA nodes they should be placed
on different vPHBs. Placing them on different PHBs by default allows
NUMA node to be assigned to those PHBs in a straightforward manner.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
7 years, 10 months