[libvirt] Proposal PCI/PCIe device placement on PAPR guests
by David Gibson
There was a discussion back in November on the qemu list which spilled
onto the libvirt list about how to add support for PCIe devices to
POWER VMs, specifically 'pseries' machine type PAPR guests.
Here's a more concrete proposal for how to handle part of this in
future from the libvirt side. Strictly speaking what I'm suggesting
here isn't intrinsically linked to PCIe: it will make adding PCIe
support sanely easier, as well as having a number of advantages for
both PCIe and plain-PCI devices on PAPR guests.
Background:
* Currently the pseries machine type only supports vanilla PCI
buses.
* This is a qemu limitation, not something inherent - PAPR guests
running under PowerVM (the IBM hypervisor) can use passthrough
PCIe devices (PowerVM doesn't emulate devices though).
* In fact the way PCI access is para-virtalized in PAPR makes the
usual distinctions between PCI and PCIe largely disappear
* Presentation of PCIe devices to PAPR guests is unusual
* Unlike x86 - and other "bare metal" platforms, root ports are
not made visible to the guest. i.e. all devices (typically)
appear as though they were integrated devices on x86
* In terms of topology all devices will appear in a way similar to
a vanilla PCI bus, even PCIe devices
* However PCIe extended config space is accessible
* This means libvirt's usual placement of PCIe devices is not
suitable for PAPR guests
* PAPR has its own hotplug mechanism
* This is used instead of standard PCIe hotplug
* This mechanism works for both PCIe and vanilla-PCI devices
* This can hotplug/unplug devices even without a root port P2P
bridge between it and the root "bus
* Multiple independent host bridges are routine on PAPR
* Unlike PC (where all host bridges have multiplexed access to
configuration space) PCI host bridges (PHBs) are truly
independent for PAPR guests (disjoint MMIO regions in system
address space)
* PowerVM typically presents a separate PHB to the guest for each
host slot passed through
The Proposal:
I suggest that libvirt implement a new default algorithm for placing
(i.e. assigning addresses to) both PCI and PCIe devices for (only)
PAPR guests.
The short summary is that by default it should assign each device to a
separate vPHB, creating vPHBs as necessary.
* For passthrough sometimes a group of host devices can't be safely
isolated from each other - this is known as a (host) Partitionable
Endpoint (PE). In this case, if any device in the PE is passed
through to a guest, the whole PE must be passed through to the
same vPHB in the guest. From the guest POV, each vPHB has exactly
one (guest) PE.
* To allow for hotplugged devices, libvirt should also add a number
of additional, empty vPHBs (the PAPR spec allows for hotplug of
PHBs, but this is not yet implemented in qemu). When hotplugging
a new device (or PE) libvirt should locate a vPHB which doesn't
currently contain anything.
* libvirt should only (automatically) add PHBs - never root ports or
other PCI to PCI bridges
In order to handle migration, the vPHBs will need to be represented in
the domain XML, which will also allow the user to override this
topology if they want.
Advantages:
There are still some details I need to figure out w.r.t. handling PCIe
devices (on both the qemu and libvirt sides). However the fact that
PAPR guests don't typically see PCIe root ports means that the normal
libvirt PCIe allocation scheme won't work. This scheme has several
advantages with or without support for PCIe devices:
* Better performance for 32-bit devices
With multiple devices on a single vPHB they all must share a (fairly
small) 32-bit DMA/IOMMU window. With separate PHBs they each have a
separate window. PAPR guests have an always-on guest visible IOMMU.
* Better EEH handling for passthrough devices
EEH is an IBM hardware-assisted mechanism for isolating and safely
resetting devices experiencing hardware faults so they don't bring
down other devices or the system at large. It's roughly similar to
PCIe AER in concept, but has a different IBM specific interface, and
works on both PCI and PCIe devices.
Currently the kernel interfaces for handling EEH events on passthrough
devices will only work if there is a single (host) iommu group in the
vfio container. While lifting that restriction would be nice, it's
quite difficult to do so (it requires keeping state synchronized
between multiple host groups). That also means that an EEH error on
one device could stop another device where that isn't required by the
actual hardware.
The unit of EEH isolation is a PE (Partitionable Endpoint) and
currently there is only one guest PE per vPHB. Changing this might
also be possible, but is again quite complex and may result in
confusing and/or broken distinctions between groups for EEH isolation
and IOMMU isolation purposes.
Placing separate host groups in separate vPHBs sidesteps these
problems.
* Guest NUMA node assignment of devices
PAPR does not (and can't reasonably) use the pxb device. Instead to
allocate devices to different guest NUMA nodes they should be placed
on different vPHBs. Placing them on different PHBs by default allows
NUMA node to be assigned to those PHBs in a straightforward manner.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
8 years, 1 month
Re: [libvirt] [Qemu-devel] [PATCH] qdev: Make "hotplugged" property read-only
by Eduardo Habkost
(CCing libvir-list and Laine)
On Tue, Jan 03, 2017 at 05:53:27PM +0100, Igor Mammedov wrote:
> On Tue, 3 Jan 2017 17:10:15 +0100
> Paolo Bonzini <pbonzini(a)redhat.com> wrote:
>
> >
> >
> > On 03/01/2017 15:22, Eduardo Habkost wrote:
> > >> I didn't know that. Is this documented somewhere?
> > >> Is it actually used by any existing software?
> > > not that I know of. But users should be fixed if they are not using it.
> > >
> > > I see. The problem is that the mechanism is undocumented,
> > > untested, and seems very likely to trigger bugs in device code.
> >
> > I agree. Why can't hotplugged be migrated?
> It's probably not migrated because of it's not runtime/guest modified
> state so we don't have to migrate it as it's know in advance.
>
> For now it should set manually on CLI (-device) with the rest of
> hotplugged device properties.
As this recommendation has the potential to trigger hidden bugs
(and known to trigger a bug in QEMU <= 2.8), I would like it to
be properly documented, and the documentation/recommendations
reviewed following the usual patch review process.
While we don't do that, setting "hotplugged=true" on the
command-line is an unused, undocumented, untested (and
unsupported?) feature.
--
Eduardo
8 years, 1 month
[libvirt] [PATCH v2 0/4] Virtio-crypto device support
by Longpeng(Mike)
As virtio-crypto has been supported in QEMU 2.8 and the frontend
driver has been merged in linux 4.10, so it's necessary to support
virtio-crypto in libvirt.
---
Changes since v1:
- split patch [Martin]
- rebase on master [Martin]
- add docs/tests/schema [Martin]
- fix typos [Gonglei]
---
Longpeng(Mike) (4):
docs: schema: Add basic documentation for the virtual crypto device
support
conf: Parse virtio-crypto in the domain XML
qemu: Implement support for 'builtin' backend for virtio-crypto
tests: Add testcase for virtio-crypto XML parsing
docs/formatdomain.html.in | 60 ++++++
docs/schemas/domaincommon.rng | 27 +++
src/conf/domain_conf.c | 213 ++++++++++++++++++++-
src/conf/domain_conf.h | 32 ++++
src/libvirt_private.syms | 2 +
src/qemu/qemu_alias.c | 20 ++
src/qemu/qemu_alias.h | 3 +
src/qemu/qemu_capabilities.c | 4 +
src/qemu/qemu_capabilities.h | 2 +
src/qemu/qemu_command.c | 132 +++++++++++++
src/qemu/qemu_command.h | 3 +
src/qemu/qemu_domain.c | 2 +
src/qemu/qemu_domain_address.c | 25 +++
src/qemu/qemu_driver.c | 6 +
src/qemu/qemu_hotplug.c | 1 +
tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml | 2 +
tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml | 2 +
.../qemuxml2argv-virtio-crypto-builtin.xml | 26 +++
.../qemuxml2argv-virtio-crypto.args | 22 +++
.../qemuxml2xmlout-virtio-crypto-builtin.xml | 31 +++
tests/qemuxml2xmltest.c | 2 +
21 files changed, 616 insertions(+), 1 deletion(-)
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-virtio-crypto-builtin.xml
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-virtio-crypto.args
create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-virtio-crypto-builtin.xml
--
1.8.3.1
8 years, 1 month
[libvirt] [PATCH] node_device: Check return value for udev_new()
by Marc Hartmayer
The comment was actually wrong as
https://www.freedesktop.org/software/systemd/man/udev_new.html
mentions that on failure NULL is returned.
Signed-off-by: Marc Hartmayer <mhartmay(a)linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk(a)linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy(a)linux.vnet.ibm.com>
---
src/node_device/node_device_udev.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
index 4b81312..4b0a875 100644
--- a/src/node_device/node_device_udev.c
+++ b/src/node_device/node_device_udev.c
@@ -1491,13 +1491,11 @@ static int nodeStateInitialize(bool privileged,
if (udevPCITranslateInit(privileged) < 0)
goto cleanup;
- /*
- * http://www.kernel.org/pub/linux/utils/kernel/hotplug/libudev/libudev-udev...
- *
- * indicates no return value other than success, so we don't check
- * its return value.
- */
udev = udev_new();
+ if (!udev) {
+ virReportOOMError();
+ goto cleanup;
+ }
#if HAVE_UDEV_LOGGING
/* cast to get rid of missing-format-attribute warning */
udev_set_log_fn(udev, (udevLogFunctionPtr) udevLogFunction);
--
2.5.5
8 years, 1 month
[libvirt] [PATCH 00/15] Add more vHBA related tests and module-arize the code
by John Ferlan
Don't be scared off by the quantity of patches...
There's quite a bit of code motion and function renaming going on before
being able to more easily add tests that will ensure that from a nodedev
perspective creation and deletion of the vHBA will work properly and it's
possible to test the various ways to create a vHBA (nothing provide, a
parent by name provided, a parent by wwnn/wwpn provided, and a parent
by fabric_wwn provided).
I did run this through the coverity checking code with no errors...
John Ferlan (15):
tests: Alter test_driver HBA name/data to be closer to reality
test: Add new NPIV capable HBA and a vHBA
test: Add helper to create vHBA for testNodeDeviceCreateXML
tests: Create a more realistic vHBA
test: Fix fchosttest resource leak
util: Create a new virvhba module and move/rename API's
util: Move/rename virStoragePoolGetVhbaSCSIHostParent to virvhba
util: Reduce complexity of virVHBAGetParent
util: Move scsi_host specific functions from virutil
tests: Add new fchosttest tests for management of a vHBA
nodedev: Keep the node device lock longer in nodeDeviceDestroy
nodedev: Rework virNodeDeviceGetParentHost
tests: Add createVHBAByNodeDevice-no-parent to fchosttest
tests: Add createVHBAByNodeDevice-parent-wwn to fchosttest
tests: Add createVHBAByNodeDevice-parent-fabric-wwn to fchosttest
po/POTFILES.in | 2 +
src/Makefile.am | 2 +
src/conf/node_device_conf.c | 76 +++-
src/conf/node_device_conf.h | 19 +-
src/conf/storage_conf.c | 100 ++---
src/conf/storage_conf.h | 5 -
src/libvirt_private.syms | 32 +-
src/node_device/node_device_driver.c | 92 ++--
src/node_device/node_device_linux_sysfs.c | 24 +-
src/storage/storage_backend_scsi.c | 68 +--
src/test/test_driver.c | 178 ++++++--
src/util/virscsi.c | 4 +
src/util/virscsihost.c | 297 ++++++++++++
src/util/virscsihost.h | 40 ++
src/util/virutil.c | 724 ------------------------------
src/util/virutil.h | 47 --
src/util/virvhba.c | 591 ++++++++++++++++++++++++
src/util/virvhba.h | 59 +++
tests/fchosttest.c | 183 ++++++--
tests/objecteventtest.c | 6 +-
tests/scsihosttest.c | 16 +-
tests/virrandommock.c | 9 +
22 files changed, 1463 insertions(+), 1111 deletions(-)
create mode 100644 src/util/virscsihost.c
create mode 100644 src/util/virscsihost.h
create mode 100644 src/util/virvhba.c
create mode 100644 src/util/virvhba.h
--
2.7.4
8 years, 1 month
[libvirt] [PATCH v2 00/12] qemu: migration: show disks stats for nbd migration
by Nikolay Shirokovskiy
diff from v1:
============
1. patch "qemu: clean out unused migrate to unix" is dropped
as it is already pushed.
2. a lot of refactoring patches added, namely all except
the last patch.
3. fetching mirroring stats is done separately from getting
migration status. Generally speaking refactorings patches
removes the function to fetch migrations status altogether.
Current migration stats will show something like [1] when in
the process of mirroring of non shared disks. This gives very
little info on the migration progress. Likewise completed stats miss
disks mirroring info.
This patch provides disks stats in the said phase like in [2] so
user can now understand what's going on. However data stats miss
memory stats, so data total and remaining will change when memory
migration starts.
AFAIU disks stats were available before the nbd based migration
becomes the default. So this patch returns disks stats back at
some level.
[1]
Job type: Unbounded
Time elapsed: 4964 ms
[2]
Job type: Unbounded
Time elapsed: 4964 ms
Data processed: 146.000 MiB
Data remaining: 854.000 MiB
Data total: 1000.000 MiB
File processed: 146.000 MiB
File remaining: 854.000 MiB
File total: 1000.000 MiB
Nikolay Shirokovskiy (12):
qemu: qemuDomainJobInfoToParams drop unused code
qemu: introduce qemu domain job status
qemu: introduce QEMU_DOMAIN_JOB_STATUS_POSTCOPY
qemu: drop QEMU_MIGRATION_COMPLETED_UPDATE_STATS
qemu: drop excessive zero-out in qemuMigrationFetchJobStatus
qemu: drop fetch and update status functions
qemu: simplify getting completed job stats
qemu: drop unused code in qemuDomainGetJobStatsInternal
qemu: drop fetch flag in qemuDomainGetJobStatsInternal
qemu: split getting stats for migration and others
qemu: introduce QEMU_DOMAIN_JOB_STATUS_PREPARE
qemu: show disks stats for nbd migration
docs/news.html.in | 4 +
src/qemu/qemu_domain.c | 32 ++++++--
src/qemu/qemu_domain.h | 13 +++-
src/qemu/qemu_driver.c | 100 ++++++++++++++----------
src/qemu/qemu_migration.c | 195 ++++++++++++++++++++++++++--------------------
src/qemu/qemu_migration.h | 8 +-
src/qemu/qemu_process.c | 6 +-
7 files changed, 214 insertions(+), 144 deletions(-)
--
1.8.3.1
8 years, 1 month
[libvirt] [RFC PATCH 00/10] introduce push backups
by Nikolay Shirokovskiy
Push backup is a backup when hypervisor itself copy backup
data to destination in contrast to pull backup when hypervisor
exports backup data thru some interface and mgmt itself make
a copy.
This patch series basically adds API and remote/qemu implementation
of backup creation and correspondent backup xml description definition.
Just like other blockjobs backup creation is asynchronous. That
is creation is merely a backup start and client should track
backup error/completion thru blockjob events. Another option
is to make backup synchronus operation. AFAIU on this way we
have to make backup asynchronus job and thus make all modifying
commands unavailable during backup. This makes backup rather
obtrusive operation which is not convinient.
Backup xml desription follows closely snapshot one and
is described in more details in definition patch [1].
Nikolay Shirokovskiy (10):
api: add API to create backup
add driver based implementation of backup API
remote: add backup API
qemu: monitor: add backup command
misc: add backup block job type
conf: add backup definition [1]
qemu: add qemuDomainBackupCreateXML implementation
qemu: check backup destination before start
qemu: prepare backup destination
virsh: add create backup command
daemon/remote.c | 8 +
examples/object-events/event-test.c | 3 +
include/libvirt/libvirt-domain-backup.h | 59 +++++++
include/libvirt/libvirt-domain.h | 3 +
include/libvirt/libvirt.h | 1 +
include/libvirt/virterror.h | 2 +
po/POTFILES.in | 2 +
src/Makefile.am | 3 +
src/access/viraccessperm.c | 3 +-
src/access/viraccessperm.h | 6 +
src/conf/backup_conf.c | 294 ++++++++++++++++++++++++++++++++
src/conf/backup_conf.h | 69 ++++++++
src/conf/domain_conf.c | 2 +-
src/datatypes.c | 60 +++++++
src/datatypes.h | 29 ++++
src/driver-hypervisor.h | 6 +
src/libvirt-domain-backup.c | 203 ++++++++++++++++++++++
src/libvirt_private.syms | 9 +
src/libvirt_public.syms | 10 ++
src/qemu/qemu_conf.h | 1 +
src/qemu/qemu_domain.c | 14 ++
src/qemu/qemu_domain.h | 2 +
src/qemu/qemu_driver.c | 249 +++++++++++++++++++++++++++
src/qemu/qemu_monitor.c | 13 ++
src/qemu/qemu_monitor.h | 5 +
src/qemu/qemu_monitor_json.c | 36 ++++
src/qemu/qemu_monitor_json.h | 6 +
src/remote/remote_driver.c | 7 +
src/remote/remote_protocol.x | 24 ++-
src/rpc/gendispatch.pl | 29 +++-
src/util/virerror.c | 6 +
tools/Makefile.am | 1 +
tools/virsh-backup.c | 101 +++++++++++
tools/virsh-backup.h | 29 ++++
tools/virsh-domain.c | 3 +-
tools/virsh.c | 2 +
tools/virsh.h | 1 +
37 files changed, 1290 insertions(+), 11 deletions(-)
create mode 100644 include/libvirt/libvirt-domain-backup.h
create mode 100644 src/conf/backup_conf.c
create mode 100644 src/conf/backup_conf.h
create mode 100644 src/libvirt-domain-backup.c
create mode 100644 tools/virsh-backup.c
create mode 100644 tools/virsh-backup.h
--
1.8.3.1
8 years, 1 month
[libvirt] [PATCH 0/9] perf: Add software perf events
by Nitesh Konkar
This patch series adds software perf events.
Nitesh Konkar (9):
perf: add cpu_clock software perf event support
perf: add task_clock software perf event support
perf: add page_faults software perf event support
perf: add context_switches software perf event support
perf: add cpu_migrations software perf event support
perf: add page_faults_min software perf event support
perf: add page_faults_maj software perf event support
perf: add alignment_faults software perf event support
perf: add emulation_faults software perf event support
docs/formatdomain.html.in | 58 +++++++++++++++++++
docs/news.xml | 7 ++-
docs/schemas/domaincommon.rng | 9 +++
include/libvirt/libvirt-domain.h | 90 +++++++++++++++++++++++++++++
src/libvirt-domain.c | 24 ++++++++
src/qemu/qemu_driver.c | 9 +++
src/util/virperf.c | 33 ++++++++++-
src/util/virperf.h | 9 +++
tests/genericxml2xmlindata/generic-perf.xml | 9 +++
tools/virsh.pod | 35 +++++++++++
10 files changed, 280 insertions(+), 3 deletions(-)
--
1.9.3
8 years, 1 month
[libvirt] [PATCH] Validate required CPU features even for host-passthrough
by Ján Tomko
Commit adff345 allowed enabling features with -cpu host
without ajdusting the validity checks on domain startup
and migration.
---
src/qemu/qemu_migration.c | 2 +-
src/qemu/qemu_process.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 0f4a6cf..0db1616 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -2322,7 +2322,7 @@ qemuMigrationIsAllowed(virQEMUDriverPtr driver,
if (!qemuMigrationIsAllowedHostdev(vm->def))
return false;
- if (vm->def->cpu && vm->def->cpu->mode != VIR_CPU_MODE_HOST_PASSTHROUGH) {
+ if (vm->def->cpu) {
for (i = 0; i < vm->def->cpu->nfeatures; i++) {
virCPUFeatureDefPtr feature = &vm->def->cpu->features[i];
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 184440d..2ad6451 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -3819,7 +3819,7 @@ qemuProcessVerifyGuestCPU(virQEMUDriverPtr driver,
}
}
- if (def->cpu && def->cpu->mode != VIR_CPU_MODE_HOST_PASSTHROUGH) {
+ if (def->cpu) {
for (i = 0; i < def->cpu->nfeatures; i++) {
virCPUFeatureDefPtr feature = &def->cpu->features[i];
--
2.10.2
8 years, 1 month
[libvirt] [PATCH] qemu: Filter ARAT CPU feature from host-model
by Jiri Denemark
ARAT feature was first introduced in QEMU 2.4.0, which means host-model
CPU mode is unusable with QEMU < 2.4.0 on any host CPU which supports
ARAT. Let's not include this feature in host-model CPUs unless a user
explicitly asks for it.
Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
---
Notes:
We will do this properly in the future since we will ask QEMU what CPU
features it understands and what it thinks about host CPU. However, the
required QMP interface is not upstream yet.
src/qemu/qemu_capabilities.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index cfd090c3f..2da4c76ab 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -2959,7 +2959,8 @@ virQEMUCapsCPUFilterFeatures(const char *name,
{
if (STREQ(name, "cmt") ||
STREQ(name, "mbm_total") ||
- STREQ(name, "mbm_local"))
+ STREQ(name, "mbm_local") ||
+ STREQ(name, "arat"))
return false;
return true;
--
2.11.0.rc2
8 years, 1 month