[libvirt] [RFC PATCH 0/9] Introduce mediate ops in vfio-pci
by Yan Zhao
For SRIOV devices, VFs are passthroughed into guest directly without host
driver mediation. However, when VMs migrating with passthroughed VFs,
dynamic host mediation is required to (1) get device states, (2) get
dirty pages. Since device states as well as other critical information
required for dirty page tracking for VFs are usually retrieved from PFs,
it is handy to provide an extension in PF driver to centralizingly control
VFs' migration.
Therefore, in order to realize (1) passthrough VFs at normal time, (2)
dynamically trap VFs' bars for dirty page tracking and (3) centralizing
VF critical states retrieving and VF controls into one driver, we propose
to introduce mediate ops on top of current vfio-pci device driver.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
__________ register mediate ops| ___________ ___________ |
| |<-----------------------| VF | | |
| vfio-pci | | | mediate | | PF driver | |
|__________|----------------------->| driver | |___________|
| open(pdev) | ----------- | |
| |
| |_ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _|
\|/ \|/
----------- ------------
| VF | | PF |
----------- ------------
VF mediate driver could be a standalone driver that does not bind to
any devices (as in demo code in patches 5-6) or it could be a built-in
extension of PF driver (as in patches 7-9) .
Rather than directly bind to VF, VF mediate driver register a mediate
ops into vfio-pci in driver init. vfio-pci maintains a list of such
mediate ops.
(Note that: VF mediate driver can register mediate ops into vfio-pci
before vfio-pci binding to any devices. And VF mediate driver can
support mediating multiple devices.)
When opening a device (e.g. a VF), vfio-pci goes through the mediate ops
list and calls each vfio_pci_mediate_ops->open() with pdev of the opening
device as a parameter.
VF mediate driver should return success or failure depending on it
supports the pdev or not.
E.g. VF mediate driver would compare its supported VF devfn with the
devfn of the passed-in pdev.
Once vfio-pci finds a successful vfio_pci_mediate_ops->open(), it will
stop querying other mediate ops and bind the opening device with this
mediate ops using the returned mediate handle.
Further vfio-pci ops (VFIO_DEVICE_GET_REGION_INFO ioctl, rw, mmap) on the
VF will be intercepted into VF mediate driver as
vfio_pci_mediate_ops->get_region_info(),
vfio_pci_mediate_ops->rw,
vfio_pci_mediate_ops->mmap, and get customized.
For vfio_pci_mediate_ops->rw and vfio_pci_mediate_ops->mmap, they will
further return 'pt' to indicate whether vfio-pci should further
passthrough data to hw.
when vfio-pci closes the VF, it calls its vfio_pci_mediate_ops->release()
with a mediate handle as parameter.
The mediate handle returned from vfio_pci_mediate_ops->open() lets VF
mediate driver be able to differentiate two opening VFs of the same device
id and vendor id.
When VF mediate driver exits, it unregisters its mediate ops from
vfio-pci.
In this patchset, we enable vfio-pci to provide 3 things:
(1) calling mediate ops to allow vendor driver customizing default
region info/rw/mmap of a region.
(2) provide a migration region to support migration
(3) provide a dynamic trap bar info region to allow vendor driver
control trap/untrap of device pci bars
This vfio-pci + mediate ops way differs from mdev way in that
(1) medv way needs to create a 1:1 mdev device on top of one VF, device
specific mdev parent driver is bound to VF directly.
(2) vfio-pci + mediate ops way does not create mdev devices and VF
mediate driver does not bind to VFs. Instead, vfio-pci binds to VFs.
The reason why we don't choose the way of writing mdev parent driver is
that
(1) VFs are almost all the time directly passthroughed. Directly binding
to vfio-pci can make most of the code shared/reused. If we write a
vendor specific mdev parent driver, most of the code (like passthrough
style of rw/mmap) still needs to be copied from vfio-pci driver, which is
actually a duplicated and tedious work.
(2) For features like dynamically trap/untrap pci bars, if they are in
vfio-pci, they can be available to most people without repeated code
copying and re-testing.
(3) with a 1:1 mdev driver which passthrough VFs most of the time, people
have to decide whether to bind VFs to vfio-pci or mdev parent driver before
it runs into a real migration need. However, if vfio-pci is bound
initially, they have no chance to do live migration when there's a need
later.
In this patchset,
- patches 1-4 enable vfio-pci to call mediate ops registered by vendor
driver to mediate/customize region info/rw/mmap.
- patches 5-6 provide a standalone sample driver to register a mediate ops
for Intel Graphics Devices. It does not bind to IGDs directly but decides
what devices it supports via its pciidlist. It also demonstrates how to
dynamic trap a device's PCI bars. (by adding more pciids in its
pciidlist, this sample driver actually is not necessarily limited to
support IGDs)
- patch 7-9 provide a sample on i40e driver that supports Intel(R)
Ethernet Controller XL710 Family of devices. It supports VF precopy live
migration on Intel's 710 SRIOV. (but we commented out the real
implementation of dirty page tracking and device state retrieving part
to focus on demonstrating framework part. Will send out them in future
versions)
patch 7 registers/unregisters VF mediate ops when PF driver
probes/removes. It specifies its supporting VFs via
vfio_pci_mediate_ops->open(pdev)
patch 8 reports device cap of VFIO_PCI_DEVICE_CAP_MIGRATION and
provides a sample implementation of migration region.
The QEMU part of vfio migration is based on v8
https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05542.html.
We do not based on recent v9 because we think there are still opens in
dirty page track part in that series.
patch 9 reports device cap of VFIO_PCI_DEVICE_CAP_DYNAMIC_TRAP_BAR and
provides an example on how to trap part of bar0 when migration starts
and passthrough this part of bar0 again when migration fails.
Yan Zhao (9):
vfio/pci: introduce mediate ops to intercept vfio-pci ops
vfio/pci: test existence before calling region->ops
vfio/pci: register a default migration region
vfio-pci: register default dynamic-trap-bar-info region
samples/vfio-pci/igd_dt: sample driver to mediate a passthrough IGD
sample/vfio-pci/igd_dt: dynamically trap/untrap subregion of IGD bar0
i40e/vf_migration: register mediate_ops to vfio-pci
i40e/vf_migration: mediate migration region
i40e/vf_migration: support dynamic trap of bar0
drivers/net/ethernet/intel/Kconfig | 2 +-
drivers/net/ethernet/intel/i40e/Makefile | 3 +-
drivers/net/ethernet/intel/i40e/i40e.h | 2 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +
.../ethernet/intel/i40e/i40e_vf_migration.c | 626 ++++++++++++++++++
.../ethernet/intel/i40e/i40e_vf_migration.h | 78 +++
drivers/vfio/pci/vfio_pci.c | 189 +++++-
drivers/vfio/pci/vfio_pci_private.h | 2 +
include/linux/vfio.h | 18 +
include/uapi/linux/vfio.h | 160 +++++
samples/Kconfig | 6 +
samples/Makefile | 1 +
samples/vfio-pci/Makefile | 2 +
samples/vfio-pci/igd_dt.c | 367 ++++++++++
14 files changed, 1455 insertions(+), 4 deletions(-)
create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.c
create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.h
create mode 100644 samples/vfio-pci/Makefile
create mode 100644 samples/vfio-pci/igd_dt.c
--
2.17.1
5 years, 3 months
[libvirt] [PATCH v5 0/4] PCI hostdev partial assignment support
by Daniel Henrique Barboza
changes from v4 [1]:
- previous patch 3 was removed. The validation it was
implementating proved to be too restrict, while
providing no tangible benefits for the trouble of
having existing domains failing to launch. This
makes this series all about the new address type
implementation and its benefits. More info about the
rationale can be found at [1].
- documentation was changed to reflect this new
tone
[1] https://www.redhat.com/archives/libvir-list/2019-December/msg01016.html
Daniel Henrique Barboza (4):
Introducing new address type='unassigned' for PCI hostdevs
qemu: handle unassigned PCI hostdevs in command line
formatdomain.html.in: document <address type='unassigned'/>
news.xml: add address type='unassigned' entry
docs/formatdomain.html.in | 13 +++++
docs/news.xml | 14 +++++
docs/schemas/domaincommon.rng | 5 ++
src/conf/device_conf.c | 2 +
src/conf/device_conf.h | 1 +
src/conf/domain_conf.c | 7 ++-
src/qemu/qemu_command.c | 5 ++
src/qemu/qemu_domain.c | 1 +
src/qemu/qemu_domain_address.c | 5 ++
.../hostdev-pci-address-unassigned.args | 31 ++++++++++
.../hostdev-pci-address-unassigned.xml | 42 ++++++++++++++
tests/qemuxml2argvtest.c | 4 ++
.../hostdev-pci-address-unassigned.xml | 58 +++++++++++++++++++
tests/qemuxml2xmltest.c | 1 +
14 files changed, 188 insertions(+), 1 deletion(-)
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-unassigned.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-unassigned.xml
create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-address-unassigned.xml
--
2.23.0
5 years, 3 months
[libvirt] [PATCH] qemu: homogenize MAC address in live & config when hotplugging a netdev
by Laine Stump
Prior to commit 55ce6564634 (first in libvirt 4.6.0), the XML sent to
virDomainAttachDeviceFlags() was parsed only once, and the results of
that parse were inserted into both the live object of the running
domain and into the persistent config. Thus, if MAC address was
omitted from in XML for a network device (<interface>), both the live
and config object would have the same MAC address.
Commit 55ce6564634 changed the code to parse the incoming XML twice -
once for live and once for config. This does eliminate the problem of
PCI (/scsi/sata) address conflicts caused by allocating an address
based on existing devices in live object, but then inserting the
result into the config (which may already have a device using that
address), BUT it also means that when the MAC address of a network
device hasn't been specified in the XML, each copy will get a
different auto-generated MAC address.
This results in the MAC address of the device changing the next time
the domain is shutdown and restarted, which creates havoc with the
guest OS's network config.
There have been several discussions about this in the last > 1 year,
attempting to find the ideal solution to this problem that makes MAC
addresses consistent and accounts for all sorts of corner cases with
PCI/scsi/sata addresses. All of these discussions fizzled out because
every proposal was either too difficult to implement or failed to fix
some esoteric case someone thought up.
So, in the interest of solving the MAC address problem while not
making the "other address" situation any worse than before, this patch
simply adds a qemuDomainAttachDeviceLiveAndConfigHomogenize() function
that (for now) copies the MAC address from the config object to the
live object (if the original xml had <mac address='blah'/> then this
will be an effective NOP (as the macs already match)).
Any downstream libvirt containing upstream commit
55ce6564634 should have this patch as well.
https://bugzilla.redhat.com/1783411
Signed-off-by: Laine Stump <laine(a)redhat.com>
---
src/qemu/qemu_driver.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 924f01d3eb..19ddff80b5 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -8623,6 +8623,30 @@ qemuDomainUpdateDeviceConfig(virDomainDefPtr vmdef,
return 0;
}
+
+static void
+qemuDomainAttachDeviceLiveAndConfigHomogenize(const virDomainDeviceDef *devConf,
+ virDomainDeviceDefPtr devLive)
+{
+ /*
+ * fixup anything that needs to be identical in the live and
+ * config versions of DeviceDef, but might not be. Do this by
+ * changing the contents of devLive.
+ */
+
+ /* MAC address should be identical in both DeviceDefs, but if it
+ * wasn't specified in the XML, and was instead autogenerated, it
+ * will be different for the two since they are each the result of
+ * a separate parser call. If it *was* specified, it will already
+ * be the same, so copying does no harm.
+ */
+
+ if (devConf->type == VIR_DOMAIN_DEVICE_NET)
+ virMacAddrSet(&devLive->data.net->mac, &devConf->data.net->mac);
+
+}
+
+
static int
qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm,
virQEMUDriverPtr driver,
@@ -8633,6 +8657,7 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm,
virDomainDefPtr vmdef = NULL;
g_autoptr(virQEMUDriverConfig) cfg = NULL;
virDomainDeviceDefPtr devConf = NULL;
+ virDomainDeviceDef devConfSave = { 0 };
virDomainDeviceDefPtr devLive = NULL;
int ret = -1;
unsigned int parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE |
@@ -8657,6 +8682,13 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm,
parse_flags)))
goto cleanup;
+ /*
+ * devConf will be NULLed out by
+ * qemuDomainAttachDeviceConfig(), so save it for later use by
+ * qemuDomainDeviceLiveAndConfigHomogenize()
+ */
+ devConfSave = *devConf;
+
if (virDomainDeviceValidateAliasForHotplug(vm, devConf,
VIR_DOMAIN_AFFECT_CONFIG) < 0)
goto cleanup;
@@ -8678,6 +8710,9 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm,
parse_flags)))
goto cleanup;
+ if (flags & VIR_DOMAIN_AFFECT_CONFIG)
+ qemuDomainAttachDeviceLiveAndConfigHomogenize(&devConfSave, devLive);
+
if (virDomainDeviceValidateAliasForHotplug(vm, devLive,
VIR_DOMAIN_AFFECT_LIVE) < 0)
goto cleanup;
--
2.23.0
5 years, 3 months
[libvirt] [PATCH 0/4] qemu: Error out if backing image format is not recorded in image metadata
by Peter Krempa
See patch 3/4 for explanation.
Peter Krempa (4):
tests: storage: Use strict version of virStorageFileGetMetadata
tests: storage: Remove unused test modes
util: storage: Don't treat files with missing backing store format as
'raw'
kbase: Add document outlining backing chain XML config and
troubleshooting
docs/kbase.html.in | 4 +
docs/kbase/backing_chains.rst | 185 ++++++++++++++++++++++++++++++++++
src/util/virstoragefile.c | 21 +++-
tests/virstoragetest.c | 42 +++-----
4 files changed, 221 insertions(+), 31 deletions(-)
create mode 100644 docs/kbase/backing_chains.rst
--
2.23.0
5 years, 3 months
[libvirt] [PATCH v2 00/12] esx: various improvements
by Pino Toscano
- implement connectListAllStoragePools, so
virConnectListAllStoragePools() works
- implement connectListAllNetworks, so virConnectListAllNetworks()
works
- implement storagePoolListAllVolumes, so virStoragePoolListAllVolumes()
works
- set the proper filesystem type for vmfs-based datastores
Pino Toscano (12):
esx: split datastoreToStoragePoolPtr helper
esx: split datastorePoolType helper
esx: split targetToStoragePool helper
esx: implement connectListAllStoragePools
esx: split virtualswitchToNetwork helper
esx: implement connectListAllNetworks
storage: add vmfs filesystem type
esx: set vmfs fs type for vmfs-based datastores
esx: split datastorePathToStorageVol helper
esx: split scsilunToStorageVol helper
esx: implement storagePoolListAllVolumes
docs: document implemented APIs in esx
docs/news.xml | 11 +
docs/schemas/storagepool.rng | 1 +
docs/schemas/storagevol.rng | 1 +
docs/storage.html.in | 3 +
src/conf/storage_conf.c | 1 +
src/conf/storage_conf.h | 1 +
src/esx/esx_network_driver.c | 100 +++++-
src/esx/esx_storage_backend_iscsi.c | 234 +++++++++++---
src/esx/esx_storage_backend_vmfs.c | 306 +++++++++++++++---
src/esx/esx_storage_driver.c | 87 +++++
.../storagepoolcapsschemadata/poolcaps-fs.xml | 1 +
.../poolcaps-full.xml | 1 +
12 files changed, 655 insertions(+), 92 deletions(-)
--
2.21.0
5 years, 3 months
[libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
by Daniel Henrique Barboza
changes from version 3 [1]:
- removed last 2 patches that made function 0 of
PCI multifunction devices mandatory
- new patch: news.xml update
- changed 'since' version to 6.0.0 in patch 4
- unassigned hostdevs are now getting qemu aliases
[1] https://www.redhat.com/archives/libvir-list/2019-November/msg01263.html
Daniel Henrique Barboza (5):
Introducing new address type='unassigned' for PCI hostdevs
qemu: handle unassigned PCI hostdevs in command line
virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
formatdomain.html.in: document <address type='unassigned'/>
news.xml: add address type='unassigned' entry
docs/formatdomain.html.in | 14 ++++
docs/news.xml | 19 ++++++
docs/schemas/domaincommon.rng | 5 ++
src/conf/device_conf.c | 2 +
src/conf/device_conf.h | 1 +
src/conf/domain_conf.c | 7 +-
src/qemu/qemu_command.c | 5 ++
src/qemu/qemu_domain.c | 1 +
src/qemu/qemu_domain_address.c | 5 ++
src/util/virhostdev.c | 64 +++++++++++++++++--
.../hostdev-pci-address-unassigned.args | 31 +++++++++
.../hostdev-pci-address-unassigned.xml | 42 ++++++++++++
tests/qemuxml2argvtest.c | 4 ++
.../hostdev-pci-address-unassigned.xml | 58 +++++++++++++++++
tests/qemuxml2xmltest.c | 1 +
15 files changed, 251 insertions(+), 8 deletions(-)
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-unassigned.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-unassigned.xml
create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-address-unassigned.xml
--
2.23.0
5 years, 3 months
[libvirt] [PATCH 0/4] Rewrite virGetUser*Directory() functions using g_get_*_dir()
by Fabiano Fidêncio
By rewriting virGetUser*Directory() functions using g_get_*_dir()
functions allows us to drop all the different implementations we
keep, as GLib already takes care of those for us.
Fabiano Fidêncio (4):
util: Rewrite virGetUserDirectory() using g_get_home_dir()
util: Rewrite virGetUserConfigDirectory() using
g_get_user_config_dir()
util: Rewrite virGetUserCacheDirectory() using g_get_user_cache_dir()
util: Rewrite virGetUserRuntimeDirectory() using
g_get_user_runtime_dir()
src/util/virutil.c | 146 +++++++++++++--------------------------------
1 file changed, 40 insertions(+), 106 deletions(-)
--
2.23.0
5 years, 3 months
[libvirt] [PATCH] tests: securityselinuxlabel: Add QEMU_CAPS_VNC to fake qemuCaps
by Peter Krempa
In commit 45270337f057f26ce484f6e forgot to make sure that tests pass.
Add the missing capability to fix the test.
Signed-off-by: Peter Krempa <pkrempa(a)redhat.com>
---
Pushed.
tests/securityselinuxlabeltest.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/securityselinuxlabeltest.c b/tests/securityselinuxlabeltest.c
index 192f2dc84f..3040a36693 100644
--- a/tests/securityselinuxlabeltest.c
+++ b/tests/securityselinuxlabeltest.c
@@ -351,6 +351,7 @@ mymain(void)
return EXIT_FAILURE;
virQEMUCapsSet(qemuCaps, QEMU_CAPS_DEVICE_CIRRUS_VGA);
+ virQEMUCapsSet(qemuCaps, QEMU_CAPS_VNC);
if (qemuTestCapsCacheInsert(driver.qemuCapsCache, qemuCaps) < 0)
return EXIT_FAILURE;
--
2.23.0
5 years, 3 months
[libvirt] [PATCH 0/9] Cleanup virConnectPtr usage
by Michal Privoznik
I've noticed this problem when reviewing a patch on the list [1].
Long story short, dom->conn is not guaranteed to have all driver
pointers set (consider split daemons). I haven't identified any
other violation than what I'm fixing here. But something might have
slipped through my git grep.
1: https://www.redhat.com/archives/libvir-list/2019-December/msg00120.html
Michal Prívozník (9):
qemu_driver: Push qemuDomainInterfaceAddresses() a few lines down
qemu: Don't use dom->conn to lookup virNetwork
qemuGetDHCPInterfaces: Move some variables inside the loop
qemuGetDHCPInterfaces: Switch to GLib
libxl: Don't use dom->conn to lookup virNetwork
libxlGetDHCPInterfaces: Move some variables inside the loop
libxlGetDHCPInterfaces: Switch to GLib
lxc: Cleanup virConnectPtr usage
get_nonnull_domain: Drop useless comment
src/libxl/libxl_driver.c | 71 ++++------
src/lxc/lxc_driver.c | 12 +-
src/lxc/lxc_process.c | 40 +++---
src/lxc/lxc_process.h | 2 +-
src/qemu/qemu_driver.c | 193 ++++++++++++----------------
src/remote/remote_daemon_dispatch.c | 3 -
6 files changed, 139 insertions(+), 182 deletions(-)
--
2.23.0
5 years, 3 months