[libvirt PATCH v5 0/7] Add a PCI/PCIe device VPD Capability
by Dmitrii Shcherbakov
Add support for deserializing the binary PCI/PCIe VPD format and
exposing VPD resources as XML elements in a new nested capability
of PCI/PCIe devices called 'vpd'.
The series contains the following incremental changes:
* The PCI VPD parser module, in-memory resource representation
and tests;
* VPD-related helpers added to the virpci module;
* VPD capability support: XML serialization/deserialization from/into
VPD resource data structures;
* Documentation.
The VPD format is specified in "I.3. VPD Definitions" in PCI specs
(2.2+) and "6.28.1 VPD Format" PCIe 4.0. As section 6.28 in PCIe 4.0
notes, the PCI Local Bus and PCIe VPD formats are binary compatible
and PCIe 4.0 merely started incorporating what was already present in
PCI specs.
Linux kernel exposes a binary blob in the VPD format via sysfs since
v2.6.26 (commit 94e6108803469a37ee1e3c92dafdd1d59298602f) which requires
a parser to interpret.
There are usage scenarios where information such as the board serial
number needs to be retrieved from PCI(e) VPD. Projects like Nova can
utilize this information for cases which involve virtual interface
plugging on SmartNIC DPUs but there may be other scenarios and types of
information useful to retrieve from VPD. The fact that the format is
binary requires proper parsing instead of substring searching hence the
full parser is proposed. Likewise, checksum validation requires proper
parsing as well.
A usage example is present here:
https://review.opendev.org/c/openstack/nova/+/808199
The patch follows a prior discussion on the mailing list which has
additional context about the use-case but a narrower proposal:
https://listman.redhat.com/archives/libvir-list/2021-May/msg00873.html
https://www.mail-archive.com/libvir-list@redhat.com/msg218165.html
The new functionality is mostly contained in virpcivpd with a
couple of new functions added to virpci. Additionally, the necessary XML
serialization/deserialization and glue code is added to expose the VPD
capability to external clients as XML.
A new capability flag is added along with a new capability in order to
allow for filtering of PCI devices with the VPD capability using virsh:
virsh nodedev-list --cap vpd
sudo virsh nodedev-dumpxml --device pci_dddd_bb_ss_f
In this example having the root uid is required in order to access the
vpd sysfs entry, therefore, the nodedev XML output will only contain
the VPD capability if virsh is run as root.
The capability is treated as dynamic due to the presence of read-write
sections in the VPD format per PCI/PCIe specs (the idea being that
read-write resource fields may potentially be altered by the DPU OS
over time independently from the host OS).
Unit tests cover the parser functionality (including many possible
invalid cases), in-memory representation as well as XML serialization
and deserialization.
Manual functional testing was performed with 2 DPUs and several other
NIC models which expose PCI(e) VPD. Testing have also been performed
for devices that do not have VPD or those that expose a VPD capability
but exhibit invalid behavior (I/O errors while reading a sysfs entry).
Per the existing guidelines, the implementation relies heavily on glib
for various purposes.
https://libvirt.org/glib-adoption.html
* v5 fixes a couple of minor typos and adds NEWS entries.
* The v4 of the patch includes a number of fixes compared to v3:
* Fixed the patch to correctly build against older glib (2.56.0);
* Notably, glib commit 86c073dba9d82ef3f1bc3d3116b058b9b5c3b1eb (in
2.59.0) fixes g_autolist support for derivable Glib types. To make
things work in 2.56.0 a workaround is conditionally applied;
* virCreateAnonymousFile now uses a temporary file which is
unlinked after creation instead of memfd because OpenSUSE 15.2 does
not have support memfd;
* Keyword resources now use GTree instead of GHashTable as the
underlying data structure:
* This allows for stable ordering which is important for XML2XML tests
as they were failing with when GLib versions were different,
resulting in a different ordering of elements;
* The keyword resource iteration function was complex and got replaced
by a simpler g_tree_foreach-based approach;
* Added more testing: functions added to virpci are now assessed by
creating a mocked vpd file under a mocked sysfs structure while the
parser is still tested in virpcivpdtest file;
* Refactoring:
* Applied changes based on the indent tool operation with some
post-processing;
* Renamed functions which had the Glib naming style to use camel case
where possible. Auto-generated declarations are an exception:
gobject/gtype.h defines type_name##_init, type_name##_class_init,
module_obj_name##_get_type functions which were left unchanged;
* camelCase is now used for local variables and function parameters;
* Replaced //-style comments with multi-line ones;
* Split out one patch into 4 based on distinct features:
* PCI VPD parser functionality and the respective in-memory types;
* VPD helpers in virpci;
* XML serialization/deserialization and VPD capability support;
* Documentation.
Build & test results for targets in ci/manifest.yaml:
ci/helper test --meson-args='-Dexpensive_tests=enabled' <target>
https://gist.github.com/dshcherb/225b5da9478275f08c220487814ffd1c
Dmitrii Shcherbakov (7):
Add a PCI/PCIe device VPD Parser
News: Add a PCI VPD parser
Add PCI VPD-related helper functions to virpci
News: Add PCI VPD-related helper functions to virpci
Add PCI VPD Capability Support
News: Add PCI VPD capability support
Add PCI VPD Capability Documentation
NEWS.rst | 22 +
build-aux/syntax-check.mk | 4 +-
docs/drvnodedev.html.in | 46 ++
docs/formatnode.html.in | 24 +-
docs/schemas/nodedev.rng | 40 +
include/libvirt/libvirt-nodedev.h | 1 +
po/POTFILES.in | 1 +
src/conf/node_device_conf.c | 271 +++++++
src/conf/node_device_conf.h | 6 +-
src/conf/virnodedeviceobj.c | 7 +-
src/libvirt_private.syms | 17 +
src/node_device/node_device_driver.c | 2 +
src/node_device/node_device_udev.c | 2 +
src/util/meson.build | 1 +
src/util/virpci.c | 62 ++
src/util/virpci.h | 3 +
src/util/virpcivpd.c | 755 ++++++++++++++++++
src/util/virpcivpd.h | 117 +++
src/util/virpcivpdpriv.h | 45 ++
tests/meson.build | 1 +
.../pci_0000_42_00_0_vpd.xml | 33 +
.../pci_0000_42_00_0_vpd.xml | 1 +
tests/nodedevxml2xmltest.c | 1 +
tests/testutils.c | 40 +
tests/testutils.h | 4 +
tests/virpcimock.c | 30 +
tests/virpcitest.c | 64 ++
tests/virpcivpdtest.c | 705 ++++++++++++++++
tools/virsh-nodedev.c | 3 +
29 files changed, 2303 insertions(+), 5 deletions(-)
create mode 100644 src/util/virpcivpd.c
create mode 100644 src/util/virpcivpd.h
create mode 100644 src/util/virpcivpdpriv.h
create mode 100644 tests/nodedevschemadata/pci_0000_42_00_0_vpd.xml
create mode 120000 tests/nodedevxml2xmlout/pci_0000_42_00_0_vpd.xml
create mode 100644 tests/virpcivpdtest.c
--
2.30.2
3 years, 6 months
Release of libvirt-7.8.0
by Jiri Denemark
The 7.8.0 release of both libvirt and libvirt-python is tagged and
signed tarballs and source RPMs are available at
https://libvirt.org/sources/
https://libvirt.org/sources/python/
Thanks everybody who helped with this release by sending patches,
reviewing, testing, or providing feedback. Your work is greatly
appreciated.
* New features
* nodedev: Add ability to automatically start mediated devices
The autostart status of a persistent mediated devices can be managed with
the new APIs ``virNodeDeviceSetAutostart()`` and
``virNodeDeviceGetAutostart()``. The corresponding virsh command is
``nodedev-autostart``. In addition, two new APIs were added to get
additional information about node devices: ``virNodeDeviceIsPersistent()``
checks whether the device is persistently defined, and
``virNodeDeviceIsActive()`` checks whether the node device is currently
active. This information can also be retrieved with the new virsh command
``nodedev-info``.
Enjoy.
Jirka
3 years, 6 months
[PATCH] conf: fix block type CDROM cannot support startupPolicy
by Jie Wang
block type CDROM also support startupPolicy in the past, so
let us fix it.
Signed-off-by: Jie Wang <wangjie88(a)huawei.com>
---
src/conf/domain_conf.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index f7025bffe4..dd374e8ab3 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -30853,9 +30853,10 @@ virDomainDiskTranslateSourcePool(virDomainDiskDef *def)
}
if (def->startupPolicy != 0 &&
- virStorageSourceGetActualType(def->src) != VIR_STORAGE_TYPE_FILE) {
+ (virStorageSourceGetActualType(def->src) != VIR_STORAGE_TYPE_FILE &&
+ virStorageSourceGetActualType(def->src) != VIR_STORAGE_TYPE_BLOCK)) {
virReportError(VIR_ERR_XML_ERROR, "%s",
- _("'startupPolicy' is only valid for 'file' type volume"));
+ _("'startupPolicy' is only valid for 'file' or 'block' type volume"));
return -1;
}
--
2.24.0.windows.2
3 years, 6 months
Re: [PATCH] failover: allow to pause the VM during the migration
by Laine Stump
On 9/30/21 1:09 PM, Laurent Vivier wrote:
> If we want to save a snapshot of a VM to a file, we used to follow the
> following steps:
>
> 1- stop the VM:
> (qemu) stop
>
> 2- migrate the VM to a file:
> (qemu) migrate "exec:cat > snapshot"
>
> 3- resume the VM:
> (qemu) cont
>
> After that we can restore the snapshot with:
> qemu-system-x86_64 ... -incoming "exec:cat snapshot"
> (qemu) cont
This is the basics of what libvirt does for a snapshot, and steps 1+2
are what it does for a "managedsave" (where it saves the snapshot to
disk and then terminates the qemu process, for later re-animation).
In those cases, it seems like this new parameter could work for us -
instead of explicitly pausing the guest prior to migrating it to disk,
we would set this new parameter to on, then directly migrate-to-disk
(relying on qemu to do the pause). Care will need to be taken to assure
that error recovery behaves the same though.
There are a couple of cases when libvirt apparently *doesn't* pause the
guest during the migrate-to-disk, both having to do with saving a
coredump of the guest. Since I really have no idea of how
common/important that is (or even if my assessment of the code is
correct), I'm Cc'ing this patch to libvir-list to make sure it catches
the attention of someone who knows the answers and implications.
> But when failover is configured, it doesn't work anymore.
>
> As the failover needs to ask the guest OS to unplug the card
> the machine cannot be paused.
>
> This patch introduces a new migration parameter, "pause-vm", that
> asks the migration to pause the VM during the migration startup
> phase after the the card is unplugged.
>
> Once the migration is done, we only need to resume the VM with
> "cont" and the card is plugged back:
>
> 1- set the parameter:
> (qemu) migrate_set_parameter pause-vm on
>
> 2- migrate the VM to a file:
> (qemu) migrate "exec:cat > snapshot"
>
> The primary failover card (VFIO) is unplugged and the VM is paused.
>
> 3- resume the VM:
> (qemu) cont
>
> The VM restarts and the primary failover card is plugged back
>
> The VM state sent in the migration stream is "paused", it means
> when the snapshot is loaded or if the stream is sent to a destination
> QEMU, the VM needs to be resumed manually.
>
> Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>
> ---
> qapi/migration.json | 20 +++++++++++++++---
> include/hw/virtio/virtio-net.h | 1 +
> hw/net/virtio-net.c | 33 ++++++++++++++++++++++++++++++
> migration/migration.c | 37 +++++++++++++++++++++++++++++++++-
> monitor/hmp-cmds.c | 8 ++++++++
> 5 files changed, 95 insertions(+), 4 deletions(-)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88f07baedd06..86284d96ad2a 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -743,6 +743,10 @@
> # block device name if there is one, and to their node name
> # otherwise. (Since 5.2)
> #
> +# @pause-vm: Pause the virtual machine before doing the migration.
> +# This allows QEMU to unplug a card before doing the
> +# migration as it depends on the guest kernel. (since 6.2)
> +#
> # Since: 2.4
> ##
> { 'enum': 'MigrationParameter',
> @@ -758,7 +762,7 @@
> 'xbzrle-cache-size', 'max-postcopy-bandwidth',
> 'max-cpu-throttle', 'multifd-compression',
> 'multifd-zlib-level' ,'multifd-zstd-level',
> - 'block-bitmap-mapping' ] }
> + 'block-bitmap-mapping', 'pause-vm' ] }
>
> ##
> # @MigrateSetParameters:
> @@ -903,6 +907,10 @@
> # block device name if there is one, and to their node name
> # otherwise. (Since 5.2)
> #
> +# @pause-vm: Pause the virtual machine before doing the migration.
> +# This allows QEMU to unplug a card before doing the
> +# migration as it depends on the guest kernel. (since 6.2)
> +#
> # Since: 2.4
> ##
> # TODO either fuse back into MigrationParameters, or make
> @@ -934,7 +942,8 @@
> '*multifd-compression': 'MultiFDCompression',
> '*multifd-zlib-level': 'uint8',
> '*multifd-zstd-level': 'uint8',
> - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> + '*pause-vm': 'bool' } }
>
> ##
> # @migrate-set-parameters:
> @@ -1099,6 +1108,10 @@
> # block device name if there is one, and to their node name
> # otherwise. (Since 5.2)
> #
> +# @pause-vm: Pause the virtual machine before doing the migration.
> +# This allows QEMU to unplug a card before doing the
> +# migration as it depends on the guest kernel. (since 6.2)
> +#
> # Since: 2.4
> ##
> { 'struct': 'MigrationParameters',
> @@ -1128,7 +1141,8 @@
> '*multifd-compression': 'MultiFDCompression',
> '*multifd-zlib-level': 'uint8',
> '*multifd-zstd-level': 'uint8',
> - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> + '*pause-vm': 'bool' } }
>
> ##
> # @query-migrate-parameters:
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 824a69c23f06..a6c186e28b45 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -210,6 +210,7 @@ struct VirtIONet {
> bool failover;
> DeviceListener primary_listener;
> Notifier migration_state;
> + VMChangeStateEntry *vm_state;
> VirtioNetRssData rss_data;
> struct NetRxPkt *rx_pkt;
> struct EBPFRSSContext ebpf_rss;
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index f205331dcf8c..c83364eac47b 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -39,6 +39,7 @@
> #include "migration/misc.h"
> #include "standard-headers/linux/ethtool.h"
> #include "sysemu/sysemu.h"
> +#include "sysemu/runstate.h"
> #include "trace.h"
> #include "monitor/qdev.h"
> #include "hw/pci/pci.h"
> @@ -3303,6 +3304,35 @@ static void virtio_net_migration_state_notifier(Notifier *notifier, void *data)
> virtio_net_handle_migration_primary(n, s);
> }
>
> +static void virtio_net_failover_restart_cb(void *opaque, bool running,
> + RunState state)
> +{
> + DeviceState *dev;
> + VirtIONet *n = opaque;
> + Error *err = NULL;
> + PCIDevice *pdev;
> +
> + if (!running) {
> + return;
> + }
> +
> + dev = failover_find_primary_device(n);
> + if (!dev) {
> + return;
> + }
> +
> + pdev = PCI_DEVICE(dev);
> + if (!pdev->partially_hotplugged) {
> + return;
> + }
> +
> + if (!failover_replug_primary(n, dev, &err)) {
> + if (err) {
> + error_report_err(err);
> + }
> + }
> +}
> +
> static bool failover_hide_primary_device(DeviceListener *listener,
> QemuOpts *device_opts)
> {
> @@ -3360,6 +3390,8 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> device_listener_register(&n->primary_listener);
> n->migration_state.notify = virtio_net_migration_state_notifier;
> add_migration_state_change_notifier(&n->migration_state);
> + n->vm_state = qemu_add_vm_change_state_handler(
> + virtio_net_failover_restart_cb, n);
> n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
> }
>
> @@ -3508,6 +3540,7 @@ static void virtio_net_device_unrealize(DeviceState *dev)
> if (n->failover) {
> device_listener_unregister(&n->primary_listener);
> remove_migration_state_change_notifier(&n->migration_state);
> + qemu_del_vm_change_state_handler(n->vm_state);
> }
>
> max_queues = n->multiqueue ? n->max_queues : 1;
> diff --git a/migration/migration.c b/migration/migration.c
> index bb909781b7f5..9c96986d4abf 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -901,6 +901,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> s->parameters.block_bitmap_mapping);
> }
>
> + params->has_pause_vm = true;
> + params->pause_vm = s->parameters.pause_vm;
> +
> return params;
> }
>
> @@ -1549,6 +1552,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
> dest->has_block_bitmap_mapping = true;
> dest->block_bitmap_mapping = params->block_bitmap_mapping;
> }
> +
> + if (params->has_pause_vm) {
> + dest->has_pause_vm = true;
> + dest->pause_vm = params->pause_vm;
> + }
> }
>
> static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> @@ -1671,6 +1679,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> QAPI_CLONE(BitmapMigrationNodeAliasList,
> params->block_bitmap_mapping);
> }
> +
> + if (params->has_pause_vm) {
> + s->parameters.pause_vm = params->pause_vm;
> + }
> }
>
> void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
> @@ -1718,6 +1730,12 @@ void qmp_migrate_start_postcopy(Error **errp)
> " started");
> return;
> }
> +
> + if (s->parameters.pause_vm) {
> + error_setg(errp, "Postcopy cannot be started if pause-vm is on");
> + return;
> + }
> +
> /*
> * we don't error if migration has finished since that would be racy
> * with issuing this command.
> @@ -3734,13 +3752,27 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state,
> "failure");
> }
> }
> -
> migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state);
> } else {
> migrate_set_state(&s->state, old_state, new_state);
> }
> }
>
> +/* stop the VM before starting the migration but after device unplug */
> +static void pause_vm_after_unplug(MigrationState *s)
> +{
> + if (s->parameters.pause_vm) {
> + qemu_mutex_lock_iothread();
> + qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> + s->vm_was_running = runstate_is_running();
> + if (vm_stop_force_state(RUN_STATE_PAUSED)) {
> + migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> + MIGRATION_STATUS_FAILED);
> + }
> + qemu_mutex_unlock_iothread();
> + }
> +}
> +
> /*
> * Master migration thread on the source VM.
> * It drives the migration and pumps the data down the outgoing channel.
> @@ -3790,6 +3822,8 @@ static void *migration_thread(void *opaque)
> qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> MIGRATION_STATUS_ACTIVE);
>
> + pause_vm_after_unplug(s);
> +
> s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
>
> trace_migration_thread_setup_complete();
> @@ -4265,6 +4299,7 @@ static void migration_instance_init(Object *obj)
> params->has_announce_max = true;
> params->has_announce_rounds = true;
> params->has_announce_step = true;
> + params->has_pause_vm = true;
>
> qemu_sem_init(&ms->postcopy_pause_sem, 0);
> qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index b5e71d9e6f52..71bc8c15335b 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -513,6 +513,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
> }
> }
> }
> +
> + monitor_printf(mon, "%s: %s\n",
> + MigrationParameter_str(MIGRATION_PARAMETER_PAUSE_VM),
> + params->pause_vm ? "on" : "off");
> }
>
> qapi_free_MigrationParameters(params);
> @@ -1399,6 +1403,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
> error_setg(&err, "The block-bitmap-mapping parameter can only be set "
> "through QMP");
> break;
> + case MIGRATION_PARAMETER_PAUSE_VM:
> + p->has_pause_vm = true;
> + visit_type_bool(v, param, &p->pause_vm, &err);
> + break;
> default:
> assert(0);
> }
>
3 years, 6 months