[PATCH rfcv4 00/13] LIBVIRT: X86: TDX support
by Zhenzhong Duan
Hi,
This series brings libvirt the x86 TDX support.
* What's TDX?
TDX stands for Trust Domain Extensions which isolates VMs from
the virtual-machine manager (VMM)/hypervisor and any other software on
the platform.
To support TDX, multiple software components, not only KVM but also QEMU,
guest Linux and virtual bios, need to be updated. For more details, please
check link[1].
This patchset is another software component to extend libvirt to support TDX,
with which one can start a TDX guest from high level rather than running qemu
directly.
* Misc
As QEMU use a software emulated way to reset guest which isn't supported by TDX
guest for security reason. We simulate reboot for TDX guest by kill and create a
new one in FakeReboot framework.
Complete code can be found at [2], matching qemu code can be found at [3].
There is a 'debug' property for tdx-guest object which isn't in matching qemu[3]
yet. I keep them intentionally as they will be implemented in qemu as extention
series of [3].
* Test
start/stop/reboot with virsh
stop/reboot trigger in guest
stop with on_poweroff=destroy/restart
reboot with on_reboot=destroy/restart
* Patch organization
- patch 1-4: Support query of TDX capabilities.
- patch 5-8: Add TDX type to launchsecurity framework.
- patch 9-11: Add reboot support to TDX guest
- patch 12-13: Add test and docs
TODO:
- update QEMU capabilities data in tests, depending on qemu TDX merged beforehand
- add reconnect logic in virsh command
[1] https://lore.kernel.org/kvm/cover.1708933498.git.isaku.yamahata@intel.com
[2] https://github.com/intel/libvirt-tdx/commits/tdx_for_upstream_rfcv4
[3] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v5
Thanks
Zhenzhong
Changelog:
rfcv4:
- add a check to tools/virt-host-validate-qemu.c (Daniel)
- remove check of q35 (Daniel)
- model 'SocktetAddress' QAPI in xml schema (Daniel)
- s/Quote-Generation-Service/quoteGenerationService/ (Daniel)
- define bits in tdx->policy and add validating logic (Daniel)
- presume QEMU choose split kernel irqchip for TDX guest by default (Daniel)
- utilize existing FakeReboot framework to do reboot for TDX guest (Daniel)
- drop patch11 'conf: Add support to keep same domid for hard reboot' (Daniel)
- add test in tests/ to validate parsing and formatting logic (Daniel)
- add doc in docs/formatdomain.rst (Daniel)
- add R-B
rfcv3:
- Change to generate qemu cmdline with -bios
- drop firmware auto match as -bios is used
- add a hard reboot method to reboot TDX guest
rfcv3: https://www.mail-archive.com/devel@lists.libvirt.org/msg00385.html
rfcv2:
- give up using qmp cmd and check TDX directly on host for TDX capabilities.
- use launchsecurity framework to support TDX
- use <os>.<loader> for general loader
- add auto firmware match feature for TDX
A example TDVF fimware description file 70-edk2-x86_64-tdx.json:
{
"description": "UEFI firmware for x86_64, supporting Intel TDX",
"interface-types": [
"uefi"
],
"mapping": {
"device": "generic",
"filename": "/usr/share/OVMF/OVMF_CODE-tdx.fd"
},
"targets": [
{
"architecture": "x86_64",
"machines": [
"pc-q35-*"
]
}
],
"features": [
"intel-tdx",
"verbose-dynamic"
],
"tags": [
]
}
rfcv2: https://www.mail-archive.com/libvir-list@redhat.com/msg219378.html
Zhenzhong Duan (13):
tools: Secure guest check for Intel in virt-host-validate
qemu: Check if INTEL Trust Domain Extention support is enabled
qemu: Add TDX capability
conf: expose TDX feature in domain capabilities
conf: add tdx as launch security type
qemu: Add command line and validation for TDX type
qemu: force special parameters enabled for TDX guest
Add Intel TDX Quote Generation Service(QGS) support
qemu: add FakeReboot support for TDX guest
qemu: Support reboot command in guest
qemu: Avoid duplicate FakeReboot for secure guest
Add test cases for Intel TDX
docs: domain: Add documentation for Intel TDX guest
docs/formatdomain.rst | 68 ++++
docs/formatdomaincaps.rst | 1 +
src/conf/domain_capabilities.c | 1 +
src/conf/domain_capabilities.h | 1 +
src/conf/domain_conf.c | 312 ++++++++++++++++++
src/conf/domain_conf.h | 75 +++++
src/conf/schemas/domaincaps.rng | 9 +
src/conf/schemas/domaincommon.rng | 135 ++++++++
src/conf/virconftypes.h | 2 +
src/qemu/qemu_capabilities.c | 36 +-
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_command.c | 139 ++++++++
src/qemu/qemu_firmware.c | 1 +
src/qemu/qemu_monitor.c | 28 +-
src/qemu/qemu_monitor.h | 2 +-
src/qemu/qemu_monitor_json.c | 6 +-
src/qemu/qemu_namespace.c | 1 +
src/qemu/qemu_process.c | 75 +++++
src/qemu/qemu_validate.c | 44 +++
...unch-security-tdx-qgs-fd.x86_64-latest.xml | 77 +++++
.../launch-security-tdx-qgs-fd.xml | 30 ++
...ch-security-tdx-qgs-inet.x86_64-latest.xml | 77 +++++
.../launch-security-tdx-qgs-inet.xml | 30 ++
...ch-security-tdx-qgs-unix.x86_64-latest.xml | 77 +++++
.../launch-security-tdx-qgs-unix.xml | 30 ++
...h-security-tdx-qgs-vsock.x86_64-latest.xml | 77 +++++
.../launch-security-tdx-qgs-vsock.xml | 30 ++
tests/qemuxmlconftest.c | 24 ++
tools/virt-host-validate-common.c | 22 +-
tools/virt-host-validate-common.h | 1 +
30 files changed, 1407 insertions(+), 5 deletions(-)
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-fd.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-fd.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-inet.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-inet.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-unix.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-unix.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-vsock.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-vsock.xml
--
2.34.1
1 month, 2 weeks
Re: [PATCH] Сheck snapshot disk is not NULL when searching it in the VM config
by Peter Krempa
On Mon, May 20, 2024 at 14:48:47 +0000, Efim Shevrin via Devel wrote:
> Hello,
>
> > If vmdisk is NULL, shouldn't this function (qemuSnapshotDeleteValidate()) return an error?
>
> I think this qemuSnapshotDeleteValidate should not return an error.
>
> It seems to me that when vmdisk is NULL, this does not invalidate
> the snapshot itself, but indicates that the config has changed since
> the snapshot was done. And if the VM config has changed, this adds evidence that the snapshot should be deleted,
> because the snapshot does not reflect the real vm config.
>
> Since we do not have an analogue of the --force option for deleting a snapshot, in the case when qemuSnapshotDeleteValidate returns
> an error when vmdisk is NULL, we will never delete a snapshot which has invalid disk.
Snapshot deletion does have something that can be considered force and
that is the '--metadata' option that removes just the snapshot
definition (metadata) and doesn't touch the disk images.
> > Similarly, disk can be NULL too
> Thank you for the comment regarding the disk variable. I`ve reworked patch.
>
> When creating a snapshot of a VM with multiple hard disks,
> the snapshot takes into account the presence of all disks
> in the system. If, over time, one of the disks is deleted,
> the snapshot will continue to store knowledge of the deleted disk.
> This results in the fact that at the moment of deleting the snapshot,
> at the validation stage, a disk from the snapshot will be searched which
> is not in the VM configuration. As a result, vmdisk variable will
> be equal to NULL. Dereferencing a null pointer at the time of calling
> virStorageSourceIsSameLocation(vmdisk->src, disk->src)
> will result in SIGSEGV.
Crashing is obviously not okay ...
> Also, the disk variable can also be equal to NULL and this
> requires to check that disk != NULL before calling the
> virStorageSourceIsSameLocation function to avoid SIGSEGV.
.. but going ahead with the snapshot deletion isn't always okay either.
The disk isn't referenced by the VM so the disk state can't be merged,
while the state would be merged for any other disk.
When reverting back to a previous snapshot, which is still referencing
the older state of the disk which was removed from the VM, the VM would
see that the image state of disks that were present at deletion would
contain the merged state, but only a partial state for the disk which
was later removed.
2 months, 2 weeks
[PATCH v2] virsh: Provide completer for some pool-X-as commands
by Abhiram Tilak
Provides completers for auth-type and source-format commands for
virsh pool-create-as and pool-define-as commands. Use Empty completers
for options where completions are not required.
Related Issue: https://gitlab.com/libvirt/libvirt/-/issues/9
Signed-off-by: Abhiram Tilak <atp.exp(a)gmail.com>
---
Changes in v2:
- Fix all formatting errors
- Change some options using Empty completers, to use
LocalPath completers.
- Add completers for AdapterName and AdapterParent using information
from node devices.
src/libvirt_private.syms | 2 +
tools/virsh-completer-pool.c | 128 +++++++++++++++++++++++++++++++++++
tools/virsh-completer-pool.h | 20 ++++++
tools/virsh-pool.c | 9 +++
4 files changed, 159 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index f0f7aa8654..fcb0ef7afe 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -1117,6 +1117,8 @@ virStorageAuthDefCopy;
virStorageAuthDefFormat;
virStorageAuthDefFree;
virStorageAuthDefParse;
+virStorageAuthTypeFromString;
+virStorageAuthTypeToString;
virStorageFileFeatureTypeFromString;
virStorageFileFeatureTypeToString;
virStorageFileFormatTypeFromString;
diff --git a/tools/virsh-completer-pool.c b/tools/virsh-completer-pool.c
index 3568bb985b..7db2a20347 100644
--- a/tools/virsh-completer-pool.c
+++ b/tools/virsh-completer-pool.c
@@ -23,6 +23,7 @@
#include "virsh-completer-pool.h"
#include "virsh-util.h"
#include "conf/storage_conf.h"
+#include "conf/node_device_conf.h"
#include "virsh-pool.h"
#include "virsh.h"
@@ -106,3 +107,130 @@ virshPoolTypeCompleter(vshControl *ctl,
return virshCommaStringListComplete(type_str, (const char **)tmp);
}
+
+
+char **
+virshPoolFormatCompleter(vshControl *ctl G_GNUC_UNUSED,
+ const vshCmd *cmd G_GNUC_UNUSED,
+ unsigned int flags)
+{
+ size_t i = 0;
+ size_t j = 0;
+ g_auto(GStrv) tmp = NULL;
+ size_t nformats = VIR_STORAGE_POOL_FS_LAST + VIR_STORAGE_POOL_NETFS_LAST +
+ VIR_STORAGE_POOL_DISK_LAST + VIR_STORAGE_POOL_LOGICAL_LAST;
+
+ virCheckFlags(0, NULL);
+
+ tmp = g_new0(char *, nformats + 1);
+
+ /* Club all PoolFormats for completion */
+ for (i = 0; i < VIR_STORAGE_POOL_FS_LAST; i++)
+ tmp[j++] = g_strdup(virStoragePoolFormatFileSystemTypeToString(i));
+
+ for (i = 0; i < VIR_STORAGE_POOL_NETFS_LAST; i++)
+ tmp[j++] = g_strdup(virStoragePoolFormatFileSystemNetTypeToString(i));
+
+ for (i = 1; i < VIR_STORAGE_POOL_DISK_LAST; i++)
+ tmp[j++] = g_strdup(virStoragePoolFormatDiskTypeToString(i));
+
+ for (i = 1; i < VIR_STORAGE_POOL_LOGICAL_LAST; i++)
+ tmp[j++] = g_strdup(virStoragePoolFormatLogicalTypeToString(i));
+
+ return g_steal_pointer(&tmp);
+}
+
+
+char **
+virshPoolAuthTypeCompleter(vshControl *ctl G_GNUC_UNUSED,
+ const vshCmd *cmd G_GNUC_UNUSED,
+ unsigned int flags)
+{
+ size_t i = 0;
+ g_auto(GStrv) tmp = NULL;
+
+ virCheckFlags(0, NULL);
+
+ tmp = g_new0(char *, VIR_STORAGE_AUTH_TYPE_LAST + 1);
+
+ for (i = 0; i < VIR_STORAGE_AUTH_TYPE_LAST; i++)
+ tmp[i] = g_strdup(virStorageAuthTypeToString(i));
+
+ return g_steal_pointer(&tmp);
+}
+
+
+char **
+virshAdapterNameCompleter(vshControl *ctl,
+ const vshCmd *cmd G_GNUC_UNUSED,
+ unsigned int flags)
+{
+ virshControl *priv = ctl->privData;
+ virNodeDevicePtr *devs = NULL;
+ int ndevs = 0;
+ size_t i = 0;
+ char **ret = NULL;
+ g_auto(GStrv) tmp = NULL;
+
+ virCheckFlags(0, NULL);
+
+ if (!priv->conn || virConnectIsAlive(priv->conn) <= 0)
+ return NULL;
+
+ flags |= VIR_CONNECT_LIST_NODE_DEVICES_CAP_SCSI_HOST;
+ if ((ndevs = virConnectListAllNodeDevices(priv->conn, &devs, flags)) < 0)
+ return NULL;
+
+ tmp = g_new0(char *, ndevs + 1);
+
+ for (i = 0; i < ndevs; i++) {
+ const char *name = virNodeDeviceGetName(devs[i]);
+
+ tmp[i] = g_strdup(name);
+ }
+
+ ret = g_steal_pointer(&tmp);
+
+ for (i = 0; i < ndevs; i++)
+ virshNodeDeviceFree(devs[i]);
+ g_free(devs);
+ return ret;
+}
+
+
+char **
+virshAdapterParentCompleter(vshControl *ctl,
+ const vshCmd *cmd G_GNUC_UNUSED,
+ unsigned int flags)
+{
+ virshControl *priv = ctl->privData;
+ virNodeDevicePtr *devs = NULL;
+ int ndevs = 0;
+ size_t i = 0;
+ char **ret = NULL;
+ g_auto(GStrv) tmp = NULL;
+
+ virCheckFlags(0, NULL);
+
+ if (!priv->conn || virConnectIsAlive(priv->conn) <= 0)
+ return NULL;
+
+ flags |= VIR_CONNECT_LIST_NODE_DEVICES_CAP_VPORTS;
+ if ((ndevs = virConnectListAllNodeDevices(priv->conn, &devs, flags)) < 0)
+ return NULL;
+
+ tmp = g_new0(char *, ndevs + 1);
+
+ for (i = 0; i < ndevs; i++) {
+ const char *name = virNodeDeviceGetName(devs[i]);
+
+ tmp[i] = g_strdup(name);
+ }
+
+ ret = g_steal_pointer(&tmp);
+
+ for (i = 0; i < ndevs; i++)
+ virshNodeDeviceFree(devs[i]);
+ g_free(devs);
+ return ret;
+}
diff --git a/tools/virsh-completer-pool.h b/tools/virsh-completer-pool.h
index bff3e5742b..eccc08a73f 100644
--- a/tools/virsh-completer-pool.h
+++ b/tools/virsh-completer-pool.h
@@ -40,3 +40,23 @@ char **
virshPoolTypeCompleter(vshControl *ctl,
const vshCmd *cmd,
unsigned int flags);
+
+char **
+virshPoolFormatCompleter(vshControl *ctl,
+ const vshCmd *cmd,
+ unsigned int flags);
+
+char **
+virshPoolAuthTypeCompleter(vshControl *ctl,
+ const vshCmd *cmd,
+ unsigned int flags);
+
+char **
+virshAdapterNameCompleter(vshControl *ctl,
+ const vshCmd *cmd,
+ unsigned int flags);
+
+char **
+virshAdapterParentCompleter(vshControl *ctl,
+ const vshCmd *cmd,
+ unsigned int flags);
diff --git a/tools/virsh-pool.c b/tools/virsh-pool.c
index f9aad8ded0..0cbd1417e6 100644
--- a/tools/virsh-pool.c
+++ b/tools/virsh-pool.c
@@ -80,31 +80,37 @@
{.name = "source-path", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompletePathLocalExisting, \
.help = N_("source path for underlying storage") \
}, \
{.name = "source-dev", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompletePathLocalExisting, \
.help = N_("source device for underlying storage") \
}, \
{.name = "source-name", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompleteEmpty, \
.help = N_("source name for underlying storage") \
}, \
{.name = "target", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompletePathLocalExisting, \
.help = N_("target for underlying storage") \
}, \
{.name = "source-format", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshPoolFormatCompleter, \
.help = N_("format for underlying storage") \
}, \
{.name = "auth-type", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshPoolAuthTypeCompleter, \
.help = N_("auth type to be used for underlying storage") \
}, \
{.name = "auth-username", \
@@ -126,6 +132,7 @@
{.name = "adapter-name", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshAdapterNameCompleter, \
.help = N_("adapter name to be used for underlying storage") \
}, \
{.name = "adapter-wwnn", \
@@ -141,6 +148,7 @@
{.name = "adapter-parent", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshAdapterParentCompleter, \
.help = N_("adapter parent scsi_hostN to be used for underlying vHBA storage") \
}, \
{.name = "adapter-parent-wwnn", \
@@ -161,6 +169,7 @@
{.name = "source-protocol-ver", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompleteEmpty, \
.help = N_("nfsvers value for NFS pool mount option") \
}, \
{.name = "source-initiator", \
--
2.39.2
4 months, 2 weeks
[PATCH v2 0/1] Expose availability of SEV-ES
by Takashi Kajinami
This introduces the new "model" field in sev elements so that clients can
check whether SEV-ES, the 2nd generation of AMD SEV, is available in
the taget hyprvisor. There is the maxESGuests field (along with the maxGuests
field) but this field does not explain whether SEV-ES is actually
enabled in KVM.
Takashi Kajinami (1):
Expose available AMD SEV models in domain capabilities
Changes since v1:
* Fixed one code path where available models are not added
* Fixed missing update of "report" flag
* Updated the documentation to explain the new model field in addition
to the existing but undocumanted cpu0Id field
Takashi Kajinami (1):
Expose available AMD SEV models in domain capabilities
docs/formatdomaincaps.rst | 5 ++
src/conf/domain_capabilities.c | 2 +
src/conf/domain_capabilities.h | 1 +
src/conf/domain_conf.c | 7 +++
src/conf/domain_conf.h | 8 ++++
src/qemu/qemu_capabilities.c | 84 +++++++++++++++++++++++++---------
6 files changed, 85 insertions(+), 22 deletions(-)
--
2.43.0
4 months, 2 weeks
[PATCH] formatstorage: Document qcow2 default version change
by Peter Krempa
Based on discussion after commit f432114d9c was pushed it was pointed
out that the documentation still mentions the older version.
Fix the documentation to state the new version and introduce ambiguity
for future updates.
Fixes: f432114d9cf507a4047aa9dc1344b1c13356db08
Signed-off-by: Peter Krempa <pkrempa(a)redhat.com>
---
Posting this documentation update to document what happened rather than
introduce (almost pointless) complication in adding a config file which
is unlikely to be ever used.
docs/formatstorage.rst | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/docs/formatstorage.rst b/docs/formatstorage.rst
index 86e167d9cb..9d9a4143eb 100644
--- a/docs/formatstorage.rst
+++ b/docs/formatstorage.rst
@@ -700,10 +700,15 @@ host filesystem. It can contain the following child elements:
Encryption <formatstorageencryption.html>`__ page for more information.
``compat``
Specify compatibility level. So far, this is only used for ``type='qcow2'``
- volumes. Valid values are ``0.10`` and ``1.1`` so far, specifying QEMU
- version the images should be compatible with. If the ``feature`` element is
- present, 1.1 is used. :since:`Since 1.1.0` If omitted, 0.10 is used.
- :since:`Since 1.1.2`
+ volumes. Valid values are ``0.10`` (QCOW2 v2) and ``1.1`` (QCOW2 v3) so far.
+ The values were meant to specify QEMU version the images should be compatible
+ with.
+
+ The default, if the ``feature`` element is present is ``1.1``. :since:`Since 1.1.0`
+ If ``feature`` is not present, ``0.10`` was used :since:`Since 1.1.2` and
+ :since:`Since 10.2.0` ``1.1`` is used as it's the default of ``qemu-img``.
+
+ Any tool depending on a specific version should specify this field explicitly.
``nocow``
Turn off COW of the newly created volume. So far, this is only valid for a
file image in btrfs file system. It will improve performance when the file
--
2.44.0
4 months, 2 weeks
[PATCH 0/4] hw/s390x: Alias @dump-skeys -> @dump-s390-skey and deprecate
by Philippe Mathieu-Daudé
We are trying to unify all qemu-system-FOO to a single binary.
In order to do that we need to remove QAPI target specific code.
@dump-skeys is only available on qemu-system-s390x. This series
rename it as @dump-s390-skey, making it available on other
binaries. We take care of backward compatibility via deprecation.
Philippe Mathieu-Daudé (4):
hw/s390x: Introduce the @dump-s390-skeys QMP command
hw/s390x: Introduce the 'dump_s390_skeys' HMP command
hw/s390x: Deprecate the HMP 'dump_skeys' command
hw/s390x: Deprecate the QMP @dump-skeys command
docs/about/deprecated.rst | 5 +++++
qapi/misc-target.json | 5 +++++
qapi/misc.json | 18 ++++++++++++++++++
include/monitor/hmp.h | 1 +
hw/s390x/s390-skeys-stub.c | 24 ++++++++++++++++++++++++
hw/s390x/s390-skeys.c | 19 +++++++++++++++++--
hmp-commands.hx | 17 +++++++++++++++--
hw/s390x/meson.build | 5 +++++
8 files changed, 90 insertions(+), 4 deletions(-)
create mode 100644 hw/s390x/s390-skeys-stub.c
--
2.41.0
5 months, 1 week
[RFC PATCH v3 0/6] Added virtio-net RSS with eBPF support.
by Andrew Melnychenko
This series of rfc patches adds support for loading the RSS eBPF
program and passing it to the QEMU.
Comments and suggestions would be useful.
QEMU with vhost may work with RSS through eBPF. To load eBPF,
the capabilities required that Libvirt may provide.
eBPF program and maps may be unique for particular QEMU and
Libvirt retrieves eBPF through qapi.
For now, there is only "RSS" eBPF object in QEMU, in the future,
there may be another one(g.e. network filters).
That's why in Libvirt added logic to load and store any
eBPF object that QEMU provides using qapi schema.
One of the reasons why this series of patches is in RFC are tests.
To this series of patches, the tests were added.
For now, the tests are synthetic, the proper "reply" file should
be generated with a new "caps" file. Currently, there are changes
in caps-9.0.0* files. There was added support for ebpf_rss_fds feature,
and request-ebpf command.
Also, there was added new config for qemuConfig - allowEBPF.
This config allows to enable/disable eBPF blob loading explicitly.
This is required for qemuxmlconf tests - where some test expects that
RSS would not support eBPF.
So, overall, the tests are required for review, comment, and discussion
how we want them to be implemented in the future.
For virtio-net RSS, the document has not changed.
```
<interface type="network">
<model type="virtio"/>
<driver queues="4" rss="on" rss_hash_report="off"/>
<interface type="network">
```
Simplified routine for RSS:
* Libvirt retrieves eBPF "RSS" and load it.
* Libvirt passes file descriptors to virtio-net with property "ebpf_rss_fds" ("rss" property should be "on" too).
* if fds was provided - QEMU using eBPF RSS implementation.
* if fds was not provided - QEMU tries to load eBPF RSS in own context and use it.
* if eBPF RSS was not loaded - QEMU uses "in-qemu" RSS(vhost not supported).
Changes since RFC v2:
* refactored and rebased.
* applied changes according to the Qemu.
* added basic test.
Changes since RFC v1:
* changed eBPF format saved in the XML cache.
* refactored and checked with syntax test.
* refactored patch hunks.
Andrew Melnychenko (6):
qemu_monitor: Added QEMU's "request-ebpf" support.
qemu_capabilities: Added logic for retrieving eBPF objects from QEMU.
qemu_interface: Added routine for loading the eBPF objects.
qemu_command: Added "ebpf_rss_fds" support for virtio-net.
qemu_conf: Added configuration to optionally disable eBPF loading.
tests: Added tests for eBPF blob loading.
meson.build | 7 +
meson_options.txt | 1 +
src/qemu/meson.build | 1 +
src/qemu/qemu_capabilities.c | 126 +++++++++++
src/qemu/qemu_capabilities.h | 6 +
src/qemu/qemu_command.c | 63 ++++++
src/qemu/qemu_conf.c | 2 +
src/qemu/qemu_conf.h | 2 +
src/qemu/qemu_domain.c | 4 +
src/qemu/qemu_domain.h | 3 +
src/qemu/qemu_interface.c | 83 ++++++++
src/qemu/qemu_interface.h | 4 +
src/qemu/qemu_monitor.c | 13 ++
src/qemu/qemu_monitor.h | 3 +
src/qemu/qemu_monitor_json.c | 26 +++
src/qemu/qemu_monitor_json.h | 3 +
.../caps_9.0.0_x86_64.replies | 199 ++++++++++--------
.../caps_9.0.0_x86_64.xml | 4 +
tests/qemuxml2argvmock.c | 21 ++
.../net-virtio-rss-bpf.x86_64-latest.args | 37 ++++
.../net-virtio-rss-bpf.x86_64-latest.xml | 46 ++++
tests/qemuxmlconfdata/net-virtio-rss-bpf.xml | 46 ++++
tests/qemuxmlconftest.c | 4 +
23 files changed, 612 insertions(+), 92 deletions(-)
create mode 100644 tests/qemuxmlconfdata/net-virtio-rss-bpf.x86_64-latest.args
create mode 100644 tests/qemuxmlconfdata/net-virtio-rss-bpf.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/net-virtio-rss-bpf.xml
--
2.44.0
5 months, 1 week
[RFC PATCH 0/1] support deprecated-props from query-cpu-model-expansion
by Collin Walling
Overview
========
QEMU will soon support reporting an optional array of deprecated features for an expanded CPU model via the query-cpu-model-expansion command. The intended use of this data is to make it easier for a user to define a CPU model with features flagged as deprecated set to disabled, thus rendering the guest migratable to future hardware that will out-right drop support for said features.
Attached to this cover letter is only half of the bigger picture. I've updated the CPU model expansion ABI to parse the new array (if it's available) and store the result in a string list within the qemuMonitorCPUModelInfo struct. I also propose an approach on how to store/retrieve the list of deprecated features in the qemuCaps cache file. All feedback on this patch is certainly welcome. Please note: I do not provide any updates to the respective qemuCaps tests right now.
The main goal of this post is to discuss the other half of the design: reporting and enabling a CPU model with deprecated features disabled. I believe the ideal solution involves a way for the user to easily configure their guest with this new data. Two ideas I currently have are outlined below. Other approaches are encouraged.
Notes
=====
- In my example below, I am running on a z14.2 machine.
- The features that are flagged as deprecated for this model are: bpb, csske, cte, te.
- The discussion regarding the QEMU changes can be found here: https://mail.gnu.org/archive/html/qemu-devel/2024-04/msg04605.html
Example of Desired Changes
==========================
Here is what I'd like the resulting guest's transient XML to look like for the <cpu> section (I have trimmed the features list for brevity):
...
<cpu mode='custom' match='exact' check='partial'>
<model fallback='forbid'>z14.2-base</model>
<feature policy='require' name='aen'/>
<feature policy='require' name='cmmnt'/>
<feature policy='require' name='aefsi'/>
...
<feature policy='disable' name='cte'/> ***
<feature policy='require' name='ais'/>
<feature policy='disable' name='bpb'/> ***
<feature policy='require' name='ctop'/>
<feature policy='require' name='gs'/>
<feature policy='require' name='ppa15'/>
<feature policy='require' name='zpci'/>
<feature policy='require' name='sea_esop2'/>
<feature policy='disable' name='te'/> ***
<feature policy='require' name='cmm'/>
<feature policy='disable' name='csske'/> ***
</cpu>
...
Ideas
=====
New Host CPU Model
------------------
Create a new CPU model that is a mirror of the host CPU model with deprecated features turned off. Let's call this model "host-recommended". A user may define this model in the guest XML as they would any other CPU model:
<cpu mode='host-recommended' check='partial'/>
Just as how host-model works, anything defined nested in the <cpu> tag will be ignored.
This model could potentially be listed in the domcapabilities output after the host-model:
<cpu>
<mode name='host-passthrough' supported='yes'>
...
</mode>
...
<mode name='host-model' supported='yes'>
...
</mode>
<mode name='host-recommended' supported='yes'>
...
<feature policy='disable' name='cte'/>
<feature policy='require' name='ais'/>
<feature policy='disable' name='bpb'/>
<feature policy='require' name='ctop'/>
<feature policy='require' name='gs'/>
<feature policy='require' name='ppa15'/>
<feature policy='require' name='zpci'/>
<feature policy='require' name='sea_esop2'/>
<feature policy='disable' name='te'/>
<feature policy='require' name='cmm'/>
<feature policy='disable' name='csske'/>
</cpu>
New Nested Element Under <cpu>
------------------------------
Create a new optional XML element nested under the <cpu> tag that may be used to disable deprecated features. This approach is more explicit compared to creating a new CPU model, and allows the user to disable these features when defining a specific model other than host-model. Here is an example of what the guest's defined XML for the CPU could look like:
<cpu mode='host-model' check='partial'>
<deprecated_features>off</deprecated_features>
</cpu>
However, a conflict arises with this approach: parameter priority. It would need to be discussed what the expected behavior should be if a user defines a guest with both a mode to disable deprecated features and any deprecated features listed with the 'require' policy, e.g.:
<cpu mode='custom' match='exact' check='partial'>
<model fallback='allow'>z13.2-base</model>
<!-- which one takes priority? -->
<deprecated_features>off</deprecated_features>
<feature policy='require' name='csske'/>
</cpu>
Another conflict is setting this option to "on" would have no effect on the CPU model (I can't think of a reason why someone would want to explicitly enable these features). This may not communicate well to the user.
To report these features, a <deprecatedProperties> tag could be added to the domcapabilities output using the same format I use in my proposed patch for the qemu capabilities file:
<cpu>
<mode name='host-passthrough' supported='yes'>
...
</mode>
...
<mode name='host-model' supported='yes'>
...
</mode>
<deprecatedProperties>
<property name='bpb'/>
<property name='te'/>
<property name='cte'/>
<property name='csske'/>
</deprecatedProperties>
</cpu>
Please let me know your thoughts. Once an approach is agreed upon, I will begin development.
Collin Walling (1):
qemu: monitor: parse deprecated-props from query-cpu-model-expansion
response
src/qemu/qemu_capabilities.c | 30 ++++++++++++++++++++++++++++++
src/qemu/qemu_monitor.h | 2 ++
src/qemu/qemu_monitor_json.c | 29 ++++++++++++++++++++++++-----
3 files changed, 56 insertions(+), 5 deletions(-)
--
2.43.0
5 months, 2 weeks
[PATCH v2 00/20] node_dev_udev: use workerpool and improve nodedev events
by Marc Hartmayer
When an udev event occurs for a mediated device (mdev) the mdev config data
requires an update via mdevctl as the udev event does not contain all config
data. This update needs to occur immediately and to be finished before the
libvirt nodedev event is issued to keep the API usage reliable.
Changelog:
v1->v2:
+ squashed old patches 3 and 17 (comments from Jonathon and Boris)
+ added r-b's from Jonathon and Boris
+ worked in comments from Jonathon to old patch 15
+ added comment why only one worker can currently be used
+ added patch `node_device_udev: remove incorrect G_GNUC_UNUSED`
RFCv1->v1:
+ removed some of my own s-o-b's that were accidentally inserted in the RFC
+ added r-b's from Boris and Jonathon
+ worked in comments from Boris and Jonathon, but I did not inline
"nodeDeviceDefResetMdevActiveConfig" as I'm not sure whether this improves
the readability
+ reworked patch "[RFC PATCH v1 11/15] node_device_udev: Use
`stateShutdownPrepare` and `stateShutdownWait`"
+ reworked patch "node_device_udev: Use a worker pool for processing events and
emitting nodedev event"
+ added patches:
- node_device_udev: Move responsibility to release `(init|udev)Thread` to `udevEventDataDispose`
- node_device_udev: Fix leak of mdevctlLock, udevThreadCond, and mdevCtlMonitor
- node_device_udev: nodeStateShutdownPrepare: Disconnect the signals explicitly
- node_device_udev: Pass the driver state as parameter in prepartion for the next commit
- node_device_udev: Add support for `g_autoptr` to `udevEventData
- node_device_udev: Pass the `udevEventData` via parameter and use refcounting
Boris Fiuczynski (3):
nodedev: fix mdev add udev event data handling
nodedev: immediate update of active config on udev events
nodedev: reset active config data on udev remove event
Marc Hartmayer (17):
node_device_udev: Set @def to NULL
node_device_udev: Remove the timeout if the data is disposed
node_device_udev: Test for mdevctlTimeout != -1
node_device_udev: Don't take `mdevctlLock` for `mdevctl list` and add
comments about locking
node_device_udev: Take lock if `driver->privateData` is modified
node_device_udev: Add prefix `udev` for udev related data
node_device_udev: Inline `udevRemoveOneDevice`
node_device_udev: Move responsibility to release `(init|udev)Thread`
to `udevEventDataDispose`
node_device_udev: Fix leak of mdevctlLock, udevThreadCond, and
mdevCtlMonitors
node_device_udev: Introduce and use `stateShutdownPrepare` and
`stateShutdownWait`
node_device_udev: nodeStateShutdownPrepare: Disconnect the signals
explicitly
node_device_udev: Pass the driver state as parameter in preparation
for the next commit
node_device_udev: Use a worker pool for processing events and emitting
nodedev event
node_device_udev: Make the code easier to read
node_device_udev: Add support for `g_autoptr` to `udevEventData`
node_device_udev: Pass the `udevEventData` via parameter and use
refcounting
node_device_udev: remove incorrect G_GNUC_UNUSED
src/node_device/node_device_driver.h | 5 +-
src/util/virmdev.h | 4 +
src/conf/node_device_conf.c | 10 +-
src/node_device/node_device_driver.c | 28 +-
src/node_device/node_device_udev.c | 516 ++++++++++++++++++---------
src/test/test_driver.c | 11 +-
src/util/virmdev.c | 20 ++
src/libvirt_private.syms | 2 +
8 files changed, 398 insertions(+), 198 deletions(-)
base-commit: c38720b337f74337ec94c0fe2e97a7c2c57188ae
--
2.34.1
5 months, 2 weeks
[PATCH 0/2] Make affinity setting a bit more debug friendly
by Michal Privoznik
*** BLURB HERE ***
Michal Prívozník (2):
qemu_process: Issue an info message when subtracting isolcpus
virprocess: Debug affinity map in virProcessSetAffinity()
src/qemu/qemu_process.c | 6 ++++++
src/util/virprocess.c | 6 ++++--
2 files changed, 10 insertions(+), 2 deletions(-)
--
2.43.2
5 months, 3 weeks