[PATCH rfcv3 00/11] LIBVIRT: X86: TDX support

Hi, This series brings libvirt the x86 TDX support. * What's TDX? TDX stands for Trust Domain Extensions which isolates VMs from the virtual-machine manager (VMM)/hypervisor and any other software on the platform. To support TDX, multiple software components, not only KVM but also QEMU, guest Linux and virtual bios, need to be updated. For more details, please check link[1], there are TDX spec links and public repository link at github for each software component. This patchset is another software component to extend libvirt to support TDX, with which one can start a VM from high level rather than running qemu directly. * Misc As QEMU use a software emulated way to reset guest which isn't supported by TDX guest for security reason. We add a new way to emulate the reset for TDX guest, called "hard reboot". We achieve this by killing old qemu and start a new one. Complete code can be found at [1], matching qemu code can be found at [2]. There are some new properties for tdx-guest object, i.e. `mrconfigid`, `mrowner`, `mrownerconfig` and `debug` which aren't in matching qemu[2] yet. I keep them intentionally as they will be implemented in qemu as extention series of [2]. * Test start/stop/reboot with virsh stop/reboot trigger in guest stop with on_poweroff=destroy/restart reboot with on_reboot=destroy/restart * Patch organization - patch 1-3: Support query of TDX capabilities. - patch 4-6: Add TDX type to launchsecurity framework. - patch 7-11: Add hard reboot support to TDX guest [1] https://github.com/intel/libvirt-tdx/commits/tdx_for_upstream_rfcv3 [2] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v3 Thanks Zhenzhong Changelog: rfcv3: - Change to generate qemu cmdline with -bios - drop firmware auto match as -bios is used - add a hard reboot method to reboot TDX guest rfcv2: - give up using qmp cmd and check TDX directly on host for TDX capabilities. - use launchsecurity framework to support TDX - use <os>.<loader> for general loader - add auto firmware match feature for TDX A example TDVF fimware description file 70-edk2-x86_64-tdx.json: { "description": "UEFI firmware for x86_64, supporting Intel TDX", "interface-types": [ "uefi" ], "mapping": { "device": "generic", "filename": "/usr/share/OVMF/OVMF_CODE-tdx.fd" }, "targets": [ { "architecture": "x86_64", "machines": [ "pc-q35-*" ] } ], "features": [ "intel-tdx", "verbose-dynamic" ], "tags": [ ] } rfcv2: https://www.mail-archive.com/libvir-list@redhat.com/msg219378.html Chenyi Qiang (3): qemu: add hard reboot in QEMU driver qemu: make hard reboot as the TDX default reboot mode virsh: add new option "timekeep" to keep virsh console alive Zhenzhong Duan (8): qemu: Check if INTEL Trust Domain Extention support is enabled qemu: Add TDX capability conf: expose TDX feature in domain capabilities conf: add tdx as launch security type qemu: Add command line and validation for TDX type qemu: force special parameters enabled for TDX guest qemu: Extend hard reboot in Qemu driver conf: Add support to keep same domid for hard reboot docs/formatdomaincaps.rst | 1 + include/libvirt/libvirt-domain.h | 2 + src/conf/domain_capabilities.c | 1 + src/conf/domain_capabilities.h | 1 + src/conf/domain_conf.c | 50 ++++++++++++++++ src/conf/domain_conf.h | 11 ++++ src/conf/schemas/domaincaps.rng | 9 +++ src/conf/schemas/domaincommon.rng | 34 +++++++++++ src/conf/virconftypes.h | 2 + src/qemu/qemu_capabilities.c | 38 +++++++++++- src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 29 +++++++++ src/qemu/qemu_domain.c | 18 ++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 85 ++++++++++++++++++++------ src/qemu/qemu_firmware.c | 1 + src/qemu/qemu_monitor.c | 19 +++++- src/qemu/qemu_monitor.h | 2 +- src/qemu/qemu_monitor_json.c | 6 +- src/qemu/qemu_namespace.c | 1 + src/qemu/qemu_process.c | 99 ++++++++++++++++++++++++++++++- src/qemu/qemu_validate.c | 18 ++++++ tools/virsh-console.c | 3 + tools/virsh-domain.c | 64 +++++++++++++++----- tools/virsh.h | 1 + 25 files changed, 463 insertions(+), 37 deletions(-) -- 2.34.1

Implement TDX check in order to generate domain feature capability correctly in case the availability of the feature changed. For INTEL TDX the verification is: - checking if "/sys/module/kvm_intel/parameters/tdx" contains the value 'Y': meaning TDX is enabled in the host kernel. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_capabilities.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 83119e871a..5f806c68fb 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -5098,6 +5098,24 @@ virQEMUCapsKVMSupportsSecureGuestAMD(void) } +/* + * Check whether INTEL Trust Domain Extention (x86) is enabled + */ +static bool +virQEMUCapsKVMSupportsSecureGuestINTEL(void) +{ + g_autofree char *modValue = NULL; + + if (virFileReadValueString(&modValue, "/sys/module/kvm_intel/parameters/tdx") < 0) + return false; + + if (modValue[0] != 'Y') + return false; + + return true; +} + + /* * Check whether the secure guest functionality is enabled. * See the specific architecture function for details on the verifications made. @@ -5111,7 +5129,8 @@ virQEMUCapsKVMSupportsSecureGuest(void) return virQEMUCapsKVMSupportsSecureGuestS390(); if (ARCH_IS_X86(arch)) - return virQEMUCapsKVMSupportsSecureGuestAMD(); + return virQEMUCapsKVMSupportsSecureGuestAMD() || + virQEMUCapsKVMSupportsSecureGuestINTEL(); return false; } -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:11PM +0800, Zhenzhong Duan wrote:
Implement TDX check in order to generate domain feature capability correctly in case the availability of the feature changed.
For INTEL TDX the verification is: - checking if "/sys/module/kvm_intel/parameters/tdx" contains the value 'Y': meaning TDX is enabled in the host kernel.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_capabilities.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 83119e871a..5f806c68fb 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -5098,6 +5098,24 @@ virQEMUCapsKVMSupportsSecureGuestAMD(void) }
+/* + * Check whether INTEL Trust Domain Extention (x86) is enabled + */ +static bool +virQEMUCapsKVMSupportsSecureGuestINTEL(void) +{ + g_autofree char *modValue = NULL; + + if (virFileReadValueString(&modValue, "/sys/module/kvm_intel/parameters/tdx") < 0) + return false; + + if (modValue[0] != 'Y') + return false; + + return true; +}
It is worth adding this as a check to tools/virt-host-validate-qemu.c too, but not a requirement for this patch.
+ + /* * Check whether the secure guest functionality is enabled. * See the specific architecture function for details on the verifications made. @@ -5111,7 +5129,8 @@ virQEMUCapsKVMSupportsSecureGuest(void) return virQEMUCapsKVMSupportsSecureGuestS390();
if (ARCH_IS_X86(arch)) - return virQEMUCapsKVMSupportsSecureGuestAMD(); + return virQEMUCapsKVMSupportsSecureGuestAMD() || + virQEMUCapsKVMSupportsSecureGuestINTEL();
return false; }
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 01/11] qemu: Check if INTEL Trust Domain Extention support is enabled
On Mon, Nov 27, 2023 at 04:55:11PM +0800, Zhenzhong Duan wrote:
Implement TDX check in order to generate domain feature capability correctly in case the availability of the feature changed.
For INTEL TDX the verification is: - checking if "/sys/module/kvm_intel/parameters/tdx" contains the value 'Y': meaning TDX is enabled in the host kernel.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_capabilities.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 83119e871a..5f806c68fb 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -5098,6 +5098,24 @@ virQEMUCapsKVMSupportsSecureGuestAMD(void) }
+/* + * Check whether INTEL Trust Domain Extention (x86) is enabled + */ +static bool +virQEMUCapsKVMSupportsSecureGuestINTEL(void) +{ + g_autofree char *modValue = NULL; + + if (virFileReadValueString(&modValue, "/sys/module/kvm_intel/parameters/tdx") < 0) + return false; + + if (modValue[0] != 'Y') + return false; + + return true; +}
It is worth adding this as a check to tools/virt-host-validate-qemu.c too, but not a requirement for this patch.
Got it, will do. Thanks Zhenzhong
+ + /* * Check whether the secure guest functionality is enabled. * See the specific architecture function for details on the verifications made. @@ -5111,7 +5129,8 @@ virQEMUCapsKVMSupportsSecureGuest(void) return virQEMUCapsKVMSupportsSecureGuestS390();
if (ARCH_IS_X86(arch)) - return virQEMUCapsKVMSupportsSecureGuestAMD(); + return virQEMUCapsKVMSupportsSecureGuestAMD() || + virQEMUCapsKVMSupportsSecureGuestINTEL();
return false; }
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

QEMU_CAPS_TDX_GUEST set means TDX supported with this QEMU. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 3 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 5f806c68fb..8764df5e9d 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -698,6 +698,7 @@ VIR_ENUM_IMPL(virQEMUCaps, /* 450 */ "run-with.async-teardown", /* QEMU_CAPS_RUN_WITH_ASYNC_TEARDOWN */ "virtio-blk-vhost-vdpa", /* QEMU_CAPS_DEVICE_VIRTIO_BLK_VHOST_VDPA */ + "tdx-guest", /* QEMU_CAPS_TDX_GUEST */ ); @@ -1385,6 +1386,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "virtio-crypto-device", QEMU_CAPS_DEVICE_VIRTIO_CRYPTO }, { "cryptodev-backend-lkcf", QEMU_CAPS_OBJECT_CRYPTO_LKCF }, { "pvpanic-pci", QEMU_CAPS_DEVICE_PANIC_PCI }, + { "tdx-guest", QEMU_CAPS_TDX_GUEST}, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 3c4f7f625b..b88058f6fb 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -677,6 +677,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 450 */ QEMU_CAPS_RUN_WITH_ASYNC_TEARDOWN, /* asynchronous teardown -run-with async-teardown=on|off */ QEMU_CAPS_DEVICE_VIRTIO_BLK_VHOST_VDPA, /* virtio-blk-vhost-vdpa block driver */ + QEMU_CAPS_TDX_GUEST, /* -object tdx-guest,... */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:12PM +0800, Zhenzhong Duan wrote:
QEMU_CAPS_TDX_GUEST set means TDX supported with this QEMU.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 3 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> NB, when the time comes to merge this, it'll probably require an update to some of the QEMU capabilities files in tests/, but that'll require the QEMU pieces to be merged first of course. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 02/11] qemu: Add TDX capability
On Mon, Nov 27, 2023 at 04:55:12PM +0800, Zhenzhong Duan wrote:
QEMU_CAPS_TDX_GUEST set means TDX supported with this QEMU.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 3 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
NB, when the time comes to merge this, it'll probably require an update to some of the QEMU capabilities files in tests/, but that'll require the QEMU pieces to be merged first of course.
Got it. Thanks Zhenzhong

Extend qemu TDX capability to domain capabilities. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- docs/formatdomaincaps.rst | 1 + src/conf/domain_capabilities.c | 1 + src/conf/domain_capabilities.h | 1 + src/conf/schemas/domaincaps.rng | 9 +++++++++ src/qemu/qemu_capabilities.c | 15 +++++++++++++++ 5 files changed, 27 insertions(+) diff --git a/docs/formatdomaincaps.rst b/docs/formatdomaincaps.rst index ef752a0f3a..3acc9a12b4 100644 --- a/docs/formatdomaincaps.rst +++ b/docs/formatdomaincaps.rst @@ -669,6 +669,7 @@ capabilities. All features occur as children of the main ``features`` element. <value>vapic</value> </enum> </hyperv> + <tdx supported='yes'/> </features> </domainCapabilities> diff --git a/src/conf/domain_capabilities.c b/src/conf/domain_capabilities.c index f6e09dc584..0f9ddb1e48 100644 --- a/src/conf/domain_capabilities.c +++ b/src/conf/domain_capabilities.c @@ -42,6 +42,7 @@ VIR_ENUM_IMPL(virDomainCapsFeature, "backup", "async-teardown", "s390-pv", + "tdx", ); static virClass *virDomainCapsClass; diff --git a/src/conf/domain_capabilities.h b/src/conf/domain_capabilities.h index 01bcfa2e39..cc44cf2363 100644 --- a/src/conf/domain_capabilities.h +++ b/src/conf/domain_capabilities.h @@ -250,6 +250,7 @@ typedef enum { VIR_DOMAIN_CAPS_FEATURE_BACKUP, VIR_DOMAIN_CAPS_FEATURE_ASYNC_TEARDOWN, VIR_DOMAIN_CAPS_FEATURE_S390_PV, + VIR_DOMAIN_CAPS_FEATURE_TDX, VIR_DOMAIN_CAPS_FEATURE_LAST } virDomainCapsFeature; diff --git a/src/conf/schemas/domaincaps.rng b/src/conf/schemas/domaincaps.rng index e7aa4a1066..a5522b1e67 100644 --- a/src/conf/schemas/domaincaps.rng +++ b/src/conf/schemas/domaincaps.rng @@ -308,6 +308,9 @@ <optional> <ref name="s390-pv"/> </optional> + <optional> + <ref name="tdx"/> + </optional> <optional> <ref name="sev"/> </optional> @@ -363,6 +366,12 @@ </element> </define> + <define name="tdx"> + <element name="tdx"> + <ref name="supported"/> + </element> + </define> + <define name="sev"> <element name="sev"> <ref name="supported"/> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 8764df5e9d..0b4988256f 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -6657,6 +6657,20 @@ virQEMUCapsFillDomainFeatureHypervCaps(virQEMUCaps *qemuCaps, } +static void +virQEMUCapsFillDomainFeatureTDXCaps(virQEMUCaps *qemuCaps, + virDomainCaps *domCaps) +{ + if (domCaps->arch == VIR_ARCH_X86_64 && + domCaps->virttype == VIR_DOMAIN_VIRT_KVM && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_TDX_GUEST) && + virQEMUCapsGetKVMSupportsSecureGuest(qemuCaps) && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT) && + (STREQ(domCaps->machine, "q35") || STRPREFIX(domCaps->machine, "pc-q35-"))) + domCaps->features[VIR_DOMAIN_CAPS_FEATURE_TDX] = VIR_TRISTATE_BOOL_YES; +} + + int virQEMUCapsFillDomainCaps(virQEMUCaps *qemuCaps, virArch hostarch, @@ -6716,6 +6730,7 @@ virQEMUCapsFillDomainCaps(virQEMUCaps *qemuCaps, virQEMUCapsFillDomainFeatureSGXCaps(qemuCaps, domCaps); virQEMUCapsFillDomainFeatureHypervCaps(qemuCaps, domCaps); virQEMUCapsFillDomainDeviceCryptoCaps(qemuCaps, crypto); + virQEMUCapsFillDomainFeatureTDXCaps(qemuCaps, domCaps); return 0; } -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:13PM +0800, Zhenzhong Duan wrote:
Extend qemu TDX capability to domain capabilities.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- docs/formatdomaincaps.rst | 1 + src/conf/domain_capabilities.c | 1 + src/conf/domain_capabilities.h | 1 + src/conf/schemas/domaincaps.rng | 9 +++++++++ src/qemu/qemu_capabilities.c | 15 +++++++++++++++ 5 files changed, 27 insertions(+)
diff --git a/docs/formatdomaincaps.rst b/docs/formatdomaincaps.rst index ef752a0f3a..3acc9a12b4 100644 --- a/docs/formatdomaincaps.rst +++ b/docs/formatdomaincaps.rst @@ -669,6 +669,7 @@ capabilities. All features occur as children of the main ``features`` element. <value>vapic</value> </enum> </hyperv> + <tdx supported='yes'/> </features> </domainCapabilities>
diff --git a/src/conf/domain_capabilities.c b/src/conf/domain_capabilities.c index f6e09dc584..0f9ddb1e48 100644 --- a/src/conf/domain_capabilities.c +++ b/src/conf/domain_capabilities.c @@ -42,6 +42,7 @@ VIR_ENUM_IMPL(virDomainCapsFeature, "backup", "async-teardown", "s390-pv", + "tdx", );
static virClass *virDomainCapsClass; diff --git a/src/conf/domain_capabilities.h b/src/conf/domain_capabilities.h index 01bcfa2e39..cc44cf2363 100644 --- a/src/conf/domain_capabilities.h +++ b/src/conf/domain_capabilities.h @@ -250,6 +250,7 @@ typedef enum { VIR_DOMAIN_CAPS_FEATURE_BACKUP, VIR_DOMAIN_CAPS_FEATURE_ASYNC_TEARDOWN, VIR_DOMAIN_CAPS_FEATURE_S390_PV, + VIR_DOMAIN_CAPS_FEATURE_TDX,
VIR_DOMAIN_CAPS_FEATURE_LAST } virDomainCapsFeature; diff --git a/src/conf/schemas/domaincaps.rng b/src/conf/schemas/domaincaps.rng index e7aa4a1066..a5522b1e67 100644 --- a/src/conf/schemas/domaincaps.rng +++ b/src/conf/schemas/domaincaps.rng @@ -308,6 +308,9 @@ <optional> <ref name="s390-pv"/> </optional> + <optional> + <ref name="tdx"/> + </optional> <optional> <ref name="sev"/> </optional> @@ -363,6 +366,12 @@ </element> </define>
+ <define name="tdx"> + <element name="tdx"> + <ref name="supported"/> + </element> + </define> + <define name="sev"> <element name="sev"> <ref name="supported"/> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 8764df5e9d..0b4988256f 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -6657,6 +6657,20 @@ virQEMUCapsFillDomainFeatureHypervCaps(virQEMUCaps *qemuCaps, }
+static void +virQEMUCapsFillDomainFeatureTDXCaps(virQEMUCaps *qemuCaps, + virDomainCaps *domCaps) +{ + if (domCaps->arch == VIR_ARCH_X86_64 && + domCaps->virttype == VIR_DOMAIN_VIRT_KVM && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_TDX_GUEST) && + virQEMUCapsGetKVMSupportsSecureGuest(qemuCaps) && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT) &&
Checking QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT is overkill, as we can assume that is implied to exist by virtue of QEMU_CAPS_TDX_GUEST existing.
+ (STREQ(domCaps->machine, "q35") || STRPREFIX(domCaps->machine, "pc-q35-")))
If QEMU has limited its support for TDX to just q35, then it would be much better if QEMU patches for TDX provided a way to detect this via QMP, so we don't need to do these string comparisons.
+ domCaps->features[VIR_DOMAIN_CAPS_FEATURE_TDX] = VIR_TRISTATE_BOOL_YES; +} + + int virQEMUCapsFillDomainCaps(virQEMUCaps *qemuCaps, virArch hostarch, @@ -6716,6 +6730,7 @@ virQEMUCapsFillDomainCaps(virQEMUCaps *qemuCaps, virQEMUCapsFillDomainFeatureSGXCaps(qemuCaps, domCaps); virQEMUCapsFillDomainFeatureHypervCaps(qemuCaps, domCaps); virQEMUCapsFillDomainDeviceCryptoCaps(qemuCaps, crypto); + virQEMUCapsFillDomainFeatureTDXCaps(qemuCaps, domCaps);
return 0; } -- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 03/11] conf: expose TDX feature in domain capabilities
Extend qemu TDX capability to domain capabilities.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- docs/formatdomaincaps.rst | 1 + src/conf/domain_capabilities.c | 1 + src/conf/domain_capabilities.h | 1 + src/conf/schemas/domaincaps.rng | 9 +++++++++ src/qemu/qemu_capabilities.c | 15 +++++++++++++++ 5 files changed, 27 insertions(+)
diff --git a/docs/formatdomaincaps.rst b/docs/formatdomaincaps.rst index ef752a0f3a..3acc9a12b4 100644 --- a/docs/formatdomaincaps.rst +++ b/docs/formatdomaincaps.rst @@ -669,6 +669,7 @@ capabilities. All features occur as children of the
On Mon, Nov 27, 2023 at 04:55:13PM +0800, Zhenzhong Duan wrote: main ``features`` element.
<value>vapic</value> </enum> </hyperv> + <tdx supported='yes'/> </features> </domainCapabilities>
diff --git a/src/conf/domain_capabilities.c
b/src/conf/domain_capabilities.c
index f6e09dc584..0f9ddb1e48 100644 --- a/src/conf/domain_capabilities.c +++ b/src/conf/domain_capabilities.c @@ -42,6 +42,7 @@ VIR_ENUM_IMPL(virDomainCapsFeature, "backup", "async-teardown", "s390-pv", + "tdx", );
static virClass *virDomainCapsClass; diff --git a/src/conf/domain_capabilities.h b/src/conf/domain_capabilities.h index 01bcfa2e39..cc44cf2363 100644 --- a/src/conf/domain_capabilities.h +++ b/src/conf/domain_capabilities.h @@ -250,6 +250,7 @@ typedef enum { VIR_DOMAIN_CAPS_FEATURE_BACKUP, VIR_DOMAIN_CAPS_FEATURE_ASYNC_TEARDOWN, VIR_DOMAIN_CAPS_FEATURE_S390_PV, + VIR_DOMAIN_CAPS_FEATURE_TDX,
VIR_DOMAIN_CAPS_FEATURE_LAST } virDomainCapsFeature; diff --git a/src/conf/schemas/domaincaps.rng b/src/conf/schemas/domaincaps.rng index e7aa4a1066..a5522b1e67 100644 --- a/src/conf/schemas/domaincaps.rng +++ b/src/conf/schemas/domaincaps.rng @@ -308,6 +308,9 @@ <optional> <ref name="s390-pv"/> </optional> + <optional> + <ref name="tdx"/> + </optional> <optional> <ref name="sev"/> </optional> @@ -363,6 +366,12 @@ </element> </define>
+ <define name="tdx"> + <element name="tdx"> + <ref name="supported"/> + </element> + </define> + <define name="sev"> <element name="sev"> <ref name="supported"/> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 8764df5e9d..0b4988256f 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -6657,6 +6657,20 @@ virQEMUCapsFillDomainFeatureHypervCaps(virQEMUCaps *qemuCaps, }
+static void +virQEMUCapsFillDomainFeatureTDXCaps(virQEMUCaps *qemuCaps, + virDomainCaps *domCaps) +{ + if (domCaps->arch == VIR_ARCH_X86_64 && + domCaps->virttype == VIR_DOMAIN_VIRT_KVM && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_TDX_GUEST) && + virQEMUCapsGetKVMSupportsSecureGuest(qemuCaps) && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT) &&
Checking QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT is overkill, as we can assume that is implied to exist by virtue of QEMU_CAPS_TDX_GUEST existing.
Good catch, will remove it.
+ (STREQ(domCaps->machine, "q35") || STRPREFIX(domCaps- machine, "pc-q35-")))
If QEMU has limited its support for TDX to just q35, then it would be much better if QEMU patches for TDX provided a way to detect this via QMP, so we don't need to do these string comparisons.
IIUC, QEMU has no limitation on using i440FX with TDX. We had this check in libvirt as we thought i440FX is deprecated. @Yamahata, Isaku, @Li, Xiaoyao, so now we have two choices: 1. remove this limitation in libvirt 2. add QMP on QEMU side to report this limitation to libvirt Which choice do you like? Thanks Zhenzhong
+ domCaps->features[VIR_DOMAIN_CAPS_FEATURE_TDX] = VIR_TRISTATE_BOOL_YES; +} + + int virQEMUCapsFillDomainCaps(virQEMUCaps *qemuCaps, virArch hostarch, @@ -6716,6 +6730,7 @@ virQEMUCapsFillDomainCaps(virQEMUCaps *qemuCaps, virQEMUCapsFillDomainFeatureSGXCaps(qemuCaps, domCaps); virQEMUCapsFillDomainFeatureHypervCaps(qemuCaps, domCaps); virQEMUCapsFillDomainDeviceCryptoCaps(qemuCaps, crypto); + virQEMUCapsFillDomainFeatureTDXCaps(qemuCaps, domCaps);
return 0; } -- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Jan 11, 2024 at 03:43:34AM +0000, Duan, Zhenzhong wrote:
-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 03/11] conf: expose TDX feature in domain capabilities
Extend qemu TDX capability to domain capabilities.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- docs/formatdomaincaps.rst | 1 + src/conf/domain_capabilities.c | 1 + src/conf/domain_capabilities.h | 1 + src/conf/schemas/domaincaps.rng | 9 +++++++++ src/qemu/qemu_capabilities.c | 15 +++++++++++++++ 5 files changed, 27 insertions(+)
diff --git a/docs/formatdomaincaps.rst b/docs/formatdomaincaps.rst index ef752a0f3a..3acc9a12b4 100644 --- a/docs/formatdomaincaps.rst +++ b/docs/formatdomaincaps.rst @@ -669,6 +669,7 @@ capabilities. All features occur as children of the
On Mon, Nov 27, 2023 at 04:55:13PM +0800, Zhenzhong Duan wrote: main ``features`` element.
<value>vapic</value> </enum> </hyperv> + <tdx supported='yes'/> </features> </domainCapabilities>
diff --git a/src/conf/domain_capabilities.c
b/src/conf/domain_capabilities.c
index f6e09dc584..0f9ddb1e48 100644 --- a/src/conf/domain_capabilities.c +++ b/src/conf/domain_capabilities.c @@ -42,6 +42,7 @@ VIR_ENUM_IMPL(virDomainCapsFeature, "backup", "async-teardown", "s390-pv", + "tdx", );
static virClass *virDomainCapsClass; diff --git a/src/conf/domain_capabilities.h b/src/conf/domain_capabilities.h index 01bcfa2e39..cc44cf2363 100644 --- a/src/conf/domain_capabilities.h +++ b/src/conf/domain_capabilities.h @@ -250,6 +250,7 @@ typedef enum { VIR_DOMAIN_CAPS_FEATURE_BACKUP, VIR_DOMAIN_CAPS_FEATURE_ASYNC_TEARDOWN, VIR_DOMAIN_CAPS_FEATURE_S390_PV, + VIR_DOMAIN_CAPS_FEATURE_TDX,
VIR_DOMAIN_CAPS_FEATURE_LAST } virDomainCapsFeature; diff --git a/src/conf/schemas/domaincaps.rng b/src/conf/schemas/domaincaps.rng index e7aa4a1066..a5522b1e67 100644 --- a/src/conf/schemas/domaincaps.rng +++ b/src/conf/schemas/domaincaps.rng @@ -308,6 +308,9 @@ <optional> <ref name="s390-pv"/> </optional> + <optional> + <ref name="tdx"/> + </optional> <optional> <ref name="sev"/> </optional> @@ -363,6 +366,12 @@ </element> </define>
+ <define name="tdx"> + <element name="tdx"> + <ref name="supported"/> + </element> + </define> + <define name="sev"> <element name="sev"> <ref name="supported"/> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 8764df5e9d..0b4988256f 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -6657,6 +6657,20 @@ virQEMUCapsFillDomainFeatureHypervCaps(virQEMUCaps *qemuCaps, }
+static void +virQEMUCapsFillDomainFeatureTDXCaps(virQEMUCaps *qemuCaps, + virDomainCaps *domCaps) +{ + if (domCaps->arch == VIR_ARCH_X86_64 && + domCaps->virttype == VIR_DOMAIN_VIRT_KVM && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_TDX_GUEST) && + virQEMUCapsGetKVMSupportsSecureGuest(qemuCaps) && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT) &&
Checking QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT is overkill, as we can assume that is implied to exist by virtue of QEMU_CAPS_TDX_GUEST existing.
Good catch, will remove it.
+ (STREQ(domCaps->machine, "q35") || STRPREFIX(domCaps- machine, "pc-q35-")))
If QEMU has limited its support for TDX to just q35, then it would be much better if QEMU patches for TDX provided a way to detect this via QMP, so we don't need to do these string comparisons.
IIUC, QEMU has no limitation on using i440FX with TDX. We had this check in libvirt as we thought i440FX is deprecated.
Libvirt has not deprecated i440FX, and neither has QEMU upstream. RHEL, as a downstream, though has deprecated it.
@Yamahata, Isaku, @Li, Xiaoyao, so now we have two choices: 1. remove this limitation in libvirt
Just remove this limitation in libvirt, since RHEL support limitations are not relevant to the code.
2. add QMP on QEMU side to report this limitation to libvirt
Which choice do you like?
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 03/11] conf: expose TDX feature in domain capabilities
On Thu, Jan 11, 2024 at 03:43:34AM +0000, Duan, Zhenzhong wrote:
-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 03/11] conf: expose TDX feature in domain capabilities
Extend qemu TDX capability to domain capabilities.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- docs/formatdomaincaps.rst | 1 + src/conf/domain_capabilities.c | 1 + src/conf/domain_capabilities.h | 1 + src/conf/schemas/domaincaps.rng | 9 +++++++++ src/qemu/qemu_capabilities.c | 15 +++++++++++++++ 5 files changed, 27 insertions(+)
diff --git a/docs/formatdomaincaps.rst b/docs/formatdomaincaps.rst index ef752a0f3a..3acc9a12b4 100644 --- a/docs/formatdomaincaps.rst +++ b/docs/formatdomaincaps.rst @@ -669,6 +669,7 @@ capabilities. All features occur as children of the
On Mon, Nov 27, 2023 at 04:55:13PM +0800, Zhenzhong Duan wrote: main ``features`` element.
<value>vapic</value> </enum> </hyperv> + <tdx supported='yes'/> </features> </domainCapabilities>
diff --git a/src/conf/domain_capabilities.c
b/src/conf/domain_capabilities.c
index f6e09dc584..0f9ddb1e48 100644 --- a/src/conf/domain_capabilities.c +++ b/src/conf/domain_capabilities.c @@ -42,6 +42,7 @@ VIR_ENUM_IMPL(virDomainCapsFeature, "backup", "async-teardown", "s390-pv", + "tdx", );
static virClass *virDomainCapsClass; diff --git a/src/conf/domain_capabilities.h b/src/conf/domain_capabilities.h index 01bcfa2e39..cc44cf2363 100644 --- a/src/conf/domain_capabilities.h +++ b/src/conf/domain_capabilities.h @@ -250,6 +250,7 @@ typedef enum { VIR_DOMAIN_CAPS_FEATURE_BACKUP, VIR_DOMAIN_CAPS_FEATURE_ASYNC_TEARDOWN, VIR_DOMAIN_CAPS_FEATURE_S390_PV, + VIR_DOMAIN_CAPS_FEATURE_TDX,
VIR_DOMAIN_CAPS_FEATURE_LAST } virDomainCapsFeature; diff --git a/src/conf/schemas/domaincaps.rng b/src/conf/schemas/domaincaps.rng index e7aa4a1066..a5522b1e67 100644 --- a/src/conf/schemas/domaincaps.rng +++ b/src/conf/schemas/domaincaps.rng @@ -308,6 +308,9 @@ <optional> <ref name="s390-pv"/> </optional> + <optional> + <ref name="tdx"/> + </optional> <optional> <ref name="sev"/> </optional> @@ -363,6 +366,12 @@ </element> </define>
+ <define name="tdx"> + <element name="tdx"> + <ref name="supported"/> + </element> + </define> + <define name="sev"> <element name="sev"> <ref name="supported"/> diff --git a/src/qemu/qemu_capabilities.c
b/src/qemu/qemu_capabilities.c
index 8764df5e9d..0b4988256f 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -6657,6 +6657,20 @@ virQEMUCapsFillDomainFeatureHypervCaps(virQEMUCaps *qemuCaps, }
+static void +virQEMUCapsFillDomainFeatureTDXCaps(virQEMUCaps *qemuCaps, + virDomainCaps *domCaps) +{ + if (domCaps->arch == VIR_ARCH_X86_64 && + domCaps->virttype == VIR_DOMAIN_VIRT_KVM && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_TDX_GUEST) && + virQEMUCapsGetKVMSupportsSecureGuest(qemuCaps) && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT) &&
Checking QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT is overkill, as we can assume that is implied to exist by virtue of QEMU_CAPS_TDX_GUEST existing.
Good catch, will remove it.
+ (STREQ(domCaps->machine, "q35") || STRPREFIX(domCaps- machine, "pc-q35-")))
If QEMU has limited its support for TDX to just q35, then it would be much better if QEMU patches for TDX provided a way to detect this via QMP, so we don't need to do these string comparisons.
IIUC, QEMU has no limitation on using i440FX with TDX. We had this check in libvirt as we thought i440FX is deprecated.
Libvirt has not deprecated i440FX, and neither has QEMU upstream.
RHEL, as a downstream, though has deprecated it.
@Yamahata, Isaku, @Li, Xiaoyao, so now we have two choices: 1. remove this limitation in libvirt
Just remove this limitation in libvirt, since RHEL support limitations are not relevant to the code.
Got it, will do. Thanks Zhenzhong

When 'tdx' is used, the VM will launched with Intel TDX feature enabled. TDX feature supports running encrypted VM (Trust Domain, TD) under the control of KVM. A TD runs in a CPU model which protects the confidentiality of its memory and its CPU state from other software There is a child element 'policy' and four optional element for tdx type. In 'policy', bit 0 is set to enable TDX debug, bit 28 set to enable sept-ve-disable, other bits are reserved currently. mrConfigId, mrOwner and mrOwnerConfig are hex string of 48 * 2 length each. Quote-Generation-Service is string to specify Quote Generation Service(QGS) in qemu socket address format. The examples of the supported format are "vsock:2:1234", "unix:/run/qgs", "localhost:1234". For example: <launchSecurity type='tdx'> <policy>0x1</policy> <mrConfigId>xxx...xxx</mrConfigId> <mrOwner>xxx...xxx</mrOwner> <mrOwnerConfig>xxx...xxx</mrOwnerConfig> <Quote-Generation-Service>xxx</Quote-Generation-Service> </launchSecurity> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/conf/domain_conf.c | 46 +++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 10 +++++++ src/conf/schemas/domaincommon.rng | 34 +++++++++++++++++++++++ src/conf/virconftypes.h | 2 ++ src/qemu/qemu_command.c | 2 ++ src/qemu/qemu_firmware.c | 1 + src/qemu/qemu_namespace.c | 1 + src/qemu/qemu_process.c | 1 + src/qemu/qemu_validate.c | 1 + 9 files changed, 98 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 80f467ae7a..08e82c5380 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1513,6 +1513,7 @@ VIR_ENUM_IMPL(virDomainLaunchSecurity, "", "sev", "s390-pv", + "tdx", ); typedef enum { @@ -3808,6 +3809,11 @@ virDomainSecDefFree(virDomainSecDef *def) g_free(def->data.sev.dh_cert); g_free(def->data.sev.session); break; + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + g_free(def->data.tdx.mrconfigid); + g_free(def->data.tdx.mrowner); + g_free(def->data.tdx.mrownerconfig); + g_free(def->data.tdx.QGS); case VIR_DOMAIN_LAUNCH_SECURITY_PV: case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: @@ -13452,6 +13458,25 @@ virDomainSEVDefParseXML(virDomainSEVDef *def, } +static int +virDomainTDXDefParseXML(virDomainTDXDef *def, + xmlXPathContextPtr ctxt) +{ + if (virXPathUIntBase("string(./policy)", ctxt, 16, &def->policy) < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("failed to get launch security policy for launch security type TDX")); + return -1; + } + + def->mrconfigid = virXPathString("string(./mrConfigId)", ctxt); + def->mrowner = virXPathString("string(./mrOwner)", ctxt); + def->mrownerconfig = virXPathString("string(./mrOwnerConfig)", ctxt); + def->QGS = virXPathString("string(./Quote-Generation-Service)", ctxt); + + return 0; +} + + static virDomainSecDef * virDomainSecDefParseXML(xmlNodePtr lsecNode, xmlXPathContextPtr ctxt) @@ -13471,6 +13496,10 @@ virDomainSecDefParseXML(xmlNodePtr lsecNode, if (virDomainSEVDefParseXML(&sec->data.sev, ctxt) < 0) return NULL; break; + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + if (virDomainTDXDefParseXML(&sec->data.tdx, ctxt) < 0) + return NULL; + break; case VIR_DOMAIN_LAUNCH_SECURITY_PV: break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: @@ -26468,6 +26497,23 @@ virDomainSecDefFormat(virBuffer *buf, virDomainSecDef *sec) break; } + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: { + virDomainTDXDef *tdx = &sec->data.tdx; + + virBufferAsprintf(&childBuf, "<policy>0x%04x</policy>\n", tdx->policy); + + if (tdx->mrconfigid) + virBufferEscapeString(&childBuf, "<mrConfigId>%s</mrConfigId>\n", tdx->mrconfigid); + if (tdx->mrowner) + virBufferEscapeString(&childBuf, "<mrOwner>%s</mrOwner>\n", tdx->mrowner); + if (tdx->mrownerconfig) + virBufferEscapeString(&childBuf, "<mrOwnerConfig>%s</mrOwnerConfig>\n", tdx->mrownerconfig); + if (tdx->QGS) + virBufferEscapeString(&childBuf, "<Quote-Generation-Service>%s</Quote-Generation-Service>\n", tdx->QGS); + + break; + } + case VIR_DOMAIN_LAUNCH_SECURITY_PV: break; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 98f99721f0..3b01850eb4 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2833,6 +2833,7 @@ typedef enum { VIR_DOMAIN_LAUNCH_SECURITY_NONE, VIR_DOMAIN_LAUNCH_SECURITY_SEV, VIR_DOMAIN_LAUNCH_SECURITY_PV, + VIR_DOMAIN_LAUNCH_SECURITY_TDX, VIR_DOMAIN_LAUNCH_SECURITY_LAST, } virDomainLaunchSecurity; @@ -2849,10 +2850,19 @@ struct _virDomainSEVDef { virTristateBool kernel_hashes; }; +struct _virDomainTDXDef { + unsigned int policy; + char *mrconfigid; + char *mrowner; + char *mrownerconfig; + char *QGS; +}; + struct _virDomainSecDef { virDomainLaunchSecurity sectype; union { virDomainSEVDef sev; + virDomainTDXDef tdx; } data; }; diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index a26986b5ce..bf3667d727 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -520,6 +520,9 @@ <value>s390-pv</value> </attribute> </group> + <group> + <ref name="launchSecurityTDX"/> + </group> </choice> </element> </define> @@ -565,6 +568,37 @@ </interleave> </define> + <define name="launchSecurityTDX"> + <attribute name="type"> + <value>tdx</value> + </attribute> + <interleave> + <element name="policy"> + <ref name="hexuint"/> + </element> + <optional> + <element name="mrConfigId"> + <data type="string"/> + </element> + </optional> + <optional> + <element name="mrOwner"> + <data type="string"/> + </element> + </optional> + <optional> + <element name="mrOwnerConfig"> + <data type="string"/> + </element> + </optional> + <optional> + <element name="Quote-Generation-Service"> + <data type="string"/> + </element> + </optional> + </interleave> + </define> + <!-- Enable or disable perf events for the domain. For each of the events the following rules apply: diff --git a/src/conf/virconftypes.h b/src/conf/virconftypes.h index 26cb966194..5516165bcc 100644 --- a/src/conf/virconftypes.h +++ b/src/conf/virconftypes.h @@ -210,6 +210,8 @@ typedef struct _virDomainResourceDef virDomainResourceDef; typedef struct _virDomainSEVDef virDomainSEVDef; +typedef struct _virDomainTDXDef virDomainTDXDef; + typedef struct _virDomainSecDef virDomainSecDef; typedef struct _virDomainShmemDef virDomainShmemDef; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index fd0f12f304..89905378e4 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -7001,6 +7001,7 @@ qemuBuildMachineCommandLine(virCommand *cmd, } break; case VIR_DOMAIN_LAUNCH_SECURITY_PV: + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: virBufferAddLit(&buf, ",confidential-guest-support=lsec0"); break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: @@ -9658,6 +9659,7 @@ qemuBuildSecCommandLine(virDomainObj *vm, virCommand *cmd, case VIR_DOMAIN_LAUNCH_SECURITY_PV: return qemuBuildPVCommandLine(vm, cmd); break; + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, sec->sectype); diff --git a/src/qemu/qemu_firmware.c b/src/qemu/qemu_firmware.c index d39e61d071..b073a38bfc 100644 --- a/src/qemu/qemu_firmware.c +++ b/src/qemu/qemu_firmware.c @@ -1374,6 +1374,7 @@ qemuFirmwareMatchDomain(const virDomainDef *def, } break; case VIR_DOMAIN_LAUNCH_SECURITY_PV: + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: diff --git a/src/qemu/qemu_namespace.c b/src/qemu/qemu_namespace.c index 915d44310f..ac8a4b5c07 100644 --- a/src/qemu/qemu_namespace.c +++ b/src/qemu/qemu_namespace.c @@ -660,6 +660,7 @@ qemuDomainSetupLaunchSecurity(virDomainObj *vm, VIR_DEBUG("Set up launch security for SEV"); break; case VIR_DOMAIN_LAUNCH_SECURITY_PV: + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1ef032dbd2..f27f6653f5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -6744,6 +6744,7 @@ qemuProcessPrepareLaunchSecurityGuestInput(virDomainObj *vm) case VIR_DOMAIN_LAUNCH_SECURITY_SEV: return qemuProcessPrepareSEVGuestInput(vm); case VIR_DOMAIN_LAUNCH_SECURITY_PV: + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: return 0; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 93df9e4c8e..af630796cd 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1322,6 +1322,7 @@ qemuValidateDomainDef(const virDomainDef *def, return -1; } break; + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, def->sec->sectype); -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:14PM +0800, Zhenzhong Duan wrote:
When 'tdx' is used, the VM will launched with Intel TDX feature enabled. TDX feature supports running encrypted VM (Trust Domain, TD) under the control of KVM. A TD runs in a CPU model which protects the confidentiality of its memory and its CPU state from other software
There is a child element 'policy' and four optional element for tdx type. In 'policy', bit 0 is set to enable TDX debug, bit 28 set to enable sept-ve-disable, other bits are reserved currently. mrConfigId, mrOwner and mrOwnerConfig are hex string of 48 * 2 length each. Quote-Generation-Service is string to specify Quote Generation Service(QGS) in qemu socket address format. The examples of the supported format are "vsock:2:1234", "unix:/run/qgs", "localhost:1234".
For example:
<launchSecurity type='tdx'> <policy>0x1</policy> <mrConfigId>xxx...xxx</mrConfigId> <mrOwner>xxx...xxx</mrOwner> <mrOwnerConfig>xxx...xxx</mrOwnerConfig> <Quote-Generation-Service>xxx</Quote-Generation-Service> </launchSecurity>
On the QEMU side, the quote generateo sevice is defined as '*quote-generation-socket': 'SocketAddress' we need to model 'SocktetAddress' in the XML properly, not just as an opaque string. Also given the naming for the rest of the elements, this should also use caps, eg <quoteGenerationService>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/conf/domain_conf.c | 46 +++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 10 +++++++ src/conf/schemas/domaincommon.rng | 34 +++++++++++++++++++++++ src/conf/virconftypes.h | 2 ++ src/qemu/qemu_command.c | 2 ++ src/qemu/qemu_firmware.c | 1 + src/qemu/qemu_namespace.c | 1 + src/qemu/qemu_process.c | 1 + src/qemu/qemu_validate.c | 1 +
Schema additions need something added to docs/formatdomain.rst to document them, as well as example XML added under tests/ to validate the parsing and formatting logic, and the QEMU command line args generation.
9 files changed, 98 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 80f467ae7a..08e82c5380 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1513,6 +1513,7 @@ VIR_ENUM_IMPL(virDomainLaunchSecurity, "", "sev", "s390-pv", + "tdx", );
typedef enum { @@ -3808,6 +3809,11 @@ virDomainSecDefFree(virDomainSecDef *def) g_free(def->data.sev.dh_cert); g_free(def->data.sev.session); break; + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + g_free(def->data.tdx.mrconfigid); + g_free(def->data.tdx.mrowner); + g_free(def->data.tdx.mrownerconfig); + g_free(def->data.tdx.QGS); case VIR_DOMAIN_LAUNCH_SECURITY_PV: case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: @@ -13452,6 +13458,25 @@ virDomainSEVDefParseXML(virDomainSEVDef *def, }
+static int +virDomainTDXDefParseXML(virDomainTDXDef *def, + xmlXPathContextPtr ctxt) +{ + if (virXPathUIntBase("string(./policy)", ctxt, 16, &def->policy) < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("failed to get launch security policy for launch security type TDX")); + return -1; + } + + def->mrconfigid = virXPathString("string(./mrConfigId)", ctxt); + def->mrowner = virXPathString("string(./mrOwner)", ctxt); + def->mrownerconfig = virXPathString("string(./mrOwnerConfig)", ctxt); + def->QGS = virXPathString("string(./Quote-Generation-Service)", ctxt); + + return 0; +} + + static virDomainSecDef * virDomainSecDefParseXML(xmlNodePtr lsecNode, xmlXPathContextPtr ctxt) @@ -13471,6 +13496,10 @@ virDomainSecDefParseXML(xmlNodePtr lsecNode, if (virDomainSEVDefParseXML(&sec->data.sev, ctxt) < 0) return NULL; break; + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + if (virDomainTDXDefParseXML(&sec->data.tdx, ctxt) < 0) + return NULL; + break; case VIR_DOMAIN_LAUNCH_SECURITY_PV: break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: @@ -26468,6 +26497,23 @@ virDomainSecDefFormat(virBuffer *buf, virDomainSecDef *sec) break; }
+ case VIR_DOMAIN_LAUNCH_SECURITY_TDX: { + virDomainTDXDef *tdx = &sec->data.tdx; + + virBufferAsprintf(&childBuf, "<policy>0x%04x</policy>\n", tdx->policy); + + if (tdx->mrconfigid) + virBufferEscapeString(&childBuf, "<mrConfigId>%s</mrConfigId>\n", tdx->mrconfigid); + if (tdx->mrowner) + virBufferEscapeString(&childBuf, "<mrOwner>%s</mrOwner>\n", tdx->mrowner); + if (tdx->mrownerconfig) + virBufferEscapeString(&childBuf, "<mrOwnerConfig>%s</mrOwnerConfig>\n", tdx->mrownerconfig); + if (tdx->QGS) + virBufferEscapeString(&childBuf, "<Quote-Generation-Service>%s</Quote-Generation-Service>\n", tdx->QGS); + + break; + } + case VIR_DOMAIN_LAUNCH_SECURITY_PV: break;
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 98f99721f0..3b01850eb4 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2833,6 +2833,7 @@ typedef enum { VIR_DOMAIN_LAUNCH_SECURITY_NONE, VIR_DOMAIN_LAUNCH_SECURITY_SEV, VIR_DOMAIN_LAUNCH_SECURITY_PV, + VIR_DOMAIN_LAUNCH_SECURITY_TDX,
VIR_DOMAIN_LAUNCH_SECURITY_LAST, } virDomainLaunchSecurity; @@ -2849,10 +2850,19 @@ struct _virDomainSEVDef { virTristateBool kernel_hashes; };
+struct _virDomainTDXDef { + unsigned int policy; + char *mrconfigid; + char *mrowner; + char *mrownerconfig; + char *QGS; +}; + struct _virDomainSecDef { virDomainLaunchSecurity sectype; union { virDomainSEVDef sev; + virDomainTDXDef tdx; } data; };
diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index a26986b5ce..bf3667d727 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -520,6 +520,9 @@ <value>s390-pv</value> </attribute> </group> + <group> + <ref name="launchSecurityTDX"/> + </group> </choice> </element> </define> @@ -565,6 +568,37 @@ </interleave> </define>
+ <define name="launchSecurityTDX"> + <attribute name="type"> + <value>tdx</value> + </attribute> + <interleave> + <element name="policy"> + <ref name="hexuint"/> + </element> + <optional> + <element name="mrConfigId"> + <data type="string"/> + </element> + </optional> + <optional> + <element name="mrOwner"> + <data type="string"/> + </element> + </optional> + <optional> + <element name="mrOwnerConfig"> + <data type="string"/> + </element> + </optional> + <optional> + <element name="Quote-Generation-Service"> + <data type="string"/> + </element> + </optional> + </interleave> + </define> + <!-- Enable or disable perf events for the domain. For each of the events the following rules apply: diff --git a/src/conf/virconftypes.h b/src/conf/virconftypes.h index 26cb966194..5516165bcc 100644 --- a/src/conf/virconftypes.h +++ b/src/conf/virconftypes.h @@ -210,6 +210,8 @@ typedef struct _virDomainResourceDef virDomainResourceDef;
typedef struct _virDomainSEVDef virDomainSEVDef;
+typedef struct _virDomainTDXDef virDomainTDXDef; + typedef struct _virDomainSecDef virDomainSecDef;
typedef struct _virDomainShmemDef virDomainShmemDef; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index fd0f12f304..89905378e4 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -7001,6 +7001,7 @@ qemuBuildMachineCommandLine(virCommand *cmd, } break; case VIR_DOMAIN_LAUNCH_SECURITY_PV: + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: virBufferAddLit(&buf, ",confidential-guest-support=lsec0"); break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: @@ -9658,6 +9659,7 @@ qemuBuildSecCommandLine(virDomainObj *vm, virCommand *cmd, case VIR_DOMAIN_LAUNCH_SECURITY_PV: return qemuBuildPVCommandLine(vm, cmd); break; + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, sec->sectype); diff --git a/src/qemu/qemu_firmware.c b/src/qemu/qemu_firmware.c index d39e61d071..b073a38bfc 100644 --- a/src/qemu/qemu_firmware.c +++ b/src/qemu/qemu_firmware.c @@ -1374,6 +1374,7 @@ qemuFirmwareMatchDomain(const virDomainDef *def, } break; case VIR_DOMAIN_LAUNCH_SECURITY_PV: + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: diff --git a/src/qemu/qemu_namespace.c b/src/qemu/qemu_namespace.c index 915d44310f..ac8a4b5c07 100644 --- a/src/qemu/qemu_namespace.c +++ b/src/qemu/qemu_namespace.c @@ -660,6 +660,7 @@ qemuDomainSetupLaunchSecurity(virDomainObj *vm, VIR_DEBUG("Set up launch security for SEV"); break; case VIR_DOMAIN_LAUNCH_SECURITY_PV: + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1ef032dbd2..f27f6653f5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -6744,6 +6744,7 @@ qemuProcessPrepareLaunchSecurityGuestInput(virDomainObj *vm) case VIR_DOMAIN_LAUNCH_SECURITY_SEV: return qemuProcessPrepareSEVGuestInput(vm); case VIR_DOMAIN_LAUNCH_SECURITY_PV: + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: return 0; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 93df9e4c8e..af630796cd 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1322,6 +1322,7 @@ qemuValidateDomainDef(const virDomainDef *def, return -1; } break; + case VIR_DOMAIN_LAUNCH_SECURITY_TDX: case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, def->sec->sectype); -- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 04/11] conf: add tdx as launch security type
On Mon, Nov 27, 2023 at 04:55:14PM +0800, Zhenzhong Duan wrote:
When 'tdx' is used, the VM will launched with Intel TDX feature enabled. TDX feature supports running encrypted VM (Trust Domain, TD) under the control of KVM. A TD runs in a CPU model which protects the confidentiality of its memory and its CPU state from other software
There is a child element 'policy' and four optional element for tdx type. In 'policy', bit 0 is set to enable TDX debug, bit 28 set to enable sept-ve-disable, other bits are reserved currently. mrConfigId, mrOwner and mrOwnerConfig are hex string of 48 * 2 length each. Quote-Generation-Service is string to specify Quote Generation Service(QGS) in qemu socket address format. The examples of the supported format are "vsock:2:1234", "unix:/run/qgs", "localhost:1234".
For example:
<launchSecurity type='tdx'> <policy>0x1</policy> <mrConfigId>xxx...xxx</mrConfigId> <mrOwner>xxx...xxx</mrOwner> <mrOwnerConfig>xxx...xxx</mrOwnerConfig> <Quote-Generation-Service>xxx</Quote-Generation-Service> </launchSecurity>
On the QEMU side, the quote generateo sevice is defined as
'*quote-generation-socket': 'SocketAddress'
we need to model 'SocktetAddress' in the XML properly, not just as an opaque string.
Good suggestion.
Also given the naming for the rest of the elements, this should also use caps, eg <quoteGenerationService>
Will do.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/conf/domain_conf.c | 46
+++++++++++++++++++++++++++++++
src/conf/domain_conf.h | 10 +++++++ src/conf/schemas/domaincommon.rng | 34 +++++++++++++++++++++++ src/conf/virconftypes.h | 2 ++ src/qemu/qemu_command.c | 2 ++ src/qemu/qemu_firmware.c | 1 + src/qemu/qemu_namespace.c | 1 + src/qemu/qemu_process.c | 1 + src/qemu/qemu_validate.c | 1 +
Schema additions need something added to docs/formatdomain.rst to document them, as well as example XML added under tests/ to validate the parsing and formatting logic, and the QEMU command line args generation.
Will do. Thanks Zhenzhong

QEMU will provides 'tdx-guest' object which is used to launch encrypted VMs on Intel platform using TDX feature. Command line looks like: $QEMU ... \ -object tdx-guest,id=lsec0,debug=on,sept-ve-disable=on,mrconfigid=xxx...xxx,mrowner=xxx...xxx,mrownerconfig=xxx...xxx,quote-generation-service=localhost:1234 \ -machine q35,confidential-guest-support=lsec0 Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_command.c | 27 +++++++++++++++++++++++++++ src/qemu/qemu_validate.c | 7 +++++++ 2 files changed, 34 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 89905378e4..45223746f5 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -9645,6 +9645,32 @@ qemuBuildPVCommandLine(virDomainObj *vm, virCommand *cmd) } +static int +qemuBuildTDXCommandLine(virDomainObj *vm, virCommand *cmd, + virDomainTDXDef *tdx) +{ + g_autoptr(virJSONValue) props = NULL; + qemuDomainObjPrivate *priv = vm->privateData; + + VIR_DEBUG("policy=0x%x", tdx->policy); + + if (qemuMonitorCreateObjectProps(&props, "tdx-guest", "lsec0", + "B:debug", !!(tdx->policy & 0x1), + "b:sept-ve-disable", !!(tdx->policy & 0x10000000), + "S:mrconfigid", tdx->mrconfigid, + "S:mrowner", tdx->mrowner, + "S:mrownerconfig", tdx->mrownerconfig, + "S:quote-generation-service", tdx->QGS, + NULL) < 0) + return -1; + + if (qemuBuildObjectCommandlineFromJSON(cmd, props, priv->qemuCaps) < 0) + return -1; + + return 0; +} + + static int qemuBuildSecCommandLine(virDomainObj *vm, virCommand *cmd, virDomainSecDef *sec) @@ -9660,6 +9686,7 @@ qemuBuildSecCommandLine(virDomainObj *vm, virCommand *cmd, return qemuBuildPVCommandLine(vm, cmd); break; case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + return qemuBuildTDXCommandLine(vm, cmd, &sec->data.tdx); case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, sec->sectype); diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index af630796cd..5a9173e8ff 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1323,6 +1323,13 @@ qemuValidateDomainDef(const virDomainDef *def, } break; case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT) || + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_TDX_GUEST)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("INTEL TDX launch security is not supported with this QEMU binary")); + return -1; + } + break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, def->sec->sectype); -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:15PM +0800, Zhenzhong Duan wrote:
QEMU will provides 'tdx-guest' object which is used to launch encrypted VMs on Intel platform using TDX feature.
Command line looks like: $QEMU ... \ -object tdx-guest,id=lsec0,debug=on,sept-ve-disable=on,mrconfigid=xxx...xxx,mrowner=xxx...xxx,mrownerconfig=xxx...xxx,quote-generation-service=localhost:1234 \ -machine q35,confidential-guest-support=lsec0
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_command.c | 27 +++++++++++++++++++++++++++ src/qemu/qemu_validate.c | 7 +++++++ 2 files changed, 34 insertions(+)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 89905378e4..45223746f5 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -9645,6 +9645,32 @@ qemuBuildPVCommandLine(virDomainObj *vm, virCommand *cmd) }
+static int +qemuBuildTDXCommandLine(virDomainObj *vm, virCommand *cmd, + virDomainTDXDef *tdx) +{ + g_autoptr(virJSONValue) props = NULL; + qemuDomainObjPrivate *priv = vm->privateData; + + VIR_DEBUG("policy=0x%x", tdx->policy); + + if (qemuMonitorCreateObjectProps(&props, "tdx-guest", "lsec0", + "B:debug", !!(tdx->policy & 0x1), + "b:sept-ve-disable", !!(tdx->policy & 0x10000000),
Here you're expanding the policy integer to named cli args. What is defining the semantics for bits in the policy integer ? What happens if other bits are set besides 0x1 and 0x10000000 - if we are not honouring those bits, we need to report an error when validating the xml
+ "S:mrconfigid", tdx->mrconfigid, + "S:mrowner", tdx->mrowner, + "S:mrownerconfig", tdx->mrownerconfig, + "S:quote-generation-service", tdx->QGS,
This is passing through QEMU JSON directly from the XML which is certainly not allowed. As mentioned in previous patch, we need to model 'SocketAddress' QAPI explicitly in the XML schema.
+ NULL) < 0) + return -1; + + if (qemuBuildObjectCommandlineFromJSON(cmd, props, priv->qemuCaps) < 0) + return -1; + + return 0; x> +} + + static int qemuBuildSecCommandLine(virDomainObj *vm, virCommand *cmd, virDomainSecDef *sec) @@ -9660,6 +9686,7 @@ qemuBuildSecCommandLine(virDomainObj *vm, virCommand *cmd, return qemuBuildPVCommandLine(vm, cmd); break; case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + return qemuBuildTDXCommandLine(vm, cmd, &sec->data.tdx); case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, sec->sectype); diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index af630796cd..5a9173e8ff 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1323,6 +1323,13 @@ qemuValidateDomainDef(const virDomainDef *def, } break; case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT) ||
This isn't required as it is implied by CAPS_TDX_GUEST
+ !virQEMUCapsGet(qemuCaps, QEMU_CAPS_TDX_GUEST)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("INTEL TDX launch security is not supported with this QEMU binary"));
s/INTEL/Intel/
+ return -1; + }
This is where we need to valid 'policy' bits are supported.
+ break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, def->sec->sectype); -- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 05/11] qemu: Add command line and validation for TDX type
On Mon, Nov 27, 2023 at 04:55:15PM +0800, Zhenzhong Duan wrote:
QEMU will provides 'tdx-guest' object which is used to launch encrypted VMs on Intel platform using TDX feature.
Command line looks like: $QEMU ... \ -object tdx-guest,id=lsec0,debug=on,sept-ve- disable=on,mrconfigid=xxx...xxx,mrowner=xxx...xxx,mrownerconfig=xxx...xxx,q uote-generation-service=localhost:1234 \ -machine q35,confidential-guest-support=lsec0
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_command.c | 27 +++++++++++++++++++++++++++ src/qemu/qemu_validate.c | 7 +++++++ 2 files changed, 34 insertions(+)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 89905378e4..45223746f5 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -9645,6 +9645,32 @@ qemuBuildPVCommandLine(virDomainObj *vm, virCommand *cmd) }
+static int +qemuBuildTDXCommandLine(virDomainObj *vm, virCommand *cmd, + virDomainTDXDef *tdx) +{ + g_autoptr(virJSONValue) props = NULL; + qemuDomainObjPrivate *priv = vm->privateData; + + VIR_DEBUG("policy=0x%x", tdx->policy); + + if (qemuMonitorCreateObjectProps(&props, "tdx-guest", "lsec0", + "B:debug", !!(tdx->policy & 0x1), + "b:sept-ve-disable", !!(tdx->policy & 0x10000000),
Here you're expanding the policy integer to named cli args.
What is defining the semantics for bits in the policy integer ?
They are defined at https://github.com/intel/qemu-tdx/blob/d20f93da3197fff3a30c3fb9ebdc4303a06a3...
What happens if other bits are set besides 0x1 and 0x10000000 - if we are not honouring those bits, we need to report an error when validating the xml
Currently other bits are ignored, will fix it.
+ "S:mrconfigid", tdx->mrconfigid, + "S:mrowner", tdx->mrowner, + "S:mrownerconfig", tdx->mrownerconfig, + "S:quote-generation-service", tdx->QGS,
This is passing through QEMU JSON directly from the XML which is certainly not allowed. As mentioned in previous patch, we need to model 'SocketAddress' QAPI explicitly in the XML schema.
Will do.
+ NULL) < 0) + return -1; + + if (qemuBuildObjectCommandlineFromJSON(cmd, props, priv- qemuCaps) < 0) + return -1; + + return 0; x> +} + + static int qemuBuildSecCommandLine(virDomainObj *vm, virCommand *cmd, virDomainSecDef *sec) @@ -9660,6 +9686,7 @@ qemuBuildSecCommandLine(virDomainObj *vm, virCommand *cmd, return qemuBuildPVCommandLine(vm, cmd); break; case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + return qemuBuildTDXCommandLine(vm, cmd, &sec->data.tdx); case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: virReportEnumRangeError(virDomainLaunchSecurity, sec->sectype); diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index af630796cd..5a9173e8ff 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1323,6 +1323,13 @@ qemuValidateDomainDef(const virDomainDef *def, } break; case VIR_DOMAIN_LAUNCH_SECURITY_TDX: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT) ||
This isn't required as it is implied by CAPS_TDX_GUEST
Will remove.
+ !virQEMUCapsGet(qemuCaps, QEMU_CAPS_TDX_GUEST)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("INTEL TDX launch security is not supported with this QEMU binary"));
s/INTEL/Intel/
Will fix.
+ return -1; + }
This is where we need to valid 'policy' bits are supported.
Got it, will do. Thanks Zhenzhong

TDX guest requires some special parameters to boot, They are: "-machine pc-q35-*" "kernel_irqchip=split" Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_validate.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 5a9173e8ff..c4f386fe99 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1329,6 +1329,16 @@ qemuValidateDomainDef(const virDomainDef *def, _("INTEL TDX launch security is not supported with this QEMU binary")); return -1; } + if (!qemuDomainIsQ35(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Intel TDX is supported with q35 machine types only")); + return -1; + } + if (def->features[VIR_DOMAIN_FEATURE_IOAPIC] != VIR_DOMAIN_IOAPIC_QEMU) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("INTEL TDX launch security needs split kernel irqchip")); + return -1; + } break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:16PM +0800, Zhenzhong Duan wrote:
TDX guest requires some special parameters to boot, They are:
"-machine pc-q35-*" "kernel_irqchip=split"
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_validate.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 5a9173e8ff..c4f386fe99 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1329,6 +1329,16 @@ qemuValidateDomainDef(const virDomainDef *def, _("INTEL TDX launch security is not supported with this QEMU binary")); return -1; } + if (!qemuDomainIsQ35(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Intel TDX is supported with q35 machine types only")); + return -1; + }
Ideally QMP 'MachineInfo' struct would report whether TDX is supported so we don't need to hardcode that.
+ if (def->features[VIR_DOMAIN_FEATURE_IOAPIC] != VIR_DOMAIN_IOAPIC_QEMU) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("INTEL TDX launch security needs split kernel irqchip"));
s/INTEL/Intel/ Ideally QEMU would automatically use the correct ioapic impl when no args are given to QEMU. That would let us do if (def->features[VIR_DOMAIN_FEATURE_IOAPIC] == VIR_DOMAIN_IOAPIC_KVM) { thus allowing IOAPIC_NONE (ie QEMU's default) or IOAPIC_QEMU (explicitly requested config). This will make TDX guest "just work" in more scenarios.
+ return -1; + } break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: -- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 06/11] qemu: force special parameters enabled for TDX guest
On Mon, Nov 27, 2023 at 04:55:16PM +0800, Zhenzhong Duan wrote:
TDX guest requires some special parameters to boot, They are:
"-machine pc-q35-*" "kernel_irqchip=split"
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_validate.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 5a9173e8ff..c4f386fe99 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1329,6 +1329,16 @@ qemuValidateDomainDef(const virDomainDef *def, _("INTEL TDX launch security is not supported with this QEMU binary")); return -1; } + if (!qemuDomainIsQ35(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Intel TDX is supported with q35 machine types only")); + return -1; + }
Ideally QMP 'MachineInfo' struct would report whether TDX is supported so we don't need to hardcode that.
As you suggested in previous mails, I'll remove Q35 check.
+ if (def->features[VIR_DOMAIN_FEATURE_IOAPIC] != VIR_DOMAIN_IOAPIC_QEMU) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("INTEL TDX launch security needs split kernel irqchip"));
s/INTEL/Intel/
Ideally QEMU would automatically use the correct ioapic impl when no args are given to QEMU. That would let us do
if (def->features[VIR_DOMAIN_FEATURE_IOAPIC] == VIR_DOMAIN_IOAPIC_KVM) {
thus allowing IOAPIC_NONE (ie QEMU's default) or IOAPIC_QEMU (explicitly requested config). This will make TDX guest "just work" in more scenarios.
It looks the matching QEMU doesn't do this automation for kernel-irqchip yet. @Li, Xiaoyao could you add this automation on QEMU side? Meanwhile I'll apply Daniel's suggested change on libvirt side. Thanks Zhenzhong
+ return -1; + } break; case VIR_DOMAIN_LAUNCH_SECURITY_NONE: case VIR_DOMAIN_LAUNCH_SECURITY_LAST: -- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

From: Chenyi Qiang <chenyi.qiang@intel.com> Add the new flag VIR_DOMAIN_REBOOT_HARD/VIR_DOMAIN_SHUTDOWN_HARD to carry out a hard reboot, which kills the QEMU process and creates a new one with the same definition. Hard reboot will be the highest priority to check. If succeed, other reboot policy (i.e. agent and acpi) would be skipped. Otherwise, use the traditional way. To make the shutdown graceful, the workflow of hard reboot is similar to the existing fakeReboot procedure. - In qemuDomainObjPrivatePtr we have a bool hardReboot flag. - The virDomainReboot method sets this flag and then triggers a normal "system_powerdown" if we specify the reboot mode "--mode hard" - The QEMU process is started with '-no-shutdown' so that the guest CPUs pause when it powers off the guest. - When we receive the "SHUTDOWN" event from QEMU monitor, if hardReboot is not set, then check fakeReboot or the last choice to invoke qemuProcessKill command to shutdown normally. - If hardReboot was set, we spawn a background thread, which issues qemuProcessStop to kill the QEMU process and cleanup the working environment. Then issue qemuProcessStart to create a new QEMU process with the same domain definition. At last, issues 'cont' to start the CPUs again. The state transfer during the hardReboot is RUNNING->SHUTDOWN->SHUTOFF->PAUSED->RUNNING, This patch doesn't hide it as the states is not transient. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> --- include/libvirt/libvirt-domain.h | 2 + src/qemu/qemu_domain.c | 18 +++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 76 +++++++++++++++++++++------ src/qemu/qemu_process.c | 88 +++++++++++++++++++++++++++++++- tools/virsh-domain.c | 8 ++- 6 files changed, 177 insertions(+), 19 deletions(-) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index a1902546bb..5cedbbd1fc 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1520,6 +1520,7 @@ typedef enum { VIR_DOMAIN_SHUTDOWN_INITCTL = (1 << 2), /* Use initctl (Since: 1.0.1) */ VIR_DOMAIN_SHUTDOWN_SIGNAL = (1 << 3), /* Send a signal (Since: 1.0.1) */ VIR_DOMAIN_SHUTDOWN_PARAVIRT = (1 << 4), /* Use paravirt guest control (Since: 1.2.5) */ + VIR_DOMAIN_SHUTDOWN_HARD = (1 << 5), /* Powercycle control (Since: 8.6.0) */ } virDomainShutdownFlagValues; int virDomainShutdown (virDomainPtr domain); @@ -1538,6 +1539,7 @@ typedef enum { VIR_DOMAIN_REBOOT_INITCTL = (1 << 2), /* Use initctl (Since: 1.0.1) */ VIR_DOMAIN_REBOOT_SIGNAL = (1 << 3), /* Send a signal (Since: 1.0.1) */ VIR_DOMAIN_REBOOT_PARAVIRT = (1 << 4), /* Use paravirt guest control (Since: 1.2.5) */ + VIR_DOMAIN_REBOOT_HARD = (1 << 5), /* Powercycle control (Since: 8.6.0) */ } virDomainRebootFlagValues; int virDomainReboot (virDomainPtr domain, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ae19ce884b..b65d85f91b 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2614,6 +2614,9 @@ qemuDomainObjPrivateXMLFormat(virBuffer *buf, if (priv->fakeReboot) virBufferAddLit(buf, "<fakereboot/>\n"); + if (priv->hardReboot) + virBufferAddLit(buf, "<hardreboot/>\n"); + if (priv->qemuDevices && *priv->qemuDevices) { char **tmp = priv->qemuDevices; virBufferAddLit(buf, "<devices>\n"); @@ -3305,6 +3308,8 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1; + priv->hardReboot = virXPathBoolean("boolean(./hardreboot)", ctxt) == 1; + if ((n = virXPathNodeSet("./devices/device", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("failed to parse qemu device list")); @@ -7445,6 +7450,19 @@ qemuDomainSetFakeReboot(virDomainObj *vm, qemuDomainSaveStatus(vm); } +void +qemuDomainSetHardReboot(virDomainObj *vm, + bool value) +{ + qemuDomainObjPrivate *priv = vm->privateData; + + if (priv->hardReboot == value) + return; + + priv->hardReboot = value; + qemuDomainSaveStatus(vm); +} + static void qemuDomainCheckRemoveOptionalDisk(virQEMUDriver *driver, virDomainObj *vm, diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 1e56e50672..fd6058c5bc 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -137,6 +137,7 @@ struct _qemuDomainObjPrivate { * -no-reboot instead. */ virTristateBool allowReboot; + bool hardReboot; unsigned long migMaxBandwidth; char *origname; @@ -700,6 +701,9 @@ qemuDomainRemoveInactiveLocked(virQEMUDriver *driver, void qemuDomainSetFakeReboot(virDomainObj *vm, bool value); +void qemuDomainSetHardReboot(virDomainObj *vm, + bool value); + int qemuDomainCheckDiskStartupPolicy(virQEMUDriver *driver, virDomainObj *vm, size_t diskIndex, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d00d2a27c6..86e8efbfcb 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1788,6 +1788,7 @@ qemuDomainShutdownFlagsAgent(virDomainObj *vm, goto endjob; qemuDomainSetFakeReboot(vm, false); + qemuDomainSetHardReboot(vm, false); agent = qemuDomainObjEnterAgent(vm); ret = qemuAgentShutdown(agent, agentFlag); qemuDomainObjExitAgent(vm, agent); @@ -1800,7 +1801,8 @@ qemuDomainShutdownFlagsAgent(virDomainObj *vm, static int qemuDomainShutdownFlagsMonitor(virDomainObj *vm, - bool isReboot) + bool isReboot, + bool hard) { int ret = -1; qemuDomainObjPrivate *priv; @@ -1816,7 +1818,17 @@ qemuDomainShutdownFlagsMonitor(virDomainObj *vm, goto endjob; } - qemuDomainSetFakeReboot(vm, isReboot); + if (hard) { + /* hard reboot control the reboot */ + VIR_DEBUG("Set hard reboot %d in shutdown monitor", isReboot); + qemuDomainSetHardReboot(vm, isReboot); + qemuDomainSetFakeReboot(vm, false); + } else { + /* fake reboot control the reboot */ + VIR_DEBUG("Set fake reboot %d in shutdown monitor", isReboot); + qemuDomainSetFakeReboot(vm, isReboot); + qemuDomainSetHardReboot(vm, false); + } qemuDomainObjEnterMonitor(vm); ret = qemuMonitorSystemPowerdown(priv->mon); qemuDomainObjExitMonitor(vm); @@ -1832,12 +1844,13 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags) virDomainObj *vm; int ret = -1; qemuDomainObjPrivate *priv; - bool useAgent = false, agentRequested, acpiRequested; + bool useAgent = false, agentRequested, acpiRequested, hardRequested; bool isReboot = false; bool agentForced; virCheckFlags(VIR_DOMAIN_SHUTDOWN_ACPI_POWER_BTN | - VIR_DOMAIN_SHUTDOWN_GUEST_AGENT, -1); + VIR_DOMAIN_SHUTDOWN_GUEST_AGENT | + VIR_DOMAIN_SHUTDOWN_HARD, -1); if (!(vm = qemuDomainObjFromDomain(dom))) goto cleanup; @@ -1851,14 +1864,23 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags) priv = vm->privateData; agentRequested = flags & VIR_DOMAIN_SHUTDOWN_GUEST_AGENT; acpiRequested = flags & VIR_DOMAIN_SHUTDOWN_ACPI_POWER_BTN; + hardRequested = flags & VIR_DOMAIN_SHUTDOWN_HARD; + + if (virDomainShutdownFlagsEnsureACL(dom->conn, vm->def, flags) < 0) + goto cleanup; + + /* Take hard reboot as the highest priority. + * if failed, consider the agent and acpi */ + if (hardRequested) + ret = qemuDomainShutdownFlagsMonitor(vm, isReboot, true); + + if (!ret) + goto cleanup; /* Prefer agent unless we were requested to not to. */ if (agentRequested || (!flags && priv->agent)) useAgent = true; - if (virDomainShutdownFlagsEnsureACL(dom->conn, vm->def, flags) < 0) - goto cleanup; - agentForced = agentRequested && !acpiRequested; if (useAgent) { ret = qemuDomainShutdownFlagsAgent(vm, isReboot, agentForced); @@ -1877,7 +1899,7 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags) goto cleanup; } - ret = qemuDomainShutdownFlagsMonitor(vm, isReboot); + ret = qemuDomainShutdownFlagsMonitor(vm, isReboot, false); } cleanup: @@ -1913,6 +1935,7 @@ qemuDomainRebootAgent(virDomainObj *vm, goto endjob; qemuDomainSetFakeReboot(vm, false); + qemuDomainSetHardReboot(vm, false); agent = qemuDomainObjEnterAgent(vm); ret = qemuAgentShutdown(agent, agentFlag); qemuDomainObjExitAgent(vm, agent); @@ -1925,7 +1948,8 @@ qemuDomainRebootAgent(virDomainObj *vm, static int qemuDomainRebootMonitor(virDomainObj *vm, - bool isReboot) + bool isReboot, + bool hard) { qemuDomainObjPrivate *priv = vm->privateData; int ret = -1; @@ -1936,7 +1960,15 @@ qemuDomainRebootMonitor(virDomainObj *vm, if (virDomainObjCheckActive(vm) < 0) goto endjob; - qemuDomainSetFakeReboot(vm, isReboot); + if (hard) { + VIR_DEBUG("Set hard reboot %d in reboot monitor", isReboot); + qemuDomainSetHardReboot(vm, isReboot); + qemuDomainSetFakeReboot(vm, false); + } else { + VIR_DEBUG("Set fake reboot %d in reboot monitor", isReboot); + qemuDomainSetFakeReboot(vm, isReboot); + qemuDomainSetHardReboot(vm, false); + } qemuDomainObjEnterMonitor(vm); ret = qemuMonitorSystemPowerdown(priv->mon); qemuDomainObjExitMonitor(vm); @@ -1953,12 +1985,13 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) virDomainObj *vm; int ret = -1; qemuDomainObjPrivate *priv; - bool useAgent = false, agentRequested, acpiRequested; + bool useAgent = false, agentRequested, acpiRequested, hardRequested; bool isReboot = true; bool agentForced; virCheckFlags(VIR_DOMAIN_REBOOT_ACPI_POWER_BTN | - VIR_DOMAIN_REBOOT_GUEST_AGENT, -1); + VIR_DOMAIN_REBOOT_GUEST_AGENT | + VIR_DOMAIN_REBOOT_HARD, -1); if (!(vm = qemuDomainObjFromDomain(dom))) goto cleanup; @@ -1972,14 +2005,24 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) priv = vm->privateData; agentRequested = flags & VIR_DOMAIN_REBOOT_GUEST_AGENT; acpiRequested = flags & VIR_DOMAIN_REBOOT_ACPI_POWER_BTN; + hardRequested = flags & VIR_DOMAIN_REBOOT_HARD; + + if (virDomainRebootEnsureACL(dom->conn, vm->def, flags) < 0) + goto cleanup; + + /* Take hard reboot as the highest priority. + * This is for TDX which is not allowed to warm reboot. + */ + if (hardRequested) + ret = qemuDomainRebootMonitor(vm, isReboot, true); + + if (!ret) + goto cleanup; /* Prefer agent unless we were requested to not to. */ if (agentRequested || (!flags && priv->agent)) useAgent = true; - if (virDomainRebootEnsureACL(dom->conn, vm->def, flags) < 0) - goto cleanup; - agentForced = agentRequested && !acpiRequested; if (useAgent) ret = qemuDomainRebootAgent(vm, isReboot, agentForced); @@ -1992,7 +2035,7 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) */ if ((!useAgent) || (ret < 0 && (acpiRequested || !flags))) { - ret = qemuDomainRebootMonitor(vm, isReboot); + ret = qemuDomainRebootMonitor(vm, isReboot, false); } cleanup: @@ -2095,6 +2138,7 @@ qemuDomainDestroyFlags(virDomainPtr dom, } qemuDomainSetFakeReboot(vm, false); + qemuDomainSetHardReboot(vm, false); if (vm->job->asyncJob == VIR_ASYNC_JOB_MIGRATION_IN) stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f27f6653f5..9fa679e408 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -502,13 +502,98 @@ qemuProcessFakeReboot(void *opaque) virDomainObjEndAPI(&vm); } +static void +qemuProcessHardReboot(void *opaque) +{ + virDomainObj *vm = opaque; + qemuDomainObjPrivate *priv = vm->privateData; + virQEMUDriver *driver = priv->driver; + unsigned int stopFlags = 0; + virObjectEvent *event = NULL; + virObjectEvent * event2 = NULL; + + VIR_DEBUG("Handle hard reboot: destroy phase"); + + virObjectLock(vm); + if (virDomainObjCheckActive(vm) < 0) + goto cleanup; + + if (qemuProcessBeginStopJob(vm, VIR_JOB_DESTROY, 0) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + virDomainObjEndJob(vm); + goto cleanup; + } + + if (vm->job->asyncJob == VIR_ASYNC_JOB_MIGRATION_IN) + stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; + + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_DESTROYED, + VIR_ASYNC_JOB_NONE, stopFlags); + virDomainAuditStop(vm, "destroyed"); + + event = virDomainEventLifecycleNewFromObj(vm, + VIR_DOMAIN_EVENT_STOPPED, + VIR_DOMAIN_EVENT_STOPPED_DESTROYED); + + virObjectEventStateQueue(driver->domainEventState, event); + /* skip remove inactive domain from active list */ + virDomainObjEndJob(vm); + + VIR_DEBUG("Handle hard reboot: boot phase"); + + if (qemuProcessBeginJob(vm, VIR_DOMAIN_JOB_OPERATION_START, + 0) < 0) { + qemuDomainRemoveInactive(driver, vm, 0, false); + goto cleanup; + } + + if (qemuProcessStart(NULL, driver, vm, NULL, VIR_ASYNC_JOB_START, + NULL, -1, NULL, NULL, + VIR_NETDEV_VPORT_PROFILE_OP_CREATE, + 0) < 0) { + virDomainAuditStart(vm, "booted", false); + qemuDomainRemoveInactive(driver, vm, 0, false); + qemuProcessEndJob(vm); + goto cleanup; + } + + virDomainAuditStart(vm, "booted", true); + event2 = virDomainEventLifecycleNewFromObj(vm, + VIR_DOMAIN_EVENT_STARTED, + VIR_DOMAIN_EVENT_STARTED_BOOTED); + virObjectEventStateQueue(driver->domainEventState, event2); + + qemuProcessEndJob(vm); + + cleanup: + virDomainObjEndAPI(&vm); +} void qemuProcessShutdownOrReboot(virDomainObj *vm) { qemuDomainObjPrivate *priv = vm->privateData; - if (priv->fakeReboot || + if (priv->hardReboot) { + g_autofree char *name = g_strdup_printf("hard reboot-%s", vm->def->name); + virThread th; + + virObjectRef(vm); + if (virThreadCreateFull(&th, + false, + qemuProcessHardReboot, + name, + false, + vm) < 0) { + VIR_ERROR(_("Failed to create hard reboot thread, killing domain")); + ignore_value(qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_NOWAIT)); + virObjectUnref(vm); + } + } else if (priv->fakeReboot || vm->def->onPoweroff == VIR_DOMAIN_LIFECYCLE_ACTION_RESTART) { g_autofree char *name = g_strdup_printf("reboot-%s", vm->def->name); virThread th; @@ -5673,6 +5758,7 @@ qemuProcessInit(virQEMUDriver *driver, } else { vm->def->id = qemuDriverAllocateID(driver); qemuDomainSetFakeReboot(vm, false); + qemuDomainSetFakeReboot(vm, false); virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_STARTING_UP); if (g_atomic_int_add(&driver->nactive, 1) == 0 && driver->inhibitCallback) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 66f933dead..21864c92cb 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -5915,7 +5915,7 @@ static const vshCmdOptDef opts_shutdown[] = { {.name = "mode", .type = VSH_OT_STRING, .completer = virshDomainShutdownModeCompleter, - .help = N_("shutdown mode: acpi|agent|initctl|signal|paravirt") + .help = N_("shutdown mode: acpi|agent|initctl|signal|paravirt|hard") }, {.name = NULL} }; @@ -5952,6 +5952,8 @@ cmdShutdown(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SHUTDOWN_SIGNAL; } else if (STREQ(mode, "paravirt")) { flags |= VIR_DOMAIN_SHUTDOWN_PARAVIRT; + } else if (STREQ(mode, "hard")) { + flags |= VIR_DOMAIN_SHUTDOWN_HARD; } else { vshError(ctl, _("Unknown mode %1$s value, expecting 'acpi', 'agent', 'initctl', 'signal', or 'paravirt'"), mode); @@ -5995,7 +5997,7 @@ static const vshCmdOptDef opts_reboot[] = { {.name = "mode", .type = VSH_OT_STRING, .completer = virshDomainShutdownModeCompleter, - .help = N_("shutdown mode: acpi|agent|initctl|signal|paravirt") + .help = N_("shutdown mode: acpi|agent|initctl|signal|paravirt|hard") }, {.name = NULL} }; @@ -6031,6 +6033,8 @@ cmdReboot(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_REBOOT_SIGNAL; } else if (STREQ(mode, "paravirt")) { flags |= VIR_DOMAIN_REBOOT_PARAVIRT; + } else if (STREQ(mode, "hard")) { + flags |= VIR_DOMAIN_REBOOT_HARD; } else { vshError(ctl, _("Unknown mode %1$s value, expecting 'acpi', 'agent', 'initctl', 'signal' or 'paravirt'"), mode); -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:17PM +0800, Zhenzhong Duan wrote:
From: Chenyi Qiang <chenyi.qiang@intel.com>
Add the new flag VIR_DOMAIN_REBOOT_HARD/VIR_DOMAIN_SHUTDOWN_HARD to carry out a hard reboot, which kills the QEMU process and creates a new one with the same definition.
AFAICT, the _HARD flags cause it to issue a QEMU monitor command todo an ACPI shutdown and then re-create QEMU. IOW, it seems like this is just the same as the existing _ACPI flags, and so I'm not sure why we need any new functionality at all. What is the existing ACPI shutdown & fake reboot not able to achieve ?
Hard reboot will be the highest priority to check. If succeed, other reboot policy (i.e. agent and acpi) would be skipped. Otherwise, use the traditional way.
To make the shutdown graceful, the workflow of hard reboot is similar to the existing fakeReboot procedure.
- In qemuDomainObjPrivatePtr we have a bool hardReboot flag. - The virDomainReboot method sets this flag and then triggers a normal "system_powerdown" if we specify the reboot mode "--mode hard" - The QEMU process is started with '-no-shutdown' so that the guest CPUs pause when it powers off the guest. - When we receive the "SHUTDOWN" event from QEMU monitor, if hardReboot is not set, then check fakeReboot or the last choice to invoke qemuProcessKill command to shutdown normally. - If hardReboot was set, we spawn a background thread, which issues qemuProcessStop to kill the QEMU process and cleanup the working environment. Then issue qemuProcessStart to create a new QEMU process with the same domain definition. At last, issues 'cont' to start the CPUs again.
The state transfer during the hardReboot is RUNNING->SHUTDOWN->SHUTOFF->PAUSED->RUNNING, This patch doesn't hide it as the states is not transient.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> --- include/libvirt/libvirt-domain.h | 2 + src/qemu/qemu_domain.c | 18 +++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 76 +++++++++++++++++++++------ src/qemu/qemu_process.c | 88 +++++++++++++++++++++++++++++++- tools/virsh-domain.c | 8 ++- 6 files changed, 177 insertions(+), 19 deletions(-)
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index a1902546bb..5cedbbd1fc 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1520,6 +1520,7 @@ typedef enum { VIR_DOMAIN_SHUTDOWN_INITCTL = (1 << 2), /* Use initctl (Since: 1.0.1) */ VIR_DOMAIN_SHUTDOWN_SIGNAL = (1 << 3), /* Send a signal (Since: 1.0.1) */ VIR_DOMAIN_SHUTDOWN_PARAVIRT = (1 << 4), /* Use paravirt guest control (Since: 1.2.5) */ + VIR_DOMAIN_SHUTDOWN_HARD = (1 << 5), /* Powercycle control (Since: 8.6.0) */ } virDomainShutdownFlagValues;
int virDomainShutdown (virDomainPtr domain); @@ -1538,6 +1539,7 @@ typedef enum { VIR_DOMAIN_REBOOT_INITCTL = (1 << 2), /* Use initctl (Since: 1.0.1) */ VIR_DOMAIN_REBOOT_SIGNAL = (1 << 3), /* Send a signal (Since: 1.0.1) */ VIR_DOMAIN_REBOOT_PARAVIRT = (1 << 4), /* Use paravirt guest control (Since: 1.2.5) */ + VIR_DOMAIN_REBOOT_HARD = (1 << 5), /* Powercycle control (Since: 8.6.0) */ } virDomainRebootFlagValues;
int virDomainReboot (virDomainPtr domain, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ae19ce884b..b65d85f91b 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2614,6 +2614,9 @@ qemuDomainObjPrivateXMLFormat(virBuffer *buf, if (priv->fakeReboot) virBufferAddLit(buf, "<fakereboot/>\n");
+ if (priv->hardReboot) + virBufferAddLit(buf, "<hardreboot/>\n"); + if (priv->qemuDevices && *priv->qemuDevices) { char **tmp = priv->qemuDevices; virBufferAddLit(buf, "<devices>\n"); @@ -3305,6 +3308,8 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt,
priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1;
+ priv->hardReboot = virXPathBoolean("boolean(./hardreboot)", ctxt) == 1; + if ((n = virXPathNodeSet("./devices/device", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("failed to parse qemu device list")); @@ -7445,6 +7450,19 @@ qemuDomainSetFakeReboot(virDomainObj *vm, qemuDomainSaveStatus(vm); }
+void +qemuDomainSetHardReboot(virDomainObj *vm, + bool value) +{ + qemuDomainObjPrivate *priv = vm->privateData; + + if (priv->hardReboot == value) + return; + + priv->hardReboot = value; + qemuDomainSaveStatus(vm); +} + static void qemuDomainCheckRemoveOptionalDisk(virQEMUDriver *driver, virDomainObj *vm, diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 1e56e50672..fd6058c5bc 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -137,6 +137,7 @@ struct _qemuDomainObjPrivate { * -no-reboot instead. */ virTristateBool allowReboot; + bool hardReboot;
unsigned long migMaxBandwidth; char *origname; @@ -700,6 +701,9 @@ qemuDomainRemoveInactiveLocked(virQEMUDriver *driver, void qemuDomainSetFakeReboot(virDomainObj *vm, bool value);
+void qemuDomainSetHardReboot(virDomainObj *vm, + bool value); + int qemuDomainCheckDiskStartupPolicy(virQEMUDriver *driver, virDomainObj *vm, size_t diskIndex, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d00d2a27c6..86e8efbfcb 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1788,6 +1788,7 @@ qemuDomainShutdownFlagsAgent(virDomainObj *vm, goto endjob;
qemuDomainSetFakeReboot(vm, false); + qemuDomainSetHardReboot(vm, false); agent = qemuDomainObjEnterAgent(vm); ret = qemuAgentShutdown(agent, agentFlag); qemuDomainObjExitAgent(vm, agent); @@ -1800,7 +1801,8 @@ qemuDomainShutdownFlagsAgent(virDomainObj *vm,
static int qemuDomainShutdownFlagsMonitor(virDomainObj *vm, - bool isReboot) + bool isReboot, + bool hard) { int ret = -1; qemuDomainObjPrivate *priv; @@ -1816,7 +1818,17 @@ qemuDomainShutdownFlagsMonitor(virDomainObj *vm, goto endjob; }
- qemuDomainSetFakeReboot(vm, isReboot); + if (hard) { + /* hard reboot control the reboot */ + VIR_DEBUG("Set hard reboot %d in shutdown monitor", isReboot); + qemuDomainSetHardReboot(vm, isReboot); + qemuDomainSetFakeReboot(vm, false); + } else { + /* fake reboot control the reboot */ + VIR_DEBUG("Set fake reboot %d in shutdown monitor", isReboot); + qemuDomainSetFakeReboot(vm, isReboot); + qemuDomainSetHardReboot(vm, false); + } qemuDomainObjEnterMonitor(vm); ret = qemuMonitorSystemPowerdown(priv->mon); qemuDomainObjExitMonitor(vm); @@ -1832,12 +1844,13 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags) virDomainObj *vm; int ret = -1; qemuDomainObjPrivate *priv; - bool useAgent = false, agentRequested, acpiRequested; + bool useAgent = false, agentRequested, acpiRequested, hardRequested; bool isReboot = false; bool agentForced;
virCheckFlags(VIR_DOMAIN_SHUTDOWN_ACPI_POWER_BTN | - VIR_DOMAIN_SHUTDOWN_GUEST_AGENT, -1); + VIR_DOMAIN_SHUTDOWN_GUEST_AGENT | + VIR_DOMAIN_SHUTDOWN_HARD, -1);
if (!(vm = qemuDomainObjFromDomain(dom))) goto cleanup; @@ -1851,14 +1864,23 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags) priv = vm->privateData; agentRequested = flags & VIR_DOMAIN_SHUTDOWN_GUEST_AGENT; acpiRequested = flags & VIR_DOMAIN_SHUTDOWN_ACPI_POWER_BTN; + hardRequested = flags & VIR_DOMAIN_SHUTDOWN_HARD; + + if (virDomainShutdownFlagsEnsureACL(dom->conn, vm->def, flags) < 0) + goto cleanup; + + /* Take hard reboot as the highest priority. + * if failed, consider the agent and acpi */ + if (hardRequested) + ret = qemuDomainShutdownFlagsMonitor(vm, isReboot, true); + + if (!ret) + goto cleanup;
/* Prefer agent unless we were requested to not to. */ if (agentRequested || (!flags && priv->agent)) useAgent = true;
- if (virDomainShutdownFlagsEnsureACL(dom->conn, vm->def, flags) < 0) - goto cleanup; - agentForced = agentRequested && !acpiRequested; if (useAgent) { ret = qemuDomainShutdownFlagsAgent(vm, isReboot, agentForced); @@ -1877,7 +1899,7 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags) goto cleanup; }
- ret = qemuDomainShutdownFlagsMonitor(vm, isReboot); + ret = qemuDomainShutdownFlagsMonitor(vm, isReboot, false); }
cleanup: @@ -1913,6 +1935,7 @@ qemuDomainRebootAgent(virDomainObj *vm, goto endjob;
qemuDomainSetFakeReboot(vm, false); + qemuDomainSetHardReboot(vm, false); agent = qemuDomainObjEnterAgent(vm); ret = qemuAgentShutdown(agent, agentFlag); qemuDomainObjExitAgent(vm, agent); @@ -1925,7 +1948,8 @@ qemuDomainRebootAgent(virDomainObj *vm,
static int qemuDomainRebootMonitor(virDomainObj *vm, - bool isReboot) + bool isReboot, + bool hard) { qemuDomainObjPrivate *priv = vm->privateData; int ret = -1; @@ -1936,7 +1960,15 @@ qemuDomainRebootMonitor(virDomainObj *vm, if (virDomainObjCheckActive(vm) < 0) goto endjob;
- qemuDomainSetFakeReboot(vm, isReboot); + if (hard) { + VIR_DEBUG("Set hard reboot %d in reboot monitor", isReboot); + qemuDomainSetHardReboot(vm, isReboot); + qemuDomainSetFakeReboot(vm, false); + } else { + VIR_DEBUG("Set fake reboot %d in reboot monitor", isReboot); + qemuDomainSetFakeReboot(vm, isReboot); + qemuDomainSetHardReboot(vm, false); + } qemuDomainObjEnterMonitor(vm); ret = qemuMonitorSystemPowerdown(priv->mon); qemuDomainObjExitMonitor(vm); @@ -1953,12 +1985,13 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) virDomainObj *vm; int ret = -1; qemuDomainObjPrivate *priv; - bool useAgent = false, agentRequested, acpiRequested; + bool useAgent = false, agentRequested, acpiRequested, hardRequested; bool isReboot = true; bool agentForced;
virCheckFlags(VIR_DOMAIN_REBOOT_ACPI_POWER_BTN | - VIR_DOMAIN_REBOOT_GUEST_AGENT, -1); + VIR_DOMAIN_REBOOT_GUEST_AGENT | + VIR_DOMAIN_REBOOT_HARD, -1);
if (!(vm = qemuDomainObjFromDomain(dom))) goto cleanup; @@ -1972,14 +2005,24 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) priv = vm->privateData; agentRequested = flags & VIR_DOMAIN_REBOOT_GUEST_AGENT; acpiRequested = flags & VIR_DOMAIN_REBOOT_ACPI_POWER_BTN; + hardRequested = flags & VIR_DOMAIN_REBOOT_HARD; + + if (virDomainRebootEnsureACL(dom->conn, vm->def, flags) < 0) + goto cleanup; + + /* Take hard reboot as the highest priority. + * This is for TDX which is not allowed to warm reboot. + */ + if (hardRequested) + ret = qemuDomainRebootMonitor(vm, isReboot, true); + + if (!ret) + goto cleanup;
/* Prefer agent unless we were requested to not to. */ if (agentRequested || (!flags && priv->agent)) useAgent = true;
- if (virDomainRebootEnsureACL(dom->conn, vm->def, flags) < 0) - goto cleanup; - agentForced = agentRequested && !acpiRequested; if (useAgent) ret = qemuDomainRebootAgent(vm, isReboot, agentForced); @@ -1992,7 +2035,7 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) */ if ((!useAgent) || (ret < 0 && (acpiRequested || !flags))) { - ret = qemuDomainRebootMonitor(vm, isReboot); + ret = qemuDomainRebootMonitor(vm, isReboot, false); }
cleanup: @@ -2095,6 +2138,7 @@ qemuDomainDestroyFlags(virDomainPtr dom, }
qemuDomainSetFakeReboot(vm, false); + qemuDomainSetHardReboot(vm, false);
if (vm->job->asyncJob == VIR_ASYNC_JOB_MIGRATION_IN) stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f27f6653f5..9fa679e408 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -502,13 +502,98 @@ qemuProcessFakeReboot(void *opaque) virDomainObjEndAPI(&vm); }
+static void +qemuProcessHardReboot(void *opaque) +{ + virDomainObj *vm = opaque; + qemuDomainObjPrivate *priv = vm->privateData; + virQEMUDriver *driver = priv->driver; + unsigned int stopFlags = 0; + virObjectEvent *event = NULL; + virObjectEvent * event2 = NULL; + + VIR_DEBUG("Handle hard reboot: destroy phase"); + + virObjectLock(vm); + if (virDomainObjCheckActive(vm) < 0) + goto cleanup; + + if (qemuProcessBeginStopJob(vm, VIR_JOB_DESTROY, 0) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + virDomainObjEndJob(vm); + goto cleanup; + } + + if (vm->job->asyncJob == VIR_ASYNC_JOB_MIGRATION_IN) + stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; + + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_DESTROYED, + VIR_ASYNC_JOB_NONE, stopFlags); + virDomainAuditStop(vm, "destroyed"); + + event = virDomainEventLifecycleNewFromObj(vm, + VIR_DOMAIN_EVENT_STOPPED, + VIR_DOMAIN_EVENT_STOPPED_DESTROYED); + + virObjectEventStateQueue(driver->domainEventState, event); + /* skip remove inactive domain from active list */ + virDomainObjEndJob(vm); + + VIR_DEBUG("Handle hard reboot: boot phase"); + + if (qemuProcessBeginJob(vm, VIR_DOMAIN_JOB_OPERATION_START, + 0) < 0) { + qemuDomainRemoveInactive(driver, vm, 0, false); + goto cleanup; + } + + if (qemuProcessStart(NULL, driver, vm, NULL, VIR_ASYNC_JOB_START, + NULL, -1, NULL, NULL, + VIR_NETDEV_VPORT_PROFILE_OP_CREATE, + 0) < 0) { + virDomainAuditStart(vm, "booted", false); + qemuDomainRemoveInactive(driver, vm, 0, false); + qemuProcessEndJob(vm); + goto cleanup; + } + + virDomainAuditStart(vm, "booted", true); + event2 = virDomainEventLifecycleNewFromObj(vm, + VIR_DOMAIN_EVENT_STARTED, + VIR_DOMAIN_EVENT_STARTED_BOOTED); + virObjectEventStateQueue(driver->domainEventState, event2); + + qemuProcessEndJob(vm); + + cleanup: + virDomainObjEndAPI(&vm); +}
void qemuProcessShutdownOrReboot(virDomainObj *vm) { qemuDomainObjPrivate *priv = vm->privateData;
- if (priv->fakeReboot || + if (priv->hardReboot) { + g_autofree char *name = g_strdup_printf("hard reboot-%s", vm->def->name); + virThread th; + + virObjectRef(vm); + if (virThreadCreateFull(&th, + false, + qemuProcessHardReboot, + name, + false, + vm) < 0) { + VIR_ERROR(_("Failed to create hard reboot thread, killing domain")); + ignore_value(qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_NOWAIT)); + virObjectUnref(vm); + } + } else if (priv->fakeReboot || vm->def->onPoweroff == VIR_DOMAIN_LIFECYCLE_ACTION_RESTART) { g_autofree char *name = g_strdup_printf("reboot-%s", vm->def->name); virThread th; @@ -5673,6 +5758,7 @@ qemuProcessInit(virQEMUDriver *driver, } else { vm->def->id = qemuDriverAllocateID(driver); qemuDomainSetFakeReboot(vm, false); + qemuDomainSetFakeReboot(vm, false); virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_STARTING_UP);
if (g_atomic_int_add(&driver->nactive, 1) == 0 && driver->inhibitCallback) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 66f933dead..21864c92cb 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -5915,7 +5915,7 @@ static const vshCmdOptDef opts_shutdown[] = { {.name = "mode", .type = VSH_OT_STRING, .completer = virshDomainShutdownModeCompleter, - .help = N_("shutdown mode: acpi|agent|initctl|signal|paravirt") + .help = N_("shutdown mode: acpi|agent|initctl|signal|paravirt|hard") }, {.name = NULL} }; @@ -5952,6 +5952,8 @@ cmdShutdown(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SHUTDOWN_SIGNAL; } else if (STREQ(mode, "paravirt")) { flags |= VIR_DOMAIN_SHUTDOWN_PARAVIRT; + } else if (STREQ(mode, "hard")) { + flags |= VIR_DOMAIN_SHUTDOWN_HARD; } else { vshError(ctl, _("Unknown mode %1$s value, expecting 'acpi', 'agent', 'initctl', 'signal', or 'paravirt'"), mode); @@ -5995,7 +5997,7 @@ static const vshCmdOptDef opts_reboot[] = { {.name = "mode", .type = VSH_OT_STRING, .completer = virshDomainShutdownModeCompleter, - .help = N_("shutdown mode: acpi|agent|initctl|signal|paravirt") + .help = N_("shutdown mode: acpi|agent|initctl|signal|paravirt|hard") }, {.name = NULL} }; @@ -6031,6 +6033,8 @@ cmdReboot(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_REBOOT_SIGNAL; } else if (STREQ(mode, "paravirt")) { flags |= VIR_DOMAIN_REBOOT_PARAVIRT; + } else if (STREQ(mode, "hard")) { + flags |= VIR_DOMAIN_REBOOT_HARD; } else { vshError(ctl, _("Unknown mode %1$s value, expecting 'acpi', 'agent', 'initctl', 'signal' or 'paravirt'"), mode); -- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 07/11] qemu: add hard reboot in QEMU driver
On Mon, Nov 27, 2023 at 04:55:17PM +0800, Zhenzhong Duan wrote:
From: Chenyi Qiang <chenyi.qiang@intel.com>
Add the new flag VIR_DOMAIN_REBOOT_HARD/VIR_DOMAIN_SHUTDOWN_HARD to carry out a hard reboot, which kills the QEMU process and creates a new one with the same definition.
AFAICT, the _HARD flags cause it to issue a QEMU monitor command todo an ACPI shutdown and then re-create QEMU.
IOW, it seems like this is just the same as the existing _ACPI flags, and so I'm not sure why we need any new functionality at all.
What is the existing ACPI shutdown & fake reboot not able to achieve ?
ACPI shutdown & fake reboot send below QMP command in sequence: "system_powerdown", "system_reset", "cont". "system_reset" QMP command isn't supported by TDX for security reasons. So we have to kill old QEMU and re-create new one. Thanks Zhenzhong

On Fri, Jan 12, 2024 at 09:20:19AM +0000, Duan, Zhenzhong wrote:
-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 07/11] qemu: add hard reboot in QEMU driver
On Mon, Nov 27, 2023 at 04:55:17PM +0800, Zhenzhong Duan wrote:
From: Chenyi Qiang <chenyi.qiang@intel.com>
Add the new flag VIR_DOMAIN_REBOOT_HARD/VIR_DOMAIN_SHUTDOWN_HARD to carry out a hard reboot, which kills the QEMU process and creates a new one with the same definition.
AFAICT, the _HARD flags cause it to issue a QEMU monitor command todo an ACPI shutdown and then re-create QEMU.
IOW, it seems like this is just the same as the existing _ACPI flags, and so I'm not sure why we need any new functionality at all.
What is the existing ACPI shutdown & fake reboot not able to achieve ?
ACPI shutdown & fake reboot send below QMP command in sequence:
"system_powerdown", "system_reset", "cont".
"system_reset" QMP command isn't supported by TDX for security reasons. So we have to kill old QEMU and re-create new one.
We are still doing the 'system_powerdown' step first though, to do a graceful shutdown, before re-creating QEMU, and the powerdown is what we refer to as "ACPI". So I think we don't need new public API constants. We should still use the _ACPI flag, and just "do the right thing" with choosing between system_reset vs re-creating QEMU depending on whether TDX/SEV are active. That way the public API user experience is unchanged, only the internal impl varies. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 07/11] qemu: add hard reboot in QEMU driver
On Fri, Jan 12, 2024 at 09:20:19AM +0000, Duan, Zhenzhong wrote:
-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 07/11] qemu: add hard reboot in QEMU driver
On Mon, Nov 27, 2023 at 04:55:17PM +0800, Zhenzhong Duan wrote:
From: Chenyi Qiang <chenyi.qiang@intel.com>
Add the new flag VIR_DOMAIN_REBOOT_HARD/VIR_DOMAIN_SHUTDOWN_HARD to carry out a hard reboot, which kills the QEMU process and creates a
new
one with the same definition.
AFAICT, the _HARD flags cause it to issue a QEMU monitor command todo an ACPI shutdown and then re-create QEMU.
IOW, it seems like this is just the same as the existing _ACPI flags, and so I'm not sure why we need any new functionality at all.
What is the existing ACPI shutdown & fake reboot not able to achieve ?
ACPI shutdown & fake reboot send below QMP command in sequence:
"system_powerdown", "system_reset", "cont".
"system_reset" QMP command isn't supported by TDX for security reasons. So we have to kill old QEMU and re-create new one.
We are still doing the 'system_powerdown' step first though, to do a graceful shutdown, before re-creating QEMU, and the powerdown is what we refer to as "ACPI".
So I think we don't need new public API constants. We should still use the _ACPI flag, and just "do the right thing" with choosing between system_reset vs re-creating QEMU depending on whether TDX/SEV are active.
Get your point.
That way the public API user experience is unchanged, only the internal impl varies.
Yes, really good suggestion, I'll follow this way in next version. Thanks Zhenzhong

From: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> --- src/qemu/qemu_driver.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 86e8efbfcb..ba1bb4ecb1 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2013,7 +2013,9 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) /* Take hard reboot as the highest priority. * This is for TDX which is not allowed to warm reboot. */ - if (hardRequested) + if ((vm->def->sec && + vm->def->sec->sectype == VIR_DOMAIN_LAUNCH_SECURITY_TDX) || + hardRequested) ret = qemuDomainRebootMonitor(vm, isReboot, true); if (!ret) @@ -3617,7 +3619,12 @@ processGuestPanicEvent(virQEMUDriver *driver, G_GNUC_FALLTHROUGH; case VIR_DOMAIN_LIFECYCLE_ACTION_RESTART: - qemuDomainSetFakeReboot(vm, true); + if (vm->def->sec && + vm->def->sec->sectype == VIR_DOMAIN_LAUNCH_SECURITY_TDX) { + qemuDomainSetHardReboot(vm, true); + } else { + qemuDomainSetFakeReboot(vm, true); + } qemuProcessShutdownOrReboot(vm); break; -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:18PM +0800, Zhenzhong Duan wrote:
From: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> --- src/qemu/qemu_driver.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 86e8efbfcb..ba1bb4ecb1 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2013,7 +2013,9 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) /* Take hard reboot as the highest priority. * This is for TDX which is not allowed to warm reboot. */ - if (hardRequested) + if ((vm->def->sec && + vm->def->sec->sectype == VIR_DOMAIN_LAUNCH_SECURITY_TDX) || + hardRequested) ret = qemuDomainRebootMonitor(vm, isReboot, true);
if (!ret) @@ -3617,7 +3619,12 @@ processGuestPanicEvent(virQEMUDriver *driver, G_GNUC_FALLTHROUGH;
case VIR_DOMAIN_LIFECYCLE_ACTION_RESTART: - qemuDomainSetFakeReboot(vm, true); + if (vm->def->sec && + vm->def->sec->sectype == VIR_DOMAIN_LAUNCH_SECURITY_TDX) { + qemuDomainSetHardReboot(vm, true); + } else { + qemuDomainSetFakeReboot(vm, true); + } qemuProcessShutdownOrReboot(vm);
I suspect we'll need todo this with SEV too, since IIUC it does not permit soft reboot either.
break;
-- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 08/11] qemu: make hard reboot as the TDX default reboot mode
On Mon, Nov 27, 2023 at 04:55:18PM +0800, Zhenzhong Duan wrote:
From: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> --- src/qemu/qemu_driver.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 86e8efbfcb..ba1bb4ecb1 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2013,7 +2013,9 @@ qemuDomainReboot(virDomainPtr dom, unsigned int flags) /* Take hard reboot as the highest priority. * This is for TDX which is not allowed to warm reboot. */ - if (hardRequested) + if ((vm->def->sec && + vm->def->sec->sectype == VIR_DOMAIN_LAUNCH_SECURITY_TDX) || + hardRequested) ret = qemuDomainRebootMonitor(vm, isReboot, true);
if (!ret) @@ -3617,7 +3619,12 @@ processGuestPanicEvent(virQEMUDriver *driver, G_GNUC_FALLTHROUGH;
case VIR_DOMAIN_LIFECYCLE_ACTION_RESTART: - qemuDomainSetFakeReboot(vm, true); + if (vm->def->sec && + vm->def->sec->sectype == VIR_DOMAIN_LAUNCH_SECURITY_TDX) { + qemuDomainSetHardReboot(vm, true); + } else { + qemuDomainSetFakeReboot(vm, true); + } qemuProcessShutdownOrReboot(vm);
I suspect we'll need todo this with SEV too, since IIUC it does not permit soft reboot either.
Yes, unluckily I have no SEV env to see if hard reboot also works for SEV. Thanks Zhenzhong

With hard reboot, we can reboot a TDX guest with 'virsh reboot' or 'virsh shutdown' if action for onPoweroff is 'restart'. But running reboot cmd in guest shell will always lead to shutdown. This behavior is not consistent with legacy guest, this patch extend hard reboot support and make TDX guest behavior same as legacy guest. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/qemu/qemu_monitor.c | 19 ++++++++++++++++++- src/qemu/qemu_monitor.h | 2 +- src/qemu/qemu_monitor_json.c | 6 +++++- src/qemu/qemu_process.c | 1 + 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 320729f067..c250edf8be 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1084,10 +1084,27 @@ qemuMonitorEmitEvent(qemuMonitor *mon, const char *event, void -qemuMonitorEmitShutdown(qemuMonitor *mon, virTristateBool guest) +qemuMonitorEmitShutdown(qemuMonitor *mon, virTristateBool guest, + const char *reason) { + virDomainObj *vm = mon->vm; + qemuDomainObjPrivate *priv = vm->privateData; + VIR_DEBUG("mon=%p guest=%u", mon, guest); + /* This isn't a proper place to set hardReboot but we need to access + * mon->vm which is defined in this file. Reboot in guest kernel will + * trigger SHUTDOWN event for td-guest, so we has to deal with it + * here. */ + if (vm->def->sec && + vm->def->sec->sectype == VIR_DOMAIN_LAUNCH_SECURITY_TDX && + ((STREQ_NULLABLE(reason, "guest-shutdown") && + vm->def->onPoweroff == VIR_DOMAIN_LIFECYCLE_ACTION_RESTART) || + (STREQ_NULLABLE(reason, "guest-reset") && + vm->def->onReboot == VIR_DOMAIN_LIFECYCLE_ACTION_RESTART))) { + priv->hardReboot = true; + } + QEMU_MONITOR_CALLBACK(mon, domainShutdown, mon->vm, guest); } diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 6c590933aa..6aae635411 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -443,7 +443,7 @@ int qemuMonitorUpdateVideoVram64Size(qemuMonitor *mon, void qemuMonitorEmitEvent(qemuMonitor *mon, const char *event, long long seconds, unsigned int micros, const char *details); -void qemuMonitorEmitShutdown(qemuMonitor *mon, virTristateBool guest); +void qemuMonitorEmitShutdown(qemuMonitor *mon, virTristateBool guest, const char *reason); void qemuMonitorEmitReset(qemuMonitor *mon); void qemuMonitorEmitStop(qemuMonitor *mon); void qemuMonitorEmitResume(qemuMonitor *mon); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 105d729d7c..794c22eb9c 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -546,12 +546,16 @@ qemuMonitorJSONMakeCommand(const char *cmdname, static void qemuMonitorJSONHandleShutdown(qemuMonitor *mon, virJSONValue *data) { bool guest = false; + const char *reason = NULL; virTristateBool guest_initiated = VIR_TRISTATE_BOOL_ABSENT; if (data && virJSONValueObjectGetBoolean(data, "guest", &guest) == 0) guest_initiated = virTristateBoolFromBool(guest); - qemuMonitorEmitShutdown(mon, guest_initiated); + if (data) + reason = virJSONValueObjectGetString(data, "reason"); + + qemuMonitorEmitShutdown(mon, guest_initiated, reason); } static void qemuMonitorJSONHandleReset(qemuMonitor *mon, virJSONValue *data G_GNUC_UNUSED) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 9fa679e408..a9bba19852 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -591,6 +591,7 @@ qemuProcessShutdownOrReboot(virDomainObj *vm) vm) < 0) { VIR_ERROR(_("Failed to create hard reboot thread, killing domain")); ignore_value(qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_NOWAIT)); + qemuDomainSetHardReboot(vm, false); virObjectUnref(vm); } } else if (priv->fakeReboot || -- 2.34.1

From: Chenyi Qiang <chenyi.qiang@intel.com> User can add a new option --timekeep to keep the virsh console alive for several seconds. Then it would try to reconnenct the same domain. This option is mainly aimed to support hard reboot in Libvirt, which would kill the QEMU process and create a new one. The console would be lost due the destroy of QEMU. To make it user-friendly (avoid creating a new virsh console), a certain time waiting to reconnnect can be a solution. However, the procedure to destroy and create QEMU may take a while and the speciifc time can't be determined due to different situations. So, users can specify the waiting time (e.g. "virsh console domain --timekeep 2" will stay alive for 2 seconds), if timeout or fail to open the console, adjusting the waiting time can help. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- tools/virsh-console.c | 3 +++ tools/virsh-domain.c | 56 +++++++++++++++++++++++++++++++++---------- tools/virsh.h | 1 + 3 files changed, 48 insertions(+), 12 deletions(-) diff --git a/tools/virsh-console.c b/tools/virsh-console.c index 7c561a11f3..09116ebca5 100644 --- a/tools/virsh-console.c +++ b/tools/virsh-console.c @@ -72,6 +72,7 @@ struct virConsole { struct virConsoleBuffer terminalToStream; char escapeChar; + bool isEscapeExit; virError error; }; @@ -269,6 +270,7 @@ virConsoleEventOnStdin(int watch G_GNUC_UNUSED, goto cleanup; } if (con->terminalToStream.data[con->terminalToStream.offset] == con->escapeChar) { + con->isEscapeExit = true; virConsoleShutdown(con, true); goto cleanup; } @@ -498,6 +500,7 @@ virshRunConsole(vshControl *ctl, cleanup: virConsoleShutdown(con, ret == 0); + priv->escapeExit = con->isEscapeExit; if (ret < 0) { vshResetLibvirtError(); diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 21864c92cb..5d43785d56 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -3020,6 +3020,10 @@ static const vshCmdOptDef opts_console[] = { .type = VSH_OT_BOOL, .help = N_("only connect if safe console handling is supported") }, + {.name = "timekeep", + .type = VSH_OT_INT, + .help = N_("keep the console alive and try to reconnect in seconds") + }, {.name = NULL} }; @@ -3027,31 +3031,41 @@ static bool cmdRunConsole(vshControl *ctl, virDomainPtr dom, const char *name, const bool resume_domain, - unsigned int flags) + unsigned int flags, + bool first) { int state; virshControl *priv = ctl->privData; if ((state = virshDomainState(ctl, dom, NULL)) < 0) { - vshError(ctl, "%s", _("Unable to get domain status")); + /* don't report error during the retry */ + if (first) + vshError(ctl, "%s", _("Unable to get domain status")); return false; } if (state == VIR_DOMAIN_SHUTOFF) { - vshError(ctl, "%s", _("The domain is not running")); + /* don't report error during the retry */ + if (first) + vshError(ctl, "%s", _("The domain is not running")); return false; } if (!isatty(STDIN_FILENO)) { - vshError(ctl, "%s", _("Cannot run interactive console without a controlling TTY")); + /* don't report error during the retry */ + if (first) + vshError(ctl, "%s", _("Cannot run interactive console without a controlling TTY")); return false; } - vshPrintExtra(ctl, _("Connected to domain '%1$s'\n"), virDomainGetName(dom)); - vshPrintExtra(ctl, _("Escape character is %1$s"), priv->escapeChar); - if (priv->escapeChar[0] == '^') - vshPrintExtra(ctl, " (Ctrl + %c)", priv->escapeChar[1]); - vshPrintExtra(ctl, "\n"); + if (first) { + vshPrintExtra(ctl, _("Connected to domain '%1$s'\n"), virDomainGetName(dom)); + vshPrintExtra(ctl, _("Escape character is %1$s"), priv->escapeChar); + if (priv->escapeChar[0] == '^') + vshPrintExtra(ctl, " (Ctrl + %c)", priv->escapeChar[1]); + vshPrintExtra(ctl, "\n"); + } + fflush(stdout); if (virshRunConsole(ctl, dom, name, resume_domain, flags) == 0) return true; @@ -3066,8 +3080,12 @@ cmdConsole(vshControl *ctl, const vshCmd *cmd) bool force = vshCommandOptBool(cmd, "force"); bool resume = vshCommandOptBool(cmd, "resume"); bool safe = vshCommandOptBool(cmd, "safe"); + unsigned long long timekeep = 0; unsigned int flags = 0; const char *name = NULL; + virshControl *priv = ctl->privData; + bool toggle = true; + bool ret; if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) return false; @@ -3075,12 +3093,26 @@ cmdConsole(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptStringReq(ctl, cmd, "devname", &name) < 0) /* sc_prohibit_devname */ return false; + if (vshCommandOptULongLong(ctl, cmd, "timekeep", &timekeep) < 0) + return false; + if (force) flags |= VIR_DOMAIN_CONSOLE_FORCE; if (safe) flags |= VIR_DOMAIN_CONSOLE_SAFE; - return cmdRunConsole(ctl, dom, name, resume, flags); + ret = cmdRunConsole(ctl, dom, name, resume, flags, true); + if (timekeep > 0) { + /* retry to connect after sleeping for "timekeep" seconds. + * escape exit can leave the console immediately. */ + while (!priv->escapeExit && toggle) { + sleep(timekeep); + if (!cmdRunConsole(ctl, dom, name, resume, flags, false)) + toggle = false; + } + } + + return ret; } #endif /* WIN32 */ @@ -4166,7 +4198,7 @@ cmdStart(vshControl *ctl, const vshCmd *cmd) vshPrintExtra(ctl, _("Domain '%1$s' started\n"), virDomainGetName(dom)); #ifndef WIN32 - if (console && !cmdRunConsole(ctl, dom, NULL, resume_domain, 0)) + if (console && !cmdRunConsole(ctl, dom, NULL, resume_domain, 0, true)) return false; #endif @@ -8295,7 +8327,7 @@ cmdCreate(vshControl *ctl, const vshCmd *cmd) virDomainGetName(dom), from); #ifndef WIN32 if (console) - cmdRunConsole(ctl, dom, NULL, resume_domain, 0); + cmdRunConsole(ctl, dom, NULL, resume_domain, 0, true); #endif return true; } diff --git a/tools/virsh.h b/tools/virsh.h index 6acefa7f9d..4f07f651a5 100644 --- a/tools/virsh.h +++ b/tools/virsh.h @@ -151,6 +151,7 @@ struct _virshControl { are missing */ const char *escapeChar; /* String representation of console escape character */ + bool escapeExit; /* true if use escape to exit */ }; /* Typedefs, function prototypes for job progress reporting. -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:20PM +0800, Zhenzhong Duan wrote:
From: Chenyi Qiang <chenyi.qiang@intel.com>
User can add a new option --timekeep to keep the virsh console alive for several seconds. Then it would try to reconnenct the same domain.
This option is mainly aimed to support hard reboot in Libvirt, which would kill the QEMU process and create a new one. The console would be lost due the destroy of QEMU. To make it user-friendly (avoid creating a new virsh console), a certain time waiting to reconnnect can be a solution. However, the procedure to destroy and create QEMU may take a while and the speciifc time can't be determined due to different situations. So, users can specify the waiting time (e.g. "virsh console domain --timekeep 2" will stay alive for 2 seconds), if timeout or fail to open the console, adjusting the waiting time can help.
I don't think we should be doing this with manually requested timeouts. IIUC when we do the fake reboot, we should be emitting one or more domain events to show what's happening. virsh should listen for the events and just "do the right thing" with automatically reconnecting when it sees the reboot taking place.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- tools/virsh-console.c | 3 +++ tools/virsh-domain.c | 56 +++++++++++++++++++++++++++++++++---------- tools/virsh.h | 1 + 3 files changed, 48 insertions(+), 12 deletions(-)
diff --git a/tools/virsh-console.c b/tools/virsh-console.c index 7c561a11f3..09116ebca5 100644 --- a/tools/virsh-console.c +++ b/tools/virsh-console.c @@ -72,6 +72,7 @@ struct virConsole { struct virConsoleBuffer terminalToStream;
char escapeChar; + bool isEscapeExit; virError error; };
@@ -269,6 +270,7 @@ virConsoleEventOnStdin(int watch G_GNUC_UNUSED, goto cleanup; } if (con->terminalToStream.data[con->terminalToStream.offset] == con->escapeChar) { + con->isEscapeExit = true; virConsoleShutdown(con, true); goto cleanup; } @@ -498,6 +500,7 @@ virshRunConsole(vshControl *ctl,
cleanup: virConsoleShutdown(con, ret == 0); + priv->escapeExit = con->isEscapeExit;
if (ret < 0) { vshResetLibvirtError(); diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 21864c92cb..5d43785d56 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -3020,6 +3020,10 @@ static const vshCmdOptDef opts_console[] = { .type = VSH_OT_BOOL, .help = N_("only connect if safe console handling is supported") }, + {.name = "timekeep", + .type = VSH_OT_INT, + .help = N_("keep the console alive and try to reconnect in seconds") + }, {.name = NULL} };
@@ -3027,31 +3031,41 @@ static bool cmdRunConsole(vshControl *ctl, virDomainPtr dom, const char *name, const bool resume_domain, - unsigned int flags) + unsigned int flags, + bool first) { int state; virshControl *priv = ctl->privData;
if ((state = virshDomainState(ctl, dom, NULL)) < 0) { - vshError(ctl, "%s", _("Unable to get domain status")); + /* don't report error during the retry */ + if (first) + vshError(ctl, "%s", _("Unable to get domain status")); return false; }
if (state == VIR_DOMAIN_SHUTOFF) { - vshError(ctl, "%s", _("The domain is not running")); + /* don't report error during the retry */ + if (first) + vshError(ctl, "%s", _("The domain is not running")); return false; }
if (!isatty(STDIN_FILENO)) { - vshError(ctl, "%s", _("Cannot run interactive console without a controlling TTY")); + /* don't report error during the retry */ + if (first) + vshError(ctl, "%s", _("Cannot run interactive console without a controlling TTY")); return false; }
- vshPrintExtra(ctl, _("Connected to domain '%1$s'\n"), virDomainGetName(dom)); - vshPrintExtra(ctl, _("Escape character is %1$s"), priv->escapeChar); - if (priv->escapeChar[0] == '^') - vshPrintExtra(ctl, " (Ctrl + %c)", priv->escapeChar[1]); - vshPrintExtra(ctl, "\n"); + if (first) { + vshPrintExtra(ctl, _("Connected to domain '%1$s'\n"), virDomainGetName(dom)); + vshPrintExtra(ctl, _("Escape character is %1$s"), priv->escapeChar); + if (priv->escapeChar[0] == '^') + vshPrintExtra(ctl, " (Ctrl + %c)", priv->escapeChar[1]); + vshPrintExtra(ctl, "\n"); + } + fflush(stdout); if (virshRunConsole(ctl, dom, name, resume_domain, flags) == 0) return true; @@ -3066,8 +3080,12 @@ cmdConsole(vshControl *ctl, const vshCmd *cmd) bool force = vshCommandOptBool(cmd, "force"); bool resume = vshCommandOptBool(cmd, "resume"); bool safe = vshCommandOptBool(cmd, "safe"); + unsigned long long timekeep = 0; unsigned int flags = 0; const char *name = NULL; + virshControl *priv = ctl->privData; + bool toggle = true; + bool ret;
if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) return false; @@ -3075,12 +3093,26 @@ cmdConsole(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptStringReq(ctl, cmd, "devname", &name) < 0) /* sc_prohibit_devname */ return false;
+ if (vshCommandOptULongLong(ctl, cmd, "timekeep", &timekeep) < 0) + return false; + if (force) flags |= VIR_DOMAIN_CONSOLE_FORCE; if (safe) flags |= VIR_DOMAIN_CONSOLE_SAFE;
- return cmdRunConsole(ctl, dom, name, resume, flags); + ret = cmdRunConsole(ctl, dom, name, resume, flags, true); + if (timekeep > 0) { + /* retry to connect after sleeping for "timekeep" seconds. + * escape exit can leave the console immediately. */ + while (!priv->escapeExit && toggle) { + sleep(timekeep); + if (!cmdRunConsole(ctl, dom, name, resume, flags, false)) + toggle = false; + } + } + + return ret; } #endif /* WIN32 */
@@ -4166,7 +4198,7 @@ cmdStart(vshControl *ctl, const vshCmd *cmd) vshPrintExtra(ctl, _("Domain '%1$s' started\n"), virDomainGetName(dom)); #ifndef WIN32 - if (console && !cmdRunConsole(ctl, dom, NULL, resume_domain, 0)) + if (console && !cmdRunConsole(ctl, dom, NULL, resume_domain, 0, true)) return false; #endif
@@ -8295,7 +8327,7 @@ cmdCreate(vshControl *ctl, const vshCmd *cmd) virDomainGetName(dom), from); #ifndef WIN32 if (console) - cmdRunConsole(ctl, dom, NULL, resume_domain, 0); + cmdRunConsole(ctl, dom, NULL, resume_domain, 0, true); #endif return true; } diff --git a/tools/virsh.h b/tools/virsh.h index 6acefa7f9d..4f07f651a5 100644 --- a/tools/virsh.h +++ b/tools/virsh.h @@ -151,6 +151,7 @@ struct _virshControl { are missing */ const char *escapeChar; /* String representation of console escape character */ + bool escapeExit; /* true if use escape to exit */ };
/* Typedefs, function prototypes for job progress reporting. -- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 10/11] virsh: add new option "timekeep" to keep virsh console alive
On Mon, Nov 27, 2023 at 04:55:20PM +0800, Zhenzhong Duan wrote:
From: Chenyi Qiang <chenyi.qiang@intel.com>
User can add a new option --timekeep to keep the virsh console alive for several seconds. Then it would try to reconnenct the same domain.
This option is mainly aimed to support hard reboot in Libvirt, which would kill the QEMU process and create a new one. The console would be lost due the destroy of QEMU. To make it user-friendly (avoid creating a new virsh console), a certain time waiting to reconnnect can be a solution. However, the procedure to destroy and create QEMU may take a while and the speciifc time can't be determined due to different situations. So, users can specify the waiting time (e.g. "virsh console domain --timekeep 2" will stay alive for 2 seconds), if timeout or fail to open the console, adjusting the waiting time can help.
I don't think we should be doing this with manually requested timeouts.
Agree it's tricky.
IIUC when we do the fake reboot, we should be emitting one or more domain events to show what's happening. virsh should listen for the events and just "do the right thing" with automatically reconnecting when it sees the reboot taking place.
Thanks for your suggestion, I'll dig into it. BRs. Zhenzhong

After hard reboot, domid is increased by 1 as a new domain. Hard reboot simulate TD-guest reboot by calling qemuProcessStop and qemuProcessStart which will release and recreate domain resource including domid. Define origin_id to save domid and restore it when recreate domain. For persistent domain also need to save domid in newDef. Also add logic to keep same domid when libvirt restart. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/conf/domain_conf.c | 4 ++++ src/conf/domain_conf.h | 1 + src/qemu/qemu_process.c | 11 +++++++++-- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 08e82c5380..8c33b335bd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4159,6 +4159,7 @@ void virDomainObjAssignDef(virDomainObj *domain, else virDomainDefFree(domain->newDef); domain->newDef = g_steal_pointer(def); + domain->newDef->origin_id = domain->def->id; return; } @@ -17874,6 +17875,9 @@ virDomainDefParseIDs(virDomainDef *def, return -1; } + /* Restore origin_id to support hard reboot */ + def->origin_id = def->id; + /* Extract domain name */ if (!(def->name = virXPathString("string(./name[1])", ctxt))) { virReportError(VIR_ERR_NO_NAME, NULL); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3b01850eb4..afd54a07eb 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2957,6 +2957,7 @@ struct _virDomainVirtioOptions { struct _virDomainDef { int virtType; /* enum virDomainVirtType */ int id; + int origin_id; unsigned char uuid[VIR_UUID_BUFLEN]; unsigned char genid[VIR_UUID_BUFLEN]; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a9bba19852..6ec649a261 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5757,8 +5757,15 @@ qemuProcessInit(virQEMUDriver *driver, goto cleanup; } } else { - vm->def->id = qemuDriverAllocateID(driver); - qemuDomainSetFakeReboot(vm, false); + if (vm->def->origin_id > 0 && priv->hardReboot) + vm->def->id = vm->def->origin_id; + else + vm->def->id = qemuDriverAllocateID(driver); + vm->def->origin_id = vm->def->id; + if (vm->newDef) + vm->newDef->origin_id = vm->def->id; + + qemuDomainSetHardReboot(vm, false); qemuDomainSetFakeReboot(vm, false); virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_STARTING_UP); -- 2.34.1

On Mon, Nov 27, 2023 at 04:55:21PM +0800, Zhenzhong Duan wrote:
After hard reboot, domid is increased by 1 as a new domain. Hard reboot simulate TD-guest reboot by calling qemuProcessStop and qemuProcessStart which will release and recreate domain resource including domid.
This is risky. We use 'domid' as a mechanism to *guarantee* certain paths are unique when re-spawning QEMU, so re-using domid for a new QEMU is liable to cause bugs. The domain ID is not guest visible, so we don't need to preserve it from that POV. I think we should just document domid is going to change for confidential guests when the reboot is faked. Most mgmt apps focus on UUID for most of their work anyway, with domain ID largely irrelevant beyond it being presented in users in a few places.
Define origin_id to save domid and restore it when recreate domain. For persistent domain also need to save domid in newDef.
Also add logic to keep same domid when libvirt restart.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- src/conf/domain_conf.c | 4 ++++ src/conf/domain_conf.h | 1 + src/qemu/qemu_process.c | 11 +++++++++-- 3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 08e82c5380..8c33b335bd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4159,6 +4159,7 @@ void virDomainObjAssignDef(virDomainObj *domain, else virDomainDefFree(domain->newDef); domain->newDef = g_steal_pointer(def); + domain->newDef->origin_id = domain->def->id; return; }
@@ -17874,6 +17875,9 @@ virDomainDefParseIDs(virDomainDef *def, return -1; }
+ /* Restore origin_id to support hard reboot */ + def->origin_id = def->id; + /* Extract domain name */ if (!(def->name = virXPathString("string(./name[1])", ctxt))) { virReportError(VIR_ERR_NO_NAME, NULL); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3b01850eb4..afd54a07eb 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2957,6 +2957,7 @@ struct _virDomainVirtioOptions { struct _virDomainDef { int virtType; /* enum virDomainVirtType */ int id; + int origin_id; unsigned char uuid[VIR_UUID_BUFLEN];
unsigned char genid[VIR_UUID_BUFLEN]; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a9bba19852..6ec649a261 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5757,8 +5757,15 @@ qemuProcessInit(virQEMUDriver *driver, goto cleanup; } } else { - vm->def->id = qemuDriverAllocateID(driver); - qemuDomainSetFakeReboot(vm, false); + if (vm->def->origin_id > 0 && priv->hardReboot) + vm->def->id = vm->def->origin_id; + else + vm->def->id = qemuDriverAllocateID(driver); + vm->def->origin_id = vm->def->id; + if (vm->newDef) + vm->newDef->origin_id = vm->def->id; + + qemuDomainSetHardReboot(vm, false); qemuDomainSetFakeReboot(vm, false); virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_STARTING_UP);
-- 2.34.1
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

-----Original Message----- From: Daniel P. Berrangé <berrange@redhat.com> Subject: Re: [PATCH rfcv3 11/11] conf: Add support to keep same domid for hard reboot
On Mon, Nov 27, 2023 at 04:55:21PM +0800, Zhenzhong Duan wrote:
After hard reboot, domid is increased by 1 as a new domain. Hard reboot simulate TD-guest reboot by calling qemuProcessStop and qemuProcessStart which will release and recreate domain resource including domid.
This is risky.
We use 'domid' as a mechanism to *guarantee* certain paths are unique when re-spawning QEMU, so re-using domid for a new QEMU is liable to cause bugs.
The domain ID is not guest visible, so we don't need to preserve it from that POV.
I think we should just document domid is going to change for confidential guests when the reboot is faked.
Most mgmt apps focus on UUID for most of their work anyway, with domain ID largely irrelevant beyond it being presented in users in a few places.
OK, will drop this patch. Thanks Zhenzhong

Hello, Thanks for the submission. A few initial general comments: On 11/27/23 2:55 AM, Zhenzhong Duan wrote:
Hi,
This series brings libvirt the x86 TDX support.
* What's TDX? TDX stands for Trust Domain Extensions which isolates VMs from the virtual-machine manager (VMM)/hypervisor and any other software on the platform.
To support TDX, multiple software components, not only KVM but also QEMU, guest Linux and virtual bios, need to be updated. For more details, please check link[1], there are TDX spec links and public repository link at github for each software component.
This patchset is another software component to extend libvirt to support TDX, with which one can start a VM from high level rather than running qemu directly.
* Misc As QEMU use a software emulated way to reset guest which isn't supported by TDX guest for security reason. We add a new way to emulate the reset for TDX guest, called "hard reboot". We achieve this by killing old qemu and start a new one.
Can you expand on this a little bit more? What problems do you encounter when you reboot the normal way? I did not notice any patches related to a hard reboot in the v2 patchset that was submitted a while ago. What other approaches did you consider to solve this issue? The changes to virsh adding a reconnect timeout option for the console command in particular feel hacky to me.
Complete code can be found at [1], matching qemu code can be found at [2].
There are some new properties for tdx-guest object, i.e. `mrconfigid`, `mrowner`, `mrownerconfig` and `debug` which aren't in matching qemu[2] yet. I keep them intentionally as they will be implemented in qemu as extention series of [2].
* Test start/stop/reboot with virsh stop/reboot trigger in guest stop with on_poweroff=destroy/restart reboot with on_reboot=destroy/restart
* Patch organization - patch 1-3: Support query of TDX capabilities. - patch 4-6: Add TDX type to launchsecurity framework. - patch 7-11: Add hard reboot support to TDX guest
I would expect to see some test cases for tdx launch security as well (at least in qemuxml2argv). I suppose it is difficult to incorporate these tests until the qemu changes are merged upstream and we can regenerate capabilities that include tdx-guest support, etc, though...
[1] https://github.com/intel/libvirt-tdx/commits/tdx_for_upstream_rfcv3 [2] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v3
Thanks Zhenzhong
Changelog: rfcv3: - Change to generate qemu cmdline with -bios - drop firmware auto match as -bios is used - add a hard reboot method to reboot TDX guest
rfcv2: - give up using qmp cmd and check TDX directly on host for TDX capabilities. - use launchsecurity framework to support TDX - use <os>.<loader> for general loader - add auto firmware match feature for TDX
A example TDVF fimware description file 70-edk2-x86_64-tdx.json: { "description": "UEFI firmware for x86_64, supporting Intel TDX", "interface-types": [ "uefi" ], "mapping": { "device": "generic", "filename": "/usr/share/OVMF/OVMF_CODE-tdx.fd" }, "targets": [ { "architecture": "x86_64", "machines": [ "pc-q35-*" ] } ], "features": [ "intel-tdx", "verbose-dynamic" ], "tags": [
] }
rfcv2: https://www.mail-archive.com/libvir-list@redhat.com/msg219378.html
Chenyi Qiang (3): qemu: add hard reboot in QEMU driver qemu: make hard reboot as the TDX default reboot mode virsh: add new option "timekeep" to keep virsh console alive
Zhenzhong Duan (8): qemu: Check if INTEL Trust Domain Extention support is enabled qemu: Add TDX capability conf: expose TDX feature in domain capabilities conf: add tdx as launch security type qemu: Add command line and validation for TDX type qemu: force special parameters enabled for TDX guest qemu: Extend hard reboot in Qemu driver conf: Add support to keep same domid for hard reboot
docs/formatdomaincaps.rst | 1 + include/libvirt/libvirt-domain.h | 2 + src/conf/domain_capabilities.c | 1 + src/conf/domain_capabilities.h | 1 + src/conf/domain_conf.c | 50 ++++++++++++++++ src/conf/domain_conf.h | 11 ++++ src/conf/schemas/domaincaps.rng | 9 +++ src/conf/schemas/domaincommon.rng | 34 +++++++++++ src/conf/virconftypes.h | 2 + src/qemu/qemu_capabilities.c | 38 +++++++++++- src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 29 +++++++++ src/qemu/qemu_domain.c | 18 ++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 85 ++++++++++++++++++++------ src/qemu/qemu_firmware.c | 1 + src/qemu/qemu_monitor.c | 19 +++++- src/qemu/qemu_monitor.h | 2 +- src/qemu/qemu_monitor_json.c | 6 +- src/qemu/qemu_namespace.c | 1 + src/qemu/qemu_process.c | 99 ++++++++++++++++++++++++++++++- src/qemu/qemu_validate.c | 18 ++++++ tools/virsh-console.c | 3 + tools/virsh-domain.c | 64 +++++++++++++++----- tools/virsh.h | 1 + 25 files changed, 463 insertions(+), 37 deletions(-)

-----Original Message----- From: Jonathon Jongsma <jjongsma@redhat.com> Sent: Saturday, December 2, 2023 6:30 AM Subject: Re: [PATCH rfcv3 00/11] LIBVIRT: X86: TDX support
Hello,
Thanks for the submission. A few initial general comments:
Hi,
This series brings libvirt the x86 TDX support.
* What's TDX? TDX stands for Trust Domain Extensions which isolates VMs from the virtual-machine manager (VMM)/hypervisor and any other software on the platform.
To support TDX, multiple software components, not only KVM but also QEMU, guest Linux and virtual bios, need to be updated. For more details, please check link[1], there are TDX spec links and public repository link at github for each software component.
This patchset is another software component to extend libvirt to support TDX, with which one can start a VM from high level rather than running qemu
On 11/27/23 2:55 AM, Zhenzhong Duan wrote: directly.
* Misc As QEMU use a software emulated way to reset guest which isn't
supported by TDX
guest for security reason. We add a new way to emulate the reset for TDX guest, called "hard reboot". We achieve this by killing old qemu and start a new one.
Can you expand on this a little bit more? What problems do you encounter when you reboot the normal way? I did not notice any patches related to a hard reboot in the v2 patchset that was submitted a while ago.
If we use existing "fake reboot" in libvirt, qmp command "system_reset" isn't supported for TDX guest, because TDX doesn't support resetting each register of vcpu for security except in TDX debug mode. TDX guest will be shutdown instead. In v2, reboot isn't supported yet, only support shutdown for TDX guest.
What other approaches did you consider to solve this issue? The changes to virsh adding a reconnect timeout option for the console command in particular feel hacky to me.
One possible way I can think of is to let qemu do the kill/create job, i.e. destroy TDX vcpus, create new one. This way we can utilize existing "fake reboot" interface between qemu and libvirt. Yes, that patch looks hacky, we can drop it and not to support virsh reconnect.
Complete code can be found at [1], matching qemu code can be found at [2].
There are some new properties for tdx-guest object, i.e. `mrconfigid`, `mrowner`, `mrownerconfig` and `debug` which aren't in matching qemu[2] yet. I keep them intentionally as they will be implemented in qemu as extention series of [2].
* Test start/stop/reboot with virsh stop/reboot trigger in guest stop with on_poweroff=destroy/restart reboot with on_reboot=destroy/restart
* Patch organization - patch 1-3: Support query of TDX capabilities. - patch 4-6: Add TDX type to launchsecurity framework. - patch 7-11: Add hard reboot support to TDX guest
I would expect to see some test cases for tdx launch security as well (at least in qemuxml2argv). I suppose it is difficult to incorporate these tests until the qemu changes are merged upstream and we can regenerate capabilities that include tdx-guest support, etc, though...
Will add test cases in v3. Thanks Zhenzhong
[1] https://github.com/intel/libvirt-tdx/commits/tdx_for_upstream_rfcv3 [2] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v3
Thanks Zhenzhong
Changelog: rfcv3: - Change to generate qemu cmdline with -bios - drop firmware auto match as -bios is used - add a hard reboot method to reboot TDX guest
rfcv2: - give up using qmp cmd and check TDX directly on host for TDX capabilities. - use launchsecurity framework to support TDX - use <os>.<loader> for general loader - add auto firmware match feature for TDX
A example TDVF fimware description file 70-edk2-x86_64-tdx.json: { "description": "UEFI firmware for x86_64, supporting Intel TDX", "interface-types": [ "uefi" ], "mapping": { "device": "generic", "filename": "/usr/share/OVMF/OVMF_CODE-tdx.fd" }, "targets": [ { "architecture": "x86_64", "machines": [ "pc-q35-*" ] } ], "features": [ "intel-tdx", "verbose-dynamic" ], "tags": [
] }
rfcv2: https://www.mail-archive.com/libvir-list@redhat.com/msg219378.html
Chenyi Qiang (3): qemu: add hard reboot in QEMU driver qemu: make hard reboot as the TDX default reboot mode virsh: add new option "timekeep" to keep virsh console alive
Zhenzhong Duan (8): qemu: Check if INTEL Trust Domain Extention support is enabled qemu: Add TDX capability conf: expose TDX feature in domain capabilities conf: add tdx as launch security type qemu: Add command line and validation for TDX type qemu: force special parameters enabled for TDX guest qemu: Extend hard reboot in Qemu driver conf: Add support to keep same domid for hard reboot
docs/formatdomaincaps.rst | 1 + include/libvirt/libvirt-domain.h | 2 + src/conf/domain_capabilities.c | 1 + src/conf/domain_capabilities.h | 1 + src/conf/domain_conf.c | 50 ++++++++++++++++ src/conf/domain_conf.h | 11 ++++ src/conf/schemas/domaincaps.rng | 9 +++ src/conf/schemas/domaincommon.rng | 34 +++++++++++ src/conf/virconftypes.h | 2 + src/qemu/qemu_capabilities.c | 38 +++++++++++- src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 29 +++++++++ src/qemu/qemu_domain.c | 18 ++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 85 ++++++++++++++++++++------ src/qemu/qemu_firmware.c | 1 + src/qemu/qemu_monitor.c | 19 +++++- src/qemu/qemu_monitor.h | 2 +- src/qemu/qemu_monitor_json.c | 6 +- src/qemu/qemu_namespace.c | 1 + src/qemu/qemu_process.c | 99 ++++++++++++++++++++++++++++++- src/qemu/qemu_validate.c | 18 ++++++ tools/virsh-console.c | 3 + tools/virsh-domain.c | 64 +++++++++++++++----- tools/virsh.h | 1 + 25 files changed, 463 insertions(+), 37 deletions(-)

On Mon, Dec 04, 2023 at 03:38:30AM +0000, Duan, Zhenzhong wrote:
-----Original Message----- From: Jonathon Jongsma <jjongsma@redhat.com> Sent: Saturday, December 2, 2023 6:30 AM Subject: Re: [PATCH rfcv3 00/11] LIBVIRT: X86: TDX support
Hello,
Thanks for the submission. A few initial general comments:
Hi,
This series brings libvirt the x86 TDX support.
* What's TDX? TDX stands for Trust Domain Extensions which isolates VMs from the virtual-machine manager (VMM)/hypervisor and any other software on the platform.
To support TDX, multiple software components, not only KVM but also QEMU, guest Linux and virtual bios, need to be updated. For more details, please check link[1], there are TDX spec links and public repository link at github for each software component.
This patchset is another software component to extend libvirt to support TDX, with which one can start a VM from high level rather than running qemu
On 11/27/23 2:55 AM, Zhenzhong Duan wrote: directly.
* Misc As QEMU use a software emulated way to reset guest which isn't
supported by TDX
guest for security reason. We add a new way to emulate the reset for TDX guest, called "hard reboot". We achieve this by killing old qemu and start a new one.
Can you expand on this a little bit more? What problems do you encounter when you reboot the normal way? I did not notice any patches related to a hard reboot in the v2 patchset that was submitted a while ago.
If we use existing "fake reboot" in libvirt, qmp command "system_reset" isn't supported for TDX guest, because TDX doesn't support resetting each register of vcpu for security except in TDX debug mode. TDX guest will be shutdown instead.
I suspect the same probably applies to SEV.
What other approaches did you consider to solve this issue? The changes to virsh adding a reconnect timeout option for the console command in particular feel hacky to me.
One possible way I can think of is to let qemu do the kill/create job, i.e. destroy TDX vcpus, create new one. This way we can utilize existing "fake reboot" interface between qemu and libvirt.
Yes, I do wonder if there's some reasonable way for QEMU to re-create the KVM VM from scratch, to make this hardware limitation more transparent for all mgmt apps. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Dec 04, 2023 at 10:28:17 +0000, Daniel P. Berrangé wrote:
On Mon, Dec 04, 2023 at 03:38:30AM +0000, Duan, Zhenzhong wrote:
-----Original Message----- From: Jonathon Jongsma <jjongsma@redhat.com> Sent: Saturday, December 2, 2023 6:30 AM Subject: Re: [PATCH rfcv3 00/11] LIBVIRT: X86: TDX support
[...]
If we use existing "fake reboot" in libvirt, qmp command "system_reset" isn't supported for TDX guest, because TDX doesn't support resetting each register of vcpu for security except in TDX debug mode. TDX guest will be shutdown instead.
I suspect the same probably applies to SEV.
What other approaches did you consider to solve this issue? The changes to virsh adding a reconnect timeout option for the console command in particular feel hacky to me.
One possible way I can think of is to let qemu do the kill/create job, i.e. destroy TDX vcpus, create new one. This way we can utilize existing "fake reboot" interface between qemu and libvirt.
Yes, I do wonder if there's some reasonable way for QEMU to re-create the KVM VM from scratch, to make this hardware limitation more transparent for all mgmt apps.
Wouldn't that imply the need to do the attestation once over?

On Mon, Dec 04, 2023 at 11:30:54AM +0100, Peter Krempa wrote:
On Mon, Dec 04, 2023 at 10:28:17 +0000, Daniel P. Berrangé wrote:
On Mon, Dec 04, 2023 at 03:38:30AM +0000, Duan, Zhenzhong wrote:
-----Original Message----- From: Jonathon Jongsma <jjongsma@redhat.com> Sent: Saturday, December 2, 2023 6:30 AM Subject: Re: [PATCH rfcv3 00/11] LIBVIRT: X86: TDX support
[...]
If we use existing "fake reboot" in libvirt, qmp command "system_reset" isn't supported for TDX guest, because TDX doesn't support resetting each register of vcpu for security except in TDX debug mode. TDX guest will be shutdown instead.
I suspect the same probably applies to SEV.
What other approaches did you consider to solve this issue? The changes to virsh adding a reconnect timeout option for the console command in particular feel hacky to me.
One possible way I can think of is to let qemu do the kill/create job, i.e. destroy TDX vcpus, create new one. This way we can utilize existing "fake reboot" interface between qemu and libvirt.
Yes, I do wonder if there's some reasonable way for QEMU to re-create the KVM VM from scratch, to make this hardware limitation more transparent for all mgmt apps.
Wouldn't that imply the need to do the attestation once over?
Yes. No matter what is done - fake reboot in QEMU, fake reboot in libvirt, or a fake reboot in the mgmt app - all will require attestation again. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (5)
-
Daniel P. Berrangé
-
Duan, Zhenzhong
-
Jonathon Jongsma
-
Peter Krempa
-
Zhenzhong Duan