Despite efforts to make the virt-qemu-sev-validate tool friendly, it is
a certainty that almost everyone who tries it will hit false negative
results, getting a failure despite the VM being trustworthy.
Diagnosing these problems is no easy matter, especially for those not
familiar with SEV/SEV-ES in general. This extra docs text attempts to
set out a checklist of items to look at to identify what went wrong.
Reviewed-by: Ján Tomko <jtomko(a)redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange(a)redhat.com>
---
docs/manpages/virt-qemu-sev-validate.rst | 116 +++++++++++++++++++++++
1 file changed, 116 insertions(+)
diff --git a/docs/manpages/virt-qemu-sev-validate.rst
b/docs/manpages/virt-qemu-sev-validate.rst
index 24cdbb6d92..f5f928603a 100644
--- a/docs/manpages/virt-qemu-sev-validate.rst
+++ b/docs/manpages/virt-qemu-sev-validate.rst
@@ -456,6 +456,122 @@ inject a disk password on success:
--domain fedora34x86_64 \
--disk-password passwd.txt
+COMMON MISTAKES CHECKLIST
+=========================
+
+The complexity of configuring a guest and validating its boot measurement
+means it is very likely to see the failure::
+
+ ERROR: Measurement does not match, VM is not trustworthy
+
+This error message assumes the worst, but in most cases will failure will be
+a result of either mis-configuring the guest, or passing the wrong information
+when trying to validate it. The following information is a guide for what
+items to check in order to stand the best chance of diagnosing the problem
+
+* Check the VM configuration for the DH certificate and session
+ blob in the libvirt guest XML.
+
+ The content for these fields should be in base64 format, which is
+ what ``sevctl session`` generates. Other tools may generate the files
+ in binary format, so ensure it has been correctly converted to base64.
+
+* Check the VM configuration policy value matches the session blob
+
+ The ``<policy>`` value in libvirt guest XML has to match the value
+ passed to the ``sevctl session`` command. If this is mismatched
+ then the guest will not even start, and QEMU will show an error
+ such as::
+
+ sev_launch_start: LAUNCH_START ret=1 fw_error=11 'Bad measurement'
+
+* Check the correct TIK/TEK keypair are passed
+
+ The TIK/TEK keypair are uniquely tied to each DH cert and session
+ blob. Make sure that the TIK/TEK keypair passed to this program
+ the ones matched to the DH cert and session blob configured for
+ the libvirt guest XML. This is one of the most common mistakes.
+ Further ensure that the TIK and TEK files are not swapped.
+
+* Check the firmware binary matches the one used to boot
+
+ The firmware binary content is part of the data covered by the
+ launch measurement. Ensure that the firmware binary passed to
+ this program matches the one used to launch the guest. The
+ hypervisor host will periodically get software updates which
+ introduce a new firmware binary version.
+
+* Check the kernel, initrd and cmdline match the one used to boot
+
+ If the guest is configured to use direct kernel boot, check that
+ the kernel, initrd and cmdline passed to this program match the
+ ones used to boot the guest. In the kernel cmdline whitespace
+ must be preserved exactly, including any leading or trailing
+ spaces.
+
+* Check whether the kernel hash measurement is enabled
+
+ The ``kernelHashes`` property in the libvirt guest XML controls
+ whether hashes of the kernel, initrd and cmdline content are
+ covered by the boot measurement. If enabled, then the matching
+ content must be passed to this program. UIf disabled, then
+ the content must **NOT** be passed.
+
+* Check that the correct measurement hash is passed
+
+ The measurement hash includes a nonce, so it will be different
+ on every boot attempt. Thus when validating the measuremnt it
+ is important ensure the most recent measurement is used.
+
+* Check the correct VMSA blobs / CPU SKU values for the host are used
+
+ The VMSA blobs provide the initial register state for the
+ boot CPU and any additional CPUs. One of the registers
+ encodes the CPU SKU (family, model, stepping) of the physical
+ host CPU. Make sure that the VMSA blob used for validation
+ is one that matches the SKU of the host the guest is booted
+ on. Passing the CPU SKU values directly to the tool can
+ reduce the likelihood of using the wrong ones.
+
+* Check the CPU count is correct
+
+ When passing VMSA blobs for SEV-ES guests, the number of CPUs
+ present will influence the measurement result. Ensure that the
+ correct vCPU count is used corresponding to the guest boot
+ attempt.
+
+
+Best practice is to run this tool in completely offline mode and pass
+all information as explicit command line parameters. When debugging
+failures, however, it can be useful to tell it to connect to libvirt
+and fetch information. If connecting to a remote libvirt instance,
+it will fetch any information that can be trusted, which is the basic
+VM launch state data. It will also sanity check the XML configuration
+to identify some common mistakes. If the ``--insecure`` flag is passed
+it can extract some configuration information and use that for the
+attestation process.
+
+If the mistake still can't be identified, then this tool can be run
+on the virtualization host. In that scenario the only three command
+line parameters required are for the TIK, TEK and libvirt domain
+name. It should be able to automatically determine all the other
+information required. If it still reports a failure, this points
+very strongly to the TIK/TEK pair not maching the configured
+DH certificate and session blob.
+
+The ``--debug`` flag will display hashes and/or hex dumps for various
+pieces of information used in the attestation process. Comparing the
+``--debug`` output from running on the hypervisor host, against that
+obtained when running in offline mode can give further guidance to
+which parameter is inconsistent.
+
+As mentioned earlier in this document, bear in mind that in general
+any attestation answers obtained from running on the hypervisor host
+should not be trusted. So if a configuration mistake is identified
+it is strongly recommended to re-run the attestation in offline mode
+on a trusted machine.
+
+
EXIT STATUS
===========
--
2.37.3