On 8/23/22 08:52, Daniel P. Berrangé wrote:
On Mon, Aug 22, 2022 at 03:17:34PM -0400, Stefan Berger wrote:
>
>
> On 8/22/22 12:46, Daniel P. Berrangé wrote:
>> On Mon, Aug 22, 2022 at 08:05:53AM -0400, Stefan Berger wrote:
>>> When share storage for the TPM state files has been setup betwen hosts then
>>> remove the TPM state files and directory only when undefining a VM and only
>>> if the attribute persistent_state is not set. Avoid removing the TPM state
>>> files and directory structure when a VM is migrated and shared storage is
>>> used since this would also remove those files and directory structure on
>>> the destination side.
>>
>> I think our current undefine behaviour is probably flawed. We go to the
>> trouble of refusing to remove the firmware NVRAM when undefining because
>> it contains important VM state, but then happily blow away the TPM state.
>> Totally inconsistent behaviour :-( Its too late to change the default
>> behaviour, but we likely ought to add a flag
>>
>> VIR_DOMAIN_UNDEFINE_KEEP_TPM
>>
>> and plumb that through the varius code paths, which would remove the
>> need for this specific 'qemuDomainUndefineReason' enum.
>
> I think the granularity encoded in the reason is necessary for the following
> patch I was going to post later on:
>
> Subject: [PATCH] qemu: tpm: Remove TPM state with non-shared storage
>
> This patch 'fixes' the behavior of the persistent_state TPM domain XML
> attribute that intends to preserve the state of the TPM but should not
> keep the state around on all the hosts a VM has been migrated to. It
> removes the TPM state directory structure from the source host upon
> successful migration when non-shared storage is used. Similarly, it
> removes it from the destination host upon migration failure when
> non-shared storage is used.
'persistent_state' is something that applies to transient VMs.
It's an attribute set in the TPM's domain XML and affects the TPM state
permanently:
persistent_state
The persistent_state attribute indicates whether 'swtpm' TPM state
is kept or not when a transient domain is powered off or undefined. This
option can be used for preserving TPM state. By default the value is no.
This attribute only works with the emulator backend. The accepted values
are yes and no. Since 7.0.0
The documentation may say transient but the code seems to just not
remove the state at all once a user has set this in the domain XML:
static void
qemuTPMEmulatorCleanupHost(virDomainTPMDef *tpm)
{
if (!tpm->data.emulator.persistent_state)
qemuTPMEmulatorDeleteStorage(tpm);
}
https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_tpm.c#L707
We cannot get rid of the persistent_state in the domain XML but with
--keep-tpm we would have a second method to keep the state. My guess is
the author of the persistent_state patch did not want to instrument
callers but achieve the effect with just producing a different XML.
What I'm suggesting is that we should have some control over
this for persistent VMs, when calling 'undefine'.
ie virsh undefine --keep-nvram --keep-tpm $VMNAME
if we have this feature in the public API, then its impl should
support both this feature and your future patch for
'persistent_state'.
And how do we handle removal decisions on shared storage so that the
directory structure is not removed when a VM is undefined on the source
machine versus we want to have the directory structure removed on the
source machine upon successful migration when no shared storage is used?
This TPM-specific decision for removal seems better on the qemu_tpm.c
level rather than having to decide whether to keep the TPM state around
at all the call-sites that patch 1/7 instrumented with the reason
parameter (albeit unspecific in most cases) and pass down a KEEP_TPM
parameter instead. Getting the reason for the undefine on the qemu_tpm.c
level lets us make this TPM-specific decision there rather than at all
sites further up in the call stack.
Regards,
Stefan