On 7/29/19 5:17 AM, Michal Privoznik wrote:
On 7/26/19 8:38 PM, Stefan Berger wrote:
> I noticed that if a domain fails to restore, the ref count in the xattr
> 'trusted.libvirt.security.ref_selinux' keeps on increasing indefinitely
> and the VM will never restore even if the root cause for the restore
> failure has been removed. The reason seems to be that the code to
> decrease
> the ref count never gets called because the block above it fails due
> to virSecuritySELinuxTransactionAppend() failing. The simple solution
> seems to be to revert the order in which things are done.
>
> Signed-off-by: Stefan Berger <stefanb(a)linux.ibm.com>
> ---
> src/security/security_selinux.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
So what was the root cause, did you ever find out?
The root cause of the failure was basically me (intentionally) changing
the password for the encrypted vTPM to a wrong password and then QEMU
cannot *resume* 'swtpm' since it cannot decrypt the state. The operation
on virt-manager level was a restore of a saved VM.
>
> diff --git a/src/security/security_selinux.c
> b/src/security/security_selinux.c
> index ea20373a90..9fd29e9bca 100644
> --- a/src/security/security_selinux.c
> +++ b/src/security/security_selinux.c
> @@ -1499,14 +1499,9 @@
> virSecuritySELinuxRestoreFileLabel(virSecurityManagerPtr mgr,
> goto cleanup;
> }
> - if ((rc = virSecuritySELinuxTransactionAppend(path, NULL,
> - false, recall,
> true)) < 0) {
> - goto cleanup;
> - } else if (rc > 0) {
> - ret = 0;
> - goto cleanup;
> - }
> -
> + /* Recall the label so the ref count label decreases its counter
> + * even if transaction append below fails.
> + */
> if (recall) {
> rc = virSecuritySELinuxRecallLabel(newpath, &fcon);
> if (rc == -2) {
> @@ -1519,6 +1514,14 @@
> virSecuritySELinuxRestoreFileLabel(virSecurityManagerPtr mgr,
> }
> }
> + if ((rc = virSecuritySELinuxTransactionAppend(path, NULL,
> + false, recall,
> true)) < 0) {
> + goto cleanup;
> + } else if (rc > 0) {
> + ret = 0;
> + goto cleanup;
> + }
> +
> if (!recall || rc == -2) {
> if (stat(newpath, &buf) != 0) {
> VIR_WARN("cannot stat %s: %s", newpath,
>
IIUC, this gets then called twice, doesn't it? I mean, transactions
are hackish a bit. We wanted chown() and setfilecon_raw() to run
inside the mount namespace of a domain. So instead of rewriting our
secdrivers and making them enter the namespace on every
chown()/setfilecon_raw() we put small code snippets just before these
calls that would record arguments for these functions in a thread
local variable and then transactionCommit() would enter the namespace
and replay all chown()/setfilecon_raw().
That's why there can't be any code before transactionAppend(), because
it would be called twice - once from regular call and from
transactionCommit(). For instance:
virSecuritySELinuxDomainSetPathLabel()
-> virSecuritySELinuxSetFilecon()
-> virSecuritySELinuxSetFileconHelper()
-> virSecuritySELinuxTransactionAppend()
virSecuritySELinuxTransactionCommit()
-> virSecuritySELinuxTransactionRun()
-> virSecuritySELinuxSetFileconHelper()
-> virSecuritySELinuxSetFileconImpl()
-> setfilecon_raw()
We need to find the root cause. Maybe our set is not on par with
restore, i.e. we are calling more set than restore and thus the
refcounter just keeps growing?
The root cause was as described above and I wanted to see how QEMU+SWTPM
react if I do this.
What I noticed was that the ...TransactionAppend() failed and didn't get
to the code to decrease the ref count, which then ended up increasing
forever. So I created this patch, which did indeed solve this particular
problem but then caused Signal 11's once I provoked an error with
starting (rather than resuming) a VM where again I had changed the
password (iirc).
Stefan
Michal