On 09/25/2018 10:52 PM, John Ferlan wrote:
On 9/21/18 5:29 AM, Michal Privoznik wrote:
> There is this latent bug which can result in virtlockd killing
> libvirtd. The problem is when the following steps occur:
>
> Parent | Child
> ------------------------------------------------------+----------------
> 1) virSecurityManagerMetadataLock(path); |
> This results in connection FD to virtlockd to be |
> dup()-ed and saved for later use. |
> |
> 2) Another thread calling fork(); |
> This results in the FD from step 1) to be cloned |
> |
> 3) virSecurityManagerMetadataUnlock(path); |
> Here, the FD is closed, but the connection is |
> still kept open because child process has |
> duplicated FD |
> |
> 4) virSecurityManagerMetadataLock(differentPath); |
> Again, new connection is opened, FD is dup()-ed |
> |
> 5) | exec() or exit()
>
> Step 5) results in closing the connection from 1) to be closed,
> finally. However, at this point virtlockd looks at its internal
> records if PID from 1) does not still own any resources. Well it
> does - in step 4) it locked differentPath. This results in
> virtlockd killing libvirtd.
>
> Note that this happens because we do some relabelling even before
> fork() at which point vm->pid is still -1. When this value is
> passed down to transaction code it simply runs the transaction
> in parent process (=libvirtd), which means the owner of metadata
> locks is the parent process.
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
Always good to double up on the S-o-b's when there's locks and forks
involved. That makes sure the fork gets one too ;-)
> ---
> src/security/security_dac.c | 12 ++++++------
> src/security/security_selinux.c | 12 ++++++------
> 2 files changed, 12 insertions(+), 12 deletions(-)
>
I read the cover, it's not simpler than the above.
Yeah, this is not that simple bug. I'll try to explain it differently:
On fork(), all FDs of the parent are duplicated to child. So even if
parent closes one of them, the connection is not closed until child
closes the same FD. And since metadata locking is very fragile when it
comes to connection to virtlockd, this fork() behaviour actually plays
against us because it violates our assumption that connection is closed
at the end of metadataUnlock(). In fact it is not - child has it still open.
If there was a flag like DONT_DUP_ON_FORK I could just set it on
virtlockd connection in metadataLock() and be done.
Perhaps I could use pthread_atfork() but that might be very tricky - I
am not sure if some library that we are linking with does not use it and
it pthreads are capable of dealing with us and the library registering
their own callbacks.
It also indicates
that patches 1-3 don't have any bearing. So to me that says - 2 series,
one the artsy/fartsy cleanup variety and the second this edge/corner
condition chase based on timing and/or other interactions.
My first question is - have you bisected your changes? Since this only
seems to matter for metadata locking, you don't necessarily have to
start too early, but I would assume that prior to commit 8b8aefb3 there
is no issue.
I haven't. I'm not quite sure why would I do that. I already know the
problem and yes, it's in metadata locking. From quick skim through git
log: 7a9ca0fa and 6d855abc14338 look like the result of bisect.
My second question is - given my analysis in patch2, was testing of this
patch done with or without patch 2 and 3? Hard to review this without
having my patch2 analysis rattling around in short term memory that
works. At this point, I'm thinking they were in place and may have had
an impact especially since that dup() call was removed, but it's
mentioned during this commit message, which is really confusing.
I've tested the whole series not individual patches. So this patch was
tested with 2 and 3 applied.
However, it's true that since we will spawn a separate process for every
transaction (and dup() is called only from there) it is not possible to
actually have FD leak (because the transaction code does not fork()
further). Having said that, there are other callers that duplicate
socket FD and the problem is real (even though after this patch metadata
locking is not affected).
Prior to patch2/3, virNetSocketDupFD has 4 callers of which only one
calls with 'false' and that's the path having problems.
libxlMigrateDstReceive (using true)
libxlDomainMigrationSrcPerform (using true)
qemuMigrationSrcConnect (using true)
Since these were calling virNetSocketDupFD(cloexec=true) anyway, nothing
changed for them.
virNetClientDupFD (using false)
Since your testing certainly doesn't include the first 3, I'll focus on
the last one which has but one caller:
virLockManagerLockDaemonAcquire
meaning without patches 2-3, we got a "dup()" socket back instead of
changing the existing socket to F_DUPFD_CLOEXEC. I think I just talked
myself in a circle, but perhaps gives you an understanding of the
struggle with all the patches in one series.
> diff --git a/src/security/security_dac.c b/src/security/security_dac.c
> index 5aea386e7c..876cca0f2f 100644
> --- a/src/security/security_dac.c
> +++ b/src/security/security_dac.c
> @@ -561,12 +561,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr mgr
ATTRIBUTE_UNUSED,
> goto cleanup;
> }
>
> - if ((pid == -1 &&
> - virSecurityDACTransactionRun(pid, list) < 0) ||
> - (pid != -1 &&
> - virProcessRunInMountNamespace(pid,
> - virSecurityDACTransactionRun,
> - list) < 0))
> + if (pid == -1)
> + pid = getpid();
> +
> + if (virProcessRunInMountNamespace(pid,
> + virSecurityDACTransactionRun,
> + list) < 0)
Going back to commit d41c1621, it essentially states:
"To differentiate whether transaction code should fork() or not the @pid
argument now accepts -1 (which means do not fork)."
This commit then backtracks or undoes that by using a getpid() causing
every transaction to fork.
The "logic" for the decision to pass pid == -1 or not is introduced in
the subsequent commit 37131ada based on qemuDomainNamespaceEnabled.
Thus with this patch every domain w/ or w/o namespace and daemon commit
path will fork. So if this logic is to stay, then there's no reason to
have the namespace check either, is there? Is the vm->pid different than
getpid()? IOW: I think this patch is too big of a leap.
Yes, vm->pid is different to getpid(). If there is a domain running with
namespaces enabled then we definitely wants to use vm->pid to enter the
qemu's namespace and relabel /dev/blah there instead of the namespace
that libvirtd (=getpid()) is running. However, some labelling is done
prior to running qemu (and setting up its namespace). For instance:
qemuDomainWriteMasterKeyFile, qemuProcessStartManagedPRDaemon,
qemuProcessMakeDir - they all call qemuSecurityDomainSetPathLabel() at
the time when no namespace was built yet.
But since we initialize vm->pid to -1 then the namspace check feels
redundant. I agree with that. In virSecurityDACTransactionCommit() it
does not matter if pid = -1 because vm->pid is -1 or because no
namespaces are used and therefore -1 comes from variable initialization
in qemuSecurityDomainSetPathLabel().
However, the contention from the cover is that this issue only happens
when metadata locking is enabled.
Thus, rather than getpid() for everyone, maybe it's a getpid() for
metadata lock consumers. Seems for pid == -1 we could query @mgr for
what it knows, like virSecurityManagerGetPrivateData and then if
priv->type == VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON, use getpid(). Of
course then @mgr loses its ATTRIBUTE_UNUSED.
The whole point of always fork()-ing is to avoid leaking FD even into
child process. Patches 2 and 3 are trying to avoid leaking FD into
exec(), this one is trying to avoid leaking it into fork(). I mean, if
the connection FD is cloned into child, we can't be sure when child
decides to close it. It may close it at the worst possible moment - when
libvirtd holds some locks (in virtlockd). And because virtlockd locks
are not associated with connection (only with PID) from virtlockd
perspective it looks like libvirtd has closed connection and yet is
still holding some locks. BANG! that's where virtlockd kills libvirtd.
I don't think there is any other way to prevent fork() cloning an FD.
My solution is to fork() in virSecurityDACTransactionCommit() (or
SELinux counterpart) which duplicates only the thread that is executing
the transaction and runs only virSecurityDACTransactionRun() which is
fork() free.
Furthermore, if virLockManagerLockDaemonAcquire knows that we have a
daemon type lock, would that virNetClientDupFD need to be altered to use
@true instead of @false? Maybe I'm over thinking this part - but with
patch2 and patch3 adjustments, it's just not clear in my head anymore...
Patches 2 and 3 are not related to metadata locking anymore. However,
they started as such (when I was writing them). Anyway, that fact does
not reduce their importance.
Michal