On 9/27/18 4:51 AM, Michal Privoznik wrote:
On 09/26/2018 11:14 PM, John Ferlan wrote:
>
[...]
>>> I read the cover, it's not simpler than the above.
>>
>> Yeah, this is not that simple bug. I'll try to explain it differently:
>>
>> On fork(), all FDs of the parent are duplicated to child. So even if
>> parent closes one of them, the connection is not closed until child
>> closes the same FD. And since metadata locking is very fragile when it
>> comes to connection to virtlockd, this fork() behaviour actually plays
>> against us because it violates our assumption that connection is closed
>> at the end of metadataUnlock(). In fact it is not - child has it still open.
>>
>> If there was a flag like DONT_DUP_ON_FORK I could just set it on
>> virtlockd connection in metadataLock() and be done.
>
> Avoid the forking exec's works too ;-}
>
> OK - instead of talking in generalities - which forking dup'd fd is in
> play here? I don't want to assume or guess any more.
The connection FD.
Maybe I'm misreading your comments or the other way around. I'll try to
explain what is the problem one last time, in the simplest way I can.
1) Thread A wants to start a domain so it calls
virSecurityManagerMetadataLock().
2) The function opens a connection to virtlockd (represented by FD) and
duplicates the FD so that connection is kept open until unlock.
Yep, virLockManagerLockDaemonAcquire calls virNetClientDupFD(false)
meaning I don't want one with the "FD_CLOEXEC" flag set.
While normally that would be OK - is it OK for this path? Does anything
change if the argument was true here (e.g. cloexec = priv->type ==
VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON)?
It seems to me that the 3 patches together dance around two things -
that the CLOEXEC is set and that TransactionCommit does the fork to
avoid the parent/child duplication problem.
3) Now that locks are acquired, the thread does chown() and setfcon().
4) the virSecurityManagerMetadataUnlock() function is called, which
causes virtlockd to release all the locks. Subsequently to that, the
function closes the duplicated FD obtained in step 2).
The duplication is done because both virLockManagerAcquire() and
virLockManagerRelease() open new connection, do the RPC talk and close
the connection. That's just the way the functions are written. Now,
because of FD duplication happening in step 2) the connection from
virLockManagerAcquire() is not really closed until step 4) where the
duplicated FD is closed.
This is important to realize: the connection is closed only when the
last FD representing it is closed. If I were to duplicate the connection
FD a hundred times and then close 99 of those duplicates the connection
would still be kept open.
I've understood all this.
Now, lets put some threads into the picture. Imagine thread B which
called fork() just in between steps 2) and 3). On fork, ALL partent FDs
are duplicated. So the connection FD is duplicated into the child
process too. Therefore, in step 4) where we think the connection is
closed - well it is not really because there is still one FD around - in
the child.
This is exactly where the story gets cloudy for me. Is ThreadB related
to ThreadA? It must be, because how would it get the duplicated fd, right?
But ironically we save priv->clientfd, so if we know we've forked a
child can we immediately close it? Knowing that our parent still has it
open? Off in left field again?
And this is huge problem, because we do not know when child is going to
close it. It may be (unintentionally) an evil child and close the FD at
the worst time possible - when libvirtd holds some locks.
Important thing to realize #2: whenever a connection closes, virtlockd
looks into its internal records and if PID on the other side held some
locks then: a) the locks are releasd, b) the PID is killed.
I'm sorry, I can't explain this any simpler. I'll try to find somebody
who understands this and have them review the patch.
I think sometimes it's better to see actual calls and traces rather than
using "imagine" if type scenarios. I'm sure this is *a lot* easier when
you have a couple of debug sessions running and see things happening in
real time.
If someone else reviews this it doesn't matter to me, but I think this
patch is wrong for a couple of reasons. That reasoning doesn't seem to
be well received. If this patch was accepted as is, then you'd be
calling virProcessRunInMountNamespace even when namespaces weren't being
used, even for domain locks. Something I saw in the traces that Marc
posted this morning and starting thinking, hey wait this code is only
supposed to be run when namespaces are enabled. So I have to assume
that's being used in his environment. Would the same problem exist if
namespaces weren't being used?
As an alternative approach, if we did not close the connection in 2) and
Did you mean clone or close?
4) we don't have to care when the child closes the FD because it
will
not cause the connection to actually close. I wrote a patch like that
for previous versions but it was disliked. Maybe I can write it from
scratch again and have that one ACKed instead.
>
[...]
>>
>> I don't think there is any other way to prevent fork() cloning an FD.
>>
>
> OpenVMS (hahaha)
I don't think we want to revert all metadata locking patches only to
implement support for different beast. Also, we don't need distributed
locking.
Michal
I wasn't serious hence the (hahaha), but quite frankly what I know about
dlm I believe it would certainly help, but that's not the point here. I
do recall though always having some issue with fd locking in various
projects I've been involved with.
John