On 09/27/2018 03:33 PM, John Ferlan wrote:
On 9/27/18 4:51 AM, Michal Privoznik wrote:
> On 09/26/2018 11:14 PM, John Ferlan wrote:
>>
[...]
>>>> I read the cover, it's not simpler than the above.
>>>
>>> Yeah, this is not that simple bug. I'll try to explain it differently:
>>>
>>> On fork(), all FDs of the parent are duplicated to child. So even if
>>> parent closes one of them, the connection is not closed until child
>>> closes the same FD. And since metadata locking is very fragile when it
>>> comes to connection to virtlockd, this fork() behaviour actually plays
>>> against us because it violates our assumption that connection is closed
>>> at the end of metadataUnlock(). In fact it is not - child has it still open.
>>>
>>> If there was a flag like DONT_DUP_ON_FORK I could just set it on
>>> virtlockd connection in metadataLock() and be done.
>>
>> Avoid the forking exec's works too ;-}
>>
>> OK - instead of talking in generalities - which forking dup'd fd is in
>> play here? I don't want to assume or guess any more.
>
> The connection FD.
>
> Maybe I'm misreading your comments or the other way around. I'll try to
> explain what is the problem one last time, in the simplest way I can.
>
> 1) Thread A wants to start a domain so it calls
> virSecurityManagerMetadataLock().
>
> 2) The function opens a connection to virtlockd (represented by FD) and
> duplicates the FD so that connection is kept open until unlock.
Yep, virLockManagerLockDaemonAcquire calls virNetClientDupFD(false)
meaning I don't want one with the "FD_CLOEXEC" flag set.
While normally that would be OK - is it OK for this path? Does anything
change if the argument was true here (e.g. cloexec = priv->type ==
VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON)?
It seems to me that the 3 patches together dance around two things -
that the CLOEXEC is set and that TransactionCommit does the fork to
avoid the parent/child duplication problem.
Let's forget about CLOEXEC for a moment. I'll wait for Dan to return (or
find somebody else) for those patches. This is the only patch left for
consideration.
>
> 3) Now that locks are acquired, the thread does chown() and setfcon().
>
> 4) the virSecurityManagerMetadataUnlock() function is called, which
> causes virtlockd to release all the locks. Subsequently to that, the
> function closes the duplicated FD obtained in step 2).
>
> The duplication is done because both virLockManagerAcquire() and
> virLockManagerRelease() open new connection, do the RPC talk and close
> the connection. That's just the way the functions are written. Now,
> because of FD duplication happening in step 2) the connection from
> virLockManagerAcquire() is not really closed until step 4) where the
> duplicated FD is closed.
>
> This is important to realize: the connection is closed only when the
> last FD representing it is closed. If I were to duplicate the connection
> FD a hundred times and then close 99 of those duplicates the connection
> would still be kept open.
I've understood all this.
>
> Now, lets put some threads into the picture. Imagine thread B which
> called fork() just in between steps 2) and 3). On fork, ALL partent FDs
> are duplicated. So the connection FD is duplicated into the child
> process too. Therefore, in step 4) where we think the connection is
> closed - well it is not really because there is still one FD around - in
> the child.
This is exactly where the story gets cloudy for me. Is ThreadB related
to ThreadA? It must be, because how would it get the duplicated fd, right?
They are related in a sense that they run in the same process. But they
don't have to be related at all for fork() to duplicate all the FDs.
fork() does not ask if two threads are related, it just duplicates ALL
opened FDs into the child.
But ironically we save priv->clientfd, so if we know we've forked a
child can we immediately close it? Knowing that our parent still has it
open? Off in left field again?
I'm failing to see how to achieve that. fork() is occurring more than
you might think - every time virCommandRun() is called, every time we
try to open a file on NFS, and million of other places. Now for every
place we would have to put close() as the first thing that child does
(and even then that would be unsafe because you are not guaranteed when
the child is scheduled to run), and there might be places where we are
unable to put close(), e.g. if some library we link with fork()-s too.
Also, requiring close() would mean making the FD globally available
which goes against all our code separation efforts.
>
> And this is huge problem, because we do not know when child is going to
> close it. It may be (unintentionally) an evil child and close the FD at
> the worst time possible - when libvirtd holds some locks.
>
> Important thing to realize #2: whenever a connection closes, virtlockd
> looks into its internal records and if PID on the other side held some
> locks then: a) the locks are releasd, b) the PID is killed.
>
> I'm sorry, I can't explain this any simpler. I'll try to find somebody
> who understands this and have them review the patch.
I think sometimes it's better to see actual calls and traces rather than
using "imagine" if type scenarios. I'm sure this is *a lot* easier when
you have a couple of debug sessions running and see things happening in
real time.
If someone else reviews this it doesn't matter to me, but I think this
patch is wrong for a couple of reasons. That reasoning doesn't seem to
be well received. If this patch was accepted as is, then you'd be
calling virProcessRunInMountNamespace even when namespaces weren't being
used, even for domain locks. Something I saw in the traces that Marc
posted this morning and starting thinking, hey wait this code is only
supposed to be run when namespaces are enabled. So I have to assume
that's being used in his environment. Would the same problem exist if
namespaces weren't being used?
Yes. Namespaces have nothing to do with this.
>
> As an alternative approach, if we did not close the connection in 2) and
Did you mean clone or close?
I meant close. However, turns out it's easier said than done. virtlockd
is not quite happy about some remote procedures being called multiple
times over one connection. And yet I have to call them. I'll look deeper
but so far this is the only fix we have.
Michal