On 10/12/2018 04:20 PM, John Ferlan wrote:
On 10/12/18 4:45 AM, Michal Privoznik wrote:
> On 10/11/2018 11:06 PM, John Ferlan wrote:
>>
>>
>> On 10/11/18 8:03 AM, Michal Privoznik wrote:
>>> In the next commit the virSecurityManagerMetadataLock() is going
>>> to be turned thread unsafe. Therefore, we have to spawn a
>>> separate process for it. Always.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>>> ---
>>> src/security/security_dac.c | 12 ++++++------
>>> src/security/security_selinux.c | 12 ++++++------
>>> 2 files changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/src/security/security_dac.c b/src/security/security_dac.c
>>> index da4a6c72fe..21db3b9684 100644
>>> --- a/src/security/security_dac.c
>>> +++ b/src/security/security_dac.c
>>> @@ -562,12 +562,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr
mgr ATTRIBUTE_UNUSED,
>> ^^^^
>>
>> The virSecurityDACTransactionCommit description would need some
>> adjustment.... Well a lot of adjustment... and the caller
>> virSecurityManagerTransactionCommit would also require some adjustment.
>>
>>> goto cleanup;
>>> }
>>>
>>> - if ((pid == -1 &&
>>> - virSecurityDACTransactionRun(pid, list) < 0) ||
>>> - (pid != -1 &&
>>> - virProcessRunInMountNamespace(pid,
>>> - virSecurityDACTransactionRun,
>>> - list) < 0))
>>> + if (pid == -1)
>>> + pid = getpid();
>>> +
>>> + if (virProcessRunInMountNamespace(pid,
>>> + virSecurityDACTransactionRun,
>>> + list) < 0)
>>
>> As described in the v1 series cover letter, namespaces would need to be
>> enabled for this to work.
>
> Not really. I mention namespaces just to show that on majority of setups
> we do fork() when relabeling paths anyway. Some might be worried that
> fork()-ing is very expensive and therefore might be against this
> approach. I am trying to prove them otherwise. In fact, this patch is
> just about that - fork() every time, even when namespaces are disabled.
Considering the number of fork's out there anyway, that expense wasn't
part of my consideration. It's a means to an end.
If someone disabled Namespace's for whatever reason, then how would this
path work?
virProcessRunInMountNamespace => virProcessNamespaceHelper (child)
I get that it's "just" a fork, but my point is more that the namespace
is Namespace and if that's not true, then isn't the premise of the patch
not true?
Every Linux process runs in a namespace (unless kernel was compiled
without NS support). That is, even libvirtd runs in a namespace:
# ls -l /proc/$(pgrep libvirtd)/ns
lrwxrwxrwx. 1 root root 0 Oct 16 09:06 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx. 1 root root 0 Oct 16 09:06 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx. 1 root root 0 Oct 16 09:06 mnt -> 'mnt:[4026531840]'
...
And no IDC about non Linux (where Namespace wouldn't be available) since
non Linux wouldn't have DAC and/or SELinux either.
Actually, we have to care about non Linux, e.g. BSD does have DAC. Or
even Linux where namespaces were disabled (unlikely but not impossible).
Perhaps the light dawning over marblehead moment for me is that the
security driver will use Namespaces similar to how the QEMU driver uses
them, but instead in the context of whatever process/thread that isn't
necessarily the domain process (in this case the libvirtd process).
Since it's been proven that the code works for the QEMU domain
processing, ergo it should "just work" for the libvirtd security driver
processing. That's very different from the commit message above and
perhaps the "assumption" I've been thinking under. There isn't a
"relationship" between the qemu.conf usage of namespaces and what's
being done here except for the libvirtd process as opposed to the qemu-*
process will use fork's and Namespaces.
Something perhaps for both the commit message *and* at least the two
modules that change here to more clearly describe the @pid usage. The
virSecurityManagerTransactionCommit will need adjustment too in order to
indicate that the underlying code will run in the Namespace specified by
the caller rather than the Namespace of the caller (a really vague, but
important difference).
>
> Again, I have no numbers to support that it's the majority, but since
> namespaces are turned on by default (and no new bugs regarding
> namespaces were to be seen recently) I am assuming that is the case.
>
>> Whether they are enabled true everywhere,
>> who's to say. I always assumed namespaces were used for a somewhat
>> narrow band of things and didn't pay that close attention to them.
>
> No. Unless you take the extra step and disable them in qemu.conf you're
> using them.
Well disabling isn't really well described in the qemu.conf description,
but I'm fairly certain it's by uncommenting the entry and removing the
"mount" from the [ ], leaving just "namespaces = [ ]"
For some more data points using the previously failing guest that now
works with the alteration below.
Running with the "default":
# nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
16
# ls -l /dev | wc -l
269
#
By changing to "namespaces = [ ]", I get:
# nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
269
# ls -l /dev | wc -l
269
#
Which I'm not sure I expected, but maybe that's right.
It is, because this means that for namespaces=[] case there is exactly
the same number of items in your host /dev and what qemu sees. Compared
to previous case when namespaces=["mnt"] (the default) host /dev
contained more entries and than what qemu saw.
Next, if I run without the changes in this series and adding a couple of
debug messages to show when virProcessNamespaceHelper is called, when
namespaces = [ ] there are no calls to to the helper; whereas, there are
two using the default (I assume one each for DAC and SELinux).
With this series applied, when namespaces = [ ] there are 8 calls to to
the helper (all from the "parent" gdb pid); whereas, for the default the
last 2 come from the domain.
I assume that's all expected (at this point).
> And the fact, that you haven't noticed feels like pat on a
> shoulder. Good job :-) You can verify this by running the following:
>
> # nsenter -m -t $(pgrep qemu) ls -l /dev
>
> (works if there's just one qemu process, or just replace pgrep with qemu
> pid)
>
> You will see very few items in /dev compared to how 'real' /dev looks:
>
> # nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
> 15
> # ls -l /dev | wc -l
> 188
>
>>
>> I will note that will a couple of well placed VIR_WARN()'s in each of
>> these functions, I can see DAC and SELinux transactionCommit's getting
>> called. Before the guest actually starts they're called with -1, then
>> once it starts there's a single call with the domain PID. So I guess I'm
>> using NS and didn't know.
>
> Yes. However, during the cmd line construction process some files are
> created and qemuSecurityDomainSetPathLabel() is called even if there is
> no qemu running yet. Therefore, security driver sees pid == -1. For
> instance, qemuDomainWriteMasterKeyFile() is such function. But this is
> not a problem (from metadata locking POV).
>
>>
>> However, looking at the qemu.conf namespace setting I note things like
>> "privileged" and "!defined(HAVE_SYS_ACL_H) ||
!defined(WITH_SELINUX)"
>> So this all then assumes MountNamespace is being used. And what if it
>> isn't? Does everything that's security related start failing? and the
>> only way to back that out is revert this?
>
> No. Namespaces have nothing to do with this feature. Namespaces are
> Linux feature and therefore when qemu driver initializes itself it has
> to make decision on the defaults. And currently the default is to enable
> namespaces if supported by host (= Linux && (at least one of {ACL,
> SELinux} was enabled in the build) ). I'm not going to explain that
> process now as it has nothing to do with this feature.
>
>>
>> I guess, what I don't understand is why this usage pattern cannot be
>> limited to meta locking. That is, if we're metalocking, then add the
>> pid; otherwise, business as usual. At least that limits the scope.
>
> Namespaces and metadata locking are orthogonal features.
>
As I note above, I think I've had the light dawning moment now. I think
prior to this point I've been under the impression that the QEMU domain
setting for namespaces was somehow "affecting" or "related to"
what's
being done/changed here. In reality it's making use or reusing the same
code that QEMU uses.
Still the one reservation is - what happens if setns isn't available.
For that I think you need to somehow decide that it isn't available and
just pass -1, unless you have some other grand idea.
Yeah, I need to have a patch that introduces "virProcessRunInFork" which
will just fork() and run the callback in the child. Something like
virProcessRunInMountNamespace but this will be called when namespaces
are not available.
Will send v3 in a while.
Michal