On 07/01/2013 07:05 PM, Richard Weinberger wrote:
> Am 01.07.2013 12:33, schrieb Daniel P. Berrange:
>> On Mon, Jul 01, 2013 at 08:29:14AM +0200, Richard Weinberger wrote:
>>> Am 01.07.2013 04:26, schrieb Gao feng:
>>>>> Well, given that we're at rc2 now & I'm still unclear
about how some
>>>>> aspects of the userns setup is working, I'm afraid we'll have
to wait
>>>>> until 1.1.1 for the userns LXC code to merge. I'll aim todo it
next
>>>>> week, so that we have plenty of time for further testing before the
>>>>> 1.1.1 release.
>>>>>
>>>>
>>>> Ok, I think Richard had tested the userns support.
>>>> Hi Richard, can you give me your ack or tested-by?
>>>
>>> I'm still facing one userns related issue.
>>
>> [snip]
>>
>>> After creating it attach to it's console, you'll find bash as pid 1.
>>> And you'll find that /proc/1/ is not fully uid/gid-mapped:
>>> ---cut---
>>> # ls -la /proc/1/
>>> total 0
>>> dr-xr-xr-x 8 root root 0 Jul 1 06:06 .
>>> dr-xr-xr-x 74 nobody nogroup 0 Jul 1 06:06 ..
>>> dr-xr-xr-x 2 root root 0 Jul 1 06:06 attr
>>
>> [snip]
>>
>>> Any ideas what's going on here?
>>
>> No, it is very odd. It smells like a kernel issue to me. What
>> version are you running ?
>
> I see this issue on all kernels.
> Currently I'm using vanilla v3.9.x and v3.10.
>
>> I've also tried running the demo programs shown on the
LWN.net
>> article
>>
>>
https://lwn.net/Articles/532593/
>>
>> and they don't operate in the way described by the article - the demo
>> programs continue to ru as 'nfsnobody' even after the mappings are
>> setup.
>>
>> I'm just using the Fedora 3.9.4-303 kernel, rebuilt with userns enabled
>> in KConfig. I'm wondering if there is still stuff missing in 3.9.x
>> that prevents this from working properly, or if the kernel behaviour
>> changed after those LWN articles were written.
>
> To me it looks like the capability system behaves odd.
> The mappings in /proc are fine as long I do not call capng_updatev().
> Also calling capng_updatev() with parameters that do not change the current cap set
> triggers the odd behavior too.
>
This issue is occured after we call setuid, the init task of container is set to be
un-dumpable
after setuid. I don't know why, the kernel set the owner of /proc/<pid>/* to
root user of host when
the task is un-dumpable.
> So we see two (related?) issues:
> 1. If we try updating the capabilities of pid1 /proc/1/ has unmapped files till we
exec().
> 2. Dropping capabilities does not work we always gain a fresh and full capability
set.
>
This problem disappeared after
1, remove capabilities dropping
2, call prctl(PR_SET_DUMPABLE, 1) after setuid/gid.
> BTW: I'm sure the issues are not caused by Gau Feng's userns patches.
I think this more like a kernel bug. we should set the owner of /proc/<pid>/* to
the root user
of container not the host.
You can try the program attached, the owner of /proc/<pid of this program>/* is
incorrect too.
Hmm, it's better to fix this problem in kernel. it's most like a userns bug.
Thanks