On 07/01/2013 07:05 PM, Richard Weinberger wrote:
Am 01.07.2013 12:33, schrieb Daniel P. Berrange:
> On Mon, Jul 01, 2013 at 08:29:14AM +0200, Richard Weinberger wrote:
>> Am 01.07.2013 04:26, schrieb Gao feng:
>>>> Well, given that we're at rc2 now & I'm still unclear about
how some
>>>> aspects of the userns setup is working, I'm afraid we'll have to
wait
>>>> until 1.1.1 for the userns LXC code to merge. I'll aim todo it next
>>>> week, so that we have plenty of time for further testing before the
>>>> 1.1.1 release.
>>>>
>>>
>>> Ok, I think Richard had tested the userns support.
>>> Hi Richard, can you give me your ack or tested-by?
>>
>> I'm still facing one userns related issue.
>
> [snip]
>
>> After creating it attach to it's console, you'll find bash as pid 1.
>> And you'll find that /proc/1/ is not fully uid/gid-mapped:
>> ---cut---
>> # ls -la /proc/1/
>> total 0
>> dr-xr-xr-x 8 root root 0 Jul 1 06:06 .
>> dr-xr-xr-x 74 nobody nogroup 0 Jul 1 06:06 ..
>> dr-xr-xr-x 2 root root 0 Jul 1 06:06 attr
>
> [snip]
>
>> Any ideas what's going on here?
>
> No, it is very odd. It smells like a kernel issue to me. What
> version are you running ?
I see this issue on all kernels.
Currently I'm using vanilla v3.9.x and v3.10.
> I've also tried running the demo programs shown on the
LWN.net
> article
>
>
https://lwn.net/Articles/532593/
>
> and they don't operate in the way described by the article - the demo
> programs continue to ru as 'nfsnobody' even after the mappings are
> setup.
>
> I'm just using the Fedora 3.9.4-303 kernel, rebuilt with userns enabled
> in KConfig. I'm wondering if there is still stuff missing in 3.9.x
> that prevents this from working properly, or if the kernel behaviour
> changed after those LWN articles were written.
To me it looks like the capability system behaves odd.
The mappings in /proc are fine as long I do not call capng_updatev().
Also calling capng_updatev() with parameters that do not change the current cap set
triggers the odd behavior too.
This issue is occured after we call setuid, the init task of container is set to be
un-dumpable
after setuid. I don't know why, the kernel set the owner of /proc/<pid>/* to
root user of host when
the task is un-dumpable.
So we see two (related?) issues:
1. If we try updating the capabilities of pid1 /proc/1/ has unmapped files till we
exec().
2. Dropping capabilities does not work we always gain a fresh and full capability set.
This problem disappeared after
1, remove capabilities dropping
2, call prctl(PR_SET_DUMPABLE, 1) after setuid/gid.
BTW: I'm sure the issues are not caused by Gau Feng's userns
patches.
I think this more like a kernel bug. we should set the owner of /proc/<pid>/* to the
root user
of container not the host.
Feel free to add:
Acked-by: Richard Weinberger <richard(a)nod.at>
Tested-by: Richard Weinberger <richard(a)nod.at>
Thanks for your help!
Gao