Am 01.07.2013 13:35, schrieb Daniel P. Berrange:
On Mon, Jul 01, 2013 at 01:25:28PM +0200, Richard Weinberger wrote:
> Am 01.07.2013 13:22, schrieb Daniel P. Berrange:
>> On Mon, Jul 01, 2013 at 01:05:23PM +0200, Richard Weinberger wrote:
>>> Am 01.07.2013 12:33, schrieb Daniel P. Berrange:
>>>> On Mon, Jul 01, 2013 at 08:29:14AM +0200, Richard Weinberger wrote:
>>>>> Any ideas what's going on here?
>>>>
>>>> No, it is very odd. It smells like a kernel issue to me. What
>>>> version are you running ?
>>>
>>> I see this issue on all kernels.
>>> Currently I'm using vanilla v3.9.x and v3.10.
>>>
>>>> I've also tried running the demo programs shown on the
LWN.net
>>>> article
>>>>
>>>>
https://lwn.net/Articles/532593/
>>>>
>>>> and they don't operate in the way described by the article - the
demo
>>>> programs continue to ru as 'nfsnobody' even after the mappings
are
>>>> setup.
>>>>
>>>> I'm just using the Fedora 3.9.4-303 kernel, rebuilt with userns
enabled
>>>> in KConfig. I'm wondering if there is still stuff missing in 3.9.x
>>>> that prevents this from working properly, or if the kernel behaviour
>>>> changed after those LWN articles were written.
>>>
>>> To me it looks like the capability system behaves odd.
>>> The mappings in /proc are fine as long I do not call capng_updatev().
>>> Also calling capng_updatev() with parameters that do not change the current
cap set
>>> triggers the odd behavior too.
>>>
>>> So we see two (related?) issues:
>>> 1. If we try updating the capabilities of pid1 /proc/1/ has unmapped files
till we exec().
>>> 2. Dropping capabilities does not work we always gain a fresh and full
capability set.
>>>
>>> BTW: I'm sure the issues are not caused by Gau Feng's userns
patches.
>>
>> Yeah, I've reproduced this problem with standalone code outside of
>> libvirt.
>>
>> Take the attached code and run
>
> -ENOATTACHMENT :-(
Now really attached.
I think I might know what is happening now though. When you start a new
namespace, you must mount a new instance of 'proc' filesystem. We are
not synchronizing this wrt setup of the uid/gid mappings though, so we
are racy. So I have a feeling we're creating the proc filesystem before
the mappings are setup. I'm going to add some synchronization in to see
if it makes a difference in this respect.
So you mount /proc and write the uid/gid mappings in parallel?
Both has to be done on the host side. Why is this parallel?
Thanks,
//richard