Am 06.06.2013 10:13, schrieb Daniel P. Berrange:
On Thu, Jun 06, 2013 at 10:07:26AM +0200, Richard Weinberger wrote:
> Am 06.06.2013 09:56, schrieb Daniel P. Berrange:
>> On Thu, Jun 06, 2013 at 08:57:21AM +0200, Richard Weinberger wrote:
>>> Hi!
>>>
>>> I'm facing the issue that "virsh lxc-enter-namespace ..." does
not work for me.
>>> setns() always fails with EINVAL.
>>>
>>> Reading the code confused me a bit, maybe you can help me. :D
>>>
>>> virsh itself calls:
>>> cmdLxcEnterNamespace()
>>> virDomainLxcOpenNamespace()
>>> conn->driver->domainLxcOpenNamespace()
>>>
>>> Here comes the first thing that is not clear to me.
>>> conn->driver seems to be the remote driver and therefore
>>> ->domainLxcOpenNamespace is remoteDomainLxcOpenNamespace()
>>> Why is lxc:/// a remote connection?
>>>
>>> remoteDomainLxcOpenNamespace() does a rpc call to libvirtd.
>>>
>>> On the remote side libvirtd does:
>>>
>>> lxcDispatchDomainOpenNamespace(), which opens the namespace fds,
>>> and sends them back as result.
>>> How can this work? Does it somewhere magic file descriptor passing
>>> on AF_UNIX?
>>
>> Yes, we use SCM_RIGHTS to pass FDs.
>>
>>> virsh then receives the fd's (pure numbers) and setns() failed badly.
>>>
>>> Wouldn't it make much more sense to do the open(/proc/XXX/ns/{mnt, user,
...}) and setns()
>>> calls directly on the local side? IOW directly in virsh?
>>> driver->domainLxcOpenNamespace() should only report the process id of the
container's
>>> init process.
>>
>> The reason for doing it server side is to get privilege separation.
>> eg libvirtd runs privileged to open the fds, and virsh can run
>> unprivileged with setns(). Unfortunately it seems the kernel
>> doesn't allow for the thing calling setns() to be unprivileged
>> at this time, but the design allows for this enhancement in the
>> future.
>
> setns() needs CAP_SYS_ADMIN() and the manpage also says:
The hope is that this can be relaxed - it ought to be sufficient to
just restrict access to the /proc/$PID/ns/ files to enforce permissions,
or require CAP_SYS_ADMIN when opening the files only. I can't see any
compelling reason why you should require CAP_SYS_ADMIN on setns() itself
once you have the FDs open.
> ERRORS:
> ...
> EINVAL fd refers to a namespace whose type does not match that specified in nstype,
or there is problem with reassociating the the thread with the specified namespace.
>
>
> I'm sure in my case setns() fails because the calling thread did not open() the
ns files itself.
Do you have user namespaces enabled by chance ?
> What is the plan to make lxc-enter-namespace work?
> Privilege separation is nice but as of now the kernel interface (setns()) seems not
to allow this.
> Are you forcing the kernel guys to change the interface?
It has long worked fine on Fedora, though we do not have user namespaces
enabled since parts of the kernel are yet to be ported to that (XFS in
particular). My best guess is that user namespaces may have caused a
regression in this ability to call setns() from a separate process.
I can confirm that lxc-enter-namespace works fine when I disable CONFIG_USER_NS
in my kernel.
Currently I'm moving my old LXC setup over to libvirt and later I'll enable user
namespaces
too.
Let's see what else breaks. ;-)
Stay tuned!
Thanks,
//richard