On Wed, Nov 13, 2013 at 02:53:05PM +0000, Daniel P. Berrange wrote:
On Fri, Nov 08, 2013 at 02:42:26PM -0500, Rich Felker wrote:
> On Fri, Nov 08, 2013 at 01:30:09PM +0800, Daniel P. Berrange wrote:
> > On Thu, Nov 07, 2013 at 09:15:43PM +0800, Gao feng wrote:
> > > I met a problem that container blocked by seteuid/setegid
> > > which is call in lxcContainerSetID on UP system and libvirt
> > > compiled with --with-fuse=yes.
> > >
> > > I looked into the glibc's codes, and found setxid in glibc
> > > calls futex() to wait for other threads to change their
> > > setxid_futex to 0(see setxid_mark_thread in glibc).
> > >
> > > since the process created by clone system call will not
> > > share the memory with the other threads and the context
> > > of memory doesn't changed until we call execl.(COW)
> > >
> > > So if the process which created by clone is called before
> > > fuse thread being stated, the new setxid_futex of fuse
> > > thread will not be saw in this process, it will be blocked
> > > forever.
> > >
> > > Maybe this problem should be fixed in glibc, but I send
> > > this patch as a quick fix.
> >
> > Can you show a stack trace of the threads/processes deadlocking
>
> I think this is a symptom of setxid not being async-signal-safe like
> it's required to be. I'm not sure if we have a bug tracker entry for
> that; if not, it should be added. But if clone() is being used except
> in a fork-like manner, this is probably invalid application usage too.
We are not using clone() in a manner that is strictly equivalent
to fork(). Libvirt is using clone() to create Linux containers
with new namespaces. eg we do
clone(CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWNET|SIGCHLD)
Understood. I still call this a fork-like manner since it's not
sharing VM or using CLONE_THREAD and using the default signal of
SIGCHLD. BTW is there a reason to prefer this usage over regular fork
followed by unshare()?
IIUC, if a process is multi-threaded you should restrict yourself to
use of async signal safe functions in between fork() and exec(). I
assume this restriction applies to clone() and exec() pairings too.
Libvirt is in fact violating rules about only using async signal safe
functions between clone() and exec() in many places. So I think what
we need to do is avoid starting any threads in the parent until after
we've clone()'d to create the new child namespace.
Per the specification, setuid is AS-safe. However glibc fails to meet
this requirement (it's actually very hard to meet due to Linux
limitations in how the kernel manages uids/gids). So for now, avoiding
starting threads until after performing clone() is probably a better
solution than trying to eliminate calls to non-AS-safe functions.
Rich