
On 07/09/2013 11:41 PM, Laine Stump wrote:
On 07/09/2013 09:17 PM, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=964358
POSIX states that multi-threaded apps should not use functions that are not async-signal-safe between fork and exec, yet we were using getpwuid_r and initgroups. Although rare, it is possible to hit deadlock in the child, when it tries to grab a mutex that was already held by another thread in the parent. I actually hit this deadlock when testing multiple domains being started in parallel with a command hook, with the following backtrace in the child:
+++ b/src/lxc/lxc_container.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2008-2012 Red Hat, Inc. + * Copyright (C) 2008-2013 Red Hat, Inc. * Copyright (C) 2008 IBM Corp. * * lxc_container.c: file description @@ -347,12 +347,17 @@ int lxcContainerWaitForContinue(int control) */ static int lxcContainerSetID(virDomainDefPtr def) { + gid_t *groups; + int ngroups; + /* Only call virSetUIDGID when user namespace is enabled * for this container. And user namespace is only enabled * when nuidmap&ngidmap is not zero */
VIR_DEBUG("Set UID/GID to 0/0"); - if (def->idmap.nuidmap && virSetUIDGID(0, 0) < 0) { + if (def->idmap.nuidmap && + ((ngroups = virGetGroupList(0, 0, &groups) < 0) || + virSetUIDGID(0, 0, groups, ngroups) < 0)) { virReportSystemError(errno, "%s", _("setuid or setgid failed")); return -1;
[1]
@@ -1904,7 +1928,7 @@ parenterror:
/* set desired uid/gid, then attempt to create the directory */
- if (virSetUIDGID(uid, gid) < 0) { + if (virSetUIDGID(uid, gid, groups, ngroups) < 0) { ret = -errno; goto childerror; }
It's too late for me to be reviewing code - I can tell because I spent several minutes being concerned that you didn't free groups in this code path. Duh. That's really all I can say to myself :-)
Only _after_ posting, did I reread my patch and your comments, and while _this_ instance doesn't need a free, I noticed that the lxc_container.c instance [1] leaks groups (at least, until exec() makes leaks irrelevant). But _that_ in turn implies that I called virGetGroupList in between clone() and exec(), which is precisely what I documented should not be done. I guess I'd better work on a followup patch that hoists the virGetGroupList to occur before the lxc clone(). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org