[libvirt] [PATCH v4 00/10] Add user namespace support for libvirt lxc

This patchset try to add userns support for libvirt lxc. Since userns is nearly completed in linux-3.9, the old kernel doesn't support userns, I add some New XML elements to let people decide if enable userns.The userns is enabled only when user configure the XML. The format of user namespace related XML file like below: <idmap> <uid start='0' target='1000' count='10'> <gid start='0' target='1000' count='10'> </idmap> it means the user in container (which uid:gid is 0:0) will be mapped to the user in host (uid:gid is 1000:1000), count is used to form an u/gid range: The users in container which uid in [start, start + count -1] will be mapped. You can have multiple lines to map differnet id ranges, caution, you must make sure the root user of container has been mapped. This patchset also does the below jobs. 1, Because the uninit userns has no right to create devices, we should create devices for container on host. 2, Changes the owner of fuse and tty device. Change from v3: 1, fix some bugs that Daniel pointed out 2, reorder the patchset,introduce virLXCControllerChown first. 3, rebase Change from v2: 1, Mount tmpfs on /stateDir/domain.dev 2, Create devices under /stateDir/doamin.dev/ 3, Mount Move the /.oldroot/stateDir/doamin.dev/ on the /dev/ of container 4, Enhance the configuration, disallow the semi configuration Gao feng (10): LXC: Introduce New XML element for user namespace LXC: enable user namespace only when user set the uidmap LXC: sort the uidmap/gidmap of domain LXC: introduce virLXCControllerSetupUserns and lxcContainerSetID LXC: Creating devices for container on host side LXC: controller: change the owner of tty devices to the root user of container LXC: controller: change the owner of /dev to the root user of container LXC: controller: change the owner of devices created on host LXC: controller: change the owner of /dev/pts and ptmx to the root of container LXC: fuse: Change files owner to the root user of container docs/formatdomain.html.in | 23 +++++ docs/schemas/domaincommon.rng | 31 ++++++ src/conf/domain_conf.c | 115 +++++++++++++++++++++ src/conf/domain_conf.h | 22 ++++ src/lxc/lxc_container.c | 173 +++++++++++++++++-------------- src/lxc/lxc_controller.c | 235 ++++++++++++++++++++++++++++++++++++++++-- src/lxc/lxc_fuse.c | 4 + 7 files changed, 516 insertions(+), 87 deletions(-) -- 1.8.1.4

This patch introduces new element <idmap> for user namespace. for example <idmap> <uid start='0' target='1000' count='10'/> <gid start='0' target='1000' count='10'/> </idmap> this new element is used for setting proc files /proc/<pid>/{uid_map,gid_map}. This patch also supports multiple uid/gid elements setting in XML configuration. We don't support the semi configuation, user has to configure uid and gid both. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- docs/formatdomain.html.in | 23 +++++++++++ docs/schemas/domaincommon.rng | 31 +++++++++++++++ src/conf/domain_conf.c | 90 +++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 22 +++++++++++ 4 files changed, 166 insertions(+) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 755d084..140e098 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -285,6 +285,29 @@ </pre> + <p> + If you want to enable user namespace,set the <code>idmap</code> element. + the <code>uid</code> and <code>gid</code> elements have three attributes: + </p> + + <dl> + <dt><code>start</code></dt> + <dd>First user id in container.</dd> + <dt><code>target</code></dt> + <dd>The first user id in container will be mapped to this target user + id in host.</dd> + <dt><code>count</code></dt> + <dd>How many users in container being allowed to map to host's user.</dd> + </dl> + + <pre> + <idmap> + <uid start='0' target='1000' count='10'/> + <gid start='0' target='1000' count='10'/> + </idmap> + </pre> + + <h3><a name="elementsSysinfo">SMBIOS System Information</a></h3> <p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 3cace35..ac07407 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -56,6 +56,9 @@ <ref name="pm"/> </optional> <optional> + <ref name="idmap"/> + </optional> + <optional> <ref name="devices"/> </optional> <zeroOrMore> @@ -465,6 +468,34 @@ </optional> </interleave> </define> + <define name="idmap"> + <zeroOrMore> + <element name="uid"> + <attribute name="start"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="target"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="count"> + <ref name="unsignedInt"/> + </attribute> + </element> + </zeroOrMore> + <zeroOrMore> + <element name="gid"> + <attribute name="start"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="target"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="count"> + <ref name="unsignedInt"/> + </attribute> + </element> + </zeroOrMore> + </define> <!-- Resources usage defines the amount of memory (maximum and possibly current usage) and number of virtual CPUs used by that domain. diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a16ebd1..b001938 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1939,6 +1939,9 @@ void virDomainDefFree(virDomainDefPtr def) virDomainTPMDefFree(def->tpm); + VIR_FREE(def->idmap.uidmap); + VIR_FREE(def->idmap.gidmap); + VIR_FREE(def->os.type); VIR_FREE(def->os.machine); VIR_FREE(def->os.init); @@ -10196,6 +10199,40 @@ cleanup: return ret; } + +/* Parse the XML definition for user namespace id map. + * + * idmap has the form of + * + * <uid start='0' target='1000' count='10'/> + * <gid start='0' target='1000' count='10'/> + */ +static virDomainIdMapEntryPtr +virDomainIdmapDefParseXML(xmlXPathContextPtr ctxt, + const xmlNodePtr *node, + size_t num) +{ + size_t i; + virDomainIdMapEntryPtr idmap = NULL; + xmlNodePtr save_ctxt = ctxt->node; + + if (VIR_ALLOC_N(idmap, num) < 0) { + virReportOOMError(); + goto error; + } + + for (i = 0; i < num; i++) { + ctxt->node = node[i]; + virXPathUInt("string(./@start)", ctxt, &idmap[i].start); + virXPathUInt("string(./@target)", ctxt, &idmap[i].target); + virXPathUInt("string(./@count)", ctxt, &idmap[i].count); + } + error: + ctxt->node = save_ctxt; + return idmap; +} + + /* Parse the XML definition for a vcpupin or emulatorpin. * * vcpupin has the form of @@ -11971,6 +12008,38 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + /* analysis of the user namespace mapping */ + if ((n = virXPathNodeSet("./idmap/uid", ctxt, &nodes)) < 0) + goto error; + + if (n) { + def->idmap.uidmap = virDomainIdmapDefParseXML(ctxt, nodes, n); + if (!def->idmap.uidmap) + goto error; + + def->idmap.nuidmap = n; + } + VIR_FREE(nodes); + + if ((n = virXPathNodeSet("./idmap/gid", ctxt, &nodes)) < 0) + goto error; + + if (n) { + def->idmap.gidmap = virDomainIdmapDefParseXML(ctxt, nodes, n); + if (!def->idmap.gidmap) + goto error; + + def->idmap.ngidmap = n; + } + VIR_FREE(nodes); + + if ((def->idmap.uidmap && !def->idmap.gidmap) || + (!def->idmap.uidmap && def->idmap.gidmap)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("uid and gid should be mapped both")); + goto error; + } + /* analysis of cpu handling */ if ((node = virXPathNode("./cpu[1]", ctxt)) != NULL) { xmlNodePtr oldnode = ctxt->node; @@ -16150,6 +16219,27 @@ virDomainDefFormatInternal(virDomainDefPtr def, virBufferAddLit(buf, " </os>\n"); + + if (def->idmap.uidmap) { + virBufferAddLit(buf, " <idmap>\n"); + for (i = 0; i < def->idmap.nuidmap; i++) { + virBufferAsprintf(buf, + " <uid start='%u' target='%u' count='%u'/>\n", + def->idmap.uidmap[i].start, + def->idmap.uidmap[i].target, + def->idmap.uidmap[i].count); + } + for (i = 0; i < def->idmap.ngidmap; i++) { + virBufferAsprintf(buf, + " <gid start='%u' target='%u' count='%u'/>\n", + def->idmap.gidmap[i].start, + def->idmap.gidmap[i].target, + def->idmap.gidmap[i].count); + } + virBufferAddLit(buf, " </idmap>\n"); + } + + if (def->features) { virBufferAddLit(buf, " <features>\n"); for (i = 0; i < VIR_DOMAIN_FEATURE_LAST; i++) { diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3a71d6c..29a6d8a 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -120,6 +120,12 @@ typedef virDomainSnapshotObjList *virDomainSnapshotObjListPtr; typedef struct _virDomainRNGDef virDomainRNGDef; typedef virDomainRNGDef *virDomainRNGDefPtr; +typedef struct _virDomainIdMapEntry virDomainIdMapEntry; +typedef virDomainIdMapEntry *virDomainIdMapEntryPtr; + +typedef struct _virDomainIdMapDef virDomainIdMapDef; +typedef virDomainIdMapDef *virDomainIdMapDefPtr; + /* Flags for the 'type' field in virDomainDeviceDef */ typedef enum { VIR_DOMAIN_DEVICE_NONE = 0, @@ -1837,6 +1843,21 @@ struct _virDomainRNGDef { virDomainDeviceInfo info; }; +struct _virDomainIdMapEntry { + unsigned int start; + unsigned int target; + unsigned int count; +}; + +struct _virDomainIdMapDef { + size_t nuidmap; + virDomainIdMapEntryPtr uidmap; + + size_t ngidmap; + virDomainIdMapEntryPtr gidmap; +}; + + void virBlkioDeviceWeightArrayClear(virBlkioDeviceWeightPtr deviceWeights, int ndevices); @@ -1899,6 +1920,7 @@ struct _virDomainDef { virNumaTuneDef numatune; virDomainResourceDefPtr resource; + virDomainIdMapDef idmap; /* These 3 are based on virDomainLifeCycleAction enum flags */ int onReboot; -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:18PM +0800, Gao feng wrote:
This patch introduces new element <idmap> for user namespace. for example <idmap> <uid start='0' target='1000' count='10'/> <gid start='0' target='1000' count='10'/> </idmap>
this new element is used for setting proc files /proc/<pid>/{uid_map,gid_map}.
This patch also supports multiple uid/gid elements setting in XML configuration.
We don't support the semi configuation, user has to configure uid and gid both.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- docs/formatdomain.html.in | 23 +++++++++++ docs/schemas/domaincommon.rng | 31 +++++++++++++++ src/conf/domain_conf.c | 90 +++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 22 +++++++++++ 4 files changed, 166 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a16ebd1..b001938 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10196,6 +10199,40 @@ cleanup: return ret; }
+ +/* Parse the XML definition for user namespace id map. + * + * idmap has the form of + * + * <uid start='0' target='1000' count='10'/> + * <gid start='0' target='1000' count='10'/> + */ +static virDomainIdMapEntryPtr +virDomainIdmapDefParseXML(xmlXPathContextPtr ctxt, + const xmlNodePtr *node, + size_t num) +{ + size_t i; + virDomainIdMapEntryPtr idmap = NULL; + xmlNodePtr save_ctxt = ctxt->node; + + if (VIR_ALLOC_N(idmap, num) < 0) { + virReportOOMError(); + goto error; + } + + for (i = 0; i < num; i++) { + ctxt->node = node[i]; + virXPathUInt("string(./@start)", ctxt, &idmap[i].start); + virXPathUInt("string(./@target)", ctxt, &idmap[i].target); + virXPathUInt("string(./@count)", ctxt, &idmap[i].count);
You need to check the return status of each virXPathUInt call to make sure it succeed & goto the error scenario on failure.
+ } + error:
s/error/cleanup/ since this code path is for both errors and success
+ ctxt->node = save_ctxt; + return idmap; +} + +
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Jun 07, 2013 at 12:38:53PM +0100, Daniel P. Berrange wrote:
On Fri, Jun 07, 2013 at 03:12:18PM +0800, Gao feng wrote:
This patch introduces new element <idmap> for user namespace. for example <idmap> <uid start='0' target='1000' count='10'/> <gid start='0' target='1000' count='10'/> </idmap>
this new element is used for setting proc files /proc/<pid>/{uid_map,gid_map}.
This patch also supports multiple uid/gid elements setting in XML configuration.
We don't support the semi configuation, user has to configure uid and gid both.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- docs/formatdomain.html.in | 23 +++++++++++ docs/schemas/domaincommon.rng | 31 +++++++++++++++ src/conf/domain_conf.c | 90 +++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 22 +++++++++++ 4 files changed, 166 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a16ebd1..b001938 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10196,6 +10199,40 @@ cleanup: return ret; }
+ +/* Parse the XML definition for user namespace id map. + * + * idmap has the form of + * + * <uid start='0' target='1000' count='10'/> + * <gid start='0' target='1000' count='10'/> + */ +static virDomainIdMapEntryPtr +virDomainIdmapDefParseXML(xmlXPathContextPtr ctxt, + const xmlNodePtr *node, + size_t num) +{ + size_t i; + virDomainIdMapEntryPtr idmap = NULL; + xmlNodePtr save_ctxt = ctxt->node; + + if (VIR_ALLOC_N(idmap, num) < 0) { + virReportOOMError(); + goto error; + } + + for (i = 0; i < num; i++) { + ctxt->node = node[i]; + virXPathUInt("string(./@start)", ctxt, &idmap[i].start); + virXPathUInt("string(./@target)", ctxt, &idmap[i].target); + virXPathUInt("string(./@count)", ctxt, &idmap[i].count);
You need to check the return status of each virXPathUInt call to make sure it succeed & goto the error scenario on failure.
+ } + error:
s/error/cleanup/ since this code path is for both errors and success
Since the rest of your patches look ok, don't bother re-submitting the series just for this fix. I'll test your series, and if it passes, I'll make the error reporting fix here myself before applying. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/07/2013 07:47 PM, Daniel P. Berrange wrote:
On Fri, Jun 07, 2013 at 12:38:53PM +0100, Daniel P. Berrange wrote:
On Fri, Jun 07, 2013 at 03:12:18PM +0800, Gao feng wrote:
This patch introduces new element <idmap> for user namespace. for example <idmap> <uid start='0' target='1000' count='10'/> <gid start='0' target='1000' count='10'/> </idmap>
this new element is used for setting proc files /proc/<pid>/{uid_map,gid_map}.
This patch also supports multiple uid/gid elements setting in XML configuration.
We don't support the semi configuation, user has to configure uid and gid both.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- docs/formatdomain.html.in | 23 +++++++++++ docs/schemas/domaincommon.rng | 31 +++++++++++++++ src/conf/domain_conf.c | 90 +++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 22 +++++++++++ 4 files changed, 166 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a16ebd1..b001938 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10196,6 +10199,40 @@ cleanup: return ret; }
+ +/* Parse the XML definition for user namespace id map. + * + * idmap has the form of + * + * <uid start='0' target='1000' count='10'/> + * <gid start='0' target='1000' count='10'/> + */ +static virDomainIdMapEntryPtr +virDomainIdmapDefParseXML(xmlXPathContextPtr ctxt, + const xmlNodePtr *node, + size_t num) +{ + size_t i; + virDomainIdMapEntryPtr idmap = NULL; + xmlNodePtr save_ctxt = ctxt->node; + + if (VIR_ALLOC_N(idmap, num) < 0) { + virReportOOMError(); + goto error; + } + + for (i = 0; i < num; i++) { + ctxt->node = node[i]; + virXPathUInt("string(./@start)", ctxt, &idmap[i].start); + virXPathUInt("string(./@target)", ctxt, &idmap[i].target); + virXPathUInt("string(./@count)", ctxt, &idmap[i].count);
You need to check the return status of each virXPathUInt call to make sure it succeed & goto the error scenario on failure.
+ } + error:
s/error/cleanup/ since this code path is for both errors and success
Since the rest of your patches look ok, don't bother re-submitting the series just for this fix. I'll test your series, and if it passes, I'll make the error reporting fix here myself before applying.
Thanks for your help :)

On 06/07/2013 07:47 PM, Daniel P. Berrange wrote:
On Fri, Jun 07, 2013 at 12:38:53PM +0100, Daniel P. Berrange wrote:
On Fri, Jun 07, 2013 at 03:12:18PM +0800, Gao feng wrote:
This patch introduces new element <idmap> for user namespace. for example <idmap> <uid start='0' target='1000' count='10'/> <gid start='0' target='1000' count='10'/> </idmap>
this new element is used for setting proc files /proc/<pid>/{uid_map,gid_map}.
This patch also supports multiple uid/gid elements setting in XML configuration.
We don't support the semi configuation, user has to configure uid and gid both.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- docs/formatdomain.html.in | 23 +++++++++++ docs/schemas/domaincommon.rng | 31 +++++++++++++++ src/conf/domain_conf.c | 90 +++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 22 +++++++++++ 4 files changed, 166 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a16ebd1..b001938 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10196,6 +10199,40 @@ cleanup: return ret; }
+ +/* Parse the XML definition for user namespace id map. + * + * idmap has the form of + * + * <uid start='0' target='1000' count='10'/> + * <gid start='0' target='1000' count='10'/> + */ +static virDomainIdMapEntryPtr +virDomainIdmapDefParseXML(xmlXPathContextPtr ctxt, + const xmlNodePtr *node, + size_t num) +{ + size_t i; + virDomainIdMapEntryPtr idmap = NULL; + xmlNodePtr save_ctxt = ctxt->node; + + if (VIR_ALLOC_N(idmap, num) < 0) { + virReportOOMError(); + goto error; + } + + for (i = 0; i < num; i++) { + ctxt->node = node[i]; + virXPathUInt("string(./@start)", ctxt, &idmap[i].start); + virXPathUInt("string(./@target)", ctxt, &idmap[i].target); + virXPathUInt("string(./@count)", ctxt, &idmap[i].count);
You need to check the return status of each virXPathUInt call to make sure it succeed & goto the error scenario on failure.
+ } + error:
s/error/cleanup/ since this code path is for both errors and success
Since the rest of your patches look ok, don't bother re-submitting the series just for this fix. I'll test your series, and if it passes, I'll make the error reporting fix here myself before applying.
Ping...
Daniel

User namespace will be enabled only when the idmap exist in configuration. If you want disable user namespace,just remove these elements from XML. If kernel doesn't support user namespace and idmap exist in configuration file, libvirt lxc will start failed and return "Kernel doesn't support user namespace" message. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_container.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 181f6c8..5d4da73 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -2018,14 +2018,12 @@ cleanup: static int userns_supported(void) { -#if 1 - /* - * put off using userns until uid mapping is implemented - */ - return 0; -#else return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0; -#endif +} + +static int userns_required(virDomainDefPtr def) +{ + return def->idmap.uidmap && def->idmap.gidmap; } virArch lxcContainerGetAlt32bitArch(virArch arch) @@ -2105,9 +2103,15 @@ int lxcContainerStart(virDomainDefPtr def, cflags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|SIGCHLD; - if (userns_supported()) { - VIR_DEBUG("Enable user namespaces"); - cflags |= CLONE_NEWUSER; + if (userns_required(def)) { + if (userns_supported()) { + VIR_DEBUG("Enable user namespace"); + cflags |= CLONE_NEWUSER; + } else { + virReportSystemError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Kernel doesn't support user namespace")); + return -1; + } } if (lxcNeedNetworkNamespace(def)) { -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:19PM +0800, Gao feng wrote:
User namespace will be enabled only when the idmap exist in configuration.
If you want disable user namespace,just remove these elements from XML.
If kernel doesn't support user namespace and idmap exist in configuration file, libvirt lxc will start failed and return "Kernel doesn't support user namespace" message.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_container.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 181f6c8..5d4da73 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -2018,14 +2018,12 @@ cleanup:
static int userns_supported(void) { -#if 1 - /* - * put off using userns until uid mapping is implemented - */ - return 0; -#else return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0; -#endif +} + +static int userns_required(virDomainDefPtr def) +{ + return def->idmap.uidmap && def->idmap.gidmap; }
virArch lxcContainerGetAlt32bitArch(virArch arch) @@ -2105,9 +2103,15 @@ int lxcContainerStart(virDomainDefPtr def,
cflags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|SIGCHLD;
- if (userns_supported()) { - VIR_DEBUG("Enable user namespaces"); - cflags |= CLONE_NEWUSER; + if (userns_required(def)) { + if (userns_supported()) { + VIR_DEBUG("Enable user namespace"); + cflags |= CLONE_NEWUSER; + } else { + virReportSystemError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Kernel doesn't support user namespace")); + return -1; + } }
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Make sure the mapping line contains the root user of container is the first element of idmap array. So we can get the real user id on host for the container easily. This patch also check the map information, User must map the root user of container to any user of host. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/conf/domain_conf.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index b001938..c4bb05e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10200,6 +10200,19 @@ cleanup: } +static int virDomainIdMapEntrySort(const void *a, const void *b) +{ + const virDomainIdMapEntryPtr entrya = (const virDomainIdMapEntryPtr) a; + const virDomainIdMapEntryPtr entryb = (const virDomainIdMapEntryPtr) b; + + if (entrya->start > entryb->start) + return 1; + else if (entrya->start < entryb->start) + return -1; + else + return 0; +} + /* Parse the XML definition for user namespace id map. * * idmap has the form of @@ -10227,6 +10240,18 @@ virDomainIdmapDefParseXML(xmlXPathContextPtr ctxt, virXPathUInt("string(./@target)", ctxt, &idmap[i].target); virXPathUInt("string(./@count)", ctxt, &idmap[i].count); } + + qsort(idmap, num, sizeof(idmap[0]), virDomainIdMapEntrySort); + + if (idmap[0].start != 0) { + /* Root user of container hasn't been mapped to any user of host, + * return error. */ + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("You must map the root user of container")); + VIR_FREE(idmap); + idmap = NULL; + } + error: ctxt->node = save_ctxt; return idmap; -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:20PM +0800, Gao feng wrote:
Make sure the mapping line contains the root user of container is the first element of idmap array. So we can get the real user id on host for the container easily.
This patch also check the map information, User must map the root user of container to any user of host.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/conf/domain_conf.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index b001938..c4bb05e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10200,6 +10200,19 @@ cleanup: }
+static int virDomainIdMapEntrySort(const void *a, const void *b) +{ + const virDomainIdMapEntryPtr entrya = (const virDomainIdMapEntryPtr) a; + const virDomainIdMapEntryPtr entryb = (const virDomainIdMapEntryPtr) b; + + if (entrya->start > entryb->start) + return 1; + else if (entrya->start < entryb->start) + return -1; + else + return 0; +} + /* Parse the XML definition for user namespace id map. * * idmap has the form of @@ -10227,6 +10240,18 @@ virDomainIdmapDefParseXML(xmlXPathContextPtr ctxt, virXPathUInt("string(./@target)", ctxt, &idmap[i].target); virXPathUInt("string(./@count)", ctxt, &idmap[i].count); } + + qsort(idmap, num, sizeof(idmap[0]), virDomainIdMapEntrySort); + + if (idmap[0].start != 0) { + /* Root user of container hasn't been mapped to any user of host, + * return error. */ + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("You must map the root user of container")); + VIR_FREE(idmap); + idmap = NULL; + } + error: ctxt->node = save_ctxt; return idmap;
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch introduces new helper function virLXCControllerSetupUserns, in this function, we set the files uid_map and gid_map of the init task of container. lxcContainerSetID is used for creating cred for tasks running in container. Since after setuid/setgid, we may be a new user. This patch calls lxcContainerSetUserns at first to make sure the new created files belong to right user. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_container.c | 63 +++++++++++++++++++++++++++++------------ src/lxc/lxc_controller.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 119 insertions(+), 18 deletions(-) diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 5d4da73..4b782bb 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -335,6 +335,30 @@ int lxcContainerWaitForContinue(int control) /** + * lxcContainerSetID: + * + * This function calls setuid and setgid to create proper + * cred for tasks running in container. + * + * Returns 0 on success or -1 in case of error + */ +static int lxcContainerSetID(virDomainDefPtr def) +{ + /* Only call virSetUIDGID when user namespace is enabled + * for this container. And user namespace is only enabled + * when nuidmap&ngidmap is not zero */ + + if (def->idmap.nuidmap && virSetUIDGID(0, 0) < 0) { + virReportSystemError(errno, "%s", + _("setuid or setgid failed")); + return -1; + } + + return 0; +} + + +/** * lxcContainerRenameAndEnableInterfaces: * @nveths: number of interfaces * @veths: interface names @@ -1926,6 +1950,27 @@ static int lxcContainerChild(void *data) cmd = lxcContainerBuildInitCmd(vmDef); virCommandWriteArgLog(cmd, 1); + if (lxcContainerResolveSymlinks(vmDef) < 0) + goto cleanup; + + if (!virFileExists(vmDef->os.init)) { + virReportSystemError(errno, + _("cannot find init path '%s' relative to container root"), + vmDef->os.init); + goto cleanup; + } + + /* Wait for interface devices to show up */ + if (lxcContainerWaitForContinue(argv->monitor) < 0) { + virReportSystemError(errno, "%s", + _("Failed to read the container continue message")); + goto cleanup; + } + VIR_DEBUG("Received container continue message"); + + if (lxcContainerSetID(vmDef) < 0) + goto cleanup; + root = virDomainGetRootFilesystem(vmDef); if (argv->nttyPaths) { @@ -1951,29 +1996,11 @@ static int lxcContainerChild(void *data) goto cleanup; } - if (lxcContainerResolveSymlinks(vmDef) < 0) - goto cleanup; - if (lxcContainerSetupPivotRoot(vmDef, root, argv->ttyPaths, argv->nttyPaths, argv->securityDriver) < 0) goto cleanup; - if (!virFileExists(vmDef->os.init)) { - virReportSystemError(errno, - _("cannot find init path '%s' relative to container root"), - vmDef->os.init); - goto cleanup; - } - - /* Wait for interface devices to show up */ - if (lxcContainerWaitForContinue(argv->monitor) < 0) { - virReportSystemError(errno, "%s", - _("Failed to read the container continue message")); - goto cleanup; - } - VIR_DEBUG("Received container continue message"); - /* rename and enable interfaces */ if (lxcContainerRenameAndEnableInterfaces(!!(vmDef->features & (1 << VIR_DOMAIN_FEATURE_PRIVNET)), diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index dfe686b..ef41efb 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1122,6 +1122,77 @@ cleanup2: } +static int +virLXCControllerSetupUsernsMap(virDomainIdMapEntryPtr map, + int num, + char *path) +{ + virBuffer map_value = VIR_BUFFER_INITIALIZER; + int i, ret = -1; + + for (i = 0; i < num; i++) + virBufferAsprintf(&map_value, "%u %u %u\n", + map[i].start, map[i].target, map[i].count); + + if (virBufferError(&map_value)) + goto no_memory; + + if (virFileWriteStr(path, virBufferCurrentContent(&map_value), 0) < 0) { + virReportSystemError(errno, _("unable write to %s"), path); + goto cleanup; + } + + ret = 0; +cleanup: + virBufferFreeAndReset(&map_value); + return ret; + +no_memory: + virReportOOMError(); + goto cleanup; +} + +/** + * virLXCControllerSetupUserns + * + * Set proc files for user namespace + * + * Returns 0 on success or -1 in case of error + */ +static int virLXCControllerSetupUserns(virLXCControllerPtr ctrl) +{ + char *uid_map = NULL; + char *gid_map = NULL; + int ret = -1; + + /* User namespace is disabled for container */ + if (ctrl->def->idmap.nuidmap == 0) + return 0; + + if (virAsprintf(&uid_map, "/proc/%d/uid_map", ctrl->initpid) < 0) + goto cleanup; + + if (virLXCControllerSetupUsernsMap(ctrl->def->idmap.uidmap, + ctrl->def->idmap.nuidmap, + uid_map) < 0) + goto cleanup; + + if (virAsprintf(&gid_map, "/proc/%d/gid_map", ctrl->initpid) < 0) + goto cleanup; + + if (virLXCControllerSetupUsernsMap(ctrl->def->idmap.gidmap, + ctrl->def->idmap.ngidmap, + gid_map) < 0) + goto cleanup; + + ret = 0; +cleanup: + VIR_FREE(uid_map); + VIR_FREE(gid_map); + return ret; +} + + /** * virLXCControllerMoveInterfaces @@ -1544,6 +1615,9 @@ virLXCControllerRun(virLXCControllerPtr ctrl) VIR_FORCE_CLOSE(control[1]); VIR_FORCE_CLOSE(containerhandshake[1]); + if (virLXCControllerSetupUserns(ctrl) < 0) + goto cleanup; + if (virLXCControllerMoveInterfaces(ctrl) < 0) goto cleanup; -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:21PM +0800, Gao feng wrote:
This patch introduces new helper function virLXCControllerSetupUserns, in this function, we set the files uid_map and gid_map of the init task of container.
lxcContainerSetID is used for creating cred for tasks running in container. Since after setuid/setgid, we may be a new user. This patch calls lxcContainerSetUserns at first to make sure the new created files belong to right user.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_container.c | 63 +++++++++++++++++++++++++++++------------ src/lxc/lxc_controller.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 119 insertions(+), 18 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

user namespace doesn't allow to create devices in uninit userns. We should create devices on host side. We first mount tmpfs on dev directroy under state dir of container. then create devices under this dev dir. Finally in container, mount the dev directroy created on host to the /dev/ directroy of container. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_container.c | 86 ++++++++++++++++------------------------ src/lxc/lxc_controller.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+), 52 deletions(-) diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 4b782bb..958e20d 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -683,7 +683,7 @@ err: } -static int lxcContainerMountBasicFS(char *sec_mount_options) +static int lxcContainerMountBasicFS(void) { const struct { const char *src; @@ -709,9 +709,8 @@ static int lxcContainerMountBasicFS(char *sec_mount_options) #endif }; int i, rc = -1; - char *opts = NULL; - VIR_DEBUG("Mounting basic filesystems sec_mount_options=%s", sec_mount_options); + VIR_DEBUG("Mounting basic filesystems"); for (i = 0; i < ARRAY_CARDINALITY(mnts); i++) { const char *srcpath = NULL; @@ -750,31 +749,10 @@ static int lxcContainerMountBasicFS(char *sec_mount_options) } } - /* - * tmpfs is limited to 64kb, since we only have device nodes in there - * and don't want to DOS the entire OS RAM usage - */ - - if (virAsprintf(&opts, - "mode=755,size=65536%s", sec_mount_options) < 0) { - virReportOOMError(); - goto cleanup; - } - - VIR_DEBUG("Mount devfs on /dev type=tmpfs flags=%x, opts=%s", - MS_NOSUID, opts); - if (mount("devfs", "/dev", "tmpfs", MS_NOSUID, opts) < 0) { - virReportSystemError(errno, - _("Failed to mount %s on %s type %s (%s)"), - "devfs", "/dev", "tmpfs", opts); - goto cleanup; - } - rc = 0; cleanup: VIR_DEBUG("rc=%d", rc); - VIR_FREE(opts); return rc; } @@ -811,6 +789,30 @@ static int lxcContainerMountProcFuse(virDomainDefPtr def ATTRIBUTE_UNUSED, } #endif +static int lxcContainerMountFSDev(virDomainDefPtr def, + const char *stateDir) +{ + int ret; + char *path = NULL; + + VIR_DEBUG("Mount /dev/ stateDir=%s", stateDir); + + if ((ret = virAsprintf(&path, "/.oldroot/%s/%s.dev", + stateDir, def->name)) < 0) + return ret; + + VIR_DEBUG("Tring to move %s to /dev", path); + + if ((ret = mount(path, "/dev", NULL, MS_MOVE, NULL)) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev"), + path); + } + + VIR_FREE(path); + return ret; +} + static int lxcContainerMountFSDevPTS(virDomainDefPtr def, const char *stateDir) { @@ -847,22 +849,10 @@ cleanup: return ret; } -static int lxcContainerPopulateDevices(char **ttyPaths, size_t nttyPaths) +static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) { size_t i; const struct { - int maj; - int min; - mode_t mode; - const char *path; - } devs[] = { - { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_NULL, 0666, "/dev/null" }, - { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_ZERO, 0666, "/dev/zero" }, - { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_FULL, 0666, "/dev/full" }, - { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_RANDOM, 0666, "/dev/random" }, - { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_URANDOM, 0666, "/dev/urandom" }, - }; - const struct { const char *src; const char *dst; } links[] = { @@ -872,18 +862,6 @@ static int lxcContainerPopulateDevices(char **ttyPaths, size_t nttyPaths) { "/proc/self/fd", "/dev/fd" }, }; - /* Populate /dev/ with a few important bits */ - for (i = 0; i < ARRAY_CARDINALITY(devs); i++) { - dev_t dev = makedev(devs[i].maj, devs[i].min); - if (mknod(devs[i].path, S_IFCHR, dev) < 0 || - chmod(devs[i].path, devs[i].mode)) { - virReportSystemError(errno, - _("Failed to make device %s"), - devs[i].path); - return -1; - } - } - for (i = 0; i < ARRAY_CARDINALITY(links); i++) { if (symlink(links[i].src, links[i].dst) < 0) { virReportSystemError(errno, @@ -1802,7 +1780,7 @@ static int lxcContainerSetupPivotRoot(virDomainDefPtr vmDef, goto cleanup; /* Mounts the core /proc, /sys, etc filesystems */ - if (lxcContainerMountBasicFS(sec_mount_options) < 0) + if (lxcContainerMountBasicFS() < 0) goto cleanup; /* Mounts /proc/meminfo etc sysinfo */ @@ -1814,12 +1792,16 @@ static int lxcContainerSetupPivotRoot(virDomainDefPtr vmDef, if (virCgroupIsolateMount(cgroup, "/.oldroot/", sec_mount_options) < 0) goto cleanup; + /* Mounts /dev */ + if (lxcContainerMountFSDev(vmDef, stateDir) < 0) + goto cleanup; + /* Mounts /dev/pts */ if (lxcContainerMountFSDevPTS(vmDef, stateDir) < 0) goto cleanup; - /* Populates device nodes in /dev/ */ - if (lxcContainerPopulateDevices(ttyPaths, nttyPaths) < 0) + /* Setup device nodes in /dev/ */ + if (lxcContainerSetupDevices(ttyPaths, nttyPaths) < 0) goto cleanup; /* Sets up any non-root mounts from guest config */ diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index ef41efb..75c7a85 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1192,6 +1192,103 @@ cleanup: return ret; } +static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) +{ + char *mount_options = NULL; + char *opts = NULL; + char *dev = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for container"); + + mount_options = virSecurityManagerGetMountOptions(ctrl->securityManager, + ctrl->def); + + if (virAsprintf(&dev, "/%s/%s.dev", + LXC_STATE_DIR, ctrl->def->name) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (virFileMakePath(dev) < 0) { + virReportSystemError(errno, + _("Failed to make path %s"), dev); + goto cleanup; + } + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) { + virReportOOMError(); + goto cleanup; + } + + VIR_DEBUG("Mount devfs on %s type=tmpfs flags=%x, opts=%s", + dev, MS_NOSUID, opts); + if (mount("devfs", dev, "tmpfs", MS_NOSUID, opts) < 0) { + virReportSystemError(errno, + _("Failed to mount devfs on %s type %s (%s)"), + dev, "tmpfs", opts); + goto cleanup; + } + + ret = 0; +cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + VIR_FREE(dev); + return ret; +} + +static int virLXCControllerPopulateDevices(virLXCControllerPtr ctrl) +{ + size_t i; + int ret = -1; + char *path = NULL; + const struct { + int maj; + int min; + mode_t mode; + const char *path; + } devs[] = { + { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_NULL, 0666, "/null" }, + { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_ZERO, 0666, "/zero" }, + { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_FULL, 0666, "/full" }, + { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_RANDOM, 0666, "/random" }, + { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_URANDOM, 0666, "/urandom" }, + }; + + if (virLXCControllerSetupDev(ctrl) < 0) + goto cleanup; + + /* Populate /dev/ with a few important bits */ + for (i = 0; i < ARRAY_CARDINALITY(devs); i++) { + if (virAsprintf(&path, "/%s/%s.dev/%s", + LXC_STATE_DIR, ctrl->def->name, devs[i].path) < 0) { + virReportOOMError(); + goto cleanup; + } + + dev_t dev = makedev(devs[i].maj, devs[i].min); + if (mknod(path, S_IFCHR, dev) < 0 || + chmod(path, devs[i].mode)) { + virReportSystemError(errno, + _("Failed to make device %s"), + path); + goto cleanup; + } + VIR_FREE(path); + } + + ret = 0; +cleanup: + VIR_FREE(path); + return ret; +} /** @@ -1594,6 +1691,9 @@ virLXCControllerRun(virLXCControllerPtr ctrl) if (virLXCControllerSetupDevPTS(ctrl) < 0) goto cleanup; + if (virLXCControllerPopulateDevices(ctrl) < 0) + goto cleanup; + if (virLXCControllerSetupFuse(ctrl) < 0) goto cleanup; -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:22PM +0800, Gao feng wrote:
user namespace doesn't allow to create devices in uninit userns. We should create devices on host side.
We first mount tmpfs on dev directroy under state dir of container. then create devices under this dev dir.
Finally in container, mount the dev directroy created on host to the /dev/ directroy of container.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_container.c | 86 ++++++++++++++++------------------------ src/lxc/lxc_controller.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+), 52 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Since these tty devices will be used by container, the owner of them should be the root user of container. This patch also adds a new function virLXCControllerChown, we can use this general function to change the owner of files. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 50 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 43 insertions(+), 7 deletions(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 75c7a85..66eae16 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1121,6 +1121,27 @@ cleanup2: return rc; } +static int +virLXCControllerChown(virLXCControllerPtr ctrl, char *path) +{ + uid_t uid; + gid_t gid; + + if (!ctrl->def->idmap.uidmap) + return 0; + + uid = ctrl->def->idmap.uidmap[0].target; + gid = ctrl->def->idmap.gidmap[0].target; + + if (chown(path, uid, gid) < 0) { + virReportSystemError(errno, + _("Failed to change owner of %s to %u:%u"), + path, uid, gid); + return -1; + } + + return 0; +} static int virLXCControllerSetupUsernsMap(virDomainIdMapEntryPtr map, @@ -1383,13 +1404,14 @@ static int lxcSetPersonality(virDomainDefPtr def) * *TTYNAME. Heavily borrowed from glibc, but doesn't require that * devpts == "/dev/pts" */ static int -lxcCreateTty(char *ptmx, int *ttymaster, char **ttyName) +lxcCreateTty(virLXCControllerPtr ctrl, int *ttymaster, + char **ttyName, char **ttyHostPath) { int ret = -1; int ptyno; int unlock = 0; - if ((*ttymaster = open(ptmx, O_RDWR|O_NOCTTY|O_NONBLOCK)) < 0) + if ((*ttymaster = open(ctrl->devptmx, O_RDWR|O_NOCTTY|O_NONBLOCK)) < 0) goto cleanup; if (ioctl(*ttymaster, TIOCSPTLCK, &unlock) < 0) @@ -1404,7 +1426,9 @@ lxcCreateTty(char *ptmx, int *ttymaster, char **ttyName) * while glibc has to fstat(), fchmod(), and fchown() for older * kernels, we can skip those steps. ptyno shouldn't currently be * anything other than 0, but let's play it safe. */ - if (virAsprintf(ttyName, "/dev/pts/%d", ptyno) < 0) { + if ((virAsprintf(ttyName, "/dev/pts/%d", ptyno) < 0) || + (virAsprintf(ttyHostPath, "/%s/%s.devpts/%d", LXC_STATE_DIR, + ctrl->def->name, ptyno) < 0)) { virReportOOMError(); errno = ENOMEM; goto cleanup; @@ -1538,18 +1562,30 @@ virLXCControllerSetupConsoles(virLXCControllerPtr ctrl, char **containerTTYPaths) { size_t i; + int ret = -1; + char *ttyHostPath = NULL; for (i = 0; i < ctrl->nconsoles; i++) { VIR_DEBUG("Opening tty on private %s", ctrl->devptmx); - if (lxcCreateTty(ctrl->devptmx, + if (lxcCreateTty(ctrl, &ctrl->consoles[i].contFd, - &containerTTYPaths[i]) < 0) { + &containerTTYPaths[i], &ttyHostPath) < 0) { virReportSystemError(errno, "%s", _("Failed to allocate tty")); - return -1; + goto cleanup; } + + /* Change the owner of tty device to the root user of container */ + if (virLXCControllerChown(ctrl, ttyHostPath) < 0) + goto cleanup; + + VIR_FREE(ttyHostPath); } - return 0; + + ret = 0; +cleanup: + VIR_FREE(ttyHostPath); + return ret; } -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:23PM +0800, Gao feng wrote:
Since these tty devices will be used by container, the owner of them should be the root user of container.
This patch also adds a new function virLXCControllerChown, we can use this general function to change the owner of files.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 50 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 43 insertions(+), 7 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

container will create /dev/pts directory in /dev. the owner of /dev should be the root user of container. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 66eae16..f6622bf 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1257,6 +1257,9 @@ static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) goto cleanup; } + if (virLXCControllerChown(ctrl, dev) < 0) + goto cleanup; + ret = 0; cleanup: VIR_FREE(opts); -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:24PM +0800, Gao feng wrote:
container will create /dev/pts directory in /dev. the owner of /dev should be the root user of container.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 66eae16..f6622bf 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1257,6 +1257,9 @@ static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) goto cleanup; }
+ if (virLXCControllerChown(ctrl, dev) < 0) + goto cleanup; + ret = 0; cleanup: VIR_FREE(opts);
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Since these devices are created for the container. the owner should be the root user of the container. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index f6622bf..dd85235 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1305,6 +1305,10 @@ static int virLXCControllerPopulateDevices(virLXCControllerPtr ctrl) path); goto cleanup; } + + if (virLXCControllerChown(ctrl, path) < 0) + goto cleanup; + VIR_FREE(path); } -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:25PM +0800, Gao feng wrote:
Since these devices are created for the container. the owner should be the root user of the container.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index f6622bf..dd85235 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1305,6 +1305,10 @@ static int virLXCControllerPopulateDevices(virLXCControllerPtr ctrl) path); goto cleanup; } + + if (virLXCControllerChown(ctrl, path) < 0) + goto cleanup; + VIR_FREE(path); }
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

These files are created for container, the owner should be the root user of container. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index dd85235..d12c4c2 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1548,6 +1548,10 @@ virLXCControllerSetupDevPTS(virLXCControllerPtr ctrl) goto cleanup; } + if ((virLXCControllerChown(ctrl, ctrl->devptmx) < 0) || + (virLXCControllerChown(ctrl, devpts) < 0)) + goto cleanup; + ret = 0; cleanup: -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:26PM +0800, Gao feng wrote:
These files are created for container, the owner should be the root user of container.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index dd85235..d12c4c2 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1548,6 +1548,10 @@ virLXCControllerSetupDevPTS(virLXCControllerPtr ctrl) goto cleanup; }
+ if ((virLXCControllerChown(ctrl, ctrl->devptmx) < 0) || + (virLXCControllerChown(ctrl, devpts) < 0)) + goto cleanup; + ret = 0;
cleanup:
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

The owner of the /proc/meminfo in container should be the root user of container. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_fuse.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/lxc/lxc_fuse.c b/src/lxc/lxc_fuse.c index 32886cd..b98a0d9 100644 --- a/src/lxc/lxc_fuse.c +++ b/src/lxc/lxc_fuse.c @@ -48,6 +48,8 @@ static int lxcProcGetattr(const char *path, struct stat *stbuf) int res; char *mempath = NULL; struct stat sb; + struct fuse_context *context = fuse_get_context(); + virDomainDefPtr def = (virDomainDefPtr)context->private_data; memset(stbuf, 0, sizeof(struct stat)); if (virAsprintf(&mempath, "/proc/%s", path) < 0) { @@ -66,6 +68,8 @@ static int lxcProcGetattr(const char *path, struct stat *stbuf) goto cleanup; } + stbuf->st_uid = def->idmap.uidmap ? def->idmap.uidmap[0].target : 0; + stbuf->st_gid = def->idmap.gidmap ? def->idmap.gidmap[0].target : 0; stbuf->st_mode = sb.st_mode; stbuf->st_nlink = 1; stbuf->st_blksize = sb.st_blksize; -- 1.8.1.4

On Fri, Jun 07, 2013 at 03:12:27PM +0800, Gao feng wrote:
The owner of the /proc/meminfo in container should be the root user of container.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_fuse.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/src/lxc/lxc_fuse.c b/src/lxc/lxc_fuse.c index 32886cd..b98a0d9 100644 --- a/src/lxc/lxc_fuse.c +++ b/src/lxc/lxc_fuse.c @@ -48,6 +48,8 @@ static int lxcProcGetattr(const char *path, struct stat *stbuf) int res; char *mempath = NULL; struct stat sb; + struct fuse_context *context = fuse_get_context(); + virDomainDefPtr def = (virDomainDefPtr)context->private_data;
memset(stbuf, 0, sizeof(struct stat)); if (virAsprintf(&mempath, "/proc/%s", path) < 0) { @@ -66,6 +68,8 @@ static int lxcProcGetattr(const char *path, struct stat *stbuf) goto cleanup; }
+ stbuf->st_uid = def->idmap.uidmap ? def->idmap.uidmap[0].target : 0; + stbuf->st_gid = def->idmap.gidmap ? def->idmap.gidmap[0].target : 0; stbuf->st_mode = sb.st_mode; stbuf->st_nlink = 1; stbuf->st_blksize = sb.st_blksize;
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Jun 07, 2013 at 03:12:17PM +0800, Gao feng wrote:
This patchset try to add userns support for libvirt lxc. Since userns is nearly completed in linux-3.9, the old kernel doesn't support userns, I add some New XML elements to let people decide if enable userns.The userns is enabled only when user configure the XML.
The format of user namespace related XML file like below: <idmap> <uid start='0' target='1000' count='10'> <gid start='0' target='1000' count='10'> </idmap> it means the user in container (which uid:gid is 0:0) will be mapped to the user in host (uid:gid is 1000:1000), count is used to form an u/gid range: The users in container which uid in [start, start + count -1] will be mapped.
You can have multiple lines to map differnet id ranges, caution, you must make sure the root user of container has been mapped.
This patchset also does the below jobs.
1, Because the uninit userns has no right to create devices, we should create devices for container on host. 2, Changes the owner of fuse and tty device.
Change from v3: 1, fix some bugs that Daniel pointed out 2, reorder the patchset,introduce virLXCControllerChown first. 3, rebase
Change from v2: 1, Mount tmpfs on /stateDir/domain.dev 2, Create devices under /stateDir/doamin.dev/ 3, Mount Move the /.oldroot/stateDir/doamin.dev/ on the /dev/ of container 4, Enhance the configuration, disallow the semi configuration
Gao feng (10): LXC: Introduce New XML element for user namespace LXC: enable user namespace only when user set the uidmap LXC: sort the uidmap/gidmap of domain LXC: introduce virLXCControllerSetupUserns and lxcContainerSetID LXC: Creating devices for container on host side LXC: controller: change the owner of tty devices to the root user of container LXC: controller: change the owner of /dev to the root user of container LXC: controller: change the owner of devices created on host LXC: controller: change the owner of /dev/pts and ptmx to the root of container LXC: fuse: Change files owner to the root user of container
docs/formatdomain.html.in | 23 +++++ docs/schemas/domaincommon.rng | 31 ++++++ src/conf/domain_conf.c | 115 +++++++++++++++++++++ src/conf/domain_conf.h | 22 ++++ src/lxc/lxc_container.c | 173 +++++++++++++++++-------------- src/lxc/lxc_controller.c | 235 ++++++++++++++++++++++++++++++++++++++++-- src/lxc/lxc_fuse.c | 4 + 7 files changed, 516 insertions(+), 87 deletions(-)
FYI, this patchset is now merged. I fixed up error reporting in patch 1, and I made a slight change to patch #4, to avoid needlessly moving some functions. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/02/2013 06:24 PM, Daniel P. Berrange wrote:
On Fri, Jun 07, 2013 at 03:12:17PM +0800, Gao feng wrote:
This patchset try to add userns support for libvirt lxc. Since userns is nearly completed in linux-3.9, the old kernel doesn't support userns, I add some New XML elements to let people decide if enable userns.The userns is enabled only when user configure the XML.
The format of user namespace related XML file like below: <idmap> <uid start='0' target='1000' count='10'> <gid start='0' target='1000' count='10'> </idmap> it means the user in container (which uid:gid is 0:0) will be mapped to the user in host (uid:gid is 1000:1000), count is used to form an u/gid range: The users in container which uid in [start, start + count -1] will be mapped.
You can have multiple lines to map differnet id ranges, caution, you must make sure the root user of container has been mapped.
This patchset also does the below jobs.
1, Because the uninit userns has no right to create devices, we should create devices for container on host. 2, Changes the owner of fuse and tty device.
Change from v3: 1, fix some bugs that Daniel pointed out 2, reorder the patchset,introduce virLXCControllerChown first. 3, rebase
Change from v2: 1, Mount tmpfs on /stateDir/domain.dev 2, Create devices under /stateDir/doamin.dev/ 3, Mount Move the /.oldroot/stateDir/doamin.dev/ on the /dev/ of container 4, Enhance the configuration, disallow the semi configuration
Gao feng (10): LXC: Introduce New XML element for user namespace LXC: enable user namespace only when user set the uidmap LXC: sort the uidmap/gidmap of domain LXC: introduce virLXCControllerSetupUserns and lxcContainerSetID LXC: Creating devices for container on host side LXC: controller: change the owner of tty devices to the root user of container LXC: controller: change the owner of /dev to the root user of container LXC: controller: change the owner of devices created on host LXC: controller: change the owner of /dev/pts and ptmx to the root of container LXC: fuse: Change files owner to the root user of container
docs/formatdomain.html.in | 23 +++++ docs/schemas/domaincommon.rng | 31 ++++++ src/conf/domain_conf.c | 115 +++++++++++++++++++++ src/conf/domain_conf.h | 22 ++++ src/lxc/lxc_container.c | 173 +++++++++++++++++-------------- src/lxc/lxc_controller.c | 235 ++++++++++++++++++++++++++++++++++++++++-- src/lxc/lxc_fuse.c | 4 + 7 files changed, 516 insertions(+), 87 deletions(-)
FYI, this patchset is now merged.
I fixed up error reporting in patch 1, and I made a slight change to patch #4, to avoid needlessly moving some functions.
cool! thanks you guys!
participants (2)
-
Daniel P. Berrange
-
Gao feng