On Wed, Jul 22, 2020 at 02:14:38PM +0200, Michal Privoznik wrote:
On 7/22/20 12:46 PM, Daniel P. Berrangé wrote:
> On Wed, Jul 22, 2020 at 11:39:55AM +0200, Michal Privoznik wrote:
> > When building domain's private /dev in a namespace, libdevmapper
> > is consulted for getting full dependency tree of domain's disks.
> > The reason is that for a multipath devices all dependent devices
> > must be created in the namespace and allowed in CGroups.
> >
> > However, this approach is very fragile as building of namespace
> > happens in the forked off child process, after mass close of FDs
> > and just before dropping privileges and execing QEMU. And it so
> > happens that when calling libdevmapper APIs, one of them opens
> > /dev/mapper/control and saves the FD into a global variable. The
> > FD is kept open until the lib is unlinked or dm_lib_release() is
> > called explicitly. We are doing neither.
> >
> > This is not a problem when calling the function from libvirtd
> > (when setting up CGroups), but it is a problem when called from
> > the pre-exec hook because we leak the FD into QEMU.
> >
> > Fixes: a30078cb832646177defd256e77c632905f1e6d0
> > Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1858260
> >
> > Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> > ---
> > src/util/virdevmapper.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/src/util/virdevmapper.c b/src/util/virdevmapper.c
> > index 40a82285f9..1c216fb6c1 100644
> > --- a/src/util/virdevmapper.c
> > +++ b/src/util/virdevmapper.c
> > @@ -156,6 +156,7 @@ virDevMapperGetTargetsImpl(const char *path,
> > virStringListFree(recursiveDevPaths);
> > virStringListFree(devPaths);
> > dm_task_destroy(dmt);
> > + dm_lib_release();
> > return ret;
> > }
>
> Hmm, this function isn't threadsafe though, so I'm kind of worried about
> us breaking parallel callers.
>
> For that matter dm_task_run isn't thread safe when it opens the
> FD in the first place either AFAICT.
>
> This libdevice-mapper.so looks like a bit of a disaster in general, as
> I can't see mutexes acquired anywhere in the public APIs and it has a
> load of static global variables including this FD :-(
>
> We could put mutex locking around our own virDevMapperGetTargetsImpl
> if we blindly assume nothing else we call may secretly be using
> libdevice-mapper too.
>
> Makes me wonder though if there's any way we can obtain the info we need
> without touching libdevice-mapper.so at all.
This is sort of resolved in the rest of the series. If we drop this patch
then after patch 17 which moves populating /dev with domain disks into
libvirtd rather than forked child, we don't need to call dm_lib_release().
But opening will still be unsafe, yes.
Yeah, I'm a little concerned that the full patch series is too big
for downstreams to be able to backport, so most likely they'll do a
targetted fix of just this patch.
Well, looks like devmapper is doing a single ioctl (if I don't
count
VERSION):
ioctl(3</dev/mapper/control>, DM_VERSION, {version=4.0.0, data_size=16384,
flags=DM_EXISTS_FLAG} => {version=4.42.0, data_size=16384,
flags=DM_EXISTS_FLAG}) = 0
ioctl(3</dev/mapper/control>, DM_TABLE_DEPS, {version=4.0.0,
data_size=16384, data_start=312, name="myusb", flags=DM_EXISTS_FLAG} =>
{version=4.42.0, data_size=328, data_start=312, dev=makedev(0xfd, 0x1),
name="myusb",
uuid="CRYPT-LUKS1-12fa3b2b646d43eb82ffd8f671515f93-myusb",
target_count=1, open_count=0, event_nr=0,
flags=DM_EXISTS_FLAG|DM_ACTIVE_PRESENT_FLAG, ...}) = 0
so maybe we can use ioctl() ourselves and drop libdevmapper completely?
Given that the kernel promises stable ABI, this should be OK, since we're
only after one very small piece of information.
The dm_task_run method seems quite simple if we exclude the bits that
are not relevant for DM_TABLE_DEPS.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|