Re: [libvirt] [PATCH v4 07/14] qemu_cgroup: Allow /dev/mapper/control for PR

17 Apr 2018

On 04/16/2018 10:56 AM, Michal Privoznik wrote:
...
On 04/14/2018 02:55 PM, John Ferlan wrote:
...
On 04/10/2018 10:58 AM, Michal Privoznik wrote:
...
Just like in previous commit, qemu-pr-helper might want to open
/dev/mapper/control under certain circumstances. Therefore we
have to allow it in cgroups.
Perhaps the two patches should be combined then...  yes, I know cgroups
is different than namespace, so I understand the separation.
Yeah, I'd like to keep them separated. Even though they allow access to
the same path they do that in different areas of the code.
Separate is fine...
...
...
...
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/qemu/qemu_cgroup.c  | 33 ++++++++++++++++++++++++++++++---
 src/util/virdevmapper.c |  8 +++++++-
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index d88eb7881f..546a4c8e63 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -114,6 +114,8 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm,
 }
+#define DEVICE_MAPPER_CONTROL_PATH "/dev/mapper/control"
+
Almost too bad we didn't have a common place for this in some existing
included .h file.
Yeah :(
...
...
static int
 qemuSetupImageCgroupInternal(virDomainObjPtr vm,
                              virStorageSourcePtr src,
@@ -125,6 +127,10 @@ qemuSetupImageCgroupInternal(virDomainObjPtr vm,
         return 0;
     }
+    if (virStoragePRDefIsManaged(src->pr) &&
+        qemuSetupImagePathCgroup(vm, DEVICE_MAPPER_CONTROL_PATH, false) < 0)
+        return -1;
+
Could the above be done potentially many times?  Could be expensive, no?
 Considering what virDevMapperGetTargets[Impl] does...
It shouldn't be expensive. At the very first call of
virDevMapperGetTargetsImpl() we get ENXIO (this the change below) and
thus there only very small time penalty added.
Suffice to say it wasn't obvious to me WTF virDevMapperGetTargets[Impl]
causes the ENXIO.  That logic is not entirely self documenting.
...
...
...
return qemuSetupImagePathCgroup(vm, src->path, src->readonly || forceReadonly);
 }
@@ -142,9 +148,8 @@ qemuTeardownImageCgroup(virDomainObjPtr vm,
                         virStorageSourcePtr src)
 {
     qemuDomainObjPrivatePtr priv = vm->privateData;
-    int perms = VIR_CGROUP_DEVICE_READ |
-                VIR_CGROUP_DEVICE_WRITE |
-                VIR_CGROUP_DEVICE_MKNOD;
+    int perms = VIR_CGROUP_DEVICE_RWM;
+    size_t i;
     int ret;
if (!virCgroupHasController(priv->cgroup,
@@ -157,6 +162,28 @@ qemuTeardownImageCgroup(virDomainObjPtr vm,
         return 0;
     }
+    for (i = 0; i < vm->def->ndisks; i++) {
+        virStorageSourcePtr diskSrc = vm->def->disks[i]->src;
+
+        if (src == diskSrc)
+            continue;
+
+        if (virStoragePRDefIsManaged(diskSrc->pr))
+            break;
+    }
+
+    if (i == vm->def->ndisks) {
If there was only one that's managed and it's @src, then we don't get
here...
How come? Say there are 3 disks for a domain and the first one is
managed: disks = {A, B, C} (where A.managed = yes}.
So it made sense when I wrote the comment... let's see.
...
So the loop goes like this:
qemuTeardownImageCgroup(A):
    i = 0;
    if (A.src == disks[0].src) //true
      continue;
i = 1;
    if (A.src == disks[1].src) //false
      continue;
if (isManaged(disks[1]) //false
      break;
i = 2;
    if (A.src == disks[2].src) //false
      continue;
if (isManaged(disks[2]) //false
      break;
// Here, i == ndisks == 3;
Or am I missing something?
OK - with the example I see...  /me wonders at this point whether it'd
be clearer to keep a list of pr-helper managed LUN's and "Add" and
"Remove" where the 1st add and the last remove trigger the PR start/stop
rather than looping through ndisks.
...
...
...
+        VIR_DEBUG("Disabling device mapper control");
+        ret = virCgroupDenyDevicePath(priv->cgroup,
+                                      DEVICE_MAPPER_CONTROL_PATH, perms, true);
You go through great lengths to ensure this is only done once converse
to the qemuSetupImageCgroupInternal change...
Yes, because we can't deny helper the access if there's still another
disk that might be using it (note that only managed disks use helper
which runs in qemu's namespace/cgroup context). But we can allow it
multiple times.
...
I guess I would think that if we can determine at Prepare time that NS
and cgroups would need to add /dev/mapper/control and we only want to do
that config once, then we should be able
Yes, of course we then we potentially miss adding for hotplug. Honestly
it seems more of a "global" to qemu_domain rather than "local" to a
particular disk source even though it's needed by the local disk source,
is it really something the disk source itself (or it's private data
structure) should be managing?
Well, I view qemuSetupImageCgroup() and qemuTeardownImageCgroup() as
entry points to CGroup mgmt. Therefore it's less lines of code to do
decision there. Doing it anywhere else would be impractical IMO.
...
...
+        virDomainAuditCgroupPath(vm, priv->cgroup, "deny",
+                                 DEVICE_MAPPER_CONTROL_PATH,
+                                 virCgroupGetDevicePermsString(perms), ret);
+        if (ret < 0)
+            return ret;
+    }
+
+
     VIR_DEBUG("Deny path %s", src->path);
ret = virCgroupDenyDevicePath(priv->cgroup, src->path, perms, true);
diff --git a/src/util/virdevmapper.c b/src/util/virdevmapper.c
index d2c25af003..ef4b1e480a 100644
--- a/src/util/virdevmapper.c
+++ b/src/util/virdevmapper.c
@@ -101,8 +101,14 @@ virDevMapperGetTargetsImpl(const char *path,
dm_task_no_open_count(dmt);
-    if (!dm_task_run(dmt))
+    if (!dm_task_run(dmt)) {
+        if (errno == ENXIO) {
+            /* In some cases devmapper realizes this late device
+             * is not managed by it. */
+            ret = 0;
+        }
Should this be separated? Or is it related to /dev/mapper/control?
No, this is exactly at the right spot in the correct patch. If
virDevMapperGetTargetsImpl() sees /dev/mapper/control the dm_task_run()
returns ENXIO. Speaking of which, this is not documented anywhere. I had
to dig through devmapper sources to find out this errno is in fact
returned by kernel.
hmm... my self documenting code comment ;-)... The relationship between
ENXIO and having /dev/mapper/control be "managed" in/by the name space
code was not that obvious to me. This is stuff that'll drive someone
nuts in the future when they wonder WTF is going on and why ENXIO was
magically chosen and added to a cgroup related patch.

John