On 7/19/20 11:09 PM, Bihong Yu wrote:
On 2020/7/18 5:14, Daniel Henrique Barboza wrote:
>
>
> On 7/17/20 8:10 AM, Bihong Yu wrote:
>>> From c328ff62b11d58553fd2032a85fd3295e009b3d3 Mon Sep 17 00:00:00 2001
>> From: Bihong Yu <yubihong(a)huawei.com>
>> Date: Fri, 17 Jul 2020 16:55:12 +0800
>> Subject: [PATCH] qemu: clear residual QMP caps processes during QEMU driver
>> initialization
>>
>> In some cases, the QMP capabilities processes possible residue. So we need to
>> clear the residual QMP caps processes during starting libvirt.
>
> Which cases are you referring to? Do you have a reproducer for this behavior? I
> tried it up starting and stopping libvirtd, fetching capabilities, starting vms
> and so on, and wasn't able to see any 'qmp.pid' file hanging around.
Yes, I have a reproducer for this behavior. I refer case is that send kill signal
to libvirtd when libvirtd is fetching capabilities before libvirtd calling
qemuProcessQMPFree() and qemuProcessQMPStop().
When libvirtd restart, it can not find the temporary qmp directory and has no way
to clear the residual QMP caps processes.
I see. The reproducer is an abnormal termination of the Libvirt daemon (I'm assuming
SIGKILL - SIGTERM would have been handled).
The problem is that processes in general can't handle SIGKILL with cleanup functions
(there are gimmicks, but one process can't avoid to not get killed instantly with
a SIGKILL by itself). I don't believe Libvirt can contemplate each cleanup scenario
where a program is terminated with SIGKILL.
You found one case where a kill -9 left stuff behind, but I guarantee you that there
are many places and situations where this will occur. Basically each temporary file
Libvirt creates are susceptible to this.
Your proposed fix doesn't help in the case you're trying to cover either. What if
I land a SIGKILL while the code is trying to cleanup the leftovers of the previous
SIGKILL? Now imagine that each place where Libvirt creates a temporary file would
need a cleanup code like this, and every one of them is susceptible to this same
behavior.
I am not sure why you needed to kill the process with SIGKILL instead of SIGTERM,
but I'd rather try to understand and fix what lead you to send a -9 to libvirtd
instead.
Thanks,
DHB
Thanks,
Bihong Yu
>
> About the code, I looked it up and all calls to qemuProcessQMPNew() are being
> cleaned up accordingly with a qemuProcessQMPFree() function, which calls
> qemuProcessQMPStop(), and the Stop function cleans up both the pid file and
> the uniqDir:
>
> ----- qemuProcessQMPStop(qemuProcessQMPPtr proc) -----
>
> if (proc->pidfile)
> unlink(proc->pidfile);
>
> if (proc->uniqDir)
> rmdir(proc->uniqDir);
>
> ------
>
> The proper fix here would be to understand in which conditions the 'qmp.pid'
file
> is being left behind and make the code go through the existing cleanup path,
> fixing qemuProcessQMPFree and/or qemuProcessQMPStop if needed. What you're doing
> here works, but it's fixing the symptom of a bug instead of the bug itself.
>
>
> Thanks,
>
>
> DHB
>
>
>>
>> Signed-off-by:Bihong Yu <yubihong(a)huawei.com>
>> ---
>> src/qemu/qemu_driver.c | 2 ++
>> src/qemu/qemu_process.c | 40 ++++++++++++++++++++++++++++++++++++++++
>> src/qemu/qemu_process.h | 2 ++
>> 3 files changed, 44 insertions(+)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index d185666..d627fd7 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -889,6 +889,8 @@ qemuStateInitialize(bool privileged,
>> run_gid = cfg->group;
>> }
>>
>> + qemuProcessQMPClear(cfg->libDir);
>> +
>> qemu_driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir,
>> cfg->cacheDir,
>> run_uid,
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 70fc24b..d545e3e 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -8312,6 +8312,46 @@ static qemuMonitorCallbacks callbacks = {
>> };
>>
>>
>> +/**
>> + * qemuProcessQMPClear
>> + *
>> + * Try to kill residual QMP caps processes
>> + */
>> +void
>> +qemuProcessQMPClear(const char *libDir)
>> +{
>> + virErrorPtr orig_err;
>> + DIR *dirp = NULL;
>> + struct dirent *dp;
>> +
>> + if (!virFileExists(libDir))
>> + return;
>> +
>> + if (virDirOpen(&dirp, libDir) < 0)
>> + return;
>> +
>> + virErrorPreserveLast(&orig_err);
>> + while (virDirRead(dirp, &dp, libDir) > 0) {
>> + g_autofree char *qmp_uniqDir = NULL;
>> + g_autofree char *qmp_pidfile = NULL;
>> +
>> + if (!STRPREFIX(dp->d_name, "qmp-"))
>> + continue;
>> +
>> + qmp_uniqDir = g_strdup_printf("%s/%s", libDir,
dp->d_name);
>> + qmp_pidfile = g_strdup_printf("%s/%s", qmp_uniqDir,
"qmp.pid");
>> +
>> + ignore_value(virPidFileForceCleanupPath(qmp_pidfile));
>> +
>> + if (qmp_uniqDir)
>> + rmdir(qmp_uniqDir);
>> + }
>> + virErrorRestore(&orig_err);
>> +
>> + VIR_DIR_CLOSE(dirp);
>> +}
>> +
>> +
>> static void
>> qemuProcessQMPStop(qemuProcessQMPPtr proc)
>> {
>> diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h
>> index 15e67b9..b039e6c 100644
>> --- a/src/qemu/qemu_process.h
>> +++ b/src/qemu/qemu_process.h
>> @@ -233,6 +233,8 @@ qemuProcessQMPPtr qemuProcessQMPNew(const char *binary,
>> gid_t runGid,
>> bool forceTCG);
>>
>> +void qemuProcessQMPClear(const char *libDir);
>> +
>> void qemuProcessQMPFree(qemuProcessQMPPtr proc);
>>
>> int qemuProcessQMPStart(qemuProcessQMPPtr proc);
>>
> .