On 7/25/24 14:18, Daniel P. Berrangé wrote:
> On Thu, Jul 25, 2024 at 12:57:59PM +0200, Michal Privoznik wrote:
>> Currently, qemuProcessStop() unlocks given domain object right in
>> the middle of cleanup process. This is dangerous because there
>> might be another thread which is executing virDomainObjListAdd().
>> And since the domain object is on the list of domain objects AND
>> by the time qemuProcessStop() unlocks it the object is also
>> marked as inactive, the other thread acquires the lock and
>> switches vm->def pointer.
>>
>> The unlocking of domain object is needed though, to allow even
>> processing thread finish its queue. Well, the processing can be
>> done before any cleanup is attempted.
>>
>> Therefore, use freshly introduced virEventThreadStop() to join
>> the event thread and drop lock/unlock from the middle of
>> qemuProcessStop().
>>
>> Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0
>> Resolves:
https://issues.redhat.com/browse/RHEL-49607
>> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>> ---
>> src/qemu/qemu_process.c | 14 ++++++++++----
>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 6255ba92e7..0869307a90 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -8615,6 +8615,14 @@ void qemuProcessStop(virQEMUDriver *driver,
>> * reporting so we don't squash a legit error. */
>> virErrorPreserveLast(&orig_err);
>>
>> + /* By unlocking the domain object the events processing thread is
>> + * allowed to finish its job. */
>> + if (priv->eventThread) {
>> + virObjectUnlock(vm);
>> + virEventThreadStop(priv->eventThread);
>> + virObjectLock(vm);
>> + }
>> +
>
> I wonder if this is a little too early.
>
> We don't call qemuMOnitorClose until a short while later,
> just after we have done qemuProcessKill.
>
> If we stop the event loop here, then I worry that we're
> at risk of missing some final monitor events that might
> be desirable ? eg the final lifecycle events indicating
> that we're stopping/stopped ?
But can we close the monitor without a job? And IIUC, we shouldn't try
to stop the event loop thread with a job held (as those callbacks try to
acquire _MODIFY job on their own).
The monitor will close itself if the QEMU pid dies.
IOW, once we've sent qemuProcessKill(), we need to wait for EOF from
the monitor, at which point we can close it & tear down the event
loop (once idle).
A complication is that the QEMU PID might have already gone away
and seen EOF on monitor, before qemuProcessStop even starts.
With regards,
Daniel
--
|: