On 12/15/23 16:47, Daniel P. Berrangé wrote:
On Fri, Dec 15, 2023 at 03:51:19PM +0100, Denis V. Lunev wrote:
> On 12/15/23 14:48, Efim Shevrin wrote:
>> *From:* Daniel P. Berrangé <berrange(a)redhat.com>
>> *Sent:* Friday, December 15, 2023 19:09
>> *To:* Efim Shevrin <efim.shevrin(a)virtuozzo.com>
>> *Cc:* devel(a)lists.libvirt.org <devel(a)lists.libvirt.org>; den(a)openvz.org
>> <den(a)openvz.org>
>> *Subject:* Re: [PATCH 3/3] rpc: Rework rpc notifications in main and
>> side thread
>> [You don't often get email from berrange(a)redhat.com. Learn why this is
>> important at
https://aka.ms/LearnAboutSenderIdentification ]
>>
>> On Fri, Dec 15, 2023 at 02:32:19AM +0800, Fima Shevrin wrote:
>>> RPC client implementation uses the following paradigm. The critical
>>> section is organized via virObjectLock(client)/virObjectUnlock(client)
>>> braces. Though this is potentially problematic as
>>> main thread: side thread:
>>> virObjectUnlock(client);
>>> virObjectLock(client);
>>> g_main_loop_quit(client->eventLoop);
>>> virObjectUnlock(client);
>>> g_main_loop_run(client->eventLoop);
>>>
>>> This means in particular that is the main thread is executing very long
>>> request like VM migration, the wakeup from the side thread could be
>>> stuck until the main request will be fully completed.
>> Can you explain this in more detail, with call traces illustration
>> for the two threads. You're not saying where the main thread is
>> doing work with the 'client' lock hold for a long time. Generally
>> the goal should be for the main thread to only hold the lock for
>> a short time. Also if the side thread is already holding a reference
>> on 'client', then potentially we should consider if it is possible
>> to terminate the event loop without acquiring the mutex, as GMainLoop
>> protects itself wrt concurrent usage already, provided all threads
>> hold a reference directly or indirectly.
> At our opinion the problem here is missed wakeup from
> the side thread to the main thread.
Hmmm, what platform are you seeing problems on ? Are you still targetting
a very old RHEL-7 platform ? I vaguely recall there are/weere some old
glib bugs in the main loop with threads that could be applicable.
With regards,
Daniel
Nope. Original problem is observed against RHEL 9.1 i.e. libvirt 8.5.
But the problem here comes from the "comparison" with very old
ancient libvirt 5.6 which behaves here MUCH better.
Den