On 02.12.2013 18:40, Daniel P. Berrange wrote:
On Mon, Dec 02, 2013 at 06:38:37PM +0100, Michal Privoznik wrote:
> On 02.12.2013 18:24, Daniel P. Berrange wrote:
>> On Mon, Dec 02, 2013 at 06:17:00PM +0100, Michal Privoznik wrote:
>>> On 17.10.2013 14:40, Daniel P. Berrange wrote:
>>>> On Tue, Oct 15, 2013 at 06:03:27PM +0200, Christophe Fergeau wrote:
>>>>> Hey,
>>>>>
>>>>> Due to a configuration issue on my system, libvirtd is not starting
on my
>>>>> system (not complaining about this!):
>>>>> 2013-10-15 15:40:51.024+0000: 10222: info : libvirt version: 1.1.3
>>>>> 2013-10-15 15:40:51.024+0000: 10222: error :
virNetTLSContextCheckCertFile:117 : Cannot read CA certificate
>>>>> '/home/teuf/usr/etc/pki/CA/cacert.pem': No such file or
directory
>>>>>
>>>>> However, before libvirtd exits, I get another error message from the
netcf
>>>>> code, which is unexpected this time:
>>>>> 2013-10-15 15:49:18.361+0000: 10222: error : netcfStateCleanup:105 :
>>>>> internal error: Attempt to close netcf state driver already closed
>>>>>
>>>>> This message comes from the call of virStateCleanup() at the end of
main()
>>>>> in libvirtd.c. virStateCleanup() should not be called before
>>>>> daemonStateInit() has been called in main.
>>>>> After this call, things get more ugly as daemonStateInit() calls
>>>>> virStateInitialize() from a thread, so there's probably a small
window for
>>>>> virStateInitialize() and virStateCleanup() running concurrently if an
error
>>>>> occurs between the call to daemonStateInit() and the call to
>>>>> virNetServerRun().
>>>>>
>>>>> I'm sending this email rather than a patch as I'm not sure
what is the best
>>>>> way to fix it. The easy way would be for virStateCleanup() to be a
noop
>>>>> when virStateInitialize() hasn't been called (iow remove the
error message
>>>>> from netcfStateCleanup). However, this would leave this small race
>>>>> condition around (which is not that bad as it would only occurs in
>>>>> situations when the daemon fails to start). So another approach would
be to
>>>>> set a vir_state_initialized boolean once the thread has called
>>>>> ivrStateInitialize, and only call virStateCleanup() when it's
set.
>>>>>
>>>>> Or maybe there's a 3rd way to fix this?
>>>>>
>>>>> Let me know if you have any guidance into the best way to fix this,
>>>>
>>>> Yeah, I've known about this issue for a while and its a bit horrible
>>>> to solve. I think we probably need to have a global mutex to serialize
>>>> the virStateInitialize, virStateReload and virStateCleanup calls so
>>>> that none of them can run in parallel.
>>>>
>>>> I'd have virGlobalInit create the mutex instance, since we know that
>>>> is run from thread safe context.
>>>>
>>>> Then make the virState* calls hold that mutex while executing.
>>>>
>>>> We don't want virStateCleanup to skip execution if
virStateInitialize
>>>> has failed though - every callback in virStateCleanup should be written
>>>> to be safe if its corresponding init function hasn't run. Just the
>>>> mutex serialization ought to be sufficient I think.
>>>>
>>>>
>>>> Daniel
>>>>
>>>
>>> Sorry for bringing up a dead thread, but as you both may have noticed I
>>> was trying to solve this too:
>>>
>>>
https://www.redhat.com/archives/libvir-list/2013-November/msg01311.html
>>>
>>> And I don't think Dan, your approach is gonna work. I mean, the reason
>>> for running virStateInitialize in thread is - it may take ages to
>>
>> Actually no, the reason for using a thread is that the initialization
>> code needs to have a working event loop - so you cannot run the
>> virStateInitialize code from the same thread that is running the
>> event loop.
>
> Okay, this might be the reason too.
>
>>
>> You're probably thinking of the QEMU VM autostart code, whicih spawns a
>> separate thread to auto-start QEMU VMs to avoid delaying use of the QEMU
>> driver while (slow) autostart takes place.
>>
>
> No. I am thinking of the stateAutostart member which is called from
> within virStateInitialize(). In case of qemu driver it's
> qemuStateAutoStart() what is called, which boils down to calling
> qemuProcessStart over each autostart domain without spawning a separate
> thread for it. The thread you have in mind is used for
> qemuProcessReconnect() not qemuProcessStart().
>
>>> complete. Hence, serializing virState* will lead to main thread waiting
>>> for the virStateInitialize to complete (which may not happen, i.e. in
>>> case of deadlock somewhere in a driver).
>>>
>>> But I was thinking of rather refined locking. Turning each of
>>> vir*DriverTab into virObjectLockable(). Then, each access would require
>>> object to lock itself. Moreover, the virStateCleanup would unregister
>>> the driver. Unfortunately, it could only unregister stateful drivers.
>>
>> This race condition scenario only hits us in two cases
>>
>> - libvirtd startup is being aborted due to some failure
>> while StateInit is still running
>> - libvirtd reload is triggered by SIGHUP causing StateReload
>> to run
>
> I can see the third scenario: the daemon exiting the event loop, i.e. as
> a result of timeout or SIGINT. In this case you don't want to wait for
> virStateInitialize() to finish, do you?
If the event loop exits then in fact the virStateInitialize may never
complete, since it requires a functioning event loop. So in this case
then you cannot safely call virStateCleanup at all and we should just
skip it entirely and exit with trying to cleanup. The cleanup code is
pretty much worthless from a functional POV anyway - it is just there
to make valgrind happy.
Yep. I vote for removing virStateCleanup(). Although, I'm not sure if
NWFilter is not doing something actually useful there. Other drivers
seems to just free() allocated memory (=making valgrind happy).
Michal