Re: [libvirt] Internal error message from netcf when trying to start libvirtd

Monday, 2 December 2013

On Mon, Dec 02, 2013 at 06:53:37PM +0100, Michal Privoznik wrote:
...
 On 02.12.2013 18:40, Daniel P. Berrange wrote:
 > On Mon, Dec 02, 2013 at 06:38:37PM +0100, Michal Privoznik wrote:
 >> On 02.12.2013 18:24, Daniel P. Berrange wrote:
 >>> On Mon, Dec 02, 2013 at 06:17:00PM +0100, Michal Privoznik wrote:
 >>>> On 17.10.2013 14:40, Daniel P. Berrange wrote:
 >>>>> On Tue, Oct 15, 2013 at 06:03:27PM +0200, Christophe Fergeau wrote:
 >>>>>> Hey,
 >>>>>>
 >>>>>> Due to a configuration issue on my system, libvirtd is not
starting on my
 >>>>>> system (not complaining about this!):
 >>>>>> 2013-10-15 15:40:51.024+0000: 10222: info : libvirt version:
1.1.3
 >>>>>> 2013-10-15 15:40:51.024+0000: 10222: error :
virNetTLSContextCheckCertFile:117 : Cannot read CA certificate
 >>>>>> '/home/teuf/usr/etc/pki/CA/cacert.pem': No such file or
directory
 >>>>>>
 >>>>>> However, before libvirtd exits, I get another error message from
the netcf
 >>>>>> code, which is unexpected this time:
 >>>>>> 2013-10-15 15:49:18.361+0000: 10222: error :
netcfStateCleanup:105 :
 >>>>>> internal error: Attempt to close netcf state driver already
closed
 >>>>>>
 >>>>>> This message comes from the call of virStateCleanup() at the end
of main()
 >>>>>> in libvirtd.c. virStateCleanup() should not be called before
 >>>>>> daemonStateInit() has been called in main.
 >>>>>> After this call, things get more ugly as daemonStateInit()
calls
 >>>>>> virStateInitialize() from a thread, so there's probably a
small window for
 >>>>>> virStateInitialize() and virStateCleanup() running concurrently
if an error
 >>>>>> occurs between the call to daemonStateInit() and the call to
 >>>>>> virNetServerRun().
 >>>>>>
 >>>>>> I'm sending this email rather than a patch as I'm not
sure what is the best
 >>>>>> way to fix it. The easy way would be for virStateCleanup() to be
a noop
 >>>>>> when virStateInitialize() hasn't been called (iow remove the
error message
 >>>>>> from netcfStateCleanup). However, this would leave this small
race
 >>>>>> condition around (which is not that bad as it would only occurs
in
 >>>>>> situations when the daemon fails to start). So another approach
would be to
 >>>>>> set a vir_state_initialized boolean once the thread has called
 >>>>>> ivrStateInitialize, and only call virStateCleanup() when
it's set.
 >>>>>>
 >>>>>> Or maybe there's a 3rd way to fix this?
 >>>>>>
 >>>>>> Let me know if you have any guidance into the best way to fix
this,
 >>>>>
 >>>>> Yeah, I've known about this issue for a while and its a bit
horrible
 >>>>> to solve. I think we probably need to have a global mutex to
serialize
 >>>>> the virStateInitialize, virStateReload and virStateCleanup calls so
 >>>>> that none of them can run in parallel.
 >>>>>
 >>>>> I'd have virGlobalInit create the mutex instance, since we know
that
 >>>>> is run from thread safe context.
 >>>>>
 >>>>> Then make the virState* calls hold that mutex while executing.
 >>>>>
 >>>>> We don't want virStateCleanup to skip execution if
virStateInitialize
 >>>>> has failed though - every callback in virStateCleanup should be
written
 >>>>> to be safe if its corresponding init function hasn't run. Just
the
 >>>>> mutex serialization ought to be sufficient I think.
 >>>>>
 >>>>>
 >>>>> Daniel
 >>>>>
 >>>>
 >>>> Sorry for bringing up a dead thread, but as you both may have noticed I
 >>>> was trying to solve this too:
 >>>>
 >>>> https://www.redhat.com/archives/libvir-list/2013-November/msg01311.html
 >>>>
 >>>> And I don't think Dan, your approach is gonna work. I mean, the
reason
 >>>> for running virStateInitialize in thread is - it may take ages to
 >>>
 >>> Actually no, the reason for using a thread is that the initialization
 >>> code needs to have a working event loop - so you cannot run the
 >>> virStateInitialize code from the same thread that is running the
 >>> event loop.
 >>
 >> Okay, this might be the reason too.
 >>
 >>>
 >>> You're probably thinking of the QEMU VM autostart code, whicih spawns a
 >>> separate thread to auto-start QEMU VMs to avoid delaying use of the QEMU
 >>> driver while (slow) autostart takes place.
 >>>
 >>
 >> No. I am thinking of the stateAutostart member which is called from
 >> within virStateInitialize(). In case of qemu driver it's
 >> qemuStateAutoStart() what is called, which boils down to calling
 >> qemuProcessStart over each autostart domain without spawning a separate
 >> thread for it. The thread you have in mind is used for
 >> qemuProcessReconnect() not qemuProcessStart().
 >>
 >>>> complete. Hence, serializing virState* will lead to main thread waiting
 >>>> for the virStateInitialize to complete (which may not happen, i.e. in
 >>>> case of deadlock somewhere in a driver).
 >>>>
 >>>> But I was thinking of rather refined locking. Turning each of
 >>>> vir*DriverTab into virObjectLockable(). Then, each access would require
 >>>> object to lock itself. Moreover, the virStateCleanup would unregister
 >>>> the driver. Unfortunately, it could only unregister stateful drivers.
 >>>
 >>> This race condition scenario only hits us in two cases
 >>>
 >>>  - libvirtd startup is being aborted due to some failure
 >>>    while StateInit is still running
 >>>  - libvirtd reload is triggered by SIGHUP causing StateReload
 >>>    to run
 >>
 >> I can see the third scenario: the daemon exiting the event loop, i.e. as
 >> a result of timeout or SIGINT. In this case you don't want to wait for
 >> virStateInitialize() to finish, do you?
 > 
 > If the event loop exits then in fact the virStateInitialize may never
 > complete, since it requires a functioning event loop. So in this case
 > then you cannot safely call virStateCleanup at all and we should just
 > skip it entirely and exit with trying to cleanup. The cleanup code is
 > pretty much worthless from a functional POV anyway - it is just there
 > to make valgrind happy.

 Yep. I vote for removing virStateCleanup(). Although, I'm not sure if
 NWFilter is not doing something actually useful there. Other drivers
 seems to just free() allocated memory (=making valgrind happy). 
NB, I didn't say remove virStateCleanup - just skip it if we know that
virStateInitialize hasn't finished yet.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] Internal error message from netcf when trying to start libvirtd