Re: [libvirt] Internal error message from netcf when trying to start libvirtd

2 Dec 2013


      On 02.12.2013 18:40, Daniel P. Berrange wrote:
...
On Mon, Dec 02, 2013 at 06:38:37PM +0100, Michal Privoznik wrote:
...
On 02.12.2013 18:24, Daniel P. Berrange wrote:
...
On Mon, Dec 02, 2013 at 06:17:00PM +0100, Michal Privoznik wrote:
...
On 17.10.2013 14:40, Daniel P. Berrange wrote:
...
On Tue, Oct 15, 2013 at 06:03:27PM +0200, Christophe Fergeau wrote:
...
Hey,
Due to a configuration issue on my system, libvirtd is not starting on my
system (not complaining about this!):
2013-10-15 15:40:51.024+0000: 10222: info : libvirt version: 1.1.3
2013-10-15 15:40:51.024+0000: 10222: error : virNetTLSContextCheckCertFile:117 : Cannot read CA certificate
'/home/teuf/usr/etc/pki/CA/cacert.pem': No such file or directory
However, before libvirtd exits, I get another error message from the netcf
code, which is unexpected this time:
2013-10-15 15:49:18.361+0000: 10222: error : netcfStateCleanup:105 :
internal error: Attempt to close netcf state driver already closed
This message comes from the call of virStateCleanup() at the end of main()
in libvirtd.c. virStateCleanup() should not be called before
daemonStateInit() has been called in main.
After this call, things get more ugly as daemonStateInit() calls
virStateInitialize() from a thread, so there's probably a small window for
virStateInitialize() and virStateCleanup() running concurrently if an error
occurs between the call to daemonStateInit() and the call to
virNetServerRun().
I'm sending this email rather than a patch as I'm not sure what is the best
way to fix it. The easy way would be for virStateCleanup() to be a noop
when virStateInitialize() hasn't been called (iow remove the error message
from netcfStateCleanup). However, this would leave this small race
condition around (which is not that bad as it would only occurs in
situations when the daemon fails to start). So another approach would be to
set a vir_state_initialized boolean once the thread has called
ivrStateInitialize, and only call virStateCleanup() when it's set.
Or maybe there's a 3rd way to fix this?
Let me know if you have any guidance into the best way to fix this,
Yeah, I've known about this issue for a while and its a bit horrible
to solve. I think we probably need to have a global mutex to serialize
the virStateInitialize, virStateReload and virStateCleanup calls so
that none of them can run in parallel.
I'd have virGlobalInit create the mutex instance, since we know that
is run from thread safe context.
Then make the virState* calls hold that mutex while executing.
We don't want virStateCleanup to skip execution if virStateInitialize
has failed though - every callback in virStateCleanup should be written
to be safe if its corresponding init function hasn't run. Just the
mutex serialization ought to be sufficient I think.
Daniel
Sorry for bringing up a dead thread, but as you both may have noticed I
was trying to solve this too:
https://www.redhat.com/archives/libvir-list/2013-November/msg01311.html
And I don't think Dan, your approach is gonna work. I mean, the reason
for running virStateInitialize in thread is - it may take ages to
Actually no, the reason for using a thread is that the initialization
code needs to have a working event loop - so you cannot run the
virStateInitialize code from the same thread that is running the
event loop.
Okay, this might be the reason too.
...
You're probably thinking of the QEMU VM autostart code, whicih spawns a
separate thread to auto-start QEMU VMs to avoid delaying use of the QEMU
driver while (slow) autostart takes place.
No. I am thinking of the stateAutostart member which is called from
within virStateInitialize(). In case of qemu driver it's
qemuStateAutoStart() what is called, which boils down to calling
qemuProcessStart over each autostart domain without spawning a separate
thread for it. The thread you have in mind is used for
qemuProcessReconnect() not qemuProcessStart().
...
...
complete. Hence, serializing virState* will lead to main thread waiting
for the virStateInitialize to complete (which may not happen, i.e. in
case of deadlock somewhere in a driver).
But I was thinking of rather refined locking. Turning each of
vir*DriverTab into virObjectLockable(). Then, each access would require
object to lock itself. Moreover, the virStateCleanup would unregister
the driver. Unfortunately, it could only unregister stateful drivers.
This race condition scenario only hits us in two cases
- libvirtd startup is being aborted due to some failure
   while StateInit is still running
 - libvirtd reload is triggered by SIGHUP causing StateReload
   to run
I can see the third scenario: the daemon exiting the event loop, i.e. as
a result of timeout or SIGINT. In this case you don't want to wait for
virStateInitialize() to finish, do you?
If the event loop exits then in fact the virStateInitialize may never
complete, since it requires a functioning event loop. So in this case
then you cannot safely call virStateCleanup at all and we should just
skip it entirely and exit with trying to cleanup. The cleanup code is
pretty much worthless from a functional POV anyway - it is just there
to make valgrind happy.
Yep. I vote for removing virStateCleanup(). Although, I'm not sure if
NWFilter is not doing something actually useful there. Other drivers
seems to just free() allocated memory (=making valgrind happy).

Michal