[libvirt-users] Thread-safety issues with vbox driver ?

Hi, I'm experiencing weird things with the vbox driver when using multiple threads. Following is the snippet of code I experience problems with /*****************************************************/ #include <stdlib.h> #include <stdio.h> #include <pthread.h> #include <libvirt/libvirt.h> void *create_and_destroy(void *arg) { virDomainPtr dom = (virDomainPtr)arg; char buf[VIR_UUID_STRING_BUFLEN]; virDomainGetUUIDString(dom, buf); if (virDomainCreate(dom) != 0) { printf("failed to start %s\n", buf); goto end; } printf("%s started\n", buf); if (virDomainDestroy(dom) != 0) { printf("failed to destroy %s\n", buf); } printf("%s destroyed\n", buf); end: virDomainFree(dom); pthread_exit(NULL); } int main(int argc, char **argv) { virConnectPtr conn = virConnectOpen("vbox:///session"); int i; int count = virConnectNumOfDefinedDomains(conn); char **names = malloc(count * sizeof(char *)); virConnectListDefinedDomains(conn, names, count); virDomainPtr *doms = malloc(count * sizeof(virDomainPtr)); for (i = 0 ; i < count ; ++i) { doms[i] = virDomainLookupByName(conn, names[i]); } pthread_t *threads = malloc(count * sizeof(pthread_t)); pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); for (i = 0 ; i < count ; ++i) { pthread_create(&threads[i], &attr, create_and_destroy, (void *)doms[i]); } pthread_attr_destroy(&attr); for (i = 0 ; i < count ; ++i) { pthread_join(threads[i], NULL); } virConnectClose(conn); pthread_exit(NULL); } /************************************************/ Here is the output of the program with 2 defined domains:
libvir: VBOX error : operation failed: OpenRemoteSession/LaunchVMProcess failed, domain can't be started failed to start c538c89a-70da-42ab-a88a-5aeb15698c12 034cf837-abe7-4c48-8373-0ddcf480d416 started 034cf837-abe7-4c48-8373-0ddcf480d416 destroyed
Sometimes the first domain really fails to start, but sometimes it starts correctly but libvirt reports an error. Sometimes domains aren't destroyed but libvirt reports no error at all. If there is only one domain, no problem occurs at all. I also tried the same code (ie with multiple domains) but with only one thread and it works well. I managed to reproduce these issues with libvirt 0.9.4, 0.9.7, using VirtualBox 4.0 and 4.1. -- Jean-Baptiste ROUAULT Ingénieur R&D - diateam : Architectes de l'information Phone : +33 (0)9 53 16 02 70 Fax : +33 (0)2 98 050 051

2011/12/13 Jean-Baptiste Rouault <jean-baptiste.rouault@diateam.net>:
Hi,
I'm experiencing weird things with the vbox driver when using multiple threads. Following is the snippet of code I experience problems with
/*****************************************************/ #include <stdlib.h> #include <stdio.h> #include <pthread.h> #include <libvirt/libvirt.h>
void *create_and_destroy(void *arg) { virDomainPtr dom = (virDomainPtr)arg;
char buf[VIR_UUID_STRING_BUFLEN]; virDomainGetUUIDString(dom, buf);
if (virDomainCreate(dom) != 0) { printf("failed to start %s\n", buf); goto end; } printf("%s started\n", buf);
if (virDomainDestroy(dom) != 0) { printf("failed to destroy %s\n", buf); } printf("%s destroyed\n", buf);
end: virDomainFree(dom); pthread_exit(NULL); }
int main(int argc, char **argv) { virConnectPtr conn = virConnectOpen("vbox:///session");
int i; int count = virConnectNumOfDefinedDomains(conn); char **names = malloc(count * sizeof(char *)); virConnectListDefinedDomains(conn, names, count); virDomainPtr *doms = malloc(count * sizeof(virDomainPtr)); for (i = 0 ; i < count ; ++i) { doms[i] = virDomainLookupByName(conn, names[i]); }
pthread_t *threads = malloc(count * sizeof(pthread_t)); pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for (i = 0 ; i < count ; ++i) { pthread_create(&threads[i], &attr, create_and_destroy, (void *)doms[i]); }
pthread_attr_destroy(&attr); for (i = 0 ; i < count ; ++i) { pthread_join(threads[i], NULL); }
virConnectClose(conn);
pthread_exit(NULL); } /************************************************/
Here is the output of the program with 2 defined domains:
libvir: VBOX error : operation failed: OpenRemoteSession/LaunchVMProcess failed, domain can't be started failed to start c538c89a-70da-42ab-a88a-5aeb15698c12 034cf837-abe7-4c48-8373-0ddcf480d416 started 034cf837-abe7-4c48-8373-0ddcf480d416 destroyed
Sometimes the first domain really fails to start, but sometimes it starts correctly but libvirt reports an error. Sometimes domains aren't destroyed but libvirt reports no error at all. If there is only one domain, no problem occurs at all. I also tried the same code (ie with multiple domains) but with only one thread and it works well.
I managed to reproduce these issues with libvirt 0.9.4, 0.9.7, using VirtualBox 4.0 and 4.1.
Okay, without looking deeper into this here are some ideas: The XPCOM API libvirt uses might not be threadsafe, or needs to be initialized for every thread that wants to use it. Currently its only initialized for the thread that opens the driver. I know that this is the case on Windows were VirtualBox uses MSCOM for its API and you need to call CoInitialize on every thread. This is currently not done for the MSCOM glue in libvirt, so I know that on Windows the VirtualBox driver is not threadsafe currently. Also I didn't look into a solution for this yet. Maybe we need a thread local variable that holds whether (MS/XP)COM was already initialized for this thread and add a check to every driver function to initialize it when needed. Did you try to open a connection for each thread instead of trying to share one? If that works reliable it might indicate that there is an VirtualBox API initialization problem. Did you try to add the actual VirtualBox error code for the OpenRemoteSession/LaunchVMProcess error in hex to the error message and look it up in the src/vbox/vbox_CAPI_v4_1.h? -- Matthias Bolte http://photron.blogspot.com

On Monday 16 January 2012 11:34:53 Matthias Bolte wrote:
Okay, without looking deeper into this here are some ideas:
The XPCOM API libvirt uses might not be threadsafe, or needs to be initialized for every thread that wants to use it. Currently its only initialized for the thread that opens the driver. I know that this is the case on Windows were VirtualBox uses MSCOM for its API and you need to call CoInitialize on every thread. This is currently not done for the MSCOM glue in libvirt, so I know that on Windows the VirtualBox driver is not threadsafe currently. Also I didn't look into a solution for this yet. Maybe we need a thread local variable that holds whether (MS/XP)COM was already initialized for this thread and add a check to every driver function to initialize it when needed.
Did you try to open a connection for each thread instead of trying to share one? If that works reliable it might indicate that there is an VirtualBox API initialization problem.
I tried today with one connection for each thread and it works. I changed the vbox driver so that the pfnComInitialize function is called only when the first connection is opened : it breaks the test, even with one connection per thread. I guess we'll have to use a thread local variable as you suggested, unless someone has a better idea to handle this problem. -- Jean-Baptiste ROUAULT Ingénieur R&D - diateam : Architectes de l'information Phone : +33 (0)2 98 050 050 Fax : +33 (0)2 98 050 051

On Wednesday 04 April 2012 17:56:12 Jean-Baptiste Rouault wrote:
On Monday 16 January 2012 11:34:53 Matthias Bolte wrote:
Okay, without looking deeper into this here are some ideas:
The XPCOM API libvirt uses might not be threadsafe, or needs to be initialized for every thread that wants to use it. Currently its only initialized for the thread that opens the driver. I know that this is the case on Windows were VirtualBox uses MSCOM for its API and you need to call CoInitialize on every thread. This is currently not done for the MSCOM glue in libvirt, so I know that on Windows the VirtualBox driver is not threadsafe currently. Also I didn't look into a solution for this yet. Maybe we need a thread local variable that holds whether (MS/XP)COM was already initialized for this thread and add a check to every driver function to initialize it when needed.
Did you try to open a connection for each thread instead of trying to share one? If that works reliable it might indicate that there is an VirtualBox API initialization problem.
I tried today with one connection for each thread and it works.
I changed the vbox driver so that the pfnComInitialize function is called only when the first connection is opened : it breaks the test, even with one connection per thread.
I guess we'll have to use a thread local variable as you suggested, unless someone has a better idea to handle this problem.
Hi, I looked deeper into these thread-safety issues, once a new connection is opened for each thread, everything works well. However, opening and closing connections isn't thread-safe at all for two reasons : - VirtualBox C bindings initialization and uninitialization functions aren't thread-safe. I talked about it with upstream on IRC and they are probably not going to fix it, but would accept a patch fixing the issue. I'm going to contact upstream again to get some advices so I can write a patch. - In the libvirt vbox driver, for each new connection, modification of the global variable g_pVBoxGlobalData isn't protected (see line 1040 of vbox_tmpl.c). First of all, is it really necessary to replace it on each new connection, or would it be ok to initialize it only when the first connection is opened ? A global mutex is needed in the vbox driver to protect access to g_pVBoxGlobalData, the vboxRegister() function seems to be the best place to initialize such a mutex unless there is another entry point to do this ? -- Jean-Baptiste ROUAULT Ingénieur R&D - diateam : Architectes de l'information Phone : +33 (0)2 98 050 050 Fax : +33 (0)2 98 050 051

On Thu, Apr 19, 2012 at 12:51:10PM +0200, Jean-Baptiste Rouault wrote:
On Wednesday 04 April 2012 17:56:12 Jean-Baptiste Rouault wrote:
On Monday 16 January 2012 11:34:53 Matthias Bolte wrote:
Okay, without looking deeper into this here are some ideas:
The XPCOM API libvirt uses might not be threadsafe, or needs to be initialized for every thread that wants to use it. Currently its only initialized for the thread that opens the driver. I know that this is the case on Windows were VirtualBox uses MSCOM for its API and you need to call CoInitialize on every thread. This is currently not done for the MSCOM glue in libvirt, so I know that on Windows the VirtualBox driver is not threadsafe currently. Also I didn't look into a solution for this yet. Maybe we need a thread local variable that holds whether (MS/XP)COM was already initialized for this thread and add a check to every driver function to initialize it when needed.
Did you try to open a connection for each thread instead of trying to share one? If that works reliable it might indicate that there is an VirtualBox API initialization problem.
I tried today with one connection for each thread and it works.
I changed the vbox driver so that the pfnComInitialize function is called only when the first connection is opened : it breaks the test, even with one connection per thread.
I guess we'll have to use a thread local variable as you suggested, unless someone has a better idea to handle this problem.
Hi,
I looked deeper into these thread-safety issues, once a new connection is opened for each thread, everything works well. However, opening and closing connections isn't thread-safe at all for two reasons :
- VirtualBox C bindings initialization and uninitialization functions aren't thread-safe. I talked about it with upstream on IRC and they are probably not going to fix it, but would accept a patch fixing the issue. I'm going to contact upstream again to get some advices so I can write a patch.
- In the libvirt vbox driver, for each new connection, modification of the global variable g_pVBoxGlobalData isn't protected (see line 1040 of vbox_tmpl.c).
First of all, is it really necessary to replace it on each new connection, or would it be ok to initialize it only when the first connection is opened ?
A global mutex is needed in the vbox driver to protect access to g_pVBoxGlobalData, the vboxRegister() function seems to be the best place to initialize such a mutex unless there is another entry point to do this ?
I can't comment on where the best place to put hte lock is, but I agree that we should make sure we do all the locking in libvirt. Not only because upstream isnt likely to make it threadsafe soon, but also because it will ensure we have compatibilty & safety with all existing/previous releases or VirtualBox Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am 19. April 2012 12:51 schrieb Jean-Baptiste Rouault <jean-baptiste.rouault@diateam.net>:
On Wednesday 04 April 2012 17:56:12 Jean-Baptiste Rouault wrote:
On Monday 16 January 2012 11:34:53 Matthias Bolte wrote:
Okay, without looking deeper into this here are some ideas:
The XPCOM API libvirt uses might not be threadsafe, or needs to be initialized for every thread that wants to use it. Currently its only initialized for the thread that opens the driver. I know that this is the case on Windows were VirtualBox uses MSCOM for its API and you need to call CoInitialize on every thread. This is currently not done for the MSCOM glue in libvirt, so I know that on Windows the VirtualBox driver is not threadsafe currently. Also I didn't look into a solution for this yet. Maybe we need a thread local variable that holds whether (MS/XP)COM was already initialized for this thread and add a check to every driver function to initialize it when needed.
Did you try to open a connection for each thread instead of trying to share one? If that works reliable it might indicate that there is an VirtualBox API initialization problem.
I tried today with one connection for each thread and it works.
I changed the vbox driver so that the pfnComInitialize function is called only when the first connection is opened : it breaks the test, even with one connection per thread.
I guess we'll have to use a thread local variable as you suggested, unless someone has a better idea to handle this problem.
Hi,
I looked deeper into these thread-safety issues, once a new connection is opened for each thread, everything works well. However, opening and closing connections isn't thread-safe at all for two reasons :
- VirtualBox C bindings initialization and uninitialization functions aren't thread-safe. I talked about it with upstream on IRC and they are probably not going to fix it, but would accept a patch fixing the issue. I'm going to contact upstream again to get some advices so I can write a patch.
- In the libvirt vbox driver, for each new connection, modification of the global variable g_pVBoxGlobalData isn't protected (see line 1040 of vbox_tmpl.c).
First of all, is it really necessary to replace it on each new connection, or would it be ok to initialize it only when the first connection is opened ?
A global mutex is needed in the vbox driver to protect access to g_pVBoxGlobalData, the vboxRegister() function seems to be the best place to initialize such a mutex unless there is another entry point to do this ?
Such a mutex doesn't help. Looking at g_pVBoxGlobalData tells me that we need to get rid of it in its current form for different reasons. The major reason is that it contains connection specific data such as the conn and domainEvents members. This means when you open a new connection you break all other open connections because connection specific data is overwritten. We need to redesign this. A major reason for the existence of g_pVBoxGlobalData is given in line 209 of vbox_tmpl.c: g_pVBoxGlobalData needs to be global because event callbacks won't work otherwise. Actually that's not true. Who ever did the original implementation of this was not aware of how COM (MS and XP variants) works. There is a vboxAllocCallbackObj function that creates an IVirtualBoxCallback COM object. The first member in a COM objects is a pointer to its vtbl that contains pointers to all the methods that can be called on it. After this vtbl the COM object contains its private data members, this aspect is not used in the current implementation. So we could just allocate a bigger object and put the data that belongs to the IVirtualBoxCallback implementation into this additional memory. This includes at least vboxCallBackRefCount and domainEvents that are currently located in g_pVBoxGlobalData. The only two members of g_pVBoxGlobalData that can stay global are caps and pFuncs, all other members are specific to a connection and cannot be global. Are you referring to data->pFuncs->pfnComInitialize and data->pFuncs->pfnComUninitialize when you say "VirtualBox C bindings initialization and uninitialization functions"? As those are used to create connection specific objects we cannot move them out of vboxOpen/vboxClose. If they are not thread-safe than we need to put a global mutex around these calls. This is just what I noticed from a quick look at the code in the context of the threading problem. This probably needs a more detailed analysis. -- Matthias Bolte http://photron.blogspot.com

On Thursday 19 April 2012 16:23:19 Matthias Bolte wrote:
Am 19. April 2012 12:51 schrieb Jean-Baptiste Rouault
A global mutex is needed in the vbox driver to protect access to g_pVBoxGlobalData, the vboxRegister() function seems to be the best place to initialize such a mutex unless there is another entry point to do this ?
Such a mutex doesn't help.
Looking at g_pVBoxGlobalData tells me that we need to get rid of it in its current form for different reasons. The major reason is that it contains connection specific data such as the conn and domainEvents members. This means when you open a new connection you break all other open connections because connection specific data is overwritten. We need to redesign this.
A major reason for the existence of g_pVBoxGlobalData is given in line 209 of vbox_tmpl.c: g_pVBoxGlobalData needs to be global because event callbacks won't work otherwise. Actually that's not true. Who ever did the original implementation of this was not aware of how COM (MS and XP variants) works.
There is a vboxAllocCallbackObj function that creates an IVirtualBoxCallback COM object. The first member in a COM objects is a pointer to its vtbl that contains pointers to all the methods that can be called on it. After this vtbl the COM object contains its private data members, this aspect is not used in the current implementation.
So we could just allocate a bigger object and put the data that belongs to the IVirtualBoxCallback implementation into this additional memory. This includes at least vboxCallBackRefCount and domainEvents that are currently located in g_pVBoxGlobalData.
The only two members of g_pVBoxGlobalData that can stay global are caps and pFuncs, all other members are specific to a connection and cannot be global.
I'm not familiar with COM, but if all this connection-specific code can be moved out it's nice.
Are you referring to data->pFuncs->pfnComInitialize and data->pFuncs->pfnComUninitialize when you say "VirtualBox C bindings initialization and uninitialization functions"? As those are used to create connection specific objects we cannot move them out of vboxOpen/vboxClose. If they are not thread-safe than we need to put a global mutex around these calls.
Yes these two functions aren't thread-safe, it is even worse than that. They are located in src/VBox/Main/cbinding/VBoxXPCOMC.cpp in the VirtualBox source code. There are 4 static pointers which are replaced at each pfnComInitialize call, and "released" (via the NS_RELEASE macro) in pfnComUninitialize. These accesses aren't protected at all, but what is worse is that when you call pfnComUninitialize, the pointers to the IVirtualBox, ISession and nsIEventQueue are released. So if you open multiple connections, then close one of them, these objects are deleted and your last opened connection is broken. As you said, a global mutex is needed around these calls, but I think we should also use the VBOX_ADDREF() macro on the IVirtualBox, ISession and nsIEventQueue objects after pfnComInitialize, and VBOX_RELEASE() after pfnComUninitialize. -- Jean-Baptiste ROUAULT Ingénieur R&D - diateam : Architectes de l'information Phone : +33 (0)2 98 050 050 Fax : +33 (0)2 98 050 051
participants (3)
-
Daniel P. Berrange
-
Jean-Baptiste Rouault
-
Matthias Bolte