[libvirt] [PATCH] [RFC] nwfilter: resolve deadlock between VM operations and filter update

V2: remove the locks from qemudVMFilterRebuild & umlVMFilterRebuild This is from a bug report and conversation on IRC where Soren reported that while a filter update is occurring on one or more VMs (due to a rule having been edited for example), a deadlock can occur when a VM referencing a filter is started. The problem is caused by the two locking sequences of qemu driver, qemu domain, filter # for the VM start operation filter, qemu_driver, qemu_domain # for the filter update operation that obviously don't lock in the same order. The problem is the 2nd lock sequence. Here the qemu_driver lock is being grabbed in qemu_driver:qemudVMFilterRebuild() The following solution is based on the idea of trying to re-arrange the 2nd sequence of locks as follows: qemu_driver, filter, qemu_driver, qemu_domain and making the qemu driver recursively lockable so that a second lock can occur, this would then lead to the following net-locking sequence qemu_driver, filter, qemu_domain where the 2nd qemu_driver lock has been ( logically ) eliminated. The 2nd part of the idea is that the sequence of locks (filter, qemu_domain) and (qemu_domain, filter) becomes interchangeable if all code paths where filter AND qemu_domain are locked have a preceding qemu_domain lock that basically blocks their concurrent execution So, the following code paths exist towards qemu_driver:qemudVMFilterRebuild where we now want to put a qemu_driver lock in front of the filter lock. -> nwfilterUndefine() [ locks the filter ] -> virNWFilterTestUnassignDef() -> virNWFilterTriggerVMFilterRebuild() -> qemudVMFilterRebuild() -> nwfilterDefine() -> virNWFilterPoolAssignDef() [ locks the filter ] -> virNWFilterTriggerVMFilterRebuild() -> qemudVMFilterRebuild() -> nwfilterDriverReload() -> virNWFilterPoolLoadAllConfigs() ->virNWFilterPoolObjLoad() -> virNWFilterPoolAssignDef() [ locks the filter ] -> virNWFilterTriggerVMFilterRebuild() -> qemudVMFilterRebuild() -> nwfilterDriverStartup() -> virNWFilterPoolLoadAllConfigs() ->virNWFilterPoolObjLoad() -> virNWFilterPoolAssignDef() [ locks the filter ] -> virNWFilterTriggerVMFilterRebuild() -> qemudVMFilterRebuild() Qemu is not the only driver using the nwfilter driver, but also the UML driver calls into it. Therefore qemuVMFilterRebuild() can be exchanged with umlVMFilterRebuild() along with the driver lock of qemu_driver that can now be a uml_driver. Further, since UML and Qemu domains can be running on the same machine, the triggering of a rebuild of the filter can touch both types of drivers and their domains. In the patch below I am now extending each nwfilter callback driver with functions for locking and unlocking the (VM) driver (UML, QEMU) and introduce new functions for locking all registered callback drivers and unlocking them. Then I am distributing the lock-all-cbdrivers/unlock-all-cbdrivers call into the above call paths. The last shown callpath starting with nwfilterDriverStart() is problematic since it is initialize before the Qemu and UML drives are and thus a lock in the path would result in a NULL pointer attempted to be locked -- the call to virNWFilterTriggerVMFilterRebuild() is never called, so we never lock either the qemu_driver or the uml_driver in that path. Therefore, only the first 3 paths now receive calls to lock and unlock all callback drivers. Now that the locks are distributed where it matters I can remove the qemu_driver and uml_driver lock from qemudVMFilterRebuild() and umlVMFilterRebuild() and not requiring the recursive locks. For now I want to put this out as an RFC patch. I have tested it by 'stretching' the critical section after the define/undefine functions each lock the filter so I can (easily) concurrently execute another VM operation (suspend,start). That code is in this patch and if you want you can de-activate it. It seems to work ok and operations are being blocked while the update is being done. I still also want to verify the other assumption above that locking filter and qemu_domain always has a preceding qemu_driver lock. Signed-off-by: Stefan Berger <stefanb@us.ibm.com> --- src/conf/nwfilter_conf.c | 18 ++++++++++++++++++ src/conf/nwfilter_conf.h | 6 ++++++ src/libvirt_private.syms | 2 ++ src/nwfilter/nwfilter_driver.c | 26 ++++++++++++++++++++++++++ src/qemu/qemu_driver.c | 19 +++++++++++++++---- src/uml/uml_driver.c | 18 ++++++++++++++---- 6 files changed, 81 insertions(+), 8 deletions(-) Index: libvirt-acl/src/conf/nwfilter_conf.h =================================================================== --- libvirt-acl.orig/src/conf/nwfilter_conf.h +++ libvirt-acl/src/conf/nwfilter_conf.h @@ -639,6 +639,8 @@ void virNWFilterConfLayerShutdown(void); typedef int (*virNWFilterRebuild)(virConnectPtr conn, virHashIterator, void *data); +typedef void (*virNWFilterVoidCall)(void); + typedef struct _virNWFilterCallbackDriver virNWFilterCallbackDriver; typedef virNWFilterCallbackDriver *virNWFilterCallbackDriverPtr; @@ -646,9 +648,13 @@ struct _virNWFilterCallbackDriver { const char *name; virNWFilterRebuild vmFilterRebuild; + virNWFilterVoidCall vmDriverLock; + virNWFilterVoidCall vmDriverUnlock; }; void virNWFilterRegisterCallbackDriver(virNWFilterCallbackDriverPtr); +void virNWFilterCallbackDriversLock(void); +void virNWFilterCallbackDriversUnlock(void); VIR_ENUM_DECL(virNWFilterRuleAction); Index: libvirt-acl/src/qemu/qemu_driver.c =================================================================== --- libvirt-acl.orig/src/qemu/qemu_driver.c +++ libvirt-acl/src/qemu/qemu_driver.c @@ -12725,11 +12725,7 @@ static int qemudVMFilterRebuild(virConnectPtr conn ATTRIBUTE_UNUSED, virHashIterator iter, void *data) { - struct qemud_driver *driver = qemu_driver; - - qemuDriverLock(driver); virHashForEach(qemu_driver->domains.objs, iter, data); - qemuDriverUnlock(driver); return 0; } @@ -12757,9 +12753,24 @@ qemudVMFiltersInstantiate(virConnectPtr return err; } + +static void +qemudVMDriverLock(void) { + qemuDriverLock(qemu_driver); +}; + + +static void +qemudVMDriverUnlock(void) { + qemuDriverUnlock(qemu_driver); +}; + + static virNWFilterCallbackDriver qemuCallbackDriver = { .name = "QEMU", .vmFilterRebuild = qemudVMFilterRebuild, + .vmDriverLock = qemudVMDriverLock, + .vmDriverUnlock = qemudVMDriverUnlock, }; int qemuRegister(void) { Index: libvirt-acl/src/uml/uml_driver.c =================================================================== --- libvirt-acl.orig/src/uml/uml_driver.c +++ libvirt-acl/src/uml/uml_driver.c @@ -2202,11 +2202,7 @@ static int umlVMFilterRebuild(virConnectPtr conn ATTRIBUTE_UNUSED, virHashIterator iter, void *data) { - struct uml_driver *driver = uml_driver; - - umlDriverLock(driver); virHashForEach(uml_driver->domains.objs, iter, data); - umlDriverUnlock(driver); return 0; } @@ -2219,9 +2215,23 @@ static virStateDriver umlStateDriver = { .active = umlActive, }; +static void +umlVMDriverLock(void) +{ + umlDriverLock(uml_driver); +} + +static void +umlVMDriverUnlock(void) +{ + umlDriverUnlock(uml_driver); +} + static virNWFilterCallbackDriver umlCallbackDriver = { .name = "UML", .vmFilterRebuild = umlVMFilterRebuild, + .vmDriverLock = umlVMDriverLock, + .vmDriverUnlock = umlVMDriverUnlock, }; int umlRegister(void) { Index: libvirt-acl/src/conf/nwfilter_conf.c =================================================================== --- libvirt-acl.orig/src/conf/nwfilter_conf.c +++ libvirt-acl/src/conf/nwfilter_conf.c @@ -2161,6 +2161,24 @@ virNWFilterRegisterCallbackDriver(virNWF } } +void +virNWFilterCallbackDriversLock(void) +{ + int i; + + for (i = 0; i < nCallbackDriver; i++) + callbackDrvArray[i]->vmDriverLock(); +} + +void +virNWFilterCallbackDriversUnlock(void) +{ + int i; + + for (i = 0; i < nCallbackDriver; i++) + callbackDrvArray[i]->vmDriverUnlock(); +} + static virHashIterator virNWFilterDomainFWUpdateCB; Index: libvirt-acl/src/libvirt_private.syms =================================================================== --- libvirt-acl.orig/src/libvirt_private.syms +++ libvirt-acl/src/libvirt_private.syms @@ -534,6 +534,8 @@ virNWFilterConfLayerInit; virNWFilterConfLayerShutdown; virNWFilterLockFilterUpdates; virNWFilterUnlockFilterUpdates; +virNWFilterCallbackDriversLock; +virNWFilterCallbackDriversUnlock; # nwfilter_params.h Index: libvirt-acl/src/nwfilter/nwfilter_driver.c =================================================================== --- libvirt-acl.orig/src/nwfilter/nwfilter_driver.c +++ libvirt-acl/src/nwfilter/nwfilter_driver.c @@ -34,6 +34,7 @@ #include "memory.h" #include "domain_conf.h" #include "domain_nwfilter.h" +#include "nwfilter_conf.h" #include "nwfilter_driver.h" #include "nwfilter_gentech_driver.h" @@ -152,9 +153,13 @@ nwfilterDriverReload(void) { virNWFilterLearnThreadsTerminate(true); nwfilterDriverLock(driverState); + virNWFilterCallbackDriversLock(); + virNWFilterPoolLoadAllConfigs(conn, &driverState->pools, driverState->configDir); + + virNWFilterCallbackDriversUnlock(); nwfilterDriverUnlock(driverState); virConnectClose(conn); @@ -328,12 +333,21 @@ nwfilterDefine(virConnectPtr conn, virNWFilterPtr ret = NULL; nwfilterDriverLock(driver); + virNWFilterCallbackDriversLock(); + if (!(def = virNWFilterDefParseString(conn, xml))) goto cleanup; if (!(pool = virNWFilterPoolObjAssignDef(conn, &driver->pools, def))) goto cleanup; +#define CRITICAL_SECTION_STRETCH 0 +if (true) { + fprintf(stderr,"sleep in %s\n", __FUNCTION__); + sleep(CRITICAL_SECTION_STRETCH); + fprintf(stderr,"continue in %s\n", __FUNCTION__); +} + if (virNWFilterPoolObjSaveDef(driver, pool, def) < 0) { virNWFilterPoolObjRemove(&driver->pools, pool); def = NULL; @@ -347,6 +361,8 @@ cleanup: virNWFilterDefFree(def); if (pool) virNWFilterPoolObjUnlock(pool); + + virNWFilterCallbackDriversUnlock(); nwfilterDriverUnlock(driver); return ret; } @@ -359,6 +375,8 @@ nwfilterUndefine(virNWFilterPtr obj) { int ret = -1; nwfilterDriverLock(driver); + virNWFilterCallbackDriversLock(); + pool = virNWFilterPoolObjFindByUUID(&driver->pools, obj->uuid); if (!pool) { virNWFilterReportError(VIR_ERR_INVALID_NWFILTER, @@ -366,6 +384,12 @@ nwfilterUndefine(virNWFilterPtr obj) { goto cleanup; } +if (true) { + fprintf(stderr,"sleep in %s\n", __FUNCTION__); + sleep(CRITICAL_SECTION_STRETCH); + fprintf(stderr,"continue in %s\n", __FUNCTION__); +} + if (virNWFilterTestUnassignDef(obj->conn, pool)) { virNWFilterReportError(VIR_ERR_INVALID_NWFILTER, "%s", @@ -385,6 +409,8 @@ nwfilterUndefine(virNWFilterPtr obj) { cleanup: if (pool) virNWFilterPoolObjUnlock(pool); + + virNWFilterCallbackDriversUnlock(); nwfilterDriverUnlock(driver); return ret; }

I had trouble applying the patch (I think maybe Thunderbird may have fiddled with the formatting :( ), but after doing it manually, it works excellently. Thanks! -- Soren Hansen Ubuntu Developer http://www.ubuntu.com/ OpenStack Developer http://www.openstack.org/

On 10/07/2010 09:06 AM, Soren Hansen wrote:
I had trouble applying the patch (I think maybe Thunderbird may have fiddled with the formatting :( ), but after doing it manually, it works excellently. Thanks!
Great. I will prepare a V3. I am also shooting a kill -SIGHUP at libvirt once in a while to see what happens (while creating / destroying 2 VMs and modifying their filters). Most of the time all goes well, but occasionally things do get stuck. I get the following debugging output from libvirt and attaching gdb to libvirt I see the following stack traces. Maybe Daniel can interpret this... To me it looks like some of the conditions need to be 'tickled'... 09:47:25.000: error : qemuAutostartDomain:822 : Failed to start job on VM 'dummy-vm1': Timed out during operation: cannot acquire state change lock (gdb) thr ap all bt Thread 9 (Thread 0x7f49bf592710 (LWP 17464)): #0 0x000000327680b729 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000435312 in virCondWaitUntil (c=<value optimized out>, m=<value optimized out>, whenms=<value optimized out>) at util/threads-pthread.c:115 #2 0x000000000043d0ab in qemuDomainObjBeginJobWithDriver (driver=0x1f9c010, obj=0x7f49a00011b0) at qemu/qemu_driver.c:409 #3 0x0000000000458abf in qemuAutostartDomain (payload=<value optimized out>, name=<value optimized out>, opaque=0x7f49bf591320) at qemu/qemu_driver.c:818 #4 0x00007f49c040ab6a in virHashForEach (table=0x1f9be20, iter=0x458a90 <qemuAutostartDomain>, data=0x7f49bf591320) at util/hash.c:495 #5 0x000000000043cdac in qemudAutostartConfigs (driver=0x1f9c010) at qemu/qemu_driver.c:855 #6 0x000000000043ce2a in qemudReload () at qemu/qemu_driver.c:2003 #7 0x00007f49c0450a3e in virStateReload () at libvirt.c:1017 #8 0x00000000004189e1 in qemudDispatchSignalEvent ( watch=<value optimized out>, fd=<value optimized out>, events=<value optimized out>, opaque=0x1f6f830) at libvirtd.c:388 ---Type <return> to continue, or q <return> to quit--- #9 0x00000000004186a9 in virEventDispatchHandles () at event.c:479 #10 virEventRunOnce () at event.c:608 #11 0x000000000041a346 in qemudOneLoop () at libvirtd.c:2217 #12 0x000000000041a613 in qemudRunLoop (opaque=0x1f6f830) at libvirtd.c:2326 #13 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #14 0x00000032760e14ed in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f49beb91710 (LWP 17465)): #0 0x000000327680b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000435336 in virCondWait (c=<value optimized out>, m=<value optimized out>) at util/threads-pthread.c:100 #2 0x000000000041b2e5 in qemudWorker (data=0x7f49b80008c0) at libvirtd.c:1549 #3 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #4 0x00000032760e14ed in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f49be190710 (LWP 17466)): #0 0x000000327680b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000435336 in virCondWait (c=<value optimized out>, m=<value optimized out>) at util/threads-pthread.c:100 #2 0x00000000004716f6 in qemuMonitorSend (mon=0x7f49a00040b0, ---Type <return> to continue, or q <return> to quit--- msg=<value optimized out>) at qemu/qemu_monitor.c:726 #3 0x0000000000473673 in qemuMonitorCommandWithHandler (mon=0x7f49a00040b0, cmd=0x4c316f "info cpus", passwordHandler=0, passwordOpaque=0x0, scm_fd=-1, reply=0x7f49be18e400) at qemu/qemu_monitor_text.c:233 #4 0x000000000047845e in qemuMonitorTextGetCPUInfo ( mon=<value optimized out>, pids=0x7f49be18fa18) at qemu/qemu_monitor_text.c:401 #5 0x0000000000455db2 in qemuDetectVcpuPIDs (conn=0x7f49ac036100, driver=0x1f9c010, vm=<value optimized out>, migrateFrom=0x0, start_paused=false, stdin_fd=-1, stdin_path=0x0) at qemu/qemu_driver.c:2413 #6 qemudStartVMDaemon (conn=0x7f49ac036100, driver=0x1f9c010, vm=<value optimized out>, migrateFrom=0x0, start_paused=false, stdin_fd=-1, stdin_path=0x0) at qemu/qemu_driver.c:3925 #7 0x0000000000457f6b in qemudDomainCreate (conn=0x7f49ac036100, xml=<value optimized out>, flags=0) at qemu/qemu_driver.c:4559 #8 0x00007f49c04564ab in virDomainCreateXML (conn=0x7f49ac036100, xmlDesc=0x7f49a02b8bf0 " <domain type='kvm'>\n <name>dummy-vm2</name>\n <memory>32768</memory>\n <currentMemory>32768</currentMemory>\n <vcpu>1</vcpu>\n <os>\n <type>hvm</type>\n <boot dev='hd'/>\n </os>"..., flags=0) at libvirt.c:1984 #9 0x0000000000427da8 in remoteDispatchDomainCreateXml ( server=<value optimized out>, client=<value optimized out>, ---Type <return> to continue, or q <return> to quit--- conn=0x7f49ac036100, hdr=<value optimized out>, rerr=0x7f49be18fc70, args=<value optimized out>, ret=0x7f49be18fbc0) at remote.c:1271 #10 0x000000000042a657 in remoteDispatchClientCall (server=0x1f6f830, client=0x7f49b83f03c0, msg=0x7f49b8453800) at dispatch.c:529 #11 remoteDispatchClientRequest (server=0x1f6f830, client=0x7f49b83f03c0, msg=0x7f49b8453800) at dispatch.c:407 #12 0x000000000041b378 in qemudWorker (data=0x7f49b80008d8) at libvirtd.c:1570 #13 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #14 0x00000032760e14ed in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f49bd78f710 (LWP 17467)): #0 0x000000327680b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000435336 in virCondWait (c=<value optimized out>, m=<value optimized out>) at util/threads-pthread.c:100 #2 0x000000000041b2e5 in qemudWorker (data=0x7f49b80008f0) at libvirtd.c:1549 #3 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #4 0x00000032760e14ed in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f49bcd8e710 (LWP 17468)): #0 0x000000327680b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #1 0x0000000000435336 in virCondWait (c=<value optimized out>, m=<value optimized out>) at util/threads-pthread.c:100 #2 0x00000000004716f6 in qemuMonitorSend (mon=0x7f49ac099740, msg=<value optimized out>) at qemu/qemu_monitor.c:726 #3 0x0000000000473673 in qemuMonitorCommandWithHandler (mon=0x7f49ac099740, cmd=0x7f49ac381620 "balloon 32", passwordHandler=0, passwordOpaque=0x0, scm_fd=-1, reply=0x7f49bcd8c420) at qemu/qemu_monitor_text.c:233 #4 0x0000000000475f45 in qemuMonitorTextSetBalloon (mon=0x7f49ac099740, newmem=<value optimized out>) at qemu/qemu_monitor_text.c:791 #5 0x0000000000455743 in qemudStartVMDaemon (conn=0x7f49a811f9c0, driver=0x1f9c010, vm=<value optimized out>, migrateFrom=0x0, start_paused=false, stdin_fd=-1, stdin_path=0x0) at qemu/qemu_driver.c:3942 #6 0x0000000000457f6b in qemudDomainCreate (conn=0x7f49a811f9c0, xml=<value optimized out>, flags=0) at qemu/qemu_driver.c:4559 #7 0x00007f49c04564ab in virDomainCreateXML (conn=0x7f49a811f9c0, xmlDesc=0x7f49ac2f9f20 " <domain type='kvm'>\n <name>dummy-vm1</name>\n <memory>32768</memory>\n <currentMemory>32768</currentMemory>\n <vcpu>1</vcpu>\n <os>\n <type>hvm</type>\n <boot dev='hd'/>\n </os>"..., flags=0) at libvirt.c:1984 #8 0x0000000000427da8 in remoteDispatchDomainCreateXml ( server=<value optimized out>, client=<value optimized out>, conn=0x7f49a811f9c0, hdr=<value optimized out>, rerr=0x7f49bcd8dc70, ---Type <return> to continue, or q <return> to quit--- args=<value optimized out>, ret=0x7f49bcd8dbc0) at remote.c:1271 #9 0x000000000042a657 in remoteDispatchClientCall (server=0x1f6f830, client=0x7f49b89b41d0, msg=0x7f49b8306460) at dispatch.c:529 #10 remoteDispatchClientRequest (server=0x1f6f830, client=0x7f49b89b41d0, msg=0x7f49b8306460) at dispatch.c:407 #11 0x000000000041b378 in qemudWorker (data=0x7f49b8000908) at libvirtd.c:1570 #12 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #13 0x00000032760e14ed in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f49b7fff710 (LWP 17469)): #0 0x000000327680b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000435336 in virCondWait (c=<value optimized out>, m=<value optimized out>) at util/threads-pthread.c:100 #2 0x000000000041b2e5 in qemudWorker (data=0x7f49b8000920) at libvirtd.c:1549 #3 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #4 0x00000032760e14ed in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f486e22b710 (LWP 4599)): #0 0x00000032760d7de3 in poll () from /lib64/libc.so.6 #1 0x0000003279c07a02 in ?? () from /usr/lib64/libpcap.so.1 #2 0x0000003279c0c65b in pcap_next () from /usr/lib64/libpcap.so.1 ---Type <return> to continue, or q <return> to quit--- #3 0x00000000004afce4 in learnIPAddressThread (arg=0x7f49b82de290) at nwfilter/nwfilter_learnipaddr.c:496 #4 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #5 0x00000032760e14ed in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f486d82a710 (LWP 4897)): #0 0x00000032760d7de3 in poll () from /lib64/libc.so.6 #1 0x0000003279c07a02 in ?? () from /usr/lib64/libpcap.so.1 #2 0x0000003279c0c65b in pcap_next () from /usr/lib64/libpcap.so.1 #3 0x00000000004afce4 in learnIPAddressThread (arg=0x7f49b8802310) at nwfilter/nwfilter_learnipaddr.c:496 #4 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #5 0x00000032760e14ed in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f49c03c27e0 (LWP 17460)): #0 0x0000003276807fbd in pthread_join () from /lib64/libpthread.so.0 #1 0x000000000041e434 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:3272 (gdb)

On Thu, Oct 07, 2010 at 09:58:28AM -0400, Stefan Berger wrote:
On 10/07/2010 09:06 AM, Soren Hansen wrote:
I had trouble applying the patch (I think maybe Thunderbird may have fiddled with the formatting :( ), but after doing it manually, it works excellently. Thanks!
Great. I will prepare a V3.
I am also shooting a kill -SIGHUP at libvirt once in a while to see what happens (while creating / destroying 2 VMs and modifying their filters). Most of the time all goes well, but occasionally things do get stuck. I get the following debugging output from libvirt and attaching gdb to libvirt I see the following stack traces. Maybe Daniel can interpret this... To me it looks like some of the conditions need to be 'tickled'...
09:47:25.000: error : qemuAutostartDomain:822 : Failed to start job on VM 'dummy-vm1': Timed out during operation: cannot acquire state change lock
(gdb) thr ap all bt
Thread 9 (Thread 0x7f49bf592710 (LWP 17464)): #0 0x000000327680b729 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000435312 in virCondWaitUntil (c=<value optimized out>, m=<value optimized out>, whenms=<value optimized out>) at util/threads-pthread.c:115 #2 0x000000000043d0ab in qemuDomainObjBeginJobWithDriver (driver=0x1f9c010, obj=0x7f49a00011b0) at qemu/qemu_driver.c:409 #3 0x0000000000458abf in qemuAutostartDomain (payload=<value optimized out>, name=<value optimized out>, opaque=0x7f49bf591320) at qemu/qemu_driver.c:818 #4 0x00007f49c040ab6a in virHashForEach (table=0x1f9be20, iter=0x458a90 <qemuAutostartDomain>, data=0x7f49bf591320) at util/hash.c:495 #5 0x000000000043cdac in qemudAutostartConfigs (driver=0x1f9c010) at qemu/qemu_driver.c:855 #6 0x000000000043ce2a in qemudReload () at qemu/qemu_driver.c:2003 #7 0x00007f49c0450a3e in virStateReload () at libvirt.c:1017 #8 0x00000000004189e1 in qemudDispatchSignalEvent ( watch=<value optimized out>, fd=<value optimized out>, events=<value optimized out>, opaque=0x1f6f830) at libvirtd.c:388 ---Type <return> to continue, or q <return> to quit--- #9 0x00000000004186a9 in virEventDispatchHandles () at event.c:479 #10 virEventRunOnce () at event.c:608 #11 0x000000000041a346 in qemudOneLoop () at libvirtd.c:2217 #12 0x000000000041a613 in qemudRunLoop (opaque=0x1f6f830) at libvirtd.c:2326 #13 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #14 0x00000032760e14ed in clone () from /lib64/libc.so.6
This thread shows the problem. Guests must not be run directly from the event loop thread, because startup requires waiting for I/O events. So this thread is sitting on the condition variable waiting for an I/O event to complete, but because its doing this from the event loop thread the event loop isn't running. So the condition will never be signalled. This is completely unrelated to the other problems discussed in this thread & I'm surprised we've not seen it before now! When you send SIGHUP to libvirt this triggers a reload of the guest domain configs. For some reason we also have this SIGHUP re-triggering autostart. IMHO this is a very big mistake. If a guest is marked as autostart, I don't think an admin would expect it to be started when just sending SIGHUP. I think we should fix it so that autostart is only ever done at daemon startup, not SIGHUP. This would avoid the entire problem code path here Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 10/13/2010 09:11 AM, Daniel P. Berrange wrote:
On Thu, Oct 07, 2010 at 09:58:28AM -0400, Stefan Berger wrote:
On 10/07/2010 09:06 AM, Soren Hansen wrote:
I had trouble applying the patch (I think maybe Thunderbird may have fiddled with the formatting :( ), but after doing it manually, it works excellently. Thanks!
Great. I will prepare a V3.
I am also shooting a kill -SIGHUP at libvirt once in a while to see what happens (while creating / destroying 2 VMs and modifying their filters). Most of the time all goes well, but occasionally things do get stuck. I get the following debugging output from libvirt and attaching gdb to libvirt I see the following stack traces. Maybe Daniel can interpret this... To me it looks like some of the conditions need to be 'tickled'...
(gdb) thr ap all bt
Thread 9 (Thread 0x7f49bf592710 (LWP 17464)): #0 0x000000327680b729 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000435312 in virCondWaitUntil (c=<value optimized out>, m=<value optimized out>, whenms=<value optimized out>) at util/threads-pthread.c:115 #2 0x000000000043d0ab in qemuDomainObjBeginJobWithDriver (driver=0x1f9c010, obj=0x7f49a00011b0) at qemu/qemu_driver.c:409 #3 0x0000000000458abf in qemuAutostartDomain (payload=<value optimized out>, name=<value optimized out>, opaque=0x7f49bf591320) at qemu/qemu_driver.c:818 #4 0x00007f49c040ab6a in virHashForEach (table=0x1f9be20, iter=0x458a90<qemuAutostartDomain>, data=0x7f49bf591320) at util/hash.c:495 #5 0x000000000043cdac in qemudAutostartConfigs (driver=0x1f9c010) at qemu/qemu_driver.c:855 #6 0x000000000043ce2a in qemudReload () at qemu/qemu_driver.c:2003 #7 0x00007f49c0450a3e in virStateReload () at libvirt.c:1017 #8 0x00000000004189e1 in qemudDispatchSignalEvent ( watch=<value optimized out>, fd=<value optimized out>, events=<value optimized out>, opaque=0x1f6f830) at libvirtd.c:388 ---Type<return> to continue, or q<return> to quit--- #9 0x00000000004186a9 in virEventDispatchHandles () at event.c:479 #10 virEventRunOnce () at event.c:608 #11 0x000000000041a346 in qemudOneLoop () at libvirtd.c:2217 #12 0x000000000041a613 in qemudRunLoop (opaque=0x1f6f830) at libvirtd.c:2326 #13 0x0000003276807761 in start_thread () from /lib64/libpthread.so.0 #14 0x00000032760e14ed in clone () from /lib64/libc.so.6
This thread shows the problem. Guests must not be run directly from the event loop thread, because startup requires waiting for I/O events. So this thread is sitting on the condition variable waiting for an I/O event to complete, but because its doing this from the event loop thread the event loop isn't running. So the condition will never be signalled. This is completely unrelated to the other problems discussed in this thread& I'm surprised we've not seen it before now!
Yes, it's unrelated and came up through my testing of the code paths I touched with the deadlock-prevention patch...
When you send SIGHUP to libvirt this triggers a reload of the guest domain configs. For some reason we also have this SIGHUP re-triggering autostart. IMHO this is a very big mistake. If a guest is marked as autostart, I don't think an admin would expect it to be started when just sending SIGHUP. I think we should fix it so that autostart is only ever done at daemon startup, not SIGHUP. This would avoid the entire problem code path here
FWIW, I don't have any VM marked as 'autostart', but the code seems to be doing something 'for' VMs no matter whether they are marked as autostart or not, i.e., run 'qemuDomainObjBeginJobWithDriver' Stefan

On Wed, Oct 06, 2010 at 05:56:36PM -0400, Stefan Berger wrote:
V2: remove the locks from qemudVMFilterRebuild & umlVMFilterRebuild [...] So, the following code paths exist towards qemu_driver:qemudVMFilterRebuild where we now want to put a qemu_driver lock in front of the filter lock.
-> nwfilterUndefine() [ locks the filter ] -> virNWFilterTestUnassignDef() -> virNWFilterTriggerVMFilterRebuild() -> qemudVMFilterRebuild()
-> nwfilterDefine() -> virNWFilterPoolAssignDef() [ locks the filter ] -> virNWFilterTriggerVMFilterRebuild() -> qemudVMFilterRebuild()
-> nwfilterDriverReload() -> virNWFilterPoolLoadAllConfigs() ->virNWFilterPoolObjLoad() -> virNWFilterPoolAssignDef() [ locks the filter ] -> virNWFilterTriggerVMFilterRebuild() -> qemudVMFilterRebuild()
-> nwfilterDriverStartup() -> virNWFilterPoolLoadAllConfigs() ->virNWFilterPoolObjLoad() -> virNWFilterPoolAssignDef() [ locks the filter ] -> virNWFilterTriggerVMFilterRebuild() -> qemudVMFilterRebuild()
Qemu is not the only driver using the nwfilter driver, but also the UML driver calls into it. Therefore qemuVMFilterRebuild() can be exchanged with umlVMFilterRebuild() along with the driver lock of qemu_driver that can now be a uml_driver. Further, since UML and Qemu domains can be running on the same machine, the triggering of a rebuild of the filter can touch both types of drivers and their domains.
okay
In the patch below I am now extending each nwfilter callback driver with functions for locking and unlocking the (VM) driver (UML, QEMU) and introduce new functions for locking all registered callback drivers and unlocking them. Then I am distributing the lock-all-cbdrivers/unlock-all-cbdrivers call into the above call paths. The last shown callpath starting with nwfilterDriverStart() is problematic since it is initialize before the Qemu and UML drives are and thus a lock in the path would result in a NULL pointer attempted to be locked -- the call to virNWFilterTriggerVMFilterRebuild() is never called, so we never lock either the qemu_driver or the uml_driver in that path. Therefore, only the first 3 paths now receive calls to lock and unlock all callback drivers. Now that the locks are distributed where it matters I can remove the qemu_driver and uml_driver lock from qemudVMFilterRebuild() and umlVMFilterRebuild() and not requiring the recursive locks.
okay,
For now I want to put this out as an RFC patch. I have tested it by 'stretching' the critical section after the define/undefine functions each lock the filter so I can (easily) concurrently execute another VM operation (suspend,start). That code is in this patch and if you want you can de-activate it. It seems to work ok and operations are being blocked while the update is being done. I still also want to verify the other assumption above that locking filter and qemu_domain always has a preceding qemu_driver lock.
I looked at the patch and the only thing that need to be cleaned up is the stretching blocki below, otherwise looks fine to me.
+if (true) { + fprintf(stderr,"sleep in %s\n", __FUNCTION__); + sleep(CRITICAL_SECTION_STRETCH); + fprintf(stderr,"continue in %s\n", __FUNCTION__); +} +
It would be good to have some ways to exercise all code paths in the locking (okay error handling specific paths aside because it's really hard), before applying. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On 10/07/2010 09:52 AM, Daniel Veillard wrote:
On Wed, Oct 06, 2010 at 05:56:36PM -0400, Stefan Berger wrote:
V2: remove the locks from qemudVMFilterRebuild& umlVMFilterRebuild
For now I want to put this out as an RFC patch. I have tested it by 'stretching' the critical section after the define/undefine functions each lock the filter so I can (easily) concurrently execute another VM operation (suspend,start). That code is in this patch and if you want you can de-activate it. It seems to work ok and operations are being blocked while the update is being done. I still also want to verify the other assumption above that locking filter and qemu_domain always has a preceding qemu_driver lock. I looked at the patch and the only thing that need to be cleaned up is the stretching blocki below, otherwise looks fine to me.
Yes, I'll remove that.
+if (true) { + fprintf(stderr,"sleep in %s\n", __FUNCTION__); + sleep(CRITICAL_SECTION_STRETCH); + fprintf(stderr,"continue in %s\n", __FUNCTION__); +} + It would be good to have some ways to exercise all code paths in the locking (okay error handling specific paths aside because it's really hard), before applying.
It will take some time, but I'll try to write a TCK test that does something like Soren was testing: - concurrently create/destroy 2 vms - concurrently modify their filters - plus occasionally shoot a SIGHUP at libvirt ( if possible ) That would test some of the code paths. Stefan

On Thu, Oct 07, 2010 at 10:06:55AM -0400, Stefan Berger wrote:
On 10/07/2010 09:52 AM, Daniel Veillard wrote:
On Wed, Oct 06, 2010 at 05:56:36PM -0400, Stefan Berger wrote:
V2: remove the locks from qemudVMFilterRebuild& umlVMFilterRebuild
For now I want to put this out as an RFC patch. I have tested it by 'stretching' the critical section after the define/undefine functions each lock the filter so I can (easily) concurrently execute another VM operation (suspend,start). That code is in this patch and if you want you can de-activate it. It seems to work ok and operations are being blocked while the update is being done. I still also want to verify the other assumption above that locking filter and qemu_domain always has a preceding qemu_driver lock. I looked at the patch and the only thing that need to be cleaned up is the stretching blocki below, otherwise looks fine to me.
Yes, I'll remove that.
sure, it was an RFC :-)
+if (true) { + fprintf(stderr,"sleep in %s\n", __FUNCTION__); + sleep(CRITICAL_SECTION_STRETCH); + fprintf(stderr,"continue in %s\n", __FUNCTION__); +} + It would be good to have some ways to exercise all code paths in the locking (okay error handling specific paths aside because it's really hard), before applying.
It will take some time, but I'll try to write a TCK test that does something like Soren was testing:
- concurrently create/destroy 2 vms - concurrently modify their filters - plus occasionally shoot a SIGHUP at libvirt ( if possible )
That would test some of the code paths.
That would be nice, yes Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Wed, Oct 06, 2010 at 05:56:36PM -0400, Stefan Berger wrote:
V2: remove the locks from qemudVMFilterRebuild & umlVMFilterRebuild
This is from a bug report and conversation on IRC where Soren reported that while a filter update is occurring on one or more VMs (due to a rule having been edited for example), a deadlock can occur when a VM referencing a filter is started.
The problem is caused by the two locking sequences of
qemu driver, qemu domain, filter # for the VM start operation filter, qemu_driver, qemu_domain # for the filter update operation
that obviously don't lock in the same order. The problem is the 2nd lock sequence. Here the qemu_driver lock is being grabbed in qemu_driver:qemudVMFilterRebuild()
The following solution is based on the idea of trying to re-arrange the 2nd sequence of locks as follows:
qemu_driver, filter, qemu_driver, qemu_domain
and making the qemu driver recursively lockable so that a second lock can occur, this would then lead to the following net-locking sequence
qemu_driver, filter, qemu_domain
where the 2nd qemu_driver lock has been ( logically ) eliminated.
The 2nd part of the idea is that the sequence of locks (filter, qemu_domain) and (qemu_domain, filter) becomes interchangeable if all code paths where filter AND qemu_domain are locked have a preceding qemu_domain lock that basically blocks their concurrent execution
I'm not entirely happy with this locking scheme, because the broken lock ordering problem is still present. We're just relying on the fact that we retain the global lock to protect us against the ongoing lock ordering issue. That said, I don't see any immediately obvious alternative to solve this problem. So I guess I'll ack this approach for now. One day this will likely need re-visiting... Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 10/13/2010 09:07 AM, Daniel P. Berrange wrote:
On Wed, Oct 06, 2010 at 05:56:36PM -0400, Stefan Berger wrote:
V2: remove the locks from qemudVMFilterRebuild& umlVMFilterRebuild
This is from a bug report and conversation on IRC where Soren reported that while a filter update is occurring on one or more VMs (due to a rule having been edited for example), a deadlock can occur when a VM referencing a filter is started.
The problem is caused by the two locking sequences of
qemu driver, qemu domain, filter # for the VM start operation filter, qemu_driver, qemu_domain # for the filter update operation
that obviously don't lock in the same order. The problem is the 2nd lock sequence. Here the qemu_driver lock is being grabbed in qemu_driver:qemudVMFilterRebuild()
The following solution is based on the idea of trying to re-arrange the 2nd sequence of locks as follows:
qemu_driver, filter, qemu_driver, qemu_domain
and making the qemu driver recursively lockable so that a second lock can occur, this would then lead to the following net-locking sequence
qemu_driver, filter, qemu_domain
where the 2nd qemu_driver lock has been ( logically ) eliminated.
The 2nd part of the idea is that the sequence of locks (filter, qemu_domain) and (qemu_domain, filter) becomes interchangeable if all code paths where filter AND qemu_domain are locked have a preceding qemu_domain lock that basically blocks their concurrent execution I'm not entirely happy with this locking scheme, because the broken lock ordering problem is still present. We're just relying on the fact that we retain the global lock to protect us against the ongoing lock ordering issue.
That said, I don't see any immediately obvious alternative to solve this problem. So I guess I'll ack this approach for now. One day this will likely need re-visiting...
Ok, I'll push it. Stefan
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Soren Hansen
-
Stefan Berger