[libvirt] PATCH: Fix DBus thread safety

We are using DBus from multiple threads in libvirtd. We are not calling dbus_threads_init_default(). Very bad things result. Happy-crashy-daemon This patch fixes that, and also stops DBus messing around with SIGPIPE and also stops it calling exit() when the bus disconnects, so we can actually see the error & continue with life. Daniel diff -rup libvirt-0.6.0.orig/qemud/qemud.c libvirt-0.6.0.new/qemud/qemud.c --- libvirt-0.6.0.orig/qemud/qemud.c 2009-02-18 10:56:34.000000000 +0000 +++ libvirt-0.6.0.new/qemud/qemud.c 2009-02-18 12:52:18.000000000 +0000 @@ -860,6 +860,10 @@ static struct qemud_server *qemudNetwork if (auth_unix_rw == REMOTE_AUTH_POLKIT || auth_unix_ro == REMOTE_AUTH_POLKIT) { DBusError derr; + + dbus_connection_set_change_sigpipe(FALSE); + dbus_threads_init_default(); + dbus_error_init(&derr); server->sysbus = dbus_bus_get(DBUS_BUS_SYSTEM, &derr); if (!(server->sysbus)) { @@ -868,6 +872,7 @@ static struct qemud_server *qemudNetwork dbus_error_free(&derr); goto cleanup; } + dbus_connection_set_exit_on_disconnect(server->sysbus, FALSE); } #endif diff -rup libvirt-0.6.0.orig/src/node_device_hal.c libvirt-0.6.0.new/src/node_device_hal.c --- libvirt-0.6.0.orig/src/node_device_hal.c 2009-01-16 12:44:22.000000000 +0000 +++ libvirt-0.6.0.new/src/node_device_hal.c 2009-02-18 12:52:48.000000000 +0000 @@ -685,6 +685,9 @@ static int halDeviceMonitorStartup(void) nodeDeviceLock(driverState); /* Allocate and initialize a new HAL context */ + dbus_connection_set_change_sigpipe(FALSE); + dbus_threads_init_default(); + dbus_error_init(&err); hal_ctx = libhal_ctx_new(); if (hal_ctx == NULL) { @@ -696,6 +699,8 @@ static int halDeviceMonitorStartup(void) fprintf(stderr, "%s: dbus_bus_get failed\n", __FUNCTION__); goto failure; } + dbus_connection_set_exit_on_disconnect(dbus_conn, FALSE); + if (!libhal_ctx_set_dbus_connection(hal_ctx, dbus_conn)) { fprintf(stderr, "%s: libhal_ctx_set_dbus_connection failed\n", __FUNCTION__); -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

"Daniel P. Berrange" <berrange@redhat.com> wrote:
We are using DBus from multiple threads in libvirtd. We are not calling dbus_threads_init_default(). Very bad things result. Happy-crashy-daemon
This patch fixes that, and also stops DBus messing around with SIGPIPE and also stops it calling exit() when the bus disconnects, so we can actually see the error & continue with life.
These all look fine. I can attest to having stared hard at dbus' SIGPIPE-handling code (and asking why, oh why, a *library* handles a signal, by default) so am glad to see it turned off here. However, now there are 3 dbus-init-related lines in each of those files, not counting the slightly-separated *_set_exit_on_disconnect call. It'd be nice to encapsulate those, or at least to add a comment before each saying to keep them in sync. Hmm. actually there are four calls in each place: dbus_connection_set_change_sigpipe(FALSE); dbus_threads_init_default(); dbus_error_init ... = dbus_bus_get(... Only the last one can fail. For the record, are these crashes easy to reproduce?
diff -rup libvirt-0.6.0.orig/qemud/qemud.c libvirt-0.6.0.new/qemud/qemud.c --- libvirt-0.6.0.orig/qemud/qemud.c 2009-02-18 10:56:34.000000000 +0000 +++ libvirt-0.6.0.new/qemud/qemud.c 2009-02-18 12:52:18.000000000 +0000 @@ -860,6 +860,10 @@ static struct qemud_server *qemudNetwork if (auth_unix_rw == REMOTE_AUTH_POLKIT || auth_unix_ro == REMOTE_AUTH_POLKIT) { DBusError derr; + + dbus_connection_set_change_sigpipe(FALSE); + dbus_threads_init_default(); + dbus_error_init(&derr); server->sysbus = dbus_bus_get(DBUS_BUS_SYSTEM, &derr); if (!(server->sysbus)) { @@ -868,6 +872,7 @@ static struct qemud_server *qemudNetwork dbus_error_free(&derr); goto cleanup; } + dbus_connection_set_exit_on_disconnect(server->sysbus, FALSE); } #endif
diff -rup libvirt-0.6.0.orig/src/node_device_hal.c libvirt-0.6.0.new/src/node_device_hal.c --- libvirt-0.6.0.orig/src/node_device_hal.c 2009-01-16 12:44:22.000000000 +0000 +++ libvirt-0.6.0.new/src/node_device_hal.c 2009-02-18 12:52:48.000000000 +0000 @@ -685,6 +685,9 @@ static int halDeviceMonitorStartup(void) nodeDeviceLock(driverState);
/* Allocate and initialize a new HAL context */ + dbus_connection_set_change_sigpipe(FALSE); + dbus_threads_init_default(); + dbus_error_init(&err); hal_ctx = libhal_ctx_new(); if (hal_ctx == NULL) { @@ -696,6 +699,8 @@ static int halDeviceMonitorStartup(void) fprintf(stderr, "%s: dbus_bus_get failed\n", __FUNCTION__); goto failure; } + dbus_connection_set_exit_on_disconnect(dbus_conn, FALSE); + if (!libhal_ctx_set_dbus_connection(hal_ctx, dbus_conn)) { fprintf(stderr, "%s: libhal_ctx_set_dbus_connection failed\n", __FUNCTION__);

On Wed, 2009-02-18 at 18:56 +0100, Jim Meyering wrote:
For the record, are these crashes easy to reproduce?
See: https://bugzilla.redhat.com/484553 this is reproducible by connecting to a vm with "virt-viewer -c qemu:///system foo" as an unprivileged user. Happens maybe 1 in 10 times. Cheers, Mark.

On Wed, 2009-02-18 at 18:56 +0100, Jim Meyering wrote:
For the record, are these crashes easy to reproduce? See:
https://bugzilla.redhat.com/484553
this is reproducible by connecting to a vm with "virt-viewer -c qemu:///system foo" as an unprivileged user. Happens maybe 1 in 10 times.
Thanks, Mark. I've taken the unusual step of adding this line to each of the affected cvs commit logs: Addresses http://bugzilla.redhat.com/484553 When a bug fixes a BZ like that, it really helps to mention the BZ number in the commit log, even if it's only a partial explanation, or one of several bugs fixed. Also, I see that there's more info in the ChangeLog than in the commit log, so perhaps we should be treating the ChangeLog as the authoritative source. I find it most useful to make commit log and ChangeLog entries identical. Then, it doesn't matter which I search; I get the same info. Ideally, we'll maintain only one in the long run.

On Wed, Feb 18, 2009 at 06:56:08PM +0100, Jim Meyering wrote:
"Daniel P. Berrange" <berrange@redhat.com> wrote:
We are using DBus from multiple threads in libvirtd. We are not calling dbus_threads_init_default(). Very bad things result. Happy-crashy-daemon
This patch fixes that, and also stops DBus messing around with SIGPIPE and also stops it calling exit() when the bus disconnects, so we can actually see the error & continue with life.
These all look fine. I can attest to having stared hard at dbus' SIGPIPE-handling code (and asking why, oh why, a *library* handles a signal, by default) so am glad to see it turned off here.
However, now there are 3 dbus-init-related lines in each of those files, not counting the slightly-separated *_set_exit_on_disconnect call. It'd be nice to encapsulate those, or at least to add a comment before each saying to keep them in sync.
Hmm. actually there are four calls in each place:
dbus_connection_set_change_sigpipe(FALSE); dbus_threads_init_default(); dbus_error_init ... = dbus_bus_get(...
Only the last one can fail.
For the record, are these crashes easy to reproduce?
Sometimes. When running on an SMP system, if you have the libvirtd running as root, and then as non-root while logged into GNOME, run 'for i in `seq 1 100` ; do virsh -c qemu:///system list ; done' it will probably fail somewhere between the 10 and 30 mark. If you run under GDB it'll often not fail till the 70 mark. And it never fails under valgrind, because valgrind doesn't allow true concurrency of the threads. Sometimes it would SEGV, other times it would just exit due to RPC message corruption. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Feb 18, 2009 at 01:59:24PM +0000, Daniel P. Berrange wrote:
We are using DBus from multiple threads in libvirtd. We are not calling dbus_threads_init_default(). Very bad things result. Happy-crashy-daemon
This patch fixes that, and also stops DBus messing around with SIGPIPE and also stops it calling exit() when the bus disconnects, so we can actually see the error & continue with life.
Okay, that patch was never commited though an important fix, so I pushed it to CVS, thanks ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Jim Meyering
-
Mark McLoughlin