[libvirt] PATCH: Ensure errors are guarenteed reported in virConnectOpen

The virConnectOpen method is unfortuantely rather special. While there is a virConnect object available, the current rule is that drivers must report errors against the global error object, because upon failure no virConnect object will be returned to the caller. Unforatunately, despite frequent cleanups of code getting this wrong, I've just audited the remote driver and found another 20 or so places where its wrong. This is clearly not a sustainable approach to error reporting. The guarenteed correct solution is actually rather simple - Always report errors against the virConnect object, even in the driver open method - In the cleanup patch of do_open() in libvirt.c, if no global error is set, copy the error from the virConnect object's virError, into the global virError. With this in place we can change all the drivers back to always report against the error object, and thus cleanup some disgusting code like __virRaiseError (in_open ? NULL : conn, ... To just __virRaiseError (conn, ... Daniel Index: src/libvirt.c =================================================================== RCS file: /data/cvs/libvirt/src/libvirt.c,v retrieving revision 1.151 diff -u -r1.151 libvirt.c --- src/libvirt.c 18 Aug 2008 09:24:46 -0000 1.151 +++ src/libvirt.c 19 Aug 2008 10:29:51 -0000 @@ -735,10 +735,8 @@ name = "xen:///"; ret = virGetConnect(); - if (ret == NULL) { - virLibConnError(NULL, VIR_ERR_NO_MEMORY, _("allocating connection")); + if (ret == NULL) return NULL; - } uri = xmlParseURI (name); if (!uri) { @@ -839,7 +837,21 @@ failed: if (ret->driver) ret->driver->close (ret); if (uri) xmlFreeURI(uri); - virUnrefConnect(ret); + + /* If not global error was set, copy any error set + in the connection object we're about to dispose of */ + if (__lastErr.code == VIR_ERR_OK) { + memcpy(&__lastErr, &ret->err, sizeof(ret->err)); + memset(&ret->err, 0, sizeof(ret->err)); + } + + /* Still no error set, then raise a generic error */ + if (__lastErr.code == VIR_ERR_OK) + virLibConnError (NULL, VIR_ERR_INTERNAL_ERROR, + _("unable to open connection")); + + virUnrefConnect(ret); + return NULL; } -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Aug 19, 2008 at 11:35:18AM +0100, Daniel P. Berrange wrote:
The virConnectOpen method is unfortuantely rather special. While there is a virConnect object available, the current rule is that drivers must report errors against the global error object, because upon failure no virConnect object will be returned to the caller. Unforatunately, despite frequent cleanups of code getting this wrong, I've just audited the remote driver and found another 20 or so places where its wrong. This is clearly not a sustainable approach to error reporting.
The guarenteed correct solution is actually rather simple
- Always report errors against the virConnect object, even in the driver open method
- In the cleanup patch of do_open() in libvirt.c, if no global error is set, copy the error from the virConnect object's virError, into the global virError.
With this in place we can change all the drivers back to always report against the error object, and thus cleanup some disgusting code like
__virRaiseError (in_open ? NULL : conn, ...
To just
__virRaiseError (conn, ...
Okay, make sense, it's better to be more flexible in one location and simplify code everywhere else. +1 Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Tue, Aug 19, 2008 at 11:35:18AM +0100, Daniel P. Berrange wrote:
The guarenteed correct solution is actually rather simple
- Always report errors against the virConnect object, even in the driver open method
- In the cleanup patch of do_open() in libvirt.c, if no global error is set, copy the error from the virConnect object's virError, into the global virError.
+1 , although unfortunately this isn't thread-safe. Nothing can be thread-safe unless we change the API. Rich. -- Richard Jones, Emerging Technologies, Red Hat http://et.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top

On Wed, Aug 20, 2008 at 02:22:58PM +0100, Richard W.M. Jones wrote:
On Tue, Aug 19, 2008 at 11:35:18AM +0100, Daniel P. Berrange wrote:
The guarenteed correct solution is actually rather simple
- Always report errors against the virConnect object, even in the driver open method
- In the cleanup patch of do_open() in libvirt.c, if no global error is set, copy the error from the virConnect object's virError, into the global virError.
+1 , although unfortunately this isn't thread-safe. Nothing can be thread-safe unless we change the API.
Well we're not making it any less thread-safe. You alrady have to use the virGetLastError() global func - we're simply making sure it actually records all errors. To make it thread-safe we'll need to add a real virGetThreadLastError() API, which is something on my todo list - with that new apps can just call thevirGetThreadLastError() exclusively and never need to know the distinction between global/connection errors which causes so much trouble. I'm fairly sure I can preseve existing semantics at same time with some suitable internal cleverness. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Aug 20, 2008 at 07:24:59PM +0100, Daniel P. Berrange wrote:
On Wed, Aug 20, 2008 at 02:22:58PM +0100, Richard W.M. Jones wrote:
On Tue, Aug 19, 2008 at 11:35:18AM +0100, Daniel P. Berrange wrote:
The guarenteed correct solution is actually rather simple
- Always report errors against the virConnect object, even in the driver open method
- In the cleanup patch of do_open() in libvirt.c, if no global error is set, copy the error from the virConnect object's virError, into the global virError.
+1 , although unfortunately this isn't thread-safe. Nothing can be thread-safe unless we change the API.
Well we're not making it any less thread-safe. You alrady have to use the virGetLastError() global func - we're simply making sure it actually records all errors.
To make it thread-safe we'll need to add a real virGetThreadLastError() API, which is something on my todo list - with that new apps can just call thevirGetThreadLastError() exclusively and never need to know the distinction between global/connection errors which causes so much trouble. I'm fairly sure I can preseve existing semantics at same time with some suitable internal cleverness.
I really wished we could avoid thread local storage mess, and in general anything having to do with API exported global variables. In general (I mean for the vast majority of the userland code dealing with libvirt) there is always a domain or connection object where we can plug the error, and provide it in-context. The only exception is the connection creation, maybe this means we need to provide a better entry point for this, but I would really prefer to not expose the notion of thread at the API level. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Thu, Aug 21, 2008 at 08:35:25AM +0200, Daniel Veillard wrote:
On Wed, Aug 20, 2008 at 07:24:59PM +0100, Daniel P. Berrange wrote:
On Wed, Aug 20, 2008 at 02:22:58PM +0100, Richard W.M. Jones wrote:
On Tue, Aug 19, 2008 at 11:35:18AM +0100, Daniel P. Berrange wrote:
The guarenteed correct solution is actually rather simple
- Always report errors against the virConnect object, even in the driver open method
- In the cleanup patch of do_open() in libvirt.c, if no global error is set, copy the error from the virConnect object's virError, into the global virError.
+1 , although unfortunately this isn't thread-safe. Nothing can be thread-safe unless we change the API.
Well we're not making it any less thread-safe. You alrady have to use the virGetLastError() global func - we're simply making sure it actually records all errors.
To make it thread-safe we'll need to add a real virGetThreadLastError() API, which is something on my todo list - with that new apps can just call thevirGetThreadLastError() exclusively and never need to know the distinction between global/connection errors which causes so much trouble. I'm fairly sure I can preseve existing semantics at same time with some suitable internal cleverness.
I really wished we could avoid thread local storage mess, and in general anything having to do with API exported global variables. In general (I mean for the vast majority of the userland code dealing with libvirt) there is always a domain or connection object where we can plug the error, and provide it in-context. The only exception is the connection creation, maybe this means we need to provide a better entry point for this, but I would really prefer to not expose the notion of thread at the API level.
Well you see I want to be able to use the same virConnectPtr object from multiple threads. I've looked at how todo this and it actually won't be very hard once we have a way to report thread-local errors. Re-using a single virConnectPtr object is important because when talking to a remote libvirtd instance, you don't want to have to open multiple connections for each thread in your app, because that will require the user to re-authenticate multiple times (or for the app to store your credentials - not cool) Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Aug 21, 2008 at 09:41:12AM +0100, Daniel P. Berrange wrote:
On Thu, Aug 21, 2008 at 08:35:25AM +0200, Daniel Veillard wrote:
To make it thread-safe we'll need to add a real virGetThreadLastError() API, which is something on my todo list - with that new apps can just call thevirGetThreadLastError() exclusively and never need to know the distinction between global/connection errors which causes so much trouble. I'm fairly sure I can preseve existing semantics at same time with some suitable internal cleverness.
I really wished we could avoid thread local storage mess, and in general anything having to do with API exported global variables. In general (I mean for the vast majority of the userland code dealing with libvirt) there is always a domain or connection object where we can plug the error, and provide it in-context. The only exception is the connection creation, maybe this means we need to provide a better entry point for this, but I would really prefer to not expose the notion of thread at the API level.
Well you see I want to be able to use the same virConnectPtr object from multiple threads. I've looked at how todo this and it actually won't be very hard once we have a way to report thread-local errors.
not very hard ??? You expect to put a global lock in the connection The threading problem is a pandora box, once you provide the functionality people will use it in ways you didn't expect. You will see things like people closing the connection in one thread while another thread is still using it. I don't think you can catch all scenarios so i'm quite worried about suggesting it will be usable and then facing undebuggable scenarios coming from lost users.
Re-using a single virConnectPtr object is important because when talking to a remote libvirtd instance, you don't want to have to open multiple connections for each thread in your app, because that will require the user to re-authenticate multiple times (or for the app to store your credentials - not cool)
you keep one thread per connection. You will have to lock anyway, it's not like you will gain anything by trying to parallelize on a single connection. I really don't see what you're trying to do. Even entry points without side effect will need some kind of locking. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Thu, Aug 21, 2008 at 11:30:11AM +0200, Daniel Veillard wrote:
On Thu, Aug 21, 2008 at 09:41:12AM +0100, Daniel P. Berrange wrote:
On Thu, Aug 21, 2008 at 08:35:25AM +0200, Daniel Veillard wrote:
To make it thread-safe we'll need to add a real virGetThreadLastError() API, which is something on my todo list - with that new apps can just call thevirGetThreadLastError() exclusively and never need to know the distinction between global/connection errors which causes so much trouble. I'm fairly sure I can preseve existing semantics at same time with some suitable internal cleverness.
I really wished we could avoid thread local storage mess, and in general anything having to do with API exported global variables. In general (I mean for the vast majority of the userland code dealing with libvirt) there is always a domain or connection object where we can plug the error, and provide it in-context. The only exception is the connection creation, maybe this means we need to provide a better entry point for this, but I would really prefer to not expose the notion of thread at the API level.
Well you see I want to be able to use the same virConnectPtr object from multiple threads. I've looked at how todo this and it actually won't be very hard once we have a way to report thread-local errors.
not very hard ??? You expect to put a global lock in the connection The threading problem is a pandora box, once you provide the functionality people will use it in ways you didn't expect. You will see things like people closing the connection in one thread while another thread is still using it. I don't think you can catch all scenarios so i'm quite worried about suggesting it will be usable and then facing undebuggable scenarios coming from lost users.
Well, ok so let me step back a bit because there's actually several layers to this issue... - The libvirtd is single threaded. This was reasonable at first, but some of our APIs take a long time to complete, so we need to have libvirtd parallelized. - The QEMU/LXC drivers have state stored in the daemon, so even if you merely allow libvirtd to parallelize by putting long running connections in the 'background' you ned to have locking in the QEMU/LXC drivers. - Given that we need to have locking anyway, might as well just make the libvirtd daemon properly multi-threaded, rather than trying to have a multi-request queuing system. - Once you've done all that, then solving the virError thread-local issue is all that remains to allow virConnect to be accessed by multiple threads We need to do 1->3 in order to make libvirtd more scalable. At which point 4 becomes 'not very hard' :-) That said, there is one other option I've thought about instead of doing step 4, we could add a 'virConnectDup' method. This would take an existing connection object, and create an independant copy of it that another thread could use. The key would be that the internal driver would automagically pass authentication credentials somehow, so you wouldn't have to re-authenticate the 2nd connection. For the remote driver, this could be as simple as invoking an RPC call to fetch a one-time key, and passing that to the 2nd connection to use as their authentication credential.
Re-using a single virConnectPtr object is important because when talking to a remote libvirtd instance, you don't want to have to open multiple connections for each thread in your app, because that will require the user to re-authenticate multiple times (or for the app to store your credentials - not cool)
you keep one thread per connection. You will have to lock anyway, it's not like you will gain anything by trying to parallelize on a single connection. I really don't see what you're trying to do. Even entry points without side effect will need some kind of locking.
The libvirt APIs are in fact very parallelizable. There are very few times you need to access state on the virConnectPtr object - most of the time we're dealing with virDomainPtr objects. So you merely need to lock the connection, while you obtain a lock on the virDomainPtr object, then you can release the connection lock. So the connection lock is held for virtually no time at all. I reckon it'll parallelize very well. But its probably not worth getting into this debate unless I've actually got some proof of concept code to show, which is probably quite a long way out timewise... Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Aug 21, 2008 at 10:42:51AM +0100, Daniel P. Berrange wrote:
Well, ok so let me step back a bit because there's actually several layers to this issue...
- The libvirtd is single threaded. This was reasonable at first, but some of our APIs take a long time to complete, so we need to have libvirtd parallelized.
- The QEMU/LXC drivers have state stored in the daemon, so even if you merely allow libvirtd to parallelize by putting long running connections in the 'background' you ned to have locking in the QEMU/LXC drivers.
- Given that we need to have locking anyway, might as well just make the libvirtd daemon properly multi-threaded, rather than trying to have a multi-request queuing system.
- Once you've done all that, then solving the virError thread-local issue is all that remains to allow virConnect to be accessed by multiple threads
We need to do 1->3 in order to make libvirtd more scalable. At which point 4 becomes 'not very hard' :-)
That said, there is one other option I've thought about instead of doing step 4, we could add a 'virConnectDup' method. This would take an existing connection object, and create an independant copy of it that another thread could use. The key would be that the internal driver would automagically pass authentication credentials somehow, so you wouldn't have to re-authenticate the 2nd connection. For the remote driver, this could be as simple as invoking an RPC call to fetch a one-time key, and passing that to the 2nd connection to use as their authentication credential.
Actually the one-time key idea wouldn't work because that wouldn't let us negotiate the session encryption, which is done by SASL at the time we authenticate the client. Rich suggested perhaps sharing a single TCP connection and multiplexing messages on it - we have a serial number with every message/reply so this is doable in theory. It'd require some fun code to hand-off replies between the virConnect objects sharing the connection, but might be more viable in general. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

"Daniel P. Berrange" <berrange@redhat.com> wrote:
The virConnectOpen method is unfortuantely rather special. While there is a virConnect object available, the current rule is that drivers must report errors against the global error object, because upon failure no virConnect object will be returned to the caller. Unforatunately, despite frequent cleanups of code getting this wrong, I've just audited the remote driver and found another 20 or so places where its wrong. This is clearly not a sustainable approach to error reporting.
The guarenteed correct solution is actually rather simple
- Always report errors against the virConnect object, even in the driver open method
- In the cleanup patch of do_open() in libvirt.c, if no global error is set, copy the error from the virConnect object's virError, into the global virError.
With this in place we can change all the drivers back to always report against the error object, and thus cleanup some disgusting code like
__virRaiseError (in_open ? NULL : conn, ...
To just
__virRaiseError (conn, ...
Looks good, indeed. ACK
Index: src/libvirt.c =================================================================== ... + /* If not global error was set, copy any error set
s/not/no/
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Jim Meyering
-
Richard W.M. Jones