Public virErrorPreserveLast()/virErrorRestore()

These two functions are currently private to libvirt, hence not available to consumers. Would it be possible to make them public? Without them, there's no way to do any libvirt call without stomping on an existing error that you may want to preserve. I have multiple threads sharing a remote connection (to local libvirtd). I need to be able to handle connection drops (e.g. libvirtd restart). Long story short, the only approach I've found that actually works properly is that there's one main conn object, each thread has a virConnectRef() to it, and whenever a thread gets an error, we check in the error callback if !virConnectIsAlive(). If so, we close the thread's conn, and potentially also clean up the main conn. Then when the error gets returned from the relevant library call, we try to re-connect. However, virConnectIsAlive() stomps on the thread's error, meaning we lose the context on a real, non-connection-related error. So I need to be able to save and restore in the error handler. Given how much these are used inside libvirt code already, it feels likely it would be useful in other public cases too. Thoughts? thnaks john

On 12/13/23 12:04, John Levon wrote:
These two functions are currently private to libvirt, hence not available to consumers. Would it be possible to make them public? Without them, there's no way to do any libvirt call without stomping on an existing error that you may want to preserve.
I have multiple threads sharing a remote connection (to local libvirtd). I need to be able to handle connection drops (e.g. libvirtd restart). Long story short, the only approach I've found that actually works properly is that there's one main conn object, each thread has a virConnectRef() to it, and whenever a thread gets an error, we check in the error callback if !virConnectIsAlive(). If so, we close the thread's conn, and potentially also clean up the main conn.
Then when the error gets returned from the relevant library call, we try to re-connect.
However, virConnectIsAlive() stomps on the thread's error, meaning we lose the context on a real, non-connection-related error.
So I need to be able to save and restore in the error handler.
Given how much these are used inside libvirt code already, it feels likely it would be useful in other public cases too.
Thoughts?
Yeah, it makes sense. I can see it used also in cleanup code (just like we do in our code). For instance: void func() { ... if (virDomainCreate(dom) < 0) goto cleanup; ... cleanup: /* save error here */ if (dom) virDomainDestroy(dom); if (conn) virConnectClose(conn); ... /* restore error here */ } Trouble is that every public API resets the error (by design). And while the error struct is public (which means users can write functions to create its copy), we already do have functions for that. So let's just expose them. Michal

On Thu, Dec 14, 2023 at 10:17:01 +0100, Michal Prívozník wrote:
On 12/13/23 12:04, John Levon wrote:
These two functions are currently private to libvirt, hence not available to consumers. Would it be possible to make them public? Without them, there's no way to do any libvirt call without stomping on an existing error that you may want to preserve.
I have multiple threads sharing a remote connection (to local libvirtd). I need to be able to handle connection drops (e.g. libvirtd restart). Long story short, the only approach I've found that actually works properly is that there's one main conn object, each thread has a virConnectRef() to it, and whenever a thread gets an error, we check in the error callback if !virConnectIsAlive(). If so, we close the thread's conn, and potentially also clean up the main conn.
Then when the error gets returned from the relevant library call, we try to re-connect.
However, virConnectIsAlive() stomps on the thread's error, meaning we lose the context on a real, non-connection-related error.
So I need to be able to save and restore in the error handler.
Given how much these are used inside libvirt code already, it feels likely it would be useful in other public cases too.
Thoughts?
Yeah, it makes sense. I can see it used also in cleanup code (just like we do in our code). For instance:
void func() {
... if (virDomainCreate(dom) < 0) goto cleanup; ... cleanup: /* save error here */ if (dom) virDomainDestroy(dom); if (conn) virConnectClose(conn); ... /* restore error here */ }
Trouble is that every public API resets the error (by design). And while the error struct is public (which means users can write functions to create its copy), we already do have functions for that. So let's just expose them.
While I have no problems with virErrorPreserveLast() as that's just syntax sugar on top of existing public functions, I don't think that it's a good idea to expose virErrorRestore() as that gives the access to previously unexposed virSetError(). This gives the users possibility to set arbitrary errors due to the error struct being public as noted above. Saving an error to a custom error object should be enough and I don't really see a point in making libvirt carry it around with the possibility of it being overwritten at any point by another library call. Apps can then pass the object instance around as needed with more explicit logic behind it.

On Thu, Dec 14, 2023 at 12:18:27PM +0100, Peter Krempa wrote:
While I have no problems with virErrorPreserveLast() as that's just syntax sugar on top of existing public functions, I don't think that it's a good idea to expose virErrorRestore() as that gives the access to previously unexposed virSetError(). This gives the users possibility to set arbitrary errors due to the error struct being public as noted above.
This sounds very much like a "don't do that then" scenario.
Saving an error to a custom error object should be enough and I don't really see a point in making libvirt carry it around with the possibility of it being overwritten at any point by another library call.
I'm not very clear on your suggestion. If I can't restore the libvirt error in the error callback, how would the eventual API call return have the right error? Are you suggesting I have to have custom logic to save the error to some thread-local location, and change every single API call handling to look there, ignoring the API return value and error object? Or do you have a suggestion I'm missing that is more elegant than this? I think it's pretty illustrative that the library itself uses these calls everywhere. regards john

On Thu, Dec 14, 2023 at 12:03:55 +0000, John Levon wrote:
On Thu, Dec 14, 2023 at 12:18:27PM +0100, Peter Krempa wrote:
While I have no problems with virErrorPreserveLast() as that's just syntax sugar on top of existing public functions, I don't think that it's a good idea to expose virErrorRestore() as that gives the access to previously unexposed virSetError(). This gives the users possibility to set arbitrary errors due to the error struct being public as noted above.
This sounds very much like a "don't do that then" scenario.
Saving an error to a custom error object should be enough and I don't really see a point in making libvirt carry it around with the possibility of it being overwritten at any point by another library call.
Please note that virErrorRestore() is used (almost [1]) exclusively inside libvirt's library code, thus *before* the error itself gets raised to the caller of a public libvirt API. Also TBH I'm not a fan of the passing of errors via the thread local variable because, while it's convenient in the simple cases, for any more complicated cases it just creates headaches (even internally). As in your original problem statement it also makes problems when you want to associate the error itself with the function call that caused it.
I'm not very clear on your suggestion. If I can't restore the libvirt error in the error callback, how would the eventual API call return have the right error?
Are you suggesting I have to have custom logic to save the error to some thread-local location, and change every single API call handling to look there, ignoring the API return value and error object?
If you need any form of more complex logic in your application for error reporting (ab)using libvirt's thread local variable to store the error for other functions in your application will not bring too much simplification of the logic itself. Additionally care would still have to be taken to use e.g. libvirt's cleanup functions which would still reset the thread local error. Applications which wish to do delayed error reporting have the means to do so by having their own instance of virError, either per process (as virsh does), or any other granularity, including a thread local one if they wish to do so. To use such error reporting you can use virSaveLastError(), which retrieves the proper error object and then pass it via your own structures with known lifetimes and without the possibly surprising resetting of the error. Using the global error handler function is not a very good idea in multithreaded applications, so I'd really suggest to have a generic handler function that you'll call after receiving a failure code from any of the libvirt APIs you care about and that will preserve/raise the error in your application. Additionally for handling disconnects libvirt provides a connection close callback registrable via virConnectRegisterCloseCallback(), which can be registered multiple times with private data for each instance. That might help you getting rid of the generic error handler's duty to catch disconnects. I understand that the error reporting situation is not ideal but I'm still very sceptical that having the ability to reset the error would help much.
Or do you have a suggestion I'm missing that is more elegant than this?
I think it's pretty illustrative that the library itself uses these calls everywhere.
[1] There is one use in virsh and one use in virt-admin code, which I'll be addressing.

On Thu, Dec 14, 2023 at 05:55:41PM +0100, Peter Krempa wrote:
If you need any form of more complex logic in your application for error reporting (ab)using libvirt's thread local variable to store the error
I'm a bit lost on the "abuse" claim here. Is this abuse in your eyes: ``` int saved_errno = errno; close(fd); errno = saved_errno; ``` since "errno" is libc's thread local variable that stores the error?
for other functions in your application will not bring too much simplification of the logic itself.
Excuse the pseudocode: ``` int func() { ... cleanup: err = virErrorSaveLast(error); virDomainFree(domain); virErrorRestore(error); return ret; } func2() { ret = func(); fprintf(stderr, "%s\n", virGetLastErrorMessage()); ``` versus: ``` int func() { ... cleanup: err = pthread_getspecific(global->pthread_key_libvirt_error); *err = virSaveLastError(); virDomainFree(domain); pthread_setspecific(global->pthread_key_libvirt_error, err); return ret; } func2() { ret = func(); err = pthread_getspecific(global->pthread_key_libvirt_error); fprintf(stderr, "%s\n", err->message); ``` If your argument is that the second version is somehow better, I don't think we're going to agree, and we'll just keep another local patch. Thanks for the discussion!
Using the global error handler function is not a very good idea in multithreaded applications
Side note: there's little point in me going into it, but been there, earned the scars. Due to, for example, various connection failures showing up as VIR_ERR_INTERNAL_ERROR, there's no reliable way to use the close callback with multiple threads sharing a connection and correctly differentiating between a need to reconnect and an actual API failure. Although, it sounds like you know of a counter-example piece of code: is it something you could share? regards john

On Thu, Dec 14, 2023 at 19:02:56 +0000, John Levon wrote:
On Thu, Dec 14, 2023 at 05:55:41PM +0100, Peter Krempa wrote:
If you need any form of more complex logic in your application for error reporting (ab)using libvirt's thread local variable to store the error
I'm a bit lost on the "abuse" claim here. Is this abuse in your eyes:
``` int saved_errno = errno; close(fd); errno = saved_errno; ```
since "errno" is libc's thread local variable that stores the error?
for other functions in your application will not bring too much simplification of the logic itself.
Excuse the pseudocode:
``` int func() { ... cleanup: err = virErrorSaveLast(error); virDomainFree(domain); virErrorRestore(error); return ret; }
func2() { ret = func(); fprintf(stderr, "%s\n", virGetLastErrorMessage()); ```
The suggested approach for applications using libvirt is to have their own virError instance, which is: 1) global (not suited for multithreaded apps, virsh uses this) 2) thread local 3) in a context object This error instance is then updated only when an error is reported: A) via the virSetErrorFunc callback (applicable to 1, 2) B) via a local helper function to store the error after failed libvirt API call This way, the resetting of the 'last error' object internal to libvirt will not influence/stomp over the error saved inside the separate error variable, thus removing the need to actually reset the error. Option A) is a bit more automatic, as it doesn't require you to be storing the error after each libvirt call. In case your application needs to report original error and then invoke some APIs which may fail to clean up this isn't ideal as you still have the same problem. Graceful recovery using different code path can be done by simply clearing the error. Option B) requires a function call on each cleanup path from a failed libvirt API but gives you the option to pick specific APIs to report errors from. In virsh we use 1) a global error object with A), errors being stored inside the error callback: You can have a look at: vshErrorHandler, vshResetLibvirtError, and vshReportError (You'll also find vshSaveLibvirtError, but that is for virsh's usage of utility code from libvirt which doesn't raise the error via the error handler. Those helpers are not available from the library)
versus:
``` int func() { ... cleanup: err = pthread_getspecific(global->pthread_key_libvirt_error); *err = virSaveLastError(); virDomainFree(domain); pthread_setspecific(global->pthread_key_libvirt_error, err); return ret; }
func2() { ret = func(); err = pthread_getspecific(global->pthread_key_libvirt_error); fprintf(stderr, "%s\n", err->message);
```
If your argument is that the second version is somehow better,
Putting the helpers into a function: void yourAppLibvirtErrorReset(void) { err = pthread_getspecific(global->pthread_key_libvirt_error); if (*err) virERrorFree(*err); pthread_setspecific(global->pthread_key_libvirt_error, NULL); } void yourAppLibvirtErrorStore(void) { if (virGetLastErrorCode() == VIR_ERR_OK) return; err = virSaveLastError(); yourAppLibvirtErrorReset(); pthread_setspecific(global->pthread_key_libvirt_error, err); } void yourAppLibvirtErrorReport(void) { err = pthread_getspecific(global->pthread_key_libvirt_error); fprintf(stderr, "%s\n", err->message); yourAppLibvirtErrorReset(); } your example then becomes: ``` int func() { ... cleanup: yourAppLibvirtErrorStore(); virDomainFree(domain); return ret; } func2() { ret = func(); yourAppLibvirtErrorReport(); ``` Or, if you chose to call yourAppLibvirtErrorStore() from the error callback: ``` int func() { ... cleanup: virDomainFree(domain); return ret; } func2() { ret = func(); yourAppLibvirtErrorReport(); ```
I don't think we're going to agree, and we'll just keep another local patch. Thanks for the discussion!
Using the global error handler function is not a very good idea in multithreaded applications
Side note: there's little point in me going into it, but been there, earned the scars. Due to, for example, various connection failures showing up as VIR_ERR_INTERNAL_ERROR, there's no reliable way to use the close callback with multiple threads sharing a connection and correctly differentiating between a need to reconnect and an actual API failure.
Could you please elaborate? This sounds like an actual bug we'd want to fix. The connection close callback is supposed to be reliable because e.g. migration recovery depends on it.
Although, it sounds like you know of a counter-example piece of code: is it something you could share?
See above for the pointers to virsh.

On Fri, Dec 15, 2023 at 10:28:18 +0000, John Levon wrote:
On Fri, Dec 15, 2023 at 10:32:36AM +0100, Peter Krempa wrote:
Although, it sounds like you know of a counter-example piece of code: is it something you could share?
See above for the pointers to virsh.
virsh is single threaded.
Yup, I wrote that in the trimmed part of the reply which also shows example which should work for multi-threaded application.

On Thu, Dec 14, 2023 at 05:55:41PM +0100, Peter Krempa wrote:
On Thu, Dec 14, 2023 at 12:03:55 +0000, John Levon wrote: Using the global error handler function is not a very good idea in multithreaded applications, so I'd really suggest to have a generic handler function that you'll call after receiving a failure code from any of the libvirt APIs you care about and that will preserve/raise the error in your application.
Right, both virConnSetErrorFunc & virSetErrorFunc are functions from day 1, that we effectively considered to be redundant once we retrofitted thread local error reporting into libvirt in the distant past. Pretty much every all just calls virSetErrorFunc to disable the silly built-in default print-to-stderr behaviour we unfortunately have.
Additionally for handling disconnects libvirt provides a connection close callback registrable via virConnectRegisterCloseCallback(), which can be registered multiple times with private data for each instance. That might help you getting rid of the generic error handler's duty to catch disconnects.
OpenStack uses this callback to trigger reconnect when a connection dies, and I'd consider that good practice. It also executes in clean stack context direct from the event loop, as opposed to the error callbacks which can be triggered from completely arbitrary application code stack context. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Dec 13, 2023 at 11:04:54AM +0000, John Levon wrote:
These two functions are currently private to libvirt, hence not available to consumers. Would it be possible to make them public? Without them, there's no way to do any libvirt call without stomping on an existing error that you may want to preserve.
I have multiple threads sharing a remote connection (to local libvirtd). I need to be able to handle connection drops (e.g. libvirtd restart). Long story short, the only approach I've found that actually works properly is that there's one main conn object, each thread has a virConnectRef() to it, and whenever a thread gets an error, we check in the error callback if !virConnectIsAlive(). If so, we close the thread's conn, and potentially also clean up the main conn.
Is there a real need to call virConnectIsAlive synchronously after every API error scenario ? That is going to be checking for libvirtd restart both too frequently, and at the same time, not frequently enough - ie you don't detect the libvirtd restart until after the next time you make an API call If you use the virConnectRegisterCloseCallback() method, you should get notified of the connection drop at precisely the right time, every time, regardless of whether you happen to be making an API call at the time. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Dec 15, 2023 at 10:58:29AM +0000, Daniel P. Berrangé wrote:
If you use the virConnectRegisterCloseCallback() method, you should get notified of the connection drop at precisely the right time, every time, regardless of whether you happen to be making an API call at the time.
That only gets called once, not for every thread sharing the connection, so there is no way to clean up all the other threads. regards john

On Wed, Dec 13, 2023 at 11:04:54AM +0000, John Levon wrote:
These two functions are currently private to libvirt, hence not available to consumers. Would it be possible to make them public? Without them, there's no way to do any libvirt call without stomping on an existing error that you may want to preserve.
I have multiple threads sharing a remote connection (to local libvirtd). I need to be able to handle connection drops (e.g. libvirtd restart). Long story short, the only approach I've found that actually works properly is that there's one main conn object, each thread has a virConnectRef() to it, and whenever a thread gets an error, we check in the error callback if !virConnectIsAlive(). If so, we close the thread's conn, and potentially also clean up the main conn.
I wonder if we should take the position that this behaviour is a libvirt bug. We declare that on return from an API call, the last error object reflects any error that occurred in this API call. Libvirt is executing the error callback and work done in the error callback is breaking our API guarantee. This broken API guarantee is not the app's fault, it is libvirt's fault for not protecting itself against things the callback might do. IOW, should we fix the virDispatchError function, to do the save/restore dance either side of invoking the callback. This would aviod exposing any new APIs, and make the app should "just work". With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Dec 15, 2023 at 11:20:10AM +0000, Daniel P. Berrangé wrote:
Libvirt is executing the error callback and work done in the error callback is breaking our API guarantee.
This broken API guarantee is not the app's fault, it is libvirt's fault for not protecting itself against things the callback might do.
IOW, should we fix the virDispatchError function, to do the save/restore dance either side of invoking the callback. This would aviod exposing any new APIs, and make the app should "just work".
This would certainly fix my scenario, yes, thanks. We still have to deal with the fact that e.g. virDomainFree() cannot be used in cleanup paths, but the workarounds there are generally not so bad. regards john
participants (4)
-
Daniel P. Berrangé
-
John Levon
-
Michal Prívozník
-
Peter Krempa