[libvirt] (Dropping) OOM Handling in libvirt

This is a long mail about ENOMEM (OOM) handling in libvirt. The executive summary is that it is not worth the maint cost because: - this code will almost never run on Linux hosts - if it does run it will likely have bad behaviour silently dropping data or crashing the process - apps using libvirt often do so via a non-C language that aborts/exits the app on OOM regardless, or use other C libraries that abort - we can build a system that is more reliable when OOM happens by not catching OOM, instead simply letting apps exit, restart and carry on where they left off The long answer follows... The background ============== Since the first commit libvirt has attempted to handle out of memory (OOM) errors in the same way it does for any other problem. Namely, a virError will be raised, and the method will jump to its cleanup/error label. The error bubbles up the stack and in theory someone or something will catch this and do something sensible/useful. This has long been considered "best practice" for most C libraries, especially those used for system services. This mail makes the argument that it is in fact /not/ "best practice" to try to handle OOM. OOM handling is very difficult to get correct because it is something that developers almost never encounter and thus the code paths are rarely run. We designed our memory allocation APIs such that we get compile time errors if code forgets to check the return value for failure. This is good as it eliminates an entire class of bugs. Our error handling goto label pattern tries to align OOM handling with other general error handling which is more commonly tested. We have code in the allocators which lets us run unit tests simulating the failure of any allocation that is made during the test. Executing this is extraordinarily time consuming as some of our unit tests have many 1000's or even 10's of 1000's of allocations. The running time to simulate OOM for each allocation is O(N^2) which does not scale well. As a result we've probably only run these OOM tests a handful of times over the years. The tests show we generally do remarkly well at OOM handling, but none the less we have *always* found bugs in our handling where we fail to report the OOM (as in, continue to run but with missing data), or we crash when cleaning up from OOM. Our unit tests don't cover anywhere near all of our codebase though. We have great coverage of XML parsing/formatting, and sporadic coverage of everything else. IOW, despite our clever API designs & coding patterns designed to improve our OOM handling, we can not have confidence that we will actually operate correctly in an OOM scenario. Of course this applies to all error conditions that may arise, but OOM should be considered special. With other error conditions from syscalls or API calls, the effects are largely isolated to the site of the call. With OOM, the OOM condition may well persist and so during cleanup we will hit further OOM problems. We may well fail to even allocate the memory needed for raise a virErrorPtr. All threads can see the OOM concurrently so the effect spreads across the whole process very quickly. Who benefits from OOM handling ============================== Lets pretend for a minute though that our OOM handling is perfect and instead ask who/what is benefitting from it ? Libvirt's own processes. aka libvirtd, virtlockd, virtlogd ---------------------------------------------------------- For libvirtd we have essentially zero confidence that it will handle OOM at all. It is running all virt driver code over which we have massively inadequate unit testing cover to have any confidence that OOM will be well handled. Even if OOM is handled in a worker thread, every iteration of the event loop does an allocation to hold the poll FD array, so we can easily see the event loop hitting OOM which will make the entire process shutdown, or worse, hang during cleanup as worker threads block waiting for the event loop todo something despite not being running anymore. We already expect that bugs will cause libvirtd to crash and so have designed the drivers to be robust such that it can restart and attempt to carry on as normal afterwards. So arguably it would be fine to handle OOM by simply doing an abort and let systemd restart the daemon. For virtlockd and virtlogd we again have little confidence that they will handle OOM correctly. They are, however, more critical processes that we badly need to stay running at all times. We go to great effort to make it possible to re-exec on upgrades keeping state open. For virtlogd we could potentially change the way we deal with stdout/err for QEMU. Instead of using an anonymous pipe, we could create a named fifo on disk for each QEMU process. stdout/err would be connected to one end, and virtlogd to the other end. This would enable us to have virtlogd restarted and re-open the stdout/err for QEMU. This would mean we no longer need the re-exec logic either, which is good as that's another thing that's rarely tested. This would make abort on OOM acceptable. For virtlockd we have the hardest problem. It fundamentally has to keep open the FDs in order to maintain active locks. If it does crash / abort, then all open locks are lost. It would have to manually re-acquire locks when it starts up again, provided it had a record of which locks it was supposed to have. In theory nothing bad should happen in this window where virtlockd is dead, as if libvirtd tries to start another VM it will trigger auto-start of virtlockd via its systemd socket unit, which could then cause it to reacquire previous locks it had. IOW, it should be possible to make virtlockd robust enough that doing abort on OOM is tolerable. We really need this robustness regardless, because virtlockd can of course already crash due to code bugs. If we make it robust to all crashes, then by implication it will be robust enough for aborton OOM. Clients of libvirt. aka oVirt, OpenStack, KubeVirt, virt-manager, cockpit ------------------------------------------------------------------------- Clients of libvirt can consume it via a number of different APIs, all eventually calling into libvirt.so - Core C API aka libvirt.so Catches and propagates OOM to callers. virsh just reports it as it would any other error. In single shot mode it will be exiting regardles with error code. In interactive mode, in theory it can carry on with processing the next command. In practice its CLI parsing will probably then hit OOM anyway causing it to exit. virt-viewer links to GTK and this already has abort-on-OOM behaviour via GLib. So in all liklihood, even if it catches the OOM from libvirt, it will none the less abort on OOM in a GTK/GLib call. libvirt-dbus is written using glib, so again even if it catches the OOM from libvirt it will none the less abort on OOM in a GLib call. - Python binding Python bindings will raise a MemoryError exception which propagates back up to the python code. In theory the app can catch this and take some behaviour. This mostly only works though if the cause was a single huge allocation, such that most other "normal" sized allocations continue working. In a true OOM scenario where arbitrary allocs start failing, the python interpretor will fail too. - Perl binding It is hard to find clear info about Perl behaviour during ENOMEM. The libvirt bindings assume the Perl allocation functions won't ever fail, so if they do we're going to reference a NULL pointer. If normal Perl code gets OOM, the interpretor will raise an error. In theory this can be caught, but in practice few, if any, apps will try so the process will likely quit. - Go binding Errors from libvirt are all turned into Golang errors and propagated to the caller. If the Go runtime gets an OOM from an allocation it will panic the process which can't be caught, so it will exit with stack trace. - Java binding The JVM tends to allocate a fixed size heap for its own use. So Java code can see OOM exceptions even if the OS has plenty of memory. If the OS does run out of memory, assuming the JVM heap was already fully allocated it shouldn't immediately be affect, but might suffer collatoral damage. But in theory libvirt OOM could get turned into a Java exception that an app can catch & handle nicely. There are other bindings, but the above captures the most important usage of libvirt. The complication of Linux ========================= Note that all of the above discussion had the implicit assumption that malloc will actually return ENOMEM when memory is exhausted. On Linux at least this is not the case in a default install. Linux tends to enable memory overcommit causing the kernel to satisfy malloc requests even if it exceeds available RAM + Swap. The allocated memory won't even be paged in to physical RAM until the page is written to. If there is insufficient to make a page resident in RAM, then the OOM killer is set free. Some poor victim will get killed off. It is possible to disable RAM overcommit and also disable or deprioritize the OOM killer. This might make it possible to actually see ENOMEM from an allocation, but few if any developers or applications run in this scenario so it should be considered untested in reality. With cgroups it is possible to RAM and swap usage limits. In theory you can disable the OOM killer per-cgroup and this will cause ENOMEM to the app. In my testing with a trivial C program that simply mallocs a massive array and them memsets it, it is hard to get ENOMEM to happen reliably. Sometimes the app ends up just hanging when testing this. Other operating systems have different behaviour and so are more likely to really report ENOMEM to application code, but even if you can get ENOMEM on other OS the earlier points illustrate that its not worth worrying about. The conclusion ============== To repeat the top of this document, attempts to handle OOM are not worth the maint cost they incur: - this code will almost never run on Linux hosts - if it does run it will likely have bad behaviour silently dropping data or crashing the process - apps using libvirt often do so via a non-C language that aborts/exits the app on OOM regardless, or use other C libraries that abort - we can build a system that is more reliable when OOM happens by not catching OOM, instead simply letting apps exit, restart and carry on where they left off. The proposal is thus to change all libvirt code (ie both client and server side), to abort on OOM. This will allow us to simplify the control flow in many methods eliminating ~1500 checks for memory failure, their associated goto jumps, and many of the cleanup/error labels. This makes the code easier to follow & maintain and opens up new avenues for improving libvirt's future development. The main blocking prequisite to making the change is to address the needs for reliable restart of virtlockd and virtlogd daemons. As a point of reference libguestfs has had abort-on-oom behaviour forever and no one has complained about this behaviour. Apps using libvirt often also use libguestfs in the same process. The implementation ================== Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library. The initial impl would thus be to just link to GLib and switch VIR_ALLOC related APIs to use g_new/g_malloc & friends internally. Over time code can be changed to use g_new/g_malloc directly thus removing the error handling incrementally. Use of GLib brings a number of other significant opportunities / advantages to ongoing development - Access to many standard data structures (hash tables, linked lists, queues, growable arrays, growable strings). - Access to platform portability APIs for threads, event loops, file I/O, filename parsing and more. - Access to GObject to replace our own virObject clone with something that is far more feature rich. This is especially true of its callback facility via signal handlers. - Access to GIO to replace much of our socket I/O portability wrappers, and DBus client APIs. - Opens the possibility of calling to code in a non-C language via the GObject introspection bindings for GObject & other helper APIs. - Ability to drop use of gnulib once we use GLib for all portability problems we currently rely on gnulib for, once a critical set of functionality is ported to the GLib APIs. Most important is the sockets / event loop stuff due to Windows portability. - Ability to drop use of autoconf/automake in favour of a more maintainable option like meson/ninja, once we eliminate use of gnulib. Ultimately the big picture benefit is that we can spend less time working on low level generic code related to platform portability, data structures, or system services. Note this is not claiming that all GLib's APIs are better than stuff we have implemented in libvirt already. The belief is that even if some of the GLib APIs are worse, it is still a benefit to the project to use them, as it will reduce the code we maintain ourselves. This in turns lets us spend more time working on high level code directly related to virtualization features and be more productive when doing so. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, May 13, 2019 at 11:17:45AM +0100, Daniel P. Berrangé wrote:
This is a long mail about ENOMEM (OOM) handling in libvirt. The executive summary is that it is not worth the maint cost because:
- this code will almost never run on Linux hosts
- if it does run it will likely have bad behaviour silently dropping data or crashing the process
- apps using libvirt often do so via a non-C language that aborts/exits the app on OOM regardless, or use other C libraries that abort
- we can build a system that is more reliable when OOM happens by not catching OOM, instead simply letting apps exit, restart and carry on where they left off
The long answer follows...
[...] +1 as I was planning to slowly convince others as well. Pavel

On Mon, 2019-05-13 at 11:17 +0100, Daniel P. Berrangé wrote:
This is a long mail about ENOMEM (OOM) handling in libvirt. The executive summary is that it is not worth the maint cost because:
- this code will almost never run on Linux hosts
- if it does run it will likely have bad behaviour silently dropping data or crashing the process
- apps using libvirt often do so via a non-C language that aborts/exits the app on OOM regardless, or use other C libraries that abort
- we can build a system that is more reliable when OOM happens by not catching OOM, instead simply letting apps exit, restart and carry on where they left off
[...]
Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library.
+1 to the idea of moving to GLib, which I guess is not at all surprising given that I had already suggested doing that a couple of years ago[1] ;) One possible complication is that we would not be able to use any of the GLib types in our public API... I think the way we should approach this is to consider the current public API as if it were yet another language binding, the language being plain C in this case, and make sure we have a very well defined boundary between them and everything else, basically treating them as a separate project that just so happens to live in the same repository and be developed in tandem. This should also make it easier for us to switch to a different programming language in the future, should we decide to. I also can't help but wonder what going in this direction would mean for libvirt-glib and the projects built upon it... [1] https://www.redhat.com/archives/libvir-list/2017-March/msg00008.html -- Andrea Bolognani / Red Hat / Virtualization

On Mon, May 13, 2019 at 02:00:28PM +0200, Andrea Bolognani wrote:
On Mon, 2019-05-13 at 11:17 +0100, Daniel P. Berrangé wrote:
This is a long mail about ENOMEM (OOM) handling in libvirt. The executive summary is that it is not worth the maint cost because:
- this code will almost never run on Linux hosts
- if it does run it will likely have bad behaviour silently dropping data or crashing the process
- apps using libvirt often do so via a non-C language that aborts/exits the app on OOM regardless, or use other C libraries that abort
- we can build a system that is more reliable when OOM happens by not catching OOM, instead simply letting apps exit, restart and carry on where they left off
[...]
Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library.
+1 to the idea of moving to GLib, which I guess is not at all surprising given that I had already suggested doing that a couple of years ago[1] ;)
One possible complication is that we would not be able to use any of the GLib types in our public API... I think the way we should approach this is to consider the current public API as if it were yet another language binding, the language being plain C in this case, and make sure we have a very well defined boundary between them and everything else, basically treating them as a separate project that just so happens to live in the same repository and be developed in tandem. This should also make it easier for us to switch to a different programming language in the future, should we decide to.
I'm not sure why you say we can't use GLib types in our public API ? I think we could use them, but I'd probably suggest we none the less choose not to use them in public API, only internally :-) But I'm anticipating we could replace virObject, with GObject, and as such all the virXXXXXPtr types in our public API would become GObjects. I think we'd likely keep them as opaque types though, despite the fact that they'd be GObjects, to retain our freedom to change impl again later if we wish. I won't think we need to change use of 'long long' to 'gint64', etc Not least because because GLib maintainers themselves are questioning whether to just mandate stdint.h types. This is fairly minor though.
I also can't help but wonder what going in this direction would mean for libvirt-glib and the projects built upon it...
I don't think it has a significant impact. libvirt-glib.so is just a glue to the GLib event loop. The libvirt built-in event loop might become the same thing. Most of the code is in libvirt-gconfig though which is a mapping of XML docs into the GObject model which is all still relevant. Likewise libvirt-gobject is a remapping of libvirt public API into GObject. If we don't expose the fact that our public API secretly uses GObject internally, then I think libvirt-gobject is also still valid. Potentially we could merge libvirt-gobject into our public API officially exposing that its GObject based, but I don't think that's an important thing in the near future, possibly not even in the long term. Basically I'd be cautious with our public API to avoid tieing the public API to the internal impl choice. Regards, Daniel
[1] https://www.redhat.com/archives/libvir-list/2017-March/msg00008.html -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, 2019-05-13 at 13:19 +0100, Daniel P. Berrangé wrote:
On Mon, May 13, 2019 at 02:00:28PM +0200, Andrea Bolognani wrote:
One possible complication is that we would not be able to use any of the GLib types in our public API... I think the way we should approach this is to consider the current public API as if it were yet another language binding, the language being plain C in this case, and make sure we have a very well defined boundary between them and everything else, basically treating them as a separate project that just so happens to live in the same repository and be developed in tandem. This should also make it easier for us to switch to a different programming language in the future, should we decide to.
I'm not sure why you say we can't use GLib types in our public API ?
I think we could use them, but I'd probably suggest we none the less choose not to use them in public API, only internally :-)
But I'm anticipating we could replace virObject, with GObject, and as such all the virXXXXXPtr types in our public API would become GObjects. I think we'd likely keep them as opaque types though, despite the fact that they'd be GObjects, to retain our freedom to change impl again later if we wish.
I won't think we need to change use of 'long long' to 'gint64', etc Not least because because GLib maintainers themselves are questioning whether to just mandate stdint.h types. This is fairly minor though.
I was mostly thinking about this latter example and other situations along those lines. For example, we'll definitely need to start using gchar* internally, and since we don't want that implementation detail exposed in our plain C bindings, then we'll have to do at least some very lightweight conversion (casting) between that and char*. This is one of the examples where considering the existing API as a language binding would IMHO result in a maintainable structure. Another situation where the above model would help is error reporting: if we start using GLib heavily, then it might make sense to adopt GError as well, but doing so means we'd have to convert to our existing error reporting facilities somewhere. If we consider the plain C API to be a binding, then that's not different from what we already do for Python and friends. As for GObject, yeah, we want all public structures to be opaque anyway; at the same time, we won't be able to turn existing non-opaque public structures into GObjects. I'm not sure how big of a deal that would be in practice, but I just thought I'd bring it up.
I also can't help but wonder what going in this direction would mean for libvirt-glib and the projects built upon it...
I don't think it has a significant impact. libvirt-glib.so is just a glue to the GLib event loop. The libvirt built-in event loop might become the same thing.
Most of the code is in libvirt-gconfig though which is a mapping of XML docs into the GObject model which is all still relevant.
Likewise libvirt-gobject is a remapping of libvirt public API into GObject. If we don't expose the fact that our public API secretly uses GObject internally, then I think libvirt-gobject is also still valid.
Potentially we could merge libvirt-gobject into our public API officially exposing that its GObject based, but I don't think that's an important thing in the near future, possibly not even in the long term. Basically I'd be cautious with our public API to avoid tieing the public API to the internal impl choice.
Yeah, I guess we can figure that out at a later date. -- Andrea Bolognani / Red Hat / Virtualization

On Mon, May 13, 2019 at 04:33:17PM +0200, Andrea Bolognani wrote:
On Mon, 2019-05-13 at 13:19 +0100, Daniel P. Berrangé wrote:
On Mon, May 13, 2019 at 02:00:28PM +0200, Andrea Bolognani wrote:
One possible complication is that we would not be able to use any of the GLib types in our public API... I think the way we should approach this is to consider the current public API as if it were yet another language binding, the language being plain C in this case, and make sure we have a very well defined boundary between them and everything else, basically treating them as a separate project that just so happens to live in the same repository and be developed in tandem. This should also make it easier for us to switch to a different programming language in the future, should we decide to.
I'm not sure why you say we can't use GLib types in our public API ?
I think we could use them, but I'd probably suggest we none the less choose not to use them in public API, only internally :-)
But I'm anticipating we could replace virObject, with GObject, and as such all the virXXXXXPtr types in our public API would become GObjects. I think we'd likely keep them as opaque types though, despite the fact that they'd be GObjects, to retain our freedom to change impl again later if we wish.
I won't think we need to change use of 'long long' to 'gint64', etc Not least because because GLib maintainers themselves are questioning whether to just mandate stdint.h types. This is fairly minor though.
I was mostly thinking about this latter example and other situations along those lines. For example, we'll definitely need to start using gchar* internally, and since we don't want that implementation detail exposed in our plain C bindings, then we'll have to do at least some very lightweight conversion (casting) between that and char*. This is one of the examples where considering the existing API as a language binding would IMHO result in a maintainable structure.
Another situation where the above model would help is error reporting: if we start using GLib heavily, then it might make sense to adopt GError as well, but doing so means we'd have to convert to our existing error reporting facilities somewhere. If we consider the plain C API to be a binding, then that's not different from what we already do for Python and friends.
Yep,I see what you mean.
As for GObject, yeah, we want all public structures to be opaque anyway; at the same time, we won't be able to turn existing non-opaque public structures into GObjects. I'm not sure how big of a deal that would be in practice, but I just thought I'd bring it up.
There is a concept called "Boxed types[1]" which lets you deal with plain C structs using the GObject system. It is good when you have a light weight struct where you want easy deep-copying but don't want the overhead of a full GObject, or where you're consuming structs defined by a 3rd party which you can't control. But as we seem to agree, I don't think we need touch anything in the public headers in the forseable future, so don't have to worry about that. Regards, Daniel [1] https://developer.gnome.org/gobject/stable/gobject-Boxed-Types.html -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Andrea Bolognani <abologna@redhat.com> writes:
On Mon, 2019-05-13 at 13:19 +0100, Daniel P. Berrangé wrote:
On Mon, May 13, 2019 at 02:00:28PM +0200, Andrea Bolognani wrote:
One possible complication is that we would not be able to use any of the GLib types in our public API... I think the way we should approach this is to consider the current public API as if it were yet another language binding, the language being plain C in this case, and make sure we have a very well defined boundary between them and everything else, basically treating them as a separate project that just so happens to live in the same repository and be developed in tandem. This should also make it easier for us to switch to a different programming language in the future, should we decide to.
I'm not sure why you say we can't use GLib types in our public API ?
I think we could use them, but I'd probably suggest we none the less choose not to use them in public API, only internally :-)
But I'm anticipating we could replace virObject, with GObject, and as such all the virXXXXXPtr types in our public API would become GObjects. I think we'd likely keep them as opaque types though, despite the fact that they'd be GObjects, to retain our freedom to change impl again later if we wish.
I won't think we need to change use of 'long long' to 'gint64', etc Not least because because GLib maintainers themselves are questioning whether to just mandate stdint.h types.
Interesting. Got a link?
This is fairly minor though.
I was mostly thinking about this latter example and other situations along those lines. For example, we'll definitely need to start using gchar* internally,
Are you sure about "definitely"? gchar is merely a typedef name for char...
and since we don't want that implementation detail exposed in our plain C bindings,
Yup, letting GLib's typedef names for ordinary C types leak into your public headers would be a mistake.
then we'll have to do at least some very lightweight conversion (casting) between that and char*. This is one of the examples where considering the existing API as a language binding would IMHO result in a maintainable structure.
[...]

On Wed, May 22, 2019 at 10:08:44AM +0200, Markus Armbruster wrote:
Andrea Bolognani <abologna@redhat.com> writes:
On Mon, 2019-05-13 at 13:19 +0100, Daniel P. Berrangé wrote:
On Mon, May 13, 2019 at 02:00:28PM +0200, Andrea Bolognani wrote:
One possible complication is that we would not be able to use any of the GLib types in our public API... I think the way we should approach this is to consider the current public API as if it were yet another language binding, the language being plain C in this case, and make sure we have a very well defined boundary between them and everything else, basically treating them as a separate project that just so happens to live in the same repository and be developed in tandem. This should also make it easier for us to switch to a different programming language in the future, should we decide to.
I'm not sure why you say we can't use GLib types in our public API ?
I think we could use them, but I'd probably suggest we none the less choose not to use them in public API, only internally :-)
But I'm anticipating we could replace virObject, with GObject, and as such all the virXXXXXPtr types in our public API would become GObjects. I think we'd likely keep them as opaque types though, despite the fact that they'd be GObjects, to retain our freedom to change impl again later if we wish.
I won't think we need to change use of 'long long' to 'gint64', etc Not least because because GLib maintainers themselves are questioning whether to just mandate stdint.h types.
Interesting. Got a link?
https://gitlab.gnome.org/GNOME/glib/issues/1484 Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Daniel P. Berrangé <berrange@redhat.com> [2019-05-13, 11:17AM +0100]:
This is a long mail about ENOMEM (OOM) handling in libvirt. The executive summary is that it is not worth the maint cost because:
- this code will almost never run on Linux hosts
- if it does run it will likely have bad behaviour silently dropping data or crashing the process
- apps using libvirt often do so via a non-C language that aborts/exits the app on OOM regardless, or use other C libraries that abort
- we can build a system that is more reliable when OOM happens by not catching OOM, instead simply letting apps exit, restart and carry on where they left off
The long answer follows...
Yes, please, thank you. All in favour!

On 5/13/19 12:17 PM, Daniel P. Berrangé wrote:
This is a long mail about ENOMEM (OOM) handling in libvirt. The executive summary is that it is not worth the maint cost because:
- this code will almost never run on Linux hosts
- if it does run it will likely have bad behaviour silently dropping data or crashing the process
- apps using libvirt often do so via a non-C language that aborts/exits the app on OOM regardless, or use other C libraries that abort
- we can build a system that is more reliable when OOM happens by not catching OOM, instead simply letting apps exit, restart and carry on where they left off
The long answer follows...
I'm up for dropping OOM handling. Since we're linking with a lot of libraries I don't think we can be confident that we won't abort() even now anyway.
The background ==============
Since the first commit libvirt has attempted to handle out of memory (OOM) errors in the same way it does for any other problem. Namely, a virError will be raised, and the method will jump to its cleanup/error label. The error bubbles up the stack and in theory someone or something will catch this and do something sensible/useful. This has long been considered "best practice" for most C libraries, especially those used for system services. This mail makes the argument that it is in fact /not/ "best practice" to try to handle OOM.
OOM handling is very difficult to get correct because it is something that developers almost never encounter and thus the code paths are rarely run. We designed our memory allocation APIs such that we get compile time errors if code forgets to check the return value for failure. This is good as it eliminates an entire class of bugs. Our error handling goto label pattern tries to align OOM handling with other general error handling which is more commonly tested.
We have code in the allocators which lets us run unit tests simulating the failure of any allocation that is made during the test. Executing this is extraordinarily time consuming as some of our unit tests have many 1000's or even 10's of 1000's of allocations. The running time to simulate OOM for each allocation is O(N^2) which does not scale well. As a result we've probably only run these OOM tests a handful of times over the years.
Well, I've tried this and so far, the only problems I've found were in tests where we're ignoring revals of VIR_ALLOC() or its friends (e.g. vir*New()) or we're checking it after it's used already. BTW: to make it work I had to disable failing from mocks (obviously).
The tests show we generally do remarkly well at OOM handling, but none the less we have *always* found bugs in our handling where we fail to report the OOM (as in, continue to run but with missing data), or we crash when cleaning up from OOM. Our unit tests don't cover anywhere near all of our codebase though. We have great coverage of XML parsing/formatting, and sporadic coverage of everything else.
IOW, despite our clever API designs & coding patterns designed to improve our OOM handling, we can not have confidence that we will actually operate correctly in an OOM scenario.
I beg to disagree. The fact that our cleanup paths are in 99% the same as error paths means that if there are no mem-leaks (and if transformation to VIR_AUTOFREE is done the chances are low), then we propably do good on OOM too. <snip/>
The implementation ==================
Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library.
No, please no. Firstly, glib is a new C dialect that only a few of us speak. Secondly, glib adds some padding around its objects => libvirt's memory footprint will grow. Thirdly, it's yet another library to link with (on my system libvirt links with 53 libraries already). Michal

On Mon, May 13, 2019 at 03:28:24PM +0200, Michal Privoznik wrote:
On 5/13/19 12:17 PM, Daniel P. Berrangé wrote:
The tests show we generally do remarkly well at OOM handling, but none the less we have *always* found bugs in our handling where we fail to report the OOM (as in, continue to run but with missing data), or we crash when cleaning up from OOM. Our unit tests don't cover anywhere near all of our codebase though. We have great coverage of XML parsing/formatting, and sporadic coverage of everything else.
IOW, despite our clever API designs & coding patterns designed to improve our OOM handling, we can not have confidence that we will actually operate correctly in an OOM scenario.
I beg to disagree. The fact that our cleanup paths are in 99% the same as error paths means that if there are no mem-leaks (and if transformation to VIR_AUTOFREE is done the chances are low), then we propably do good on OOM too.
There possibly might be cases where we dereference the pointer unconditionally in the cleanup section, regardless of whether the allocation succeeded.
<snip/>
The implementation ==================
Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library.
No, please no. Firstly, glib is a new C dialect that only a few of us speak.
Well, libvirt C is also a dialect that is even less widely spoken. Moreover, it will make it easier for QEMU developers to contribute to libvirt and vica verca. Also, learning it is much easier than learning a different language (Rust? Haskell?)
Secondly, glib adds some padding around its objects => libvirt's memory footprint will grow.
We also do a lot of unnecessary (re)allocations to simplify the code, the logic being that the extra developer time spent debugging is not worth the neligible increase. But this seems like something that's hard to measure until we rewrite all of libvirt in glib.
Thirdly, it's yet another library to link with (on my system libvirt links with 53 libraries already).
Well, if you already have to have glib because of QEMU, why does the library count matter? Jano

On Mon, 2019-05-13 at 15:28 +0200, Michal Privoznik wrote:
On 5/13/19 12:17 PM, Daniel P. Berrangé wrote:
Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library.
No, please no. Firstly, glib is a new C dialect that only a few of us speak.
We all got used to libvirt's own interpretation of C, with all its specific APIs and quirks, so I don't think doing the same for GLib is at all beyond us.
Secondly, glib adds some padding around its objects => libvirt's memory footprint will grow.
Unless I'm mistaken, this is only done for public structures that might need to be extended in the future, which are an exception rather than the norm.
Thirdly, it's yet another library to link with (on my system libvirt links with 53 libraries already).
QEMU already links against GLib, so on basically all machine you'll end up with the very same number of libraries loaded into memory. -- Andrea Bolognani / Red Hat / Virtualization

On Mon, May 13, 2019 at 03:28:24PM +0200, Michal Privoznik wrote:
On 5/13/19 12:17 PM, Daniel P. Berrangé wrote:
The background ==============
Since the first commit libvirt has attempted to handle out of memory (OOM) errors in the same way it does for any other problem. Namely, a virError will be raised, and the method will jump to its cleanup/error label. The error bubbles up the stack and in theory someone or something will catch this and do something sensible/useful. This has long been considered "best practice" for most C libraries, especially those used for system services. This mail makes the argument that it is in fact /not/ "best practice" to try to handle OOM.
OOM handling is very difficult to get correct because it is something that developers almost never encounter and thus the code paths are rarely run. We designed our memory allocation APIs such that we get compile time errors if code forgets to check the return value for failure. This is good as it eliminates an entire class of bugs. Our error handling goto label pattern tries to align OOM handling with other general error handling which is more commonly tested.
We have code in the allocators which lets us run unit tests simulating the failure of any allocation that is made during the test. Executing this is extraordinarily time consuming as some of our unit tests have many 1000's or even 10's of 1000's of allocations. The running time to simulate OOM for each allocation is O(N^2) which does not scale well. As a result we've probably only run these OOM tests a handful of times over the years.
Well, I've tried this and so far, the only problems I've found were in tests where we're ignoring revals of VIR_ALLOC() or its friends (e.g. vir*New()) or we're checking it after it's used already.
Hmm, I guess you didn't check the same as me, as I've found crashes $ VIR_TEST_OOM=1 ./qemuxml2argvtest ....snip lots of test output... 631) QEMU XML-2-ARGV tpm-passthrough ... OK Test OOM for nalloc=162 ...................................................................................................................................Segmentation fault (core dumped) There are a bunch of earlier messages that aren't causing crashes, but are showing that we're silently generating unexpected output for cli args, or are raising bizarre error messages instead of "out of memory".
The tests show we generally do remarkly well at OOM handling, but none the less we have *always* found bugs in our handling where we fail to report the OOM (as in, continue to run but with missing data), or we crash when cleaning up from OOM. Our unit tests don't cover anywhere near all of our codebase though. We have great coverage of XML parsing/formatting, and sporadic coverage of everything else.
IOW, despite our clever API designs & coding patterns designed to improve our OOM handling, we can not have confidence that we will actually operate correctly in an OOM scenario.
I beg to disagree. The fact that our cleanup paths are in 99% the same as error paths means that if there are no mem-leaks (and if transformation to VIR_AUTOFREE is done the chances are low), then we propably do good on OOM too.
I don't think that holds true, as we can see from the wierd error codes reported in OOM scenarios, or the crashes, or the incorrect ARGV being generated silently. We do remarkably well, but we still fail despite our best efforts. I can not have confidence in this at runtime in a production world if we're not even going to report correct error messages or silently created malformed data.
The implementation ==================
Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library.
No, please no. Firstly, glib is a new C dialect that only a few of us speak.
Of course not everyone will know the APIs right now, but I'm confident any of us our easily capable of learning what it provides as we need to. We're already constantly learning new APIs as we introduce more frameworks in libvirt. QEMU's adoption of GLib shows it can be done without any real difficulty & by using it in libvirt too, we'll make it easier for QEMU contributors to understand libvirt as we won't have reinvented the wheel so much.
Secondly, glib adds some padding around its objects => libvirt's memory footprint will grow. Thirdly, it's yet another library to link with (on my system libvirt links with 53 libraries already).
I don't think the padding on objects is significant enough to be a concern when placed in context of what libvirt already does. Most of the per-VM memory usage is around the data we store for that VM, especially the huge parsed XML configs spanning 100's of structs - that won't change and will dominate any extra padding on objects. Libvirt memory usage as a whole is tiny compared to the memory allocated to the actual QEMU VM RAM areas. QEMU is already linking to it so it will be resident both in the installed footprint and in memory. The only overhead is thus from runtime allocations. So I don't think there's any reason to avoid it in terms of memory overhead Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, May 13, 2019 at 03:25:43PM +0100, Daniel P. Berrangé wrote:
On Mon, May 13, 2019 at 03:28:24PM +0200, Michal Privoznik wrote:
On 5/13/19 12:17 PM, Daniel P. Berrangé wrote:
The background ==============
Since the first commit libvirt has attempted to handle out of memory (OOM) errors in the same way it does for any other problem. Namely, a virError will be raised, and the method will jump to its cleanup/error label. The error bubbles up the stack and in theory someone or something will catch this and do something sensible/useful. This has long been considered "best practice" for most C libraries, especially those used for system services. This mail makes the argument that it is in fact /not/ "best practice" to try to handle OOM.
OOM handling is very difficult to get correct because it is something that developers almost never encounter and thus the code paths are rarely run. We designed our memory allocation APIs such that we get compile time errors if code forgets to check the return value for failure. This is good as it eliminates an entire class of bugs. Our error handling goto label pattern tries to align OOM handling with other general error handling which is more commonly tested.
We have code in the allocators which lets us run unit tests simulating the failure of any allocation that is made during the test. Executing this is extraordinarily time consuming as some of our unit tests have many 1000's or even 10's of 1000's of allocations. The running time to simulate OOM for each allocation is O(N^2) which does not scale well. As a result we've probably only run these OOM tests a handful of times over the years.
Well, I've tried this and so far, the only problems I've found were in tests where we're ignoring revals of VIR_ALLOC() or its friends (e.g. vir*New()) or we're checking it after it's used already.
Hmm, I guess you didn't check the same as me, as I've found crashes
$ VIR_TEST_OOM=1 ./qemuxml2argvtest ....snip lots of test output... 631) QEMU XML-2-ARGV tpm-passthrough ... OK Test OOM for nalloc=162 ...................................................................................................................................Segmentation fault (core dumped)
There are a bunch of earlier messages that aren't causing crashes, but are showing that we're silently generating unexpected output for cli args, or are raising bizarre error messages instead of "out of memory".
The tests show we generally do remarkly well at OOM handling, but none the less we have *always* found bugs in our handling where we fail to report the OOM (as in, continue to run but with missing data), or we crash when cleaning up from OOM. Our unit tests don't cover anywhere near all of our codebase though. We have great coverage of XML parsing/formatting, and sporadic coverage of everything else.
IOW, despite our clever API designs & coding patterns designed to improve our OOM handling, we can not have confidence that we will actually operate correctly in an OOM scenario.
I beg to disagree. The fact that our cleanup paths are in 99% the same as error paths means that if there are no mem-leaks (and if transformation to VIR_AUTOFREE is done the chances are low), then we propably do good on OOM too.
I don't think that holds true, as we can see from the wierd error codes reported in OOM scenarios, or the crashes, or the incorrect ARGV being generated silently.
We do remarkably well, but we still fail despite our best efforts. I can not have confidence in this at runtime in a production world if we're not even going to report correct error messages or silently created malformed data.
The implementation ==================
Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library.
No, please no. Firstly, glib is a new C dialect that only a few of us speak.
Of course not everyone will know the APIs right now, but I'm confident any of us our easily capable of learning what it provides as we need to. We're already constantly learning new APIs as we introduce more frameworks in libvirt. QEMU's adoption of GLib shows it can be done without any real difficulty & by using it in libvirt too, we'll make it easier for QEMU contributors to understand libvirt as we won't have reinvented the wheel so much.
Secondly, glib adds some padding around its objects => libvirt's memory footprint will grow. Thirdly, it's yet another library to link with (on my system libvirt links with 53 libraries already).
I don't think the padding on objects is significant enough to be a concern when placed in context of what libvirt already does. Most of the per-VM memory usage is around the data we store for that VM, especially the huge parsed XML configs spanning 100's of structs - that won't change and will dominate any extra padding on objects. Libvirt memory usage as a whole is tiny compared to the memory allocated to the actual QEMU VM RAM areas.
QEMU is already linking to it so it will be resident both in the installed footprint and in memory. The only overhead is thus from runtime allocations. So I don't think there's any reason to avoid it in terms of memory overhead
Actually let me correct that a little, as glib is actually split into multiple ELF libraries libglib, libgobject, libgio, libgmodule, libgthread. QEMU uses libglib only, not the libgobject / libgio. QEMU re-invented GOBject itself and libgio didn't exist when QEMU first used glib, so it has reinvented large parts of libgio too. QEMU's min glib version is now new enough that it should really start using libgio. So currently the disk install has all the libraries, but only libglib is resident in ram for QEMU. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 5/13/19 8:28 AM, Michal Privoznik wrote:
On 5/13/19 12:17 PM, Daniel P. Berrangé wrote:
This is a long mail about ENOMEM (OOM) handling in libvirt. The executive summary is that it is not worth the maint cost because:
The long answer follows...
I'm up for dropping OOM handling. Since we're linking with a lot of libraries I don't think we can be confident that we won't abort() even now anyway.
I can live with the decision to drop OOM handling, as long as we still try to gracefully handle any user-requested large allocation (such as g_try_malloc) and only default to abort-on-failure for the more common case of small allocations (such as g_malloc).
The implementation ==================
Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and make use of the GLib library.
No, please no. Firstly, glib is a new C dialect that only a few of us speak. Secondly, glib adds some padding around its objects => libvirt's memory footprint will grow. Thirdly, it's yet another library to link with (on my system libvirt links with 53 libraries already).
I'm not sold on the fact that glib will ease things, but I do agree that one reason we have avoided it so far is because of its abort-on-ENOMEM behavior, and we are now in agreement that this is no longer a blocker for using glib. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org

Eric Blake <eblake@redhat.com> writes:
On 5/13/19 8:28 AM, Michal Privoznik wrote:
On 5/13/19 12:17 PM, Daniel P. Berrangé wrote:
This is a long mail about ENOMEM (OOM) handling in libvirt. The executive summary is that it is not worth the maint cost because:
The long answer follows...
I'm up for dropping OOM handling. Since we're linking with a lot of libraries I don't think we can be confident that we won't abort() even now anyway.
I can live with the decision to drop OOM handling, as long as we still try to gracefully handle any user-requested large allocation (such as g_try_malloc) and only default to abort-on-failure for the more common case of small allocations (such as g_malloc).
This argument relies on the "implicit assumption that malloc will actually return ENOMEM when memory is exhausted" (Daniel's wording). His whole section is worth (re-)reading; I append it for your convenience. The way Linux behaves reduces the benefit of attempting to "gracefully handle any user-requested large allocation". It also makes it harder to test, increasing its risks. Is it worthwhile? We had this discussion in QEMU (which has the policy you propose, followed except when it isn't, error recovery largely untested). I think it's not worthwhile for QEMU, but others have different opinions. [...] The complication of Linux ========================= Note that all of the above discussion had the implicit assumption that malloc will actually return ENOMEM when memory is exhausted. On Linux at least this is not the case in a default install. Linux tends to enable memory overcommit causing the kernel to satisfy malloc requests even if it exceeds available RAM + Swap. The allocated memory won't even be paged in to physical RAM until the page is written to. If there is insufficient to make a page resident in RAM, then the OOM killer is set free. Some poor victim will get killed off. It is possible to disable RAM overcommit and also disable or deprioritize the OOM killer. This might make it possible to actually see ENOMEM from an allocation, but few if any developers or applications run in this scenario so it should be considered untested in reality. With cgroups it is possible to RAM and swap usage limits. In theory you can disable the OOM killer per-cgroup and this will cause ENOMEM to the app. In my testing with a trivial C program that simply mallocs a massive array and them memsets it, it is hard to get ENOMEM to happen reliably. Sometimes the app ends up just hanging when testing this. Other operating systems have different behaviour and so are more likely to really report ENOMEM to application code, but even if you can get ENOMEM on other OS the earlier points illustrate that its not worth worrying about. End quote.

<snip/>
One thing that I've realized was that we have the NSS plugin which currently links statically with a small library that we build aside (src/libvirt-nss.la). The plugin uses virHash, virObject, virString and some other submodules of ours. If we replace our implementation with glib then effectively every process that ever calls gethostbyname() will be dynamically linking with glib/gobject/.... Don't know if it's a bad thing, just putting it out here to consider. Michal

On Mon, May 20, 2019 at 02:16:44PM +0200, Michal Privoznik wrote:
<snip/>
One thing that I've realized was that we have the NSS plugin which currently links statically with a small library that we build aside (src/libvirt-nss.la). The plugin uses virHash, virObject, virString and some other submodules of ours. If we replace our implementation with glib then effectively every process that ever calls gethostbyname() will be dynamically linking with glib/gobject/....
Don't know if it's a bad thing, just putting it out here to consider.
I'm not so worried about the fact that we're linking with more stuff since glib etc will already be memory resident. The fact that the stuff we link to will abort on OOM though will be undesirable for an NSS module that can be loaded into every single app on the host. Some of those apps may really want/need to be safe wrt OOM and you can't easily opt-out of NSS modules per app. The libvirt_nss.c file is only 700 lines long. Despite out cut down static linking it still pulls in quite a bit of libvirt code. This is probably a case where we'd need to make the NSS module use only plain C library codes + json lib, with no abstraction from libvirt or any other library. We don't really need the portability benefits from either gnulib or glib for NSS anyway, since it is a Linux specific framework. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (8)
-
Andrea Bolognani
-
Bjoern Walk
-
Daniel P. Berrangé
-
Eric Blake
-
Ján Tomko
-
Markus Armbruster
-
Michal Privoznik
-
Pavel Hrdina