[libvirt] Redesigning Libvirt: Adapting for the next 12 years

Hold tight, this is a long one... It is hard for me to believe it, but the libvirt project is now 12 years old (born on Nov 2, 2005), and I've been working on it since March 2006, making it easily the most significant project I've worked on. It started off life as an attempt to provide a stable application development API for the Xen hypervisor, interfacing across XenD, XenStore and Xen hypercalls. It was initially just a plain C library and Python binding, but when we added QEMU support in Feb 2007 the libvirtd daemon was born. That cemented a split of hypervisor drivers into two distinct architectures, stateless drivers where all logic was in the library (VMware ESX, VirtualBox, original Xen) and stateful drivers where all logic was in the daemon (QEMU, LXC, UML, modern Xen). The project has been wildly successful beyond our expectations, in particular the hypervisor abstraction layer made it possible for RHEL to switch from using Xen to KVM while keeping the userspace tooling the same for users. Libvirt is now used, to some degree, by likely 100's of applications with KVM being the dominant hypervisor choice by a long way. There is an old adage in the computer industry though "Adapt or die" This is usually applied to companies who see their primary product suddenly become a commodity, or disappear into irrelevance as new technology disrupts the market, killing their revenue stream. It is, however, just as reasonable to apply this to open source projects which can see their core usage scenarios disrupted by new startup projects & technologies. While the open source code will never go away, the companies who pay for the project's developers can quickly reassign them elsewhere, seriously harming the viability of the community thereafter. IOW, while Libvirt has seen 12 years of great success, we must not be so naive to assume we are going to see another 12 years without being disrupted. Over time we've done alot of work refactoring libvirt code to introduce new concepts and support new hypervisor targets, but I think its fair to say that at a high level the architecture is unchanged since we first introduced libvirtd, and then its multithreaded internals in the 2006-2008 timeframe. We've taken a fairly conservative, evolutionary approach to our changes. This is good, because providing stability to our users is a critically important reason for libvirt to exist. This is bad, because we've not been willing to take risks in short term that could potentially be very beneficial in the long term (5-10 year time). I think that now is the time to consider some major architectural changes in the approach we take. There's no single reason, rather a combination of factors all coming together to form a compelling case for ambitious change. Before going further though, I want to highlight one important point: I am NOT suggesting changing the public API or the XML format in a backwards incompatible manner. API & XML stability is the single most important part of libvirt and MUST be maintained on a par with that's available today. IOW we can add new features, but can't remove what's there already. This even leaves the door open for providing a libvirt2.so, provided we're willing to still maintain libvirt.so indefinitely alongside, though that's not something I'd encourage. The majority of the hard problems we face are not in the API design, or in the XML format, so that's not a significant limiting factor IMHO. There are three core areas of libvirt I see that are problematic, and where the fixes have major implications. At a very high level what I'm going to suggest is - Expose key hypervisor specific concepts as fully supported features to applications. In particular provide a way for an application to launch QEMU processes directly in their process execution environment, rather than as a child of libvirtd. - Explode the libvirtd daemon into a swarm of independant daemons. This would provide a more reliable system where a single bug doesn't take out the entire libvirt management daemon. It would allow for better security isolation of components. It would let session libvirtd use system daemons for networking & hostdev setup - Adopt use of Go and gradually convert (all|most) our C code into Go. This would improve the reliablity of libvirt, by giving us a memory safe language with garbage collection. It would improve productivity by letting us spend more time writing interesting code, rather than wasting time on platform portability or building basic abstractions for things like OO programming, hash tables, etc (much of the stuff we have in src/util), no more XML parsers needed (just annotated struct fields). It would increase the talent pool of potential contributors to libvirt by lowering the bar to getting work done. To avoid this mail getting too long, I'll cover each area in a separate mail. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

The problem(s) ============== While a hypervisor agnostic API is useful for some users, it is completely irrelevant, and potentally even painful, for other users. We made some concessions to this when we introduced hypervisor specific XML namespaces and option for hypervisor specific add-on APIs. We tell apps these are all unsupported for production usage though. IOW, have this pony, but you can never play with it. The hypervisor agnostic API approach inevitably took us in a direction where libvirt (or something below it) is in charge of managing the QEMU process lifecycle. We can't expose the concept of process management upto the client application because many hypervisors don't present virtual machine as UNIX processes, or merely have processes as a secondary concept eg with Xen a QEMU process is just subservient to the main Xen guest domain concept. Essentially libvirt expects the application to treat the hypervisor / compute host as a black box and just rely on libvirt APIs for managing the virtual machines, because that is the only way to provide a hypervisor agnostic view of a commpute host. This approach also gives parity of functionality regardless of whether the management app is on a remote machine, vs colocated locally with libvirtd. Most of the large scale management applications have ended up with a model where they have a component on each compute host talking to libvirt locally over a UNIX socket, with TCP based access only really used for live migration. Thus the management apps have rarely considered the Linux OS to truely be a black box when dealing with KVM. To some degree they all peer inside the box, and wish to take advantage of some of the concepts Linux exposes to integrate with the hypervisor. The inability to directly associate a client with the lifecycle of a single QEMU process has long been a source of frustration to libguestfs. The level of indirection forced by use of libvirtd does not map well to how libguestfs wants to use QEMU. Essentially libguestfs isn't trying to use QEMU in a system mangement scenario, but rather utilize it as an embedded technology component. As a result, libguestfs still has its own non-libvirt based way of spawning QEMU which is often used in preference to its libvirt based impl. Other apps like libvirt-sandbox have faced the same. When systemd came into existance, finally providing good mechanisms for process management in Linux machines, we found a tension between what libvirt wants todo and what systemd wants todo. The best we've managed is a compromise where libvirt spawns the guest, but then registers it with systemd. Users can't directly spawn QEMU guests with systemd and then manage them with libvirt. We've not seen people seriously try to manage QEMU guests directly with systemd, but it is fair to say that the combination of systemd and docker have taken away most potential users of libvirt's LXC driver, as apps managing containers don't want to treat the host as a blackbox, they want to have more direct control. The failure to get adoption of the LXC driver serves as a cautionary tale for what could happen to use of the libvirt QEMU driver in future. More recently the increasing interest in use of containers is raising new interesting architectures for the management of processes. In particular the Kubernetes project can be considerd to provide cluster-wide management of processes, aka k8s is systemd for data centers. Again there is interest in using Kubernetes to manage QEMU guests across the data center. The KubeVirt project is attempting to bridge the conflicting world views of libvirt and Kubernetes to build a KVM management system to eventually replace both oVirt and OpenStack. The need to have libvirtd spawn the QEMU processes is causing severe complications for KubeVirt architecture, causing them to seriously consider not using libvirt for management KVM. This issue is a major blocking item for KubeVirt to the extent that they may well have to abandon use of libvirt to get the process startup & resource integration model they need. On Linux as far as hypervisor technology is concerned, KVM has won the battles and the war. OpenStack user surveys have constantly put KVM/QEMU on top with at least one order of magnitude higher usage than any other technology. Amazon was always the major reference for usage of Xen in public cloud and even they appear to be about to pivot to KVM. IOW, while providing a hypervisor agnostic management API is still a core competancy of libvirt, we need to embrace the reality that KVM is the defacto standard on Linux and better enable people to take avantage of its unique features, because that is where most of our userbase is. A second example of limitations of the purely hypervisor agnostic approach to libvirt is the way our API design is fully synchronous. An application calling a libvirt API blocks until its execution is complete. This approach was originally driven by the need to integrate directly with various Xen backend APIs which were also mostly synchronous in design. Later we added other hypervisor targets which also exposed synchronous APIs. In parallel though, we added the libvirtd daemon for running stateful hypervisor drivers like QEMU, LXC, UML, and now Xen. We speak to this over an RPC system that can handle arbitrarily overlapping asynchronous requests, but then force it into our synchronous public API. For applications which only care about using KVM, the ability to use an asynchronous API could be very interesting as it would no longer force them to spawn large numbers of threads to get parallel API execution. The Solution(s) =============== Currently our long term public stability promise just covers the XML format and library API. To enable more interesting usage of hypervisor specific concepts it is important to consider how to provide other options beyond just the current API and XML formats. IOW, I'm not talking about making QMP command passthrough or CLI arg passthrough fully supported features, as libvirt's API & XML abstraction has clear value there. Rather I'm thinking about more architectural level changes. In particular I want to try to break down the black box model of the host, to make it possible to exploit KVM's key distinguishing feature, which is that the guest is just a normal process. An application that knows how to spawn & reap processes should be able to launch KVM as if it was just another normal process. This implies that the application needs the option to handle the fork+exec of KVM, instead of libvirt, if it so wishes. I would anticipate a standalone process "libvirt-qemu" that an application can spawn, providing a normal domain XML file via the command line or stdin. It would then connect to libvirtd to register its existance and claim its ownership of the guest name + UUID. Assuming that succeeds, 'libvirt-qemu' would directly spawn QEMU. In this manner, the QEMU process automatically inherits all the characteristics of the application that invoked the 'libvirt-qemu' binary. This means it shares the user / group ID, the security context, the cgroup placement, the set of kernel namespaces, etc. Libvirt would honour these characteristics by default, but also have ability to further refine them. For example, it would honour the initial process CPU pinning, but could still further pin individual QEMU threads. In the initial implementation I would anticipate that libvirtd still retains control over pretty much every other aspect of ongoing QEMU management. ie libvirtd still owns the monitor connection. This means there would be some assumptions / limitations in functionality in the short term. eg it might be assumed that while libvirtd & libvirt-qemu can be in different mount namespaces, they must none the less be able to see the same underlying storage in their respective namespaces. The next mail in this series, however, takes things further to move actual driver functionality into libvirt-qemu, at which point limitations around namespaces would be largely eliminated. This design would solve the single biggest problem with managing QEMU from apps like libguestfs, systemd and KubeVirt. To avoid having 2 divergant launch processes, when libvirtd itself launches a QEMU process, it would have to use the same "libvirt-qemu' shim to do so. This would ensure functional equivalance regardless of whether the management app used the hypervisor agnostic API, or instead used the QEMU specific approach of running "libvirt-qemu". We made a crude attempt previously to allow apps to run their own QEMU and have it managed by libvirt, via the virDomainQemuAttach API. That API design was impossible to ever consider fully supported, because the mgmt application was still in charge of designing QEMU command line arguments, and it is impractical for libvirt to cope with an arbitrary set of args. With the new proposal, we're still using the current libvirt code for converting XML into QEMU args, so have a predictable configuration for QEMU. Thus the new approach can provide a fully supported way for applications to spawn QEMU. This concept of a "libvirt-qemu" shim is not all that far away from the current "libvirt-lxc" shim we have. With this in mind, it would also be desirable to make that a fully supported way to spawn LXC processes, which can then be managed by libvirt. This would make the libvirt LXC driver more interesting for people who wish to run containers (though it is admittedly to late to really recapture any significant usage from other container technologies). As mentioned earlier, if an application is only concerned with managing of KVM (or other stateful drivers running inside libvirtd), we have scope to be able to expose a fully asynchronous management API to applications. Such an undertaking would effectively mean creating an entirely new libvirt client library, to expose the asynchronous design, and we obvious have to keep the current library around long term regardless. Creating a new library would also involve creating new language bindings, which just adds to the work. Rather than undertake this massive amount of extra work, I think it is worth considering declaring the RPC protocol to be a fully supported interface for applications to consume. There are already projects which are re-implemented the libvirt client API directly ontop of the RPC protocol, bypassing libvirt.so. We have always strongly discouraged this, but none the less it has happened. As we have to maintain strong protocol compatibility on the RPC layer, it is effectively a stable API already. We cannot ever change it in incompatible manner without breaking our own client library implementation. So declaring it a formally supported interface for libvirt would not really involve any significant extra work on our part, just acknowledgement of the existing reality. It would perhaps involve some documentation work to assist developers wishing to consume it though. We would also have to outline the caveats of taking such an approach, which principally involve loosing ability to use the stateless hypervisor drivers which all live in the libvirt library. This is not a real issue though, because the people building ontop of the RPC protocol only care about KVM. Another example where exposing a KVM specific model might help is wrt live migration, specifically the initial launch of QEMU on the target host. Our libvirt migration API doesn't given the application direct control over this part which has caused apps like OpenStack to jump through considerable hoops when doing live migration. So just as an application should be able to launch the initial QEMU process, it should be able to directly launch it ready for incoming migration, and then trigger live migration to use this pre-launched virtual machine. In general the concept is that although the primary libvirt.so API will still consider the virt host to be a black box, below this libvirt should not be afraid to open up the black box to applications to expose hypervisor specific details as fully supported concepts. Applications can opt-in to using this, or continue to solely use the hypervisor agnostic API, as best fits their needs. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Nov 14, 2017 at 05:25:03PM +0000, Daniel P. Berrange wrote:
I would anticipate a standalone process "libvirt-qemu" that an application can spawn, providing a normal domain XML file via the command line or stdin. It would then connect to libvirtd to register its existance and claim its ownership of the guest name + UUID. Assuming that succeeds, 'libvirt-qemu' would directly spawn QEMU.
To be really clear about this, the application would run something like: libvirt_xml = sprintf ("<domain><uuid>%s</uuid> etc etc", uuid); libvirt_xml_file = /* write libvirt_xml to a temporary file */; if (fork () == 0) { execlp ("libvirt-qemu", "libvirt-qemu", "--config", libvirt_xml_file, NULL); } dom = virDomainLookupByUUID (conn, uuid); libvirt-qemu would exec(2) qemu? Above I've assumed that we need to get a libvirt handle for ongoing interactions with the new qemu process. Would we get that via the name or UUID from the XML, ie. calling virDomainLookupByUUID? I guess there's some raciness here. The libvirt domain wouldn't exist immediately in the application process. In general it does sound like a good plan, and solves a problem for libguestfs too. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org

On Wed, Nov 15, 2017 at 05:57:45PM +0000, Richard W.M. Jones wrote:
On Tue, Nov 14, 2017 at 05:25:03PM +0000, Daniel P. Berrange wrote:
I would anticipate a standalone process "libvirt-qemu" that an application can spawn, providing a normal domain XML file via the command line or stdin. It would then connect to libvirtd to register its existance and claim its ownership of the guest name + UUID. Assuming that succeeds, 'libvirt-qemu' would directly spawn QEMU.
To be really clear about this, the application would run something like:
libvirt_xml = sprintf ("<domain><uuid>%s</uuid> etc etc", uuid); libvirt_xml_file = /* write libvirt_xml to a temporary file */;
if (fork () == 0) { execlp ("libvirt-qemu", "libvirt-qemu", "--config", libvirt_xml_file, NULL); }
dom = virDomainLookupByUUID (conn, uuid);
libvirt-qemu would exec(2) qemu?
Yes, that is pretty much exactly what I am suggesting.
Above I've assumed that we need to get a libvirt handle for ongoing interactions with the new qemu process. Would we get that via the name or UUID from the XML, ie. calling virDomainLookupByUUID? I guess there's some raciness here. The libvirt domain wouldn't exist immediately in the application process.
The libvirt-qemu would register itself with libvirtd, and then libguestfs would have to speak to libvirtd for ongoing management. Though for the purposes of shutdown, it would be valid to just kill() the children directly if desired. In the second mail in this series, I describe a way to decompose libvirtd, whereupon ongoing management could be handled inside libvirt-qemu itself. That would potentially avoid the need for libguestfs to talk to libvirtd at all. Though this would be a secondary piece of work Initially we could avoid the raciness if libvirt-qemu implemented the systemd startup notification protocol. That would let libvirt-qemu notify libguestfs (or whoever spawns it), /after/ it has successfully registered itself with libvirtd. So you can then virDomainLookupByUUID without any race (ie you can then assume that if virDomainLookupByUUID fails, it means the new QEMU has already quit) Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Nov 15, 2017 at 06:19:38PM +0000, Daniel P. Berrange wrote:
On Wed, Nov 15, 2017 at 05:57:45PM +0000, Richard W.M. Jones wrote:
On Tue, Nov 14, 2017 at 05:25:03PM +0000, Daniel P. Berrange wrote:
I would anticipate a standalone process "libvirt-qemu" that an application can spawn, providing a normal domain XML file via the command line or stdin. It would then connect to libvirtd to register its existance and claim its ownership of the guest name + UUID. Assuming that succeeds, 'libvirt-qemu' would directly spawn QEMU.
To be really clear about this, the application would run something like:
libvirt_xml = sprintf ("<domain><uuid>%s</uuid> etc etc", uuid); libvirt_xml_file = /* write libvirt_xml to a temporary file */;
if (fork () == 0) { execlp ("libvirt-qemu", "libvirt-qemu", "--config", libvirt_xml_file, NULL); }
dom = virDomainLookupByUUID (conn, uuid);
libvirt-qemu would exec(2) qemu?
Yes, that is pretty much exactly what I am suggesting.
Above I've assumed that we need to get a libvirt handle for ongoing interactions with the new qemu process. Would we get that via the name or UUID from the XML, ie. calling virDomainLookupByUUID? I guess there's some raciness here. The libvirt domain wouldn't exist immediately in the application process.
The libvirt-qemu would register itself with libvirtd, and then libguestfs would have to speak to libvirtd for ongoing management. Though for the purposes of shutdown, it would be valid to just kill() the children directly if desired. In the second mail in this series, I describe a way to decompose libvirtd, whereupon ongoing management could be handled inside libvirt-qemu itself. That would potentially avoid the need for libguestfs to talk to libvirtd at all. Though this would be a secondary piece of work
Initially we could avoid the raciness if libvirt-qemu implemented the systemd startup notification protocol. That would let libvirt-qemu notify libguestfs (or whoever spawns it), /after/ it has successfully registered itself with libvirtd. So you can then virDomainLookupByUUID without any race (ie you can then assume that if virDomainLookupByUUID fails, it means the new QEMU has already quit)
I like the whole idea. I'm replying here because this is the most relevant part of this particular sub-thread. Would it be too much for us to go beyond this and offer more functionality without actually talking to the daemon? Let's say we: - return the UUID instead of requiring it - allow having more signal handlers than for just SIGTERM - maybe add some simple protocol that libvirt-qemu shim would implement on stdin/out these three things would provide the shim usable for things not compiled with libvirt at all, maybe even users. I'm not saying this must be something we strive for from day one, just something we could consider not forbidding.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Sun, Nov 19, 2017 at 10:21:32PM +0100, Martin Kletzander wrote:
On Wed, Nov 15, 2017 at 06:19:38PM +0000, Daniel P. Berrange wrote:
On Wed, Nov 15, 2017 at 05:57:45PM +0000, Richard W.M. Jones wrote:
On Tue, Nov 14, 2017 at 05:25:03PM +0000, Daniel P. Berrange wrote:
I would anticipate a standalone process "libvirt-qemu" that an application can spawn, providing a normal domain XML file via the command line or stdin. It would then connect to libvirtd to register its existance and claim its ownership of the guest name + UUID. Assuming that succeeds, 'libvirt-qemu' would directly spawn QEMU.
To be really clear about this, the application would run something like:
libvirt_xml = sprintf ("<domain><uuid>%s</uuid> etc etc", uuid); libvirt_xml_file = /* write libvirt_xml to a temporary file */;
if (fork () == 0) { execlp ("libvirt-qemu", "libvirt-qemu", "--config", libvirt_xml_file, NULL); }
dom = virDomainLookupByUUID (conn, uuid);
libvirt-qemu would exec(2) qemu?
Yes, that is pretty much exactly what I am suggesting.
Above I've assumed that we need to get a libvirt handle for ongoing interactions with the new qemu process. Would we get that via the name or UUID from the XML, ie. calling virDomainLookupByUUID? I guess there's some raciness here. The libvirt domain wouldn't exist immediately in the application process.
The libvirt-qemu would register itself with libvirtd, and then libguestfs would have to speak to libvirtd for ongoing management. Though for the purposes of shutdown, it would be valid to just kill() the children directly if desired. In the second mail in this series, I describe a way to decompose libvirtd, whereupon ongoing management could be handled inside libvirt-qemu itself. That would potentially avoid the need for libguestfs to talk to libvirtd at all. Though this would be a secondary piece of work
Initially we could avoid the raciness if libvirt-qemu implemented the systemd startup notification protocol. That would let libvirt-qemu notify libguestfs (or whoever spawns it), /after/ it has successfully registered itself with libvirtd. So you can then virDomainLookupByUUID without any race (ie you can then assume that if virDomainLookupByUUID fails, it means the new QEMU has already quit)
I like the whole idea. I'm replying here because this is the most relevant part of this particular sub-thread. Would it be too much for us to go beyond this and offer more functionality without actually talking to the daemon? Let's say we:
- return the UUID instead of requiring it
- allow having more signal handlers than for just SIGTERM
- maybe add some simple protocol that libvirt-qemu shim would implement on stdin/out
these three things would provide the shim usable for things not compiled with libvirt at all, maybe even users. I'm not saying this must be something we strive for from day one, just something we could consider not forbidding.
I guess there's a distinction between what the apps has to do vs what the shim has todo. At least in the short-medium term, the shim itself would still need to communicate with libvirtd (or other related daemons if it needs to resolve other objects. eg resolve the "default" virtual network). The application does not neccessarily need to talk to libvirtd though. It would be enough to spawn the libvirt-qemu shim, and query stats out of cgroup directly and then kill() the shim when done. If you want to dynamically make changes to the QEMU config on the fly though, then we need some kind of API, and I'm not seeing a compelling reason to change the API we currently provide apps. It could, however, be possible for that libvirt.so remote driver to connect directly to the shim todo its work. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11/15/2017 06:57 PM, Richard W.M. Jones wrote:
On Tue, Nov 14, 2017 at 05:25:03PM +0000, Daniel P. Berrange wrote:
I would anticipate a standalone process "libvirt-qemu" that an application can spawn, providing a normal domain XML file via the command line or stdin. It would then connect to libvirtd to register its existance and claim its ownership of the guest name + UUID. Assuming that succeeds, 'libvirt-qemu' would directly spawn QEMU.
To be really clear about this, the application would run something like:
libvirt_xml = sprintf ("<domain><uuid>%s</uuid> etc etc", uuid); libvirt_xml_file = /* write libvirt_xml to a temporary file */;
if (fork () == 0) { execlp ("libvirt-qemu", "libvirt-qemu", "--config", libvirt_xml_file, NULL);
Problem with this is libvirt-qemu binary would need a connection object so that ...
}
dom = virDomainLookupByUUID (conn, uuid);
... it registers the domain under @conn connection (= it needs to register the domain at the right libvirtd [or whomever is going to keep list of running domains]). Michal

On Thu, Nov 23, 2017 at 11:32:13AM +0100, Michal Privoznik wrote:
On 11/15/2017 06:57 PM, Richard W.M. Jones wrote:
On Tue, Nov 14, 2017 at 05:25:03PM +0000, Daniel P. Berrange wrote:
I would anticipate a standalone process "libvirt-qemu" that an application can spawn, providing a normal domain XML file via the command line or stdin. It would then connect to libvirtd to register its existance and claim its ownership of the guest name + UUID. Assuming that succeeds, 'libvirt-qemu' would directly spawn QEMU.
To be really clear about this, the application would run something like:
libvirt_xml = sprintf ("<domain><uuid>%s</uuid> etc etc", uuid); libvirt_xml_file = /* write libvirt_xml to a temporary file */;
if (fork () == 0) { execlp ("libvirt-qemu", "libvirt-qemu", "--config", libvirt_xml_file, NULL);
Problem with this is libvirt-qemu binary would need a connection object so that ...
That's not really a problem - I certainly expected that to be the case. There would be a '--connect URI' arg, but since this is a QEMU specific process, it could sensibly default to qemu:///session or qemu://system as appropriate
}
dom = virDomainLookupByUUID (conn, uuid);
... it registers the domain under @conn connection (= it needs to register the domain at the right libvirtd [or whomever is going to keep list of running domains]).
Yes. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11/14/2017 06:25 PM, Daniel P. Berrange wrote:
The problem(s) ==============
As mentioned earlier, if an application is only concerned with managing of KVM (or other stateful drivers running inside libvirtd), we have scope to be able to expose a fully asynchronous management API to applications. Such an undertaking would effectively mean creating an entirely new libvirt client library, to expose the asynchronous design, and we obvious have to keep the current library around long term regardless. Creating a new library would also involve creating new language bindings, which just adds to the work. Rather than undertake this massive amount of extra work, I think it is worth considering declaring the RPC protocol to be a fully supported interface for applications to consume.
I don't think this a good idea.
There are already projects which are re-implemented the libvirt client API directly ontop of the RPC protocol, bypassing libvirt.so. We have always strongly discouraged this, but none the less it has happened. As we have to maintain strong protocol compatibility on the RPC layer, it is effectively a stable API already.
And we discourage that for a reason. For instance, our client library does some checks before anything hits the RPC layer, or sets up a state (remoteAuthSASL()). Also, the client library encapsulates two or more calls into a single function: remoteDomainCreate() for instance. Michal

On Thu, Nov 23, 2017 at 11:32:19AM +0100, Michal Privoznik wrote:
On 11/14/2017 06:25 PM, Daniel P. Berrange wrote:
The problem(s) ==============
As mentioned earlier, if an application is only concerned with managing of KVM (or other stateful drivers running inside libvirtd), we have scope to be able to expose a fully asynchronous management API to applications. Such an undertaking would effectively mean creating an entirely new libvirt client library, to expose the asynchronous design, and we obvious have to keep the current library around long term regardless. Creating a new library would also involve creating new language bindings, which just adds to the work. Rather than undertake this massive amount of extra work, I think it is worth considering declaring the RPC protocol to be a fully supported interface for applications to consume.
I don't think this a good idea.
NB, docs declaring this supported were already acked & merged: https://libvirt.org/support.html#rpcproto
There are already projects which are re-implemented the libvirt client API directly ontop of the RPC protocol, bypassing libvirt.so. We have always strongly discouraged this, but none the less it has happened. As we have to maintain strong protocol compatibility on the RPC layer, it is effectively a stable API already.
And we discourage that for a reason. For instance, our client library does some checks before anything hits the RPC layer, or sets up a state (remoteAuthSASL()). Also, the client library encapsulates two or more calls into a single function: remoteDomainCreate() for instance.
Nothing in the server can rely on the client library performing checkins before the RPC is sent, so those client side checks are at best an optimization to get quicker / clearer error message. An alternative impl would of course need to implement SASL / TLS in the compatible manner. The remoteDomainCreate() impl was a hack, because we forgot that we needed to update the virDomainPtr with the ID value. An implementing the client directly will not have any virDomainPtr object to update. They'll have their own concept there, which may not even care about having an ID value cached locally - I certainly wouldn't bother if we wrote libvirt again - using the UUID exclusively is a much better choice. So they need not have multiple RPC calls in the same place. On the other hand, since the app is not constrained by the need to follow the libvirt public client API, then may well write their client impl in a manner that doesn't have a 1-1 mapping to RPC messages - the 1-1 mapping is merely how we chose todo it for libvirt.so. They could write something higher level that triggers many RPC calls for one application call if they so desire. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11/23/2017 11:42 AM, Daniel P. Berrange wrote:
On Thu, Nov 23, 2017 at 11:32:19AM +0100, Michal Privoznik wrote:
On 11/14/2017 06:25 PM, Daniel P. Berrange wrote:
The problem(s) ==============
As mentioned earlier, if an application is only concerned with managing of KVM (or other stateful drivers running inside libvirtd), we have scope to be able to expose a fully asynchronous management API to applications. Such an undertaking would effectively mean creating an entirely new libvirt client library, to expose the asynchronous design, and we obvious have to keep the current library around long term regardless. Creating a new library would also involve creating new language bindings, which just adds to the work. Rather than undertake this massive amount of extra work, I think it is worth considering declaring the RPC protocol to be a fully supported interface for applications to consume.
I don't think this a good idea.
NB, docs declaring this supported were already acked & merged:
Oh, how did that one slipped in? :-)
There are already projects which are re-implemented the libvirt client API directly ontop of the RPC protocol, bypassing libvirt.so. We have always strongly discouraged this, but none the less it has happened. As we have to maintain strong protocol compatibility on the RPC layer, it is effectively a stable API already.
And we discourage that for a reason. For instance, our client library does some checks before anything hits the RPC layer, or sets up a state (remoteAuthSASL()). Also, the client library encapsulates two or more calls into a single function: remoteDomainCreate() for instance.
Nothing in the server can rely on the client library performing checkins before the RPC is sent, so those client side checks are at best an optimization to get quicker / clearer error message. An alternative impl would of course need to implement SASL / TLS in the compatible manner.
Sure. Receiver (=server) has to do the checks itself.
The remoteDomainCreate() impl was a hack, because we forgot that we needed to update the virDomainPtr with the ID value. An implementing the client directly will not have any virDomainPtr object to update. They'll have their own concept there, which may not even care about having an ID value cached locally - I certainly wouldn't bother if we wrote libvirt again - using the UUID exclusively is a much better choice. So they need not have multiple RPC calls in the same place. On the other hand, since the app is not constrained by the need to follow the libvirt public client API, then may well write their client impl in a manner that doesn't have a 1-1 mapping to RPC messages - the 1-1 mapping is merely how we chose todo it for libvirt.so. They could write something higher level that triggers many RPC calls for one application call if they so desire.
Okay. Makes sense. But just to be clear - we will expose our *public* RPC (which can be mapped onto our public APIs), but NOT expose our *private* RPC which is the one used to communicate between two daemons (say virthypervisord and virnetworkd), right? Moreover, if we require all the daemons to be the same version we can consider the private RPC unstable and we can change it as we please. E.g. virthypervisord calls a function from virstoraged to determine blocking chain. If we need to add a new argument, we can just change the RPC instead of introducing new v2 of the call. Michal

On Thu, Nov 23, 2017 at 04:58:55PM +0100, Michal Privoznik wrote:
On 11/23/2017 11:42 AM, Daniel P. Berrange wrote:
On 11/14/2017 06:25 PM, Daniel P. Berrange wrote: The remoteDomainCreate() impl was a hack, because we forgot that we needed to update the virDomainPtr with the ID value. An implementing
On Thu, Nov 23, 2017 at 11:32:19AM +0100, Michal Privoznik wrote: the client directly will not have any virDomainPtr object to update. They'll have their own concept there, which may not even care about having an ID value cached locally - I certainly wouldn't bother if we wrote libvirt again - using the UUID exclusively is a much better choice. So they need not have multiple RPC calls in the same place. On the other hand, since the app is not constrained by the need to follow the libvirt public client API, then may well write their client impl in a manner that doesn't have a 1-1 mapping to RPC messages - the 1-1 mapping is merely how we chose todo it for libvirt.so. They could write something higher level that triggers many RPC calls for one application call if they so desire.
Okay. Makes sense. But just to be clear - we will expose our *public* RPC (which can be mapped onto our public APIs), but NOT expose our *private* RPC which is the one used to communicate between two daemons (say virthypervisord and virnetworkd), right? Moreover, if we require all the daemons to be the same version we can consider the private RPC unstable and we can change it as we please. E.g. virthypervisord calls a function from virstoraged to determine blocking chain. If we need to add a new argument, we can just change the RPC instead of introducing new v2 of the call.
Agreed, I would only want our primary RPC protocol considered supported, Never any of the inter-daemon communiction protocols, which should remain subject to change - this is another good reason to only ever support UNIX domain sockets in these modular daemons - explicitly means we don't have to care about compat across libvirt versions. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

The problem(s) ============== The libvirtd architecture has evolved over time, initially as an expediant solution to the problem of managing virtual networks and QEMU processes, and over time came to control all the other resources too. It is only avoided in the case of the stateless hypervisor drivers which talk to remote RPC systems (VMWare ESX, HyperV, etc). We later introduced the concepts of loadable modules, and separate daemons for locking and logging because of the key requirement that the latter services be re-exec()able while VMs are running. Despite the existance virtlogd & virtlockd, the libvirtd daemon is clearly using the monolithic service model. This has a direction impact on both the reliability and security of libvirtd. QEMU has the nice characteristic that since it is just a regular process, if one QEMU goes bad, the other QEMUs continue to operate normally. Libvirtd then throws away this advantage, by introducing an architecture where if one QEMU goes bad, it can easily impact all other QEMU processes. This can either be due to libvirtd crashing, preventing mgmt of all resources, or due to a rogue QEMU giving libvirtd so much work todo that other jobs get starved. When we first hit this we introduced multithreading inside libvirtd, which did help, but made life more complicated. We then saw bottlenecks on the QEMU driver level locks and had to switch to a lockless driver, with just the VM locks. We then also had to introduce the job concept, and then async job concept to allow APIs to complete while the monitor is being used. There are still concurrency problems in this area, for example QMP event processing in the main thread can block other API calls and keepalives for arbitrary amounts of time. It is worse though, because a problem in other areas of libvirtd related to storage, networking, node devices, and so on can also impact ability to manage QEMU and vica-verca. This is inherant in the monolithic design of libvirtd where a single daemon does everything. There are 100's of 1000's of lines of complex code, and a single bug can impact everything inside libvirtd. The monolithic model is bad for security too. Given the broad set of features supported by libvirtd it is impossible to write any meaningful SELinux policy to lock down its capabilities, unless you're willing to simply block large feature sets. What is worse is that many of these features require root privileges, so libvirtd has a whole needs to run as root, and has no security confinement. Libvirtd meanwhile has to directly interact with non-trusted components such as the QEMU monitor console, so its security is paramount to preventing a malicious QEMU from escaping its confinement. To the best of my knowledge no one has tried to break out of QEMU by attacking libvirtd via QMP, but that's probably just because they've not told us. The final problem with libvirtd is the split between system and session mode. We've long told people that session mode is for desktop virt and system mode is for server virt, but this simple explanation of roles fails in the real world. It has been a source of pain for libguestfs for example, which wants to be able to simply run QEMU with the same rights as the application which invokes libguestfs. The system vs session distinction means it often hits problems where the app using libguestfs can read the disk file, but QEMU launched by libvirtd on libguestfs' behalf cannot read it. Then there is the fact that with session mode, network connectivity is a disaster. We hacked around this by using a setuid helper, which lets the admin grant a user the ability to access a specific bridge device. The mgmt app though is locked out of all the virtual network management APIs with the session instance. The conceptual model here is really wrong. Just because you want to have the QEMU processes runing under the unprivileged user, doesn't imply that you want the network managements APIs under the same user account. In retrospect, simply duplicating the privileged libvirtd functionality in a non-privileged libvirtd was a clear mistake. Some areas of functionality inherantly require a privilegd environment and should only ever have run inside root libvirtd. The solution(s) =============== As noted above, we made some baby-steps towards a modular daemon architecture when we introduced virtlockd and virlogd. It is now time to fully commit to a modular design and explode libvirtd into a swarm of daemons each responsible for a clearly demarked task. Such a decomposition would naturally fall across the internal driver boundaries, giving a virtnwfilterd, virtnetworkd, virtstoraged, virtnodedevd, etc. We have to maintain compatibility with our existing client API implementation though. This libvirtd would have to still accept connections from the client and route the RPC request directly onto the modular daemon. We could also enhance the client API to directly know how to connect to the modular daemons, bypassing libvirtd. If we restricted the modular daemons to only concern themselves with local UNIX domain socket usage, we could then provide libvirtd as the bridge to remote TCP access, and for backcompat with legacy client library impls. [app] -> [libvirt.so] -> [libvirtd] becomes [app] -> [libvirt.so] -> [virthypervisord] +> [virtnetworkd] +> [virtstoraged] ...etc With this more modular design, we now have the flexibilty to make the non-root libvirt usage more usable in the real world. For example, desktop virt can now use non-root virthypervisord to manage QEMU processes under the local user but connect to the privileged virtnetworkd to see the network connectivity. The non-root virthypervisord would also talk to virtnetworkd to acquire a TAP device for the guest during startup, with the FD being passed back across the UNIX socket. This gives us finer grained access control options, where we can selectively require the root password depending on the featureset the guest is requesting. For example, non-root libvirt could require root password in order to acquire access to a vGPU device from privileged virtnodedevd. The modular design also potentially unlocks the functionality of libvirt so that it can be used in isolation. For example, there are scenarios where a management application may wish to use the storage pools API to manage a pool of disk images but doesn't need anything related to the hypervisr. Currently you're forced to have a hypervisor driver present in libvirtd to get a connection, even if you'll never use it. Even with a virthypervisord separated out from libvirtd, it is still effectively a monolithic design from the POV of the hypervisor components. So a problem in interacting with any single QEMU process still has the potential to negatively impact our ability to manage over QEMU processes. And of course a code bug that causes a crash takes out the ability to manage everything. The previous mail describes a change to introduce a 'libvirt-qemu' shim to manage startup for an individual QEMU process. Once this shim process exists, the obvious question to ask is whether it can take responsibility for ongoing management of the QEMU process, essentially owning the monitor connection. A very large portion of the virDomain related APIs are naturally scoped to only operate on a single QEMU process. Essentially they invoke monitor APIs and get responses, acting as a transformation layer between the libvirt API/XML format and the QMP format. Their implementation does, however, often touch global state when dealing with acquisition of shared resources such as PCI devices, network devices, etc. The allocation of such shared state should be the responsibility of the individual daemons though (virtnodedevd, virtnetworkd, etc). With all this in mind, it would be possible to move the bulk of individual QEMU management into the 'libvirt-qemu' shim. The virthypervisord would essentially act as an aggregation service and registry. It would handle the APIs that deal with bulk querying of resources, and ensuring uniqueness of domain UUIUDs and names, etc. Any functional operations on individual guests would be simly passed onto the respective 'libvirt-qemu' shim. [app] -> [libvirt.so] -> [virthypervisord] -> [libvirt-shim] -> [qemu] +> [libvirt-shim] -> [qemu] +> [libvirt-shim] -> [qemu] +> [libvirt-shim] -> [qemu] One might suggest that this would just inherit all the same problems the current libvirtd has, just with the QMP monitor interaction replaced by RPC calls. The key difference here though is that when libvirtd deals with QEMU it is forced to call into the synchronous libvirt.so public API to execute individual API calls. This forced libvirtd to take the approach of creating many worker threads to execute blocking APIs. By contrast when the virthypervisord daemon calls into the 'libvirt-shim' to perform a command, it would directly use the low level RPC APIs we have. This would enable it to implement a full asynchronous approach and not require a big pool of worker threads that block. While it would not magically solve all scalability problems, it would be a less complex internal code flow with less juggling of threads. More importantly is that a bug in any of the QEMU driver logic relating to QMP would only affect that single 'libvirt-qemu' process which improves the overall system reliability and potentially offers a more secure system. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11/14/2017 06:26 PM, Daniel P. Berrange wrote:
The solution(s) ===============
As noted above, we made some baby-steps towards a modular daemon architecture when we introduced virtlockd and virlogd. It is now time to fully commit to a modular design and explode libvirtd into a swarm of daemons each responsible for a clearly demarked task. Such a decomposition would naturally fall across the internal driver boundaries, giving a virtnwfilterd, virtnetworkd, virtstoraged, virtnodedevd, etc. We have to maintain compatibility with our existing client API implementation though. This libvirtd would have to still accept connections from the client and route the RPC request directly onto the modular daemon. We could also enhance the client API to directly know how to connect to the modular daemons, bypassing libvirtd. If we restricted the modular daemons to only concern themselves with local UNIX domain socket usage, we could then provide libvirtd as the bridge to remote TCP access, and for backcompat with legacy client library impls.
[app] -> [libvirt.so] -> [libvirtd]
becomes
[app] -> [libvirt.so] -> [virthypervisord] +> [virtnetworkd] +> [virtstoraged] ...etc
So what about remote connections? Say hostA is running my KVMs and hostB is where mgmt app lives. If hostB is connecting to hostA's libvirt I guess it's still going to be libvirtd which will then multiplex RPC calls and redirect them to correct daemon? IOW, if hostB calls virStoragePoolGetXMLDesc() how is this request going to end up at hostA's virstoraged? Michal

On Thu, Nov 23, 2017 at 11:32:14AM +0100, Michal Privoznik wrote:
On 11/14/2017 06:26 PM, Daniel P. Berrange wrote:
The solution(s) ===============
As noted above, we made some baby-steps towards a modular daemon architecture when we introduced virtlockd and virlogd. It is now time to fully commit to a modular design and explode libvirtd into a swarm of daemons each responsible for a clearly demarked task. Such a decomposition would naturally fall across the internal driver boundaries, giving a virtnwfilterd, virtnetworkd, virtstoraged, virtnodedevd, etc. We have to maintain compatibility with our existing client API implementation though. This libvirtd would have to still accept connections from the client and route the RPC request directly onto the modular daemon. We could also enhance the client API to directly know how to connect to the modular daemons, bypassing libvirtd. If we restricted the modular daemons to only concern themselves with local UNIX domain socket usage, we could then provide libvirtd as the bridge to remote TCP access, and for backcompat with legacy client library impls.
[app] -> [libvirt.so] -> [libvirtd]
becomes
[app] -> [libvirt.so] -> [virthypervisord] +> [virtnetworkd] +> [virtstoraged] ...etc
So what about remote connections? Say hostA is running my KVMs and hostB is where mgmt app lives. If hostB is connecting to hostA's libvirt I guess it's still going to be libvirtd which will then multiplex RPC calls and redirect them to correct daemon? IOW, if hostB calls virStoragePoolGetXMLDesc() how is this request going to end up at hostA's virstoraged?
Yes, for back-compat, we must aways have the single libvirtd around and capable of acting as a pass-through proxy, even in the local-only case. For the remote case, I think it is compelling to have libvirtd there forever, such that libvirtd is the only daemon that uses TCP, TLS, etc The modular daemons would be UNIX domain sockets only. This gives a strong barrier between the functional areas and the remote RPC access. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

The Problem(s) ============== When libvirt was created, C was the only viable choice for anything aiming to be a core system library component. At that time 2005, aside from C there were common choices of Java, Python, Perl. Java was way too heavy for a low level system component, Python was becoming popular but not widely used for low level system services and Perl was on a downward trend. None of them are accessible to arbitrary languages as libraries, without providing a RPC based API service. As it turns out libvirt did end up having RPC based approach for many virt drivers, but the original approach was to be a pure library component. IOW it is understandable why C was chosen back in 2005, but 12 years on the world around us has changed significantly. It has long been accepted that C is a very challenging language to write "safe" applications. By "safe" I mean avoiding the many problems that lead to critical security bugs. In particular the lack of a safe memory management framework leads to memory leaks, double free's, stack or heap corruption and more. The lack of strict type safety just compounds these problems. We've got many tools to help us in this area, and at times have tried to design our APIs to avoid problems, but there's no getting away from fact that even the best programmers will continually screw up memory management leading to crashes & security flaws. It is just a fact of life when using C, particularly if you want to be fast at accepting new feature proposals. It is no surprise that there have been no new mainstream programming languages in years (decades) which provide an inherantly unsafe memory management framework. Even back in 2005 security was a serious challenge, but in the last 10+ years the situation has only got worse with countless high profile security bugs a direct result of the choice to use C. Given the threat's faced today, one has to seriously consider the wisdom of writing any new system software in C. In another 10 years time, it would not surprise me if any system software still using C is considered an obsolete relic, and ripe for a rewrite in a memory safe language. There are long term implications for the potential pool of contributors in the future. There has always been a limited pool of programmers able todo a good job in C, compared to those who know higher level languages like Python/Java. A programmer write bad code in any language, but in C/C++ that bad code quickly turns into a serious problem. Libvirt has done ok despite this, but I feel our level of contribution, particularly "drive by" patch submissions, is held back by use of C. Move forward another 10 years, and while C will certainly exist, I struggle to imagine the talent pool being larger. On the contrary I would expect it to shrink, certainly in relative terms, and possibly in absolute terms, as other new languages take C's place for low level systems programming. 10 years ago, Docker would have been written in C, but they took the sensible decision to pick Go instead. This is happening everywhere I look, and if not Go, then Rust. We push up against the boundaries of what's sane todo in C in other ways too. For portability across operating systems, we have to rely on GNULIB to try to sanitize the platform inconsistencies where we use POSIX, and assume that any 3rd party libraries we use have done likewise. Even then, we've tried to avoid using the platform APIs because their designs are often too unsafe to risk using directly (strcat, malloc, free), or are not thread safe (APIs lacking _r variants). So we build our own custom C platform library on top of the base POSIX system, re-inventing the same wheel that every other project written in C invents. Every time we have to do work at the core C platform level, it is diverting time away from doing working managing higher level concepts. Our code is following an object oriented design in many areas, but such a notion is foreign to C, so we have to bolt a poor-mans OO framework on the side. This feeds back into the memory safety problem, because our OO invention cannot be type checked reliably at compile time, making it easy to do unsafe things with objects. It relies on reference counting because there's no automatic memory management. The other big trend of the past 10 years has been the increase in CPU core counts. My first libvirt dev machine had 1 physical CPU with no cores or threads or NUMA. My current libvirt dev machine has 2 CPUs, each with 6 cores, for 12 logical CPUs. Common server machines have 32/64 logical CPUs, and high end has 100's of CPUs. In 10 years, we'll see high end machines with 1000's of CPUs and entry level with mere 100's. IOW good concurrency is going to be key for any scalable application. Libvirt is actually doing reasonably well in this respect via our heavily threaded libvirtd daemon. It is not without cost though with ever more complex threading & locking models, which still have scalability problems. Part of the problem is that, despite Linux having very low overhead thread spawning, threads still consume non-trivial resources, so we try to constrain how many we use, which forces an M:N relationship between jobs we need to process and threads we have available. The Solution(s) =============== Two fairly recent languages, Go & Rust, have introduced new credible options for writing systems applications without sacrificing the performance of C, while achieving the kind of ease of use / speed of development seen with languages like Python. It goes without saying that both of them are memory safe languages, immediately solving the biggest risk of using C / C++. The particularly interesting & relevant innovation of Go is the concept of Goroutines for concurrent programming, which provide a hybrid kernel/userspace threading model. This lowers the overhead of concurrency to the point where you can consider spawning a new goroutine for each logical job. For example, instead of having a single thread or limited pool of threads servicing all QEMU monitor sockets & API clients, can you afford to have a new goroutine dedicated to each monitor socket and API client. That has the potential to dramatically simplify use of concurrency while at the same time allowing the code to make even better use of CPUs with massive core counts. It of course provides a cross platform portable core library of features, and has a massive ecosystem of developers providing further 3rd party libraries for a wide variety of features. This means developers can focus more time on solving the interesting problems in their application space. The Go code is still low level enough that it can interface with C code easily. FFI calls to C APIs can be made inline in the Go code, with no need to switch out to write a low level binding in C itself. In many ways, Go can be said to have the ease of use, fast learning & safety of Python, combined with the expressiveness of C. IOW it is a better C than C. I don't have direct experiance in Rust, but it has the same kind of benefits over C as Go does, again without the downsides of languages like Python or Java. There are some interesting unique features to Rust that can be important to some apps. In particular it does not use garbage collection, instead the user must still do manual memory management as you would with C/C++. This allows Rust to be used in performance critical cases where it is unacceptable to have a garbage collector run. Despite a requirement for manual allocation/deallocation, Rust still provides a safe memory model. This approach of avoiding abstractions which will introduce performance overhead is a theme of Rust. The cost of such an approach is that development has a higher learning curve and ongoing cost in Rust, as compared to Go. I don't believe that the unique features of Rust, over Go, are important to the needs of libvirt. eg while for QEMU it would be critical to not have a GC doing asynchronous memory deallocation, this is not at all important to libvirt. In fact precisely the opposite, libvirt would benefit much more from having GC take care of deallocation, letting developers focus attention other areas. In general, as from having a memory safe language, what libvirt would most benefit from is productivity gains & ease of contribution. This is the core competancy of Go, and why it is the right choice for usage in libvirt. The obvious question / difficulty is deciding how to adopt usage of a new language, without throwing everything away and starting from scratch. It needs to be possible for contributors to continue working on every other aspect of the project while adoption takes place over the long term. Blocking ongoing feature work for prolonged periods of time is not acceptable. There is also a question of scope of the work. A possible target would be to aim for 100% elimination of C in N years time (for a value of N that is certainly greater than 5, possibly as much as 10). There is a question of just whether that is a good use of resources, and even practical. In terms of management of KVM guests the bulk of ongoing development work, and complexity is in the libvirtd daemon. The libvirt.so library merely provides the remote driver client which is largely stable & unchanging. So with this in the mind the biggest benefits would be in tackling the daemon part of the code where all the complexity lives. As mentioned earlier, Go has a very effective FFI mechanism for calling C code from Go, and also allows Go code to be called from C. There are some caveats to be aware of with passing data between the languages, however, generally it is neccessary to copy data structures as C code is not permitted to derefence pointers that are owned by the Go GC system. There are two possible approaches to take, which can be crudely described as top down, or bottom up. In the top down approach, the C file providing the main() method gets replaced by a Go file providing an equivalent main() method, which then simply does an FFI call to the existing libvirt C APIs to run the code. For example it would just call virNetServer APIs to setup the RPC layer. Effectively have a Go program where 90% of the code is an FFI call to existing libvirt C code. Then we would gradually iterate downwards converting increasing areas of C code to Go code. In the bottom up approach, the program remains a C program, but we built .a files containing Go code for core pieces of functionality. The C code can thus call into this archive and end up executing Go code for certain pieces. Then we would gradually iterate upwards converting increasing areas of C code to Go code, until eventually reaching the top main() method. Or a hybrid of both approaches can be taken. Whichever way is chosen is going to be a long process and many bumps in the road. The best way to start, however, is probably to focus on a simple self-contained area of libvirt code. Specifically attack the virtlockd, and/or virtlogd daemons, converting them to use Go. This still need not be done in a "big bang". A first phase would be to develop the server side framework for handling our RPC protocol deserialization. This could then just dispatch RPC calls to the existing C impls. As a second phase, the RPC method impls would be converted to Go. Both of these daemons are small enough that the conversion would be possible across the time of a couple of releases. The hardest bit is likely ensuring compatibility for the re-exec() upgrade model they support, but this is none the less doable. The lessons learned in this would go a long way towards informing the best way to tackle the bigger task of the monolithic libvirtd (or equivalently the swarm of daemons the previous proposal suggests) Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hello Daniel, thank you for this interesting insight. The future-proof choice of tools, especially programming languages, is certainly a problem that a lot of project have to solve sooner rather then later. For projects that are currently written in any non-memory-managed language, this issue is even more pressing I guess, looking back at the last decade of more or less devastating security vulnerabilities. However, since your solution part already reads like a call to arms, I have to express my concerns about your problem description and proposal. Daniel P. Berrange <berrange@redhat.com> [2017-11-14, 05:27PM +0000]:
The Problem(s) ==============
When libvirt was created, C was the only viable choice for anything aiming to be a core system library component. At that time 2005, aside from C there were common choices of Java, Python, Perl. Java was way too heavy for a low level system component, Python was becoming popular but not widely used for low level system services and Perl was on a downward trend. None of them are accessible to arbitrary languages as libraries, without providing a RPC based API service. As it turns out libvirt did end up having RPC based approach for many virt drivers, but the original approach was to be a pure library component.
IOW it is understandable why C was chosen back in 2005, but 12 years on the world around us has changed significantly. It has long been accepted that C is a very challenging language to write "safe" applications. By "safe" I mean avoiding the many problems that lead to critical security bugs. In particular the lack of a safe memory management framework leads to memory leaks, double free's, stack or heap corruption and more. The lack of strict type safety just compounds these problems. We've got many tools to help us in this area, and at times have tried to design our APIs to avoid problems, but there's no getting away from fact that even the best programmers will continually screw up memory management leading to crashes & security flaws. It is just a fact of life when using C, particularly if you want to be fast at accepting new feature proposals.
It is no surprise that there have been no new mainstream programming languages in years (decades) which provide an inherantly unsafe memory management framework. Even back in 2005 security was a serious challenge, but in the last 10+ years the situation has only got worse with countless high profile security bugs a direct result of the choice to use C. Given the threat's faced today, one has to seriously consider the wisdom of writing any new system software in C.
I agree for newly written software. There is almost no reasoning to use C for starting another project. Especially given the amount of different options of problem-specific languages out there nowadays. But I don't think argument holds for existing projects. I would suggest that the amount of time that has already been spend in finding and mitigating critical security bugs outweighs the possible inherent safety of any new language.
In another 10 years time, it would not surprise me if any system software still using C is considered an obsolete relic, and ripe for a rewrite in a memory safe language.
I guess this has been said about the C language a lot of times. Of course I don't have any better crystal balls then you do, but at least to this current time I wouldn't think so.
There are long term implications for the potential pool of contributors in the future. There has always been a limited pool of programmers able todo a good job in C, compared to those who know higher level languages like Python/Java. A programmer write bad code in any language, but in C/C++ that bad code quickly turns into a serious problem. Libvirt has done ok despite this, but I feel our level of contribution, particularly "drive by" patch submissions, is held back by use of C. Move forward another 10 years, and while C will certainly exist, I struggle to imagine the talent pool being larger. On the contrary I would expect it to shrink, certainly in relative terms, and possibly in absolute terms, as other new languages take C's place for low level systems programming. 10 years ago, Docker would have been written in C, but they took the sensible decision to pick Go instead. This is happening everywhere I look, and if not Go, then Rust.
Out of interest, I took a look at the CVE history of both libvirt and docker: https://www.cvedetails.com/product/20594/Redhat-Libvirt.html?vendor_id=25 https://www.cvedetails.com/product/28125/Docker-Docker.html?vendor_id=13534 Not sure, how up to date and complete this list is, but for the sake of arguments, let's take it. Docker since its creation in 2014 had 15 CVEs, 2 of them code execution and 3 of them privilege escalation. On the other hand, libvirt had, in the same time frame since 2014, a total of 20 CVEs, 1 of them code execution and 2 privilege escalations. The year 2014 was even an outlier with 13 CVEs that year. So honestly, in terms of security, I don't see a prevailing argument for Go as the better language compared to C. Mind as well that the size of the codebase of libvirt is somewhat 3-6 times larger then that of docker, depending on how you count it. On could argue that at a more mature state of a project one would expect to have less and less CVEs but even if we were to compare docker to libvirt's initial years of CVE history, I don't see a clearer argument.
We push up against the boundaries of what's sane todo in C in other ways too. For portability across operating systems, we have to rely on GNULIB to try to sanitize the platform inconsistencies where we use POSIX, and assume that any 3rd party libraries we use have done likewise.
Even then, we've tried to avoid using the platform APIs because their designs are often too unsafe to risk using directly (strcat, malloc, free), or are not thread safe (APIs lacking _r variants). So we build our own custom C platform library on top of the base POSIX system, re-inventing the same wheel that every other project written in C invents.
Why has there never been a truly satisfying standard library for C for this kind of stuff? If such a project would exist, this wheel re-inventing would be prevented while providing a higher-quality code for platform library code.
Every time we have to do work at the core C platform level, it is diverting time away from doing working managing higher level concepts.
How often is this the case? I assume that platform code does not change that often and will converge into a stable fix-point.
Our code is following an object oriented design in many areas, but such a notion is foreign to C, so we have to bolt a poor-mans OO framework on the side. This feeds back into the memory safety problem, because our OO invention cannot be type checked reliably at compile time, making it easy to do unsafe things with objects. It relies on reference counting because there's no automatic memory management.
The other big trend of the past 10 years has been the increase in CPU core counts. My first libvirt dev machine had 1 physical CPU with no cores or threads or NUMA. My current libvirt dev machine has 2 CPUs, each with 6 cores, for 12 logical CPUs. Common server machines have 32/64 logical CPUs, and high end has 100's of CPUs. In 10 years, we'll see high end machines with 1000's of CPUs and entry level with mere 100's. IOW good concurrency is going to be key for any scalable application. Libvirt is actually doing reasonably well in this respect via our heavily threaded libvirtd daemon. It is not without cost though with ever more complex threading & locking models, which still have scalability problems. Part of the problem is that, despite Linux having very low overhead thread spawning, threads still consume non-trivial resources, so we try to constrain how many we use, which forces an M:N relationship between jobs we need to process and threads we have available.
This is the one argument against C that I fully support. Support for parallelism is missing and with current development of multi- and many-core platforms this is really a show-stopper.
The Solution(s) ===============
[...]
The obvious question / difficulty is deciding how to adopt usage of a new language, without throwing everything away and starting from scratch. It needs to be possible for contributors to continue working on every other aspect of the project while adoption takes place over the long term. Blocking ongoing feature work for prolonged periods of time is not acceptable.
Yes, I fully concur. But still, I have seen many projects that underestimated the amount of work even a partial rewrite in another language takes. And in the end, feature development and even bug fixing WILL suffer from this transition. Maybe it is a good idea to look at the GCC project and their transition from C to C++ and learn from their experience beforehand.
There is also a question of scope of the work. A possible target would be to aim for 100% elimination of C in N years time (for a value of N that is certainly greater than 5, possibly as much as 10). There is a question of just whether that is a good use of resources, and even practical. In terms of management of KVM guests the bulk of ongoing development work, and complexity is in the libvirtd daemon. The libvirt.so library merely provides the remote driver client which is largely stable & unchanging. So with this in the mind the biggest benefits would be in tackling the daemon part of the code where all the complexity lives.
As mentioned earlier, Go has a very effective FFI mechanism for calling C code from Go, and also allows Go code to be called from C. There are some caveats to be aware of with passing data between the languages, however, generally it is neccessary to copy data structures as C code is not permitted to derefence pointers that are owned by the Go GC system. There are two possible approaches to take, which can be crudely described as top down, or bottom up.
Earlier you talked about the contributor pool. But wouldn't your proposal limit this pool even further by actually requiring the intersection of the pool of C developers AND Go developers?
[...]
What I would like to see before any rewrite is taken into consideration, is an effort to reduce complexity, even on the architectural level. Your proposal to split libvirt into set of daemons with specific tasks can help here tremendously. In my opinion, a rewrite in another language should be a last resort thing if every other options have been exhausted, because, from experience, it WILL set a project back. Best, Bjoern -- IBM Systems Linux on z Systems & Virtualization Development ------------------------------------------------------------------------ IBM Deutschland Schönaicher Str. 220 71032 Böblingen Phone: +49 7031 16 1819 E-Mail: bwalk@de.ibm.com ------------------------------------------------------------------------ IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On Wed, Nov 15, 2017 at 12:28:30PM +0100, Bjoern Walk wrote:
Daniel P. Berrange <berrange@redhat.com> [2017-11-14, 05:27PM +0000]:
The Problem(s) ==============
When libvirt was created, C was the only viable choice for anything aiming to be a core system library component. At that time 2005, aside from C there were common choices of Java, Python, Perl. Java was way too heavy for a low level system component, Python was becoming popular but not widely used for low level system services and Perl was on a downward trend. None of them are accessible to arbitrary languages as libraries, without providing a RPC based API service. As it turns out libvirt did end up having RPC based approach for many virt drivers, but the original approach was to be a pure library component.
IOW it is understandable why C was chosen back in 2005, but 12 years on the world around us has changed significantly. It has long been accepted that C is a very challenging language to write "safe" applications. By "safe" I mean avoiding the many problems that lead to critical security bugs. In particular the lack of a safe memory management framework leads to memory leaks, double free's, stack or heap corruption and more. The lack of strict type safety just compounds these problems. We've got many tools to help us in this area, and at times have tried to design our APIs to avoid problems, but there's no getting away from fact that even the best programmers will continually screw up memory management leading to crashes & security flaws. It is just a fact of life when using C, particularly if you want to be fast at accepting new feature proposals.
It is no surprise that there have been no new mainstream programming languages in years (decades) which provide an inherantly unsafe memory management framework. Even back in 2005 security was a serious challenge, but in the last 10+ years the situation has only got worse with countless high profile security bugs a direct result of the choice to use C. Given the threat's faced today, one has to seriously consider the wisdom of writing any new system software in C.
I agree for newly written software. There is almost no reasoning to use C for starting another project. Especially given the amount of different options of problem-specific languages out there nowadays. But I don't think argument holds for existing projects. I would suggest that the amount of time that has already been spend in finding and mitigating critical security bugs outweighs the possible inherent safety of any new language.
I don't think that is the case unless the code is essentially feature complete and not writing significant new code. That is certainly not the case with libvirt, which shows no sign of slowing down in terms of features we must develop. As long as you are continuing to write non-trivial C code, you are continuing to introduce security bugs. Further, things that in the past were not considered security flaws, and increasingly be classed as security flaws. What has improved is that we use more tools to detect the security flaws & crashing bugs after we've introduced them. eg coverity tells us about many screw ups. It is none the less an ongoing issue.
There are long term implications for the potential pool of contributors in the future. There has always been a limited pool of programmers able todo a good job in C, compared to those who know higher level languages like Python/Java. A programmer write bad code in any language, but in C/C++ that bad code quickly turns into a serious problem. Libvirt has done ok despite this, but I feel our level of contribution, particularly "drive by" patch submissions, is held back by use of C. Move forward another 10 years, and while C will certainly exist, I struggle to imagine the talent pool being larger. On the contrary I would expect it to shrink, certainly in relative terms, and possibly in absolute terms, as other new languages take C's place for low level systems programming. 10 years ago, Docker would have been written in C, but they took the sensible decision to pick Go instead. This is happening everywhere I look, and if not Go, then Rust.
Out of interest, I took a look at the CVE history of both libvirt and docker:
https://www.cvedetails.com/product/20594/Redhat-Libvirt.html?vendor_id=25 https://www.cvedetails.com/product/28125/Docker-Docker.html?vendor_id=13534
Not sure, how up to date and complete this list is, but for the sake of arguments, let's take it. Docker since its creation in 2014 had 15 CVEs, 2 of them code execution and 3 of them privilege escalation. On the other hand, libvirt had, in the same time frame since 2014, a total of 20 CVEs, 1 of them code execution and 2 privilege escalations. The year 2014 was even an outlier with 13 CVEs that year. So honestly, in terms of security, I don't see a prevailing argument for Go as the better language compared to C. Mind as well that the size of the codebase of libvirt is somewhat 3-6 times larger then that of docker, depending on how you count it.
I don't think that this is a sensible comparison. For libvirt we have effectively given up and said that a large portion of our APIs have semantic privileges equivalent to a root shell. This turn means that when libvirtd crashes, we claim that it is not a security flaw. We have many 100's of crashes we've solved, and their frequency of occurrance is not slowing down. IOW, if we actually classified all our crashes as CVEs we would have orders of magnitude more CVEs. Essentially we are ignoring the scale of the problem.
We push up against the boundaries of what's sane todo in C in other ways too. For portability across operating systems, we have to rely on GNULIB to try to sanitize the platform inconsistencies where we use POSIX, and assume that any 3rd party libraries we use have done likewise.
Even then, we've tried to avoid using the platform APIs because their designs are often too unsafe to risk using directly (strcat, malloc, free), or are not thread safe (APIs lacking _r variants). So we build our own custom C platform library on top of the base POSIX system, re-inventing the same wheel that every other project written in C invents.
Why has there never been a truly satisfying standard library for C for this kind of stuff? If such a project would exist, this wheel re-inventing would be prevented while providing a higher-quality code for platform library code.
There are countless "standard" libraries all with their own quirks. Many of them don't even really address the core safety problems. eg we could consider glib2 for libvirt, but it just replicates the same awful unsafe malloc API style that the stdc lib has, with marginal improvement. The same is true of most other standard libraries. There's essentially no chance of C ever getting a widely adopted "standard" library at this point. The only place there's any standardization work is at the POSIX level and that's waaaaay to low level.
Every time we have to do work at the core C platform level, it is diverting time away from doing working managing higher level concepts.
How often is this the case? I assume that platform code does not change that often and will converge into a stable fix-point.
Only if we stop delivering new features. In pretty much every monthly release cycle we break platform portability at least once. We have finally have CI coverage to help us detect with this happens, but you still have todo the work to make our code portable. We've got 6000 lines of m4 code for configure.ac in libvirt to cope with this stuff. If you add in gnulib's m4 code that's 45,000 lines of m4. If there's one thing C programmers all agree on it is that autoconf is horrific to write and maintain. This in turn feeds into 1000's of #ifdef statements littered throughout the code, which leads to ongoing maintainence work. Pretty all this work simply doesn't exist in any modern language which provides an inherantly portable core runtime.
The Solution(s) ===============
[...]
The obvious question / difficulty is deciding how to adopt usage of a new language, without throwing everything away and starting from scratch. It needs to be possible for contributors to continue working on every other aspect of the project while adoption takes place over the long term. Blocking ongoing feature work for prolonged periods of time is not acceptable.
Yes, I fully concur. But still, I have seen many projects that underestimated the amount of work even a partial rewrite in another language takes. And in the end, feature development and even bug fixing WILL suffer from this transition.
Maybe it is a good idea to look at the GCC project and their transition from C to C++ and learn from their experience beforehand.
Going from C to C++ is a waste of time IMHO. It just gives you a different and often more complicated ways to shoot yourself in the foot, solving very few of the core problems with C. You're still using an memory unsafe language, you still have tonnes of work to write portable code.
Earlier you talked about the contributor pool. But wouldn't your proposal limit this pool even further by actually requiring the intersection of the pool of C developers AND Go developers?
Depending on the choice of language this is certainly a valid concern. It is the primary reason why I would not suggest use of Rust, because I think that really does limit the pool. With Go though, any C developers who has written non-trivial C programs easily has the skills to learn Go code in a matter of days. It would not be idiomatic Go code, but it doesn't take much more experiance to learn the Go mindset and start writing high quality code. Go has been tailored specifically to make the on-ramp as easy as possible. Mostly you just spend time googling the API docs to learn about the useful stadnard library APIs you can leverage. So I don't thnk you limit the contributor pool to the intersection of developers of the two languages. You really do broaden the pool to more like the union of the two sets.
What I would like to see before any rewrite is taken into consideration, is an effort to reduce complexity, even on the architectural level. Your proposal to split libvirt into set of daemons with specific tasks can help here tremendously. In my opinion, a rewrite in another language should be a last resort thing if every other options have been exhausted, because, from experience, it WILL set a project back.
As mentioned, it would be absolutely critical to ensure that ongoing feature work can still take place during any transition. That is why I am not suggesting it as a "big bang" moment. It would be an incremental job over as long as 5 years or more. There will still be some periods of short term pain, but over the long term the improved productivity would easily make up for that. The recent major rewrite of firefox is a great example of this - they continued releasing on a regular schedule with significant new features, while in parallel rewriting a large part of their code in Rust. The end result is a tremendous step forward for the project.. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Nov 15, 2017 at 02:59:20PM +0000, Daniel P. Berrange wrote:
On Wed, Nov 15, 2017 at 12:28:30PM +0100, Bjoern Walk wrote:
Why has there never been a truly satisfying standard library for C for this kind of stuff? If such a project would exist, this wheel re-inventing would be prevented while providing a higher-quality code for platform library code.
There are countless "standard" libraries all with their own quirks. Many of them don't even really address the core safety problems. eg we could consider glib2 for libvirt, but it just replicates the same awful unsafe malloc API style that the stdc lib has, with marginal improvement.
I think you should look at pool allocators. I really wish I'd used one in libguestfs ... Pool allocators give you a nice middle ground between using C and having something which looks and feels a lot like garbage collection. Pool allocators also fit nicely to servers because it's natural to create a new pool for each request and then free everything in the pool at the end of the request. You can also attach pools to other concepts (in libvirtd that would include per-VM pools). Luckily there is now a widely available pool allocator (widely available because it's used/required by SAMBA): https://talloc.samba.org/talloc/doc/html/index.html Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top

As a point of reference, libguestfs started to allow OCaml in the daemon this summer, so effectively you can now use OCaml to implement libguestfs APIs. (I'm not suggesting libvirt should use OCaml). Some points: - We have a mixed C / OCaml daemon. The vast majority of the code is still C and will most likely always be in C. - This is an incremental approach. We can add new APIs in OCaml, or convert existing APIs from C to OCaml, but mostly existing APIs are left alone. - Our use of a generator makes this easier. The complexity is embedded in the generator. Developers just need to select whether an API is written in C or OCaml, and of course implement/ reimplement the API in the chosen language, and the generator handles the heavy lifting. - OCaml-implemented APIs are less code & safer, while being just as fast. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

On Tue, Nov 14, 2017 at 17:27:01 +0000, Daniel Berrange wrote:
The Problem(s) ==============
Note at first: This is a personal opinion. I'm not discrediting any advantages a different language might have in technical sense. I'm against this. Go is a utterly ugly language. It looks as the authors wanted to be so hip so that they had to do a lot of stuff backwards. Literally. Just look at variable declaration. While some concepts might be cool, the rest of the stuff makes my eyes bleed. I hate it. Seriously who on earth thought that the network socket connection API should be called 'dial'. Any language allows for people to do mistakes. You can eliminate the very common mistakes, but they are comparatively easy to spot and fix. You can't eliminate the very subtle logic problems, which usually linger a long time in libvirt. While the language itself may get rid of one layer of compatibility functions/libraries like gnulib, most of our uses are hidden in utility functions. Getting rid of it will remove some stuff but we will end up with a different pile of utility code in a different language. It may be simpler in the end but it will be there and will be used similarly as we use the current utility code. There's no clear win in my opinion here. Then there's the rewrite. While the language may allow calling of C code and being called from C code we will need to draw a line somewhere. Adding new Go code into existing C code will be ugly. Rejecting someones contribution because it's in C is wrong. Also just spending time writing the same stuff in a supposedly cooler language feels wrong to me. I can see the value in adding new stuff, but it's a waste to rewrite existing stuff. Then there's debugging. Official docs hint to use GDB but in the same paragraph note that GDB does not work entirely well with Go programs. [1] There might be other tools like Delve [2] which are still under heavy development. I'm not entirely persuaded in this area. I don't like this. Don't expect me to write any Go code. Peter [1] https://golang.org/doc/gdb [2] https://github.com/derekparker/delve

On Thu, Nov 16, 2017 at 02:58:33PM +0100, Peter Krempa wrote:
On Tue, Nov 14, 2017 at 17:27:01 +0000, Daniel Berrange wrote:
The Problem(s) ==============
Note at first: This is a personal opinion. I'm not discrediting any advantages a different language might have in technical sense.
I'm against this. Go is a utterly ugly language. It looks as the authors wanted to be so hip so that they had to do a lot of stuff backwards. Literally. Just look at variable declaration. While some concepts might be cool, the rest of the stuff makes my eyes bleed. I hate it. Seriously who on earth thought that the network socket connection API should be called 'dial'.
I have a hard time calling Go ugly when the comparison is C. The things we do with C and macros are horrific by comparison. Go has very strong readability. Variable declartions syntax has little to no impact on ability to use the language productive manner. Likewise API nameing is just a matter of taste. Sure 'dial' is odd, but there's 1000's of wierdly named methods across APIs used in any language.
Any language allows for people to do mistakes. You can eliminate the very common mistakes, but they are comparatively easy to spot and fix. You can't eliminate the very subtle logic problems, which usually linger a long time in libvirt.
I think evidence of libvirt and every other non-trivial C program sshows that when it comes to memory management mistakes it is essentially to eliminate the common mistakes. People make the same mistake over & over again, even experts in the language. We have tools that help us spot some of them, but nowhere near all of them, as evidenced by the constant stream of crashes we have to fix in libvirt which coverity doesn't tell us about. I agree that subtle logic problems can exist in any language and libvirt has its fair share of those and they won't be magically solved by a different language. We waste a hell of alot of resources on dealing with problems inherant in the usage of C (whether platform portability, or mmemory management or reinventing an object system). By not wasting that time, we will ultimately be able to put more time into identifying and fixing the subtle logic problems.
While the language itself may get rid of one layer of compatibility functions/libraries like gnulib, most of our uses are hidden in utility functions. Getting rid of it will remove some stuff but we will end up with a different pile of utility code in a different language. It may be simpler in the end but it will be there and will be used similarly as we use the current utility code. There's no clear win in my opinion here.
I believe it will be a major net win. We have many 10's of thousands of lines of autoconf, and automake code, and code littered if #ifdefs many combinations of which are never tested properly. We have 65,000 lines of code just for parsing & formatting XML. The vast majority of that can be automatically handled in Go with mere annotations on struct fields, and the result is far safer because it would get XML entity escaping right without any effort. ON networking stuff we've written alot of code to help us integrate with sockets, TLS, SSH and so on, because while there are libraries for this none of the them provide the same API. Again the need to write that kind of boilerplating all goes away because there's a proper extensible I/O framework that. The amount of time we spend writing infrastructure in C that already exists in other language's standard libraries is very significant.
Then there's the rewrite. While the language may allow calling of C code and being called from C code we will need to draw a line somewhere. Adding new Go code into existing C code will be ugly. Rejecting someones contribution because it's in C is wrong. Also just spending time writing the same stuff in a supposedly cooler language feels wrong to me. I can see the value in adding new stuff, but it's a waste to rewrite existing stuff.
I'm not suggesting to unconditionally reject people's contributions in C. If the area of code they want to contribute to is still using C, then they should certainly write submit patches in the same way. This is a key reasn why I say we shouldn't attempt any "stop the world & rewrite" big bang - it if far too distruptive and prevents people doing other useful work in parallel. A conversion would have to be incrementally evolved over a prolonged period of time, so we can still satisfy ongoing feature requests & bug fixes in parallel. This is not about using a cooler language, it is about using a language which will let us provide a much more reliable libvirt which doesn't randomly crash & leak memory, and where we can focus our dev resources on solving the interesting problems that libvirt has, instead of wasting time dealing with the problems of C.
Then there's debugging. Official docs hint to use GDB but in the same paragraph note that GDB does not work entirely well with Go programs. [1] There might be other tools like Delve [2] which are still under heavy development. I'm not entirely persuaded in this area.
That is certainly a fair point. The flipside is that a great many of the reasons for needing GDB / valgrind / coverity / etc in C are removed by virtue of having a safe memory management design. In the intermediate where we have a mix of C and Go in the same process though, I accept it would make some debugging tasks harder. I don't think that's a show stopping problem, because over the long term it'd be a net win.
I don't like this. Don't expect me to write any Go code.
I don't know if I'll change your mind, but I really encourage people to actually look beyond the obvious surface differences. As languages go it is explicitly designed to be easy to understand & familiar for C developers, much more so than any other competing languages out there. By having a truly compiled language, as opposed to a VM like Java, or a interpretor like python/perl/ruby/etc, you're still close to the machine layer as you are in C. You can still easily call into C APIs where you need to with ease as the basic scalar data types are all very similar to C, while giving you much easier to use compound data types (lists, strings, maps, etc). Being free from the maddness of manual memory allocation/deallocation/overruns greatly improves productivity and reliability of the code, and you no longer waste so much time reinventing the wheel whether for cross-platform porting or basic library APIs. I struggle to think of any compelling reason to continue to use C, aside from the fact that it is what I've always worked with for ~20 years. Many languages have come up in that time, but none has been able to credibly replace C for low overhead systems programming work until Go and Rust came along. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11/14/2017 12:27 PM, Daniel P. Berrange wrote:
The Problem(s) ==============
First off I'll state I'm not opposed to considering adopting or integrating a newer language. Still, I have my concerns, fears, uncertainty, and doubts. In a world where one must "adapt or die", I'm not opposed to being more accepting to see GO type contributions, but I also know there's a learning curve involved with adapting and forcing myself to learn a new language especially one that's touted as being more C like, but really isn't necessarily what I've been used to for a long time (at least at first glance).
When libvirt was created, C was the only viable choice for anything aiming to be a core system library component. At that time 2005, aside from C there were common choices of Java, Python, Perl. Java was way too heavy for a low level system component, Python was becoming popular but not widely used for low level system services and Perl was on a downward trend. None of them are accessible to arbitrary languages as libraries, without providing a RPC based API service. As it turns out libvirt did end up having RPC based approach for many virt drivers, but the original approach was to be a pure library component.
IOW it is understandable why C was chosen back in 2005, but 12 years on the world around us has changed significantly. It has long been accepted that C is a very challenging language to write "safe" applications. By "safe" I mean avoiding the many problems that lead to critical security bugs. In particular the lack of a safe memory management framework leads to memory leaks, double free's, stack or heap corruption and more. The lack of strict type safety just compounds these problems. We've got many tools to help us in this area, and at times have tried to design our APIs to avoid problems, but there's no getting away from fact that even the best programmers will continually screw up memory management leading to crashes & security flaws. It is just a fact of life when using C, particularly if you want to be fast at accepting new feature proposals.
It is no surprise that there have been no new mainstream programming languages in years (decades) which provide an inherantly unsafe memory management framework. Even back in 2005 security was a serious challenge, but in the last 10+ years the situation has only got worse with countless high profile security bugs a direct result of the choice to use C. Given the threat's faced today, one has to seriously consider the wisdom of writing any new system software in C. In another 10 years time, it would not surprise me if any system software still using C is considered an obsolete relic, and ripe for a rewrite in a memory safe language.
Programming languages come and (well) go - it just so happens the C has been a survivor. There's always been challengers promising something, but eventually falling by the wayside when either they fail to deliver their promise or the next sexy language comes along. In my 30 years (eeks) even with all warts C has been there. It was certainly better than writing in assembly/macro or BLISS. I recall converting a lot of BLISS to C when I first started. There is something to be said about the "devil you know" vs. the one you don't! Just as much as there is a need to keep yourself "current" with technology trends. The latter becomes harder to do the longer I do this.
There are long term implications for the potential pool of contributors in the future. There has always been a limited pool of programmers able todo a good job in C, compared to those who know higher level languages like Python/Java. A programmer write bad code in any language, but in C/C++ that bad code quickly turns into a serious problem. Libvirt has done ok despite this, but I feel our level of contribution, particularly "drive by" patch submissions, is held back by use of C. Move forward another 10 years, and while C will certainly exist, I struggle to imagine the talent pool being larger. On the contrary I would expect it to shrink, certainly in relative terms, and possibly in absolute terms, as other new languages take C's place for low level systems programming. 10 years ago, Docker would have been written in C, but they took the sensible decision to pick Go instead. This is happening everywhere I look, and if not Go, then Rust.
I'm not convinced that "drive by" patch submissions are those we seek. As stated, libvirt is a fairly complex project. I would think drive by submissions lead to more problems regardless of the language chosen because a reviewer spends so much of his/her valuable time trying to assist the new contributor only to eventually learn that it is a drive by. Then those that are committed to the project are left to decide to drop the drive by submission or support it for years to come. Invariably there's some integration interaction that was missed. I would hope our long term goal would be build up not only contributors, but more importantly reviewers. Again, doesn't matter what language is chosen, since libvirt has review requirements then it needs reviewers. If GO is a language from which to draw new contributors and more importantly reviewers, then great. With respect to the limited pool of C developers able to do a good job in C - by flipping a switch to GO what kind of confidence level do you have that new wealth of talent will have the necessary skills/experience and/or desire to understand the nuances that do exist for project like libvirt and in particular the complicated libvirtd problem to be solved? Maybe it's a bit of 'bias' and terminology, but I've always thought there is a difference between programmer and software engineer. My FUD is that we attract too many of the former and not enough of the latter that are necessary to solve that complex issue. There are certain "things" you learn through years of trial and error that perhaps are "less important" at the application level. It seems today the theory is if an App crashes - so what, restart it. That's not something for library, daemon, or OS development. If a Daemon crashes, oh crap... host crashes, oh double crap. Once you do this long enough you get involved in many aspects of OS, daemon, and library code such as timing, threads, inter-process communication, locking, fd/socket mgmt, backdoor hooks, etc. Does GO make those less relevant or just shift the onus to learn the language and its limitations and quirks? Yes, I understand C is callable from it, but if the long term goal is C independence, then we ought to weigh and understand the risks before jumping into the ocean.
We push up against the boundaries of what's sane todo in C in other ways too. For portability across operating systems, we have to rely on GNULIB to try to sanitize the platform inconsistencies where we use POSIX, and assume that any 3rd party libraries we use have done likewise.
Even then, we've tried to avoid using the platform APIs because their designs are often too unsafe to risk using directly (strcat, malloc, free), or are not thread safe (APIs lacking _r variants). So we build our own custom C platform library on top of the base POSIX system, re-inventing the same wheel that every other project written in C invents. Every time we have to do work at the core C platform level, it is diverting time away from doing working managing higher level concepts.
Our code is following an object oriented design in many areas, but such a notion is foreign to C, so we have to bolt a poor-mans OO framework on the side. This feeds back into the memory safety problem, because our OO invention cannot be type checked reliably at compile time, making it easy to do unsafe things with objects. It relies on reference counting because there's no automatic memory management.
The other big trend of the past 10 years has been the increase in CPU core counts. My first libvirt dev machine had 1 physical CPU with no cores or threads or NUMA. My current libvirt dev machine has 2 CPUs, each with 6 cores, for 12 logical CPUs. Common server machines have 32/64 logical CPUs, and high end has 100's of CPUs. In 10 years, we'll see high end machines with 1000's of CPUs and entry level with mere 100's. IOW good concurrency is going to be key for any scalable application. Libvirt is actually doing reasonably well in this respect via our heavily threaded libvirtd daemon. It is not without cost though with ever more complex threading & locking models, which still have scalability problems. Part of the problem is that, despite Linux having very low overhead thread spawning, threads still consume non-trivial resources, so we try to constrain how many we use, which forces an M:N relationship between jobs we need to process and threads we have available.
So GO's process/thread model is then lightweight? What did they learn that the rest of us ought to know! Or is this just a continuation of the libvirtd discussion? Still it seems the pendulum has swung back to hardware and software needs to catch up. It used to be quantum leaps in processor speed as it related to chip size/density - now it's just leaps in the ability to partition/thread at the chip level. I'd hate to tell you about the boat anchor I had on my desktop when I first started!
The Solution(s) ===============
Two fairly recent languages, Go & Rust, have introduced new credible options for writing systems applications without sacrificing the performance of C, while achieving the kind of ease of use / speed of development seen with languages like Python. It goes without saying that both of them are memory safe languages, immediately solving the biggest risk of using C / C++.
If memory mgmt and security flaws are the driving force to convert to GO, then can it be claimed unequivocally that GO will be the panacea to solve all those problems? Even the best intentions don't always work out the best. If as pointed out in someone else's response there have been CVE's from/for GO centric apps - how many of those are GO related and how many are App related? Not that it matters, but the point is we're shifting some amount of risk for timely fixes elsewhere and shifting the backwards compatible story elsewhere which could be the most problematic. Not everyone has the same end goal for ABI/API compatibility. Add to that the complexity of ensuring that a specific version of some package you've based your product/reputation on. Curious, is the performance rated vs. libc memory alloc/free or something else? I don't recall ever being on a project that didn't have some sort of way to "rewrite" the memory mgmt code. Whether it was shims to handle project specific needs or usage of caches to avoid the awful *alloc/free performance. Doing the GC is great, but what is the cost. Perhaps something we don't know until we got further down that path.
The particularly interesting & relevant innovation of Go is the concept of Goroutines for concurrent programming, which provide a hybrid kernel/userspace threading model. This lowers the overhead of concurrency to the point where you can consider spawning a new goroutine for each logical job. For example, instead of having a single thread or limited pool of threads servicing all QEMU monitor sockets & API clients, can you afford to have a new goroutine dedicated to each monitor socket and API client. That has the potential to dramatically simplify use of concurrency while at the same time allowing the code to make even better use of CPUs with massive core counts.
Sounds promising and complicated, but is the risk of libvirt discovering some flaw or limitation in goroutine's worth it? IOW: Would libvirt be blazing a new trail or are other consumers that have "helped" work through the initial issues.
It of course provides a cross platform portable core library of features, and has a massive ecosystem of developers providing further 3rd party libraries for a wide variety of features. This means developers can focus more time on solving the interesting problems in their application space. The Go code is still low level enough that it can interface with C code easily. FFI calls to C APIs can be made inline in the Go code, with no need to switch out to write a low level binding in C itself. In many ways, Go can be said to have the ease of use, fast learning & safety of Python, combined with the expressiveness of C. IOW it is a better C than C.
But still requiring a learning curve to get through the nuances. I think you may be underestimating the learning curve, but I could be wrong. It would seem to be far more than a google search (as pointed out in a different response). It would probably also include gaining an understanding how whatever 3rd party library was chosen works (but maybe that's just the trust factor). If there's so many Go developers out there - one would hope there would a "swarm" willing to help convert existing projects from C to Go. ;-) Oh, and license wise it would seem we'd have to be careful, true? At least w/r/t attempting to utilize packages written or listed on the wiki page link. From just a quick scan there, it seems to be numerous "packages" available and some list difference licenses. Also, once chosen what happens if/when issues or incompatibilities are discovered in some package? Do we follow the same principle of GNULIB and try to fix it ourselves or somehow work around it? As I've learned through time - "how" someone else fixes a problem may not work out best and the degree of importance of the problem can result in delays in getting a resolution. Having some amount of control is nice and we just have to weigh the risk(s) of giving some of that away.
I don't have direct experiance in Rust, but it has the same kind of benefits over C as Go does, again without the downsides of languages like Python or Java. There are some interesting unique features to Rust that can be important to some apps. In particular it does not use garbage collection, instead the user must still do manual memory management as you would with C/C++. This allows Rust to be used in performance critical cases where it is unacceptable to have a garbage collector run. Despite a requirement for manual allocation/deallocation, Rust still provides a safe memory model. This approach of avoiding abstractions which will introduce performance overhead is a theme of Rust. The cost of such an approach is that development has a higher learning curve and ongoing cost in Rust, as compared to Go.
I don't believe that the unique features of Rust, over Go, are important to the needs of libvirt. eg while for QEMU it would be critical to not have a GC doing asynchronous memory deallocation, this is not at all important to libvirt. In fact precisely the opposite, libvirt would benefit much more from having GC take care of deallocation, letting developers focus attention other areas. In general, as from having a memory safe language, what libvirt would most benefit from is productivity gains & ease of contribution. This is the core competancy of Go, and why it is the right choice for usage in libvirt.
Depends on the GC, right? Is GC context/scope based? or overall APP based? There are certainly some particularly hairy uses of memory and arguments in libvirt code.
The obvious question / difficulty is deciding how to adopt usage of a new language, without throwing everything away and starting from scratch. It needs to be possible for contributors to continue working on every other aspect of the project while adoption takes place over the long term. Blocking ongoing feature work for prolonged periods of time is not acceptable.
Not an easy task because one way or another you're taking resources from one pile to put on another pile. Throwing new resources at the problem isn't necessarily the solution either because they need to "learn the environment".
There is also a question of scope of the work. A possible target would be to aim for 100% elimination of C in N years time (for a value of N that is certainly greater than 5, possibly as much as 10). There is a question of just whether that is a good use of resources, and even practical. In terms of management of KVM guests the bulk of ongoing development work, and complexity is in the libvirtd daemon. The libvirt.so library merely provides the remote driver client which is largely stable & unchanging. So with this in the mind the biggest benefits would be in tackling the daemon part of the code where all the complexity lives.
N = ∞ (infinity ;-))
As mentioned earlier, Go has a very effective FFI mechanism for calling C code from Go, and also allows Go code to be called from C. There are some caveats to be aware of with passing data between the languages, however, generally it is neccessary to copy data structures as C code is not permitted to derefence pointers that are owned by the Go GC system. There are two possible approaches to take, which can be crudely described as top down, or bottom up.
In the top down approach, the C file providing the main() method gets replaced by a Go file providing an equivalent main() method, which then simply does an FFI call to the existing libvirt C APIs to run the code. For example it would just call virNetServer APIs to setup the RPC layer. Effectively have a Go program where 90% of the code is an FFI call to existing libvirt C code. Then we would gradually iterate downwards converting increasing areas of C code to Go code.
In the bottom up approach, the program remains a C program, but we built .a files containing Go code for core pieces of functionality. The C code can thus call into this archive and end up executing Go code for certain pieces. Then we would gradually iterate upwards converting increasing areas of C code to Go code, until eventually reaching the top main() method.
Or a hybrid of both approaches can be taken. Whichever way is chosen is going to be a long process and many bumps in the road.
The best way to start, however, is probably to focus on a simple self-contained area of libvirt code. Specifically attack the virtlockd, and/or virtlogd daemons, converting them to use Go. This still need not be done in a "big bang". A first phase would be to develop the server side framework for handling our RPC protocol deserialization. This could then just dispatch RPC calls to the existing C impls. As a second phase, the RPC method impls would be converted to Go. Both of these daemons are small enough that the conversion would be possible across the time of a couple of releases. The hardest bit is likely ensuring compatibility for the re-exec() upgrade model they support, but this is none the less doable. The lessons learned in this would go a long way towards informing the best way to tackle the bigger task of the monolithic libvirtd (or equivalently the swarm of daemons the previous proposal suggests)
It will take though "someone" who knows GO and libvirt well enough start. At this time, I submit that pool of talent is quite limited. Not necessarily GO contributors, but those that understand the libvirt build system, how to mash things together, how to write good GO code, and what types of considerations one has to make when developing at the OS, daemon, and library level. In the end I'm not sure I see a 'requirement' to switch to GO. It seems more a 'strong desire' based primarily on the factors of GC, availability of language packages (whether inherent or provided) and some possibility that libvirt would attract more developers. It doesn't seem like GO will "fix" something that cannot be resolved in C. Thanks for the thought provoking topic and the new diversion! John
Regards, Daniel

On 11/16/2017 03:55 PM, John Ferlan wrote:
On 11/14/2017 12:27 PM, Daniel P. Berrange wrote:
Part of the problem is that, despite Linux having very low overhead thread spawning, threads still consume non-trivial resources, so we try to constrain how many we use, which forces an M:N relationship between jobs we need to process and threads we have available.
So GO's process/thread model is then lightweight? What did they learn that the rest of us ought to know! Or is this just a continuation of the libvirtd discussion?
Goroutines are not strictly 1:1 mapped to an OS thread...it's an N:M mapping where a blocking call in a goroutine will not block any other goroutines. Modern Go defaults to a number of OS threads equal to the number of cores. Chris

On Thu, Nov 16, 2017 at 04:55:55PM -0500, John Ferlan wrote:
On 11/14/2017 12:27 PM, Daniel P. Berrange wrote:
When libvirt was created, C was the only viable choice for anything aiming to be a core system library component. At that time 2005, aside from C there were common choices of Java, Python, Perl. Java was way too heavy for a low level system component, Python was becoming popular but not widely used for low level system services and Perl was on a downward trend. None of them are accessible to arbitrary languages as libraries, without providing a RPC based API service. As it turns out libvirt did end up having RPC based approach for many virt drivers, but the original approach was to be a pure library component.
IOW it is understandable why C was chosen back in 2005, but 12 years on the world around us has changed significantly. It has long been accepted that C is a very challenging language to write "safe" applications. By "safe" I mean avoiding the many problems that lead to critical security bugs. In particular the lack of a safe memory management framework leads to memory leaks, double free's, stack or heap corruption and more. The lack of strict type safety just compounds these problems. We've got many tools to help us in this area, and at times have tried to design our APIs to avoid problems, but there's no getting away from fact that even the best programmers will continually screw up memory management leading to crashes & security flaws. It is just a fact of life when using C, particularly if you want to be fast at accepting new feature proposals.
It is no surprise that there have been no new mainstream programming languages in years (decades) which provide an inherantly unsafe memory management framework. Even back in 2005 security was a serious challenge, but in the last 10+ years the situation has only got worse with countless high profile security bugs a direct result of the choice to use C. Given the threat's faced today, one has to seriously consider the wisdom of writing any new system software in C. In another 10 years time, it would not surprise me if any system software still using C is considered an obsolete relic, and ripe for a rewrite in a memory safe language.
Programming languages come and (well) go - it just so happens the C has been a survivor. There's always been challengers promising something, but eventually falling by the wayside when either they fail to deliver their promise or the next sexy language comes along. In my 30 years (eeks) even with all warts C has been there. It was certainly better than writing in assembly/macro or BLISS. I recall converting a lot of BLISS to C when I first started.
I had to go an look up what BLISS was - that's a new (well old) one for me :-) Just thinking about the languages that have risen up while I've been using C, I'm not surprised that C has been a survivor in the area that libvirt live. There's been a plethora of dynamic and/or scripting languages that have become popular (perl, python, ruby, javascript, to name but 4). None of these have really been a credible choice for usage libvirt. Use of interpretors by default has limited their perf, and forces a multi-process model to get any true parallelism. The latter has been a huge bottleneck for OpenStack Nova's concurrency. There's then the elephant in the room, Java, and with its JVM model the memory footprint of running it is just crazy. In all of them, interfacing to C is possible, but horribly unpleasant work to a large degree. And of course there's always C++, but that takes C's already complex world and makes it even more complex & leaves all the dangerous aspects of C still there, so when you shoot yourself in the foot, it blows away your entire leg. There's quite a few other interesting languages around, but none of them have received such mainstream usage, so are hard to justify if you want a broad contributor set. Rust & Go by comparison, have offered something pretty unique in languages that are still compiled to native and fairly low level and easy to interface with C. Between them they have strong potential to eliminate the need to use C for the majority of remaining usage scenarios, which can't be said of other languages I've described above.
There is something to be said about the "devil you know" vs. the one you don't! Just as much as there is a need to keep yourself "current" with technology trends. The latter becomes harder to do the longer I do this.
I don't disagree, but I think you would be pleasantly surprised at how easy it is to learn Go if coming from a C background (or indeed a Python or Java background too).
There are long term implications for the potential pool of contributors in the future. There has always been a limited pool of programmers able todo a good job in C, compared to those who know higher level languages like Python/Java. A programmer write bad code in any language, but in C/C++ that bad code quickly turns into a serious problem. Libvirt has done ok despite this, but I feel our level of contribution, particularly "drive by" patch submissions, is held back by use of C. Move forward another 10 years, and while C will certainly exist, I struggle to imagine the talent pool being larger. On the contrary I would expect it to shrink, certainly in relative terms, and possibly in absolute terms, as other new languages take C's place for low level systems programming. 10 years ago, Docker would have been written in C, but they took the sensible decision to pick Go instead. This is happening everywhere I look, and if not Go, then Rust.
I'm not convinced that "drive by" patch submissions are those we seek. As stated, libvirt is a fairly complex project. I would think drive by submissions lead to more problems regardless of the language chosen because a reviewer spends so much of his/her valuable time trying to assist the new contributor only to eventually learn that it is a drive by. Then those that are committed to the project are left to decide to drop the drive by submission or support it for years to come. Invariably there's some integration interaction that was missed.
I would hope our long term goal would be build up not only contributors, but more importantly reviewers. Again, doesn't matter what language is chosen, since libvirt has review requirements then it needs reviewers. If GO is a language from which to draw new contributors and more importantly reviewers, then great.
Yep, I don't disagree about our need for reviewers, as much as we need contributors, if not more. I think pretty much every non-trivial project I have seen suffers from a need for more reviewers. Every time you make reviewers more efficient, you enable a greater flow of patches, and so you enable more contributions. Just like building roads to solve traffic jams, never actually solves traffic jams. So I won't claim that choice of language is the driving factor in availablity of reviewers. What I will say is that more productive language would lets us focus attention on the more interesting problems to libvirt, instead of working on general infrastructure, portaibility and all the things that language doesn't provide us. IOW reviewers would still be horribly overloaded, but the stuff they would be reviewing would be more useful.
Maybe it's a bit of 'bias' and terminology, but I've always thought there is a difference between programmer and software engineer. My FUD is that we attract too many of the former and not enough of the latter that are necessary to solve that complex issue.
I think that is basically true of every single open source software project that's out there.
The other big trend of the past 10 years has been the increase in CPU core counts. My first libvirt dev machine had 1 physical CPU with no cores or threads or NUMA. My current libvirt dev machine has 2 CPUs, each with 6 cores, for 12 logical CPUs. Common server machines have 32/64 logical CPUs, and high end has 100's of CPUs. In 10 years, we'll see high end machines with 1000's of CPUs and entry level with mere 100's. IOW good concurrency is going to be key for any scalable application. Libvirt is actually doing reasonably well in this respect via our heavily threaded libvirtd daemon. It is not without cost though with ever more complex threading & locking models, which still have scalability problems. Part of the problem is that, despite Linux having very low overhead thread spawning, threads still consume non-trivial resources, so we try to constrain how many we use, which forces an M:N relationship between jobs we need to process and threads we have available.
So GO's process/thread model is then lightweight? What did they learn that the rest of us ought to know! Or is this just a continuation of the libvirtd discussion?
So in C you officially have the option of pthreads, which we use heavily. As you know a pthread maps 1:1 to an operating system thread (at least in the Linux impl, IIUC, that's not technically required by POSIX specs). Each thread has a fixed stack size associated with it (defaults to 8MB in size on Fedora at least). Of course a thread only uses 8MB of physical memory if it actually touches all those pages. If you exceed the stack bad things happen, so it is hard to pick a safe smaller size for pthread stacks. The OS schedules the threads and deals with context switching as for any normal process. Linux pthreads is nicely efficient compared to the original LinuxThreads impl and other OS like Solaris, but it is still a fairly heavy context switch. Alternatively you have the option of inventing a Coroutine concept like QEMU has used in its block layer. That lets a single OS thread run multiple userspace threads, typically the application will switch between coroutines manually at key points (like I/O operations). Coroutine context switching is lighter than thread switching so it can be beneficial. You would typically use a smaller stack size, but it is still a fixed stack, so bad stuff happens if you pick too small a size. You have to decide which OS level thread runs which coroutines manually. Goroutines are basically a union of the thread + coroutine concepts. The Go runtime will create N OS level threads, where the default N currently matches the number of logical CPU cores you host has (but is tunable to other values). The application code just always creates Goroutines which are userspace threads just like coroutines. The Go runtime will dynamically switch goroutines at key points, and automatically pick suitable OS level threads to run them on to maximize concurrency. Most cleverly goroutines have a 2 KB default stack size, and runtime will dynamically grow the stack if that limit is reached. So this avoids the problem of picking a suitable stack size, and avoids any danger of overruns. As a result it is possible for a process to create *1000's* of goroutines and have less overhead than if you tried todo the same thing in C with threads+coroutines manually. The fact that goroutines are so cheap means you can use simpler threading designs for applications. eg instead of the approach where libvirt tries to multiplex all I/O into a single event thread, and then have a pool of threads for RPC calls, but then add another pool for threads for RPC calls that must always run quickly, we could dramatically simplify things. Normal practice in Go would just have a single Goroutine for each client socket connection, and this would spawn a single Goroutine for each RPC call that needs to be run. Essentially eliminates all the throttling & queuing of calls that we do, which removes the bottlenecks it inherantly creates. cf the problems that Prerna is trying to solve where our main loop is getting block by QEMU event handling and the increasingly complex solutions we're trying to invent to deal with it.
Still it seems the pendulum has swung back to hardware and software needs to catch up. It used to be quantum leaps in processor speed as it related to chip size/density - now it's just leaps in the ability to partition/thread at the chip level. I'd hate to tell you about the boat anchor I had on my desktop when I first started!
IBM solved everything in the 60's & 70's on the mainframe, and we're still trying to reinvent all the solutions they had :-P
Two fairly recent languages, Go & Rust, have introduced new credible options for writing systems applications without sacrificing the performance of C, while achieving the kind of ease of use / speed of development seen with languages like Python. It goes without saying that both of them are memory safe languages, immediately solving the biggest risk of using C / C++.
If memory mgmt and security flaws are the driving force to convert to GO, then can it be claimed unequivocally that GO will be the panacea to solve all those problems? Even the best intentions don't always work out the best. If as pointed out in someone else's response there have been CVE's from/for GO centric apps - how many of those are GO related and how many are App related? Not that it matters, but the point is we're shifting some amount of risk for timely fixes elsewhere and shifting the backwards compatible story elsewhere which could be the most problematic. Not everyone has the same end goal for ABI/API compatibility. Add to that the complexity of ensuring that a specific version of some package you've based your product/reputation on.
NB, I'm certainly not claiming it will solve all security flaws. Far from it, there's been plenty of screwups we've done that are not at all related to choice of language. I am claiming that we'll eliminate all those flaws related to use of unsafe memory mangement. ie buffer overflows, double frees, use of free'd memory, and all the other fun ways in which we screw up and crash libvirtd, often enabling security attacks (even if we don't file CVEs for most of them)
Curious, is the performance rated vs. libc memory alloc/free or something else? I don't recall ever being on a project that didn't have some sort of way to "rewrite" the memory mgmt code. Whether it was shims to handle project specific needs or usage of caches to avoid the awful *alloc/free performance. Doing the GC is great, but what is the cost. Perhaps something we don't know until we got further down that path.
I don't see a compelling case where libvirt has performance critical memory management requirements. Indeed use of manual malloc/free is far from offering the best performance from a memory mgmt POV in general. Our biggest performance problems come when we inherantly self-limit our performance by using fewer threads than are really needed to deal with the number of VMs we're managing. Assuming the GC is not totally useless, I don't see a reason why it is an issue for libvirt. In terms of scope Docker is very similar to what libvirt tries todo, and probably has greater performance requirements than libvirt because container density on a machine is usually much higher than VM density. Beyond that you have apps like Etcd and Kubernetes which have an order of magnitude greater performance needs than libvirt, as they're managing across enter clusters of 100's of machines or more. IOW, use of GC is not a concern for me, and clearly outweighs the downsides of our current manual approach both in terms of code simplicity and reliabity.
The particularly interesting & relevant innovation of Go is the concept of Goroutines for concurrent programming, which provide a hybrid kernel/userspace threading model. This lowers the overhead of concurrency to the point where you can consider spawning a new goroutine for each logical job. For example, instead of having a single thread or limited pool of threads servicing all QEMU monitor sockets & API clients, can you afford to have a new goroutine dedicated to each monitor socket and API client. That has the potential to dramatically simplify use of concurrency while at the same time allowing the code to make even better use of CPUs with massive core counts.
Sounds promising and complicated, but is the risk of libvirt discovering some flaw or limitation in goroutine's worth it? IOW: Would libvirt be blazing a new trail or are other consumers that have "helped" work through the initial issues.
We would not be anywhere near pushing the boundaries of Go. Apps like Etcd / Kubernetes stretch it far more than we would.
Oh, and license wise it would seem we'd have to be careful, true? At least w/r/t attempting to utilize packages written or listed on the wiki page link. From just a quick scan there, it seems to be numerous "packages" available and some list difference licenses.
Yes, licensing is always a concern. Most commonly I see Go code under more permissive licenses (BSD, Apache) than libvirt has traditionally used. L(GPL)v2+ is compatible with both BSD & Apache, though for Apache it relies on v2+ becoming v3+. L(GPL)v2-only code is a problem with Apache compatibilty. We do unfortunately suffer from GPLv2-only in the VirtualBox driver, inherited from the VirtualBox XPCOM API which is GPLv2-only and hence why we had to move it out of libvirt.so into libvirtd despite it being a stateless driver. The only real good option there, aside from deleting it, is to isolate the VirtualBox driver still further in a standalone process such that its problems are confined. The refactoring of libvirtd I suggested would probably help with the latter
Also, once chosen what happens if/when issues or incompatibilities are discovered in some package? Do we follow the same principle of GNULIB and try to fix it ourselves or somehow work around it? As I've learned through time - "how" someone else fixes a problem may not work out best and the degree of importance of the problem can result in delays in getting a resolution. Having some amount of control is nice and we just have to weigh the risk(s) of giving some of that away.
I don't have a clear answer for this, since it would probably depend on the kind of problems we hit. This kind of unknown is why a cautious approach would be best, starting at the edge of libvirtd where scope for interactions and/or impact on other work is limited. eg virtlogd and virtlockd are both very self-contained services, so could be good guinea pigs for initial prooving work without disrupting libvirt in general.
I don't believe that the unique features of Rust, over Go, are important to the needs of libvirt. eg while for QEMU it would be critical to not have a GC doing asynchronous memory deallocation, this is not at all important to libvirt. In fact precisely the opposite, libvirt would benefit much more from having GC take care of deallocation, letting developers focus attention other areas. In general, as from having a memory safe language, what libvirt would most benefit from is productivity gains & ease of contribution. This is the core competancy of Go, and why it is the right choice for usage in libvirt.
Depends on the GC, right? Is GC context/scope based? or overall APP based? There are certainly some particularly hairy uses of memory and arguments in libvirt code.
GC approach in Go is mark+swweep, but as mentioned above, when compared to the scale of other apps using Go, I'm not concerned from libvirt POV.
The best way to start, however, is probably to focus on a simple self-contained area of libvirt code. Specifically attack the virtlockd, and/or virtlogd daemons, converting them to use Go. This still need not be done in a "big bang". A first phase would be to develop the server side framework for handling our RPC protocol deserialization. This could then just dispatch RPC calls to the existing C impls. As a second phase, the RPC method impls would be converted to Go. Both of these daemons are small enough that the conversion would be possible across the time of a couple of releases. The hardest bit is likely ensuring compatibility for the re-exec() upgrade model they support, but this is none the less doable. The lessons learned in this would go a long way towards informing the best way to tackle the bigger task of the monolithic libvirtd (or equivalently the swarm of daemons the previous proposal suggests)
It will take though "someone" who knows GO and libvirt well enough start. At this time, I submit that pool of talent is quite limited. Not necessarily GO contributors, but those that understand the libvirt build system, how to mash things together, how to write good GO code, and what types of considerations one has to make when developing at the OS, daemon, and library level.
You can probably guess that the "someone" would be me ;-) You certainly have a valid point here, which is again my a cautious approach is needed rather than attempting to much of a "big bang". I certainly would not start anywhere near the QEMU driver, since chance of disrupting other devs productivity is far too high. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

"Daniel P. Berrange" <berrange@redhat.com> writes: [...]
Goroutines are basically a union of the thread + coroutine concepts. The Go runtime will create N OS level threads, where the default N currently matches the number of logical CPU cores you host has (but is tunable to other values). The application code just always creates Goroutines which are userspace threads just like coroutines. The Go runtime will dynamically switch goroutines at key points, and automatically pick suitable OS level threads to run them on to maximize concurrency. Most cleverly goroutines have a 2 KB default stack size, and runtime will dynamically grow the stack if that limit is reached.
Does this work even when the stack limit is exceeded in a C function?
So this avoids the problem of picking a suitable stack size, and avoids any danger of overruns. As a result it is possible for a process to create *1000's* of goroutines and have less overhead than if you tried todo the same thing in C with threads+coroutines manually.
[...]

On Fri, Nov 17, 2017 at 01:34:54PM +0100, Markus Armbruster wrote:
"Daniel P. Berrange" <berrange@redhat.com> writes:
[...]
Goroutines are basically a union of the thread + coroutine concepts. The Go runtime will create N OS level threads, where the default N currently matches the number of logical CPU cores you host has (but is tunable to other values). The application code just always creates Goroutines which are userspace threads just like coroutines. The Go runtime will dynamically switch goroutines at key points, and automatically pick suitable OS level threads to run them on to maximize concurrency. Most cleverly goroutines have a 2 KB default stack size, and runtime will dynamically grow the stack if that limit is reached.
Does this work even when the stack limit is exceeded in a C function?
When you make a C call in go, it runs in a separate stack. The goroutines own stack is managed by the garbage collector, so can't be exposed to C code. I'm unclear exactly what size the C stack would be, but it'll be the traditional fixed size, not the grow-on-demand behaviour of the Go stack. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11/17/2017 06:37 AM, Daniel P. Berrange wrote:
On Fri, Nov 17, 2017 at 01:34:54PM +0100, Markus Armbruster wrote:
"Daniel P. Berrange" <berrange@redhat.com> writes:
[...]
Goroutines are basically a union of the thread + coroutine concepts. The Go runtime will create N OS level threads, where the default N currently matches the number of logical CPU cores you host has (but is tunable to other values). The application code just always creates Goroutines which are userspace threads just like coroutines. The Go runtime will dynamically switch goroutines at key points, and automatically pick suitable OS level threads to run them on to maximize concurrency. Most cleverly goroutines have a 2 KB default stack size, and runtime will dynamically grow the stack if that limit is reached.
Does this work even when the stack limit is exceeded in a C function?
When you make a C call in go, it runs in a separate stack. The goroutines own stack is managed by the garbage collector, so can't be exposed to C code. I'm unclear exactly what size the C stack would be, but it'll be the traditional fixed size, not the grow-on-demand behaviour of the Go stack.
Based on https://github.com/golang/go/blob/master/src/runtime/cgo/gcc_linux_amd64.c it looks like they don't explicitly specify a stack size, at least on linux. Are there limits as to what you're allowed to do in C code called from Go? Can you fork processes, spawn threads, call setjmp/longjmp, handle signals, sleep, etc.? Chris

On Fri, Nov 17, 2017 at 10:04:35AM -0600, Chris Friesen wrote:
On 11/17/2017 06:37 AM, Daniel P. Berrange wrote:
On Fri, Nov 17, 2017 at 01:34:54PM +0100, Markus Armbruster wrote:
"Daniel P. Berrange" <berrange@redhat.com> writes:
[...]
Goroutines are basically a union of the thread + coroutine concepts. The Go runtime will create N OS level threads, where the default N currently matches the number of logical CPU cores you host has (but is tunable to other values). The application code just always creates Goroutines which are userspace threads just like coroutines. The Go runtime will dynamically switch goroutines at key points, and automatically pick suitable OS level threads to run them on to maximize concurrency. Most cleverly goroutines have a 2 KB default stack size, and runtime will dynamically grow the stack if that limit is reached.
Does this work even when the stack limit is exceeded in a C function?
When you make a C call in go, it runs in a separate stack. The goroutines own stack is managed by the garbage collector, so can't be exposed to C code. I'm unclear exactly what size the C stack would be, but it'll be the traditional fixed size, not the grow-on-demand behaviour of the Go stack.
Based on https://github.com/golang/go/blob/master/src/runtime/cgo/gcc_linux_amd64.c it looks like they don't explicitly specify a stack size, at least on linux.
That'll inherit the default from ulimit then, which will be 8MB
Are there limits as to what you're allowed to do in C code called from Go? Can you fork processes, spawn threads, call setjmp/longjmp, handle signals, sleep, etc.?
The big caveats are around sharing memory between the Go & C layers. You generally need to copy data between the layers to avoid C code interacting with Go memory that can be garbage collected. In terms of whta C can do, I think pretty much anything is possible, within the constraints of safe C programming - eg careful wrt async signal safety as normal. You also have to be aware that if a C code called from goroutines makes the goroutine much heavier than normal, because of the extra C stack space. ie just because you can have 1000's of goroutines running go code, doesn't mean you can have 1000's of goroutines running C code, as stack size explodes from 2kb to 8 MB Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Nov 14, 2017 at 05:27:01PM +0000, Daniel P. Berrange wrote: [...]
I don't have direct experiance in Rust, but it has the same kind of benefits over C as Go does, again without the downsides of languages like Python or Java. There are some interesting unique features to Rust that can be important to some apps. In particular it does not use garbage collection, instead the user must still do manual memory management as you would with C/C++. This allows Rust to be used in performance critical cases where it is unacceptable to have a garbage collector run. Despite a requirement for manual allocation/deallocation, Rust still provides a safe memory model. This approach of avoiding abstractions which will introduce performance overhead is a theme of Rust. The cost of such an approach is that development has a higher learning curve and ongoing cost in Rust, as compared to Go.
I don't believe that the unique features of Rust, over Go, are important to the needs of libvirt. eg while for QEMU it would be critical to not have a GC doing asynchronous memory deallocation, this is not at all important to libvirt. In fact precisely the opposite, libvirt would benefit much more from having GC take care of deallocation, letting developers focus attention other areas. In general, as from having a memory safe language, what libvirt would most benefit from is productivity gains & ease of contribution. This is the core competancy of Go, and why it is the right choice for usage in libvirt.
Even though I agree with you on most of the opinions of this thread, I must engage in a discussion on this part. I started working on libvirt because I liked the fact that it is in C and needs to work with the low-level concepts and interfaces. I like python for rapid prototyping, but I used to be against new, "hype" languages that were popping up here and there. I must say, on the other hand, that I was surprised how choice of language can influence the contributor flow (mostly incoming). However I heard about this mostly happening in the ruby and node.js communities. Along the way, I quickly stopped thinking about Rust and Go as hype languages. Well, so much for my background... I'm going to approach the elephant in the room, but I'm not the one to defend the opinion as my understanding, as well as experience, is limited. On the other hand, that could show you that it is not that hard for a newbie to get the hang of it. I'm saying this because I might get some things wrong, but I expect the wrong information to be related mainly to Go. So the first thing I disagreed on is that in Rust you do manual allocations. In fact, you don't. Or, depending on the point of view, you do less or the same amount of manual allocation than in Go. What is the clear win for Rust it the concept of ownership and it's related to the allocation mentioned before. I am standing strongly behind the opinion that the learning curve of Rust is definitely worth it. And coming from the C world, it is easy to understand. To me, it is very easy to explain that concept to great detail to someone who has background in libvirt. And the big benefit (and still a huge opportunity for improvement WRT optimizations) is that the compiler must know about it and so it is resolved compile-time. Dereferencing or destructors are run at the end of their scope, automatically. You can nicely see that when realizing that Rust doesn't need any `defer` as Go has. Sure, Rust doesn't have green threads implementation, however there should be support for that in some library. I used vague wording due to the fact that I haven't yet found it, however it is mentioned in official docs. Apparently it was removed from the standard library due to requirements on runtime size. Rust's slogan (or one of them?) is "fearless concurrency". It builds upon the ideas from Go, however, having more information it can work with it better. It could help us considering how many problems we have with reference counting and similar. So much for defending Rust. Long story short, I think we could benefit a tiny (well, IMHO not really tiny) bit more if we go that route instead. Now for some more opinions I have. Stay with me. Not considering the Linux kernel, which has their own address sanitizer, randomizer and god-knows-whatifier, there is still plethora or projects that are successful in C and I don't see how they would need to adapt or die. I know we have bunch of stuff that needs to be fixed and the current approach doesn't make it easy. I, myself, thought about proposing new version of libvirt, which would be a rewrite of the current one. Thanks to that I have some reasons why I didn't do that. One of the things is that the kinks we have can be ironed out in C as well. It might be easier in other languages, but it is harder when you have to switch to one. We have bunch of code dealing with backwards compatibility. And I argue that this is something that causes issues on its own. What's even worse, IMHO, is that we are so much feature-driven that there is no time for any ironing. I see too much potential for refactoring in various parts of libvirt that will never see the lights of day because we need X to be implemented. And contributors sending feature requests that they fail to maintain later don't help much with that. Maybe we could fix this by saying the next Y releases will just be bugfix releases. Maybe we could help bringing new contributors by devoting some of our time to do an actual change that will make them want to help us more. I know some of you will be sick and tired hearing about Rust once more, but have you heard about how much their community is inclusion-oriented? I guess what I'm trying to say is that there are other (and maybe less disruptive) ways to handle the current problems we are facing. And then there are the "issues" with Go (and unfortunately some with Rust as well :'( ). Lot of the code for libraries is written with permissive licences, but if there is some that is LGPL-incompatible we can't use them. And in ecosystems such as Rust and Go there are fewer alternatives, so we might not find one that we'll be able to use. If that happens, there goes bunch of our time. Like nothing. How do we deal with problems/bugs in dependence libraries? I know, all the projects are pretty new, so they might be nicer to contributors. If they are not, we might need to fork or rewrite the code. Bam, another chance of losing workforce. If I may, I will ask some things about Go that I'm not that familiar with and I think they are pretty important. How does Go handle updates in dependency libs? Does it automatically pull newest version from public repositories where some unknown person can push whatever they want? Or can they be hash- or version-bound? The build process is that all the binaries are static, right? Or all the go code is static and it only has dynamic dependencies on C libraries? In such a project as libvirt, wouldn't that mean that the processes we run will be pretty heavy-weight? How is it with rebuilding after a small change. I know Go is good when it comes to compilation times. That might be something that people might like a lot. Especially those who are trying to shave off every second of compilation. However if you cannot use ccache and you always need to rebuild everything, it might increase the build-time quite a lot, even thought indirectly. Since there were some hateful opinions about Go, let me add my share as well. Just so we are on the same page and I don't miss saying anything. Not that these opinions would be as important as the ones above. You can't have /tmp mounted with noexec option unless you have TMPDIR set to some other directory. And sometimes not even in that case. I guess non-issue for some distros, but bunch of people deal with that and if seems like something that could be taken care of in Go itself and it is just not. You can't just clone a repo, cd into it and build it. You have to get the dependencies in a special manner, for that you have to have GOPATH set and based on that you have to have your directories setup similarly to what Go expects, then you need GOBIN if you want to build something and other stuff that's just not nice and doesn't make much sense. At least for newcomers. Simply the fact that it seems to me like Go is trying to go against the philosophy of "Do one thing and do it well". I know everyone is about having the build described in the same language as the project, but what comes out of it is not something I prefer. I'm not against using another language to make some stuff better. I guess it is kind of visible from the mail that I like Rust, but I'm not against other languages as well. I just want this to be full on discussion and I want my opinion to be expressed. Thanks for _listening_ ;) Have a nice day, Martin

On Mon, Nov 20, 2017 at 12:24:22AM +0100, Martin Kletzander wrote:
On Tue, Nov 14, 2017 at 05:27:01PM +0000, Daniel P. Berrange wrote:
[...]
I don't have direct experiance in Rust, but it has the same kind of benefits over C as Go does, again without the downsides of languages like Python or Java. There are some interesting unique features to Rust that can be important to some apps. In particular it does not use garbage collection, instead the user must still do manual memory management as you would with C/C++. This allows Rust to be used in performance critical cases where it is unacceptable to have a garbage collector run. Despite a requirement for manual allocation/deallocation, Rust still provides a safe memory model. This approach of avoiding abstractions which will introduce performance overhead is a theme of Rust. The cost of such an approach is that development has a higher learning curve and ongoing cost in Rust, as compared to Go.
I don't believe that the unique features of Rust, over Go, are important to the needs of libvirt. eg while for QEMU it would be critical to not have a GC doing asynchronous memory deallocation, this is not at all important to libvirt. In fact precisely the opposite, libvirt would benefit much more from having GC take care of deallocation, letting developers focus attention other areas. In general, as from having a memory safe language, what libvirt would most benefit from is productivity gains & ease of contribution. This is the core competancy of Go, and why it is the right choice for usage in libvirt.
So the first thing I disagreed on is that in Rust you do manual allocations. In fact, you don't. Or, depending on the point of view, you do less or the same amount of manual allocation than in Go. What is the clear win for Rust it the concept of ownership and it's related to the allocation mentioned before.
I shouldn't have used the word "allocation" in my paragraph above. As you say, both languages have similar needs around allocation. The difference I meant is around deallocation policy - in Rust, object lifetime is is a more explicit decision on control of the progammer, as opposed to Go's garbage collection. From what I've read Rust approach to deallocation is much closer to the C++ concept of "smart pointers", eg this http://pcwalton.github.io/blog/2013/03/18/an-overview-of-memory-management-i...
I am standing strongly behind the opinion that the learning curve of Rust is definitely worth it. And coming from the C world, it is easy to understand. To me, it is very easy to explain that concept to great detail to someone who has background in libvirt. And the big benefit (and still a huge opportunity for improvement WRT optimizations) is that the compiler must know about it and so it is resolved compile-time. Dereferencing or destructors are run at the end of their scope, automatically. You can nicely see that when realizing that Rust doesn't need any `defer` as Go has.
Nb the 'defer' concept isn't really about memory management per-se, rather it is focused on cleanup of related resources - eg deciding when to close an open file handle, or when to release another resource. Everything thats just Go object memory is handling by the GC.
One of the things is that the kinks we have can be ironed out in C as well. It might be easier in other languages, but it is harder when you have to switch to one. We have bunch of code dealing with backwards compatibility. And I argue that this is something that causes issues on its own. What's even worse, IMHO, is that we are so much feature-driven that there is no time for any ironing. I see too much potential for refactoring in various parts of libvirt that will never see the lights of day because we need X to be implemented. And contributors sending feature requests that they fail to maintain later don't help much with that. Maybe we could fix this by saying the next Y releases will just be bugfix releases. Maybe we could help bringing new contributors by devoting some of our time to do an actual change that will make them want to help us more. I know some of you will be sick and tired hearing about Rust once more, but have you heard about how much their community is inclusion-oriented? I guess what I'm trying to say is that there are other (and maybe less disruptive) ways to handle the current problems we are facing.
I'm not going to debate that there's plenty of problems we could be tackling, and changing language is not a magic bullet for all of them. My primary motivation is to start to get out of the world where we have random crashes & security bugs due to overflowing buffers, double frees, use after free, and so on. Problems that have been solved by every language invented since C in the last 30-40 years. On the choice of language, I will say C is a turn off to many people as it is (not unreasonably) viewed as archaic, hard to learn and difficult to write good code in. When I worked in OpenStack it was a constant battle to get people to consider enhancements to libvirt instead of reinventing it in Python. It was a hard sell because most python dev just didn't want to use C at all because it has a high curve to contributors, even if libvirt as a community is welcoming. As a result OpenStack pretty much reinvented its own hypervisor agnostic API for esx, hyperv, xenapi and KVM instead of enhancing libvirt's support for esx, hyperv or xenapi. They only end up using libvirt for KVM and libxl really. I hear similar comments from people working in virt related projects in Go. So use of C really does have an impact on our pool of potential contributors. This is disappointing as there are huge numbers of people working on virt related projects, they just don't ever go near C - virt-viewer & virsh are probably the only "apps" using libvirt from C - all the others use a higher level language via one of our bindings.
And then there are the "issues" with Go (and unfortunately some with Rust as well :'( ).
Yep, no choice is ever perfect.
Lot of the code for libraries is written with permissive licences, but if there is some that is LGPL-incompatible we can't use them. And in ecosystems such as Rust and Go there are fewer alternatives, so we might not find one that we'll be able to use. If that happens, there goes bunch of our time. Like nothing.
I've not looked at the rust ecosystem, but from I've seen in Go most devs tend to go for even more permissive licensing, ie BSD/Apache/MIT. Disappointly few people pick GPL variants :-( So side from the complication with our virtualbox code being GPLv2-only, I don't think there's a license problem to worry about with Go, anymore than we have to worry about with C today.
How do we deal with problems/bugs in dependence libraries? I know, all the projects are pretty new, so they might be nicer to contributors. If they are not, we might need to fork or rewrite the code. Bam, another chance of losing workforce.
Many C libs we depend on have been around along time so are more mature, but are also more conservative in accepting changes, especially if they touch API. On balance I don't think there would be a big difference either way in this area.
How does Go handle updates in dependency libs? Does it automatically pull newest version from public repositories where some unknown person can push whatever they want? Or can they be hash- or version-bound?
Originally it was quite informal (and awful) - your 3rd party deps just had to be present in $GOPATH, and there was no tracking of versions. No significant sized project works this way anymore because it is insane. Instead go introduced the concept they call the "vendoring" where you have a top level dir called vendor/ where all your deps live. You app provides a metadata file (in JSON typically) where you list the deps and the preferred versions you need. The tool then populates vendor/ with the right code to build against. Think of vendor/ has been sort of like GIT submodules, but not using GIT submodules, and you'll be thinking along the right lines. Go libraries are being strongly encouraged to adopt semver for their versioning.
The build process is that all the binaries are static, right? Or all the go code is static and it only has dynamic dependencies on C libraries? In such a project as libvirt, wouldn't that mean that the processes we run will be pretty heavy-weight?
Yes, all the Go code is statically linked - only C libs are dyn loaded. This doesn't have any impact on runtime, because even if the binary was 100's of MB in size, the kernel is only ever going to page in sections of that file which are actually executed. The main impact is that if a dependancy gets an update (eg for a security fix) all downstream apps need rebuilding.
How is it with rebuilding after a small change. I know Go is good when it comes to compilation times. That might be something that people might like a lot. Especially those who are trying to shave off every second of compilation. However if you cannot use ccache and you always need to rebuild everything, it might increase the build-time quite a lot, even thought indirectly.
Compile times are great - the compiler is very fast, and it does caching on a per-package basis. NB in go a "package" is an individual directory in your source tree, so a typical app would have 10's or 100's of packages, each corresponding to a separate subdir. Source dirs are probably more fine grained that we use in libvirt. eg what's in src/qemu in libvirt currently would likely end up being spread across as many as 5 packages if it were idiomatic Go. The biggest win though comes from not needing autoconf, automake. Of course libvirt wouldn't see that benefit as long as any of our code were still C, so I won't claim that's a win in this particular case.
You can't have /tmp mounted with noexec option unless you have TMPDIR set to some other directory. And sometimes not even in that case. I guess non-issue for some distros, but bunch of people deal with that and if seems like something that could be taken care of in Go itself and it is just not.
I've not heard of that one before, and so obviously not hit it.
From Google it seems this applies if you use 'go run' or 'go test' commands. The former is not something you typically use, but the latter is. It can be avoided by having the make/shell run that invokes 'go test' set a local TMPDIR - no need to set it globally in your bash profile. I'm not sure what distros have /tmp with noexec - Fedora doesn't at least.
You can't just clone a repo, cd into it and build it. You have to get the dependencies in a special manner, for that you have to have GOPATH set and based on that you have to have your directories setup similarly to what Go expects, then you need GOBIN if you want to build something and other stuff that's just not nice and doesn't make much sense. At least for newcomers. Simply the fact that it seems to me like Go is trying to go against the philosophy of "Do one thing and do it well". I know everyone is about having the build described in the same language as the project, but what comes out of it is not something I prefer.
This is relataed to the thing I mention above where historically everything was just splattered into $GOPATH. Most apps have gone towards the vendoring concept where every dep is self-contained in your local checkout.
I'm not against using another language to make some stuff better. I guess it is kind of visible from the mail that I like Rust, but I'm not against other languages as well. I just want this to be full on discussion and I want my opinion to be expressed. Thanks for _listening_ ;)
Either Rust or Go would be a step forward over staying exclusively with C IMHO, so at least we agree that there are potential benefits to either :-) Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Nov 20, 2017 at 03:25:33PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 20, 2017 at 12:24:22AM +0100, Martin Kletzander wrote:
On Tue, Nov 14, 2017 at 05:27:01PM +0000, Daniel P. Berrange wrote:
[...]
I don't have direct experiance in Rust, but it has the same kind of benefits over C as Go does, again without the downsides of languages like Python or Java. There are some interesting unique features to Rust that can be important to some apps. In particular it does not use garbage collection, instead the user must still do manual memory management as you would with C/C++. This allows Rust to be used in performance critical cases where it is unacceptable to have a garbage collector run. Despite a requirement for manual allocation/deallocation, Rust still provides a safe memory model. This approach of avoiding abstractions which will introduce performance overhead is a theme of Rust. The cost of such an approach is that development has a higher learning curve and ongoing cost in Rust, as compared to Go.
I don't believe that the unique features of Rust, over Go, are important to the needs of libvirt. eg while for QEMU it would be critical to not have a GC doing asynchronous memory deallocation, this is not at all important to libvirt. In fact precisely the opposite, libvirt would benefit much more from having GC take care of deallocation, letting developers focus attention other areas. In general, as from having a memory safe language, what libvirt would most benefit from is productivity gains & ease of contribution. This is the core competancy of Go, and why it is the right choice for usage in libvirt.
So the first thing I disagreed on is that in Rust you do manual allocations. In fact, you don't. Or, depending on the point of view, you do less or the same amount of manual allocation than in Go. What is the clear win for Rust it the concept of ownership and it's related to the allocation mentioned before.
I shouldn't have used the word "allocation" in my paragraph above. As you say, both languages have similar needs around allocation. The difference I meant is around deallocation policy - in Rust, object lifetime is is a more explicit decision on control of the progammer, as opposed to Go's garbage collection. From what I've read Rust approach to deallocation is much closer to the C++ concept of "smart pointers", eg this
http://pcwalton.github.io/blog/2013/03/18/an-overview-of-memory-management-i...
This is kind of old, that code wouldn't run with newer Rust. I guess that is from far ago when it was not stabilized at all. It is a bit smarter now. The fact that you have control over when the value is getting freed is true, however you rarely have to think about that. What's more important is that the compiler prevents you from accessing value from multiple places or not knowing who "owns" (think of it as "who should take care of freeing it") the variable. If you give the ownership to someone you can't access it. The difference I see is that if you access it after some other part of the code is responsible for that variable in Rust the compiler will cut you off unless you clearly specify how the memory space related to the variable is supposed to be handled. In Go it will just work (with potential bug) but it will not crash because GC will not clean it up when someone can still access it. Granted this is usually problem with concurrent threads/coroutines (which I don't know how they handle access concurrent access in Go). Also, for example Rust doesn't allow you to have value accessible from multiple threads unless it is guarded by thread-safe reference counter and does not allow you to modify it unless it is guarded by Mutex, RWLock or s one of the Atomic types. Again, I don't know Go that much, I'm still yet to delve into the deep unknowns of it, but I haven't heard about it providing such safety.
I am standing strongly behind the opinion that the learning curve of Rust is definitely worth it. And coming from the C world, it is easy to understand. To me, it is very easy to explain that concept to great detail to someone who has background in libvirt. And the big benefit (and still a huge opportunity for improvement WRT optimizations) is that the compiler must know about it and so it is resolved compile-time. Dereferencing or destructors are run at the end of their scope, automatically. You can nicely see that when realizing that Rust doesn't need any `defer` as Go has.
Nb the 'defer' concept isn't really about memory management per-se, rather it is focused on cleanup of related resources - eg deciding when to close an open file handle, or when to release another resource. Everything thats just Go object memory is handling by the GC.
One of the things is that the kinks we have can be ironed out in C as well. It might be easier in other languages, but it is harder when you have to switch to one. We have bunch of code dealing with backwards compatibility. And I argue that this is something that causes issues on its own. What's even worse, IMHO, is that we are so much feature-driven that there is no time for any ironing. I see too much potential for refactoring in various parts of libvirt that will never see the lights of day because we need X to be implemented. And contributors sending feature requests that they fail to maintain later don't help much with that. Maybe we could fix this by saying the next Y releases will just be bugfix releases. Maybe we could help bringing new contributors by devoting some of our time to do an actual change that will make them want to help us more. I know some of you will be sick and tired hearing about Rust once more, but have you heard about how much their community is inclusion-oriented? I guess what I'm trying to say is that there are other (and maybe less disruptive) ways to handle the current problems we are facing.
I'm not going to debate that there's plenty of problems we could be tackling, and changing language is not a magic bullet for all of them. My primary motivation is to start to get out of the world where we have random crashes & security bugs due to overflowing buffers, double frees, use after free, and so on. Problems that have been solved by every language invented since C in the last 30-40 years.
On the choice of language, I will say C is a turn off to many people as it is (not unreasonably) viewed as archaic, hard to learn and difficult to write good code in.
I would say this highly depends on what area you are coming from. In the cloud world it would be viewed way differently than in the dark^Wlow-level side of things. I don't want to compare libvirt to the kernel for example, but I haven't heard about C being a turn-off there.
When I worked in OpenStack it was a constant battle to get people to consider enhancements to libvirt instead of reinventing it in Python. It was a hard sell because most python dev just didn't want to use C at all because it has a high curve to contributors, even if libvirt as a community is welcoming. As a result OpenStack pretty much reinvented
I'm sorry, but I think only handful of us (yeah, I think and hope I could count myself amoungst that group) are welcoming. But it's actually where I see one of the big turn-offs.
its own hypervisor agnostic API for esx, hyperv, xenapi and KVM instead of enhancing libvirt's support for esx, hyperv or xenapi. They only end up using libvirt for KVM and libxl really. I hear similar comments from people working in virt related projects in Go. So use of C really does have an impact on our pool of potential contributors. This is disappointing
It's hard to guess how that would turn out if C wasn't turn-off for them. Maybe we would get bunch of patchsets submitted that would be in better shape language-wise, but would that help with understanding various internal structures and behaviours of libvirt? Not considering the fact that some language would make it more readable. Or would we just get bunch more drive-by patches that we would have to fix/maintain? I don't think we can answer that.
as there are huge numbers of people working on virt related projects, they just don't ever go near C - virt-viewer & virsh are probably the only "apps" using libvirt from C - all the others use a higher level language via one of our bindings.
And then there are the "issues" with Go (and unfortunately some with Rust as well :'( ).
Yep, no choice is ever perfect.
Lot of the code for libraries is written with permissive licences, but if there is some that is LGPL-incompatible we can't use them. And in ecosystems such as Rust and Go there are fewer alternatives, so we might not find one that we'll be able to use. If that happens, there goes bunch of our time. Like nothing.
I've not looked at the rust ecosystem, but from I've seen in Go most devs tend to go for even more permissive licensing, ie BSD/Apache/MIT. Disappointly few people pick GPL variants :-( So side from the complication with our virtualbox code being GPLv2-only, I don't think there's a license problem to worry about with Go, anymore than we have to worry about with C today.
OK, and ...
How do we deal with problems/bugs in dependence libraries? I know, all the projects are pretty new, so they might be nicer to contributors. If they are not, we might need to fork or rewrite the code. Bam, another chance of losing workforce.
Many C libs we depend on have been around along time so are more mature, but are also more conservative in accepting changes, especially if they touch API. On balance I don't think there would be a big difference either way in this area.
... OK, I hope so and if you say so I will blindly believe that ;)
How does Go handle updates in dependency libs? Does it automatically pull newest version from public repositories where some unknown person can push whatever they want? Or can they be hash- or version-bound?
Originally it was quite informal (and awful) - your 3rd party deps just had to be present in $GOPATH, and there was no tracking of versions. No significant sized project works this way anymore because it is insane.
Instead go introduced the concept they call the "vendoring" where you have a top level dir called vendor/ where all your deps live. You app provides a metadata file (in JSON typically) where you list the deps and the preferred versions you need. The tool then populates vendor/ with the right code to build against. Think of vendor/ has been sort of like GIT submodules, but not using GIT submodules, and you'll be thinking along the right lines. Go libraries are being strongly encouraged to adopt semver for their versioning.
Oh, good, I was hoping for this.
The build process is that all the binaries are static, right? Or all the go code is static and it only has dynamic dependencies on C libraries? In such a project as libvirt, wouldn't that mean that the processes we run will be pretty heavy-weight?
Yes, all the Go code is statically linked - only C libs are dyn loaded. This doesn't have any impact on runtime, because even if the binary was 100's of MB in size, the kernel is only ever going to page in sections of that file which are actually executed.
But the code that is used by each binary will be present as many times as that binary is running since it is not dynamically loaded, right?
The main impact is that if a dependancy gets an update (eg for a security fix) all downstream apps need rebuilding.
Well, that sucks, but it's not a deal-breaker.
How is it with rebuilding after a small change. I know Go is good when it comes to compilation times. That might be something that people might like a lot. Especially those who are trying to shave off every second of compilation. However if you cannot use ccache and you always need to rebuild everything, it might increase the build-time quite a lot, even thought indirectly.
Compile times are great - the compiler is very fast, and it does caching on a per-package basis. NB in go a "package" is an individual directory in your source tree, so a typical app would have 10's or 100's of packages, each corresponding to a separate subdir. Source dirs are probably more fine grained that we use in libvirt. eg what's in src/qemu in libvirt currently would likely end up being spread across as many as 5 packages if it were idiomatic Go.
The biggest win though comes from not needing autoconf, automake. Of course libvirt wouldn't see that benefit as long as any of our code were still C, so I won't claim that's a win in this particular case.
You can't have /tmp mounted with noexec option unless you have TMPDIR set to some other directory. And sometimes not even in that case. I guess non-issue for some distros, but bunch of people deal with that and if seems like something that could be taken care of in Go itself and it is just not.
I've not heard of that one before, and so obviously not hit it. From Google it seems this applies if you use 'go run' or 'go test' commands. The former is not something you typically use, but the latter is. It can be avoided by having the make/shell run that invokes 'go test' set a local TMPDIR - no need to set it globally in your bash profile. I'm not sure what distros have /tmp with noexec - Fedora doesn't at least.
Usually if you don't have SELinux and want to be guarded a tiny bit more. Like me. Maybe I'll cahnge that someday. I managed to fix that by having entry and exit scripts that mangle the TMPDIR env var for me. And it works in some cases.
You can't just clone a repo, cd into it and build it. You have to get the dependencies in a special manner, for that you have to have GOPATH set and based on that you have to have your directories setup similarly to what Go expects, then you need GOBIN if you want to build something and other stuff that's just not nice and doesn't make much sense. At least for newcomers. Simply the fact that it seems to me like Go is trying to go against the philosophy of "Do one thing and do it well". I know everyone is about having the build described in the same language as the project, but what comes out of it is not something I prefer.
This is relataed to the thing I mention above where historically everything was just splattered into $GOPATH. Most apps have gone towards the vendoring concept where every dep is self-contained in your local checkout.
I have to read up on the basics on how to do a proper first-time setup for Go. And maybe everything will be sunshine and rainbows from that point forward. It's just that I don't like the fact that it's non-intuitive. Is it possible somehow to just build some code without actually dealing with bunch of dependencies and all the setup? To give an example, is there a way to differentiate the compiler from the dependency handling and build processes? Like gcc and autoconf/automake/make or rustc and cargo?
I'm not against using another language to make some stuff better. I guess it is kind of visible from the mail that I like Rust, but I'm not against other languages as well. I just want this to be full on discussion and I want my opinion to be expressed. Thanks for _listening_ ;)
Either Rust or Go would be a step forward over staying exclusively with C IMHO, so at least we agree that there are potential benefits to either :-)
Yeah. Good luck with evaluating the responses. I hope we won't need to change our Code of Conduct or even resort to voting...
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Nov 20, 2017 at 05:36:24PM +0100, Martin Kletzander wrote:
On Mon, Nov 20, 2017 at 03:25:33PM +0000, Daniel P. Berrange wrote:
I shouldn't have used the word "allocation" in my paragraph above. As you say, both languages have similar needs around allocation. The difference I meant is around deallocation policy - in Rust, object lifetime is is a more explicit decision on control of the progammer, as opposed to Go's garbage collection. From what I've read Rust approach to deallocation is much closer to the C++ concept of "smart pointers", eg this
http://pcwalton.github.io/blog/2013/03/18/an-overview-of-memory-management-i...
This is kind of old, that code wouldn't run with newer Rust. I guess that is from far ago when it was not stabilized at all. It is a bit smarter now. The fact that you have control over when the value is getting freed is true, however you rarely have to think about that. What's more important is that the compiler prevents you from accessing value from multiple places or not knowing who "owns" (think of it as "who should take care of freeing it") the variable. If you give the ownership to someone you can't access it. The difference I see is that if you access it after some other part of the code is responsible for that variable in Rust the compiler will cut you off unless you clearly specify how the memory space related to the variable is supposed to be handled. In Go it will just work (with potential bug) but it will not crash because GC will not clean it up when someone can still access it. Granted this is usually problem with concurrent threads/coroutines (which I don't know how they handle access concurrent access in Go). Also, for example Rust doesn't allow you to have value accessible from multiple threads unless it is guarded by thread-safe reference counter and does not allow you to modify it unless it is guarded by Mutex, RWLock or s one of the Atomic types. Again, I don't know Go that much, I'm still yet to delve into the deep unknowns of it, but I haven't heard about it providing such safety.
The example you give about Rust not permitting concurrent usage is one of the unique features I alluded to that Go doesn't have.
On the choice of language, I will say C is a turn off to many people as it is (not unreasonably) viewed as archaic, hard to learn and difficult to write good code in.
I would say this highly depends on what area you are coming from. In the cloud world it would be viewed way differently than in the dark^Wlow-level side of things. I don't want to compare libvirt to the kernel for example, but I haven't heard about C being a turn-off there.
I'm wary of using the kernel as an anology for anything, because it is special in oh so many ways from "regular" projects... The closest comparison is K8s -> Docker, and there I do see more engagement from Kubernetes devs down into Docker code than I ever saw from OpenStack dev down into libvirt code. There's only so far you can usefully take these comparisons though.
When I worked in OpenStack it was a constant battle to get people to consider enhancements to libvirt instead of reinventing it in Python. It was a hard sell because most python dev just didn't want to use C at all because it has a high curve to contributors, even if libvirt as a community is welcoming. As a result OpenStack pretty much reinvented
I'm sorry, but I think only handful of us (yeah, I think and hope I could count myself amoungst that group) are welcoming. But it's actually where I see one of the big turn-offs.
Perhaps "welcoming" was overstating things - perhaps more "not aggressively hostile" would be a better way of saying it.
How do we deal with problems/bugs in dependence libraries? I know, all the projects are pretty new, so they might be nicer to contributors. If they are not, we might need to fork or rewrite the code. Bam, another chance of losing workforce.
Many C libs we depend on have been around along time so are more mature, but are also more conservative in accepting changes, especially if they touch API. On balance I don't think there would be a big difference either way in this area.
... OK, I hope so and if you say so I will blindly believe that ;)
Mostly I'm basing this off what I see with Docker and Kubernetes. They both relying on a huge number of 3rd party Go modules, and their needs & scale at least on a par with libvirts. IOW, if it is good enough for them, its a sign it would be good enough for us.
The build process is that all the binaries are static, right? Or all the go code is static and it only has dynamic dependencies on C libraries? In such a project as libvirt, wouldn't that mean that the processes we run will be pretty heavy-weight?
Yes, all the Go code is statically linked - only C libs are dyn loaded. This doesn't have any impact on runtime, because even if the binary was 100's of MB in size, the kernel is only ever going to page in sections of that file which are actually executed.
But the code that is used by each binary will be present as many times as that binary is running since it is not dynamically loaded, right?
The memory mapping associated with /usr/bin/foo is still shared across all instances of 'foo' that are executing. If you have /usr/bin/foo and /usr/bin/bar though, which both static linked to 'wizz', then you would get two copies of 'wizz' in memory though.
The main impact is that if a dependancy gets an update (eg for a security fix) all downstream apps need rebuilding.
Well, that sucks, but it's not a deal-breaker.
NB, same as ocaml and a few other languages in Fedora too.
You can't just clone a repo, cd into it and build it. You have to get the dependencies in a special manner, for that you have to have GOPATH set and based on that you have to have your directories setup similarly to what Go expects, then you need GOBIN if you want to build something and other stuff that's just not nice and doesn't make much sense. At least for newcomers. Simply the fact that it seems to me like Go is trying to go against the philosophy of "Do one thing and do it well". I know everyone is about having the build described in the same language as the project, but what comes out of it is not something I prefer.
This is relataed to the thing I mention above where historically everything was just splattered into $GOPATH. Most apps have gone towards the vendoring concept where every dep is self-contained in your local checkout.
I have to read up on the basics on how to do a proper first-time setup for Go. And maybe everything will be sunshine and rainbows from that point forward. It's just that I don't like the fact that it's non-intuitive. Is it possible somehow to just build some code without actually dealing with bunch of dependencies and all the setup? To give an example, is there a way to differentiate the compiler from the dependency handling and build processes? Like gcc and autoconf/automake/make or rustc and cargo?
The compiler binary 'go' expects all the deps to have been pulled down locally. It will search for them in either GOPATH (the traditional location), or local vendor/ dir (the modern location), searching vendor/ first. The process of actually getting those deps installed is handled by a separate tool. On my most recent code I used a tool called 'glide', but Go is in process of unifying on a tool called "dep". You still need to normally checkout in $GOPATH/src/$REPOURL eg when checking out https://github.com/dicot-project/dicot-api it would go into $GOPATH/src/github.com/dicot-project/dicot-api It would be possible to avoid this, if I made the makefile auto-populate a $GOPATH with symlinnks, but I've not bothered to try that, as the use of $GOPATH layout doesn't really bother me now that the 3rd party deps are isolated into the vendor/ dir.
I'm not against using another language to make some stuff better. I guess it is kind of visible from the mail that I like Rust, but I'm not against other languages as well. I just want this to be full on discussion and I want my opinion to be expressed. Thanks for _listening_ ;)
Either Rust or Go would be a step forward over staying exclusively with C IMHO, so at least we agree that there are potential benefits to either :-)
Yeah. Good luck with evaluating the responses. I hope we won't need to change our Code of Conduct or even resort to voting...
I think we're a way off actually getting to the point of decision making. I would view refactoring of libvirtd into modular deamons as the more pressing problem to tackle. Language choice is a more long term thing to consider. Realistically it would need a real world proof of concept illustrating the usage of $LANG in combination with C, with our codebase, to be able to properly understand what it would look like for libvirts POV. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Nov 20, 2017 at 04:57:56PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 20, 2017 at 05:36:24PM +0100, Martin Kletzander wrote:
On Mon, Nov 20, 2017 at 03:25:33PM +0000, Daniel P. Berrange wrote:
I shouldn't have used the word "allocation" in my paragraph above. As you say, both languages have similar needs around allocation. The difference I meant is around deallocation policy - in Rust, object lifetime is is a more explicit decision on control of the progammer, as opposed to Go's garbage collection. From what I've read Rust approach to deallocation is much closer to the C++ concept of "smart pointers", eg this
http://pcwalton.github.io/blog/2013/03/18/an-overview-of-memory-management-i...
This is kind of old, that code wouldn't run with newer Rust. I guess that is from far ago when it was not stabilized at all. It is a bit smarter now. The fact that you have control over when the value is getting freed is true, however you rarely have to think about that. What's more important is that the compiler prevents you from accessing value from multiple places or not knowing who "owns" (think of it as "who should take care of freeing it") the variable. If you give the ownership to someone you can't access it. The difference I see is that if you access it after some other part of the code is responsible for that variable in Rust the compiler will cut you off unless you clearly specify how the memory space related to the variable is supposed to be handled. In Go it will just work (with potential bug) but it will not crash because GC will not clean it up when someone can still access it. Granted this is usually problem with concurrent threads/coroutines (which I don't know how they handle access concurrent access in Go). Also, for example Rust doesn't allow you to have value accessible from multiple threads unless it is guarded by thread-safe reference counter and does not allow you to modify it unless it is guarded by Mutex, RWLock or s one of the Atomic types. Again, I don't know Go that much, I'm still yet to delve into the deep unknowns of it, but I haven't heard about it providing such safety.
The example you give about Rust not permitting concurrent usage is one of the unique features I alluded to that Go doesn't have.
Just a quick note on what I've found out after I dedicated half day to go through the tour of go and some other tutorials. The learning curve of Go is even less steep than I though (for some unknown reason) it is. So that's in favor of Go. However I haven't found out how is it possible to avoid some SIGSEGVs or aborts since Go doesn't have many recoverable errors. And in some cases they are not easy to spot immediately. Or making sure struct fields are initialized. Since libvirt strives to go for recoverable errors, I see this as a downside. Anyway, that's just to update others on what I've learnt. Have a nice day, Martin

On Tue, Nov 28, 2017 at 08:43:54AM +0100, Martin Kletzander wrote:
On Mon, Nov 20, 2017 at 04:57:56PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 20, 2017 at 05:36:24PM +0100, Martin Kletzander wrote:
On Mon, Nov 20, 2017 at 03:25:33PM +0000, Daniel P. Berrange wrote:
I shouldn't have used the word "allocation" in my paragraph above. As you say, both languages have similar needs around allocation. The difference I meant is around deallocation policy - in Rust, object lifetime is is a more explicit decision on control of the progammer, as opposed to Go's garbage collection. From what I've read Rust approach to deallocation is much closer to the C++ concept of "smart pointers", eg this
http://pcwalton.github.io/blog/2013/03/18/an-overview-of-memory-management-i...
This is kind of old, that code wouldn't run with newer Rust. I guess that is from far ago when it was not stabilized at all. It is a bit smarter now. The fact that you have control over when the value is getting freed is true, however you rarely have to think about that. What's more important is that the compiler prevents you from accessing value from multiple places or not knowing who "owns" (think of it as "who should take care of freeing it") the variable. If you give the ownership to someone you can't access it. The difference I see is that if you access it after some other part of the code is responsible for that variable in Rust the compiler will cut you off unless you clearly specify how the memory space related to the variable is supposed to be handled. In Go it will just work (with potential bug) but it will not crash because GC will not clean it up when someone can still access it. Granted this is usually problem with concurrent threads/coroutines (which I don't know how they handle access concurrent access in Go). Also, for example Rust doesn't allow you to have value accessible from multiple threads unless it is guarded by thread-safe reference counter and does not allow you to modify it unless it is guarded by Mutex, RWLock or s one of the Atomic types. Again, I don't know Go that much, I'm still yet to delve into the deep unknowns of it, but I haven't heard about it providing such safety.
The example you give about Rust not permitting concurrent usage is one of the unique features I alluded to that Go doesn't have.
Just a quick note on what I've found out after I dedicated half day to go through the tour of go and some other tutorials. The learning curve of Go is even less steep than I though (for some unknown reason) it is. So that's in favor of Go. However I haven't found out how is it possible to avoid some SIGSEGVs or aborts since Go doesn't have many recoverable errors. And in some cases they are not easy to spot immediately. Or making sure struct fields are initialized. Since libvirt strives to go for recoverable errors, I see this as a downside.
Either I'm mis-understanding what you mean, or you missed the 'recover' function. In normal operation, error reporting is dealt with by having functions return a value that implements the 'error' interface. Functions can have multiple return values, so typically you would return a pair of values, the first being the data, the second being the error indicator. You check & deal with those errors with normal control flow statements. For cases where the code triggered a runtime panic() (eg dereference a Nil pointer), ordinarily that will terminate the program. At point in the callstack, however, can catch that panic using the recover() method which avoids termination, and resumes normal execution. Typically in an RPC server, the RPC dispatch method would use recover() so that if any RPC method execution panic()s the server carries on running normally, only that one method is terminated. The only thing that you can't catch is when you call into C code and that crashes. The C code can obviously arbitrarily corrupt memory, so there's no safe way to recover that. Only the Go can be recover()d from. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Nov 28, 2017 at 10:22:21AM +0000, Daniel P. Berrange wrote:
On Tue, Nov 28, 2017 at 08:43:54AM +0100, Martin Kletzander wrote:
Just a quick note on what I've found out after I dedicated half day to go through the tour of go and some other tutorials. The learning curve of Go is even less steep than I though (for some unknown reason) it is. So that's in favor of Go. However I haven't found out how is it possible to avoid some SIGSEGVs or aborts since Go doesn't have many recoverable errors. And in some cases they are not easy to spot immediately. Or making sure struct fields are initialized. Since libvirt strives to go for recoverable errors, I see this as a downside.
Either I'm mis-understanding what you mean, or you missed the 'recover' function. In normal operation, error reporting is dealt with by having functions return a value that implements the 'error' interface. Functions can have multiple return values, so typically you would return a pair of values, the first being the data, the second being the error indicator. You check & deal with those errors with normal control flow statements.
For cases where the code triggered a runtime panic() (eg dereference a Nil pointer), ordinarily that will terminate the program. At point in the callstack, however, can catch that panic using the recover() method which avoids termination, and resumes normal execution. Typically in an RPC server, the RPC dispatch method would use recover() so that if any RPC method execution panic()s the server carries on running normally, only that one method is terminated.
The only thing that you can't catch is when you call into C code and that crashes. The C code can obviously arbitrarily corrupt memory, so there's no safe way to recover that. Only the Go can be recover()d from.
Opps, meant to include this link https://blog.golang.org/defer-panic-and-recover Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Nov 28, 2017 at 10:25:37AM +0000, Daniel P. Berrange wrote:
On Tue, Nov 28, 2017 at 10:22:21AM +0000, Daniel P. Berrange wrote:
On Tue, Nov 28, 2017 at 08:43:54AM +0100, Martin Kletzander wrote:
Just a quick note on what I've found out after I dedicated half day to go through the tour of go and some other tutorials. The learning curve of Go is even less steep than I though (for some unknown reason) it is. So that's in favor of Go. However I haven't found out how is it possible to avoid some SIGSEGVs or aborts since Go doesn't have many recoverable errors. And in some cases they are not easy to spot immediately. Or making sure struct fields are initialized. Since libvirt strives to go for recoverable errors, I see this as a downside.
Either I'm mis-understanding what you mean, or you missed the 'recover' function. In normal operation, error reporting is dealt with by having functions return a value that implements the 'error' interface. Functions can have multiple return values, so typically you would return a pair of values, the first being the data, the second being the error indicator. You check & deal with those errors with normal control flow statements.
For cases where the code triggered a runtime panic() (eg dereference a Nil pointer), ordinarily that will terminate the program. At point in the callstack, however, can catch that panic using the recover() method which avoids termination, and resumes normal execution. Typically in an RPC server, the RPC dispatch method would use recover() so that if any RPC method execution panic()s the server carries on running normally, only that one method is terminated.
The only thing that you can't catch is when you call into C code and that crashes. The C code can obviously arbitrarily corrupt memory, so there's no safe way to recover that. Only the Go can be recover()d from.
Opps, meant to include this link
Oh, I didn't know about this, that's cool. I totally missed the recover() function. Thanks for the info and the link! I'm starting to feel like I know Go now :D
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Nov 28, 2017 at 9:44 AM Martin Kletzander <mkletzan@redhat.com> wrote:
On Mon, Nov 20, 2017 at 04:57:56PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 20, 2017 at 05:36:24PM +0100, Martin Kletzander wrote:
On Mon, Nov 20, 2017 at 03:25:33PM +0000, Daniel P. Berrange wrote:
I shouldn't have used the word "allocation" in my paragraph above. As you say, both languages have similar needs around allocation. The
difference
I meant is around deallocation policy - in Rust, object lifetime is is a more explicit decision on control of the progammer, as opposed to Go's garbage collection. From what I've read Rust approach to deallocation is much closer to the C++ concept of "smart pointers", eg this
http://pcwalton.github.io/blog/2013/03/18/an-overview-of-memory-management-i...
This is kind of old, that code wouldn't run with newer Rust. I guess that is from far ago when it was not stabilized at all. It is a bit smarter now. The fact that you have control over when the value is getting freed is true, however you rarely have to think about that. What's more important is that the compiler prevents you from accessing value from multiple places or not knowing who "owns" (think of it as "who should take care of freeing it") the variable. If you give the ownership to someone you can't access it. The difference I see is that if you access it after some other part of the code is responsible for that variable in Rust the compiler will cut you off unless you clearly specify how the memory space related to the variable is supposed to be handled. In Go it will just work (with potential bug) but it will not crash because GC will not clean it up when someone can still access it. Granted this is usually problem with concurrent threads/coroutines (which I don't know how they handle access concurrent access in Go). Also, for example Rust doesn't allow you to have value accessible from multiple threads unless it is guarded by thread-safe reference counter and does not allow you to modify it unless it is guarded by Mutex, RWLock or s one of the Atomic types. Again, I don't know Go that much, I'm still yet to delve into the deep unknowns of it, but I haven't heard about it providing such safety.
The example you give about Rust not permitting concurrent usage is one of the unique features I alluded to that Go doesn't have.
Just a quick note on what I've found out after I dedicated half day to go through the tour of go and some other tutorials. The learning curve of Go is even less steep than I though (for some unknown reason) it is. So that's in favor of Go. However I haven't found out how is it possible to avoid some SIGSEGVs or aborts since Go doesn't have many recoverable errors. And in some cases they are not easy to spot immediately. Or making sure struct fields are initialized.
Struct fields are always initialized to the zero value of the type, there is no such thing as uninitialized memory in go. https://golang.org/ref/spec#The_zero_value
Since libvirt strives to go for recoverable errors, I see this as a downside.
Anyway, that's just to update others on what I've learnt.
Have a nice day, Martin-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Tue, Nov 28, 2017 at 10:47:33AM +0000, Nir Soffer wrote:
On Tue, Nov 28, 2017 at 9:44 AM Martin Kletzander <mkletzan@redhat.com> wrote:
On Mon, Nov 20, 2017 at 04:57:56PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 20, 2017 at 05:36:24PM +0100, Martin Kletzander wrote:
On Mon, Nov 20, 2017 at 03:25:33PM +0000, Daniel P. Berrange wrote:
I shouldn't have used the word "allocation" in my paragraph above. As you say, both languages have similar needs around allocation. The
difference
I meant is around deallocation policy - in Rust, object lifetime is is a more explicit decision on control of the progammer, as opposed to Go's garbage collection. From what I've read Rust approach to deallocation is much closer to the C++ concept of "smart pointers", eg this
http://pcwalton.github.io/blog/2013/03/18/an-overview-of-memory-management-i...
This is kind of old, that code wouldn't run with newer Rust. I guess that is from far ago when it was not stabilized at all. It is a bit smarter now. The fact that you have control over when the value is getting freed is true, however you rarely have to think about that. What's more important is that the compiler prevents you from accessing value from multiple places or not knowing who "owns" (think of it as "who should take care of freeing it") the variable. If you give the ownership to someone you can't access it. The difference I see is that if you access it after some other part of the code is responsible for that variable in Rust the compiler will cut you off unless you clearly specify how the memory space related to the variable is supposed to be handled. In Go it will just work (with potential bug) but it will not crash because GC will not clean it up when someone can still access it. Granted this is usually problem with concurrent threads/coroutines (which I don't know how they handle access concurrent access in Go). Also, for example Rust doesn't allow you to have value accessible from multiple threads unless it is guarded by thread-safe reference counter and does not allow you to modify it unless it is guarded by Mutex, RWLock or s one of the Atomic types. Again, I don't know Go that much, I'm still yet to delve into the deep unknowns of it, but I haven't heard about it providing such safety.
The example you give about Rust not permitting concurrent usage is one of the unique features I alluded to that Go doesn't have.
Just a quick note on what I've found out after I dedicated half day to go through the tour of go and some other tutorials. The learning curve of Go is even less steep than I though (for some unknown reason) it is. So that's in favor of Go. However I haven't found out how is it possible to avoid some SIGSEGVs or aborts since Go doesn't have many recoverable errors. And in some cases they are not easy to spot immediately. Or making sure struct fields are initialized.
Struct fields are always initialized to the zero value of the type, there is no such thing as uninitialized memory in go. https://golang.org/ref/spec#The_zero_value
Sorry, I meant s/initialized/non-nil/
Since libvirt strives to go for recoverable errors, I see this as a downside.
Anyway, that's just to update others on what I've learnt.
Have a nice day, Martin-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 11/20/2017 09:25 AM, Daniel P. Berrange wrote:
When I worked in OpenStack it was a constant battle to get people to consider enhancements to libvirt instead of reinventing it in Python. It was a hard sell because most python dev just didn't want to use C at all because it has a high curve to contributors, even if libvirt as a community is welcoming. As a result OpenStack pretty much reinvented its own hypervisor agnostic API for esx, hyperv, xenapi and KVM instead of enhancing libvirt's support for esx, hyperv or xenapi.
To be fair, there's also the issue that getting a change into any external project and packaged into all the distros is more unpredictable (and may take longer) than implementing the same thing in their own project. RHEL (just as an example) has been updating libvirt roughly once a year for the past couple years. Chris

On 11/14/2017 06:27 PM, Daniel P. Berrange wrote:
This might be more tricky than one would initially think. What about tests for instance? In our test suite we rely heavily on mocking. For instance, virpcitest uses virpcimock which reimplements kernel behaviour for detach/attach of PCI devices. Would it be possible to have that with a Go program? Okay, virpcimock might not be the best example since it mocks plain syscalls (open, close, ...). But then there are some tests which mock more high level functions - e.g. qemuxml2argvmock. On the other hand, some tests would not be needed at all - e.g. virhashtest - assuming Go has its own implementation of hash tables that we'd use. Michal

On Thu, Nov 23, 2017 at 11:32:26AM +0100, Michal Privoznik wrote:
On 11/14/2017 06:27 PM, Daniel P. Berrange wrote:
This might be more tricky than one would initially think. What about tests for instance? In our test suite we rely heavily on mocking. For instance, virpcitest uses virpcimock which reimplements kernel behaviour for detach/attach of PCI devices. Would it be possible to have that with a Go program? Okay, virpcimock might not be the best example since it mocks plain syscalls (open, close, ...). But then there are some tests which mock more high level functions - e.g. qemuxml2argvmock.
It depends on which areas we would want to mock/test. A regular Go method is not directly callable to from, and thus also not directly mockable with the ELF preload / overrides tricks do. A Go method which is explicitly exported with C calling convention, however, can be mocked in the same manner we do now. So if we had replaced some low level infrastructure with a Go module, we could mock at the public entry points to that Go module which were exported to C code. We could not mock the internals of that Go module. For pure Go code being tested in isolation from any C code, you would use a different approach. The general concept in Go is to define an 'interface' covering sets of APIs which you might want alternative impls for. So if there was a bit of Go code you wanted to easily replace in unit tests with a fake impl, you would define an interface and fake impl that satisifies that interface. So you need to put a little bit more thought into the code upfront to make it fakeable in tests, but if you do that the end result is comparable. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (10)
-
Bjoern Walk
-
Chris Friesen
-
Daniel P. Berrange
-
John Ferlan
-
Markus Armbruster
-
Martin Kletzander
-
Michal Privoznik
-
Nir Soffer
-
Peter Krempa
-
Richard W.M. Jones