[libvirt] [discuss] The new cgroup patches for libvirt

Hi, Everyone, I've seen a new set of patches from Dan Smith, which implement cgroup support for libvirt. While the patches seem simple, there are some issues that have been pointed out in the posting itself. I hope that libvirt will switch over (may be after your concerns are addressed and definitely in the longer run) to using libcgroups rather than having an internal implementation of cgroups. The advantages of switching over would be using the functionality that libcgroup already provides libcgroups (libcg.sf.net) provides 1. Ability to configure and mount cgroups and controllers via initscripts and a configuration file 2. An API to control and read cgroups information 3. Thread safety around API calls 4. Daemons to automatically classify a task based on a certain set of rules 5. API to extract current cgroup classification (where is the task currently in the cgroup hierarchy) While re-implementing might sound like a cool thing to do, here are the drawbacks 1. It leads to code duplication and reduces code reuse 2. It leads to confused users I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us 1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt -- Balbir

On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us
1. Fix any issues you see or point them to us
I did point the general problem of ABI in libcgroup http://www.mail-archive.com/libvir-list@redhat.com/msg08388.html I didn't see any reply to the points I raised specifically. In the meantime we got a relatively simple, sufficient for now, usable right now, patch fullfilling our needs. A working patch is better in my eye than something which may work well in the future if we take the time to integrate it and stabilize and propagate to the systems we use. The package available in Fedora 9 has not improved as far as I can tell. So I'm still keeping the same point of view as posted on that same thread a month ago: http://www.mail-archive.com/libvir-list@redhat.com/msg08472.html "Yes I don't want to presume the ability of the libcgroup to become cleaner and more stable, we can probably go with a small internal API and when/if things become nicer, then reuse libcgroup," As maintainer I will also note that "nicer" also imply the ability to work well and smoothly with the other maintainers. I hate guerilla, I would prefer if you had read and replied to what I wrote. So Dan Smith patch should IMHO go in now, if later your API are widely distributed, cleaner than what i have now (0.1c may be old but what is available to us on Fedora, no idea what is available on other distros) and there is a clean patch to switch then we will look at it, right now we can't use libcgroup in my opinion. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Fri, Oct 03, 2008 at 06:22:27PM +0200, Daniel Veillard wrote:
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us
1. Fix any issues you see or point them to us
I did point the general problem of ABI in libcgroup http://www.mail-archive.com/libvir-list@redhat.com/msg08388.html
I didn't see any reply to the points I raised specifically.
I did respond back to that email at http://www.mail-archive.com/libvir-list@redhat.com/msg08541.html Since then patches have been merged which clean up that part of the code.
In the meantime we got a relatively simple, sufficient for now, usable right now, patch fullfilling our needs. A working patch is better in my eye than something which may work well in the future if we take the time to integrate it and stabilize and propagate to the systems we use.
The package available in Fedora 9 has not improved as far as I can tell.
Rawhide has a newer package, and I am working on packaging up v0.32 for rawhide now. Should be pushed out sometime soon.
So I'm still keeping the same point of view as posted on that same thread a month ago:
http://www.mail-archive.com/libvir-list@redhat.com/msg08472.html
"Yes I don't want to presume the ability of the libcgroup to become cleaner and more stable, we can probably go with a small internal API and when/if things become nicer, then reuse libcgroup,"
As maintainer I will also note that "nicer" also imply the ability to work well and smoothly with the other maintainers. I hate guerilla, I would prefer if you had read and replied to what I wrote.
So Dan Smith patch should IMHO go in now, if later your API are widely distributed, cleaner than what i have now (0.1c may be old but what is available to us on Fedora, no idea what is available on other distros) and there is a clean patch to switch then we will look at it, right now we can't use libcgroup in my opinion.
If you would not mind, could you take a look at the latest snapshot available at http://sourceforge.net/projects/libcg , and let us know what is missing, we can implement it so that libvirt's needs are met. Thanks, -- regards, Dhaval

On Fri, Oct 3, 2008 at 9:52 PM, Daniel Veillard <veillard@redhat.com> wrote:
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us
1. Fix any issues you see or point them to us
I did point the general problem of ABI in libcgroup http://www.mail-archive.com/libvir-list@redhat.com/msg08388.html
I thought I responded to them at http://www.mail-archive.com/libvir-list@redhat.com/msg08512.html
I didn't see any reply to the points I raised specifically. In the meantime we got a relatively simple, sufficient for now, usable right now, patch fullfilling our needs. A working patch is better in my eye than something which may work well in the future if we take the time to integrate it and stabilize and propagate to the systems we use.
The package available in Fedora 9 has not improved as far as I can tell. So I'm still keeping the same point of view as posted on that same thread a month ago:
http://www.mail-archive.com/libvir-list@redhat.com/msg08472.html
If I remember correctly, Dhaval has pushed version 0.31 into Fedora and we will soon push in version 0.32
"Yes I don't want to presume the ability of the libcgroup to become cleaner and more stable, we can probably go with a small internal API and when/if things become nicer, then reuse libcgroup,"
As maintainer I will also note that "nicer" also imply the ability to work well and smoothly with the other maintainers. I hate guerilla, I would prefer if you had read and replied to what I wrote.
So Dan Smith patch should IMHO go in now, if later your API are widely distributed, cleaner than what i have now (0.1c may be old but what is available to us on Fedora, no idea what is available on other distros) and there is a clean patch to switch then we will look at it, right now we can't use libcgroup in my opinion.
Your approach is fine, but it is a very hands off approach, I was hoping that you would be more proactive and fix things or help us fix them (Daniel P Berrange has been very helpful). I don't blame you, since everyone has limited bandwidth. Balbir

On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
Hi, Everyone,
I've seen a new set of patches from Dan Smith, which implement cgroup support for libvirt. While the patches seem simple, there are some issues that have been pointed out in the posting itself.
I hope that libvirt will switch over (may be after your concerns are addressed and definitely in the longer run) to using libcgroups rather than having an internal implementation of cgroups. The advantages of switching over would be using the functionality that libcgroup already provides
libcgroups (libcg.sf.net) provides
1. Ability to configure and mount cgroups and controllers via initscripts and a configuration file 2. An API to control and read cgroups information 3. Thread safety around API calls 4. Daemons to automatically classify a task based on a certain set of rules 5. API to extract current cgroup classification (where is the task currently in the cgroup hierarchy)
So from a functional point of view you are addressing essentially three use cases 1. System configuration for controllers 2. Automatic task classification 3. Application development API for creating groups If each piece is correctly designed, the choice of implementation for each of these can be, and in some cases must be, totally independant. Since the kernel restricts that a single controller can only be attached to one cgroupsfs mount point, and one attach cannot be changed, the choice of how / where to mount controllers must remain outside the scope of applications. If any application using cgroups were to specify mount points, it would be inflicting its own requirements on every user of cgroups. This implies that applications must be designed to work with whatever controller mount configuration the admin has configured, and not configure stuff themselves. So impl for point 1 (configuration) must, by neccessity, be completely independant of impl for point 3 (application API). Considering automatic task classification. The task classification engine must be able to cope with the fact that applications have some functional requirements on cgroups setup. Taking libvirt as an example, we have a specific need to apply some controllers over a group of processes forming a container. A task classification engine must not re-clasify individual tasks within a container because that would conflict with the semantics required by libvirt. It is, however, free to re-classify the libvirtd daemon itself - whatever cgroup libvirtd is placed in, it will create the LXC cgroups below this point. So if libvirt is designed correctly, it will work with whatever cgroup task classification engine that might be running. Similarly if the task classification engine has been designed to co-operate with applications there is no problem running it alonside libvirt. Thus the implementation of points 2 (task classification) and point 3 (application API) have no need to be formally tied together. Furthermore tieing them together does not magically solve the problem that both applications & the cgroups task classification engine need to be intelligently designed to co-operate.
While re-implementing might sound like a cool thing to do, here are the drawbacks
1. It leads to code duplication and reduces code reuse
This is important if the library code is providing significant value add to the application using it. As it stands, libcgroup is merely a direct interface to the cgroups filesystem providing weakly typed setters & getters - with the exception of looking at the mount table to find where a controller lives, this is not hard / complex code, so the benefits of re-use are not particularly high. In such a scenario reducing code duplication is not in itself a benefit, since there are costs associated with using external libraries. It is more complicated integrate 2 independant style sof API, particularly with different views on error reporting, memory management and varying expectations for the semantic models exposed. There are a number of 'hard' questions wrt to cgroups usage by applications, two of which are outlined above. Simply having all applications use a single API cannot magically solve any of these problems - no matter what API is used application developers need to take care to design their usage of cgroups such that it 'plays nicely' with other applications.
2. It leads to confused users
The use of cgroups is an internal implementation detail for libvirt's LXC driver. In comon with all libvirt drivers, the user has no need to know about the underlying impl details and these can & will change at will as we discover better ways to achieve things. As such its irrelevant to a user how we configure the cgroups filesytem for libvirt - whether we do it directly, or via libcg the end result is identical - a set of directories in the cgroups filesystem.
I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us
1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt
Our of the 3 functional areas I outlined earlier on, the ones that can provide the most value to users of libvirt, are a standardized means to configure cgroups mounts and have them active at boot time, and the task classification engine. Both of these are interesting & hard problems for which there is clear value in only having 1 global implementation. Neither of these has a strong dependancy on the API that 3rd party applications use to talk to cgroups filesytem - from a functional point of view they ought to 'just work' no matter what API the app uses, provided the application is design to assume presence of other applications using cgroups. So I do see value in the work cgroups project is producing, but the application development API is the least critical part of this - the task engine and cofiguration policy is the key value add that is useful to everyone. Since cgroups is an internal implementation detail for libvirt LXC driver, we can change our impl at any time we like, should circumstance change & use of libcgroup.so be critically neccessary. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
Hi, Everyone,
I've seen a new set of patches from Dan Smith, which implement cgroup support for libvirt. While the patches seem simple, there are some issues that have been pointed out in the posting itself.
I hope that libvirt will switch over (may be after your concerns are addressed and definitely in the longer run) to using libcgroups rather than having an internal implementation of cgroups. The advantages of switching over would be using the functionality that libcgroup already provides
libcgroups (libcg.sf.net) provides
1. Ability to configure and mount cgroups and controllers via initscripts and a configuration file 2. An API to control and read cgroups information 3. Thread safety around API calls 4. Daemons to automatically classify a task based on a certain set of rules 5. API to extract current cgroup classification (where is the task currently in the cgroup hierarchy)
So from a functional point of view you are addressing essentially three use cases
1. System configuration for controllers 2. Automatic task classification 3. Application development API for creating groups
If each piece is correctly designed, the choice of implementation for each of these can be, and in some cases must be, totally independant.
Since the kernel restricts that a single controller can only be attached to one cgroupsfs mount point, and one attach cannot be changed, the choice of how / where to mount controllers must remain outside the scope of applications. If any application using cgroups were to specify mount points, it would be inflicting its own requirements on every user of cgroups. This implies that applications must be designed to work with whatever controller mount configuration the admin has configured, and not configure stuff themselves. So impl for point 1 (configuration) must, by neccessity, be completely independant of impl for point 3 (application API).
Considering automatic task classification. The task classification engine must be able to cope with the fact that applications have some functional requirements on cgroups setup. Taking libvirt as an example, we have a specific need to apply some controllers over a group of processes forming a container. A task classification engine must not re-clasify individual tasks within a container because that would conflict with the semantics required by libvirt. It is, however, free to re-classify the libvirtd daemon itself - whatever cgroup libvirtd is placed in, it will create the LXC cgroups below this point.
So if libvirt is designed correctly, it will work with whatever cgroup task classification engine that might be running. Similarly if the task classification engine has been designed to co-operate with applications there is no problem running it alonside libvirt. Thus the implementation of points 2 (task classification) and point 3 (application API) have no need to be formally tied together. Furthermore tieing them together does not magically solve the problem that both applications & the cgroups task classification engine need to be intelligently designed to co-operate.
Agreed!
While re-implementing might sound like a cool thing to do, here are the drawbacks
1. It leads to code duplication and reduces code reuse
This is important if the library code is providing significant value add to the application using it. As it stands, libcgroup is merely a direct interface to the cgroups filesystem providing weakly typed setters & getters - with the exception of looking at the mount table to find where a controller lives, this is not hard / complex code, so the benefits of re-use are not particularly high.
Please see my earlier email on layering of API.
In such a scenario reducing code duplication is not in itself a benefit, since there are costs associated with using external libraries. It is more complicated integrate 2 independant style sof API, particularly with different views on error reporting, memory management and varying expectations for the semantic models exposed.
I disagree, I see a lot of code that does the same thing, look through /proc/mounts, read and parse values to write and read. I see two API's you've built on top of what libcgroup has (one for setting memory limit and the other for devices). Please compare the patch sizes as well and you'll see what I mean.
There are a number of 'hard' questions wrt to cgroups usage by applications, two of which are outlined above. Simply having all applications use a single API cannot magically solve any of these problems - no matter what API is used application developers need to take care to design their usage of cgroups such that it 'plays nicely' with other applications.
Playing "nicely" is a definite requirement, but not using existing code or contributing to it if something is broken and re-implementing it, sounds a little extreme.
2. It leads to confused users
The use of cgroups is an internal implementation detail for libvirt's LXC driver. In comon with all libvirt drivers, the user has no need to know about the underlying impl details and these can & will change at will as we discover better ways to achieve things. As such its irrelevant to a user how we configure the cgroups filesytem for libvirt - whether we do it directly, or via libcg the end result is identical - a set of directories in the cgroups filesystem.
Why do you want to invest in maintaining cgroups code? You'll see as time passes by that the effort spent in maintaining, enhancing the code will increase. For example, CPU shares are currently unsupported by containers and the cost of maintenance will continue to increase as newer controllers are developed.
I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us
1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt
Our of the 3 functional areas I outlined earlier on, the ones that can provide the most value to users of libvirt, are a standardized means to configure cgroups mounts and have them active at boot time, and the task classification engine. Both of these are interesting & hard problems for which there is clear value in only having 1 global implementation. Neither of these has a strong dependancy on the API that 3rd party applications use to talk to cgroups filesytem - from a functional point of view they ought to 'just work' no matter what API the app uses, provided the application is design to assume presence of other applications using cgroups.
So I do see value in the work cgroups project is producing, but the application development API is the least critical part of this - the task engine and cofiguration policy is the key value add that is useful to everyone. Since cgroups is an internal implementation detail for libvirt LXC driver, we can change our impl at any time we like, should circumstance change & use of libcgroup.so be critically neccessary.
Sounds reasonable to me, I hope we do converge at some point sooner rather than latter (well time will tell :) ) -- Balbir

On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us
1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt
To expand on what I said in my other mail about providing value-add over the representation exposed by the kernel, here's some thoughts on the API exposed. Consider the following high level use case of libvirt - A set of groups, in a 3 level hierarchy <APPNAME>/<DRIVER>/<DOMAIN> - Control the ACL for block/char devices - Control memory limits This translates into an underling implementation, that I need to create 3 levels of cgroups in the filesystem, attach my PIDs at the 3rd level use the memory and device controllers and attach PIDs at the 3rd, and set values for attributes exposed by the controllers. Notice I'm not actually setting any config parms at the 1st & 2nd levels, but they do need to still exist to ensure namespace uniqueness amongst different applications using cgroups. The current cgroups API provides APIs that directly map to individual actions wrt the kernel filesystem exposed. So as an application developer I have to explicitly create the 3 levels of hierarchy, tell it I want to use memory & device controllers, format config values into the syntax required for each attribute, and remeber the attribute names. // Create the hierachy <APPNAME>/<DRIVER>/<DOMAIN> c1 = cgroup_new_cgroup("libvirt") c2 = cgroup_new_cgroup_parent(c1, "lxc") c3 = cgroup_new_cgroup_parent(c2, domain.name) // Setup the controllers I want to use cgroup_add_controler(c3, "devices") cgroup_add_controller(c3, "memory") // Add my domain's PID to the cgroup cgroup_attach_task(c3, domain.pid) // Set the device ACL limits cgroup_set_value_string(c2, "devices.deny", "a"); char buf[1024]; sprintf(buf, "%c %d:%d", 'c', 1, 3); cgroup_set_value_stirng(c2, "devices.allow", buf); // Set memory limit cgroup_set_value_uint64(c2, "memory.limit_in_bytes", domain.memory * 1024); This really isn't providing any semantically useful abstraction over the direct filesytem manipulation. Just a bunch of wrappers for mkdir(), mount() and read()/write() calls. My application still has to know far too much information about the details of cgroups as exposed by the kernel. I do not care that there is a concept of 'controllers' at all, I just want to set device ACLs and memory limits. I do not care what the attributes in the filesystem are called, again I just want to set device ACLs and memory limits. I do not care what the data format for them must be for device/memory settings. Memory settings could be stored in base-2, base-10 or base-16 I should not have to know this information. With this style of API, the library provide no real value-add or compelling reason to use it. What might a more useful API look like? At least from my point of view, I'd like to be able to say: // Tell it I want $PID placed in <APPNAME>/<DRIVER>/<DOMAIN> char *path[] = { "libvirt", "lxc", domain.name}; cg = cgroup_new_path(path, domain.pid) // I want to deny all devices cgroup_deny_all_devices(cg); // Allow /dev/null - either by node/major/minor cgroup_allow_device_node(cg, 'c', 1, 3); // Or more conviently just give it a node to copy info from cgroup_allow_device_node(cg, "/dev/null") // Set memory in KB cgroup_set_memory_limit_kb(cg, domain.memory) Notice how with such a style of API, I don't need to know anything about the low level implementation details - I'm working entirely in terms of semantically meaningful concepts. Now, comes the hard bit - you have to figure out what semantic concepts you want to expose to applications. The example here would be suitable for libvirt, but not neccessarily for other applications. Picking the right APIs is very much much harder than just exposing the kernel capabilities directly as libcgroup.h does now, but the trade off is that the resulting API would be much more useful and interesting to app developers. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Fri, Oct 03, 2008 at 07:13:58PM +0100, Daniel P. Berrange wrote:
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us
1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt
To expand on what I said in my other mail about providing value-add over the representation exposed by the kernel, here's some thoughts on the API exposed.
Consider the following high level use case of libvirt
- A set of groups, in a 3 level hierarchy <APPNAME>/<DRIVER>/<DOMAIN> - Control the ACL for block/char devices - Control memory limits
This translates into an underling implementation, that I need to create 3 levels of cgroups in the filesystem, attach my PIDs at the 3rd level use the memory and device controllers and attach PIDs at the 3rd, and set values for attributes exposed by the controllers. Notice I'm not actually setting any config parms at the 1st & 2nd levels, but they do need to still exist to ensure namespace uniqueness amongst different applications using cgroups.
The current cgroups API provides APIs that directly map to individual actions wrt the kernel filesystem exposed. So as an application developer I have to explicitly create the 3 levels of hierarchy, tell it I want to use memory & device controllers, format config values into the syntax required for each attribute, and remeber the attribute names.
// Create the hierachy <APPNAME>/<DRIVER>/<DOMAIN> c1 = cgroup_new_cgroup("libvirt") c2 = cgroup_new_cgroup_parent(c1, "lxc") c3 = cgroup_new_cgroup_parent(c2, domain.name)
// Setup the controllers I want to use cgroup_add_controler(c3, "devices") cgroup_add_controller(c3, "memory")
// Add my domain's PID to the cgroup cgroup_attach_task(c3, domain.pid)
// Set the device ACL limits cgroup_set_value_string(c2, "devices.deny", "a");
char buf[1024]; sprintf(buf, "%c %d:%d", 'c', 1, 3); cgroup_set_value_stirng(c2, "devices.allow", buf);
// Set memory limit cgroup_set_value_uint64(c2, "memory.limit_in_bytes", domain.memory * 1024);
This really isn't providing any semantically useful abstraction over the direct filesytem manipulation. Just a bunch of wrappers for mkdir(), mount() and read()/write() calls. My application still has to know far too much information about the details of cgroups as exposed by the kernel.
Good point! Let's see how we can improve upon this issue faced by applications.
I do not care that there is a concept of 'controllers' at all, I just want to set device ACLs and memory limits. I do not care what the attributes in the filesystem are called, again I just want to set device ACLs and memory limits. I do not care what the data format for them must be for device/memory settings. Memory settings could be stored in base-2, base-10 or base-16 I should not have to know this information.
With this style of API, the library provide no real value-add or compelling reason to use it.
What might a more useful API look like? At least from my point of view, I'd like to be able to say:
// Tell it I want $PID placed in <APPNAME>/<DRIVER>/<DOMAIN> char *path[] = { "libvirt", "lxc", domain.name}; cg = cgroup_new_path(path, domain.pid)
// I want to deny all devices cgroup_deny_all_devices(cg);
// Allow /dev/null - either by node/major/minor cgroup_allow_device_node(cg, 'c', 1, 3);
// Or more conviently just give it a node to copy info from cgroup_allow_device_node(cg, "/dev/null")
// Set memory in KB cgroup_set_memory_limit_kb(cg, domain.memory)
Notice how with such a style of API, I don't need to know anything about the low level implementation details - I'm working entirely in terms of semantically meaningful concepts.
OK. This is something Balbir and I have been discussing , on how to push libcgroup forward. I do have a patch which started looking at controller specific stuff, but now that we are quite clear on what would be good, its much clearer in what direction it should proceed (and that I should throw away what I wrote, and look to design it in this fashion). I am on vacation for the next two weeks, but I shall look at pushing this forward, very soon.
Now, comes the hard bit - you have to figure out what semantic concepts you want to expose to applications. The example here would be suitable for libvirt, but not neccessarily for other applications. Picking the right APIs is very much much harder than just exposing the kernel capabilities directly as libcgroup.h does now, but the trade off is that the resulting API would be much more useful and interesting to app developers.
I hope we can utilize your experience here to help us with libcgroup as well. thanks, -- regards, Dhaval

On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us
1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt
To expand on what I said in my other mail about providing value-add over the representation exposed by the kernel, here's some thoughts on the API exposed.
Consider the following high level use case of libvirt
- A set of groups, in a 3 level hierarchy <APPNAME>/<DRIVER>/<DOMAIN> - Control the ACL for block/char devices - Control memory limits
This translates into an underling implementation, that I need to create 3 levels of cgroups in the filesystem, attach my PIDs at the 3rd level use the memory and device controllers and attach PIDs at the 3rd, and set values for attributes exposed by the controllers. Notice I'm not actually setting any config parms at the 1st & 2nd levels, but they do need to still exist to ensure namespace uniqueness amongst different applications using cgroups.
The current cgroups API provides APIs that directly map to individual actions wrt the kernel filesystem exposed. So as an application developer I have to explicitly create the 3 levels of hierarchy, tell it I want to use memory & device controllers, format config values into the syntax required for each attribute, and remeber the attribute names.
// Create the hierachy <APPNAME>/<DRIVER>/<DOMAIN> c1 = cgroup_new_cgroup("libvirt") c2 = cgroup_new_cgroup_parent(c1, "lxc") c3 = cgroup_new_cgroup_parent(c2, domain.name)
// Setup the controllers I want to use cgroup_add_controler(c3, "devices") cgroup_add_controller(c3, "memory")
// Add my domain's PID to the cgroup cgroup_attach_task(c3, domain.pid)
// Set the device ACL limits cgroup_set_value_string(c2, "devices.deny", "a");
char buf[1024]; sprintf(buf, "%c %d:%d", 'c', 1, 3); cgroup_set_value_stirng(c2, "devices.allow", buf);
// Set memory limit cgroup_set_value_uint64(c2, "memory.limit_in_bytes", domain.memory * 1024);
This really isn't providing any semantically useful abstraction over the direct filesytem manipulation. Just a bunch of wrappers for mkdir(), mount() and read()/write() calls. My application still has to know far too much information about the details of cgroups as exposed by the kernel.
True, it definitely does and the way I look at APIs is that they are layers. We've built the first layer that abstracts permissions, paths and strings into a set of useful API. The second layer does things that you say, the question then is why don't we have it yet? Let me try and answer that question 1. We've been trying to build configuration, classification and the low level plumbing 2. We've been planning to build the exact same thing that you say, we call that the pluggable architecture, where controller plug in their logic and provide the abstractions you need, but not gotten there yet. When you announced cgroup support in libvirt, it was definitely going to be a user and we hoped that you would come to us with your exact requirements that you've mentioned now (believe me, your feedback is very useful). The question then to ask is, is it cheaper for you to build these abstractions into libvirt or either helped us or asked us to do so, we would have gladly obliged. You might say that the onus is on the maintainers to do the right thing without feedback, but I would beg to differ. What you've asked for, I consider as a layer on top of the API we have now and should be easy to build.
I do not care that there is a concept of 'controllers' at all, I just want to set device ACLs and memory limits. I do not care what the attributes in the filesystem are called, again I just want to set device ACLs and memory limits. I do not care what the data format for them must be for device/memory settings. Memory settings could be stored in base-2, base-10 or base-16 I should not have to know this information.
With this style of API, the library provide no real value-add or compelling reason to use it.
What might a more useful API look like? At least from my point of view, I'd like to be able to say:
// Tell it I want $PID placed in <APPNAME>/<DRIVER>/<DOMAIN> char *path[] = { "libvirt", "lxc", domain.name}; cg = cgroup_new_path(path, domain.pid)
// I want to deny all devices cgroup_deny_all_devices(cg);
// Allow /dev/null - either by node/major/minor cgroup_allow_device_node(cg, 'c', 1, 3);
// Or more conviently just give it a node to copy info from cgroup_allow_device_node(cg, "/dev/null")
// Set memory in KB cgroup_set_memory_limit_kb(cg, domain.memory)
Notice how with such a style of API, I don't need to know anything about the low level implementation details - I'm working entirely in terms of semantically meaningful concepts.
Yes, I agree this is definitely more usable and friendlier. These are not hard to do implement today, in fact implementing them would require a few calls to existing API and can be built as controller specific layers (I call them as plugins for each controller).
Now, comes the hard bit - you have to figure out what semantic concepts you want to expose to applications. The example here would be suitable for libvirt, but not neccessarily for other applications. Picking the right APIs is very much much harder than just exposing the kernel capabilities directly as libcgroup.h does now, but the trade off is that the resulting API would be much more useful and interesting to app developers.
I like what you've proposed very much and I am going to start building these abstractions and make them available in libcgroup. At some point, I hope you will find them useful enough so as to drop your abstractions (which I would hope you had directly built into libcgroup and used, so that more people would have benefited from it) and use them. Balbir

On Sat, Oct 04, 2008 at 12:13:38AM +0530, Balbir Singh wrote:
On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange <berrange@redhat.com> wrote: True, it definitely does and the way I look at APIs is that they are layers. We've built the first layer that abstracts permissions, paths and strings into a set of useful API. The second layer does things that you say, the question then is why don't we have it yet?
Let me try and answer that question
1. We've been trying to build configuration, classification and the low level plumbing 2. We've been planning to build the exact same thing that you say, we call that the pluggable architecture, where controller plug in their logic and provide the abstractions you need, but not gotten there yet.
When you announced cgroup support in libvirt, it was definitely going to be a user and we hoped that you would come to us with your exact requirements that you've mentioned now (believe me, your feedback is very useful). The question then to ask is, is it cheaper for you to build these abstractions into libvirt or either helped us or asked us to do so, we would have gladly obliged. You might say that the onus is on the maintainers to do the right thing without feedback, but I would beg to differ.
The thing I didn't mention, is that until Dan posted his current patches actually implementing the cgroups stuff in LXC driver, I didn't have a good picture of what the ideal higher level interface would look like. If you try and imagine high level APIs, without having an app actually using them, its all too easy to design something that turns out to not be useful. So while I know the low level cgroups API isn't what we need, it needs the current proof of concept in the libvirt LXC driver to discover what is an effective approach for libcgroups. I suspect our code will evolve further as we learn from what we've got now. By doing this entirely within libvirt we can experiment with effective implementation strategies without having to lockdown a formally supported API immediately. Once things settle down, it'll easier for libcgroups to see exactly what is important for a high level API and thus make one that's useful to more apps in the long term. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Sat, Oct 4, 2008 at 1:17 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Sat, Oct 04, 2008 at 12:13:38AM +0530, Balbir Singh wrote:
On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange <berrange@redhat.com> wrote: True, it definitely does and the way I look at APIs is that they are layers. We've built the first layer that abstracts permissions, paths and strings into a set of useful API. The second layer does things that you say, the question then is why don't we have it yet?
Let me try and answer that question
1. We've been trying to build configuration, classification and the low level plumbing 2. We've been planning to build the exact same thing that you say, we call that the pluggable architecture, where controller plug in their logic and provide the abstractions you need, but not gotten there yet.
When you announced cgroup support in libvirt, it was definitely going to be a user and we hoped that you would come to us with your exact requirements that you've mentioned now (believe me, your feedback is very useful). The question then to ask is, is it cheaper for you to build these abstractions into libvirt or either helped us or asked us to do so, we would have gladly obliged. You might say that the onus is on the maintainers to do the right thing without feedback, but I would beg to differ.
The thing I didn't mention, is that until Dan posted his current patches actually implementing the cgroups stuff in LXC driver, I didn't have a good picture of what the ideal higher level interface would look like. If you try and imagine high level APIs, without having an app actually using them, its all too easy to design something that turns out to not be useful.
So while I know the low level cgroups API isn't what we need, it needs the current proof of concept in the libvirt LXC driver to discover what is an effective approach for libcgroups. I suspect our code will evolve further as we learn from what we've got now. By doing this entirely within libvirt we can experiment with effective implementation strategies without having to lockdown a formally supported API immediately. Once things settle down, it'll easier for libcgroups to see exactly what is important for a high level API and thus make one that's useful to more apps in the long term.
Please remember my words "if you ever find that you have a code base that looks like what we have in libcgroups, please remember to switch over to libcgroup". I fear that you will reach that stage, the code that is going in right now has too many things hard-coded and will need a lot of changes going forward, things like adding support for new controllers is not going to be straight forward, your assumption that only root can create a container might be broken and we'll build support for hierarchies, which will require further changes, etc. I am not scaring you, just trying to make sure we don't solve the same problems twice. Balbir

The thing I didn't mention, is that until Dan posted his current patches actually implementing the cgroups stuff in LXC driver, I didn't have a good picture of what the ideal higher level interface would look like. If you try and imagine high level APIs, without having an app actually using them, its all too easy to design something that turns out to not be useful.
So while I know the low level cgroups API isn't what we need, it needs the current proof of concept in the libvirt LXC driver to discover what is an effective approach for libcgroups. I suspect our code will evolve further as we learn from what we've got now. By doing this entirely within libvirt we can experiment with effective implementation strategies without having to lockdown a formally supported API immediately. Once things settle down, it'll easier for libcgroups to see exactly what is important for a high level API and thus make one that's useful to more apps in the long term.
Agreed, the libvirt changes for cgroups have shown us a useful layer to build. We'll keep on top of it and try and build something that everyone can use. Balbir
participants (4)
-
Balbir Singh
-
Daniel P. Berrange
-
Daniel Veillard
-
Dhaval Giani