[libvirt] [PATCH] util: Remove empty resource partition created by libvirt

The default resource partition is created in the domain start path if it is not existing. Even when libvirtd is stopped after shutting down all domains, the resource partition still exists. The patch adds code to removes the default resource partition in the cgroup removal path of the domain. If the default resource partition is found to have no child cgroup, the default resource partition will be removed. Moreover, the code does not remove the user provided resource partitions. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> --- src/util/vircgroup.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 0599ba5..4dc0702 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -42,6 +42,7 @@ #define __VIR_CGROUP_ALLOW_INCLUDE_PRIV_H__ #include "vircgrouppriv.h" +#include "dirname.h" #include "virutil.h" #include "viralloc.h" #include "virerror.h" @@ -3311,6 +3312,59 @@ virCgroupRemoveRecursively(char *grppath) /** + * virCgroupRemoveEmptyParent: + * + * @group-path: The group path + * + * Cleanup the libvirt created partition directory if there are no + * child group existing. For the given @group-path, find the parent + * directory. For resource partition created by libvirt check + * existence of child cgroup. Remove the resource partition if no + * child cgroup exist. + * + * Returns: void + */ +static void +virCgroupRemoveEmptyParent(char *grppath) +{ + char *parent = NULL, *partition = NULL; + struct dirent *ent; + DIR *grpdir; + int direrr; + int is_empty = 1; + + if (!(parent = mdir_name(grppath))) + goto cleanup; + + partition = strrchr(parent, '/'); + if (STRNEQ(partition, "/machine") && STRNEQ(partition, "/machine.slice")) + goto cleanup; + + grpdir = opendir(parent); + if (grpdir == NULL) + goto cleanup; + + while ((direrr = virDirRead(grpdir, &ent, NULL)) > 0) { + if (ent->d_name[0] == '.') continue; + if (ent->d_type == DT_DIR) { + is_empty = 0; + break; + } + } + closedir(grpdir); + + if (is_empty) { + VIR_DEBUG("Removing empty parent cgroup %s", parent); + if (rmdir(parent) != 0) + VIR_ERROR(_("Unable to remove %s (%d)"), parent, errno); + } + + cleanup: + VIR_FREE(parent); + return; +} + +/** * virCgroupRemove: * * @group: The group to be removed @@ -3352,6 +3406,7 @@ virCgroupRemove(virCgroupPtr group) VIR_DEBUG("Removing cgroup %s and all child cgroups", grppath); rc = virCgroupRemoveRecursively(grppath); + virCgroupRemoveEmptyParent(grppath); VIR_FREE(grppath); } VIR_DEBUG("Done removing cgroup %s", group->path); -- 2.4.3

On Tue, Aug 11, 2015 at 04:57:15PM +0530, Nikunj A Dadhania wrote:
The default resource partition is created in the domain start path if it is not existing. Even when libvirtd is stopped after shutting down all domains, the resource partition still exists.
The patch adds code to removes the default resource partition in the cgroup removal path of the domain. If the default resource partition is found to have no child cgroup, the default resource partition will be removed.
Moreover, the code does not remove the user provided resource partitions.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
I don't think we want to be doing this. In non-systemd hosts this will be deleting the heirarchy that the sysadmin manually pre-created for their VMs. In a systemd host it will also end up deleting slices that were created by systemd. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hi Daniel, "Daniel P. Berrange" <berrange@redhat.com> writes:
On Tue, Aug 11, 2015 at 04:57:15PM +0530, Nikunj A Dadhania wrote:
The default resource partition is created in the domain start path if it is not existing. Even when libvirtd is stopped after shutting down all domains, the resource partition still exists.
The patch adds code to removes the default resource partition in the cgroup removal path of the domain. If the default resource partition is found to have no child cgroup, the default resource partition will be removed.
Moreover, the code does not remove the user provided resource partitions.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
I don't think we want to be doing this. In non-systemd hosts this will be deleting the heirarchy that the sysadmin manually pre-created for their VMs. In a systemd host it will also end up deleting slices that were created by systemd.
AFAIU, there are three cases here: 1) User created resource partition, for example /production/foo As this is created by user, we should not touch them. And my patch does not remove them 2) systemd created /machine.slice If not libvirt, should systemd clean this up when the libvirtd service is stopped ? Currently, my patch does remove this when its found empty 3) libvirt created /machine As this was created manually by libvirt, should we delete it here in libvirt daemon Regards, Nikunj

On 12.08.2015 07:39, Nikunj A Dadhania wrote:
Hi Daniel,
"Daniel P. Berrange" <berrange@redhat.com> writes:
On Tue, Aug 11, 2015 at 04:57:15PM +0530, Nikunj A Dadhania wrote:
The default resource partition is created in the domain start path if it is not existing. Even when libvirtd is stopped after shutting down all domains, the resource partition still exists.
The patch adds code to removes the default resource partition in the cgroup removal path of the domain. If the default resource partition is found to have no child cgroup, the default resource partition will be removed.
Moreover, the code does not remove the user provided resource partitions.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
I don't think we want to be doing this. In non-systemd hosts this will be deleting the heirarchy that the sysadmin manually pre-created for their VMs. In a systemd host it will also end up deleting slices that were created by systemd.
AFAIU, there are three cases here:
1) User created resource partition, for example /production/foo As this is created by user, we should not touch them. And my patch does not remove them
2) systemd created /machine.slice If not libvirt, should systemd clean this up when the libvirtd service is stopped ?
No, machined should clean that up based on signalization sent by libvirt when it is started up again. I guess the scenario is as follows: 1) libvirt is starting a container 2) as part of the process, a remote procedure is called (via dbus) on machined to precreate the machine.slice 3) the container is started 4) libvirtd.service is stopped 5) container is stopped Now you have dangling machine.slice. But this in fact is correct, because libvirt needs to clean up its runtime metadata too. Therefore the process should go on like this: 6) libvirtd.service is started again 7) libvirt notices that the container has stopped 8) as part of cleanup process it instructs machined to remove the machine.slice
Currently, my patch does remove this when its found empty
3) libvirt created /machine As this was created manually by libvirt, should we delete it here in libvirt daemon
This one can make sense. Michal

Michal Privoznik <mprivozn@redhat.com> writes:
On 12.08.2015 07:39, Nikunj A Dadhania wrote:
Hi Daniel,
"Daniel P. Berrange" <berrange@redhat.com> writes:
On Tue, Aug 11, 2015 at 04:57:15PM +0530, Nikunj A Dadhania wrote:
The default resource partition is created in the domain start path if it is not existing. Even when libvirtd is stopped after shutting down all domains, the resource partition still exists.
The patch adds code to removes the default resource partition in the cgroup removal path of the domain. If the default resource partition is found to have no child cgroup, the default resource partition will be removed.
Moreover, the code does not remove the user provided resource partitions.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
I don't think we want to be doing this. In non-systemd hosts this will be deleting the heirarchy that the sysadmin manually pre-created for their VMs. In a systemd host it will also end up deleting slices that were created by systemd.
AFAIU, there are three cases here:
1) User created resource partition, for example /production/foo As this is created by user, we should not touch them. And my patch does not remove them
2) systemd created /machine.slice If not libvirt, should systemd clean this up when the libvirtd service is stopped ?
No, machined should clean that up based on signalization sent by libvirt when it is started up again. I guess the scenario is as follows: 1) libvirt is starting a container 2) as part of the process, a remote procedure is called (via dbus) on machined to precreate the machine.slice 3) the container is started 4) libvirtd.service is stopped 5) container is stopped
In my case I was looking for: 3) the container is started 4) the container is stopped 5) libvirtd.service is stopped Who would remove machine.slice in this case? Basically, there are no containers running currently.
Now you have dangling machine.slice. But this in fact is correct, because libvirt needs to clean up its runtime metadata too. Therefore the process should go on like this:
6) libvirtd.service is started again 7) libvirt notices that the container has stopped 8) as part of cleanup process it instructs machined to remove the machine.slice
Are you suggesting that current libvirtd along with machined is doing the above 3 steps? Or we would need to enable that?
Currently, my patch does remove this when its found empty
3) libvirt created /machine As this was created manually by libvirt, should we delete it here in libvirt daemon
This one can make sense.
Regards, Nikunj

On Wed, Aug 12, 2015 at 11:09:12AM +0530, Nikunj A Dadhania wrote:
Hi Daniel,
"Daniel P. Berrange" <berrange@redhat.com> writes:
On Tue, Aug 11, 2015 at 04:57:15PM +0530, Nikunj A Dadhania wrote:
The default resource partition is created in the domain start path if it is not existing. Even when libvirtd is stopped after shutting down all domains, the resource partition still exists.
The patch adds code to removes the default resource partition in the cgroup removal path of the domain. If the default resource partition is found to have no child cgroup, the default resource partition will be removed.
Moreover, the code does not remove the user provided resource partitions.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
I don't think we want to be doing this. In non-systemd hosts this will be deleting the heirarchy that the sysadmin manually pre-created for their VMs. In a systemd host it will also end up deleting slices that were created by systemd.
AFAIU, there are three cases here:
1) User created resource partition, for example /production/foo As this is created by user, we should not touch them. And my patch does not remove them
2) systemd created /machine.slice If not libvirt, should systemd clean this up when the libvirtd service is stopped ?
Currently, my patch does remove this when its found empty
It isn't libvirtd's job to delete /machine.slice - systemd will periodically prune empty slices itself.
3) libvirt created /machine As this was created manually by libvirt, should we delete it here in libvirt daemon
No, you can't assume /machine is created by libvirtd - it could have been created by the user, just like case 3. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

"Daniel P. Berrange" <berrange@redhat.com> writes:
On Wed, Aug 12, 2015 at 11:09:12AM +0530, Nikunj A Dadhania wrote:
Hi Daniel,
"Daniel P. Berrange" <berrange@redhat.com> writes:
On Tue, Aug 11, 2015 at 04:57:15PM +0530, Nikunj A Dadhania wrote:
The default resource partition is created in the domain start path if it is not existing. Even when libvirtd is stopped after shutting down all domains, the resource partition still exists.
The patch adds code to removes the default resource partition in the cgroup removal path of the domain. If the default resource partition is found to have no child cgroup, the default resource partition will be removed.
Moreover, the code does not remove the user provided resource partitions.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
I don't think we want to be doing this. In non-systemd hosts this will be deleting the heirarchy that the sysadmin manually pre-created for their VMs. In a systemd host it will also end up deleting slices that were created by systemd.
AFAIU, there are three cases here:
1) User created resource partition, for example /production/foo As this is created by user, we should not touch them. And my patch does not remove them
2) systemd created /machine.slice If not libvirt, should systemd clean this up when the libvirtd service is stopped ?
Currently, my patch does remove this when its found empty
It isn't libvirtd's job to delete /machine.slice - systemd will periodically prune empty slices itself.
Let me check that, did not see this happening.
3) libvirt created /machine As this was created manually by libvirt, should we delete it here in libvirt daemon
No, you can't assume /machine is created by libvirtd - it could have been created by the user, just like case 3.
Did you mean case 1 here? Regards, Nikunj

"Daniel P. Berrange" <berrange@redhat.com> writes:
On Wed, Aug 12, 2015 at 11:09:12AM +0530, Nikunj A Dadhania wrote:
Hi Daniel,
"Daniel P. Berrange" <berrange@redhat.com> writes:
On Tue, Aug 11, 2015 at 04:57:15PM +0530, Nikunj A Dadhania wrote:
The default resource partition is created in the domain start path if it is not existing. Even when libvirtd is stopped after shutting down all domains, the resource partition still exists.
The patch adds code to removes the default resource partition in the cgroup removal path of the domain. If the default resource partition is found to have no child cgroup, the default resource partition will be removed.
Moreover, the code does not remove the user provided resource partitions.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
I don't think we want to be doing this. In non-systemd hosts this will be deleting the heirarchy that the sysadmin manually pre-created for their VMs. In a systemd host it will also end up deleting slices that were created by systemd.
AFAIU, there are three cases here:
1) User created resource partition, for example /production/foo As this is created by user, we should not touch them. And my patch does not remove them
2) systemd created /machine.slice If not libvirt, should systemd clean this up when the libvirtd service is stopped ?
Currently, my patch does remove this when its found empty
It isn't libvirtd's job to delete /machine.slice - systemd will periodically prune empty slices itself.
Before runnning machine.slice . ├── blkio │ ├── system.slice │ └── user.slice ├── cpu -> cpu,cpuacct ├── cpuacct -> cpu,cpuacct ├── cpu,cpuacct │ ├── system.slice │ └── user.slice ├── cpuset ├── devices │ ├── system.slice │ └── user.slice ├── freezer ├── hugetlb ├── memory │ ├── system.slice │ └── user.slice ├── net_cls -> net_cls,net_prio ├── net_cls,net_prio ├── net_prio -> net_cls,net_prio ├── perf_event └── systemd ├── system.slice └── user.slice After starting systemd-machined: . ├── blkio │ ├── machine.slice │ ├── system.slice │ └── user.slice ├── cpu -> cpu,cpuacct ├── cpuacct -> cpu,cpuacct ├── cpu,cpuacct │ ├── machine.slice │ ├── system.slice │ └── user.slice ├── cpuset ├── devices │ ├── machine.slice │ ├── system.slice │ └── user.slice ├── freezer ├── hugetlb ├── memory │ ├── machine.slice │ ├── system.slice │ └── user.slice ├── net_cls -> net_cls,net_prio ├── net_cls,net_prio ├── net_prio -> net_cls,net_prio ├── perf_event └── systemd ├── machine.slice ├── system.slice └── user.slice systemd leaves out cpuset/freezer. So these are basically created by libvirt and not by systemd. When I went through systemd code, it seems cpuset/freezer is disabled on purpose. systemd/src/core/main.c static int initialize_join_controllers(void) { /* By default, mount "cpu" + "cpuacct" together, and "net_cls" * + "net_prio". We'd like to add "cpuset" to the mix, but * "cpuset" doesn't really work for groups with no initialized * attributes. */ Some more details here: http://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/ Note that the number of cgroup attributes currently exposed as unit properties is limited. This will be extended later on, as their kernel interfaces are cleaned up. For example cpuset or freezer are currently not exposed at all due to the broken inheritance semantics of the kernel logic.
3) libvirt created /machine As this was created manually by libvirt, should we delete it here in libvirt daemon
No, you can't assume /machine is created by libvirtd - it could have been created by the user, just like case 3.
I was thinking of an idea to create a stub directory "libvirt-owned" in the machine/machine.slice if libvirt created this directory. So while on the exit path if we find the directory empty and is owned by libvirt, we can clean that up. Regards, Nikunj
participants (3)
-
Daniel P. Berrange
-
Michal Privoznik
-
Nikunj A Dadhania