[libvirt] [PATCH 0/2] Resolve issues seen with schedinfo

These patches resolve an issue seen using 'virsh schedinfo <domain>' on a non running domain that have been present since 1.0.4 as a result of the cgroup infrastructure changes: https://www.redhat.com/archives/libvir-list/2013-April/msg00783.html The exact commit id that caused the issue is listed in each of the commit messages. I used git bisect to determine, although it was tricky because the TPM changes were made around the same time and required commit '8b934a5c' to be applied in order to actually see domains on my host. Prior to the changes the "CFS Bandwidth" data provided by qemuGetCpuBWStatus() and lxcGetCpuBWStatus() would be obtained as a result of the driver cgroup being mounted. Now it relies on the domain cgroup being mounted which only occurs once the domain is active. This issue also affects the libvirt-cim code in how it defines QEMU domains. Fortunately it only looks for the "cpu_shares" value. A "downside" to the end result of these changes is that for a non-running domain it becomes impossible to obtain the vcpu_period, vcpu_quota, emulator_period, and emulator_quota values. All that can be obtained is the 'cpu_shares" value. As an option, I did consider adding the following to either [qemu|lxc]DomainSchedulerGetType() and DomainSchedulerParameter[Flags]() or just [qemu|lxc]GetCpuBWStatus() APIs in order to get the "host" values if they existed (since they are eventually copied into the domain cgroup): if (cgroup == NULL) { rc = virCgroupNewSelf(&hostgrp); if (rc < 0) { virReportSystemError(-rc, "%s", _("Unable to get schedinfo host cgroup")); goto cleanup; } cgroup = hostgrp; } However, I wasn't sure going that route was desired and figured that I'd use the code review opportunity to get the answer. Furthermore, it wasn't clear that the vcpu_* and emulator_* values made sense for a non running domain. Also, the virReportSystemError may need to change to a VIR_INFO since I believe it would be triggered if cgroups weren't mounted on the system. Another option would be to just add the above code to the GetType() APIs and then ignore the 'cpu_bw_status' for the VIR_DOMAIN_AFFECT_CONFIG path in the ParametersFlags() APIs thus returning all 5 datapoints set to whatever the persistentDef had defined. Since this has been present since 1.0.4 and no one has complained so far, I don't think it's critical for 1.0.6. I suspect the change would need to into the maint trees though. It might be nice to get DanB's opinion on this too. John Ferlan (2): qemu: Resolve issue with virsh schedinfo for non running domain lxc: Resolve issue with virsh schedinfo for non running domain src/lxc/lxc_driver.c | 9 ++++++++- src/qemu/qemu_driver.c | 11 ++++++++++- 2 files changed, 18 insertions(+), 2 deletions(-) -- 1.8.1.4

Since commit '632f78ca' the 'virsh schedinfo <domain>' command returns: Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted Prior to that change a non running domain would return: Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 This change will result in the following: Scheduler : posix cpu_shares : 0 The code sequence is a 'virGetSchedulerType()' which returns the "*params" for a subsequent 'virDomainGetSchedulerParameters[Flags]()' call. The latter requires at least 1 'nparam' to be provided/returned. --- src/qemu/qemu_driver.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4a76f14..7292149 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6899,12 +6899,20 @@ static char *qemuDomainGetSchedulerType(virDomainPtr dom, } priv = vm->privateData; + /* Domain not running or no cgroup */ + if (!priv->cgroup) { + if (nparams) + *nparams = 1; + goto cleanup; + } + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cgroup CPU controller is not mounted")); goto cleanup; } + if (nparams) { rc = qemuGetCpuBWStatus(priv->cgroup); if (rc < 0) @@ -6915,11 +6923,12 @@ static char *qemuDomainGetSchedulerType(virDomainPtr dom, *nparams = 5; } +cleanup: ignore_value(VIR_STRDUP(ret, "posix")); -cleanup: if (vm) virObjectUnlock(vm); + return ret; } -- 1.8.1.4

On 05/30/2013 02:24 PM, John Ferlan wrote:
Since commit '632f78ca' the 'virsh schedinfo <domain>' command returns:
Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0
This change will result in the following:
Scheduler : posix cpu_shares : 0
The code sequence is a 'virGetSchedulerType()' which returns the "*params" for a subsequent 'virDomainGetSchedulerParameters[Flags]()' call. The latter requires at least 1 'nparam' to be provided/returned.
I can't find where the nparams is required to be >= 1, but that might changed since you've posted this patch. I changed the '*nparams = 1' to '0' and it works perfectly (returns only the scheduler type, so it's visible that nothing is set).
--- src/qemu/qemu_driver.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4a76f14..7292149 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6899,12 +6899,20 @@ static char *qemuDomainGetSchedulerType(virDomainPtr dom, } priv = vm->privateData;
+ /* Domain not running or no cgroup */ + if (!priv->cgroup) { + if (nparams) + *nparams = 1; + goto cleanup; + } + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cgroup CPU controller is not mounted")); goto cleanup; }
+ if (nparams) { rc = qemuGetCpuBWStatus(priv->cgroup); if (rc < 0) @@ -6915,11 +6923,12 @@ static char *qemuDomainGetSchedulerType(virDomainPtr dom, *nparams = 5; }
+cleanup: ignore_value(VIR_STRDUP(ret, "posix"));
You can get to here even when the domain doesn't exist and in that case NULL should be returned to indicate an error. It would also skip an error on problem in qemuGetCpuBWStatus(), but there is none printed, so at least we would return the scheduler type.
-cleanup: if (vm) virObjectUnlock(vm); + return ret; }
This patch fixes the problem, but may create a new one. Also 'virsh schedinfo --config <domain>' is different for running and shutoff domain, but I'm willing to overlook that since it is less problematic than what happened before. Martin

On 06/07/2013 06:14 AM, Martin Kletzander wrote:
On 05/30/2013 02:24 PM, John Ferlan wrote:
Since commit '632f78ca' the 'virsh schedinfo <domain>' command returns:
Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0
This change will result in the following:
Scheduler : posix cpu_shares : 0
The code sequence is a 'virGetSchedulerType()' which returns the "*params" for a subsequent 'virDomainGetSchedulerParameters[Flags]()' call. The latter requires at least 1 'nparam' to be provided/returned.
I can't find where the nparams is required to be >= 1, but that might changed since you've posted this patch. I changed the '*nparams = 1' to '0' and it works perfectly (returns only the scheduler type, so it's visible that nothing is set).
The documentation for virDomainGetSchedulerParameters[Flags] has: * * Get all scheduler parameters. On input, @nparams gives the size of the * @params array; on output, @nparams gives how many slots were filled * with parameter information, which might be less but will not exceed * the input value. @nparams cannot be 0. * Also, the virDomainGetSchedulerParameters checks nparams > 0: virCheckPositiveArgGoto(*nparams, error); Te reason why virsh schedinfo printed just the scheduler type is because virsh-domain.c checks for 'nparams' before calling the virDomainGetSchedulerParametersFlags() API.
--- src/qemu/qemu_driver.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4a76f14..7292149 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6899,12 +6899,20 @@ static char *qemuDomainGetSchedulerType(virDomainPtr dom, } priv = vm->privateData;
+ /* Domain not running or no cgroup */ + if (!priv->cgroup) { + if (nparams) + *nparams = 1; + goto cleanup; + } + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cgroup CPU controller is not mounted")); goto cleanup; }
+ if (nparams) { rc = qemuGetCpuBWStatus(priv->cgroup); if (rc < 0) @@ -6915,11 +6923,12 @@ static char *qemuDomainGetSchedulerType(virDomainPtr dom, *nparams = 5; }
+cleanup: ignore_value(VIR_STRDUP(ret, "posix"));
You can get to here even when the domain doesn't exist and in that case NULL should be returned to indicate an error. It would also skip an error on problem in qemuGetCpuBWStatus(), but there is none printed, so at least we would return the scheduler type.
Oh yeah, right duh. I was trying to cover the case where !priv->cgroup without copying that same line when "!priv->cgroup". Setting *nparams = 1 was taken from the viewpoint of the return 0 case from the [lxc|qemu}GetCpuBWStatus() calls (since cgroups aren't started). For a non running guest, I assumed that none of the 5 values really meant anything, but I can see the logic that would get/set that count once prior to vm startup and then repeatedly assume/use that value to call virGetSchedulerParametersFlags() during the lifetime of whatever is monitoring the vm, so I understand why setting it to 5 is valid/desired. Once the guest is running, if there was an issue with the [lxc|qemu]GetCPUBWStatus() call, then only the 'cpu_shares' would get returned due to checks in the GetSchedulerParametersFlags() APIs. Thus, If I'm reading the responses thus far correctly, then When cgroups aren't enabled for the domain (eg, not active): qemuDomainGetSchedulerType() would return "posix" and 5 for *nparams lxcDomainGetSchedulerType() would return "posix" and 3 for *nparams e.g., for qemu: /* Domain not running or no cgroup */ if (!priv->cgroup) { if (nparams) *nparams = 5; ignore_value(VIR_STRDUP(ret, "posix")); goto cleanup; } (and moving the ignore_value(VIR_STRDUP(ret, "posix")); before cleanup: For the [lxc|qemu]DomainGetSchedulerType() API's, the only time [lxc|qemu]GetCpuBWStatus() would be called is if the domain is active, e.g., we get past the virCgroupPathOfController() call. The [qemu|lxc]DomainGetSchedulerParametersFlags() API's need adjustments as well because currently they make the same call to [lxc|qemu]GetCpuBWStatus() regardless of whether the domain is active or not and regardless of what 'flags' are set (current, live, config). This code I was less sure of how to handle what gets returned. Each has a call such as (from qemu): if (*nparams > 1) { rc = qemuGetCpuBWStatus(priv->cgroup); if (rc < 0) goto cleanup; cpu_bw_status = !!rc; } which sets 'cpu_bw_status' based on the rc, which is set as follows: * Return 1 when CFS bandwidth is supported, 0 when CFS bandwidth is not * supported, -1 on error. For the non active case, this returns 0 and thus clears the cpu_bw_status flag which affects the code deciding how many configuration parameters to return : if (flags & VIR_DOMAIN_AFFECT_CONFIG) { shares = persistentDef->cputune.shares; if (*nparams > 1 && cpu_bw_status) { ... Again, given how I'm reading the responses, the check here for cpu_bw_status could then be removed since if the vm is not active or the "--config" flag is used, we return all 5 parameters regardless of whether the vm is running and regardless of whether the cgroup is found. Another option would be to add a check just prior that would set the cpu_bw_status = true when we don't have a cgroup and we're just getting config data, such as: if (!priv->cgroup && (flags & VIR_DOMAIN_AFFECT_CONFIG)) cpu_bw_status = true; One way or another the 'cpu_bw_status' needs to be set to true before "goto out" so that all the data points are copied into the output "params[]" array. For "current" or "live" data of a running vm with cgroups mounted, subsequent checks will return an error, 1, 3, or 5 elements depending on success or failure of live checks. Obviously, I'll post a v2, but I would prefer to get it "more right" without too much more "churn". Tks, John
-cleanup: if (vm) virObjectUnlock(vm); + return ret; }
This patch fixes the problem, but may create a new one. Also 'virsh schedinfo --config <domain>' is different for running and shutoff domain, but I'm willing to overlook that since it is less problematic than what happened before.
Martin

On Thu, May 30, 2013 at 08:24:59AM -0400, John Ferlan wrote:
Since commit '632f78ca' the 'virsh schedinfo <domain>' command returns:
Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0
This change will result in the following:
Scheduler : posix cpu_shares : 0
Hmm, no this isn't right. The entire approach of this method to changing the set of tunables reported, according to the state of cgroups is broken by design. We should always unconditionally report all 5 tunables whether running or shutoff. Then when starting a guest, we should report an error if the admin has set a non-default emulator_quota/emulator_period values and the corresponding cgrpoups aren't available. Likewise when changing tunables on a running guest, we should report an error if the admin attempts to change the values of tunables that are not currently supported. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/07/2013 12:19 PM, Daniel P. Berrange wrote:
On Thu, May 30, 2013 at 08:24:59AM -0400, John Ferlan wrote:
Since commit '632f78ca' the 'virsh schedinfo <domain>' command returns:
Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0
This change will result in the following:
Scheduler : posix cpu_shares : 0
Hmm, no this isn't right. The entire approach of this method to changing the set of tunables reported, according to the state of cgroups is broken by design.
We should always unconditionally report all 5 tunables whether running or shutoff.
As I mentioned in my reply, this is right. However, the only difference is that qemuDomainGetSchedulerParametersFlags() in QEMU driver doesn't honor VIR_DOMAIN_AFFECT_CURRENT.
Then when starting a guest, we should report an error if the admin has set a non-default emulator_quota/emulator_period values and the corresponding cgrpoups aren't available.
This should be handled automatically already, if I'm not mistaken. Worth checking, though.
Likewise when changing tunables on a running guest, we should report an error if the admin attempts to change the values of tunables that are not currently supported.
And this is already fixed in commit v1.0.4-21-ge7cd284. Martin

On Fri, Jun 07, 2013 at 01:55:36PM +0200, Martin Kletzander wrote:
On 06/07/2013 12:19 PM, Daniel P. Berrange wrote:
On Thu, May 30, 2013 at 08:24:59AM -0400, John Ferlan wrote:
Since commit '632f78ca' the 'virsh schedinfo <domain>' command returns:
Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0
This change will result in the following:
Scheduler : posix cpu_shares : 0
Hmm, no this isn't right. The entire approach of this method to changing the set of tunables reported, according to the state of cgroups is broken by design.
We should always unconditionally report all 5 tunables whether running or shutoff.
As I mentioned in my reply, this is right. However, the only difference is that qemuDomainGetSchedulerParametersFlags() in QEMU driver doesn't honor VIR_DOMAIN_AFFECT_CURRENT.
No it isn't right. We must report all 5 tunable regardless of run state of the VM. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Since commit 'cfed9ad4' the 'virsh schedinfo <domain>' command returns: Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted Prior to that change a non running domain would return: Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 This change will result in the following: Scheduler : posix cpu_shares : 0 The code sequence is a 'virGetSchedulerType()' which returns the "*params" for a subsequent 'virDomainGetSchedulerParameters[Flags]()' call. The latter requires at least 1 'nparam' to be provided/returned. --- src/lxc/lxc_driver.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 3d6baf5..1991a31 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -1617,6 +1617,13 @@ static char *lxcDomainGetSchedulerType(virDomainPtr dom, } priv = vm->privateData; + /* Domain not running or no cgroup */ + if (!priv->cgroup) { + if (nparams) + *nparams = 1; + goto cleanup; + } + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cgroup CPU controller is not mounted")); @@ -1633,9 +1640,9 @@ static char *lxcDomainGetSchedulerType(virDomainPtr dom, *nparams = 3; } +cleanup: ignore_value(VIR_STRDUP(ret, "posix")); -cleanup: if (vm) virObjectUnlock(vm); lxcDriverUnlock(driver); -- 1.8.1.4

ping? Although perhaps I should have noted in the title that it seems there's a regression as a result of cgroups changes to the virGetSchedulerType() and virDomainGetSchedulerParameters[Flags]() APIs John On 05/30/2013 08:24 AM, John Ferlan wrote:
These patches resolve an issue seen using 'virsh schedinfo <domain>' on a non running domain that have been present since 1.0.4 as a result of the cgroup infrastructure changes:
https://www.redhat.com/archives/libvir-list/2013-April/msg00783.html
The exact commit id that caused the issue is listed in each of the commit messages. I used git bisect to determine, although it was tricky because the TPM changes were made around the same time and required commit '8b934a5c' to be applied in order to actually see domains on my host.
Prior to the changes the "CFS Bandwidth" data provided by qemuGetCpuBWStatus() and lxcGetCpuBWStatus() would be obtained as a result of the driver cgroup being mounted. Now it relies on the domain cgroup being mounted which only occurs once the domain is active.
This issue also affects the libvirt-cim code in how it defines QEMU domains. Fortunately it only looks for the "cpu_shares" value.
A "downside" to the end result of these changes is that for a non-running domain it becomes impossible to obtain the vcpu_period, vcpu_quota, emulator_period, and emulator_quota values. All that can be obtained is the 'cpu_shares" value. As an option, I did consider adding the following to either [qemu|lxc]DomainSchedulerGetType() and DomainSchedulerParameter[Flags]() or just [qemu|lxc]GetCpuBWStatus() APIs in order to get the "host" values if they existed (since they are eventually copied into the domain cgroup):
if (cgroup == NULL) { rc = virCgroupNewSelf(&hostgrp); if (rc < 0) { virReportSystemError(-rc, "%s", _("Unable to get schedinfo host cgroup")); goto cleanup; } cgroup = hostgrp; }
However, I wasn't sure going that route was desired and figured that I'd use the code review opportunity to get the answer. Furthermore, it wasn't clear that the vcpu_* and emulator_* values made sense for a non running domain. Also, the virReportSystemError may need to change to a VIR_INFO since I believe it would be triggered if cgroups weren't mounted on the system.
Another option would be to just add the above code to the GetType() APIs and then ignore the 'cpu_bw_status' for the VIR_DOMAIN_AFFECT_CONFIG path in the ParametersFlags() APIs thus returning all 5 datapoints set to whatever the persistentDef had defined.
Since this has been present since 1.0.4 and no one has complained so far, I don't think it's critical for 1.0.6. I suspect the change would need to into the maint trees though. It might be nice to get DanB's opinion on this too.
John Ferlan (2): qemu: Resolve issue with virsh schedinfo for non running domain lxc: Resolve issue with virsh schedinfo for non running domain
src/lxc/lxc_driver.c | 9 ++++++++- src/qemu/qemu_driver.c | 11 ++++++++++- 2 files changed, 18 insertions(+), 2 deletions(-)
participants (3)
-
Daniel P. Berrange
-
John Ferlan
-
Martin Kletzander