[libvirt] [PATCH v2 0/2] Resolve issues seen with schedinfo

These patches resolve an issue seen using 'virsh schedinfo <domain>' on a non running domain that have been present since 1.0.4 as a result of the cgroup infrastructure changes: https://www.redhat.com/archives/libvir-list/2013-April/msg00783.html The exact commit id that caused the issue is listed in each of the commit messages. I used git bisect to determine, although it was tricky because the TPM changes were made around the same time and required commit '8b934a5c' to be applied in order to actually see domains on my host. Prior to the changes the "CFS Bandwidth" data was obtained since the driver cgroup was mounted as opposed to the changes from the above set which mount cgroups when the domain is running. The result for 'virsh schedinfo <domain>' for a non running guest is to return the configuration data for default, --config, and --current options. The --live option reports a failure. For a running guest, default, --live, and --current report values from cgroup, while --config reports only the configuration values. This issue also affects the libvirt-cim code in how it defines QEMU domains. Fortunately it only looks for the "cpu_shares" value. Difference to v1: - In the [qemu|lxc]DomainGetSchedulerType() API's, rather than check for priv->cgroup, check if the domain is running and return defaults if not - In the [qemu|lxc]DomainGetSchedulerParametersFlags() API's, if we're only returning configuration data, then don't gate the result returned on the CFS bandwidth data cgroup availability. qemu: Resolve issue with GetScheduler APIs for non running domain lxc: Resolve issue with GetScheduler APIs for non running domain src/lxc/lxc_driver.c | 11 ++++++++++- src/qemu/qemu_driver.c | 11 ++++++++++- 2 files changed, 20 insertions(+), 2 deletions(-) -- 1.8.1.4

As a consequence of the cgroup layout changes from commit '632f78ca', the qemuDomainGetSchedulerParameters[Flags]()' and qemuGetSchedulerType() APIs failed to return data for a non running domain. This can be seen through a 'virsh schedinfo <domain>' command which returns: Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted Prior to that change a non running domain would return: Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 This patch will restore the capability to return configuration only data for a non running domain regardless of whether cgroups are available. --- src/qemu/qemu_driver.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 51952c9..d093b0f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7058,6 +7058,14 @@ static char *qemuDomainGetSchedulerType(virDomainPtr dom, } priv = vm->privateData; + /* Domain not running, thus no cgroups - return defaults */ + if (!virDomainObjIsActive(vm)) { + if (nparams) + *nparams = 5; + ignore_value(VIR_STRDUP(ret, "posix")); + goto cleanup; + } + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cgroup CPU controller is not mounted")); @@ -8470,11 +8478,12 @@ qemuDomainGetSchedulerParametersFlags(virDomainPtr dom, if (flags & VIR_DOMAIN_AFFECT_CONFIG) { shares = persistentDef->cputune.shares; - if (*nparams > 1 && cpu_bw_status) { + if (*nparams > 1) { period = persistentDef->cputune.period; quota = persistentDef->cputune.quota; emulator_period = persistentDef->cputune.emulator_period; emulator_quota = persistentDef->cputune.emulator_quota; + cpu_bw_status = true; /* Allow copy of data to params[] */ } goto out; } -- 1.8.1.4

On Mon, Jun 10, 2013 at 12:06:45PM -0400, John Ferlan wrote:
As a consequence of the cgroup layout changes from commit '632f78ca', the qemuDomainGetSchedulerParameters[Flags]()' and qemuGetSchedulerType() APIs failed to return data for a non running domain. This can be seen through a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0
This patch will restore the capability to return configuration only data for a non running domain regardless of whether cgroups are available. --- src/qemu/qemu_driver.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

As a consequence of the cgroup layout changes from commit 'cfed9ad4', the lxcDomainGetSchedulerParameters[Flags]()' and lxcGetSchedulerType() APIs failed to return data for a non running domain. This can be seen through a 'virsh schedinfo <domain>' command which returns: Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted Prior to that change a non running domain would return: Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0 This patch will restore the capability to return configuration only data for a non running domain regardless of whether cgroups are available. --- src/lxc/lxc_driver.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 3d6baf5..4ab1736 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -1617,6 +1617,14 @@ static char *lxcDomainGetSchedulerType(virDomainPtr dom, } priv = vm->privateData; + /* Domain not running, thus no cgroups - return defaults */ + if (!virDomainObjIsActive(vm)) { + if (nparams) + *nparams = 3; + ignore_value(VIR_STRDUP(ret, "posix")); + goto cleanup; + } + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cgroup CPU controller is not mounted")); @@ -1895,9 +1903,10 @@ lxcDomainGetSchedulerParametersFlags(virDomainPtr dom, if (flags & VIR_DOMAIN_AFFECT_CONFIG) { shares = persistentDef->cputune.shares; - if (*nparams > 1 && cpu_bw_status) { + if (*nparams > 1) { period = persistentDef->cputune.period; quota = persistentDef->cputune.quota; + cpu_bw_status = true; /* Allow copy of data to params[] */ } goto out; } -- 1.8.1.4

On Mon, Jun 10, 2013 at 12:06:46PM -0400, John Ferlan wrote:
As a consequence of the cgroup layout changes from commit 'cfed9ad4', the lxcDomainGetSchedulerParameters[Flags]()' and lxcGetSchedulerType() APIs failed to return data for a non running domain. This can be seen through a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix cpu_shares : 0 vcpu_period : 0 vcpu_quota : 0 emulator_period: 0 emulator_quota : 0
This patch will restore the capability to return configuration only data for a non running domain regardless of whether cgroups are available. --- src/lxc/lxc_driver.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 10.06.2013 18:06, John Ferlan wrote:
These patches resolve an issue seen using 'virsh schedinfo <domain>' on a non running domain that have been present since 1.0.4 as a result of the cgroup infrastructure changes:
https://www.redhat.com/archives/libvir-list/2013-April/msg00783.html
The exact commit id that caused the issue is listed in each of the commit messages. I used git bisect to determine, although it was tricky because the TPM changes were made around the same time and required commit '8b934a5c' to be applied in order to actually see domains on my host.
Prior to the changes the "CFS Bandwidth" data was obtained since the driver cgroup was mounted as opposed to the changes from the above set which mount cgroups when the domain is running.
The result for 'virsh schedinfo <domain>' for a non running guest is to return the configuration data for default, --config, and --current options. The --live option reports a failure. For a running guest, default, --live, and --current report values from cgroup, while --config reports only the configuration values.
This issue also affects the libvirt-cim code in how it defines QEMU domains. Fortunately it only looks for the "cpu_shares" value.
Difference to v1: - In the [qemu|lxc]DomainGetSchedulerType() API's, rather than check for priv->cgroup, check if the domain is running and return defaults if not - In the [qemu|lxc]DomainGetSchedulerParametersFlags() API's, if we're only returning configuration data, then don't gate the result returned on the CFS bandwidth data cgroup availability.
qemu: Resolve issue with GetScheduler APIs for non running domain lxc: Resolve issue with GetScheduler APIs for non running domain
src/lxc/lxc_driver.c | 11 ++++++++++- src/qemu/qemu_driver.c | 11 ++++++++++- 2 files changed, 20 insertions(+), 2 deletions(-)
ACK series. Michal

On 06/10/2013 12:06 PM, John Ferlan wrote:
src/lxc/lxc_driver.c | 11 ++++++++++- src/qemu/qemu_driver.c | 11 ++++++++++- 2 files changed, 20 insertions(+), 2 deletions(-)
These patches are now pushed. They need to also go into the 1.0.4-maint, 1.0.5-maint, and 1.0.6-maint. John
participants (3)
-
Daniel P. Berrange
-
John Ferlan
-
Michal Privoznik