[libvirt] [PATCH v2] Get the maxvcpus from both the qemuCaps and /dev/kvm

For some archs and machines, the maxvcpus defined in the kernel can be different from the qemu-caps maxvcpus. Just reporting the kernel defined maxvcpus is not be sufficient. virsh domacapabilities and virsh maxvcpus --type kvm return different maxvcpus values and is confusing as to know what actually works. The minimum of the two values is what actually works. For example on PPC64, the KVM_MAX_VCPUS is defined to be 1024 in kernel where as qemu has MAX_CPUMASK_BITS defined at 255 in include/sysemu/sysemu.h. The guest can go upto 256 vcpus here. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_driver.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3d0c7c8..2716af8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1255,10 +1255,31 @@ static int qemuConnectIsAlive(virConnectPtr conn ATTRIBUTE_UNUSED) static int -kvmGetMaxVCPUs(void) +kvmGetMaxVCPUs(virConnectPtr conn) { int fd; int ret; + int qemuCapsMaxVcpus = 0; + virArch arch = virArchFromHost(); + virQEMUCapsPtr qemuCaps = NULL; + virQEMUDriverPtr driver = conn->privateData; + const char *machine; + + if (!(qemuCaps = virQEMUCapsCacheLookupByArch(driver->qemuCapsCache, + arch))) { + virReportError(VIR_ERR_INVALID_ARG, + _("unable to find any emulator to serve '%s' " + "architecture"), virArchToString(arch)); + return -1; + } + + if (!(machine = virQEMUCapsGetDefaultMachine(qemuCaps))) { + virObjectUnref(qemuCaps); + return -1; + } + + qemuCapsMaxVcpus = virQEMUCapsGetMachineMaxCpus(qemuCaps, machine); + virObjectUnref(qemuCaps); if ((fd = open(KVM_DEVICE, O_RDONLY)) < 0) { virReportSystemError(errno, _("Unable to open %s"), KVM_DEVICE); @@ -1282,7 +1303,7 @@ kvmGetMaxVCPUs(void) cleanup: VIR_FORCE_CLOSE(fd); - return ret; + return ret > qemuCapsMaxVcpus ? qemuCapsMaxVcpus : ret; } @@ -1323,7 +1344,7 @@ qemuConnectGetMaxVcpus(virConnectPtr conn ATTRIBUTE_UNUSED, const char *type) return 16; if (STRCASEEQ(type, "kvm")) - return kvmGetMaxVCPUs(); + return kvmGetMaxVCPUs(conn); if (STRCASEEQ(type, "kqemu")) return 1;

On 05/04/2016 08:34 AM, Shivaprasad G Bhat wrote:
For some archs and machines, the maxvcpus defined in the kernel can be different from the qemu-caps maxvcpus. Just reporting the kernel defined maxvcpus is not be sufficient. virsh domacapabilities and virsh maxvcpus --type kvm return different maxvcpus values and is confusing as to know what actually works. The minimum of the two values is what actually works.
For example on PPC64, the KVM_MAX_VCPUS is defined to be 1024 in kernel where as qemu has MAX_CPUMASK_BITS defined at 255 in include/sysemu/sysemu.h. The guest can go upto 256 vcpus here.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_driver.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)
You'll also need to extend the virConnectGetMaxVcpus doc string in src/libvirt-host.c to indicate it may limit the reported CPUs based on host arch and default machine type. Additionally, there's also virDomainGetMaxVcpus which is just virConnectGetMaxVcpus(def->type), however we should extend the way that is handled internally to also abide the domain's arch and machine type in this case. One comment below:
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3d0c7c8..2716af8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1255,10 +1255,31 @@ static int qemuConnectIsAlive(virConnectPtr conn ATTRIBUTE_UNUSED)
static int -kvmGetMaxVCPUs(void) +kvmGetMaxVCPUs(virConnectPtr conn) { int fd; int ret; + int qemuCapsMaxVcpus = 0; + virArch arch = virArchFromHost(); + virQEMUCapsPtr qemuCaps = NULL; + virQEMUDriverPtr driver = conn->privateData; + const char *machine; + + if (!(qemuCaps = virQEMUCapsCacheLookupByArch(driver->qemuCapsCache, + arch))) { + virReportError(VIR_ERR_INVALID_ARG, + _("unable to find any emulator to serve '%s' " + "architecture"), virArchToString(arch)); + return -1; + } + + if (!(machine = virQEMUCapsGetDefaultMachine(qemuCaps))) { + virObjectUnref(qemuCaps); + return -1; + } + + qemuCapsMaxVcpus = virQEMUCapsGetMachineMaxCpus(qemuCaps, machine); + virObjectUnref(qemuCaps);
if ((fd = open(KVM_DEVICE, O_RDONLY)) < 0) { virReportSystemError(errno, _("Unable to open %s"), KVM_DEVICE); @@ -1282,7 +1303,7 @@ kvmGetMaxVCPUs(void)
cleanup: VIR_FORCE_CLOSE(fd); - return ret; + return ret > qemuCapsMaxVcpus ? qemuCapsMaxVcpus : ret; }
You can use MIN() here Thanks, Cole

On Wed, May 04, 2016 at 10:12:24 -0400, Cole Robinson wrote:
On 05/04/2016 08:34 AM, Shivaprasad G Bhat wrote:
For some archs and machines, the maxvcpus defined in the kernel can be different from the qemu-caps maxvcpus. Just reporting the kernel defined maxvcpus is not be sufficient. virsh domacapabilities and virsh maxvcpus --type kvm return different maxvcpus values and is confusing as to know what actually works. The minimum of the two values is what actually works.
For example on PPC64, the KVM_MAX_VCPUS is defined to be 1024 in kernel where as qemu has MAX_CPUMASK_BITS defined at 255 in include/sysemu/sysemu.h. The guest can go upto 256 vcpus here.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_driver.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)
You'll also need to extend the virConnectGetMaxVcpus doc string in src/libvirt-host.c to indicate it may limit the reported CPUs based on host arch and default machine type.
Hmm, I think it is a very bad idea to limit the CPUs according to something which is hidden and you cannot influence. In other words, the API itself is not able to report the maximum depending on emulator binary, machine type, whatever so it shouldn't do that at all. Using default emulator and machine type is even more confusing that reporting just the kernel limits. The virConnectGetDomainCapabilities API seems like a much better place for reporting maximum number of usable CPUs as it can get kvm/qemu, arch, emulator binary, and machine type as optional parameters so that the user gets the answer they want rather than just (in various ways) random number reported by virConnectGetMaxVcpus. Jirka

On 05/04/2016 10:24 AM, Jiri Denemark wrote:
On Wed, May 04, 2016 at 10:12:24 -0400, Cole Robinson wrote:
On 05/04/2016 08:34 AM, Shivaprasad G Bhat wrote:
For some archs and machines, the maxvcpus defined in the kernel can be different from the qemu-caps maxvcpus. Just reporting the kernel defined maxvcpus is not be sufficient. virsh domacapabilities and virsh maxvcpus --type kvm return different maxvcpus values and is confusing as to know what actually works. The minimum of the two values is what actually works.
For example on PPC64, the KVM_MAX_VCPUS is defined to be 1024 in kernel where as qemu has MAX_CPUMASK_BITS defined at 255 in include/sysemu/sysemu.h. The guest can go upto 256 vcpus here.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_driver.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)
You'll also need to extend the virConnectGetMaxVcpus doc string in src/libvirt-host.c to indicate it may limit the reported CPUs based on host arch and default machine type.
Hmm, I think it is a very bad idea to limit the CPUs according to something which is hidden and you cannot influence. In other words, the API itself is not able to report the maximum depending on emulator binary, machine type, whatever so it shouldn't do that at all. Using default emulator and machine type is even more confusing that reporting just the kernel limits.
Is it not confusing that MaxVCPUs will return a value that no VM can actually support, because it hits qemu limits? That was the original complaint. Maybe instead we do MIN(kvm_vcpus, max_vcpus_supported_by_any_qemu_machine) Even if we don't change the logic, the API documentation could use some work to clarify the existing behavior
The virConnectGetDomainCapabilities API seems like a much better place for reporting maximum number of usable CPUs as it can get kvm/qemu, arch, emulator binary, and machine type as optional parameters so that the user gets the answer they want rather than just (in various ways) random number reported by virConnectGetMaxVcpus.
I agree the proper place for reporting fully accurate info is in domcapabilities - Cole

On Wed, 2016-05-04 at 10:44 -0400, Cole Robinson wrote:
Hmm, I think it is a very bad idea to limit the CPUs according to something which is hidden and you cannot influence. In other words, the API itself is not able to report the maximum depending on emulator binary, machine type, whatever so it shouldn't do that at all. Using default emulator and machine type is even more confusing that reporting just the kernel limits. Is it not confusing that MaxVCPUs will return a value that no VM can actually support, because it hits qemu limits? That was the original complaint. Maybe instead we do MIN(kvm_vcpus, max_vcpus_supported_by_any_qemu_machine) Even if we don't change the logic, the API documentation could use some work to clarify the existing behavior The virConnectGetDomainCapabilities API seems like a much better place for reporting maximum number of usable CPUs as it can get kvm/qemu, arch, emulator binary, and machine type as optional parameters so that the user gets the answer they want rather than just (in various ways) random number reported by virConnectGetMaxVcpus. I agree the proper place for reporting fully accurate info is in domcapabilities
How about something like <vcpus max='255'> <kvm> <soft_limit>96</soft_limit> <hard_limit>2048</hard_limit> </kvm> <qemu> <hard_limit>255</hard_limit> </qemu> </vcpus> Hopefully someone can come up with a better XML than that, but you get the idea :) This would include one bit of information that AFAIK we're currently missing, which is the recommended number of vCPUs; the maximum value would of course be the smallest amoung all hard limits. And of course we can't change virConnectGetMaxVcpus() because it's public API, but we can extend 'virsh maxvcpus' to accept the same options as 'virsh domcapabilities' and actually use the domcapabilities XML behind the scenes. That, and update the API documentation for virConnectGetMaxVcpus() to notify applications developers that they should really use the domcapabilities XML instead. -- Andrea Bolognani Software Engineer - Virtualization Team

On Wed, May 04, 2016 at 19:27:41 +0200, Andrea Bolognani wrote:
How about something like
<vcpus max='255'> <kvm> <soft_limit>96</soft_limit> <hard_limit>2048</hard_limit> </kvm> <qemu> <hard_limit>255</hard_limit> </qemu> </vcpus>
No, domcapabilities are bound to an arch/binary/type/machine-type combination so providing both kvm and qemu stuff there is wrong. If it's a result of asking for kvm type, //vcpus@max should contain the kvm limit. If the type of the domain is qemu, it should contain qemu limit. And in both cases, the limit is constrained to just the machine type reported in the domcapabilities XML.
This would include one bit of information that AFAIK we're currently missing, which is the recommended number of vCPUs; the maximum value would of course be the smallest amoung all hard limits.
We could perhaps somehow incorporate the "recommended number of CPUs" in there, but that's a separate thing really.
And of course we can't change virConnectGetMaxVcpus() because it's public API, but we can extend 'virsh maxvcpus' to accept the same options as 'virsh domcapabilities' and actually use the domcapabilities XML behind the scenes.
I don't think this is a good idea.
That, and update the API documentation for virConnectGetMaxVcpus() to notify applications developers that they should really use the domcapabilities XML instead.
But yeah, we should update the docs this way. Jirka

On Thu, 2016-05-05 at 08:52 +0200, Jiri Denemark wrote:
How about something like <vcpus max='255'> <kvm> <soft_limit>96</soft_limit> <hard_limit>2048</hard_limit> </kvm> <qemu> <hard_limit>255</hard_limit> </qemu> </vcpus> No, domcapabilities are bound to an arch/binary/type/machine-type combination so providing both kvm and qemu stuff there is wrong. If it's a result of asking for kvm type, //vcpus@max should contain the kvm
On Wed, May 04, 2016 at 19:27:41 +0200, Andrea Bolognani wrote: limit. If the type of the domain is qemu, it should contain qemu limit. And in both cases, the limit is constrained to just the machine type reported in the domcapabilities XML.
The fact is that the QEMU limit *does* influence the actual limit for KVM guests: see the example above, taken from a POWER host, where KVM would happily create 2048 vCPUs but QEMU is limited to 255. I didn't say it explicitly, but it was implied that QEMU limits would only be present for QEMU and KVM guests, and KVM limits only for KVM guests. I guess we could skip all the details and just give the user <vcpus max='255' suggested='96'/> so they don't have to implement the logic themselves. But reporting the KVM limits without taking QEMU limits into consideration is not the way to go, I think.
This would include one bit of information that AFAIK we're currently missing, which is the recommended number of vCPUs; the maximum value would of course be the smallest amoung all hard limits. We could perhaps somehow incorporate the "recommended number of CPUs" in there, but that's a separate thing really.
It seems like a valuable information an a good fit for the element, but if you have a better placement in mind please do share :)
And of course we can't change virConnectGetMaxVcpus() because it's public API, but we can extend 'virsh maxvcpus' to accept the same options as 'virsh domcapabilities' and actually use the domcapabilities XML behind the scenes. I don't think this is a good idea.
Why not? virsh is supposed to be both a uselful management tool and a showcase of our API... If we document that a function does not provide optimal data, and document the preferred approach instead, why shouldn't virsh be updated to reflect the changes we're raccomending external users, thus making it better both as a management tool and as an API showcase? -- Andrea Bolognani Software Engineer - Virtualization Team

On Thu, May 05, 2016 at 09:34:21 +0200, Andrea Bolognani wrote:
On Thu, 2016-05-05 at 08:52 +0200, Jiri Denemark wrote:
How about something like <vcpus max='255'> <kvm> <soft_limit>96</soft_limit> <hard_limit>2048</hard_limit> </kvm> <qemu> <hard_limit>255</hard_limit> </qemu> </vcpus> No, domcapabilities are bound to an arch/binary/type/machine-type combination so providing both kvm and qemu stuff there is wrong. If it's a result of asking for kvm type, //vcpus@max should contain the kvm
On Wed, May 04, 2016 at 19:27:41 +0200, Andrea Bolognani wrote: limit. If the type of the domain is qemu, it should contain qemu limit. And in both cases, the limit is constrained to just the machine type reported in the domcapabilities XML.
The fact is that the QEMU limit *does* influence the actual limit for KVM guests: see the example above, taken from a POWER host, where KVM would happily create 2048 vCPUs but QEMU is limited to 255.
I didn't say it explicitly, but it was implied that QEMU limits would only be present for QEMU and KVM guests, and KVM limits only for KVM guests.
I guess we could skip all the details and just give the user
<vcpus max='255' suggested='96'/>
so they don't have to implement the logic themselves. But reporting the KVM limits without taking QEMU limits into consideration is not the way to go, I think.
Yes. I didn't want to say we should only report the number from KVM, I meant we should only show the number that is applicable to a KVM domain of the particular machine type. That is, both KVM and QEMU limits combined into a single number which gives the real limit. Jirka

On Thu, 2016-05-05 at 10:00 +0200, Jiri Denemark wrote:
I guess we could skip all the details and just give the user <vcpus max='255' suggested='96'/> so they don't have to implement the logic themselves. But reporting the KVM limits without taking QEMU limits into consideration is not the way to go, I think. Yes. I didn't want to say we should only report the number from KVM, I meant we should only show the number that is applicable to a KVM domain of the particular machine type. That is, both KVM and QEMU limits combined into a single number which gives the real limit.
Okay :) But if we don't change 'virsh maxvcpus --type kvm' to report the computed limit, instead of just the KVM limit as it does now, the problem reported by Shivaprasad will still exist... -- Andrea Bolognani Software Engineer - Virtualization Team

On Thu, May 05, 2016 at 10:39:56 +0200, Andrea Bolognani wrote:
On Thu, 2016-05-05 at 10:00 +0200, Jiri Denemark wrote:
I guess we could skip all the details and just give the user <vcpus max='255' suggested='96'/> so they don't have to implement the logic themselves. But reporting the KVM limits without taking QEMU limits into consideration is not the way to go, I think. Yes. I didn't want to say we should only report the number from KVM, I meant we should only show the number that is applicable to a KVM domain of the particular machine type. That is, both KVM and QEMU limits combined into a single number which gives the real limit.
Okay :)
But if we don't change 'virsh maxvcpus --type kvm' to report the computed limit, instead of just the KVM limit as it does now, the problem reported by Shivaprasad will still exist...
I think that should be fixed by documenting that the old API and virsh command report bad results and should not be used rather than trying to add syntax sugar that will pull the data from a different place.

On Thu, 2016-05-05 at 11:00 +0200, Peter Krempa wrote:
But if we don't change 'virsh maxvcpus --type kvm' to report the computed limit, instead of just the KVM limit as it does now, the problem reported by Shivaprasad will still exist... I think that should be fixed by documenting that the old API and virsh command report bad results and should not be used rather than trying to add syntax sugar that will pull the data from a different place.
I disagree. We can't extend virConnectGetMaxVcpus() in any sensible way because that would break the ABI, but we can do so for the maxvcpus command: --type is already the same as the domcapabilities command, we just have to add --emulatorbin, --arch and --machine and trivially change the implementation. People using 'virsh maxvcpus' without passing any of the new options will get better data automatically, in a completely backward-compatible fashion, and we will have one less place where different APIs return slightly different results for no apparent reason (I'm looking at you, topology information). While virsh commands generally map directly to the libvirt function of the same name, that's not necessarily a strict requirement IMHO - especially when we're asking application developers *not* to use that API anymore, and pointing them to a different solution. We should lead by example here. -- Andrea Bolognani Software Engineer - Virtualization Team

On Wed, May 04, 2016 at 10:44:37 -0400, Cole Robinson wrote:
On 05/04/2016 10:24 AM, Jiri Denemark wrote:
Hmm, I think it is a very bad idea to limit the CPUs according to something which is hidden and you cannot influence. In other words, the API itself is not able to report the maximum depending on emulator binary, machine type, whatever so it shouldn't do that at all. Using default emulator and machine type is even more confusing that reporting just the kernel limits.
Is it not confusing that MaxVCPUs will return a value that no VM can actually support, because it hits qemu limits? That was the original complaint.
Maybe instead we do MIN(kvm_vcpus, max_vcpus_supported_by_any_qemu_machine)
Even if we don't change the logic, the API documentation could use some work to clarify the existing behavior
I think this is the only part that really needs changing. There's no way we can fix the API to report anything useful. The result will always be confusing in some way. I'd rather keep the confusion consistent and let the API report what it always reported and explain that in the documentation with a pointer to domain capabilities API. Jirka
participants (5)
-
Andrea Bolognani
-
Cole Robinson
-
Jiri Denemark
-
Peter Krempa
-
Shivaprasad G Bhat