Re: [libvirt] [PATCH] vl.c: make sure maxcpus matches topology to prevent migration failure

On Fri, Aug 24, 2018 at 01:26:54PM +0200, Igor Mammedov wrote:
On Fri, 24 Aug 2018 08:11:48 -0300 Eduardo Habkost <ehabkost@redhat.com> wrote:
On Fri, Aug 24, 2018 at 11:13:50AM +0200, Igor Mammedov wrote:
On Thu, 23 Aug 2018 18:32:41 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:
On 23/08/2018 16:51, Igor Mammedov wrote:
Topology (threads*cores*sockets) must match maxcpus to be valid, otherwise we could start QEMU with invalid topology that throws a error on migration destination side, that should not be reachable: Source: -smp 8,maxcpus=64,cores=1,threads=8,sockets=1 // hotplug cpus upto maxcpus Destination: -smp 64,maxcpus=64,cores=1,threads=8,sockets=1 qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64) This destination CLI aren't exactly correct as well since it should've been exactly the same -smp as on source + a bunch of -device cpufoo... so we can always say go fix your CLI so it won't trigger error.
The destination should have sockets=8, shouldn't it? either that or cores=8 or cores=4,sockets=2 ...
It seems to me that, at startup, you should have cpus = s*t*c and cpus <= maxcpus. Currently we check cpus <= s*t*c <= maxcpus, which doesn't make much sense. I think that s*t*c should describe topology of whole machine including not yet plugged vcpus. "cpus = s*t*c" probably won't work for partially filled package case: -smp 1,cores=1,threads=8,sockets=1 cores/threads should reflect full package configuration for guest to see an expected topology.
Oh, now I remember: that's the reason we don't enforce s*t*c == smp_cpus nor s*t*c == max_cpus.
Both "-smp 4,maxcpus=8,cores=2,threads=2,sockets=1" and "-smp 4,maxcpus=8,cores=2,threads=2,sockets=2" worked since maxcpus was introduced, making the semantics of "sockets" unclear and hard to change without breaking existing configs. Should we go with deprication thingy then, so we could make it clear in the future?
Yes, but I'm not sure which option we should adopt (s*t*c == smp_cpus or s*t*c == max_cpus). Does anybody know what's the semantics expected by libvirt today? -- Eduardo

On Fri, Aug 24, 2018 at 10:53:32AM -0300, Eduardo Habkost wrote:
On Fri, Aug 24, 2018 at 01:26:54PM +0200, Igor Mammedov wrote:
On Fri, 24 Aug 2018 08:11:48 -0300 Eduardo Habkost <ehabkost@redhat.com> wrote:
On Fri, Aug 24, 2018 at 11:13:50AM +0200, Igor Mammedov wrote:
On Thu, 23 Aug 2018 18:32:41 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:
On 23/08/2018 16:51, Igor Mammedov wrote:
Topology (threads*cores*sockets) must match maxcpus to be valid, otherwise we could start QEMU with invalid topology that throws a error on migration destination side, that should not be reachable: Source: -smp 8,maxcpus=64,cores=1,threads=8,sockets=1 // hotplug cpus upto maxcpus Destination: -smp 64,maxcpus=64,cores=1,threads=8,sockets=1 qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64) This destination CLI aren't exactly correct as well since it should've been exactly the same -smp as on source + a bunch of -device cpufoo... so we can always say go fix your CLI so it won't trigger error.
The destination should have sockets=8, shouldn't it? either that or cores=8 or cores=4,sockets=2 ...
It seems to me that, at startup, you should have cpus = s*t*c and cpus <= maxcpus. Currently we check cpus <= s*t*c <= maxcpus, which doesn't make much sense. I think that s*t*c should describe topology of whole machine including not yet plugged vcpus. "cpus = s*t*c" probably won't work for partially filled package case: -smp 1,cores=1,threads=8,sockets=1 cores/threads should reflect full package configuration for guest to see an expected topology.
Oh, now I remember: that's the reason we don't enforce s*t*c == smp_cpus nor s*t*c == max_cpus.
Both "-smp 4,maxcpus=8,cores=2,threads=2,sockets=1" and "-smp 4,maxcpus=8,cores=2,threads=2,sockets=2" worked since maxcpus was introduced, making the semantics of "sockets" unclear and hard to change without breaking existing configs. Should we go with deprication thingy then, so we could make it clear in the future?
Yes, but I'm not sure which option we should adopt (s*t*c == smp_cpus or s*t*c == max_cpus).
Does anybody know what's the semantics expected by libvirt today?
Libvirt requires s*c*t to equal the total number of possible CPUs, *not* the currently plugged number. ie Valid: <vcpu placement='static' current='16'>32</vcpu> <cpu> <topology sockets='4' cores='4' threads='2'/> </cpu> Invalid: <vcpu placement='static' current='32'>64</vcpu> <cpu> <topology sockets='4' cores='4' threads='2'/> </cpu> Test with: $ virsh edit QEMUGuest1 error: unsupported configuration: CPU topology doesn't match maximum vcpu count Failed. Try again? [y,n,i,f,?]: Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, 24 Aug 2018 10:53:32 -0300 Eduardo Habkost <ehabkost@redhat.com> wrote:
On Fri, Aug 24, 2018 at 01:26:54PM +0200, Igor Mammedov wrote:
On Fri, 24 Aug 2018 08:11:48 -0300 Eduardo Habkost <ehabkost@redhat.com> wrote:
On Fri, Aug 24, 2018 at 11:13:50AM +0200, Igor Mammedov wrote:
On Thu, 23 Aug 2018 18:32:41 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:
On 23/08/2018 16:51, Igor Mammedov wrote:
Topology (threads*cores*sockets) must match maxcpus to be valid, otherwise we could start QEMU with invalid topology that throws a error on migration destination side, that should not be reachable: Source: -smp 8,maxcpus=64,cores=1,threads=8,sockets=1 // hotplug cpus upto maxcpus Destination: -smp 64,maxcpus=64,cores=1,threads=8,sockets=1 qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64) This destination CLI aren't exactly correct as well since it should've been exactly the same -smp as on source + a bunch of -device cpufoo... so we can always say go fix your CLI so it won't trigger error.
The destination should have sockets=8, shouldn't it? either that or cores=8 or cores=4,sockets=2 ...
It seems to me that, at startup, you should have cpus = s*t*c and cpus <= maxcpus. Currently we check cpus <= s*t*c <= maxcpus, which doesn't make much sense. I think that s*t*c should describe topology of whole machine including not yet plugged vcpus. "cpus = s*t*c" probably won't work for partially filled package case: -smp 1,cores=1,threads=8,sockets=1 cores/threads should reflect full package configuration for guest to see an expected topology.
Oh, now I remember: that's the reason we don't enforce s*t*c == smp_cpus nor s*t*c == max_cpus.
Both "-smp 4,maxcpus=8,cores=2,threads=2,sockets=1" and "-smp 4,maxcpus=8,cores=2,threads=2,sockets=2" worked since maxcpus was introduced, making the semantics of "sockets" unclear and hard to change without breaking existing configs. Should we go with deprication thingy then, so we could make it clear in the future?
Yes, but I'm not sure which option we should adopt (s*t*c == smp_cpus or s*t*c == max_cpus). s*t*c == smp_cpus is wrong as one won't be able to start QEMU with partial package and then hotplug the rest when needed. [...]
participants (3)
-
Daniel P. Berrangé
-
Eduardo Habkost
-
Igor Mammedov