Re: [libvirt] [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size

CCing libvir-list. On Wed, Dec 04, 2013 at 03:58:50PM +0800, Wanlong Gao wrote:
If the total number of the assigned numa nodes memory is not equal to the assigned ram size, it will write the wrong data to ACPI talb, then the guest will ignore the wrong ACPI table and recognize all memory to one node. It's buggy, we should check it to ensure that we write the right data to ACPI table.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
This will make configurations that could be running for years (except that the guest OS was ignoring the NUMA data) suddenly stop running. I just want to confirm: we really want that, right? Does libvirt allow this kind of broken configuration to be generated, or it already ensures the total NUMA node sizes match RAM size?
--- numa.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/numa.c b/numa.c index ce7736a..beda80e 100644 --- a/numa.c +++ b/numa.c @@ -150,6 +150,16 @@ void set_numa_nodes(void) node_mem[i] = ram_size - usedmem; }
+ uint64_t numa_total = 0; + for (i = 0; i < nb_numa_nodes; i++) { + numa_total += node_mem[i]; + } + if (numa_total != ram_size) { + fprintf(stderr, "qemu: numa nodes total memory size " + "should equal to ram_size\n"); + exit(1); + } + for (i = 0; i < nb_numa_nodes; i++) { if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) { break; -- 1.8.5
-- Eduardo

Il 10/12/2013 14:15, Eduardo Habkost ha scritto:
If the total number of the assigned numa nodes memory is not equal to the assigned ram size, it will write the wrong data to ACPI talb, then the guest will ignore the wrong ACPI table and recognize all memory to one node. It's buggy, we should check it to ensure that we write the right data to ACPI table.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> This will make configurations that could be running for years (except that the guest OS was ignoring the NUMA data) suddenly stop running. I just want to confirm: we really want that, right?
Does libvirt allow this kind of broken configuration to be generated, or it already ensures the total NUMA node sizes match RAM size?
It allows this. It just converts the <numa> XML to "-numa node". Paolo

On Tue, Dec 10, 2013 at 07:03:50PM +0100, Paolo Bonzini wrote:
Il 10/12/2013 14:15, Eduardo Habkost ha scritto:
If the total number of the assigned numa nodes memory is not equal to the assigned ram size, it will write the wrong data to ACPI talb, then the guest will ignore the wrong ACPI table and recognize all memory to one node. It's buggy, we should check it to ensure that we write the right data to ACPI table.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> This will make configurations that could be running for years (except that the guest OS was ignoring the NUMA data) suddenly stop running. I just want to confirm: we really want that, right?
Does libvirt allow this kind of broken configuration to be generated, or it already ensures the total NUMA node sizes match RAM size?
It allows this. It just converts the <numa> XML to "-numa node".
In that case, if we apply this patch we may want to make libvirt validate the NUMA configuration instead of getting a cryptic "QEMU aborted" error message with the actual problem buried in a log file. (Well, even if we do not apply this patch, I believe it is a good idea to make libvirt validate the NUMA configuration.) -- Eduardo

On Tue, Dec 10, 2013 at 05:01:02PM -0200, Eduardo Habkost wrote:
On Tue, Dec 10, 2013 at 07:03:50PM +0100, Paolo Bonzini wrote:
Il 10/12/2013 14:15, Eduardo Habkost ha scritto:
If the total number of the assigned numa nodes memory is not equal to the assigned ram size, it will write the wrong data to ACPI talb, then the guest will ignore the wrong ACPI table and recognize all memory to one node. It's buggy, we should check it to ensure that we write the right data to ACPI table.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> This will make configurations that could be running for years (except that the guest OS was ignoring the NUMA data) suddenly stop running. I just want to confirm: we really want that, right?
Does libvirt allow this kind of broken configuration to be generated, or it already ensures the total NUMA node sizes match RAM size?
It allows this. It just converts the <numa> XML to "-numa node".
In that case, if we apply this patch we may want to make libvirt validate the NUMA configuration instead of getting a cryptic "QEMU aborted" error message with the actual problem buried in a log file.
(Well, even if we do not apply this patch, I believe it is a good idea to make libvirt validate the NUMA configuration.)
Yes, libvirt really ought to validate this, since such inconsistency is a bogus configuration. It would be desirable for libvirt to reject it completely as an error, but we should check if there any common apps which are (accidentally) relying on such broken configs already. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Eduardo Habkost
-
Paolo Bonzini