On Wed, Jul 11, 2018 at 06:03:08PM +0200, Pavel Hrdina wrote:
To make it clear I'll summarize all the possible combinations and
how it
should work so we are on the same page.
originally: before commit [1]
now: after commit [1] (current master)
expect: what this patch series should fix
======= Non-NUMA guests =======
* Only one hugepage specified without any nodeset
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB'/>
</hugepages>
</memoryBacking>
This is correct, was always working and we should not change it.
originally: works
now: works
expected: works
* Only one hugapage specified with nodeset
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>
This is questionable since there is no guest NUMA node specified,
but on the other hand we can consider non-NUMA guest to have exactly
one NUMA node.
This was working but has been broken by commit [1] which tried to
fix a case where you are trying to specify non-existing NUMA node.
Because of that assumption we can consider this as valid XML even
though there is no NUMA node specified [2]. There are two possible
solutions:
- we can leave the XML intact
- we can silently remove the 'nodeset' attribute to 'fix' the
XML definition
originally: works
now: fails
expect: works
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='1'/>
</hugepages>
</memoryBacking>
If the nodeset is != 0 it should newer work becuase there is no
guest NUMA topology and even if we take into account the assumption
that there is always one NUMA node it is still invalid.
originally: works
now: fails
expect: fails
* One hugepage with specific nodeset and second default hugepage
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0'/>
<page size='2048' unit='KiB'/>
</hugepages>
</memoryBacking>
This was working but was 'fixed' by commit [1] because the code
checks only the first hugepage and uses only the first hugepage.
It should never worked because it doesn't make any sense, there
is no guest NUMA node configured and even if we take into account
the assumption that non-NUMA guest has one NUMA node there is need
for the default page size.
originally: works
now: fails
expect: fails
There is yet another issue with the current state in libvirt, if
you swap the order of pages:
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB'/>
<page size='1048576' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>
it will work even with current libvirt with the fix [1]. The reason
is that code in qemuBuildMemPathStr() function takes into account
only the first page size so it depends on the order of elements
which is wrong.
We should not allow any of these two configurations. Setting
nodeset to != 0 will should not make any difference.
originally: works
now: works
expect: fails
====== NUMA guest ======
* Only one hugepage specified without any nodeset
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB'/>
</hugepages>
</memoryBacking>
...
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' cores='4' threads='2'/>
<numa>
<cell id='0' cpus='0-3' memory='4194304'
unit='KiB'/>
<cell id='1' cpus='4-7' memory='4194304'
unit='KiB'/>
</numa>
</cpu>
originally: works
now: works
expect: works
* Only one hugapage specified with nodeset
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>
...
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' cores='4' threads='2'/>
<numa>
<cell id='0' cpus='0-3' memory='4194304'
unit='KiB'/>
<cell id='1' cpus='4-7' memory='4194304'
unit='KiB'/>
</numa>
</cpu>
All possible combinations for the nodeset attribute are allowed
if it always corresponds to existing guest NUMA node:
<page size='2048' unit='KiB' nodeset='1'/>
or
<page size='2048' unit='KiB' nodeset='0,1'/>
originally: works
now: works
expect: works
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB' nodeset='2'/>
</hugepages>
</memoryBacking>
...
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' cores='4' threads='2'/>
<numa>
<cell id='0' cpus='0-3' memory='4194304'
unit='KiB'/>
<cell id='1' cpus='4-7' memory='4194304'
unit='KiB'/>
</numa>
</cpu>
There is invalid guest NUMA node specified for the hugepage.
originally: fails
now: fails
expect: fails
* One hugepage with specific nodeset and second default hugepage
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0'/>
<page size='2048' unit='KiB'/>
</hugepages>
</memoryBacking>
...
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' cores='4' threads='2'/>
<numa>
<cell id='0' cpus='0-3' memory='4194304'
unit='KiB'/>
<cell id='1' cpus='4-7' memory='4194304'
unit='KiB'/>
</numa>
</cpu>
There are two guest NUMA nodes where we have default hugepage size
configured and for NUMA node '0' we have a different size.
originally: works
now: works
expect: works
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0,1'/>
<page size='2048' unit='KiB'/>
</hugepages>
</memoryBacking>
...
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' cores='4' threads='2'/>
<numa>
<cell id='0' cpus='0-3' memory='4194304'
unit='KiB'/>
<cell id='1' cpus='4-7' memory='4194304'
unit='KiB'/>
</numa>
</cpu>
originally: works
now: works
expect: fails ???
In this situation all the guest NUMA nodes are covered by the
hugepage size with specified nodeset attribute. The default one
is not used at all so is pointless here.
The difference between this and non-NUMA guest is that if we change
the order it will still work as expected, it doesn't depend on the
order of elements. However, we might consider is as invalid
configuration because there is no need to have the default page size
configured at all.
* Multiple combination of default and specific hugepage sizes
There are some restriction if we use multiple page sizes:
- There cannot be two different default hugepage sizes
- Two different page elements cannot have the same guest NUMA
node specified in nodeset attribute
- hugepages are not allowed if memory backing allocation is set
to 'ondemand' (not documented)
- hugepages are not allowed if memory backing source is set to
'anonymous' (not documented)
I hope that there is no other configuration that we need to care about.
Pavel
[1] <fa6bdf6afa878b8d7c5ed71664ee72be8967cdc5>
[2] <
https://bugzilla.redhat.com/show_bug.cgi?id=1591235>