On Wed, Jul 04, 2018 at 02:35:23PM +0200, Michal Prívozník wrote:
> On 07/04/2018 02:05 PM, Pavel Hrdina wrote:
>> On Wed, Jul 04, 2018 at 01:51:57PM +0200, Michal Privoznik wrote:
>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1448149
>>
>> This BZ is not related to the patch, it describes different issue.
>>
>> There is different BZ that tracks the issue:
>>
>>
https://bugzilla.redhat.com/show_bug.cgi?id=1591235
>>
>>> In fa6bdf6afa878 and ec982f6d929f3c23 I've tried to forbid
>>> configuring <hugepages/> for non-existent guest NUMA nodes.
>>> However, the check was not fine tuned enough as it failed when no
>>> guest NUMA was configured but <hugepages/> were configured for
>>> guest NUMA node #0.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>>> ---
>>> src/qemu/qemu_command.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
>>> index 04c5c28438..15caf392aa 100644
>>> --- a/src/qemu/qemu_command.c
>>> +++ b/src/qemu/qemu_command.c
>>> @@ -7421,7 +7421,7 @@ qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg,
>>>
>>> if (def->mem.hugepages[0].nodemask) {
>>> ssize_t next_bit =
virBitmapNextSetBit(def->mem.hugepages[0].nodemask, -1);
>>> - if (next_bit >= 0) {
>>> + if (next_bit > 0) {
>>> virReportError(VIR_ERR_XML_DETAIL,
>>> _("hugepages: node %zd not found"),
>>> next_bit);
>>
>> This code is weird, it checks only the first hugepage element but
>> the XML in this case can look like this:
>>
>> <memoryBacking>
>> <hugepages>
>> <page size='2048' unit='KiB' nodeset='0'/>
>> <page size='1048576' unit='KiB'/>
>> </hugepages>
>> <locked/>
>> </memoryBacking>
>>
>> which is obviously wrong and would be accepted and 2048KiB pages would
>> be used.
>
> Can you elaborate more on why do you consider this erroneous behvaiour?
> The whole idea behind @nodeset was to have one default (1GiB in this
> case) and then allow users fine tune hugepages used for NUMA nodes. So
> the config you pasted says: 1GiB is the default and for NUMA node #0 use
> 2MiB.
Well, this patch fixes an issue where you don't have NUMA nodes for the
guest, but the XML example would be accepted as valid for guest without
NUMA nodes and that is wrong.
The 2MiB would be the default and the 1GiB would be ignored.
>>
>>> @@ -7577,7 +7577,7 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
>>> }
>>>
>>> next_bit = virBitmapNextSetBit(def->mem.hugepages[i].nodemask,
pos);
>>> - if (next_bit >= 0) {
>>> + if (next_bit > 0) {
>>> virReportError(VIR_ERR_XML_DETAIL,
>>> _("hugepages: node %zd not found"),
>>> next_bit);
>>
>> The fact that we have this check in two places makes me thing that it's
>> wrong.
>
> The reason is that we have two options for generating command line to
> enable hugepages (which are not compatible so we have to have both to
> maintain backward compatibility upon migration):
>
> 1) -mem-path /path/to/hugetlbfs
> 2) -object memory-backend-file
The check is executed for the same data with the same result in two
different paths, so we can move it up before the path is split.
>> Actually this should be moved from qemu build command into qemu
>> validate code or event into generic validate code.
>
> I fear we can't do that. Historically we've accepted a wide variety of
> misconfigured domains from this respect and putting the check into
> validation code would mean losing them on daemon restart. That's the
> reason we have those checks at domain startup time.
The validation code is not executed on daemon restart exactly for that
reason so there is no issue with that.
Well, if you think you can do this then I am all for it and I'm holding
my fingers crossed for you. I've spent non-trivial amount of time trying
to fix all the corner cases (roughly:
libvirt.git $ git log --author=mprivozn --oneline | grep -i huge | wc -l
) and it would be pity if we would break it now.
Michal