On 11/2/18 8:35 AM, Daniel Henrique Barboza wrote:
Hi. Sorry for the late reply.
On 10/11/18 9:03 PM, John Ferlan wrote:
> On 10/4/18 9:14 AM, Daniel Henrique Barboza wrote:
>> This patch makes two quality of life changes for non-x86 guests. The
>> first one is a maxCpus validation at qemuDomainDefValidate. The check
>> is made by the same function used to do that at qemuProcessStartValidateXML,
>> qemuValidateCpuCount. This ensures that the user doesn't goes over the
>> board with the maxCpus value when editing the XML, only to be faced with a
>> runtime error when starting it.
>>
>> To do that, the following changes were made:
>>
>> - qemuValidateCpuCount was made public. It was renamed to
>> qemuProcessValidateCpuCount to be compliant with the other public methods
>> at qemu_process.h;
>>
>> - the method signature was slightly adapted to fit the const 'def'
>> variable used in qemuDomainDefValidate. This change has no downside in
>> in its original usage at qemuProcessStartValidateXML.
>>
>> The second QoL change is adding the maxCpus value in the error message
>> of the now qemuProcessValidateCpuCount. This simple change allows the
>> user to quickly edit the XML to comply with the acceptable limit without
>> having to know QEMU internals. x86 guests, that might have been created
>> prior to the x86 qemuDomainDefValidate maxCpus check code, will also
>> benefit from this change.
>>
>> After this patch, this is the expect result running a Power 9 guest with
>> a maxCpus of 40000:
>>
>> $ ./virsh start dhb
>> error: Failed to start domain dhb
>> error: unsupported configuration: Maximum CPUs greater than specified machine
type limit 240
>>
>> And this is the result when trying to edit maxCpus to an invalid value:
>>
>> $ ./virsh edit dhb
>> error: unsupported configuration: Maximum CPUs greater than specified machine
type limit 240
>> Failed. Try again? [y,n,i,f,?]:
>> error: unsupported configuration: Maximum CPUs greater than specified machine
type limit 240
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
>> ---
>> src/qemu/qemu_domain.c | 5 +++++
>> src/qemu/qemu_process.c | 13 +++++++------
>> src/qemu/qemu_process.h | 3 +++
>> 3 files changed, 15 insertions(+), 6 deletions(-)
>>
> When you write a commit message that says this much, start thinking - I
> might need to split this up into multiple patches to reach my final
> goal. I think there are 3 patches here:
>
> 1. Changing the name from qemuValidateCpuCount to
> qemuProcessValidateCpuCount, alter the @def variable name, and add to
> qemu_process.h.
>
> 2. Alter the error message.
>
> 3. The qemu_domain change.
I can split it in 3 patches as you suggested, no problem.
>> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
>> index 939b2a3da2..5a39b11da7 100644
>> --- a/src/qemu/qemu_domain.c
>> +++ b/src/qemu/qemu_domain.c
>> @@ -4100,6 +4100,11 @@ qemuDomainDefValidate(const virDomainDef *def,
>> }
>> }
>>
>> + if (!ARCH_IS_X86(def->os.arch) &&
>> + qemuProcessValidateCpuCount(def, qemuCaps) < 0) {
>> + goto cleanup;
>> + }
>> +
> So this really is the crux of the fix... To perform this validate during
> the define processing, which seems to be a good idea; however, I'm
> wondering why you're filtering "!ARCH_IS_X86". That same filter
isn't
> done at start time. Yes, there is an X86 specific message just before:
>
> unsupported configuration: more than 255 vCPUs are only supported on
> q35-based machine type
My idea here was to keep the x86 verification unchanged, with its own
error/warning message, while adding a vCPU verification for all other
archs. This is why I've used !ARCH_IS_X86.
It seems strange to have cpu count checking be arch specific especially
in a validate routine. Other arch specific checks deal with certain
truths for specific configurations.
Let's see if I can context switch my brain back here ;-) Ah yes, the
question rephrased is since qemuProcessStartValidateXML doesn't filter
on ARCH_IS_X86 when calling qemuProcessValidateCpuCount, then why are we
doing so here?
Additionally, if you follow the qemuProcessStartValidateXML call to
virDomainDefValidate you will see it calls config.domainValidateCallback
which is essentially calling qemuDomainDefValidate, so you'd just be
duplicating what was already done (but only at startup time).
So, the tl;dr is I think you'd be safe in having the call in
qemuDomainDefValidate be added without any guard. Due to the
VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATE guard at the top of
virDomainDefValidate it won't be called in those "previous" cases which
caused domains to disappear. As part of that change, remove the call and
comments above it from qemuProcessStartValidateXML.
It may even be possible to then remove qemuProcessStartValidateXML and
move the remaining single check back into qemuProcessStartValidate as a
patch after the move.
When revisiting this, I initially "feared" moving what was a start only
time check into a more generic after define time when
VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATE isn't set.
However, I see qemuDomainDefValidate was introduced in commit
5972f185e1c and virDomainDefValidate in commit b394af162a. Both of these
came after Cole's commit 5938f2d0b creating qemuProcessStartValidateXML.
Since the stated reason in Cole's commit is related to the "problem" of
validating previously accepted XML configs in order to ensure they
weren't dropped from the VM list when libvirt is updated and the purpose
of virDomainDefValidate processing is to validate XML when defining or
starting guests for the first time avoiding the chance that we "lose" a
guest with XML that shouldn't be validated, then I think we're safe in
just moving the call
> So, for q35-based machines, they'd have to wait for start to
get the
> message? Considering there's lots of roadblocks to define a q35 with >
> 255 vCPUs, I think we'd be safe.
> Still part of me wonders why we don't just move the
> qemuProcessStartValidateXML call to qemuProcessValidateCpuCount rather
> than having it both places.
>
> I think when it was first added where it was it was done because the
> DefValidate processing didn't exist (see [1] for some history). Back
> then all we had was PostParse processing and if XML range validation was
> done there guests may "disappear" when libvirtd restarted. So adding to
> start processing was the only answer.
Should I move qemuProcessStartValidadeXML to qemuProcessValidadeCpuCount
in the next spin then?
Not quite sure what you were asking here! I think you meant what I've
stated above though.
John
>> if (def->nresctrls &&
>> def->virtType != VIR_DOMAIN_VIRT_KVM) {
>> virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 29b0ba1590..0de5b503d6 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -3901,9 +3901,9 @@ qemuProcessSPICEAllocatePorts(virQEMUDriverPtr driver,
>> }
>>
>>
>> -static int
>> -qemuValidateCpuCount(virDomainDefPtr def,
>> - virQEMUCapsPtr qemuCaps)
>> +int
>> +qemuProcessValidateCpuCount(const virDomainDef *def,
>> + virQEMUCapsPtr qemuCaps)
> Going back through history commit ff968889 added this logic
>
>> {
>> unsigned int maxCpus = virQEMUCapsGetMachineMaxCpus(qemuCaps,
def->os.machine);
>>
>> @@ -3914,8 +3914,9 @@ qemuValidateCpuCount(virDomainDefPtr def,
>> }
>>
>> if (maxCpus > 0 && virDomainDefGetVcpusMax(def) > maxCpus) {
>> - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
>> - _("Maximum CPUs greater than specified machine type
limit"));
>> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> + _("Maximum CPUs greater than specified machine type
limit %u"),
>> + maxCpus);
> Change this to
>
> virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> _("Maximum CPUs greater than specified machine "
> "type limit: %u"), maxCpus);
>
> It's a long line thing...
Ok!
>> return -1;
>> }
>>
>> @@ -5165,7 +5166,7 @@ qemuProcessStartValidateXML(virQEMUDriverPtr driver,
>> * If back compat isn't a concern, XML validation should probably
>> * be done at parse time.
>> */
> [1]
> Looking at the history of the above comment finds commit 5938f2d0b which
> was added before commit b394af162 to introduce virDomainDefValidate.
> Then eventually commit d071d292c added a call to virDomainDefValidate
> for VIR_QEMU_PROCESS_START_NEW processing. Commit 05eab1bf9a further
> altered that code to just call virDomainDefValidate
>
>> - if (qemuValidateCpuCount(vm->def, qemuCaps) < 0)
>> + if (qemuProcessValidateCpuCount(vm->def, qemuCaps) < 0)
>> return -1;
>>
>> /* checks below should not be executed when starting a qemu process for a
>> diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h
>> index c2f7c2b5d2..1716230475 100644
>> --- a/src/qemu/qemu_process.h
>> +++ b/src/qemu/qemu_process.h
>> @@ -215,4 +215,7 @@ int qemuProcessStartManagedPRDaemon(virDomainObjPtr vm);
>>
>> void qemuProcessKillManagedPRDaemon(virDomainObjPtr vm);
>>
>> +int qemuProcessValidateCpuCount(const virDomainDef *def,
>> + virQEMUCapsPtr qemuCaps);
>> +
> Not that it matters, but I'd like to see this in the same logical order
> after qemuProcessDestroyMemoryBackingPath and before
> qemuProcessAutostartAll. If you look at the source code that's about
> it's location. Of course qemuProcessAutostartAll doesn't exist, but I
> will fix that after I send this. It seems commit aad362f93 moved
> qemuProcessReconnectAll, but didn't change the .h file <sigh>...
Ok!
> John
>
>> #endif /* __QEMU_PROCESS_H__ */
>>