On Fri, Aug 31, 2018 at 12:19:12 +0200, Laszlo Ersek wrote:
On 08/31/18 10:00, Igor Mammedov wrote:
> On Thu, 30 Aug 2018 17:51:13 +0200
> Laszlo Ersek <lersek(a)redhat.com> wrote:
>
>> +Drew
>>
>> On 08/30/18 14:08, Igor Mammedov wrote:
>>> If VM has VCPUs plugged sparselly (for example a VM started with
>>> 3 VCPUs (cpu0, cpu1 and cpu2) and then cpu1 was hotunplugged so
>>> only cpu0 and cpu2 are present), QGA will rise a error
>>> error: internal error: unable to execute QEMU agent command
'guest-get-vcpus':
>>> open("/sys/devices/system/cpu/cpu1/"): No such file or
directory
>>> when
>>> virsh vcpucount FOO --guest
>>> is executed.
>>> Fix it by ignoring non present CPUs when fetching CPUs status from sysfs.
>>>
>>> Signed-off-by: Igor Mammedov <imammedo(a)redhat.com>
>>> ---
>>> qga/commands-posix.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
>>> index 37e8a2d..2929872 100644
>>> --- a/qga/commands-posix.c
>>> +++ b/qga/commands-posix.c
>>> @@ -2044,7 +2044,9 @@ static void transfer_vcpu(GuestLogicalProcessor *vcpu,
bool sys2vcpu,
>>> vcpu->logical_id);
>>> dirfd = open(dirpath, O_RDONLY | O_DIRECTORY);
>>> if (dirfd == -1) {
>>> - error_setg_errno(errp, errno, "open(\"%s\")",
dirpath);
>>> + if (!(sys2vcpu && errno == ENOENT)) {
>>> + error_setg_errno(errp, errno,
"open(\"%s\")", dirpath);
>>> + }
>>> } else {
>>> static const char fn[] = "online";
>>> int fd;
>>>
>>
>> Originally these guest agent commands (both getting and setting) were
>> meant to be used in the absence of real VCPU hot[un]plug, as a fallback
>> / place-holder.
>>
>> If the latter (= real VCPU hot(un)plug) works, then these guest agent
>> commands shouldn't be used at all.
> Technically there isn't reasons for "get" not to work in sparse
usecase
> hence the patch.
>
>> Drew, do I remember correctly? ... The related RHBZ is
>> <
https://bugzilla.redhat.com/show_bug.cgi?id=924684>. (It's a private
>> one, and I'm not at liberty to open it up, so my apologies to non-RH
folks.)
>>
>> Anyway, given that "set" should be a subset of the "get"
return value
>> (as documented in the command schema), if we fix up "get" to work
with
>> sparse topologies, then "set" should work at once.
>>
>> However... as far as I understand, this change will allow
>> qmp_guest_get_vcpus() to produce a GuestLogicalProcessor object for the
>> missing (hot-unplugged) VCPU, with the following contents:
>> - @logical-id: populated by the loop,
>> - @online: false (from g_malloc0()),
>> - @can-offline: present (from the loop), and false (from g_malloc0()).
>>
>> The smaller problem with this might be that "online==false &&
>> can-offline==false" is nonsensical and has never been returned before. I
>> don't know how higher level apps will react.
>>
>> The larger problem might be that a higher level app could simply copy
>> this output structure into the next "set" call unchanged, and then
that
>> "set" call will fail.
We do that. Not the unchanged part, but we request the structure, modify
some bits and pass it to the 'set' API.
> Libvirt it seems that survives such outrageous output
Depending on the API. The old API which only allows to set a target vCPU
count has a check for it:
1630 /* This shouldn't happen, but we can't trust the guest agent */
1631 if (!cpuinfo[i].online && !cpuinfo[i].offlinable) {
1632 virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
1633 _("Invalid data provided by guest agent"));
1634 return -1;
1635 }
The new API does not need such a check for all cases, but only for those
where the user requests a change. We don't check that the vCPU cannot be
onlined though in that case.
>> I wonder if, instead of this patch, we should rework
>> qmp_guest_get_vcpus(), to silently skip processors for which this
>> dirpath ENOENT condition arises (i.e., return a shorter list of
>> GuestLogicalProcessor objects).
I think that is the correct idea.
>
>> But, again, I wouldn't mix this guest agent command with real VCPU
>> hot(un)plug in the first place. The latter is much-much better, so if
>> it's available, use that exclusively?
> Agreed,
>
> Maybe we can block invalid usecase on libvirt side with a more clear
> error message as libvirt sort of knows that sparse cpus are supported.
OK -- I see libvir-list is CC'd, so let's hear what they prefer :)
I don't see a reason to disallow it since it is technically possible.
While it makes slightly less sense to use rather than real unplug since
the changes can be undone by the system administrator, but nevertheless
there are few advantages for trusted VMs e.g. boot cpu(s) are not
unpluggable by the new API and this approach allows to control them.
Additionally we don't know which approach the user wishes to use in most
cases so there's no magic point where we can disable the guest-agent
based API.
Note that the new API also reports the presence of the structure for the
given vCPU so there should not be any usability problem.