Bump
Does anyone else want to weight in on this? Jano suggested that
exposing the timeout configuration is not the best plan, and I don't
have a strong opinion about it. I even mentioned below that I'm ok
with the idea of simply increasing the timeout to 15 seconds and
changing the error message a bit to let users know that this is
not an error per-se and the user should see the guest to see
the result.
If no one else have feelings for this matter, I'll simply re-sent the
series with the change I just mentioned. I might also keep the
unplug settings in the config object to get rid of qemu_hotplugpriv.h
(patch 3/3).
Thanks,
DHB
On 8/19/19 7:07 AM, Daniel Henrique Barboza wrote:
Hi,
On 8/19/19 6:16 AM, Ján Tomko wrote:
> On Sun, Aug 18, 2019 at 06:45:29PM -0300, Daniel Henrique Barboza wrote:
>> For some architectures and setups, device removal can take
>> longer than the default 5 seconds. This results in commands
>> such as 'virsh setvcpus' to fire timeout messages even if
>> the actual operation happened in the guest, causing confusion
>> for the user.
>>
>
> The commit that introduced this error message:
> commit e3229f6e4461cd1721dc68a32e16ab1718ae716e
> qemu: hotplug: Add support for VCPU unplug
>
> specifically says that we treat this differently than regular device
> detach:
>
> As the new code is using device_del all the implications of using it
> are present. Contrary to the device deletion code, the vcpu deletion
> code fails if the unplug request is not executed in time.
>
> Technically, we already did execute the unplug request so we lie to the
> user saying "operation failed".
>
> Maybe we can revisit the decision? [cc-ing pkrempa who added this]
>
I have thought about making setvcpus asynchronous when it is a
device_del operation. This would be more code (perhaps a new command
to do that? Or a --asynchronous option?), and we can set user
expectations
properly by making it clear that this is a unplug request command, and
the
user will need to check the result personally in the guest.
But then, if we are prepared to tell the user "go check yourself" we
can simply
change the current timeout message to say "vcpu unplug [...] timeout,
check unplug status in the guest". This would be clearer, and the user
wouldn't automatically assume that timeout == operation failed.
Another thing we can do, instead of exposing the option to the user
(which
has a good potential for disaster - hence why I added a minimal
value), is
to simply set the timeout to a greater value (10? 15?) and be done
with it.
If we set the timeout to 15 seconds and change the timeout message to
let the user know "we don't know, you'll need to check", like I said
above,
we have more resilience and will not alarm the user if a timeout still
occurs.
We'll also avoid exposing the timeout configuration like I'm doing here.
>> This patch adds a new qemu.conf parameter called 'unplug_timeout'
>> to handle these cases. If left unset, the current default
>> timeout is used. To avoid user 'experimentation' with small
>> timeouts, the current timeout is also the minimal value
>> allowed.
>>
>
> The reason for this timeout is that we originally promised something
> that we cannot deliver - a synchronous device detach API, while the
> operation itself is asynchronous. I'm not a fan of exposing it and
> making it configurable.
>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
>> ---
>> src/qemu/libvirtd_qemu.aug | 3 +++
>> src/qemu/qemu.conf | 4 ++++
>> src/qemu/qemu_conf.c | 26 ++++++++++++++++++++++++++
>> src/qemu/qemu_conf.h | 5 +++++
>> src/qemu/qemu_driver.c | 2 ++
>> src/qemu/test_libvirtd_qemu.aug.in | 1 +
>> 6 files changed, 41 insertions(+)
>>
>
> [...]
>
>> diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h
>> index 0cbddd7a9c..29824e4e35 100644
>> --- a/src/qemu/qemu_conf.h
>> +++ b/src/qemu/qemu_conf.h
>> @@ -214,6 +214,8 @@ struct _virQEMUDriverConfig {
>> gid_t swtpm_group;
>>
>> char **capabilityfilters;
>> +
>> + unsigned int unplugTimeout;
>> };
>>
>> /* Main driver state */
>> @@ -294,6 +296,9 @@ struct _virQEMUDriver {
>>
>> /* Immutable pointer, self-locking APIs */
>> virHashAtomicPtr migrationErrors;
>> +
>> + /* Immutable value */
>> + unsigned int unplugTimeout;
>> };
>>
>
> Why store this value twice?
I wanted the value to be available at the driver object, but saw that the
parsing of the reading file put stuff in config.
However, just realized that we get to the cfg via qemu_driver->config
(as long as the lock is being held, which I think it's the case for
all unplug
operations).
In case we still want this configurable timeout solution, I'll fix
this in the
next spin.
Thanks,
DHB
>
> Jano