On 8/30/19 8:15 AM, Christophe de Dinechin wrote:
Daniel P. Berrangé writes:
>> The reason for this timeout is that we originally promised something
>> that we cannot deliver - a synchronous device detach API, while the
>> operation itself is asynchronous. I'm not a fan of exposing it and
>> making it configurable.
> I'm especially *not* a fan because the commit messages says this is
> a problem on certain architectures. Since we know what those arches
> are, we should use a larger timeout for those arches out of the box.
> Requiring admin to set a config param to fix the architectures is
> super unpleasant out of the box experiance.
True, but also notice that 5 seconds is also already close to the
attention span time limit for users [1]. So increasing it to 10s might
bring people to believe things are stuck, unless you provide some sort
of feedback that this is normal.
https://www.nngroup.com/articles/response-times-3-important-limits/
Interesting link, thanks.
About the user feedback due to long response delay: we're already
breaking this with the setvcpus command, at least with PowerPC
guests and a lot of vcpus being unplugged. Here's an example in
which I am able to complete the command without kicking the
timeout error (guest is idle, vcpu unplug is fast in this case):
--- guest booted with 1 vcpu - added 39 extra vcpus. Operation takes
a second, it's fast ---
[danielhb@kop5 libvirt]$ sudo ./run tools/virsh setvcpus vcpus-test 40
--live
[danielhb@kop5 libvirt]$
[danielhb@kop5 libvirt]$
--- removing them back ----
[danielhb@kop5 libvirt]$ time sudo ./run tools/virsh setvcpus vcpus-test
1 --live
real 0m21.695s
user 0m0.119s
sys 0m0.000s
[danielhb@kop5 libvirt]$
This happens in PowerPC because the timeout is being considered not for the
whole operation, but per device. Since I'm unplugging 39 devices and the
5 seconds timeout is refreshed for every operation, in theory the user
can wait close to 39*5 seconds with the terminal frozen.
Now, if we are to adhere to such UX standards (IMO, we should), I propose
the following:
- short term: increase PowerPC timeout to 10 seconds per device. Following
the UX guideline above, this is the limit we can go without warning the
user about the delay;
- short term: for PowerPC guests, tune 'setvcpus' message to warn the user
that the operation can take some time to complete;
These 2 are simples changes and I can get it done for the next release
without too much trouble.
- mid/long term: I can look into the PowerPC guest implementation, see
if there
are device_del events being fired up in QMP and implement a better UX with
more information about how the process is going. Something like "vcpu 1 out
of 30 unplugged", "vcpu 2 out of 30 unplugged", or a progress bar, or
whatever makes more sense to give the user a feeling of operation ongoing.
Note that I'm suggesting PowerPC only changes due to what Daniel said
earlier - we can't impact other users due to something that, at first
glance,
only PowerPC does different. I have a hunch that we should do for all
archs, but I can't defend this claim without testing this in x86 at least.
These short-term changes are easy to make it across the board though,so it's
just a matter of removing "if PowerPC" in these changes.
What do you think?
Thanks,
DHB
> Regards,
> Daniel
> --
> |:https://berrange.com -o-https://www.flickr.com/photos/dberrange :|
> |:https://libvirt.org -o-https://fstop138.berrange.com :|
> |:https://entangle-photo.org -o-https://www.instagram.com/dberrange :|
--
Cheers,
Christophe de Dinechin (IRC c3d)