On 8/29/24 2:56 PM, Markus Armbruster wrote:
Daniil Tatianin <d-tatianin(a)yandex-team.ru> writes:
> The "reconnect" option only allows to specify the time in seconds,
> which is way too long for certain workflows.
>
> We have a lightweight disk backend server, which takes about 20ms to
> live update, but due to this limitation in QEMU, previously the guest
> disk controller would hang for one second because it would take this
> long for QEMU to reinitialize the socket connection.
>
> Make it possible to specify a smaller timeout by treating the value in
> "reconnect" as milliseconds via the new "reconnect-is-ms"
option.
>
> Signed-off-by: Daniil Tatianin <d-tatianin(a)yandex-team.ru>
Your use case demonstrates that a granularity of seconds was the wrong
choice for the reconnection delay.
[...]
> diff --git a/qapi/char.json b/qapi/char.json
> index ef58445cee..61aeccf09d 100644
> --- a/qapi/char.json
> +++ b/qapi/char.json
> @@ -272,8 +272,13 @@
> # (default: false) (Since: 3.1)
> #
> # @reconnect: For a client socket, if a socket is disconnected, then
> -# attempt a reconnect after the given number of seconds. Setting
> -# this to zero disables this function. (default: 0) (Since: 2.2)
> +# attempt a reconnect after the given number of seconds (unless
> +# @reconnect-is-ms is set to true, in that case the number is
> +# treated as milliseconds). Setting this to zero disables
> +# this function. (default: 0) (Since: 2.2)
> +#
> +# @reconnect-is-ms: The value specified in @reconnect should be treated
> +# as milliseconds. (default: false) (Since: 9.2)
> #
> # Since: 1.4
> ##
> @@ -287,7 +292,8 @@
> '*telnet': 'bool',
> '*tn3270': 'bool',
> '*websocket': 'bool',
> - '*reconnect': 'int' },
> + '*reconnect': 'int',
> + '*reconnect-is-ms': 'bool' },
> 'base': 'ChardevCommon' }
> ##
I don't like this interface.
Indeed. After discussing this more internally we have actually settled
on one of the alternatives you have mentioned below.
PRO: compatible extension; no management application updates needed
unless they want to support sub-second delays.
CON: specifying a sub-second delay takes two parameters, which is
awkward.
CON: trap in combination with -set. Before the patch, something like
-set chardev.ID.reconnect=N means N seconds no matter what.
Afterwards, it depends on the value of reconnect-is-ms, which may be
set far away. Mitigating factor: -set is obscure.
Alternatives:
1. Change @reconnect to 'number', specify sub-second delays as
fractions.
PRO: compatible extension; no management application updates needed
unless they want to support sub-second delays.
CON: first use of floating-point for time in seconds in QAPI, as far
as I can see.
CON: QemuOpts can't do floating-point.
2. Deprecate @reconnect in favour of a new member with a more suitable
unit. Error if both are present.
PRO: after @reconnect is gone, the interface is what it arguably
should've been from the start.
CON: incompatible change. Management application updates needed
within the deprecation grace period.
This is the one we decided to go with. We simply added a new
"reconnect-ms" option, which doesn't replace the old one for backwards
compatibility, but also disallows both to be specified at the same time,
making them mutually exclusive. I think deprecation would be the way to
go here.
Let's get additional input from management application developers. I
cc'ed some.
Sure, sounds great. Thanks!
>
> Related: NetdevSocketOptions member @reconnect.
>