Thank you both for your patience and discussion, John and Michal.
I will send a new version to fix the issues referred in the discussion.
On 09/10/2018 10:22 PM, John Ferlan wrote:
>
>
> On 09/05/2018 11:09 PM, Yi Wang wrote:
>> When doing some job holding state lock for a long time,
>> we may come across error:
>
> blank line
>
>> "Timed out during operation: cannot acquire state change lock"
>
> blank line
>
>> Well, sometimes it's not a problem and users want to continue
>> to wait, and this patch allow users decide how long time they
>> can wait the state lock.
>
> As I noted in a previous review... With this patch it's not the
> individual user or command that can decide how long to wait, rather it's
> set by qemu.conf policy. The duration of the wait just becomes
> customize-able. Not that it's any different the current implementation,
> since the value is set in qemuDomainObjBeginJobInternal that means
> there's no distinguishing between job, asyncjob, and agentjob. There's
> also no attempt to allow changing the value via virt-admin.
I'm not sure how we would implement per API timeout. We could perhaps do
per connection but that's also not desired.
And regarding "how does one know how long to wait" - well, they don't.
That's the thing with timeouts. It's not an argument against them
though. Even primitive functions (e.g. those from pthread) have
XXX_timed() variants.
>
> A "strong recommendation" of longer than 30 seconds doesn't provide
much
> guidance. The setting of value generally depends on the configuration
> and what causes an API to hold the lock for longer than 30 seconds.
> That's probably limited to primarily virDomainListGetStats and/or
> virConnectGetAllDomainStats. There's other factors in play including CPU
> speed, CPU quota allowed, network throughput/latency, disk speed, etc.
> Tough to create a value the works for every command/purpose.
Agreed, "strong recommendation" is use case dependant (although it's
perhaps what initiated this patch). If anything, we should discourage
people from touching this knob.
>
> Ironically (perhaps) allowing a value of 0 would essentially make it so
> no one waited - such as how qemuDomainObjBeginJobNowait operates
> essentially. OTOH, in order to have everything wait "as long as
> possible" passing -1 would be a lot easier than providing 2147483647 (or
> if properly typed to uint, then 4294967295U).
Passing 0 is explicitly forbidden (although in a bit clumsy way - we
need to check the value right after it's parsed because multiplying it
by 1000 can lead to overflow and a negative value becomes positive or
vice versa).
[......]
> If this were an unsigned int, then the < 0 check wouldn't
necessary.
>
> So then the question becomes, should 0 (zero) be special cased as wait
> forever.
This check needs to be done before multiplication otherwise an overflow
might occur. Also, the multiplication can be done in BeginJobInternal so
that cfg->stateLockTimeout holds value entered by user.
>
>> + virReportError(VIR_ERR_CONF_SYNTAX, "%s",
>> + _("state_lock_timeout should larger than
zero"));
>
> If 0 were left as invalid, then s/should/must be/
>
>> + return -1;
>> + }
>> +
>> return 0;
>> }
>>
Michal
---
Best wishes
Yi Wang