
Thank you both for your patience and discussion, John and Michal. I will send a new version to fix the issues referred in the discussion.
On 09/10/2018 10:22 PM, John Ferlan wrote:
On 09/05/2018 11:09 PM, Yi Wang wrote:
When doing some job holding state lock for a long time, we may come across error:
blank line
"Timed out during operation: cannot acquire state change lock"
blank line
Well, sometimes it's not a problem and users want to continue to wait, and this patch allow users decide how long time they can wait the state lock.
As I noted in a previous review... With this patch it's not the individual user or command that can decide how long to wait, rather it's set by qemu.conf policy. The duration of the wait just becomes customize-able. Not that it's any different the current implementation, since the value is set in qemuDomainObjBeginJobInternal that means there's no distinguishing between job, asyncjob, and agentjob. There's also no attempt to allow changing the value via virt-admin.
I'm not sure how we would implement per API timeout. We could perhaps do per connection but that's also not desired.
And regarding "how does one know how long to wait" - well, they don't. That's the thing with timeouts. It's not an argument against them though. Even primitive functions (e.g. those from pthread) have XXX_timed() variants.
A "strong recommendation" of longer than 30 seconds doesn't provide much guidance. The setting of value generally depends on the configuration and what causes an API to hold the lock for longer than 30 seconds. That's probably limited to primarily virDomainListGetStats and/or virConnectGetAllDomainStats. There's other factors in play including CPU speed, CPU quota allowed, network throughput/latency, disk speed, etc. Tough to create a value the works for every command/purpose.
Agreed, "strong recommendation" is use case dependant (although it's perhaps what initiated this patch). If anything, we should discourage people from touching this knob.
Ironically (perhaps) allowing a value of 0 would essentially make it so no one waited - such as how qemuDomainObjBeginJobNowait operates essentially. OTOH, in order to have everything wait "as long as possible" passing -1 would be a lot easier than providing 2147483647 (or if properly typed to uint, then 4294967295U).
Passing 0 is explicitly forbidden (although in a bit clumsy way - we need to check the value right after it's parsed because multiplying it by 1000 can lead to overflow and a negative value becomes positive or vice versa).
[......]
If this were an unsigned int, then the < 0 check wouldn't necessary.
So then the question becomes, should 0 (zero) be special cased as wait forever.
This check needs to be done before multiplication otherwise an overflow might occur. Also, the multiplication can be done in BeginJobInternal so that cfg->stateLockTimeout holds value entered by user.
+ virReportError(VIR_ERR_CONF_SYNTAX, "%s", + _("state_lock_timeout should larger than zero"));
If 0 were left as invalid, then s/should/must be/
+ return -1; + } + return 0; }
Michal
--- Best wishes Yi Wang