Re: [libvirt] [PATCH 2/2] qemu: agent: fix unsafe agent access

23 Nov 2016


      On 22.11.2016 17:49, Michal Privoznik wrote:
...
On 14.11.2016 15:24, Nikolay Shirokovskiy wrote:
...
qemuDomainObjExitAgent is unsafe.
First it accesses domain object without domain lock.
Second it uses outdated logic that goes back to commit 79533da1 of
year 2009 when code was quite different. (unref function
instead of unreferencing only unlocked and disposed object
in case of last reference and leaved unlocking to the caller otherwise).
Nowadays this logic may lead to disposing locked object
i guess.
Well, I agree that the order of those two steps should be reversed, so
ACK to that.
...
Another problem is that the callers of qemuDomainObjEnterAgent
use domain object again (namely priv->agent) without domain lock.
But why is this a problem?
qemuProcessHandleAgentEOF for example can zero out priv->agent after 
we call qemuDomainObjEnterAgent and before we call qemuAgentGetTime(priv->agent, seconds, nseconds).
Because we drop domain lock in qemuDomainObjEnterAgent and 
qemuProcessHandleAgentEOF does not acquire job condition, only domain lock. 
At the same time qemuAgentGetTime is not ready for NULL agent and will crash.
It could get even worse as priv->agent is not atomic and instead of
NULL we can get garbage there. Long story short we just should
not access domain state without lock. Thats why I change all the places
so we get copy of priv->agent before we drop domain lock.
...
...
This patch address these two problems.
qemuDomainGetAgent is dropped as unused.
---
I wonder should we make qemuDomainObj(Enter|Exit)Agent be macros. So 
we don't need to declare a variable to hold agent reference and zero
it out at the end for safety.
Also the qemu monitor code has probably same problems as the agent monitor code
seems to be copied from there.
Well, qemu monitor can't come and go. I mean, if it does, the domain is
killed. That's not the case with agent monitor.
Nevertheless in the same situation as described above we can get
undefined behaviour on domain destroy.

Nikolay