Re: [libvirt] [PATCH 01/11] Introduce qemuDomainObjExitMonitorAlive

17 Dec 2014

On Wed, Dec 17, 2014 at 01:58:28PM +0100, Ján Tomko wrote:
...
On 12/17/2014 01:21 PM, Daniel P. Berrange wrote:
...
On Wed, Dec 17, 2014 at 01:18:41PM +0100, Michal Privoznik wrote:
...
On 16.12.2014 17:41, Ján Tomko wrote:
...
The domain might disappear during the time in monitor when
the virDomainObjPtr is unlocked, so the caller needs to check
if it's still alive.
Introduce a helper function for the callers that want to report
an error in this case.
---
 src/qemu/qemu_domain.c | 17 +++++++++++++++++
 src/qemu/qemu_domain.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 8dd427a..d074429 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -1606,6 +1606,23 @@ void qemuDomainObjExitMonitor(virQEMUDriverPtr driver,
     qemuDomainObjExitMonitorInternal(driver, obj);
 }
+/* obj must NOT be locked before calling
+ *
+ * Same as qemuDomainObjExitMonitor, but reports an error and
+ * returns -1 if the domain is no longer alive after exiting the monitor
Can you state explicitly that this function needs to be called whenever the
domain object or vm->def, or privdata (any runtime info held within vm in
general) is used after ExitMonitor()?
Right, I can add this to qemu/THREADS.txt as well.
...
Are there any cases where that is not true ?  eg could we just avoid
introducing this new function, by making the existing funtion check
liveliness every time. That would avoid the class of bugs you outline
above entirely.
I was worried about backports, but new code checking the return value would
fail to compile with the void function in older libvirt.
We should add  ATTRIBUTE_RETURNCHECK to force all callers to check
the return status too
...
There are functions that only call EndJob and return after exiting the
monitor. But if the domain crashed before the API even returned, can we really
claim success?
There are some APIs where we do

  if (running)
     do x
  else
     do y

even with those though, I think we're justified in reporting an error
if the domain stopped at any point in the 'do x' code path - after all
we're already doing to be doing that if it happens while we're issuing
a mointor API call. So this endjob check is only extending that window
of opportunity for errors by a couple more instructions.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|