[Libvir] Proposal: New virDomainLookup function to indicate Not found vs Error

import libvirt conn = libvirt.open ("xen+tls:///") d = conn.lookupByName ("Domain-0") d = conn.lookupByName ("doesnotexist") [...]
d = conn.lookupByName ("doesnotexist")
virt-install has some code which waits for a domain to appear just after it has been created. It looks like the loop attached to the end of this email, and is functional but has two problems. Problem (1) is that self.conn.lookupByName doesn't distinguish between a "Not found" domain and an actual error. For example there is no way to tell the difference between being unable to contact xend (an actual error), and being able to contact xend, but xend not being able to find the domain (not found). As shown here: libvirt.libvirtError: virDomainLookupByName() failed then I deliberately kill the remote daemon: libvir: Remote error : Error in the push function. [...] The first exception is a Not found condition (not an error) whereas the second is an error. Problem (2) is that virterror is over anxious to print error messages to stderr, even if the caller can handle them and even if (as in the Not found case) they don't indicate errors. In practical terms this means that the virt-install loop attached below may print out 1 or 2 error messages even when it is functioning normally. You'll see an error like this appearing [sic]: libvir: Xen Daemon error : GET operation failed: Since it's difficult to change the LookupBy* functions without changing the ABI, I suspect that the best thing to do is going to be to add a new call with better semantics. Therefore I suggest: virDomainPtr * virDomainLookup (virConnectPtr conn, int flags, int id, char *str, int *error); where flags is one of: VIR_LOOKUP_BY_ID, VIR_LOOKUP_BY_NAME, VIR_LOOKUP_BY_UUID or VIR_LOOKUP_BY_UUID_STRING The return values are: ret = domain, *error = 0 => found it ret = NULL, *error = 0 => not found ret = NULL, *error = 1 => error (check virterror) Addition 1: There would be a similar function virNetworkLookup, but without needing the 'id' parameter because networks don't have IDs. Addition 2: Change the driver internals so that they don't call virterror in the not found case. (This requires quite a bit of rejigging in xend_internal, but is not too hard). Addition 3: Language bindings could be modified to detect this function and if present change their existing LookupBy* functions to use the new interface. Thoughts? Rich. ---------------------------------------------------- This is the troublesome loop: logging.debug("Created guest, looking to see if it is running") # sleep in .25 second increments until either a) we find # our domain or b) it's been 5 seconds. this is so that # we can try to gracefully handle domain creation failures num = 0 d = None while num < (5 / .25): # 5 seconds, .25 second sleeps try: d = self.conn.lookupByName(self.name) break except libvirt.libvirtError, e: logging.debug("No guest running yet " + str(e)) pass num += 1 time.sleep(0.25) -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Wed, Jun 27, 2007 at 02:43:18PM +0100, Richard W.M. Jones wrote:
virt-install has some code which waits for a domain to appear just after it has been created. It looks like the loop attached to the end of this email, and is functional but has two problems.
Problem (1) is that self.conn.lookupByName doesn't distinguish between a "Not found" domain and an actual error. For example there is no way to tell the difference between being unable to contact xend (an actual error), and being able to contact xend, but xend not being able to find the domain (not found).
It is possible to tell the difference, we just don't report it well.
As shown here:
import libvirt conn = libvirt.open ("xen+tls:///") d = conn.lookupByName ("Domain-0") d = conn.lookupByName ("doesnotexist") [...] libvirt.libvirtError: virDomainLookupByName() failed
then I deliberately kill the remote daemon:
d = conn.lookupByName ("doesnotexist") libvir: Remote error : Error in the push function. [...]
The first exception is a Not found condition (not an error) whereas the second is an error.
There is no explicit libvirt error code for 'no such domain' so it is basically impossible to catch this scenario in an app currently This problem is actually not just an issue with virLookup* group of functions. Basically any function which takes a virDomainPtr arg can have a 'no such domain' error, since a domain could have gone away in the time since the app got hold of a virDomainPtr object. The QEMU driver just throws a generic 'internal error', but I think its worth introducing an explicit 'no such domain' and 'no such network' error code and fixing all functions to report these correctly.
Problem (2) is that virterror is over anxious to print error messages to stderr, even if the caller can handle them and even if (as in the Not found case) they don't indicate errors. In practical terms this means that the virt-install loop attached below may print out 1 or 2 error messages even when it is functioning normally. You'll see an error like this appearing [sic]:
libvir: Xen Daemon error : GET operation failed:
In the python bindings all errors are converted into Exceptions, so the python binding really shouldn't be printing out anything to the console at all by default. It'll all be reported as exceptions. The default error reporting func in libvirt is doing this so perhaps we should register a no-op func.
Since it's difficult to change the LookupBy* functions without changing the ABI, I suspect that the best thing to do is going to be to add a new call with better semantics. Therefore I suggest:
virDomainPtr * virDomainLookup (virConnectPtr conn, int flags, int id, char *str, int *error);
where flags is one of: VIR_LOOKUP_BY_ID, VIR_LOOKUP_BY_NAME, VIR_LOOKUP_BY_UUID or VIR_LOOKUP_BY_UUID_STRING
The return values are: ret = domain, *error = 0 => found it ret = NULL, *error = 0 => not found ret = NULL, *error = 1 => error (check virterror)
I'd do it the other way around, returning the error code, and putting the domain object into a parameter int virDomainLookup (virConnectPtr conn, int flags, int id, char *str, virDomainPtr *dom); That said, I'm not convinced we need this if we fix the error reporting of the original functions to allow the 'no such domain' error to be reliably caught & handled. Dan, -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Daniel P. Berrange wrote:
That said, I'm not convinced we need this if we fix the error reporting of the original functions to allow the 'no such domain' error to be reliably caught & handled.
OK, I tend to agree. I'll come up with a patch which adds the no-such-domain/network errors and make libvirt return them when appropriate. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903
participants (2)
-
Daniel P. Berrange
-
Richard W.M. Jones