[Libvir] virDomainDestroy not synchronous?

i686 libvirt-0.1.5-3 xen-3.0.2-33 Code: domain_desc = virDomainGetXMLDesc(vdp, 0); response = virDomainDestroy(vdp); if (response == 0 && domain_desc) { /* sleep(3); */ virDomainCreateLinux(vp, domain_desc, 0); free(domain_desc); } If I add the 'sleep()', the domain dies and virDomainCreateLinux() fails with the following error: libvir: Xen Daemon error : POST operation failed: No such domain futon1 Failed to get devices for domain futon1 That really doesn't bother me, and it's likely my fault anyway (it's not a fully configured domain; it's a dom-U sitting in the anaconda install screen). Of course, it's moot if we get a guaranteed synchronous reboot flag for virDomainReboot()... hint hint ;) What bothers me is that if I *don't* put the sleep in, the domain never actually gets destroyed. In this particular case, the domain is still sitting at the anaconda install screen; all I have to do is reconnect using xm console. If I only do a virDomainDestroy(), the domain gets destroyed. So, it looks like like the virDomainCreateLinux() cancels the destroy request before it can be completed. This makes it look like that the destroy request is not synchronous, and/or the result of the operation is not guaranteed (only that the request was made). Is it supposed to work this way, or is this a bug/problem? -- Lon

On Tue, 2006-09-26 at 18:07 -0400, Lon Hohberger wrote: xend.log -- Lon

On Tue, 2006-09-26 at 18:07 -0400, Lon Hohberger wrote:
This makes it look like that the destroy request is not synchronous, and/or the result of the operation is not guaranteed (only that the request was made). Is it supposed to work this way, or is this a bug/problem?
Nope, looks like it just boots faster than I get to the console. Disregard this thread. -- Lon

On Tue, Sep 26, 2006 at 06:40:58PM -0400, Lon Hohberger wrote:
On Tue, 2006-09-26 at 18:07 -0400, Lon Hohberger wrote:
This makes it look like that the destroy request is not synchronous, and/or the result of the operation is not guaranteed (only that the request was made). Is it supposed to work this way, or is this a bug/problem?
Nope, looks like it just boots faster than I get to the console. Disregard this thread.
Pfff ... I feel better :-) Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Destroy is neither synchronous nor guaranteed. It's a request to the hypervisor that isn't completed until all of the memory is completely unmapped by any other domain that may be mapping it. If you want to be really robust, you shouldn't assume that the domain is actually destroyed after doing a destroy. The race conditions, in practice, are usually very small but they are still there. Regards, Anthony Liguori On Tue, 26 Sep 2006 18:07:06 -0400, Lon Hohberger wrote:
i686
libvirt-0.1.5-3 xen-3.0.2-33
Code:
domain_desc = virDomainGetXMLDesc(vdp, 0); response = virDomainDestroy(vdp); if (response == 0 && domain_desc) { /* sleep(3); */ virDomainCreateLinux(vp, domain_desc, 0); free(domain_desc); } } If I add the 'sleep()', the domain dies and virDomainCreateLinux() fails with the following error:
libvir: Xen Daemon error : POST operation failed: No such domain futon1 Failed to get devices for domain futon1
That really doesn't bother me, and it's likely my fault anyway (it's not a fully configured domain; it's a dom-U sitting in the anaconda install screen). Of course, it's moot if we get a guaranteed synchronous reboot flag for virDomainReboot()... hint hint ;)
What bothers me is that if I *don't* put the sleep in, the domain never actually gets destroyed. In this particular case, the domain is still sitting at the anaconda install screen; all I have to do is reconnect using xm console.
If I only do a virDomainDestroy(), the domain gets destroyed. So, it looks like like the virDomainCreateLinux() cancels the destroy request before it can be completed.
This makes it look like that the destroy request is not synchronous, and/or the result of the operation is not guaranteed (only that the request was made). Is it supposed to work this way, or is this a bug/problem?
-- Lon

On Wed, Sep 27, 2006 at 01:54:53PM -0500, Anthony Liguori wrote:
Destroy is neither synchronous nor guaranteed. It's a request to the hypervisor that isn't completed until all of the memory is completely unmapped by any other domain that may be mapping it.
If you want to be really robust, you shouldn't assume that the domain is actually destroyed after doing a destroy. The race conditions, in practice, are usually very small but they are still there.
So is there any better way to block on destroy here ? In the clustering scenario its neccessary to 'fence' a misbehaving domain on a host before bringing it back online. From what you're saying it would appear to be neccessary to poll for completion of the destroy op before trying to restart the domain. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Daniel P. Berrange wrote:
On Wed, Sep 27, 2006 at 01:54:53PM -0500, Anthony Liguori wrote:
Destroy is neither synchronous nor guaranteed. It's a request to the hypervisor that isn't completed until all of the memory is completely unmapped by any other domain that may be mapping it.
If you want to be really robust, you shouldn't assume that the domain is actually destroyed after doing a destroy. The race conditions, in practice, are usually very small but they are still there.
So is there any better way to block on destroy here ? In the clustering scenario its neccessary to 'fence' a misbehaving domain on a host before bringing it back online. From what you're saying it would appear to be neccessary to poll for completion of the destroy op before trying to restart the domain.
The 3.0.4 API ought to have proper async/sync semantics. Polling is an option. Keep in mind, this problem isn't limited to destroy. It's true for reboot, shutdown, etc. There are very few ops that are actually synchronous in Xen today. Regards, Anthony Liguori
Regards, Dan.

On Wed, 2006-09-27 at 14:26 -0500, Anthony Liguori wrote:
So is there any better way to block on destroy here ? In the clustering scenario its neccessary to 'fence' a misbehaving domain on a host before bringing it back online. From what you're saying it would appear to be neccessary to poll for completion of the destroy op before trying to restart the domain.
The 3.0.4 API ought to have proper async/sync semantics. Polling is an option.
Ok, that's fine -- I'll poll until the domain's gone before returning; on the newer API, it won't break anything.
Keep in mind, this problem isn't limited to destroy. It's true for reboot, shutdown, etc. There are very few ops that are actually synchronous in Xen today.
/me wants sync reboot... :) *wishes upon a star* -- Lon
participants (4)
-
Anthony Liguori
-
Daniel P. Berrange
-
Daniel Veillard
-
Lon Hohberger