[PATCH] Add state transition poll to DestroySystem() call

# HG changeset patch # User Kaitlin Rupert <karupert@us.ibm.com> # Date 1214850941 25200 # Node ID 98e6e7b6160be701c4124cb04eef9c893fdfaa1d # Parent b123b6b1fb08c3ab956f1c33801e743e082192b9 Add state transition poll to DestroySystem() call. Signed-off-by: Kaitlin Rupert <karupert@us.ibm.com> diff -r b123b6b1fb08 -r 98e6e7b6160b src/Virt_VirtualSystemManagementService.c --- a/src/Virt_VirtualSystemManagementService.c Fri Jun 27 08:59:09 2008 -0700 +++ b/src/Virt_VirtualSystemManagementService.c Mon Jun 30 11:35:41 2008 -0700 @@ -56,6 +56,7 @@ #define DEFAULT_MAC_PREFIX "00:16:3e" #define DEFAULT_XEN_WEIGHT 1024 +#define STATE_TRANSITION_TIMEOUT 120 const static CMPIBroker *_BROKER; @@ -832,6 +833,8 @@ CMPIObjectPath *sys; virConnectPtr conn = NULL; virDomainPtr dom = NULL; + virDomainInfo info; + int i; conn = connect_by_classname(_BROKER, CLASSNAME(reference), @@ -858,6 +861,34 @@ infostore_delete(virConnectGetType(conn), dom_name); virDomainDestroy(dom); /* Okay for this to fail */ + + for (i = 0; i < STATE_TRANSITION_TIMEOUT; i++) { + if ((i % 30) == 0) { + CU_DEBUG("Polling for destroy completion..."); + } + + dom = virDomainLookupByName(conn, dom_name); + if (dom == NULL) { + CU_DEBUG("Domain successfully destroyed"); + rc = IM_RC_OK; + trigger_indication(context, + "ComputerSystemDeletedIndication", + NAMESPACE(reference)); + goto error; + } + + if (virDomainGetInfo(dom, &info) != 0) { + CU_DEBUG("Unable to get domain current state"); + rc = IM_RC_SYS_NOT_FOUND; + goto error; + } + + if (info.state == VIR_DOMAIN_SHUTOFF) + break; + + sleep(1); + } + if (virDomainUndefine(dom) == 0) { rc = IM_RC_OK; trigger_indication(context,

KR> # HG changeset patch KR> # User Kaitlin Rupert <karupert@us.ibm.com> KR> # Date 1214850941 25200 KR> # Node ID 98e6e7b6160be701c4124cb04eef9c893fdfaa1d KR> # Parent b123b6b1fb08c3ab956f1c33801e743e082192b9 KR> Add state transition poll to DestroySystem() call. Sorry for the delay here... KR> +#define STATE_TRANSITION_TIMEOUT 120 First off, 120 seconds is sure to be longer than the CIM timeout, so if the call really lasts this long, the client will get a timeout. I think it's probably reasonable to expect that after a destroy(), the undefine() would happen before the domain could really be pulled offline. However, 120 seconds should be considered "something is wrong and the domain isn't going away" IMHO. If you think that 120 seconds is necessary (and reasonable), then we need to check for online-ness, and start a job to do this in that case. KR> + for (i = 0; i < STATE_TRANSITION_TIMEOUT; i++) { KR> + if ((i % 30) == 0) { KR> + CU_DEBUG("Polling for destroy completion..."); KR> + } KR> + KR> + dom = virDomainLookupByName(conn, dom_name); KR> + if (dom == NULL) { KR> + CU_DEBUG("Domain successfully destroyed"); KR> + rc = IM_RC_OK; KR> + trigger_indication(context, KR> + "ComputerSystemDeletedIndication", KR> + NAMESPACE(reference)); KR> + goto error; KR> + } KR> + KR> + if (virDomainGetInfo(dom, &info) != 0) { KR> + CU_DEBUG("Unable to get domain current state"); KR> + rc = IM_RC_SYS_NOT_FOUND; KR> + goto error; KR> + } KR> + KR> + if (info.state == VIR_DOMAIN_SHUTOFF) KR> + break; KR> + We have domain_online() which does (effectively) the same thing (but more concisely) and is considered to be the test for online-ness of a domain elsewhere. Any reason not to use that here? -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

Dan Smith wrote:
KR> # HG changeset patch KR> # User Kaitlin Rupert <karupert@us.ibm.com> KR> # Date 1214850941 25200 KR> # Node ID 98e6e7b6160be701c4124cb04eef9c893fdfaa1d KR> # Parent b123b6b1fb08c3ab956f1c33801e743e082192b9 KR> Add state transition poll to DestroySystem() call.
Sorry for the delay here...
KR> +#define STATE_TRANSITION_TIMEOUT 120
First off, 120 seconds is sure to be longer than the CIM timeout, so if the call really lasts this long, the client will get a timeout. I think it's probably reasonable to expect that after a destroy(), the undefine() would happen before the domain could really be pulled offline. However, 120 seconds should be considered "something is wrong and the domain isn't going away" IMHO.
If you think that 120 seconds is necessary (and reasonable), then we need to check for online-ness, and start a job to do this in that case.
I did some more testing, and adding the additional virDomainLookupByName() (to handle the case where the domain was created using virsh/virt-manager without being defined first) slows things down enough to prevent the issue I was seeing. I'm going to resubmit with the loop removed. And then I'll do some further testing with a system under load to see what happens. -- Kaitlin Rupert IBM Linux Technology Center kaitlin@linux.vnet.ibm.com
participants (2)
-
Dan Smith
-
Kaitlin Rupert