[PATCH 0 of 2] Implement restart-style migration

Adds restart migration, which shuts down a guest, moves the config to a remote host, and then restarts the domain. Also includes a fix to domain_online() This seems like a pretty effective and safe way to move a guest around if you don't need it to stay up in the process.

# HG changeset patch # User Dan Smith <danms@us.ibm.com> # Date 1206112986 25200 # Node ID c3dca3932e0b9f80778cbd6cbdfa09de4a7632a9 # Parent 594c9195e59c8025ea18b3d6b5ca43b322db55e1 Make domain_online() re-lookup domain to make sure we're getting fresh info This is necessary if you're using domain_online() in a polling loop. Signed-off-by: Dan Smith <danms@us.ibm.com> diff -r 594c9195e59c -r c3dca3932e0b libxkutil/misc_util.c --- a/libxkutil/misc_util.c Thu Mar 20 13:13:57 2008 -0700 +++ b/libxkutil/misc_util.c Fri Mar 21 08:23:06 2008 -0700 @@ -378,12 +378,25 @@ bool domain_online(virDomainPtr dom) bool domain_online(virDomainPtr dom) { virDomainInfo info; - - if (virDomainGetInfo(dom, &info) != 0) + virDomainPtr _dom; + bool rc; + + _dom = virDomainLookupByName(virDomainGetConnect(dom), + virDomainGetName(dom)); + if (_dom == NULL) { + CU_DEBUG("Unable to re-lookup domain"); return false; - - return (info.state == VIR_DOMAIN_BLOCKED) || - (info.state == VIR_DOMAIN_RUNNING); + } + + if (virDomainGetInfo(_dom, &info) != 0) + rc = false; + else + rc = (info.state == VIR_DOMAIN_BLOCKED) || + (info.state == VIR_DOMAIN_RUNNING) || + (info.state == VIR_DOMAIN_NOSTATE); + virDomainFree(_dom); + + return rc; } int parse_id(const char *id,

Dan Smith wrote:
# HG changeset patch # User Dan Smith <danms@us.ibm.com> # Date 1206112986 25200 # Node ID c3dca3932e0b9f80778cbd6cbdfa09de4a7632a9 # Parent 594c9195e59c8025ea18b3d6b5ca43b322db55e1 Make domain_online() re-lookup domain to make sure we're getting fresh info
This is necessary if you're using domain_online() in a polling loop.
Signed-off-by: Dan Smith <danms@us.ibm.com>
diff -r 594c9195e59c -r c3dca3932e0b libxkutil/misc_util.c --- a/libxkutil/misc_util.c Thu Mar 20 13:13:57 2008 -0700 +++ b/libxkutil/misc_util.c Fri Mar 21 08:23:06 2008 -0700 @@ -378,12 +378,25 @@ bool domain_online(virDomainPtr dom) bool domain_online(virDomainPtr dom) { virDomainInfo info; - - if (virDomainGetInfo(dom, &info) != 0) + virDomainPtr _dom; + bool rc; + + _dom = virDomainLookupByName(virDomainGetConnect(dom), + virDomainGetName(dom)); + if (_dom == NULL) { + CU_DEBUG("Unable to re-lookup domain"); return false;
I definitely could be wrong here, but I thought that virDomainGetInfo got live info regardless of the age of the virDomainPtr, as long as it still referenced the right domain. I seem to remember running into that when I was writing the ComputerSystemModifiedIndication code, and that was the reason I had to get XML description and store it, instead of just pulling from two different virDomainPtrs and comparing. That said, this can't hurt, so if we don't have a definite answer I'm totally cool with it. -- -Jay

JG> I definitely could be wrong here, but I thought that JG> virDomainGetInfo got live info regardless of the age of the JG> virDomainPtr, as long as it still referenced the right domain. I JG> seem to remember running into that when I was writing the JG> ComputerSystemModifiedIndication code, and that was the reason I JG> had to get XML description and store it, instead of just pulling JG> from two different virDomainPtrs and comparing. Yeah, I thought so too, and it does, to some extent. However, I think the problem I was running into was when the domain goes from online to offline state, the DomInfo returned from the DomPtr isn't correct anymore. So what would happen was I would never see the SHUTDOWN state, it would just start returning NO_STATE after the domain went away. This solves that problem. Given that people expect the behavior from this function that you describe, I think making sure we do what we need to do for that is good :) -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

# HG changeset patch # User Dan Smith <danms@us.ibm.com> # Date 1206113389 25200 # Node ID 6c2d68b9722a23a94075172270acffad1bbbad6f # Parent c3dca3932e0b9f80778cbd6cbdfa09de4a7632a9 Add restart migration Based on a patch Kaitlin started on before she left and sent to me. Signed-off-by: Dan Smith <danms@us.ibm.com> Signed-off-by: Kaitlin Rupert <karupert@us.ibm.com> diff -r c3dca3932e0b -r 6c2d68b9722a src/Virt_VSMigrationService.c --- a/src/Virt_VSMigrationService.c Fri Mar 21 08:23:06 2008 -0700 +++ b/src/Virt_VSMigrationService.c Fri Mar 21 08:29:49 2008 -0700 @@ -55,6 +55,8 @@ #define CIM_JOBSTATE_RUNNING 4 #define CIM_JOBSTATE_COMPLETE 7 +#define MIGRATE_SHUTDOWN_TIMEOUT 120 + #define METHOD_RETURN(r, v) do { \ uint32_t rc = v; \ CMReturnData(r, (CMPIValue *)&rc, CMPI_uint32); \ @@ -905,6 +907,46 @@ static CMPIStatus handle_migrate(virConn return s; } +static CMPIStatus handle_restart_migrate(virConnectPtr dconn, + virDomainPtr dom, + struct migration_job *job) +{ + CMPIStatus s = {CMPI_RC_OK, NULL}; + int ret; + int i; + + CU_DEBUG("Shutting down domain for migration"); + ret = virDomainShutdown(dom); + if (ret != 0) { + cu_statusf(_BROKER, &s, + CMPI_RC_ERR_FAILED, + "Unable to shutdown guest"); + goto out; + } + + for (i = 0; i < MIGRATE_SHUTDOWN_TIMEOUT; i++) { + if ((i % 30) == 0) { + CU_DEBUG("Polling for shutdown completion..."); + } + + if (!domain_online(dom)) + goto out; + + sleep(1); + } + + cu_statusf(_BROKER, &s, + CMPI_RC_ERR_FAILED, + "Domain failed to shutdown in %i seconds", + MIGRATE_SHUTDOWN_TIMEOUT); + out: + CU_DEBUG("Domain %s shutdown", + s.rc == CMPI_RC_OK ? "did" : "did NOT"); + + return s; +} + + static CMPIStatus prepare_migrate(virDomainPtr dom, char **xml) { @@ -924,7 +966,8 @@ static CMPIStatus prepare_migrate(virDom static CMPIStatus complete_migrate(virDomainPtr ldom, virConnectPtr rconn, - const char *xml) + const char *xml, + bool restart) { CMPIStatus s = {CMPI_RC_OK, NULL}; virDomainPtr newdom = NULL; @@ -942,6 +985,16 @@ static CMPIStatus complete_migrate(virDo } CU_DEBUG("Defined domain on destination host"); + + if (restart) { + CU_DEBUG("Restarting domain on remote host"); + if (virDomainCreate(newdom) != 0) { + CU_DEBUG("Failed to start domain on remote host"); + cu_statusf(_BROKER, &s, + CMPI_RC_ERR_FAILED, + "Failed to start domain on remote host"); + } + } out: virDomainFree(newdom); @@ -1015,10 +1068,13 @@ static CMPIStatus migrate_vs(struct migr s = handle_migrate(job->conn, dom, VIR_MIGRATE_LIVE, job); break; case CIM_MIGRATE_RESUME: - case CIM_MIGRATE_RESTART: CU_DEBUG("Static migration"); s = handle_migrate(job->conn, dom, 0, job); break; + case CIM_MIGRATE_RESTART: + CU_DEBUG("Restart migration"); + s = handle_restart_migrate(job->conn, dom, job); + break; default: CU_DEBUG("Unsupported migration type (%d)", job->type); cu_statusf(_BROKER, &s, @@ -1030,7 +1086,10 @@ static CMPIStatus migrate_vs(struct migr if (s.rc != CMPI_RC_OK) goto out; - s = complete_migrate(dom, job->conn, xml); + s = complete_migrate(dom, + job->conn, + xml, + job->type == CIM_MIGRATE_RESTART); if (s.rc == CMPI_RC_OK) { CU_DEBUG("Migration succeeded"); } else {

Dan Smith wrote:
# HG changeset patch # User Dan Smith <danms@us.ibm.com> # Date 1206113389 25200 # Node ID 6c2d68b9722a23a94075172270acffad1bbbad6f # Parent c3dca3932e0b9f80778cbd6cbdfa09de4a7632a9 Add restart migration
Based on a patch Kaitlin started on before she left and sent to me.
Signed-off-by: Dan Smith <danms@us.ibm.com> Signed-off-by: Kaitlin Rupert <karupert@us.ibm.com>
+static CMPIStatus handle_restart_migrate(virConnectPtr dconn, + virDomainPtr dom, + struct migration_job *job) +{ + CMPIStatus s = {CMPI_RC_OK, NULL}; + int ret; + int i; + + CU_DEBUG("Shutting down domain for migration"); + ret = virDomainShutdown(dom); + if (ret != 0) { + cu_statusf(_BROKER, &s, + CMPI_RC_ERR_FAILED, + "Unable to shutdown guest"); + goto out; + } + + for (i = 0; i < MIGRATE_SHUTDOWN_TIMEOUT; i++) { + if ((i % 30) == 0) { + CU_DEBUG("Polling for shutdown completion..."); + }
Everything looks good to me, but whenever I see the modulo operator I wanna make sure I know what's going on. This is essentially "print a 'polling' message every thirty seconds so the user can see we haven't died", right? And not to sign myself up for more work, but would this be the kind of place for an indication? -- -Jay

JG> looks good to me, but whenever I see the modulo operator I JG> wanna make sure I know what's going on. This is essentially "print a JG> polling' message every thirty seconds so the user can see we haven't JG> died", right? Not the user so much as the person trying to debug what's going on. I did that to help track the shutdown, while I was trying to figure out why the states weren't transitioning properly. In reality, only a few of those will be printed, so it shouldn't be a big deal, but it's certainly easily removed. JG> And not to sign myself up for more work, but would this be the JG> kind of place for an indication? Well, the regular migration indications will still fire, as will a CSModified indication, so I think we're covered. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com
participants (2)
-
Dan Smith
-
Jay Gagnon