Re: [libvirt] [PATCH v13] support offline migration

在 2012-11-09五的 08:44 +0800,li guang写道:
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 978af57..6c2bf98 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9796,7 +9796,7 @@ qemuDomainMigrateBegin3(virDomainPtr domain, asyncJob = QEMU_ASYNC_JOB_NONE; }
- if (!virDomainObjIsActive(vm)) { + if (!virDomainObjIsActive(vm) && !(flags & VIR_MIGRATE_OFFLINE)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("domain is not running")); goto endjob; @@ -9805,9 +9805,9 @@ qemuDomainMigrateBegin3(virDomainPtr domain, /* Check if there is any ejected media. * We don't want to require them on the destination. */ - - if (qemuDomainCheckEjectableMedia(driver, vm, asyncJob) < 0) - goto endjob; + if (virDomainObjIsActive(vm)) + if (qemuDomainCheckEjectableMedia(driver, vm, asyncJob) < 0) + goto endjob;
if (!(xml = qemuMigrationBegin(driver, vm, xmlin, dname, cookieout, cookieoutlen,
What if you want to offline migrate a domain that is currently running? I think it's better to first move qemuDomainCheckEjectableMedia down to qemuMigrationBegin in a separate patch and then just use the code from v12 (without using the offline label).
sorry, I don't like to do offline migrate for a domain that's running.
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 5f8a9c5..66fbc02 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -72,6 +72,7 @@ enum qemuMigrationCookieFlags { QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE, QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT, QEMU_MIGRATION_COOKIE_FLAG_NETWORK, + QEMU_MIGRATION_COOKIE_FLAG_OFFLINE,
QEMU_MIGRATION_COOKIE_FLAG_LAST }; @@ -79,13 +80,14 @@ enum qemuMigrationCookieFlags { VIR_ENUM_DECL(qemuMigrationCookieFlag); VIR_ENUM_IMPL(qemuMigrationCookieFlag, QEMU_MIGRATION_COOKIE_FLAG_LAST, - "graphics", "lockstate", "persistent", "network"); + "graphics", "lockstate", "persistent", "network", "offline");
enum qemuMigrationCookieFeatures { QEMU_MIGRATION_COOKIE_GRAPHICS = (1 << QEMU_MIGRATION_COOKIE_FLAG_GRAPHICS), QEMU_MIGRATION_COOKIE_LOCKSTATE = (1 << QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE), QEMU_MIGRATION_COOKIE_PERSISTENT = (1 << QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT), QEMU_MIGRATION_COOKIE_NETWORK = (1 << QEMU_MIGRATION_COOKIE_FLAG_NETWORK), + QEMU_MIGRATION_COOKIE_OFFLINE = (1 << QEMU_MIGRATION_COOKIE_FLAG_OFFLINE), };
typedef struct _qemuMigrationCookieGraphics qemuMigrationCookieGraphics; @@ -594,6 +596,9 @@ qemuMigrationCookieXMLFormat(struct qemud_driver *driver, if ((mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) && mig->network) qemuMigrationCookieNetworkXMLFormat(buf, mig->network);
+ if (mig->flags & QEMU_MIGRATION_COOKIE_OFFLINE) + virBufferAsprintf(buf, " <offline/>\n"); + virBufferAddLit(buf, "</qemu-migration>\n"); return 0; } @@ -874,6 +879,11 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, (!(mig->network = qemuMigrationCookieNetworkXMLParse(ctxt)))) goto error;
+ if ((flags & QEMU_MIGRATION_COOKIE_OFFLINE)) { + if (virXPathBoolean("count(./offline) > 0", ctxt)) + mig->flags |= QEMU_MIGRATION_COOKIE_OFFLINE; + } + return 0;
error: @@ -938,6 +948,10 @@ qemuMigrationBakeCookie(qemuMigrationCookiePtr mig, return -1; }
+ if (flags & QEMU_MIGRATION_COOKIE_OFFLINE) { + mig->flags |= QEMU_MIGRATION_COOKIE_OFFLINE; + } + if (!(*cookieout = qemuMigrationCookieXMLFormatStr(driver, mig))) return -1;
Oh, I'm sorry for not noticing this earlier, but why exactly do we need this <offline/> element in migration cookie? It doesn't look like we need to store some additional data required for offline migration. I think just passing the VIR_MIGRATE_OFFLINE flag will be better. Not to mention that if an older libvirt gets a cookie with <offline/> element, it will just ignore it while passing a flag (which the code should already been doing anyway) should make it fail for unsupported flag.
without this how do you know you a offline migration at target side?
+ goto cleanup; + } + if (xmlin) { if (!(def = virDomainDefParseString(driver->caps, xmlin, QEMU_EXPECTED_VIRT_TYPES,
@@ -1443,6 +1457,24 @@ char *qemuMigrationBegin(struct qemud_driver *driver, QEMU_MIGRATION_COOKIE_LOCKSTATE) < 0) goto cleanup;
+ if (flags & VIR_MIGRATE_OFFLINE) { + if (flags & (VIR_MIGRATE_NON_SHARED_DISK| + VIR_MIGRATE_NON_SHARED_INC)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("offline migration cannot handle non-shared storage")); + goto cleanup; + } + if (!(flags & VIR_MIGRATE_PERSIST_DEST)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("offline migration must be specified with the persistent flag set")); + goto cleanup; + } + if (qemuMigrationBakeCookie(mig, driver, vm, + cookieout, cookieoutlen, + QEMU_MIGRATION_COOKIE_OFFLINE) <
Good, just wrap the two long lines and create a separate patch (which this one will depend on) to move qemuDomainCheckEjectableMedia here as well.
@@ -1607,6 +1639,15 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, goto endjob; }
+ if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, + QEMU_MIGRATION_COOKIE_OFFLINE))) + return ret;
AS I wrote in my last review, this needs to call goto endjob rather than returning directly.
right, will fix.
+ + if (mig->flags & QEMU_MIGRATION_COOKIE_OFFLINE) { + ret = 0; + goto done; + } + /* Start the QEMU daemon, with the same command-line arguments plus * -incoming $migrateFrom */ ... @@ -2150,6 +2192,9 @@ qemuMigrationRun(struct qemud_driver *driver, return -1; }
+ if (flags & VIR_MIGRATE_OFFLINE) + return 0; + if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen,
QEMU_MIGRATION_COOKIE_GRAPHICS)))
goto cleanup;
I still think we should not even get into qemuMigrationRun when doing offline migration.
No, it will get into here for I did not touch migrationPerform path, initially I don't want to change any code at libvirt.c so I must keep offline migration walk through whole path like normal, or there're maybe lots of fixes, any will break up the migration path.
@@ -2665,7 +2710,12 @@ static int doPeer2PeerMigrate3(struct qemud_driver *driver, uri, &uri_out, flags, dname, resource, dom_xml); qemuDomainObjExitRemoteWithDriver(driver, vm); } + VIR_FREE(dom_xml); + + if (flags & VIR_MIGRATE_OFFLINE) + goto cleanup; + if (ret == -1) goto cleanup;
Quoting from my previous review:
This will skip not only Perform but also Finish phase in peer-to-peer migration. You want to jump to finish label and do that *after* the check for ret == 1.
OK, I will jump to finish label.
@@ -2771,7 +2821,7 @@ finish: vm->def->name);
cleanup: - if (ddomain) { + if (ddomain || (flags & VIR_MIGRATE_OFFLINE)) { virObjectUnref(ddomain); ret = 0; } else {
I think this should not be changed at all.
or you'll get an error for ret = -1.
...
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 393b67b..54ba63a 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -6644,6 +6644,7 @@ static const vshCmdInfo info_migrate[] = {
static const vshCmdOptDef opts_migrate[] = { {"live", VSH_OT_BOOL, 0, N_("live migration")}, + {"offline", VSH_OT_BOOL, 0, N_("offline (domain's inactive) migration")}, {"p2p", VSH_OT_BOOL, 0, N_("peer-2-peer migration")}, {"direct", VSH_OT_BOOL, 0, N_("direct migration")}, {"tunneled", VSH_OT_ALIAS, 0, "tunnelled"}, @@ -6729,6 +6730,15 @@ doMigrate(void *opaque) if (vshCommandOptBool(cmd, "unsafe")) flags |= VIR_MIGRATE_UNSAFE;
+ if (vshCommandOptBool(cmd, "offline")) { + flags |= VIR_MIGRATE_OFFLINE; + } + + if (virDomainIsActive(dom) && (flags & VIR_MIGRATE_OFFLINE)) { + vshError(ctl, "%s", _("domain is active, offline migration for inactive domain only")); + goto out; + } +
Another thing I didn't notice last time :-( Is there any reason why offline migrating a running domain should be forbidden? But even if there was a reason, this check doesn't belong to virsh.
like I said before, running domain is active, so it's not offline, if you want offline process to do job for an active domain, I think an explicit flag is required.
And --ofline flag for virsh migrate should also be documented in virsh man page.
got it, I'll add this to virsh man page.
Jirka
-- li guang lig.fnst@cn.fujitsu.com linux kernel team at FNST, china

On 11/08/2012 07:57 PM, li guang wrote:
What if you want to offline migrate a domain that is currently running? I think it's better to first move qemuDomainCheckEjectableMedia down to qemuMigrationBegin in a separate patch and then just use the code from v12 (without using the offline label).
sorry, I don't like to do offline migrate for a domain that's running.
Why not, when we've already argued it can be useful? You don't have to support it in the same patch, but you should at least take the advice on how to refactor things so that someone else that does like the idea can extend things to provide it. There's nothing wrong with having persistent definitions of the same domain on more than one machine, as long as only one machine at a time is running it; and after all, with live migration, if you _don't_ pass the --undefinesource flag, you will be left in the same situation where a persistent domain on the source is left behind even when the running domain has migrated.
Oh, I'm sorry for not noticing this earlier, but why exactly do we need this <offline/> element in migration cookie? It doesn't look like we need to store some additional data required for offline migration. I think just passing the VIR_MIGRATE_OFFLINE flag will be better. Not to mention that if an older libvirt gets a cookie with <offline/> element, it will just ignore it while passing a flag (which the code should already been doing anyway) should make it fail for unsupported flag.
without this how do you know you a offline migration at target side?
By the presence or absence of the flag. If the flag is present, you are doing an offline migration. I agree that we don't need to add anything to the cookie, since the only way to get offline migration is via the new flag, and both source and destination will see the same set of flags. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

在 2012-11-08四的 20:28 -0700,Eric Blake写道:
On 11/08/2012 07:57 PM, li guang wrote:
What if you want to offline migrate a domain that is currently running? I think it's better to first move qemuDomainCheckEjectableMedia down to qemuMigrationBegin in a separate patch and then just use the code from v12 (without using the offline label).
sorry, I don't like to do offline migrate for a domain that's running.
Why not, when we've already argued it can be useful? You don't have to support it in the same patch, but you should at least take the advice on how to refactor things so that someone else that does like the idea can extend things to provide it. There's nothing wrong with having persistent definitions of the same domain on more than one machine, as long as only one machine at a time is running it; and after all, with live migration, if you _don't_ pass the --undefinesource flag, you will be left in the same situation where a persistent domain on the source is left behind even when the running domain has migrated.
I mean we should do this explicitly, e.g. an flag to notify us we are doing job for active domain by walking through offline path. I said this before, at comment on virsh change.
Oh, I'm sorry for not noticing this earlier, but why exactly do we need this <offline/> element in migration cookie? It doesn't look like we need to store some additional data required for offline migration. I think just passing the VIR_MIGRATE_OFFLINE flag will be better. Not to mention that if an older libvirt gets a cookie with <offline/> element, it will just ignore it while passing a flag (which the code should already been doing anyway) should make it fail for unsupported flag.
without this how do you know you a offline migration at target side?
By the presence or absence of the flag. If the flag is present, you are
seem flags was forgotten at qemuDomainMigratePrepare3, so any preparation for offline migration will be impossible, isn't it? do you mean I should change MigratePrepare functions to add flags as a parameter?
doing an offline migration. I agree that we don't need to add anything to the cookie, since the only way to get offline migration is via the new flag, and both source and destination will see the same set of flags.
-- li guang lig.fnst@cn.fujitsu.com linux kernel team at FNST, china

On 11/08/2012 08:54 PM, li guang wrote:
在 2012-11-08四的 20:28 -0700,Eric Blake写道:
On 11/08/2012 07:57 PM, li guang wrote:
What if you want to offline migrate a domain that is currently running? I think it's better to first move qemuDomainCheckEjectableMedia down to qemuMigrationBegin in a separate patch and then just use the code from v12 (without using the offline label).
sorry, I don't like to do offline migrate for a domain that's running.
Why not, when we've already argued it can be useful? You don't have to support it in the same patch, but you should at least take the advice on how to refactor things so that someone else that does like the idea can extend things to provide it. There's nothing wrong with having persistent definitions of the same domain on more than one machine, as long as only one machine at a time is running it; and after all, with live migration, if you _don't_ pass the --undefinesource flag, you will be left in the same situation where a persistent domain on the source is left behind even when the running domain has migrated.
I mean we should do this explicitly, e.g. an flag to notify us we are doing job for active domain by walking through offline path. I said this before, at comment on virsh change.
You ARE being explicit when you say 'virsh migrate --offline'. If the domain is transient, it must fail, but if the domain is persistent, then whether or not it is running, you are requesting that ONLY the offline state be migrated.
without this how do you know you a offline migration at target side?
By the presence or absence of the flag. If the flag is present, you are
seem flags was forgotten at qemuDomainMigratePrepare3, so any preparation for offline migration will be impossible, isn't it? do you mean I should change MigratePrepare functions to add flags as a parameter?
Huh? qemuDomainMigratePrepare3 already has a flags argument (arg 8), and it is directly related to the flags argument that was originally passed to virDomainMigrate() in libvirt.c. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

在 2012-11-08四的 21:06 -0700,Eric Blake写道:
On 11/08/2012 08:54 PM, li guang wrote:
在 2012-11-08四的 20:28 -0700,Eric Blake写道:
On 11/08/2012 07:57 PM, li guang wrote:
What if you want to offline migrate a domain that is currently running? I think it's better to first move qemuDomainCheckEjectableMedia down to qemuMigrationBegin in a separate patch and then just use the code from v12 (without using the offline label).
sorry, I don't like to do offline migrate for a domain that's running.
Why not, when we've already argued it can be useful? You don't have to support it in the same patch, but you should at least take the advice on how to refactor things so that someone else that does like the idea can extend things to provide it. There's nothing wrong with having persistent definitions of the same domain on more than one machine, as long as only one machine at a time is running it; and after all, with live migration, if you _don't_ pass the --undefinesource flag, you will be left in the same situation where a persistent domain on the source is left behind even when the running domain has migrated.
I mean we should do this explicitly, e.g. an flag to notify us we are doing job for active domain by walking through offline path. I said this before, at comment on virsh change.
You ARE being explicit when you say 'virsh migrate --offline'. If the domain is transient, it must fail, but if the domain is persistent, then whether or not it is running, you are requesting that ONLY the offline state be migrated.
reasonable? maybe.
without this how do you know you a offline migration at target side?
By the presence or absence of the flag. If the flag is present, you are
seem flags was forgotten at qemuDomainMigratePrepare3, so any preparation for offline migration will be impossible, isn't it? do you mean I should change MigratePrepare functions to add flags as a parameter?
Huh? qemuDomainMigratePrepare3 already has a flags argument (arg 8), and it is directly related to the flags argument that was originally passed to virDomainMigrate() in libvirt.c.
please check qemuMigrationPrepareDirect called in qemuDomainMigratePrepare3 -- li guang lig.fnst@cn.fujitsu.com linux kernel team at FNST, china

On 11/08/2012 09:31 PM, li guang wrote:
seem flags was forgotten at qemuDomainMigratePrepare3, so any preparation for offline migration will be impossible, isn't it? do you mean I should change MigratePrepare functions to add flags as a parameter?
Huh? qemuDomainMigratePrepare3 already has a flags argument (arg 8), and it is directly related to the flags argument that was originally passed to virDomainMigrate() in libvirt.c.
please check qemuMigrationPrepareDirect called in qemuDomainMigratePrepare3
qemuMigrationPrepareDirect is not a callback, and not a public API, so you are free to change its signature and add a flags parameter if that is necessary to get the information to the right places. The only signatures you can't change are in src/libvirt.c and in functions serving as callbacks registered through signatures in src/driver.h. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

在 2012-11-08四的 21:50 -0700,Eric Blake写道:
On 11/08/2012 09:31 PM, li guang wrote:
seem flags was forgotten at qemuDomainMigratePrepare3, so any preparation for offline migration will be impossible, isn't it? do you mean I should change MigratePrepare functions to add flags as a parameter?
Huh? qemuDomainMigratePrepare3 already has a flags argument (arg 8), and it is directly related to the flags argument that was originally passed to virDomainMigrate() in libvirt.c.
please check qemuMigrationPrepareDirect called in qemuDomainMigratePrepare3
qemuMigrationPrepareDirect is not a callback, and not a public API, so you are free to change its signature and add a flags parameter if that is necessary to get the information to the right places. The only signatures you can't change are in src/libvirt.c and in functions serving as callbacks registered through signatures in src/driver.h.
OK, will change. -- li guang lig.fnst@cn.fujitsu.com linux kernel team at FNST, china

On Fri, Nov 09, 2012 at 10:57:23 +0800, li guang wrote: ...
@@ -2150,6 +2192,9 @@ qemuMigrationRun(struct qemud_driver *driver, return -1; }
+ if (flags & VIR_MIGRATE_OFFLINE) + return 0; + if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen,
QEMU_MIGRATION_COOKIE_GRAPHICS)))
goto cleanup;
I still think we should not even get into qemuMigrationRun when doing offline migration.
No, it will get into here for I did not touch migrationPerform path, initially I don't want to change any code at libvirt.c so I must keep offline migration walk through whole path like normal, or there're maybe lots of fixes, any will break up the migration path.
Well, and that's the thing that actually needs fixing. The migration code in libvirt.c and the code doing peer-to-peer migration in qemu_migration.c are very similar because they are doing almost the same job (only at different levels). It makes sense to change them both at the same time. Why do you want to avoid changing code in libvirt.c? Jirka

在 2012-11-09五的 13:00 +0100,Jiri Denemark写道:
On Fri, Nov 09, 2012 at 10:57:23 +0800, li guang wrote: ...
@@ -2150,6 +2192,9 @@ qemuMigrationRun(struct qemud_driver *driver, return -1; }
+ if (flags & VIR_MIGRATE_OFFLINE) + return 0; + if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen,
QEMU_MIGRATION_COOKIE_GRAPHICS)))
goto cleanup;
I still think we should not even get into qemuMigrationRun when doing offline migration.
No, it will get into here for I did not touch migrationPerform path, initially I don't want to change any code at libvirt.c so I must keep offline migration walk through whole path like normal, or there're maybe lots of fixes, any will break up the migration path.
Well, and that's the thing that actually needs fixing. The migration code in libvirt.c and the code doing peer-to-peer migration in qemu_migration.c are very similar because they are doing almost the same job (only at different levels). It makes sense to change them both at the same time. Why do you want to avoid changing code in libvirt.c?
first, I think it's clean enough for my patch to do offline migration, second, someone told me to keep public API stability of libvirt before(it's not clear who and when), so I fear to change. you mean I should take care of fixing or re-factoring redundant code this time? maybe I can try to do something after this patch.
Jirka
-- li guang lig.fnst@cn.fujitsu.com linux kernel team at FNST, china

On Mon, Nov 12, 2012 at 09:26:30 +0800, li guang wrote:
在 2012-11-09五的 13:00 +0100,Jiri Denemark写道:
On Fri, Nov 09, 2012 at 10:57:23 +0800, li guang wrote:
Well, and that's the thing that actually needs fixing. The migration code in libvirt.c and the code doing peer-to-peer migration in qemu_migration.c are very similar because they are doing almost the same job (only at different levels). It makes sense to change them both at the same time. Why do you want to avoid changing code in libvirt.c?
first, I think it's clean enough for my patch to do offline migration, second, someone told me to keep public API stability of libvirt before(it's not clear who and when), so I fear to change.
Well, don't worry I'm aware of API/ABI stability we need to maintain in libvirt and I wouldn't suggest you to do something that would break anything in that area. Of course you can't add new parameters to existing public functions, for example. But changing the code inside to support new flags is allowed.
you mean I should take care of fixing or re-factoring redundant code this time? maybe I can try to do something after this patch.
No. It's just that the migration code in libvirt.c and the p2p code in qemu_migration.c are very similar since they are doing the same thing at different levels. You can't really refactor that and I'm not asking you to do so. However, since the code logic is similar in both places, it's wise to keep it that way and just skip Perform phase for offline migrations regardless on the actual type of migration. I'd much like to see that rather than hacking somewhere inside the code doing Perform phase to exit if it notices it should not have been run at all. Jirka

在 2012-11-16五的 11:21 +0100,Jiri Denemark写道:
On Mon, Nov 12, 2012 at 09:26:30 +0800, li guang wrote:
在 2012-11-09五的 13:00 +0100,Jiri Denemark写道:
On Fri, Nov 09, 2012 at 10:57:23 +0800, li guang wrote:
Well, and that's the thing that actually needs fixing. The migration code in libvirt.c and the code doing peer-to-peer migration in qemu_migration.c are very similar because they are doing almost the same job (only at different levels). It makes sense to change them both at the same time. Why do you want to avoid changing code in libvirt.c?
first, I think it's clean enough for my patch to do offline migration, second, someone told me to keep public API stability of libvirt before(it's not clear who and when), so I fear to change.
Well, don't worry I'm aware of API/ABI stability we need to maintain in libvirt and I wouldn't suggest you to do something that would break anything in that area. Of course you can't add new parameters to existing public functions, for example. But changing the code inside to support new flags is allowed.
Okay.
you mean I should take care of fixing or re-factoring redundant code this time? maybe I can try to do something after this patch.
No. It's just that the migration code in libvirt.c and the p2p code in qemu_migration.c are very similar since they are doing the same thing at different levels. You can't really refactor that and I'm not asking you to do so. However, since the code logic is similar in both places, it's wise to keep it that way and just skip Perform phase for offline migrations regardless on the actual type of migration. I'd much like to see that rather than hacking somewhere inside the code doing Perform phase to exit if it notices it should not have been run at all.
Jirka
-- li guang lig.fnst@cn.fujitsu.com linux kernel team at FNST, china
participants (3)
-
Eric Blake
-
Jiri Denemark
-
li guang