[libvirt PATCH 0/6] Introduce Local Migration Support in Libvirt

I'm (re-)sending this patch series on behalf of Shaju Abraham <shaju.abraham@nutanix.com> who has tried to send this several times already. Red Hat's email infrastructure is broken, accepting the mails and then failing to deliver them to mailman, or any other Red Hat address. Unfortunately it means that while we can send comments back to Shaju on this thread, subscribers will then probably fail to see any responses Shaju tries to give :-( To say this is bad is an understatement. I have yet another ticket open tracking & escalating this awful problem but can't give any ETA on a fix :-( Anyway, with that out of the way, here's Shaju's original cover letter below.... 1) What is this patch series about? Local live migration of a VM is about Live migrating a VM instance with in the same node. Traditional libvirt live migration involves migrating the VM from a source node to a remote node. The local migrations are forbidden in Libvirt for a myriad of reasons. This patch series is to enable local migration in Libvirt. 2) Why Local Migration is important? The ability to Live migrate a VM locally paves the way for hypervisor upgrades without shutting down the VM. For example to upgrade qemu after a security upgrade, we can locally migrate the VM to the new qemu instance. By utilising capabilities like "bypass-shared-memory" in qemu, the hypervisor upgrades are faster. 3) Why is local migration difficult in Libvirt? Libvirt always assumes that the name/UUID pair is unique with in a node. During local migration there will be two different VMs with the same UUID/name pair which will confuse the management stack. There are other path variables like monitor path, config paths etc which assumes that the name/UUID pair is unique. So during migration the same monitor will be used by both the source and the target. We cannot assign a temporary UUID to the target VM, since UUID is a part of the machine ABI which is immutable. To decouple the dependecy on UUID/name, a new field (the domain id) is included in all the PATHs that Libvirt uses. This will ensure that all instances of the VM gets a unique PATH. 4) How is the Local Migration Designed ? Libvirt manages all the VM domain objects using two hash tables which are indexed using either the UUID or Name.During the Live migration the domain entry in the source node gets deleted and a new entry gets populated in the target node, which are indexed using the same name/UUID.But for the Local migration, there is no remote node. Both the source and the target nodes are same. So inorder to model the remote node, two more hashtables are introduced which represents the hash tables of the remote node during migration. The Libvirt migration involves 5 stages 1) Begin 2) Prepare 3) Perform 4) Finish 5) Confirm Begin,Perform and Confirm gets executed on the source node where as Prepare and Finish gets executed on the target node. In the case of Local Migration Perform and Finish stages uses the newly introduced 'remote hash table' and rest of the stages uses the 'source hash tables'. Once the migration is completed, that is after the confirm phase, the VM domain object is moved from the 'remote hash table' to the 'source hash table'. This is required so that other Libvirt commands like 'virsh list' can display all the VMs running in the node. 5) How to test Local Migration? A new flag 'local' is added to the 'virsh migrate' command to enable local migration. The syntax is virsh migrate --live --local 'domain-id' qemu+ssh://ip-address/system 6) What are the known issues? SeLinux policies is know to have issues with the creating /dev/hugepages entries during VM launch. In order to test local migration disable SeLinux using 'setenforce 0'. Shaju Abraham (6): Add VIR_MIGRATE_LOCAL flag to virsh migrate command Introduce remote hash tables and helper routines Add local migration support in QEMU Migration framework Modify close callback routines to handle local migration Make PATHs unique for a VM object instance Move the domain object from remote to source hash table include/libvirt/libvirt-domain.h | 6 + src/conf/virdomainobjlist.c | 232 +++++++++++++++++++++++++++++-- src/conf/virdomainobjlist.h | 10 ++ src/libvirt_private.syms | 4 + src/qemu/qemu_conf.c | 4 +- src/qemu/qemu_domain.c | 28 +++- src/qemu/qemu_domain.h | 2 + src/qemu/qemu_driver.c | 46 +++++- src/qemu/qemu_migration.c | 59 +++++--- src/qemu/qemu_migration.h | 5 + src/qemu/qemu_migration_cookie.c | 121 ++++++++-------- src/qemu/qemu_migration_cookie.h | 2 + src/qemu/qemu_process.c | 3 +- src/qemu/qemu_process.h | 2 + src/util/virclosecallbacks.c | 48 +++++-- src/util/virclosecallbacks.h | 3 + tools/virsh-domain.c | 7 + 17 files changed, 471 insertions(+), 111 deletions(-) -- 2.24.1

From: Shaju Abraham <shaju.abraham@nutanix.com> In order to support the local migration, the migration framework should distinguish between local and non local migrations. Add a flag to the 'virsh migrate' command so that framework can easily identify a local migration attempt. Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- include/libvirt/libvirt-domain.h | 6 ++++++ tools/virsh-domain.c | 7 +++++++ 2 files changed, 13 insertions(+) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 5846e93d98..018635b4f1 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -849,6 +849,12 @@ typedef enum { */ VIR_MIGRATE_PARALLEL = (1 << 17), + /* Setting the VIR_MIGRATE_LOCAL flag tells libvirt that the migration + * is within the host and bypass checks which refuses local host migration.It + * indicates that the same UUID and name will be used for different domains. + */ + VIR_MIGRATE_LOCAL = (1 << 18), + } virDomainMigrateFlags; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index e64e08e5da..b91117a6a2 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -10421,6 +10421,10 @@ static const vshCmdOptDef opts_migrate[] = { .type = VSH_OT_BOOL, .help = N_("peer-2-peer migration") }, + {.name = "local", + .type = VSH_OT_BOOL, + .help = N_("local migration") + }, {.name = "direct", .type = VSH_OT_BOOL, .help = N_("direct migration") @@ -10820,6 +10824,9 @@ doMigrate(void *opaque) flags |= VIR_MIGRATE_LIVE; if (vshCommandOptBool(cmd, "p2p")) flags |= VIR_MIGRATE_PEER2PEER; + + if (vshCommandOptBool(cmd, "local")) + flags |= VIR_MIGRATE_LOCAL; if (vshCommandOptBool(cmd, "tunnelled")) flags |= VIR_MIGRATE_TUNNELLED; -- 2.24.1

From: Shaju Abraham <shaju.abraham@nutanix.com> Libvirt assumes that there will ever be only one VM in a host with a specific UUId and name. The VM object pointers are stored in hash tables and are indexed using the UUID or name. For local migrations both the source and destination VMs reside on the same host and share the same name and UUID. This will introduce hash collisions. Introduce two more hash tables to manage the destination VM object. The remote VM objects are always stored in the newly introduced remote hashtable. The hash table helper routines are modified to handle the newly introduced remote hash tables. The existing functions are duplicated to operate on the remote hash table. This is done intentionally so that the changes to existing migration framework is minimal. Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- src/conf/virdomainobjlist.c | 232 ++++++++++++++++++++++++++++++++++-- src/conf/virdomainobjlist.h | 10 ++ src/libvirt_private.syms | 4 + 3 files changed, 238 insertions(+), 8 deletions(-) diff --git a/src/conf/virdomainobjlist.c b/src/conf/virdomainobjlist.c index 417025ae9f..37fc0c4d5e 100644 --- a/src/conf/virdomainobjlist.c +++ b/src/conf/virdomainobjlist.c @@ -48,10 +48,14 @@ struct _virDomainObjList { /* uuid string -> virDomainObj mapping * for O(1), lockless lookup-by-uuid */ virHashTable *objs; + /* Hashtable to support Local Migration */ + virHashTable *remote_objs; /* name -> virDomainObj mapping for O(1), * lockless lookup-by-name */ virHashTable *objsName; + /* Hashtable to support Local Migration */ + virHashTable *remote_objsName; }; @@ -80,6 +84,13 @@ virDomainObjListPtr virDomainObjListNew(void) virObjectUnref(doms); return NULL; } + if (!(doms->remote_objs = virHashCreate(50, virObjectFreeHashData)) || + !(doms->remote_objsName = virHashCreate(50, virObjectFreeHashData))) { + virHashFree(doms->objs); + virHashFree(doms->objsName); + virObjectUnref(doms); + return NULL; + } return doms; } @@ -92,6 +103,193 @@ static void virDomainObjListDispose(void *obj) virHashFree(doms->objs); virHashFree(doms->objsName); } +/*Remote Hash table manipulation routines*/ +static virDomainObjPtr +virDomainObjListFindByUUIDRemoteLocked(virDomainObjListPtr doms, + const unsigned char *uuid) +{ + char uuidstr[VIR_UUID_STRING_BUFLEN]; + virDomainObjPtr obj; + + virUUIDFormat(uuid, uuidstr); + obj = virHashLookup(doms->remote_objs, uuidstr); + virObjectRef(obj); + if (obj) { + virObjectLock(obj); + if (obj->removing) { + virObjectUnlock(obj); + virObjectUnref(obj); + obj = NULL; + } + } + return obj; +} + +static virDomainObjPtr +virDomainObjListFindByUUIDFromHashLocked(virHashTable *table, + const unsigned char *uuid) +{ + char uuidstr[VIR_UUID_STRING_BUFLEN]; + virDomainObjPtr obj; + + virUUIDFormat(uuid, uuidstr); + obj = virHashLookup(table, uuidstr); + virObjectRef(obj); + if (obj) { + virObjectLock(obj); + if (obj->removing) { + virObjectUnlock(obj); + virObjectUnref(obj); + obj = NULL; + } + } + return obj; +} +/** + * @doms: Domain object list + * @uuid: UUID to search the doms->objs table + * + * Lookup the @uuid in the doms->objs hash table and return a + * locked and ref counted domain object if found. Caller is + * expected to use the virDomainObjEndAPI when done with the object. + */ +virDomainObjPtr +virDomainObjListFindByUUIDRemote(virDomainObjListPtr doms, + const unsigned char *uuid) +{ + virDomainObjPtr obj; + virObjectRWLockRead(doms); + obj = virDomainObjListFindByUUIDRemoteLocked(doms, uuid); + virObjectRWUnlock(doms); + + return obj; +} +static virDomainObjPtr +virDomainObjListFindByNameFromHashLocked(virHashTable *table, + const char *name) +{ + virDomainObjPtr obj; + + obj = virHashLookup(table, name); + virObjectRef(obj); + if (obj) { + virObjectLock(obj); + if (obj->removing) { + virObjectUnlock(obj); + virObjectUnref(obj); + obj = NULL; + } + } + return obj; +} + +/** + * @doms: Domain object list + * @name: Name to search the doms->objsName table + * + * Lookup the @name in the doms->objsName_remote hash table and return a + * locked and ref counted domain object if found. Caller is expected + * to use the virDomainObjEndAPI when done with the object. + */ +virDomainObjPtr +virDomainObjListFindByNameRemote(virDomainObjListPtr doms, + const char *name) +{ + virDomainObjPtr obj; + virObjectRWLockRead(doms); + obj = virDomainObjListFindByNameFromHashLocked(doms->remote_objs, name); + virObjectRWUnlock(doms); + + return obj; +} + +/* The caller must hold lock on 'doms' in addition to 'virDomainObjListRemove' + * requirements + * + * Can be used to remove current element while iterating with + * virDomainObjListForEach + */ +int +virDomainObjListRemoveRemoteLocked(virDomainObjListPtr doms, + virDomainObjPtr dom) +{ + char uuidstr[VIR_UUID_STRING_BUFLEN]; + virDomainObjPtr remoteDom; + int ret = -1; + + virUUIDFormat(dom->def->uuid, uuidstr); + remoteDom = virHashSteal(doms->remote_objs, uuidstr); + if (remoteDom != dom) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Can't get matching VM from uuid")); + return ret; + } + remoteDom = virHashSteal(doms->remote_objsName, dom->def->name); + if (remoteDom != dom) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Can't get matching VM from name")); + return ret; + } + return 0; +} + +/** + * @doms: Pointer to the domain object list + * @dom: Domain pointer from either after Add or FindBy* API where the + * @dom was successfully added to both the doms->objs and ->objsName + * hash tables that now would need to be removed. + * + * The caller must hold a lock on the driver owning 'doms', + * and must also have locked and ref counted 'dom', to ensure + * no one else is either waiting for 'dom' or still using it. + * + * When this function returns, @dom will be removed from the hash + * tables and returned with lock and refcnt that was present upon entry. + */ +int +virDomainObjListRemoveRemote(virDomainObjListPtr doms, + virDomainObjPtr dom) +{ + int ret = -1; + virObjectRef(dom); + virObjectUnlock(dom); + virObjectRWLockWrite(doms); + virObjectLock(dom); + ret = virDomainObjListRemoveRemoteLocked(doms, dom); + virObjectUnref(dom); + virObjectRWUnlock(doms); + return ret; +} +/* + * @doms: Pointer to the domain object list + * @dom: Domain pointer + * This helper routine inserts a domain object to the hash tables. + * @dom should be successfully added to both the doms->objs and ->objsName + * hashtables. Required by the local migration use case. +*/ +int virDomainObjListAddEntry(virDomainObjListPtr doms, virDomainObjPtr dom) +{ + int ret = -1; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + virUUIDFormat(dom->def->uuid, uuidstr); + virObjectRef(dom); + virObjectUnlock(dom); + virObjectLock(doms); + virObjectLock(dom); + if (virHashAddEntry(doms->objs, uuidstr, dom) < 0) { + virObjectUnref(dom); + return ret; + } + if (virHashAddEntry(doms->objsName, dom->def->name, dom) < 0) { + virHashRemoveEntry(doms->objs, uuidstr); + goto cleanup; + } + ret = 0; + cleanup: + virObjectUnlock(dom); + virObjectUnlock(doms); + return ret; +} static int virDomainObjListSearchID(const void *payload, @@ -225,6 +423,7 @@ virDomainObjListFindByName(virDomainObjListPtr doms, /** * @doms: Domain object list pointer * @vm: Domain object to be added + * @localMigration: flag to indicate local migration * * Upon entry @vm should have at least 1 ref and be locked. * @@ -240,16 +439,22 @@ virDomainObjListFindByName(virDomainObjListPtr doms, */ static int virDomainObjListAddObjLocked(virDomainObjListPtr doms, - virDomainObjPtr vm) + virDomainObjPtr vm, bool localMigration) { char uuidstr[VIR_UUID_STRING_BUFLEN]; + virHashTable *table = doms->objs; + virHashTable *table_name = doms->objsName; + if (localMigration) { + table = doms->remote_objs; + table_name = doms->remote_objsName; + } virUUIDFormat(vm->def->uuid, uuidstr); - if (virHashAddEntry(doms->objs, uuidstr, vm) < 0) + if (virHashAddEntry(table, uuidstr, vm) < 0) return -1; virObjectRef(vm); - if (virHashAddEntry(doms->objsName, vm->def->name, vm) < 0) { + if (virHashAddEntry(table_name, vm->def->name, vm) < 0) { virHashRemoveEntry(doms->objs, uuidstr); return -1; } @@ -281,14 +486,25 @@ virDomainObjListAddLocked(virDomainObjListPtr doms, unsigned int flags, virDomainDefPtr *oldDef) { - virDomainObjPtr vm; + virDomainObjPtr vm = NULL; char uuidstr[VIR_UUID_STRING_BUFLEN]; + bool localMigration = 0; + virHashTable *table = doms->objs; + virHashTable *table_name = doms->objsName; if (oldDef) *oldDef = NULL; + /* Check if it is local migration. In that case the src vm exists with + * same UUID and name.We have to create a new UUID and name for the domain. + */ + if (flags & VIR_DOMAIN_OBJ_LIST_ADD_LIVE_LOCAL) { + table = doms->remote_objs; + table_name = doms->remote_objsName; + localMigration = 1; + } /* See if a VM with matching UUID already exists */ - if ((vm = virDomainObjListFindByUUIDLocked(doms, def->uuid))) { + if ((vm = virDomainObjListFindByUUIDFromHashLocked(table, def->uuid))) { if (vm->removing) { virReportError(VIR_ERR_OPERATION_FAILED, _("domain '%s' is already being removed"), @@ -325,7 +541,7 @@ virDomainObjListAddLocked(virDomainObjListPtr doms, oldDef); } else { /* UUID does not match, but if a name matches, refuse it */ - if ((vm = virDomainObjListFindByNameLocked(doms, def->name))) { + if ((vm = virDomainObjListFindByNameFromHashLocked(table_name, def->name))) { virUUIDFormat(vm->def->uuid, uuidstr); virReportError(VIR_ERR_OPERATION_FAILED, _("domain '%s' already exists with uuid %s"), @@ -337,7 +553,7 @@ virDomainObjListAddLocked(virDomainObjListPtr doms, goto error; vm->def = def; - if (virDomainObjListAddObjLocked(doms, vm) < 0) { + if (virDomainObjListAddObjLocked(doms, vm, localMigration) < 0) { vm->def = NULL; goto error; } @@ -563,7 +779,7 @@ virDomainObjListLoadStatus(virDomainObjListPtr doms, goto error; } - if (virDomainObjListAddObjLocked(doms, obj) < 0) + if (virDomainObjListAddObjLocked(doms, obj, 0) < 0) goto error; if (notify) diff --git a/src/conf/virdomainobjlist.h b/src/conf/virdomainobjlist.h index 6150e13aa4..a3253787ff 100644 --- a/src/conf/virdomainobjlist.h +++ b/src/conf/virdomainobjlist.h @@ -33,12 +33,18 @@ virDomainObjPtr virDomainObjListFindByID(virDomainObjListPtr doms, int id); virDomainObjPtr virDomainObjListFindByUUID(virDomainObjListPtr doms, const unsigned char *uuid); +virDomainObjPtr virDomainObjListFindByUUIDRemote(virDomainObjListPtr doms, + const unsigned char *uuid); virDomainObjPtr virDomainObjListFindByName(virDomainObjListPtr doms, const char *name); +virDomainObjPtr virDomainObjListFindByNameRemote(virDomainObjListPtr doms, + const char *name); +int virDomainObjListAddEntry(virDomainObjListPtr doms,virDomainObjPtr dom); enum { VIR_DOMAIN_OBJ_LIST_ADD_LIVE = (1 << 0), VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE = (1 << 1), + VIR_DOMAIN_OBJ_LIST_ADD_LIVE_LOCAL = (1 << 2), }; virDomainObjPtr virDomainObjListAdd(virDomainObjListPtr doms, virDomainDefPtr def, @@ -61,6 +67,10 @@ void virDomainObjListRemove(virDomainObjListPtr doms, virDomainObjPtr dom); void virDomainObjListRemoveLocked(virDomainObjListPtr doms, virDomainObjPtr dom); +int virDomainObjListRemoveRemoteLocked(virDomainObjListPtr doms, + virDomainObjPtr dom); +int virDomainObjListRemoveRemote(virDomainObjListPtr doms, + virDomainObjPtr dom); int virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, const char *configDir, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 970f412527..c104313319 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1045,7 +1045,9 @@ virDomainObjListConvert; virDomainObjListExport; virDomainObjListFindByID; virDomainObjListFindByName; +virDomainObjListFindByNameRemote; virDomainObjListFindByUUID; +virDomainObjListFindByUUIDRemote; virDomainObjListForEach; virDomainObjListGetActiveIDs; virDomainObjListGetInactiveNames; @@ -1054,7 +1056,9 @@ virDomainObjListNew; virDomainObjListNumOfDomains; virDomainObjListRemove; virDomainObjListRemoveLocked; +virDomainObjListRemoveRemote; virDomainObjListRename; +virDomainObjListAddEntry; # conf/virdomainsnapshotobjlist.h -- 2.24.1

From: Shaju Abraham <shaju.abraham@nutanix.com> The local migration is selected by the option 'local' in virsh migrate command. Once 'local' is specified in the command line, the migration framework needs to follow the local migration path. Use the passed flag to selectively bypass the checks that otherwise prevents the local migration in QEMU migration framework. Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- src/qemu/qemu_migration.c | 46 +++++++----- src/qemu/qemu_migration.h | 1 + src/qemu/qemu_migration_cookie.c | 121 ++++++++++++++++--------------- src/qemu/qemu_migration_cookie.h | 2 + src/qemu/qemu_process.c | 3 +- src/qemu/qemu_process.h | 2 + 6 files changed, 98 insertions(+), 77 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d7814208a2..d22daeac9c 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2334,15 +2334,30 @@ qemuMigrationDstPrepareAny(virQEMUDriverPtr driver, bool tunnel = !!st; char *xmlout = NULL; unsigned int cookieFlags; - unsigned int startFlags; + unsigned int startFlags = 0; + unsigned int eatCookieFlags; qemuProcessIncomingDefPtr incoming = NULL; bool taint_hook = false; bool stopProcess = false; bool relabel = false; int rv; char *tlsAlias = NULL; + unsigned long list_flags = VIR_DOMAIN_OBJ_LIST_ADD_LIVE | + VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE; + eatCookieFlags = QEMU_MIGRATION_COOKIE_LOCKSTATE | + QEMU_MIGRATION_COOKIE_NBD | + QEMU_MIGRATION_COOKIE_MEMORY_HOTPLUG | + QEMU_MIGRATION_COOKIE_CPU_HOTPLUG | + QEMU_MIGRATION_COOKIE_CPU | + QEMU_MIGRATION_COOKIE_ALLOW_REBOOT | + QEMU_MIGRATION_COOKIE_CAPS; virNWFilterReadLockFilterUpdates(); + if (flags & VIR_MIGRATE_LOCAL) { + list_flags |= VIR_DOMAIN_OBJ_LIST_ADD_LIVE_LOCAL; + eatCookieFlags |= QEMU_MIGRATION_COOKIE_LOCAL_MIGRATION; + startFlags |= VIR_QEMU_PROCESS_START_LOCAL_MIG; + } if (flags & VIR_MIGRATE_OFFLINE) { if (flags & (VIR_MIGRATE_NON_SHARED_DISK | @@ -2439,19 +2454,12 @@ qemuMigrationDstPrepareAny(virQEMUDriverPtr driver, * point in having the domain in the list at that point. */ if (!(mig = qemuMigrationEatCookie(driver, *def, origname, NULL, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_LOCKSTATE | - QEMU_MIGRATION_COOKIE_NBD | - QEMU_MIGRATION_COOKIE_MEMORY_HOTPLUG | - QEMU_MIGRATION_COOKIE_CPU_HOTPLUG | - QEMU_MIGRATION_COOKIE_CPU | - QEMU_MIGRATION_COOKIE_ALLOW_REBOOT | - QEMU_MIGRATION_COOKIE_CAPS))) + eatCookieFlags))) goto cleanup; if (!(vm = virDomainObjListAdd(driver->domains, *def, driver->xmlopt, - VIR_DOMAIN_OBJ_LIST_ADD_LIVE | - VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE, + list_flags, NULL))) goto cleanup; *def = NULL; @@ -2494,9 +2502,7 @@ qemuMigrationDstPrepareAny(virQEMUDriverPtr driver, _("cannot create pipe for tunnelled migration")); goto stopjob; } - - startFlags = VIR_QEMU_PROCESS_START_AUTODESTROY; - + startFlags |= VIR_QEMU_PROCESS_START_AUTODESTROY; if (qemuProcessInit(driver, vm, mig->cpu, QEMU_ASYNC_JOB_MIGRATION_IN, true, startFlags) < 0) goto stopjob; @@ -2938,6 +2944,7 @@ qemuMigrationSrcConfirmPhase(virQEMUDriverPtr driver, qemuMigrationCookiePtr mig; virObjectEventPtr event; int rv = -1; + unsigned int cookieFlags = 0; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = NULL; @@ -2948,15 +2955,16 @@ qemuMigrationSrcConfirmPhase(virQEMUDriverPtr driver, flags, retcode); virCheckFlags(QEMU_MIGRATION_FLAGS, -1); - + cookieFlags = QEMU_MIGRATION_COOKIE_STATS; + if (flags & VIR_MIGRATE_LOCAL) + cookieFlags |= QEMU_MIGRATION_COOKIE_LOCAL_MIGRATION; qemuMigrationJobSetPhase(driver, vm, retcode == 0 ? QEMU_MIGRATION_PHASE_CONFIRM3 : QEMU_MIGRATION_PHASE_CONFIRM3_CANCELLED); if (!(mig = qemuMigrationEatCookie(driver, vm->def, priv->origname, priv, - cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_STATS))) + cookiein, cookieinlen, cookieFlags))) goto cleanup; if (retcode == 0) @@ -3420,7 +3428,8 @@ qemuMigrationSrcRun(virQEMUDriverPtr driver, cookieout, cookieoutlen, flags, resource, spec, spec->destType, spec->fwdType, dconn, NULLSTR(graphicsuri), nmigrate_disks, migrate_disks); - + if (flags & VIR_MIGRATE_LOCAL) + cookieFlags |= QEMU_MIGRATION_COOKIE_LOCAL_MIGRATION; if (flags & VIR_MIGRATE_NON_SHARED_DISK) { migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_DISK; cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; @@ -4955,6 +4964,9 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver, * even though VIR_MIGRATE_PERSIST_DEST was not used. */ cookie_flags |= QEMU_MIGRATION_COOKIE_PERSISTENT; + if (flags & VIR_MIGRATE_LOCAL) + cookie_flags |= QEMU_MIGRATION_COOKIE_LOCAL_MIGRATION; + if (!(mig = qemuMigrationEatCookie(driver, vm->def, priv->origname, priv, cookiein, cookieinlen, cookie_flags))) goto endjob; diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index b6f88d3fd9..2cb4bb9023 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -59,6 +59,7 @@ VIR_MIGRATE_POSTCOPY | \ VIR_MIGRATE_TLS | \ VIR_MIGRATE_PARALLEL | \ + VIR_MIGRATE_LOCAL |\ 0) /* All supported migration parameters and their types. */ diff --git a/src/qemu/qemu_migration_cookie.c b/src/qemu/qemu_migration_cookie.c index 299bf17c9e..18544dbaac 100644 --- a/src/qemu/qemu_migration_cookie.c +++ b/src/qemu/qemu_migration_cookie.c @@ -51,6 +51,7 @@ VIR_ENUM_IMPL(qemuMigrationCookieFlag, "cpu", "allowReboot", "capabilities", + "local-migration", ); @@ -1199,69 +1200,71 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, /* We don't store the uuid, name, hostname, or hostuuid * values. We just compare them to local data to do some - * sanity checking on migration operation + * sanity checking on migration operation. In case of Local + * Migration bypass the checks. We know that we are migrating + * to the same host and the name and UUID will be existing in + * the host. */ /* Extract domain name */ - if (!(tmp = virXPathString("string(./name[1])", ctxt))) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("missing name element in migration data")); - goto error; - } - if (STRNEQ(tmp, mig->name)) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Incoming cookie data had unexpected name %s vs %s"), - tmp, mig->name); - goto error; - } - VIR_FREE(tmp); - - /* Extract domain uuid */ - tmp = virXPathString("string(./uuid[1])", ctxt); - if (!tmp) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("missing uuid element in migration data")); - goto error; - } - virUUIDFormat(mig->uuid, uuidstr); - if (STRNEQ(tmp, uuidstr)) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Incoming cookie data had unexpected UUID %s vs %s"), - tmp, uuidstr); - goto error; - } - VIR_FREE(tmp); - - /* Check & forbid "localhost" migration */ - if (!(mig->remoteHostname = virXPathString("string(./hostname[1])", ctxt))) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("missing hostname element in migration data")); - goto error; - } - if (STREQ(mig->remoteHostname, mig->localHostname)) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Attempt to migrate guest to the same host %s"), - mig->remoteHostname); - goto error; - } - - if (!(tmp = virXPathString("string(./hostuuid[1])", ctxt))) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("missing hostuuid element in migration data")); - goto error; - } - if (virUUIDParse(tmp, mig->remoteHostuuid) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("malformed hostuuid element in migration data")); - goto error; - } - if (memcmp(mig->remoteHostuuid, mig->localHostuuid, VIR_UUID_BUFLEN) == 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Attempt to migrate guest to the same host %s"), - tmp); - goto error; + if (!(flags & QEMU_MIGRATION_COOKIE_LOCAL_MIGRATION)) { + if (!(tmp = virXPathString("string(./name[1])", ctxt))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("missing name element in migration data")); + goto error; + } + if (STRNEQ(tmp, mig->name)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Incoming cookie data had unexpected name %s vs %s"), + tmp, mig->name); + goto error; + } + VIR_FREE(tmp); + /* Extract domain uuid */ + tmp = virXPathString("string(./uuid[1])", ctxt); + if (!tmp) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("missing uuid element in migration data")); + goto error; + } + virUUIDFormat(mig->uuid, uuidstr); + if (STRNEQ(tmp, uuidstr)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Incoming cookie data had unexpected UUID %s vs %s"), + tmp, uuidstr); + goto error; + } + VIR_FREE(tmp); + /* Check & forbid "localhost" migration */ + if (!(mig->remoteHostname = virXPathString("string(./hostname[1])", ctxt))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("missing hostname element in migration data")); + goto error; + } + if (STREQ(mig->remoteHostname, mig->localHostname)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Attempt to migrate guest to the same host %s"), + mig->remoteHostname); + goto error; + } + if (!(tmp = virXPathString("string(./hostuuid[1])", ctxt))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("missing hostuuid element in migration data")); + goto error; + } + if (virUUIDParse(tmp, mig->remoteHostuuid) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("malformed hostuuid element in migration data")); + goto error; + } + if (memcmp(mig->remoteHostuuid, mig->localHostuuid, VIR_UUID_BUFLEN) == 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Attempt to migrate guest to the same host %s"), + tmp); + goto error; + } + VIR_FREE(tmp); } - VIR_FREE(tmp); /* Check to ensure all mandatory features from XML are also * present in 'flags' */ diff --git a/src/qemu/qemu_migration_cookie.h b/src/qemu/qemu_migration_cookie.h index 20e1ed60ca..7574672fd6 100644 --- a/src/qemu/qemu_migration_cookie.h +++ b/src/qemu/qemu_migration_cookie.h @@ -33,6 +33,7 @@ typedef enum { QEMU_MIGRATION_COOKIE_FLAG_CPU, QEMU_MIGRATION_COOKIE_FLAG_ALLOW_REBOOT, QEMU_MIGRATION_COOKIE_FLAG_CAPS, + QEMU_MIGRATION_COOKIE_FLAG_LOCAL_MIGRATION, QEMU_MIGRATION_COOKIE_FLAG_LAST } qemuMigrationCookieFlags; @@ -51,6 +52,7 @@ typedef enum { QEMU_MIGRATION_COOKIE_CPU = (1 << QEMU_MIGRATION_COOKIE_FLAG_CPU), QEMU_MIGRATION_COOKIE_ALLOW_REBOOT = (1 << QEMU_MIGRATION_COOKIE_FLAG_ALLOW_REBOOT), QEMU_MIGRATION_COOKIE_CAPS = (1 << QEMU_MIGRATION_COOKIE_FLAG_CAPS), + QEMU_MIGRATION_COOKIE_LOCAL_MIGRATION = (1 << QEMU_MIGRATION_COOKIE_FLAG_LOCAL_MIGRATION), } qemuMigrationCookieFeatures; typedef struct _qemuMigrationCookieGraphics qemuMigrationCookieGraphics; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 57a60c568a..df0884a233 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -6679,7 +6679,8 @@ qemuProcessLaunch(virConnectPtr conn, VIR_QEMU_PROCESS_START_PAUSED | VIR_QEMU_PROCESS_START_AUTODESTROY | VIR_QEMU_PROCESS_START_NEW | - VIR_QEMU_PROCESS_START_GEN_VMID, -1); + VIR_QEMU_PROCESS_START_GEN_VMID | + VIR_QEMU_PROCESS_START_LOCAL_MIG, -1); cfg = virQEMUDriverGetConfig(driver); diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 9af9f967fd..7aa67158b0 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -81,6 +81,8 @@ typedef enum { VIR_QEMU_PROCESS_START_GEN_VMID = 1 << 5, /* Generate a new VMID */ VIR_QEMU_PROCESS_START_STANDALONE = 1 << 6, /* Require CLI args to be usable standalone, ie no FD passing and the like */ + VIR_QEMU_PROCESS_START_LOCAL_MIG = 1 << 7, /* internal, new VM is starting + for Local Migration */ } qemuProcessStartFlags; int qemuProcessStart(virConnectPtr conn, -- 2.24.1

From: Shaju Abraham <shaju.abraham@nutanix.com> At the time of registration, the close callback routines are inserted to a hash table using the UUID as index. Since UUID is unique in a host, a single hash entry is suffice to represent a domain object. But in the case of local migration, the UUID is shared between the source and destination VMs. This creates hash collision since both source and destination VMs are on same host. Inorder to uniquely identify the source and destination VMs , the domain id is appeneded to the UUID to generate a unique hash key. The routines are modified to handle the above change. Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- src/util/virclosecallbacks.c | 48 +++++++++++++++++++++++++++--------- src/util/virclosecallbacks.h | 3 +++ 2 files changed, 39 insertions(+), 12 deletions(-) diff --git a/src/util/virclosecallbacks.c b/src/util/virclosecallbacks.c index 200577e18e..a8ab46d4d7 100644 --- a/src/util/virclosecallbacks.c +++ b/src/util/virclosecallbacks.c @@ -24,6 +24,7 @@ #include "virclosecallbacks.h" #include "virlog.h" #include "virobject.h" +#include "virstring.h" #define VIR_FROM_THIS VIR_FROM_NONE @@ -34,6 +35,7 @@ typedef virDriverCloseDef *virDriverCloseDefPtr; struct _virDriverCloseDef { virConnectPtr conn; virCloseCallback cb; + virDomainObjPtr vm; }; struct _virCloseCallbacks { @@ -85,7 +87,6 @@ virCloseCallbacksDispose(void *obj) virHashFree(closeCallbacks->list); } - int virCloseCallbacksSet(virCloseCallbacksPtr closeCallbacks, virDomainObjPtr vm, @@ -93,6 +94,7 @@ virCloseCallbacksSet(virCloseCallbacksPtr closeCallbacks, virCloseCallback cb) { char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *uniqstr = NULL; virDriverCloseDefPtr closeDef; int ret = -1; @@ -100,9 +102,10 @@ virCloseCallbacksSet(virCloseCallbacksPtr closeCallbacks, VIR_DEBUG("vm=%s, uuid=%s, conn=%p, cb=%p", vm->def->name, uuidstr, conn, cb); + uniqstr = g_strdup_printf("%s%d", uuidstr, vm->def->id); virObjectLock(closeCallbacks); - closeDef = virHashLookup(closeCallbacks->list, uuidstr); + closeDef = virHashLookup(closeCallbacks->list, uniqstr); if (closeDef) { if (closeDef->conn != conn) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -125,7 +128,8 @@ virCloseCallbacksSet(virCloseCallbacksPtr closeCallbacks, closeDef->conn = conn; closeDef->cb = cb; - if (virHashAddEntry(closeCallbacks->list, uuidstr, closeDef) < 0) { + closeDef->vm = vm; + if (virHashAddEntry(closeCallbacks->list, uniqstr, closeDef) < 0) { VIR_FREE(closeDef); goto cleanup; } @@ -145,6 +149,7 @@ virCloseCallbacksUnset(virCloseCallbacksPtr closeCallbacks, virCloseCallback cb) { char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *uniqstr = NULL; virDriverCloseDefPtr closeDef; int ret = -1; @@ -152,9 +157,10 @@ virCloseCallbacksUnset(virCloseCallbacksPtr closeCallbacks, VIR_DEBUG("vm=%s, uuid=%s, cb=%p", vm->def->name, uuidstr, cb); + uniqstr = g_strdup_printf("%s%d", uuidstr, vm->def->id); virObjectLock(closeCallbacks); - closeDef = virHashLookup(closeCallbacks->list, uuidstr); + closeDef = virHashLookup(closeCallbacks->list, uniqstr); if (!closeDef) goto cleanup; @@ -165,7 +171,7 @@ virCloseCallbacksUnset(virCloseCallbacksPtr closeCallbacks, goto cleanup; } - if (virHashRemoveEntry(closeCallbacks->list, uuidstr) < 0) + if (virHashRemoveEntry(closeCallbacks->list, uniqstr) < 0) goto cleanup; virObjectUnref(vm); @@ -183,16 +189,18 @@ virCloseCallbacksGet(virCloseCallbacksPtr closeCallbacks, virConnectPtr conn) { char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *uniqstr = NULL; virDriverCloseDefPtr closeDef; virCloseCallback cb = NULL; virUUIDFormat(vm->def->uuid, uuidstr); VIR_DEBUG("vm=%s, uuid=%s, conn=%p", vm->def->name, uuidstr, conn); + uniqstr = g_strdup_printf("%s%d", uuidstr, vm->def->id); virObjectLock(closeCallbacks); - closeDef = virHashLookup(closeCallbacks->list, uuidstr); + closeDef = virHashLookup(closeCallbacks->list, uniqstr); if (closeDef && (!conn || closeDef->conn == conn)) cb = closeDef->cb; @@ -207,15 +215,17 @@ virCloseCallbacksGetConn(virCloseCallbacksPtr closeCallbacks, virDomainObjPtr vm) { char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *uniqstr = NULL; virDriverCloseDefPtr closeDef; virConnectPtr conn = NULL; virUUIDFormat(vm->def->uuid, uuidstr); VIR_DEBUG("vm=%s, uuid=%s", vm->def->name, uuidstr); + uniqstr = g_strdup_printf("%s%d", uuidstr, vm->def->id); virObjectLock(closeCallbacks); - closeDef = virHashLookup(closeCallbacks->list, uuidstr); + closeDef = virHashLookup(closeCallbacks->list, uniqstr); if (closeDef) conn = closeDef->conn; @@ -231,6 +241,7 @@ typedef virCloseCallbacksListEntry *virCloseCallbacksListEntryPtr; struct _virCloseCallbacksListEntry { unsigned char uuid[VIR_UUID_BUFLEN]; virCloseCallback callback; + virDomainObjPtr vm; }; typedef struct _virCloseCallbacksList virCloseCallbacksList; @@ -274,6 +285,7 @@ virCloseCallbacksGetOne(void *payload, memcpy(data->list->entries[data->list->nentries - 1].uuid, uuid, VIR_UUID_BUFLEN); data->list->entries[data->list->nentries - 1].callback = closeDef->cb; + data->list->entries[data->list->nentries - 1].vm = closeDef->vm; return 0; } @@ -302,8 +314,18 @@ virCloseCallbacksGetForConn(virCloseCallbacksPtr closeCallbacks, return list; } - - +virDomainObjPtr virGetMatchingDomain(virDomainObjListPtr domains, + unsigned char *uuidstr, virDomainObjPtr dom) +{ + virDomainObjPtr vm = NULL; + vm = virDomainObjListFindByUUID(domains, uuidstr); + if (!vm || vm != dom) { + vm = virDomainObjListFindByUUIDRemote(domains, uuidstr); + if (!vm || vm != dom) + return NULL; + } + return vm; +} void virCloseCallbacksRun(virCloseCallbacksPtr closeCallbacks, virConnectPtr conn, @@ -329,8 +351,10 @@ virCloseCallbacksRun(virCloseCallbacksPtr closeCallbacks, for (i = 0; i < list->nentries; i++) { char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *uniqstr = NULL; virUUIDFormat(list->entries[i].uuid, uuidstr); - virHashRemoveEntry(closeCallbacks->list, uuidstr); + uniqstr = g_strdup_printf("%s%d", uuidstr, list->entries[i].vm->def->id); + virHashRemoveEntry(closeCallbacks->list, uniqstr); } virObjectUnlock(closeCallbacks); @@ -338,8 +362,8 @@ virCloseCallbacksRun(virCloseCallbacksPtr closeCallbacks, virDomainObjPtr vm; /* Grab a ref and lock to the vm */ - if (!(vm = virDomainObjListFindByUUID(domains, - list->entries[i].uuid))) { + if (!(vm = virGetMatchingDomain(domains, + list->entries[i].uuid, list->entries[i].vm))) { char uuidstr[VIR_UUID_STRING_BUFLEN]; virUUIDFormat(list->entries[i].uuid, uuidstr); VIR_DEBUG("No domain object with UUID %s", uuidstr); diff --git a/src/util/virclosecallbacks.h b/src/util/virclosecallbacks.h index 98fc2c4a94..253ecab6bc 100644 --- a/src/util/virclosecallbacks.h +++ b/src/util/virclosecallbacks.h @@ -48,3 +48,6 @@ virCloseCallbacksRun(virCloseCallbacksPtr closeCallbacks, virConnectPtr conn, virDomainObjListPtr domains, void *opaque); + +virDomainObjPtr virGetMatchingDomain(virDomainObjListPtr domains, + unsigned char *uuidstr, virDomainObjPtr dom); -- 2.24.1

From: Shaju Abraham <shaju.abraham@nutanix.com> There are various config paths that a VM uses. The monitor paths and other lib paths are examples. These paths are tied to the VM name or UUID. The local migration breaks the assumption that there will be only one VM with a unique UUID and name. During local migrations there can be multiple VMs with same name and UUID in the same host. Append the domain-id field to the path so that there is no duplication of path names. Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- src/qemu/qemu_conf.c | 4 ++-- src/qemu/qemu_domain.c | 16 ++++++++++------ 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 00801ef01b..6769736d58 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1894,7 +1894,7 @@ qemuGetDomainHugepagePath(const virDomainDef *def, char *ret = NULL; if (base && domPath) - ret = g_strdup_printf("%s/%s", base, domPath); + ret = g_strdup_printf("%s/%s-%d", base, domPath, def->id); return ret; } @@ -1962,7 +1962,7 @@ qemuGetMemoryBackingDomainPath(const virDomainDef *def, return -1; qemuGetMemoryBackingBasePath(cfg, &base); - *path = g_strdup_printf("%s/%s", base, shortName); + *path = g_strdup_printf("%s/%s-%d", base, shortName, def->id); return 0; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index b0c0e1a19b..002c092cf8 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2127,11 +2127,13 @@ qemuDomainSetPrivatePathsOld(virQEMUDriverPtr driver, virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); if (!priv->libDir) - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, vm->def->name); + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, + vm->def->name, vm->def->id); if (!priv->channelTargetDir) - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", - cfg->channelTargetDir, vm->def->name); + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", + cfg->channelTargetDir, + vm->def->name, vm->def->id); virObjectUnref(cfg); } @@ -2150,11 +2152,13 @@ qemuDomainSetPrivatePaths(virQEMUDriverPtr driver, goto cleanup; if (!priv->libDir) - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, domname); + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, domname, + vm->def->id); if (!priv->channelTargetDir) - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", - cfg->channelTargetDir, domname); + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", + cfg->channelTargetDir, + domname, vm->def->id); ret = 0; cleanup: -- 2.24.1

On Mon, Feb 03, 2020 at 12:43:32PM +0000, Daniel P. Berrangé wrote:
From: Shaju Abraham <shaju.abraham@nutanix.com>
There are various config paths that a VM uses. The monitor paths and other lib paths are examples. These paths are tied to the VM name or UUID. The local migration breaks the assumption that there will be only one VM with a unique UUID and name. During local migrations there can be multiple VMs with same name and UUID in the same host. Append the domain-id field to the path so that there is no duplication of path names.
This is the really critical problem with localhost migration. Appending the domain-id looks "simple" but this is a significant behavioural / functional change for applications, and I don't think it can fully solve the problem either. This is changing thue paths used in various places where libvirt internally generates unique paths (eg QMP socket, huge page or file based memory paths, and defaults used for auto-filling device paths (<channel> if not specified). Some of these paths are functionally important to external applications and cannot be changed in this way. eg stuff integrating with QEMU can be expecting certain memory backing file paths, or certain <channel> paths & is liable to break if we change the naming convention. For sake of argument, lets assume we can changing the naming convention without breaking anything... This only applies to paths libvirt generates at VM startup. There are plenty of configuration elements in the guest XML that are end user / mgmt app defined, and reference paths in the host OS. For example <graphics>, <serial>, <console>, <channel>, all support UNIX domain sockets and TCP sockets. A UNIX domain socket cannot be listened on by multiple VMs at once. If the UNIX socket is in client mode, we cannot assume the thing QEMU is connecting to allows multiple concurrent connections. eg 2 QEMU's could have their <serial> connected together over a UNIX socket pair. Similarly if automatic TCP port assignment is not used we cannot have multiple QEMU's listening on the same host. One answer is to say that localhost migration is just not supported for such VMs, but I don't find that very convincing because the UNIX domain socket configs affected are in common use. If localhost migration is only usable in a small subset scenarios, I'm not convinced it is worth the support burden. Rarely used & tested features are liable to bitrot, and migration is already a serious point of instability in QEMU in general.
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- src/qemu/qemu_conf.c | 4 ++-- src/qemu/qemu_domain.c | 16 ++++++++++------ 2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 00801ef01b..6769736d58 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1894,7 +1894,7 @@ qemuGetDomainHugepagePath(const virDomainDef *def, char *ret = NULL;
if (base && domPath) - ret = g_strdup_printf("%s/%s", base, domPath); + ret = g_strdup_printf("%s/%s-%d", base, domPath, def->id); return ret; }
@@ -1962,7 +1962,7 @@ qemuGetMemoryBackingDomainPath(const virDomainDef *def, return -1;
qemuGetMemoryBackingBasePath(cfg, &base); - *path = g_strdup_printf("%s/%s", base, shortName); + *path = g_strdup_printf("%s/%s-%d", base, shortName, def->id);
return 0; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index b0c0e1a19b..002c092cf8 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2127,11 +2127,13 @@ qemuDomainSetPrivatePathsOld(virQEMUDriverPtr driver, virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
if (!priv->libDir) - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, vm->def->name); + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, + vm->def->name, vm->def->id);
if (!priv->channelTargetDir) - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", - cfg->channelTargetDir, vm->def->name); + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", + cfg->channelTargetDir, + vm->def->name, vm->def->id);
virObjectUnref(cfg); } @@ -2150,11 +2152,13 @@ qemuDomainSetPrivatePaths(virQEMUDriverPtr driver, goto cleanup;
if (!priv->libDir) - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, domname); + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, domname, + vm->def->id);
if (!priv->channelTargetDir) - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", - cfg->channelTargetDir, domname); + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", + cfg->channelTargetDir, + domname, vm->def->id);
ret = 0; cleanup: -- 2.24.1
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

This is the really critical problem with localhost migration.
Appending the domain-id looks "simple" but this is a significant >behavioural / functional change for applications, and I don't think >it can fully solve the problem either.
>This is changing thue paths used in various places where libvirt >internally generates unique paths (eg QMP socket, huge page or >file based memory paths, and defaults used for auto-filling >device paths (<channel> if not specified). >Some of these paths are functionally important to external >applications and cannot be changed in this way. eg stuff >integrating with QEMU can be expecting certain memory backing >file paths, or certain <channel> paths & is liable to break >if we change the naming convention. >For sake of argument, lets assume we can changing the naming >convention without breaking anything... Agree on the need to keep the path names unmodified. I will try to design a new approach using symlinks. > If localhost migration is only usable in a small subset >scenarios, I'm not convinced it is worth the support >burden. Rarely used & tested features are liable to >bitrot, and migration is already a serious point of >instability in QEMU in general. The local migration is a much needed feature in data centers. At present, an upgrade of the hypervisor in a data center requires a large maintenance window , due to the migration of the VMs. QEMU already have features like "x-ignore-shared" which is almost like zero copy migration . Since Libvirt lacks support for local migration, we are not able to take advantage of these features. Regards Shaju On 2/5/20, 11:03 PM, "Daniel P. Berrangé" <berrange@redhat.com> wrote: On Mon, Feb 03, 2020 at 12:43:32PM +0000, Daniel P. Berrangé wrote: > From: Shaju Abraham <shaju.abraham@nutanix.com> > > There are various config paths that a VM uses. The monitor paths and > other lib paths are examples. These paths are tied to the VM name or > UUID. The local migration breaks the assumption that there will be only > one VM with a unique UUID and name. During local migrations there can be > multiple VMs with same name and UUID in the same host. Append the > domain-id field to the path so that there is no duplication of path > names. This is the really critical problem with localhost migration. Appending the domain-id looks "simple" but this is a significant behavioural / functional change for applications, and I don't think it can fully solve the problem either. This is changing thue paths used in various places where libvirt internally generates unique paths (eg QMP socket, huge page or file based memory paths, and defaults used for auto-filling device paths (<channel> if not specified). Some of these paths are functionally important to external applications and cannot be changed in this way. eg stuff integrating with QEMU can be expecting certain memory backing file paths, or certain <channel> paths & is liable to break if we change the naming convention. For sake of argument, lets assume we can changing the naming convention without breaking anything... This only applies to paths libvirt generates at VM startup. There are plenty of configuration elements in the guest XML that are end user / mgmt app defined, and reference paths in the host OS. For example <graphics>, <serial>, <console>, <channel>, all support UNIX domain sockets and TCP sockets. A UNIX domain socket cannot be listened on by multiple VMs at once. If the UNIX socket is in client mode, we cannot assume the thing QEMU is connecting to allows multiple concurrent connections. eg 2 QEMU's could have their <serial> connected together over a UNIX socket pair. Similarly if automatic TCP port assignment is not used we cannot have multiple QEMU's listening on the same host. One answer is to say that localhost migration is just not supported for such VMs, but I don't find that very convincing because the UNIX domain socket configs affected are in common use. If localhost migration is only usable in a small subset scenarios, I'm not convinced it is worth the support burden. Rarely used & tested features are liable to bitrot, and migration is already a serious point of instability in QEMU in general. > Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> > --- > src/qemu/qemu_conf.c | 4 ++-- > src/qemu/qemu_domain.c | 16 ++++++++++------ > 2 files changed, 12 insertions(+), 8 deletions(-) > > diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c > index 00801ef01b..6769736d58 100644 > --- a/src/qemu/qemu_conf.c > +++ b/src/qemu/qemu_conf.c > @@ -1894,7 +1894,7 @@ qemuGetDomainHugepagePath(const virDomainDef *def, > char *ret = NULL; > > if (base && domPath) > - ret = g_strdup_printf("%s/%s", base, domPath); > + ret = g_strdup_printf("%s/%s-%d", base, domPath, def->id); > return ret; > } > > @@ -1962,7 +1962,7 @@ qemuGetMemoryBackingDomainPath(const virDomainDef *def, > return -1; > > qemuGetMemoryBackingBasePath(cfg, &base); > - *path = g_strdup_printf("%s/%s", base, shortName); > + *path = g_strdup_printf("%s/%s-%d", base, shortName, def->id); > > return 0; > } > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c > index b0c0e1a19b..002c092cf8 100644 > --- a/src/qemu/qemu_domain.c > +++ b/src/qemu/qemu_domain.c > @@ -2127,11 +2127,13 @@ qemuDomainSetPrivatePathsOld(virQEMUDriverPtr driver, > virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); > > if (!priv->libDir) > - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, vm->def->name); > + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, > + vm->def->name, vm->def->id); > > if (!priv->channelTargetDir) > - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", > - cfg->channelTargetDir, vm->def->name); > + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", > + cfg->channelTargetDir, > + vm->def->name, vm->def->id); > > virObjectUnref(cfg); > } > @@ -2150,11 +2152,13 @@ qemuDomainSetPrivatePaths(virQEMUDriverPtr driver, > goto cleanup; > > if (!priv->libDir) > - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, domname); > + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, domname, > + vm->def->id); > > if (!priv->channelTargetDir) > - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", > - cfg->channelTargetDir, domname); > + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", > + cfg->channelTargetDir, > + domname, vm->def->id); > > ret = 0; > cleanup: > -- > 2.24.1 > Regards, Daniel -- |: https://urldefense.proofpoint.com/v2/url?u=https-3A__berrange.com&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=sY-XeNqcuy_ruBQ9T7A2LmG6ktyYXXSxRB1ljkxMepI&m=wOg64DtCNMIOAHZJH27aA6STzZD7wKkkU0Oxf6pQ4UQ&s=q_RMo9hPZMHyB8Gn1lmP1kIcS8tLqtSysmxdlRONxLc&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__www.flickr.com_photos_dberrange&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=sY-XeNqcuy_ruBQ9T7A2LmG6ktyYXXSxRB1ljkxMepI&m=wOg64DtCNMIOAHZJH27aA6STzZD7wKkkU0Oxf6pQ4UQ&s=tz40Lg7IggJKQG15YYagc7KtfvNxph37V_0LSgCobEo&e= :| |: https://urldefense.proofpoint.com/v2/url?u=https-3A__libvirt.org&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=sY-XeNqcuy_ruBQ9T7A2LmG6ktyYXXSxRB1ljkxMepI&m=wOg64DtCNMIOAHZJH27aA6STzZD7wKkkU0Oxf6pQ4UQ&s=eQgHUoWy1TZFOChYnRiZITxMTYr1ORZuSUXJGUa3cDQ&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__fstop138.berrange.com&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=sY-XeNqcuy_ruBQ9T7A2LmG6ktyYXXSxRB1ljkxMepI&m=wOg64DtCNMIOAHZJH27aA6STzZD7wKkkU0Oxf6pQ4UQ&s=Z07AZnscj4rQkk9oK1T0jIrkz7UYlDgauYiWoV5KIBg&e= :| |: https://urldefense.proofpoint.com/v2/url?u=https-3A__entangle-2Dphoto.org&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=sY-XeNqcuy_ruBQ9T7A2LmG6ktyYXXSxRB1ljkxMepI&m=wOg64DtCNMIOAHZJH27aA6STzZD7wKkkU0Oxf6pQ4UQ&s=Nushxar5SvxzJ8cnPTxTn3NouuEV-oh-IiH8Pl98kvo&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instagram.com_dberrange&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=sY-XeNqcuy_ruBQ9T7A2LmG6ktyYXXSxRB1ljkxMepI&m=wOg64DtCNMIOAHZJH27aA6STzZD7wKkkU0Oxf6pQ4UQ&s=qVatQRJpoD_LqBVLXJBMHEqo3MKpEIXseNm4maPEt0Q&e= :|

On Wed, Feb 05, 2020 at 05:32:50PM +0000, Daniel P. Berrangé wrote:
On Mon, Feb 03, 2020 at 12:43:32PM +0000, Daniel P. Berrangé wrote:
From: Shaju Abraham <shaju.abraham@nutanix.com>
There are various config paths that a VM uses. The monitor paths and other lib paths are examples. These paths are tied to the VM name or UUID. The local migration breaks the assumption that there will be only one VM with a unique UUID and name. During local migrations there can be multiple VMs with same name and UUID in the same host. Append the domain-id field to the path so that there is no duplication of path names.
This is the really critical problem with localhost migration.
Appending the domain-id looks "simple" but this is a significant behavioural / functional change for applications, and I don't think it can fully solve the problem either.
This is changing thue paths used in various places where libvirt internally generates unique paths (eg QMP socket, huge page or file based memory paths, and defaults used for auto-filling device paths (<channel> if not specified).
Some of these paths are functionally important to external applications and cannot be changed in this way. eg stuff integrating with QEMU can be expecting certain memory backing file paths, or certain <channel> paths & is liable to break if we change the naming convention.
For sake of argument, lets assume we can changing the naming convention without breaking anything...
This was already done in (I would say) most places as they use virDomainDefGetShortName() to get a short, unique name of a directory -- it uses the domain ID as a prefix.
This only applies to paths libvirt generates at VM startup.
There are plenty of configuration elements in the guest XML that are end user / mgmt app defined, and reference paths in the host OS.
For example <graphics>, <serial>, <console>, <channel>, all support UNIX domain sockets and TCP sockets. A UNIX domain socket cannot be listened on by multiple VMs at once. If the UNIX socket is in client mode, we cannot assume the thing QEMU is connecting to allows multiple concurrent connections. eg 2 QEMU's could have their <serial> connected together over a UNIX socket pair. Similarly if automatic TCP port assignment is not used we cannot have multiple QEMU's listening on the same host.
One answer is to say that localhost migration is just not supported for such VMs, but I don't find that very convincing because the UNIX domain socket configs affected are in common use.
I would be okay with saying that these either need to be changed in a provided destination XML or the migration will probably break. I do not think it is unreasonable to say that if users are trying to shoot themselves, we should not spend a ridiculous time just so we can prevent that. Otherwise we will get to the same point as we are now -- there might be a case where a local migration would work, but users cannot execute it even if they were very cautious and went through all the things that can prevent it from the technical point of view, libvirt will still disallow that. Adding at least partial support where we could say we rely on QEMU failing reasonably would allow couple of mgmt apps to do more than they can do now. And they might have taken care of the path collisions (e.g. when running libvirtd in containers or so).
If localhost migration is only usable in a small subset scenarios, I'm not convinced it is worth the support burden. Rarely used & tested features are liable to bitrot, and migration is already a serious point of instability in QEMU in general.
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- src/qemu/qemu_conf.c | 4 ++-- src/qemu/qemu_domain.c | 16 ++++++++++------ 2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 00801ef01b..6769736d58 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1894,7 +1894,7 @@ qemuGetDomainHugepagePath(const virDomainDef *def, char *ret = NULL;
if (base && domPath) - ret = g_strdup_printf("%s/%s", base, domPath); + ret = g_strdup_printf("%s/%s-%d", base, domPath, def->id); return ret; }
@@ -1962,7 +1962,7 @@ qemuGetMemoryBackingDomainPath(const virDomainDef *def, return -1;
qemuGetMemoryBackingBasePath(cfg, &base); - *path = g_strdup_printf("%s/%s", base, shortName); + *path = g_strdup_printf("%s/%s-%d", base, shortName, def->id);
return 0; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index b0c0e1a19b..002c092cf8 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2127,11 +2127,13 @@ qemuDomainSetPrivatePathsOld(virQEMUDriverPtr driver, virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
if (!priv->libDir) - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, vm->def->name); + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, + vm->def->name, vm->def->id);
if (!priv->channelTargetDir) - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", - cfg->channelTargetDir, vm->def->name); + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", + cfg->channelTargetDir, + vm->def->name, vm->def->id);
virObjectUnref(cfg); } @@ -2150,11 +2152,13 @@ qemuDomainSetPrivatePaths(virQEMUDriverPtr driver, goto cleanup;
if (!priv->libDir) - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, domname); + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, domname, + vm->def->id);
if (!priv->channelTargetDir) - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", - cfg->channelTargetDir, domname); + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", + cfg->channelTargetDir, + domname, vm->def->id);
ret = 0; cleanup: -- 2.24.1
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Feb 11, 2020 at 10:05:53AM +0100, Martin Kletzander wrote:
On Wed, Feb 05, 2020 at 05:32:50PM +0000, Daniel P. Berrangé wrote:
On Mon, Feb 03, 2020 at 12:43:32PM +0000, Daniel P. Berrangé wrote:
From: Shaju Abraham <shaju.abraham@nutanix.com>
There are various config paths that a VM uses. The monitor paths and other lib paths are examples. These paths are tied to the VM name or UUID. The local migration breaks the assumption that there will be only one VM with a unique UUID and name. During local migrations there can be multiple VMs with same name and UUID in the same host. Append the domain-id field to the path so that there is no duplication of path names.
This is the really critical problem with localhost migration.
Appending the domain-id looks "simple" but this is a significant behavioural / functional change for applications, and I don't think it can fully solve the problem either.
This is changing thue paths used in various places where libvirt internally generates unique paths (eg QMP socket, huge page or file based memory paths, and defaults used for auto-filling device paths (<channel> if not specified).
Some of these paths are functionally important to external applications and cannot be changed in this way. eg stuff integrating with QEMU can be expecting certain memory backing file paths, or certain <channel> paths & is liable to break if we change the naming convention.
For sake of argument, lets assume we can changing the naming convention without breaking anything...
This was already done in (I would say) most places as they use virDomainDefGetShortName() to get a short, unique name of a directory -- it uses the domain ID as a prefix.
This only applies to paths libvirt generates at VM startup.
There are plenty of configuration elements in the guest XML that are end user / mgmt app defined, and reference paths in the host OS.
For example <graphics>, <serial>, <console>, <channel>, all support UNIX domain sockets and TCP sockets. A UNIX domain socket cannot be listened on by multiple VMs at once. If the UNIX socket is in client mode, we cannot assume the thing QEMU is connecting to allows multiple concurrent connections. eg 2 QEMU's could have their <serial> connected together over a UNIX socket pair. Similarly if automatic TCP port assignment is not used we cannot have multiple QEMU's listening on the same host.
One answer is to say that localhost migration is just not supported for such VMs, but I don't find that very convincing because the UNIX domain socket configs affected are in common use.
I would be okay with saying that these either need to be changed in a provided destination XML or the migration will probably break. I do not think it is unreasonable to say that if users are trying to shoot themselves, we should not spend a ridiculous time just so we can prevent that. Otherwise we will get to the same point as we are now -- there might be a case where a local migration would work, but users cannot execute it even if they were very cautious and went through all the things that can prevent it from the technical point of view, libvirt will still disallow that.
If there are clashing resources, we can't rely on QEMU reporting an error. For example with a UNIX domain socket, the first thing QEMU does is unlink(/socket/path), which will blow away the UNIX domain socket belonging to the original QEMU. As a result if migration fails, and we rollback, the original QEMU will be broken. Preventing users from shooting themselves in the foot is a core part of the value that libvirt is adding for QEMU migration. We do this with stable device ABI, and controlled locking / disk labelling, and CPU compatibility checking, and so on. We're not perfect by any means, but the one thing we've tried very hard to ensure is that if the destination QEMU fails for any reason during migration, the user should always be able to rollback to use the original source QEMU without problems. The localhost migration support makes it harder to guarantee the the source QEMU is not broken, so I do think we need to make extra affect to protect users, if we're going to try to allow this. This series has taken the approach of trying to make the localhost migration work as if it were just a normal remote migration, with just the minimum change to alter some auto-generated paths on disk, and keeping the second list of domains. So we still the same begin/prepare/perform/finish/confirm phases fully separated. IOW, essentially the migration code has very little, almost no, knowlege of the fact that this is a localhost migration. This is understandaable as a way to minimize the invasiveness of any changes, but I think it misses the point that localhost migration needs more than just changing a few paths on disk.
Adding at least partial support where we could say we rely on QEMU failing reasonably would allow couple of mgmt apps to do more than they can do now. And they might have taken care of the path collisions (e.g. when running libvirtd in containers or so).
If libvirtd is running inside a container, then from libvirt's POV this is no longer localhost migration - it is regular cross-host migration. The only caveat is that the container must be reporting a unique hostname + UUID. We can at least override UUID in libvirtd.conf if that isn't the case.
If localhost migration is only usable in a small subset scenarios, I'm not convinced it is worth the support burden. Rarely used & tested features are liable to bitrot, and migration is already a serious point of instability in QEMU in general.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 2/11/20, 7:06 PM, "Daniel P. Berrangé" <berrange@redhat.com> wrote: On Tue, Feb 11, 2020 at 10:05:53AM +0100, Martin Kletzander wrote:
On Wed, Feb 05, 2020 at 05:32:50PM +0000, Daniel P. Berrangé wrote: > > > On Mon, Feb 03, 2020 at 12:43:32PM +0000, Daniel P. Berrangé wrote: > > > > From: Shaju Abraham <shaju.abraham@nutanix.com> > > > > > > > There are various config paths that a VM uses. The monitor paths and > > > > other lib paths are examples. These paths are tied to the VM name or > > > > UUID. The local migration breaks the assumption that there will be only > > > > one VM with a unique UUID and name. During local migrations there can be > > > > multiple VMs with same name and UUID in the same host. Append the > > > > domain-id field to the path so that there is no duplication of path > > > names. > > > > > >This is the really critical problem with localhost migration. > > > > > >Appending the domain-id looks "simple" but this is a significant > > >behavioural / functional change for applications, and I don't think > > >it can fully solve the problem either. > > > > > >This is changing thue paths used in various places where libvirt > > >internally generates unique paths (eg QMP socket, huge page or > > >file based memory paths, and defaults used for auto-filling > > >device paths (<channel> if not specified). > > > > > >Some of these paths are functionally important to external > > >applications and cannot be changed in this way. eg stuff > > >integrating with QEMU can be expecting certain memory backing > > >file paths, or certain <channel> paths & is liable to break > > >if we change the naming convention. > > > > > >For sake of argument, lets assume we can changing the naming > > >convention without breaking anything... > > > > > > >This was already done in (I would say) most places as they use > >virDomainDefGetShortName() to get a short, unique name of a directory -- it uses > >the domain ID as a prefix. > > > > > This only applies to paths libvirt generates at VM startup. > > > > > >There are plenty of configuration elements in the guest XML > > >that are end user / mgmt app defined, and reference paths in > > >the host OS. > > > > > >For example <graphics>, <serial>, <console>, <channel>, > > >all support UNIX domain sockets and TCP sockets. A UNIX > > >domain socket cannot be listened on by multiple VMs > > >at once. If the UNIX socket is in client mode, we cannot > > >assume the thing QEMU is connecting to allows multiple > > >concurrent connections. eg 2 QEMU's could have their > > ><serial> connected together over a UNIX socket pair. > > >Similarly if automatic TCP port assignment is not used > > >we cannot have multiple QEMU's listening on the same > > >host. > > > > > >One answer is to say that localhost migration is just > > >not supported for such VMs, but I don't find that very > > >convincing because the UNIX domain socket configs > > >affected are in common use. > > > > > > >I would be okay with saying that these either need to be changed in a provided > >destination XML or the migration will probably break. I do not think it is > >unreasonable to say that if users are trying to shoot themselves, we should not > >spend a ridiculous time just so we can prevent that. Otherwise we will get to > >the same point as we are now -- there might be a case where a local migration > >would work, but users cannot execute it even if they were very cautious and went > >through all the things that can prevent it from the technical point of view, > >libvirt will still disallow that.
>If there are clashing resources, we can't rely on QEMU reporting an >error. For example with a UNIX domain socket, the first thing QEMU >does is unlink(/socket/path), which will blow away the UNIX domain >socket belonging to the original QEMU. As a result if migration >fails, and we rollback, the original QEMU will be broken. By appending the id field to the path, we are effectively nullifying this particular concern. Each qemu instance will get its own unique path and monitor. If a migration fails, we can roll back. I understand the need to keep the paths unchanged for externally generated configurations. How about the below approach? Symlinks are created so that until the migration is complete the path remains unchanged for the source . I am still having issues with the emails, so pasting the code inline.
From 37ead09491f1e4ae09f8bba1204ae4626390fb2d Mon Sep 17 00:00:00 2001 From: Shaju Abraham <shaju.abraham@nutanix.com> Date: Wed, 22 Jan 2020 02:06:03 -0800 Subject: [PATCH 5/6] Make PATHs unique for a VM object instance
There are various config paths that a VM uses. The monitor paths and other lib paths are examples. These paths are tied to the VM name or UUID. The local migration breaks the assumption that there will be only one VM with a unique UUID and name. During local migrations there can be multiple VMs with same name and UUID in the same host. Append the domain-id field to the path so that there is no duplication of path names. Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- src/qemu/qemu_conf.c | 4 ++-- src/qemu/qemu_domain.c | 40 +++++++++++++++++++++++++++++++++------- src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_process.c | 6 ++++++ 4 files changed, 45 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index b62dd1d..a2d55ce 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1896,7 +1896,7 @@ qemuGetDomainHugepagePath(const virDomainDef *def, char *ret = NULL; if (base && domPath) - ret = g_strdup_printf("%s/%s", base, domPath); + ret = g_strdup_printf("%s/%s-%d", base, domPath, def->id); return ret; } @@ -1964,7 +1964,7 @@ qemuGetMemoryBackingDomainPath(const virDomainDef *def, return -1; qemuGetMemoryBackingBasePath(cfg, &base); - *path = g_strdup_printf("%s/%s", base, shortName); + *path = g_strdup_printf("%s/%s-%d", base, shortName, def->id); return 0; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ce0c5b7..9478501 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2127,11 +2127,13 @@ qemuDomainSetPrivatePathsOld(virQEMUDriverPtr driver, virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); if (!priv->libDir) - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, vm->def->name); + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, + vm->def->name, vm->def->id); if (!priv->channelTargetDir) - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", - cfg->channelTargetDir, vm->def->name); + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", + cfg->channelTargetDir, + vm->def->name, vm->def->id); virObjectUnref(cfg); } @@ -2150,12 +2152,19 @@ qemuDomainSetPrivatePaths(virQEMUDriverPtr driver, goto cleanup; if (!priv->libDir) - priv->libDir = g_strdup_printf("%s/domain-%s", cfg->libDir, domname); + priv->libDir = g_strdup_printf("%s/domain-%s-%d", cfg->libDir, domname, + vm->def->id); + if (!priv->libDirsymLink) + priv->libDirsymLink = g_strdup_printf("%s/domain-%s", cfg->libDir, domname); if (!priv->channelTargetDir) - priv->channelTargetDir = g_strdup_printf("%s/domain-%s", - cfg->channelTargetDir, domname); - + priv->channelTargetDir = g_strdup_printf("%s/domain-%s-%d", + cfg->channelTargetDir, + domname, vm->def->id); + if (!priv->channelTargetDirsymLink) + priv->channelTargetDirsymLink = g_strdup_printf("%s/domain-%s", + cfg->channelTargetDir, + domname); ret = 0; cleanup: virObjectUnref(cfg); @@ -2163,6 +2172,23 @@ qemuDomainSetPrivatePaths(virQEMUDriverPtr driver, return ret; } +int +qemuDomainSymLinkPrivatePaths(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + if (symlink(priv->libDir, priv->libDirsymLink ) != 0) + goto cleanup; + if (symlink(priv->channelTargetDir,priv->channelTargetDirsymLink ) != 0) + goto cleanup; + ret = 0; + cleanup: + virObjectUnref(cfg); + return ret; +} + static void dbusVMStateHashFree(void *opaque) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 21ece23..7cdb16b 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -358,6 +358,8 @@ struct _qemuDomainObjPrivate { char *libDir; /* base path for per-domain files */ char *channelTargetDir; /* base path for per-domain channel targets */ + char *libDirsymLink; /* base path for per-domain files for backward compatibility*/ + char *channelTargetDirsymLink; /* base path for per-domain channel targets for backward compatibility */ /* random masterKey and length for encryption (not to be saved in our */ /* private XML) - need to restore at process reconnect */ uint8_t *masterKey; @@ -1014,6 +1016,8 @@ bool qemuDomainNetSupportsMTU(virDomainNetType type); int qemuDomainSetPrivatePaths(virQEMUDriverPtr driver, virDomainObjPtr vm); +int qemuDomainSymLinkPrivatePaths(virQEMUDriverPtr driver, + virDomainObjPtr vm); virDomainDiskDefPtr qemuDomainDiskByName(virDomainDefPtr def, const char *name); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 9029fd1..1187275 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -6527,6 +6527,10 @@ qemuProcessPrepareHost(virQEMUDriverPtr driver, if (qemuProcessMakeDir(driver, vm, priv->libDir) < 0 || qemuProcessMakeDir(driver, vm, priv->channelTargetDir) < 0) return -1; + if (!(flags & VIR_QEMU_PROCESS_START_LOCAL_MIG)) { + if (qemuDomainSymLinkPrivatePaths(driver, vm) < 0) + goto cleanup; + } VIR_DEBUG("Write domain masterKey"); if (qemuDomainWriteMasterKeyFile(driver, vm) < 0) @@ -7365,6 +7369,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, virFileDeleteTree(priv->libDir); virFileDeleteTree(priv->channelTargetDir); + unlink(priv->libDirsymLink); + unlink(priv->channelTargetDirsymLink); ignore_value(virDomainChrDefForeach(vm->def, false, qemuProcessCleanupChardevDevice, -- >Preventing users from shooting themselves in the foot is a core >part of the value that libvirt is adding for QEMU migration. We >do this with stable device ABI, and controlled locking / disk >labelling, and CPU compatibility checking, and so on. >We're not perfect by any means, but the one thing we've tried >very hard to ensure is that if the destination QEMU fails for >any reason during migration, the user should always be able to >rollback to use the original source QEMU without problems. >The localhost migration support makes it harder to guarantee >the the source QEMU is not broken, so I do think we need to >make extra affect to protect users, if we're going to try to >allow this. >This series has taken the approach of trying to make the localhost >migration work as if it were just a normal remote migration, with >just the minimum change to alter some auto-generated paths on disk, >and keeping the second list of domains. >So we still the same begin/prepare/perform/finish/confirm >phases fully separated. IOW, essentially the migration code has >very little, almost no, knowlege of the fact that this is a >localhost migration. >This is understandaable as a way to minimize the invasiveness >of any changes, but I think it misses the point that localhost >migration needs more than just changing a few paths on disk. > > Adding at least partial support where we could say we rely on QEMU failing > > reasonably would allow couple of mgmt apps to do more than they can do now. And > >they might have taken care of the path collisions (e.g. when running libvirtd in > >containers or so). >If libvirtd is running inside a container, then from libvirt's >POV this is no longer localhost migration - it is regular >cross-host migration. The only caveat is that the container >must be reporting a unique hostname + UUID. We can at least >override UUID in libvirtd.conf if that isn't the case. > > > If localhost migration is only usable in a small subset > > scenarios, I'm not convinced it is worth the support > > burden. Rarely used & tested features are liable to > > bitrot, and migration is already a serious point of > > instability in QEMU in general. Regards Shaju

On Wed, Feb 19, 2020 at 02:39:22AM +0000, Shaju Abraham wrote:
On 2/11/20, 7:06 PM, "Daniel P. Berrangé" <berrange@redhat.com> wrote:
On Tue, Feb 11, 2020 at 10:05:53AM +0100, Martin Kletzander wrote:
On Wed, Feb 05, 2020 at 05:32:50PM +0000, Daniel P. Berrangé wrote: > > > On Mon, Feb 03, 2020 at 12:43:32PM +0000, Daniel P. Berrangé wrote: > > > > From: Shaju Abraham <shaju.abraham@nutanix.com> > > > > > > > There are various config paths that a VM uses. The monitor paths and > > > > other lib paths are examples. These paths are tied to the VM name or > > > > UUID. The local migration breaks the assumption that there will be only > > > > one VM with a unique UUID and name. During local migrations there can be > > > > multiple VMs with same name and UUID in the same host. Append the > > > > domain-id field to the path so that there is no duplication of path > > > names. > > > > > >This is the really critical problem with localhost migration. > > > > > >Appending the domain-id looks "simple" but this is a significant > > >behavioural / functional change for applications, and I don't think > > >it can fully solve the problem either. > > > > > >This is changing thue paths used in various places where libvirt > > >internally generates unique paths (eg QMP socket, huge page or > > >file based memory paths, and defaults used for auto-filling > > >device paths (<channel> if not specified). > > > > > >Some of these paths are functionally important to external > > >applications and cannot be changed in this way. eg stuff > > >integrating with QEMU can be expecting certain memory backing > > >file paths, or certain <channel> paths & is liable to break > > >if we change the naming convention. > > > > > >For sake of argument, lets assume we can changing the naming > > >convention without breaking anything... > > > > > > >This was already done in (I would say) most places as they use > >virDomainDefGetShortName() to get a short, unique name of a directory -- it uses > >the domain ID as a prefix. > > > > > This only applies to paths libvirt generates at VM startup. > > > > > >There are plenty of configuration elements in the guest XML > > >that are end user / mgmt app defined, and reference paths in > > >the host OS. > > > > > >For example <graphics>, <serial>, <console>, <channel>, > > >all support UNIX domain sockets and TCP sockets. A UNIX > > >domain socket cannot be listened on by multiple VMs > > >at once. If the UNIX socket is in client mode, we cannot > > >assume the thing QEMU is connecting to allows multiple > > >concurrent connections. eg 2 QEMU's could have their > > ><serial> connected together over a UNIX socket pair. > > >Similarly if automatic TCP port assignment is not used > > >we cannot have multiple QEMU's listening on the same > > >host. > > > > > >One answer is to say that localhost migration is just > > >not supported for such VMs, but I don't find that very > > >convincing because the UNIX domain socket configs > > >affected are in common use. > > > > > > >I would be okay with saying that these either need to be changed in a provided > >destination XML or the migration will probably break. I do not think it is > >unreasonable to say that if users are trying to shoot themselves, we should not > >spend a ridiculous time just so we can prevent that. Otherwise we will get to > >the same point as we are now -- there might be a case where a local migration > >would work, but users cannot execute it even if they were very cautious and went > >through all the things that can prevent it from the technical point of view, > >libvirt will still disallow that.
>If there are clashing resources, we can't rely on QEMU reporting an >error. For example with a UNIX domain socket, the first thing QEMU >does is unlink(/socket/path), which will blow away the UNIX domain >socket belonging to the original QEMU. As a result if migration >fails, and we rollback, the original QEMU will be broken.
By appending the id field to the path, we are effectively nullifying this particular concern. Each qemu instance will get its own unique path and monitor. If a migration fails, we can roll back.
No, you've not nullified the problem. This only helps the cases where libvirt generates the path. This is only a subset of affected cases. Just one example: <graphics type='vnc' socket='/some/path/QEMUGuest1-vnc.sock'> there are many other parts of the domain XML that accept UNIX socket paths where the mgmt app picks the socket path. Nothing is validating this to prevent clashes between the src+dst QEMU on the same host, meaning on migration rollback, src QEMU is broken. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Feb 19, 2020 at 02:39:22AM +0000, Shaju Abraham wrote:
On 2/11/20, 7:06 PM, "Daniel P. Berrangé" <berrange@redhat.com> wrote:
On Tue, Feb 11, 2020 at 10:05:53AM +0100, Martin Kletzander wrote:
On Wed, Feb 05, 2020 at 05:32:50PM +0000, Daniel P. Berrangé
wrote:
> > > On Mon, Feb 03, 2020 at 12:43:32PM +0000, Daniel P. Berrangé
wrote:
> > > > From: Shaju Abraham <shaju.abraham@nutanix.com> > > > > > > > There are various config paths that a VM uses. The monitor
paths and
> > > > other lib paths are examples. These paths are tied to the VM
name or
> > > > UUID. The local migration breaks the assumption that there
will be only
> > > > one VM with a unique UUID and name. During local migrations
there can be
> > > > multiple VMs with same name and UUID in the same host.
Append the
> > > > domain-id field to the path so that there is no duplication
of path
> > > names. > > > > > >This is the really critical problem with localhost migration. > > > > > >Appending the domain-id looks "simple" but this is a significant > > >behavioural / functional change for applications, and I don't
think
> > >it can fully solve the problem either. > > > > > >This is changing thue paths used in various places where libvirt > > >internally generates unique paths (eg QMP socket, huge page or > > >file based memory paths, and defaults used for auto-filling > > >device paths (<channel> if not specified). > > > > > >Some of these paths are functionally important to external > > >applications and cannot be changed in this way. eg stuff > > >integrating with QEMU can be expecting certain memory backing > > >file paths, or certain <channel> paths & is liable to break > > >if we change the naming convention. > > > > > >For sake of argument, lets assume we can changing the naming > > >convention without breaking anything... > > > > > > >This was already done in (I would say) most places as they use > >virDomainDefGetShortName() to get a short, unique name of a
directory -- it uses
> >the domain ID as a prefix. > > > > > This only applies to paths libvirt generates at VM startup. > > > > > >There are plenty of configuration elements in the guest XML > > >that are end user / mgmt app defined, and reference paths in > > >the host OS. > > > > > >For example <graphics>, <serial>, <console>, <channel>, > > >all support UNIX domain sockets and TCP sockets. A UNIX > > >domain socket cannot be listened on by multiple VMs > > >at once. If the UNIX socket is in client mode, we cannot > > >assume the thing QEMU is connecting to allows multiple > > >concurrent connections. eg 2 QEMU's could have their > > ><serial> connected together over a UNIX socket pair. > > >Similarly if automatic TCP port assignment is not used > > >we cannot have multiple QEMU's listening on the same > > >host. > > > > > >One answer is to say that localhost migration is just > > >not supported for such VMs, but I don't find that very > > >convincing because the UNIX domain socket configs > > >affected are in common use. > > > > > > >I would be okay with saying that these either need to be changed
in a provided
> >destination XML or the migration will probably break. I do not
think it is
> >unreasonable to say that if users are trying to shoot themselves,
we should not
> >spend a ridiculous time just so we can prevent that. Otherwise
we will get to
> >the same point as we are now -- there might be a case where a
local migration
> >would work, but users cannot execute it even if they were very
cautious and went
> >through all the things that can prevent it from the technical
point of view,
> >libvirt will still disallow that.
>If there are clashing resources, we can't rely on QEMU reporting an >error. For example with a UNIX domain socket, the first thing QEMU >does is unlink(/socket/path), which will blow away the UNIX domain >socket belonging to the original QEMU. As a result if migration >fails, and we rollback, the original QEMU will be broken.
By appending the id field to the path, we are effectively nullifying
this particular
concern. Each qemu instance will get its own unique path and monitor. If a migration fails, we can roll back.
No, you've not nullified the problem. This only helps the cases where libvirt generates the path. This is only a subset of affected cases. Just one example:
<graphics type='vnc' socket='/some/path/QEMUGuest1-vnc.sock'>
there are many other parts of the domain XML that accept UNIX socket paths where the mgmt app picks the socket path. Nothing is validating this to prevent clashes between the src+dst QEMU on the same host, meaning on migration rollback, src QEMU is broken.
It is true that I have not covered all the use cases. I would like to know if the approach using the symlink is acceptable. In that case we can have the same design for externally generated paths as well. Siting your example, if management application
On Tue, Feb 25, 2020 at 6:29 PM Daniel P. Berrangé <berrange@redhat.com> wrote: provides a path like <graphics type='vnc' socket='/some/path/QEMUGuest1-vnc.sock'> We can always consider the path "'/some/path/QEMUGuest1-vnc.sock'" as a symlink to the internally generated path generated appending the ID field. /some/path/QEMUGuest1-vnc.sock--->/some/path-ID/QEMUGuest1-vnc.sock. The management application will always refer by the path it has provided and will be valid even during migration. At the end of migration the symlink will be changed to point to the destination QEMU. Regards Shaju

On Wed, Feb 26, 2020 at 10:50:20AM +0530, Shaju Abraham wrote:
On Tue, Feb 25, 2020 at 6:29 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Wed, Feb 19, 2020 at 02:39:22AM +0000, Shaju Abraham wrote:
On 2/11/20, 7:06 PM, "Daniel P. Berrangé" <berrange@redhat.com> wrote:
On Tue, Feb 11, 2020 at 10:05:53AM +0100, Martin Kletzander wrote:
On Wed, Feb 05, 2020 at 05:32:50PM +0000, Daniel P. Berrangé
wrote:
> > > On Mon, Feb 03, 2020 at 12:43:32PM +0000, Daniel P. Berrangé
wrote:
> > > > From: Shaju Abraham <shaju.abraham@nutanix.com> > > > > > > > There are various config paths that a VM uses. The monitor
paths and
> > > > other lib paths are examples. These paths are tied to the VM
name or
> > > > UUID. The local migration breaks the assumption that there
will be only
> > > > one VM with a unique UUID and name. During local migrations
there can be
> > > > multiple VMs with same name and UUID in the same host.
Append the
> > > > domain-id field to the path so that there is no duplication
of path
> > > names. > > > > > >This is the really critical problem with localhost migration. > > > > > >Appending the domain-id looks "simple" but this is a significant > > >behavioural / functional change for applications, and I don't
think
> > >it can fully solve the problem either. > > > > > >This is changing thue paths used in various places where libvirt > > >internally generates unique paths (eg QMP socket, huge page or > > >file based memory paths, and defaults used for auto-filling > > >device paths (<channel> if not specified). > > > > > >Some of these paths are functionally important to external > > >applications and cannot be changed in this way. eg stuff > > >integrating with QEMU can be expecting certain memory backing > > >file paths, or certain <channel> paths & is liable to break > > >if we change the naming convention. > > > > > >For sake of argument, lets assume we can changing the naming > > >convention without breaking anything... > > > > > > >This was already done in (I would say) most places as they use > >virDomainDefGetShortName() to get a short, unique name of a
directory -- it uses
> >the domain ID as a prefix. > > > > > This only applies to paths libvirt generates at VM startup. > > > > > >There are plenty of configuration elements in the guest XML > > >that are end user / mgmt app defined, and reference paths in > > >the host OS. > > > > > >For example <graphics>, <serial>, <console>, <channel>, > > >all support UNIX domain sockets and TCP sockets. A UNIX > > >domain socket cannot be listened on by multiple VMs > > >at once. If the UNIX socket is in client mode, we cannot > > >assume the thing QEMU is connecting to allows multiple > > >concurrent connections. eg 2 QEMU's could have their > > ><serial> connected together over a UNIX socket pair. > > >Similarly if automatic TCP port assignment is not used > > >we cannot have multiple QEMU's listening on the same > > >host. > > > > > >One answer is to say that localhost migration is just > > >not supported for such VMs, but I don't find that very > > >convincing because the UNIX domain socket configs > > >affected are in common use. > > > > > > >I would be okay with saying that these either need to be changed
in a provided
> >destination XML or the migration will probably break. I do not
think it is
> >unreasonable to say that if users are trying to shoot themselves,
we should not
> >spend a ridiculous time just so we can prevent that. Otherwise
we will get to
> >the same point as we are now -- there might be a case where a
local migration
> >would work, but users cannot execute it even if they were very
cautious and went
> >through all the things that can prevent it from the technical
point of view,
> >libvirt will still disallow that.
>If there are clashing resources, we can't rely on QEMU reporting an >error. For example with a UNIX domain socket, the first thing QEMU >does is unlink(/socket/path), which will blow away the UNIX domain >socket belonging to the original QEMU. As a result if migration >fails, and we rollback, the original QEMU will be broken.
By appending the id field to the path, we are effectively nullifying
this particular
concern. Each qemu instance will get its own unique path and monitor. If a migration fails, we can roll back.
No, you've not nullified the problem. This only helps the cases where libvirt generates the path. This is only a subset of affected cases. Just one example:
<graphics type='vnc' socket='/some/path/QEMUGuest1-vnc.sock'>
there are many other parts of the domain XML that accept UNIX socket paths where the mgmt app picks the socket path. Nothing is validating this to prevent clashes between the src+dst QEMU on the same host, meaning on migration rollback, src QEMU is broken.
It is true that I have not covered all the use cases. I would like to know if the approach using the symlink is acceptable. In that case we can have the same design for externally generated paths as well. Siting your example, if management application provides a path like
<graphics type='vnc' socket='/some/path/QEMUGuest1-vnc.sock'>
We can always consider the path "'/some/path/QEMUGuest1-vnc.sock'" as a symlink to the internally generated path generated appending the ID field.
/some/path/QEMUGuest1-vnc.sock--->/some/path-ID/QEMUGuest1-vnc.sock.
The management application will always refer by the path it has provided and will be valid even during migration. At the end of migration the symlink will be changed to point to the destination QEMU.
This sounds like it could be the least worst solution to the problem of clashing UNIX sockets. We've still got a log file problem. This could potentially done the same way with symlinks, but that will cause significant complications with cummulative logfile size limiting and rollover processing. We already pass in pre-opened FDs to QEMU from virtlogd, so potentially we can ask virtlogd for the pre-existing FDs it gave to the src QEMU and also give them to the dest QEMU for reuse. Clashing IP addresses is also a problem. For VNC/SPICE at least we have the auto-port allocation which is a good solution. We might need to consider expanding that to other scenarios. The only other gross hack I thought of would be to live migrate twice. Once to a new QEMU that uses arbitrary paths, and then again to a new QEMU that uses the user specified paths. That has bad recovery scenarios though if the 2nd migration fails, which makes it impractical I think. A further option would be to have a way for QEMU to pass its existing open FDs for chardevs & sockets, etc to the new QEMU using SCM_RIGHTS passing. This is conceptually my favourite, but I think it is not practical in reality given the way that QEMU backends are implemented. All these FDs are needed during early QEMU intiialization, before the migration data stream is established. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

From: Shaju Abraham <shaju.abraham@nutanix.com> After the confirm phase Libvirt domain objects should only be on the source hash table. Move the remote hash table entry corresponding to the migrated VM to the source hash table. Delete the moved entries from remote hash table. Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com> --- src/qemu/qemu_domain.c | 12 +++++++++- src/qemu/qemu_domain.h | 2 ++ src/qemu/qemu_driver.c | 46 ++++++++++++++++++++++++++++++++++----- src/qemu/qemu_migration.c | 13 +++++++++++ src/qemu/qemu_migration.h | 4 ++++ 5 files changed, 71 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 002c092cf8..cbf779e9eb 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -16570,7 +16570,17 @@ qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason) return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED; } - +int +qemuDomainInsertMigrateVMRef(virQEMUDriverPtr driver, + virDomainObjPtr remote_vm) +{ + if (virDomainObjListRemoveRemote(driver->domains, remote_vm) < 0) + return -1; + if (virDomainObjListAddEntry(driver->domains, remote_vm) < 0) + return -1; + else + return 0; +} /* qemuDomainIsUsingNoShutdown: * @priv: Domain private data diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index c581b3a162..caf8b2a8a5 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -828,6 +828,8 @@ void qemuDomainRemoveInactiveJob(virQEMUDriverPtr driver, void qemuDomainRemoveInactiveJobLocked(virQEMUDriverPtr driver, virDomainObjPtr vm); +int qemuDomainInsertMigrateVMRef(virQEMUDriverPtr driver, + virDomainObjPtr vm); void qemuDomainSetFakeReboot(virQEMUDriverPtr driver, virDomainObjPtr vm, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3e6d14ead8..df639b5299 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -152,6 +152,33 @@ static int qemuOpenFileAs(uid_t fallback_uid, gid_t fallback_gid, bool *needUnlink); static virQEMUDriverPtr qemu_driver; +/** + * qemuDomObjFromDomainRemote: + * @domain: Domain pointer that has to be looked up + * + * This function looks up @domain and returns the appropriate virDomainObjPtr + * that has to be released by calling virDomainObjEndAPI(). + * + * Returns the domain object with incremented reference counter which is locked + * on success, NULL otherwise. + */ +static virDomainObjPtr +qemuDomObjFromDomainRemote(virDomainPtr domain) +{ + virDomainObjPtr vm; + virQEMUDriverPtr driver = domain->conn->privateData; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + + vm = virDomainObjListFindByUUIDRemote(driver->domains, domain->uuid); + if (!vm) { + virUUIDFormat(domain->uuid, uuidstr); + virReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s' (%s)"), + uuidstr, domain->name); + return NULL; + } + return vm; +} /* Looks up the domain object from snapshot and unlocks the * driver. The returned domain object is locked and ref'd and the @@ -13188,8 +13215,10 @@ qemuDomainMigrateFinish3Params(virConnectPtr dconn, virReportError(VIR_ERR_NO_DOMAIN, "%s", _("missing domain name")); return NULL; } - - vm = virDomainObjListFindByName(driver->domains, dname); + if (flags & VIR_MIGRATE_LOCAL) + vm = virDomainObjListFindByNameRemote(driver->domains, dname); + else + vm = virDomainObjListFindByName(driver->domains, dname); if (!vm) { virReportError(VIR_ERR_NO_DOMAIN, _("no domain with matching name '%s'"), dname); @@ -13241,7 +13270,8 @@ qemuDomainMigrateConfirm3Params(virDomainPtr domain, unsigned int flags, int cancelled) { - virDomainObjPtr vm; + virDomainObjPtr vm, remote_vm; + int ret = 0; virCheckFlags(QEMU_MIGRATION_FLAGS, -1); @@ -13250,14 +13280,20 @@ qemuDomainMigrateConfirm3Params(virDomainPtr domain, if (!(vm = qemuDomainObjFromDomain(domain))) return -1; + if (!(remote_vm = qemuDomObjFromDomainRemote(domain))) + return -1; if (virDomainMigrateConfirm3ParamsEnsureACL(domain->conn, vm->def) < 0) { virDomainObjEndAPI(&vm); return -1; } - - return qemuMigrationSrcConfirm(domain->conn->privateData, vm, cookiein, cookieinlen, + ret = qemuMigrationSrcConfirm(domain->conn->privateData, vm, cookiein, cookieinlen, flags, cancelled); + if (ret < 0) + return ret; + ret = qemuMigrationSrcMergeHashTables(domain->conn->privateData, + remote_vm, flags); + return ret; } diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d22daeac9c..45403d361e 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3092,6 +3092,19 @@ qemuMigrationSrcConfirm(virQEMUDriverPtr driver, virObjectUnref(cfg); return ret; } +int qemuMigrationSrcMergeHashTables(virQEMUDriverPtr driver, + virDomainObjPtr vm, + unsigned int flags) +{ + int ret = 0; + if (!(flags & VIR_MIGRATE_LOCAL)) + return ret; + ret = qemuDomainInsertMigrateVMRef(driver, vm); + if (ret < 0) + return ret; + qemuDomainSaveStatus(vm); + return ret; +} enum qemuMigrationDestinationType { diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 2cb4bb9023..f6d67e3997 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -256,3 +256,7 @@ qemuMigrationSrcFetchMirrorStats(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob asyncJob, qemuDomainJobInfoPtr jobInfo); + +int qemuMigrationSrcMergeHashTables(virQEMUDriverPtr driver, + virDomainObjPtr vm, + unsigned int flags); -- 2.24.1

Hi Daniel, I am happy that Libvirt is pushing local migration/live patching support, but at the same time I am wondering what changed from what you said here: https://www.redhat.com/archives/libvir-list/2017-September/msg00489.html To give you a background, we have live patching enhancements in IBM backlog since a few years ago, and one on the reasons these were being postponed time and time again were the lack of Libvirt support and this direction of "Libvirt is not interested in supporting it". And this message above was being used internally as the rationale for it. Thanks, DHB On 2/3/20 9:43 AM, Daniel P. Berrangé wrote:
I'm (re-)sending this patch series on behalf of Shaju Abraham <shaju.abraham@nutanix.com> who has tried to send this several times already.
Red Hat's email infrastructure is broken, accepting the mails and then failing to deliver them to mailman, or any other Red Hat address. Unfortunately it means that while we can send comments back to Shaju on this thread, subscribers will then probably fail to see any responses Shaju tries to give :-( To say this is bad is an understatement. I have yet another ticket open tracking & escalating this awful problem but can't give any ETA on a fix :-(
Anyway, with that out of the way, here's Shaju's original cover letter below....
1) What is this patch series about?
Local live migration of a VM is about Live migrating a VM instance with in the same node. Traditional libvirt live migration involves migrating the VM from a source node to a remote node. The local migrations are forbidden in Libvirt for a myriad of reasons. This patch series is to enable local migration in Libvirt.
2) Why Local Migration is important?
The ability to Live migrate a VM locally paves the way for hypervisor upgrades without shutting down the VM. For example to upgrade qemu after a security upgrade, we can locally migrate the VM to the new qemu instance. By utilising capabilities like "bypass-shared-memory" in qemu, the hypervisor upgrades are faster.
3) Why is local migration difficult in Libvirt?
Libvirt always assumes that the name/UUID pair is unique with in a node. During local migration there will be two different VMs with the same UUID/name pair which will confuse the management stack. There are other path variables like monitor path, config paths etc which assumes that the name/UUID pair is unique. So during migration the same monitor will be used by both the source and the target. We cannot assign a temporary UUID to the target VM, since UUID is a part of the machine ABI which is immutable. To decouple the dependecy on UUID/name, a new field (the domain id) is included in all the PATHs that Libvirt uses. This will ensure that all instances of the VM gets a unique PATH.
4) How is the Local Migration Designed ?
Libvirt manages all the VM domain objects using two hash tables which are indexed using either the UUID or Name.During the Live migration the domain entry in the source node gets deleted and a new entry gets populated in the target node, which are indexed using the same name/UUID.But for the Local migration, there is no remote node. Both the source and the target nodes are same. So inorder to model the remote node, two more hashtables are introduced which represents the hash tables of the remote node during migration. The Libvirt migration involves 5 stages 1) Begin 2) Prepare 3) Perform 4) Finish 5) Confirm
Begin,Perform and Confirm gets executed on the source node where as Prepare and Finish gets executed on the target node. In the case of Local Migration Perform and Finish stages uses the newly introduced 'remote hash table' and rest of the stages uses the 'source hash tables'. Once the migration is completed, that is after the confirm phase, the VM domain object is moved from the 'remote hash table' to the 'source hash table'. This is required so that other Libvirt commands like 'virsh list' can display all the VMs running in the node.
5) How to test Local Migration?
A new flag 'local' is added to the 'virsh migrate' command to enable local migration. The syntax is virsh migrate --live --local 'domain-id' qemu+ssh://ip-address/system
6) What are the known issues?
SeLinux policies is know to have issues with the creating /dev/hugepages entries during VM launch. In order to test local migration disable SeLinux using 'setenforce 0'.
Shaju Abraham (6): Add VIR_MIGRATE_LOCAL flag to virsh migrate command Introduce remote hash tables and helper routines Add local migration support in QEMU Migration framework Modify close callback routines to handle local migration Make PATHs unique for a VM object instance Move the domain object from remote to source hash table
include/libvirt/libvirt-domain.h | 6 + src/conf/virdomainobjlist.c | 232 +++++++++++++++++++++++++++++-- src/conf/virdomainobjlist.h | 10 ++ src/libvirt_private.syms | 4 + src/qemu/qemu_conf.c | 4 +- src/qemu/qemu_domain.c | 28 +++- src/qemu/qemu_domain.h | 2 + src/qemu/qemu_driver.c | 46 +++++- src/qemu/qemu_migration.c | 59 +++++--- src/qemu/qemu_migration.h | 5 + src/qemu/qemu_migration_cookie.c | 121 ++++++++-------- src/qemu/qemu_migration_cookie.h | 2 + src/qemu/qemu_process.c | 3 +- src/qemu/qemu_process.h | 2 + src/util/virclosecallbacks.c | 48 +++++-- src/util/virclosecallbacks.h | 3 + tools/virsh-domain.c | 7 + 17 files changed, 471 insertions(+), 111 deletions(-)

On Mon, Feb 03, 2020 at 10:42:48AM -0300, Daniel Henrique Barboza wrote:
Hi Daniel,
I am happy that Libvirt is pushing local migration/live patching support, but at the same time I am wondering what changed from what you said here:
Err, this isn't libvirt pushing local migration. I'm simply re-posting these patches on behalf of Shaju who is unable to post the patches due to our broken mail server. Don't take this as meaning that I approve of the patches. They're simply here for discussion as any other patch proposal is.
https://www.redhat.com/archives/libvir-list/2017-September/msg00489.html
That is largely still my view.
To give you a background, we have live patching enhancements in IBM backlog since a few years ago, and one on the reasons these were being postponed time and time again were the lack of Libvirt support and this direction of "Libvirt is not interested in supporting it". And this message above was being used internally as the rationale for it.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 2/3/20 5:43 AM, Daniel P. Berrangé wrote:
I'm (re-)sending this patch series on behalf of Shaju Abraham <shaju.abraham@nutanix.com> who has tried to send this several times already.
Red Hat's email infrastructure is broken, accepting the mails and then failing to deliver them to mailman, or any other Red Hat address. Unfortunately it means that while we can send comments back to Shaju on this thread, subscribers will then probably fail to see any responses Shaju tries to give :-( To say this is bad is an understatement. I have yet another ticket open tracking & escalating this awful problem but can't give any ETA on a fix :-(
Anyway, with that out of the way, here's Shaju's original cover letter below....
1) What is this patch series about?
Local live migration of a VM is about Live migrating a VM instance with in the same node. Traditional libvirt live migration involves migrating the VM from a source node to a remote node. The local migrations are forbidden in Libvirt for a myriad of reasons. This patch series is to enable local migration in Libvirt.
2) Why Local Migration is important?
The ability to Live migrate a VM locally paves the way for hypervisor upgrades without shutting down the VM. For example to upgrade qemu after a security upgrade, we can locally migrate the VM to the new qemu instance. By utilising capabilities like "bypass-shared-memory" in qemu, the hypervisor upgrades are faster.
3) Why is local migration difficult in Libvirt?
Libvirt always assumes that the name/UUID pair is unique with in a node. During local migration there will be two different VMs with the same UUID/name pair which will confuse the management stack. There are other path variables like monitor path, config paths etc which assumes that the name/UUID pair is unique. So during migration the same monitor will be used by both the source and the target. We cannot assign a temporary UUID to the target VM, since UUID is a part of the machine ABI which is immutable. To decouple the dependecy on UUID/name, a new field (the domain id) is included in all the PATHs that Libvirt uses. This will ensure that all instances of the VM gets a unique PATH.
Since localhost migration is difficult, and there will likely be some growing pains until the feature is fully baked, perhaps it is best to have a knob for enabling/disabling it. The namespace feature suffered similar growing pains and having the ability to disable it in qemu.conf proved quite handy at times. Regards, Jim
4) How is the Local Migration Designed ?
Libvirt manages all the VM domain objects using two hash tables which are indexed using either the UUID or Name.During the Live migration the domain entry in the source node gets deleted and a new entry gets populated in the target node, which are indexed using the same name/UUID.But for the Local migration, there is no remote node. Both the source and the target nodes are same. So inorder to model the remote node, two more hashtables are introduced which represents the hash tables of the remote node during migration. The Libvirt migration involves 5 stages 1) Begin 2) Prepare 3) Perform 4) Finish 5) Confirm
Begin,Perform and Confirm gets executed on the source node where as Prepare and Finish gets executed on the target node. In the case of Local Migration Perform and Finish stages uses the newly introduced 'remote hash table' and rest of the stages uses the 'source hash tables'. Once the migration is completed, that is after the confirm phase, the VM domain object is moved from the 'remote hash table' to the 'source hash table'. This is required so that other Libvirt commands like 'virsh list' can display all the VMs running in the node.
5) How to test Local Migration?
A new flag 'local' is added to the 'virsh migrate' command to enable local migration. The syntax is virsh migrate --live --local 'domain-id' qemu+ssh://ip-address/system
6) What are the known issues?
SeLinux policies is know to have issues with the creating /dev/hugepages entries during VM launch. In order to test local migration disable SeLinux using 'setenforce 0'.
Shaju Abraham (6): Add VIR_MIGRATE_LOCAL flag to virsh migrate command Introduce remote hash tables and helper routines Add local migration support in QEMU Migration framework Modify close callback routines to handle local migration Make PATHs unique for a VM object instance Move the domain object from remote to source hash table
include/libvirt/libvirt-domain.h | 6 + src/conf/virdomainobjlist.c | 232 +++++++++++++++++++++++++++++-- src/conf/virdomainobjlist.h | 10 ++ src/libvirt_private.syms | 4 + src/qemu/qemu_conf.c | 4 +- src/qemu/qemu_domain.c | 28 +++- src/qemu/qemu_domain.h | 2 + src/qemu/qemu_driver.c | 46 +++++- src/qemu/qemu_migration.c | 59 +++++--- src/qemu/qemu_migration.h | 5 + src/qemu/qemu_migration_cookie.c | 121 ++++++++-------- src/qemu/qemu_migration_cookie.h | 2 + src/qemu/qemu_process.c | 3 +- src/qemu/qemu_process.h | 2 + src/util/virclosecallbacks.c | 48 +++++-- src/util/virclosecallbacks.h | 3 + tools/virsh-domain.c | 7 + 17 files changed, 471 insertions(+), 111 deletions(-)

On Mon, Feb 10, 2020 at 08:45:28AM -0700, Jim Fehlig wrote:
On 2/3/20 5:43 AM, Daniel P. Berrangé wrote:
I'm (re-)sending this patch series on behalf of Shaju Abraham <shaju.abraham@nutanix.com> who has tried to send this several times already.
Red Hat's email infrastructure is broken, accepting the mails and then failing to deliver them to mailman, or any other Red Hat address. Unfortunately it means that while we can send comments back to Shaju on this thread, subscribers will then probably fail to see any responses Shaju tries to give :-( To say this is bad is an understatement. I have yet another ticket open tracking & escalating this awful problem but can't give any ETA on a fix :-(
Anyway, with that out of the way, here's Shaju's original cover letter below....
1) What is this patch series about?
Local live migration of a VM is about Live migrating a VM instance with in the same node. Traditional libvirt live migration involves migrating the VM from a source node to a remote node. The local migrations are forbidden in Libvirt for a myriad of reasons. This patch series is to enable local migration in Libvirt.
2) Why Local Migration is important?
The ability to Live migrate a VM locally paves the way for hypervisor upgrades without shutting down the VM. For example to upgrade qemu after a security upgrade, we can locally migrate the VM to the new qemu instance. By utilising capabilities like "bypass-shared-memory" in qemu, the hypervisor upgrades are faster.
3) Why is local migration difficult in Libvirt?
Libvirt always assumes that the name/UUID pair is unique with in a node. During local migration there will be two different VMs with the same UUID/name pair which will confuse the management stack. There are other path variables like monitor path, config paths etc which assumes that the name/UUID pair is unique. So during migration the same monitor will be used by both the source and the target. We cannot assign a temporary UUID to the target VM, since UUID is a part of the machine ABI which is immutable. To decouple the dependecy on UUID/name, a new field (the domain id) is included in all the PATHs that Libvirt uses. This will ensure that all instances of the VM gets a unique PATH.
Since localhost migration is difficult, and there will likely be some growing pains until the feature is fully baked, perhaps it is best to have a knob for enabling/disabling it. The namespace feature suffered similar growing pains and having the ability to disable it in qemu.conf proved quite handy at times.
Probably an API flag VIR_MIGRATE_SAME_HOST is sufficient, as that shows an opt-in on the part of the person/thing that initiates it. I think we'd want this flag forever, regardless of whether its experimental or production quality, because there are special concerns about clashing host files/resources. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (6)
-
Daniel Henrique Barboza
-
Daniel P. Berrangé
-
Jim Fehlig
-
Martin Kletzander
-
Shaju Abraham
-
Shaju Abraham