Jim Fehlig wrote:
> +static int
> +libxlDomainMigratePerform3(virDomainPtr dom,
> + const char *xmlin ATTRIBUTE_UNUSED,
> + const char *cookiein ATTRIBUTE_UNUSED,
> + int cookieinlen ATTRIBUTE_UNUSED,
> + char **cookieout ATTRIBUTE_UNUSED,
> + int *cookieoutlen ATTRIBUTE_UNUSED,
> + const char *dconnuri ATTRIBUTE_UNUSED,
> + const char *uri,
> + unsigned long flags,
> + const char *dname ATTRIBUTE_UNUSED,
> + unsigned long resource ATTRIBUTE_UNUSED)
> +{
> + libxlDriverPrivatePtr driver = dom->conn->privateData;
> + char *hostname = NULL;
> + int port;
> + char *servname = NULL;
> + struct addrinfo hints, *res;
> + int sockfd = -1;
> + int ret = -1;
> +
> + virCheckFlags(LIBXL_MIGRATION_FLAGS, -1);
> +
> + libxlDriverLock(driver);
> +
> + if (doParseURI(uri, &hostname, &port))
> + goto cleanup;
> +
> + VIR_DEBUG("hostname = %s, port = %d", hostname, port);
> +
> + if (virAsprintf(&servname, "%d", port ? port :
DEFAULT_MIGRATION_PORT) < 0) {
> + virReportOOMError();
> + goto cleanup;
> + }
> +
> + if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0 ){
> + libxlError(VIR_ERR_OPERATION_FAILED,
> + _("Failed to create socket"));
> + goto cleanup;
> + }
> +
> + memset(&hints, 0, sizeof(hints));
> + hints.ai_family = AF_INET;
> + hints.ai_socktype = SOCK_STREAM;
> + if (getaddrinfo(hostname, servname, &hints, &res) || res == NULL) {
> + libxlError(VIR_ERR_OPERATION_FAILED,
> + _("IP address lookup failed"));
> + goto cleanup;
> + }
> +
> + if (connect(sockfd, res->ai_addr, res->ai_addrlen) < 0) {
> + libxlError(VIR_ERR_OPERATION_FAILED,
> + _("Connect error"));
> + goto cleanup;
> + }
> +
> + ret = doMigrateSend(dom, flags, sockfd);
>
>
Hmm, the entire driver is locked during the migration. I just noticed
libxlDomainSave{Flags} also holds the driver lock during save :-(.
libxlDomainCoreDump drops the lock during the dump and IMO migration
should follow the same pattern.
After some more testing, following the pattern in libxlDomainCoreDump is
not a good idea as it leads to a deadlock.
# virsh dump dom /var/lib/libvirt/libxl/save/test.dump
d1. lock driver
d2. get virDomainObjPtr for domain, acquiring dom obj lock
d3. unlock driver
d4. core dump domain
d5. lock driver
d6. do post dump stuff
d7. unlock driver
d8. unlock dom obj lock
While d4 is in progress, list domains
# virsh list
l1. get num domains
l2. lock driver
l3. call virDomainObjListNumOfDomains, which iterates through domains,
obtaining dom obj lock in process
l3 will block when trying to obtain dom obj lock for the domain being
dumped, and note that it is holding the driver lock. When d4 is
finished, it will attempt to acquire the driver lock - deadlock. A
quick fix would be to lock the driver for duration of the dump, similar
to save. But we really need to prevent locking the driver during these
long running operations.
Question for other libvirt devs:
Many of the libxl driver functions use this pattern
- lock driver
- vm = virDomainFindByUUID // acquires dom obj lock
- unlock driver
- do stuff
- virDomainObjUnlock
In some cases, "do stuff" requires obtaining the driver lock (e.g.
removing a domain from driver's domain list), in which case there is
potential for deadlock. Any suggestions for preventing the deadlock
*and* avoiding locking the driver during long running operations?
Thanks,
Jim