[libvirt] [PATCH] Fix failing virGetHostname.

We've been running into a lot of situations where virGetHostname() is returning "localhost", where a plain gethostname() would have returned the correct thing. This is because virGetHostname() is *always* trying to canonicalize the name returned from gethostname(), even when it doesn't have to. This patch changes virGetHostname so that if the value returned from gethostname() is already FQDN or localhost, it returns that string directly. If the value returned from gethostname() is a shortened hostname, then we try to canonicalize it. If that succeeds, we returned the canonicalized hostname. If that fails, and/or returns "localhost", then we just return the original string we got from gethostname() and hope for the best. Note that after this patch it is up to clients to check whether "localhost" is an allowed return value. The only place where it's currently not is in qemu migration. Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/libvirt_private.syms | 1 - src/qemu/qemu_driver.c | 15 +++++-- src/util/util.c | 94 ++++++++++++++++++++++++--------------------- src/util/util.h | 1 - 4 files changed, 60 insertions(+), 51 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 104278f..130a2a1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -653,7 +653,6 @@ virExecDaemonize; virSetCloseExec; virSetNonBlock; virFormatMacAddr; -virGetHostnameLocalhost; virGetHostname; virParseMacAddr; virFileDeletePid; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0af64c7..cfa5964 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10017,7 +10017,7 @@ qemudDomainMigratePrepare2 (virConnectPtr dconn, virDomainDefPtr def = NULL; virDomainObjPtr vm = NULL; int this_port; - char *hostname; + char *hostname = NULL; char migrateFrom [64]; const char *p; virDomainEventPtr event = NULL; @@ -10057,9 +10057,15 @@ qemudDomainMigratePrepare2 (virConnectPtr dconn, if (port == QEMUD_MIGRATION_NUM_PORTS) port = 0; /* Get hostname */ - if ((hostname = virGetHostnameLocalhost(0)) == NULL) + if ((hostname = virGetHostname(NULL)) == NULL) goto cleanup; + if (STRPREFIX(hostname, "localhost")) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("hostname on destination resolved to localhost, but migration requires an FQDN")); + goto cleanup; + } + /* XXX this really should have been a properly well-formed * URI, but we can't add in tcp:// now without breaking * compatability with old targets. We at least make the @@ -10067,7 +10073,6 @@ qemudDomainMigratePrepare2 (virConnectPtr dconn, */ /* Caller frees */ internalret = virAsprintf(uri_out, "tcp:%s:%d", hostname, this_port); - VIR_FREE(hostname); if (internalret < 0) { virReportOOMError(); goto cleanup; @@ -10171,10 +10176,10 @@ endjob: vm = NULL; cleanup: + VIR_FREE(hostname); virDomainDefFree(def); - if (ret != 0) { + if (ret != 0) VIR_FREE(*uri_out); - } if (vm) virDomainObjUnlock(vm); if (event) diff --git a/src/util/util.c b/src/util/util.c index e937d39..930bfac 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2367,11 +2367,31 @@ char *virIndexToDiskName(int idx, const char *prefix) # define AI_CANONIDN 0 #endif -char *virGetHostnameLocalhost(int allow_localhost) +/* Who knew getting a hostname could be so delicate. In Linux (and Unices + * in general), many things depend on "hostname" returning a value that will + * resolve one way or another. In the modern world where networks frequently + * come and go this is often being hard-coded to resolve to "localhost". If + * it *doesn't* resolve to localhost, then we would prefer to have the FQDN. + * That leads us to 3 possibilities: + * + * 1) gethostname() returns an FQDN (not localhost) - we return the string + * as-is, it's all of the information we want + * 2) gethostname() returns "localhost" - we return localhost; doing further + * work to try to resolve it is pointless + * 3) gethostname() returns a shortened hostname - in this case, we want to + * try to resolve this to a fully-qualified name. Therefore we pass it + * to getaddrinfo(). There are two possible responses: + * a) getaddrinfo() resolves to a FQDN - return the FQDN + * b) getaddrinfo() resolves to localhost - in this case, the data we got + * from gethostname() is actually more useful than what we got from + * getaddrinfo(). Return the value from gethostname() and hope for + * the best. + */ +char *virGetHostname(virConnectPtr conn ATTRIBUTE_UNUSED) { int r; char hostname[HOST_NAME_MAX+1], *result; - struct addrinfo hints, *info, *res; + struct addrinfo hints, *info; r = gethostname (hostname, sizeof(hostname)); if (r == -1) { @@ -2381,6 +2401,21 @@ char *virGetHostnameLocalhost(int allow_localhost) } NUL_TERMINATE(hostname); + if (STRPREFIX(hostname, "localhost") || strchr(hostname, '.')) { + /* in this case, gethostname returned localhost (meaning we can't + * do any further canonicalization), or it returned an FQDN (and + * we don't need to do any further canonicalization). Return the + * string as-is; it's up to callers to check whether "localhost" + * is allowed. + */ + result = strdup(hostname); + goto check_and_return; + } + + /* otherwise, it's a shortened, non-localhost, hostname. Attempt to + * canonicalize the hostname by running it through getaddrinfo + */ + memset(&hints, 0, sizeof(hints)); hints.ai_flags = AI_CANONNAME|AI_CANONIDN; hints.ai_family = AF_UNSPEC; @@ -2395,54 +2430,25 @@ char *virGetHostnameLocalhost(int allow_localhost) /* Tell static analyzers about getaddrinfo semantics. */ sa_assert (info); - /* if we aren't allowing localhost, then we iterate through the - * list and make sure none of the IPv4 addresses are 127.0.0.1 and - * that none of the IPv6 addresses are ::1 - */ - if (!allow_localhost) { - res = info; - while (res) { - if (res->ai_family == AF_INET) { - if (htonl(((struct sockaddr_in *)res->ai_addr)->sin_addr.s_addr) == INADDR_LOOPBACK) { - virUtilError(VIR_ERR_INTERNAL_ERROR, "%s", - _("canonical hostname pointed to localhost, but this is not allowed")); - freeaddrinfo(info); - return NULL; - } - } - else if (res->ai_family == AF_INET6) { - if (IN6_IS_ADDR_LOOPBACK(&((struct sockaddr_in6 *)res->ai_addr)->sin6_addr)) { - virUtilError(VIR_ERR_INTERNAL_ERROR, "%s", - _("canonical hostname pointed to localhost, but this is not allowed")); - freeaddrinfo(info); - return NULL; - } - } - res = res->ai_next; - } - } + if (info->ai_canonname == NULL || + STRPREFIX(info->ai_canonname, "localhost")) + /* in this case, we tried to canonicalize and we ended up back with + * localhost. Ignore the canonicalized name and just return the + * original hostname + */ + result = strdup(hostname); + else + /* Caller frees this string. */ + result = strdup (info->ai_canonname); - if (info->ai_canonname == NULL) { - virUtilError(VIR_ERR_INTERNAL_ERROR, - "%s", _("could not determine canonical host name")); - freeaddrinfo(info); - return NULL; - } + freeaddrinfo(info); - /* Caller frees this string. */ - result = strdup (info->ai_canonname); - if (!result) +check_and_return: + if (result == NULL) virReportOOMError(); - - freeaddrinfo(info); return result; } -char *virGetHostname(virConnectPtr conn ATTRIBUTE_UNUSED) -{ - return virGetHostnameLocalhost(1); -} - /* send signal to a single process */ int virKillProcess(pid_t pid, int sig) { diff --git a/src/util/util.h b/src/util/util.h index 6bf6bcc..f8b64c2 100644 --- a/src/util/util.h +++ b/src/util/util.h @@ -252,7 +252,6 @@ static inline int getuid (void) { return 0; } static inline int getgid (void) { return 0; } # endif -char *virGetHostnameLocalhost(int allow_localhost); char *virGetHostname(virConnectPtr conn); int virKillProcess(pid_t pid, int sig); -- 1.6.6.1

We shouldn't be checking validity in domain_conf, since it can be used by multiple different hosts and hypervisors. Remove the check completely. Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/conf/domain_conf.c | 12 ++---------- 1 files changed, 2 insertions(+), 10 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 21c3b07..2658b43 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1801,12 +1801,6 @@ cleanup: } -static bool -isValidIfname(const char *ifname) { - return ifname[strspn(ifname, VALID_IFNAME_CHARS)] == 0; -} - - /* Parse the XML definition for a network interface * @param node XML nodeset to parse for net definition * @return 0 on success, -1 on failure @@ -1889,11 +1883,9 @@ virDomainNetDefParseXML(virCapsPtr caps, xmlStrEqual(cur->name, BAD_CAST "target")) { ifname = virXMLPropString(cur, "dev"); if ((ifname != NULL) && - (((flags & VIR_DOMAIN_XML_INACTIVE) && - (STRPREFIX((const char*)ifname, "vnet"))) || - (!isValidIfname(ifname)))) { + ((flags & VIR_DOMAIN_XML_INACTIVE) && + (STRPREFIX((const char*)ifname, "vnet")))) { /* An auto-generated target name, blank it out */ - /* blank out invalid interface names */ VIR_FREE(ifname); } } else if ((script == NULL) && -- 1.6.6.1

On 05/20/2010 01:57 PM, Chris Lalancette wrote:
We shouldn't be checking validity in domain_conf, since it can be used by multiple different hosts and hypervisors. Remove the check completely.
-static bool -isValidIfname(const char *ifname) { - return ifname[strspn(ifname, VALID_IFNAME_CHARS)] == 0; -}
Conditional ACK - you also need to nuke the (now unused) definition of VALID_IFNAME_CHARS in domain_conf.h before pushing. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 05/20/2010 05:30 PM, Eric Blake wrote:
On 05/20/2010 01:57 PM, Chris Lalancette wrote:
We shouldn't be checking validity in domain_conf, since it can be used by multiple different hosts and hypervisors. Remove the check completely.
-static bool -isValidIfname(const char *ifname) { - return ifname[strspn(ifname, VALID_IFNAME_CHARS)] == 0; -}
Conditional ACK - you also need to nuke the (now unused) definition of VALID_IFNAME_CHARS in domain_conf.h before pushing.
Yeah, true. I've pushed a combination of my and Charles Duffy's patch. Thanks for the review. Charles, could you give the head of the libvirt tree another test just to make sure this works for your use-case? Thanks. -- Chris Lalancette

Basic live migration was broken by the commit that added non-shared block support. It also added a couple of built-in memory leaks. This patch fixes these problems and allows normal live migration to work again. Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/qemu/qemu_driver.c | 3 --- src/qemu/qemu_monitor_text.c | 10 +++++++--- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index cfa5964..bbab893 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10205,9 +10205,6 @@ static int doNativeMigrate(struct qemud_driver *driver, qemuDomainObjPrivatePtr priv = vm->privateData; unsigned int background_flags = 0; - virCheckFlags(VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC, - -1); - /* Issue the migrate command. */ if (STRPREFIX(uri, "tcp:") && !STRPREFIX(uri, "tcp://")) { /* HACK: source host generates bogus URIs, so fix them up */ diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index 3df078a..41a1fab 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -1134,6 +1134,7 @@ static int qemuMonitorTextMigrate(qemuMonitorPtr mon, int ret = -1; char *safedest = qemuMonitorEscapeArg(dest); virBuffer extra = VIR_BUFFER_INITIALIZER; + char *extrastr = NULL; if (!safedest) { virReportOOMError(); @@ -1149,10 +1150,12 @@ static int qemuMonitorTextMigrate(qemuMonitorPtr mon, if (virBufferError(&extra)) { virBufferFreeAndReset(&extra); virReportOOMError(); - free(safedest); - return -1; + goto cleanup; } - if (virAsprintf(&cmd, "migrate %s\"%s\"", virBufferContentAndReset(&extra), safedest) < 0) { + + extrastr = virBufferContentAndReset(&extra); + if (virAsprintf(&cmd, "migrate %s\"%s\"", extra ? extrastr : "", + safedest) < 0) { virReportOOMError(); goto cleanup; } @@ -1181,6 +1184,7 @@ static int qemuMonitorTextMigrate(qemuMonitorPtr mon, ret = 0; cleanup: + VIR_FREE(extrastr); VIR_FREE(safedest); VIR_FREE(info); VIR_FREE(cmd); -- 1.6.6.1

On 05/20/2010 01:57 PM, Chris Lalancette wrote:
Basic live migration was broken by the commit that added non-shared block support. It also added a couple of built-in memory leaks. This patch fixes these problems and allows normal live migration to work again.
Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/qemu/qemu_driver.c | 3 --- src/qemu/qemu_monitor_text.c | 10 +++++++--- 2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index cfa5964..bbab893 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10205,9 +10205,6 @@ static int doNativeMigrate(struct qemud_driver *driver, qemuDomainObjPrivatePtr priv = vm->privateData; unsigned int background_flags = 0;
- virCheckFlags(VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC, - -1); -
Why are you removing the valid flag check altogether? Is it a matter of adding more valid flags, or is this just a forwarding routine to other methods that also do a valid flag check?
/* Issue the migrate command. */ if (STRPREFIX(uri, "tcp:") && !STRPREFIX(uri, "tcp://")) { /* HACK: source host generates bogus URIs, so fix them up */ diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index 3df078a..41a1fab 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c
The changes to qemu_monitor_text.c make sense, but I'd like to understand the qemu_driver.c change before giving an ack. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 05/20/2010 05:35 PM, Eric Blake wrote:
On 05/20/2010 01:57 PM, Chris Lalancette wrote:
Basic live migration was broken by the commit that added non-shared block support. It also added a couple of built-in memory leaks. This patch fixes these problems and allows normal live migration to work again.
Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/qemu/qemu_driver.c | 3 --- src/qemu/qemu_monitor_text.c | 10 +++++++--- 2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index cfa5964..bbab893 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10205,9 +10205,6 @@ static int doNativeMigrate(struct qemud_driver *driver, qemuDomainObjPrivatePtr priv = vm->privateData; unsigned int background_flags = 0;
- virCheckFlags(VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC, - -1); -
Why are you removing the valid flag check altogether? Is it a matter of adding more valid flags, or is this just a forwarding routine to other methods that also do a valid flag check?
Yeah, I guess I should have been a bit more explicit here. There's no hard-and-fast rules about virCheckFlags(), but the general usage of it seems to be directly at the entry points to the driver (i.e. in qemuDomainSuspend(), etc). doNativeMigrate() is not an entry point, it is a helper function. In addition, there is a long list of flags that migration supports, so the above list is far from complete. Would it be better if I added the check to qemuDomainMigratePerform() (with the complete list of flags), which is actually the entry point for this function? -- Chris Lalancette

On 05/21/2010 07:22 AM, Chris Lalancette wrote:
- virCheckFlags(VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC, - -1); -
Why are you removing the valid flag check altogether? Is it a matter of adding more valid flags, or is this just a forwarding routine to other methods that also do a valid flag check?
Yeah, I guess I should have been a bit more explicit here. There's no hard-and-fast rules about virCheckFlags(), but the general usage of it seems to be directly at the entry points to the driver (i.e. in qemuDomainSuspend(), etc). doNativeMigrate() is not an entry point, it is a helper function. In addition, there is a long list of flags that migration supports, so the above list is far from complete. Would it be better if I added the check to qemuDomainMigratePerform() (with the complete list of flags), which is actually the entry point for this function?
Sounds good to me - if all entry points filter on all accepted flags, then helper functions can assume that flags are already valid. As long as the filtering gets done somewhere, we've left the door open for adding future flags while still correctly identifying situations where we are talking to older implementations that can't honor new flags. It's only when there is no flag filtering at all that we've locked ourselves out of easy-to-validate future extensions. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 05/21/2010 12:48 PM, Eric Blake wrote:
On 05/21/2010 07:22 AM, Chris Lalancette wrote:
- virCheckFlags(VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC, - -1); -
Why are you removing the valid flag check altogether? Is it a matter of adding more valid flags, or is this just a forwarding routine to other methods that also do a valid flag check?
Yeah, I guess I should have been a bit more explicit here. There's no hard-and-fast rules about virCheckFlags(), but the general usage of it seems to be directly at the entry points to the driver (i.e. in qemuDomainSuspend(), etc). doNativeMigrate() is not an entry point, it is a helper function. In addition, there is a long list of flags that migration supports, so the above list is far from complete. Would it be better if I added the check to qemuDomainMigratePerform() (with the complete list of flags), which is actually the entry point for this function?
Sounds good to me - if all entry points filter on all accepted flags, then helper functions can assume that flags are already valid. As long as the filtering gets done somewhere, we've left the door open for adding future flags while still correctly identifying situations where we are talking to older implementations that can't honor new flags. It's only when there is no flag filtering at all that we've locked ourselves out of easy-to-validate future extensions.
Unfortunately doing this caused a bit of churn in the qemu driver. qemudDomainMigrate takes an unsigned long as flags instead of unsigned int, which required me to create two new macros: virCheckFlagsUI and virCheckFlagsUL. The good news is that with this patch in place we are doing more checking of the flags during migration, which is probably a good thing. A new patch is coming up. -- Chris Lalancette

On 05/24/2010 02:20 PM, Chris Lalancette wrote:
Sounds good to me - if all entry points filter on all accepted flags, then helper functions can assume that flags are already valid. As long as the filtering gets done somewhere, we've left the door open for adding future flags while still correctly identifying situations where we are talking to older implementations that can't honor new flags. It's only when there is no flag filtering at all that we've locked ourselves out of easy-to-validate future extensions.
Unfortunately doing this caused a bit of churn in the qemu driver. qemudDomainMigrate takes an unsigned long as flags instead of unsigned int, which required me to create two new macros: virCheckFlagsUI and virCheckFlagsUL. The good news is that with this patch in place we are doing more checking of the flags during migration, which is probably a good thing. A new patch is coming up.
Hmm, since virCheckFlags() is already a macro in the first place, can we use some sizeof(flags) magic to get it to polymorphically do the right thing rather than having to invent an alternate spelling? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 05/24/2010 04:34 PM, Eric Blake wrote:
On 05/24/2010 02:20 PM, Chris Lalancette wrote:
Sounds good to me - if all entry points filter on all accepted flags, then helper functions can assume that flags are already valid. As long as the filtering gets done somewhere, we've left the door open for adding future flags while still correctly identifying situations where we are talking to older implementations that can't honor new flags. It's only when there is no flag filtering at all that we've locked ourselves out of easy-to-validate future extensions.
Unfortunately doing this caused a bit of churn in the qemu driver. qemudDomainMigrate takes an unsigned long as flags instead of unsigned int, which required me to create two new macros: virCheckFlagsUI and virCheckFlagsUL. The good news is that with this patch in place we are doing more checking of the flags during migration, which is probably a good thing. A new patch is coming up.
Hmm, since virCheckFlags() is already a macro in the first place, can we use some sizeof(flags) magic to get it to polymorphically do the right thing rather than having to invent an alternate spelling?
We discussed this on IRC. Unfortunately I don't think you can use sizeof() directly, since on 32-bit sizeof(unsigned int) == sizeof(unsigned long). However, what I've done instead is to unconditionally promote flags to an unsigned long and then use %lx. That seems to work. I'll post v3 of the patch in a moment. -- Chris Lalancette

2010/5/25 Chris Lalancette <clalance@redhat.com>:
On 05/24/2010 04:34 PM, Eric Blake wrote:
On 05/24/2010 02:20 PM, Chris Lalancette wrote:
Sounds good to me - if all entry points filter on all accepted flags, then helper functions can assume that flags are already valid. As long as the filtering gets done somewhere, we've left the door open for adding future flags while still correctly identifying situations where we are talking to older implementations that can't honor new flags. It's only when there is no flag filtering at all that we've locked ourselves out of easy-to-validate future extensions.
Unfortunately doing this caused a bit of churn in the qemu driver. qemudDomainMigrate takes an unsigned long as flags instead of unsigned int, which required me to create two new macros: virCheckFlagsUI and virCheckFlagsUL. The good news is that with this patch in place we are doing more checking of the flags during migration, which is probably a good thing. A new patch is coming up.
Hmm, since virCheckFlags() is already a macro in the first place, can we use some sizeof(flags) magic to get it to polymorphically do the right thing rather than having to invent an alternate spelling?
We discussed this on IRC. Unfortunately I don't think you can use sizeof() directly, since on 32-bit sizeof(unsigned int) == sizeof(unsigned long). However, what I've done instead is to unconditionally promote flags to an unsigned long and then use %lx. That seems to work. I'll post v3 of the patch in a moment.
Well, the actual question is why does qemudDomainMigrate take an unsigned long? If it's because it should be able to take more then 32 flags then unsigned long is the wrong type obviously, because unsigned long is just 32 bit width on 32 bit platforms. For that case qemudDomainMigrate has to have an unsigned long long flags parameter. If qemudDomainMigrate doesn't need to support more than 32 flags then lets just change unsigned long to unsigned int there. If we want to extend virCheckFlags to cover up to 64 flags properly we have to use unsigned long long (64 bit width on 32 and 64 bit platforms) instead of unsigned long. Unfortunately the public API member virDomainMigrate takes a unsigned long flags parameter, but effectively limited to a 32 flags maximum, and we are stuck with that. Matthias

On 05/25/2010 10:52 AM, Matthias Bolte wrote:
Well, the actual question is why does qemudDomainMigrate take an unsigned long? If it's because it should be able to take more then 32 flags then unsigned long is the wrong type obviously, because unsigned long is just 32 bit width on 32 bit platforms. For that case qemudDomainMigrate has to have an unsigned long long flags parameter. If qemudDomainMigrate doesn't need to support more than 32 flags then lets just change unsigned long to unsigned int there.
I don't really think it needs to support more than 32 flags. However, I don't think we can change it; on 64-bit, sizeof(unsigned long) == 8 while sizeof(unsigned int) == 4. -- Chris Lalancette

On 05/20/2010 03:57 PM, Chris Lalancette wrote:
We've been running into a lot of situations where virGetHostname() is returning "localhost", where a plain gethostname() would have returned the correct thing. This is because virGetHostname() is *always* trying to canonicalize the name returned from gethostname(), even when it doesn't have to.
This patch changes virGetHostname so that if the value returned from gethostname() is already FQDN or localhost, it returns that string directly. If the value returned from gethostname() is a shortened hostname, then we try to canonicalize it. If that succeeds, we returned the canonicalized hostname. If that fails, and/or returns "localhost", then we just return the original string we got from gethostname() and hope for the best.
Note that after this patch it is up to clients to check whether "localhost" is an allowed return value. The only place where it's currently not is in qemu migration.
Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/libvirt_private.syms | 1 - src/qemu/qemu_driver.c | 15 +++++-- src/util/util.c | 94 ++++++++++++++++++++++++--------------------- src/util/util.h | 1 - 4 files changed, 60 insertions(+), 51 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 104278f..130a2a1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -653,7 +653,6 @@ virExecDaemonize; virSetCloseExec; virSetNonBlock; virFormatMacAddr; -virGetHostnameLocalhost; virGetHostname; virParseMacAddr; virFileDeletePid; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0af64c7..cfa5964 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10017,7 +10017,7 @@ qemudDomainMigratePrepare2 (virConnectPtr dconn, virDomainDefPtr def = NULL; virDomainObjPtr vm = NULL; int this_port; - char *hostname; + char *hostname = NULL; char migrateFrom [64]; const char *p; virDomainEventPtr event = NULL; @@ -10057,9 +10057,15 @@ qemudDomainMigratePrepare2 (virConnectPtr dconn, if (port == QEMUD_MIGRATION_NUM_PORTS) port = 0;
/* Get hostname */ - if ((hostname = virGetHostnameLocalhost(0)) == NULL) + if ((hostname = virGetHostname(NULL)) == NULL) goto cleanup;
+ if (STRPREFIX(hostname, "localhost")) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("hostname on destination resolved to localhost, but migration requires an FQDN")); + goto cleanup; + } + /* XXX this really should have been a properly well-formed * URI, but we can't add in tcp:// now without breaking * compatability with old targets. We at least make the @@ -10067,7 +10073,6 @@ qemudDomainMigratePrepare2 (virConnectPtr dconn, */ /* Caller frees */ internalret = virAsprintf(uri_out, "tcp:%s:%d", hostname, this_port); - VIR_FREE(hostname); if (internalret < 0) { virReportOOMError(); goto cleanup; @@ -10171,10 +10176,10 @@ endjob: vm = NULL;
cleanup: + VIR_FREE(hostname); virDomainDefFree(def); - if (ret != 0) { + if (ret != 0) VIR_FREE(*uri_out); - } if (vm) virDomainObjUnlock(vm); if (event) diff --git a/src/util/util.c b/src/util/util.c index e937d39..930bfac 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2367,11 +2367,31 @@ char *virIndexToDiskName(int idx, const char *prefix) # define AI_CANONIDN 0 #endif
-char *virGetHostnameLocalhost(int allow_localhost) +/* Who knew getting a hostname could be so delicate. In Linux (and Unices + * in general), many things depend on "hostname" returning a value that will + * resolve one way or another. In the modern world where networks frequently + * come and go this is often being hard-coded to resolve to "localhost". If + * it *doesn't* resolve to localhost, then we would prefer to have the FQDN. + * That leads us to 3 possibilities: + * + * 1) gethostname() returns an FQDN (not localhost) - we return the string + * as-is, it's all of the information we want + * 2) gethostname() returns "localhost" - we return localhost; doing further + * work to try to resolve it is pointless + * 3) gethostname() returns a shortened hostname - in this case, we want to + * try to resolve this to a fully-qualified name. Therefore we pass it + * to getaddrinfo(). There are two possible responses: + * a) getaddrinfo() resolves to a FQDN - return the FQDN + * b) getaddrinfo() resolves to localhost - in this case, the data we got + * from gethostname() is actually more useful than what we got from + * getaddrinfo(). Return the value from gethostname() and hope for + * the best. + */ +char *virGetHostname(virConnectPtr conn ATTRIBUTE_UNUSED) {
Hmm, why isn't one of the fallback options: if (conn) hostname = parse_uri(conn.get_uri()) if hostname != "localhost": return hostname Seems like if MigratePrepare2 dconn is the remote connection, we are guaranteed to have a resolvable hostname in the URI (well, resolvable to the source connection at least). - Cole

On 05/21/2010 12:27 PM, Cole Robinson wrote:
Hmm, why isn't one of the fallback options:
if (conn) hostname = parse_uri(conn.get_uri()) if hostname != "localhost": return hostname
Seems like if MigratePrepare2 dconn is the remote connection, we are guaranteed to have a resolvable hostname in the URI (well, resolvable to the source connection at least).
(sorry I was late in replying to this) The thing is, I don't think that is generally going to be useful. At the moment, all of the callers pass NULL as conn. Even if we were to make some of the callers pass a real conn, because of the way the remote driver works it doesn't really map back to the IP address you care about. That is, at the place where you are running virsh (for instance), conn.uri() is actually something useful. But by the time you've remoted through the remote_driver and made it to the remote libvirtd on the destination of the migration, you are really doing a "local" connection from libvirtd to the qemu driver, so conn.uri() basically contains "qemu:///system", which is not particularly helpful. -- Chris Lalancette

On 05/24/2010 04:17 PM, Chris Lalancette wrote:
On 05/21/2010 12:27 PM, Cole Robinson wrote:
Hmm, why isn't one of the fallback options:
if (conn) hostname = parse_uri(conn.get_uri()) if hostname != "localhost": return hostname
Seems like if MigratePrepare2 dconn is the remote connection, we are guaranteed to have a resolvable hostname in the URI (well, resolvable to the source connection at least).
(sorry I was late in replying to this)
The thing is, I don't think that is generally going to be useful. At the moment, all of the callers pass NULL as conn. Even if we were to make some of the callers pass a real conn, because of the way the remote driver works it doesn't really map back to the IP address you care about. That is, at the place where you are running virsh (for instance), conn.uri() is actually something useful. But by the time you've remoted through the remote_driver and made it to the remote libvirtd on the destination of the migration, you are really doing a "local" connection from libvirtd to the qemu driver, so conn.uri() basically contains "qemu:///system", which is not particularly helpful.
My bad, I figured dconn had the same URI on the dest host as it did on the source host. ACK to this patch then. Maybe for the future we should consider always building a manual migrate URI in virsh.c (or maybe libvirt.c), using the destination connection URI hostname. This is what we do in virt-manager, since it's really it gives the closest guarantee for success. - Cole

On 05/24/2010 04:50 PM, Cole Robinson wrote:
My bad, I figured dconn had the same URI on the dest host as it did on the source host. ACK to this patch then.
Maybe for the future we should consider always building a manual migrate URI in virsh.c (or maybe libvirt.c), using the destination connection URI hostname. This is what we do in virt-manager, since it's really it gives the closest guarantee for success.
Yeah, to be perfectly frank, the fact that the destination of the migration tells us the hostname to migrate to is actually a design bug in the live migration protocol. I tried to do something like you suggest, but I couldn't come up with a way that would support both old and new clients and servers and not be qemu-specific. For now, I think this is the best we can do. I think the correct answer in the medium to long term is to make tunnelled migration the default, since we are already certain that the libvirt control channel works. -- Chris Lalancette

On 05/24/2010 04:50 PM, Cole Robinson wrote:
On 05/24/2010 04:17 PM, Chris Lalancette wrote:
On 05/21/2010 12:27 PM, Cole Robinson wrote:
Hmm, why isn't one of the fallback options:
if (conn) hostname = parse_uri(conn.get_uri()) if hostname != "localhost": return hostname
Seems like if MigratePrepare2 dconn is the remote connection, we are guaranteed to have a resolvable hostname in the URI (well, resolvable to the source connection at least).
(sorry I was late in replying to this)
The thing is, I don't think that is generally going to be useful. At the moment, all of the callers pass NULL as conn. Even if we were to make some of the callers pass a real conn, because of the way the remote driver works it doesn't really map back to the IP address you care about. That is, at the place where you are running virsh (for instance), conn.uri() is actually something useful. But by the time you've remoted through the remote_driver and made it to the remote libvirtd on the destination of the migration, you are really doing a "local" connection from libvirtd to the qemu driver, so conn.uri() basically contains "qemu:///system", which is not particularly helpful.
My bad, I figured dconn had the same URI on the dest host as it did on the source host. ACK to this patch then.
Maybe for the future we should consider always building a manual migrate URI in virsh.c (or maybe libvirt.c), using the destination connection URI hostname. This is what we do in virt-manager, since it's really it gives the closest guarantee for success.
Thanks for the review. I've now pushed this patch, as it seems to resolve most of the problems we are having with resolving localhost. -- Chris Lalancette

On Fri, May 21, 2010 at 12:27:27PM -0400, Cole Robinson wrote:
On 05/20/2010 03:57 PM, Chris Lalancette wrote:
We've been running into a lot of situations where virGetHostname() is returning "localhost", where a plain gethostname() would have returned the correct thing. This is because virGetHostname() is *always* trying to canonicalize the name returned from gethostname(), even when it doesn't have to.
This patch changes virGetHostname so that if the value returned from gethostname() is already FQDN or localhost, it returns that string directly. If the value returned from gethostname() is a shortened hostname, then we try to canonicalize it. If that succeeds, we returned the canonicalized hostname. If that fails, and/or returns "localhost", then we just return the original string we got from gethostname() and hope for the best.
Note that after this patch it is up to clients to check whether "localhost" is an allowed return value. The only place where it's currently not is in qemu migration.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Hmm, why isn't one of the fallback options:
if (conn) hostname = parse_uri(conn.get_uri()) if hostname != "localhost": return hostname
Seems like if MigratePrepare2 dconn is the remote connection, we are guaranteed to have a resolvable hostname in the URI (well, resolvable to the source connection at least).
In the context of migration, the 'dconn' hostname is potentially troublesome. For normal migration, the hostname is what the libvirt client app needs, which may not be relevant for the source host. eg I use SSH based connections for all my machines, and have aliases in $HOME/.ssh/config. So I just use qemu+ssh://machinealias/system where 'machinealias' is only meaningful wrt to my $HOME/.ssh/config. Similar problems occur if you use a SSH tunnel. If doing PEER2PEER migration, then we probably could more reasonably rely on the dconn hostname, since that's got to be suitable for the source host to use. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (5)
-
Chris Lalancette
-
Cole Robinson
-
Daniel P. Berrange
-
Eric Blake
-
Matthias Bolte