[libvirt] qemu_driver migrateuri handling broken?

Hi all. Short summary on DV's request ;) I ran into a problem migrating kvm machines with libvirt-0.6.5: 1) At first, using the same syntax for the migrateuri as with xen (just the IP) did not work... looking into the source code (! ;) ), I found a different syntax for qemu. 2) using tcp://<ip>:<port> just produced an 'unknown failure' on the receiving side: root@loadgen137:~> virsh -c qemu:///system migrate --live kvm-testnode-vnode3 qemu+tcp://10.192.11.136/system?no_verify=1 tcp://10.192.11.136:12345 error: Unknown failure (Note: it was working like a charm when I eliminated the migrateuri altogether, but this was not an option in all my work environments, since one of them doesn't feature DNS, breaking migration on the hostname alone (and keeping /etc/hosts in sync on n machines is something I'd like to avoid, if possible ;) )) 3) removing the case distinction and the handling of the migrateuri in the qemudDomainMigratePrepare2 function in qemu_driver.c entirely (if-statement, and full else-part) solved both my issues. --- The issue seems to be current with 0.7.1 as well (at least I did receive the same error message - I did not investigate further) --- My 2ct: ;) I think that a wrapper like libvirt should consequently abstract lower level details (such as 'qemu', or 'xen') as much as possible (that's what libvirt is for, I guess :) ). Since the removal of the URI handling solves both the issue _and_ eliminates a case distinction between xen and qemu (painlessly, as far as I can see), I'd suggest perhaps considering the removal. :) As to the sending side specification of transfer ports: I think there should always be a possiblity to leave port selection up to libvirt, as the sending side does not necessarily an oversight over the free ports on the receiving side. (I feel that the reason for the different syntax including mandatory :port part was motivated by the wish to allow users to specify ports on the sending side) Cheers, Gregor Schaffrath.

On Wed, Sep 16, 2009 at 05:42:50PM +0200, Gregor Schaffrath wrote:
Hi all.
Short summary on DV's request ;)
I ran into a problem migrating kvm machines with libvirt-0.6.5:
1) At first, using the same syntax for the migrateuri as with xen (just the IP) did not work... looking into the source code (! ;) ), I found a different syntax for qemu.
The URI schemes should be listed in the driver capabilities XML. The reason they are different is that they are two different ways of doing migration. We are working on a new tunnelled migration scheme that will be uniform across drivers.
2) using tcp://<ip>:<port> just produced an 'unknown failure' on the receiving side: root@loadgen137:~> virsh -c qemu:///system migrate --live kvm-testnode-vnode3 qemu+tcp://10.192.11.136/system?no_verify=1 tcp://10.192.11.136:12345 error: Unknown failure (Note: it was working like a charm when I eliminated the migrateuri altogether,
Hmm, try tcp:10.192.11.136:12345 instead - for some unknown reason it is not using correct URI formats.
3) removing the case distinction and the handling of the migrateuri in the qemudDomainMigratePrepare2 function in qemu_driver.c entirely (if-statement, and full else-part) solved both my issues.
I don't know what exactly you removed, by you'll almost certainly have broken something else. daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Mon, Sep 21, 2009 at 12:46:46PM +0100, Daniel P. Berrange wrote:
On Wed, Sep 16, 2009 at 05:42:50PM +0200, Gregor Schaffrath wrote:
Hi all.
Short summary on DV's request ;)
I ran into a problem migrating kvm machines with libvirt-0.6.5:
1) At first, using the same syntax for the migrateuri as with xen (just the IP) did not work... looking into the source code (! ;) ), I found a different syntax for qemu.
The URI schemes should be listed in the driver capabilities XML. The reason they are different is that they are two different ways of doing migration.
We are working on a new tunnelled migration scheme that will be uniform across drivers. ic - To be honest, I was confused by the migrateuri anyhow, since I considered the situation where libvirt traffic is tunneled via SSH, but Xen-/KVM-/foo communication may be direct a rather rare exception (or am I mistaken?) (I understood that this was the rationale behind the hostname-Query in the first place)
2) using tcp://<ip>:<port> just produced an 'unknown failure' on the receiving side: root@loadgen137:~> virsh -c qemu:///system migrate --live kvm-testnode-vnode3 qemu+tcp://10.192.11.136/system?no_verify=1 tcp://10.192.11.136:12345 error: Unknown failure (Note: it was working like a charm when I eliminated the migrateuri altogether,
Hmm, try tcp:10.192.11.136:12345 instead - for some unknown reason it is not using correct URI formats.
works indeed :)
3) removing the case distinction and the handling of the migrateuri in the qemudDomainMigratePrepare2 function in qemu_driver.c entirely (if-statement, and full else-part) solved both my issues.
I don't know what exactly you removed, by you'll almost certainly have broken something else.
the part doesn't do much besides setting the port - the hostname is even ignored ;) ... and the error comes from the receiving side - not the sending one. Therefore as far as I see, the only thing broken is that now the sending side can't choose the listening port number on the receiving side (is this a debugging feature?) Cheers & thx for the response, Gregor. ---snip--- root@loadgen137:/usr/src# diff libvirt-0.6.5/src/qemu_driver.c libvirt-0.6.5.modified/src/qemu_driver.c 4862c4862 < if (uri_in == NULL) { ---
//if (uri_in == NULL) {
4878c4878 < } else { ---
//} else {
4883c4883 < if (!STRPREFIX (uri_in, "tcp:")) { ---
/* if (!STRPREFIX (uri_in, "tcp:")) {
4887c4887 < } ---
}*/
4890,4892c4890,4892 < p = strrchr (uri_in, ':'); < p++; /* definitely has a ':' in it, see above */ < this_port = virParseNumber (&p); ---
// p = strrchr (uri_in, ':'); // p++; /* definitely has a ':' in it, see above */ /* this_port = virParseNumber (&p);
4898c4898 < } --- --snip---
}*/
daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Gregor Schaffrath wrote:
On Mon, Sep 21, 2009 at 12:46:46PM +0100, Daniel P. Berrange wrote:
On Wed, Sep 16, 2009 at 05:42:50PM +0200, Gregor Schaffrath wrote:
Hi all.
Short summary on DV's request ;)
I ran into a problem migrating kvm machines with libvirt-0.6.5:
1) At first, using the same syntax for the migrateuri as with xen (just the IP) did not work... looking into the source code (! ;) ), I found a different syntax for qemu. The URI schemes should be listed in the driver capabilities XML. The reason they are different is that they are two different ways of doing migration.
We are working on a new tunnelled migration scheme that will be uniform across drivers. ic - To be honest, I was confused by the migrateuri anyhow, since I considered the situation where libvirt traffic is tunneled via SSH, but Xen-/KVM-/foo communication may be direct a rather rare exception (or am I mistaken?) (I understood that this was the rationale behind the hostname-Query in the first place)
Well, the rationale is that you may have two paths to get to a machine, and you may only want to allow migration traffic on one of them (say, a direct cross-over) since the data goes across unencrypted. So you might have a machine with eth0 and eth1, where eth0 is exposed to the world, and you have libvirtd listening on eth0. But then when you actually do the migration, you want it to send the data across on eth1. Note also that libvirt traffic tunnelled via ssh isn't the only method, you can also attach to libvirtd via TLS and TCP (with SASL encryption).
2) using tcp://<ip>:<port> just produced an 'unknown failure' on the receiving side: root@loadgen137:~> virsh -c qemu:///system migrate --live kvm-testnode-vnode3 qemu+tcp://10.192.11.136/system?no_verify=1 tcp://10.192.11.136:12345 error: Unknown failure (Note: it was working like a charm when I eliminated the migrateuri altogether, Hmm, try tcp:10.192.11.136:12345 instead - for some unknown reason it is not using correct URI formats. works indeed :)
3) removing the case distinction and the handling of the migrateuri in the qemudDomainMigratePrepare2 function in qemu_driver.c entirely (if-statement, and full else-part) solved both my issues. I don't know what exactly you removed, by you'll almost certainly have broken something else.
the part doesn't do much besides setting the port - the hostname is even ignored ;) ... and the error comes from the receiving side - not the sending one.
Therefore as far as I see, the only thing broken is that now the sending side can't choose the listening port number on the receiving side (is this a debugging feature?)
Not exactly a debugging feature, more a "give more control to the admin". If you do not supply a migrateuri, then libvirtd will choose a port between 49152 and 49216. However, if you don't want to open up all of those ports on your firewall, you can specify a migrateuri to say "use *this* port", and then you only have to open up one port in the firewall. So we do need to allow the migrateuri, and removing it isn't really feasible. The tunnelled migration stuff should make this a bit easier, although we'll still have to allow the migrateuri type stuff for the dual-homed situation. -- Chris Lalancette

On Wed, Sep 23, 2009 at 10:05:36AM +0200, Chris Lalancette wrote:
Gregor Schaffrath wrote:
On Mon, Sep 21, 2009 at 12:46:46PM +0100, Daniel P. Berrange wrote:
On Wed, Sep 16, 2009 at 05:42:50PM +0200, Gregor Schaffrath wrote:
Hi all.
Short summary on DV's request ;)
I ran into a problem migrating kvm machines with libvirt-0.6.5:
1) At first, using the same syntax for the migrateuri as with xen (just the IP) did not work... looking into the source code (! ;) ), I found a different syntax for qemu. The URI schemes should be listed in the driver capabilities XML. The reason they are different is that they are two different ways of doing migration.
We are working on a new tunnelled migration scheme that will be uniform across drivers. ic - To be honest, I was confused by the migrateuri anyhow, since I considered the situation where libvirt traffic is tunneled via SSH, but Xen-/KVM-/foo communication may be direct a rather rare exception (or am I mistaken?) (I understood that this was the rationale behind the hostname-Query in the first place)
Well, the rationale is that you may have two paths to get to a machine, and you may only want to allow migration traffic on one of them (say, a direct cross-over) since the data goes across unencrypted. So you might have a machine with eth0 and eth1, where eth0 is exposed to the world, and you have libvirtd listening on eth0. But then when you actually do the migration, you want it to send the data across on eth1.
More critically that than, you cannot assume that the libvirt URI used to connect to the machine actually has a hostname that refers to the machine in question, eg ssh -L 22:src.virt.machine:9001 -L 22:dst.virt.machine:9002 some.gateway.machine Now do migration using.... virsh -c xen+ssh://localhost:9001/ migrate --live guest xen+ssh://localhost:9002/ ...what hostname exactly are you going to expect the migration data to go over ? It certainly won't be 'localhost'. We have no choice but todo the hostname query on the target machine to discover the real primary hostname - the libvirt connection URIs are useless for determining the hostname. If the user then wants a different NIC that is not associated with the primary hostname we have to use the extra migrateuri field to supply that data as there's no other way to get it. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Sep 23, 2009 at 10:05:36AM +0200, Chris Lalancette wrote:
We are working on a new tunnelled migration scheme that will be uniform across drivers. ic - To be honest, I was confused by the migrateuri anyhow, since I considered the situation where libvirt traffic is tunneled via SSH, but Xen-/KVM-/foo communication may be direct a rather rare exception (or am I mistaken?) (I understood that this was the rationale behind the hostname-Query in the first place)
Well, the rationale is that you may have two paths to get to a machine, and you may only want to allow migration traffic on one of them (say, a direct cross-over) since the data goes across unencrypted. So you might have a machine with eth0 and eth1, where eth0 is exposed to the world, and you have libvirtd listening on eth0. But then when you actually do the migration, you want it to send the data across on eth1.
Note also that libvirt traffic tunnelled via ssh isn't the only method, you can also attach to libvirtd via TLS and TCP (with SASL encryption). Hm - I acknowledge that there might be such situation, so you want to have this feature. And as long as there's a way around the assumption that the remote hostname - especially without a domain part - is resolvable at the sending side, my only concern would be a unified migrateuri syntax, which seems to be on the way :) .
I guess my actual confusion is rather about the choice of the default behaviour, than the feature's existence (since I never had the split-path-situation in this context, but definitely had an environment where `hostname` output would not be resolvable, and I deemed the latter more probable than the prior) - but that's certainly debatable.
the part doesn't do much besides setting the port - the hostname is even ignored ;) ... and the error comes from the receiving side - not the sending one.
Therefore as far as I see, the only thing broken is that now the sending side can't choose the listening port number on the receiving side (is this a debugging feature?)
Not exactly a debugging feature, more a "give more control to the admin". If you do not supply a migrateuri, then libvirtd will choose a port between 49152 and 49216. However, if you don't want to open up all of those ports on your firewall, you can specify a migrateuri to say "use *this* port", and then you only have to open up one port in the firewall. So we do need to allow the migrateuri, and removing it isn't really feasible. Again acknowledged, but then I would request a possibility to specify the IP, while leaving the port choice up to the receiving side (basically making the port specification optional rather than mandatory).
My rationale for the request would be that when you consider scripted (i.e. automated) management of virtual nodes along with the possibility of several concurrent migrations, the port choice on the sending side is likely to turn out awkward, even though you may have an environment without DNS, or with dual-homing (and therefore need to specify the migrateuri).
The tunnelled migration stuff should make this a bit easier, although we'll still have to allow the migrateuri type stuff for the dual-homed situation.
Hm - I don't know what exactly you have in mind (not familiar with the plans). But I'd like to bring forward the point that as a user I was quite confused, because I intuitively expected a different default behaviour than the one libvirt currently exhibits (i.e., I was not prepared to get a 'hostname could not be resolved' type of error, when I specified an IP as migration destination ;) ). Cheers, Gregor.
-- Chris Lalancette
participants (3)
-
Chris Lalancette
-
Daniel P. Berrange
-
Gregor Schaffrath