Re: [libvirt] [Users] Migration issues with ovirt 3.3

On Wed, Oct 09, 2013 at 02:42:22PM +0200, Gianluca Cecchi wrote:
On Tue, Oct 8, 2013 at 10:40 AM, Dan Kenigsberg wrote:
But migration still fails
It seems like an unrelated failure. I do not know what's blocking migration traffic. Could you see if libvirtd.log and qemu logs at source and destinaiton have clues?
It seems that on VM.log under qemu of desdt host I have: ... -incoming tcp:[::]:49153: Failed to bind socket: Address already in use
Is that port really taken (`ss -ntp` should tell by whom)?
See all: - In libvirtd.log of source host 2013-10-07 23:20:54.471+0000: 1209: debug : qemuMonitorOpenInternal:751 : QEMU_MONITOR_NEW: mon=0x7fc66412e820 refs=2 fd=30 2013-10-07 23:20:54.472+0000: 1209: warning : qemuDomainObjEnterMonitorInternal:1136 : This thread seems to be the async job owner; entering monitor without asking for a nested job is dangerous 2013-10-07 23:20:54.472+0000: 1209: debug : qemuMonitorSetCapabilities:1145 : mon=0x7fc66412e820 2013-10-07 23:20:54.472+0000: 1209: debug : qemuMonitorSend:887 : QEMU_MONITOR_SEND_MSG: mon=0x7fc66412e820 msg={"execute":"qmp_capabilities","id":"libvirt-1"} fd=-1 2013-10-07 23:20:54.769+0000: 1199: error : qemuMonitorIORead:505 : Unable to read from monitor: Connection reset by peer 2013-10-07 23:20:54.769+0000: 1199: debug : qemuMonitorIO:638 : Error on monitor Unable to read from monitor: Connection reset by peer 2013-10-07 23:20:54.769+0000: 1199: debug : qemuMonitorIO:672 : Triggering error callback 2013-10-07 23:20:54.769+0000: 1199: debug : qemuProcessHandleMonitorError:351 : Received error on 0x7fc664124fb0 'c8again32' 2013-10-07 23:20:54.769+0000: 1209: debug : qemuMonitorSend:899 : Send command resulted in error Unable to read from monitor: Connection reset by peer 2013-10-07 23:20:54.770+0000: 1199: debug : qemuMonitorIO:638 : Error on monitor Unable to read from monitor: Connection reset by peer 2013-10-07 23:20:54.770+0000: 1209: debug : virFileMakePathHelper:1283 : path=/var/run/libvirt/qemu mode=0777 2013-10-07 23:20:54.770+0000: 1199: debug : qemuMonitorIO:661 : Triggering EOF callback 2013-10-07 23:20:54.770+0000: 1199: debug : qemuProcessHandleMonitorEOF:294 : Received EOF on 0x7fc664124fb0 'c8again32' 2013-10-07 23:20:54.770+0000: 1209: debug : qemuProcessStop:3992 : Shutting down VM 'c8again32' pid=18053 flags=0 2013-10-07 23:20:54.771+0000: 1209: error : virNWFilterDHCPSnoopEnd:2135 : internal error ifname "vnet0" not in key map 2013-10-07 23:20:54.782+0000: 1209: debug : virCommandRunAsync:2251 : About to run /bin/sh -c 'IPT="/usr/sbin/iptables" $IPT -D libvirt-out -m physdev --physdev-is-bridged --physdev-out vnet0 -g FO-vnet0 $IPT -D libvirt-out -m physdev --physdev-out vnet0 -g FO-vnet0 $IPT -D libvirt-in -m physdev --physdev-in vnet0 -g FI-vnet0 $IPT -D libvirt-host-in -m physdev --physdev-in vnet0 -g HI-vnet0 $IPT -D libvirt-in-post -m physdev --physdev-in vnet0 -j ACCEPT $IPT -F FO-vnet0 $IPT -X FO-vnet0 $IPT -F FI-vnet0 $IPT -X FI-vnet0 $IPT -F HI-vnet0 $IPT -X HI-vnet0 IPT="/usr/sbin/ip6tables" $IPT -D libvirt-out -m physdev --physdev-is-bridged --physdev-out vnet0 -g FO-vnet0 $IPT -D libvirt-out -m physdev --physdev-out vnet0 -g FO-vnet0 $IPT -D libvirt-in -m physdev --physdev-in vnet0 -g FI-vnet0 $IPT -D libvirt-host-in -m physdev --physdev-in vnet0 -g HI-vnet0 $IPT -D libvirt-in-post -m physdev --physdev-in vnet0 -j ACCEPT $IPT -F FO-vnet0 $IPT -X FO-vnet0 $IPT -F FI-vnet0 $IPT -X FI-vnet0 $IPT -F HI-vnet0 $IPT -X HI-vnet0 EBT="/usr/sbin/ebtables" $EBT -t nat -D PREROUTING -i vnet0 -j libvirt-I-vnet0 $EBT -t nat -D POSTROUTING -o vnet0 -j libvirt-O-vnet0 EBT="/usr/sbin/ebtables" collect_chains() { for tmp2 in $*; do for tmp in $($EBT -t nat -L $tmp2 | \ sed -n "/Bridge chain/,\$ s/.*-j \\([IO]-.*\\)/\\1/p"); do echo $tmp collect_chains $tmp done done } rm_chains() { for tmp in $*; do $EBT -t nat -F $tmp; done for tmp in $*; do $EBT -t nat -X $tmp; done } tmp='\'' '\'' IFS='\'' '\'''\'' '\''$tmp chains="$(collect_chains libvirt-I-vnet0 libvirt-O-vnet0)" $EBT -t nat -F libvirt-I-vnet0 $EBT -t nat -F libvirt-O-vnet0 rm_chains $chains $EBT -t nat -F libvirt-I-vnet0 $EBT -t nat -X libvirt-I-vnet0 $EBT -t nat -F libvirt-O-vnet0 $EBT -t nat -X libvirt-O-vnet0 ' 2013-10-07 23:20:54.784+0000: 1209: debug : virCommandRunAsync:2256 : Command result 0, with PID 18076 2013-10-07 23:20:54.863+0000: 1209: debug : virCommandRun:2125 : Result exit status 255, stdout: '' stderr: 'iptables v1.4.18: goto 'FO-vnet0' is not a chain
Try `iptables -h' or 'iptables --help' for more information. iptables v1.4.18: goto 'FO-vnet0' is not a chain
Try `iptables -h' or 'iptables --help' for more information. iptables v1.4.18: goto 'FI-vnet0' is not a chain Try `iptables -h' or 'iptables --help' for more information. iptables v1.4.18: goto 'HI-vnet0' is not a chain
Try `iptables -h' or 'iptables --help' for more information. iptables: Bad rule (does a matching rule exist in that chain?). iptables: No chain/target/match by that name. iptables: No chain/target/match by that name. iptables: No chain/target/match by that name. iptables: No chain/target/match by that name. iptables: No chain/target/match by that name. iptables: No chain/target/match by that name. ip6tables v1.4.18: goto 'FO-vnet0' is not a chain
Try `ip6tables -h' or 'ip6tables --help' for more information. ip6tables v1.4.18: goto 'FO-vnet0' is not a chain
Try `ip6tables -h' or 'ip6tables --help' for more information. ip6tables v1.4.18: goto 'FI-vnet0' is not a chain
Try `ip6tables -h' or 'ip6tables --help' for more information. ip6tables v1.4.18: goto 'HI-vnet0' is not a chain
Try `ip6tables -h' or 'ip6tables --help' for more information. ip6tables: Bad rule (does a matching rule exist in that chain?). ip6tables: No chain/target/match by that name. ip6tables: No chain/target/match by that name. ip6tables: No chain/target/match by that name. ip6tables: No chain/target/match by that name. ip6tables: No chain/target/match by that name. ip6tables: No chain/target/match by that name. Illegal target name 'libvirt-O-vnet0'. Chain 'libvirt-O-vnet0' doesn't exist. Chain 'libvirt-O-vnet0' doesn't exist. Chain 'libvirt-O-vnet0' doesn't exist. Chain 'libvirt-O-vnet0' doesn't exist. ' 2013-10-07 23:20:54.863+0000: 1209: debug : qemuMonitorClose:821 : QEMU_MONITOR_CLOSE: mon=0x7fc66412e820 refs=2 2013-10-07 23:20:54.863+0000: 1209: debug : qemuProcessKill:3951 : vm=c8again32 pid=18053 flags=5 2013-10-07 23:20:54.863+0000: 1209: debug : virProcessKillPainfully:269 : vpid=18053 force=1 2013-10-07 23:20:54.863+0000: 1209: debug : qemuDomainCleanupRun:2132 : driver=0x7fc664024cd0, vm=c8again32 2013-10-07 23:20:54.863+0000: 1209: debug : qemuProcessAutoDestroyRemove:4504 : vm=c8again32 2013-10-07 23:20:54.863+0000: 1209: debug : virQEMUCloseCallbacksUnset:744 : vm=c8again32, uuid=d54660a2-45ed-41ae-ab99-a6f93ebbdbb1, cb=0x7fc66b6fe570 2013-10-07 23:20:54.864+0000: 1209: error : virPortAllocatorRelease:174 : port 0 must be in range (5900, 65535) 2013-10-07 23:20:54.865+0000: 1209: debug : qemuDomainObjEndJob:1070 : Stopping job: none (async=migration in) 2013-10-07 23:20:54.865+0000: 1209: debug : qemuDomainObjEndAsyncJob:1088 : Stopping async job: migration in 2013-10-07 23:20:54.865+0000: 1199: debug : qemuProcessHandleMonitorEOF:306 : Domain 0x7fc664124fb0 is not active, ignoring EOF Caught Segmentation violation dumping internal log buffer:
This line one seems ominous. Can libvir-list help with it?

On Wed, Oct 9, 2013 at 3:43 PM, Dan Kenigsberg wrote:
On Wed, Oct 09, 2013 at 02:42:22PM +0200, Gianluca Cecchi wrote:
On Tue, Oct 8, 2013 at 10:40 AM, Dan Kenigsberg wrote:
But migration still fails
It seems like an unrelated failure. I do not know what's blocking migration traffic. Could you see if libvirtd.log and qemu logs at source and destinaiton have clues?
It seems that on VM.log under qemu of desdt host I have: ... -incoming tcp:[::]:49153: Failed to bind socket: Address already in use
Is that port really taken (`ss -ntp` should tell by whom)?
yeah ! It seems gluster uses it on both sides On destination [root@f18ovn01 qemu]# ss -ntp |egrep "State|49153" State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.3.1:975 192.168.3.1:49153 users:(("glusterfs",31166,7)) ESTAB 0 0 192.168.3.1:49153 192.168.3.3:972 users:(("glusterfsd",18615,14)) ESTAB 0 0 192.168.3.1:49153 192.168.3.1:965 users:(("glusterfsd",18615,13)) ESTAB 0 0 192.168.3.1:963 192.168.3.3:49153 users:(("glusterfs",31152,17)) ESTAB 0 0 192.168.3.1:49153 192.168.3.1:975 users:(("glusterfsd",18615,9)) ESTAB 0 0 192.168.3.1:49153 192.168.3.3:966 users:(("glusterfsd",18615,15)) ESTAB 0 0 192.168.3.1:965 192.168.3.1:49153 users:(("glusterfs",31152,7)) ESTAB 0 0 192.168.3.1:960 192.168.3.3:49153 users:(("glusterfs",31166,11)) ... [root@f18ovn01 qemu]# ps -ef|grep 31166 root 14950 10958 0 16:50 pts/0 00:00:00 grep --color=auto 31166 root 31166 1 0 Oct07 ? 00:00:04 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/626066f6d74e376808c27ad679a1e85c.socket --xlator-option *replicate*.node-uuid=ebaf2f1a-65a8-409a-b911-6e631a5f182f [root@f18ovn01 qemu]# lsof -Pp 31166|grep 49153 glusterfs 31166 root 7u IPv4 4703891 0t0 TCP f18ovn01.mydomain:975->f18ovn01.mydomain:49153 (ESTABLISHED) glusterfs 31166 root 11u IPv4 4780563 0t0 TCP f18ovn01.mydomain:960->f18ovn03.mydomain:49153 (ESTABLISHED) not so good indeed ;-)

On Wed, Oct 09, 2013 at 04:52:20PM +0200, Gianluca Cecchi wrote:
On Wed, Oct 9, 2013 at 3:43 PM, Dan Kenigsberg wrote:
On Wed, Oct 09, 2013 at 02:42:22PM +0200, Gianluca Cecchi wrote:
On Tue, Oct 8, 2013 at 10:40 AM, Dan Kenigsberg wrote:
But migration still fails
It seems like an unrelated failure. I do not know what's blocking migration traffic. Could you see if libvirtd.log and qemu logs at source and destinaiton have clues?
It seems that on VM.log under qemu of desdt host I have: ... -incoming tcp:[::]:49153: Failed to bind socket: Address already in use
Is that port really taken (`ss -ntp` should tell by whom)?
yeah ! It seems gluster uses it on both sides
Since libvirt has been using this port range first, would you open a bug on gluster to avoid it? Dan. (prays that Vdsm is not expected to edit libvirt's migration port range on installation)

Le mercredi 09 octobre 2013 à 16:18 +0100, Dan Kenigsberg a écrit :
Since libvirt has been using this port range first, would you open a bug on gluster to avoid it?
Already done: https://bugzilla.redhat.com/show_bug.cgi?id=987555 (not using ovirt, but the problem is easy to trigger as soon as you use GlusterFS and libvirt on the same boxes) Regards, Daniel
Dan. (prays that Vdsm is not expected to edit libvirt's migration port range on installation)
-- Daniel Berteaud FIREWALL-SERVICES SARL. Société de Services en Logiciels Libres Technopôle Montesquieu 33650 MARTILLAC Tel : 05 56 64 15 32 Fax : 05 56 64 15 32 Web : http://www.firewall-services.com

On Wed, Oct 9, 2013 at 5:52 PM, Daniel Berteaud wrote:
Le mercredi 09 octobre 2013 à 16:18 +0100, Dan Kenigsberg a écrit :
Since libvirt has been using this port range first, would you open a bug on gluster to avoid it?
Already done: https://bugzilla.redhat.com/show_bug.cgi?id=987555
(not using ovirt, but the problem is easy to trigger as soon as you use GlusterFS and libvirt on the same boxes)
Regards, Daniel
Dan. (prays that Vdsm is not expected to edit libvirt's migration port range on installation)
I added my point to the bugzilla and everyone on oVirt should do so in my opinion... very frustrating.. ;-( And it seems not so easy to change gluster range. I read in past threads for previous versions that the range 24009+ was written inside the code. In 3.4 it should be an option such as --volfile-server-port=PORT from a client point of view... but I tried some things and not able to arrive at a significant point... always 49152 and 49153 used for my two bricks Furthermore, as a side note, I wanted to try migration reattempt it several times as each time it chooses next port. I have two bricks so that the first two attempts fail; in VM.log I get ... -incoming tcp:[::]:49152: Failed to bind socket: Address already in use ... -incoming tcp:[::]:49153: Failed to bind socket: Address already in use Unfortunately, tested three times with same result, after the second attempt the dest node goes into a loop of failing, recoverying from crash, non reposnive. I easily correct the situation with systemctl restart vdsmd on it, but possibly I could have raised some sort of bug. And then if I try again to migrate it restarts from 49152... so I cannot test it. See here below a file with what written in vdsmd.log, with the two attempts and the failure and the loop of type Thread-61::DEBUG::2013-10-10 01:15:32,161::BindingXMLRPC::986::vds::(wrapper) return getCapabilities with {'status': {'message': 'Recovering from crash or Initializing', 'code': 99}} https://docs.google.com/file/d/0BwoPbcrMv8mvamR3RmpzLU11OFE/edit?usp=sharing

On Wed, Oct 09, 2013 at 16:18:25 +0100, Dan Kenigsberg wrote:
On Wed, Oct 09, 2013 at 04:52:20PM +0200, Gianluca Cecchi wrote:
On Wed, Oct 9, 2013 at 3:43 PM, Dan Kenigsberg wrote:
On Wed, Oct 09, 2013 at 02:42:22PM +0200, Gianluca Cecchi wrote:
On Tue, Oct 8, 2013 at 10:40 AM, Dan Kenigsberg wrote:
But migration still fails
It seems like an unrelated failure. I do not know what's blocking migration traffic. Could you see if libvirtd.log and qemu logs at source and destinaiton have clues?
It seems that on VM.log under qemu of desdt host I have: ... -incoming tcp:[::]:49153: Failed to bind socket: Address already in use
Is that port really taken (`ss -ntp` should tell by whom)?
yeah ! It seems gluster uses it on both sides
Since libvirt has been using this port range first, would you open a bug on gluster to avoid it?
Dan. (prays that Vdsm is not expected to edit libvirt's migration port range on installation)
If it makes you feel better, you can't even do that :-) Unfortunately, the port range used for migration is hard-coded in libvirt. And since we don't use port allocator yet (patches are in progress), we don't automatically avoid ports that are already taken. All this is going to change, though. Jirka

On Thu, Oct 10, 2013 at 08:40:29AM +0200, Jiri Denemark wrote:
On Wed, Oct 09, 2013 at 16:18:25 +0100, Dan Kenigsberg wrote:
On Wed, Oct 09, 2013 at 04:52:20PM +0200, Gianluca Cecchi wrote:
On Wed, Oct 9, 2013 at 3:43 PM, Dan Kenigsberg wrote:
On Wed, Oct 09, 2013 at 02:42:22PM +0200, Gianluca Cecchi wrote:
On Tue, Oct 8, 2013 at 10:40 AM, Dan Kenigsberg wrote:
> > But migration still fails >
It seems like an unrelated failure. I do not know what's blocking migration traffic. Could you see if libvirtd.log and qemu logs at source and destinaiton have clues?
It seems that on VM.log under qemu of desdt host I have: ... -incoming tcp:[::]:49153: Failed to bind socket: Address already in use
Is that port really taken (`ss -ntp` should tell by whom)?
yeah ! It seems gluster uses it on both sides
Since libvirt has been using this port range first, would you open a bug on gluster to avoid it?
Dan. (prays that Vdsm is not expected to edit libvirt's migration port range on installation)
If it makes you feel better, you can't even do that :-) Unfortunately, the port range used for migration is hard-coded in libvirt. And since we don't use port allocator yet (patches are in progress), we don't automatically avoid ports that are already taken. All this is going to change, though.
Are these patches posted? Is there a libvirt BZ that tracks this issue? Dan.

On Fri, Oct 11, 2013 at 11:41 PM, Dan Kenigsberg wrote:
yeah ! It seems gluster uses it on both sides
Since libvirt has been using this port range first, would you open a bug on gluster to avoid it?
Dan. (prays that Vdsm is not expected to edit libvirt's migration port range on installation)
If it makes you feel better, you can't even do that :-) Unfortunately, the port range used for migration is hard-coded in libvirt. And since we don't use port allocator yet (patches are in progress), we don't automatically avoid ports that are already taken. All this is going to change, though.
Are these patches posted? Is there a libvirt BZ that tracks this issue?
Dan.
In the mean time I got 3.4.1 source rpm for fedora 19 and rebuilt with patches review as in comment#2 of https://bugzilla.redhat.com/show_bug.cgi?id=1018178 Now range is 50152+ and I'm able to successfully migrate in oVirt my CentOS 6.4 VM Gianluca

On Fri, Oct 11, 2013 at 22:41:32 +0100, Dan Kenigsberg wrote:
On Thu, Oct 10, 2013 at 08:40:29AM +0200, Jiri Denemark wrote:
On Wed, Oct 09, 2013 at 16:18:25 +0100, Dan Kenigsberg wrote:
On Wed, Oct 09, 2013 at 04:52:20PM +0200, Gianluca Cecchi wrote:
On Wed, Oct 9, 2013 at 3:43 PM, Dan Kenigsberg wrote:
On Wed, Oct 09, 2013 at 02:42:22PM +0200, Gianluca Cecchi wrote:
On Tue, Oct 8, 2013 at 10:40 AM, Dan Kenigsberg wrote:
> >> >> But migration still fails >> > > It seems like an unrelated failure. I do not know what's blocking > migration traffic. Could you see if libvirtd.log and qemu logs at source > and destinaiton have clues? >
It seems that on VM.log under qemu of desdt host I have: ... -incoming tcp:[::]:49153: Failed to bind socket: Address already in use
Is that port really taken (`ss -ntp` should tell by whom)?
yeah ! It seems gluster uses it on both sides
Since libvirt has been using this port range first, would you open a bug on gluster to avoid it?
Dan. (prays that Vdsm is not expected to edit libvirt's migration port range on installation)
If it makes you feel better, you can't even do that :-) Unfortunately, the port range used for migration is hard-coded in libvirt. And since we don't use port allocator yet (patches are in progress), we don't automatically avoid ports that are already taken. All this is going to change, though.
Are these patches posted? Is there a libvirt BZ that tracks this issue?
https://www.redhat.com/archives/libvir-list/2013-October/msg00513.html is v3 of a patch that makes libvirt skip ports that are already in use. Of course, if something binds to that port just before QEMU is started, it will still fail. But that's a separate issue that may only be solved by giving QEMU a socket which is already bound rather than telling QEMU to bind to it. A patch for making the port range configurable was not post yet... BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1018530 Jirka
participants (4)
-
Dan Kenigsberg
-
Daniel Berteaud
-
Gianluca Cecchi
-
Jiri Denemark