[libvirt-users] Libvirt migration issues (0.9.4 and 0.9.9)

Dear all, we're having two different problems with migrations in libvirt, running as root user on host machines with CentOS release 5.5 (Final), kernel: Linux 2.6.32.24 #3 SMP Fri Oct 29 16:22:02 BST 2010 x86_64 x86_64 x86_64 GNU/Linux First case: virsh version Compiled against library: libvir 0.9.4 Using library: libvir 0.9.4 Using API: QEMU 0.9.4 Running hypervisor: QEMU 1.0.50 Migrations work well for a basic VM, but if we attach a disk to the usb bus migration is no longer possible and fails with the error message: "error: operation failed: migration job: is not active". This is regardless of if the device is mounted inside the VM or not (debian). Please find more information attached. If we attach the same (.iso based) disk to the scsi bus instead, migrations work as normal. ---- To mitigate this problem, we tried upgrading to a more recent libvirt version: Compiled against library: libvir 0.9.9 Using library: libvir 0.9.9 Using API: QEMU 0.9.9 Running hypervisor: QEMU 1.0.50 When trying to migrate a normal (debian) instance from one host to another using the same domain as in the previous successful case without any devices attached, migration fails with the error message "error: Unable to copy socket file handle: Invalid argument". The libvirt.log only has a similar single-line of information: 2012-01-30 15:44:46.772+0000: 7546: error : virNetSocketDupFD:787 : Unable to copy socket file handle: Invalid argument. The network configuration used here is the same as we successfully used in the 0.9.4 test case, using static ip's. Thankful for assistance, not really sure what to try next. :) Regards, Daniel Espling

Hi again, I spent some time trying to debug this: added some printouts and noticed that the virNetSocketDupFD() function is called with cloexec = True, hence triggering the call: fd = fcntl(sock->fd, F_DUPFD_CLOEXEC); However, running on CentOS 5.5 our glibc version is glibc-2.5-49.el5_5.5, and it seems the F_DUPFD_CLOEXEC flag was not added until glibc2.7 [1]. I then tried to replace that code with: if (cloexec) { fd = fcntl(sock->fd, F_DUPFD); if (fd >= 0) fcntl(fd, F_SETFD, FD_CLOEXEC); } but the results are the same; Unable to copy socket file handle: Invalid argument I also added some more printouts to find more about the fd: 2012-01-31 11:20:24.093+0000: 10445: error : virNetSocketDupFD:790 : sock->fd: 15, cloexec: 1: Invalid argument and can at least see that fd 15 is the one complaining. Looking at lsof for the KVM process: kvm 7456 root 13u 0000 0,8 0 852 anon_inode kvm 7456 root 14u IPv4 167375 TCP localhost.localdomain:5900 (LISTEN) kvm 7456 root 15r FIFO 0,7 167376 pipe kvm 7456 root 16w FIFO 0,7 167376 pipe kvm 7456 root 17u CHR 10,200 2986 /dev/net/tun Seems like it fails duplicating a pipe leading back to the same process? Regards, Daniel 1) http://stackoverflow.com/questions/1643304/how-to-set-close-on-exec-by-defau... On Jan 30, 2012, at 3:14 PM, Daniel Espling wrote:
Dear all,
we're having two different problems with migrations in libvirt, running as root user on host machines with CentOS release 5.5 (Final), kernel: Linux 2.6.32.24 #3 SMP Fri Oct 29 16:22:02 BST 2010 x86_64 x86_64 x86_64 GNU/Linux
First case:
virsh version Compiled against library: libvir 0.9.4 Using library: libvir 0.9.4 Using API: QEMU 0.9.4 Running hypervisor: QEMU 1.0.50
Migrations work well for a basic VM, but if we attach a disk to the usb bus migration is no longer possible and fails with the error message: "error: operation failed: migration job: is not active". This is regardless of if the device is mounted inside the VM or not (debian). Please find more information attached.
If we attach the same (.iso based) disk to the scsi bus instead, migrations work as normal.
----
To mitigate this problem, we tried upgrading to a more recent libvirt version:
Compiled against library: libvir 0.9.9 Using library: libvir 0.9.9 Using API: QEMU 0.9.9 Running hypervisor: QEMU 1.0.50
When trying to migrate a normal (debian) instance from one host to another using the same domain as in the previous successful case without any devices attached, migration fails with the error message "error: Unable to copy socket file handle: Invalid argument". The libvirt.log only has a similar single-line of information: 2012-01-30 15:44:46.772+0000: 7546: error : virNetSocketDupFD:787 : Unable to copy socket file handle: Invalid argument.
The network configuration used here is the same as we successfully used in the 0.9.4 test case, using static ip's.
Thankful for assistance, not really sure what to try next. :)
Regards, Daniel Espling
<libvirt_0.9.4.txt>_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users

Sorry for the spamming, but changing the below code: if (cloexec) fd = fcntl(sock->fd, F_DUPFD_CLOEXEC); else fd = dup(sock->fd); to: fd = dup(sock->fd); if (cloexec && fd >= 0) cntl(fd, F_SETFD, FD_CLOEXEC); made it work for me. //Daniel On Jan 31, 2012, at 10:32 AM, Daniel Espling wrote:
Hi again,
I spent some time trying to debug this:
added some printouts and noticed that the virNetSocketDupFD() function is called with cloexec = True, hence triggering the call: fd = fcntl(sock->fd, F_DUPFD_CLOEXEC);
However, running on CentOS 5.5 our glibc version is glibc-2.5-49.el5_5.5, and it seems the F_DUPFD_CLOEXEC flag was not added until glibc2.7 [1]. I then tried to replace that code with:
if (cloexec) { fd = fcntl(sock->fd, F_DUPFD); if (fd >= 0) fcntl(fd, F_SETFD, FD_CLOEXEC); }
but the results are the same; Unable to copy socket file handle: Invalid argument
I also added some more printouts to find more about the fd: 2012-01-31 11:20:24.093+0000: 10445: error : virNetSocketDupFD:790 : sock->fd: 15, cloexec: 1: Invalid argument
and can at least see that fd 15 is the one complaining. Looking at lsof for the KVM process:
kvm 7456 root 13u 0000 0,8 0 852 anon_inode kvm 7456 root 14u IPv4 167375 TCP localhost.localdomain:5900 (LISTEN) kvm 7456 root 15r FIFO 0,7 167376 pipe kvm 7456 root 16w FIFO 0,7 167376 pipe kvm 7456 root 17u CHR 10,200 2986 /dev/net/tun
Seems like it fails duplicating a pipe leading back to the same process?
Regards,
Daniel
1) http://stackoverflow.com/questions/1643304/how-to-set-close-on-exec-by-defau...
On Jan 30, 2012, at 3:14 PM, Daniel Espling wrote:
Dear all,
we're having two different problems with migrations in libvirt, running as root user on host machines with CentOS release 5.5 (Final), kernel: Linux 2.6.32.24 #3 SMP Fri Oct 29 16:22:02 BST 2010 x86_64 x86_64 x86_64 GNU/Linux
First case:
virsh version Compiled against library: libvir 0.9.4 Using library: libvir 0.9.4 Using API: QEMU 0.9.4 Running hypervisor: QEMU 1.0.50
Migrations work well for a basic VM, but if we attach a disk to the usb bus migration is no longer possible and fails with the error message: "error: operation failed: migration job: is not active". This is regardless of if the device is mounted inside the VM or not (debian). Please find more information attached.
If we attach the same (.iso based) disk to the scsi bus instead, migrations work as normal.
----
To mitigate this problem, we tried upgrading to a more recent libvirt version:
Compiled against library: libvir 0.9.9 Using library: libvir 0.9.9 Using API: QEMU 0.9.9 Running hypervisor: QEMU 1.0.50
When trying to migrate a normal (debian) instance from one host to another using the same domain as in the previous successful case without any devices attached, migration fails with the error message "error: Unable to copy socket file handle: Invalid argument". The libvirt.log only has a similar single-line of information: 2012-01-30 15:44:46.772+0000: 7546: error : virNetSocketDupFD:787 : Unable to copy socket file handle: Invalid argument.
The network configuration used here is the same as we successfully used in the 0.9.4 test case, using static ip's.
Thankful for assistance, not really sure what to try next. :)
Regards, Daniel Espling
<libvirt_0.9.4.txt>_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users

[please don't top-post on technical lists] On 01/31/2012 02:47 AM, Daniel Espling wrote:
Sorry for the spamming, but changing the below code:
if (cloexec) fd = fcntl(sock->fd, F_DUPFD_CLOEXEC); else fd = dup(sock->fd);
to:
fd = dup(sock->fd); if (cloexec && fd >= 0) cntl(fd, F_SETFD, FD_CLOEXEC);
made it work for me.
Thanks for the report. However, you shouldn't need to make this change. Gnulib should be replacing fcntl() on kernels that are too old to support F_DUPFD_CLOEXEC, and doing that work on your behalf so that the rest of the code can be written as though it were targetting newer kernels (with the only drawback being that it is not atomic like it is with newer kernels). I'll need to see if I can reproduce this situation, and figure out why gnulib isn't doing the right thing. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Daniel Espling
-
Eric Blake