Quoting Alex Bligh (alex(a)alex.org.uk):
Serge,
On 7 Aug 2014, at 03:50, Serge Hallyn <serge.hallyn(a)ubuntu.com> wrote:
> This worked for me when migrating by hand. I'm trying to make it work
> through libvirt, using the following patch. (So whether to have
> pc-1.0 be treated as qemu's or qemu-kvm's pc-1.0 is specifed using a
> boolean in /etc/libvirt/qemu.conf) Qemu starts with decent
> looking args, but for some reason the the migration is failing -
> still looking through the logfile to figure out why.
Are you using exactly the same arguments by hand and with libvirt?
Also, on reflection, given one of the changes between 1.0 and 2.0
is ACPI, I should probably have done some testing with an ACPI
enabled image, rather than just cirros (which not ACPI enabled);
any chance this is ACPI related?
Turning off acpi (well, commenting it out in the xml, which I'm assuming
dtrt) doesn't help:
===============================================
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
QEMU_AUDIO_DRV=none /usr/bin/kvm -name cirros -S -global
virtio-net-pci.romfile=pxe-virtio.rom.12.04 -machine pc-1.0-qemu-kvm,accel=kvm,usb=off -m
512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid
2542c328-6842-33ef-d30e-866c3f3189a8 -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/cirros.monitor,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -boot
strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/var/lib/libvirt/images/cirros.img,if=none,id=drive-ide0-0-0,format=raw -device
ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev
tap,fd=26,id=hostnet0 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:be:d8:99,bus=pci.0,addr=0x3 -chardev
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x4
-incoming fd:23 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg
timestamp=on
2014-08-07 12:51:02.400+0000: 1539: debug : virFileClose:99 : Closed fd 25
2014-08-07 12:51:02.401+0000: 1539: debug : virFileClose:99 : Closed fd 31
2014-08-07 12:51:02.401+0000: 1539: debug : virFileClose:99 : Closed fd 3
2014-08-07 12:51:02.401+0000: 1540: debug : virExec:616 : Run hook 0x7f25cb17bca0
0x7f25d3aedf20
2014-08-07 12:51:02.401+0000: 1540: debug : qemuProcessHook:2719 : Obtaining domain lock
2014-08-07 12:51:02.401+0000: 1540: debug : virDomainLockProcessStart:175 :
plugin=0x7f25c4170290 dom=0x7f25c4186510 paused=1 fd=0x7f25d3aedb44
2014-08-07 12:51:02.401+0000: 1540: debug : virDomainLockManagerNew:133 :
plugin=0x7f25c4170290 dom=0x7f25c4186510 withResources=1
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerPluginGetDriver:281 :
plugin=0x7f25c4170290
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerNew:305 : driver=0x7f25da723580
type=0 nparams=5 params=0x7f25d3aeda30 flags=0
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerLogParams:98 : key=uuid
type=uuid value=2542c328-6842-33ef-d30e-866c3f3189a8
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerLogParams:91 : key=name
type=string value=cirros
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerLogParams:79 : key=id
type=uint value=2
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerLogParams:79 : key=pid
type=uint value=1540
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerLogParams:94 : key=uri
type=cstring value=qemu:///system
2014-08-07 12:51:02.401+0000: 1540: debug : virDomainLockManagerNew:145 : Adding leases
2014-08-07 12:51:02.401+0000: 1540: debug : virDomainLockManagerNew:150 : Adding disks
2014-08-07 12:51:02.401+0000: 1540: debug : virDomainLockManagerAddDisk:91 : Add disk
/var/lib/libvirt/images/cirros.img
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerAddResource:332 :
lock=0x7f25c417b080 type=0 name=/var/lib/libvirt/images/cirros.img nparams=0 params=(nil)
flags=0
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerAcquire:350 :
lock=0x7f25c417b080 state='<null>' flags=3 action=0 fd=0x7f25d3aedb44
2014-08-07 12:51:02.401+0000: 1540: debug : virLockManagerFree:387 : lock=0x7f25c417b080
2014-08-07 12:51:02.401+0000: 1540: debug : virObjectUnref:259 : OBJECT_UNREF:
obj=0x7f25c415e620
2014-08-07 12:51:02.401+0000: 1540: debug : qemuProcessHook:2746 : Hook complete ret=0
2014-08-07 12:51:02.401+0000: 1540: debug : virExec:618 : Done hook 0
2014-08-07 12:51:02.401+0000: 1540: debug : virExec:638 : Setting child AppArmor profile
to libvirt-2542c328-6842-33ef-d30e-866c3f3189a8
2014-08-07 12:51:02.402+0000: 1540: debug : virExec:655 : Setting child uid:gid to 107:113
with caps 0
2014-08-07 12:51:02.402+0000: 1540: debug : virCommandHandshakeChild:358 : Notifying
parent for handshake start on 28
2014-08-07 12:51:02.402+0000: 1540: debug : virCommandHandshakeChild:366 : Waiting on
parent for handshake complete on 29
libvirt: error : libvirtd quit during handshake: Input/output error
2014-08-07 12:51:02.424+0000: shutting down
===============================================
Perhaps the key is here:
===============================================
2014-08-07 12:51:02.416+0000: 1119: debug : virEventPollDispatchHandles:508 :
EVENT_POLL_DISPATCH_HANDLE: watch=7 events=1
2014-08-07 12:51:02.416+0000: 1119: debug : udevEventHandleCallback:1585 : udev action:
'add'
2014-08-07 12:51:02.416+0000: 1119: debug : udevGetDeviceProperty:125 : udev reports
device 'rx-0' does not have property 'DRIVER'
2014-08-07 12:51:02.416+0000: 1119: debug : udevGetDeviceProperty:143 : Found property key
'SUBSYSTEM' value 'queues' for device with sysname 'rx-0'
2014-08-07 12:51:02.416+0000: 1119: debug : udevGetDeviceType:1279 : Could not determine
device type for device with sysfs name 'rx-0'
2014-08-07 12:51:02.416+0000: 1119: debug : udevAddOneDevice:1454 : Discarding device -1
0x7f25dcb40e80 /sys/devices/virtual/net/vnet0/queues/rx-0
===============================================
> Now sadly my
> tests are being further slowed down by qcow corruption on my host,
> but I don't think that was the cause of my failure.
Whilst getting the patch right in the first place I tend to
cp from a known good image. Obviously once it works, qcow2
corruption should not happen. But failed migrations (with
or without my patch) do appear to cause this relatively
frequently.
No no, the qcow2 corruption is nothing to do with the migration itself.
I'm testing migration between two vms on my laptop which are using
qcow2 snapshots for rootfs, and those are the ones getting the corruption.
This is
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1319578
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1315162 and
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1292234
which i could never reproduce before.
-serge