
On Fri, Nov 11, 2011 at 01:03:20PM +0000, Daniel P. Berrange wrote:
Libvirt recently introduced a change to the way it does 'save to file' with QEMU. Historically QEMU has a 32MB/s I/O limit on migration by default. When saving to file, we didn't want any artificial limit, but rather to max out the underlying storage. So when doing save to file, we set a large bandwidth limit (INT64_MAX / (1024 * 1024)) so it is effectively unlimited.
After doing this, we discovered that the QEMU monitor was becoming entirely blocked. It did not even return from the 'migrate' command until migration was complete despite the 'detach' flag being set. This was a bug in libvirt, because we passed a plain file descriptor which does not support EAGAIN. Thank you POSIX.
Libvirt has another mode where it uses an I/O helper command so get O_DIRECT, and in this mode we pass a pipe() FD to QEMU. After ensuring that this pipe FD really does have O_NONBLOCK set, we still saw some odd behaviour.
I'm not sure whether what I describe can neccessarily be called a QEMU bug, but I wanted to raise it for discussion anyway....
The sequence of steps is
- libvirt sets qemu migration bandwidth to "unlimited" - libvirt opens a pipe() and sets O_NONBLOCK on the write end - libvirt spawns libvirt-iohelper giving it the target file on disk, and the read end of the pipe - libvirt does 'getfd migfile' monitor command to give QEMU the write end of the pipe - libvirt does 'migrate fd:migfile -d' to run migration - In parallel - QEMU is writing to the pipe (which is non-blocking) - libvirt_helper reading the pipe & writing to disk with O_DIRECT
I should have mentioned that the way I'm testing this is with libvirt 0.9.7, with both QEMU 0.14 and QEMU GIT master, using a guest with 2 GB of RAM: $ virsh start l3 Domain l3 started $ virsh dominfo l3 Id: 17 Name: l3 UUID: c7a3edbd-edaf-9455-926a-d65c16db1803 OS Type: hvm State: running CPU(s): 1 CPU time: 1.1s Max memory: 2292000 kB Used memory: 2292736 kB Persistent: yes Autostart: disable Managed save: no Security model: selinux Security DOI: 0 Security label: system_u:system_r:unconfined_t:s0:c94,c700 (permissive) To actually perform the save-to-file, I use the '--bypass-cache' flag for libvirt, which ensures we pass a pipe to QEMU and run our I/O helper for O_DIRECT, instead of directly giving QEMU a plain file $ virsh save --bypass-cache l3 l3.image Domain l3 saved to l3.image
- Most of the qemu_savevm_state_iterate() calls complete in 10-20 ms
- Reasonably often a qemu_savevm_state_iterate() call takes 300-400 ms
- Fairly rarely a qemu_savevm_state_iterate() call takes 10-20 *seconds*
I use the attached systemtap script for determining these eg run this before starting the migration to disk: # stap qemu-mig.stp Begin 0.000 Start 5.198 > Begin 5.220 < Begin 0.022 5.220 > Iterate 5.224 < Iterate 0.004 ...snip.. 6.299 > Iterate 6.314 < Iterate 0.015 6.314 > Iterate 6.319 < Iterate 0.005 6.409 > Iterate 8.139 < Iterate 1.730 <<< very slow iteration 8.152 > Iterate 13.078 < Iterate 4.926 <<< very slow iteration 13.963 > Iterate 14.248 < Iterate 0.285 14.441 > Iterate 14.448 < Iterate 0.007 ...snip... 24.171 > Iterate 24.178 < Iterate 0.007 24.178 > Complete 24.588 < Complete 0.410 <Ctrl-C> avg 79 = sum 8033 / count 101; min 3 max 4926 value |-------------------------------------------------- count 0 | 0 1 | 0 2 | 1 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 74 8 |@@@@@@@@@ 19 16 |@ 3 32 | 0 64 | 0 128 | 0 256 |@ 2 512 | 0 1024 | 1 2048 | 0 4096 | 1 8192 | 0 16384 | 0 Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|