On Fri, Nov 11, 2011 at 01:03:20PM +0000, Daniel P. Berrange wrote:
Libvirt recently introduced a change to the way it does 'save to
file'
with QEMU. Historically QEMU has a 32MB/s I/O limit on migration by
default. When saving to file, we didn't want any artificial limit,
but rather to max out the underlying storage. So when doing save to
file, we set a large bandwidth limit (INT64_MAX / (1024 * 1024)) so
it is effectively unlimited.
After doing this, we discovered that the QEMU monitor was becoming
entirely blocked. It did not even return from the 'migrate' command
until migration was complete despite the 'detach' flag being set.
This was a bug in libvirt, because we passed a plain file descriptor
which does not support EAGAIN. Thank you POSIX.
Libvirt has another mode where it uses an I/O helper command so get
O_DIRECT, and in this mode we pass a pipe() FD to QEMU. After ensuring
that this pipe FD really does have O_NONBLOCK set, we still saw some
odd behaviour.
I'm not sure whether what I describe can neccessarily be called a QEMU
bug, but I wanted to raise it for discussion anyway....
The sequence of steps is
- libvirt sets qemu migration bandwidth to "unlimited"
- libvirt opens a pipe() and sets O_NONBLOCK on the write end
- libvirt spawns libvirt-iohelper giving it the target file
on disk, and the read end of the pipe
- libvirt does 'getfd migfile' monitor command to give QEMU
the write end of the pipe
- libvirt does 'migrate fd:migfile -d' to run migration
- In parallel
- QEMU is writing to the pipe (which is non-blocking)
- libvirt_helper reading the pipe & writing to disk with O_DIRECT
I should have mentioned that the way I'm testing this is with
libvirt 0.9.7, with both QEMU 0.14 and QEMU GIT master, using
a guest with 2 GB of RAM:
$ virsh start l3
Domain l3 started
$ virsh dominfo l3
Id: 17
Name: l3
UUID: c7a3edbd-edaf-9455-926a-d65c16db1803
OS Type: hvm
State: running
CPU(s): 1
CPU time: 1.1s
Max memory: 2292000 kB
Used memory: 2292736 kB
Persistent: yes
Autostart: disable
Managed save: no
Security model: selinux
Security DOI: 0
Security label: system_u:system_r:unconfined_t:s0:c94,c700 (permissive)
To actually perform the save-to-file, I use the '--bypass-cache' flag
for libvirt, which ensures we pass a pipe to QEMU and run our I/O
helper for O_DIRECT, instead of directly giving QEMU a plain file
$ virsh save --bypass-cache l3 l3.image
Domain l3 saved to l3.image
- Most of the qemu_savevm_state_iterate() calls complete in 10-20
ms
- Reasonably often a qemu_savevm_state_iterate() call takes 300-400 ms
- Fairly rarely a qemu_savevm_state_iterate() call takes 10-20 *seconds*
I use the attached systemtap script for determining these
eg run this before starting the migration to disk:
# stap qemu-mig.stp
Begin
0.000 Start
5.198 > Begin
5.220 < Begin 0.022
5.220 > Iterate
5.224 < Iterate 0.004
...snip..
6.299 > Iterate
6.314 < Iterate 0.015
6.314 > Iterate
6.319 < Iterate 0.005
6.409 > Iterate
8.139 < Iterate 1.730 <<< very slow iteration
8.152 > Iterate
13.078 < Iterate 4.926 <<< very slow iteration
13.963 > Iterate
14.248 < Iterate 0.285
14.441 > Iterate
14.448 < Iterate 0.007
...snip...
24.171 > Iterate
24.178 < Iterate 0.007
24.178 > Complete
24.588 < Complete 0.410
<Ctrl-C>
avg 79 = sum 8033 / count 101; min 3 max 4926
value |-------------------------------------------------- count
0 | 0
1 | 0
2 | 1
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 74
8 |@@@@@@@@@ 19
16 |@ 3
32 | 0
64 | 0
128 | 0
256 |@ 2
512 | 0
1024 | 1
2048 | 0
4096 | 1
8192 | 0
16384 | 0
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|