[libvirt-users] Quadratic function-like call delay

Hi, There is a quite annoying thing, don`t sure if I can call it a bug. How to reproduce: - start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service. http://xdel.ru/downloads/libvirt.png

On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:
Hi,
There is a quite annoying thing, don`t sure if I can call it a bug.
How to reproduce:
- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.
When the daemon is restarted it needs to reconnect to all the guests and that operation takes time. I'm not sure why it's not linear, but I think i experienced that a couple of years ago, so that doesn't sounds a recent regression. If you can look at what is going on, that would be useful Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:
Hi,
There is a quite annoying thing, don`t sure if I can call it a bug.
How to reproduce:
- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.
When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Sep 24, 2012 at 01:35:21PM +0100, Daniel P. Berrange wrote:
On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:
Hi,
There is a quite annoying thing, don`t sure if I can call it a bug.
How to reproduce:
- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.
When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect.
Hum, IIRC there was a time where libvirtd would not process any request until those reconnections were done, did that change ? Andrey, what version are you using ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Mon, Sep 24, 2012 at 09:46:08PM +0800, Daniel Veillard wrote:
On Mon, Sep 24, 2012 at 01:35:21PM +0100, Daniel P. Berrange wrote:
On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:
Hi,
There is a quite annoying thing, don`t sure if I can call it a bug.
How to reproduce:
- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.
When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect.
Hum, IIRC there was a time where libvirtd would not process any request until those reconnections were done, did that change ?
Yes, a little while back we introduced code which spawns a background thread at startup to handle re-connect of VMs, so that libvirtd is not blocked for a long time. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Sep 24, 2012 at 4:35 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:
Hi,
There is a quite annoying thing, don`t sure if I can call it a bug.
How to reproduce:
- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.
When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect.
Here is a report, please take a look. https://bugzilla.redhat.com/show_bug.cgi?id=860053
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Sep 24, 2012 at 2:12 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
On Mon, Sep 24, 2012 at 4:35 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:
Hi,
There is a quite annoying thing, don`t sure if I can call it a bug.
How to reproduce:
- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.
When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect.
Here is a report, please take a look.
I've commented on the bug. But for what its worth, I see this as well but the delay comes from the fact that I have a number of storage pools marked as autostart. The solution^H^H^H^Hhack was to unmark them from autostart and make an init script which started up all of my storage pools. Now I can restart libvirt as often as I like with a fairly linear with relation to VMs delay. -- Doug Goldstein
participants (4)
-
Andrey Korolyov
-
Daniel P. Berrange
-
Daniel Veillard
-
Doug Goldstein