[libvirt-users] Quadratic function-like call delay

older
[libvirt-users] VLANs with Open...

Andrey Korolyov

24 Sep 2012 24 Sep '12

4:48 p.m.

Hi, There is a quite annoying thing, don`t sure if I can call it a bug. How to reproduce: - start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service. http://xdel.ru/downloads/libvirt.png

Show replies by date

Daniel Veillard

24 Sep 24 Sep

5:50 p.m.

On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:

...

Hi,

There is a quite annoying thing, don`t sure if I can call it a bug.

How to reproduce:

- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.

http://xdel.ru/downloads/libvirt.png

When the daemon is restarted it needs to reconnect to all the guests and that operation takes time. I'm not sure why it's not linear, but I think i experienced that a couple of years ago, so that doesn't sounds a recent regression. If you can look at what is going on, that would be useful Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Daniel P. Berrange

9:35 p.m.

On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:

...

Hi,

There is a quite annoying thing, don`t sure if I can call it a bug.

How to reproduce:

- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.

When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Daniel Veillard

10:46 p.m.

On Mon, Sep 24, 2012 at 01:35:21PM +0100, Daniel P. Berrange wrote:

...

On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:

...
Hi,

There is a quite annoying thing, don`t sure if I can call it a bug.

How to reproduce:

- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.

When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect.

Hum, IIRC there was a time where libvirtd would not process any request until those reconnections were done, did that change ? Andrey, what version are you using ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Daniel P. Berrange

10:49 p.m.

On Mon, Sep 24, 2012 at 09:46:08PM +0800, Daniel Veillard wrote:

...

On Mon, Sep 24, 2012 at 01:35:21PM +0100, Daniel P. Berrange wrote:

...
On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:

...
Hi,

There is a quite annoying thing, don`t sure if I can call it a bug.

How to reproduce:

- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.

When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect.

Hum, IIRC there was a time where libvirtd would not process any request until those reconnections were done, did that change ?

Yes, a little while back we introduced code which spawns a background thread at startup to handle re-connect of VMs, so that libvirtd is not blocked for a long time. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Andrey Korolyov

25 Sep 25 Sep

4:12 a.m.

On Mon, Sep 24, 2012 at 4:35 PM, Daniel P. Berrange <berrange@redhat.com> wrote:

...

On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:

...
Hi,

There is a quite annoying thing, don`t sure if I can call it a bug.

How to reproduce:

- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.

When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect.

Here is a report, please take a look. https://bugzilla.redhat.com/show_bug.cgi?id=860053

...

Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Doug Goldstein

7:05 a.m.

On Mon, Sep 24, 2012 at 2:12 PM, Andrey Korolyov <andrey@xdel.ru> wrote:

...

On Mon, Sep 24, 2012 at 4:35 PM, Daniel P. Berrange <berrange@redhat.com> wrote:

...
On Mon, Sep 24, 2012 at 11:48:44AM +0400, Andrey Korolyov wrote:

...
Hi,

There is a quite annoying thing, don`t sure if I can call it a bug.

How to reproduce:

- start a bunch of VMs, - stop libvirt, - start libvirt and immediately issue any call, say, 'virsh version', - delay until call completion may be fitted nearly as quadratic function from number of running VMs, qemu-kvm in my case (see link below). After this delay passed, any calls executed without slowdown, so problem is only a 'dead gap' after restarting service.

When you restart the libvirtd service, while KVM guests are running, the first thing libvirtd needs todo is to connect to the QEMU monitor for each guest and figure out its status. This can take some time depending on how loaded the host is in general. It will definitely slow libvirt API calls which need to query individual VMs, but I'm rather surprised that you saw any slowdown of the 'virsh version' command, since that should not touch VMs at all. So I think I *would* class this as a bug. It could be something silly like the main I/O event loop having some badly written bit of non-scalable code. Please do file a bug about this, providing as much raw data as you have managed to collect.

Here is a report, please take a look.

https://bugzilla.redhat.com/show_bug.cgi?id=860053

I've commented on the bug. But for what its worth, I see this as well but the delay comes from the fact that I have a number of storage pools marked as autostart. The solution^H^H^H^Hhack was to unmark them from autostart and make an init script which started up all of my storage pools. Now I can restart libvirt as often as I like with a fairly linear with relation to VMs delay. -- Doug Goldstein

4818

Age (days ago)

4818

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Andrey Korolyov
Daniel P. Berrange
Daniel Veillard
Doug Goldstein