Re: [libvirt] Ongoing work on lock contention in qemu driver?

Thursday, 16 May 2013

On Thu, May 16, 2013 at 12:09:39PM -0400, Peter Feiner wrote:
...
 Hello Daniel,

 I've been working on improving scalability in OpenStack on libvirt+kvm
 for the last couple of months. I'm particularly interested in reducing
 the time it takes to create VMs when many VMs are requested in
 parallel.

 One apparent bottleneck during virtual machine creation is libvirt. As
 more VMs are created in parallel, some libvirt calls (i.e.,
 virConnectGetLibVersion and virDomainCreateWithFlags) take longer
 without a commensurate increase in hardware utilization.

 Thanks to your patches in libvirt-1.0.3, the situation has improved.
 Some libvirt calls OpenStack makes during VM creation (i.e.,
 virConnectDefineXML) have no measurable slowdown when many VMs are
 created in parallel. In turn, parallel VM creation in OpenStack is
 significantly faster with libvirt-1.0.3. On my standard benchmark
 (create 20 VMs in parallel, wait until the VM is ACTIVE, which is
 essentially after virDomainCreateWithFlags returns), libvirt-1.0.3
 reduces the median creation time from 90s to 60s when compared to
 libvirt-0.9.8. 
How many CPU cores are you testing on ?  That's a good improvement,
but I'd expect the improvement to be greater as # of core is larger.

Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
limit a single connection to only 5 RPC calls. Beyond that calls
queue up, even if libvirtd is otherwise idle. OpenStack uses a
single connection for everythin so will hit this. I suspect this
would be why  virConnectGetLibVersion would appear to be slow. That
API does absolutely nothing of any consequence, so the only reason
I'd expect that to be slow is if you're hitting a libvirtd RPC
limit causing the API to be queued up.

...
 I'd like to know if your concurrency work in the qemu driver is
 ongoing. If it isn't, I'd like to pick the work up myself and work on
 further improvements. Any advice or insight would be appreciated. 
I'm not actively doing anything in this area. Mostly because I've got not
clear data on where any remaining bottlenecks are. 

One theory I had was that the virDomainObjListSearchName method could
be a bottleneck, becaue that acquires a lock on every single VM. This
is invoked when starting a VM, when we call virDomainObjListAddLocked.
I tried removing this locking though & didn't see any performance
benefit, so never persued this further.  Before trying things like
this again, I think we'd need to find a way to actually identify where
the true bottlenecks are, rather than guesswork.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] Ongoing work on lock contention in qemu driver?