On Thu, May 16, 2013 at 12:09:39PM -0400, Peter Feiner wrote:
Hello Daniel,
I've been working on improving scalability in OpenStack on libvirt+kvm
for the last couple of months. I'm particularly interested in reducing
the time it takes to create VMs when many VMs are requested in
parallel.
One apparent bottleneck during virtual machine creation is libvirt. As
more VMs are created in parallel, some libvirt calls (i.e.,
virConnectGetLibVersion and virDomainCreateWithFlags) take longer
without a commensurate increase in hardware utilization.
Thanks to your patches in libvirt-1.0.3, the situation has improved.
Some libvirt calls OpenStack makes during VM creation (i.e.,
virConnectDefineXML) have no measurable slowdown when many VMs are
created in parallel. In turn, parallel VM creation in OpenStack is
significantly faster with libvirt-1.0.3. On my standard benchmark
(create 20 VMs in parallel, wait until the VM is ACTIVE, which is
essentially after virDomainCreateWithFlags returns), libvirt-1.0.3
reduces the median creation time from 90s to 60s when compared to
libvirt-0.9.8.
How many CPU cores are you testing on ? That's a good improvement,
but I'd expect the improvement to be greater as # of core is larger.
Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
limit a single connection to only 5 RPC calls. Beyond that calls
queue up, even if libvirtd is otherwise idle. OpenStack uses a
single connection for everythin so will hit this. I suspect this
would be why virConnectGetLibVersion would appear to be slow. That
API does absolutely nothing of any consequence, so the only reason
I'd expect that to be slow is if you're hitting a libvirtd RPC
limit causing the API to be queued up.
I'd like to know if your concurrency work in the qemu driver is
ongoing. If it isn't, I'd like to pick the work up myself and work on
further improvements. Any advice or insight would be appreciated.
I'm not actively doing anything in this area. Mostly because I've got not
clear data on where any remaining bottlenecks are.
One theory I had was that the virDomainObjListSearchName method could
be a bottleneck, becaue that acquires a lock on every single VM. This
is invoked when starting a VM, when we call virDomainObjListAddLocked.
I tried removing this locking though & didn't see any performance
benefit, so never persued this further. Before trying things like
this again, I think we'd need to find a way to actually identify where
the true bottlenecks are, rather than guesswork.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|