Re: [libvirt] [RFC][scale] new API for querying domains stats

Tuesday, 1 July 2014

On Tue, Jul 01, 2014 at 03:09:13AM -0400, Francesco Romani wrote:
...
 Hi everyone,

 I'd like to discuss possible APIs and plans for new query APIs in libvirt.

 I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM;
 VDSM is the node management daemon, which is in charge, among many other things, to
 gather the host and statistics per Domain/VM.

 Right now we aim for a number of VM per node in the (few) hundreds, but we have big
plans
 to scale much more, and to possibly reach thousands in a not so distant future.
 At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk),
 and of course this obviously scales poorly.

 This is made only worse by the fact that VDSM is a python 2.7 application, and
notoriously
 python 2.x behaves very badly with threads. We are already working to improve our code,
 but I'd like to bring the discussion here and see if and when the querying API can be
improved.

 We currently use these APIs for our sempling:
   virDomainBlockInfo
   virDomainGetInfo
   virDomainGetCPUStats
   virDomainBlockStats
   virDomainBlockStatsFlags
   virDomainInterfaceStats
   virDomainGetVcpusFlags
   virDomainGetMetadata 
Why do you need to call virDomainGetMetadata so often ? That merely contains a
opaque data blob that can only have come from VDSM itself, so I'm surprised you
need to call that at all frequently.

...
 What we'd like to have is

 * asynchronous APIs for querying domain stats
(https://bugzilla.redhat.com/show_bug.cgi?id=1113106)
   This would be just awesome. Either a single callback or a different one per call is
fine
   (let's discuss this!).
   please note that we are much more concerned about thread reduction then about
performance
   numbers. We had report of thread number becoming a real harm, while performance so far
   is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54)

 * bulk APIs for querying domain stats
(https://bugzilla.redhat.com/show_bug.cgi?id=1113116)
   would be really welcome as well. It is quite independent from the previous bullet
point
   and would help us greatly with scale. 
If we did the first bullet point, we'd be adding another ~10 APIs for
async variants. If we then did the second bullet point we'd be adding
another ~10 APIs for bulk querying. So while you're right that they
are independent, it would be desirable to address them both at the
same time, so we only need to add 10 new APIs in total, not 20.

For the async API design, I could see two potential designs

1. A custom callback to run per API

     typedef (void)(*virDomainBlockInfoCallback)(virDomainPtr dom,
                                                 bool isError,
                                                 virDomainBlockInfoPtr info,
                                                 void *opaque);

    int virDomainGetBlockInfoAsync(virDomainPtr dom,
                                   const char *disk,
                                   virDomainBlockInfoCallback cb,
                                   void *opaque,
                                   unsigned int flags);

2. A standard callback and a pair of APIs

     typedef void *virDomainAsyncResult;
     typedef (void)(*virDomainAsyncCallback)(virDomainPtr dom,
                                             virDomainAsyncResult res);

   void virDomainGetBlockInfoAsync(virDomainPtr dom,
                                   const char *disk,
                                   virDomainBlockInfoCallback cb,
                                   void *opaque,
                                   unsigned int flags);
   int virDomainGetBlockInfoFinish(virDomainPtr dom,
                                  virDomainAsyncResult res,
                                  virDomainBlockInfoPtr info);

This second approach is the way GIO works (see example in this page
https://developer.gnome.org/gio/stable/GAsyncResult.html ). The main
difference between them really is probably the way you get error
reporting from the APIs. In the first example, libvirt would raise
an error before it invoked the callback, with isError set to True.
In the second example, the Finish() func would raise the error and
return -1.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [RFC][scale] new API for querying domains stats