On Tue, May 23, 2017 at 10:04:44AM +0100, Daniel P. Berrange wrote:
On Tue, May 23, 2017 at 10:51:47AM +0200, Martin Kletzander wrote:
> On Tue, May 23, 2017 at 09:32:03AM +0100, Daniel P. Berrange wrote:
> > On Mon, May 22, 2017 at 06:00:13PM +0200, Peter Krempa wrote:
> > > Hint that the users should limit the number of VMs queried in the bulk
> > > stats API.
> > > ---
> > > src/libvirt-domain.c | 7 +++++++
> > > 1 file changed, 7 insertions(+)
> > >
> > > diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
> > > index 310b91b37..b01f2705c 100644
> > > --- a/src/libvirt-domain.c
> > > +++ b/src/libvirt-domain.c
> > > @@ -11315,6 +11315,10 @@ virConnectGetDomainCapabilities(virConnectPtr
conn,
> > > * VIR_CONNECT_GET_ALL_DOMAINS_STATS_SHUTOFF and/or
> > > * VIR_CONNECT_GET_ALL_DOMAINS_STATS_OTHER for all other states.
> > > *
> > > + * Note that this API is prone to exceeding maximum RPC message size on
hosts
> > > + * with lots of VMs so it's suggested to use virDomainListGetStats
with a
> > > + * reasonable list of VMs as the argument.
> > > + *
> > > * Returns the count of returned statistics structures on success, -1 on
error.
> > > * The requested data are returned in the @retStats parameter. The
returned
> > > * array should be freed by the caller. See virDomainStatsRecordListFree.
> > > @@ -11386,6 +11390,9 @@ virConnectGetAllDomainStats(virConnectPtr conn,
> > > * Note that any of the domain list filtering flags in @flags may be
rejected
> > > * by this function.
> > > *
> > > + * Note that this API is prone to exceeding maximum RPC message size on
hosts
> > > + * with lots of VMs so it's suggested to limit the number of VMs
queried.
> >
> > With the way this is worded, applications have no guidance on what is a
> > sensible max number of VMs they can safely use, so I don't think it is
> > particularly useful, except as a way to say "we told you so" when
apps
> > report a bug.
> >
> > I'd be inclined to explicitly put a limit of 100 VMs[1] in the API, and
> > document that hard limit, so apps immediately know to write their code
> > to chunk in 100 VM blocks.
> >
>
> Or we can give a hint on what the limit is and let users figure out the
> sensible number. It is not only based on the number of VMs. I believe
> you can hit the limit with one or two VMs if you have lot of devices
> that we report statistics for. Limiting the number of VMs to a
> particular number would not help as much in this case. But we can
> combine both approaches.
Urgh, that's even worse. Apps can't simply split queries into blocks of
N vms, because any single VM in that list might have huge number of disks.
So to use this at all reliably you have to query the XML config of every
guest to see what devices are present and then split up the queries into
variable number of VMs :-(
I meant something along the lines of:
This API might transfer lot of data over the connection which needs to
be limited. The limit is currently 32MB with all the headers and the
output format is described here, so one should be able to make a
pretty good assumptions based on the average deployment they have. In
case you get error message saying "RPC: message to big", you need to
limit the number of VMs you are requesting statistics for.
And then put a limit in the API that returns additional error if more
than 100 VMs are in the list.
This just adds to my feeling that we should consider this API a
failed
experiment
I'm not saying no... But we should be able to do such things, just in a
different way, let's say. The idea with shared memory is nice. Even
remotely the libvirt API could expose it as such and underneath we could
be changing how that update routine works.