Re: [libvirt] [Qemu-devel] IO accounting overhaul

2 Sep 2014

      Benoît Canet <benoit.canet@irqsave.net> writes:
...
The Monday 01 Sep 2014 à 13:41:01 (+0200), Markus Armbruster wrote :
...
Benoît Canet <benoit.canet@irqsave.net> writes:
...
The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
[...]
...
A quick stab at tasks:
* QMP interface, either a compatible extension of query-blockstats or a
  new one.
I would like to extend query-blockstat in a first time but I also
would like to postpone the QMP interface changes and just write the
shared infrastructure and deploy it in the device models.
Implementing improved data collection need not wait for the QMP design.
...
...
* Rough idea on how to do the shared infrastructure.
-API wize I think about adding
bdrv_acct_invalid() and
bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
Complication: partial success.  Example:
1. Guest requests a read of N sectors.
2. Device model calls
   bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
3. Device model examines the request, and deems it valid.
4. Device model passes it to the block layer.
5. Block layer does its thing, but for some reason only M < N sectors
   can be read.  Block layer returns M.
6. What's the device model to do now?  Both bdrv_acct_failed() and
   bdrv_acct_done() would be wrong.
Should the device model account for a read of size M?  This ignores
   the partial failure.
Should it split the read into a successful and a failed part for
   accounting purposes?  This miscounts the number of reads.
maybe bdrv_acct_partial() accounting the size of data read in the
bandwith counter
and keeping care of counting this event.
Two sub-questions:

a. What's a convenient interface for device models to report the
   operation announced with bdrv_acct_start() has succeeded partially?

   bdrv_acct_partial() sounds okay.

   Or maybe pass the #bytes actually done to bdrv_acct_done().  Equal to
   #bytes passed to bdrv_acct_done() means complete sucess, less means
   partial success.  You could even have negative mean complete
   failure.  Could perhaps be more concise.

   I trust you'll develop a preference while making the device models
   use your new interface.

b. How to count a partially successful operation?  In other words, what
   should your answer to a. do?

   I guess I'd be fine with simply counting short I/O as if the request
   had the short size.  But if we decide differently, changing the code
   accordingly should be trivial, so just start with whatever you think
   is right, and leave the debate (if any) to patch review.
...
Maybe we will discover some other rare event to account.
Yes, but we can worry about it when we run into it.

[...]