Re: [libvirt] [RFC] finegrained disk driver options control

16 Mar 2017

      Am 16.03.2017 um 16:52 hat Daniel P. Berrange geschrieben:
...
On Thu, Mar 16, 2017 at 04:35:36PM +0100, Kevin Wolf wrote:
...
Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
...
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
...
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
...
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
...
Hello, All!
There is a problem in the current libvirt implementation. domain.xml
allows to specify only basic set of options, especially in the case
of QEMU, when there are really a lot of tweaks in format drivers.
Most likely these options will never be supported in a good way
in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using
very strange approach:
- disk section of domain XML is removed
- exact command line options to start the disk are specified at the end
  of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of
options is found I can not drop VM in such state in the production. This
is the pain and problem. For example, I have spend 3 days with the
VM of one customer which blames us for slow IO in the guest. I have
found very good combination of non-standard options which increases
disk performance 5 times (not 5%). Currently I can not put this combination
in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here,
but it would be nice to allow to pass arbitrary option to the QEMU
command line. This could be done in a very generic way if we will
allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'
iothread='1'>
          <option name='l2-cache-size' value='64M/>
          <option name='cache-clean-interval' value='32'/>
      </driver>
      <source file='/var/lib/libvirt/images/rhel7.qcow2'/>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
and so on. The meaning (at least for QEMU) is quite simple -
these options will just be added to the end of the -drive command
line. The meaning for other drivers should be the same and I
think that there are ways to pass generic options in them.
It is a general policy that we do *not* do generic option passthrough
in this kind of manner. We always want to represent concepts explicitly
with named attributes, so that if 2 hypervisors support the same concept
we can map it the same way in the XML
OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is
around 10 times. 10 TIMES. 1000%. The customer could not
wait until proper fix in the next QEMU release especially
if we are able to provide the kludge specifically for him.
We can explicitly allow L2 cache size set in the XML but that
is a pretty poor solution to the problem IMHO, as the mgmt
application has no apriori knowledge of whether a particular
cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed
in QEMU so it has either a more appropriate default, or if a
single default is not possible, have QEMU auto-tune its cache
size dynamically to suit the characteristics of the qcow2 image.
A tradeoff between memory usage and performance is policy, and setting
policy is the management layer's job, no qemu's. We can try to provide
good defaults, but they are meant for manual users of qemu. libvirt is
expected to configure everything exactly as it wants it instead of
relying on defaults.
The question though is how is an app supposed to figure out what the
optimal setting for cache size is ?  It seems to require knowledge
of the level of disk fragmentation and guest I/O patterns, neither
of which are things we can know upfront. Which means any atttempt to
set cache size is little more than ill-informed guesswork
No, it requires knowledge whether the user prefers to spend more memory
for improved disk performance of this VM, at the cost of other VMs or
applications running on the machine. And this is something that qemu
can't really figure out any better than libvirt.

If you don't care about that at all, the optimal configuration in terms
of performance is to give qemu a cache large enough that the metadata of
the whole image fits in it. When setting cache-clean-interval, this
could actually be reasonable even for large images because the memory
wouldn't be used forever but just as long as the guest is submitting the
problematic I/O patterns - but it still means that temporarily qemu
could really use all of this memory.

Kevin