[libvirt] [RFC] finegrained disk driver options control

Hello, All! There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities. Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk. I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this: <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. Any opinion? Den

On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them.
It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
Regards, Daniel OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him. There is an option <qemu:commandline> which specifically works like this. It is enabled specifically with changed scheme. OK, we can have this option enabled only under the same condition. But we have to have a way to solve the problem at the moment. Not in 3 month of painful dances within the driver. May be with limitations line increased memory footprint, but still. Will it work for you? Den

On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him.
We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image. For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image.
There is an option <qemu:commandline> which specifically works like this. It is enabled specifically with changed scheme. OK, we can have this option enabled only under the same condition. But we have to have a way to solve the problem at the moment. Not in 3 month of painful dances within the driver. May be with limitations line increased memory footprint, but still.
Sure, you can use <qemu:commandline> passthrough - that is the explicit temporary workaround - we don't provide any guarantee that your guest won't break when upgrading either libvirt or QEMU though, hence we mark it as tainted. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 03/16/2017 06:08 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him. We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image. Yes, I agree. That is why I am spoken about the kludge.
There is an option <qemu:commandline> which specifically works like this. It is enabled specifically with changed scheme. OK, we can have this option enabled only under the same condition. But we have to have a way to solve the problem at the moment. Not in 3 month of painful dances within the driver. May be with limitations line increased memory footprint, but still. Sure, you can use <qemu:commandline> passthrough - that is the explicit temporary workaround - we don't provide any guarantee that your guest won't break when upgrading either libvirt or QEMU though, hence we mark it as tainted. No and yes. Yes, <qemu:commandline> partially solves the situation. No this solution has tooo strong drawbacks IMHO. The configuration of this VM could not be changed anymore in any viable way and there are a lot of problems as one disk is absent at libvirt level.
Can we add the option when the VM config is tainted and debug scheme enabled specifically to the disk level? This would the best partial solution, which will not ruin other management tasks like backup, disk add etc. Den

On Thu, Mar 16, 2017 at 06:15:27PM +0300, Denis V. Lunev wrote:
On 03/16/2017 06:08 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him. We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image. Yes, I agree. That is why I am spoken about the kludge.
There is an option <qemu:commandline> which specifically works like this. It is enabled specifically with changed scheme. OK, we can have this option enabled only under the same condition. But we have to have a way to solve the problem at the moment. Not in 3 month of painful dances within the driver. May be with limitations line increased memory footprint, but still. Sure, you can use <qemu:commandline> passthrough - that is the explicit temporary workaround - we don't provide any guarantee that your guest won't break when upgrading either libvirt or QEMU though, hence we mark it as tainted. No and yes. Yes, <qemu:commandline> partially solves the situation. No this solution has tooo strong drawbacks IMHO. The configuration of this VM could not be changed anymore in any viable way and there are a lot of problems as one disk is absent at libvirt level.
Can we add the option when the VM config is tainted and debug scheme enabled specifically to the disk level? This would the best partial solution, which will not ruin other management tasks like backup, disk add etc.
We really don't want to propagate the custom passthrough into further areas of the XML. It is intentionally limited because it is not something we want people/apps to use for anything other than a short term hack. You should be able to use qemu's '-set' argument to set fields against existing QEMU args, without having to throw away the entire libvirt <disk> config built by libvirt. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 03/16/2017 06:20 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 06:15:27PM +0300, Denis V. Lunev wrote:
On 03/16/2017 06:08 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him. We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image. Yes, I agree. That is why I am spoken about the kludge.
There is an option <qemu:commandline> which specifically works like this. It is enabled specifically with changed scheme. OK, we can have this option enabled only under the same condition. But we have to have a way to solve the problem at the moment. Not in 3 month of painful dances within the driver. May be with limitations line increased memory footprint, but still. Sure, you can use <qemu:commandline> passthrough - that is the explicit temporary workaround - we don't provide any guarantee that your guest won't break when upgrading either libvirt or QEMU though, hence we mark it as tainted. No and yes. Yes, <qemu:commandline> partially solves the situation. No this solution has tooo strong drawbacks IMHO. The configuration of this VM could not be changed anymore in any viable way and there are a lot of problems as one disk is absent at libvirt level.
Can we add the option when the VM config is tainted and debug scheme enabled specifically to the disk level? This would the best partial solution, which will not ruin other management tasks like backup, disk add etc. We really don't want to propagate the custom passthrough into further areas of the XML. It is intentionally limited because it is not something we want people/apps to use for anything other than a short term hack. You should be able to use qemu's '-set' argument to set fields against existing QEMU args, without having to throw away the entire libvirt <disk> config built by libvirt.
Regards, Daniel Technically this solves the problem, but still we need to specify options without such hacks.
Den

On Thu 16 Mar 2017 04:08:50 PM CET, Daniel P. Berrange wrote:
OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him.
We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image.
Related bug report (and discussion): https://bugzilla.redhat.com/show_bug.cgi?id=1377735 Berto

On 03/16/2017 06:29 PM, Alberto Garcia wrote:
On Thu 16 Mar 2017 04:08:50 PM CET, Daniel P. Berrange wrote:
OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him. We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image. Related bug report (and discussion):
https://bugzilla.redhat.com/show_bug.cgi?id=1377735
Berto Very useful! Thank you.
I have different approach for the case. Though I do not have numbers, but with a big data block the price for cache miss is too big. I'll try to come back with this later on. Den

Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him.
We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image.
A tradeoff between memory usage and performance is policy, and setting policy is the management layer's job, no qemu's. We can try to provide good defaults, but they are meant for manual users of qemu. libvirt is expected to configure everything exactly as it wants it instead of relying on defaults. Kevin

On 03/16/2017 06:35 PM, Kevin Wolf wrote:
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him. We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image. A tradeoff between memory usage and performance is policy, and setting
Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben: policy is the management layer's job, no qemu's. We can try to provide good defaults, but they are meant for manual users of qemu. libvirt is expected to configure everything exactly as it wants it instead of relying on defaults.
Kevin Exactly. We can have VMs with cold data with reduced memory footprint and VMs with hot data with maximum IO capacity. Requirements from management are completely different.
Den

On Thu, Mar 16, 2017 at 04:35:36PM +0100, Kevin Wolf wrote:
Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him.
We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image.
A tradeoff between memory usage and performance is policy, and setting policy is the management layer's job, no qemu's. We can try to provide good defaults, but they are meant for manual users of qemu. libvirt is expected to configure everything exactly as it wants it instead of relying on defaults.
The question though is how is an app supposed to figure out what the optimal setting for cache size is ? It seems to require knowledge of the level of disk fragmentation and guest I/O patterns, neither of which are things we can know upfront. Which means any atttempt to set cache size is little more than ill-informed guesswork Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 03/16/2017 06:52 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 04:35:36PM +0100, Kevin Wolf wrote:
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him. We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image. A tradeoff between memory usage and performance is policy, and setting
Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben: policy is the management layer's job, no qemu's. We can try to provide good defaults, but they are meant for manual users of qemu. libvirt is expected to configure everything exactly as it wants it instead of relying on defaults. The question though is how is an app supposed to figure out what the optimal setting for cache size is ? It seems to require knowledge of the level of disk fragmentation and guest I/O patterns, neither of which are things we can know upfront. Which means any atttempt to set cache size is little more than ill-informed guesswork
Regards, Daniel Funny thing that this information could come from the outside world, f.e. from the SLA which is dependent from the amount of money the end-user is paying to the hosting provider.
Den

Am 16.03.2017 um 16:52 hat Daniel P. Berrange geschrieben:
On Thu, Mar 16, 2017 at 04:35:36PM +0100, Kevin Wolf wrote:
Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
OK. How could I change L2 cache size for QCOW2 image?
For 1 Tb disk, fragmented in guest, the performance loss is around 10 times. 10 TIMES. 1000%. The customer could not wait until proper fix in the next QEMU release especially if we are able to provide the kludge specifically for him.
We can explicitly allow L2 cache size set in the XML but that is a pretty poor solution to the problem IMHO, as the mgmt application has no apriori knowledge of whether a particular cache size is going to be right for a particular QCow2 image.
For a sustainable solution, IMHO this really needs to be fixed in QEMU so it has either a more appropriate default, or if a single default is not possible, have QEMU auto-tune its cache size dynamically to suit the characteristics of the qcow2 image.
A tradeoff between memory usage and performance is policy, and setting policy is the management layer's job, no qemu's. We can try to provide good defaults, but they are meant for manual users of qemu. libvirt is expected to configure everything exactly as it wants it instead of relying on defaults.
The question though is how is an app supposed to figure out what the optimal setting for cache size is ? It seems to require knowledge of the level of disk fragmentation and guest I/O patterns, neither of which are things we can know upfront. Which means any atttempt to set cache size is little more than ill-informed guesswork
No, it requires knowledge whether the user prefers to spend more memory for improved disk performance of this VM, at the cost of other VMs or applications running on the machine. And this is something that qemu can't really figure out any better than libvirt. If you don't care about that at all, the optimal configuration in terms of performance is to give qemu a cache large enough that the metadata of the whole image fits in it. When setting cache-clean-interval, this could actually be reasonable even for large images because the memory wouldn't be used forever but just as long as the guest is submitting the problematic I/O patterns - but it still means that temporarily qemu could really use all of this memory. Kevin

On Thu, Mar 16, 2017 at 05:26:26PM +0100, Kevin Wolf wrote:
Am 16.03.2017 um 16:52 hat Daniel P. Berrange geschrieben: If you don't care about that at all, the optimal configuration in terms of performance is to give qemu a cache large enough that the metadata of the whole image fits in it. When setting cache-clean-interval, this could actually be reasonable even for large images because the memory wouldn't be used forever but just as long as the guest is submitting the problematic I/O patterns - but it still means that temporarily qemu could really use all of this memory.
Is there some easy algorithm that libvirt can use to determine the size of the L2 tables, and thus report what the maximum useful cache size would be ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Sat 18 Mar 2017 10:54:16 AM CET, Daniel P. Berrange wrote:
If you don't care about that at all, the optimal configuration in terms of performance is to give qemu a cache large enough that the metadata of the whole image fits in it. When setting cache-clean-interval, this could actually be reasonable even for large images because the memory wouldn't be used forever but just as long as the guest is submitting the problematic I/O patterns - but it still means that temporarily qemu could really use all of this memory.
Is there some easy algorithm that libvirt can use to determine the size of the L2 tables, and thus report what the maximum useful cache size would be ?
Yes, the disk size that can be covered with a certain L2 cache is: disk_size = l2_cache_size * cluster_size / 8 (all sizes in bytes) Note that increasing the L2 cache size also increases the refcount cache, which you might want to keep small. It's all explained in detail here: https://github.com/qemu/qemu/blob/master/docs/qcow2-cache.txt https://blogs.igalia.com/berto/2015/12/17/improving-disk-io-performance-in-q... Berto

On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
Regards, Daniel In general this policy means that the management software which wants to implement some differentiation in between VMs f.e. in disk tuning is forced to use qemu:commandline backdoor. That is a pity. Exactly like in the case with additional logs.
OK. Thank you for the discussion. At least I have found new way to perform some fine tuning. Den

On Thu, Mar 16, 2017 at 08:31:08PM +0300, Denis V. Lunev wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
Regards, Daniel In general this policy means that the management software which wants to implement some differentiation in between VMs f.e. in disk tuning is forced to use qemu:commandline backdoor. That is a pity. Exactly like in the case with additional logs.
Thank you for the discussion. At least I have found new way to perform some fine tuning.
Ignoring the question of generic option passthrough, I think we can model the cache settings in libvirt XML explicitly. Other types of disk besides qcow2 can have a cache concept, so I think we could create something like this: <driver name='qemu' type='qcow2' ....> <cache> <clean interval="2" unit="seconds"/> <bank name="l2" size="1024" unit="KiB"/> <bank name="refcount" size="1024" unit="KiB"/> </cache> </driver> The "bank" element would be permitted to be repeated multiple times if a particular diskk driver had multiple caches it needed. In the storage vol XML, we would want a way top report what the size of the L2 and refcount tables are when reporting qcow2 volumes, so apps know the maximum sensible size to use for cache. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
Regards, Daniel In general this policy means that the management software which wants to implement some differentiation in between VMs f.e. in disk tuning is forced to use qemu:commandline backdoor. That is a pity. Exactly like in the case with additional logs.
Thank you for the discussion. At least I have found new way to perform some fine tuning. Ignoring the question of generic option passthrough, I think we can model
On Thu, Mar 16, 2017 at 08:31:08PM +0300, Denis V. Lunev wrote: the cache settings in libvirt XML explicitly. Other types of disk besides qcow2 can have a cache concept, so I think we could create something like this:
<driver name='qemu' type='qcow2' ....> <cache> <clean interval="2" unit="seconds"/> <bank name="l2" size="1024" unit="KiB"/> <bank name="refcount" size="1024" unit="KiB"/> </cache> </driver>
The "bank" element would be permitted to be repeated multiple times if a particular diskk driver had multiple caches it needed.
In the storage vol XML, we would want a way top report what the size of the L2 and refcount tables are when reporting qcow2 volumes, so apps know the maximum sensible size to use for cache.
Regards, Daniel For cache and anything which could be bound as cache this is not that difficult. But are you going to limit possible bank names? Without
On 03/18/2017 12:59 PM, Daniel P. Berrange wrote: the limit this would work exactly the same as I have proposed. With the limit, i.e. understanding of allowed banks on a format basis, we will stuck in a really LOT of details. Den

On Mon, Mar 20, 2017 at 11:11:42AM +0300, Denis V. Lunev wrote:
On 03/18/2017 12:59 PM, Daniel P. Berrange wrote:
On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
Hello, All!
There is a problem in the current libvirt implementation. domain.xml allows to specify only basic set of options, especially in the case of QEMU, when there are really a lot of tweaks in format drivers. Most likely these options will never be supported in a good way in libvirt as recognizable entities.
Right now in order to debug libvirt QEMU VM in production I am using very strange approach: - disk section of domain XML is removed - exact command line options to start the disk are specified at the end of domain.xml whithin <qemu:commandline> as described by Stefan
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
The problem is that when debug is finished and viable combinations of options is found I can not drop VM in such state in the production. This is the pain and problem. For example, I have spend 3 days with the VM of one customer which blames us for slow IO in the guest. I have found very good combination of non-standard options which increases disk performance 5 times (not 5%). Currently I can not put this combination in the production as libvirt does not see the disk.
I propose to do very simple thing, may be I am not the first one here, but it would be nice to allow to pass arbitrary option to the QEMU command line. This could be done in a very generic way if we will allow to specify additional options inside <driver> section like this:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' iothread='1'> <option name='l2-cache-size' value='64M/> <option name='cache-clean-interval' value='32'/> </driver> <source file='/var/lib/libvirt/images/rhel7.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
and so on. The meaning (at least for QEMU) is quite simple - these options will just be added to the end of the -drive command line. The meaning for other drivers should be the same and I think that there are ways to pass generic options in them. It is a general policy that we do *not* do generic option passthrough in this kind of manner. We always want to represent concepts explicitly with named attributes, so that if 2 hypervisors support the same concept we can map it the same way in the XML
In general this policy means that the management software which wants to implement some differentiation in between VMs f.e. in disk tuning is forced to use qemu:commandline backdoor. That is a pity. Exactly like in the case with additional logs.
Thank you for the discussion. At least I have found new way to perform some fine tuning. Ignoring the question of generic option passthrough, I think we can model
On Thu, Mar 16, 2017 at 08:31:08PM +0300, Denis V. Lunev wrote: the cache settings in libvirt XML explicitly. Other types of disk besides qcow2 can have a cache concept, so I think we could create something like this:
<driver name='qemu' type='qcow2' ....> <cache> <clean interval="2" unit="seconds"/> <bank name="l2" size="1024" unit="KiB"/> <bank name="refcount" size="1024" unit="KiB"/> </cache> </driver>
The "bank" element would be permitted to be repeated multiple times if a particular diskk driver had multiple caches it needed.
In the storage vol XML, we would want a way top report what the size of the L2 and refcount tables are when reporting qcow2 volumes, so apps know the maximum sensible size to use for cache.
For cache and anything which could be bound as cache this is not that difficult. But are you going to limit possible bank names? Without the limit this would work exactly the same as I have proposed. With the limit, i.e. understanding of allowed banks on a format basis, we will stuck in a really LOT of details.
It would certainly validate the cache names matched those supported by the image format, as well as validating the values are in integer format. As mentioned, the storage volume XML would also report the cache supported by each storage volume in a storage pool, providing applications the way to learn what caches are available for the format. This is very different to just providing blind passthrough of any qcow option. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
participants (4)
-
Alberto Garcia
-
Daniel P. Berrange
-
Denis V. Lunev
-
Kevin Wolf