On 12/12/18 3:39 AM, Nikolay Shirokovskiy wrote:
On 11.12.2018 19:34, John Ferlan wrote:
>
>
> On 12/11/18 5:22 AM, Nikolay Shirokovskiy wrote:
>>
>>
>> On 10.12.2018 19:56, John Ferlan wrote:
>>>
>>> On 11/8/18 8:02 AM, Nikolay Shirokovskiy wrote:
>>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy(a)virtuozzo.com>
>>>> ---
>>>> docs/formatdomain.html.in | 8 ++++
>>>> docs/schemas/domaincommon.rng | 11 +++++
>>>> src/conf/domain_conf.c | 17 ++++++++
>>>> src/conf/domain_conf.h | 9 ++++
>>>> .../qemuxml2argvdata/disk-metadata_cache_size.xml | 42
+++++++++++++++++++
>>>> .../disk-metadata_cache_size.xml | 48
++++++++++++++++++++++
>>>> tests/qemuxml2xmltest.c | 2 +
>>>> 7 files changed, 137 insertions(+)
>>>> create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.xml
>>>> create mode 100644
tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
>>>>
>>>
>>> <sigh> looks like a forgotten thread... It seems reviewer bandwidth
is
>>> limited and won't get much better during the last couple weeks of the
>>> month when many if not most Red Hat employees are out to due company
>>> shutdown period.
>>>
>>> You need to add a few words to the commit message to describe what's
>>> being changed. Oh and you'll also need a "last" patch to
docs/news.xml
>>> to describe the new feature.
>>>
>>>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
>>>> index 269741a..1d186ab 100644
>>>> --- a/docs/formatdomain.html.in
>>>> +++ b/docs/formatdomain.html.in
>>>> @@ -3556,6 +3556,14 @@
>>>> virt queues for virtio-blk. (<span
class="since">Since 3.9.0</span>)
>>>> </li>
>>>> <li>
>>>> + The optional <code>metadata_cache_size</code>
attribute specifies
>>>> + metadata cache size policy, possible values are
"default" and "maximum".
>>>
>>> s/policy, possible/policy. Possible/
>>>
>>>> + "default" leaves setting cache size to hypervisor,
"maximum" makes
>>>
>>> s/"default"/Using "default"/
>>>
>>> s/to hypervisor,/to the hypervisor./
>>>
>>> s/"maximum"/Using "maximum"
>>>
>>>> + cache size large enough to keep all metadata, this will help
if workload
>>>
>>>
>>> "Using maximum assigns the largest value possible for the cache size.
>>> This ensures the entire disk cache remains in memory for faster I/O at
>>> the expense of utilizing more memory."
>>>
>>> [Editorial comment: it's really not 100% clear what all the tradeoffs
>>> someone is making here. The phrase "large enough" just sticks out,
but
>>> since you use INT64_MAX in patch 3, I suppose that *is* the maximum.
>>> Still in some way indicating that this allows QEMU to grow the cache and
>>> keep everything in memory, but has the side effect that disks configured
>>> this way will cause guest memory requirements to grow albeit limited by
>>> the QEMU algorithms. It seems from my read the max is 32MB, so perhaps
>>> not a huge deal, but could be useful to note. Whether QEMU shrinks the
>>> cache when not in use wasn't 100% clear to me.]
>>>
>>> It's too bad it's not possible to utilize some dynamic value via
>>> qcow2_update_options_prepare. If dynamic adjustment were possible, then
>>> saving the value in the XML wouldn't be necessary - we could just allow
>>> dynamic adjustment similar to what I did for of a couple of IOThread
>>> params (see commit 11ceedcda and the patches before it).
>>>
>>>> + needs access to whole disk all the time. (<span
class="since">Since
>>>> + 4.10.0, QEMU 3.1</span>)
>>>
>>> This will be at least 5.0.0 now.
>>>
>>>> + </li>
>>>> + <li>
>>>> For virtio disks,
>>>> <a href="#elementsVirtio">Virtio-specific
options</a> can also be
>>>> set. (<span class="since">Since
3.5.0</span>)
>>>> diff --git a/docs/schemas/domaincommon.rng
b/docs/schemas/domaincommon.rng
>>>> index b9ac5df..3e406fc 100644
>>>> --- a/docs/schemas/domaincommon.rng
>>>> +++ b/docs/schemas/domaincommon.rng
>>>> @@ -1990,6 +1990,9 @@
>>>> <ref name="detect_zeroes"/>
>>>> </optional>
>>>> <optional>
>>>> + <ref name="metadata_cache_size"/>
>>>> + </optional>
>>>> + <optional>
>>>> <attribute name='queues'>
>>>> <ref name="positiveInteger"/>
>>>> </attribute>
>>>> @@ -2090,6 +2093,14 @@
>>>> </choice>
>>>> </attribute>
>>>> </define>
>>>> + <define name="metadata_cache_size">
>>>> + <attribute name='metadata_cache_size'>
>>>> + <choice>
>>>> + <value>default</value>
>>>> + <value>maximum</value>
>>>
>>> I didn't go back and read the previous reviews nor was I present for the
>>> KVM Forum discussion on this, but default really means "minimum" I
>>> think. Even that just doesn't feel "right". There's 3 QEMU
knobs, but
>>> this changes 1 knob allowing one direction. And yes, changing 3 knobs is
>>> also confusing. I "assume" that it's been determined that this
one knob
>>> has the greatest affect on I/O performance...
>>>
>>> Reading
https://github.com/qemu/qemu/blob/master/docs/qcow2-cache.txt
>>> some phrases stick out:
>>>
>>> 1. "setting the right cache sizes is not a straightforward
operation".
>>> 2. "In order to choose the cache sizes we need to know how they
relate
>>> to the amount of allocated space."
>>> 3. The limit of the l2 cache size is "32 MB by default on Linux
>>> platforms (enough for full coverage of 256 GB images, with the default
>>> cluster size)".
>>> 4. "QEMU will not use more memory than needed to hold all of the
>>> image's L2 tables, regardless of this max. value."
>>>
>>> So at least providing INT_MAX won't hurt, although given the math in
>>> qcow2_update_options_prepare:
>>>
>>> l2_cache_size /= l2_cache_entry_size;
>>> if (l2_cache_size < MIN_L2_CACHE_SIZE) {
>>> l2_cache_size = MIN_L2_CACHE_SIZE;
>>> }
>>> if (l2_cache_size > INT_MAX) {
>>>
>>> wouldn't that mean we could pass up to "l2_cache_size *
>>> l2_cache_entry_size"?
>>
>> You mean up to INT_MAX / l2_cache_entry_size? No, because there
>> is limitation also that comes from read_cache_sizes so cache
>> will be limited to maximum needs of disk size.
>>
>
> So many dependencies... So much math to look at.
>
>>>
>>> Still using minimum/maximum for what amounts to a boolean operation is I
>>> think unnecessary. Rather than a list of values, wouldn't having
>>> something like "max_metadata_cache='yes'" be just as
effective?
>>
>> Well introducing metadata_cache_size enum is what it was agreed on KVM Forum...
>>
>
> But yet providing maximum runs into immediate issues because of bugs
> that are fixed in 3.1.
>
> Devil's advocate says - why not patch QEMU to add something that causes
> the allocation algorithm to use the maximum values for great I/O
> performance - that way mgmt apps don't have to figure out the math and
> if the QEMU algorithm for this ever changes and guest XML settings don't
> have a negative impact. The mgmt app just has to signify it's desire to
> use algorithms that "can" improve I/O performance significantly.
But this is exactly the case with approach this patch presents. Only
'maximum' cache setting is possible which corresponds to maxium IO.
All the complexities of possible cache size values are not revealed.
Understood. My QEMU point still stands... Since someone else is the
expert here, then providing a means to provide a couple of flags to
alter the "default" algorithm to have better values based on knowledge
and experience I think is better than the mgmt tool having to provide
values for any of these knobs. I also understand the QEMU viewpoint -
hey here's a bunch of knobs for you to play with - we don't want to make
the decision which is the best option for your environment. Reminds me
of old stereo equipment with bass, treble, and balance knobs eventually
being replaced by electronic technology to say I want those knobs to be
automatically adjusted for news, music, or movies.
Since we're not going to get that any time soon, then OK have the mgmt
app be forced to take the giant hammer approach.
John
Nikolay
>
>>>
>>> Alternatively, reading through the above document and thinking about how
>>> "useful" it could be to use a more specific value, why not go with
>>> "metadata_cache_size=N", where N is in MB and describes the size of
the
>>> cache to be used. Being in MB ascribes to the need to be a power of 2
>>> and quite frankly keeps a reasonable range assigned to the value. I'd
>>> find it hard to fathom a "large enough" disk would improve that
much I/O
>>> performance in KB adjustments, but I haven't done any analysis of that
>>> theory either.
>>
>> There a lot of discussion of this cache parameter in this list, bugzilla
>> (sorry for not providing links) that boils down to one does not know
>> how to set it properly, so having option that is in MB is hard to use. On
>
> I know there's lots of discussion on this - you did list a few series in
> your first cover. Using MB was merely a reaction to reading the document
> and seeing it says "The maximum L2 cache size is 32 MB by default on
> Linux platforms". So a value between 1 to 32 could be used if someone
> wants to go through the pain of figuring out the nuances of the
> algorithm and difference in performance in their environment for each
> step. Not providing a "set" number though satisfies the desire to avoid
> some number being baked into the guest XML permanently.
>
>> the other hand there are cases (synthetic tests, large database) when default
>> cache size is clearly not big enough so in course of discussion
>> it was decided to have besides default cache size also cache size that
>> cover whole disk so IO performance will not degrade whatever guest will do.
>
> I understand the need... Many previous series have shown that!
>
>> Thus this 'maximum' set. 'default' is just an explicit default
(implicit
>> is omitted metadata_cache_size option of course).
>
> The chosen implementation is boolean - all or default. Was there
> discussion that also noted that perhaps we should add a 3rd or 4th enum
> or string to describe some condition between default and maximum
> ("quarter", "half", "threefourths")?
>
> In the long run it doesn't matter, but as the implementation is posted
> it is a boolean operation. I can live with "minimum" and
"maximum", but
> feel it's important to note the obvious especially since trying to
> describe anything between could take the PHD (piled higher and deeper)
> of doc writing abilities.
>
>>
>> Also having enum instead of bool is a bit future proof )
>
> In my mind only if the choice is at some time in the future to add
> something between default and maximum. I honestly don't see that
> happening, but if the group decision at KVM Forum was to go with some
> sort of enum - I can live with that. Perhaps something was discussed
> there that just isn't obvious with my short term tunnel vision.
>
> John
>
>>
>> Nikolay
>>
>>>
>>> I think the above document also gives a "fair" example that could
be
>>> "reworded" into the libvirt docs in order to help "size"
things based on
>>> the default QEMU values we're not allowing to be changed and then
>>> showing the math what it means to have a 1MB cache, a 2MB cache, an 8MB,
>>> etc. vis-a-vis the size of the disk for which the cache is being
>>> created. For example, for a disk of up to 8G, the "default" cache
would
>>> be XXX and setting a value of YYY would help I/O. Similarly for a 16GB,
>>> 64GB, etc. Essentially a bit of a guide - although that's starting to
>>> border on what should go into the wiki instead.
>>>
>>>> + </choice>
>>>> + </attribute>
>>>> + </define>
>>>> <define name="controller">
>>>> <element name="controller">
>>>> <optional>
>>>> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>>>> index 237540b..b2185f8 100644
>>>> --- a/src/conf/domain_conf.c
>>>> +++ b/src/conf/domain_conf.c
>>>> @@ -885,6 +885,11 @@ VIR_ENUM_IMPL(virDomainDiskDetectZeroes,
VIR_DOMAIN_DISK_DETECT_ZEROES_LAST,
>>>> "on",
>>>> "unmap")
>>>>
>>>> +VIR_ENUM_IMPL(virDomainDiskMetadataCacheSize,
>>>> + VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST,
>>>> + "default",
>>>> + "maximum")
>>>> +
>>>> VIR_ENUM_IMPL(virDomainDiskMirrorState,
VIR_DOMAIN_DISK_MIRROR_STATE_LAST,
>>>> "none",
>>>> "yes",
>>>> @@ -9375,6 +9380,14 @@ virDomainDiskDefDriverParseXML(virDomainDiskDefPtr
def,
>>>> }
>>>> VIR_FREE(tmp);
>>>>
>>>> + if ((tmp = virXMLPropString(cur, "metadata_cache_size"))
&&
>>>> + (def->metadata_cache_size =
virDomainDiskMetadataCacheSizeTypeFromString(tmp)) < 0) {
>>>> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>>>> + _("unknown driver metadata_cache_size value
'%s'"), tmp);
>>>> + goto cleanup;
>>>> + }
>>>> + VIR_FREE(tmp);
>>>> +
>>>
>>> This would just use the YesNo type logic instead or read in a numeric
>>> value. BTW: Not sure I'd bother with worrying about checking the QEMU
>>> maximum as that could change and then we're stuck with a lower maximum
>>> check. Just pass along the value to QEMU and let it fail.
>>>
>>>> ret = 0;
>>>>
>>>> cleanup:
>>>> @@ -23902,6 +23915,10 @@ virDomainDiskDefFormatDriver(virBufferPtr buf,
>>>> if (disk->queues)
>>>> virBufferAsprintf(&driverBuf, "
queues='%u'", disk->queues);
>>>>
>>>> + if (disk->metadata_cache_size)
>>>> + virBufferAsprintf(&driverBuf, "
metadata_cache_size='%s'",
>>>> +
virDomainDiskMetadataCacheSizeTypeToString(disk->metadata_cache_size));
>>>> +
>>>
>>> My personal crusade... metadata_cache_size >
>>> VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_DEFAULT (yes, > 0).
>>>
>>> Of course this all depends on your final solution - a boolean would only
>>> be written if set and an int value only written if > 0 (since 0 would be
>>> the "optional" viewpoint).
>>>
>>>> virDomainVirtioOptionsFormat(&driverBuf, disk->virtio);
>>>>
>>>> ret = virXMLFormatElement(buf, "driver", &driverBuf,
NULL);
>>>> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
>>>> index e30a4b2..b155058 100644
>>>> --- a/src/conf/domain_conf.h
>>>> +++ b/src/conf/domain_conf.h
>>>> @@ -568,6 +568,13 @@ typedef enum {
>>>> VIR_DOMAIN_DISK_DETECT_ZEROES_LAST
>>>> } virDomainDiskDetectZeroes;
>>>>
>>>> +typedef enum {
>>>> + VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_DEFAULT = 0,
>>>> + VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM,
>>>> +
>>>> + VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST
>>>> +} virDomainDiskMetadataCacheSize;
>>>> +
>>>> typedef struct _virDomainBlockIoTuneInfo virDomainBlockIoTuneInfo;
>>>> struct _virDomainBlockIoTuneInfo {
>>>> unsigned long long total_bytes_sec;
>>>> @@ -672,6 +679,7 @@ struct _virDomainDiskDef {
>>>> int discard; /* enum virDomainDiskDiscard */
>>>> unsigned int iothread; /* unused = 0, > 0 specific thread # */
>>>> int detect_zeroes; /* enum virDomainDiskDetectZeroes */
>>>> + int metadata_cache_size; /* enum virDomainDiskMetadataCacheSize */
>>>> char *domain_name; /* backend domain name */
>>>> unsigned int queues;
>>>> virDomainVirtioOptionsPtr virtio;
>>>> @@ -3388,6 +3396,7 @@ VIR_ENUM_DECL(virDomainDeviceSGIO)
>>>> VIR_ENUM_DECL(virDomainDiskTray)
>>>> VIR_ENUM_DECL(virDomainDiskDiscard)
>>>> VIR_ENUM_DECL(virDomainDiskDetectZeroes)
>>>> +VIR_ENUM_DECL(virDomainDiskMetadataCacheSize)
>>>> VIR_ENUM_DECL(virDomainDiskMirrorState)
>>>> VIR_ENUM_DECL(virDomainController)
>>>> VIR_ENUM_DECL(virDomainControllerModelPCI)
>>>> diff --git a/tests/qemuxml2argvdata/disk-metadata_cache_size.xml
b/tests/qemuxml2argvdata/disk-metadata_cache_size.xml
>>>> new file mode 100644
>>>> index 0000000..8ac2599
>>>> --- /dev/null
>>>> +++ b/tests/qemuxml2argvdata/disk-metadata_cache_size.xml
>>>> @@ -0,0 +1,42 @@
>>>> +<domain type='qemu'>
>>>> + <name>test</name>
>>>> + <uuid>468404ad-d49c-40f2-9e14-02294f9c1be3</uuid>
>>>> + <memory unit='KiB'>1048576</memory>
>>>> + <currentMemory unit='KiB'>1048576</currentMemory>
>>>> + <vcpu placement='static'>1</vcpu>
>>>> + <os>
>>>> + <type arch='x86_64'
machine='pc-0.13'>hvm</type>
>>>
>>> Nothing more recent than pc-0.13 ;-) for this copy-pasta.
>>>
>>>> + <boot dev='cdrom'/>
>>>> + <boot dev='hd'/>
>>>> + <bootmenu enable='yes'/>
>>>> + </os>
>>>> + <clock offset='utc'/>
>>>> + <on_poweroff>destroy</on_poweroff>
>>>> + <on_reboot>restart</on_reboot>
>>>> + <on_crash>restart</on_crash>
>>>> + <devices>
>>>> + <emulator>/usr/bin/qemu-system-x86_64</emulator>
>>>> + <disk type='file' device='disk'>
>>>> + <driver name='qemu' type='qcow2'
metadata_cache_size='maximum'/>
>>>> + <source file='/var/lib/libvirt/images/f14.img'/>
>>>> + <target dev='vda' bus='virtio'/>
>>>> + <address type='pci' domain='0x0000'
bus='0x00' slot='0x04' function='0x0'/>
>>>> + </disk>
>>>> + <disk type='file' device='cdrom'>
>>>> + <driver name='qemu' type='raw'/>
>>>> + <source
file='/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso'/>
>>>> + <target dev='hdc' bus='ide'/>
>>>> + <readonly/>
>>>> + <address type='drive' controller='0'
bus='1' target='0' unit='0'/>
>>>> + </disk>
>>>
>>> This second disk isn't necessary
>>>
>>> John
>>>
>>>> + <controller type='usb' index='0'/>
>>>> + <controller type='virtio-serial' index='0'>
>>>> + <address type='pci' domain='0x0000'
bus='0x00' slot='0x06' function='0x0'/>
>>>> + </controller>
>>>> + <controller type='ide' index='0'/>
>>>> + <controller type='pci' index='0'
model='pci-root'/>
>>>> + <input type='mouse' bus='ps2'/>
>>>> + <input type='keyboard' bus='ps2'/>
>>>> + <memballoon model='virtio'/>
>>>> + </devices>
>>>> +</domain>
>>>> diff --git a/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
b/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
>>>> new file mode 100644
>>>> index 0000000..5fed22b
>>>> --- /dev/null
>>>> +++ b/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
>>>> @@ -0,0 +1,48 @@
>>>> +<domain type='qemu'>
>>>> + <name>test</name>
>>>> + <uuid>468404ad-d49c-40f2-9e14-02294f9c1be3</uuid>
>>>> + <memory unit='KiB'>1048576</memory>
>>>> + <currentMemory unit='KiB'>1048576</currentMemory>
>>>> + <vcpu placement='static'>1</vcpu>
>>>> + <os>
>>>> + <type arch='x86_64'
machine='pc-0.13'>hvm</type>
>>>> + <boot dev='cdrom'/>
>>>> + <boot dev='hd'/>
>>>> + <bootmenu enable='yes'/>
>>>> + </os>
>>>> + <clock offset='utc'/>
>>>> + <on_poweroff>destroy</on_poweroff>
>>>> + <on_reboot>restart</on_reboot>
>>>> + <on_crash>restart</on_crash>
>>>> + <devices>
>>>> + <emulator>/usr/bin/qemu-system-x86_64</emulator>
>>>> + <disk type='file' device='disk'>
>>>> + <driver name='qemu' type='qcow2'
metadata_cache_size='maximum'/>
>>>> + <source file='/var/lib/libvirt/images/f14.img'/>
>>>> + <target dev='vda' bus='virtio'/>
>>>> + <address type='pci' domain='0x0000'
bus='0x00' slot='0x04' function='0x0'/>
>>>> + </disk>
>>>> + <disk type='file' device='cdrom'>
>>>> + <driver name='qemu' type='raw'/>
>>>> + <source
file='/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso'/>
>>>> + <target dev='hdc' bus='ide'/>
>>>> + <readonly/>
>>>> + <address type='drive' controller='0'
bus='1' target='0' unit='0'/>
>>>> + </disk>
>>>> + <controller type='usb' index='0'>
>>>> + <address type='pci' domain='0x0000'
bus='0x00' slot='0x01' function='0x2'/>
>>>> + </controller>
>>>> + <controller type='virtio-serial' index='0'>
>>>> + <address type='pci' domain='0x0000'
bus='0x00' slot='0x06' function='0x0'/>
>>>> + </controller>
>>>> + <controller type='ide' index='0'>
>>>> + <address type='pci' domain='0x0000'
bus='0x00' slot='0x01' function='0x1'/>
>>>> + </controller>
>>>> + <controller type='pci' index='0'
model='pci-root'/>
>>>> + <input type='mouse' bus='ps2'/>
>>>> + <input type='keyboard' bus='ps2'/>
>>>> + <memballoon model='virtio'>
>>>> + <address type='pci' domain='0x0000'
bus='0x00' slot='0x03' function='0x0'/>
>>>> + </memballoon>
>>>> + </devices>
>>>> +</domain>
>>>> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c
>>>> index 89640f6..c44e0fe 100644
>>>> --- a/tests/qemuxml2xmltest.c
>>>> +++ b/tests/qemuxml2xmltest.c
>>>> @@ -1235,6 +1235,8 @@ mymain(void)
>>>> DO_TEST("riscv64-virt",
>>>> QEMU_CAPS_DEVICE_VIRTIO_MMIO);
>>>>
>>>> + DO_TEST("disk-metadata_cache_size", NONE);
>>>> +
>>>> if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL)
>>>> virFileDeleteTree(fakerootdir);
>>>>
>>>>