On 07/18/2017 07:12 AM, Michal Privoznik wrote:
[Adding MST who wrote qemu patches]
On 07/18/2017 12:21 PM, Peter Krempa wrote:
> On Tue, Jul 18, 2017 at 09:01:31 +0200, Michal Privoznik wrote:
>> On 07/18/2017 08:23 AM, Peter Krempa wrote:
>>> On Mon, Jul 17, 2017 at 15:39:56 +0200, Michal Privoznik wrote:
>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1462653
>>> This bugzilla is not public.
>> Okay, I'll drop it from the commit message.
> Also add proper explanation what the benefits are, since upstream can't
> read the motivation from the bugzilla.
>
>>>> Just like I've added support for setting rx_queue_size (in
>>>> c56cdf259 and friends), qemu just gained support for setting tx
>>>> ring size.
> [...]
>
>>>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
>>>> index c12efcf78..58662cf48 100644
>>>> --- a/docs/formatdomain.html.in
>>>> +++ b/docs/formatdomain.html.in
>>> [...]
>>>
>>>> @@ -5201,6 +5201,20 @@ qemu-kvm -net nic,model=? /dev/null
>>>> <b>In general you should leave this option alone, unless
you
>>>> are very certain you know what you are doing.</b>
>>>> </dd>
>>>> + <dt><code>tx_queue_size</code></dt>
>>>> + <dd>
>>>> + The optional <code>tx_queue_size</code> attribute
controls
>>>> + the size of virtio ring for each queue as described above.
>>>> + The default value is hypervisor dependent and may change
>>>> + across its releases. Moreover, some hypervisors may pose
>>>> + some restrictions on actual value. For instance, latest
>>>> + QEMU (as of 2017-07-13) requires value to be a power of two
> Refer to a proper version of qemu when this is supported, not a date.
>
>>>> + from [256, 1024] range.
>>>> + <span class="since">Since 3.6.0 (QEMU and KVM
only)</span><br/><br/>
>>> This is ridiculous. Since we can't figure out how to set this, how are
>>> users supposed to figure this out?
>> Well, you've cut off the line that reads;
>>
>> <b>In general you should leave this option alone, unless you
>> are very certain you know what you are doing.</b>
>>
>> So only users that know how virtio works under the hood are expected to
>> also know what rx/tx queue size is and how to set it. But frankly, I
> This statement is ridiculous by itself.
Why? There are more experienced users (for whom libvirt's just a stable
API) and less experienced ones (for whom libvirt's and tools on the top
of it are great for their automatic chose of parameters, e.g. gnome boxes).
Beyond that, that same statement (or something nearly identical) is
repeated in multiple places throughout the XML documentation. There are
at least two classes of these attributes that I can think of:
1) attributes that are automatically set to a sane value by libvirt when
not specified (and they usually *aren't* specified), and saved in the
config XML in order to assure they are set the same every time the
domain is started (to preserve guest ABI). These are intended to record
whatever was the default setting for the attribute at the time the
domain was created. For example, a pcie-root-port controller might have
<model name='ioh3420'/> set, if that was the only type of pcie-root-port
available at the time a domain was created; this comes in handy now that
there is a generic pcie-root-port (which also has <model
name='pcie-root-port'/>) - existing domains don't get their ABI screwed
up when they're migrated to a newer host.
2) knobs that have been added in over the years at the request/demand
from below (qemu) and above (ovirt / openstack), many of them obscure,
difficult to explain with any amount of useful detail, and almost never
worthy of being manually set (and if you "don't know what you're
doing",
you're just as likely to make things worse as to make them better).
tx_queue_size is one of the latter.
For either of these types of attributes, they need to be documented so
that someone encountering them (or actively searching them out) will at
least have a basic idea what the attribute is named / used for, but we
also need to warn the casual user to not mess with them. I don't see
anything ridiculous about that.
>> think users setting this are always gonna go with the highest value
>> avaliable (1024). Such detailed description is a copy of rx_virtio_queue
>> size description which is result of review.
>>
>>> Is it really needed? How should it be configured? Can't we or qemu pick
>>> a sane value?
>>>
>> No. Some users need bigger virtio rings otherwise they see a packet
>> drop. So this is a fine tuning that heavily depends on the use case.
>> Thus libvirt should not try to come up with some value.
> That's why I think it's wrong. What's the drawback of setting it to
> maximum? Which workloads will hit it? Why is the default not sufficient?
>
> And most notably, how do the users figure out that they need it?
I think it's a bit too much burden on libvirt to expect that much detail
to be included in formatdomain.html. Heck, I don't know if anyone even
*knows* that much detail right now.
I'll leave this for MST to answer.
> In this case, there are no anchor points that users can use to figure
> out if they need a setting like this. In addition putting in a warning
> that a setting should not be touched makes it rather useless.
I disagree. The setting is documented so that people can know that it
exists. Anyone *really* interested in performance can look up
information about it directly from the source and/or run their own
performance tests. My understanding is that's what's done with settings
like this, not just in libvirt and qemu, but in the Linux kernel too -
people like Red Hat's performance testing groups will run a bunch of
benchmark tests with different loads, comparing the test results with
different settings of the tuning attributes, and publish white papers
with recommendations for settings based on what loads a customer will be
running on their systems (over time, these performance tests may
discover that one particular setting is the best in *all* conditions,
and in that case either qemu or libvirt can begin setting that as the
default). But it's impractical to expect the person adding the knob to
perform such performance testing and document the proper setting for the
knob before it's even been pushed upstream. Instead, we add the knobs
and let the perf testing teams know about them, then they run their
barrage of tests and tell everyone what (if anything) to do with those
knobs.
Well, it can be viewed as counter part to rx_queue_size which we
already
have.
> Is there any writeup that we could point users to which would explain
> this feature?
>
I'm afraid there's nothing else than BZ I've linked and qemu patches.
I think the documentation Peter is looking for will come later, from
someone else.