Re: [libvirt] [PATCH v2] qemu: Add option to enable/disable IOEventFD feature

20 May 2011

      On Thu, May 19, 2011 at 11:20 AM, Daniel Veillard <veillard@redhat.com> wrote:
...
On Thu, May 19, 2011 at 09:44:35AM +0100, Stefan Hajnoczi wrote:
...
On Thu, May 19, 2011 at 8:26 AM, Daniel Veillard <veillard@redhat.com> wrote:
...
On Wed, May 18, 2011 at 04:07:30PM +0800, Daniel Veillard wrote:
...
On Tue, May 17, 2011 at 03:56:11PM +0100, Daniel P. Berrange wrote:
...
On Tue, May 17, 2011 at 04:49:17PM +0200, Michal Privoznik wrote:
...
This feature allows QEMU to achieve higher throughput, but is available
only in recent versions. It is accessible via ioeventfd attribute
with accepting values 'on', 'off'. Only experienced users needs to set
this, because QEMU defaults to 'on', meaning higher performance.
Translates into virtio-{blk|net}-pci.ioeventfd option.
[...]
+          <li>
+           The optional <code>ioeventfd</code> attribute enables or disables
+           IOEventFD feature for virtqueue notify. The value can be either
+           'on' or 'off'.
+            <span class="since">Since 0.9.2 (QEMU and KVM only)</span>
This is a qemu specific attribute name & description. IMHO we shouldn't
be exposing that directly. Who even knows what effect it actually has
on the guests...
  Agreed, what is the semantic of this flag, beside allowing to switch
something in qemu ?
 Just to clarify my answer a bit, the problem here is that the patch
does not explain what the ioeventfd qemu flag does in practice and how
it influence the virtualization. To be able to provide a good API and
maintain it long term we need to be able to explain the semantic of
the API (be it a function of the library or part of the XML being used),
only then we can guarantee that there is no misunderstanding about what
it does, and also allow us to reuse it in case the same functionality
is provided by another hypervisor.
 So instead of explaining the option using terms from QEmu, let's
explain what it does in general terms and use those general terms to
model the API,
I don't think there is a general API here, ioeventfd is specific to
QEMU's architecture.  It allows you to switch between two internal
threading models for handling I/O emulation.  It could change in the
future if QEMU's architecture changes.  This is not an end-user
feature, it's more an internal performance tunable.
 Actually reading about it at
  https://patchwork.kernel.org/patch/43390/
it seems that can be described as
  "domain I/O asynchronous handling",
it's a shortcut because it's not for the whole I/O only a part of it
but that in itself is sufficiently generic to be potentially useful
for something else.
 I would just suggest to rename the attribute "asyncio" with value
on or off, document the fact that it allows to force on or off some of
the asynchronous I/O handling for the device, and that the default is
left to the discretion of the hypervisor.
 In case we need to refine later, we can still provide a larger set of
accepted values for that attribute, assuming people really want to
make more distinctive tuning,
Inventing a different name makes life harder for everyone.  There is a
need for a generic API/notation that covers all virtualization
software but this is a hypervisor-specific performance tunable that
does not benefit from abstraction.

When I ask a user to try disabling ioeventfd I need to first search
through libvirt documentation and/or source code to reverse-engineer
this artificial mapping.  This creates an extra source of errors for
people who are trying to configure or troubleshoot their systems.  The
"I know what the hypervisor-specific setting is but have no idea how
to express it in libvirt domain XML" problem is really common and
creates a gap between the hypervisor and libvirt communities.

The next time an optimization is added to QEMU you'll have to pick a
new name, "asyncio" (already overloaded terminology today) won't be
available anymore.  We're going to end up with increasingly contrived
or off-base naming.

Regarding semantics:

Ioeventfd decouples vcpu execution from I/O emulation, allowing the VM
to execute guest code while a separate thread handles I/O.  This
results in reduced steal time and lowers spinlock contention inside
the guest.  Typically guests that are experiencing high system cpu
utilization during I/O will benefit from ioeventfd.  On an
overcommitted host it could increase guest I/O latency though.  The
ioeventfd option is currently only supported on virtio-blk (default:
on) and virtio-net (default: off) devices.

Please call it ioeventfd.  Also, it can always be toggled using the
<qemu:commandline> tag if you don't want to expose it natively in
domain XML.

Stefan