On Thu, May 19, 2011 at 11:20 AM, Daniel Veillard <veillard(a)redhat.com> wrote:
On Thu, May 19, 2011 at 09:44:35AM +0100, Stefan Hajnoczi wrote:
> On Thu, May 19, 2011 at 8:26 AM, Daniel Veillard <veillard(a)redhat.com> wrote:
> > On Wed, May 18, 2011 at 04:07:30PM +0800, Daniel Veillard wrote:
> >> On Tue, May 17, 2011 at 03:56:11PM +0100, Daniel P. Berrange wrote:
> >> > On Tue, May 17, 2011 at 04:49:17PM +0200, Michal Privoznik wrote:
> >> > > This feature allows QEMU to achieve higher throughput, but is
available
> >> > > only in recent versions. It is accessible via ioeventfd attribute
> >> > > with accepting values 'on', 'off'. Only
experienced users needs to set
> >> > > this, because QEMU defaults to 'on', meaning higher
performance.
> >> > > Translates into virtio-{blk|net}-pci.ioeventfd option.
> >> [...]
> >> > > + <li>
> >> > > + The optional <code>ioeventfd</code>
attribute enables or disables
> >> > > + IOEventFD feature for virtqueue notify. The value can
be either
> >> > > + 'on' or 'off'.
> >> > > + <span class="since">Since 0.9.2 (QEMU
and KVM only)</span>
> >> >
> >> > This is a qemu specific attribute name & description. IMHO we
shouldn't
> >> > be exposing that directly. Who even knows what effect it actually has
> >> > on the guests...
> >>
> >> Agreed, what is the semantic of this flag, beside allowing to switch
> >> something in qemu ?
> >
> > Just to clarify my answer a bit, the problem here is that the patch
> > does not explain what the ioeventfd qemu flag does in practice and how
> > it influence the virtualization. To be able to provide a good API and
> > maintain it long term we need to be able to explain the semantic of
> > the API (be it a function of the library or part of the XML being used),
> > only then we can guarantee that there is no misunderstanding about what
> > it does, and also allow us to reuse it in case the same functionality
> > is provided by another hypervisor.
> > So instead of explaining the option using terms from QEmu, let's
> > explain what it does in general terms and use those general terms to
> > model the API,
>
> I don't think there is a general API here, ioeventfd is specific to
> QEMU's architecture. It allows you to switch between two internal
> threading models for handling I/O emulation. It could change in the
> future if QEMU's architecture changes. This is not an end-user
> feature, it's more an internal performance tunable.
Actually reading about it at
https://patchwork.kernel.org/patch/43390/
it seems that can be described as
"domain I/O asynchronous handling",
it's a shortcut because it's not for the whole I/O only a part of it
but that in itself is sufficiently generic to be potentially useful
for something else.
I would just suggest to rename the attribute "asyncio" with value
on or off, document the fact that it allows to force on or off some of
the asynchronous I/O handling for the device, and that the default is
left to the discretion of the hypervisor.
In case we need to refine later, we can still provide a larger set of
accepted values for that attribute, assuming people really want to
make more distinctive tuning,
Inventing a different name makes life harder for everyone. There is a
need for a generic API/notation that covers all virtualization
software but this is a hypervisor-specific performance tunable that
does not benefit from abstraction.
When I ask a user to try disabling ioeventfd I need to first search
through libvirt documentation and/or source code to reverse-engineer
this artificial mapping. This creates an extra source of errors for
people who are trying to configure or troubleshoot their systems. The
"I know what the hypervisor-specific setting is but have no idea how
to express it in libvirt domain XML" problem is really common and
creates a gap between the hypervisor and libvirt communities.
The next time an optimization is added to QEMU you'll have to pick a
new name, "asyncio" (already overloaded terminology today) won't be
available anymore. We're going to end up with increasingly contrived
or off-base naming.
Regarding semantics:
Ioeventfd decouples vcpu execution from I/O emulation, allowing the VM
to execute guest code while a separate thread handles I/O. This
results in reduced steal time and lowers spinlock contention inside
the guest. Typically guests that are experiencing high system cpu
utilization during I/O will benefit from ioeventfd. On an
overcommitted host it could increase guest I/O latency though. The
ioeventfd option is currently only supported on virtio-blk (default:
on) and virtio-net (default: off) devices.
Please call it ioeventfd. Also, it can always be toggled using the
<qemu:commandline> tag if you don't want to expose it natively in
domain XML.
Stefan