On Tue, Feb 21, 2017 at 01:26:25PM +0000, Daniel P. Berrange wrote:
On Tue, Feb 21, 2017 at 02:14:44PM +0100, Pavel Hrdina wrote:
> On Tue, Feb 21, 2017 at 12:55:51PM +0000, Daniel P. Berrange wrote:
> > On Tue, Feb 21, 2017 at 01:48:15PM +0100, Pavel Hrdina wrote:
> > > On Tue, Feb 21, 2017 at 12:26:02PM +0000, Daniel P. Berrange wrote:
> > > > On Tue, Feb 21, 2017 at 01:14:57PM +0100, Pavel Hrdina wrote:
> > > > > QEMU 2.9.0 will introduce polling feature for AioContext that
instead
> > > > > of blocking syscalls polls for events without blocking. This
means
> > > > > that polling can be in most cases faster but it also increases
CPU
> > > > > utilization.
> > > > >
> > > > > To address this issue QEMU implements self-tuning algorithm
that
> > > > > modifies the current polling time to adapt to different
workloads
> > > > > and it can also fallback to blocking syscalls.
> > > > >
> > > > > For each IOThread this all is controlled by three parameters,
> > > > > poll-max-ns, poll-grow and poll-shrink. If parameter
poll-max-ns
> > > > > is set to 0 it disables the polling, if it is omitted the
default
> > > > > behavior is used and any value more than 0 enables polling.
> > > > > Parameters poll-grow and poll-shrink configure how the
self-tuning
> > > > > algorithm will adapt the current polling time. If they are
omitted
> > > > > or set to 0 default values will be used.
> > > >
> > > > With my app developer hat on I have to wonder how an app is supposed
> > > > to figure out what to set these parameters to ? It has been
difficult
> > > > enough figuring out existing QEMU block tunables, but at least most
> > > > of those can be set dependant on the type of storage use on the host
> > > > side. Tunables whose use depends on the guest workload are harder to
> > > > use since it largely involves predicting the unknown. IOW is there
> > > > a compelling reason to add these low level parameters that are
tightly
> > > > coupled to the specific algorithm that QEMU happens to use today.
> > >
> > > I agree that it's probably way to complicated for management
applications,
> > > but there is a small issue with QEMU. Currently it you don't specify
> > > anything the polling is enabled with some reasonable default value and
base
> > > on experience with QEMU I'm not planning to count on that they will
not change
> > > the default behavior in the future. To explicitly enable polling you
need
> > > to set poll-max-ns to some value more than 0. We would have to check
qemu
> > > source code, and define the default value in our code in order to let
> > > users explicitly enable polling.
> >
> > The QEMU commit says polling is now enabled by default without needing
> > to set poll-max-ns AFAICT
> >
> > commit cdd7abfdba9287a289c404dfdcb02316f9ffee7d
> > Author: Stefan Hajnoczi <stefanha(a)redhat.com>
> > Date: Thu Jan 26 17:01:19 2017 +0000
> >
> > iothread: enable AioContext polling by default
> >
> > IOThread AioContexts are likely to consist only of event sources like
> > virtqueue ioeventfds and LinuxAIO completion eventfds that are pollable
> > from userspace (without system calls).
> >
> > We recently merged the AioContext polling feature but didn't enable it
> > by default yet. I have gone back over the performance data on the
> > mailing list and picked a default polling value that gave good results.
> >
> > Let's enable AioContext polling by default so users don't have
another
> > switch they need to set manually. If performance regressions are found
> > we can still disable this for the QEMU 2.9 release.
> >
> > > > The QEMU commits say the tunables all default to sane parameters so
> > > > I'm inclined to say we ignore them at the libvirt level
entirely.
> > >
> > > Yes, it would be way batter to have only <polling
enable='yes|no'> end let
> > > QEMU deal with the sane values for all parameters but that would mean to
come
> > > up with the sane values ourselves or modify QEMU to add another property
that
> > > would simply control whether it is enabled or not.
> >
> > I'm saying don't even add that.
> >
> > Do exactly nothing and just rely on the QEMU defaults here. This is not
> > affecting guest ABI at all so it doesn't matter if QEMU changes its
> > defaults later. In fact if QEMU changes defaults based on newer performance
> > measurements, it is a good thing if libvirt hasn't hardcoded all its VM
> > configs to the old default.
>
> What if someone would like to disable it even if QEMU thinks that the
> performance is good? This patch series doesn't hardcode anything into
> VM config. If you don't set the polling element at all Libvirt will follow
> the QEMU defaults and only the live XML would contain current state of
> polling with default values loaded from QEMU.
>
> This patch series ads a possibility to explicitly configure the polling if
> someone want's to do that for some reason, but it also preserve the benefit
> when you just don't care about it and you want to use the default.
>
> If you still think that we should not export this feature at all, well we
> don't have to. The use-case that you've described is still possible with
> this series, it only adds extra functionality on top of that.
I'm very wary of adding config parameters in libvirt just because they
exist in QEMU, particularly when the parameters are totally specific to
an algorithm that just happens to be the one implemented right now. We've
no idea if QEMU will stick with this algorithm or change it entirely, and
if the latter, then any config parameters will be likely meaningless for
any other algorithm. I can understand why QEMU would expose them on its
CLI, but I don't feel they are a good fit for exposing up the stack,
particularly given a lack of any guidance as to how people might consider
changing the values, other than random uninformed guesswork.
Yes, that's true and I had the same worrying feeling about the parameters
specifically tied to QEMU algorithm, but I thought that it would be nice
to export them anyway. Let's ignore this series for now and if someone asks
for a feature to disable polling we can revive it.
Pavel