On Wed, Sep 6, 2023 at 7:24 PM Daniel P. Berrangé <berrange(a)redhat.com>
wrote:
On Wed, Sep 06, 2023 at 06:54:29PM +0800, Yong Huang wrote:
> Thanks Daniel for the comments !
>
> On Wed, Sep 6, 2023 at 4:48 PM Daniel P. Berrangé <berrange(a)redhat.com>
> wrote:
>
> > On Tue, Aug 01, 2023 at 05:31:12PM +0800, ~hyman wrote:
> > > From: Hyman Huang(黄勇) <yong.huang(a)smartx.com>
> > >
> > > The upper limit (megabyte/s) of the dirty page rate configured
> > > by the user can be tracked by the XML. To allow this, add the
> > > following XML:
> > >
> > > <domain>
> > > ...
> > > <vcpu current='2'>3</vcpu>
> > > <vcpus>
> > > <vcpu id='0' hotpluggable='no'
dirty_limit='10' order='1'.../>
> > > <vcpu id='1' hotpluggable='yes'
dirty_limit='10' order='2'.../>
> > > </vcpus>
> > > ...
> > >
> > > The "dirty_limit" attribute in "vcpu" sub-element
within "vcpus"
> > > element allows to set an upper limit for the individual vCPU. The
> > > value can be set dynamically by limit-dirty-page-rate API.
> > >
> > > Note that the dirty limit feature is based on the dirty-ring
> > > feature, so it requires dirty-ring size configuration in XML.
> > >
> > > Signed-off-by: Hyman Huang(黄勇) <yong.huang(a)smartx.com>
> > > ---
> > > docs/formatdomain.rst | 7 ++++++-
> > > src/conf/domain_conf.c | 26 ++++++++++++++++++++++++
> > > src/conf/domain_conf.h | 8 ++++++++
> > > src/conf/domain_validate.c | 33
+++++++++++++++++++++++++++++++
> > > src/conf/schemas/domaincommon.rng | 5 +++++
> > > 5 files changed, 78 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> > > index bc469e5f9f..337b7ec9cc 100644
> > > --- a/docs/formatdomain.rst
> > > +++ b/docs/formatdomain.rst
> >
> > > @@ -715,6 +715,11 @@ CPU Allocation
> > > be enabled and non-hotpluggable. On PPC64 along with it vCPUs
that
> > are in the
> > > same core need to be enabled as well. All non-hotpluggable CPUs
> > present at
> > > boot need to be grouped after vCPU 0. :since:`Since 2.2.0 (QEMU
> > only)`
> > > + ``dirty_limit`` :since:`Since 9.7.0 (QEMU and KVM only)`
> > > + The optional attribute ``dirty_limit`` allows to set an upper
limit
> > (MB/s)
> > > + of the dirty page rate for the vCPU. User can change the upper
limit
> > value
> > > + dynamically by using ``limit-dirty-page-rate`` API. Require
> > ``dirty-ring``
> > > + size configured.
> >
> > What scenarios would you want to apply such a limit ?
> >
> To be completely honest, I haven't given the scenarios anythought. This
> method has been utilized up till now by migration to throttle the guest
> within QEMU.
Lets say the guest has 2 vcpus and current dirty rates are
vcpu 0: 3 mb/s
vcpu 1: 150 mb/s
if we put a limit on vcpu 1 of 50 mb/s supposedly that would
reduce rates sufficient to let us migrate. But at any point
in time the guest OS scheduler could move the workload to
vcpu 0 instead. So we would actually want the dirty limi to
apply to both vCPUs, even though at this current snapshot
in time, vcpu 1 has the bigger dirty rate.
> The most likely situation, in my opinion, is that some virtual machines
on a
> host are using up all of the memory bandwidth, either intentionally or
> accidentally, while other virtual machines perform worse when writing to
> memory.
If we're protecting against either an accidental or malicious guest
workload, then again we need to apply the same limit across all vCPUs,
otherwise the guest can just shift its workload to the least limited
vCPU.
IOW, in both these scenarios I don't see a reason to allow the rate
limit to be set differently on each vCPU. A single value set across
all vCPUs is needed to offer the protection desired, otherwise the
guest trivially escapes the limit by moving workloads across vCPUs
Ok, that works for me.
> > Is there to admins for sensible values to use when
> > setting this limit ?
> >
>
> Currently, the "calc-dirty-rate" API can be used by upper-level apps to
> peek at
> the dirty page rate of the vCPU. If more details regarding setting values
> were
> required, we would add them to the comments.
>
> >
> > What is the impact on the guest if it hits the limit ?
> >
>
> Guest memory writes would take longer than usual to complete.
>
> > Is the snigle vCPU blocked for the remainder of some
> > timeslice, or are all vCPUs blocked ?
> >
> Single vCPU.
>
> >
> > Does it even make sense to control this with different
> > values per-VCPU as opposed to a single value for the VM
> > as a whole ?
> >
>
> Since this issue needs to be discussed, I'm inclined to offer a per-vCPU
API
> and let the top APP make the final decision.
On the XML representation more generally, allow of our tuning
related to CPUs is under the <cputune> element somewhere. We
also usually allow units to be set explicitly. eg we might
want
<cputune>
<vcpudirtyrate limit='23' units='mb'/>
</cputune>
If we follow our normal practice that would also imply allowing
a 'vcpus' bitmask eg
<cputune>
<vcpudirtyrate vcpus='0-3^2' limit='23' units='mb'/>
</cputune>
which would also imply allowing multiple
<cputune>
<vcpudirtyrate vcpus='0-3^2' limit='23' units='mb'/>
<vcpudirtyrate vcpus='2' limit='10' units='mb'/>
</cputune>
Thanks to the comment, this api is more graceful and I'll try it
in the next version.
Thanks,
Yong
--
Best regards