* Daniel P. Berrange <berrange(a)redhat.com> [2010-08-24 11:02:44]:
On Tue, Aug 24, 2010 at 01:05:26PM +0530, Balbir Singh wrote:
> * Nikunj A. Dadhania <nikunj(a)linux.vnet.ibm.com> [2010-08-24 11:53:27]:
>
> >
> > Subject: [RFC] Memory controller exploitation in libvirt
> >
> > Memory CGroup is a kernel feature that can be exploited effectively in the
> > current libvirt/qemu driver. Here is a shot at that.
> >
> > At present, QEmu uses memory ballooning feature, where the memory can be
> > inflated/deflated as and when needed, co-operatively between the host and
> > the guest. There should be some mechanism where the host can have more
> > control over the guests memory usage. Memory CGroup provides features such
> > as hard-limit and soft-limit for memory, and hard-limit for swap area.
> >
> > Design 1: Provide new API and XML changes for resource management
> > =================================================================
> >
> > All the memory controller tunables are not supported with the current
> > abstractions provided by the libvirt API. libvirt works on various OS. This
> > new API will support GNU/Linux initially and as and when other platforms
> > starts supporting memory tunables, the interface could be enabled for
> > them. Adding following two function pointer to the virDriver interface.
> >
> > 1) domainSetMemoryParameters: which would take one or more name-value
> > pairs. This makes the API extensible, and agnostic to the kind of
> > parameters supported by various Hypervisors.
> > 2) domainGetMemoryParameters: For getting current memory parameters
> >
> > Corresponding libvirt public API:
> > int virDomainSetMemoryParamters (virDomainPtr domain,
> > virMemoryParamterPtr params,
> > unsigned int nparams);
> > int virDomainGetMemoryParamters (virDomainPtr domain,
> > virMemoryParamterPtr params,
> > unsigned int nparams);
> >
> >
>
> Does nparams imply setting several parameters together? Does bulk
> loading help? I would prefer splitting out the API if possible
> into
>
> virCgroupSetMemory() - already present in src/util/cgroup.c
> virCgroupGetMemory() - already present in src/util/cgroup.c
> virCgroupSetMemorySoftLimit()
> virCgroupSetMemoryHardLimit()
> virCgroupSetMemorySwapHardLimit()
> virCgroupGetStats()
Nope, we don't want cgroups exposed in the public API, since this
has to be applicable to the VMWare and OpenVZ drivers too.
I am not talking about exposing these as public API, but
be a part of src/util/cgroup.c and utilized by the qemu driver.
It is good to abstract out the OS independent parts, but my concern
was double exposure through API like driver->setMemory() that is currently
used and the newer API.
> > Parameter list supported:
> >
> > MemoryHardLimits (memory.limits_in_bytes) - Maximum memory
> > MemorySoftLimits (memory.softlimit_in_bytes) - Desired memory
>
> Soft limits allows you to set memory limit on contention.
>
> > MemoryMinimumGaurantee - Minimum memory required (without this amount of
> > memory, VM should not be started)
> >
> > SwapHardLimits (memory.memsw_limit_in_bytes) - Maximum swap
> > SwapSoftLimits (Currently not supported by kernel) - Desired swap space
> >
>
> We *dont* support SwapSoftLimits in the memory cgroup controller with
> no plans to support it in the future either at this point. The
> semantics are just too hard to get right at the moment.
That's not a huge problem. Since we have many hypervisors to support
in libvirt, I expect the set of tunables will expand over time, and
not every hypervisor driver in libvirt will support every tunable.
They'll just pick the tunables that apply to them. We can leave
SwapSoftLimits out of the public API until we find a HV that needs
it
>
> > Tunables memory.limit_in_bytes, memory.softlimit_in_bytes and
> > memory.memsw_limit_in_bytes are provided by the memory controller in the
> > Linux kernel.
> >
> > I am not an expert here, so just listing what new elements need to be added
> > to the XML schema:
> >
> > <define name="resource">
> > <element memory>
> > <element memoryHardLimit/>
> > <element memorySoftLimit/>
> > <element memoryMinGaurantee/>
> > <element swapHardLimit/>
> > <element swapSoftLimit/>
> > </element>
> > </define>
> >
>
> I'd prefer a syntax that integrates well with what we currently have
>
> <cgroup>
> <path>...</path>
> <controller>
> <name>..</name>
> <soft limit>...</>
> <hard limit>...</>
> </controller>
> ...
> </cgroup>
That is exposing far too much info about the cgroups implementation
details. The XML representation needs to be decouple from the
implementation.
Don't we already expose a lot of information about qemu for example
about vhost net's or cmdline's/virtio etc in the qemu configuration of
a guest. I am not opposed to having a higher level abstraction but
concerned that some of the nitty-gritty details like swappiness (yes
that is a tunable) or the interpretation of stats might vary widely
across operating systems. Hence, I felt it is better to expose it as a
part of the qemu-cgroup-linux driver combo.
--
Three Cheers,
Balbir