
* Daniel P. Berrange <berrange@redhat.com> [2010-08-24 11:02:44]:
On Tue, Aug 24, 2010 at 01:05:26PM +0530, Balbir Singh wrote:
* Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com> [2010-08-24 11:53:27]:
Subject: [RFC] Memory controller exploitation in libvirt
Memory CGroup is a kernel feature that can be exploited effectively in the current libvirt/qemu driver. Here is a shot at that.
At present, QEmu uses memory ballooning feature, where the memory can be inflated/deflated as and when needed, co-operatively between the host and the guest. There should be some mechanism where the host can have more control over the guests memory usage. Memory CGroup provides features such as hard-limit and soft-limit for memory, and hard-limit for swap area.
Design 1: Provide new API and XML changes for resource management =================================================================
All the memory controller tunables are not supported with the current abstractions provided by the libvirt API. libvirt works on various OS. This new API will support GNU/Linux initially and as and when other platforms starts supporting memory tunables, the interface could be enabled for them. Adding following two function pointer to the virDriver interface.
1) domainSetMemoryParameters: which would take one or more name-value pairs. This makes the API extensible, and agnostic to the kind of parameters supported by various Hypervisors. 2) domainGetMemoryParameters: For getting current memory parameters
Corresponding libvirt public API: int virDomainSetMemoryParamters (virDomainPtr domain, virMemoryParamterPtr params, unsigned int nparams); int virDomainGetMemoryParamters (virDomainPtr domain, virMemoryParamterPtr params, unsigned int nparams);
Does nparams imply setting several parameters together? Does bulk loading help? I would prefer splitting out the API if possible into
virCgroupSetMemory() - already present in src/util/cgroup.c virCgroupGetMemory() - already present in src/util/cgroup.c virCgroupSetMemorySoftLimit() virCgroupSetMemoryHardLimit() virCgroupSetMemorySwapHardLimit() virCgroupGetStats()
Nope, we don't want cgroups exposed in the public API, since this has to be applicable to the VMWare and OpenVZ drivers too.
I am not talking about exposing these as public API, but be a part of src/util/cgroup.c and utilized by the qemu driver. It is good to abstract out the OS independent parts, but my concern was double exposure through API like driver->setMemory() that is currently used and the newer API.
Parameter list supported:
MemoryHardLimits (memory.limits_in_bytes) - Maximum memory MemorySoftLimits (memory.softlimit_in_bytes) - Desired memory
Soft limits allows you to set memory limit on contention.
MemoryMinimumGaurantee - Minimum memory required (without this amount of memory, VM should not be started)
SwapHardLimits (memory.memsw_limit_in_bytes) - Maximum swap SwapSoftLimits (Currently not supported by kernel) - Desired swap space
We *dont* support SwapSoftLimits in the memory cgroup controller with no plans to support it in the future either at this point. The semantics are just too hard to get right at the moment.
That's not a huge problem. Since we have many hypervisors to support in libvirt, I expect the set of tunables will expand over time, and not every hypervisor driver in libvirt will support every tunable. They'll just pick the tunables that apply to them. We can leave SwapSoftLimits out of the public API until we find a HV that needs it
Tunables memory.limit_in_bytes, memory.softlimit_in_bytes and memory.memsw_limit_in_bytes are provided by the memory controller in the Linux kernel.
I am not an expert here, so just listing what new elements need to be added to the XML schema:
<define name="resource"> <element memory> <element memoryHardLimit/> <element memorySoftLimit/> <element memoryMinGaurantee/> <element swapHardLimit/> <element swapSoftLimit/> </element> </define>
I'd prefer a syntax that integrates well with what we currently have
<cgroup> <path>...</path> <controller> <name>..</name> <soft limit>...</> <hard limit>...</> </controller> ... </cgroup>
That is exposing far too much info about the cgroups implementation details. The XML representation needs to be decouple from the implementation.
Don't we already expose a lot of information about qemu for example about vhost net's or cmdline's/virtio etc in the qemu configuration of a guest. I am not opposed to having a higher level abstraction but concerned that some of the nitty-gritty details like swappiness (yes that is a tunable) or the interpretation of stats might vary widely across operating systems. Hence, I felt it is better to expose it as a part of the qemu-cgroup-linux driver combo. -- Three Cheers, Balbir