On Fri, May 30, 2014 at 03:42:01PM +0200, Michal Privoznik wrote:
On 30.05.2014 14:46, Daniel P. Berrange wrote:
>On Fri, May 30, 2014 at 01:41:06PM +0200, Michal Privoznik wrote:
>>On 30.05.2014 10:52, Daniel P. Berrange wrote:
>>>On Thu, May 29, 2014 at 10:32:35AM +0200, Michal Privoznik wrote:
>>>> /**
>>>>+ * virNodeHugeTLB:
>>>>+ * @conn: pointer to the hypervisor connection
>>>>+ * @type: type
>>>>+ * @params: pointer to memory parameter object
>>>>+ * (return value, allocated by the caller)
>>>>+ * @nparams: pointer to number of memory parameters; input and output
>>>>+ * @flags: extra flags; not used yet, so callers should always pass 0
>>>>+ *
>>>>+ * Get information about host's huge pages. On input, @nparams
>>>>+ * gives the size of the @params array; on output, @nparams gives
>>>>+ * how many slots were filled with parameter information, which
>>>>+ * might be less but will not exceed the input value.
>>>>+ *
>>>>+ * As a special case, calling with @params as NULL and @nparams
>>>>+ * as 0 on input will cause @nparams on output to contain the
>>>>+ * number of parameters supported by the hypervisor. The caller
>>>>+ * should then allocate @params array, i.e.
>>>>+ * (sizeof(@virTypedParameter) * @nparams) bytes and call the API
>>>>+ * again. See virDomainGetMemoryParameters() for an equivalent
>>>>+ * usage example.
>>>>+ *
>>>>+ * Returns 0 in case of success, and -1 in case of failure.
>>>>+ */
>>>>+int
>>>>+virNodeHugeTLB(virConnectPtr conn,
>>>>+ int type,
>>>>+ virTypedParameterPtr params,
>>>>+ int *nparams,
>>>>+ unsigned int flags)
>>>
>>>What is the 'type' parameter doing ?
>>
>>Ah, it should be named numa_node rather than type. If type==-1, then overall
>>statistics are returned (number of {available,free} pages accumulated across
>>all NUMA nodes), if type >= 0, info on the specific NUMA node is returned.
>>
>>>
>>>I think in general this API needs a different design. I'd like to have
>>>an API that can request info for all page sizes on all NUMA nods in a
>>>single call. I also think the static unchanging data should be part of
>>>the cpu + NUMA info in the capabilities XML. So the API only reports
>>>info which is changing - ie the available pages.
>>
>>The only problem is, the size of huge pages pool is not immutable. Now it's
>>possible for 2M huge pages to be allocated dynamically:
>>
>># echo 8 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>>
>>and it may be possible for 1GB too in future (what if kernel learns how to
>>do it?). In general, the only thing that we can take unalterable for now is
>>the default size of huge pages. And I wouldn't bet on that either.
>
>Yes, you can in theory change the number of huge pages at an arbitrary
>time, but realistically people mostly only do it immediately at boot.
>With 1 GB pages is will be impossible todo it any time except immediately
>at boot. If you wait a little while, then memory will be too fragmented
>for you to be able to dynamically allocate more 1 GB pages. The same
>is true of 2MB pages to a lesser degree.
IMO no. Processes never ever see physical address (PA). All they see are
virtual addresses (VA). So there is possibility for kernel to rearrange
physical memory without effect on the processes in order to gain bigger
segments of free memory.
Applications aren't typically the problem - it is the kernels' own data
structures that often cannot be moved at all, so over time they will
cause free physical RAM regions to be very fragmented. It is a real
problem with huge page usage which will be an order of magnitude worse
for 1GB size pages.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|