Re: [libvirt] [PATCH 01/12] Introduce virNodeHugeTLB

30 May 2014

      On Thu, May 29, 2014 at 10:32:35AM +0200, Michal Privoznik wrote:
...
/**
+ * virNodeHugeTLB:
+ * @conn: pointer to the hypervisor connection
+ * @type: type
+ * @params: pointer to memory parameter object
+ *          (return value, allocated by the caller)
+ * @nparams: pointer to number of memory parameters; input and output
+ * @flags: extra flags; not used yet, so callers should always pass 0
+ *
+ * Get information about host's huge pages. On input, @nparams
+ * gives the size of the @params array; on output, @nparams gives
+ * how many slots were filled with parameter information, which
+ * might be less but will not exceed the input value.
+ *
+ * As a special case, calling with @params as NULL and @nparams
+ * as 0 on input will cause @nparams on output to contain the
+ * number of parameters supported by the hypervisor. The caller
+ * should then allocate @params array, i.e.
+ * (sizeof(@virTypedParameter) * @nparams) bytes and call the API
+ * again.  See virDomainGetMemoryParameters() for an equivalent
+ * usage example.
+ *
+ * Returns 0 in case of success, and -1 in case of failure.
+ */
+int
+virNodeHugeTLB(virConnectPtr conn,
+               int type,
+               virTypedParameterPtr params,
+               int *nparams,
+               unsigned int flags)
What is the 'type' parameter doing ?

I think in general this API needs a different design. I'd like to have
an API that can request info for all page sizes on all NUMA nods in a
single call. I also think the static unchanging data should be part of
the cpu + NUMA info in the capabilities XML. So the API only reports
info which is changing - ie the available pages.

In the <cpu> element, we should report which page sizes are available
for the CPU model.

In the <topology> element we should report the number of pages of each
size that are present in that node. We shouldn't treat huge pages
separately from small pages in this respect.

So as an example

 - CPU supports 3 page sizes 4k, 2MB, 1GB
 - 2 numa nodes
 - 3 GB of memory per numa node
 - First node has
    - 262144  * 4k pages
    - 2 * 1 GB pages
 - Second node has
    - 1536 * 2 MB pages

This would look like this

  <host>
    <cpu>
      <arch>x86_64</arch>
      <model>Westmere</model>
      <vendor>Intel</vendor>
      <topology sockets='1' cores='6' threads='2'/>
      <feature name='rdtscp'/>
      <feature name='pdpe1gb'/>
      <feature name='dca'/>
      <feature name='pdcm'/>
      <feature name='xtpr'/>
      <pages units="KiB" size="4"/>
      <pages units="MiB" size="2"/>
      <pages units="GiB" size="1"/>
    </cpu>

    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>3221225472</memory>
          <pages unit="KiB"  size="4">262144</pages>
          <pages unit="MiB"  size="2">0</pages>
          <pages unit="GiB"  size="1">2</pages>
          <cpus num='4'>
            <cpu id='0'/>
            <cpu id='2'/>
            <cpu id='4'/>
            <cpu id='6'/>
          </cpus>
        </cell>
        <cell id='1'>
          <memory unit='KiB'>3221225472</memory>
          <pages unit="KiB"  size="4">0</pages>
          <pages unit="MiB"  size="2">1536</pages>
          <pages unit="GiB"  size="1">2</pages>
          <cpus num='4'>
            <cpu id='1'/>
            <cpu id='3'/>
            <cpu id='5'/>
            <cpu id='7'/>
          </cpus>
        </cell>
      </cells>
    </topology>

So then an API call to request the available pages on all nodes
would look something like

   virNodeGetFreePages(virConnectPtr conn,
                       unsigned int *pages,
                       unsigned int npages,
                       unsigned int startcell,
                       unsigned int cellcount,
                       unsigned long long *counts);

In this API

 @pages - array whose elements are the page sizes to request info for
 @npages - number of elements in @pages
 @startcell - ID of first NUMA cell to request data for
 @cellcount - number of cells to request data for
 @counts - array which is @npages * @cellcount in length

So if you want free count for all page sizes on all NUMA nodes
you might use this as

   unsigned int pages[] = { 4096, 2097152, 1073741824}
   unsigned int npages = ARRAY_CARDINALITY(pages);
   unsigned int startcell = 0;
   unsigned int cellcount = 2;

   unsigned long long counts = malloc(sizeof(long long) * npages * cellcount);

   virNodeGetFreePages(conn, pages, npages,
                       startcell, cellcount, counts);

   for (i = 0 ; i < cellcount ; i++) {
       fprintf(stdout, "Cell %d\n", startcell + i);
       for (j = 0 ; j < npages ; j++) {
          fprintf(stdout, "  Page size=%d count=%d bytes=%llu\n",
                  pages[j], counts[(i * npages) +  j],
                  pages[j] * counts[(i * npages) +  j]);
       }
       fprintf(stderr, "\n");
   }

 Cell 0
    Page size=4096 count=300 bytes=1228800
    Page size=2097152 count=0 bytes=0
    Page size=1073741824 count=1 bytes=1073741824
 Cell 1
    Page size=4096 count=0 bytes=0
    Page size=2097152 count=20 bytes=41943040
    Page size=1073741824 count=0 bytes=0

Or you could request free count for one specific node, or for one specific
page size.

This new API would basically obsolete the existing virNodeGetCellsFreeMemory
by providing something that gave you data on all pages at once, instead of
only data on the smallest page size.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Re: [libvirt] [PATCH 01/12] Introduce virNodeHugeTLB

Daniel P. Berrange