[libvirt] [PATCH 0/4] Couple of test driver fixes and improvements

BLURB Michal Privoznik (4): testOpenDefault: Rename loop variable testNodeGetCellsFreeMemory: Fix off by one error test driver: Store memory per NUMA node test_driver: Implement virNodeAllocPages src/test/test_driver.c | 198 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 164 insertions(+), 34 deletions(-) -- 2.8.4

We have inclination to calling our loop variables i, j, k, not u. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/src/test/test_driver.c b/src/test/test_driver.c index a382d89..758327c 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -1262,10 +1262,10 @@ testOpenFromFile(virConnectPtr conn, const char *file) static int testOpenDefault(virConnectPtr conn) { - int u; testDriverPtr privconn = NULL; xmlDocPtr doc = NULL; xmlXPathContextPtr ctxt = NULL; + size_t i; virMutexLock(&defaultLock); if (defaultConnections++) { @@ -1283,20 +1283,20 @@ testOpenDefault(virConnectPtr conn) /* Numa setup */ privconn->numCells = 2; - for (u = 0; u < privconn->numCells; ++u) { - privconn->cells[u].numCpus = 8; - privconn->cells[u].mem = (u + 1) * 2048 * 1024; - privconn->cells[u].freeMem = (u + 1) * 1024 * 1024; + for (i = 0; i < privconn->numCells; i++) { + privconn->cells[i].numCpus = 8; + privconn->cells[i].mem = (i + 1) * 2048 * 1024; + privconn->cells[i].freeMem = (i + 1) * 1024 * 1024; } - for (u = 0; u < 16; u++) { + for (i = 0; i < 16; i++) { virBitmapPtr siblings = virBitmapNew(16); if (!siblings) goto error; - ignore_value(virBitmapSetBit(siblings, u)); - privconn->cells[u / 8].cpus[(u % 8)].id = u; - privconn->cells[u / 8].cpus[(u % 8)].socket_id = u / 8; - privconn->cells[u / 8].cpus[(u % 8)].core_id = u % 8; - privconn->cells[u / 8].cpus[(u % 8)].siblings = siblings; + ignore_value(virBitmapSetBit(siblings, i)); + privconn->cells[i / 8].cpus[(i % 8)].id = i; + privconn->cells[i / 8].cpus[(i % 8)].socket_id = i / 8; + privconn->cells[i / 8].cpus[(i % 8)].core_id = i % 8; + privconn->cells[i / 8].cpus[(i % 8)].siblings = siblings; } if (!(privconn->caps = testBuildCapabilities(conn))) -- 2.8.4

On 10/05/2016 03:30 AM, Michal Privoznik wrote:
We have inclination to calling our loop variables i, j, k, not u.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
ACK John

Consider the following scenario: virsh # freecell --all 0: 2048 KiB 1: 4096 KiB -------------------- Total: 6144 KiB virsh # freecell 0 0: 2048 KiB virsh # freecell 1 1: 4096 KiB And now before this change: virsh # freecell 2 After this change: virsh # freecell 2 error: invalid argument: Range exceeds available cells Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/test/test_driver.c b/src/test/test_driver.c index 758327c..b760b4f 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -2710,7 +2710,7 @@ static int testNodeGetCellsFreeMemory(virConnectPtr conn, int ret = -1; testDriverLock(privconn); - if (startCell > privconn->numCells) { + if (startCell >= privconn->numCells) { virReportError(VIR_ERR_INVALID_ARG, "%s", _("Range exceeds available cells")); goto cleanup; -- 2.8.4

On 10/05/2016 03:30 AM, Michal Privoznik wrote:
Consider the following scenario:
virsh # freecell --all 0: 2048 KiB 1: 4096 KiB -------------------- Total: 6144 KiB
virsh # freecell 0 0: 2048 KiB
virsh # freecell 1 1: 4096 KiB
And now before this change:
virsh # freecell 2
After this change:
virsh # freecell 2 error: invalid argument: Range exceeds available cells
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
ACK John

In d18c7d7124 we have tried to implement virNodeGetFreePages API to test driver. And for a very limited definition of success we have succeeded. But, we can do better. Firstly, we can teach our internal representation of a NUMA cell that there are different page sizes and that they create a pool (from which apps alloc some). For the demonstration purposes a half of the pool is taken by default, therefore only the other half is free. This representation described above also makes it perfect for implementing virNodeAllocPages API (yet to come in a subsequent commit). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 100 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 76 insertions(+), 24 deletions(-) diff --git a/src/test/test_driver.c b/src/test/test_driver.c index b760b4f..a3f74f8 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -72,9 +72,16 @@ VIR_LOG_INIT("test.test_driver"); #define MAX_CPUS 128 +struct _testPage { + unsigned long pagesize; /* in KiB */ + unsigned long pages; /* in total, pagesFree should never + be greater than this. */ + unsigned long long pagesFree; /* free pages */ +}; + struct _testCell { - unsigned long mem; - unsigned long freeMem; + size_t npages; + struct _testPage *pages; int numCpus; virCapsHostNUMACellCPU cpus[MAX_CPUS]; }; @@ -319,16 +326,17 @@ testBuildCapabilities(virConnectPtr conn) if (virCapabilitiesAddHostFeature(caps, "nonpae") < 0) goto error; - if (VIR_ALLOC_N(caps->host.pagesSize, 2) < 0) + if (VIR_ALLOC_N(caps->host.pagesSize, privconn->cells[0].npages) < 0) goto error; - caps->host.pagesSize[caps->host.nPagesSize++] = 4; - caps->host.pagesSize[caps->host.nPagesSize++] = 2048; + for (i = 0; i < privconn->cells[i].npages; i++) + caps->host.pagesSize[caps->host.nPagesSize++] = privconn->cells[0].pages[i].pagesize; for (i = 0; i < privconn->numCells; i++) { virCapsHostNUMACellCPUPtr cpu_cells; virCapsHostNUMACellPageInfoPtr pages; size_t nPages; + unsigned long mem = 0; if (VIR_ALLOC_N(cpu_cells, privconn->cells[i].numCpus) < 0 || VIR_ALLOC_N(pages, caps->host.nPagesSize) < 0) { @@ -341,12 +349,14 @@ testBuildCapabilities(virConnectPtr conn) memcpy(cpu_cells, privconn->cells[i].cpus, sizeof(*cpu_cells) * privconn->cells[i].numCpus); - for (j = 0; j < nPages; j++) - pages[j].size = caps->host.pagesSize[j]; + for (j = 0; j < nPages; j++) { + pages[j].size = privconn->cells[i].pages[j].pagesize; + pages[j].avail = privconn->cells[i].pages[j].pages; - pages[0].avail = privconn->cells[i].mem / pages[0].size; + mem += pages[j].size * pages[j].avail; + } - if (virCapabilitiesAddHostNUMACell(caps, i, privconn->cells[i].mem, + if (virCapabilitiesAddHostNUMACell(caps, i, mem, privconn->cells[i].numCpus, cpu_cells, 0, NULL, nPages, pages) < 0) goto error; @@ -1285,8 +1295,20 @@ testOpenDefault(virConnectPtr conn) privconn->numCells = 2; for (i = 0; i < privconn->numCells; i++) { privconn->cells[i].numCpus = 8; - privconn->cells[i].mem = (i + 1) * 2048 * 1024; - privconn->cells[i].freeMem = (i + 1) * 1024 * 1024; + + if (VIR_ALLOC_N(privconn->cells[i].pages, 2) < 0) + goto error; + privconn->cells[i].npages = 2; + + /* Let the node have some 4k pages */ + privconn->cells[i].pages[0].pagesize = 4; + privconn->cells[i].pages[0].pages = (i + 1) * 2048 * 1024; + privconn->cells[i].pages[0].pagesFree = (i + 1) * 1024 * 1024; + + /* And also some 2M pages */ + privconn->cells[i].pages[1].pagesize = 2048; + privconn->cells[i].pages[1].pages = (i + 1) * 2048; + privconn->cells[i].pages[1].pagesFree = (i + 1) * 128; } for (i = 0; i < 16; i++) { virBitmapPtr siblings = virBitmapNew(16); @@ -2719,7 +2741,9 @@ static int testNodeGetCellsFreeMemory(virConnectPtr conn, for (cell = startCell, i = 0; (cell < privconn->numCells && i < maxCells); ++cell, ++i) { - freemems[i] = privconn->cells[cell].mem; + struct _testCell *tmp = &privconn->cells[cell]; + + freemems[i] = tmp->pages[0].pagesize * tmp->pages[0].pagesFree * 1024; } ret = i; @@ -2784,35 +2808,63 @@ testNodeGetFreeMemory(virConnectPtr conn) testDriverLock(privconn); - for (i = 0; i < privconn->numCells; i++) - freeMem += privconn->cells[i].freeMem; + for (i = 0; i < privconn->numCells; i++) { + struct _testCell *tmp = &privconn->cells[i]; + + freeMem += tmp->pages[0].pagesize * tmp->pages[0].pagesFree * 1024; + } testDriverUnlock(privconn); return freeMem; } static int -testNodeGetFreePages(virConnectPtr conn ATTRIBUTE_UNUSED, +testNodeGetFreePages(virConnectPtr conn, unsigned int npages, - unsigned int *pages ATTRIBUTE_UNUSED, - int startCell ATTRIBUTE_UNUSED, + unsigned int *pages, + int startCell, unsigned int cellCount, unsigned long long *counts, unsigned int flags) { - size_t i = 0, j = 0; - int x = 6; + testDriverPtr privconn = conn->privateData; + size_t i, ncounts = 0; + int lastCell; + int ret = -1; virCheckFlags(0, -1); - for (i = 0; i < cellCount; i++) { - for (j = 0; j < npages; j++) { - x = x * 2 + 7; - counts[(i * npages) + j] = x; + testDriverLock(privconn); + + lastCell = privconn->numCells - 1; + + if (startCell < 0 || startCell > lastCell) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("start cell %d out of range (0-%d)"), + startCell, lastCell); + goto cleanup; + } + + lastCell = MIN(lastCell, startCell + (int) cellCount - 1); + + for (i = startCell; i <= lastCell; i++) { + testCellPtr cell = &privconn->cells[i]; + size_t j, k; + + for (j = 0; j < cell->npages; j++) { + for (k = 0; k < npages; k++) { + if (pages[k] != cell->pages[j].pagesize) + continue; + + counts[ncounts++] = cell->pages[j].pagesFree; + } } } - return 0; + ret = ncounts; + cleanup: + testDriverUnlock(privconn); + return ret; } static int testDomainCreateWithFlags(virDomainPtr domain, unsigned int flags) -- 2.8.4

On 10/05/2016 03:30 AM, Michal Privoznik wrote:
In d18c7d7124 we have tried to implement virNodeGetFreePages API to test driver. And for a very limited definition of success we have succeeded. But, we can do better. Firstly, we can teach our internal representation of a NUMA cell that there are different page sizes and that they create a pool (from which apps alloc some). For the demonstration purposes a half of the pool is taken by default, therefore only the other half is free.
This representation described above also makes it perfect for implementing virNodeAllocPages API (yet to come in a subsequent commit).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 100 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 76 insertions(+), 24 deletions(-)
diff --git a/src/test/test_driver.c b/src/test/test_driver.c index b760b4f..a3f74f8 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -72,9 +72,16 @@ VIR_LOG_INIT("test.test_driver");
#define MAX_CPUS 128
+struct _testPage { + unsigned long pagesize; /* in KiB */ + unsigned long pages; /* in total, pagesFree should never + be greater than this. */ + unsigned long long pagesFree; /* free pages */ +}; + struct _testCell { - unsigned long mem; - unsigned long freeMem; + size_t npages; + struct _testPage *pages; int numCpus; virCapsHostNUMACellCPU cpus[MAX_CPUS]; }; @@ -319,16 +326,17 @@ testBuildCapabilities(virConnectPtr conn) if (virCapabilitiesAddHostFeature(caps, "nonpae") < 0) goto error;
- if (VIR_ALLOC_N(caps->host.pagesSize, 2) < 0) + if (VIR_ALLOC_N(caps->host.pagesSize, privconn->cells[0].npages) < 0) goto error;
I have very limited knowledge of NUMA/Cell, but I guess I don't understand why the host.pagesSize is only referencing the cells[0] values here and in the subsequent loop. It's all a mis Shouldn't this be checking the various cells[n] for the page sizes supported and then allocating pagesSize based upon the different sizes? Then when filling things in the nPagesSize[n] would be based on the cells page sizes found? It just doesn't look right with the usage of [0]. The config has 2 cells that each have 2 pages. The host.pagesSize would then be a list of the page sizes found, right? Future math could then find the number of pages and pagesFree for each specific size.
- caps->host.pagesSize[caps->host.nPagesSize++] = 4; - caps->host.pagesSize[caps->host.nPagesSize++] = 2048; + for (i = 0; i < privconn->cells[i].npages; i++) + caps->host.pagesSize[caps->host.nPagesSize++] = privconn->cells[0].pages[i].pagesize;
for (i = 0; i < privconn->numCells; i++) { virCapsHostNUMACellCPUPtr cpu_cells; virCapsHostNUMACellPageInfoPtr pages; size_t nPages; + unsigned long mem = 0;
if (VIR_ALLOC_N(cpu_cells, privconn->cells[i].numCpus) < 0 || VIR_ALLOC_N(pages, caps->host.nPagesSize) < 0) { @@ -341,12 +349,14 @@ testBuildCapabilities(virConnectPtr conn) memcpy(cpu_cells, privconn->cells[i].cpus, sizeof(*cpu_cells) * privconn->cells[i].numCpus);
I would remove the local nPages it's confusing... Whether the rest of the following hunk would seem to rely on whether my theory above is true. But essentially filling in the pages[N] would rely on finding the cells with the matching page size from host.pagesSize[N]. IOW: I would think there needs to be an if statement ensuring that cells[i].pagesize == host.pagesSize[N]. Reading this infernal dual loops is always painful.
- for (j = 0; j < nPages; j++) - pages[j].size = caps->host.pagesSize[j]; + for (j = 0; j < nPages; j++) { + pages[j].size = privconn->cells[i].pages[j].pagesize; + pages[j].avail = privconn->cells[i].pages[j].pages;
- pages[0].avail = privconn->cells[i].mem / pages[0].size; + mem += pages[j].size * pages[j].avail; + }
- if (virCapabilitiesAddHostNUMACell(caps, i, privconn->cells[i].mem, + if (virCapabilitiesAddHostNUMACell(caps, i, mem, privconn->cells[i].numCpus, cpu_cells, 0, NULL, nPages, pages) < 0) goto error; @@ -1285,8 +1295,20 @@ testOpenDefault(virConnectPtr conn) privconn->numCells = 2; for (i = 0; i < privconn->numCells; i++) { privconn->cells[i].numCpus = 8; - privconn->cells[i].mem = (i + 1) * 2048 * 1024; - privconn->cells[i].freeMem = (i + 1) * 1024 * 1024; + + if (VIR_ALLOC_N(privconn->cells[i].pages, 2) < 0) + goto error; + privconn->cells[i].npages = 2; + + /* Let the node have some 4k pages */ + privconn->cells[i].pages[0].pagesize = 4; + privconn->cells[i].pages[0].pages = (i + 1) * 2048 * 1024; + privconn->cells[i].pages[0].pagesFree = (i + 1) * 1024 * 1024;
Not that it probably matters since we're not doing allocations, but I assume the " * 1024" was cut-n-paste from the old .mem and .freeMem which were I believe byte based while your new structure is kb based...
+ + /* And also some 2M pages */ + privconn->cells[i].pages[1].pagesize = 2048; + privconn->cells[i].pages[1].pages = (i + 1) * 2048; + privconn->cells[i].pages[1].pagesFree = (i + 1) * 128; } for (i = 0; i < 16; i++) { virBitmapPtr siblings = virBitmapNew(16); @@ -2719,7 +2741,9 @@ static int testNodeGetCellsFreeMemory(virConnectPtr conn, for (cell = startCell, i = 0; (cell < privconn->numCells && i < maxCells); ++cell, ++i) { - freemems[i] = privconn->cells[cell].mem; + struct _testCell *tmp = &privconn->cells[cell]; + + freemems[i] = tmp->pages[0].pagesize * tmp->pages[0].pagesFree * 1024;
Why aren't the 2m pages[1] included?
} ret = i;
@@ -2784,35 +2808,63 @@ testNodeGetFreeMemory(virConnectPtr conn)
testDriverLock(privconn);
- for (i = 0; i < privconn->numCells; i++) - freeMem += privconn->cells[i].freeMem; + for (i = 0; i < privconn->numCells; i++) { + struct _testCell *tmp = &privconn->cells[i]; + + freeMem += tmp->pages[0].pagesize * tmp->pages[0].pagesFree * 1024;
Why aren't the 2M pages[1] included?
+ }
testDriverUnlock(privconn); return freeMem; }
static int -testNodeGetFreePages(virConnectPtr conn ATTRIBUTE_UNUSED, +testNodeGetFreePages(virConnectPtr conn, unsigned int npages, - unsigned int *pages ATTRIBUTE_UNUSED, - int startCell ATTRIBUTE_UNUSED, + unsigned int *pages, + int startCell, unsigned int cellCount, unsigned long long *counts, unsigned int flags) { - size_t i = 0, j = 0; - int x = 6; + testDriverPtr privconn = conn->privateData; + size_t i, ncounts = 0; + int lastCell; + int ret = -1;
virCheckFlags(0, -1);
- for (i = 0; i < cellCount; i++) { - for (j = 0; j < npages; j++) { - x = x * 2 + 7; - counts[(i * npages) + j] = x; + testDriverLock(privconn); + + lastCell = privconn->numCells - 1; + + if (startCell < 0 || startCell > lastCell) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("start cell %d out of range (0-%d)"), + startCell, lastCell); + goto cleanup; + } + + lastCell = MIN(lastCell, startCell + (int) cellCount - 1); + + for (i = startCell; i <= lastCell; i++) { + testCellPtr cell = &privconn->cells[i]; + size_t j, k; + + for (j = 0; j < cell->npages; j++) { + for (k = 0; k < npages; k++) { + if (pages[k] != cell->pages[j].pagesize) + continue;
This to a degree I think confirms what I wrote above - this code would seemingly be getting the data for a specific pages size (assuming that pages is an array of pageSizes). John
+ + counts[ncounts++] = cell->pages[j].pagesFree; + } } }
- return 0; + ret = ncounts; + cleanup: + testDriverUnlock(privconn); + return ret; }
static int testDomainCreateWithFlags(virDomainPtr domain, unsigned int flags)

On 13.10.2016 05:12, John Ferlan wrote:
On 10/05/2016 03:30 AM, Michal Privoznik wrote:
In d18c7d7124 we have tried to implement virNodeGetFreePages API to test driver. And for a very limited definition of success we have succeeded. But, we can do better. Firstly, we can teach our internal representation of a NUMA cell that there are different page sizes and that they create a pool (from which apps alloc some). For the demonstration purposes a half of the pool is taken by default, therefore only the other half is free.
This representation described above also makes it perfect for implementing virNodeAllocPages API (yet to come in a subsequent commit).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 100 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 76 insertions(+), 24 deletions(-)
diff --git a/src/test/test_driver.c b/src/test/test_driver.c index b760b4f..a3f74f8 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -72,9 +72,16 @@ VIR_LOG_INIT("test.test_driver");
#define MAX_CPUS 128
+struct _testPage { + unsigned long pagesize; /* in KiB */ + unsigned long pages; /* in total, pagesFree should never + be greater than this. */ + unsigned long long pagesFree; /* free pages */ +}; + struct _testCell { - unsigned long mem; - unsigned long freeMem; + size_t npages; + struct _testPage *pages; int numCpus; virCapsHostNUMACellCPU cpus[MAX_CPUS]; }; @@ -319,16 +326,17 @@ testBuildCapabilities(virConnectPtr conn) if (virCapabilitiesAddHostFeature(caps, "nonpae") < 0) goto error;
- if (VIR_ALLOC_N(caps->host.pagesSize, 2) < 0) + if (VIR_ALLOC_N(caps->host.pagesSize, privconn->cells[0].npages) < 0) goto error;
I have very limited knowledge of NUMA/Cell, but I guess I don't understand why the host.pagesSize is only referencing the cells[0] values here and in the subsequent loop. It's all a mis
Well, so far I'm assuming that all the NUMA nodes support all the sizes of huge pages. IOW all the NUMA nodes are the same from this specific POV. So if NUMA node #0 supports say 4K and 2M pages, the rest of the nodes do as well. Therefore, I don't have to make this more complex than it already is.
Shouldn't this be checking the various cells[n] for the page sizes supported and then allocating pagesSize based upon the different sizes?
Yes, if we were to have different sizes for different NUMA nodes. But we don't. And frankly, I've never ever seen a machine out there in the wild which does have separate page sizes on NUMA nodes. Maybe there is one. But even if there is one, question is whether we want the test driver to reflect that. And the next question is whether we want to do it in a separate patch (and merging this one meanwhile) or in this one.
Then when filling things in the nPagesSize[n] would be based on the cells page sizes found?
You mean pageSize[n]? Yes.
It just doesn't look right with the usage of [0]. The config has 2 cells that each have 2 pages. The host.pagesSize would then be a list of the page sizes found, right?
Yes. But then again - since all our virtual NUMA nodes for the test driver have the same page sizes, we can do this kind of shortcut.
Future math could then find the number of pages and pagesFree for each specific size.
- caps->host.pagesSize[caps->host.nPagesSize++] = 4; - caps->host.pagesSize[caps->host.nPagesSize++] = 2048; + for (i = 0; i < privconn->cells[i].npages; i++) + caps->host.pagesSize[caps->host.nPagesSize++] = privconn->cells[0].pages[i].pagesize;
for (i = 0; i < privconn->numCells; i++) { virCapsHostNUMACellCPUPtr cpu_cells; virCapsHostNUMACellPageInfoPtr pages; size_t nPages; + unsigned long mem = 0;
if (VIR_ALLOC_N(cpu_cells, privconn->cells[i].numCpus) < 0 || VIR_ALLOC_N(pages, caps->host.nPagesSize) < 0) { @@ -341,12 +349,14 @@ testBuildCapabilities(virConnectPtr conn) memcpy(cpu_cells, privconn->cells[i].cpus, sizeof(*cpu_cells) * privconn->cells[i].numCpus);
I would remove the local nPages it's confusing...
Ah, I felt the opposite. But okay. Consider it gone.
Whether the rest of the following hunk would seem to rely on whether my theory above is true.
But essentially filling in the pages[N] would rely on finding the cells with the matching page size from host.pagesSize[N]. IOW: I would think there needs to be an if statement ensuring that cells[i].pagesize == host.pagesSize[N]. Reading this infernal dual loops is always painful.
Right it is. This is where we have to do the extra step, because usually - in other drivers - we just ask kernel (via sysfs). And this double loop is implemented in it then.
- for (j = 0; j < nPages; j++) - pages[j].size = caps->host.pagesSize[j]; + for (j = 0; j < nPages; j++) { + pages[j].size = privconn->cells[i].pages[j].pagesize; + pages[j].avail = privconn->cells[i].pages[j].pages;
- pages[0].avail = privconn->cells[i].mem / pages[0].size; + mem += pages[j].size * pages[j].avail; + }
- if (virCapabilitiesAddHostNUMACell(caps, i, privconn->cells[i].mem, + if (virCapabilitiesAddHostNUMACell(caps, i, mem, privconn->cells[i].numCpus, cpu_cells, 0, NULL, nPages, pages) < 0) goto error; @@ -1285,8 +1295,20 @@ testOpenDefault(virConnectPtr conn) privconn->numCells = 2; for (i = 0; i < privconn->numCells; i++) { privconn->cells[i].numCpus = 8; - privconn->cells[i].mem = (i + 1) * 2048 * 1024; - privconn->cells[i].freeMem = (i + 1) * 1024 * 1024; + + if (VIR_ALLOC_N(privconn->cells[i].pages, 2) < 0) + goto error; + privconn->cells[i].npages = 2; + + /* Let the node have some 4k pages */ + privconn->cells[i].pages[0].pagesize = 4; + privconn->cells[i].pages[0].pages = (i + 1) * 2048 * 1024; + privconn->cells[i].pages[0].pagesFree = (i + 1) * 1024 * 1024;
Not that it probably matters since we're not doing allocations, but I assume the " * 1024" was cut-n-paste from the old .mem and .freeMem which were I believe byte based while your new structure is kb based...
+ + /* And also some 2M pages */ + privconn->cells[i].pages[1].pagesize = 2048; + privconn->cells[i].pages[1].pages = (i + 1) * 2048; + privconn->cells[i].pages[1].pagesFree = (i + 1) * 128; } for (i = 0; i < 16; i++) { virBitmapPtr siblings = virBitmapNew(16); @@ -2719,7 +2741,9 @@ static int testNodeGetCellsFreeMemory(virConnectPtr conn, for (cell = startCell, i = 0; (cell < privconn->numCells && i < maxCells); ++cell, ++i) { - freemems[i] = privconn->cells[cell].mem; + struct _testCell *tmp = &privconn->cells[cell]; + + freemems[i] = tmp->pages[0].pagesize * tmp->pages[0].pagesFree * 1024;
Why aren't the 2m pages[1] included?
Because freecell only contains 'free memory per each NUMA node'. If you'd run 'virsh -c qemu:///system freecell --all' you will not see 2M (or any other huge pages) either. Just the regular 4k pages. This is basically stupidity of Linux kernel while it threats pages of different sizes differently. There are regular system pages (4k) and these show up in 'free -m', 'virsh freecell', 'top', ... You don't have reserve a pool of them if you want to use them, or allocate them via special syscall. Then there are other sizes (usually 2M or 1G) and oh boy, everything's different. They don't show up in any of the previous tools, and basically - they are treated quite the opposite to what I described. This is very stupid approach. That's why we have some additional code in libvirt that treats all of the pages equally. For instance, you can configure your domain to use "hugepages" of size 4K, or 'virsh freepages --all' shows ALL the page sizes. Michal

On 10/12/2016 10:30 PM, Michal Privoznik wrote:
On 13.10.2016 05:12, John Ferlan wrote:
On 10/05/2016 03:30 AM, Michal Privoznik wrote:
In d18c7d7124 we have tried to implement virNodeGetFreePages API to test driver. And for a very limited definition of success we have succeeded. But, we can do better. Firstly, we can teach our internal representation of a NUMA cell that there are different page sizes and that they create a pool (from which apps alloc some). For the demonstration purposes a half of the pool is taken by default, therefore only the other half is free.
This representation described above also makes it perfect for implementing virNodeAllocPages API (yet to come in a subsequent commit).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 100 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 76 insertions(+), 24 deletions(-)
diff --git a/src/test/test_driver.c b/src/test/test_driver.c index b760b4f..a3f74f8 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -72,9 +72,16 @@ VIR_LOG_INIT("test.test_driver");
#define MAX_CPUS 128
+struct _testPage { + unsigned long pagesize; /* in KiB */ + unsigned long pages; /* in total, pagesFree should never + be greater than this. */ + unsigned long long pagesFree; /* free pages */ +}; + struct _testCell { - unsigned long mem; - unsigned long freeMem; + size_t npages; + struct _testPage *pages; int numCpus; virCapsHostNUMACellCPU cpus[MAX_CPUS]; }; @@ -319,16 +326,17 @@ testBuildCapabilities(virConnectPtr conn) if (virCapabilitiesAddHostFeature(caps, "nonpae") < 0) goto error;
- if (VIR_ALLOC_N(caps->host.pagesSize, 2) < 0) + if (VIR_ALLOC_N(caps->host.pagesSize, privconn->cells[0].npages) < 0) goto error;
I have very limited knowledge of NUMA/Cell, but I guess I don't understand why the host.pagesSize is only referencing the cells[0] values here and in the subsequent loop. It's all a mis
Well, so far I'm assuming that all the NUMA nodes support all the sizes of huge pages. IOW all the NUMA nodes are the same from this specific POV. So if NUMA node #0 supports say 4K and 2M pages, the rest of the nodes do as well. Therefore, I don't have to make this more complex than it already is.
Shouldn't this be checking the various cells[n] for the page sizes supported and then allocating pagesSize based upon the different sizes?
Yes, if we were to have different sizes for different NUMA nodes. But we don't. And frankly, I've never ever seen a machine out there in the wild which does have separate page sizes on NUMA nodes. Maybe there is one. But even if there is one, question is whether we want the test driver to reflect that. And the next question is whether we want to do it in a separate patch (and merging this one meanwhile) or in this one.
Guess I was thinking too hard ;-) in trying to understand the algorithm, but you're right - this is a test driver for a simple case and we're taking a shortcut... Something that you just document - what you were thinking - although perhaps someone else with a deeper understand of NUMA would say, well duh why document that...
Then when filling things in the nPagesSize[n] would be based on the cells page sizes found?
You mean pageSize[n]? Yes.
It just doesn't look right with the usage of [0]. The config has 2 cells that each have 2 pages. The host.pagesSize would then be a list of the page sizes found, right?
Yes. But then again - since all our virtual NUMA nodes for the test driver have the same page sizes, we can do this kind of shortcut.
Future math could then find the number of pages and pagesFree for each specific size.
- caps->host.pagesSize[caps->host.nPagesSize++] = 4; - caps->host.pagesSize[caps->host.nPagesSize++] = 2048; + for (i = 0; i < privconn->cells[i].npages; i++) + caps->host.pagesSize[caps->host.nPagesSize++] = privconn->cells[0].pages[i].pagesize;
for (i = 0; i < privconn->numCells; i++) { virCapsHostNUMACellCPUPtr cpu_cells; virCapsHostNUMACellPageInfoPtr pages; size_t nPages; + unsigned long mem = 0;
if (VIR_ALLOC_N(cpu_cells, privconn->cells[i].numCpus) < 0 || VIR_ALLOC_N(pages, caps->host.nPagesSize) < 0) { @@ -341,12 +349,14 @@ testBuildCapabilities(virConnectPtr conn) memcpy(cpu_cells, privconn->cells[i].cpus, sizeof(*cpu_cells) * privconn->cells[i].numCpus);
I would remove the local nPages it's confusing...
Ah, I felt the opposite. But okay. Consider it gone.
Whether the rest of the following hunk would seem to rely on whether my theory above is true.
But essentially filling in the pages[N] would rely on finding the cells with the matching page size from host.pagesSize[N]. IOW: I would think there needs to be an if statement ensuring that cells[i].pagesize == host.pagesSize[N]. Reading this infernal dual loops is always painful.
Right it is. This is where we have to do the extra step, because usually - in other drivers - we just ask kernel (via sysfs). And this double loop is implemented in it then.
- for (j = 0; j < nPages; j++) - pages[j].size = caps->host.pagesSize[j]; + for (j = 0; j < nPages; j++) { + pages[j].size = privconn->cells[i].pages[j].pagesize; + pages[j].avail = privconn->cells[i].pages[j].pages;
- pages[0].avail = privconn->cells[i].mem / pages[0].size; + mem += pages[j].size * pages[j].avail; + }
- if (virCapabilitiesAddHostNUMACell(caps, i, privconn->cells[i].mem, + if (virCapabilitiesAddHostNUMACell(caps, i, mem, privconn->cells[i].numCpus, cpu_cells, 0, NULL, nPages, pages) < 0) goto error; @@ -1285,8 +1295,20 @@ testOpenDefault(virConnectPtr conn) privconn->numCells = 2; for (i = 0; i < privconn->numCells; i++) { privconn->cells[i].numCpus = 8; - privconn->cells[i].mem = (i + 1) * 2048 * 1024; - privconn->cells[i].freeMem = (i + 1) * 1024 * 1024; + + if (VIR_ALLOC_N(privconn->cells[i].pages, 2) < 0) + goto error; + privconn->cells[i].npages = 2; + + /* Let the node have some 4k pages */ + privconn->cells[i].pages[0].pagesize = 4; + privconn->cells[i].pages[0].pages = (i + 1) * 2048 * 1024; + privconn->cells[i].pages[0].pagesFree = (i + 1) * 1024 * 1024;
Not that it probably matters since we're not doing allocations, but I assume the " * 1024" was cut-n-paste from the old .mem and .freeMem which were I believe byte based while your new structure is kb based...
+ + /* And also some 2M pages */ + privconn->cells[i].pages[1].pagesize = 2048; + privconn->cells[i].pages[1].pages = (i + 1) * 2048; + privconn->cells[i].pages[1].pagesFree = (i + 1) * 128; } for (i = 0; i < 16; i++) { virBitmapPtr siblings = virBitmapNew(16); @@ -2719,7 +2741,9 @@ static int testNodeGetCellsFreeMemory(virConnectPtr conn, for (cell = startCell, i = 0; (cell < privconn->numCells && i < maxCells); ++cell, ++i) { - freemems[i] = privconn->cells[cell].mem; + struct _testCell *tmp = &privconn->cells[cell]; + + freemems[i] = tmp->pages[0].pagesize * tmp->pages[0].pagesFree * 1024;
Why aren't the 2m pages[1] included?
Because freecell only contains 'free memory per each NUMA node'. If you'd run 'virsh -c qemu:///system freecell --all' you will not see 2M (or any other huge pages) either. Just the regular 4k pages. This is basically stupidity of Linux kernel while it threats pages of different sizes differently. There are regular system pages (4k) and these show up in 'free -m', 'virsh freecell', 'top', ... You don't have reserve a pool of them if you want to use them, or allocate them via special syscall. Then there are other sizes (usually 2M or 1G) and oh boy, everything's different. They don't show up in any of the previous tools, and basically - they are treated quite the opposite to what I described. This is very stupid approach. That's why we have some additional code in libvirt that treats all of the pages equally. For instance, you can configure your domain to use "hugepages" of size 4K, or 'virsh freepages --all' shows ALL the page sizes.
Ahhh... I see... It's all very confusing and unless you know the rules and history, it's easy to get lost. John

Now that our cells in test driver are huge pages aware, we can implement virNodeAllocPages. Basically there's just one catch. If users specify startCell == -1, they want the huge pages change to be stretched over all the nodes. Therefore we just recalculate the change they want to make to each node and continue. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/src/test/test_driver.c b/src/test/test_driver.c index a3f74f8..d9e408e 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -2867,6 +2867,83 @@ testNodeGetFreePages(virConnectPtr conn, return ret; } +static int +testNodeAllocPages(virConnectPtr conn, + unsigned int npages, + unsigned int *pageSizes, + unsigned long long *pageCounts, + int startCell, + unsigned int cellCount, + unsigned int flags) +{ + testDriverPtr privconn = conn->privateData; + bool add = !(flags & VIR_NODE_ALLOC_PAGES_SET); + ssize_t i, ncounts = 0; + size_t j; + int lastCell; + int ret = -1; + + virCheckFlags(VIR_NODE_ALLOC_PAGES_SET, -1); + + testDriverLock(privconn); + + lastCell = privconn->numCells - 1; + + if (startCell < -1 || startCell > lastCell) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("start cell %d out of range (0-%d)"), + startCell, lastCell); + goto cleanup; + } + + lastCell = MIN(lastCell, startCell + (int) cellCount - 1); + + if (startCell == -1) { + /* Okay, hold on to your hats because this is gonna get wild. + * startCell == -1 means that user wants us to allocate + * pages over all NUMA nodes proportionally. Just + * recalculate the pageCounts and we should be good. */ + for (j = 0; j < npages; j++) + pageCounts[j] /= privconn->numCells; + startCell = 0; + lastCell = privconn->numCells - 1; + } + + for (i = startCell; i <= lastCell; i++) { + testCellPtr cell = &privconn->cells[i]; + size_t k; + + for (j = 0; j < cell->npages; j++) { + for (k = 0; k < npages; k++) { + unsigned long long pagesFree = cell->pages[j].pagesFree; + + if (pageSizes[k] != cell->pages[j].pagesize) + continue; + + if (add) + pagesFree += pageCounts[k]; + else + pagesFree = pageCounts[k]; + + if (pagesFree > cell->pages[j].pages) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to allocate %llu pages"), + pagesFree); + goto cleanup; + } + + cell->pages[j].pagesFree = pagesFree; + ncounts++; + } + } + } + + ret = ncounts; + cleanup: + testDriverUnlock(privconn); + return ret; +} + static int testDomainCreateWithFlags(virDomainPtr domain, unsigned int flags) { testDriverPtr privconn = domain->conn->privateData; @@ -6872,6 +6949,7 @@ static virHypervisorDriver testHypervisorDriver = { .nodeGetCPUStats = testNodeGetCPUStats, /* 2.3.0 */ .nodeGetFreeMemory = testNodeGetFreeMemory, /* 2.3.0 */ .nodeGetFreePages = testNodeGetFreePages, /* 2.3.0 */ + .nodeAllocPages = testNodeAllocPages, /* 2.4.0 */ .connectGetCapabilities = testConnectGetCapabilities, /* 0.2.1 */ .connectGetSysinfo = testConnectGetSysinfo, /* 2.3.0 */ .connectGetType = testConnectGetType, /* 2.3.0 */ -- 2.8.4

On 10/05/2016 03:30 AM, Michal Privoznik wrote:
Now that our cells in test driver are huge pages aware, we can implement virNodeAllocPages. Basically there's just one catch. If users specify startCell == -1, they want the huge pages change to be stretched over all the nodes. Therefore we just recalculate the change they want to make to each node and continue.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+)
diff --git a/src/test/test_driver.c b/src/test/test_driver.c index a3f74f8..d9e408e 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -2867,6 +2867,83 @@ testNodeGetFreePages(virConnectPtr conn, return ret; }
+static int +testNodeAllocPages(virConnectPtr conn, + unsigned int npages, + unsigned int *pageSizes, + unsigned long long *pageCounts, + int startCell, + unsigned int cellCount, + unsigned int flags) +{ + testDriverPtr privconn = conn->privateData; + bool add = !(flags & VIR_NODE_ALLOC_PAGES_SET); + ssize_t i, ncounts = 0; + size_t j; + int lastCell; + int ret = -1; + + virCheckFlags(VIR_NODE_ALLOC_PAGES_SET, -1); + + testDriverLock(privconn); + + lastCell = privconn->numCells - 1;
lastCell would be 1 then.
+ + if (startCell < -1 || startCell > lastCell) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("start cell %d out of range (0-%d)"), + startCell, lastCell); + goto cleanup; + } + + lastCell = MIN(lastCell, startCell + (int) cellCount - 1);
if cellCount is 2, then lastCell = MIN(1, -1 + 2 - 1) ? Does the following hunk belong before the startCell < -1 if?
+ + if (startCell == -1) { + /* Okay, hold on to your hats because this is gonna get wild. + * startCell == -1 means that user wants us to allocate + * pages over all NUMA nodes proportionally. Just + * recalculate the pageCounts and we should be good. */
Color me confounded.
+ for (j = 0; j < npages; j++) + pageCounts[j] /= privconn->numCells;
Why does this work? pageCounts[j] would essentially be divided by 2, but isn't that an input value? How does that make us ensure we're pulling "equally" from across all cells. That is what's the difference between supplying 0 or -1 for startCell? I'm missing something...
+ startCell = 0; + lastCell = privconn->numCells - 1;
So this doesn't change from our default?
+ } +
You could probably explain the following to me 5 different ways and each time I'd probably respond, huh?
+ for (i = startCell; i <= lastCell; i++) { + testCellPtr cell = &privconn->cells[i]; + size_t k; + + for (j = 0; j < cell->npages; j++) { + for (k = 0; k < npages; k++) { + unsigned long long pagesFree = cell->pages[j].pagesFree; + + if (pageSizes[k] != cell->pages[j].pagesize) + continue;
Going back to patch 3, this seems to match what I was thinking... The rest of this I guess I'm not sure what to say other than I'm lost. John
+ + if (add) + pagesFree += pageCounts[k]; + else + pagesFree = pageCounts[k]; + + if (pagesFree > cell->pages[j].pages) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to allocate %llu pages"), + pagesFree); + goto cleanup; + } + + cell->pages[j].pagesFree = pagesFree; + ncounts++; + } + } + } + + ret = ncounts; + cleanup: + testDriverUnlock(privconn); + return ret; +} + static int testDomainCreateWithFlags(virDomainPtr domain, unsigned int flags) { testDriverPtr privconn = domain->conn->privateData; @@ -6872,6 +6949,7 @@ static virHypervisorDriver testHypervisorDriver = { .nodeGetCPUStats = testNodeGetCPUStats, /* 2.3.0 */ .nodeGetFreeMemory = testNodeGetFreeMemory, /* 2.3.0 */ .nodeGetFreePages = testNodeGetFreePages, /* 2.3.0 */ + .nodeAllocPages = testNodeAllocPages, /* 2.4.0 */ .connectGetCapabilities = testConnectGetCapabilities, /* 0.2.1 */ .connectGetSysinfo = testConnectGetSysinfo, /* 2.3.0 */ .connectGetType = testConnectGetType, /* 2.3.0 */

On 13.10.2016 05:38, John Ferlan wrote:
On 10/05/2016 03:30 AM, Michal Privoznik wrote:
Now that our cells in test driver are huge pages aware, we can implement virNodeAllocPages. Basically there's just one catch. If users specify startCell == -1, they want the huge pages change to be stretched over all the nodes. Therefore we just recalculate the change they want to make to each node and continue.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+)
diff --git a/src/test/test_driver.c b/src/test/test_driver.c index a3f74f8..d9e408e 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -2867,6 +2867,83 @@ testNodeGetFreePages(virConnectPtr conn, return ret; }
+static int +testNodeAllocPages(virConnectPtr conn, + unsigned int npages, + unsigned int *pageSizes, + unsigned long long *pageCounts, + int startCell, + unsigned int cellCount, + unsigned int flags) +{ + testDriverPtr privconn = conn->privateData; + bool add = !(flags & VIR_NODE_ALLOC_PAGES_SET); + ssize_t i, ncounts = 0; + size_t j; + int lastCell; + int ret = -1; + + virCheckFlags(VIR_NODE_ALLOC_PAGES_SET, -1); + + testDriverLock(privconn); + + lastCell = privconn->numCells - 1;
lastCell would be 1 then.
+ + if (startCell < -1 || startCell > lastCell) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("start cell %d out of range (0-%d)"), + startCell, lastCell); + goto cleanup; + } + + lastCell = MIN(lastCell, startCell + (int) cellCount - 1);
if cellCount is 2, then lastCell = MIN(1, -1 + 2 - 1) ?
Ah. So I haven't though of the case where user wants both behaviours: startcell = -1 cellCount = 2. He/she is basically telling us: in the first step I want you to allocate the pages through all NUMA nodes, and in the second step, I want you to allocate some additional pages on node #0. Damn, this is gonna be twice as complicated. Okay, let me rework this and post v2.
Does the following hunk belong before the startCell < -1 if?
+ + if (startCell == -1) { + /* Okay, hold on to your hats because this is gonna get wild. + * startCell == -1 means that user wants us to allocate + * pages over all NUMA nodes proportionally. Just + * recalculate the pageCounts and we should be good. */
Color me confounded.
+ for (j = 0; j < npages; j++) + pageCounts[j] /= privconn->numCells;
Why does this work? pageCounts[j] would essentially be divided by 2, but isn't that an input value? How does that make us ensure we're pulling "equally" from across all cells. That is what's the difference between supplying 0 or -1 for startCell? I'm missing something...
Exactly. I mean, if user tells us the following: npages = 1, pageSizes= {2048}, pageCounts = {678}, startCell = -1, cellCount = 1 then they want to allocate 678 of 2M pages across the whole host. They don't care whether they will be distributed equally throughout all NUMA nodes. For what they know, all 678 pages can be allocated on node #0. Or you know, any other combination that adds up to 678. If, however, they care where the pages are allocated, they have to provide startCell != -1. So for instance, if host has 2 NUMA nodes, and they want to allocate 678 pages in total and have them distributed equally between the two nodes: npages =1, pageSizes = {2048}, pageCounts = {339}, startCell = 0, cellCount = 2 Yup, it is very complicated API, because it allows you to allocate more page sizes across multiple NUMA nodes. Anyway, because this is all emulated in our test driver (and we specifically document that we guarantee no equal distribution for startCell = -1 case), we can distribute the allocation as we please. So I went with equal distribution.
+ startCell = 0; + lastCell = privconn->numCells - 1;
So this doesn't change from our default?
+ } +
You could probably explain the following to me 5 different ways and each time I'd probably respond, huh?
+ for (i = startCell; i <= lastCell; i++) { + testCellPtr cell = &privconn->cells[i]; + size_t k; + + for (j = 0; j < cell->npages; j++) { + for (k = 0; k < npages; k++) { + unsigned long long pagesFree = cell->pages[j].pagesFree; + + if (pageSizes[k] != cell->pages[j].pagesize) + continue;
Going back to patch 3, this seems to match what I was thinking...
The rest of this I guess I'm not sure what to say other than I'm lost.
John
+ + if (add) + pagesFree += pageCounts[k]; + else + pagesFree = pageCounts[k]; + + if (pagesFree > cell->pages[j].pages) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to allocate %llu pages"), + pagesFree); + goto cleanup; + } + + cell->pages[j].pagesFree = pagesFree; + ncounts++;
Oh, just figured. I need to modify the pool allocation size, not how much free pages is there. That is I need to modify cell-pages[j].pages. And also, I should probably recalculate the pagesFree then... Sigh. It is complicated and easy to get lost in. Will send v2. Michal

On 10/12/2016 10:57 PM, Michal Privoznik wrote:
On 13.10.2016 05:38, John Ferlan wrote:
On 10/05/2016 03:30 AM, Michal Privoznik wrote:
Now that our cells in test driver are huge pages aware, we can implement virNodeAllocPages. Basically there's just one catch. If users specify startCell == -1, they want the huge pages change to be stretched over all the nodes. Therefore we just recalculate the change they want to make to each node and continue.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/test/test_driver.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+)
diff --git a/src/test/test_driver.c b/src/test/test_driver.c index a3f74f8..d9e408e 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -2867,6 +2867,83 @@ testNodeGetFreePages(virConnectPtr conn, return ret; }
+static int +testNodeAllocPages(virConnectPtr conn, + unsigned int npages, + unsigned int *pageSizes, + unsigned long long *pageCounts, + int startCell, + unsigned int cellCount, + unsigned int flags) +{ + testDriverPtr privconn = conn->privateData; + bool add = !(flags & VIR_NODE_ALLOC_PAGES_SET); + ssize_t i, ncounts = 0; + size_t j; + int lastCell; + int ret = -1; + + virCheckFlags(VIR_NODE_ALLOC_PAGES_SET, -1); + + testDriverLock(privconn); + + lastCell = privconn->numCells - 1;
lastCell would be 1 then.
+ + if (startCell < -1 || startCell > lastCell) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("start cell %d out of range (0-%d)"), + startCell, lastCell); + goto cleanup; + } + + lastCell = MIN(lastCell, startCell + (int) cellCount - 1);
if cellCount is 2, then lastCell = MIN(1, -1 + 2 - 1) ?
Ah. So I haven't though of the case where user wants both behaviours: startcell = -1 cellCount = 2. He/she is basically telling us: in the first step I want you to allocate the pages through all NUMA nodes, and in the second step, I want you to allocate some additional pages on node #0. Damn, this is gonna be twice as complicated. Okay, let me rework this and post v2.
Well it is a test of some basic world - I'm unclear of the rules of the game of parameters, I'm reacting to what I see. It is complicated as you continue to describe below... Glad I'm not the expert! John
Does the following hunk belong before the startCell < -1 if?
+ + if (startCell == -1) { + /* Okay, hold on to your hats because this is gonna get wild. + * startCell == -1 means that user wants us to allocate + * pages over all NUMA nodes proportionally. Just + * recalculate the pageCounts and we should be good. */
Color me confounded.
+ for (j = 0; j < npages; j++) + pageCounts[j] /= privconn->numCells;
Why does this work? pageCounts[j] would essentially be divided by 2, but isn't that an input value? How does that make us ensure we're pulling "equally" from across all cells. That is what's the difference between supplying 0 or -1 for startCell? I'm missing something...
Exactly. I mean, if user tells us the following:
npages = 1, pageSizes= {2048}, pageCounts = {678}, startCell = -1, cellCount = 1
then they want to allocate 678 of 2M pages across the whole host. They don't care whether they will be distributed equally throughout all NUMA nodes. For what they know, all 678 pages can be allocated on node #0. Or you know, any other combination that adds up to 678. If, however, they care where the pages are allocated, they have to provide startCell != -1. So for instance, if host has 2 NUMA nodes, and they want to allocate 678 pages in total and have them distributed equally between the two nodes:
npages =1, pageSizes = {2048}, pageCounts = {339}, startCell = 0, cellCount = 2
Yup, it is very complicated API, because it allows you to allocate more page sizes across multiple NUMA nodes.
Anyway, because this is all emulated in our test driver (and we specifically document that we guarantee no equal distribution for startCell = -1 case), we can distribute the allocation as we please. So I went with equal distribution.
+ startCell = 0; + lastCell = privconn->numCells - 1;
So this doesn't change from our default?
+ } +
You could probably explain the following to me 5 different ways and each time I'd probably respond, huh?
+ for (i = startCell; i <= lastCell; i++) { + testCellPtr cell = &privconn->cells[i]; + size_t k; + + for (j = 0; j < cell->npages; j++) { + for (k = 0; k < npages; k++) { + unsigned long long pagesFree = cell->pages[j].pagesFree; + + if (pageSizes[k] != cell->pages[j].pagesize) + continue;
Going back to patch 3, this seems to match what I was thinking...
The rest of this I guess I'm not sure what to say other than I'm lost.
John
+ + if (add) + pagesFree += pageCounts[k]; + else + pagesFree = pageCounts[k]; + + if (pagesFree > cell->pages[j].pages) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to allocate %llu pages"), + pagesFree); + goto cleanup; + } + + cell->pages[j].pagesFree = pagesFree; + ncounts++;
Oh, just figured. I need to modify the pool allocation size, not how much free pages is there. That is I need to modify cell-pages[j].pages. And also, I should probably recalculate the pagesFree then... Sigh. It is complicated and easy to get lost in.
Will send v2.
Michal
participants (2)
-
John Ferlan
-
Michal Privoznik