[libvirt] [PATCH RFC 0/2] Implement l3 CAT

This is a RFC patch to implement l3 CAT. There's old RFC V3 [1] patch, but don't get any attentions, reason maybe that it's is not a workable one. I address some of the comments for RFC V2 version from Martin. 1. Add file lock while access /sys/fs/resctrl base on kernel documents [3]. 2. Variable renaming. 3. Tested locally to create vm with cachetune defined and works well. Issues want to get some comments: I can not pass syntax-check as I reference the cache tune and cachebank definition (from conf/domain_conf.h). Since I need a complex struct to pass cachetune and host's cachetune information to resctrl. I can split patch if it's hard to review. Sorry for inconvient. Patch was pushed to github [4] for easing read. [1] https://www.redhat.com/archives/libvir-list/2017-June/msg00069.html [2] https://www.redhat.com/archives/libvir-list/2017-April/msg01466.html [3] https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt [4] https://github.com/taget/libvirt/commits/cache Eli Qiao (2): Resctrl: Add new xml element to support cache tune Resctrl: Add uitls functions to operate sysfs resctrl docs/schemas/domaincommon.rng | 54 ++ include/libvirt/virterror.h | 1 + src/Makefile.am | 1 + src/conf/domain_conf.c | 131 +++++ src/conf/domain_conf.h | 21 + src/libvirt_private.syms | 9 + src/qemu/qemu_process.c | 54 ++ src/util/virerror.c | 1 + src/util/virresctrl.c | 851 ++++++++++++++++++++++++++++++ src/util/virresctrl.h | 79 +++ tests/Makefile.am | 8 +- tests/virresctrldata/L3-free.schemata | 1 + tests/virresctrldata/L3CODE-free.schemata | 1 + tests/virresctrldata/L3DATA-free.schemata | 1 + tests/virresctrldata/linux-resctrl | 1 + tests/virresctrldata/linux-resctrl-cdp | 1 + tests/virresctrltest.c | 119 +++++ 17 files changed, 1333 insertions(+), 1 deletion(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h create mode 100644 tests/virresctrldata/L3-free.schemata create mode 100644 tests/virresctrldata/L3CODE-free.schemata create mode 100644 tests/virresctrldata/L3DATA-free.schemata create mode 120000 tests/virresctrldata/linux-resctrl create mode 120000 tests/virresctrldata/linux-resctrl-cdp create mode 100644 tests/virresctrltest.c -- 1.9.1

This patch adds new xml element to support cache tune as: <cputune> ... <cachetune id='0' cache_id='0' level='3' type='both' size='2816' unit='KiB' vcpus='1,2'/> ... </cputune> id: any non-minus number cache_id: reference of the host's cache banks id, it's from capabilities level: cache level type: cache bank type size: should be multiples of the min_size of the bank on host. vcpus: cache allocation on vcpu set, if empty, will apply the allocation on all vcpus Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- docs/schemas/domaincommon.rng | 54 +++++++++++++++++ src/conf/domain_conf.c | 131 ++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 21 +++++++ 3 files changed, 206 insertions(+) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 4950ddc..11dbb55 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -834,6 +834,35 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name="cachetune"> + <attribute name="id"> + <ref name="cachetuneid"/> + </attribute> + <attribute name="cache_id"> + <ref name="cacheid"/> + </attribute> + <attribute name="level"> + <ref name="cachelevel"/> + </attribute> + <attribute name="type"> + <ref name="cachetype"/> + </attribute> + <attribute name="size"> + <ref name="unsignedInt"/> + </attribute> + <optional> + <attribute name="unit"> + <ref name="cacheunit"/> + </attribute> + </optional> + <optional> + <attribute name="vcpus"> + <ref name="cpuset"/> + </attribute> + </optional> + </element> + </zeroOrMore> <optional> <element name="emulatorpin"> <attribute name="cpuset"> @@ -5686,6 +5715,31 @@ <param name="minInclusive">-1</param> </data> </define> + <define name="cachetuneid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define> + <define name="cacheid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define> + <define name="cachelevel"> + <data type="unsignedShort"> + <param name="pattern">(3)</param> + </data> + </define> + <define name="cachetype"> + <data type="string"> + <param name="pattern">(both|code|data)</param> + </data> + </define> + <define name="cacheunit"> + <data type="string"> + <param name="pattern">KiB</param> + </data> + </define> <!-- weight currently is in range [100, 1000] --> <define name="weight"> <data type="unsignedInt"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 5c32f93..93984bc 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16250,6 +16250,104 @@ virDomainVcpuPinDefParseXML(virDomainDefPtr def, return ret; } +/* Parse the XML definition for cachetune + * and a cachetune has the form + * <cachetune id='0' cache_id='0' level='3' type='both' size='1024' unit='KiB'/> + */ +static int +virDomainCacheTuneDefParseXML(virDomainDefPtr def, + int n, + xmlNodePtr* nodes) +{ + char* tmp = NULL; + size_t i; + int type = -1; + virDomainCacheBankPtr bank = NULL; + + if (VIR_ALLOC_N(bank, n) < 0) + goto cleanup; + + for (i = 0; i < n; i++) { + if (!(tmp = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache id '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "cache_id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing cache id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].cache_id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache host id '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "level"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing cache level in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].level)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache level '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + + if (!(tmp = virXMLPropString(nodes[i], "size"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing size in cache tune")); + goto cleanup; + } + if (virStrToLong_ull(tmp, NULL, 10, &(bank[i].size)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache size '%s'"), tmp); + goto cleanup; + } + /* convert to B */ + bank[i].size *= 1024; + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "type"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing cache type")); + goto cleanup; + } + if ((type = virCacheTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'unsupported cache type '%s'"), tmp); + goto cleanup; + } + + if ((tmp = virXMLPropString(nodes[i], "vcpus"))) { + if (virBitmapParse(tmp, &bank[i].vcpus, + VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(bank[i].vcpus)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'vcpus': %s"), tmp); + goto cleanup; + } + } + } + + def->cachetune.cache_banks = bank; + def->cachetune.n_banks = n; + return 0; + + cleanup: + VIR_FREE(bank); + VIR_FREE(tmp); + return -1; +} /* Parse the XML definition for a iothreadpin * and an iothreadspin has the form @@ -17538,6 +17636,14 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/cachetune", ctxt, &nodes)) < 0) + goto error; + + if (virDomainCacheTuneDefParseXML(def, n, nodes) < 0) + goto error; + + VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/emulatorpin", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("cannot extract emulatorpin nodes")); @@ -24328,6 +24434,29 @@ virDomainSchedulerFormat(virBufferPtr buf, } +static void +virDomainCacheTuneDefFormat(virBufferPtr buf, + virDomainCachetunePtr cache) +{ + size_t i; + for (i = 0; i < cache->n_banks; i ++) { + virBufferAsprintf(buf, "<cachetune id='%u' cache_id='%u' " + "level='%u' type='%s' size='%llu' unit='KiB'", + cache->cache_banks[i].id, + cache->cache_banks[i].cache_id, + cache->cache_banks[i].level, + virCacheTypeToString(cache->cache_banks[i].type), + cache->cache_banks[i].size / 1024); + + if (cache->cache_banks[i].vcpus) + virBufferAsprintf(buf, " vcpus='%s'/>\n", + virBitmapFormat(cache->cache_banks[i].vcpus)); + else + virBufferAddLit(buf, "/>\n"); + } +} + + static int virDomainCputuneDefFormat(virBufferPtr buf, virDomainDefPtr def) @@ -24374,6 +24503,8 @@ virDomainCputuneDefFormat(virBufferPtr buf, "</iothread_quota>\n", def->cputune.iothread_quota); + virDomainCacheTuneDefFormat(&childrenBuf, &def->cachetune); + for (i = 0; i < def->maxvcpus; i++) { char *cpumask; virDomainVcpuDefPtr vcpu = def->vcpus[i]; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e67b6fd..0a3566a 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2190,6 +2190,25 @@ struct _virDomainMemtune { int allocation; /* enum virDomainMemoryAllocation */ }; +typedef struct _virDomainCacheBank virDomainCacheBank; +typedef virDomainCacheBank *virDomainCacheBankPtr; + +struct _virDomainCacheBank { + unsigned int id; + unsigned int cache_id; /* cache id to be allocated on */ + unsigned int level; /* cache level */ + int type; /* enum virCacheType */ + unsigned long long size; /* in kibibytes */ + virBitmapPtr vcpus; +}; + +typedef struct _virDomainCachetune virDomainCachetune; +typedef virDomainCachetune *virDomainCachetunePtr; +struct _virDomainCachetune { + size_t n_banks; + virDomainCacheBankPtr cache_banks; +}; + typedef struct _virDomainPowerManagement virDomainPowerManagement; typedef virDomainPowerManagement *virDomainPowerManagementPtr; @@ -2263,6 +2282,8 @@ struct _virDomainDef { virDomainCputune cputune; + virDomainCachetune cachetune; + virDomainNumaPtr numa; virDomainResourceDefPtr resource; virDomainIdMapDef idmap; -- 1.9.1

On Mon, Jun 12, 2017 at 05:48:40PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='0' cache_id='0' level='3' type='both' size='2816' unit='KiB' vcpus='1,2'/>
The cache_id automatically implies level and type. Either have one or the other. I know we talked about this already (maybe multiple times), but without any clear outcome. For me the sensible thing is to have level and type as that doesn't need to be changed when moving between hosts, and if it cannot be migrated, then it's properly checked.
... </cputune>
id: any non-minus number cache_id: reference of the host's cache banks id, it's from capabilities level: cache level type: cache bank type size: should be multiples of the min_size of the bank on host.
must be multiple of granularity, must be greater than or equal to minimum.
vcpus: cache allocation on vcpu set, if empty, will apply the allocation on all vcpus
Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- docs/schemas/domaincommon.rng | 54 +++++++++++++++++ src/conf/domain_conf.c | 131 ++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 21 +++++++ 3 files changed, 206 insertions(+)
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 4950ddc..11dbb55 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -834,6 +834,35 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name="cachetune"> + <attribute name="id"> + <ref name="cachetuneid"/> + </attribute> + <attribute name="cache_id"> + <ref name="cacheid"/> + </attribute> + <attribute name="level"> + <ref name="cachelevel"/> + </attribute> + <attribute name="type"> + <ref name="cachetype"/> + </attribute> + <attribute name="size"> + <ref name="unsignedInt"/> + </attribute> + <optional> + <attribute name="unit"> + <ref name="cacheunit"/> + </attribute> + </optional> + <optional> + <attribute name="vcpus"> + <ref name="cpuset"/> + </attribute> + </optional> + </element> + </zeroOrMore> <optional> <element name="emulatorpin"> <attribute name="cpuset"> @@ -5686,6 +5715,31 @@ <param name="minInclusive">-1</param> </data> </define> + <define name="cachetuneid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define> + <define name="cacheid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define>
These are useless ^^
+ <define name="cachelevel"> + <data type="unsignedShort"> + <param name="pattern">(3)</param>
Either don't restrict it at all or do the following: <choice> <!-- For now we only support allocation of L3 caches --> <value>3</value> <choice>
+ </data> + </define> + <define name="cachetype"> + <data type="string"> + <param name="pattern">(both|code|data)</param> + </data> + </define>
<choice> <value>both</value> <value>code</value> <value>data</value> </choice>
+ <define name="cacheunit"> + <data type="string"> + <param name="pattern">KiB</param> + </data> + </define>
This doesn't have to be in KiB, otherwise the unit doesn't make sense. For all the units I'm sure there already is a define in the sources somewhere.
<!-- weight currently is in range [100, 1000] --> <define name="weight"> <data type="unsignedInt"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 5c32f93..93984bc 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16250,6 +16250,104 @@ virDomainVcpuPinDefParseXML(virDomainDefPtr def, return ret; }
+/* Parse the XML definition for cachetune + * and a cachetune has the form + * <cachetune id='0' cache_id='0' level='3' type='both' size='1024' unit='KiB'/> + */ +static int +virDomainCacheTuneDefParseXML(virDomainDefPtr def, + int n, + xmlNodePtr* nodes) +{ + char* tmp = NULL; + size_t i; + int type = -1; + virDomainCacheBankPtr bank = NULL; + + if (VIR_ALLOC_N(bank, n) < 0) + goto cleanup; + + for (i = 0; i < n; i++) { + if (!(tmp = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache id '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "cache_id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing cache id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].cache_id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache host id '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "level"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing cache level in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].level)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache level '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + + if (!(tmp = virXMLPropString(nodes[i], "size"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing size in cache tune")); + goto cleanup; + } + if (virStrToLong_ull(tmp, NULL, 10, &(bank[i].size)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache size '%s'"), tmp);
Your error messages until now were almost consistent
+ goto cleanup; + } + /* convert to B */ + bank[i].size *= 1024; + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "type"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing cache type")); + goto cleanup; + } + if ((type = virCacheTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'unsupported cache type '%s'"), tmp);
But now it's different, plus s/'unsupp/unsupp/
+ goto cleanup; + } + + if ((tmp = virXMLPropString(nodes[i], "vcpus"))) { + if (virBitmapParse(tmp, &bank[i].vcpus, + VIR_DOMAIN_CPUMASK_LEN) < 0)
This can be one line
+ goto cleanup; + + if (virBitmapIsAllClear(bank[i].vcpus)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'vcpus': %s"), tmp);
And now there's a colon, apostrophes around 'vcpus', but tmp can be empty string and that will not be very visible...
+ goto cleanup; + } + } + } + + def->cachetune.cache_banks = bank; + def->cachetune.n_banks = n; + return 0; + + cleanup: + VIR_FREE(bank); + VIR_FREE(tmp); + return -1; +}
/* Parse the XML definition for a iothreadpin * and an iothreadspin has the form @@ -17538,6 +17636,14 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes);
+ if ((n = virXPathNodeSet("./cputune/cachetune", ctxt, &nodes)) < 0) + goto error; + + if (virDomainCacheTuneDefParseXML(def, n, nodes) < 0)
You are calling it even when n == 0
+ goto error; + + VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/emulatorpin", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("cannot extract emulatorpin nodes")); @@ -24328,6 +24434,29 @@ virDomainSchedulerFormat(virBufferPtr buf, }
+static void +virDomainCacheTuneDefFormat(virBufferPtr buf, + virDomainCachetunePtr cache) +{ + size_t i;
Missing empty line
+ for (i = 0; i < cache->n_banks; i ++) { + virBufferAsprintf(buf, "<cachetune id='%u' cache_id='%u' " + "level='%u' type='%s' size='%llu' unit='KiB'", + cache->cache_banks[i].id, + cache->cache_banks[i].cache_id, + cache->cache_banks[i].level, + virCacheTypeToString(cache->cache_banks[i].type), + cache->cache_banks[i].size / 1024); + + if (cache->cache_banks[i].vcpus) + virBufferAsprintf(buf, " vcpus='%s'/>\n", + virBitmapFormat(cache->cache_banks[i].vcpus)); + else + virBufferAddLit(buf, "/>\n"); + } +} + + static int virDomainCputuneDefFormat(virBufferPtr buf, virDomainDefPtr def) @@ -24374,6 +24503,8 @@ virDomainCputuneDefFormat(virBufferPtr buf, "</iothread_quota>\n", def->cputune.iothread_quota);
+ virDomainCacheTuneDefFormat(&childrenBuf, &def->cachetune); + for (i = 0; i < def->maxvcpus; i++) { char *cpumask; virDomainVcpuDefPtr vcpu = def->vcpus[i]; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e67b6fd..0a3566a 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2190,6 +2190,25 @@ struct _virDomainMemtune { int allocation; /* enum virDomainMemoryAllocation */ };
+typedef struct _virDomainCacheBank virDomainCacheBank; +typedef virDomainCacheBank *virDomainCacheBankPtr; + +struct _virDomainCacheBank { + unsigned int id; + unsigned int cache_id; /* cache id to be allocated on */ + unsigned int level; /* cache level */ + int type; /* enum virCacheType */ + unsigned long long size; /* in kibibytes */ + virBitmapPtr vcpus; +}; + +typedef struct _virDomainCachetune virDomainCachetune; +typedef virDomainCachetune *virDomainCachetunePtr; +struct _virDomainCachetune { + size_t n_banks; + virDomainCacheBankPtr cache_banks; +}; +
I don't think there's a real need for this intermediate structure, but since you want to be passing it around everywhere...
typedef struct _virDomainPowerManagement virDomainPowerManagement; typedef virDomainPowerManagement *virDomainPowerManagementPtr;
@@ -2263,6 +2282,8 @@ struct _virDomainDef {
virDomainCputune cputune;
+ virDomainCachetune cachetune; +
It is part of cputune in the XML, why not here?
virDomainNumaPtr numa; virDomainResourceDefPtr resource; virDomainIdMapDef idmap; -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

hi Martin It’s really nice of you to help reviewing the mass code. Thanks. I don’t find a better way to split patch. On Wednesday, 21 June 2017 at 9:53 PM, Martin Kletzander wrote:
On Mon, Jun 12, 2017 at 05:48:40PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='0' cache_id='0' level='3' type='both' size='2816' unit='KiB' vcpus='1,2'/>
The cache_id automatically implies level and type. Either have one or the other. I know we talked about this already (maybe multiple times), but without any clear outcome. For me the sensible thing is to have level and type as that doesn't need to be changed when moving between hosts, and if it cannot be migrated, then it's properly checked.
Think about this case, if the VM has numa setting, the VM has multiple vcpu running across sockets, if we don’t specify cache_id (cache id stand for on which Socket/Cache), how can we know on which Socket we allocation for the VM? I can image there’s 2 cases: 1. if we don’t specify vcpus, and our host have 2 or more Socket, we have this xml define <cachetune id='0' level='3' type='both' size='2816' unit=‘KiB’> We allocate 2816 KiB cache on all of the Socket/Cache. 2. if we specify vcpus <cachetune id='0' level='3' type='both' size='2816' unit=‘KiB’, vcpus=‘1,2'> <cachetune id=‘1' level='3' type='both' size=‘5632' unit=‘KiB’, vcpus=‘3,4’> We need to make sure we vcpu 1, 2 are mapped to Socket/Cache 0 and 3,4 on Socket/Cache 1. So that vcpus running on Socket/Cache 0 has 2816 KiB cache allocated and vcpus running on Socket/Cache 1 has 5632 KiB cache allocated. Does it make sense? …
virDomainCputune cputune;
+ virDomainCachetune cachetune; +
It is part of cputune in the XML, why not here?
Oh yes, I will rethink how to simple the domain cache tune.
virDomainNumaPtr numa; virDomainResourceDefPtr resource; virDomainIdMapDef idmap; -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list

On Thu, Jun 22, 2017 at 01:24:29PM +0800, Eli Qiao wrote:
hi Martin
It’s really nice of you to help reviewing the mass code. Thanks.
I don’t find a better way to split patch.
On Wednesday, 21 June 2017 at 9:53 PM, Martin Kletzander wrote:
On Mon, Jun 12, 2017 at 05:48:40PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='0' cache_id='0' level='3' type='both' size='2816' unit='KiB' vcpus='1,2'/>
The cache_id automatically implies level and type. Either have one or the other. I know we talked about this already (maybe multiple times), but without any clear outcome. For me the sensible thing is to have level and type as that doesn't need to be changed when moving between hosts, and if it cannot be migrated, then it's properly checked.
Think about this case, if the VM has numa setting, the VM has multiple vcpu running across sockets, if we don’t specify cache_id (cache id stand for on which Socket/Cache), how can we know on which Socket we allocation for the VM?
I missed this. Ye, you're right, thanks for showing that with a use case. We could then require only the cache_id and automatically fill in level and type. Migration (or rather any start) should then fail if that cache_id doesn't correspond to the level and type requested. I still can't find a reason for 'id', though.
I can image there’s 2 cases:
1. if we don’t specify vcpus, and our host have 2 or more Socket, we have this xml define
<cachetune id='0' level='3' type='both' size='2816' unit=‘KiB’>
We allocate 2816 KiB cache on all of the Socket/Cache.
2. if we specify vcpus <cachetune id='0' level='3' type='both' size='2816' unit=‘KiB’, vcpus=‘1,2'>
<cachetune id=‘1' level='3' type='both' size=‘5632' unit=‘KiB’, vcpus=‘3,4’>
We need to make sure we vcpu 1, 2 are mapped to Socket/Cache 0 and 3,4 on Socket/Cache 1. So that vcpus running on Socket/Cache 0 has 2816 KiB cache allocated and vcpus running on Socket/Cache 1 has 5632 KiB cache allocated.
Does it make sense?
…
virDomainCputune cputune;
+ virDomainCachetune cachetune; +
It is part of cputune in the XML, why not here?
Oh yes, I will rethink how to simple the domain cache tune.
virDomainNumaPtr numa; virDomainResourceDefPtr resource; virDomainIdMapDef idmap; -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list

On Thursday, 22 June 2017 at 8:41 PM, Martin Kletzander wrote:
I missed this. Ye, you're right, thanks for showing that with a use case. We could then require only the cache_id and automatically fill in level and type. Migration (or rather any start) should then fail if that cache_id doesn't correspond to the level and type requested. I still can't find a reason for 'id', though.
I agree that id, level is useless, but also type is required, the case is when host’s enabled CDP like this: <bank id='0' level='3' type='both' size='15360' unit='KiB' cpus='0-5'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> </bank> the cachetune need to be defined as: <cachetune cache_id='0' type=‘l3data' size='2816' unit='KiB' vcpus=‘0,1’/> <cachetune cache_id='0' type=‘l3code' size='2816' unit='KiB' vcpus=‘0,1’/> <cachetune cache_id=‘1' type=‘l3data' size='2816' unit='KiB' vcpus=‘2,3’/> <cachetune cache_id=‘1' type=‘l3code' size='2816' unit='KiB' vcpus=’2,3’/> Thought?

This patch adds 3 major private interface. virResctrlGetFreeCache: return free cache, default cache substract cache allocated. virResctrlSetCachetunes: set cache banks which defined in a domain. virResctrlRemoveCachetunes: remove cache allocation group from the host. There's some existed issue when do syntax-check as I reference the cache tune and cachebank definition (from conf/domain_conf.h) in util/virresctrl.h. --- include/libvirt/virterror.h | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 9 + src/qemu/qemu_process.c | 54 ++ src/util/virerror.c | 1 + src/util/virresctrl.c | 851 ++++++++++++++++++++++++++++++ src/util/virresctrl.h | 79 +++ tests/Makefile.am | 8 +- tests/virresctrldata/L3-free.schemata | 1 + tests/virresctrldata/L3CODE-free.schemata | 1 + tests/virresctrldata/L3DATA-free.schemata | 1 + tests/virresctrldata/linux-resctrl | 1 + tests/virresctrldata/linux-resctrl-cdp | 1 + tests/virresctrltest.c | 119 +++++ 14 files changed, 1127 insertions(+), 1 deletion(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h create mode 100644 tests/virresctrldata/L3-free.schemata create mode 100644 tests/virresctrldata/L3CODE-free.schemata create mode 100644 tests/virresctrldata/L3DATA-free.schemata create mode 120000 tests/virresctrldata/linux-resctrl create mode 120000 tests/virresctrldata/linux-resctrl-cdp create mode 100644 tests/virresctrltest.c diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 2efee8f..4bc0c74 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -132,6 +132,7 @@ typedef enum { VIR_FROM_PERF = 65, /* Error from perf */ VIR_FROM_LIBSSH = 66, /* Error from libssh connection transport */ + VIR_FROM_RESCTRL = 67, /* Error from resctrl */ # ifdef VIR_ENUM_SENTINELS VIR_ERR_DOMAIN_LAST diff --git a/src/Makefile.am b/src/Makefile.am index eae32dc..8dbb778 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -167,6 +167,7 @@ UTIL_SOURCES = \ util/virprocess.c util/virprocess.h \ util/virqemu.c util/virqemu.h \ util/virrandom.h util/virrandom.c \ + util/virresctrl.h util/virresctrl.c \ util/virrotatingfile.h util/virrotatingfile.c \ util/virscsi.c util/virscsi.h \ util/virscsihost.c util/virscsihost.h \ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b6c828f..7392cfa 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2440,6 +2440,15 @@ virRandomGenerateWWN; virRandomInt; +# util/virresctrl.h +virResctrlBitmap2String; +virResctrlFreeSchemata; +virResctrlGetFreeCache; +virResctrlRemoveCachetunes; +virResctrlSetCachetunes; +virResctrlTypeToString; + + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 4a66f0d..8efeb19 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -70,6 +70,7 @@ #include "virbitmap.h" #include "viratomic.h" #include "virnuma.h" +#include "virresctrl.h" #include "virstring.h" #include "virhostdev.h" #include "secret_util.h" @@ -5088,6 +5089,51 @@ qemuProcessSetupVcpus(virDomainObjPtr vm) return 0; } +static int +qemuProcessSetCacheBanks(virCapsHostPtr caps, virDomainObjPtr vm) +{ + size_t i, j; + virDomainCachetunePtr cachetune; + unsigned int max_vcpus = virDomainDefGetVcpusMax(vm->def); + pid_t *pids = NULL; + virDomainVcpuDefPtr vcpu; + size_t npid = 0; + size_t count = 0; + int ret = -1; + + cachetune = &(vm->def->cachetune); + + for (i = 0; i < cachetune->n_banks; i++) { + if (cachetune->cache_banks[i].vcpus) { + for (j = 0; j < max_vcpus; j++) { + if (virBitmapIsBitSet(cachetune->cache_banks[i].vcpus, j)) { + + vcpu = virDomainDefGetVcpu(vm->def, j); + if (!vcpu->online) + continue; + + if (VIR_RESIZE_N(pids, npid, count, 1) < 0) + goto cleanup; + pids[count ++] = qemuDomainGetVcpuPid(vm, j); + } + } + } + } + + /* If not specify vcpus in cachetune, add vm->pid */ + if (pids == NULL) { + if (VIR_ALLOC_N(pids, 1) < 0) + goto cleanup; + pids[0] = vm->pid; + count = 1; + } + ret = virResctrlSetCachetunes(caps, cachetune, vm->def->uuid, pids, count); + + cleanup: + VIR_FREE(pids); + return ret; +} + int qemuProcessSetupIOThread(virDomainObjPtr vm, @@ -5914,6 +5960,11 @@ qemuProcessLaunch(virConnectPtr conn, qemuProcessAutoDestroyAdd(driver, vm, conn) < 0) goto cleanup; + VIR_WARN("Cache allocation"); + if (qemuProcessSetCacheBanks(&(driver->caps->host), + vm) < 0) + goto cleanup; + ret = 0; cleanup: @@ -6419,6 +6470,9 @@ void qemuProcessStop(virQEMUDriverPtr driver, virPerfFree(priv->perf); priv->perf = NULL; + if (&(vm->def->cachetune) != NULL) + virResctrlRemoveCachetunes(vm->def->uuid); + qemuProcessRemoveDomainStatus(driver, vm); /* Remove VNC and Spice ports from port reservation bitmap, but only if diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..02fabcc 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST, "Perf", /* 65 */ "Libssh transport layer", + "Resource Control", ) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..89bc43e --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,851 @@ +/* + * virresctrl.c: methods for managing resource control + * + * Copyright (C) 2017 Intel, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ + +#include <config.h> +#include <fcntl.h> +#include <sys/file.h> +#include <sys/stat.h> +#include <sys/types.h> + +#include "virresctrl.h" +#include "virerror.h" +#include "virlog.h" +#include "viralloc.h" +#include "virstring.h" +#include "virfile.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL +#define SYSFS_RESCTRL_PATH "/sys/fs/resctrl" +#define MAX_CBM_LEN 20 +#define VIR_RESCTRL_LOCK(fd, op) flock(fd, op) +#define VIR_RESCTRL_UNLOCK(fd) flock(fd, LOCK_UN) +#define CONSTRUCT_RESCTRL_PATH(domain_name, item_name) \ +do { \ + if (NULL == domain_name) { \ + if (virAsprintf(&path, "%s/%s", SYSFS_RESCTRL_PATH, item_name) < 0) \ + return -1; \ + } else { \ + if (virAsprintf(&path, "%s/%s/%s", SYSFS_RESCTRL_PATH, domain_name, \ + item_name) < 0) \ + return -1; \ + } \ +} while (0) + +VIR_ENUM_IMPL(virResctrl, VIR_RESCTRL_TYPE_LAST, + "L3", + "L3CODE", + "L3DATA") + +/** + * a virResctrlGroup represents a resource control group, it's a directory + * under /sys/fs/resctrl. + * e.g. /sys/fs/resctrl/CG1 + * |-- cpus + * |-- schemata + * `-- tasks + * # cat schemata + * L3DATA:0=fffff;1=fffff + * L3CODE:0=fffff;1=fffff + * + * Besides, it can also represent the default resource control group of the + * host. + */ + +typedef struct _virResctrlGroup virResctrlGroup; +typedef virResctrlGroup *virResctrlGroupPtr; +struct _virResctrlGroup { + char *name; /* resource group name, NULL for default host group */ + size_t n_tasks; /* number of tasks assigned to the resource group */ + char **tasks; /* task id list */ + virResctrlSchemataPtr schemata[VIR_RESCTRL_TYPE_LAST]; /* Array for schemata */ +}; + +/* All resource control groups on this host, including default resource group */ +typedef struct _virResctrlHost virResctrlHost; +typedef virResctrlHost *virResctrlHostPtr; +struct _virResctrlHost { + size_t n_groups; /* number of resource control group */ + virResctrlGroupPtr *groups; /* list of resource control group */ +}; + +void +virResctrlFreeSchemata(virResctrlSchemataPtr ptr) +{ + size_t i; + + if (!ptr) + return; + + for (i = 0; i < ptr->n_masks; i++) { + virBitmapFree(ptr->masks[i]->mask); + VIR_FREE(ptr->masks[i]); + } + + VIR_FREE(ptr); + ptr = NULL; +} + +static void +virResctrlFreeGroup(virResctrlGroupPtr ptr) +{ + size_t i; + + if (!ptr) + return; + + for (i = 0; i < ptr->n_tasks; i++) + VIR_FREE(ptr->tasks[i]); + VIR_FREE(ptr->name); + + for (i = 0; i < VIR_RESCTRL_TYPE_LAST; i++) + virResctrlFreeSchemata(ptr->schemata[i]); + + VIR_FREE(ptr); + ptr = NULL; +} + +/* Return specify type of schemata string from schematalval. + e.g., 0=f;1=f */ +static int +virResctrlGetSchemataString(virResctrlType type, + const char *schemataval, + char **schematastr) +{ + int rc = -1; + char *prefix = NULL; + char **lines = NULL; + + if (virAsprintf(&prefix, + "%s:", + virResctrlTypeToString(type)) < 0) + return -1; + + lines = virStringSplit(schemataval, "\n", 0); + + if (VIR_STRDUP(*schematastr, + virStringListGetFirstWithPrefix(lines, prefix)) < 0) + goto cleanup; + + if (*schematastr == NULL) + rc = -1; + else + rc = 0; + + cleanup: + VIR_FREE(prefix); + virStringListFree(lines); + return rc; +} + +static int +virResctrlRemoveSysGroup(const char* name) +{ + char *path = NULL; + int ret = -1; + + if ((ret = virAsprintf(&path, "%s/%s", SYSFS_RESCTRL_PATH, name)) < 0) + return ret; + + ret = rmdir(path); + + VIR_FREE(path); + return ret; +} + +static int +virResctrlNewSysGroup(const char *name) +{ + char *path = NULL; + int ret = -1; + mode_t mode = 0755; + + if (virAsprintf(&path, "%s/%s", SYSFS_RESCTRL_PATH, name) < 0) + return -1; + + if (virDirCreate(path, mode, 0, 0, 0) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FREE(path); + return ret; +} + +static int +virResctrlWrite(const char *name, const char *item, const char *content) +{ + char *path; + int writefd; + int rc = -1; + + CONSTRUCT_RESCTRL_PATH(name, item); + + if (!virFileExists(path)) + goto cleanup; + + if ((writefd = open(path, O_WRONLY | O_APPEND, S_IRUSR | S_IWUSR)) < 0) + goto cleanup; + + if (safewrite(writefd, content, strlen(content)) < 0) + goto cleanup; + rc = 0; + + cleanup: + VIR_FREE(path); + VIR_FORCE_CLOSE(writefd); + return rc; +} + +static +virBitmapPtr virResctrlMask2Bitmap(const char *mask) +{ + virBitmapPtr bitmap; + unsigned int tmp; + size_t i; + + if (virStrToLong_ui(mask, NULL, 16, &tmp) < 0) + return NULL; + + bitmap = virBitmapNewEmpty(); + + for (i = 0; i < MAX_CBM_LEN; i++) { + if (((tmp & 0x1) == 0x1) && + (virBitmapSetBitExpand(bitmap, i) < 0)) + goto error; + tmp = tmp >> 1; + } + return bitmap; + + error: + virBitmapFree(bitmap); + return NULL; +} + +char *virResctrlBitmap2String(virBitmapPtr bitmap) +{ + char *tmp; + char *ret = NULL; + char *p; + tmp = virBitmapString(bitmap); + /* skip "0x" */ + p = tmp + 2; + + /* first non-0 position */ + while (*++p == '0'); + + if (VIR_STRDUP(ret, p) < 0) + ret = NULL; + + VIR_FREE(tmp); + return ret; +} + +static int +virResctrlParseSchemata(const char* schemata_str, + virResctrlSchemataPtr schemata) +{ + VIR_DEBUG("schemata_str=%s, schemata=%p", schemata_str, schemata); + + int ret = -1; + size_t i; + virResctrlMaskPtr mask; + char **schemata_list; + char *mask_str; + + /* parse 0=fffff;1=f */ + schemata_list = virStringSplit(schemata_str, ";", 0); + + if (!schemata_list) + goto cleanup; + + for (i = 0; schemata_list[i] != NULL; i++) { + /* parse 0=fffff */ + mask_str = strchr(schemata_list[i], '='); + + if (!mask_str) + goto cleanup; + + if (VIR_ALLOC(mask) < 0) + goto cleanup; + + mask->cache_id = i; + mask->mask = virResctrlMask2Bitmap(mask_str + 1); + schemata->n_masks += 1; + schemata->masks[i] = mask; + + } + ret = 0; + + cleanup: + virStringListFree(schemata_list); + return ret; +} + +static int +virResctrlLoadGroup(const char *name, + virResctrlHostPtr host) +{ + VIR_DEBUG("name=%s, host=%p\n", name, host); + + int ret = -1; + char *schemataval = NULL; + char *schemata_str = NULL; + virResctrlType i; + int rv; + virResctrlGroupPtr grp; + virResctrlSchemataPtr schemata; + + rv = virFileReadValueString(&schemataval, + SYSFS_RESCTRL_PATH "/%s/schemata", + name ? name : ""); + + if (rv < 0) + return -1; + + if (VIR_ALLOC(grp) < 0) + goto cleanup; + + if (VIR_STRDUP(grp->name, name) < 0) + goto cleanup; + + for (i = 0; i < VIR_RESCTRL_TYPE_LAST; i++) { + rv = virResctrlGetSchemataString(i, schemataval, &schemata_str); + + if (rv < 0) + continue; + + if (VIR_ALLOC(schemata) < 0) + goto cleanup; + + schemata->type = i; + + if (virResctrlParseSchemata(schemata_str, schemata) < 0) { + VIR_FREE(schemata); + VIR_FREE(schemata_str); + goto cleanup; + } + + grp->schemata[i] = schemata; + VIR_FREE(schemata_str); + } + + if (VIR_APPEND_ELEMENT(host->groups, + host->n_groups, + grp) < 0) { + virResctrlFreeGroup(grp); + goto cleanup; + } + + ret = 0; + + cleanup: + VIR_FREE(schemataval); + return ret; +} + +static int +virResctrlLoadHost(virResctrlHostPtr host) +{ + int rv = -1; + DIR *dirp = NULL; + char *path = NULL; + struct dirent *ent; + + rv = virDirOpenIfExists(&dirp, SYSFS_RESCTRL_PATH); + if (rv < 0) + return -1; + + /* load default group first */ + if (virResctrlLoadGroup(NULL, host) < 0) + return -1; + + while ((rv = virDirRead(dirp, &ent, path)) > 0) { + /* resctrl is not hierarchical, only read directory under + /sys/fs/resctrl */ + if ((ent->d_type != DT_DIR) || STREQ(ent->d_name, "info")) + continue; + + if (virResctrlLoadGroup(ent->d_name, host) < 0) + return -1; + } + return 0; +} + +static void +virResctrlRefreshHost(virResctrlHostPtr host) +{ + virResctrlGroupPtr default_grp = NULL; + virResctrlSchemataPtr schemata = NULL; + size_t i, j; + virResctrlType t; + + default_grp = host->groups[0]; + + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + if (default_grp->schemata[t] != NULL) { + for (i = 0; i < default_grp->schemata[t]->n_masks; i++) { + /* Reset default group's mask */ + virBitmapSetAll(default_grp->schemata[t]->masks[i]->mask); + /* Loop each other resource group except default group */ + for (j = 1; j < host->n_groups; j++) { + schemata = host->groups[j]->schemata[t]; + virBitmapSubtract(default_grp->schemata[t]->masks[i]->mask, + schemata->masks[i]->mask); + } + } + } + } +} + +static virResctrlGroupPtr +virResctrlGetFreeGroup(void) +{ + size_t i; + virResctrlHostPtr host = NULL; + virResctrlGroupPtr grp = NULL; + + if (VIR_ALLOC(host) < 0) + return NULL; + + if (virResctrlLoadHost(host) < 0) + goto error; + + virResctrlRefreshHost(host); + + for (i = 1; i < host->n_groups; i++) + virResctrlFreeGroup(host->groups[i]); + + grp = host->groups[0]; + VIR_FREE(host); + + return grp; + + error: + virResctrlFreeGroup(grp); + return NULL; +} + +virResctrlSchemataPtr +virResctrlGetFreeCache(virResctrlType type) +{ + VIR_DEBUG("type=%d", type); + + virResctrlType t; + virResctrlGroupPtr grp = NULL; + virResctrlSchemataPtr schemata = NULL; + int lockfd = -1; + + lockfd = open(SYSFS_RESCTRL_PATH, O_DIRECTORY); + if (lockfd < 0) + return NULL; + + VIR_RESCTRL_LOCK(lockfd, LOCK_SH); + + if ((grp = virResctrlGetFreeGroup()) == NULL) + goto cleanup; + + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + if (t == type) + schemata = grp->schemata[t]; + else + virResctrlFreeSchemata(grp->schemata[t]); + + } + + cleanup: + VIR_RESCTRL_UNLOCK(lockfd); + VIR_FORCE_CLOSE(lockfd); + return schemata; +} + +static int +virResctrlCalculateCbm(int cbm_len, + virBitmapPtr defaultcbm, + virBitmapPtr newcbm) +{ + VIR_DEBUG("cbm_len=%d, defaultcbm=%p, newcbm=%p", + cbm_len, defaultcbm, newcbm); + + ssize_t pos = -1; + size_t i; + + /* not enough cache way to be allocated */ + if (virBitmapCountBits(defaultcbm) < cbm_len + 1) + return -1; + + while ((pos = virBitmapNextSetBit(defaultcbm, pos)) >= 0) { + for (i = 0; i < cbm_len; i++) + ignore_value(virBitmapSetBitExpand(newcbm, i + pos)); + /* Test if newcbm is sub set of defaultcbm */ + if (virBitmapNextClearBit(defaultcbm, pos) > i + pos) { + break; + } else { + pos = pos + i - 1; + virBitmapClearAll(newcbm); + } + } + + if (virBitmapCountBits(newcbm) != cbm_len) + return -1; + + /* consume default cbm after allocation */ + virBitmapSubtract(defaultcbm, newcbm); + + return 0; +} + +/* Fill mask value for newly created resource group base on hostcachebank + * and domcachebank */ +static int +virResctrlFillMask(virResctrlGroupPtr grp, + virResctrlGroupPtr free_grp, + virCapsHostCacheBankPtr hostcachebank, + virDomainCacheBankPtr domcachebank) +{ + VIR_DEBUG("grp=%p, free_grp=%p, hostcachebank=%p, domcachebank=%p", + grp, free_grp, hostcachebank, domcachebank); + + size_t i; + int cbm_candidate_len; + unsigned int cache_id; + unsigned int cache_type; + virCapsHostCacheControlPtr control = NULL; + virResctrlMaskPtr mask; + virResctrlSchemataPtr schemata = NULL; + + /* Find control information for that kind type of cache */ + for (i = 0; i < hostcachebank->ncontrols; i++) { + if (hostcachebank->controls[i]->scope == domcachebank->type) { + control = hostcachebank->controls[i]; + break; + } + } + + if (control == NULL) + return -1; + + cache_type = domcachebank->type; + cache_id = domcachebank->cache_id; + schemata = grp->schemata[cache_type]; + + if ((schemata == NULL) && (VIR_ALLOC(schemata) < 0)) + return -1; + + if (VIR_ALLOC(mask) < 0) + return -1; + + mask->cache_id = cache_id; + mask->mask = virBitmapNewEmpty(); + + /* here should be control->granularity and control->min + also domcachebank size should be checked while define domain xml */ + cbm_candidate_len = domcachebank->size / control->min; + VIR_DEBUG("cbm_len = %d", cbm_candidate_len); + if (virResctrlCalculateCbm(cbm_candidate_len, + free_grp->schemata[cache_type]->masks[cache_id]->mask, + mask->mask) < 0) + goto error; + + schemata->type = cache_type; + schemata->n_masks += 1; + schemata->masks[cache_id] = mask; + grp->schemata[cache_type] = schemata; + + return 0; + + error: + VIR_FREE(schemata); + return -1; +} + +/* only keep the highest consecutive bits */ +static void +virResctrlTrimMask(virResctrlMaskPtr mask) +{ + size_t i; + ssize_t setbit = -1; + ssize_t clearbit = -1; + + clearbit = virBitmapNextClearBit(mask->mask, -1); + setbit = virBitmapNextSetBit(mask->mask, -1); + for (i = setbit; i < clearbit; i++) + ignore_value(virBitmapClearBit(mask->mask, i)); +} + +static int +virResctrlCompleteMask(virResctrlSchemataPtr schemata, + virResctrlSchemataPtr defaultschemata) +{ + size_t i; + virResctrlMaskPtr mask; + + if (schemata == NULL && VIR_ALLOC(schemata) < 0) + return -1; + + if (schemata->n_masks == defaultschemata->n_masks) + return 0; + + for (i = 0; i < defaultschemata->n_masks; i++) { + if (schemata->masks[i] == NULL) { + if (VIR_ALLOC(mask) < 0) + goto error; + + mask->cache_id = i; + mask->mask = virBitmapNewEmpty(); + schemata->n_masks += 1; + schemata->masks[i] = mask; + /* resctrl doesn't allow mask to be zero + use higher bits to fill up the cbm which + domaincache bank doens't provide */ + ignore_value(virBitmapSetBitExpand(mask->mask, + virBitmapLastSetBit(defaultschemata->masks[i]->mask))); + } + /* only keep the highest consecutive bits for default group */ + virResctrlTrimMask(defaultschemata->masks[i]); + } + + return 0; + + error: + VIR_FREE(schemata); + return -1; +} + +/* complete the schemata in the resrouce group before it can be write back + to resctrl */ +static int +virResctrlCompleteGroup(virResctrlGroupPtr grp, + virResctrlGroupPtr default_grp) +{ + virResctrlType t; + virResctrlSchemataPtr schemata; + virResctrlSchemataPtr defaultschemata; + + + /* NOTES: resctrl system require we need provide all cache's cbm mask */ + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + defaultschemata = default_grp->schemata[t]; + if (defaultschemata != NULL) { + schemata = grp->schemata[t]; + if (virResctrlCompleteMask(schemata, defaultschemata) < 0) + return -1; + /* only keep the highest consecutive bits for default group */ + } + } + return 0; +} + +static +char *virResctrlGetSchemataStr(virResctrlSchemataPtr schemata) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + size_t i; + + virBufferAsprintf(&buf, "%s:%u=%s", + virResctrlTypeToString(schemata->type), + schemata->masks[0]->cache_id, + virResctrlBitmap2String(schemata->masks[0]->mask)); + + for (i = 1; i < schemata->n_masks; i ++) + virBufferAsprintf(&buf, ";%u=%s", + schemata->masks[i]->cache_id, + virResctrlBitmap2String(schemata->masks[i]->mask)); + + return virBufferContentAndReset(&buf); +} + +static int +virResctrlFlushGroup(virResctrlGroupPtr grp) +{ + int ret = -1; + size_t i; + char *schemata_str = NULL; + virResctrlType t; + virBuffer buf = VIR_BUFFER_INITIALIZER; + + VIR_DEBUG("grp=%p", grp); + + if (grp->name != NULL && virResctrlNewSysGroup(grp->name) < 0) + return -1; + + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + if (grp->schemata[t] != NULL) { + schemata_str = virResctrlGetSchemataStr(grp->schemata[t]); + virBufferAsprintf(&buf, "%s\n", schemata_str); + VIR_FREE(schemata_str); + } + } + + schemata_str = virBufferContentAndReset(&buf); + + if (virResctrlWrite(grp->name, "schemata", schemata_str) < 0) + goto cleanup; + + for (i = 0; i < grp->n_tasks; i++) { + if (virResctrlWrite(grp->name, "tasks", grp->tasks[i]) < 0) + goto cleanup; + } + + ret = 0; + + cleanup: + VIR_FREE(schemata_str); + return ret; +} + +int virResctrlSetCachetunes(virCapsHostPtr caps, + virDomainCachetunePtr cachetune, + unsigned char* uuid, pid_t *pids, int npid) +{ + size_t i; + size_t j; + int ret = -1; + char name[VIR_UUID_STRING_BUFLEN]; + char *tmp; + int lockfd = -1; + virResctrlGroupPtr grp = NULL; + virResctrlGroupPtr default_grp = NULL; + virCapsHostCacheBankPtr hostcachebank; + virDomainCacheBankPtr domcachebank; + + virUUIDFormat(uuid, name); + + if (cachetune->n_banks < 1) + return 0; + + /* create new resource group */ + if (VIR_ALLOC(grp) < 0) + goto error; + + if (VIR_STRDUP(grp->name, name) < 0) + goto error; + + /* allocate file lock */ + lockfd = open(SYSFS_RESCTRL_PATH, O_DIRECTORY); + if (lockfd < 0) + goto error; + + VIR_RESCTRL_LOCK(lockfd, LOCK_EX); + + if ((default_grp = virResctrlGetFreeGroup()) == NULL) + goto error; + + /* Allocate cache for each cache bank defined in cache tune */ + for (i = 0; i < cachetune->n_banks; i++) { + domcachebank = &cachetune->cache_banks[i]; + hostcachebank = NULL; + /* find the host cache bank to be allocated on */ + for (j = 0; j < caps->ncaches; j++) { + if (caps->caches[j]->id == domcachebank->cache_id) { + hostcachebank = caps->caches[j]; + break; + } + } + /* fill up newly crated grp and consume from default_grp */ + if (virResctrlFillMask(grp, default_grp, hostcachebank, domcachebank) < 0) + goto error; + } + + /* Add tasks to grp */ + for (i = 0; i < npid; i++) { + if (virAsprintf(&tmp, "%llu", (long long)pids[i]) < 0) + goto error; + + if (VIR_APPEND_ELEMENT(grp->tasks, + grp->n_tasks, + tmp) < 0) { + VIR_FREE(tmp); + goto error; + } + } + + if (virResctrlCompleteGroup(grp, default_grp) < 0) { + VIR_WARN("Failed to complete group"); + goto error; + } + + if (virResctrlFlushGroup(grp) < 0) + goto error; + + if (virResctrlFlushGroup(default_grp) < 0) { + virResctrlRemoveSysGroup(grp->name); + goto error; + } + + ret = 0; + + error: + VIR_RESCTRL_UNLOCK(lockfd); + VIR_FORCE_CLOSE(lockfd); + virResctrlFreeGroup(grp); + virResctrlFreeGroup(default_grp); + return ret; +} + +int virResctrlRemoveCachetunes(unsigned char* uuid) +{ + int ret = -1; + int lockfd = -1; + size_t i; + virResctrlType t; + virResctrlSchemataPtr schemata; + char name[VIR_UUID_STRING_BUFLEN]; + virResctrlGroupPtr default_grp = NULL; + + virUUIDFormat(uuid, name); + + VIR_DEBUG("name=%s", name); + + lockfd = open(SYSFS_RESCTRL_PATH, O_DIRECTORY); + if (lockfd < 0) + return -1; + + VIR_RESCTRL_LOCK(lockfd, LOCK_SH); + + if (virResctrlRemoveSysGroup(name) < 0) + goto cleanup; + + if ((default_grp = virResctrlGetFreeGroup()) == NULL) + goto cleanup; + + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + schemata = default_grp->schemata[t]; + if (schemata != NULL) { + for (i = 0; i < schemata->n_masks; i++) + virResctrlTrimMask(schemata->masks[i]); + } + } + + if (virResctrlFlushGroup(default_grp) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_RESCTRL_UNLOCK(lockfd); + VIR_FORCE_CLOSE(lockfd); + return ret; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..7373c72 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,79 @@ +/* + * virresctrl.h: header for managing resctrl control + * + * Copyright (C) 2017 Intel, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +#include "virutil.h" +#include "virbitmap.h" +#include "conf/domain_conf.h" + +#define MAX_CACHE_ID 16 + +typedef enum { + VIR_RESCTRL_TYPE_L3, + VIR_RESCTRL_TYPE_L3_CODE, + VIR_RESCTRL_TYPE_L3_DATA, + + VIR_RESCTRL_TYPE_LAST +} virResctrlType; + +VIR_ENUM_DECL(virResctrl); + +/* + * a virResctrlMask represents one of mask object in a + * resource control group. + * e.g., 0=f + */ +typedef struct _virResctrlMask virResctrlMask; +typedef virResctrlMask *virResctrlMaskPtr; +struct _virResctrlMask { + unsigned int cache_id; /* cache resource id */ + virBitmapPtr mask; /* the cbm mask */ +}; + +/* + * a virResctrlSchemata represents schemata objects of specific type of + * resource in a resource control group. + * eg: L3:0=f,1=ff + */ +typedef struct _virResctrlSchemata virResctrlSchemata; +typedef virResctrlSchemata *virResctrlSchemataPtr; +struct _virResctrlSchemata { + virResctrlType type; /* resource control type, e.g., L3 */ + size_t n_masks; /* number of masks */ + virResctrlMaskPtr masks[MAX_CACHE_ID]; /* array of mask, use array for easy index */ +}; + +/* Get free cache of the host, result saved in schemata */ +virResctrlSchemataPtr virResctrlGetFreeCache(virResctrlType type); + +/* Get mask string from Bitmap */ +char *virResctrlBitmap2String(virBitmapPtr bitmap); + +void virResctrlFreeSchemata(virResctrlSchemataPtr ptr); +int virResctrlSetCachetunes(virCapsHostPtr caps, + virDomainCachetunePtr cachetune, + unsigned char* uuid, pid_t *pids, int npid); +int virResctrlRemoveCachetunes(unsigned char* uuid); +#endif diff --git a/tests/Makefile.am b/tests/Makefile.am index 19986dc..e0b9923 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -229,6 +229,7 @@ if WITH_LINUX test_programs += fchosttest test_programs += scsihosttest test_programs += vircaps2xmltest +test_programs += virresctrltest test_libraries += virusbmock.la \ virnetdevbandwidthmock.la \ virnumamock.la \ @@ -1149,8 +1150,13 @@ virnumamock_la_CFLAGS = $(AM_CFLAGS) virnumamock_la_LDFLAGS = $(MOCKLIBS_LDFLAGS) virnumamock_la_LIBADD = $(MOCKLIBS_LIBS) +virresctrltest_SOURCES = \ + virresctrltest.c testutils.h testutils.c virfilewrapper.h virfilewrapper.c +virresctrltest_LDADD = $(LDADDS) + else ! WITH_LINUX -EXTRA_DIST += vircaps2xmltest.c virnumamock.c virfilewrapper.c virfilewrapper.h +EXTRA_DIST += vircaps2xmltest.c virnumamock.c virfilewrapper.c \ + virfilewrapper.h virresctrltest.c endif ! WITH_LINUX if WITH_NSS diff --git a/tests/virresctrldata/L3-free.schemata b/tests/virresctrldata/L3-free.schemata new file mode 100644 index 0000000..9b47d25 --- /dev/null +++ b/tests/virresctrldata/L3-free.schemata @@ -0,0 +1 @@ +L3:0=1ffff;1=1ffff diff --git a/tests/virresctrldata/L3CODE-free.schemata b/tests/virresctrldata/L3CODE-free.schemata new file mode 100644 index 0000000..7039c45 --- /dev/null +++ b/tests/virresctrldata/L3CODE-free.schemata @@ -0,0 +1 @@ +L3CODE:0=cffff;1=cffff diff --git a/tests/virresctrldata/L3DATA-free.schemata b/tests/virresctrldata/L3DATA-free.schemata new file mode 100644 index 0000000..30f1cbd --- /dev/null +++ b/tests/virresctrldata/L3DATA-free.schemata @@ -0,0 +1 @@ +L3DATA:0=3ffff;1=3ffff diff --git a/tests/virresctrldata/linux-resctrl b/tests/virresctrldata/linux-resctrl new file mode 120000 index 0000000..069dfb2 --- /dev/null +++ b/tests/virresctrldata/linux-resctrl @@ -0,0 +1 @@ +../vircaps2xmldata/linux-resctrl \ No newline at end of file diff --git a/tests/virresctrldata/linux-resctrl-cdp b/tests/virresctrldata/linux-resctrl-cdp new file mode 120000 index 0000000..c5a973f --- /dev/null +++ b/tests/virresctrldata/linux-resctrl-cdp @@ -0,0 +1 @@ +../vircaps2xmldata/linux-resctrl-cdp \ No newline at end of file diff --git a/tests/virresctrltest.c b/tests/virresctrltest.c new file mode 100644 index 0000000..a9d75f1 --- /dev/null +++ b/tests/virresctrltest.c @@ -0,0 +1,119 @@ +/* + * Copyright (C) Intel, Inc. 2017 + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ + +#include <config.h> +#include <stdlib.h> + +#include "testutils.h" +#include "virbitmap.h" +#include "virfilewrapper.h" +#include "virresctrl.h" + + +#define VIR_FROM_THIS VIR_FROM_NONE + +struct virResctrlData { + const char *filename; + virResctrlType type; +}; + +static void +GetSchemataStr(virResctrlSchemataPtr schemata, char **str) +{ + size_t i; + + virBuffer buf = VIR_BUFFER_INITIALIZER; + virBufferAsprintf(&buf, "%s:%u=%s", + virResctrlTypeToString(schemata->type), + schemata->masks[0]->cache_id, + virResctrlBitmap2String(schemata->masks[0]->mask)); + + for (i = 1; i < schemata->n_masks; i ++) + virBufferAsprintf(&buf, ";%u=%s", + schemata->masks[i]->cache_id, + virResctrlBitmap2String(schemata->masks[i]->mask)); + + *str = virBufferContentAndReset(&buf); +} + +static int +test_virResctrl(const void *opaque) +{ + struct virResctrlData *data = (struct virResctrlData *) opaque; + char *dir = NULL; + char *resctrl = NULL; + int ret = -1; + virResctrlSchemataPtr schemata = NULL; + char *schemata_str = NULL; + char *schemata_file; + + if (virAsprintf(&resctrl, "%s/virresctrldata/linux-%s/resctrl", + abs_srcdir, data->filename) < 0) + goto cleanup; + + if (virAsprintf(&schemata_file, "%s/virresctrldata/%s-free.schemata", + abs_srcdir, virResctrlTypeToString(data->type)) < 0) + goto cleanup; + + if (virFileWrapperAddPrefix("/sys/fs/resctrl", resctrl) < 0) + goto cleanup; + + if ((schemata = virResctrlGetFreeCache(data->type)) == NULL) + goto cleanup; + + virFileWrapperClearPrefixes(); + + GetSchemataStr(schemata, &schemata_str); + + if (virTestCompareToFile(schemata_str, schemata_file) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FREE(dir); + VIR_FREE(resctrl); + VIR_FREE(schemata_str); + virResctrlFreeSchemata(schemata); + return ret; +} + +static int +mymain(void) +{ + int ret = 0; + +#define DO_TEST_FULL(filename, type) \ + do { \ + struct virResctrlData data = {filename, \ + type}; \ + if (virTestRun(filename, test_virResctrl, &data) < 0) \ + ret = -1; \ + } while (0) + + DO_TEST_FULL("resctrl", VIR_RESCTRL_TYPE_L3); + DO_TEST_FULL("resctrl-cdp", VIR_RESCTRL_TYPE_L3_CODE); + DO_TEST_FULL("resctrl-cdp", VIR_RESCTRL_TYPE_L3_DATA); + + return ret; +} + +VIR_TEST_MAIN(mymain) -- 1.9.1

hello, ping On Monday, 12 June 2017 at 5:48 PM, Eli Qiao wrote:
This patch adds 3 major private interface.
virResctrlGetFreeCache: return free cache, default cache substract cache allocated. virResctrlSetCachetunes: set cache banks which defined in a domain. virResctrlRemoveCachetunes: remove cache allocation group from the host.
There's some existed issue when do syntax-check as I reference the cache tune and cachebank definition (from conf/domain_conf.h) in util/virresctrl.h. --- include/libvirt/virterror.h | 1 + src/Makefile.am (http://Makefile.am) | 1 + src/libvirt_private.syms | 9 + src/qemu/qemu_process.c | 54 ++ src/util/virerror.c | 1 + src/util/virresctrl.c | 851 ++++++++++++++++++++++++++++++ src/util/virresctrl.h | 79 +++ tests/Makefile.am (http://Makefile.am) | 8 +- tests/virresctrldata/L3-free.schemata | 1 + tests/virresctrldata/L3CODE-free.schemata | 1 + tests/virresctrldata/L3DATA-free.schemata | 1 + tests/virresctrldata/linux-resctrl | 1 + tests/virresctrldata/linux-resctrl-cdp | 1 + tests/virresctrltest.c | 119 +++++ 14 files changed, 1127 insertions(+), 1 deletion(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h create mode 100644 tests/virresctrldata/L3-free.schemata create mode 100644 tests/virresctrldata/L3CODE-free.schemata create mode 100644 tests/virresctrldata/L3DATA-free.schemata create mode 120000 tests/virresctrldata/linux-resctrl create mode 120000 tests/virresctrldata/linux-resctrl-cdp create mode 100644 tests/virresctrltest.c
diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 2efee8f..4bc0c74 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -132,6 +132,7 @@ typedef enum {
VIR_FROM_PERF = 65, /* Error from perf */ VIR_FROM_LIBSSH = 66, /* Error from libssh connection transport */ + VIR_FROM_RESCTRL = 67, /* Error from resctrl */
# ifdef VIR_ENUM_SENTINELS VIR_ERR_DOMAIN_LAST diff --git a/src/Makefile.am (http://Makefile.am) b/src/Makefile.am (http://Makefile.am) index eae32dc..8dbb778 100644 --- a/src/Makefile.am (http://Makefile.am) +++ b/src/Makefile.am (http://Makefile.am) @@ -167,6 +167,7 @@ UTIL_SOURCES = \ util/virprocess.c util/virprocess.h \ util/virqemu.c util/virqemu.h \ util/virrandom.h util/virrandom.c \ + util/virresctrl.h util/virresctrl.c \ util/virrotatingfile.h util/virrotatingfile.c \ util/virscsi.c util/virscsi.h \ util/virscsihost.c util/virscsihost.h \ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b6c828f..7392cfa 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2440,6 +2440,15 @@ virRandomGenerateWWN; virRandomInt;
+# util/virresctrl.h +virResctrlBitmap2String; +virResctrlFreeSchemata; +virResctrlGetFreeCache; +virResctrlRemoveCachetunes; +virResctrlSetCachetunes; +virResctrlTypeToString; + + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 4a66f0d..8efeb19 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -70,6 +70,7 @@ #include "virbitmap.h" #include "viratomic.h" #include "virnuma.h" +#include "virresctrl.h" #include "virstring.h" #include "virhostdev.h" #include "secret_util.h" @@ -5088,6 +5089,51 @@ qemuProcessSetupVcpus(virDomainObjPtr vm) return 0; }
+static int +qemuProcessSetCacheBanks(virCapsHostPtr caps, virDomainObjPtr vm) +{ + size_t i, j; + virDomainCachetunePtr cachetune; + unsigned int max_vcpus = virDomainDefGetVcpusMax(vm->def); + pid_t *pids = NULL; + virDomainVcpuDefPtr vcpu; + size_t npid = 0; + size_t count = 0; + int ret = -1; + + cachetune = &(vm->def->cachetune); + + for (i = 0; i < cachetune->n_banks; i++) { + if (cachetune->cache_banks[i].vcpus) { + for (j = 0; j < max_vcpus; j++) { + if (virBitmapIsBitSet(cachetune->cache_banks[i].vcpus, j)) { + + vcpu = virDomainDefGetVcpu(vm->def, j); + if (!vcpu->online) + continue; + + if (VIR_RESIZE_N(pids, npid, count, 1) < 0) + goto cleanup; + pids[count ++] = qemuDomainGetVcpuPid(vm, j); + } + } + } + } + + /* If not specify vcpus in cachetune, add vm->pid */ + if (pids == NULL) { + if (VIR_ALLOC_N(pids, 1) < 0) + goto cleanup; + pids[0] = vm->pid; + count = 1; + } + ret = virResctrlSetCachetunes(caps, cachetune, vm->def->uuid, pids, count); + + cleanup: + VIR_FREE(pids); + return ret; +} +
int qemuProcessSetupIOThread(virDomainObjPtr vm, @@ -5914,6 +5960,11 @@ qemuProcessLaunch(virConnectPtr conn, qemuProcessAutoDestroyAdd(driver, vm, conn) < 0) goto cleanup;
+ VIR_WARN("Cache allocation"); + if (qemuProcessSetCacheBanks(&(driver->caps->host), + vm) < 0) + goto cleanup; + ret = 0;
cleanup: @@ -6419,6 +6470,9 @@ void qemuProcessStop(virQEMUDriverPtr driver, virPerfFree(priv->perf); priv->perf = NULL;
+ if (&(vm->def->cachetune) != NULL) + virResctrlRemoveCachetunes(vm->def->uuid); + qemuProcessRemoveDomainStatus(driver, vm);
/* Remove VNC and Spice ports from port reservation bitmap, but only if diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..02fabcc 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST,
"Perf", /* 65 */ "Libssh transport layer", + "Resource Control", )
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..89bc43e --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,851 @@ +/* + * virresctrl.c: methods for managing resource control + * + * Copyright (C) 2017 Intel, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ + +#include <config.h> +#include <fcntl.h> +#include <sys/file.h> +#include <sys/stat.h> +#include <sys/types.h> + +#include "virresctrl.h" +#include "virerror.h" +#include "virlog.h" +#include "viralloc.h" +#include "virstring.h" +#include "virfile.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL +#define SYSFS_RESCTRL_PATH "/sys/fs/resctrl" +#define MAX_CBM_LEN 20 +#define VIR_RESCTRL_LOCK(fd, op) flock(fd, op) +#define VIR_RESCTRL_UNLOCK(fd) flock(fd, LOCK_UN) +#define CONSTRUCT_RESCTRL_PATH(domain_name, item_name) \ +do { \ + if (NULL == domain_name) { \ + if (virAsprintf(&path, "%s/%s", SYSFS_RESCTRL_PATH, item_name) < 0) \ + return -1; \ + } else { \ + if (virAsprintf(&path, "%s/%s/%s", SYSFS_RESCTRL_PATH, domain_name, \ + item_name) < 0) \ + return -1; \ + } \ +} while (0) + +VIR_ENUM_IMPL(virResctrl, VIR_RESCTRL_TYPE_LAST, + "L3", + "L3CODE", + "L3DATA") + +/** + * a virResctrlGroup represents a resource control group, it's a directory + * under /sys/fs/resctrl. + * e.g. /sys/fs/resctrl/CG1 + * |-- cpus + * |-- schemata + * `-- tasks + * # cat schemata + * L3DATA:0=fffff;1=fffff + * L3CODE:0=fffff;1=fffff + * + * Besides, it can also represent the default resource control group of the + * host. + */ + +typedef struct _virResctrlGroup virResctrlGroup; +typedef virResctrlGroup *virResctrlGroupPtr; +struct _virResctrlGroup { + char *name; /* resource group name, NULL for default host group */ + size_t n_tasks; /* number of tasks assigned to the resource group */ + char **tasks; /* task id list */ + virResctrlSchemataPtr schemata[VIR_RESCTRL_TYPE_LAST]; /* Array for schemata */ +}; + +/* All resource control groups on this host, including default resource group */ +typedef struct _virResctrlHost virResctrlHost; +typedef virResctrlHost *virResctrlHostPtr; +struct _virResctrlHost { + size_t n_groups; /* number of resource control group */ + virResctrlGroupPtr *groups; /* list of resource control group */ +}; + +void +virResctrlFreeSchemata(virResctrlSchemataPtr ptr) +{ + size_t i; + + if (!ptr) + return; + + for (i = 0; i < ptr->n_masks; i++) { + virBitmapFree(ptr->masks[i]->mask); + VIR_FREE(ptr->masks[i]); + } + + VIR_FREE(ptr); + ptr = NULL; +} + +static void +virResctrlFreeGroup(virResctrlGroupPtr ptr) +{ + size_t i; + + if (!ptr) + return; + + for (i = 0; i < ptr->n_tasks; i++) + VIR_FREE(ptr->tasks[i]); + VIR_FREE(ptr->name); + + for (i = 0; i < VIR_RESCTRL_TYPE_LAST; i++) + virResctrlFreeSchemata(ptr->schemata[i]); + + VIR_FREE(ptr); + ptr = NULL; +} + +/* Return specify type of schemata string from schematalval. + e.g., 0=f;1=f */ +static int +virResctrlGetSchemataString(virResctrlType type, + const char *schemataval, + char **schematastr) +{ + int rc = -1; + char *prefix = NULL; + char **lines = NULL; + + if (virAsprintf(&prefix, + "%s:", + virResctrlTypeToString(type)) < 0) + return -1; + + lines = virStringSplit(schemataval, "\n", 0); + + if (VIR_STRDUP(*schematastr, + virStringListGetFirstWithPrefix(lines, prefix)) < 0) + goto cleanup; + + if (*schematastr == NULL) + rc = -1; + else + rc = 0; + + cleanup: + VIR_FREE(prefix); + virStringListFree(lines); + return rc; +} + +static int +virResctrlRemoveSysGroup(const char* name) +{ + char *path = NULL; + int ret = -1; + + if ((ret = virAsprintf(&path, "%s/%s", SYSFS_RESCTRL_PATH, name)) < 0) + return ret; + + ret = rmdir(path); + + VIR_FREE(path); + return ret; +} + +static int +virResctrlNewSysGroup(const char *name) +{ + char *path = NULL; + int ret = -1; + mode_t mode = 0755; + + if (virAsprintf(&path, "%s/%s", SYSFS_RESCTRL_PATH, name) < 0) + return -1; + + if (virDirCreate(path, mode, 0, 0, 0) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FREE(path); + return ret; +} + +static int +virResctrlWrite(const char *name, const char *item, const char *content) +{ + char *path; + int writefd; + int rc = -1; + + CONSTRUCT_RESCTRL_PATH(name, item); + + if (!virFileExists(path)) + goto cleanup; + + if ((writefd = open(path, O_WRONLY | O_APPEND, S_IRUSR | S_IWUSR)) < 0) + goto cleanup; + + if (safewrite(writefd, content, strlen(content)) < 0) + goto cleanup; + rc = 0; + + cleanup: + VIR_FREE(path); + VIR_FORCE_CLOSE(writefd); + return rc; +} + +static +virBitmapPtr virResctrlMask2Bitmap(const char *mask) +{ + virBitmapPtr bitmap; + unsigned int tmp; + size_t i; + + if (virStrToLong_ui(mask, NULL, 16, &tmp) < 0) + return NULL; + + bitmap = virBitmapNewEmpty(); + + for (i = 0; i < MAX_CBM_LEN; i++) { + if (((tmp & 0x1) == 0x1) && + (virBitmapSetBitExpand(bitmap, i) < 0)) + goto error; + tmp = tmp >> 1; + } + return bitmap; + + error: + virBitmapFree(bitmap); + return NULL; +} + +char *virResctrlBitmap2String(virBitmapPtr bitmap) +{ + char *tmp; + char *ret = NULL; + char *p; + tmp = virBitmapString(bitmap); + /* skip "0x" */ + p = tmp + 2; + + /* first non-0 position */ + while (*++p == '0'); + + if (VIR_STRDUP(ret, p) < 0) + ret = NULL; + + VIR_FREE(tmp); + return ret; +} + +static int +virResctrlParseSchemata(const char* schemata_str, + virResctrlSchemataPtr schemata) +{ + VIR_DEBUG("schemata_str=%s, schemata=%p", schemata_str, schemata); + + int ret = -1; + size_t i; + virResctrlMaskPtr mask; + char **schemata_list; + char *mask_str; + + /* parse 0=fffff;1=f */ + schemata_list = virStringSplit(schemata_str, ";", 0); + + if (!schemata_list) + goto cleanup; + + for (i = 0; schemata_list[i] != NULL; i++) { + /* parse 0=fffff */ + mask_str = strchr(schemata_list[i], '='); + + if (!mask_str) + goto cleanup; + + if (VIR_ALLOC(mask) < 0) + goto cleanup; + + mask->cache_id = i; + mask->mask = virResctrlMask2Bitmap(mask_str + 1); + schemata->n_masks += 1; + schemata->masks[i] = mask; + + } + ret = 0; + + cleanup: + virStringListFree(schemata_list); + return ret; +} + +static int +virResctrlLoadGroup(const char *name, + virResctrlHostPtr host) +{ + VIR_DEBUG("name=%s, host=%p\n", name, host); + + int ret = -1; + char *schemataval = NULL; + char *schemata_str = NULL; + virResctrlType i; + int rv; + virResctrlGroupPtr grp; + virResctrlSchemataPtr schemata; + + rv = virFileReadValueString(&schemataval, + SYSFS_RESCTRL_PATH "/%s/schemata", + name ? name : ""); + + if (rv < 0) + return -1; + + if (VIR_ALLOC(grp) < 0) + goto cleanup; + + if (VIR_STRDUP(grp->name, name) < 0) + goto cleanup; + + for (i = 0; i < VIR_RESCTRL_TYPE_LAST; i++) { + rv = virResctrlGetSchemataString(i, schemataval, &schemata_str); + + if (rv < 0) + continue; + + if (VIR_ALLOC(schemata) < 0) + goto cleanup; + + schemata->type = i; + + if (virResctrlParseSchemata(schemata_str, schemata) < 0) { + VIR_FREE(schemata); + VIR_FREE(schemata_str); + goto cleanup; + } + + grp->schemata[i] = schemata; + VIR_FREE(schemata_str); + } + + if (VIR_APPEND_ELEMENT(host->groups, + host->n_groups, + grp) < 0) { + virResctrlFreeGroup(grp); + goto cleanup; + } + + ret = 0; + + cleanup: + VIR_FREE(schemataval); + return ret; +} + +static int +virResctrlLoadHost(virResctrlHostPtr host) +{ + int rv = -1; + DIR *dirp = NULL; + char *path = NULL; + struct dirent *ent; + + rv = virDirOpenIfExists(&dirp, SYSFS_RESCTRL_PATH); + if (rv < 0) + return -1; + + /* load default group first */ + if (virResctrlLoadGroup(NULL, host) < 0) + return -1; + + while ((rv = virDirRead(dirp, &ent, path)) > 0) { + /* resctrl is not hierarchical, only read directory under + /sys/fs/resctrl */ + if ((ent->d_type != DT_DIR) || STREQ(ent->d_name, "info")) + continue; + + if (virResctrlLoadGroup(ent->d_name, host) < 0) + return -1; + } + return 0; +} + +static void +virResctrlRefreshHost(virResctrlHostPtr host) +{ + virResctrlGroupPtr default_grp = NULL; + virResctrlSchemataPtr schemata = NULL; + size_t i, j; + virResctrlType t; + + default_grp = host->groups[0]; + + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + if (default_grp->schemata[t] != NULL) { + for (i = 0; i < default_grp->schemata[t]->n_masks; i++) { + /* Reset default group's mask */ + virBitmapSetAll(default_grp->schemata[t]->masks[i]->mask); + /* Loop each other resource group except default group */ + for (j = 1; j < host->n_groups; j++) { + schemata = host->groups[j]->schemata[t]; + virBitmapSubtract(default_grp->schemata[t]->masks[i]->mask, + schemata->masks[i]->mask); + } + } + } + } +} + +static virResctrlGroupPtr +virResctrlGetFreeGroup(void) +{ + size_t i; + virResctrlHostPtr host = NULL; + virResctrlGroupPtr grp = NULL; + + if (VIR_ALLOC(host) < 0) + return NULL; + + if (virResctrlLoadHost(host) < 0) + goto error; + + virResctrlRefreshHost(host); + + for (i = 1; i < host->n_groups; i++) + virResctrlFreeGroup(host->groups[i]); + + grp = host->groups[0]; + VIR_FREE(host); + + return grp; + + error: + virResctrlFreeGroup(grp); + return NULL; +} + +virResctrlSchemataPtr +virResctrlGetFreeCache(virResctrlType type) +{ + VIR_DEBUG("type=%d", type); + + virResctrlType t; + virResctrlGroupPtr grp = NULL; + virResctrlSchemataPtr schemata = NULL; + int lockfd = -1; + + lockfd = open(SYSFS_RESCTRL_PATH, O_DIRECTORY); + if (lockfd < 0) + return NULL; + + VIR_RESCTRL_LOCK(lockfd, LOCK_SH); + + if ((grp = virResctrlGetFreeGroup()) == NULL) + goto cleanup; + + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + if (t == type) + schemata = grp->schemata[t]; + else + virResctrlFreeSchemata(grp->schemata[t]); + + } + + cleanup: + VIR_RESCTRL_UNLOCK(lockfd); + VIR_FORCE_CLOSE(lockfd); + return schemata; +} + +static int +virResctrlCalculateCbm(int cbm_len, + virBitmapPtr defaultcbm, + virBitmapPtr newcbm) +{ + VIR_DEBUG("cbm_len=%d, defaultcbm=%p, newcbm=%p", + cbm_len, defaultcbm, newcbm); + + ssize_t pos = -1; + size_t i; + + /* not enough cache way to be allocated */ + if (virBitmapCountBits(defaultcbm) < cbm_len + 1) + return -1; + + while ((pos = virBitmapNextSetBit(defaultcbm, pos)) >= 0) { + for (i = 0; i < cbm_len; i++) + ignore_value(virBitmapSetBitExpand(newcbm, i + pos)); + /* Test if newcbm is sub set of defaultcbm */ + if (virBitmapNextClearBit(defaultcbm, pos) > i + pos) { + break; + } else { + pos = pos + i - 1; + virBitmapClearAll(newcbm); + } + } + + if (virBitmapCountBits(newcbm) != cbm_len) + return -1; + + /* consume default cbm after allocation */ + virBitmapSubtract(defaultcbm, newcbm); + + return 0; +} + +/* Fill mask value for newly created resource group base on hostcachebank + * and domcachebank */ +static int +virResctrlFillMask(virResctrlGroupPtr grp, + virResctrlGroupPtr free_grp, + virCapsHostCacheBankPtr hostcachebank, + virDomainCacheBankPtr domcachebank) +{ + VIR_DEBUG("grp=%p, free_grp=%p, hostcachebank=%p, domcachebank=%p", + grp, free_grp, hostcachebank, domcachebank); + + size_t i; + int cbm_candidate_len; + unsigned int cache_id; + unsigned int cache_type; + virCapsHostCacheControlPtr control = NULL; + virResctrlMaskPtr mask; + virResctrlSchemataPtr schemata = NULL; + + /* Find control information for that kind type of cache */ + for (i = 0; i < hostcachebank->ncontrols; i++) { + if (hostcachebank->controls[i]->scope == domcachebank->type) { + control = hostcachebank->controls[i]; + break; + } + } + + if (control == NULL) + return -1; + + cache_type = domcachebank->type; + cache_id = domcachebank->cache_id; + schemata = grp->schemata[cache_type]; + + if ((schemata == NULL) && (VIR_ALLOC(schemata) < 0)) + return -1; + + if (VIR_ALLOC(mask) < 0) + return -1; + + mask->cache_id = cache_id; + mask->mask = virBitmapNewEmpty(); + + /* here should be control->granularity and control->min + also domcachebank size should be checked while define domain xml */ + cbm_candidate_len = domcachebank->size / control->min; + VIR_DEBUG("cbm_len = %d", cbm_candidate_len); + if (virResctrlCalculateCbm(cbm_candidate_len, + free_grp->schemata[cache_type]->masks[cache_id]->mask, + mask->mask) < 0) + goto error; + + schemata->type = cache_type; + schemata->n_masks += 1; + schemata->masks[cache_id] = mask; + grp->schemata[cache_type] = schemata; + + return 0; + + error: + VIR_FREE(schemata); + return -1; +} + +/* only keep the highest consecutive bits */ +static void +virResctrlTrimMask(virResctrlMaskPtr mask) +{ + size_t i; + ssize_t setbit = -1; + ssize_t clearbit = -1; + + clearbit = virBitmapNextClearBit(mask->mask, -1); + setbit = virBitmapNextSetBit(mask->mask, -1); + for (i = setbit; i < clearbit; i++) + ignore_value(virBitmapClearBit(mask->mask, i)); +} + +static int +virResctrlCompleteMask(virResctrlSchemataPtr schemata, + virResctrlSchemataPtr defaultschemata) +{ + size_t i; + virResctrlMaskPtr mask; + + if (schemata == NULL && VIR_ALLOC(schemata) < 0) + return -1; + + if (schemata->n_masks == defaultschemata->n_masks) + return 0; + + for (i = 0; i < defaultschemata->n_masks; i++) { + if (schemata->masks[i] == NULL) { + if (VIR_ALLOC(mask) < 0) + goto error; + + mask->cache_id = i; + mask->mask = virBitmapNewEmpty(); + schemata->n_masks += 1; + schemata->masks[i] = mask; + /* resctrl doesn't allow mask to be zero + use higher bits to fill up the cbm which + domaincache bank doens't provide */ + ignore_value(virBitmapSetBitExpand(mask->mask, + virBitmapLastSetBit(defaultschemata->masks[i]->mask))); + } + /* only keep the highest consecutive bits for default group */ + virResctrlTrimMask(defaultschemata->masks[i]); + } + + return 0; + + error: + VIR_FREE(schemata); + return -1; +} + +/* complete the schemata in the resrouce group before it can be write back + to resctrl */ +static int +virResctrlCompleteGroup(virResctrlGroupPtr grp, + virResctrlGroupPtr default_grp) +{ + virResctrlType t; + virResctrlSchemataPtr schemata; + virResctrlSchemataPtr defaultschemata; + + + /* NOTES: resctrl system require we need provide all cache's cbm mask */ + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + defaultschemata = default_grp->schemata[t]; + if (defaultschemata != NULL) { + schemata = grp->schemata[t]; + if (virResctrlCompleteMask(schemata, defaultschemata) < 0) + return -1; + /* only keep the highest consecutive bits for default group */ + } + } + return 0; +} + +static +char *virResctrlGetSchemataStr(virResctrlSchemataPtr schemata) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + size_t i; + + virBufferAsprintf(&buf, "%s:%u=%s", + virResctrlTypeToString(schemata->type), + schemata->masks[0]->cache_id, + virResctrlBitmap2String(schemata->masks[0]->mask)); + + for (i = 1; i < schemata->n_masks; i ++) + virBufferAsprintf(&buf, ";%u=%s", + schemata->masks[i]->cache_id, + virResctrlBitmap2String(schemata->masks[i]->mask)); + + return virBufferContentAndReset(&buf); +} + +static int +virResctrlFlushGroup(virResctrlGroupPtr grp) +{ + int ret = -1; + size_t i; + char *schemata_str = NULL; + virResctrlType t; + virBuffer buf = VIR_BUFFER_INITIALIZER; + + VIR_DEBUG("grp=%p", grp); + + if (grp->name != NULL && virResctrlNewSysGroup(grp->name) < 0) + return -1; + + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + if (grp->schemata[t] != NULL) { + schemata_str = virResctrlGetSchemataStr(grp->schemata[t]); + virBufferAsprintf(&buf, "%s\n", schemata_str); + VIR_FREE(schemata_str); + } + } + + schemata_str = virBufferContentAndReset(&buf); + + if (virResctrlWrite(grp->name, "schemata", schemata_str) < 0) + goto cleanup; + + for (i = 0; i < grp->n_tasks; i++) { + if (virResctrlWrite(grp->name, "tasks", grp->tasks[i]) < 0) + goto cleanup; + } + + ret = 0; + + cleanup: + VIR_FREE(schemata_str); + return ret; +} + +int virResctrlSetCachetunes(virCapsHostPtr caps, + virDomainCachetunePtr cachetune, + unsigned char* uuid, pid_t *pids, int npid) +{ + size_t i; + size_t j; + int ret = -1; + char name[VIR_UUID_STRING_BUFLEN]; + char *tmp; + int lockfd = -1; + virResctrlGroupPtr grp = NULL; + virResctrlGroupPtr default_grp = NULL; + virCapsHostCacheBankPtr hostcachebank; + virDomainCacheBankPtr domcachebank; + + virUUIDFormat(uuid, name); + + if (cachetune->n_banks < 1) + return 0; + + /* create new resource group */ + if (VIR_ALLOC(grp) < 0) + goto error; + + if (VIR_STRDUP(grp->name, name) < 0) + goto error; + + /* allocate file lock */ + lockfd = open(SYSFS_RESCTRL_PATH, O_DIRECTORY); + if (lockfd < 0) + goto error; + + VIR_RESCTRL_LOCK(lockfd, LOCK_EX); + + if ((default_grp = virResctrlGetFreeGroup()) == NULL) + goto error; + + /* Allocate cache for each cache bank defined in cache tune */ + for (i = 0; i < cachetune->n_banks; i++) { + domcachebank = &cachetune->cache_banks[i]; + hostcachebank = NULL; + /* find the host cache bank to be allocated on */ + for (j = 0; j < caps->ncaches; j++) { + if (caps->caches[j]->id == domcachebank->cache_id) { + hostcachebank = caps->caches[j]; + break; + } + } + /* fill up newly crated grp and consume from default_grp */ + if (virResctrlFillMask(grp, default_grp, hostcachebank, domcachebank) < 0) + goto error; + } + + /* Add tasks to grp */ + for (i = 0; i < npid; i++) { + if (virAsprintf(&tmp, "%llu", (long long)pids[i]) < 0) + goto error; + + if (VIR_APPEND_ELEMENT(grp->tasks, + grp->n_tasks, + tmp) < 0) { + VIR_FREE(tmp); + goto error; + } + } + + if (virResctrlCompleteGroup(grp, default_grp) < 0) { + VIR_WARN("Failed to complete group"); + goto error; + } + + if (virResctrlFlushGroup(grp) < 0) + goto error; + + if (virResctrlFlushGroup(default_grp) < 0) { + virResctrlRemoveSysGroup(grp->name); + goto error; + } + + ret = 0; + + error: + VIR_RESCTRL_UNLOCK(lockfd); + VIR_FORCE_CLOSE(lockfd); + virResctrlFreeGroup(grp); + virResctrlFreeGroup(default_grp); + return ret; +} + +int virResctrlRemoveCachetunes(unsigned char* uuid) +{ + int ret = -1; + int lockfd = -1; + size_t i; + virResctrlType t; + virResctrlSchemataPtr schemata; + char name[VIR_UUID_STRING_BUFLEN]; + virResctrlGroupPtr default_grp = NULL; + + virUUIDFormat(uuid, name); + + VIR_DEBUG("name=%s", name); + + lockfd = open(SYSFS_RESCTRL_PATH, O_DIRECTORY); + if (lockfd < 0) + return -1; + + VIR_RESCTRL_LOCK(lockfd, LOCK_SH); + + if (virResctrlRemoveSysGroup(name) < 0) + goto cleanup; + + if ((default_grp = virResctrlGetFreeGroup()) == NULL) + goto cleanup; + + for (t = 0; t < VIR_RESCTRL_TYPE_LAST; t++) { + schemata = default_grp->schemata[t]; + if (schemata != NULL) { + for (i = 0; i < schemata->n_masks; i++) + virResctrlTrimMask(schemata->masks[i]); + } + } + + if (virResctrlFlushGroup(default_grp) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_RESCTRL_UNLOCK(lockfd); + VIR_FORCE_CLOSE(lockfd); + return ret; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..7373c72 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,79 @@ +/* + * virresctrl.h: header for managing resctrl control + * + * Copyright (C) 2017 Intel, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +#include "virutil.h" +#include "virbitmap.h" +#include "conf/domain_conf.h" + +#define MAX_CACHE_ID 16 + +typedef enum { + VIR_RESCTRL_TYPE_L3, + VIR_RESCTRL_TYPE_L3_CODE, + VIR_RESCTRL_TYPE_L3_DATA, + + VIR_RESCTRL_TYPE_LAST +} virResctrlType; + +VIR_ENUM_DECL(virResctrl); + +/* + * a virResctrlMask represents one of mask object in a + * resource control group. + * e.g., 0=f + */ +typedef struct _virResctrlMask virResctrlMask; +typedef virResctrlMask *virResctrlMaskPtr; +struct _virResctrlMask { + unsigned int cache_id; /* cache resource id */ + virBitmapPtr mask; /* the cbm mask */ +}; + +/* + * a virResctrlSchemata represents schemata objects of specific type of + * resource in a resource control group. + * eg: L3:0=f,1=ff + */ +typedef struct _virResctrlSchemata virResctrlSchemata; +typedef virResctrlSchemata *virResctrlSchemataPtr; +struct _virResctrlSchemata { + virResctrlType type; /* resource control type, e.g., L3 */ + size_t n_masks; /* number of masks */ + virResctrlMaskPtr masks[MAX_CACHE_ID]; /* array of mask, use array for easy index */ +}; + +/* Get free cache of the host, result saved in schemata */ +virResctrlSchemataPtr virResctrlGetFreeCache(virResctrlType type); + +/* Get mask string from Bitmap */ +char *virResctrlBitmap2String(virBitmapPtr bitmap); + +void virResctrlFreeSchemata(virResctrlSchemataPtr ptr); +int virResctrlSetCachetunes(virCapsHostPtr caps, + virDomainCachetunePtr cachetune, + unsigned char* uuid, pid_t *pids, int npid); +int virResctrlRemoveCachetunes(unsigned char* uuid); +#endif diff --git a/tests/Makefile.am (http://Makefile.am) b/tests/Makefile.am (http://Makefile.am) index 19986dc..e0b9923 100644 --- a/tests/Makefile.am (http://Makefile.am) +++ b/tests/Makefile.am (http://Makefile.am) @@ -229,6 +229,7 @@ if WITH_LINUX test_programs += fchosttest test_programs += scsihosttest test_programs += vircaps2xmltest +test_programs += virresctrltest test_libraries += virusbmock.la (http://virusbmock.la) \ virnetdevbandwidthmock.la (http://virnetdevbandwidthmock.la) \ virnumamock.la (http://virnumamock.la) \ @@ -1149,8 +1150,13 @@ virnumamock_la_CFLAGS = $(AM_CFLAGS) virnumamock_la_LDFLAGS = $(MOCKLIBS_LDFLAGS) virnumamock_la_LIBADD = $(MOCKLIBS_LIBS)
+virresctrltest_SOURCES = \ + virresctrltest.c testutils.h testutils.c virfilewrapper.h virfilewrapper.c +virresctrltest_LDADD = $(LDADDS) + else ! WITH_LINUX -EXTRA_DIST += vircaps2xmltest.c virnumamock.c virfilewrapper.c virfilewrapper.h +EXTRA_DIST += vircaps2xmltest.c virnumamock.c virfilewrapper.c \ + virfilewrapper.h virresctrltest.c endif ! WITH_LINUX
if WITH_NSS diff --git a/tests/virresctrldata/L3-free.schemata b/tests/virresctrldata/L3-free.schemata new file mode 100644 index 0000000..9b47d25 --- /dev/null +++ b/tests/virresctrldata/L3-free.schemata @@ -0,0 +1 @@ +L3:0=1ffff;1=1ffff diff --git a/tests/virresctrldata/L3CODE-free.schemata b/tests/virresctrldata/L3CODE-free.schemata new file mode 100644 index 0000000..7039c45 --- /dev/null +++ b/tests/virresctrldata/L3CODE-free.schemata @@ -0,0 +1 @@ +L3CODE:0=cffff;1=cffff diff --git a/tests/virresctrldata/L3DATA-free.schemata b/tests/virresctrldata/L3DATA-free.schemata new file mode 100644 index 0000000..30f1cbd --- /dev/null +++ b/tests/virresctrldata/L3DATA-free.schemata @@ -0,0 +1 @@ +L3DATA:0=3ffff;1=3ffff diff --git a/tests/virresctrldata/linux-resctrl b/tests/virresctrldata/linux-resctrl new file mode 120000 index 0000000..069dfb2 --- /dev/null +++ b/tests/virresctrldata/linux-resctrl @@ -0,0 +1 @@ +../vircaps2xmldata/linux-resctrl \ No newline at end of file diff --git a/tests/virresctrldata/linux-resctrl-cdp b/tests/virresctrldata/linux-resctrl-cdp new file mode 120000 index 0000000..c5a973f --- /dev/null +++ b/tests/virresctrldata/linux-resctrl-cdp @@ -0,0 +1 @@ +../vircaps2xmldata/linux-resctrl-cdp \ No newline at end of file diff --git a/tests/virresctrltest.c b/tests/virresctrltest.c new file mode 100644 index 0000000..a9d75f1 --- /dev/null +++ b/tests/virresctrltest.c @@ -0,0 +1,119 @@ +/* + * Copyright (C) Intel, Inc. 2017 + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ + +#include <config.h> +#include <stdlib.h> + +#include "testutils.h" +#include "virbitmap.h" +#include "virfilewrapper.h" +#include "virresctrl.h" + + +#define VIR_FROM_THIS VIR_FROM_NONE + +struct virResctrlData { + const char *filename; + virResctrlType type; +}; + +static void +GetSchemataStr(virResctrlSchemataPtr schemata, char **str) +{ + size_t i; + + virBuffer buf = VIR_BUFFER_INITIALIZER; + virBufferAsprintf(&buf, "%s:%u=%s", + virResctrlTypeToString(schemata->type), + schemata->masks[0]->cache_id, + virResctrlBitmap2String(schemata->masks[0]->mask)); + + for (i = 1; i < schemata->n_masks; i ++) + virBufferAsprintf(&buf, ";%u=%s", + schemata->masks[i]->cache_id, + virResctrlBitmap2String(schemata->masks[i]->mask)); + + *str = virBufferContentAndReset(&buf); +} + +static int +test_virResctrl(const void *opaque) +{ + struct virResctrlData *data = (struct virResctrlData *) opaque; + char *dir = NULL; + char *resctrl = NULL; + int ret = -1; + virResctrlSchemataPtr schemata = NULL; + char *schemata_str = NULL; + char *schemata_file; + + if (virAsprintf(&resctrl, "%s/virresctrldata/linux-%s/resctrl", + abs_srcdir, data->filename) < 0) + goto cleanup; + + if (virAsprintf(&schemata_file, "%s/virresctrldata/%s-free.schemata", + abs_srcdir, virResctrlTypeToString(data->type)) < 0) + goto cleanup; + + if (virFileWrapperAddPrefix("/sys/fs/resctrl", resctrl) < 0) + goto cleanup; + + if ((schemata = virResctrlGetFreeCache(data->type)) == NULL) + goto cleanup; + + virFileWrapperClearPrefixes(); + + GetSchemataStr(schemata, &schemata_str); + + if (virTestCompareToFile(schemata_str, schemata_file) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FREE(dir); + VIR_FREE(resctrl); + VIR_FREE(schemata_str); + virResctrlFreeSchemata(schemata); + return ret; +} + +static int +mymain(void) +{ + int ret = 0; + +#define DO_TEST_FULL(filename, type) \ + do { \ + struct virResctrlData data = {filename, \ + type}; \ + if (virTestRun(filename, test_virResctrl, &data) < 0) \ + ret = -1; \ + } while (0) + + DO_TEST_FULL("resctrl", VIR_RESCTRL_TYPE_L3); + DO_TEST_FULL("resctrl-cdp", VIR_RESCTRL_TYPE_L3_CODE); + DO_TEST_FULL("resctrl-cdp", VIR_RESCTRL_TYPE_L3_DATA); + + return ret; +} + +VIR_TEST_MAIN(mymain) -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list

On Tue, Jun 20, 2017 at 03:38:15PM +0800, Eli Qiao wrote:
hello, ping
BTW this ping sent from gmail is both plaintext and html at the same time. Not only is it treating Makefile.am as domain name, that's not that big of a deal, but because you didn't trim it down, you made 36K patch into 91K message (quite a bit more than the five bytes needed for just "ping\n".

On Mon, Jun 12, 2017 at 05:48:41PM +0800, Eli Qiao wrote:
This patch adds 3 major private interface.
virResctrlGetFreeCache: return free cache, default cache substract cache allocated. virResctrlSetCachetunes: set cache banks which defined in a domain. virResctrlRemoveCachetunes: remove cache allocation group from the host.
There's some existed issue when do syntax-check as I reference the cache tune and cachebank definition (from conf/domain_conf.h) in util/virresctrl.h.
Yes, util/ cannot depend on conf/ in libvirt due to various reasons. All the data you want to use in util/ need to be defined there. If that corresponds to some XML, the parsers and formatters must be in conf/. In rare cases, there might be need for two data structures, one in util/ and one in conf/. I don't think that's needed in this case.

On Tuesday, 20 June 2017 at 8:39 PM, Martin Kletzander wrote:
On Mon, Jun 12, 2017 at 05:48:41PM +0800, Eli Qiao wrote:
This patch adds 3 major private interface.
virResctrlGetFreeCache: return free cache, default cache substract cache allocated. virResctrlSetCachetunes: set cache banks which defined in a domain. virResctrlRemoveCachetunes: remove cache allocation group from the host.
There's some existed issue when do syntax-check as I reference the cache tune and cachebank definition (from conf/domain_conf.h) in util/virresctrl.h.
Yes, util/ cannot depend on conf/ in libvirt due to various reasons. All the data you want to use in util/ need to be defined there. If that corresponds to some XML, the parsers and formatters must be in conf/. In rare cases, there might be need for two data structures, one in util/ and one in conf/. I don't think that's needed in this case.
I can move the virDomainCacheBank definition to util/virresctrl.h but what about the virCapsHostPtr, we need the host cache information and resctrl information it’s defined in src/conf/capabilities.h” Do we need to define another copy in virresctrl.h ?
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, Jun 21, 2017 at 02:07:15PM +0800, Eli Qiao wrote:
On Tuesday, 20 June 2017 at 8:39 PM, Martin Kletzander wrote:
On Mon, Jun 12, 2017 at 05:48:41PM +0800, Eli Qiao wrote:
This patch adds 3 major private interface.
virResctrlGetFreeCache: return free cache, default cache substract cache allocated. virResctrlSetCachetunes: set cache banks which defined in a domain. virResctrlRemoveCachetunes: remove cache allocation group from the host.
There's some existed issue when do syntax-check as I reference the cache tune and cachebank definition (from conf/domain_conf.h) in util/virresctrl.h.
Yes, util/ cannot depend on conf/ in libvirt due to various reasons. All the data you want to use in util/ need to be defined there. If that corresponds to some XML, the parsers and formatters must be in conf/. In rare cases, there might be need for two data structures, one in util/ and one in conf/. I don't think that's needed in this case.
I can move the virDomainCacheBank definition to util/virresctrl.h
but what about the virCapsHostPtr, we need the host cache information and resctrl information it’s defined in src/conf/capabilities.h”
Do we need to define another copy in virresctrl.h ?
You don't need to pass the whole structure of all the data. Can't the qemu function just do something like: virResctrlLock() foreach cachetune: region = virResctrlGetFreeRegion(size, type) foreach cachetune.vcpu: virResctrlSetRegion(vcpu.pid, region) Or like with some other tuning, you can have a function that determines the region when given vcpu and just call it for all vcpus. You can save the regions in the status XML, so that not only users can see it, but you can also reference them from that aforementioned function. Or you could have saved pairs of id: region, but I think that's not needed. That reminds me, unless referred to from somewhere, the cachetune doesn't even need the id. But basically, you don't need to pass the whole cachetune or any other structure. The code is very messy, check your pointers and don't compare references to NULLs. Read the diffs after yourself. I know it works for you, but the code needs to be readable as well.

On Wednesday, 21 June 2017 at 10:44 PM, Martin Kletzander wrote:
You don't need to pass the whole structure of all the data. Can't the qemu function just do something like:
virResctrlLock() foreach cachetune: region = virResctrlGetFreeRegion(size, type) foreach cachetune.vcpu: virResctrlSetRegion(vcpu.pid, region)
Sure, good suggestion, I will try to read what other tune does.
Or like with some other tuning, you can have a function that determines the region when given vcpu and just call it for all vcpus. You can save the regions in the status XML, so that not only users can see it, but you can also reference them from that aforementioned function. Or you could have saved pairs of id: region, but I think that's not needed.
That reminds me, unless referred to from somewhere, the cachetune doesn't even need the id.
But basically, you don't need to pass the whole cachetune or any other structure. The code is very messy, check your pointers and don't compare references to NULLs. Read the diffs after yourself. I know it works for you, but the code needs to be readable as well.
Forgive me I am not a qualified c programmer for years :(, I will try to refine them.

On Wednesday, 21 June 2017 at 10:44 PM, Martin Kletzander wrote:
n 21, 2017 at 02:07:15PM +0800, Eli Qiao wrote:
You don't need to pass the whole structure of all the data. Can't the qemu function just do something like:
virResctrlLock() foreach cachetune: region = virResctrlGetFreeRegion(size, type) foreach cachetune.vcpu: virResctrlSetRegion(vcpu.pid, region)
Or like with some other tuning, you can have a function that determines the region when given vcpu and just call it for all vcpus. You can save the regions in the status XML, so that not only users can see it, but you can also reference them from that aforementioned function. Or you could have saved pairs of id: region, but I think that's not needed.
That reminds me, unless referred to from somewhere, the cachetune doesn't even need the id.
But basically, you don't need to pass the whole cachetune or any other structure. The code is very messy, check your pointers and don't compare references to NULLs. Read the diffs after yourself. I know it works for you, but the code needs to be readable as well.
Forgive me to spam it again. I agree that not to pass whole cachetune , but it seems I need to define new struct or passing more paramter while setting a cachetune. 1. It’s not just so simple we only get the Region, and setRegion. We need to consume the llc cache from the “default” group on the host and then write back it to /sys/fs/resctrl/ and it’s better that we need to compute all the mask at same time because for multiple socket host, if user don’t specify cache allocation on all sockets, we need to complete the mask by the min bits. So I don’t think its a good solution to setcachetune in a look. Better to pass all domain cache tune to virrresctrl and let it calculate it at a time then write it back to /sys/fs/resctrl 2. for region = virResctrlGetFreeRegion(size, type) if host has multiple sockets, we can not decide which region only be size, type. Which is to say, we need the cache_id. 3. while doing cache allocation, we need to pass cache control information: min, granularity, maxAlloc. beyond that , we need cache_id (we can find by vcups list) type, size..
participants (3)
-
Eli Qiao
-
Eli Qiao
-
Martin Kletzander