[RFC PATCH 0/4] Added virtio-net RSS with eBPF support.

This series of rfc patches adds support for loading the RSS eBPF program and passing it to the QEMU. Comments and suggestions would be useful. QEMU with vhost may work with RSS through eBPF. To load eBPF, the capabilities required that Libvirt may provide. eBPF program and maps may be unique for particular QEMU and Libvirt retrieves eBPF through qapi. For now, there is only "RSS" eBPF object in QEMU, in the future, there may be another one(g.e. network filters). That's why in Libvirt added logic to load and store any eBPF object that QEMU provides using qapi schema. For virtio-net RSS, the document has not changed. ``` <interface type="network"> <model type="virtio"/> <driver queues="4" rss="on" rss_hash_report="off"/> <interface type="network"> ``` Simplified routine for RSS: * Libvirt retrieves eBPF "RSS" and load it. * Libvirt passes file descriptors to virtio-net with property "ebpf_rss_fds" ("rss" property should be "on" too). * if fds was provided - QEMU using eBPF RSS implementation. * if fds was not provided - QEMU tries to load eBPF RSS in own context and use it. * if eBPF RSS was not loaded - QEMU uses "in-qemu" RSS(vhost not supported). meson.build | 6 ++ meson_options.txt | 1 + src/qemu/meson.build | 1 + src/qemu/qemu_capabilities.c | 181 +++++++++++++++++++++++++++++++++++ src/qemu/qemu_capabilities.h | 4 + src/qemu/qemu_command.c | 53 ++++++++++ src/qemu/qemu_domain.c | 4 + src/qemu/qemu_domain.h | 3 + src/qemu/qemu_interface.c | 42 ++++++++ src/qemu/qemu_interface.h | 4 + src/qemu/qemu_monitor.c | 23 +++++ src/qemu/qemu_monitor.h | 3 + src/qemu/qemu_monitor_json.c | 21 ++++ src/qemu/qemu_monitor_json.h | 3 + 14 files changed, 349 insertions(+) -- 2.42.0

Added code for monitor and monitor_json. The "request-ebpf" return's eBPF binary object encoded in base64. The function qemuMonitorGetEbpf() returns a decoded blob. QEMU provides eBPF that can be loaded and passed to it from Libvirt. QEMU requires exact eBPF program/maps, so it can be retrieved using QAPI. To load eBPF program - administrative capabilities are required, so Libvirt may load it and pass it to the QEMU instance. For now, there is only "RSS"(Receive Side Scaling) for virtio-net eBPF program and maps. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- src/qemu/qemu_monitor.c | 23 +++++++++++++++++++++++ src/qemu/qemu_monitor.h | 3 +++ src/qemu/qemu_monitor_json.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 +++ 4 files changed, 50 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 320729f067..07596c78ee 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -4512,3 +4512,26 @@ qemuMonitorGetStatsByQOMPath(virJSONValue *arr, return NULL; } + +void * +qemuMonitorGetEbpf(qemuMonitor *mon, const char *ebpfName, size_t *size) +{ + QEMU_CHECK_MONITOR_NULL(mon); + g_autoptr(virJSONValue) reply = NULL; + const char *ebpfBase64 = NULL; + void *ebpfObject = NULL; + + if (ebpfName == NULL || size == NULL) + return NULL; + + reply = qemuMonitorJSONGetEbpf(mon, ebpfName); + + if (reply == NULL) + return NULL; + + ebpfBase64 = virJSONValueObjectGetString(reply, "object"); + + ebpfObject = g_base64_decode(ebpfBase64, size); + + return ebpfObject; +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 6c590933aa..15f32f105c 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1579,3 +1579,6 @@ qemuMonitorExtractQueryStats(virJSONValue *info); virJSONValue * qemuMonitorGetStatsByQOMPath(virJSONValue *arr, char *qom_path); + +void * +qemuMonitorGetEbpf(qemuMonitor *mon, const char *ebpfName, size_t *size); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 8152eea9a0..a7d0865ddc 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -8851,3 +8851,24 @@ qemuMonitorJSONQueryStats(qemuMonitor *mon, return virJSONValueObjectStealArray(reply, "return"); } + + +virJSONValue * +qemuMonitorJSONGetEbpf(qemuMonitor *mon, const char *ebpfName) +{ + g_autoptr(virJSONValue) cmd = NULL; + g_autoptr(virJSONValue) reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("request-ebpf", + "s:id", ebpfName, NULL))) + return NULL; + + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + return NULL; + + /* return empty hash */ + if (qemuMonitorJSONHasError(reply, "CommandNotFound")) + return NULL; + + return virJSONValueObjectStealObject(reply, "return"); +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 06023b98ea..6a2fc963ba 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -825,3 +825,6 @@ qemuMonitorJSONQueryStats(qemuMonitor *mon, qemuMonitorQueryStatsTargetType target, char **vcpus, GPtrArray *providers); + +virJSONValue * +qemuMonitorJSONGetEbpf(qemuMonitor *mon, const char *ebpfName); -- 2.42.0

On Mon, Oct 09, 2023 at 09:16:11 +0300, Andrew Melnychenko wrote:
Added code for monitor and monitor_json. The "request-ebpf" return's eBPF binary object encoded in base64. The function qemuMonitorGetEbpf() returns a decoded blob.
QEMU provides eBPF that can be loaded and passed to it from Libvirt. QEMU requires exact eBPF program/maps, so it can be retrieved using QAPI. To load eBPF program - administrative capabilities are required, so Libvirt may load it and pass it to the QEMU instance. For now, there is only "RSS"(Receive Side Scaling) for virtio-net eBPF program and maps.
Please wrap the text in the commit message to the usual line length. Since it's just explanation, the sentences can be neatly wrapped. Long lines in commit message are acceptable only if it's impossible to split them or splitting them would destroy the content (e.g. do not split pasted output of commands).
Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- src/qemu/qemu_monitor.c | 23 +++++++++++++++++++++++ src/qemu/qemu_monitor.h | 3 +++ src/qemu/qemu_monitor_json.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 +++ 4 files changed, 50 insertions(+)
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 320729f067..07596c78ee 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -4512,3 +4512,26 @@ qemuMonitorGetStatsByQOMPath(virJSONValue *arr,
return NULL; }
See coding style problems mentioned in reply to cover letter.
+ +void * +qemuMonitorGetEbpf(qemuMonitor *mon, const char *ebpfName, size_t *size) +{ + QEMU_CHECK_MONITOR_NULL(mon);
Fails to compile: In file included from /usr/lib64/glib-2.0/include/glibconfig.h:9, from /usr/include/glib-2.0/glib/gtypes.h:34, from /usr/include/glib-2.0/glib/galloca.h:34, from /usr/include/glib-2.0/glib.h:32, from /usr/include/glib-2.0/gobject/gbinding.h:30, from /usr/include/glib-2.0/glib-object.h:24, from /usr/include/glib-2.0/gio/gioenums.h:30, from /usr/include/glib-2.0/gio/giotypes.h:30, from /usr/include/glib-2.0/gio/gio.h:28, from ../../../libvirt/src/qemu/qemu_monitor.c:27: ../../../libvirt/src/qemu/qemu_monitor.c: In function ‘qemuMonitorGetEbpf’: /usr/include/glib-2.0/glib/gmacros.h:1347:43: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 1347 | #define _GLIB_CLEANUP(func) __attribute__((cleanup(func))) | ^~~~~~~~~~~~~ /usr/include/glib-2.0/glib/gmacros.h:1380:29: note: in expansion of macro ‘_GLIB_CLEANUP’ 1380 | #define g_autoptr(TypeName) _GLIB_CLEANUP(_GLIB_AUTOPTR_FUNC_NAME(TypeName)) _GLIB_AUTOPTR_TYPENAME(TypeName) | ^~~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_monitor.c:4520:5: note: in expansion of macro ‘g_autoptr’ 4520 | g_autoptr(virJSONValue) reply = NULL; | ^~~~~~~~~ cc1: all warnings being treated as errors Make sure your code compiles and passes test suite after every single patch per our coding guidelines.
+ g_autoptr(virJSONValue) reply = NULL; + const char *ebpfBase64 = NULL; + void *ebpfObject = NULL; + + if (ebpfName == NULL || size == NULL)
You are mixing returns which do not set a libvirt error ...
+ return NULL; + + reply = qemuMonitorJSONGetEbpf(mon, ebpfName); + + if (reply == NULL)
... with those that do. This is not a good idea as the caller can't distinguish easily whether it's an error or desired NULL value. We don't allow that.
+ return NULL; + + ebpfBase64 = virJSONValueObjectGetString(reply, "object"); + + ebpfObject = g_base64_decode(ebpfBase64, size);
I'd very strongly prefer that this is stored in the base64 form inside libvirt and decoded just before use. Storing strings is simpler, doesn't require you to pass around the 'size' needlessly and simplifies the XML formatter/parser in capabilities. Also please move all extraction of data from the monitor reply to the JSON command. The qemu_monitor.c wrapper at this point should simply pass through. It might eventually be completely removed as it serves no purpose once the HMP monitor support was deleted. (And before that it'd be semantically wrong because you pass the JSON out of the QMP function).
+ + return ebpfObject; +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 6c590933aa..15f32f105c 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1579,3 +1579,6 @@ qemuMonitorExtractQueryStats(virJSONValue *info); virJSONValue * qemuMonitorGetStatsByQOMPath(virJSONValue *arr, char *qom_path); + +void * +qemuMonitorGetEbpf(qemuMonitor *mon, const char *ebpfName, size_t *size); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 8152eea9a0..a7d0865ddc 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -8851,3 +8851,24 @@ qemuMonitorJSONQueryStats(qemuMonitor *mon,
return virJSONValueObjectStealArray(reply, "return"); } + + +virJSONValue * +qemuMonitorJSONGetEbpf(qemuMonitor *mon, const char *ebpfName) +{ + g_autoptr(virJSONValue) cmd = NULL; + g_autoptr(virJSONValue) reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("request-ebpf", + "s:id", ebpfName, NULL))) + return NULL; + + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0)
Once again, both of the above functions set a libvirt error on failure ...
+ return NULL; + + /* return empty hash */ + if (qemuMonitorJSONHasError(reply, "CommandNotFound")) + return NULL;
But this does not. Caller can't distinguish between a monitor failure and command not found. Also the comment makes no sense.
+ + return virJSONValueObjectStealObject(reply, "return"); +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 06023b98ea..6a2fc963ba 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -825,3 +825,6 @@ qemuMonitorJSONQueryStats(qemuMonitor *mon, qemuMonitorQueryStatsTargetType target, char **vcpus, GPtrArray *providers); + +virJSONValue * +qemuMonitorJSONGetEbpf(qemuMonitor *mon, const char *ebpfName); -- 2.42.0

Also, added logic for retrieving eBPF objects from QEMU. eBPF objects stored in the hash table during runtime. eBPF objects cached encoded in base64 in the .xml cache file. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- src/qemu/qemu_capabilities.c | 181 +++++++++++++++++++++++++++++++++++ src/qemu/qemu_capabilities.h | 4 + 2 files changed, 185 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 3a1bfbf74d..d9b5d6fb22 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -698,6 +698,7 @@ VIR_ENUM_IMPL(virQEMUCaps, /* 450 */ "run-with.async-teardown", /* QEMU_CAPS_RUN_WITH_ASYNC_TEARDOWN */ "virtio-blk-vhost-vdpa", /* QEMU_CAPS_DEVICE_VIRTIO_BLK_VHOST_VDPA */ + "virtio-net.ebpf_rss_fds", /* QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS */ ); @@ -789,6 +790,9 @@ struct _virQEMUCaps { virQEMUCapsAccel kvm; virQEMUCapsAccel hvf; virQEMUCapsAccel tcg; + + /* Hash of ebpf objects virQEMUEbpfOBjectData */ + GHashTable *ebpfObjects; }; struct virQEMUCapsSearchData { @@ -796,6 +800,18 @@ struct virQEMUCapsSearchData { const char *binaryFilter; }; +struct virQEMUEbpfOBjectData { + void *ebpfData; + size_t ebpfSize; +}; + +void virQEMUEbpfOBjectDataDestroy(gpointer data) { + if (!data) + return; + struct virQEMUEbpfOBjectData *object = data; + g_free(object->ebpfData); + g_free(data); +} static virClass *virQEMUCapsClass; static void virQEMUCapsDispose(void *obj); @@ -836,6 +852,19 @@ const char *virQEMUCapsArchToString(virArch arch) } +const void * +virQEMUCapsGetEbpf(virQEMUCaps *qemuCaps, const char *id, size_t *size) +{ + struct virQEMUEbpfOBjectData *object = virHashLookup(qemuCaps->ebpfObjects, id); + + if (!object) + return NULL; + + *size = object->ebpfSize; + return object->ebpfData; +} + + /* Checks whether a domain with @guest arch can run natively on @host. */ bool @@ -1426,6 +1455,7 @@ static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVirtioBlk[] = { static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVirtioNet[] = { { "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL }, { "rss", QEMU_CAPS_VIRTIO_NET_RSS, NULL }, + { "ebpf_rss_fds", QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS, NULL }, }; static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsPCIeRootPort[] = { @@ -1804,6 +1834,8 @@ virQEMUCapsNew(void) qemuCaps->invalidation = true; qemuCaps->flags = virBitmapNew(QEMU_CAPS_LAST); + qemuCaps->ebpfObjects = virHashNew(virQEMUEbpfOBjectDataDestroy); + return qemuCaps; } @@ -1984,6 +2016,8 @@ virQEMUCaps *virQEMUCapsNewCopy(virQEMUCaps *qemuCaps) ret->hypervCapabilities = g_memdup(qemuCaps->hypervCapabilities, sizeof(virDomainCapsFeatureHyperv)); + ret->ebpfObjects = g_hash_table_ref(qemuCaps->ebpfObjects); + return g_steal_pointer(&ret); } @@ -2026,6 +2060,8 @@ void virQEMUCapsDispose(void *obj) g_free(qemuCaps->hypervCapabilities); + g_hash_table_unref(qemuCaps->ebpfObjects); + virQEMUCapsAccelClear(&qemuCaps->kvm); virQEMUCapsAccelClear(&qemuCaps->hvf); virQEMUCapsAccelClear(&qemuCaps->tcg); @@ -4541,6 +4577,46 @@ virQEMUCapsValidateArch(virQEMUCaps *qemuCaps, xmlXPathContextPtr ctxt) } +static int +virQEMUCapsParseEbpfObjects(virQEMUCaps *qemuCaps, xmlXPathContextPtr ctxt) +{ + g_autofree xmlNodePtr *nodes = NULL; + size_t i; + int n; + + if ((n = virXPathNodeSet("./ebpf", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to parse qemu cached eBPF object")); + return -1; + } + + for (i = 0; i < n; i++) { + g_autofree char *id = NULL; + g_autofree char *ebpf = NULL; + struct virQEMUEbpfOBjectData *ebpfObject = NULL; + + if (!(id = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing eBPF object ID in QEMU capabilities cache")); + return -1; + } + + if (!(ebpf = virXMLNodeContentString(nodes[i]))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s %s", + _("can't get eBPF base64 data from cache for ID: "), id); + return -1; + } + + ebpfObject = g_malloc(sizeof(*ebpfObject)); + ebpfObject->ebpfData = g_base64_decode(ebpf, &ebpfObject->ebpfSize); + + virHashAddEntry(qemuCaps->ebpfObjects, id, ebpfObject); + } + + return 0; +} + + /* * Parsing a doc that looks like * @@ -4688,6 +4764,8 @@ virQEMUCapsLoadCache(virArch hostArch, if (skipInvalidation) qemuCaps->invalidation = false; + virQEMUCapsParseEbpfObjects(qemuCaps, ctxt); + return 0; } @@ -4925,6 +5003,32 @@ virQEMUCapsFormatHypervCapabilities(virQEMUCaps *qemuCaps, } +static int +virQEMUCapsFormatEbpfObjectsIterator(void *payload, const char *name, void *opaque) +{ + virBuffer *buf = opaque; + struct virQEMUEbpfOBjectData *object = payload; + g_autofree char *objectBase64 = NULL; + + if (!buf || !object) + return 0; + + objectBase64 = g_base64_encode(object->ebpfData, object->ebpfSize); + if (!objectBase64) + return 0; + + virBufferAsprintf(buf, "<ebpf id='%s'>\n", name); + virBufferAdjustIndent(buf, 2); + + virBufferAddStr(buf, objectBase64); + virBufferAddLit(buf, "\n"); + + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</ebpf>\n"); + + return 0; +} + char * virQEMUCapsFormatCache(virQEMUCaps *qemuCaps) { @@ -5015,6 +5119,8 @@ virQEMUCapsFormatCache(virQEMUCaps *qemuCaps) if (qemuCaps->kvmSupportsSecureGuest) virBufferAddLit(&buf, "<kvmSupportsSecureGuest/>\n"); + virHashForEach(qemuCaps->ebpfObjects, virQEMUCapsFormatEbpfObjectsIterator, &buf); + virBufferAdjustIndent(&buf, -2); virBufferAddLit(&buf, "</qemuCaps>\n"); @@ -5437,6 +5543,79 @@ virQEMUCapsInitProcessCaps(virQEMUCaps *qemuCaps) } +static int +virQEMUCapsProbeQMPEbpfObject(virQEMUCaps *qemuCaps, const char *id, + qemuMonitor *mon) +{ + void *ebpfObject = NULL; + size_t size = 0; + + if (!id) + return -1; + + ebpfObject = qemuMonitorGetEbpf(mon, id, &size); + if(ebpfObject == NULL) + return -1; + + struct virQEMUEbpfOBjectData *object = g_malloc(sizeof(*object)); + object->ebpfData = ebpfObject; + object->ebpfSize = size; + + virHashAddEntry(qemuCaps->ebpfObjects, id, object); + + return 0; +} + +static void virQEMUCapsProbeQMPSchemaEbpf(virQEMUCaps *qemuCaps, GHashTable* schema, qemuMonitor *mon) { + if (schema == NULL) + return; + + virJSONValue *requestSchema = virHashLookup(schema, "request-ebpf"); + if (!requestSchema) + return; + + const char *argType = virJSONValueObjectGetString(requestSchema, "arg-type"); + if (!argType) + return; + + virJSONValue *argSchema = virHashLookup(schema, argType); + if (!argSchema) + return; + + /* Get member id type*/ + virJSONValue *members = virJSONValueObjectGetArray(argSchema, "members"); + if (!members) + return; + + /* Find "id" type */ + virJSONValue *idSchema = NULL; + for (size_t i = 0; (idSchema = virJSONValueArrayGet(members, i)) != NULL; ++i) { + const char *name = virJSONValueObjectGetString(idSchema, "name"); + if (strncmp(name, "id", 3) == 0) + break; + } + + if (!idSchema) + return; + + const char *ebpfIds = virJSONValueObjectGetString(idSchema, "type"); + virJSONValue *ebpfIdsSchema = virHashLookup(schema, ebpfIds); + if (!ebpfIdsSchema) + return; + + virJSONValue *ebpfIdsArray = virJSONValueObjectGetArray(ebpfIdsSchema, "values"); + if (!ebpfIdsArray) + return; + + /* Try to request every eBPF */ + virJSONValue *id = NULL; + for (size_t i = 0; (id = virJSONValueArrayGet(ebpfIdsArray, i)) != NULL; ++i) { + const char *name = virJSONValueGetString(id); + virQEMUCapsProbeQMPEbpfObject(qemuCaps, name, mon); + } +} + + static int virQEMUCapsProbeQMPSchemaCapabilities(virQEMUCaps *qemuCaps, qemuMonitor *mon) @@ -5466,6 +5645,8 @@ virQEMUCapsProbeQMPSchemaCapabilities(virQEMUCaps *qemuCaps, virQEMUCapsSet(qemuCaps, cmd->flag); } + virQEMUCapsProbeQMPSchemaEbpf(qemuCaps, schema, mon); + return 0; } diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 3c4f7f625b..164dc003d0 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -677,6 +677,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 450 */ QEMU_CAPS_RUN_WITH_ASYNC_TEARDOWN, /* asynchronous teardown -run-with async-teardown=on|off */ QEMU_CAPS_DEVICE_VIRTIO_BLK_VHOST_VDPA, /* virtio-blk-vhost-vdpa block driver */ + QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS, /* virtio-net ebpf_rss_fds feature */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; @@ -895,3 +896,6 @@ int virQEMUCapsProbeQMPMachineTypes(virQEMUCaps *qemuCaps, virDomainVirtType virtType, qemuMonitor *mon); + +const void * +virQEMUCapsGetEbpf(virQEMUCaps *qemuCaps, const char *id, size_t *size); -- 2.42.0

On Mon, Oct 09, 2023 at 09:16:12 +0300, Andrew Melnychenko wrote:
Also, added logic for retrieving eBPF objects from QEMU. eBPF objects stored in the hash table during runtime. eBPF objects cached encoded in base64 in the .xml cache file.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- src/qemu/qemu_capabilities.c | 181 +++++++++++++++++++++++++++++++++++ src/qemu/qemu_capabilities.h | 4 + 2 files changed, 185 insertions(+)
Once again, make sure to check that each patch builds and passes tests. This one fails to compile: u/libvirt_driver_qemu_impl.a.p/qemu_capabilities.c.o -c ../../../libvirt/src/qemu/qemu_capabilities.c ../../../libvirt/src/qemu/qemu_capabilities.c:808:6: error: no previous prototype for ‘virQEMUEbpfOBjectDataDestroy’ [-Werror=missing-prototypes] 808 | void virQEMUEbpfOBjectDataDestroy(gpointer data) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c: In function ‘virQEMUEbpfOBjectDataDestroy’: ../../../libvirt/src/qemu/qemu_capabilities.c:811:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 811 | struct virQEMUEbpfOBjectData *object = data; | ^~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c: In function ‘virQEMUCapsProbeQMPEbpfObject’: ../../../libvirt/src/qemu/qemu_capabilities.c:5560:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5560 | struct virQEMUEbpfOBjectData *object = g_malloc(sizeof(*object)); | ^~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c: In function ‘virQEMUCapsProbeQMPSchemaEbpf’: ../../../libvirt/src/qemu/qemu_capabilities.c:5573:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5573 | virJSONValue *requestSchema = virHashLookup(schema, "request-ebpf"); | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5577:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5577 | const char *argType = virJSONValueObjectGetString(requestSchema, "arg-type"); | ^~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5581:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5581 | virJSONValue *argSchema = virHashLookup(schema, argType); | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5586:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5586 | virJSONValue *members = virJSONValueObjectGetArray(argSchema, "members"); | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5591:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5591 | virJSONValue *idSchema = NULL; | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5601:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5601 | const char *ebpfIds = virJSONValueObjectGetString(idSchema, "type"); | ^~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5606:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5606 | virJSONValue *ebpfIdsArray = virJSONValueObjectGetArray(ebpfIdsSchema, "values"); | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5611:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5611 | virJSONValue *id = NULL; | ^~~~~~~~~~~~ cc1: all warnings being treated as errors After the compile error is fixed there are further errors caught by syntax-check.
@@ -789,6 +790,9 @@ struct _virQEMUCaps { virQEMUCapsAccel kvm; virQEMUCapsAccel hvf; virQEMUCapsAccel tcg; + + /* Hash of ebpf objects virQEMUEbpfOBjectData */ + GHashTable *ebpfObjects; };
struct virQEMUCapsSearchData { @@ -796,6 +800,18 @@ struct virQEMUCapsSearchData { const char *binaryFilter; };
+struct virQEMUEbpfOBjectData { + void *ebpfData; + size_t ebpfSize; +}; + +void virQEMUEbpfOBjectDataDestroy(gpointer data) { + if (!data) + return; + struct virQEMUEbpfOBjectData *object = data; + g_free(object->ebpfData); + g_free(data); +}
As noted, store the program as the base64 string throughout libvirt, none of the above will be needed.
static virClass *virQEMUCapsClass; static void virQEMUCapsDispose(void *obj); @@ -836,6 +852,19 @@ const char *virQEMUCapsArchToString(virArch arch) }
+const void * +virQEMUCapsGetEbpf(virQEMUCaps *qemuCaps, const char *id, size_t *size) +{ + struct virQEMUEbpfOBjectData *object = virHashLookup(qemuCaps->ebpfObjects, id); + + if (!object) + return NULL; + + *size = object->ebpfSize; + return object->ebpfData; +} + + /* Checks whether a domain with @guest arch can run natively on @host. */ bool @@ -1426,6 +1455,7 @@ static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVirtioBlk[] = { static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVirtioNet[] = { { "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL }, { "rss", QEMU_CAPS_VIRTIO_NET_RSS, NULL }, + { "ebpf_rss_fds", QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS, NULL },
The QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS capability is not actually connected to the 'request-ebpf' output caching. The ebpf programs are cached if the 'request-ebpf' command is present and in such case all programs are cached. Please split the patch such that one adds the 'request-ebpf' related changes and the second one adds the virtio-net capability.
};
static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsPCIeRootPort[] = { @@ -1804,6 +1834,8 @@ virQEMUCapsNew(void) qemuCaps->invalidation = true; qemuCaps->flags = virBitmapNew(QEMU_CAPS_LAST);
+ qemuCaps->ebpfObjects = virHashNew(virQEMUEbpfOBjectDataDestroy); + return qemuCaps; }
@@ -1984,6 +2016,8 @@ virQEMUCaps *virQEMUCapsNewCopy(virQEMUCaps *qemuCaps) ret->hypervCapabilities = g_memdup(qemuCaps->hypervCapabilities, sizeof(virDomainCapsFeatureHyperv));
+ ret->ebpfObjects = g_hash_table_ref(qemuCaps->ebpfObjects);
virQEMUCapsNewCopy is a deep copy function, thus you must deep copy the objects.
+ return g_steal_pointer(&ret);
[...]
@@ -4541,6 +4577,46 @@ virQEMUCapsValidateArch(virQEMUCaps *qemuCaps, xmlXPathContextPtr ctxt) }
+static int +virQEMUCapsParseEbpfObjects(virQEMUCaps *qemuCaps, xmlXPathContextPtr ctxt) +{ + g_autofree xmlNodePtr *nodes = NULL; + size_t i; + int n; + + if ((n = virXPathNodeSet("./ebpf", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to parse qemu cached eBPF object")); + return -1; + } + + for (i = 0; i < n; i++) { + g_autofree char *id = NULL; + g_autofree char *ebpf = NULL; + struct virQEMUEbpfOBjectData *ebpfObject = NULL; + + if (!(id = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing eBPF object ID in QEMU capabilities cache"));
Use virXMLPropStringRequired instead of reporting a custom error.
+ return -1; + } + + if (!(ebpf = virXMLNodeContentString(nodes[i]))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s %s",
broken format string in the error
+ _("can't get eBPF base64 data from cache for ID: "), id);
Store the program as an attribute
+ return -1; + } + + ebpfObject = g_malloc(sizeof(*ebpfObject));
we use exclusively g_new0 for allocation of structs.
+ ebpfObject->ebpfData = g_base64_decode(ebpf, &ebpfObject->ebpfSize);
As noted store this as a string you'll save this step here.
+ + virHashAddEntry(qemuCaps->ebpfObjects, id, ebpfObject); + } + + return 0; +} + + /* * Parsing a doc that looks like * @@ -4688,6 +4764,8 @@ virQEMUCapsLoadCache(virArch hostArch, if (skipInvalidation) qemuCaps->invalidation = false;
+ virQEMUCapsParseEbpfObjects(qemuCaps, ctxt);
So the above function reports many errors, here you don't bother looking for them.
+ return 0; }
@@ -4925,6 +5003,32 @@ virQEMUCapsFormatHypervCapabilities(virQEMUCaps *qemuCaps, }
+static int +virQEMUCapsFormatEbpfObjectsIterator(void *payload, const char *name, void *opaque) +{ + virBuffer *buf = opaque; + struct virQEMUEbpfOBjectData *object = payload; + g_autofree char *objectBase64 = NULL; + + if (!buf || !object)
'buf' can't be null here, and neither 'object'.
+ return 0; + + objectBase64 = g_base64_encode(object->ebpfData, object->ebpfSize); + if (!objectBase64)
g_base64_encode can't return NULL thus this check is pointless.
+ return 0; + + virBufferAsprintf(buf, "<ebpf id='%s'>\n", name); + virBufferAdjustIndent(buf, 2); + + virBufferAddStr(buf, objectBase64); + virBufferAddLit(buf, "\n"); + + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</ebpf>\n");
This format is not acceptable. As noted, store it into a attribute instead of the element so that we can reasonably have subelements might it become required.
+ + return 0; +} + char * virQEMUCapsFormatCache(virQEMUCaps *qemuCaps) { @@ -5015,6 +5119,8 @@ virQEMUCapsFormatCache(virQEMUCaps *qemuCaps) if (qemuCaps->kvmSupportsSecureGuest) virBufferAddLit(&buf, "<kvmSupportsSecureGuest/>\n");
+ virHashForEach(qemuCaps->ebpfObjects, virQEMUCapsFormatEbpfObjectsIterator, &buf);
Missing container around the '<ebpf>' elements. I know that XML allows multiple, but we prefer stashing them in a container. Also use virHashForEachSorted here. It's obviously less efficient, but the capability cache is also handled in tests thus we must use a variant which has stable output.
+ virBufferAdjustIndent(&buf, -2); virBufferAddLit(&buf, "</qemuCaps>\n");
@@ -5437,6 +5543,79 @@ virQEMUCapsInitProcessCaps(virQEMUCaps *qemuCaps) }
+static int +virQEMUCapsProbeQMPEbpfObject(virQEMUCaps *qemuCaps, const char *id, + qemuMonitor *mon) +{ + void *ebpfObject = NULL; + size_t size = 0; + + if (!id) + return -1;
This can't happen. If it did you'd not report an error.
+ + ebpfObject = qemuMonitorGetEbpf(mon, id, &size); + if(ebpfObject == NULL)
As noted in previous patch, you can't determine here that NULL is failure. And you'd propagate the non-failure (without error set) to upper layers.
+ return -1; + + struct virQEMUEbpfOBjectData *object = g_malloc(sizeof(*object)); + object->ebpfData = ebpfObject; + object->ebpfSize = size; + + virHashAddEntry(qemuCaps->ebpfObjects, id, object);
This can return error and is not checked.
+ + return 0; +} + +static void virQEMUCapsProbeQMPSchemaEbpf(virQEMUCaps *qemuCaps, GHashTable* schema, qemuMonitor *mon) { + if (schema == NULL) + return;
Can't happen.
+ + virJSONValue *requestSchema = virHashLookup(schema, "request-ebpf"); + if (!requestSchema) + return; + + const char *argType = virJSONValueObjectGetString(requestSchema, "arg-type"); + if (!argType) + return; + + virJSONValue *argSchema = virHashLookup(schema, argType); + if (!argSchema) + return; + + /* Get member id type*/ + virJSONValue *members = virJSONValueObjectGetArray(argSchema, "members"); + if (!members) + return; + + /* Find "id" type */ + virJSONValue *idSchema = NULL; + for (size_t i = 0; (idSchema = virJSONValueArrayGet(members, i)) != NULL; ++i) { + const char *name = virJSONValueObjectGetString(idSchema, "name"); + if (strncmp(name, "id", 3) == 0) + break; + } + + if (!idSchema) + return; + + const char *ebpfIds = virJSONValueObjectGetString(idSchema, "type"); + virJSONValue *ebpfIdsSchema = virHashLookup(schema, ebpfIds); + if (!ebpfIdsSchema) + return; + + virJSONValue *ebpfIdsArray = virJSONValueObjectGetArray(ebpfIdsSchema, "values"); + if (!ebpfIdsArray) + return;
Instead of all of the above please use our existing schema query infrastructure: virQEMUQAPISchemaPathGet("request-ebpf/arg-type/id", schema, &ebpfIdsArray) This should fill ebpfIdsArray with what the whole code above does. Since the above code fails to compile I'll replace it by my variant: static int virQEMUCapsProbeQMPSchemaEbpf(virQEMUCaps *qemuCaps, GHashTable* schema, qemuMonitor *mon) { virJSONValue *ebpfIdsArray; virJSONValue *ebpfIdsSchema; size_t i; if (virQEMUQAPISchemaPathGet("request-ebpf/arg-type/id", schema, &ebpfIdsSchema) != 1) return 0; if (!(ebpfIdsArray = virJSONValueObjectGetArray(ebpfIdsSchema, "values"))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("malformed QMP schema of 'request-ebpf'")); return -1; } /* Try to request every eBPF */ for (i = 0; i < virJSONValueArraySize(ebpfIdsArray); i++) { virJSONValue *id = virJSONValueArrayGet(ebpfIdsArray, i); if (virQEMUCapsProbeQMPEbpfObject(qemuCaps, virJSONValueGetString(id), mon) < 0) return -1; } return 0; }
+ + /* Try to request every eBPF */ + virJSONValue *id = NULL; + for (size_t i = 0; (id = virJSONValueArrayGet(ebpfIdsArray, i)) != NULL; ++i) { + const char *name = virJSONValueGetString(id); + virQEMUCapsProbeQMPEbpfObject(qemuCaps, name, mon);
Ah, so you in the end ignore all the errors regardless.
+ } +} + + static int virQEMUCapsProbeQMPSchemaCapabilities(virQEMUCaps *qemuCaps, qemuMonitor *mon) @@ -5466,6 +5645,8 @@ virQEMUCapsProbeQMPSchemaCapabilities(virQEMUCaps *qemuCaps, virQEMUCapsSet(qemuCaps, cmd->flag); }
+ virQEMUCapsProbeQMPSchemaEbpf(qemuCaps, schema, mon); + return 0; }

Hi all, thank you for your comments. On Mon, Oct 9, 2023 at 12:48 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Oct 09, 2023 at 09:16:12 +0300, Andrew Melnychenko wrote:
Also, added logic for retrieving eBPF objects from QEMU. eBPF objects stored in the hash table during runtime. eBPF objects cached encoded in base64 in the .xml cache file.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- src/qemu/qemu_capabilities.c | 181 +++++++++++++++++++++++++++++++++++ src/qemu/qemu_capabilities.h | 4 + 2 files changed, 185 insertions(+)
Once again, make sure to check that each patch builds and passes tests. This one fails to compile:
u/libvirt_driver_qemu_impl.a.p/qemu_capabilities.c.o -c ../../../libvirt/src/qemu/qemu_capabilities.c ../../../libvirt/src/qemu/qemu_capabilities.c:808:6: error: no previous prototype for ‘virQEMUEbpfOBjectDataDestroy’ [-Werror=missing-prototypes] 808 | void virQEMUEbpfOBjectDataDestroy(gpointer data) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c: In function ‘virQEMUEbpfOBjectDataDestroy’: ../../../libvirt/src/qemu/qemu_capabilities.c:811:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 811 | struct virQEMUEbpfOBjectData *object = data; | ^~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c: In function ‘virQEMUCapsProbeQMPEbpfObject’: ../../../libvirt/src/qemu/qemu_capabilities.c:5560:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5560 | struct virQEMUEbpfOBjectData *object = g_malloc(sizeof(*object)); | ^~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c: In function ‘virQEMUCapsProbeQMPSchemaEbpf’: ../../../libvirt/src/qemu/qemu_capabilities.c:5573:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5573 | virJSONValue *requestSchema = virHashLookup(schema, "request-ebpf"); | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5577:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5577 | const char *argType = virJSONValueObjectGetString(requestSchema, "arg-type"); | ^~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5581:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5581 | virJSONValue *argSchema = virHashLookup(schema, argType); | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5586:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5586 | virJSONValue *members = virJSONValueObjectGetArray(argSchema, "members"); | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5591:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5591 | virJSONValue *idSchema = NULL; | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5601:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5601 | const char *ebpfIds = virJSONValueObjectGetString(idSchema, "type"); | ^~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5606:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5606 | virJSONValue *ebpfIdsArray = virJSONValueObjectGetArray(ebpfIdsSchema, "values"); | ^~~~~~~~~~~~ ../../../libvirt/src/qemu/qemu_capabilities.c:5611:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 5611 | virJSONValue *id = NULL; | ^~~~~~~~~~~~ cc1: all warnings being treated as errors
After the compile error is fixed there are further errors caught by syntax-check.
@@ -789,6 +790,9 @@ struct _virQEMUCaps { virQEMUCapsAccel kvm; virQEMUCapsAccel hvf; virQEMUCapsAccel tcg; + + /* Hash of ebpf objects virQEMUEbpfOBjectData */ + GHashTable *ebpfObjects; };
struct virQEMUCapsSearchData { @@ -796,6 +800,18 @@ struct virQEMUCapsSearchData { const char *binaryFilter; };
+struct virQEMUEbpfOBjectData { + void *ebpfData; + size_t ebpfSize; +}; + +void virQEMUEbpfOBjectDataDestroy(gpointer data) { + if (!data) + return; + struct virQEMUEbpfOBjectData *object = data; + g_free(object->ebpfData); + g_free(data); +}
As noted, store the program as the base64 string throughout libvirt, none of the above will be needed.
Ok.
static virClass *virQEMUCapsClass; static void virQEMUCapsDispose(void *obj); @@ -836,6 +852,19 @@ const char *virQEMUCapsArchToString(virArch arch) }
+const void * +virQEMUCapsGetEbpf(virQEMUCaps *qemuCaps, const char *id, size_t *size) +{ + struct virQEMUEbpfOBjectData *object = virHashLookup(qemuCaps->ebpfObjects, id); + + if (!object) + return NULL; + + *size = object->ebpfSize; + return object->ebpfData; +} + + /* Checks whether a domain with @guest arch can run natively on @host. */ bool @@ -1426,6 +1455,7 @@ static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVirtioBlk[] = { static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVirtioNet[] = { { "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL }, { "rss", QEMU_CAPS_VIRTIO_NET_RSS, NULL }, + { "ebpf_rss_fds", QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS, NULL },
The QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS capability is not actually connected to the 'request-ebpf' output caching.
The ebpf programs are cached if the 'request-ebpf' command is present and in such case all programs are cached.
Please split the patch such that one adds the 'request-ebpf' related changes and the second one adds the virtio-net capability.
I'll prepare new hunks in the next version.
};
static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsPCIeRootPort[] = { @@ -1804,6 +1834,8 @@ virQEMUCapsNew(void) qemuCaps->invalidation = true; qemuCaps->flags = virBitmapNew(QEMU_CAPS_LAST);
+ qemuCaps->ebpfObjects = virHashNew(virQEMUEbpfOBjectDataDestroy); + return qemuCaps; }
@@ -1984,6 +2016,8 @@ virQEMUCaps *virQEMUCapsNewCopy(virQEMUCaps *qemuCaps) ret->hypervCapabilities = g_memdup(qemuCaps->hypervCapabilities, sizeof(virDomainCapsFeatureHyperv));
+ ret->ebpfObjects = g_hash_table_ref(qemuCaps->ebpfObjects);
virQEMUCapsNewCopy is a deep copy function, thus you must deep copy the objects.
+ return g_steal_pointer(&ret);
[...]
@@ -4541,6 +4577,46 @@ virQEMUCapsValidateArch(virQEMUCaps *qemuCaps, xmlXPathContextPtr ctxt) }
+static int +virQEMUCapsParseEbpfObjects(virQEMUCaps *qemuCaps, xmlXPathContextPtr ctxt) +{ + g_autofree xmlNodePtr *nodes = NULL; + size_t i; + int n; + + if ((n = virXPathNodeSet("./ebpf", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to parse qemu cached eBPF object")); + return -1; + } + + for (i = 0; i < n; i++) { + g_autofree char *id = NULL; + g_autofree char *ebpf = NULL; + struct virQEMUEbpfOBjectData *ebpfObject = NULL; + + if (!(id = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing eBPF object ID in QEMU capabilities cache"));
Use virXMLPropStringRequired instead of reporting a custom error.
+ return -1; + } + + if (!(ebpf = virXMLNodeContentString(nodes[i]))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s %s",
broken format string in the error
+ _("can't get eBPF base64 data from cache for ID: "), id);
Store the program as an attribute
+ return -1; + } + + ebpfObject = g_malloc(sizeof(*ebpfObject));
we use exclusively g_new0 for allocation of structs.
+ ebpfObject->ebpfData = g_base64_decode(ebpf, &ebpfObject->ebpfSize);
As noted store this as a string you'll save this step here.
+ + virHashAddEntry(qemuCaps->ebpfObjects, id, ebpfObject); + } + + return 0; +} + + /* * Parsing a doc that looks like * @@ -4688,6 +4764,8 @@ virQEMUCapsLoadCache(virArch hostArch, if (skipInvalidation) qemuCaps->invalidation = false;
+ virQEMUCapsParseEbpfObjects(qemuCaps, ctxt);
So the above function reports many errors, here you don't bother looking for them.
+ return 0; }
@@ -4925,6 +5003,32 @@ virQEMUCapsFormatHypervCapabilities(virQEMUCaps *qemuCaps, }
+static int +virQEMUCapsFormatEbpfObjectsIterator(void *payload, const char *name, void *opaque) +{ + virBuffer *buf = opaque; + struct virQEMUEbpfOBjectData *object = payload; + g_autofree char *objectBase64 = NULL; + + if (!buf || !object)
'buf' can't be null here, and neither 'object'.
+ return 0; + + objectBase64 = g_base64_encode(object->ebpfData, object->ebpfSize); + if (!objectBase64)
g_base64_encode can't return NULL thus this check is pointless.
+ return 0; + + virBufferAsprintf(buf, "<ebpf id='%s'>\n", name); + virBufferAdjustIndent(buf, 2); + + virBufferAddStr(buf, objectBase64); + virBufferAddLit(buf, "\n"); + + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</ebpf>\n");
This format is not acceptable. As noted, store it into a attribute instead of the element so that we can reasonably have subelements might it become required.
+ + return 0; +} + char * virQEMUCapsFormatCache(virQEMUCaps *qemuCaps) { @@ -5015,6 +5119,8 @@ virQEMUCapsFormatCache(virQEMUCaps *qemuCaps) if (qemuCaps->kvmSupportsSecureGuest) virBufferAddLit(&buf, "<kvmSupportsSecureGuest/>\n");
+ virHashForEach(qemuCaps->ebpfObjects, virQEMUCapsFormatEbpfObjectsIterator, &buf);
Missing container around the '<ebpf>' elements. I know that XML allows multiple, but we prefer stashing them in a container.
Also use virHashForEachSorted here. It's obviously less efficient, but the capability cache is also handled in tests thus we must use a variant which has stable output.
Ok. Is it should be something like this: ``` <ebpfs> <ebpf id="rss" data="..."/> <ebpf id="filter" data="..."/> ... </ebpfs> ```
+ virBufferAdjustIndent(&buf, -2); virBufferAddLit(&buf, "</qemuCaps>\n");
@@ -5437,6 +5543,79 @@ virQEMUCapsInitProcessCaps(virQEMUCaps *qemuCaps) }
+static int +virQEMUCapsProbeQMPEbpfObject(virQEMUCaps *qemuCaps, const char *id, + qemuMonitor *mon) +{ + void *ebpfObject = NULL; + size_t size = 0; + + if (!id) + return -1;
This can't happen. If it did you'd not report an error.
+ + ebpfObject = qemuMonitorGetEbpf(mon, id, &size); + if(ebpfObject == NULL)
As noted in previous patch, you can't determine here that NULL is failure. And you'd propagate the non-failure (without error set) to upper layers.
+ return -1; + + struct virQEMUEbpfOBjectData *object = g_malloc(sizeof(*object)); + object->ebpfData = ebpfObject; + object->ebpfSize = size; + + virHashAddEntry(qemuCaps->ebpfObjects, id, object);
This can return error and is not checked.
+ + return 0; +} + +static void virQEMUCapsProbeQMPSchemaEbpf(virQEMUCaps *qemuCaps, GHashTable* schema, qemuMonitor *mon) { + if (schema == NULL) + return;
Can't happen.
+ + virJSONValue *requestSchema = virHashLookup(schema, "request-ebpf"); + if (!requestSchema) + return; + + const char *argType = virJSONValueObjectGetString(requestSchema, "arg-type"); + if (!argType) + return; + + virJSONValue *argSchema = virHashLookup(schema, argType); + if (!argSchema) + return; + + /* Get member id type*/ + virJSONValue *members = virJSONValueObjectGetArray(argSchema, "members"); + if (!members) + return; + + /* Find "id" type */ + virJSONValue *idSchema = NULL; + for (size_t i = 0; (idSchema = virJSONValueArrayGet(members, i)) != NULL; ++i) { + const char *name = virJSONValueObjectGetString(idSchema, "name"); + if (strncmp(name, "id", 3) == 0) + break; + } + + if (!idSchema) + return; + + const char *ebpfIds = virJSONValueObjectGetString(idSchema, "type"); + virJSONValue *ebpfIdsSchema = virHashLookup(schema, ebpfIds); + if (!ebpfIdsSchema) + return; + + virJSONValue *ebpfIdsArray = virJSONValueObjectGetArray(ebpfIdsSchema, "values"); + if (!ebpfIdsArray) + return;
Instead of all of the above please use our existing schema query infrastructure:
virQEMUQAPISchemaPathGet("request-ebpf/arg-type/id", schema, &ebpfIdsArray)
This should fill ebpfIdsArray with what the whole code above does.
Since the above code fails to compile I'll replace it by my variant:
static int virQEMUCapsProbeQMPSchemaEbpf(virQEMUCaps *qemuCaps, GHashTable* schema, qemuMonitor *mon) { virJSONValue *ebpfIdsArray; virJSONValue *ebpfIdsSchema; size_t i;
if (virQEMUQAPISchemaPathGet("request-ebpf/arg-type/id", schema, &ebpfIdsSchema) != 1) return 0;
if (!(ebpfIdsArray = virJSONValueObjectGetArray(ebpfIdsSchema, "values"))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("malformed QMP schema of 'request-ebpf'")); return -1; }
/* Try to request every eBPF */ for (i = 0; i < virJSONValueArraySize(ebpfIdsArray); i++) { virJSONValue *id = virJSONValueArrayGet(ebpfIdsArray, i);
if (virQEMUCapsProbeQMPEbpfObject(qemuCaps, virJSONValueGetString(id), mon) < 0) return -1; }
return 0; }
Thank you - missed the existing schema query and did it the hard way. Your code will be used in the next patches.
+ + /* Try to request every eBPF */ + virJSONValue *id = NULL; + for (size_t i = 0; (id = virJSONValueArrayGet(ebpfIdsArray, i)) != NULL; ++i) { + const char *name = virJSONValueGetString(id); + virQEMUCapsProbeQMPEbpfObject(qemuCaps, name, mon);
Ah, so you in the end ignore all the errors regardless.
Well, the idea was if we can't retrieve eBPF object for some reason it shouldn't stop the workflow - work continues without possible eBPF.
+ } +} + + static int virQEMUCapsProbeQMPSchemaCapabilities(virQEMUCaps *qemuCaps, qemuMonitor *mon) @@ -5466,6 +5645,8 @@ virQEMUCapsProbeQMPSchemaCapabilities(virQEMUCaps *qemuCaps, virQEMUCapsSet(qemuCaps, cmd->flag); }
+ virQEMUCapsProbeQMPSchemaEbpf(qemuCaps, schema, mon); + return 0; }

Also, added dependencies for libbpf with bpf option. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- meson.build | 6 ++++++ meson_options.txt | 1 + src/qemu/meson.build | 1 + src/qemu/qemu_interface.c | 42 +++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_interface.h | 4 ++++ 5 files changed, 54 insertions(+) diff --git a/meson.build b/meson.build index de23fbda1e..b68e916246 100644 --- a/meson.build +++ b/meson.build @@ -1381,6 +1381,11 @@ if yajl_dep.found() conf.set('WITH_YAJL', 1) endif +bpf_version = '1.1.0' +bpf_dep = dependency('libbpf', version: '>=' + bpf_version, required: get_option('bpf')) +if bpf_dep.found() + conf.set('WITH_BPF', 1) +endif # generic build dependencies checks @@ -2269,6 +2274,7 @@ libs_summary = { 'udev': udev_dep.found(), 'xdr': xdr_dep.found(), 'yajl': yajl_dep.found(), + 'bpf': bpf_dep.found(), } summary(libs_summary, section: 'Libraries', bool_yn: true) diff --git a/meson_options.txt b/meson_options.txt index 7c428a9eb0..092b2efe06 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -44,6 +44,7 @@ option('udev', type: 'feature', value: 'auto', description: 'udev support') option('wireshark_dissector', type: 'feature', value: 'auto', description: 'wireshark support') option('wireshark_plugindir', type: 'string', value: '', description: 'wireshark plugins directory for use when installing wireshark plugin') option('yajl', type: 'feature', value: 'auto', description: 'yajl support') +option('bpf', type: 'feature', value: 'auto', description: 'qemu libbpf support') # build driver options diff --git a/src/qemu/meson.build b/src/qemu/meson.build index 64c62e584f..afd9133ae0 100644 --- a/src/qemu/meson.build +++ b/src/qemu/meson.build @@ -105,6 +105,7 @@ if conf.has('WITH_QEMU') selinux_dep, src_dep, xdr_dep, + bpf_dep, ], include_directories: [ conf_inc_dir, diff --git a/src/qemu/qemu_interface.c b/src/qemu/qemu_interface.c index 8856bb95a8..a3a43a43c5 100644 --- a/src/qemu/qemu_interface.c +++ b/src/qemu/qemu_interface.c @@ -36,6 +36,7 @@ #include <sys/stat.h> #include <fcntl.h> +#include <bpf/libbpf.h> #define VIR_FROM_THIS VIR_FROM_QEMU @@ -763,3 +764,44 @@ qemuInterfaceOpenVhostNet(virDomainObj *vm, virDomainAuditNetDevice(vm->def, net, vhostnet_path, vhostfdSize); return 0; } + + +int qemuInterfaceLoadEbpf(const void *ebpfObject, size_t ebpfSize, void **retLibbpfObj, int *fds, int nfds) { + int err = 0; + int i = 0; + + struct bpf_object *obj = bpf_object__open_mem(ebpfObject, ebpfSize, NULL); + err = libbpf_get_error(obj); + if(err) { + return -1; + } + + struct bpf_program *prog; + struct bpf_map *map; + + err = bpf_object__load(obj); + if (err) { + return -1; + } + + bpf_object__for_each_program(prog, obj) { + fds[i] = bpf_program__fd(prog); + ++i; + } + + bpf_object__for_each_map(map, obj) { + fds[i] = bpf_map__fd(map); + ++i; + } + + *retLibbpfObj = obj; + + return i; +} + + +void qemuInterfaceCloseEbpf(void *libbpfObj) +{ + if(libbpfObj) + bpf_object__close(libbpfObj); +} diff --git a/src/qemu/qemu_interface.h b/src/qemu/qemu_interface.h index 6eed3e6bd7..dbc1fd625c 100644 --- a/src/qemu/qemu_interface.h +++ b/src/qemu/qemu_interface.h @@ -55,3 +55,7 @@ int qemuInterfaceOpenVhostNet(virDomainObj *def, int qemuInterfacePrepareSlirp(virQEMUDriver *driver, virDomainNetDef *net); + +int qemuInterfaceLoadEbpf(const void *ebpfObject, size_t ebpfSize, void **retLibbpfObj, int *fds, int nfds); + +void qemuInterfaceCloseEbpf(void *libbpfObj); -- 2.42.0

On Mon, Oct 09, 2023 at 09:16:13 +0300, Andrew Melnychenko wrote:
Also, added dependencies for libbpf with bpf option.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- meson.build | 6 ++++++ meson_options.txt | 1 + src/qemu/meson.build | 1 + src/qemu/qemu_interface.c | 42 +++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_interface.h | 4 ++++ 5 files changed, 54 insertions(+)
Yet again fails to compile: a.p/qemu_interface.c.o -c ../../../libvirt/src/qemu/qemu_interface.c ../../../libvirt/src/qemu/qemu_interface.c: In function ‘qemuInterfaceLoadEbpf’: ../../../libvirt/src/qemu/qemu_interface.c:779:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 779 | struct bpf_program *prog; | ^~~~~~ ../../../libvirt/src/qemu/qemu_interface.c:769:103: error: unused parameter ‘nfds’ [-Werror=unused-parameter] 769 | int qemuInterfaceLoadEbpf(const void *ebpfObject, size_t ebpfSize, void **retLibbpfObj, int *fds, int nfds) { | ~~~~^~~~ Fixing that also shows that the syntax-check rules were also broken.
diff --git a/meson.build b/meson.build index de23fbda1e..b68e916246 100644 --- a/meson.build +++ b/meson.build @@ -1381,6 +1381,11 @@ if yajl_dep.found() conf.set('WITH_YAJL', 1) endif
+bpf_version = '1.1.0' +bpf_dep = dependency('libbpf', version: '>=' + bpf_version, required: get_option('bpf')) +if bpf_dep.found() + conf.set('WITH_BPF', 1) +endif
# generic build dependencies checks
@@ -2269,6 +2274,7 @@ libs_summary = { 'udev': udev_dep.found(), 'xdr': xdr_dep.found(), 'yajl': yajl_dep.found(), + 'bpf': bpf_dep.found(), } summary(libs_summary, section: 'Libraries', bool_yn: true)
diff --git a/meson_options.txt b/meson_options.txt index 7c428a9eb0..092b2efe06 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -44,6 +44,7 @@ option('udev', type: 'feature', value: 'auto', description: 'udev support') option('wireshark_dissector', type: 'feature', value: 'auto', description: 'wireshark support') option('wireshark_plugindir', type: 'string', value: '', description: 'wireshark plugins directory for use when installing wireshark plugin') option('yajl', type: 'feature', value: 'auto', description: 'yajl support') +option('bpf', type: 'feature', value: 'auto', description: 'qemu libbpf support')
# build driver options diff --git a/src/qemu/meson.build b/src/qemu/meson.build index 64c62e584f..afd9133ae0 100644 --- a/src/qemu/meson.build +++ b/src/qemu/meson.build @@ -105,6 +105,7 @@ if conf.has('WITH_QEMU') selinux_dep, src_dep, xdr_dep, + bpf_dep, ], include_directories: [ conf_inc_dir, diff --git a/src/qemu/qemu_interface.c b/src/qemu/qemu_interface.c index 8856bb95a8..a3a43a43c5 100644 --- a/src/qemu/qemu_interface.c +++ b/src/qemu/qemu_interface.c @@ -36,6 +36,7 @@
#include <sys/stat.h> #include <fcntl.h> +#include <bpf/libbpf.h>
#define VIR_FROM_THIS VIR_FROM_QEMU
@@ -763,3 +764,44 @@ qemuInterfaceOpenVhostNet(virDomainObj *vm, virDomainAuditNetDevice(vm->def, net, vhostnet_path, vhostfdSize); return 0; } + +
Please document this function. It's not very obvious what it does on the first glance.
+int qemuInterfaceLoadEbpf(const void *ebpfObject, size_t ebpfSize, void **retLibbpfObj, int *fds, int nfds) { + int err = 0; + int i = 0; + + struct bpf_object *obj = bpf_object__open_mem(ebpfObject, ebpfSize, NULL);
This is compiled regardless of 'WITH_BPF' flag ...
+ err = libbpf_get_error(obj); + if(err) { + return -1; + }

Added logic for loading the "RSS" eBPF program. eBPF file descriptors passed to the QEMU. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- src/qemu/qemu_command.c | 53 +++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.c | 4 ++++ src/qemu/qemu_domain.h | 3 +++ 3 files changed, 60 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 8a7b80719f..ca6cb1bc7a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3786,6 +3786,23 @@ qemuBuildLegacyNicStr(virDomainNetDef *net) NULLSTR_EMPTY(net->info.alias)); } +static char *qemuBuildEbpfRssArg(virDomainNetDef *net) { + qemuDomainNetworkPrivate *netpriv = QEMU_DOMAIN_NETWORK_PRIVATE(net); + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + size_t nfds; + GSList *n; + + if (netpriv->ebpfrssfds) { + nfds = 0; + for (n = netpriv->ebpfrssfds; n; n = n->next) { + virBufferAsprintf(&buf, "%s:", qemuFDPassDirectGetPath(n->data)); + nfds++; + } + } + virBufferTrim(&buf, ":"); + + return virBufferContentAndReset(&buf); +} virJSONValue * qemuBuildNicDevProps(virDomainDef *def, @@ -3871,6 +3888,11 @@ qemuBuildNicDevProps(virDomainDef *def, "T:failover", failover, NULL) < 0) return NULL; + if (net->driver.virtio.rss == VIR_TRISTATE_SWITCH_ON && virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS)){ + g_autofree char *ebpf = qemuBuildEbpfRssArg(net); + if (virJSONValueObjectAdd(&props, "S:ebpf_rss_fds", ebpf, NULL) < 0) + return NULL; + } } else { if (virJSONValueObjectAdd(&props, "s:driver", virDomainNetGetModelString(net), @@ -4152,6 +4174,33 @@ qemuBuildWatchdogDevProps(const virDomainDef *def, } +static void qemuOpenEbpfRssFds(virDomainNetDef *net, + virQEMUCaps *qemuCaps) { + const void *ebpfRSSObject = NULL; + size_t ebpfRSSObjectSize = 0; + int fds[16]; + int nfds = 0; + qemuDomainNetworkPrivate *netpriv = QEMU_DOMAIN_NETWORK_PRIVATE(net); + + netpriv->libbpfRSSObject = NULL; + netpriv->ebpfrssfds = NULL; + + /* Add ebpf values */ + if (net->driver.virtio.rss == VIR_TRISTATE_SWITCH_ON && virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS)) { + ebpfRSSObject = virQEMUCapsGetEbpf(qemuCaps, "rss", &ebpfRSSObjectSize); + nfds = qemuInterfaceLoadEbpf(ebpfRSSObject, ebpfRSSObjectSize, &netpriv->libbpfRSSObject, fds, 16); + + if (nfds > 0) { + for (int i = 0; i < nfds; ++i) { + g_autofree char *name = g_strdup_printf("ebpfrssfd-%s-%u", net->info.alias, i); + netpriv->ebpfrssfds = g_slist_prepend(netpriv->ebpfrssfds, qemuFDPassDirectNew(name, fds + i)); + } + netpriv->ebpfrssfds = g_slist_reverse(netpriv->ebpfrssfds); + } + } +} + + static int qemuBuildWatchdogCommandLine(virCommand *cmd, const virDomainDef *def, @@ -8735,6 +8784,10 @@ qemuBuildInterfaceCommandLine(virQEMUDriver *driver, qemuFDPassDirectTransferCommand(netpriv->slirpfd, cmd); qemuFDPassTransferCommand(netpriv->vdpafd, cmd); + qemuOpenEbpfRssFds(net, qemuCaps); + for (n = netpriv->ebpfrssfds; n; n = n->next) + qemuFDPassDirectTransferCommand(n->data, cmd); + if (!(hostnetprops = qemuBuildHostNetProps(vm, net))) goto cleanup; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index eec7bed28b..c696482f29 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -38,6 +38,7 @@ #include "qemu_checkpoint.h" #include "qemu_validate.h" #include "qemu_namespace.h" +#include "qemu_interface.h" #include "viralloc.h" #include "virlog.h" #include "virerror.h" @@ -1078,6 +1079,7 @@ qemuDomainNetworkPrivateClearFDs(qemuDomainNetworkPrivate *priv) g_clear_pointer(&priv->vdpafd, qemuFDPassFree); g_slist_free_full(g_steal_pointer(&priv->vhostfds), (GDestroyNotify) qemuFDPassDirectFree); g_slist_free_full(g_steal_pointer(&priv->tapfds), (GDestroyNotify) qemuFDPassDirectFree); + g_slist_free_full(g_steal_pointer(&priv->ebpfrssfds), (GDestroyNotify) qemuFDPassDirectFree); } @@ -1089,6 +1091,8 @@ qemuDomainNetworkPrivateDispose(void *obj G_GNUC_UNUSED) qemuSlirpFree(priv->slirp); qemuDomainNetworkPrivateClearFDs(priv); + + qemuInterfaceCloseEbpf(priv->libbpfRSSObject); } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index a3fc6acaaa..5e155b77cc 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -431,6 +431,9 @@ struct _qemuDomainNetworkPrivate { GSList *tapfds; /* qemuFDPassDirect */ GSList *vhostfds; /* qemuFDPassDirect */ qemuFDPass *vdpafd; + + void *libbpfRSSObject; + GSList *ebpfrssfds; /* qemuFDPassDirect eBPF RSS fds from helper */ }; -- 2.42.0

On Mon, Oct 09, 2023 at 09:16:14 +0300, Andrew Melnychenko wrote:
Added logic for loading the "RSS" eBPF program. eBPF file descriptors passed to the QEMU.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- src/qemu/qemu_command.c | 53 +++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.c | 4 ++++ src/qemu/qemu_domain.h | 3 +++ 3 files changed, 60 insertions(+)
This will require addition to qemuxml2argvtest once the qemu bit is ready.
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 8a7b80719f..ca6cb1bc7a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3786,6 +3786,23 @@ qemuBuildLegacyNicStr(virDomainNetDef *net) NULLSTR_EMPTY(net->info.alias)); }
+static char *qemuBuildEbpfRssArg(virDomainNetDef *net) { + qemuDomainNetworkPrivate *netpriv = QEMU_DOMAIN_NETWORK_PRIVATE(net); + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + size_t nfds; + GSList *n; + + if (netpriv->ebpfrssfds) {
This check is not needed.
+ nfds = 0; + for (n = netpriv->ebpfrssfds; n; n = n->next) { + virBufferAsprintf(&buf, "%s:", qemuFDPassDirectGetPath(n->data)); + nfds++;
'nfds' is not used.
+ } + } + virBufferTrim(&buf, ":"); + + return virBufferContentAndReset(&buf); +}
virJSONValue * qemuBuildNicDevProps(virDomainDef *def, @@ -3871,6 +3888,11 @@ qemuBuildNicDevProps(virDomainDef *def, "T:failover", failover, NULL) < 0) return NULL; + if (net->driver.virtio.rss == VIR_TRISTATE_SWITCH_ON && virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS)){
This check should not be needed here ...
+ g_autofree char *ebpf = qemuBuildEbpfRssArg(net);
Because this will return a NULL string if it was not set up.
+ if (virJSONValueObjectAdd(&props, "S:ebpf_rss_fds", ebpf, NULL) < 0)
This is definitely not needed in a separate step and can be done in the block above as "S:" skipps the attribute if the argument is NULL.
+ return NULL; + } } else { if (virJSONValueObjectAdd(&props, "s:driver", virDomainNetGetModelString(net), @@ -4152,6 +4174,33 @@ qemuBuildWatchdogDevProps(const virDomainDef *def, }
+static void qemuOpenEbpfRssFds(virDomainNetDef *net, + virQEMUCaps *qemuCaps) { + const void *ebpfRSSObject = NULL; + size_t ebpfRSSObjectSize = 0; + int fds[16];
16 feels arbitrary
+ int nfds = 0; + qemuDomainNetworkPrivate *netpriv = QEMU_DOMAIN_NETWORK_PRIVATE(net); + + netpriv->libbpfRSSObject = NULL; + netpriv->ebpfrssfds = NULL; + + /* Add ebpf values */ + if (net->driver.virtio.rss == VIR_TRISTATE_SWITCH_ON && virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS)) {
This line is too long and there's a good spot to break it up;
+ ebpfRSSObject = virQEMUCapsGetEbpf(qemuCaps, "rss", &ebpfRSSObjectSize); + nfds = qemuInterfaceLoadEbpf(ebpfRSSObject, ebpfRSSObjectSize, &netpriv->libbpfRSSObject, fds, 16);
Note that the '16' argument is unused by the callee. Also note that when running an unprivileged instance of libvirt, this function might not have permissions to do anything. It seems that the rest is okay, but we need to avoid any form of spurious errors and such if that is really okay and there are fallback paths.
+ + if (nfds > 0) { + for (int i = 0; i < nfds; ++i) { + g_autofree char *name = g_strdup_printf("ebpfrssfd-%s-%u", net->info.alias, i); + netpriv->ebpfrssfds = g_slist_prepend(netpriv->ebpfrssfds, qemuFDPassDirectNew(name, fds + i)); + } + netpriv->ebpfrssfds = g_slist_reverse(netpriv->ebpfrssfds); + } + } +} + + static int qemuBuildWatchdogCommandLine(virCommand *cmd, const virDomainDef *def, @@ -8735,6 +8784,10 @@ qemuBuildInterfaceCommandLine(virQEMUDriver *driver, qemuFDPassDirectTransferCommand(netpriv->slirpfd, cmd); qemuFDPassTransferCommand(netpriv->vdpafd, cmd);
+ qemuOpenEbpfRssFds(net, qemuCaps);
Note that this will cause problems when you'll be about to test this functionality in qemuxml2argvtest. The test code must not modify the host state, so this will either need to be mocked in the testsuite or you'll need to move it to the host-prepare function 'qemuProcessPrepareHost' which is skipped from the test suite. The test suite will then need to fake the data.
+ for (n = netpriv->ebpfrssfds; n; n = n->next) + qemuFDPassDirectTransferCommand(n->data, cmd); + if (!(hostnetprops = qemuBuildHostNetProps(vm, net))) goto cleanup;
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index eec7bed28b..c696482f29 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -38,6 +38,7 @@ #include "qemu_checkpoint.h" #include "qemu_validate.h" #include "qemu_namespace.h" +#include "qemu_interface.h" #include "viralloc.h" #include "virlog.h" #include "virerror.h" @@ -1078,6 +1079,7 @@ qemuDomainNetworkPrivateClearFDs(qemuDomainNetworkPrivate *priv) g_clear_pointer(&priv->vdpafd, qemuFDPassFree); g_slist_free_full(g_steal_pointer(&priv->vhostfds), (GDestroyNotify) qemuFDPassDirectFree); g_slist_free_full(g_steal_pointer(&priv->tapfds), (GDestroyNotify) qemuFDPassDirectFree); + g_slist_free_full(g_steal_pointer(&priv->ebpfrssfds), (GDestroyNotify) qemuFDPassDirectFree); }
@@ -1089,6 +1091,8 @@ qemuDomainNetworkPrivateDispose(void *obj G_GNUC_UNUSED) qemuSlirpFree(priv->slirp);
qemuDomainNetworkPrivateClearFDs(priv); + + qemuInterfaceCloseEbpf(priv->libbpfRSSObject);
Does libvirt need to keep this open during the whole lifetime of the VM? Also note that this code will be called when restarting the libvirtd/virtqemud daemon, thus it can be called while the VM is alive.

Hi all, On Mon, Oct 9, 2023 at 1:54 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Oct 09, 2023 at 09:16:14 +0300, Andrew Melnychenko wrote:
Added logic for loading the "RSS" eBPF program. eBPF file descriptors passed to the QEMU.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com> --- src/qemu/qemu_command.c | 53 +++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.c | 4 ++++ src/qemu/qemu_domain.h | 3 +++ 3 files changed, 60 insertions(+)
This will require addition to qemuxml2argvtest once the qemu bit is ready.
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 8a7b80719f..ca6cb1bc7a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3786,6 +3786,23 @@ qemuBuildLegacyNicStr(virDomainNetDef *net) NULLSTR_EMPTY(net->info.alias)); }
+static char *qemuBuildEbpfRssArg(virDomainNetDef *net) { + qemuDomainNetworkPrivate *netpriv = QEMU_DOMAIN_NETWORK_PRIVATE(net); + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + size_t nfds; + GSList *n; + + if (netpriv->ebpfrssfds) {
This check is not needed.
+ nfds = 0; + for (n = netpriv->ebpfrssfds; n; n = n->next) { + virBufferAsprintf(&buf, "%s:", qemuFDPassDirectGetPath(n->data)); + nfds++;
'nfds' is not used.
+ } + } + virBufferTrim(&buf, ":"); + + return virBufferContentAndReset(&buf); +}
virJSONValue * qemuBuildNicDevProps(virDomainDef *def, @@ -3871,6 +3888,11 @@ qemuBuildNicDevProps(virDomainDef *def, "T:failover", failover, NULL) < 0) return NULL; + if (net->driver.virtio.rss == VIR_TRISTATE_SWITCH_ON && virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS)){
This check should not be needed here ...
+ g_autofree char *ebpf = qemuBuildEbpfRssArg(net);
Because this will return a NULL string if it was not set up.
+ if (virJSONValueObjectAdd(&props, "S:ebpf_rss_fds", ebpf, NULL) < 0)
This is definitely not needed in a separate step and can be done in the block above as "S:" skipps the attribute if the argument is NULL.
+ return NULL; + } } else { if (virJSONValueObjectAdd(&props, "s:driver", virDomainNetGetModelString(net), @@ -4152,6 +4174,33 @@ qemuBuildWatchdogDevProps(const virDomainDef *def, }
+static void qemuOpenEbpfRssFds(virDomainNetDef *net, + virQEMUCaps *qemuCaps) { + const void *ebpfRSSObject = NULL; + size_t ebpfRSSObjectSize = 0; + int fds[16];
16 feels arbitrary
Yes, it should be big enough. QEMU in the future may have less/more program&maps file descriptors. I'm thinking about a routine that could allocate the exact fd array during ebpf load.
+ int nfds = 0; + qemuDomainNetworkPrivate *netpriv = QEMU_DOMAIN_NETWORK_PRIVATE(net); + + netpriv->libbpfRSSObject = NULL; + netpriv->ebpfrssfds = NULL; + + /* Add ebpf values */ + if (net->driver.virtio.rss == VIR_TRISTATE_SWITCH_ON && virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_EBPF_RSS_FDS)) {
This line is too long and there's a good spot to break it up;
+ ebpfRSSObject = virQEMUCapsGetEbpf(qemuCaps, "rss", &ebpfRSSObjectSize); + nfds = qemuInterfaceLoadEbpf(ebpfRSSObject, ebpfRSSObjectSize, &netpriv->libbpfRSSObject, fds, 16);
Note that the '16' argument is unused by the callee.
Also note that when running an unprivileged instance of libvirt, this function might not have permissions to do anything. It seems that the rest is okay, but we need to avoid any form of spurious errors and such if that is really okay and there are fallback paths.
+ + if (nfds > 0) { + for (int i = 0; i < nfds; ++i) { + g_autofree char *name = g_strdup_printf("ebpfrssfd-%s-%u", net->info.alias, i); + netpriv->ebpfrssfds = g_slist_prepend(netpriv->ebpfrssfds, qemuFDPassDirectNew(name, fds + i)); + } + netpriv->ebpfrssfds = g_slist_reverse(netpriv->ebpfrssfds); + } + } +} + + static int qemuBuildWatchdogCommandLine(virCommand *cmd, const virDomainDef *def, @@ -8735,6 +8784,10 @@ qemuBuildInterfaceCommandLine(virQEMUDriver *driver, qemuFDPassDirectTransferCommand(netpriv->slirpfd, cmd); qemuFDPassTransferCommand(netpriv->vdpafd, cmd);
+ qemuOpenEbpfRssFds(net, qemuCaps);
Note that this will cause problems when you'll be about to test this functionality in qemuxml2argvtest. The test code must not modify the host state, so this will either need to be mocked in the testsuite or you'll need to move it to the host-prepare function 'qemuProcessPrepareHost' which is skipped from the test suite. The test suite will then need to fake the data.
I think it will be mocked.
+ for (n = netpriv->ebpfrssfds; n; n = n->next) + qemuFDPassDirectTransferCommand(n->data, cmd); + if (!(hostnetprops = qemuBuildHostNetProps(vm, net))) goto cleanup;
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index eec7bed28b..c696482f29 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -38,6 +38,7 @@ #include "qemu_checkpoint.h" #include "qemu_validate.h" #include "qemu_namespace.h" +#include "qemu_interface.h" #include "viralloc.h" #include "virlog.h" #include "virerror.h" @@ -1078,6 +1079,7 @@ qemuDomainNetworkPrivateClearFDs(qemuDomainNetworkPrivate *priv) g_clear_pointer(&priv->vdpafd, qemuFDPassFree); g_slist_free_full(g_steal_pointer(&priv->vhostfds), (GDestroyNotify) qemuFDPassDirectFree); g_slist_free_full(g_steal_pointer(&priv->tapfds), (GDestroyNotify) qemuFDPassDirectFree); + g_slist_free_full(g_steal_pointer(&priv->ebpfrssfds), (GDestroyNotify) qemuFDPassDirectFree); }
@@ -1089,6 +1091,8 @@ qemuDomainNetworkPrivateDispose(void *obj G_GNUC_UNUSED) qemuSlirpFree(priv->slirp);
qemuDomainNetworkPrivateClearFDs(priv); + + qemuInterfaceCloseEbpf(priv->libbpfRSSObject);
Does libvirt need to keep this open during the whole lifetime of the VM? Also note that this code will be called when restarting the libvirtd/virtqemud daemon, thus it can be called while the VM is alive.
I believe we can close the libbpf context right after QEMU launch(after fds are passed to the QEMU process). The idea was if the private context holds libbpf resources - it will destroy them with the private network context itself.

On Mon, Oct 09, 2023 at 09:16:10 +0300, Andrew Melnychenko wrote:
This series of rfc patches adds support for loading the RSS eBPF program and passing it to the QEMU. Comments and suggestions would be useful.
QEMU with vhost may work with RSS through eBPF. To load eBPF, the capabilities required that Libvirt may provide. eBPF program and maps may be unique for particular QEMU and Libvirt retrieves eBPF through qapi. For now, there is only "RSS" eBPF object in QEMU, in the future, there may be another one(g.e. network filters). That's why in Libvirt added logic to load and store any eBPF object that QEMU provides using qapi schema.
For virtio-net RSS, the document has not changed. ``` <interface type="network"> <model type="virtio"/> <driver queues="4" rss="on" rss_hash_report="off"/> <interface type="network"> ```
Simplified routine for RSS: * Libvirt retrieves eBPF "RSS" and load it. * Libvirt passes file descriptors to virtio-net with property "ebpf_rss_fds" ("rss" property should be "on" too). * if fds was provided - QEMU using eBPF RSS implementation. * if fds was not provided - QEMU tries to load eBPF RSS in own context and use it. * if eBPF RSS was not loaded - QEMU uses "in-qemu" RSS(vhost not supported).
Hi, please note that in my review following this mail I'll be mostly commenting about general problems, coding style problems and the capabilies probing an caching. I'm not familiar with what this feature is supposed to be doing as I'm involved more with storage and I maintain the qemu capability code. Thus input of others may be needed in terms of the actual feature.
meson.build | 6 ++ meson_options.txt | 1 + src/qemu/meson.build | 1 + src/qemu/qemu_capabilities.c | 181 +++++++++++++++++++++++++++++++++++ src/qemu/qemu_capabilities.h | 4 + src/qemu/qemu_command.c | 53 ++++++++++ src/qemu/qemu_domain.c | 4 + src/qemu/qemu_domain.h | 3 + src/qemu/qemu_interface.c | 42 ++++++++ src/qemu/qemu_interface.h | 4 + src/qemu/qemu_monitor.c | 23 +++++ src/qemu/qemu_monitor.h | 3 + src/qemu/qemu_monitor_json.c | 21 ++++ src/qemu/qemu_monitor_json.h | 3 +
During a quick look at the patches I've noticed few repeated coding style problems. Note that our prefered function declaration/definition header is virReturnType * virFunctionName(virFirstArg argname, virSecondArg *argval, virEtc etc) { } We also keep two line spacing between functions. Additionally we also require C90-style variable declarations (variable declaration must not mix with code inside a block). Other comments inline.

On Mon, Oct 09, 2023 at 10:15:34 +0200, Peter Krempa wrote:
On Mon, Oct 09, 2023 at 09:16:10 +0300, Andrew Melnychenko wrote:
This series of rfc patches adds support for loading the RSS eBPF program and passing it to the QEMU. Comments and suggestions would be useful.
QEMU with vhost may work with RSS through eBPF. To load eBPF, the capabilities required that Libvirt may provide. eBPF program and maps may be unique for particular QEMU and Libvirt retrieves eBPF through qapi. For now, there is only "RSS" eBPF object in QEMU, in the future, there may be another one(g.e. network filters). That's why in Libvirt added logic to load and store any eBPF object that QEMU provides using qapi schema.
For virtio-net RSS, the document has not changed. ``` <interface type="network"> <model type="virtio"/> <driver queues="4" rss="on" rss_hash_report="off"/> <interface type="network"> ```
Simplified routine for RSS: * Libvirt retrieves eBPF "RSS" and load it. * Libvirt passes file descriptors to virtio-net with property "ebpf_rss_fds" ("rss" property should be "on" too). * if fds was provided - QEMU using eBPF RSS implementation. * if fds was not provided - QEMU tries to load eBPF RSS in own context and use it. * if eBPF RSS was not loaded - QEMU uses "in-qemu" RSS(vhost not supported).
Hi,
Also what is the status of the qemu patches, specifically the QMP interface? I see that the patch was in v1 of Jasons 'Net patches' pull request in September, but was dropped in v2 for some reason. Specifically our policy is that a feature depending on new qemu interfaces can be merged to libvirt only after the code was comitted to qemu. We do not require that it is released, in fact we follow the development cycle with the the "qemu capabilities" updates done throughout the the dev cycle. Once the qemu code is merged I can do the required update of the capability test output (tests/qemucapabilitiesdata/...replies) output file based on your patch so that churn and your manual input is minimized.

Hi all, thank you for your comments. On Mon, Oct 9, 2023 at 11:22 AM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Oct 09, 2023 at 10:15:34 +0200, Peter Krempa wrote:
On Mon, Oct 09, 2023 at 09:16:10 +0300, Andrew Melnychenko wrote:
This series of rfc patches adds support for loading the RSS eBPF program and passing it to the QEMU. Comments and suggestions would be useful.
QEMU with vhost may work with RSS through eBPF. To load eBPF, the capabilities required that Libvirt may provide. eBPF program and maps may be unique for particular QEMU and Libvirt retrieves eBPF through qapi. For now, there is only "RSS" eBPF object in QEMU, in the future, there may be another one(g.e. network filters). That's why in Libvirt added logic to load and store any eBPF object that QEMU provides using qapi schema.
For virtio-net RSS, the document has not changed. ``` <interface type="network"> <model type="virtio"/> <driver queues="4" rss="on" rss_hash_report="off"/> <interface type="network"> ```
Simplified routine for RSS: * Libvirt retrieves eBPF "RSS" and load it. * Libvirt passes file descriptors to virtio-net with property "ebpf_rss_fds" ("rss" property should be "on" too). * if fds was provided - QEMU using eBPF RSS implementation. * if fds was not provided - QEMU tries to load eBPF RSS in own context and use it. * if eBPF RSS was not loaded - QEMU uses "in-qemu" RSS(vhost not supported).
Hi,
Also what is the status of the qemu patches, specifically the QMP interface? I see that the patch was in v1 of Jasons 'Net patches' pull request in September, but was dropped in v2 for some reason.
As I know ebpf qmp patches was in queue (https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg05859.html). I'll recheck it later.
Specifically our policy is that a feature depending on new qemu interfaces can be merged to libvirt only after the code was comitted to qemu.
We do not require that it is released, in fact we follow the development cycle with the the "qemu capabilities" updates done throughout the the dev cycle.
Once the qemu code is merged I can do the required update of the capability test output (tests/qemucapabilitiesdata/...replies) output file based on your patch so that churn and your manual input is minimized.
Yes, these are RFC patches, so when QEMU applies qmp ebpf - the Libvirt solution should be properly ready.
participants (3)
-
Andrew Melnichenko
-
Andrew Melnychenko
-
Peter Krempa