
On Thu, 2020-08-20 at 18:56 -0400, Laine Stump wrote:
On 8/18/20 2:37 PM, Jonathon Jongsma wrote:
vDPA network devices allow high-performance networking in a virtual machine by providing a wire-speed data path. These devices require a vendor-specific host driver but the data path follows the virtio specification.
The support for vDPA devices was recently added to qemu. This allows libvirt to support these devices. It requires that the device is configured on the host with the appropriate vendor-specific driver. This will create a chardev on the host at e.g. /dev/vhost-vdpa-0. That chardev path can then be used to define a new interface with type='vdpa'. ---
[snip]
+ +:: + + ... + <devices> + <interface type='vdpa'> + <source dev='/dev/vhost-vdpa-0'/>
(The above device is created (I just learned this from you in IRC!) by unbinding a VF from its NIC driver on the host, and re-binding it to a special VDPA-VF driver.)
As we were just discussing online, on one hand it could be nice if libvirt could automatically handle rebinding the VF to the vdpa host driver (given the PCI address of the VF), to make it easier to use (because no advance setup would be needed), similar to what's already done with hostdev devices (and <interface type='hostdev'>) when managed='yes' (which is the default setting).
On the other hand, it is exactly that managed='yes' functionality that has created more "libvirt-but-not-really-libvirt" bug reports than any other aspect of vfio device assignment, because the process of unbinding and rebinding drivers is timing-sensitive and causes code that's usually run only once at host boot-time to be run hundreds of times thus making it more likely to expose infrequently-hit bugs.
I just bring this up in advance of someone suggesting the addition of managed='yes' here to put in my vote for *not* doing it, and instead using that same effort to provide some sort of API in the node- device driver for easily creating one or more VDPA devices from VFs, which could be done once at host boot time, and thus avoid the level of "libvirt-not-libvirt" bug reports for VDPA. (and after that maybe even an API to allocate a device from that pool to be used by a guest). But that's for later.
I'd really love to hear some additional opinions on this topic from some more of the libvirt "old-timers". I intentionally kept the initial vdpa support simple by requiring that the device is already setup and bound to the appropriate driver. But I want to make sure that we can add additional capabilities later (as Laine suggested) with this same API/schema. Anybody else have thoughts on this topic?
+ </interface> + </devices> + ... + :anchor:`<a id="elementsTeaming"/>`
Teaming a virtio/hostdev NIC pair diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 0d0dcbc5ce..17f74490f4 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -3108,6 +3108,21 @@ <ref name="interface-options"/> </interleave> </group> + + <group> + <attribute name="type"> + <value>vdpa</value> + </attribute> + <interleave> + <element name="source"> + <attribute name="dev"> + <ref name="deviceName"/> + </attribute> + </element> + <ref name="interface-options"/> + </interleave> + </group> + </choice> <optional> <attribute name="trustGuestRxFilters">
[snip]
diff --git a/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args new file mode 100644 index 0000000000..8e76ac7794 --- /dev/null +++ b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args @@ -0,0 +1,37 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i386 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object secret,id=masterKey0,format=raw,\ +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-cpu qemu64 \ +-m 214 \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-add-fd set=0,fd=1732 \ +-netdev vhost-vdpa,vhostdev=/dev/fdset/0,id=hostnet0 \
Okay, I'm feeling too lazy to parse through the code above an see how you arrived at "vhostdev='/dev/fdset/0'", but that doesn't look right. Shouldn't you be ending up with "-netdev vhost-vdpa,fd=NN,..."? The document I have shows that syntax is supported, so there shouldn't be any need to do the add-fd stuff in this case.
The initial proposal for vdpa in qemu supported both vhostdev= and fd= parameters, but the final implementation does not actually support an fd= parameter here. The recommended way to pass an pre-opened fd to qemu in this scenario is to use the 'add-fd' and /dev/fdset/ method as shown above. As far as I understand from reading qemu code, /dev/fdset/N is not actually a file that exists in the filesystem. Instead, when you call 'add-fd', qemu adds that fd to an internal mapping that maps from the fdset specified by set=N to the passed fd. Whenever qemu tries to open a file, qemu_open() has special handling for filenames that start with "/dev/fdset/": it looks up the fd associated with that fdset id and returns that instead of attempting to open the file path. So I think the above should be correct.
I think the next step should be to find some hardware and give this a smoke test! :-)
Indeed, I'm working with Cindy Lu (who implemented the qemu vdpa support) to try to get this tested properly. Unfortunately I've been unable to get vdpa_sim working and I don't have access to hardware at the moment. Thanks for the review! Jonathon