Re: [libvirt PATCH] RFC: Add support for vDPA network devices

27 Aug 2020

I got some feedback from John Ferlan on a different forum about missing
handling of migration and qemu capabilities. I'm adding this to my
patch, but I'd appreciate any additional feedback, particularly on the
xml format and the managed=yes/node-device questions below.


On Mon, 2020-08-24 at 16:33 -0500, Jonathon Jongsma wrote:
...
On Thu, 2020-08-20 at 18:56 -0400, Laine Stump wrote:
...
On 8/18/20 2:37 PM, Jonathon Jongsma wrote:
...
vDPA network devices allow high-performance networking in a
virtual
machine by providing a wire-speed data path. These devices
require
a
vendor-specific host driver but the data path follows the virtio
specification.
The support for vDPA devices was recently added to qemu. This
allows
libvirt to support these devices. It requires that the device is
configured on the host with the appropriate vendor-specific
driver.
This will create a chardev on the host at e.g. /dev/vhost-vdpa-0.
That
chardev path can then be used to define a new interface with
type='vdpa'.
---
[snip]
...
...
+
+::
+
+   ...
+   <devices>
+     <interface type='vdpa'>
+       <source dev='/dev/vhost-vdpa-0'/>
(The above device is created (I just learned this from you in IRC!)
by 
unbinding a VF from its NIC driver on the host, and re-binding it
to
a 
special VDPA-VF driver.)
As we were just discussing online, on one hand it could be nice if 
libvirt could automatically handle rebinding the VF to the vdpa
host 
driver (given the PCI address of the VF), to make it easier to use 
(because no advance setup would be needed), similar to what's
already 
done with hostdev devices (and <interface type='hostdev'>) when 
managed='yes' (which is the default setting).
On the other hand, it is exactly that managed='yes' functionality
that 
has created more "libvirt-but-not-really-libvirt" bug reports than
any 
other aspect of vfio device assignment, because the process of
unbinding 
and rebinding drivers is timing-sensitive and causes code that's
usually 
run only once at host boot-time to be run hundreds of times thus
making 
it more likely to expose infrequently-hit bugs.
I just bring this up in advance of someone suggesting the addition
of 
managed='yes' here to put in my vote for *not* doing it, and
instead 
using that same effort to provide some sort of API in the node-
device 
driver for easily creating one or more VDPA devices from VFs,
which 
could be done once at host boot time, and thus avoid the level of 
"libvirt-not-libvirt" bug reports for VDPA. (and after that maybe
even 
an API to allocate a device from that pool to be used by a guest).
But 
that's for later.
I'd really love to hear some additional opinions on this topic from
some more of the libvirt "old-timers". I intentionally kept the
initial
vdpa support simple by requiring that the device is already setup and
bound to the appropriate driver. But I want to make sure that we can
add additional capabilities later (as Laine suggested) with this same
API/schema.
Anybody else have thoughts on this topic?
...
...
+     </interface>
+   </devices>
+   ...
+
  :anchor:`<a id="elementsTeaming"/>`
Teaming a virtio/hostdev NIC pair

diff --git a/docs/schemas/domaincommon.rng
b/docs/schemas/domaincommon.rng
index 0d0dcbc5ce..17f74490f4 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -3108,6 +3108,21 @@
              <ref name="interface-options"/>
            </interleave>
          </group>
+
+        <group>
+          <attribute name="type">
+            <value>vdpa</value>
+          </attribute>
+          <interleave>
+            <element name="source">
+              <attribute name="dev">
+                <ref name="deviceName"/>
+              </attribute>
+            </element>
+            <ref name="interface-options"/>
+          </interleave>
+        </group>
+
        </choice>
        <optional>
          <attribute name="trustGuestRxFilters">
[snip]
...
...
diff --git a/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
new file mode 100644
index 0000000000..8e76ac7794
--- /dev/null
+++ b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
@@ -0,0 +1,37 @@
+LC_ALL=C \
+PATH=/bin \
+HOME=/tmp/lib/domain--1-QEMUGuest1 \
+USER=test \
+LOGNAME=test \
+XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \
+XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \
+XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \
+QEMU_AUDIO_DRV=none \
+/usr/bin/qemu-system-i386 \
+-name guest=QEMUGuest1,debug-threads=on \
+-S \
+-object secret,id=masterKey0,format=raw,\
+file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \
+-machine pc,accel=tcg,usb=off,dump-guest-core=off \
+-cpu qemu64 \
+-m 214 \
+-overcommit mem-lock=off \
+-smp 1,sockets=1,cores=1,threads=1 \
+-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \
+-display none \
+-no-user-config \
+-nodefaults \
+-chardev socket,id=charmonitor,fd=1729,server,nowait \
+-mon chardev=charmonitor,id=monitor,mode=control \
+-rtc base=utc \
+-no-shutdown \
+-no-acpi \
+-boot strict=on \
+-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
+-add-fd set=0,fd=1732 \
+-netdev vhost-vdpa,vhostdev=/dev/fdset/0,id=hostnet0 \
Okay, I'm feeling too lazy to parse through the code above an see
how 
you arrived at "vhostdev='/dev/fdset/0'", but that doesn't look
right. 
Shouldn't you be ending up with "-netdev vhost-vdpa,fd=NN,..."?
The 
document I have shows that syntax is supported, so there shouldn't
be 
any need to do the add-fd stuff in this case.
The initial proposal for vdpa in qemu supported both vhostdev= and
fd=
parameters, but the final implementation does not actually support an
fd= parameter here. The recommended way to pass an pre-opened fd to
qemu in this scenario is to use the 'add-fd' and /dev/fdset/ method
as
shown above.
As far as I understand from reading qemu code, /dev/fdset/N is not
actually a file that exists in the filesystem. Instead, when you call
'add-fd', qemu adds that fd to an internal mapping that maps from the
fdset specified by set=N to the passed fd. Whenever qemu tries to
open
a file, qemu_open() has special handling for filenames that start
with
"/dev/fdset/": it looks up the fd associated with that fdset id and
returns that instead of attempting to open the file path.
So I think the above should be correct.
...
I think the next step should be to find some hardware and give this
a 
smoke test! :-)
Indeed, I'm working with Cindy Lu (who implemented the qemu vdpa
support) to try to get this tested properly. Unfortunately I've been
unable to get vdpa_sim working and I don't have access to hardware at
the moment.
Thanks for the review!
Jonathon