I got some feedback from John Ferlan on a different forum about missing
handling of migration and qemu capabilities. I'm adding this to my
patch, but I'd appreciate any additional feedback, particularly on the
xml format and the managed=yes/node-device questions below.
On Mon, 2020-08-24 at 16:33 -0500, Jonathon Jongsma wrote:
On Thu, 2020-08-20 at 18:56 -0400, Laine Stump wrote:
> On 8/18/20 2:37 PM, Jonathon Jongsma wrote:
> > vDPA network devices allow high-performance networking in a
> > virtual
> > machine by providing a wire-speed data path. These devices
> > require
> > a
> > vendor-specific host driver but the data path follows the virtio
> > specification.
> >
> > The support for vDPA devices was recently added to qemu. This
> > allows
> > libvirt to support these devices. It requires that the device is
> > configured on the host with the appropriate vendor-specific
> > driver.
> > This will create a chardev on the host at e.g. /dev/vhost-vdpa-0.
> > That
> > chardev path can then be used to define a new interface with
> > type='vdpa'.
> > ---
[snip]
>
> > +
> > +::
> > +
> > + ...
> > + <devices>
> > + <interface type='vdpa'>
> > + <source dev='/dev/vhost-vdpa-0'/>
>
> (The above device is created (I just learned this from you in IRC!)
> by
> unbinding a VF from its NIC driver on the host, and re-binding it
> to
> a
> special VDPA-VF driver.)
>
>
> As we were just discussing online, on one hand it could be nice if
> libvirt could automatically handle rebinding the VF to the vdpa
> host
> driver (given the PCI address of the VF), to make it easier to use
> (because no advance setup would be needed), similar to what's
> already
> done with hostdev devices (and <interface type='hostdev'>) when
> managed='yes' (which is the default setting).
>
>
> On the other hand, it is exactly that managed='yes' functionality
> that
> has created more "libvirt-but-not-really-libvirt" bug reports than
> any
> other aspect of vfio device assignment, because the process of
> unbinding
> and rebinding drivers is timing-sensitive and causes code that's
> usually
> run only once at host boot-time to be run hundreds of times thus
> making
> it more likely to expose infrequently-hit bugs.
>
>
> I just bring this up in advance of someone suggesting the addition
> of
> managed='yes' here to put in my vote for *not* doing it, and
> instead
> using that same effort to provide some sort of API in the node-
> device
> driver for easily creating one or more VDPA devices from VFs,
> which
> could be done once at host boot time, and thus avoid the level of
> "libvirt-not-libvirt" bug reports for VDPA. (and after that maybe
> even
> an API to allocate a device from that pool to be used by a guest).
> But
> that's for later.
I'd really love to hear some additional opinions on this topic from
some more of the libvirt "old-timers". I intentionally kept the
initial
vdpa support simple by requiring that the device is already setup and
bound to the appropriate driver. But I want to make sure that we can
add additional capabilities later (as Laine suggested) with this same
API/schema.
Anybody else have thoughts on this topic?
>
> > + </interface>
> > + </devices>
> > + ...
> > +
> > :anchor:`<a id="elementsTeaming"/>`
> >
> > Teaming a virtio/hostdev NIC pair
> > diff --git a/docs/schemas/domaincommon.rng
> > b/docs/schemas/domaincommon.rng
> > index 0d0dcbc5ce..17f74490f4 100644
> > --- a/docs/schemas/domaincommon.rng
> > +++ b/docs/schemas/domaincommon.rng
> > @@ -3108,6 +3108,21 @@
> > <ref name="interface-options"/>
> > </interleave>
> > </group>
> > +
> > + <group>
> > + <attribute name="type">
> > + <value>vdpa</value>
> > + </attribute>
> > + <interleave>
> > + <element name="source">
> > + <attribute name="dev">
> > + <ref name="deviceName"/>
> > + </attribute>
> > + </element>
> > + <ref name="interface-options"/>
> > + </interleave>
> > + </group>
> > +
> > </choice>
> > <optional>
> > <attribute name="trustGuestRxFilters">
> >
[snip]
> > diff --git a/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > new file mode 100644
> > index 0000000000..8e76ac7794
> > --- /dev/null
> > +++ b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> > @@ -0,0 +1,37 @@
> > +LC_ALL=C \
> > +PATH=/bin \
> > +HOME=/tmp/lib/domain--1-QEMUGuest1 \
> > +USER=test \
> > +LOGNAME=test \
> > +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \
> > +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \
> > +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \
> > +QEMU_AUDIO_DRV=none \
> > +/usr/bin/qemu-system-i386 \
> > +-name guest=QEMUGuest1,debug-threads=on \
> > +-S \
> > +-object secret,id=masterKey0,format=raw,\
> > +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \
> > +-machine pc,accel=tcg,usb=off,dump-guest-core=off \
> > +-cpu qemu64 \
> > +-m 214 \
> > +-overcommit mem-lock=off \
> > +-smp 1,sockets=1,cores=1,threads=1 \
> > +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \
> > +-display none \
> > +-no-user-config \
> > +-nodefaults \
> > +-chardev socket,id=charmonitor,fd=1729,server,nowait \
> > +-mon chardev=charmonitor,id=monitor,mode=control \
> > +-rtc base=utc \
> > +-no-shutdown \
> > +-no-acpi \
> > +-boot strict=on \
> > +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> > +-add-fd set=0,fd=1732 \
> > +-netdev vhost-vdpa,vhostdev=/dev/fdset/0,id=hostnet0 \
>
> Okay, I'm feeling too lazy to parse through the code above an see
> how
> you arrived at "vhostdev='/dev/fdset/0'", but that doesn't
look
> right.
> Shouldn't you be ending up with "-netdev vhost-vdpa,fd=NN,..."?
> The
> document I have shows that syntax is supported, so there shouldn't
> be
> any need to do the add-fd stuff in this case.
The initial proposal for vdpa in qemu supported both vhostdev= and
fd=
parameters, but the final implementation does not actually support an
fd= parameter here. The recommended way to pass an pre-opened fd to
qemu in this scenario is to use the 'add-fd' and /dev/fdset/ method
as
shown above.
As far as I understand from reading qemu code, /dev/fdset/N is not
actually a file that exists in the filesystem. Instead, when you call
'add-fd', qemu adds that fd to an internal mapping that maps from the
fdset specified by set=N to the passed fd. Whenever qemu tries to
open
a file, qemu_open() has special handling for filenames that start
with
"/dev/fdset/": it looks up the fd associated with that fdset id and
returns that instead of attempting to open the file path.
So I think the above should be correct.
>
> I think the next step should be to find some hardware and give this
> a
> smoke test! :-)
Indeed, I'm working with Cindy Lu (who implemented the qemu vdpa
support) to try to get this tested properly. Unfortunately I've been
unable to get vdpa_sim working and I don't have access to hardware at
the moment.
Thanks for the review!
Jonathon