On Thu, 2020-08-20 at 18:56 -0400, Laine Stump wrote:
On 8/18/20 2:37 PM, Jonathon Jongsma wrote:
> vDPA network devices allow high-performance networking in a virtual
> machine by providing a wire-speed data path. These devices require
> a
> vendor-specific host driver but the data path follows the virtio
> specification.
>
> The support for vDPA devices was recently added to qemu. This
> allows
> libvirt to support these devices. It requires that the device is
> configured on the host with the appropriate vendor-specific driver.
> This will create a chardev on the host at e.g. /dev/vhost-vdpa-0.
> That
> chardev path can then be used to define a new interface with
> type='vdpa'.
> ---
[snip]
> +
> +::
> +
> + ...
> + <devices>
> + <interface type='vdpa'>
> + <source dev='/dev/vhost-vdpa-0'/>
(The above device is created (I just learned this from you in IRC!)
by
unbinding a VF from its NIC driver on the host, and re-binding it to
a
special VDPA-VF driver.)
As we were just discussing online, on one hand it could be nice if
libvirt could automatically handle rebinding the VF to the vdpa host
driver (given the PCI address of the VF), to make it easier to use
(because no advance setup would be needed), similar to what's
already
done with hostdev devices (and <interface type='hostdev'>) when
managed='yes' (which is the default setting).
On the other hand, it is exactly that managed='yes' functionality
that
has created more "libvirt-but-not-really-libvirt" bug reports than
any
other aspect of vfio device assignment, because the process of
unbinding
and rebinding drivers is timing-sensitive and causes code that's
usually
run only once at host boot-time to be run hundreds of times thus
making
it more likely to expose infrequently-hit bugs.
I just bring this up in advance of someone suggesting the addition
of
managed='yes' here to put in my vote for *not* doing it, and instead
using that same effort to provide some sort of API in the node-
device
driver for easily creating one or more VDPA devices from VFs, which
could be done once at host boot time, and thus avoid the level of
"libvirt-not-libvirt" bug reports for VDPA. (and after that maybe
even
an API to allocate a device from that pool to be used by a guest).
But
that's for later.
I'd really love to hear some additional opinions on this topic from
some more of the libvirt "old-timers". I intentionally kept the initial
vdpa support simple by requiring that the device is already setup and
bound to the appropriate driver. But I want to make sure that we can
add additional capabilities later (as Laine suggested) with this same
API/schema.
Anybody else have thoughts on this topic?
> + </interface>
> + </devices>
> + ...
> +
> :anchor:`<a id="elementsTeaming"/>`
>
> Teaming a virtio/hostdev NIC pair
> diff --git a/docs/schemas/domaincommon.rng
> b/docs/schemas/domaincommon.rng
> index 0d0dcbc5ce..17f74490f4 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
> @@ -3108,6 +3108,21 @@
> <ref name="interface-options"/>
> </interleave>
> </group>
> +
> + <group>
> + <attribute name="type">
> + <value>vdpa</value>
> + </attribute>
> + <interleave>
> + <element name="source">
> + <attribute name="dev">
> + <ref name="deviceName"/>
> + </attribute>
> + </element>
> + <ref name="interface-options"/>
> + </interleave>
> + </group>
> +
> </choice>
> <optional>
> <attribute name="trustGuestRxFilters">
>
[snip]
> diff --git a/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> new file mode 100644
> index 0000000000..8e76ac7794
> --- /dev/null
> +++ b/tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
> @@ -0,0 +1,37 @@
> +LC_ALL=C \
> +PATH=/bin \
> +HOME=/tmp/lib/domain--1-QEMUGuest1 \
> +USER=test \
> +LOGNAME=test \
> +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \
> +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \
> +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \
> +QEMU_AUDIO_DRV=none \
> +/usr/bin/qemu-system-i386 \
> +-name guest=QEMUGuest1,debug-threads=on \
> +-S \
> +-object secret,id=masterKey0,format=raw,\
> +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \
> +-machine pc,accel=tcg,usb=off,dump-guest-core=off \
> +-cpu qemu64 \
> +-m 214 \
> +-overcommit mem-lock=off \
> +-smp 1,sockets=1,cores=1,threads=1 \
> +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \
> +-display none \
> +-no-user-config \
> +-nodefaults \
> +-chardev socket,id=charmonitor,fd=1729,server,nowait \
> +-mon chardev=charmonitor,id=monitor,mode=control \
> +-rtc base=utc \
> +-no-shutdown \
> +-no-acpi \
> +-boot strict=on \
> +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> +-add-fd set=0,fd=1732 \
> +-netdev vhost-vdpa,vhostdev=/dev/fdset/0,id=hostnet0 \
Okay, I'm feeling too lazy to parse through the code above an see
how
you arrived at "vhostdev='/dev/fdset/0'", but that doesn't look
right.
Shouldn't you be ending up with "-netdev vhost-vdpa,fd=NN,..."? The
document I have shows that syntax is supported, so there shouldn't
be
any need to do the add-fd stuff in this case.
The initial proposal for vdpa in qemu supported both vhostdev= and fd=
parameters, but the final implementation does not actually support an
fd= parameter here. The recommended way to pass an pre-opened fd to
qemu in this scenario is to use the 'add-fd' and /dev/fdset/ method as
shown above.
As far as I understand from reading qemu code, /dev/fdset/N is not
actually a file that exists in the filesystem. Instead, when you call
'add-fd', qemu adds that fd to an internal mapping that maps from the
fdset specified by set=N to the passed fd. Whenever qemu tries to open
a file, qemu_open() has special handling for filenames that start with
"/dev/fdset/": it looks up the fd associated with that fdset id and
returns that instead of attempting to open the file path.
So I think the above should be correct.
I think the next step should be to find some hardware and give this
a
smoke test! :-)
Indeed, I'm working with Cindy Lu (who implemented the qemu vdpa
support) to try to get this tested properly. Unfortunately I've been
unable to get vdpa_sim working and I don't have access to hardware at
the moment.
Thanks for the review!
Jonathon