On Tue, 2 Apr 2019 10:34:27 +0200
Erik Skultety <eskultet(a)redhat.com> wrote:
On Tue, Mar 12, 2019 at 06:55:49PM -0300, Daniel Henrique Barboza
wrote:
> The NVLink2 support in QEMU implements the detection of NVLink2
> capable devices by verifying the attributes of the VFIO mem region
> QEMU allocates for the NVIDIA GPUs. To properly allocate an
> adequate amount of memLock, Libvirt needs this information before
> a QEMU instance is even created, thus querying QEMU is not
> possible and opening a VFIO window is too much.
>
> An alternative is presented in this patch. Making the following
> assumptions:
>
> - if we want GPU RAM to be available in the guest, an NVLink2 bridge
> must be passed through;
>
> - an unknown PCI device can be classified as a NVLink2 bridge
> if its device tree node has 'ibm,gpu', 'ibm,nvlink',
> 'ibm,nvlink-speed' and 'memory-region'.
Alexey mentioned that it should be enough to check for the properties ^above.
I'm just wondering, knowing this is IBM's PPC8/9 if the assumptions we have
made are going to stay with further revisions of PPC, NVLink and GPUs, IOW
we need to be sure "ibm,nvlink" won't be renamed with further revisions,
e.g.
other cards than V100 in the future, because then compatibility and revision
selection comes into the picture.
>
> This patch introduces a helper called @ppc64VFIODeviceIsNV2Bridge
> that checks the device tree node of a given PCI device and
> check if it meets the criteria to be a NVLink2 bridge. This
Just out of curiosity, what about NVLink 1.0? Apart from performance, I wasn't
able to find something useful in terms of compatibility, is there something to
consider, since we're only relying on NVLink 2.0?
Probably not. NVLink 1 is entirely incompatible, and AFAIK support for
it has been postponed until probably never.
> new function will be used in a follow-up patch that, using the
> first assumption, will set up the rlimits of the guest
> accordingly.
>
> Signed-off-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
> ---
> src/qemu/qemu_domain.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 1659e88478..dcc92d253c 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -10398,6 +10398,35 @@ qemuDomainUpdateCurrentMemorySize(virDomainObjPtr vm)
> }
>
>
> +/**
> + * ppc64VFIODeviceIsNV2Bridge:
> + * @device: string with the PCI device address
> + *
> + * This function receives a string that represents a PCI device,
> + * such as '0004:04:00.0', and tells if the device is a NVLink2
> + * bridge.
> + */
> +static bool
> +ppc64VFIODeviceIsNV2Bridge(const char *device)
> +{
> + const char *nvlink2Files[] = {"ibm,gpu", "ibm,nvlink",
> + "ibm,nvlink-speed",
"memory-region"};
Like I said above, if ^this is not to change, then I'm okay in principle, still
I feel like this needs David's ACK (putting him on CC)
Codewise, Peter already provided you with comments.
Hm, I don't feel I can answer this definitively. My guess would be to
ask Alexey, and AFAIK you already have information from him.
--
David Gibson <dgibson(a)redhat.com>
Principal Software Engineer, Virtualization, Red Hat