libvirt
/
libvirt
|
master
|
22 mins and 19 secs
|
Daniel Henrique Barboza
|
PPC64 support for NVIDIA V100 GPU with NVLink2 passthrough
The NVIDIA V100 GPU has an onboard RAM that is mapped into the host memory and accessible as normal RAM via an NVLink2 bridge. When passed through in a guest, QEMU puts the NVIDIA RAM window in a non-contiguous area, above the PCI MMIO area that starts at 32TiB. This means that the NVIDIA RAM window starts at 64TiB and go all the way to 128TiB.
This means that the guest might request a 64-bit window, for each PCI Host Bridge, that goes all the way to 128TiB. However, the NVIDIA RAM window isn't counted as regular RAM, thus this window is considered only for the allocation of the Translation and Control Entry (TCE). For more information about how NVLink2 support works in QEMU, refer to the accepted implementation [1].
This memory layout differs from the existing VFIO case, requiring its own formula. This patch changes the PPC64 code of @qemuDomainGetMemLockLimitBytes to:
- detect if we have a NVLink2 bridge being passed through to the guest. This is done by using the @ppc64VFIODeviceIsNV2Bridge function added in the previous patch. The existence of the NVLink2 bridge in the guest means that we are dealing with the NVLink2 memory layout;
- if an IBM NVLink2 bridge exists, passthroughLimit is calculated in a different way to account for the extra memory the TCE table can alloc. The 64TiB..128TiB window is more than enough to fit all possible GPUs, thus the memLimit is the same regardless of passing through 1 or multiple V100 GPUs.
Further reading explaining the background [1] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg03700.html [2] https://www.redhat.com/archives/libvir-list/2019-March/msg00660.html [3] https://www.redhat.com/archives/libvir-list/2019-April/msg00527.html
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
|
|