On 10/07/13 16:10, Laine Stump wrote:
On 10/04/2013 08:55 AM, Peter Krempa wrote:
> Add code to check availability of PCI passhthrough using VFIO and the
> legacy KVM passthrough and use it when starting VMs and hotplugging
> devices to live machine.
> ---
> src/qemu/qemu_hostdev.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++
> src/qemu/qemu_hostdev.h | 4 ++
> src/qemu/qemu_hotplug.c | 4 ++
> src/qemu/qemu_process.c | 5 ++
> 4 files changed, 139 insertions(+)
>
> diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c
> index 21fe47f..dbbc2b4 100644
> --- a/src/qemu/qemu_hostdev.c
> +++ b/src/qemu/qemu_hostdev.c
...
> @@ -1287,3 +1293,123 @@ void
qemuDomainReAttachHostDevices(virQEMUDriverPtr driver,
> qemuDomainReAttachHostScsiDevices(driver, def->name, def->hostdevs,
> def->nhostdevs);
> }
> +
> +
> +static bool
> +qemuHostdevHostSupportsPassthroughVFIO(void)
> +{
> + DIR *iommuDir = NULL;
> + struct dirent *iommuGroup = NULL;
> + bool ret = false;
> +
> + /* condition 1 - /sys/kernel/iommu_groups/ contains entries */
> + if (!(iommuDir = opendir("/sys/kernel/iommu_groups/")))
> + goto cleanup;
> +
> + while ((iommuGroup = readdir(iommuDir))) {
> + /* skip ./ ../ */
> + if (STRPREFIX(iommuGroup->d_name, "."))
> + continue;
> +
> + /* assume we found a group */
> + break;
> + }
> +
> + if (!iommuGroup)
> + goto cleanup;
> + /* okay, iommu is on and recognizes groups */
> +
> + /* condition 2 - /dev/vfio/vfio exists */
> + if (!virFileExists("/dev/vfio/vfio"))
> + goto cleanup;
Was this all that was suggested? At any rate it's a good start even if
there are other checks that can be added later.
Yes that was all. While I was testing this I found out, that this
doesn't necesarily mean everything will go well. For example on crappy
hardware you might need to enable unsafe interrupt routing (which may
kill your host in most cases). These are then caught by the improved
error reporting code in the qemu driver so you get a more useful error
message. (saying that operation was not permitted and few entries in the
kernel log).
> +
...
> +bool
> +qemuHostdevHostVerifySupport(virDomainHostdevDefPtr *hostdevs,
> + size_t nhostdevs)
> +{
> + int supportsPassthroughKVM = -1;
> + int supportsPassthroughVFIO = -1;
> + size_t i;
> +
> + /* assign defaults for hostdev passthrough */
> + for (i = 0; i < nhostdevs; i++) {
> + virDomainHostdevDefPtr hostdev = hostdevs[i];
> +
> + if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
> + hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) {
> + int *backend = &hostdev->source.subsys.u.pci.backend;
> +
> + /* cache host state of passthrough support */
> + if (supportsPassthroughKVM == -1 || supportsPassthroughVFIO == -1) {
> + supportsPassthroughKVM =
qemuHostdevHostSupportsPassthroughLegacy();
> + supportsPassthroughVFIO = qemuHostdevHostSupportsPassthroughVFIO();
These should be checked once at the top of the function and their return
values cached.
Hmm; I originally didn't want to call the code in case no PCI
passthrough device was present in the guest. The main reason was that
the first version was erroring out in the functions directly. I think we
can safely call those every time we see non-zero count of hostdevs.
I will adapt the code in the next version.
> + }
> +
> + switch ((virDomainHostdevSubsysPciBackendType) *backend) {
> + case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO:
> + if (!supportsPassthroughVFIO) {
> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> + _("host doesn't support VFIO PCI
passthrough"));
> + return false;
> + }
> + break;
> +
...
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 7a30a5e..4aa9582 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -3567,6 +3567,11 @@ int qemuProcessStart(virConnectPtr conn,
> goto cleanup;
> }
>
> + /* check and assign device assignment settings */
> + if (!qemuHostdevHostVerifySupport(vm->def->hostdevs,
> + vm->def->nhostdevs))
> + goto cleanup;
> +
In one case you're calling this very near to a call to
qemuPrepareHostdevPCIDevices(), in the other very near to a call to
qemuPrepareHostDevices, which itself calls
qemuPrepareHostdevPCIDevices(); I think this check should be made inside
qemuPrepareHostdevPCIDevices() so that the higher level code isn't
polluted with the detail.
Seems like that will be an okay place too. At first I was looking for a
place that won't be called when running the tests, but
qemuPrepareHostdevPCIDevices() seems like the right place to do it.
I will adapt the code and re-send.
Peter