Eric Blake wrote:
On 10/23/2014 03:21 PM, Jim Fehlig wrote:
> This patch fixes a few issues noted while enabling the TCK PCI device
> hotplug tests.
>
> First, the call to node device dettach function misses parameters, resulting
> in the following failure
>
> Failed test 'detached device from host OS'
> at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90.
> died: Usage: Sys::Virt::NodeDevice::dettach(dev, driversv, flags=0)
> at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90.
>
> Trivally fixed by adding the missing parameters when calling node device
> dettach function.
>
That part is fine to fix (note that 'dettach' is a typo but part of the
API, so we are stuck with it; it's nice to use 'detach' where possible).
> Second, it appears qemu needs to reach some state of initialization before
> host device attach/detach works properly. Currently, the test fails when
> detaching the device from the guest, resetting it, and reattaching it to
> the host
>
> Failed test 'reset the host PCI device'
> at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 110.
> died: Sys::Virt::Error (libvirt error code: 1, message: internal error:
> Not resetting active device 0000:03:00.1)
>
> Failed test 'reattached device to host OS'
> at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 111.
> died: Sys::Virt::Error (libvirt error code: 55, message: Requested operation
> is not valid: PCI device 0000:03:00.1 is still in use by driver QEMU, domain
> tck)
>
> I've found that sleeping for a bit after creating the guest allows the
> tests to pass.
>
But that papers over the bug. Ideally, the libvirt API should be
waiting longer for the guest to relinquish control of the device before
returning success on the first call, so that the second call cannot hit
a collision.
Even with this delay, the guest has not completely booted when the test
successfully completes. So it is not clear to me how much guest
cooperation is needed.
I'm not so sure that this is the right patch to make to
TCK, and wonder if we should instead be patching libvirt proper. Or is
it a case that because detach requires guest cooperation, we need to
wait long enough for the guest to boot to the point that it can
cooperate?
Perhaps. Maybe acpiphp needs to be loaded in the guest kernel before
this will work.
If so, then maybe we should expose a proper libvirt event
for the underlying 'guest agent available' event (aka VSERPORT_CHANGE)
that recent qemu added, and wait for that event to occur within a
reasonable time rather than blindly hard-coding a fixed sleep time.
I don't see this issue with an operational guest. It only manifests
when attempting to attach and detach the device immediately after
creating the domain. AFAICT, most tests that require some form of guest
cooperation sleep after create_domain.
Regards,
Jim
> While here, also fix a small typo from copy and paste of disk tests.
>
It would be better to split off the obvious trivial fixes and push them
now, while waiting on discussion for the rest of the patch.
> Signed-off-by: Jim Fehlig <jfehlig(a)suse.com>
> ---
> scripts/domain/250-pci-host-hotplug.t | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/domain/250-pci-host-hotplug.t
b/scripts/domain/250-pci-host-hotplug.t
> index 4579477..e128de7 100644
> --- a/scripts/domain/250-pci-host-hotplug.t
> +++ b/scripts/domain/250-pci-host-hotplug.t
> @@ -52,6 +52,8 @@ my $xml = $tck->generic_domain(name =>
"tck")->as_xml;
> diag "Creating a new transient domain";
> my $dom;
> ok_domain(sub { $dom = $conn->create_domain($xml) }, "created transient
domain object");
> +diag "Waiting 10 seconds for guest to initialize";
> +sleep(10);
>
>
So this is the controversial hunk...
>
> my ($domain, $bus, $slot, $function) = $tck->get_host_pci_device();
> @@ -86,7 +88,7 @@ SKIP: {
>
> ok(defined $nodedev, "found PCI device $domain:$bus:$slot.$function on
host");
>
> - lives_ok(sub { $nodedev->dettach() }, "detached device from host
OS");
> + lives_ok(sub { $nodedev->dettach(undef, 0) }, "detached device from host
OS");
> lives_ok(sub { $nodedev->reset() }, "reset the host PCI device");
>
> my $devxml =
> @@ -112,6 +114,6 @@ SKIP: {
>
> my $finalxml = $dom->get_xml_description;
>
> - is($initialxml, $finalxml, "final XML has removed the disk")
> + is($initialxml, $finalxml, "final XML has removed the PCI device")
>
...but ACK to these two.
> }
>
>
>