[libvirt] [libvirt-tck PATCH] Fix PCI device hotplug tests

This patch fixes a few issues noted while enabling the TCK PCI device hotplug tests. First, the call to node device dettach function misses parameters, resulting in the following failure Failed test 'detached device from host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90. died: Usage: Sys::Virt::NodeDevice::dettach(dev, driversv, flags=0) at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90. Trivally fixed by adding the missing parameters when calling node device dettach function. Second, it appears qemu needs to reach some state of initialization before host device attach/detach works properly. Currently, the test fails when detaching the device from the guest, resetting it, and reattaching it to the host Failed test 'reset the host PCI device' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 110. died: Sys::Virt::Error (libvirt error code: 1, message: internal error: Not resetting active device 0000:03:00.1) Failed test 'reattached device to host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 111. died: Sys::Virt::Error (libvirt error code: 55, message: Requested operation is not valid: PCI device 0000:03:00.1 is still in use by driver QEMU, domain tck) I've found that sleeping for a bit after creating the guest allows the tests to pass. While here, also fix a small typo from copy and paste of disk tests. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- scripts/domain/250-pci-host-hotplug.t | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/scripts/domain/250-pci-host-hotplug.t b/scripts/domain/250-pci-host-hotplug.t index 4579477..e128de7 100644 --- a/scripts/domain/250-pci-host-hotplug.t +++ b/scripts/domain/250-pci-host-hotplug.t @@ -52,6 +52,8 @@ my $xml = $tck->generic_domain(name => "tck")->as_xml; diag "Creating a new transient domain"; my $dom; ok_domain(sub { $dom = $conn->create_domain($xml) }, "created transient domain object"); +diag "Waiting 10 seconds for guest to initialize"; +sleep(10); my ($domain, $bus, $slot, $function) = $tck->get_host_pci_device(); @@ -86,7 +88,7 @@ SKIP: { ok(defined $nodedev, "found PCI device $domain:$bus:$slot.$function on host"); - lives_ok(sub { $nodedev->dettach() }, "detached device from host OS"); + lives_ok(sub { $nodedev->dettach(undef, 0) }, "detached device from host OS"); lives_ok(sub { $nodedev->reset() }, "reset the host PCI device"); my $devxml = @@ -112,6 +114,6 @@ SKIP: { my $finalxml = $dom->get_xml_description; - is($initialxml, $finalxml, "final XML has removed the disk") + is($initialxml, $finalxml, "final XML has removed the PCI device") } -- 1.8.4.5

On 10/23/2014 03:21 PM, Jim Fehlig wrote:
This patch fixes a few issues noted while enabling the TCK PCI device hotplug tests.
First, the call to node device dettach function misses parameters, resulting in the following failure
Failed test 'detached device from host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90. died: Usage: Sys::Virt::NodeDevice::dettach(dev, driversv, flags=0) at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90.
Trivally fixed by adding the missing parameters when calling node device dettach function.
That part is fine to fix (note that 'dettach' is a typo but part of the API, so we are stuck with it; it's nice to use 'detach' where possible).
Second, it appears qemu needs to reach some state of initialization before host device attach/detach works properly. Currently, the test fails when detaching the device from the guest, resetting it, and reattaching it to the host
Failed test 'reset the host PCI device' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 110. died: Sys::Virt::Error (libvirt error code: 1, message: internal error: Not resetting active device 0000:03:00.1)
Failed test 'reattached device to host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 111. died: Sys::Virt::Error (libvirt error code: 55, message: Requested operation is not valid: PCI device 0000:03:00.1 is still in use by driver QEMU, domain tck)
I've found that sleeping for a bit after creating the guest allows the tests to pass.
But that papers over the bug. Ideally, the libvirt API should be waiting longer for the guest to relinquish control of the device before returning success on the first call, so that the second call cannot hit a collision. I'm not so sure that this is the right patch to make to TCK, and wonder if we should instead be patching libvirt proper. Or is it a case that because detach requires guest cooperation, we need to wait long enough for the guest to boot to the point that it can cooperate? If so, then maybe we should expose a proper libvirt event for the underlying 'guest agent available' event (aka VSERPORT_CHANGE) that recent qemu added, and wait for that event to occur within a reasonable time rather than blindly hard-coding a fixed sleep time.
While here, also fix a small typo from copy and paste of disk tests.
It would be better to split off the obvious trivial fixes and push them now, while waiting on discussion for the rest of the patch.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- scripts/domain/250-pci-host-hotplug.t | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/scripts/domain/250-pci-host-hotplug.t b/scripts/domain/250-pci-host-hotplug.t index 4579477..e128de7 100644 --- a/scripts/domain/250-pci-host-hotplug.t +++ b/scripts/domain/250-pci-host-hotplug.t @@ -52,6 +52,8 @@ my $xml = $tck->generic_domain(name => "tck")->as_xml; diag "Creating a new transient domain"; my $dom; ok_domain(sub { $dom = $conn->create_domain($xml) }, "created transient domain object"); +diag "Waiting 10 seconds for guest to initialize"; +sleep(10);
So this is the controversial hunk...
my ($domain, $bus, $slot, $function) = $tck->get_host_pci_device(); @@ -86,7 +88,7 @@ SKIP: {
ok(defined $nodedev, "found PCI device $domain:$bus:$slot.$function on host");
- lives_ok(sub { $nodedev->dettach() }, "detached device from host OS"); + lives_ok(sub { $nodedev->dettach(undef, 0) }, "detached device from host OS"); lives_ok(sub { $nodedev->reset() }, "reset the host PCI device");
my $devxml = @@ -112,6 +114,6 @@ SKIP: {
my $finalxml = $dom->get_xml_description;
- is($initialxml, $finalxml, "final XML has removed the disk") + is($initialxml, $finalxml, "final XML has removed the PCI device")
...but ACK to these two.
}
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Eric Blake wrote:
On 10/23/2014 03:21 PM, Jim Fehlig wrote:
This patch fixes a few issues noted while enabling the TCK PCI device hotplug tests.
First, the call to node device dettach function misses parameters, resulting in the following failure
Failed test 'detached device from host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90. died: Usage: Sys::Virt::NodeDevice::dettach(dev, driversv, flags=0) at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90.
Trivally fixed by adding the missing parameters when calling node device dettach function.
That part is fine to fix (note that 'dettach' is a typo but part of the API, so we are stuck with it; it's nice to use 'detach' where possible).
Second, it appears qemu needs to reach some state of initialization before host device attach/detach works properly. Currently, the test fails when detaching the device from the guest, resetting it, and reattaching it to the host
Failed test 'reset the host PCI device' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 110. died: Sys::Virt::Error (libvirt error code: 1, message: internal error: Not resetting active device 0000:03:00.1)
Failed test 'reattached device to host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 111. died: Sys::Virt::Error (libvirt error code: 55, message: Requested operation is not valid: PCI device 0000:03:00.1 is still in use by driver QEMU, domain tck)
I've found that sleeping for a bit after creating the guest allows the tests to pass.
But that papers over the bug. Ideally, the libvirt API should be waiting longer for the guest to relinquish control of the device before returning success on the first call, so that the second call cannot hit a collision.
Even with this delay, the guest has not completely booted when the test successfully completes. So it is not clear to me how much guest cooperation is needed.
I'm not so sure that this is the right patch to make to TCK, and wonder if we should instead be patching libvirt proper. Or is it a case that because detach requires guest cooperation, we need to wait long enough for the guest to boot to the point that it can cooperate?
Perhaps. Maybe acpiphp needs to be loaded in the guest kernel before this will work.
If so, then maybe we should expose a proper libvirt event for the underlying 'guest agent available' event (aka VSERPORT_CHANGE) that recent qemu added, and wait for that event to occur within a reasonable time rather than blindly hard-coding a fixed sleep time.
I don't see this issue with an operational guest. It only manifests when attempting to attach and detach the device immediately after creating the domain. AFAICT, most tests that require some form of guest cooperation sleep after create_domain. Regards, Jim
While here, also fix a small typo from copy and paste of disk tests.
It would be better to split off the obvious trivial fixes and push them now, while waiting on discussion for the rest of the patch.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- scripts/domain/250-pci-host-hotplug.t | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/scripts/domain/250-pci-host-hotplug.t b/scripts/domain/250-pci-host-hotplug.t index 4579477..e128de7 100644 --- a/scripts/domain/250-pci-host-hotplug.t +++ b/scripts/domain/250-pci-host-hotplug.t @@ -52,6 +52,8 @@ my $xml = $tck->generic_domain(name => "tck")->as_xml; diag "Creating a new transient domain"; my $dom; ok_domain(sub { $dom = $conn->create_domain($xml) }, "created transient domain object"); +diag "Waiting 10 seconds for guest to initialize"; +sleep(10);
So this is the controversial hunk...
my ($domain, $bus, $slot, $function) = $tck->get_host_pci_device(); @@ -86,7 +88,7 @@ SKIP: {
ok(defined $nodedev, "found PCI device $domain:$bus:$slot.$function on host");
- lives_ok(sub { $nodedev->dettach() }, "detached device from host OS"); + lives_ok(sub { $nodedev->dettach(undef, 0) }, "detached device from host OS"); lives_ok(sub { $nodedev->reset() }, "reset the host PCI device");
my $devxml = @@ -112,6 +114,6 @@ SKIP: {
my $finalxml = $dom->get_xml_description;
- is($initialxml, $finalxml, "final XML has removed the disk") + is($initialxml, $finalxml, "final XML has removed the PCI device")
...but ACK to these two.
}

On Fri, Oct 24, 2014 at 09:15:42AM -0600, Jim Fehlig wrote:
Eric Blake wrote:
On 10/23/2014 03:21 PM, Jim Fehlig wrote:
This patch fixes a few issues noted while enabling the TCK PCI device hotplug tests.
First, the call to node device dettach function misses parameters, resulting in the following failure
Failed test 'detached device from host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90. died: Usage: Sys::Virt::NodeDevice::dettach(dev, driversv, flags=0) at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90.
Trivally fixed by adding the missing parameters when calling node device dettach function.
That part is fine to fix (note that 'dettach' is a typo but part of the API, so we are stuck with it; it's nice to use 'detach' where possible).
Second, it appears qemu needs to reach some state of initialization before host device attach/detach works properly. Currently, the test fails when detaching the device from the guest, resetting it, and reattaching it to the host
Failed test 'reset the host PCI device' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 110. died: Sys::Virt::Error (libvirt error code: 1, message: internal error: Not resetting active device 0000:03:00.1)
Failed test 'reattached device to host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 111. died: Sys::Virt::Error (libvirt error code: 55, message: Requested operation is not valid: PCI device 0000:03:00.1 is still in use by driver QEMU, domain tck)
I've found that sleeping for a bit after creating the guest allows the tests to pass.
But that papers over the bug. Ideally, the libvirt API should be waiting longer for the guest to relinquish control of the device before returning success on the first call, so that the second call cannot hit a collision.
Even with this delay, the guest has not completely booted when the test successfully completes. So it is not clear to me how much guest cooperation is needed.
I'm not so sure that this is the right patch to make to TCK, and wonder if we should instead be patching libvirt proper. Or is it a case that because detach requires guest cooperation, we need to wait long enough for the guest to boot to the point that it can cooperate?
Perhaps. Maybe acpiphp needs to be loaded in the guest kernel before this will work.
Yeah, I'm a little fuzzy on just how functiuonal the guest OS has to be. I suspect the kernel mod is sufficient, so whether it finishes booting is probably not important as long as the mod is loaded. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Daniel P. Berrange wrote:
On Fri, Oct 24, 2014 at 09:15:42AM -0600, Jim Fehlig wrote:
Eric Blake wrote:
On 10/23/2014 03:21 PM, Jim Fehlig wrote:
This patch fixes a few issues noted while enabling the TCK PCI device hotplug tests.
First, the call to node device dettach function misses parameters, resulting in the following failure
Failed test 'detached device from host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90. died: Usage: Sys::Virt::NodeDevice::dettach(dev, driversv, flags=0) at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90.
Trivally fixed by adding the missing parameters when calling node device dettach function.
That part is fine to fix (note that 'dettach' is a typo but part of the API, so we are stuck with it; it's nice to use 'detach' where possible).
Second, it appears qemu needs to reach some state of initialization before host device attach/detach works properly. Currently, the test fails when detaching the device from the guest, resetting it, and reattaching it to the host
Failed test 'reset the host PCI device' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 110. died: Sys::Virt::Error (libvirt error code: 1, message: internal error: Not resetting active device 0000:03:00.1)
Failed test 'reattached device to host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 111. died: Sys::Virt::Error (libvirt error code: 55, message: Requested operation is not valid: PCI device 0000:03:00.1 is still in use by driver QEMU, domain tck)
I've found that sleeping for a bit after creating the guest allows the tests to pass.
But that papers over the bug. Ideally, the libvirt API should be waiting longer for the guest to relinquish control of the device before returning success on the first call, so that the second call cannot hit a collision.
Even with this delay, the guest has not completely booted when the test successfully completes. So it is not clear to me how much guest cooperation is needed.
I'm not so sure that this is the right patch to make to TCK, and wonder if we should instead be patching libvirt proper. Or is it a case that because detach requires guest cooperation, we need to wait long enough for the guest to boot to the point that it can cooperate?
Perhaps. Maybe acpiphp needs to be loaded in the guest kernel before this will work.
Yeah, I'm a little fuzzy on just how functiuonal the guest OS has to be. I suspect the kernel mod is sufficient, so whether it finishes booting is probably not important as long as the mod is loaded.
Right. IMO, sleeping in this test is not papering over a bug. The bug is attempting to hotplug/hotunplug devices before the guest has an opportunity to boot. As currently written, the attempt to attach/detach occurs before seabios finishes loading :-). Regards, Jim

On Fri, Oct 24, 2014 at 09:29:29AM -0600, Jim Fehlig wrote:
Daniel P. Berrange wrote:
On Fri, Oct 24, 2014 at 09:15:42AM -0600, Jim Fehlig wrote:
Eric Blake wrote:
On 10/23/2014 03:21 PM, Jim Fehlig wrote:
This patch fixes a few issues noted while enabling the TCK PCI device hotplug tests.
First, the call to node device dettach function misses parameters, resulting in the following failure
Failed test 'detached device from host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90. died: Usage: Sys::Virt::NodeDevice::dettach(dev, driversv, flags=0) at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 90.
Trivally fixed by adding the missing parameters when calling node device dettach function.
That part is fine to fix (note that 'dettach' is a typo but part of the API, so we are stuck with it; it's nice to use 'detach' where possible).
Second, it appears qemu needs to reach some state of initialization before host device attach/detach works properly. Currently, the test fails when detaching the device from the guest, resetting it, and reattaching it to the host
Failed test 'reset the host PCI device' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 110. died: Sys::Virt::Error (libvirt error code: 1, message: internal error: Not resetting active device 0000:03:00.1)
Failed test 'reattached device to host OS' at /usr/share/libvirt-tck/tests/domain/250-pci-host-hotplug.t line 111. died: Sys::Virt::Error (libvirt error code: 55, message: Requested operation is not valid: PCI device 0000:03:00.1 is still in use by driver QEMU, domain tck)
I've found that sleeping for a bit after creating the guest allows the tests to pass.
But that papers over the bug. Ideally, the libvirt API should be waiting longer for the guest to relinquish control of the device before returning success on the first call, so that the second call cannot hit a collision.
Even with this delay, the guest has not completely booted when the test successfully completes. So it is not clear to me how much guest cooperation is needed.
I'm not so sure that this is the right patch to make to TCK, and wonder if we should instead be patching libvirt proper. Or is it a case that because detach requires guest cooperation, we need to wait long enough for the guest to boot to the point that it can cooperate?
Perhaps. Maybe acpiphp needs to be loaded in the guest kernel before this will work.
Yeah, I'm a little fuzzy on just how functiuonal the guest OS has to be. I suspect the kernel mod is sufficient, so whether it finishes booting is probably not important as long as the mod is loaded.
Right. IMO, sleeping in this test is not papering over a bug. The bug is attempting to hotplug/hotunplug devices before the guest has an opportunity to boot. As currently written, the attempt to attach/detach occurs before seabios finishes loading :-).
The "fix" would be for us to wait for the guest to boot before running the test, but that's not something you should have to solve right now. So I'm fine with a sleep. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Daniel P. Berrange wrote:
On Fri, Oct 24, 2014 at 09:29:29AM -0600, Jim Fehlig wrote:
Right. IMO, sleeping in this test is not papering over a bug. The bug is attempting to hotplug/hotunplug devices before the guest has an opportunity to boot. As currently written, the attempt to attach/detach occurs before seabios finishes loading :-).
The "fix" would be for us to wait for the guest to boot before running the test, but that's not something you should have to solve right now. So I'm fine with a sleep.
Thanks. Finally got around to pushing this today. Regards, Jim
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Jim Fehlig