Hi Daniel,
On Thursday, October 08, 2015 10:20:48 AM Daniel P. Berrange wrote:
On Thu, Oct 08, 2015 at 10:17:23AM +0100, Daniel P. Berrange wrote:
> On Mon, Oct 05, 2015 at 05:18:42PM -0600, Mike Latimer wrote:
> > diag "Creating a new transient domain";
> > my $dom;
> > ok_domain(sub { $dom = $conn->create_domain($xml) }, "created
transient
> > domain object");> >
> > +sleep(20);
>
> I'm really not a fan to adding arbitrary 20 second delays to so
> many of our tests, as it dramatically increases the running time
> of the test suite.
>
> Can you explain more about the actual failure / problem scenario
> you are seeing ?
Sure.
There are actually two different types of tests that are more reliable (in my
environment) with the sleeps added - domain/21[05]-nic-hotplug* and
nwfilter/[12]*. (I'm not actually seeing a problem with domain/200-disk-
hotplug, but I wanted to be consistent across the tests, and the additional 20
seconds didn't impact me.)
The nic-hotplug tests don't spin up an entire vm (contrary to my poor
description in patch 5/6). Instead, they just use the kernel and initrd found
in /var/cache/libvirt-tck/os-x86_64-hvm, and an empty 100MB disk. Regardless
of what kernel I have in place (Fedora 19/22, openSUSE 13.2/Leap), these tests
attach the NIC properly, and appear to detach the NIC properly. However, the
final test finds the NIC still defined in the XML. Adding an 11-13 second delay
immediately after creating the domain resolves the problem (across all
kernel/initrds).
In the nwfilter tests, there are two problems I sometimes encounter. The first
is that the domain (which is an actual working domain), does not obtain an IP
address before the test tries to find and use that address. Sleeping for an
additional 5-10 seconds before trying to get the address is enough to avoid
the problem.
The second nwfilter issue is that I have seen instances where the domain will
not shutdown in 30 seconds. Increasing the graceful shutdown window to 60
seconds resolve this issue. (This change does not increase test time when
using a domain that does shut down quicker.)
Although we don't use ssh in these tests, perhaps it is
reasonable
to also poll on connect() on the ssh port to identify when the SSH
server has started, and use that as an indication that the guest
has booted.
In the nic hotplug tests, there is no SSH server. Sorry for misleading you
based on my incorrect "testing larger domains" comment. In the nwfilter tests,
this approach will not work as the necessary delays are all before the domain
is up and running with an IP address.
Any ideas (other than sleeping)? (I'm happy to rework the patch to drop the
delay in the block hotplug script as that is not necessary. I can also reduce
the other sleeps by 5 seconds or so, if it matters.)
Thanks,
Mike