On Thu, May 10, 2018 at 2:07 PM, Laine Stump <laine(a)redhat.com> wrote:
On 05/10/2018 02:53 PM, Ihar Hrachyshka wrote:
> Hi,
>
> In kubevirt, we discovered [1] that whenever e1000 is used for vNIC,
> link on the interface becomes ready several seconds after 'ifup' is
> executed
What is your definition of "becomes ready"? Are you looking at the
output of "ip link show" in the guest? Or are you watching "brctl
showstp" for the bridge device on the host? Or something else?
I was watching the guest dmesg for the following messages:
[ 4.773275] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 6.769235] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: RX
[ 6.771408] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
For e1000, there are 2 seconds in between those messages; for virtio,
it's near instant. Interesting that it happens on the very first ifup;
when I do it the second time after the guest booted, it's instant.
> which for some buggy images like cirros may slow down boot
> process for up to 1 minute [2]. If we switch from e1000 to virtio, the
> link is brought up and ready almost immediately.
>
> For the record, I am using the following versions:
> - L0 kernel: 4.16.5-200.fc27.x86_64 #1 SMP
> - libvirt: 3.7.0-4.fc27
> - guest kernel: 4.4.0-28-generic #47-Ubuntu
>
> Is there something specific about e1000 that makes it initialize the
> link too slowly on libvirt or guest side?
There isn't anything libvirt could do that would cause the link to
IFF_UP up any faster or slower, so if there is an issue it's elsewhere.
Since switching to the virtio device eliminates the problem, my guess
would be that it's something about the implementation of the emulated
device in qemu that is causing a delay in the e1000 driver in the guest.
That's just a guess though.
>
> [1]
https://github.com/kubevirt/kubevirt/issues/936
> [2]
https://bugs.launchpad.net/cirros/+bug/1768955
(I discount the idea of the stp delay timer having an effect, as
suggested in one of the comments on github that points to my explanation
of STP in a libvirt bugzilla record, because that would cause the same
problem for e1000 or virtio).
Yes, it's not STP, and I also tried to explicitly set all bridge
timers to 0 with no result. I also did "tcpdump -i any" inside the
container that hosts the VM VIF, and there was no relevant traffic on
tap device.
I hesitate to suggest this, because the rtl8139 code in qemu is
considered less well maintained and lower performance than e1000, but
have you tried setting that model to see how it behaves? You may be
forced to make that the default when virtio isn't available.
Indeed rth8139 is near instant too:
[ 4.156872] 8139cp 0000:07:01.0 eth0: link up, 100Mbps,
full-duplex, lpa 0x05E1
[ 4.177520] 8139cp 0000:07:01.0 eth0: link up, 100Mbps,
full-duplex, lpa 0x05E1
Thanks for the tip, we will consider it too (also thanks for the
background info about the driver support state).
Another thought - I guess the virtio driver in Cirros is always
available? Perhaps kubevirt could use libosinfo to auto-decide what
device to use for networking based on OS.
This, or we can introduce explicit tags for NICs / guest type to use.
Thanks a lot for reply,
Ihar