
On Thu, May 10, 2018 at 2:07 PM, Laine Stump <laine@redhat.com> wrote:
On 05/10/2018 02:53 PM, Ihar Hrachyshka wrote:
Hi,
In kubevirt, we discovered [1] that whenever e1000 is used for vNIC, link on the interface becomes ready several seconds after 'ifup' is executed
What is your definition of "becomes ready"? Are you looking at the output of "ip link show" in the guest? Or are you watching "brctl showstp" for the bridge device on the host? Or something else?
I was watching the guest dmesg for the following messages: [ 4.773275] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 6.769235] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 6.771408] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready For e1000, there are 2 seconds in between those messages; for virtio, it's near instant. Interesting that it happens on the very first ifup; when I do it the second time after the guest booted, it's instant.
which for some buggy images like cirros may slow down boot process for up to 1 minute [2]. If we switch from e1000 to virtio, the link is brought up and ready almost immediately.
For the record, I am using the following versions: - L0 kernel: 4.16.5-200.fc27.x86_64 #1 SMP - libvirt: 3.7.0-4.fc27 - guest kernel: 4.4.0-28-generic #47-Ubuntu
Is there something specific about e1000 that makes it initialize the link too slowly on libvirt or guest side?
There isn't anything libvirt could do that would cause the link to IFF_UP up any faster or slower, so if there is an issue it's elsewhere. Since switching to the virtio device eliminates the problem, my guess would be that it's something about the implementation of the emulated device in qemu that is causing a delay in the e1000 driver in the guest. That's just a guess though.
[1] https://github.com/kubevirt/kubevirt/issues/936 [2] https://bugs.launchpad.net/cirros/+bug/1768955
(I discount the idea of the stp delay timer having an effect, as suggested in one of the comments on github that points to my explanation of STP in a libvirt bugzilla record, because that would cause the same problem for e1000 or virtio).
Yes, it's not STP, and I also tried to explicitly set all bridge timers to 0 with no result. I also did "tcpdump -i any" inside the container that hosts the VM VIF, and there was no relevant traffic on tap device.
I hesitate to suggest this, because the rtl8139 code in qemu is considered less well maintained and lower performance than e1000, but have you tried setting that model to see how it behaves? You may be forced to make that the default when virtio isn't available.
Indeed rth8139 is near instant too: [ 4.156872] 8139cp 0000:07:01.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1 [ 4.177520] 8139cp 0000:07:01.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1 Thanks for the tip, we will consider it too (also thanks for the background info about the driver support state).
Another thought - I guess the virtio driver in Cirros is always available? Perhaps kubevirt could use libosinfo to auto-decide what device to use for networking based on OS.
This, or we can introduce explicit tags for NICs / guest type to use. Thanks a lot for reply, Ihar