Hi!
I hit an issue with PCI passthrough - after I reboot the VM IOMMU
mappings are incorrect and devices will access invalid memory.
The issue is quite easy to reproduce, I'm using NICs (e.g. Intel 40
Gig Ethernet NICs hit the issue reliably, I wasn't able to get
Mellanox NICs to work reliably due to other bugs) but you could
probably just pass through any PCI device which does a bit of DMA.
With NICs I do the following to reproduce:
- configure the NIC for SR-IOV passthrough [1];
- create two standard VMs;
- configure VMs with 4GB current allocation and 15GB maximum
allocation of memory (my machines have 32 or 64GB total);
- pass a VF to each machine;
Note1: the current/maximum allocation of memory seem to play a role
here. I'm not 100% sure, however, if it causes the bug or just
makes it more likely to be triggered.
Note2: I leave <on_reboot>restart</on_reboot> so that VMs can reboot.
I was able to reproduce easily on 3 distinct machines: dual CPU
Haswell E, single CPU Haswell E, single Sandy Bridge EP.
With the VMs created above do the following:
(1) boot;
(2) configure VF interfaces;
(3) run ping -c30 to confirm they can communicate;
(4) run iperf -P4 -t30 between the machines;
(5) reboot;
(6) goto 2;
First time (fresh after boot) ping and iperf should work fine. After
first reboot, there should already be communication problems. From
traffic inspection with tcpdump it appears that VFs receive zeroed
packets. Only some of the packets are zeroed so depending on your luck
the communication may work for a while or just have limited throughput.
Usually it breaks down when ARP or important TCP segment gets placed in
area that device reads as zero. Reboot will not fix this condition,
shutdown/start will.
I reproduced this with fully up-to-date Ubuntu 14.04 (both host and VM).
I also tried kernel from linux-next.git (4ef7675344d68) and qemu from
tip of their git (38a762f) and bug is still there. libvirt in Ubuntu
14.04 is at 1.2.2.
As already said - I reproduce this easily within a minute on a range of
machines. If someone looks into this and have problems hitting the bug
- please let me know.
Kuba
[1] For Intel NICs supported by i40e:
http://www.intel.es/content/www/es/es/embedded/products/networking/xl710-...