Hi,
A new security fix [1],[2] and [3] merged to qemu.
After updating the packages we started to get "qemu-system-x86_64: Virtqueue size
exceeded", when resuming the guest.
Our environment is OpenStack master and we have Mellanox CI that test SR-IOV
functionality.
Ubuntu 14.04 with Qemu 2.0.0+dfsg-2ubuntu1.26 that contains the fixes see [2]
ii qemu-kvm 2.0.0+dfsg-2ubuntu1.26 amd64
QEMU Full virtualization
ii qemu-system-x86 2.0.0+dfsg-2ubuntu1.26 amd64
QEMU full system emulation binaries (x86)
ii qemu-utils 2.0.0+dfsg-2ubuntu1.26 amd64
QEMU utilities
Our CI started to fail last week when this security packages released.
The scenarios is as follows (sorry for the OpenStack commands :)) :
1. nova boot guest
2. nova suspend guest
3. nova resume guest
The result is that the guest is in poweroff state and when I power it on everything is
working fine.
I tested in direct port (SR-IOV) and normal port (virtual port) and it happens in both
cases.
According to the [3] it prevent from malicious guest to submit more requests than the
virtqueuesize permits.
Our CI uses proprietary Cirros image with mlnx4_en driver.
(
http://13.69.151.247/images/mellanox_eth.img)
I started to test it with other images to see if the problem with our image.
I also tested with Ubuntu image -
https://cloud-images.ubuntu.com/wily/current/wily-server-cloudimg-amd64-d...
And OpenStack Cirros image
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img
The Ubuntu image had the same failure, but the Cirros worked.
I wonder if there is a problem with the patch or with the images?
What in these images can make them malicious guest?
[1] -
https://access.redhat.com/security/cve/cve-2016-5403
[2] -
http://www.ubuntu.com/usn/usn-3047-1/
[3] -
https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06257.html