
On Thu, Apr 04, 2019 at 02:08:41PM +0000, Frank Schreuder wrote:
To fix this issue I have to restart libvirt. Some iptable chains are missing, which is probably caused by a nwfilter-define operation. I'm able to reproduce this bug within 2 hours by running 2 loops. One loop is defining nwfilters and the second loop is destroying and starting multiple VMs.
The fact that we recreate it everytime we try to start a guest also means any problem should be self-correcting which makes it even more strange that you need to have a restart
It seems that the problem is a race condition between libvirt and our reload-iptables script. Libvirt inserts and removes rules one by one, while reload-iptables uses iptables-save and iptables-restore. The script reload-iptables saves libvirt firewall rules to a temp-file appends puppet's rules, and then imports said temp-file. When libvirt is inserting firewall rules between the save and import from reload-iptables we get unexpected behaviour.
Ah right, I should have thought of something like that. Protecting against concurrent app that dumps & recreates iptables rules in parallel with libvirt doing its work with iptables is not really practical I'm afraid :-( It is one of the big painpoints of dealing with iptables.
So far I am not able to reproduce this bug on libvirt 5.0.0.
This is interesting, because AFAICT we had no changes to the nwfilter driver between 5.0.0 and 5.1.0 that would affect this behaviour.
We did have the changes to the virtual network driver but that should not interfere with the nwfilter driver.
The hypervisor running libvirt 5.0.0 was not using this reload-iptables script.
Ok, that explains it! Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|