[libvirt] [RFC] Faster libvirtd restart with nwfilter rules

Hi, all. On fat hosts which are capable to run hundreds of VMs restarting libvirtd makes it's services unavailable for a long time if VMs use network filters. In my tests each of 100 VMs has no-promisc [1] and no-mac-spoofing filters and executing virsh list right after daemon restart takes appoximately 140s if no firewalld is running (that is ebtables/iptables/ip6tables commands are used to configure kernel tables). The problem is daemon does not even start to read from client connections because state drivers are not initialized. Initialization is blocked in state drivers autostart which grabs VMs locks. And VMs locks are hold by VMs reconnection code. Each VM reloads network tables on reconnection and this reloading is serialized on updateMutex in gentech nwfilter driver. Workarounding autostart won't help much because even if state drivers will initialize listing VM won't be possible because listing VMs takes each VM lock one by one too. However managing VM that passed reconnection phase will be possible which takes same 140s in worst case. Note that this issue is only applicable if we use filters configuration that don't need ip learning. In the latter case situation is different because reconnection code spawns new thread that apply network rules only after ip is learned from traffic and this thread does not grab VM lock. As result VMs are managable but reloading filters in background takes appoximately those same 140s. I guess managing network filters during this period can have issues too. Anyway this situation does not look good so fixing the described issue by spawning threads even without ip learning does not look nice to me. What speed up is possible on conservative approach? First we can remove for test purpuses firewall ruleLock, gentech dirver updateMutex and filter object mutex which do not serve function in restart scenario. This gives 36s restart time. The speed up is archived because heavy fork/preexec steps are now run concurrently. Next we can try to reduce fork/preexec time. To estimate its contibution alone let's bring back the above locks. It turns out the most time takes fork itself and closing 8k (on my system) file descriptors in preexec. Using vfork gives 2x boost and so does dropping mass close. (I check this mass close contribution because I not quite understand the purpose of this step - libvirt typically set close-on-exec flag on it's descriptors). So this two optimizations alone can result in restart time of 30s. Unfortunately combining the above two approaches does not give boost multiple of them along. The reason is due to concurrency and high number of VMs (100) preexec boost does not have significant role and using vfork dininishes concurrency as it freezes all parent threads before execve. So dropping locks and closes gives 33s restart time and adding vfork to this gives 25s restart time. Another approach is to use --atomic-file option for ebtables (iptables/ip6tables unfortunately does not have one). The idea is to save table to file/edit file/commit table to kernel. I hoped this could give performance boost because we don't need to load/store kernel network table for a single rule update. In order to isolate approaches I also dropped all ip/ip6 updates which can not be done this way. In this approach we can not drop ruleLock in firewall because no other VM threads should change tables between save/commit. This approach gives restart time 25s. But this approach is broken anyway as we can not be sure another application doesn't change newtork table between save/commit in which case these changes will be lost. After all I think we need to move in a different direction. We can add API to all binaries and firewalld to execute many commands in one run. We can pass commands as arguments or wrote them into file which is then given to binary. Then libvirt itself can update for example bridge network table in couple of commands. The exact number depends on new API. For example if we add option to delete chains recursively and an option not to fail on NOENT error we can change table in one command (no listing current rules is required). [1] no-promisc filter <filter name='no-promisc' chain='root' priority='-750'> <uuid>6d055022-1192-4a3d-ae1f-576baa5564b6</uuid> <rule action='return' direction='in' priority='500'> <mac dstmacaddr='ff:ff:ff:ff:ff:ff'/> </rule> <rule action='return' direction='in' priority='500'> <mac dstmacaddr='$MAC'/> </rule> <rule action='return' direction='in' priority='500'> <mac dstmacaddr='33:33:00:00:00:00' dstmacmask='ff:ff:00:00:00:00'/> </rule> <rule action='drop' direction='in' priority='500'> <mac/> </rule> <rule action='return' direction='in' priority='500'> <mac dstmacaddr='01:00:5e:00:00:00' dstmacmask='ff:ff:ff:80:00:00'/> </rule> </filter>

ping On 24.09.2018 10:41, Nikolay Shirokovskiy wrote:
Hi, all.
On fat hosts which are capable to run hundreds of VMs restarting libvirtd makes it's services unavailable for a long time if VMs use network filters. In my tests each of 100 VMs has no-promisc [1] and no-mac-spoofing filters and executing virsh list right after daemon restart takes appoximately 140s if no firewalld is running (that is ebtables/iptables/ip6tables commands are used to configure kernel tables).
The problem is daemon does not even start to read from client connections because state drivers are not initialized. Initialization is blocked in state drivers autostart which grabs VMs locks. And VMs locks are hold by VMs reconnection code. Each VM reloads network tables on reconnection and this reloading is serialized on updateMutex in gentech nwfilter driver. Workarounding autostart won't help much because even if state drivers will initialize listing VM won't be possible because listing VMs takes each VM lock one by one too. However managing VM that passed reconnection phase will be possible which takes same 140s in worst case.
Note that this issue is only applicable if we use filters configuration that don't need ip learning. In the latter case situation is different because reconnection code spawns new thread that apply network rules only after ip is learned from traffic and this thread does not grab VM lock. As result VMs are managable but reloading filters in background takes appoximately those same 140s. I guess managing network filters during this period can have issues too. Anyway this situation does not look good so fixing the described issue by spawning threads even without ip learning does not look nice to me.
What speed up is possible on conservative approach? First we can remove for test purpuses firewall ruleLock, gentech dirver updateMutex and filter object mutex which do not serve function in restart scenario. This gives 36s restart time. The speed up is archived because heavy fork/preexec steps are now run concurrently.
Next we can try to reduce fork/preexec time. To estimate its contibution alone let's bring back the above locks. It turns out the most time takes fork itself and closing 8k (on my system) file descriptors in preexec. Using vfork gives 2x boost and so does dropping mass close. (I check this mass close contribution because I not quite understand the purpose of this step - libvirt typically set close-on-exec flag on it's descriptors). So this two optimizations alone can result in restart time of 30s.
Unfortunately combining the above two approaches does not give boost multiple of them along. The reason is due to concurrency and high number of VMs (100) preexec boost does not have significant role and using vfork dininishes concurrency as it freezes all parent threads before execve. So dropping locks and closes gives 33s restart time and adding vfork to this gives 25s restart time.
Another approach is to use --atomic-file option for ebtables (iptables/ip6tables unfortunately does not have one). The idea is to save table to file/edit file/commit table to kernel. I hoped this could give performance boost because we don't need to load/store kernel network table for a single rule update. In order to isolate approaches I also dropped all ip/ip6 updates which can not be done this way. In this approach we can not drop ruleLock in firewall because no other VM threads should change tables between save/commit. This approach gives restart time 25s. But this approach is broken anyway as we can not be sure another application doesn't change newtork table between save/commit in which case these changes will be lost.
After all I think we need to move in a different direction. We can add API to all binaries and firewalld to execute many commands in one run. We can pass commands as arguments or wrote them into file which is then given to binary. Then libvirt itself can update for example bridge network table in couple of commands. The exact number depends on new API. For example if we add option to delete chains recursively and an option not to fail on NOENT error we can change table in one command (no listing current rules is required).
[1] no-promisc filter
<filter name='no-promisc' chain='root' priority='-750'> <uuid>6d055022-1192-4a3d-ae1f-576baa5564b6</uuid> <rule action='return' direction='in' priority='500'> <mac dstmacaddr='ff:ff:ff:ff:ff:ff'/> </rule> <rule action='return' direction='in' priority='500'> <mac dstmacaddr='$MAC'/> </rule> <rule action='return' direction='in' priority='500'> <mac dstmacaddr='33:33:00:00:00:00' dstmacmask='ff:ff:00:00:00:00'/> </rule> <rule action='drop' direction='in' priority='500'> <mac/> </rule> <rule action='return' direction='in' priority='500'> <mac dstmacaddr='01:00:5e:00:00:00' dstmacmask='ff:ff:ff:80:00:00'/> </rule> </filter>
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Mon, Sep 24, 2018 at 10:41:37AM +0300, Nikolay Shirokovskiy wrote:
Hi, all.
On fat hosts which are capable to run hundreds of VMs restarting libvirtd makes it's services unavailable for a long time if VMs use network filters. In my tests each of 100 VMs has no-promisc [1] and no-mac-spoofing filters and executing virsh list right after daemon restart takes appoximately 140s if no firewalld is running (that is ebtables/iptables/ip6tables commands are used to configure kernel tables).
Yep, this is not entirely surprising given the huge number of rules we try to create.
The problem is daemon does not even start to read from client connections because state drivers are not initialized. Initialization is blocked in state drivers autostart which grabs VMs locks. And VMs locks are hold by VMs reconnection code. Each VM reloads network tables on reconnection and this reloading is serialized on updateMutex in gentech nwfilter driver. Workarounding autostart won't help much because even if state drivers will initialize listing VM won't be possible because listing VMs takes each VM lock one by one too. However managing VM that passed reconnection phase will be possible which takes same 140s in worst case.
In the QEMU driver we call qemuProcesssReconnectAll(). This spawns one thread per VM that needs connecting to. So AFAICT, state driver initialization should not be blocked. Libvirtd should accept commands, but any APIs that touch a virDomainObj will of course still be blocked until reconnect is completed.
Note that this issue is only applicable if we use filters configuration that don't need ip learning. In the latter case situation is different because reconnection code spawns new thread that apply network rules only after ip is learned from traffic and this thread does not grab VM lock. As result VMs are managable but reloading filters in background takes appoximately those same 140s. I guess managing network filters during this period can have issues too.
I believe it should be possible to still change network filters while this reconnect is taking place - it would mean that any VMs affected by the filter change would have their iptables rules re-built a second time. However...
Anyway this situation does not look good so fixing the described issue by spawning threads even without ip learning does not look nice to me.
this is tricky - we want to know filters are built before we allow the VM to start running, so we don't want to spawn a background thread in general. IP learning has special code that sets up a minimal safe ruleset before spawning the thread and waiting todo the real ruleset creation based on the learn IP address. So I don't think it is desirable to change all rule creation to run in a background thread.
What speed up is possible on conservative approach? First we can remove for test purpuses firewall ruleLock, gentech dirver updateMutex and filter object mutex which do not serve function in restart scenario. This gives 36s restart time. The speed up is archived because heavy fork/preexec steps are now run concurrently.
To me the most obvious speedup is not to run any commands at all. 99% of the time when we restart libvirtd and re-build network filters we are achieving nothing useful. We tear down the rules and replace them by exactly the same rules. The idea behind rebuilding at startup is that someone might have changed the config on diks while libvirtd was not running, and we want to ensure that the VMs have the correct live config after this. Alternatively libvirtd has been upgraded to a newer version, and we want to rebuild with the new code in case old code had a bug causing it to create incorrect rules. I think we should look at how to optimize this to be more intelligent. We could somehow record a hash of what rule contents was used originally. Only rebuild the rules if we see the hash of nwfilter rules has changed, or if libvirtd binary has had code changes.
Next we can try to reduce fork/preexec time. To estimate its contibution alone let's bring back the above locks. It turns out the most time takes fork itself and closing 8k (on my system) file descriptors in preexec. Using vfork gives 2x boost and so does dropping mass close. (I check this mass close contribution because I not quite understand the purpose of this step - libvirt typically set close-on-exec flag on it's descriptors). So this two optimizations alone can result in restart time of 30s.
I don't like the idea of using vfork(). It is already hard to do stuff between fork() and execve() safely as you're restricted to async signal safe functions. vfork() places an even greater number of restrictions on code that can be run. It is especially dangerous in threaded programs as only the calling thread is suspended. Unfortunately we cannot rely on close-on-exec. Depending on OS and/or version, clsoe-on-exec setting may or may not be atomic. eg open(...., O_CLOEXEC) vs open(...) fcntl(O_CLOSEXEC) the later has a race we'd hit. In addition libvirt links to a huge number of 3rd party libraries and I have essentially zero confidence in them setting O_CLOEXEC correctly all the time. IOW, the mass close is inescapable IMHO. The only thing I could see us doing is to spawn some minimalist helper process which then spawns iptables as needed. We'd only need the mass-close once for the helper process, which can then just spawn iptables without mass-close, or simply set a tiny max files ulimit for this helper process so that it only 'mass close' 20 FDs. We'd have to write the set of desired iptables rules into some data format, send it to the helper to execute and wait for results. Probably not too difficult to achieve.
Unfortunately combining the above two approaches does not give boost multiple of them along. The reason is due to concurrency and high number of VMs (100) preexec boost does not have significant role and using vfork dininishes concurrency as it freezes all parent threads before execve. So dropping locks and closes gives 33s restart time and adding vfork to this gives 25s restart time.
Another approach is to use --atomic-file option for ebtables (iptables/ip6tables unfortunately does not have one). The idea is to save table to file/edit file/commit table to kernel. I hoped this could give performance boost because we don't need to load/store kernel network table for a single rule update. In order to isolate approaches I also dropped all ip/ip6 updates which can not be done this way. In this approach we can not drop ruleLock in firewall because no other VM threads should change tables between save/commit. This approach gives restart time 25s. But this approach is broken anyway as we can not be sure another application doesn't change newtork table between save/commit in which case these changes will be lost.
After all I think we need to move in a different direction. We can add API to all binaries and firewalld to execute many commands in one run. We can pass commands as arguments or wrote them into file which is then given to binary.
This sounds like 'iptables-restore' command which accepts a batch of commands in a file. There's also ip6tables-restore and ebtables-restore. There's no way to execute these via firewalld though. It is also tricky when we need to check the output of certan commands before making a decision on which other commands to run. It would certainly speed things up though to load batches of rules at once. IMHO the most immediately useful thing would be to work on optimization to skip re-creating nwfilter rules on startup, unless we detect there are likely changes. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11.10.2018 13:20, Daniel P. Berrangé wrote:
On Mon, Sep 24, 2018 at 10:41:37AM +0300, Nikolay Shirokovskiy wrote:
Hi, all.
On fat hosts which are capable to run hundreds of VMs restarting libvirtd makes it's services unavailable for a long time if VMs use network filters. In my tests each of 100 VMs has no-promisc [1] and no-mac-spoofing filters and executing virsh list right after daemon restart takes appoximately 140s if no firewalld is running (that is ebtables/iptables/ip6tables commands are used to configure kernel tables).
Yep, this is not entirely surprising given the huge number of rules we try to create.
The problem is daemon does not even start to read from client connections because state drivers are not initialized. Initialization is blocked in state drivers autostart which grabs VMs locks. And VMs locks are hold by VMs reconnection code. Each VM reloads network tables on reconnection and this reloading is serialized on updateMutex in gentech nwfilter driver. Workarounding autostart won't help much because even if state drivers will initialize listing VM won't be possible because listing VMs takes each VM lock one by one too. However managing VM that passed reconnection phase will be possible which takes same 140s in worst case.
In the QEMU driver we call qemuProcesssReconnectAll(). This spawns one thread per VM that needs connecting to.
So AFAICT, state driver initialization should not be blocked. Libvirtd should accept commands, but any APIs that touch a virDomainObj will of course still be blocked until reconnect is completed.
Note that this issue is only applicable if we use filters configuration that don't need ip learning. In the latter case situation is different because reconnection code spawns new thread that apply network rules only after ip is learned from traffic and this thread does not grab VM lock. As result VMs are managable but reloading filters in background takes appoximately those same 140s. I guess managing network filters during this period can have issues too.
I believe it should be possible to still change network filters while this reconnect is taking place - it would mean that any VMs affected by the filter change would have their iptables rules re-built a second time. However...
Anyway this situation does not look good so fixing the described issue by spawning threads even without ip learning does not look nice to me.
this is tricky - we want to know filters are built before we allow the VM to start running, so we don't want to spawn a background thread in general. IP learning has special code that sets up a minimal safe ruleset before spawning the thread and waiting todo the real ruleset creation based on the learn IP address. So I don't think it is desirable to change all rule creation to run in a background thread.
What speed up is possible on conservative approach? First we can remove for test purpuses firewall ruleLock, gentech dirver updateMutex and filter object mutex which do not serve function in restart scenario. This gives 36s restart time. The speed up is archived because heavy fork/preexec steps are now run concurrently.
To me the most obvious speedup is not to run any commands at all.
99% of the time when we restart libvirtd and re-build network filters we are achieving nothing useful. We tear down the rules and replace them by exactly the same rules.
The idea behind rebuilding at startup is that someone might have changed the config on diks while libvirtd was not running, and we want to ensure that the VMs have the correct live config after this.
Alternatively libvirtd has been upgraded to a newer version, and we want to rebuild with the new code in case old code had a bug causing it to create incorrect rules.
I think we should look at how to optimize this to be more intelligent.
We could somehow record a hash of what rule contents was used originally. Only rebuild the rules if we see the hash of nwfilter rules has changed, or if libvirtd binary has had code changes.
Next we can try to reduce fork/preexec time. To estimate its contibution alone let's bring back the above locks. It turns out the most time takes fork itself and closing 8k (on my system) file descriptors in preexec. Using vfork gives 2x boost and so does dropping mass close. (I check this mass close contribution because I not quite understand the purpose of this step - libvirt typically set close-on-exec flag on it's descriptors). So this two optimizations alone can result in restart time of 30s.
I don't like the idea of using vfork(). It is already hard to do stuff between fork() and execve() safely as you're restricted to async signal safe functions. vfork() places an even greater number of restrictions on code that can be run. It is especially dangerous in threaded programs as only the calling thread is suspended.
Unfortunately we cannot rely on close-on-exec. Depending on OS and/or version, clsoe-on-exec setting may or may not be atomic. eg
open(...., O_CLOEXEC)
vs
open(...) fcntl(O_CLOSEXEC)
the later has a race we'd hit.
In addition libvirt links to a huge number of 3rd party libraries and I have essentially zero confidence in them setting O_CLOEXEC correctly all the time.
IOW, the mass close is inescapable IMHO.
The only thing I could see us doing is to spawn some minimalist helper process which then spawns iptables as needed. We'd only need the mass-close once for the helper process, which can then just spawn iptables without mass-close, or simply set a tiny max files ulimit for this helper process so that it only 'mass close' 20 FDs.
We'd have to write the set of desired iptables rules into some data format, send it to the helper to execute and wait for results. Probably not too difficult to achieve.
Unfortunately combining the above two approaches does not give boost multiple of them along. The reason is due to concurrency and high number of VMs (100) preexec boost does not have significant role and using vfork dininishes concurrency as it freezes all parent threads before execve. So dropping locks and closes gives 33s restart time and adding vfork to this gives 25s restart time.
Another approach is to use --atomic-file option for ebtables (iptables/ip6tables unfortunately does not have one). The idea is to save table to file/edit file/commit table to kernel. I hoped this could give performance boost because we don't need to load/store kernel network table for a single rule update. In order to isolate approaches I also dropped all ip/ip6 updates which can not be done this way. In this approach we can not drop ruleLock in firewall because no other VM threads should change tables between save/commit. This approach gives restart time 25s. But this approach is broken anyway as we can not be sure another application doesn't change newtork table between save/commit in which case these changes will be lost.
After all I think we need to move in a different direction. We can add API to all binaries and firewalld to execute many commands in one run. We can pass commands as arguments or wrote them into file which is then given to binary.
This sounds like 'iptables-restore' command which accepts a batch of commands in a file. There's also ip6tables-restore and ebtables-restore.
There's no way to execute these via firewalld though.
It is also tricky when we need to check the output of certan commands before making a decision on which other commands to run. It would certainly speed things up though to load batches of rules at once.
IMHO the most immediately useful thing would be to work on optimization to skip re-creating nwfilter rules on startup, unless we detect there are likely changes.
Thanx! Quite simple and effective indeed. Nikolay

On 10/11/2018 06:20 AM, Daniel P. Berrangé wrote:
On Mon, Sep 24, 2018 at 10:41:37AM +0300, Nikolay Shirokovskiy wrote:
What speed up is possible on conservative approach? First we can remove for test purpuses firewall ruleLock, gentech dirver updateMutex and filter object mutex which do not serve function in restart scenario. This gives 36s restart time. The speed up is archived because heavy fork/preexec steps are now run concurrently. To me the most obvious speedup is not to run any commands at all.
99% of the time when we restart libvirtd and re-build network filters we are achieving nothing useful. We tear down the rules and replace them by exactly the same rules.
The idea behind rebuilding at startup is that someone might have changed the config on diks while libvirtd was not running, and we want to ensure that the VMs have the correct live config after this.
Alternatively libvirtd has been upgraded to a newer version, and we want to rebuild with the new code in case old code had a bug causing it to create incorrect rules.
I think we should look at how to optimize this to be more intelligent.
We could somehow record a hash of what rule contents was used originally. Only rebuild the rules if we see the hash of nwfilter rules has changed, or if libvirtd binary has had code changes.
Since we pre-emptively delete any existing copy of a rule just prior to adding that rule, another way we could reduce startup time would be to spend less time deleting. one way of doing that would (I think) be if we could add all of our rules in a set of chains that we create ourselves. Then when libvirtd restarted, we could just issue a few commands to flush those chains. (come to think of it, nwfilter *already* puts most/all of its rules in self-created chains. Hmm....
Next we can try to reduce fork/preexec time. To estimate its contibution alone let's bring back the above locks. It turns out the most time takes fork itself and closing 8k (on my system) file descriptors in preexec. Using vfork gives 2x boost and so does dropping mass close. (I check this mass close contribution because I not quite understand the purpose of this step - libvirt typically set close-on-exec flag on it's descriptors). So this two optimizations alone can result in restart time of 30s. I don't like the idea of using vfork(). It is already hard to do stuff between fork() and execve() safely as you're restricted to async signal safe functions. vfork() places an even greater number of restrictions on code that can be run. It is especially dangerous in threaded programs as only the calling thread is suspended.
Unfortunately we cannot rely on close-on-exec. Depending on OS and/or version, clsoe-on-exec setting may or may not be atomic. eg
open(...., O_CLOEXEC)
vs
open(...) fcntl(O_CLOSEXEC)
the later has a race we'd hit.
In addition libvirt links to a huge number of 3rd party libraries and I have essentially zero confidence in them setting O_CLOEXEC correctly all the time.
IOW, the mass close is inescapable IMHO.
The only thing I could see us doing is to spawn some minimalist helper process which then spawns iptables as needed. We'd only need the mass-close once for the helper process, which can then just spawn iptables without mass-close, or simply set a tiny max files ulimit for this helper process so that it only 'mass close' 20 FDs.
We'd have to write the set of desired iptables rules into some data format, send it to the helper to execute and wait for results. Probably not too difficult to achieve.
The best way to reduce fork/exec time is of course to never fork/exec at all :-). So, a chain of current/future events that could lead to 0 fork/execs: 1) upstream firewalld now has an nftables backend available (hang with me, this really is related...) 2) some distros have already switched to it (someone running debian on #virt a couple weeks ago had a problem that turned out to be due to using the nftables backend, and Fedora 29 *tried* to switch to using the nftables backend, but had to revert/postpone the change due to the networking problems it caused with libvirt virtual networks.) 3) due to (1) and (2) we're looking into making libvirt work properly with a firewalld using nftables (it's going to take new code in both libvirt *and* firewalld) 4) an upcoming release of firewalld will be to adding/deleting nftables rules using calls to a library rather than a fork/exec of the nft command. Once libvirt is working with an nftables backend to firewalld, and firewalld is using an API to access nftables, there will be 0 fork/execs! Yay! (The bad news is that libvirt's nwfilter rules require more new functionality that doesn't yet exist in firewalld (e.g. outbound rules), so while the firewall rules that we add for libvirt networks could be switched over fairly soo, the nwfilter part is going to take awhile :-/
Unfortunately combining the above two approaches does not give boost multiple of them along. The reason is due to concurrency and high number of VMs (100) preexec boost does not have significant role and using vfork dininishes concurrency as it freezes all parent threads before execve. So dropping locks and closes gives 33s restart time and adding vfork to this gives 25s restart time.
Another approach is to use --atomic-file option for ebtables (iptables/ip6tables unfortunately does not have one). The idea is to save table to file/edit file/commit table to kernel. I hoped this could give performance boost because we don't need to load/store kernel network table for a single rule update. In order to isolate approaches I also dropped all ip/ip6 updates which can not be done this way. In this approach we can not drop ruleLock in firewall because no other VM threads should change tables between save/commit. This approach gives restart time 25s. But this approach is broken anyway as we can not be sure another application doesn't change newtork table between save/commit in which case these changes will be lost.
After all I think we need to move in a different direction. We can add API to all binaries and firewalld to execute many commands in one run. We can pass commands as arguments or wrote them into file which is then given to binary. This sounds like 'iptables-restore' command which accepts a batch of commands in a file. There's also ip6tables-restore and ebtables-restore.
There's no way to execute these via firewalld though.
...even though firewalld itself internally uses iptables-restore in certain cases (according to what I was told by one of the firewalld developers). (also, for those naughty people who have firewalld disabled, libvirt could use iptables-restore)
It is also tricky when we need to check the output of certan commands before making a decision on which other commands to run. It would certainly speed things up though to load batches of rules at once.
IMHO the most immediately useful thing would be to work on optimization to skip re-creating nwfilter rules on startup, unless we detect there are likely changes.
Agreed. That is something that could be done now, with no new features required from any other package.

On Thu, Oct 11, 2018 at 03:57:10PM -0400, Laine Stump wrote:
On 10/11/2018 06:20 AM, Daniel P. Berrangé wrote:
On Mon, Sep 24, 2018 at 10:41:37AM +0300, Nikolay Shirokovskiy wrote:
What speed up is possible on conservative approach? First we can remove for test purpuses firewall ruleLock, gentech dirver updateMutex and filter object mutex which do not serve function in restart scenario. This gives 36s restart time. The speed up is archived because heavy fork/preexec steps are now run concurrently. To me the most obvious speedup is not to run any commands at all.
99% of the time when we restart libvirtd and re-build network filters we are achieving nothing useful. We tear down the rules and replace them by exactly the same rules.
The idea behind rebuilding at startup is that someone might have changed the config on diks while libvirtd was not running, and we want to ensure that the VMs have the correct live config after this.
Alternatively libvirtd has been upgraded to a newer version, and we want to rebuild with the new code in case old code had a bug causing it to create incorrect rules.
I think we should look at how to optimize this to be more intelligent.
We could somehow record a hash of what rule contents was used originally. Only rebuild the rules if we see the hash of nwfilter rules has changed, or if libvirtd binary has had code changes.
Since we pre-emptively delete any existing copy of a rule just prior to adding that rule, another way we could reduce startup time would be to spend less time deleting. one way of doing that would (I think) be if we could add all of our rules in a set of chains that we create ourselves. Then when libvirtd restarted, we could just issue a few commands to flush those chains. (come to think of it, nwfilter *already* puts most/all of its rules in self-created chains. Hmm....
I don't think deletion is a big problem - we only delete the custom chains with nwfilter, and that's a small part of the command count.
The best way to reduce fork/exec time is of course to never fork/exec at all :-). So, a chain of current/future events that could lead to 0 fork/execs:
1) upstream firewalld now has an nftables backend available (hang with me, this really is related...)
2) some distros have already switched to it (someone running debian on #virt a couple weeks ago had a problem that turned out to be due to using the nftables backend, and Fedora 29 *tried* to switch to using the nftables backend, but had to revert/postpone the change due to the networking problems it caused with libvirt virtual networks.)
3) due to (1) and (2) we're looking into making libvirt work properly with a firewalld using nftables (it's going to take new code in both libvirt *and* firewalld)
4) an upcoming release of firewalld will be to adding/deleting nftables rules using calls to a library rather than a fork/exec of the nft command.
Once libvirt is working with an nftables backend to firewalld, and firewalld is using an API to access nftables, there will be 0 fork/execs! Yay!
Sadly not. We don't fork/exec but we do call dbus *synchronously* to get firewalld to add the rules, and firewalld does a fork/exec. IOW, we still have the fork/exec for every command, but we also now have a dbus roundtrip on top. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Oct 12, 2018 at 08:47:55AM +0100, Daniel P. Berrangé wrote:
On Thu, Oct 11, 2018 at 03:57:10PM -0400, Laine Stump wrote:
On 10/11/2018 06:20 AM, Daniel P. Berrangé wrote:
On Mon, Sep 24, 2018 at 10:41:37AM +0300, Nikolay Shirokovskiy wrote:
What speed up is possible on conservative approach? First we can remove for test purpuses firewall ruleLock, gentech dirver updateMutex and filter object mutex which do not serve function in restart scenario. This gives 36s restart time. The speed up is archived because heavy fork/preexec steps are now run concurrently. To me the most obvious speedup is not to run any commands at all.
99% of the time when we restart libvirtd and re-build network filters we are achieving nothing useful. We tear down the rules and replace them by exactly the same rules.
The idea behind rebuilding at startup is that someone might have changed the config on diks while libvirtd was not running, and we want to ensure that the VMs have the correct live config after this.
Alternatively libvirtd has been upgraded to a newer version, and we want to rebuild with the new code in case old code had a bug causing it to create incorrect rules.
I think we should look at how to optimize this to be more intelligent.
We could somehow record a hash of what rule contents was used originally. Only rebuild the rules if we see the hash of nwfilter rules has changed, or if libvirtd binary has had code changes.
Since we pre-emptively delete any existing copy of a rule just prior to adding that rule, another way we could reduce startup time would be to spend less time deleting. one way of doing that would (I think) be if we could add all of our rules in a set of chains that we create ourselves. Then when libvirtd restarted, we could just issue a few commands to flush those chains. (come to think of it, nwfilter *already* puts most/all of its rules in self-created chains. Hmm....
I don't think deletion is a big problem - we only delete the custom chains with nwfilter, and that's a small part of the command count.
The best way to reduce fork/exec time is of course to never fork/exec at all :-). So, a chain of current/future events that could lead to 0 fork/execs:
1) upstream firewalld now has an nftables backend available (hang with me, this really is related...)
2) some distros have already switched to it (someone running debian on #virt a couple weeks ago had a problem that turned out to be due to using the nftables backend, and Fedora 29 *tried* to switch to using the nftables backend, but had to revert/postpone the change due to the networking problems it caused with libvirt virtual networks.)
3) due to (1) and (2) we're looking into making libvirt work properly with a firewalld using nftables (it's going to take new code in both libvirt *and* firewalld)
4) an upcoming release of firewalld will be to adding/deleting nftables rules using calls to a library rather than a fork/exec of the nft command.
Once libvirt is working with an nftables backend to firewalld, and firewalld is using an API to access nftables, there will be 0 fork/execs! Yay!
Sadly not.
We don't fork/exec but we do call dbus *synchronously* to get firewalld to add the rules, and firewalld does a fork/exec.
IOW, we still have the fork/exec for every command, but we also now have a dbus roundtrip on top.
Sigh, of course I missed the key paragraph in your point (4) about using the API for nftables :-( Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (3)
-
Daniel P. Berrangé
-
Laine Stump
-
Nikolay Shirokovskiy