failed to load seccomp syscall filter in kernel: Operation canceled

Hello community, I am operating an openstack cluster where applications (libvirt/nova etc) are running using containers. The compute node's arch is aarch64 (phytium 2500), when there are virtual machines around 60 or 70, I failed to boot new virtual machines and faced with following error message, error: internal error: qemu unexpectedly closed the monitor: 2022-10-24T06:23:54.545685Z qemu-system-aarch64: -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny: failed to load seccomp syscall filter in kernel: Operation canceled Interestingly, if I virsh stop one virtual machine, I am able to boot another. Besides, I managed to manually boot a virtual machine without any issue. So my question is what could be the potential cause of this behavior and how can I deal with it? Thank you very much in advance for the help. -- Best Regards, Jiatong Shen

On 10/25/22 02:38, Jiatong Shen wrote:
Hello community,
I am operating an openstack cluster where applications (libvirt/nova etc) are running using containers. The compute node's arch is aarch64 (phytium 2500), when there are virtual machines around 60 or 70, I failed to boot new virtual machines and faced with following error message,
error: internal error: qemu unexpectedly closed the monitor: 2022-10-24T06:23:54.545685Z qemu-system-aarch64: -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny: failed to load seccomp syscall filter in kernel: Operation canceled
This error message comes from qemu, which sees seccomp_load() fail. Looking further [1], seccomp_load() returns -ECANCELED. This is because qemu does not set SCMP_FLTATR_API_SYSRAWRC attribute and thus public APIs fail with this meaningless error. I've posted patch here [1]. But the symptoms suggest that you are hitting a limit (for eBPF perhaps?). 1: https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg04509.html
Interestingly, if I virsh stop one virtual machine, I am able to boot another. Besides, I managed to manually boot a virtual machine without any issue. So my question is what could be the potential cause of this behavior and how can I deal with it? Thank you very much in advance for the help.
I wonder whether the fact that openstack runs VMs in container has something to do with this. Perhaps it touches the limit/sets different accounting for it? Michal

Thank you very much for the feedback. I manually create a script and do secomp_load 200 times using non-root user and it fails even outside container, so right now I think kylin OS might get some bugs which causes problems for non-root users. Best, Norman On Tue, Oct 25, 2022 at 8:11 PM Michal Prívozník <mprivozn@redhat.com> wrote:
On 10/25/22 02:38, Jiatong Shen wrote:
Hello community,
I am operating an openstack cluster where applications (libvirt/nova etc) are running using containers. The compute node's arch is aarch64 (phytium 2500), when there are virtual machines around 60 or 70, I failed to boot new virtual machines and faced with following error message,
error: internal error: qemu unexpectedly closed the monitor: 2022-10-24T06:23:54.545685Z qemu-system-aarch64: -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny: failed to load seccomp syscall filter in kernel: Operation canceled
This error message comes from qemu, which sees seccomp_load() fail. Looking further [1], seccomp_load() returns -ECANCELED. This is because qemu does not set SCMP_FLTATR_API_SYSRAWRC attribute and thus public APIs fail with this meaningless error. I've posted patch here [1].
But the symptoms suggest that you are hitting a limit (for eBPF perhaps?).
1: https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg04509.html
Interestingly, if I virsh stop one virtual machine, I am able to boot another. Besides, I managed to manually boot a virtual machine without any issue. So my question is what could be the potential cause of this behavior and how can I deal with it? Thank you very much in advance for the help.
I wonder whether the fact that openstack runs VMs in container has something to do with this. Perhaps it touches the limit/sets different accounting for it?
Michal
-- Best Regards, Jiatong Shen
participants (2)
-
Jiatong Shen
-
Michal Prívozník