
On 04/26/2016 07:44 AM, mxs kolo wrote:
Now reporduced with 100% 1) create contrainer with memory limit 1Gb 2) run inside simple memory test allocator: #include <malloc.h> #include <unistd.h> #include <memory.h> #define MB 1024 * 1024 int main() { int total = 0; while (1) { void *p = malloc( 100*MB ); memset(p,0, 100*MB ); total = total + 100; printf("Alloc %d Mb\n",total); sleep(1); } } [root@tst-mxs2 ~]# free total used free shared buff/cache available Mem: 1048576 7412 1028644 11112 12520 1028644 Swap: 1048576 0 1048576 [root@tst-mxs2 ~]# ./a.out Alloc 100 Mb Alloc 200 Mb Alloc 300 Mb Alloc 400 Mb Alloc 500 Mb Alloc 600 Mb Alloc 700 Mb Alloc 800 Mb Alloc 900 Mb Alloc 1000 Mb Killed
As You can see, limit worked and "free" inside container show correct values
3) Check situation outside container, from top hadrware node: [root@node01]# cat /sys/fs/cgroup/memory/machine.slice/machine-lxc\\x2d7445\\x2dtst\\x2dmxs2.test.scope/memory.limit_in_bytes 1073741824 4) Check list of pid in cgroups (it's IMPOTANT moment): [root@node01]# cat /sys/fs/cgroup/memory/machine.slice/machine-lxc\\x2d7445\\x2dtst\\x2dmxs2.test.scope/tasks 7445 7446 7480 7506 7510 7511 7512 7529 7532 7533 7723 7724 8251 8253 10455
First PID 7445 - it's pid of libvirt process for container: # ps ax | grep 7445 7445 ? Sl 0:00 /usr/libexec/libvirt_lxc --name tst-mxs2.test --console 21 --security=none --handshake 24 --veth macvlan5 [root@node01]# virsh list Id Name State ---------------------------------------------------- 7445 tst-mxs2.test running
5) Now broke /proc/meminfo inside container. prepare simple systemd service: # cat /usr/lib/systemd/system/true.service [Unit] Description=simple test
[Service] Type=simple ExecStart=/bin/true
[Install] WantedBy=multi-user.target
Enable service first time, disable and start:
[root@node01]# systemctl enable /usr/lib/systemd/system/true.service Created symlink from /etc/systemd/system/multi-user.target.wants/true.service to /usr/lib/systemd/system/true.service. [root@node01]# systemctl disable true.service Removed symlink /etc/systemd/system/multi-user.target.wants/true.service. [root@node01]# systemctl start true.service
Now check memory inside container: [root@tst-mxs2 ~]# free total used free shared buff/cache available Mem: 9007199254740991 190824 9007199254236179 11112 313988 9007199254236179 Swap: 0
6) Check tasks list in cgroups: [root@node01]# cat /sys/fs/cgroup/memory/machine.slice/machine-lxc\\x2d7445\\x2dtst\\x2dmxs2.test.scope/tasks 7446 7480 7506 7510 7511 7512 7529 7532 7533 7723 7724 8251 8253
After start disabled systemd service, from task list removed libvirt PID 7445. It's mean that inside LXC limit real still worked, 7446 - it's PID of /sbin/init inside container. Check that limit work:
[root@tst-mxs2 ~]# free total used free shared buff/cache available Mem: 9007199254740991 190824 9007199254236179 11112 313988 9007199254236179 Swap: 0 0 0 [root@tst-mxs2 ~]# ./a.out Alloc 100 Mb Alloc 200 Mb Alloc 300 Mb Alloc 400 Mb Alloc 500 Mb Alloc 600 Mb Alloc 700 Mb Alloc 800 Mb Alloc 900 Mb Alloc 1000 Mb Killed
Broken only fuse mount. It's positive news - process inside container even in case 8Ptb can't allocate more memory that set in cgroups. But negative news - that some java based sotfware (as puppetdb in our case) plan self strategy based on 8Ptb memory and collapsed after reach real limit.
resume: 1) don't start disabled service by systemd 2) workaround by cglassify or by it's simple analog [root@node01]# echo 7445 > /sys/fs/cgroup/memory/machine.slice/machine-lxc\\x2d7445\\x2dtst\\x2dmxs2.test.scope/tasks
p.s. I am not sure whose bug - libvirtd or systemd.
b.r. Maxim Kozin
Cool, thanks for the info! Does this still affect libvirt 1.3.2 as well? You mentioned elsewhere that you weren't hitting this issue with that version - Cole