On 04/26/2016 07:44 AM, mxs kolo wrote:
Now reporduced with 100%
1) create contrainer with memory limit 1Gb
2) run inside simple memory test allocator:
#include <malloc.h>
#include <unistd.h>
#include <memory.h>
#define MB 1024 * 1024
int main() {
int total = 0;
while (1) {
void *p = malloc( 100*MB );
memset(p,0, 100*MB );
total = total + 100;
printf("Alloc %d Mb\n",total);
sleep(1);
}
}
[root@tst-mxs2 ~]# free
total used free shared buff/cache available
Mem: 1048576 7412 1028644 11112 12520 1028644
Swap: 1048576 0 1048576
[root@tst-mxs2 ~]# ./a.out
Alloc 100 Mb
Alloc 200 Mb
Alloc 300 Mb
Alloc 400 Mb
Alloc 500 Mb
Alloc 600 Mb
Alloc 700 Mb
Alloc 800 Mb
Alloc 900 Mb
Alloc 1000 Mb
Killed
As You can see, limit worked and "free" inside container show correct values
3) Check situation outside container, from top hadrware node:
[root@node01]# cat
/sys/fs/cgroup/memory/machine.slice/machine-lxc\\x2d7445\\x2dtst\\x2dmxs2.test.scope/memory.limit_in_bytes
1073741824
4) Check list of pid in cgroups (it's IMPOTANT moment):
[root@node01]# cat
/sys/fs/cgroup/memory/machine.slice/machine-lxc\\x2d7445\\x2dtst\\x2dmxs2.test.scope/tasks
7445
7446
7480
7506
7510
7511
7512
7529
7532
7533
7723
7724
8251
8253
10455
First PID 7445 - it's pid of libvirt process for container:
# ps ax | grep 7445
7445 ? Sl 0:00 /usr/libexec/libvirt_lxc --name
tst-mxs2.test --console 21 --security=none --handshake 24 --veth
macvlan5
[root@node01]# virsh list
Id Name State
----------------------------------------------------
7445 tst-mxs2.test running
5) Now broke /proc/meminfo inside container.
prepare simple systemd service:
# cat /usr/lib/systemd/system/true.service
[Unit]
Description=simple test
[Service]
Type=simple
ExecStart=/bin/true
[Install]
WantedBy=multi-user.target
Enable service first time, disable and start:
[root@node01]# systemctl enable /usr/lib/systemd/system/true.service
Created symlink from
/etc/systemd/system/multi-user.target.wants/true.service to
/usr/lib/systemd/system/true.service.
[root@node01]# systemctl disable true.service
Removed symlink /etc/systemd/system/multi-user.target.wants/true.service.
[root@node01]# systemctl start true.service
Now check memory inside container:
[root@tst-mxs2 ~]# free
total used free shared buff/cache available
Mem: 9007199254740991 190824 9007199254236179 11112
313988 9007199254236179
Swap: 0
6) Check tasks list in cgroups:
[root@node01]# cat
/sys/fs/cgroup/memory/machine.slice/machine-lxc\\x2d7445\\x2dtst\\x2dmxs2.test.scope/tasks
7446
7480
7506
7510
7511
7512
7529
7532
7533
7723
7724
8251
8253
After start disabled systemd service, from task list removed libvirt PID 7445.
It's mean that inside LXC limit real still worked, 7446 - it's PID of
/sbin/init inside container.
Check that limit work:
[root@tst-mxs2 ~]# free
total used free shared buff/cache available
Mem: 9007199254740991 190824 9007199254236179 11112
313988 9007199254236179
Swap: 0 0 0
[root@tst-mxs2 ~]# ./a.out
Alloc 100 Mb
Alloc 200 Mb
Alloc 300 Mb
Alloc 400 Mb
Alloc 500 Mb
Alloc 600 Mb
Alloc 700 Mb
Alloc 800 Mb
Alloc 900 Mb
Alloc 1000 Mb
Killed
Broken only fuse mount. It's positive news - process inside container
even in case 8Ptb can't allocate more memory that set in cgroups.
But negative news - that some java based sotfware (as puppetdb in our
case) plan self strategy based on 8Ptb memory and collapsed after
reach real limit.
resume:
1) don't start disabled service by systemd
2) workaround by cglassify or by it's simple analog
[root@node01]# echo 7445 >
/sys/fs/cgroup/memory/machine.slice/machine-lxc\\x2d7445\\x2dtst\\x2dmxs2.test.scope/tasks
p.s.
I am not sure whose bug - libvirtd or systemd.
b.r.
Maxim Kozin
Cool, thanks for the info! Does this still affect libvirt 1.3.2 as well? You
mentioned elsewhere that you weren't hitting this issue with that version
- Cole