Hi,
when the instance was OOM killed (output attached), we have
implemented Nagios check to monitor those close to the limit. However,
now we are getting false alarms, because instance(s) can get close to
the cgroup limit for valid reasons.
However, this behavior won't change with caches. Kernel knows
that
those are data (s)he can discard so before killing the process, unneeded
caches will get dropped and after there is nothing to drop, the
procedure falls back to killing the process.
I guess, in the check, we will have to subtract cache size from '
memory.usage_in_bytes'.
It's still puzzling me, how instances with caching disabled for all
its block devices can accumulate such large caches on host.
Thanks all for your time,
Regards,
Brano Zarnovican