
Hi, when the instance was OOM killed (output attached), we have implemented Nagios check to monitor those close to the limit. However, now we are getting false alarms, because instance(s) can get close to the cgroup limit for valid reasons.
However, this behavior won't change with caches. Kernel knows that those are data (s)he can discard so before killing the process, unneeded caches will get dropped and after there is nothing to drop, the procedure falls back to killing the process.
I guess, in the check, we will have to subtract cache size from ' memory.usage_in_bytes'. It's still puzzling me, how instances with caching disabled for all its block devices can accumulate such large caches on host. Thanks all for your time, Regards, Brano Zarnovican