On 09.08.2013 13:39, Martin Kletzander wrote:
On 08/08/2013 05:03 PM, Brano Zarnovican wrote:
> On Thu, Aug 8, 2013 at 9:39 AM, Martin Kletzander <mkletzan(a)redhat.com> wrote:
>> At first let me explain that libvirt is not ignoring the cache=none.
>> This is propagated to qemu as a parameter for it's disk. From qemu's
>> POV (anyone feel free to correct me if I'm mistaken) this means the file
>> is opened with O_DIRECT flag; and from the open(2) manual, the O_DIRECT
>> means "Try to minimize cache effects of the I/O to and from this
>> file...", that doesn't necessarily mean there is no cache at all.
>
> Thanks for explanation.
>
>> But even if it does, this applies to files used as disks, but those
>> disks are not the only files the process is using. You can check what
>> othe files the process has mapped, opened etc. from the '/proc'
>> filesystem or using the 'lsof' utility. All the other files can (and
>> probably will) take some cache and there is nothing wrong with that.
>
> In my case there was 4GB of caches.
>
> Just now, I have thrashed one instance with many read/writes on
> various devices. In total, tens of GB of data. But the cache (on host)
> did not grow beyond 3MB. I'm not yet able to reproduce the problem.
>
>> Are you trying to resolve an issue or asking just out of curiosity?
>> Because this is wanted behavior and there should be no need for anyone
>> to minimize this.
>
> Once or twice, one of our VMs was OOM killed because it reached 1.5 *
> memory limit for its cgroup.
>
Oh, please report this to us. This is one of the problems we'll be,
unfortunately, dealing with forever, I guess. This limit is just a
"guess" how much qemu might take and we're setting it to make sure host
is not overwhelmed in case qemu is faulty/hacked. Since this isn't ever
possible to set exactly, it already happened that thanks to cgroups,
qemu was killed, so we had to increase the limit.
I Cc'd Michal who might be the right person to know about any further
increase.
Sometimes I feel like I should have not added the functionality.
Guessing the correct limit for a process is like solving a Halting
problem. It cannot be calculated by any algorithm and the best we can do
is increase the limit once somebody is already in a trouble. D'oh!
Morover, if somebody comes by and tell us about it, we blindly size the
limit up without knowing that the qemu is not mem-leaking for sure (in
which case the limit is right and OOM killer did the right thing). The
more problems are reported the more I'm closer to writing a patch which
removes this heuristic.
Michal