Hi all,
This is not really a libvirt issue but I'm hoping some of the smart folks
here will know more about this problem...
We have noticed when running some HPC applications on our OpenStack
(libvirt+KVM) cloud that the same application occasionally performs much
worse (4-5x slowdown) than normal. We can reproduce this quite easily by
filling pagecache (i.e. dd-ing a single large file to /dev/null) before
running the application. The problem seems to be that the kernel is not
freeing (or has some trouble freeing) the non-dirty and (presumably
immediately) reclaimable pagecache in order to allocate THPs for the
application.
This behaviour is also observable on regular bare-metal, but the slowdown
is only 10-20% there - the nested paging of the guest really makes THP
allocation important there (1). Both current CentOS and Ubuntu guests have
the issue. Some more environmental context: the VMs are tuned (CPU pinning,
NUMA topology, hugepage backing), so we normally see no difference between
host and guest performance.
After talking to many other colleagues in the HPC space it seems common
that people setup their batch schedulers to put drop_caches between jobs,
which can help to workaround this and other similar issues, but no-one has
been able to explain why this is happening. Hopefully someone who knows
about the guts of the Linux MM can shed some light...?
(1) a related and possibly dumb question: in the case of a high-performance
KVM where the guest is hugepage backed and pinned in host memory anyway,
why do we still have a table based resolution for guest physical to host
virtual address translation - couldn't this just be done by offset?
--
Cheers,
~Blairo