On Mon, May 09, 2022 at 06:52:32PM +0000, Petr Beneš wrote:
Hi,
my problem can be described simply: libvirt can't handle starting dozens of VMs at
the same time.
(technically, it can, but it's really slow.)
We have an AMD machine with 256 logical cores and 1.5T ram.
On that machine there is roughly 200 VMs.
Each VM is the same: 8GB of RAM, 4 VCPUs. Half of them is Win7 x86, the other half is
Win7 x64.
VMs are using qcow2 as the disk image. These images reside in the ramdisk (tmpfs).
We use these machines for automatic malware analysis, so our scenario consists of this
cycle:
- reverting VM to a running state
- execute sample inside of the VM for ~1-2 minutes
- shutdown the VM
Of course, this results in multiple VMs trying to start at the same time.
At first, reverts/starts are really fast - second or two.
After about a minute, the "revertToSnapshot" suddenly takes 10-15 seconds,
which is really unacceptable.
For comparison, we're running the same scenarion on Proxmox, where the
revertToSnapshot usually takes 2 seconds.
Can you share the XML configuration of one of your guests - assuming
they all have the same basic configuration.
As a gut feeling it sounds to me like it could be initially fast due to
utilization of host I/O cache, but then slows down due to having to
flush data to disk / read fresh from disk. This could be the case if
the disk configuration cache mode is set to certain values, so the XML
config will show us this info.
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|