On Tue, Dec 10, 2019 at 02:54:22PM +0100, Andrea Bolognani wrote:
This patch is intended to start a slightly larger discussion about
our plans for the CentOS CI environment going forward.
At the moment, we have active builders for
CentOS 7
Debian 9
Debian 10
Fedora 30
Fedora 31
Fedora Rawhide
FreeBSD 11
FreeBSD 12
but we don't have builder for
Debian sid
FreeBSD -CURRENT
Ubuntu 16.04
Ubuntu 18.04
despite them being fully supported in the libvirt-jenkins-ci
repository.
This makes sense for sid and -CURRENT, since the former covers the
same "freshest Linux packages" angle that Rawhide already takes care
of and the latter is often broken and not trivial to keep updated;
both Ubuntu targets, however, should IMHO be part of the CentOS CI
environment. Hence this series :)
Moreover, we're in the process of adding
CentOS 8
openSUSE Leap 15.1
openSUSE Tumbleweed
as targets, of which the first two should also IMHO be added as they
would provide useful additional coverage.
The only reason why I'm even questioning whether this should be done
is capacity for the hypervisor host: the machine we're running all
builders on has
CPUs: 8
Memory: 32 GiB
Storage: 450 GiB
and each of the guests is configured to use
CPUs: 2
Memory: 2 GiB
Storage: 20 GiB
So while we're good, and actually have plenty of room to grow, on
the memory and storage front, we're already overcommitting our CPUs
pretty significantly, which I guess is at least part of the reason
why builds take so long.
NB the memory that's free is not really free - it is being usefull
as I/O cache for the VM disks. So more VMs will reduce I/O cache.
Whether that will actually impact us I don't know though.
More importantly though, AFAICT, those are not 8 real CPUs.
virsh nodeinfo reports 8 cores, but virsh capabilities
reports it as a 1 socket, 4 core, 2 thread CPU.
IOW we haven't really got 8 CPUs, more like equivalent of 5 CPUs.
as HT only really gives a x1.3 boost in best case, and I suspect
builds are not likely to be hitting the best case.
Can we afford to add 50% more load on the machine without making it
unusable? I don't know. But I think it would be worthwhile to at
least try and see how it handles an additional 25%, which is exactly
what this series does.
Giving it a try is ok I guess.
I expect there's probably more we can do to optimize the setup
too.
For example, what actual features of qcow2 are we using ? We're
not snapshotting VMs, we don't need grow-on-demand allocation.
AFACT we're paying the performance cost of qcow2 (l1/l2 table
lookups & metadata caching), for no reason. Switch the VMs to
fully pre-allocated raw files may improve I/O performance.
Raw LVM VGs would be even better but that will be painful
to setup given the host install setup.
I also wonder if we have the optimal aio setting for disks,
as there's nothing in the XML.
We could consider using cache=unsafe for VMs, though for
that I think we'd want to separate off a separate disk
for /home/jenkins so that if there was a host OS crash,
we wouldn't have to rebuild the entire VMs - just throw
away the data disk & recreate.
Since we've got plenty of RAM, another obvious thing would be
to turn on huge pages and use them for all guest RAM. This may
well have a very significant performance boost from reducing
CPU overhead which is our biggest bottleneck.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|