[libvirt] [jenkins-ci PATCH 0/1] jenkins: Start building on Ubuntu

This patch is intended to start a slightly larger discussion about our plans for the CentOS CI environment going forward. At the moment, we have active builders for CentOS 7 Debian 9 Debian 10 Fedora 30 Fedora 31 Fedora Rawhide FreeBSD 11 FreeBSD 12 but we don't have builder for Debian sid FreeBSD -CURRENT Ubuntu 16.04 Ubuntu 18.04 despite them being fully supported in the libvirt-jenkins-ci repository. This makes sense for sid and -CURRENT, since the former covers the same "freshest Linux packages" angle that Rawhide already takes care of and the latter is often broken and not trivial to keep updated; both Ubuntu targets, however, should IMHO be part of the CentOS CI environment. Hence this series :) Moreover, we're in the process of adding CentOS 8 openSUSE Leap 15.1 openSUSE Tumbleweed as targets, of which the first two should also IMHO be added as they would provide useful additional coverage. The only reason why I'm even questioning whether this should be done is capacity for the hypervisor host: the machine we're running all builders on has CPUs: 8 Memory: 32 GiB Storage: 450 GiB and each of the guests is configured to use CPUs: 2 Memory: 2 GiB Storage: 20 GiB So while we're good, and actually have plenty of room to grow, on the memory and storage front, we're already overcommitting our CPUs pretty significantly, which I guess is at least part of the reason why builds take so long. Can we afford to add 50% more load on the machine without making it unusable? I don't know. But I think it would be worthwhile to at least try and see how it handles an additional 25%, which is exactly what this series does. In my opinion, as long as the machine can keep up with demand and not end up in a situation where it starts accumulating backlog jobs, it's fine if builds take longer: developers who want to run a relatively quick smoke test before posting patches can use Travis CI or 'make ci-check@...' locally for the purpose, whereas the role of CentOS CI in my eyes is to try and catch as many issues as possible after merge so that they don't end up in a release. But I realize others might see it differently, hence this lenghty cover letter :) Andrea Bolognani (1): jenkins: Start building on Ubuntu jenkins/jobs/defaults.yaml | 2 ++ jenkins/projects/libvirt-dbus.yaml | 1 + jenkins/projects/libvirt-sandbox.yaml | 2 ++ jenkins/projects/libvirt-tck.yaml | 2 ++ jenkins/projects/libvirt.yaml | 2 ++ jenkins/projects/virt-manager.yaml | 2 ++ 6 files changed, 11 insertions(+) -- 2.23.0

We've supported both Ubuntu 16.04 and Ubuntu 18.04 as build targets for a long time now, and we are even building on the latter on Travis CI, but we've never actually deployed the corresponding VMs in the CentOS CI environment. Start doing that now. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- jenkins/jobs/defaults.yaml | 2 ++ jenkins/projects/libvirt-dbus.yaml | 1 + jenkins/projects/libvirt-sandbox.yaml | 2 ++ jenkins/projects/libvirt-tck.yaml | 2 ++ jenkins/projects/libvirt.yaml | 2 ++ jenkins/projects/virt-manager.yaml | 2 ++ 6 files changed, 11 insertions(+) diff --git a/jenkins/jobs/defaults.yaml b/jenkins/jobs/defaults.yaml index 676ecbf..18bbade 100644 --- a/jenkins/jobs/defaults.yaml +++ b/jenkins/jobs/defaults.yaml @@ -11,6 +11,8 @@ - libvirt-fedora-rawhide - libvirt-freebsd-11 - libvirt-freebsd-12 + - libvirt-ubuntu-1604 + - libvirt-ubuntu-1804 rpm_machines: - libvirt-centos-7 - libvirt-fedora-30 diff --git a/jenkins/projects/libvirt-dbus.yaml b/jenkins/projects/libvirt-dbus.yaml index dfc159c..db9d64b 100644 --- a/jenkins/projects/libvirt-dbus.yaml +++ b/jenkins/projects/libvirt-dbus.yaml @@ -20,6 +20,7 @@ - libvirt-fedora-30 - libvirt-fedora-31 - libvirt-fedora-rawhide + - libvirt-ubuntu-1804 - meson-rpm-job: parent_jobs: 'libvirt-dbus-check' # RPM build is still not possible on CentOS7 as it does not diff --git a/jenkins/projects/libvirt-sandbox.yaml b/jenkins/projects/libvirt-sandbox.yaml index 6a8acec..00ecb98 100644 --- a/jenkins/projects/libvirt-sandbox.yaml +++ b/jenkins/projects/libvirt-sandbox.yaml @@ -10,6 +10,8 @@ - libvirt-fedora-30 - libvirt-fedora-31 - libvirt-fedora-rawhide + - libvirt-ubuntu-1604 + - libvirt-ubuntu-1804 title: Libvirt Sandbox archive_format: gz git_url: '{git_urls[libvirt-sandbox][default]}' diff --git a/jenkins/projects/libvirt-tck.yaml b/jenkins/projects/libvirt-tck.yaml index fcdea98..4e129ee 100644 --- a/jenkins/projects/libvirt-tck.yaml +++ b/jenkins/projects/libvirt-tck.yaml @@ -11,6 +11,8 @@ - libvirt-fedora-rawhide - libvirt-freebsd-11 - libvirt-freebsd-12 + - libvirt-ubuntu-1604 + - libvirt-ubuntu-1804 title: Libvirt TCK archive_format: gz git_url: '{git_urls[libvirt-tck][default]}' diff --git a/jenkins/projects/libvirt.yaml b/jenkins/projects/libvirt.yaml index fdc24bc..9c5922f 100644 --- a/jenkins/projects/libvirt.yaml +++ b/jenkins/projects/libvirt.yaml @@ -19,6 +19,8 @@ - libvirt-fedora-30 - libvirt-fedora-31 - libvirt-fedora-rawhide + - libvirt-ubuntu-1604 + - libvirt-ubuntu-1804 - autotools-check-job: parent_jobs: 'libvirt-syntax-check' local_env: | diff --git a/jenkins/projects/virt-manager.yaml b/jenkins/projects/virt-manager.yaml index 3dc8e2e..81bee13 100644 --- a/jenkins/projects/virt-manager.yaml +++ b/jenkins/projects/virt-manager.yaml @@ -12,6 +12,7 @@ - libvirt-fedora-rawhide - libvirt-freebsd-11 - libvirt-freebsd-12 + - libvirt-ubuntu-1804 title: Virtual Machine Manager archive_format: gz git_url: '{git_urls[virt-manager][default]}' @@ -33,6 +34,7 @@ - libvirt-fedora-30 - libvirt-fedora-31 - libvirt-fedora-rawhide + - libvirt-ubuntu-1804 - python-distutils-rpm-job: parent_jobs: 'virt-manager-check' machines: -- 2.23.0

On Tue, Dec 10, 2019 at 02:54:23PM +0100, Andrea Bolognani wrote:
We've supported both Ubuntu 16.04 and Ubuntu 18.04 as build targets for a long time now, and we are even building on the latter on Travis CI, but we've never actually deployed the corresponding VMs in the CentOS CI environment. Start doing that now.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- jenkins/jobs/defaults.yaml | 2 ++ jenkins/projects/libvirt-dbus.yaml | 1 + jenkins/projects/libvirt-sandbox.yaml | 2 ++ jenkins/projects/libvirt-tck.yaml | 2 ++ jenkins/projects/libvirt.yaml | 2 ++ jenkins/projects/virt-manager.yaml | 2 ++ 6 files changed, 11 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Dec 10, 2019 at 02:54:22PM +0100, Andrea Bolognani wrote:
This patch is intended to start a slightly larger discussion about our plans for the CentOS CI environment going forward.
At the moment, we have active builders for
CentOS 7 Debian 9 Debian 10 Fedora 30 Fedora 31 Fedora Rawhide FreeBSD 11 FreeBSD 12
but we don't have builder for
Debian sid FreeBSD -CURRENT Ubuntu 16.04 Ubuntu 18.04
despite them being fully supported in the libvirt-jenkins-ci repository.
This makes sense for sid and -CURRENT, since the former covers the same "freshest Linux packages" angle that Rawhide already takes care of and the latter is often broken and not trivial to keep updated; both Ubuntu targets, however, should IMHO be part of the CentOS CI environment. Hence this series :)
Moreover, we're in the process of adding
CentOS 8 openSUSE Leap 15.1 openSUSE Tumbleweed
as targets, of which the first two should also IMHO be added as they would provide useful additional coverage.
The only reason why I'm even questioning whether this should be done is capacity for the hypervisor host: the machine we're running all builders on has
CPUs: 8 Memory: 32 GiB Storage: 450 GiB
and each of the guests is configured to use
CPUs: 2 Memory: 2 GiB Storage: 20 GiB
So while we're good, and actually have plenty of room to grow, on the memory and storage front, we're already overcommitting our CPUs pretty significantly, which I guess is at least part of the reason why builds take so long.
NB the memory that's free is not really free - it is being usefull as I/O cache for the VM disks. So more VMs will reduce I/O cache. Whether that will actually impact us I don't know though. More importantly though, AFAICT, those are not 8 real CPUs. virsh nodeinfo reports 8 cores, but virsh capabilities reports it as a 1 socket, 4 core, 2 thread CPU. IOW we haven't really got 8 CPUs, more like equivalent of 5 CPUs. as HT only really gives a x1.3 boost in best case, and I suspect builds are not likely to be hitting the best case.
Can we afford to add 50% more load on the machine without making it unusable? I don't know. But I think it would be worthwhile to at least try and see how it handles an additional 25%, which is exactly what this series does.
Giving it a try is ok I guess. I expect there's probably more we can do to optimize the setup too. For example, what actual features of qcow2 are we using ? We're not snapshotting VMs, we don't need grow-on-demand allocation. AFACT we're paying the performance cost of qcow2 (l1/l2 table lookups & metadata caching), for no reason. Switch the VMs to fully pre-allocated raw files may improve I/O performance. Raw LVM VGs would be even better but that will be painful to setup given the host install setup. I also wonder if we have the optimal aio setting for disks, as there's nothing in the XML. We could consider using cache=unsafe for VMs, though for that I think we'd want to separate off a separate disk for /home/jenkins so that if there was a host OS crash, we wouldn't have to rebuild the entire VMs - just throw away the data disk & recreate. Since we've got plenty of RAM, another obvious thing would be to turn on huge pages and use them for all guest RAM. This may well have a very significant performance boost from reducing CPU overhead which is our biggest bottleneck. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, 2019-12-10 at 14:54 +0000, Daniel P. Berrangé wrote:
On Tue, Dec 10, 2019 at 02:54:22PM +0100, Andrea Bolognani wrote:
The only reason why I'm even questioning whether this should be done is capacity for the hypervisor host: the machine we're running all builders on has
CPUs: 8 Memory: 32 GiB Storage: 450 GiB
and each of the guests is configured to use
CPUs: 2 Memory: 2 GiB Storage: 20 GiB
So while we're good, and actually have plenty of room to grow, on the memory and storage front, we're already overcommitting our CPUs pretty significantly, which I guess is at least part of the reason why builds take so long.
NB the memory that's free is not really free - it is being usefull as I/O cache for the VM disks. So more VMs will reduce I/O cache. Whether that will actually impact us I don't know though.
Oh yeah, I'm aware of that. But like you, I'm unsure about the exact impact that makes. We could arguably reduce the amount of memory assigned to guests to 1 GiB: building libvirt and friends is not exactly memory-intensive, and we could potentially benefit from the additionaly I/O cache or lessen the blow caused by adding more VMs.
More importantly though, AFAICT, those are not 8 real CPUs.
virsh nodeinfo reports 8 cores, but virsh capabilities reports it as a 1 socket, 4 core, 2 thread CPU.
IOW we haven't really got 8 CPUs, more like equivalent of 5 CPUs. as HT only really gives a x1.3 boost in best case, and I suspect builds are not likely to be hitting the best case.
That's also true.
Can we afford to add 50% more load on the machine without making it unusable? I don't know. But I think it would be worthwhile to at least try and see how it handles an additional 25%, which is exactly what this series does.
Giving it a try is ok I guess.
Can I have an R-b then? O:-)
I expect there's probably more we can do to optimize the setup too.
For example, what actual features of qcow2 are we using ? We're not snapshotting VMs, we don't need grow-on-demand allocation. AFACT we're paying the performance cost of qcow2 (l1/l2 table lookups & metadata caching), for no reason. Switch the VMs to fully pre-allocated raw files may improve I/O performance. Raw LVM VGs would be even better but that will be painful to setup given the host install setup.
I also wonder if we have the optimal aio setting for disks, as there's nothing in the XML.
We could consider using cache=unsafe for VMs, though for that I think we'd want to separate off a separate disk for /home/jenkins so that if there was a host OS crash, we wouldn't have to rebuild the entire VMs - just throw away the data disk & recreate.
The home directory for the jenkins user contains some configuration as well, so you'd have to run 'lcitool update' anyway after attaching the new disk... At that point, it might be less work overall to just rebuild the entire VM from scratch. We have only seen a couple of unexpected host shutdowns so far, and all were caused by hardware issues that resulted in a multi-day downtime anyway, so the overhead of reinstalling the various guest OS' would not be the dealbreaker.
Since we've got plenty of RAM, another obvious thing would be to turn on huge pages and use them for all guest RAM. This may well have a very significant performance boost from reducing CPU overhead which is our biggest bottleneck.
Yeah, all of the above sounds like it could help, but I'm not really well versed on the performance tuning front so I wouldn't know where to start and even how to properly measure the impact of each change. We also have to keep in mind that, while the CentOS CI environment is the main consumer of the repository, we want it to be possible for any developer to deploy builders locally relatively painlessly, so things like hugepages and LVM usage would definitely have to be optional if we introduced them. -- Andrea Bolognani / Red Hat / Virtualization
participants (2)
-
Andrea Bolognani
-
Daniel P. Berrangé