[libvirt PATCH] ci: Reduce number of stages

Right now we're dividing the jobs into three stages: prebuild, which includes DCO checking as well as building artifacts such as the website, and native_build/cross_build, which do exactly what you'd expect based on their names. This organization is nice from the logical point of view, but results in poor utilization of the available CI resources: in particular, the fact that cross_build jobs can only start after all native_build jobs have finished means that if even a single one of the latter takes a bit longer the pipeline will stall, and with native builds taking anywhere from less than 10 minutes to more than 20, this happens all the time. Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs. Reducing the number of stages results in significant speedups: specifically, going from three stages to two stages reduces the overall completion time for a full CI pipeline from ~45 minutes[1] to ~30 minutes[2]. [1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173 Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- .gitlab-ci.yml | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index 7a8142b506..8d9313e415 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -2,9 +2,8 @@ variables: GIT_DEPTH: 100 stages: - - prebuild - - native_build - - cross_build + - sanity_checks + - builds .script_variables: &script_variables | export MAKEFLAGS="-j$(getconf _NPROCESSORS_ONLN)" @@ -17,7 +16,7 @@ stages: # Default native build jobs that are always run .native_build_default_job_template: &native_build_default_job_definition - stage: native_build + stage: builds cache: paths: - ccache/ @@ -42,7 +41,7 @@ stages: # system other than Linux. These jobs will only run if the required # setup has been performed on the GitLab account (see ci/README.rst). .cirrus_build_default_job_template: &cirrus_build_default_job_definition - stage: native_build + stage: builds image: registry.gitlab.com/libvirt/libvirt-ci/cirrus-run:master script: - cirrus-run ci/cirrus/$NAME.yml.j2 @@ -64,7 +63,7 @@ stages: # Default cross build jobs that are always run .cross_build_default_job_template: &cross_build_default_job_definition - stage: cross_build + stage: builds cache: paths: - ccache/ @@ -194,7 +193,7 @@ mingw64-fedora-rawhide: # be deployed to the web root: # https://gitlab.com/libvirt/libvirt/-/jobs/artifacts/master/download?job=webs... website: - stage: prebuild + stage: builds before_script: - *script_variables script: @@ -216,7 +215,7 @@ website: codestyle: - stage: prebuild + stage: builds before_script: - *script_variables script: @@ -231,7 +230,7 @@ codestyle: # for translation usage: # https://gitlab.com/libvirt/libvirt/-/jobs/artifacts/master/download?job=potf... potfile: - stage: prebuild + stage: builds only: - master before_script: @@ -259,7 +258,7 @@ potfile: # this test on developer's personal forks from which # merge requests are submitted check-dco: - stage: prebuild + stage: sanity_checks image: registry.gitlab.com/libvirt/libvirt-ci/check-dco:master script: - /check-dco -- 2.25.4

On Wed, Jun 10, 2020 at 13:33:01 +0200, Andrea Bolognani wrote: [...]
Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs.
On the contrary I think that the DCO check should be made after builds as that usually forces users to add a sign-off just to bypass that check if they want to sanity check their series. Since the lack of a sign off can be effectively used as a mark for an patch that is not ready to be pushed, but a build-check is still needed. This adds a pointless hurdle in using the CI and also removes one of the meaningful uses to have a sign off checker.

On Wed, Jun 10, 2020 at 01:51:29PM +0200, Peter Krempa wrote:
On Wed, Jun 10, 2020 at 13:33:01 +0200, Andrea Bolognani wrote:
[...]
Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs.
On the contrary I think that the DCO check should be made after builds as that usually forces users to add a sign-off just to bypass that check if they want to sanity check their series.
Missing signoff is quite common for new contributors, so it was put as the first check so that get quick notification of this mistake.
Since the lack of a sign off can be effectively used as a mark for an patch that is not ready to be pushed, but a build-check is still needed. This adds a pointless hurdle in using the CI and also removes one of the meaningful uses to have a sign off checker.
That kind of usage of signoff is not really required in a merge request workflow. You won't typically open the merge request in the first place if code isn't ready, but if you did, then there's explicit "WIP" flag for merge requests to achieve this. Once libvirt.git uses merge request, we will fully block all ability to push directly to git. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On a Wednesday in 2020, Daniel P. Berrangé wrote:
On Wed, Jun 10, 2020 at 01:51:29PM +0200, Peter Krempa wrote:
On Wed, Jun 10, 2020 at 13:33:01 +0200, Andrea Bolognani wrote:
[...]
Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs.
On the contrary I think that the DCO check should be made after builds as that usually forces users to add a sign-off just to bypass that check if they want to sanity check their series.
Missing signoff is quite common for new contributors, so it was put as the first check so that get quick notification of this mistake.
Since the lack of a sign off can be effectively used as a mark for an patch that is not ready to be pushed, but a build-check is still needed. This adds a pointless hurdle in using the CI and also removes one of the meaningful uses to have a sign off checker.
That kind of usage of signoff is not really required in a merge request workflow. You won't typically open the merge request in the first place if code isn't ready, but if you did, then there's explicit "WIP" flag for merge requests to achieve this. Once libvirt.git uses merge request, we will fully block all ability to push directly to git.
I think we have a long way to go until merge requests are usable without pushing directly to git. Jano
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote:
Right now we're dividing the jobs into three stages: prebuild, which includes DCO checking as well as building artifacts such as the website, and native_build/cross_build, which do exactly what you'd expect based on their names.
This organization is nice from the logical point of view, but results in poor utilization of the available CI resources: in particular, the fact that cross_build jobs can only start after all native_build jobs have finished means that if even a single one of the latter takes a bit longer the pipeline will stall, and with native builds taking anywhere from less than 10 minutes to more than 20, this happens all the time.
Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs.
The advantage of using stages is that it makes it easy to see at a glance where the pipeline was failing.
Reducing the number of stages results in significant speedups: specifically, going from three stages to two stages reduces the overall completion time for a full CI pipeline from ~45 minutes[1] to ~30 minutes[2].
[1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173
I don't think this time comparison is showing a genuine difference. If we look at the original staged pipeline, every single individual job took much longer than every individual jobs in the simplified pipeline. I think the difference in job times accounts for most (possibly all) of the difference in the pipelines time. If we look at the history of libvirt pipelines: https://gitlab.com/libvirt/libvirt/pipelines the vast majority of the time we're completing in 30 minutes or less already. If you want to demonstrate an time improvement from these merged stages, then run 20 pipelines over a cople of days and show that they're consistently better than what we see already, and not just a reflection of the CI infra load at a point in time. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Jun 10, 2020 at 01:14:51PM +0100, Daniel P. Berrangé wrote:
On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote:
Right now we're dividing the jobs into three stages: prebuild, which includes DCO checking as well as building artifacts such as the website, and native_build/cross_build, which do exactly what you'd expect based on their names.
This organization is nice from the logical point of view, but results in poor utilization of the available CI resources: in particular, the fact that cross_build jobs can only start after all native_build jobs have finished means that if even a single one of the latter takes a bit longer the pipeline will stall, and with native builds taking anywhere from less than 10 minutes to more than 20, this happens all the time.
Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs.
The advantage of using stages is that it makes it easy to see at a glance where the pipeline was failing.
Reducing the number of stages results in significant speedups: specifically, going from three stages to two stages reduces the overall completion time for a full CI pipeline from ~45 minutes[1] to ~30 minutes[2].
[1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173
I don't think this time comparison is showing a genuine difference.
If we look at the original staged pipeline, every single individual job took much longer than every individual jobs in the simplified pipeline. I think the difference in job times accounts for most (possibly all) of the difference in the pipelines time.
If we look at the history of libvirt pipelines:
https://gitlab.com/libvirt/libvirt/pipelines
the vast majority of the time we're completing in 30 minutes or less already.
If you want to demonstrate an time improvement from these merged stages, then run 20 pipelines over a cople of days and show that they're consistently better than what we see already, and not just a reflection of the CI infra load at a point in time.
Also remember that we're using ccache, so slower builds may just be a reflection of the ccache having low hit rate - a sequence of repeated builds of the same branch should identify if that's the case. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, 2020-06-10 at 13:31 +0100, Daniel P. Berrangé wrote:
On Wed, Jun 10, 2020 at 01:14:51PM +0100, Daniel P. Berrangé wrote:
On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote:
Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs.
The advantage of using stages is that it makes it easy to see at a glance where the pipeline was failing.
Ultimately you'll need to drill down to the actual failure, though, so the only situation in which it would really provide value is if for some reason *all* cross builds failed at once, which is not something that happens frequently enough to optimize for.
Reducing the number of stages results in significant speedups: specifically, going from three stages to two stages reduces the overall completion time for a full CI pipeline from ~45 minutes[1] to ~30 minutes[2].
[1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173
I don't think this time comparison is showing a genuine difference.
If we look at the original staged pipeline, every single individual job took much longer than every individual jobs in the simplified pipeline. I think the difference in job times accounts for most (possibly all) of the difference in the pipelines time.
If we look at the history of libvirt pipelines:
https://gitlab.com/libvirt/libvirt/pipelines
the vast majority of the time we're completing in 30 minutes or less already.
That was before introducing FreeBSD builds, which for whatever reason take a significantly longer time: the last couple of jobs both took 50+ minutes. Installing packages is very inefficient, it would seem. Either way, even looking at earlier jobs, it seems clear that we leave compute time on the table: for the last 10 jobs before adding FreeBSD, we have Longest job | Shortest job ------------ ------------- 21:20 | 12:12 16:11 | 09:04 21:31 | 13:40 16:32 | 08:28 14:53 | 08:16 16:01 | 07:59 16:17 | 08:40 15:30 | 08:49 15:12 | 09:11 16:20 | 08:34 which means the pipeline is stalled for at least 5-8 minutes each time. That's time that we could use to run builds, but we just sit idly and wait instead. The difference becomes even bigger with FreeBSD in the mix. Even from a more semantical point of view, pipeline stages exist to implement dependencies between jobs: a good example is our container build jobs, which of course need to happen *before* the build job that uses that container can start. There are no dependencies whatsoever between native builds and cross builds.
If you want to demonstrate an time improvement from these merged stages, then run 20 pipelines over a cople of days and show that they're consistently better than what we see already, and not just a reflection of the CI infra load at a point in time.
I could do that, sure, it just seems like a waste of shared runner CPU time...
Also remember that we're using ccache, so slower builds may just be a reflection of the ccache having low hit rate - a sequence of repeated builds of the same branch should identify if that's the case.
I've been running builds pretty much non-stop over the past few days, and since the cache is keyed off the job's name there should be no significant skew caused by this. -- Andrea Bolognani / Red Hat / Virtualization

On Wed, Jun 10, 2020 at 05:15:55PM +0200, Andrea Bolognani wrote:
On Wed, 2020-06-10 at 13:31 +0100, Daniel P. Berrangé wrote:
On Wed, Jun 10, 2020 at 01:14:51PM +0100, Daniel P. Berrangé wrote:
On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote:
Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs.
The advantage of using stages is that it makes it easy to see at a glance where the pipeline was failing.
Ultimately you'll need to drill down to the actual failure, though, so the only situation in which it would really provide value is if for some reason *all* cross builds failed at once, which is not something that happens frequently enough to optimize for.
Reducing the number of stages results in significant speedups: specifically, going from three stages to two stages reduces the overall completion time for a full CI pipeline from ~45 minutes[1] to ~30 minutes[2].
[1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173
I don't think this time comparison is showing a genuine difference.
If we look at the original staged pipeline, every single individual job took much longer than every individual jobs in the simplified pipeline. I think the difference in job times accounts for most (possibly all) of the difference in the pipelines time.
If we look at the history of libvirt pipelines:
https://gitlab.com/libvirt/libvirt/pipelines
the vast majority of the time we're completing in 30 minutes or less already.
That was before introducing FreeBSD builds, which for whatever reason take a significantly longer time: the last couple of jobs both took 50+ minutes. Installing packages is very inefficient, it would seem.
Oh dear, yeah, i missed that it introduced FreeBSD. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote:
Right now we're dividing the jobs into three stages: prebuild, which includes DCO checking as well as building artifacts such as the website, and native_build/cross_build, which do exactly what you'd expect based on their names.
This organization is nice from the logical point of view, but results in poor utilization of the available CI resources: in particular, the fact that cross_build jobs can only start after all native_build jobs have finished means that if even a single one of the latter takes a bit longer the pipeline will stall, and with native builds taking anywhere from less than 10 minutes to more than 20, this happens all the time.
Building artifacts in a separate pipeline stage also doesn't have any advantages, and only delays further stages by a couple of minutes. The only job that really makes sense in its own stage is the DCO check, because it's extremely fast (less than 1 minute) and, if that fails, we can avoid kicking off all other jobs.
Reducing the number of stages results in significant speedups: specifically, going from three stages to two stages reduces the overall completion time for a full CI pipeline from ~45 minutes[1] to ~30 minutes[2].
[1] https://gitlab.com/abologna/libvirt/-/pipelines/154751893 [2] https://gitlab.com/abologna/libvirt/-/pipelines/154771173
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- .gitlab-ci.yml | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (4)
-
Andrea Bolognani
-
Daniel P. Berrangé
-
Ján Tomko
-
Peter Krempa