On Wed, Jun 10, 2020 at 01:14:51PM +0100, Daniel P. Berrangé wrote:
On Wed, Jun 10, 2020 at 01:33:01PM +0200, Andrea Bolognani wrote:
> Right now we're dividing the jobs into three stages: prebuild, which
> includes DCO checking as well as building artifacts such as the
> website, and native_build/cross_build, which do exactly what you'd
> expect based on their names.
>
> This organization is nice from the logical point of view, but results
> in poor utilization of the available CI resources: in particular, the
> fact that cross_build jobs can only start after all native_build jobs
> have finished means that if even a single one of the latter takes a
> bit longer the pipeline will stall, and with native builds taking
> anywhere from less than 10 minutes to more than 20, this happens all
> the time.
>
> Building artifacts in a separate pipeline stage also doesn't have any
> advantages, and only delays further stages by a couple of minutes.
> The only job that really makes sense in its own stage is the DCO
> check, because it's extremely fast (less than 1 minute) and, if that
> fails, we can avoid kicking off all other jobs.
The advantage of using stages is that it makes it easy to see at a
glance where the pipeline was failing.
>
> Reducing the number of stages results in significant speedups:
> specifically, going from three stages to two stages reduces the
> overall completion time for a full CI pipeline from ~45 minutes[1]
> to ~30 minutes[2].
>
> [1]
https://gitlab.com/abologna/libvirt/-/pipelines/154751893
> [2]
https://gitlab.com/abologna/libvirt/-/pipelines/154771173
I don't think this time comparison is showing a genuine difference.
If we look at the original staged pipeline, every single individual
job took much longer than every individual jobs in the simplified
pipeline. I think the difference in job times accounts for most
(possibly all) of the difference in the pipelines time.
If we look at the history of libvirt pipelines:
https://gitlab.com/libvirt/libvirt/pipelines
the vast majority of the time we're completing in 30 minutes or
less already.
If you want to demonstrate an time improvement from these merged
stages, then run 20 pipelines over a cople of days and show
that they're consistently better than what we see already, and
not just a reflection of the CI infra load at a point in time.
Also remember that we're using ccache, so slower builds may just be a
reflection of the ccache having low hit rate - a sequence of repeated
builds of the same branch should identify if that's the case.
Regards,
Daniel
--
|: