Re: [libvirt PATCH 00/20] ci: Move GitLab build recipes to a standalone script

7 Feb 2023

      On Mon, Feb 06, 2023 at 05:19:52PM +0100, Erik Skultety wrote:
...
On Mon, Feb 06, 2023 at 03:45:12PM +0000, Daniel P. Berrangé wrote:
...
On Mon, Feb 06, 2023 at 02:52:57PM +0100, Erik Skultety wrote:
...
This is a follow up to:
https://listman.redhat.com/archives/libvir-list/2023-January/237201.html
The effort here is to unify the way builds/tests are executed in GitLab CI vs
local container executions and make another step forward in terms of
reproducibility of (specifically) GitLab environments.
Even though code to run all but one (coverity) jobs from GitLab via the
build.sh script is added with this series, local behavior remains the same as
before this series. The reason for that is that that will require more patches
ridding of the Makefile which is currently used and instead integrate usage of
lcitool with the ci/helper Python script which is currently the entry point for
local container executions.
snip
...
.gitlab-ci.yml |  56 ++++++++++-------------
 ci/Makefile    |  16 ++++---
 ci/build.sh    | 121 +++++++++++++++++++++++++++++++++++++++++++------
 ci/helper      |  21 ++++++---
 4 files changed, 155 insertions(+), 59 deletions(-)
 mode change 100644 => 100755 ci/build.sh
I'm really super unenthusiastic about this change, and also the similar
change to add an ci/integration.sh. Shell scripting is something we
worked hard to eliminate from libvirt. It is an awful language
to do anything non-trivial with, error handling, lack of data
structures, variable handling, portability are all bug generators.
I know the .gitlab-ci.yml  'script' commands are technically shell
script, but they are pretty trivial bits and don't have to worry
about portability for dash vs bash etc, or complex control logic.
The majority of it is just a simple list of commands to invoke,
with the occasional conditional.
The build.sh script is by contrast significantly more complex. By
the very nature of taking "N" separate gitlab jobs and multiplexing
them onto the one shell script, we're now having todo command line
arg parsing in shell and de-multiplexing back to commands. The CI
logic is now split up across even more sources - the gitlab config,
the build.sh script and the meson.build files, which I think is
worse for maintaining this too.
True, except that even despite that, and I know what I'm talking about since I
wrote the integration jobs templates, GitLab is super picky; you can't debug
the syntax problems properly; certain common syntactic sugars don't work and
have to be replaced by less common counterparts; using YAML for bash formatting
leads to many table-flipping moments; you have to wait for Nx10 minutes before
it fails with some ridiculous error you would have caught early locally but you
couldn't because of the stage and artifact dependencies. So, once you have it
finally in place it's unarguably nice, but the road towards that isn't really
rainbow and unicorns either, it's still shell after all. Since I'm mainly
working on this part of libvirt, I myself would appreciate if I could easily
just run a single script instead of copy pasting commands one-by-one to a local
VM to catch stupid errors quickly rather than wait for GitLab to fail.
...
I appreciate the goal of being able to run CI commands locally, but
I'm not feeling this approach is a net win. I'd much rather just
having developers invoke the meson command directly, which is not
that hard.
Well, the big picture here is about making running integration tests locally
less painful than it is now...plus increase the amount of automation involved.
That's the major driver here - libvirt needs to be built locally in a safe
environment before integration tests could be run against it without touching
the host. So, with this script, while I agree with everything you said about
Bash, was just one step towards getting the automation I mentioned in place. I
also considered Python (yeah..why), but that would be super akward.
TL;DR: if we don't do anything about how we currently maintain the build
recipes (note we're maintaining 2 already), everybody will continue ignoring the
integration test suite, unless we enable merge requests where the status quo
would be somewhat (but not completely) eliminated. With integration tests you
can't ignore the aspect of getting feedback early compared to waiting for
GitLab CI.
...
If we do really want to provide something that 100% mirrors the
gitlab CI job commands though, I'm even more convinced now that
we should be using the .gitlab-ci.yml as the canonical source.
Since I last mentioned that idea, I found out that this should
be something we can achieve via the gitlab runner 'exec' command.
Haven't heard of ^this one, but I wonder how we can get something meaningful
out of it for the purposes I mentioned above.
Erik
So, I had a brief look at gihttps://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797tlab-runner exec this morning. Long story short,
it's deprecated [1] and though it was scheduled for complete removal, that plan
was put on hold for now. Nevertheless, it's not getting any features that are
introduced to .gitlab-ci.yml. The gist is that it doesn't understand 'include'
nor 'extends' which we're using heavily across our gitlab configuration
hierarchy, so it's a no-go. It also doesn't support artifacts in any form, so
while it technically should be possible to save RPM builds in a volume (not
sure if the bind mount is cleared on job completion or not) we could not pass
them into a VM running the integration tests the same convenient way as we do
in the CI.

Another thing to consider is that this solution also requires to have the
gitlab-runner binary installed locally, which many in the community may not be
comfortable with, but that's up for a debate, but hey, you can always build it
yourself.

There's a plan to rework GitLab's local CI execution functionality heavily [2]
before the 'exec' command is sunset, but ultimately their plan is have
something working by Q3 FY24, which can be anywhere from March 2024 until
September 2024, that's more than a year ahead.

In your reply you mentioned the unappealing complexity of the script
potentially leading to bugs. At the same time though one currently can't consume
.gitlab-ci.yml recipes to run a local integration test in a VM.
So, how about I get rid of the multiplexing and CLI parsing by placing each job
recipe in a standalone script or even going one step further and extract the
commonalities to a separate file that would be source from within each job
script? Would you consider that meeting each other halfway?
As for POSIX compliance, I guess this would be a soft-requirement based on
whether shellcheck was run during review, does gitlab do something better? IIUC
there's no pre-check and you'll only know after a job was executed that there
was an incompatible statement.

[1] https://docs.gitlab.com/runner/commands/#gitlab-runner-exec-deprecated
[2] https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797

Regards,
Erik