On Wed, Oct 16, 2019 at 02:22:56PM +0200, Andrea Bolognani wrote:
As we look to make the libvirt project easier to contribute to, one
fact that certainly comes to mind is that we only accept patches via
the mailing list. While the core developers are comfortable with the
email-based workflow and swear by it, many perspective contributors
are used to the PR/MR workflow common to many Open Source projects
these days, and similarly swear by it.
I still think email is a more productive workflow in many respects,
but there's no denying the fact that web based workflow has come to
dominate the open source development world. Projects that are as
large, or larger, than libvirt are successfully using web based
workflows, so it is not credible to claim it won't work for libvirt
anymore.
The challenges are all around the human factors. We have 15 years
of using a email workflow for libvirt, so that's what our current
regular contributors are all used to, and have optimized our daily
routine around. Changing people's daily routines is always disruptive
and met with resistance.
If we're to reduce the on-ramp / learning curve to make libvirt more
accessible for new contributors I think switching libvirt to use an
web based tool is inevitable & desirable.
Beyond that there are a number of ways it would benefit existing
contributors too. Over the past 5 years there have been many times
when mailman has just given up and stopped sending mails to the
libvirt list. Some of these outages have lasted as long as an
entire week & been quite risruptive to us. Getting anyone to care
to fix the outages is challenging as few other people are impacted
to the same degree as libvirt is with this
redhat.com mailman.
All to often patches from contributors get lost in the torrent.
This can happen in web based tools too. The difference is that
the web tools have much better facilities for identifying these
missed patches & reporting on these problems, or helping organize
patches to minimized missed stuff.
If we look at the PRs submitted on GitHub against the libvirt mirror
repository[1], there's just three of them per year on average, which
is arguably not a lot; however, it should be noted that each
repository also carries a fairly loud "PULL REQUESTS ARE IGNORED"
message right at the top, and it's not unlikely that a number of
perspective contributors simply lost interest after seeing it.
Yep, that's certainly not an encouraging first impression.
As a way to make contributions easier without forcing core
developers
to give up their current workflow or making the project dependent on
a third-party provider, I suggest we adopt a hybrid approach.
First of all, we'd remove the ominous message from GitHub mirror
repositories (interestingly, the same is not present on GitLab).
That's an accident in GitLab config.
Then, we'd starting accepting PRs/MRs. The way we'd handle them is
that, when one comes in, one among the handful of core developers who
volunteered to do so would review the patches on the respective
platform, working with the submitter to get it into shape just like
they would do on the mailing list; once the series is finally ready
to be merged, the core developer would update the PR/MR as necessary,
for example picking up R-bs or fixing the kind of trivial issues that
usually don't warrant a repost, and then push to master as usual.
I really don't like this hybrid approach for several reasons.
Splitting attention between the web based tool and email list, with
only a subset of people volunteering to use the web tool is really
not attractive. Anyone who only pays attention to one of the two
places is going to be missing out on what's being submitted and
merging until it has already merged.
I think it is fundamental principal that whatever workflow we use
for patch submission & review *must* be one that is universally
used by everyone.
There's two tools being discussed here - both GitHub and GitLab.
Splitting attention between email and a web based tool is bad,
but splitting attention between email and two web based tools
is even worse.
Finally I have general desire to *NOT* designate GitHub as an
official part of the libvirt workflow on the basis that it is
a closed source tool. Yes, we are already using it, but it is
largely an ancillary service working as a passive backup service
for our git repos, not a core part of our workflow. I don't want
to elevate it to be a core part.
One last important bit. In the spirit of not requiring core
developers to alter their workflow, the developer who reviewed the
patches on GitHub/GitLab will also be responsible to format them to
the mailing list after merging them: this way, even those who don't
have a GitHub/GitLab account will get a chance to take notice of the
code changes and weigh in. Unlike what we're used to, this feedback
will come after the fact, but assuming that issues are spotted only
at that point we can either push the relevant fixes as follow-up
commits or even revert the series outright, so I don't feel like it
would be a massive problem overall.
If the only patches sent to the web tools are trivial patches, it
is minor. For any non-trivial patches though, I think only seeing
them after merge is a disaster. If we're going to officially
use a web based tool we should expect that non-trivial patches
will be sent that way.
I think we have to take a different approach to this problem. As
mentioned at the start, I think for the long term sustainability
of the project, switching to a *single* web based tool for our
patch submissions & review process is inevitable.
Such a switch, however, *must* involve discontinuing use of email
workflow so that we always have all contributors using the same
tools & seeing the same patch submissions.
To me picking GitLab, either hosted on
gitlab.com, or self-hosted,
is the only viable options, given that GitHub is closed source.
I'm not seriously suggesting we self-host, as I don't think we
need quite that level of control, nor is it productive for us to
play at being sysadmins. Having the option available though is
good.
In adopting a web based tool we nmeed to think about what this
means for our development practice more generally though, as it
has broader implications.
For example, currently we have a pratice of adding Reviewed-by tags
on patches. This is possible, but inconvenient, when using the
web tools. It is arguabley redundant, since the tools themselves
record who commented, and who added approvals on the patch and
who requested it merge. Dropping use of R-b assumes that we're
100% use the web tools and not email.
One of the things the web tools do well is fully integrating
CI testing into the merge process. New submissions would typically
get CI jobs run against them and only approved for merge once the
CI has succeeded. This largely eliminates the problem of developers
accidentally pushing changes to master that break the build. Again
though this benefit is only seen if we discontinue use of email
workflow. Much of our existing CI setup should be easy to integrate
into GitLab. Our current VMs that are used with Jenkins just need
to have the Jenkins agent replaced with the GitLab runner agent.
We can then drop Jenkins entirely and do everything though GitLab
for CI[1].
Although I'm in favour of using the web based tools, I personally
don't want to use a web UI for code review on a daily basis. My
personal goal is to create a console based UI for code review with
GitLab, so that my workflow would be largely unchanged from how
it works with email in terms of user interface.
There may be other things that are important to our frequent
contributors that we must consider before making a switch to web
based tools, in order to reduce the disruption.
Ultimately though I think it is a mistake to try to find a way
to 100% mirror everyone's existing email workflow with the web
based tools. Setting such a goal is a way to ensure we would
never switch away from email.
With such a different paradigm between email & web based tools
there is a period of guaranteed disruption. Accept that some
things may be worse, but that this will be counterbalanced
by other things being better. Some of the things which are
worse (web based pointy clicky UI) can be mitigated. People
are adaptable & will find ways to get what they need done.
Picking an open source solution gives us the option to make
contributions to improve it where it is lacking too.
To deal with inevitable disruption, I think we should switch
our repositories in two stages.
First switch all the add-on repos over to GitLab. This means
the language bindings, the wiki/web repos, the CI repos,
etc. These are all reasonably low traffic in terms of patch
submission. Switching them gives us time to learn to deal
with the web basd tools, define our process, figure out the
CI integration, identify any pain points we might want to
mitigate, etc.
Then, perhaps 6 months later, switch the main libvirt.git
high traffic repo.
To reduce confusion, I might go as far as saying we should
delete all our GitHub repositories. Keep the libvirt project
namespace, have some message to refer people to GitLab.
There is a slight complication here wrt the Go languag binding
as the hosting service is part of the code namespace for imports.
What we should have done is declare the namespace for the Go
imports to be under
libvirt.org. We can still in fact do that
and submit MRs to projects to get them to switch their codebase,
so that we don't end up with
gitlab.com in the namespace
instead, leaving the same mistake.
As for the git hosting in
libvirt.org, we should reverse the
mirroring process. eg GitLab is the master, and
libvirt.org
becomes the backup mirror.
Regards,
Daniel
[1] We still have the macOS problem which we need Travis for. Sooner
or later we need to either get a supportable macOS box to act as a
runner, or drop macOS. I'm loathe todo the latter though, since we
do have people who appear to be using macOS.
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|