[PATCH 0/3] Fix use-after-free in block job reconnection code
by Peter Krempa
Peter Krempa (3):
qemu: migration: Don't use return value of qemuBlockJobUpdate
qemuBlockJobUpdate: Remove return value
qemuBlockJobRefreshJobs: Warn readers that 'job' may be invalid after
update
src/qemu/qemu_blockjob.c | 9 +++------
src/qemu/qemu_blockjob.h | 7 ++++---
src/qemu/qemu_migration.c | 17 +++++++----------
3 files changed, 14 insertions(+), 19 deletions(-)
--
2.24.1
4 years, 8 months
Re: [PATCH 5/5] cpu: Introduce getHost supoort for ARM
by Daniel P. Berrangé
Re-adding libvir-list to the CC line.
On Mon, Mar 30, 2020 at 08:20:44PM +0800, Zhenyu Zheng wrote:
> Hi, yes, I think we can do that using inline assembly, I can check it out
> if you think it is a better solution,
> do you have any suggestions for the features(cpu flags) part? It seems that
> ARM does not have a location/register
> that holds all the flags, seems that we have to query alot of different
> registers to check for features, which could
> be quite messy.
Perhaps there is a way to record the location/register info in the XML
against each feature name, so that the code itself can stay simple and
just be driven from the metadata ?
I'm not familiar enough with Arm to know how feasiable this is though,
so will have to leave that to others to give an opinion.
>
> On Mon, Mar 30, 2020 at 8:01 PM Daniel P. Berrangé <berrange(a)redhat.com>
> wrote:
>
> > On Mon, Mar 30, 2020 at 07:32:36PM +0800, Zhenyu Zheng wrote:
> > > Hi Daniel,
> > >
> > > Thanks for thre review and reply, my first implementation was going to
> > > gather data from /proc/cpuinfo, but unlike X86, we can only get this kind
> > > of info:
> > >
> > > processor : 0
> > > BogoMIPS : 200.00
> > > Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
> > > CPU implementer : 0x43
> > > CPU architecture: 8
> > > CPU variant : 0x1
> > > CPU part : 0x0a1
> > > CPU revision : 1
> > >
> > > so we have to perform some translation to perform human readable
> > > information, and I mentioned that 'lscpu' has done that too. So Andrea
> > > Bolognani
> > > suggested that maybe we can use it directly, to avoid re-implement the
> > > translation. Here is the discussion:
> > > https://www.redhat.com/archives/libvir-list/2020-March/msg00812.html
> >
> > On x86 we get majority of info straight from calling the CPUID instruction,
> > not /proc/cpuinfo, and use our XML data files in src/cpu_map to translate
> > things into human readable names. I see you're adding XML data files
> > with names in the earlier patches. Is it possible to add the hex values
> > for the CPU implementer/architecture/variant/part to these XML files so
> > we can directly map them in libvirt, in the same way we do for x86
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
4 years, 8 months
[GSoC][RPC] Project Proposal: "Introducing Job control to the storage driver"
by Prathamesh Chavan
GSoC Proposal - 2020
====================
Project Name: Introducing Job control to the storage driver
===========================================================
About Me:
=========
Name: Prathamesh Chavan
University: Indian Institute of Technology Kharagpur
Major: Computer Science and Engineering(Dual Degree)
Email: pc44800(a)gmail.com
Blog: pratham-pc.github.io
Contact: +91-993-235-8333
Time Zone: IST (UTC +5:30)
Background:
===========
I am a final year dual degree (B.Tech & M.Tech) student from the Department
of Computer Science and Engineering at IIT Kharagpur. During my first year
of college, I got introduced to open source through Kharagpur Open Source
Society and later I became part of it.
As my master's thesis project, I'm working on a tiered file system with
Software Wear Management for NVM Technologies. I always wanted to get
involved in storage technologies and the development process and Google
Summer of Code is a great way to achieve it.
I've been part of Google Summer of code in my second year of college under
the Git Organization. It was my first experience with a large codebase.
Information related to it is available in GSoC - 2017's archive[1].
Last year summers, I also interned at Nutanix. I worked on Logbay, which is
a configuration-based data collection, archiving and streaming tool used by
all services on Nutanix Controller-VM (CVM). I added multi-platform support
to Logbay by identifying all the dependencies of Logbay on the other CVM
based services and introduced an interface between these dependencies and
Logbay-Core for allowing it to be used on different platforms where they
aren’t available. I also implemented the interface on a Dev-VM and added
multi-port support in it for allowing multiple instances of Logbay to run
on a single Dev-VM to simulate a multi-node cluster which allowed the
developers to test their changes on their Dev-VM itself.
The Project:
============
Introducing job control to the storage driver
Summary: Implement abstract job control and use it to improve storage driver.
Mentor: Pavel Hrdina
Abstract:
=========
Currently, libvirt support job cancellation and progress reporting on domains.
That is, if there's a long-running job on a domain, e.g. migration, libvirt
reports how much data has already been transferred to the destination and how
much still needs to be transferred. However, libvirt lacks such information
reporting in storage area, to which libvirt developers refer to as the storage
driver. The aim is to report progress on several storage tasks, like volume
wiping, file allocation an others.
Job Control in Domain:
======================
In src/qemu/qemu_domain.h, we can find the struct qemuDomainJobObj, which is
a job object in domains, and is used for: coordinating between jobs, help
identify which API call owns the job object, and contain rest additional info
regarding the normal job/agent job/async job. This qemuDomainJobObj is part
of another struct qemuDomainObjPrivate, which majorly is the object, the
driver's API majorly interacts with which calling jobs on.
Whenever an API call is made, depending upon the type of the job, specific
locks are acquired and then the job is carried out. Exact design details
regarding the implementation of such APIs are present in
`src/qemu/THREADS.txt`.
Job Control in Storage:
=======================
Whenever an API call is made to a storageVol, the member `in_use` of the
struct `virStorageVolDef` is used as a reference counter. This allows us to
check whether the storage volume is already in use or not, and whether the
current API call can be carried out. Once the API call exits, the reference
counter is decremented.
Reporting of job-progress as done in case of Domains:
=====================================================
Additionally, when an async job is running: it also contains
qemuDomainJobInfo: which stores the progress data of the async job,
and an another qemuDomainJobInfo stores the statistics data of a recently
completed job.
Functions virDomainGetJobInfo() and virDomainGetJobStats() present in
libvirt-domain.c help extract information about progress of a background
job on a domain.
Plan to implement something similar in Storage Jobs:
====================================================
Firstly, it's important to bring in the notion of jobs to the storage driver.
Right now the API calls get directly executed if the required mutex locks are
acquired. But this gives the rest of the API calls less information about what
is running currently or has the locks acquired. Further, the domain jobs
additionally contain a lot of information which can even be useful in case of
the storage API calls.
Firstly, identification of what all API calls are occurring on Storage Volumes
in storage driver, and classifying them into something similar to normal jobs
and async jobs (the long-running ones). Also, some of the API calls will not
be acquiring a job (ones which didn't change the reference counter).
After this, a document similar to src/qemu/THREADS.txt needs to be created for
storage job handling and should mention the new design of the existing storage
APIs, acquiring jobs and appropriate locks as required.
Additional new APIs need to be implemented for the creation, deletion of
storage jobs. These would be similar to the domain job API present in
qemu/qemu_domain.h such as qemuDomainObjBeginJob(), etc. This specifically
also included storage equivalent functions of virDomainGetJobInfo() and
virDomainGetJobStats(). These would be used by the long-running storage jobs
to report completion progress.
Existing storage API needs to be implemented with this new job notion and the
reference counter member 'in_use' be removed.
Other desired changes:
======================
1. Unification of the notion of jobs throughout: one of the steps taking
keeping this in mind could be to have the storage job API implementation as
close to the domain job API, so that later on unification would be easier.
Unification, IMO, is left in the future scope of the project.
2. Prediction of time-remaining for job completion as the progress for
various jobs is reported.
BiteSizedTask attempted:
========================
Clean up variables in tools/libvirt-guests.sh.in
Mentor: Martin Kletzander
A patch was floated on the mailing list and its latest version can be found
here[2].
Rest of the plans for the summers:
==================================
Due to the on-going COVID-19 Pandemic, my internship was canceled. Hence,
I'll be available full-time for the project. In August, I'll be joining
Nutanix as a Full-time Employee.
PS:
===
1. It was already very late by the I decided to take part in GSoC'20. Hence, I
wasn't able to give the required amount of time for preparing this project's
proposal. If it would be okay, I'll still like to keep updating this proposal
after the deadline and add a few important things, such as solid project
deliverables along with a timeline.
2. As Pavel Hrdina is the mentor of this project, and as this project was
suggested by Michal Privoznik, I've cc'd both of them to this email.
3. Since I still haven't spent enough time understanding the details of the
existing APIs, I might have gone wrong at a few places, and I would be glad
to have them pointed out.
4. One of the mentioned requirements according to libvirt's GSoC FAQ page[3]
is passing an interview. When does this interview typically take place, in the
GSoC timeline?
5. A google-doc of this proposal can be found here[4]. Comments on the
doc are welcomed.
[1]: https://summerofcode.withgoogle.com/archive/2017/projects/5434523185577984/
[2]: https://www.redhat.com/archives/libvir-list/2020-March/msg01303.html
[3]: https://wiki.libvirt.org/page/Google_Summer_of_Code_FAQ
[4]: https://docs.google.com/document/d/1Js-yi1oNrl9fRhzvMJwBHyygYCAtr_q7Bby8k...
4 years, 8 months
On the need to move to a merge request workflow
by Daniel P. Berrangé
We've discussed the idea of replacing our mailing list review workflow with
a merge request workflow in various places, over the last 6 months or so,
but never made a concrete decision.
I understand that some people will not like the idea of using merge requests.
It is ok to feel that way, however, we need to consider the bigger picture and
be open to learning to make the most of the opportunities, even if we don't
like certain things about the tools. We need to recognise that there are
millions of open source developers who are successfully delivering code using
the merge request workflow, and we are just as capable as them.
To claim that merge requests won't/can't work for libvirt, means there is
something unique about the libvirt project's needs, and/or all those other
developers are doing a worse job than libvirt developers, and/or libvirt
developers are incapable of using tools that everyone else copes with.
I don't believe any of those points are true.
While it is certainly possible to come up with a list of points why we might
like mailing lists, on aggregate merge requests will be the better solution.
There are examples of high profile projects moving to a git forge based
workflow and new projects use them as a first choice. If mailing lists were
truly the better approach, then there would be a notable flow of projects
moving away from merge requests, back to mailing lists. There is no evidence
of a trend towards mailing list workflows.
Short summary
=============
This has quickly become another very long email to read though, so I'll
attempt a summary
* The need to attract new contributors is critical to the long term
health of the project. Since 2016 the number of new contributors has
dropped 50% from 80-90 per year, down to just 40-45 in 2019.
* The current libvirt mailing lists are suffering from unreliable delivery.
Some contributors are unable to send. Other contributors are unable
to receive.
* The contributors affected by the problems have very limited ability to
resolve the problem they have, aside from opening tickets or changing
email providers entirely.
* A mail workflow relies on all contributors agreeing on the conventions
to use when sending & replying to messages. Many of these conventions are
unwritten and impose a burden for new contributors, discouraging their
participation.
* A mail workflow has many problems even for long term contributors, most
notably with keeping track of what mails are outstanding. Each person
has invented their own solution to deal with the lack of central tracking
and they are all flawed & duplicating time on triage.
* A merge request workflow has a lower barrier to participation for new
contributors to the project, by making libvirt use commonly found workflow
for OSS projects.
* A merge request workflow is technically capable of supporting projects of
equal size, or greater, than libvirt which deliver features at an
equivalent speed. There is nothing special or unique about libvirts needs.
See KDE, GNOME, SystemD, Kubernetes, OpenShift, to name just a few
* Trying to directly match each aspect of a developer's current mail setup
to a merge request workflow is not practical. They are inherantly different
tools with differing capabilities which will never map 1:1 in all cases.
* The largest barrier to a change is not technical, but the relative lack of
experience & familiarity with the tools. This can only be solved by using
them in a real world environment. A period of productivity disruption
is inevitable while learning to optimize use of the new tools.
* A merge request workflow is better for new or infrequent contributors,
and in the long term is certainly usable for frequent contributors.
Switching to merge requests is thus better on aggregate, and with
experience/hindsight may well turn out better for everyone.
With the last few points in mind in particular, the idea is to not do a big
bang switchover of everything at once, but instead split it into phases.
Initially the focus is to be on all the non-core projects, which means the
language bindings, application integrations, and supporting tools. After those
are all switched successfully, highlighting any areas of difficulty and
allowing for some learning/adaptation, the main libvirt.git would be switched
over. The non-core projects can be switched more or less immediately as their
review demands are much lower. It is anticipated that the timeframe for
everything would be on the order of 3-6 months, at the max.
The long story
==============
The measurement question
------------------------
The key motivation behind this proposal is that it will make libvirt a more
attractive project for new contributors. The quotes later illustrate real
world cases where our use of email workflow has hurt contributors. These are
just the cases we have become aware of from people who were motivated enough
to continue their effort despite the barriers faced. It is an unknown how many
more people faced similar barriers and just walked away without us knowing.
It has been suggested in the past that we run both workflows in parallel as a
way to test which is better. I don't believe such an approach will provide
valid data. Splitting attention between two different tools will be detrimental
to the effectiveness of both tools. It would be an unbalanced comparison
because there is no practical way to compare both tools with the same data set
unless we wish to review all patch series twice. In addition the timeframes
needed for a meaningful comparison would be too long, spanning 1+ years to
gather statistically significant data on contribution. If a merge request
workflow was a new, unproven concept, then it would none the less be worth
trying to do a comparison, however, difficult. This is not the case though,
merge request workflows have been around many years & proven viable. There is
nothing special about the libvirt project or developers skills that would
invalidate this track record.
Thus this mail will look at various aspects of the tools and consider how they
impact our work, with a particular focus on issues which negatively impact new
contributors, or which have a discriminatory effect on certain subsets of new
or existing contributors.
Contributors decline
--------------------
Since 2016 libvirt has seen a significant decline in the number of new authors
contributing patches to the project. This is illustrated in the 3rd slide from
this KVM Forum presentation:
http://people.redhat.com/berrange/kvm-forum-2019/kvm-forum-2019-libvirt.pdf
For the data behind this graph, the commits to all GIT repos on libvirt.git
were analysed, taking the year recorded for the git changeset author date.
The email addresses were recorded each year, and a "new contributor" was
considered to be someone who had never made a commit in any previous calendar
year.
We can see the project grew quite fast until about 2011 and then levelled out
with a reasonably consistent number of long term vs new contributors. Since
2016, however, there has been a clear declining trend. The raw data shows for
the last two years, the non-core git repos in particular saw a big decline,
however, the same is also seen for libvirt.git itself.
There is likely no single factor responsible for this decline. A combination
of interest in containers instead of VMs, and maturity of the codebase likely
play a fairly large part. None the less, the decline of new contributors is
highly undesirable, whatever the reason, as there is still plenty of work that
needs doing in libvirt. Thus steps need to be taken to make libvirt more
attractive to new contributors. Lowering the barrier to entry for contributors
is one key aspect to this.
New contributor mistakes
------------------------
Some of the mistakes new users are liable to make when dealing with patch
submissions
* HTML mail instead of plain text
* Mangled patches due to mail client breaking long lines
* Incorrectly threaded patch series
* Not labelling series with version numbers
* Sending plain ‘diff’ output instead of a git patch
* Not basing the patch on recent git master
* Not subscribing to the list before posting
* Corporate legal privacy / copyright signatures
* Unintelligible mail quoting in replies (Outlook)
We've tried to steer people towards git-publish, as that has encoded many of
the conventions to prevent mistakes, but that is by no means a foolproof
solution. It still requires setup and a working mail server, as well as
following git best practice wrt branch usage.
New contributor real experiences
--------------------------------
Some quotes on this theme that I've collected from just the last three months.
I'm removing the names as this isn't about the individuals involved, rather
their first / early impressions & experience of libvirt:
"I have sent the patches to the list. I had seen a while back
that eventually merge requests might be fielded, so I figured
it was worth a shot."
"just not familiar with manual patch work, normally been used
to gitlab, github. I installed the git-publish as per document,
but not sure about the next step from this tool. And, I am not
sure about how to connect my email with this tool either."
"I'm happy to hear you are considering a switch to gitlab.
I've never contributed to projects which are using email based
submission"
"Have you guys received an email from .... with the subject
..... I can't find it in the archives"
"I don't know what has happened, but we seem to be missing
this patch from the list.
...
I have all the patches in my inbox from the cc, so have no idea
what happened either. It was my first time using git-publish and
it took a few tries to get anything done."
There are a variety of reasons behind these problems, but the effects are
clear. Contributing to libvirt via email is complex and error prone.
Mailman lost delivery
---------------------
For Mailman, we are entirely dependent on Red Hat IT support via a private
ticketing system. The reliability of Mailman leaves alot to be desired. For
the past 5 years or more there have been multiple periods when mail delivery
either stopped entirely, or randomly lost a subset of mails. When this happens
if often isn't obvious at first that we're missing mail, and once identified,
it has usually taken 2-3 days between opening a ticket and having someone
actively investigate and resolve it. In the worst case it was over 7 days with
broken mail delivery. This is not a good situation to be in, but we've
tolerated it as nothing was irretrievably lost, just delayed by days, and
every subscriber was equally affected.
No service is perfectly reliable and we have seen both GitLab and GitHub have
downtime periodically. A key difference though is how the problems are handled.
For the git forges, service downtime is a business critical problem, and so
gets the prioritized attention that it demands. Mailing list delivery for
libvirt is NOT considered a business critical problem for Red Hat, and so the
low priorization of fixing mailman problems reflects this.
MX server content mangling
--------------------------
More recently we've faced worse problems with the mailing list. The new
service acting as redhat.com border MX gateway is mangling mails in a way that
breaks DKIM signatures. Thus any time someone sends a message with a DKIM
signature, there are a large number of subscribers to the list whose mail
server will bounce the delivery from mailan due to DKIM validation failures.
We tried to fix this by removing the subject prefix modification and body text
footer in mailman, but that merely reduced the frequency of the problem by
a small amount. In theory it can be mitigated by having mailman replace the
"From" field with the list server's address. qemu-devel did this originally
before they switched to removing the subject prefix/body footer. The reason it
worked for QEMU was because only mailman was mangling their messages, whereas
for libvirt, both mailman and the redhat.com MX gateway are mangling. The old
version of mailman on redhat.com doesn't appear to support From header
rewriting.
Similar to DKIM problems, there are the same problems with ARC headers, though
we lack definitive proof on the cause. The effect is that when any SUSE
contributor sends a mail, deliveries to all other SUSE subscribers will be
bounced. We believe it affects some other domains, but SUSE is the most obvious
one.
Both these problems are pretty serious and discriminate against certain
subscribers depending on how their mail service configures their incoming
filters. Even redhat.com was rejecting messages with broken DKIM signatures
for a time, until enough people complained about missing valid emails. Many
other contributors are not lucky enough to have an IT staff who would respond
to such complaints.
MX server anti-spam blackholing
-------------------------------
A further problem has impacted mails sent by one contributor, which have been
silently dropped by redhat.com MX servers, with no bounce at all. After two
months we tracked this down to a bogus hostname in the From field of the
messages which caused redhat.com to consider them a spammer spoofing the
sender address & so blackhole it. This is actually a case where I probably
agree with the redhat.com MX server policy, but it is none the less incredibly
unhelpful for the unlucky contributor(s) affected by it, as it is hard for
them to discover what they did wrong. We were lucky this contributor noticed
they had a problem and was very persistent over months in talking to us about
it to rather than walking away, as I expect most people would.
Loosing members through attrition
---------------------------------
Even without any of the above problems, the mailing list workflow still has
the problem of being yet another account for people to keep track of. Every
week I have to purge multiple subscribers from libvirt lists due to repeated
bounced delivery attempts. Most commonly this is when their MX server starts
rejecting messages claiming the address no longer exists (due to leaving a
job).
IOW, when someone has changed their job, and thus changed their email address,
they have to re-register with any projects they are using. In some cases this
is ok, as they may no longer care about the project after switching jobs, but
it means we lack any channel to attempt to keep their interest. If another
contributor tries to contact them via their old email address, they'll just
get bounces too. With Git Forges, users have a login identity that is distinct
from their email address, and thus they one need update one place when
changing jobs, and interactions with other users are via this persistent
identifier.
Mail diagnosis / resolution
---------------------------
The difficulty in all of these cases is that the project contributors who are
affected have very limited means to either diagnose or resolve the problem.
They are dependent on raising tickets with their own IT admin staff, who might
then have to raise tickets with support from a company they outsourced to, and
or talk to libvirt mailman admins. The libvirt mailman admins are similarly
dependent on raising tickets with Red Hat IT and an external company. Depending
on the level of outsourcing, the problem can span across IT support staff from
4/5 different companies. This has a predictably bad effect on the chances of
resolving problems.
Mail lack of recovery
---------------------
Many of the problems result in permanent loss of messages to the subscriber
affected. Even if the problem was eventually solved, there is often no
straightforward way to get back the messages that were lost in the mean time.
One the mail server has a hard bounce for that person, it will stop retrying
delivery. The only option is to visit the mailman archive page and download
the mbox and re-import messages.
This is a result of email being a push based communication mechanism, with
only a weak delivery assurance. If the push fails and isn't retried, the
information is gone for good.
Contrast with a Git Forge, where the primary communication mechanism is a pull
based model. If there is a transient problem with the contributors network
connectivity, or the hosted service has an outage, the user will be unable to
pull information for a period of time. Once the problem is resolved, the next
attempt to pull information will include everything, with no loss from the
end user's POV.
Mail list providers
-------------------
Moving the libvirt mailing lists to another provider is not a magic bullet
solution. qemu-devel is hosted on nongnu.org and has suffered bounces due to
DKIM breakage, though at least their more modern mailman let them workaround
it better. nongnu.org though has had similar problems with mail delivery
getting stopped for periods of time, not as bad as libvirt suffered, but still
dependant on waiting until the US wakes up to get anyone to investigate. I
know other projects hosting lists on groups.io, which have also had serious
problems of late with mails getting silently discarded, or mangled such that
'git am' fails. There are other providers
The impact on contributors
--------------------------
All this conspires to negatively impact the ability of people to participate
in the libvirt project. If you can't reliably send & receive patches to
libvir-list you are effectively excluded from the project, given that email is
the core of our workflow. It hasn't had a huge visible impact on libvirt,
primarily because the Red Hat contributions have escaped the worst effects.
This "luck" doesn't make me at all happy, because it is important that anyone
involved the project has an equal ability to contribute, regardless of whether
they are Red Hat people or not. This is not the case
There are many conceptual reasons to like a distributed messaging system based
on open standards. The reality of the implementation of email though is quite
different from the theory. The distributed nature of email, combined with the
plethora of implementations which don't agree on what the minimum acceptable
standards are, is the direct cause of the problems we have with reliable
operation of the libvirt list, and mail in general. The problems with email
don't seem to be getting better, if anything they are worse, as the broader
world increasingly shifts away from email, sadly often to closed protocols,
or closed applications.
Overall though, I struggle the with idea of continuing to recommend email as
the best choice for collaboration since it is clearly failing & excluding or
harming some of our contributors.
Mail usage for new contributors
-------------------------------
For those of us who have been involved in libvirt for any length of time, the
usage of email feels quite straightforward and natural. Our familiarity and
experience can be quite different from that of new contribtors to the project.
The usage of email as a development workflow relies on a large number of,
often undocumented, conventions. If a contributor gets the conventions wrong
they'll have poor first experience with the project.
In the early days of libvirt those new contributors with existing open source
experiance would already be familiar with email conventions, but we had plenty
of contributors from the corporate world for which open source was a completely
new experiance. Over the years we've sent nasty-gram responses to plenty of
contributors who missed the conventions, though thankfully we're getting more
polite and friendly these days in how we point out these kind of mistakes. It
would be better to use a system that doesn't need this to the same degree.
These days open source is a well known development model even among corporate
developers from the closed source world, but their OSS knowledge is almost
always based on a GitForge workflow, not an email workflow.
Maintainer patch submission overload
------------------------------------
Any long term contributors to libvirt will have become familiar with the mail
based development workflow, however, long term usage and familiarity does not
imply that the solution used is the optional one, rather that the chosen
solution has been optimized.
With the volume of patches submitted to libvirt there is always pressure on
reviewer bandwidth. The imbalance between incoming patches and reviewer time
is common across all major open source projects, regardless of development
workflow and tools. What matters is thus how well the project deals with the
overload, to priortize the right aspects.
The efficiency with which developers can perform reviews is one factor and is
typically the area where email is said to be most beneficial. I personally find
email is efficient for writing comments inline to patches, but that is largely
due to the use of a text based user interface (mutt), rather than the inherent
use of email as a messaging platform. There are other aspects to the use of
email which harm our productivity as reviewers.
Personal patch tracking
-----------------------
To review a patch you first have to realize and/or remember that it exists,
and this is an area where email is quite poor. It is easy for patches to go
unattended for a prolonged period of time, drowned out in the volume of
patches on the list. We encourage users to send a "ping" if they think their
patch has been ignored for too long. This is a poor workaround for our
inability to keep track of what patches are pending. New contributors won't
even know they're expected to send a "ping", or may wonder how long they
should wait.
Essentially the mail folder needs to be thought of as a mixture of signal and
noise, where 'signal' refers to patches which need reviewer attention, and
'noise' refers to patches which are already processed in some way. The most
recent mails in an folder have a high signal component, turning increasingly
to noise as they age. The key task for email processing is thus to distinguish
signal from noise, such that we never miss any valid messages.
The supposed positive of using email is that contributors have the flexibility
to customize their way of working to enable them to keep track of pending work
in whatever way suits them best. People come up with clever mail filtering
practices / rules / scripts to organize their incoming mail stream. Instead of
considering this a benefit, we should be asking why our tools are so ill-suited
to the task that every contributor has to spend (waste) time reinventing the
wheel to distinguish signal from noise when processing email, duplicating
that same work across all contributors ?
This has a particularly high burden on contributors who don't engage with the
project on a daily basis, as they get a firehose of emails, many of which are
already obsolete by the time they look at their inbox. IOW, in-frequent
contributors experiance high noise, low signal, when interacting with the
libvirt mailing list.
At first glance merge requests may feel like they are no better, since there
would be the same number of patches arriving each day. The key difference is
that there is good structure and metadata to the merge requests. Multiple
versions of the same series, appear as a single entity. Merge requests which
have been merged or abandoned no longer appear. Active merge requests can be
tagged with labels & users to organize them. Thus looking at the list of
merge requests gives you information with a high signal content, and little
noise content. This is good for both regular contributors and infrequent or
new contributors.
Patch tracking tools
--------------------
With every contributor organizing their own personal incoming stream, there is
no visibility / interaction between contributors on pending work. There is no
way of knowing whether any given patch series is on the to do list for another
person to deal with. There have been tools created which attempt to fill the
gap by providing a centralized view of incoming patches. To name just a few,
there are patchwork, patches, patchew, and patchecker. We have tried many of
the tools to some degree or another with libvirt, but none have had success.
Some are also in use in QEMU but see periodic breakage in consuming the mail
stream for a variety of reasons.
First and foremost, they suffer from the difficulty of turning free format
mail threads into well structured data. This can only be done reliably if all
contributors follow a set of well defined conventions for sending patches,
which immediately puts a burden on contributors, and is a barrier to new
contributors. These tools typically assume that all patches can apply against
git master and if this isn't possible drop them with an error. Contributors
rarely intentionally send patches against old versions of the code, but with
the rate of commits, git master can still conflict with a newly posted patch
series, requiring human intervention. Some support special magic comments in
the mail to indicate what git version the patch is based on, but this is more
secret sauce for contributors to learn.
The problem of patch conflict resolution also creates challenges for the tools
to determine when a series has been merged & can thus be considered complete.
Then there is the matter of correlating different versions of the same
patch series. This again largely relies on contributors following a convention
of using the same subject line for the leading patch / cover letter each time
it is posted. Some people post URLs pointing to previous versions of the same
series, but this is yet more manual work to work around the limits of email
when used for patch submission. Keeping a fixed subject does not always make
sense. This again makes it hard for the tools to determine whether a patch
series has been merged or obsoleted.
So overall, while it is possible to write tools to turn incoming patch series
emails into structured data, their reliability is poor, which in turn makes
it difficult to consider relying on them. These same problems are why it is
difficult to have a bi-directional email gateway between a mailing list and a
merge request system. Sending mails from a merge request is easy. Parsing
contextual replies and getting them back into the merge request is incredibly
hard, often impossible.
Attaching CI to submissions
---------------------------
Without an ability to reliably process incoming patches in an automated manner,
it is impractical to automate downstream tasks. It is generally considered
best practice to run tests on all code before it is accepted for merge. Libvirt
CI testing is currently all done post-merge. Contributors are generally
expected to make sure code compiles and passes tests before posting it. The
testing done this way rarely has the same breadth of coverage as that done by
post-merge CI and may be done against an outdated git master. Reviewers may
or may not take the time to actually apply the patches and run builds and
tests. Where this is done, the coverage is usually just one or two platforms,
a small subset of post-merge CI.
The result of this is the frequent build & test breakage of git master, which
has to be addressed through "brown paper bag" commits, mostly pushed without
review under a "build breaker" rule. This is a prudent way to fix the breakage,
however, it would be better if the breakage was prevented in the first place.
If automation can catch the build problems before merge, there ceases to be
any need for having special exceptions to allow pushing code without review.
The result will be a higher quality output overall, with less time wasted on
manually pointing out silly mistakes, or fixing them up after the fact.
We know there is insufficient reviewer time to handle all the incoming patches
submitted, and thus we should not expect them to spend time on tasks that can
be automated. Reviewers should never be building code or running tests in
general, or spending time pointing out problems in the code that can be caught
by the tests. The tools should do all testing the moment the code is submitted,
with reviewers only spending time once CI has passed & the submitted had a
chance to re-upload any obvious fixes. The patchew tool attempts to do this
kind of automation, but as noted above it suffers from the hard problem of
reliably parsing email, determining what git master patches apply to, and needs
secret comment syntax to be used by contributors for cases where it can't do
the right thing.
Barriers to new contributors
----------------------------
The only real solution to contributor overload is to spread the work across
more people. This in turn implies a need to attract new contributors to the
project. There need to be interesting problems to attract their attention.
The task is then need to turn them into regular contributors, and this is
where first impressions of a project are important. Simply put, the first
interaction needs to be a positive experience. This might be via a bug tracker
or via a code submission, or something else. Anything which may hurt first
impressions needs to be eliminate or mitigated as much as practical.
A need to learn unusual workflows, or configure new tools, or learn about
project specific (unwritteN) conventions, are all things which stand in the
way of succesful participation.
The key value of the merge request workflow is the standardization across
projects, such that it has immediate familiarity enabling people to
contribution without having to read docs. This is a bit of an over
simplication, since there is always some project specific knowledge required.
Ideally this knowledge will be primarily around the code and not on the
development process.
I've increasingly recognised this in my own interactions with projects. Since
the widespread adoption of Git, I find it a significant turn off to find a
project is still using CVS, Subversion, or any other non-Git SCM. Increasingly
I feel the same way about development workflow and the git forge. If I can
fork a project on GitLab/GitHub, and open a pull request, I'm more likely to
send a patch upstream, than if I have to register for a new account on a
custom bug tracker or mailing list. On the flip side, while one of my own
personal projects still has a mailing list, it is more enjoyable dealing with
contributors if they open a pull request, as I see the CI results and know the
patch can be applied with 1 click if desired. The more I use the tools, the
more effective they become for me.
Keeping new contributors interested
-----------------------------------
The response to a new contributor is also important. The earlier someone
engages with them, the better. With this in mind it is useful to be able to
quickly identify new contributor and prioritize a response to them over other
work. Automation (CI) can help in this respect by picking up & reporting
problems without needing human attention.
None the less, triage of incoming requests will be needed, primarily though
tagging to classify incoming requests, and nominating certain people as
reviewers. The power of the merge request model is that such triage is
centralized & thus visible to participates, as opposed to being repeated by
every individual when using email. People can see this triage and more
effectively direct their attention where it is needed.
The Web UI problem
------------------
The use of a web UI is often seen as the biggest downside to using a Git Forge,
especially amongst people who are used to an email based code review workflow.
This is a position I have great sympathy with, as I do largely prefer text
based tools with strong keyboard navigation, over graphical tools with strong
mouse navigation. The downsides of the web UI for merge requests becomes more
apparent the larger the change is, especially if split across multiple
patches.
While common, this viewpoint on the merits of text UI over graphical UI is not
universally held. The popularity of Git Forges amongst the open souce world
demonstrates that many developers have no trouble with the web UI, or at least
are willing to accept it in order to benefit from the other features offered.
My preference for a text based UI has lead me to spend time investigating the
API support for GitLab and from there developing a custom application for
interacting with GitLab merge requests.
https://gitlab.com/bichon-project/bichon
The goal of the tool is to provide a user interface for merge requests that
is familiar to users of the mutt email client. It is not especially advanced
in its impl yet, and certainly has rough edges, but for me it has proved the
general point that we can build useful tools to integrate with GitLab, which
can mitigate the worst aspects of a Web UI.
The need to build such a tool, however, has a cost as it diverts time away
from working on libvirt itself, which negates some of the benefits of adopting
GitLab in the first place. It is hoped, however, that this is a relatively
short term cost that could ultimately be amoritized across multiple developers
who use GitLab for their projects, while the benefits are repaid continuously
over the long term.
The review comment model
------------------------
Those familiar with a email based workflow using clients like mutt will often
cherish the deeply threaded discussion model. It is worth bearing in mind
that not all email clients will follow this, with many turning the threaded
discussion into a flat discussion.
The GitLab review comment system certainly doesn't offer the kind of threading
that mutt does, but it is a step above what many email clients do. Against any
single patch, it is possible to one or more standalone comments, and one or
more discussion threads, attached to specific lines of code. The distinction
between the two is whether replies are permitted. The threads are, however,
flat, not deeply nested.
There are two interesting and perhaps useful things to be aware of. First each
discussion thread on a patch has a notion of whether it has been "resolved"
or not. It is possible to (optionally) have threads auto-resolve when a new
version of the patch series is posted to the merge request, or they can be
resolved manually by a contributor. This can be valuable in keeping track of
whether feedback on v1 was addressed in changes for v2, which is a challenge
with email review workflows.
The second interesting thing is that threads can be "rebased". This is both
useful and annoying at the same time. What appears to happen behind the scenes
is that the (commit hash, file, line num) that is associated with a comment
thread can be auto-updated to point to the latest commit that contains the
matching (file, line num). This is useful when a v2 is posted, as the comments
from v1 are moved to v2, helping identify if the problem was addressed. It is
annoying when dealing with patch series though, as it appears that a comment
created on patch 3, might be reported against patch 15 when later queried, if
patch 15 has context covering this same source file location. This makes some
sense when considering the web UI, because by default it will show the combined
diff of all patches in a series. It is less helpful for my bichon tool which
is trying to present a per-patch view. In theory it should be possible to
follow the diff context backwards to display the comment against every single
patch which has context for the same line of code. This would in fact make it
quite useful, but until that is done it is quite annoying.
More generally, comments have nice integration with the rest of the toolset
making it easy to refer to issue trackers, merge requests via magic markdown
extensions, and to tag individual contributors to get their opinion on a
question.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
4 years, 8 months
Re: GSoC'20 Interested Student: Adding support to Jailhouse Hypervisor
by Jan Kiszka
Hi Prakhar,
On 23.03.20 07:53, PRAKHAR BANSAL wrote:
> Hello All,
>
> My name is Prakhar Bansal and I am a graduate student in Computer
> Engineering at Iowa State University, US.
> I have experience with Analysing Performance of Applications running
> inside multiple virtual machines hosted by the libvirt QEMU-KVM through
> virt-manager.
>
> I am interested in working on the project to develop a Libvirt driver
> for the Jailhouse hypervisor. I looked into the initial attempt on the
> Jailhouse driver which seems to be based on the Jailhouse command-line
> interface. I am currently looking into learning and understanding the
> kernel APIs for jailhouse hypervisor.
Thanks for your interest!
> I followed the below articles mentioned by Valentine Sinitsyn to begin
> learning about the Jailhouse hypervisor.
>
> https://lwn.net/Articles/578295/
> https://lwn.net/Articles/578852/
>
> I have a few questions regarding this project, please let me know if
> someone can help me out.
Sure, go ahead. Depending on the scope of the question, libvirt might be
the better community to ask. Therefore, I'm adding its list to this thread.
Jan
>
> Thanks & Regards,
> Prakhar Bansal
>
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
4 years, 8 months
[libvirt PATCH] travis: delete all Linux jobs
by Daniel P. Berrangé
The Fedora rawhide job started failing with the latest container build.
Since we now have broad CI coverage on GitLab CI, there's not a strong
reason to continue using Travis for Linux jobs. Deleting the redundant
jobs is a better use of time than trying to debug the failure.
Signed-off-by: Daniel P. Berrangé <berrange(a)redhat.com>
---
.travis.yml | 47 -----------------------------------------------
1 file changed, 47 deletions(-)
diff --git a/.travis.yml b/.travis.yml
index f400f76118..0548b28b01 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -18,53 +18,6 @@ addons:
matrix:
include:
- - services:
- - docker
- env:
- - IMAGE="ubuntu-1804"
- - MAKE_ARGS="syntax-check distcheck"
- script:
- - make -C ci/ ci-build@$IMAGE CI_MAKE_ARGS="$MAKE_ARGS"
- - services:
- - docker
- env:
- - IMAGE="centos-7"
- - MAKE_ARGS="syntax-check distcheck"
- script:
- - make -C ci/ ci-build@$IMAGE CI_MAKE_ARGS="$MAKE_ARGS"
- - services:
- - docker
- env:
- - IMAGE="debian-9"
- - MAKE_ARGS="syntax-check distcheck"
- script:
- - make -C ci/ ci-build@$IMAGE CI_MAKE_ARGS="$MAKE_ARGS"
- - services:
- - docker
- env:
- - IMAGE="fedora-31"
- - MAKE_ARGS="syntax-check distcheck"
- script:
- - make -C ci/ ci-build@$IMAGE CI_MAKE_ARGS="$MAKE_ARGS"
- - services:
- - docker
- env:
- - IMAGE="fedora-rawhide"
- - MAKE_ARGS="syntax-check distcheck"
- script:
- - make -C ci/ ci-build@$IMAGE CI_MAKE_ARGS="$MAKE_ARGS"
- - services:
- - docker
- env:
- - IMAGE="fedora-30-cross-mingw32"
- script:
- - make -C ci/ ci-build@$IMAGE
- - services:
- - docker
- env:
- - IMAGE="fedora-30-cross-mingw64"
- script:
- - make -C ci/ ci-build@$IMAGE
- compiler: clang
language: c
os: osx
--
2.24.1
4 years, 8 months
[PATCH] Add blog for ARM Datacenter Project.
by Robert Foley
Our blog covers QEMU topics.
Signed-off-by: Robert Foley <robert.foley(a)linaro.org>
---
updater/virt-tools/config.ini | 3 +++
1 file changed, 3 insertions(+)
diff --git a/updater/virt-tools/config.ini b/updater/virt-tools/config.ini
index 9375c82..4e53029 100644
--- a/updater/virt-tools/config.ini
+++ b/updater/virt-tools/config.ini
@@ -197,3 +197,6 @@ faceheight = 96
[https://fidencio.org/tags/virt/index.xml]
name = Fabiano Fidêncio
+
+[https://futurewei-cloud.github.io/ARM-Datacenter/feed.xml]
+name = ARM Datacenter Project
--
2.17.1
4 years, 8 months
[PATCH v2 00/11] Another round of qemu:///embed fixes
by Michal Privoznik
v2 of:
https://www.redhat.com/archives/libvir-list/2020-March/msg00992.html
Basically, a lot of patches from v1 was ACKed, but there were some
suggestions which I put into separate commits and now am resending as
v2.
diff to v1:
- Renamed virDomainDriverHashRoot() to virDomainDriverGenerateRootHash()
- Generate hashed path only if the base (prefix) would point outside of
the root dir
- Use "qemu" instead of "QEMU" when generating hashed path
- New patches 2/11, 7/11
Michal Prívozník (11):
tests: Fix virQEMUDriverConfigNew() calling with respect to @root
qemu: Drop two layers of nesting of memoryBackingDir
conf: Move virDomainGenerateMachineName to hypervisor/
virDomainDriverGenerateMachineName: Factor out embed path hashing
qemu: Introduce virQEMUDriverGetEmbedRoot
qemuDomainGetMachineName: Access embeddedRoot from driver rather than
cfg
qemu: use virQEMUDriverGetEmbedRoot() to access embeddedRoot
Revert "qemu_conf: Track embed root dir"
qemu: Make hugepages path generation embed driver aware
qemu: Make memory path generation embed driver aware
qemu: Make auto dump path generation embed driver aware
src/conf/domain_conf.c | 72 ---------------
src/conf/domain_conf.h | 7 --
src/hypervisor/domain_driver.c | 88 +++++++++++++++++++
src/hypervisor/domain_driver.h | 11 +++
src/libvirt_private.syms | 3 +-
src/lxc/lxc_domain.c | 3 +-
src/qemu/qemu_command.c | 19 ++--
src/qemu/qemu_conf.c | 69 +++++++++------
src/qemu/qemu_conf.h | 23 +++--
src/qemu/qemu_domain.c | 9 +-
src/qemu/qemu_driver.c | 26 +++---
src/qemu/qemu_process.c | 9 +-
tests/domaincapstest.c | 2 +-
.../qemuxml2argvdata/cpu-numa-memshared.args | 8 +-
.../fd-memory-no-numa-topology.args | 2 +-
.../fd-memory-numa-topology.args | 4 +-
.../fd-memory-numa-topology2.args | 8 +-
.../fd-memory-numa-topology3.args | 12 +--
.../hugepages-memaccess2.args | 12 +--
.../qemuxml2argvdata/pages-dimm-discard.args | 4 +-
...vhost-user-fs-fd-memory.x86_64-latest.args | 2 +-
tests/testutilsqemu.c | 2 +-
tests/virsystemdtest.c | 5 +-
23 files changed, 217 insertions(+), 183 deletions(-)
--
2.24.1
4 years, 8 months
[PATCH v2 0/4] add support for QEMU 9pfs 'multidevs' option
by Christian Schoenebeck
QEMU 4.2 added a new option 'multidevs' for 9pfs. The following patch adds
support for this new option to libvirt.
In short, what is this about: to distinguish files uniquely from each other
in general, numeric file IDs are typically used for comparison, which in
practice is the combination of a file's device ID and the file's inode
number. Unfortunately 9p protocol's QID field used for this purpose,
currently is too small to fit both the device ID and inode number in, which
hence is a problem if one 9pfs export contains multiple devices and may
thus lead to misbheaviours on guest (e.g. with SAMBA file servers) in that
case due to potential file ID collisions.
To mitigate this problem with 9pfs a 'multidevs' option was introduced in
QEMU 4.2 for defining how to deal with this, e.g. multidevs=remap will cause
QEMU's 9pfs implementation to remap all inodes from host side to different
inode numbers on guest side in a way that prevents file ID collisions.
NOTE: In the libvirt docs changes of this libvirt patch I simply assumed
"since 6.2.0". So the final libvirt version number would need to be adjusted
in that text if necessary.
See QEMU discussion with following Message-ID for details:
8a2ffe17fda3a86b9a5a437e1245276881f1e235.1567680121.git.qemu_oss(a)crudebyte.com
v1->v2:
* Unrelated docs/formatdomain.html.in changes to separate patch.
[patch 1]
* Added new capability QEMU_CAPS_VIRTFS_MULTIDEVS.
[patch 2]
* XML changes as isolated patch.
[patch 3]
* Code style fix.
[patch 3]
* QEMU 'multidevs' command handling as isolated patch.
[patch 4]
* Error out if not QEMU_CAPS_VIRTFS_MULTIDEVS capability.
[patch 4]
* Error out on virtiofs (since it does not have the 'multidevs' option).
[patch 4]
TODO:
* Capabilities test cases would fail if <flag name='virtfs-multidevs'/>
was added to the other architectures' test case xml files, why?
[patch 2]
* The requested test cases to add: Sorry, the libvirt test case
environment is yet a mystery to me, I would not even know where to
start here.
Message-ID of v1: E1jEFpL-00028n-Qj(a)lizzy.crudebyte.com
Christian Schoenebeck (4):
docs: virtfs: add section separators
qemu: capabilities: add QEMU_CAPS_VIRTFS_MULTIDEVS
conf: add 'multidevs' option
qemu: add support for 'multidevs' option
docs/formatdomain.html.in | 47 ++++++++++++++++++-
docs/schemas/domaincommon.rng | 10 ++++
src/conf/domain_conf.c | 29 ++++++++++++
src/conf/domain_conf.h | 13 +++++
src/qemu/qemu_capabilities.c | 5 ++
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_command.c | 7 +++
src/qemu/qemu_domain.c | 12 +++++
.../caps_4.2.0.x86_64.xml | 1 +
.../caps_5.0.0.aarch64.xml | 1 +
.../caps_5.0.0.x86_64.xml | 1 +
11 files changed, 126 insertions(+), 1 deletion(-)
--
2.20.1
4 years, 8 months
RFC: qemu: use uuid instead of name for misc filenames
by Nikolay Shirokovskiy
Hi, everyone.
I'm working on supporting domain renaming when it has snapshots which is not
supported now. And it strikes me things will be much simplier to manage on
renaming if we use uuid in filenames instead of domain names.
1. Renaming will only involve saving updated config.
The saving is atomic thanx to tmp file and rename(2) approach. In constast
current renaming on error paths can leave config with old or new name. Thus
on libvirt restart extra VM will appear.
And we don't need to rename autostart links, snapshot directories etc.
2. Renaming will be possible for running domains with no efforts.
We only need to pass uuid instead of name in '-name guest=...' command line.
3. Mgmt can stop using autogenerated names for domains.
I guess openstack for example uses names like instance-000002ff because we
have many limitations on domain renaming. And if these limitations are removed
then openstack can just use user supplied names for domains.
4. No issues with long domain names and filename length limit
If the above conversion makes sense I guess the good time to apply it is
on domain start (and rename to support renaming with snapshots).
I guess we can also have tool (some virsh command) for developers to generate
symlinks so one can access logs, configs etc by name instead of uuid.
Nikolay
4 years, 8 months