On Thu, Dec 03, 2015 at 15:35:10 +0100, Matthias Gatto wrote:
The purpose of these patches is to introduce quorum for libvirt
I've try to follow this proposal:
http://www.redhat.com/archives/libvir-list/2014-May/msg00533.html
TL;DR: I'm concerned that the quorum implementation is not really useful
and will introduce a lot of code with little benefit.
---
So I have a few comments/observations regarding the quorum block driver
in qemu and it's usability.
At first I'd like to as you to describe your use case a bit more. I'm
currently lacking the motivation to do anything about this, as the
series is just partial and I don't really see any advantage of using the
qorum driver at all and can't come up with any useful use case.
Also a good use case is usually a good reason to drive development of a
feature and I'm afraid that this could become abandoned without any real
use.
My problems with supporting the quorum backend are:
1) No traking of integrity
As the quorum members don't have headers, failed quorum members are
not recorded and remembered. The user or management app then has to
do this externally for given storage devices.
2) No internal tracking of quorum members
Members of the quorum don't have any header marking them as such and
thus any images may be mixed together with unforseen/catastrophic
results. Higher level management then needs to take the role of
remembering which images belong together. Reimplementing this looks
like reimplementing a distriuted storage system to me.
3) Lack of auto-resync:
Once the quorum get's few inconsistencies it does not automatically
resync like the linux MD driver. With the current implementation the
only way to resync this would be to issue a block-mirror (blockCopy)
to /dev/null so that all blocks are read and rewritten to the
identical copy. This also requires a user action.
Additionally the member of the quorum is not ignored if it was out
of sync in any previous time without being resynced
allowing for split-brain/corruption scenarios.
4) Necessity for at least 3 copies
Since a majority needs to win in a vote, you need at least 3 member
disks for this to be fault-tolerant.
5) Lack of speedup
Since always all blocks are read from all members and verified the
quorum backend doesn't really add any speed to the reads. This can
be mostly attributed to the fact that fault tracking is not present.
In other cases, due to internal error correcting codes it's very
unlikely that a storage medium would return a corrupted sector
without producing a error.
6) Almost every remote storage technology does quorums internally
Any distributed storage (ceph/rbd, gluster, sheepdog, etc..) provide
the quorum functionality internally with added benefit that their
internal working fixes problems when split of the network occurs.
7) Tools are restricted to qemu and qemu-img
It's a "proprietary" implementation so for a rebuild you have to use
one of the two tools. AFAIK qemu-img is not really user friendly for
the less common disk backends and we don't really provide any
abstraction on top of that. This means that there really aren't any
reasonable tools to do a offline resync. (Okay, if you know which
instance is okay, you can just copy it ...)
This series also lacks implementation of any user/maganement warning
method that a block operation didn't have 100% votes in the quorum
voting thus it's not really possible for the users to do a
rebuild/diagnostic if something fails.
Peter