Re: [libvirt] [PATCH v7 00/13] qemu: Add quorum support to libvirt

Wednesday, 20 January 2016

Hi Peter,

I'm the current maintainer of Quorum in QEMU and I'd like to try to
answer some of your comments.

On Fri, Jan 08, 2016 at 06:20:04PM +0100, Peter Krempa wrote:

...
 So I have a few comments/observations regarding the quorum block
 driver in qemu and it's usability.

 At first I'd like to as you to describe your use case a bit
 more. I'm currently lacking the motivation to do anything about
 this, as the series is just partial and I don't really see any
 advantage of using the qorum driver at all and can't come up with
 any useful use case.

 Also a good use case is usually a good reason to drive development
 of a feature and I'm afraid that this could become abandoned without
 any real use. 
The original use case for which Quorum was designed was a data center
doing redundancy with storage in multiple separate rooms shared using
NFS.

One of the issues that the customer was facing was not only problems
in the file servers themselves but -mainly- data corruption accross
the network. Quorum can correct this on the fly and is able to
identify which one of the file servers is causing the problem without
having to rebuild a whole array (like it would be the case with RAID).

Quorum is also used for the COLO block replication functionality
currently being discussed in QEMU:

   http://wiki.qemu.org/Features/BlockReplication

...
 1) No traking of integrity
     As the quorum members don't have headers, failed quorum members
     are not recorded and remembered. The user or management app then
     has to do this externally for given storage devices.

 2) No internal tracking of quorum members
     Members of the quorum don't have any header marking them
     as such and thus any images may be mixed together with
     unforseen/catastrophic results. Higher level management then
     needs to take the role of remembering which images belong
     together. Reimplementing this looks like reimplementing a
     distriuted storage system to me. 
That's right, Quorum does not have its own file format and was
designed to work with any driver or protocol that QEMU supports, so
I'm not sure if there's much that can be done about this.

...
 3) Lack of auto-resync:
     Once the quorum get's few inconsistencies it does not
     automatically resync like the linux MD driver. With the current
     implementation the only way to resync this would be to issue a
     block-mirror (blockCopy) to /dev/null so that all blocks are
     read and rewritten to the identical copy. This also requires a
     user action.

     Additionally the member of the quorum is not ignored if it was
     out of sync in any previous time without being resynced allowing
     for split-brain/corruption scenarios. 
Quorum can fix errors on the fly (there's the 'rewrite-corrupted' flag
for that), so in those cases no manual intervention is required.

If we want a way to auto-resync a complete image that should be
doable, I believe it's relatively simple to implement in QEMU
(depending on the semantics).

For the manual resync I also agree that it would be good to have a
simple API to do that in case the user wants to do it manually. That
can be done.

...
 4) Necessity for at least 3 copies
     Since a majority needs to win in a vote, you need at least 3
     member disks for this to be fault-tolerant.

 5) Lack of speedup
     Since always all blocks are read from all members and verified
     the quorum backend doesn't really add any speed to the
     reads. This can be mostly attributed to the fact that fault
     tracking is not present.

     In other cases, due to internal error correcting codes it's very
     unlikely that a storage medium would return a corrupted sector
     without producing a error. 
4) and 5) are part of the design of Quorum, as I said one the goals
is to detect (and correct) silent data corruption on the fly, not to
speed up disk access or to be space efficient.

...
 6) Almost every remote storage technology does quorums internally
     Any distributed storage (ceph/rbd, gluster, sheepdog, etc..)
     provide the quorum functionality internally with added benefit
     that their internal working fixes problems when split of the
     network occurs.

 7) Tools are restricted to qemu and qemu-img
     It's a "proprietary" implementation so for a rebuild you have
     to use one of the two tools. AFAIK qemu-img is not really
     user friendly for the less common disk backends and we don't
     really provide any abstraction on top of that. This means
     that there really aren't any reasonable tools to do a offline
     resync. (Okay, if you know which instance is okay, you can just
     copy it ...) 
Right. If this is important I can propose to write a tool for QEMU to
deal with this. It's probably a good idea anyway.

...
 This series also lacks implementation of any user/maganement
 warning method that a block operation didn't have 100% votes in the
 quorum voting thus it's not really possible for the users to do a
 rebuild/diagnostic if something fails. 
I can't say much about this series because I haven't looked into the
code in detail yet, but I'm willing to help fix the existing problems,
add the missing features and improve the code (both in libvirt and
QEMU) if there are no other major blockers.

Thanks,

Berto

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH v7 00/13] qemu: Add quorum support to libvirt