On Wed, Feb 10, 2016 at 17:01:22 +0200, Alberto Garcia wrote:
On Tue, Jan 26, 2016 at 05:36:36PM +0100, Peter Krempa wrote:
[...]
> Whoah. Data corruption accross network? I'm not quite sure
whether
> I'd use this to cover up a problem with the storage technology or
> network rather than just fix the root cause. If you have 3 copies,
> and manage to have a sector where all 3 differ then the quorum
> driver won't help. And it will make it even harder to find any
> possible problems.
But in that case you detect that it went wrong and you get an I/O
error. The problem with silent data corruption is that it can be hard
to detect.
Yes, and that's why it should be fixed at the network storage technology
layer rather than anywhere else.
If there's a bit-flip across the network Quorum can detect it,
report it and correct the faulty version without needing to rebuild
everything.
I still think that you do wan't to rebulild the whole volume in such
case if you care about your data in the slightest. Otherwise you don't
have to do stuff like this.
[...]
> > Quorum is also used for the COLO block replication
functionality
> > currently being discussed in QEMU:
> >
> >
http://wiki.qemu.org/Features/BlockReplication
[...]
Yes, I just wanted to point out one other example of how Quorum is
being used. This current series of Quorum for libvirt is not taking
COLO into account at all, in fact it is still under review in QEMU.
Yes and there are apparently some design issues/major problems. If they
are going to use this, we should probably wait on the result of that
discussion.
[...]
> 1) Apart from abusing quorums in fifo mode for COLO I still
don't
> think that they are hugely useful. (no, data corruption on NFS
> didn't persuade me)
It is one of the main reasons why Quorum was written. Here's one more
example of silent data corruption over the network:
https://cds.cern.ch/record/2026187/files/Adler32_Data_Corruption.pdf
That underlines the fact that the network storage protocol does a
terrible job in this case. Additionally in the described case the
highlighted advantage of adler32 is speed. Patching that with
triplicating the data and necessary bandwidth to transfer it does not
stack with that really well. Additionally the regular use case remains
still broken.
> 2) The implementation in this series as in current state adds a
lot
> of code to mintain that wouldn't much used be and is incomplete in
> many aspects:
[...]
> * no support for the quorum failure events and reporting
> * no way to control 'rewrite-corrupted'
I can look into these.
> * since we don't use node-names yet, it's not really possible to do
> block jobs on quorum disks, thus they are forbidden
I'm not sure what's the status of node names in libvirt, I could also
try to help to make it happen.
They are basically non-existent. To be honest I think that the node name
support stuff and better approach at constructing block devices and
their backing chains and better handling of block jobs should be done
prior to quorum.
This series tries to partially do the stuff that is a plan how to
approach some stuff regarding disks. One of them is that the backing
chain of a disk is persisted in the XML and then fully constructed.
By adding this code the refactor will be even more painful as it will
currently be.
I'm actually planing to do this in short term future, but unfortunately
this is not a weekend project.
> * since block jobs are forbidden and rewrite-corrupted can't be
> * enabled, no way to do the rebuild
'rewrite-corrupted' can be easily added to the series so I don't
think that's a problem. The block jobs thing I would need to see
first. Would you really need to have node names in order to rebuild a
Quorum?
Most probably yes. Without them, it will be just an ugly hack.
Peter