Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

Monday, 14 November 2011

On 11/14/2011 04:16 AM, Daniel P. Berrange wrote:
...
 On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>> Live migration with qcow2 or any other image format is just not going to
work
>>> right now even with proper clustered storage.  I think doing a block level
flush
>>> cache interface and letting block devices decide how to do it is the best
approach.
>>
>> I would really prefer reusing the existing open/close code. It means
>> less (duplicated) code, is existing code that is well tested and doesn't
>> make migration much of a special case.
>>
>> If you want to avoid reopening the file on the OS level, we can reopen
>> only the topmost layer (i.e. the format, but not the protocol) for now
>> and in 1.1 we can use bdrv_reopen().
>>
>
> Intuitively I dislike _reopen style interfaces.  If the second open
> yields different results from the first, does it invalidate any
> computations in between?
>
> What's wrong with just delaying the open?

 If you delay the 'open' until the mgmt app issues 'cont', then you loose
 the ability to rollback to the source host upon open failure for most
 deployed versions of libvirt. We only fairly recently switched to a five
 stage migration handshake to cope with rollback when 'cont' fails. 
Delayed open isn't a panacea.  With the series I sent, we should be able to 
migration with a qcow2 file on coherent shared storage.

There are two other cases that we care about: migration with nfs cache!=none and 
direct attached storage with cache!=none

Whether the open is deferred matters less with NFS than if the open happens 
after the close on the source.  To fix NFS cache!=none, we would have to do a 
bdrv_close() before sending the last byte of migration data and make sure that 
we bdrv_open() after receiving the last byte of migration data.

The problem with this IMHO is it creates a large window where noone has the file 
open and you're critically vulnerable to losing your VM.

I'm much more in favor of a smarter caching policy.  If we can fcntl() our way 
to O_DIRECT on NFS, that would be fairly interesting.  I'm not sure if this is 
supported today but it's something we could look into adding in the kernel. 
That way we could force NFS to O_DIRECT during migration which would solve this 
problem robustly.

Deferred open doesn't help with direct attached storage.  There simple is no 
guarantee that there isn't data in the page cache.

Again, I think defaulting DAS to cache=none|directsync is what makes the most 
sense here.

We can even add a migration blocker for DAS with cache=on.  If we can do dynamic 
toggling of the cache setting, then that's pretty friendly at the end of the day.

Regards,

Anthony Liguori

...

 Daniel 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions