Am 12.04.2011 10:14, schrieb Daniel P. Berrange:
On Mon, Apr 11, 2011 at 05:06:54PM -0500, Anthony Liguori wrote:
> On 04/11/2011 04:45 PM, Daniel P. Berrange wrote:
>> On Fri, Apr 08, 2011 at 02:26:48PM -0500, Anthony Liguori wrote:
>>> On 04/08/2011 11:02 AM, Stefan Hajnoczi wrote:
>>>> On Fri, Apr 8, 2011 at 2:31 PM, Daniel P.
Berrange<berrange(a)redhat.com> wrote:
>>>>
>>>> I have CCed Anthony and Kevin. Anthony drove the QED image streaming
>>>> and Kevin will probably be interested in the idea of allocating raw
>>>> images as a background activity while QEMU runs.
>>>>
>>>>> /*
>>>>> * @path: fully qualified filename of the virtual disk
>>>>> * @nregions: filled in the number of @region structs
>>>>> * @regions: filled with a list of allocated regions
>>>>> *
>>>>> * Query the extents of allocated regions within the
>>>>> * virtual disk file. The offsets in the list of regions
>>>>> * are not guarenteed to be sorted in any explicit order.
>>>>> */
>>>>> int virDomainBlockGetAllocationMap(virDomainPtr dom,
>>>>> const char *path,
>>>>> unsigned int *nregions,
>>>>> virDomainBlockRegionPtr
*regions);
>>>> QEMU can provide this with its existing .bdrv_is_allocated() function.
>>>> Kevin, do you have any thoughts on whether this API will work well?
>>> I think the trouble with this API proposal is that it's overloading
>>> concepts.
>>>
>>> Sparse is not the same thing as CoW to a backing file.
>> I don't like to use the term "sparse", since that implies a
specific disk
>> format (raw file with holes). Rather I use the term 'thin provisioned'
>> to refer to any disk format, where the not all physical sectors have
>> yet been allocated. A thin-provisioned disk, can trivially be thought
>> of as a disk, with a backing file whose sectors are all filled with
>> zeros.
>
> It's not so black and white today.
>
> Imagine that you had a qcow2 file, and you "streamed" it such that
> it was no longer "thin provisioned", as soon as the guest starts
> issuing trim/discards, QEMU could conceivably start defragmenting
> the image and truncating resulting in a sparse file.
>
> The only time the concept of "fully allocated" really makes sense is
> for a raw image on a simple file system. Once you start dealing
> with things like btrfs and deduplication, and of those useful
> guarantees are thrown out the window.
I would expect any behaviour where QEMU would defrag/truncate the
file to release host storage blocks to be configurable.
While I agree that we should have this option, it isn't true today. So
I'm afraid qemu doesn't meet your expectations.
It must be
possible for a mgmt app to ensure that a guest runs with fully
allocated storage at all times, to provide protection against
allocation failure due to overcommit.
> I think the real question is, why do you care about what physical
> sectors reside where? What problem are you trying to solve?
Err, I don't care where the physical sectors reside. The API
is providing info about the logical allocation information.
The primary motivation is the image streaming use case, in
the sector-at-a-time mode, rather than single-shot entire
image. The other example is making an image fully allocated.
There may be other use cases, hence the proposal to provide
a general purpose API instead of something that only considers
the narrow use case of image streaming, which we then have to
later replace with something more general.
But Anthony is right that the allocation status is more than just a boolean:
1. Allocated
2. Not allocated, but known to be zero (without backing file access)
3. Not allocated in the overlay, but allocated in the backing file
4. Not allocated in both the overlay and the backing file
For our problem cases 3 and 4 are almost the same, and they are the
interesting ones: You can turn them into case 2 by setting a flag in the
overlay image, or you can fully allocate them and turn them into case 1.
Streaming is mostly about the former, while preallocation is about the
latter.
So I think what Anthony wants to know is how this maps to your API.
Kevin