Re: [Libvir] PATCH: 0/7 Implementation of storage APIs

Wednesday, 31 October 2007

"Daniel P. Berrange" <berrange(a)redhat.com&gt; wrote:
...
 Since the previous discussions didn't really end up anywhere
conclusive
 I decided it would be better to have a crack at getting some working code
 to illustrate my ideas. Thus,  the following series of 7 patches provide 
Hi Dan,

Impressive work!
It's taken me a while just to digest all of this.

...
...
 Some open questions

  - Is it worth bothering with a UUID for individual storage volumes. It
    is possible to construct a globally unique identifier for most volumes,
    by combing various bits of metadata we have such as device unique ID,
    inode, iSCSI target & LUN, etc There isn't really any UUID that fits
    into the classic libvirt 16 byte UUID.  I've implemented (randomly
    generated) UUIDs for the virStorageVolPtr object, but I'm inclined
    to remove them, since its not much use if they change each time the
    libvirtd daemon is restarted.

    The 'name' field provides a unique identifier scoped to the storage
    pool. I think we could add a 'char *key' field, as an arbitrary opaque
    string, forming a globally unique identifier for the volume. This
    would serve same purpose as UUID, but without the 16 bytes constraint
    which we can't usefully provide. 
That sounds good.  And with it being opaque, no one will
be tempted (or able) to rely on it.

...
  - For the local directory backend, I've got the ability to
choose
    between file formats on a per-volume basis. eg, /var/lib/xen/images can
    contain a mix of raw, qcow, vmdk, etc files. This is nice & flexible
    for the user, but a more complex thing to implement, since it means
    we have to probe each volume and try & figure out its format each 
Have there been requests for this feature?
The probe-and-recognize part doesn't sound too hard, but if
a large majority of use cases have homogeneous volumes-per-pool,
then for starters at least, maybe we can avoid the complexity.

A possible compromise (albeit ugly), _if_ we can dictate naming policy:
let part of a volume name (suffix, substring, component, whatever)
tell libvirtd its type.  As I said, ugly, and hence probably not
worth considering, but I had to say it :-)

...
    time we list volumes. If we constrained the choice between
formats
    to be at the pool level instead of the volume level we could avoid
    probing & thus simplify the code. This is what XenAPI does.

  - If creating non-sparse files, it can take a very long time to do the
    I/O to fill in the image. In virt-intsall/virt-manager we have nice
    incremental progress display. The API I've got currently though is
    blocking. This blocks the calling application. It also blocks the
    entire libvirtd daemon since we are single process. There are a couple
    of ways we can address this:

      1 Allow the libvirtd daemon to serve each client connection in
        a separate thread. We'd need to adding some mutex locking to the
        QEMU driver and the storage driver to handle this. It would have
        been nice to avoid threads, but I don't think we can much longer.

      2 For long running operations, spawn off a worker thread (or
        process) to perform the operation. Don't send a reply to the RPC
        message, instead just put the client on a 'wait queue', and get
        on with serving other clients. When the worker thread completes,
        send the RPC reply to the original client.

      3 Having the virStorageVolCreate()  method return immediately,
        giving back the client app some kind of 'job id'. The client app
        can poll on another API  virStorageVolJob() method to determine
        how far through the task has got. The implementation in the
        driver would have to spawn a worker thread to do the actual
        long operation. 
I like the idea of spawning off a thread for a very precise
and limited-scope task.

On first reading, I preferred your #2 worker-thread-based solution.
Then, client apps simply wait -- i.e., don't have to poll.
But we'd still need another interface for progress feedback, so #3
starts to look better: client progress feedback might come almost
for free, while polling to check for completion.

...
        Possibly we can allow creation to be async or blocking by
        making use of the 'flags' field to virStorageVolCreate() method,
        eg VIR_STORAGE_ASYNC.  If we make async behaviour optional, we
        still need some work in the daemon to avoid blocking all clients.

    This problem will also impact us when we add cloning of existing
    volumes.  It already sort of hits us when saving & restoring VMs
    if they have large amounts of memory. So perhaps we need togo
    for the general solution of making the daemon threaded per client
    connection. The ASYNC flag may still be useful anyway to get the
    incremental progress feedback in the UI. 
Could we just treat that as another type of task to hand out to
a worker thread?

Otherwise, this (#1) sounds a lot more invasive, but that's just my
relatively uninformed impression.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Libvir] PATCH: 0/7 Implementation of storage APIs