[libvirt] [RFC]: Volume allocation progress reporting

The attached patch implements storage volume allocation progress reporting for file volumes. Currently, the volume creation process looks like: - Grab the pool lock - Fully allocated the volume - 'define' the volume (so it shows up in 'virsh vol-list', etc) - Lookup the volume object to return - Drop the pool lock The new sequence is: - Grab the pool lock - 'define' the volume (even though nothing is on disk yet) - Drop the pool lock - Allocate the volume as needed - Regrab the pool lock - Lookup the volume object to return - Drop the pool lock (again) Since the volume is 'defined', the user can fetch an object reference in a separate thread, and continually poll the 'info' command for up to date numbers. This also has the benefit of dropping the pool lock during the potentially lengthy allocation, as currently 'virsh pool-list' etc. will block while any volume is being allocated. Non file volumes maintain existing behavior. I tried to make the implementation resistant to user error: say if the pool is deactivated or deleted while the volume is being allocated. The creation process may bail out, but I couldn't produce any bad errors (crashing). There are a few other small fixes in this patch: - Refresh volume info when doing volume dumpxml - Update volume capacity when doing a refresh I've also attached an ugly python script that can test this. Presuming you have a pool named 'default', running python sparse.py --vol-create-info Will launch an allocation, print vol.info in a loop. Feedback appreciated. Thanks, Cole import libvirt import threading import time import sys import optparse poolname = "default" volname = "nonsparsetest" testvol = "testvol" uri = "qemu:///system" testvolxml = """ <volume> <name>%s</name> <capacity>1048576000</capacity> <allocation>0</allocation> <target> <format type='raw'/> </target> </volume> """ % testvol volxml = """ <volume> <name>%s</name> <capacity>1048576000</capacity> <allocation>800000000</allocation> <target> <format type='raw'/> </target> </volume> """ % volname failvol = """ <volume> <name>%s</name> <capacity>1048576000</capacity> <allocation>1048576000</allocation> <target> <format type='bochs'/> </target> </volume> """ % volname poolxml = """ <pool type='dir'> <name>default</name> <target> <path>/var/lib/libvirt/images</path> </target> </pool> """ # Helper functions def exception_wrapper(cmd, args): try: cmd(*args) except Exception, e: print str(e) def make_test_vol(): pool = get_pool() pool.createXML(testvolxml, 0) def del_vol(name=volname): pool = get_pool() vol = pool.storageVolLookupByName(name) vol.delete(0) def del_pool(): pool = get_pool() pool.destroy() pool.undefine() def define_pool(): conn = libvirt.open(uri) pool = conn.storagePoolDefineXML(poolxml, 0) pool.create(0) pool.setAutostart(True) def allocate_thread(xml=volxml): try: del_vol() except: pass pool = get_pool() print "creating vol in thread" vol = pool.createXML(xml, 0) print "creating vol complete." def info_thread(vol): for i in range(0, 40): time.sleep(.5) print vol.info() def get_pool(name=poolname): conn = libvirt.open(uri) return conn.storagePoolLookupByName(name) def get_vol(name=volname): pool = get_pool() return pool.storageVolLookupByName(name) def cmd_vol_create(xml=volxml): exception_wrapper(define_pool, ()) pool = get_pool() print pool.listVolumes() t = threading.Thread(target=allocate_thread, name="Allocating", args=(xml,)) t.start() time.sleep(5) print "\nRefreshing pool and dumping list" pool.refresh(0) print pool.listVolumes() def cmd_vol_fail(): cmd_vol_create(failvol) def cmd_vol_poll(): cmd_vol_create() vol = get_vol() t = threading.Thread(target=info_thread, name="Getting info", args=(vol,)) t.start() def main(): import optparse parser = optparse.OptionParser() parser.add_option("", "--vol-create", action="store_true", dest="vol_create", help="Create a volume nonsparse volume that should " "succeed") parser.add_option("", "--vol-create-info", action="store_true", dest="vol_create_info", help="Create a volume nonsparse volume that should " "succeed, and list info in a loop") parser.add_option("", "--vol-create-fail", action="store_true", dest="vol_fail", help="Create a volume that will fail at the allocate " "stage") options, ignore = parser.parse_args() if options.vol_create: cmd_vol_create() elif options.vol_create_info: cmd_vol_poll() elif options.vol_fail: cmd_vol_fail() else: parser.print_help() sys.exit(1) if __name__ == "__main__": main()

Cole Robinson wrote:
The attached patch implements storage volume allocation progress reporting for file volumes.
Currently, the volume creation process looks like:
- Grab the pool lock - Fully allocated the volume - 'define' the volume (so it shows up in 'virsh vol-list', etc) - Lookup the volume object to return - Drop the pool lock
The new sequence is:
- Grab the pool lock - 'define' the volume (even though nothing is on disk yet) - Drop the pool lock - Allocate the volume as needed - Regrab the pool lock - Lookup the volume object to return - Drop the pool lock (again)
Since the volume is 'defined', the user can fetch an object reference in a separate thread, and continually poll the 'info' command for up to date numbers. This also has the benefit of dropping the pool lock during the potentially lengthy allocation, as currently 'virsh pool-list' etc. will block while any volume is being allocated.
Non file volumes maintain existing behavior.
I tried to make the implementation resistant to user error: say if the pool is deactivated or deleted while the volume is being allocated. The creation process may bail out, but I couldn't produce any bad errors (crashing).
There are a few other small fixes in this patch:
- Refresh volume info when doing volume dumpxml - Update volume capacity when doing a refresh
I'm realizing now that this piece isn't correct, since the capacity updating for volumes in this case is only relevant for raw files. There seem to be some other problems in this area independent of my patch, which I'll be posting fixes for soon. Thanks, Cole

On Tue, Mar 24, 2009 at 05:20:29PM -0400, Cole Robinson wrote:
Currently, the volume creation process looks like:
- Grab the pool lock - Fully allocated the volume - 'define' the volume (so it shows up in 'virsh vol-list', etc) - Lookup the volume object to return - Drop the pool lock
The new sequence is:
- Grab the pool lock - 'define' the volume (even though nothing is on disk yet) - Drop the pool lock - Allocate the volume as needed - Regrab the pool lock - Lookup the volume object to return - Drop the pool lock (again)
Since the volume is 'defined', the user can fetch an object reference in a separate thread, and continually poll the 'info' command for up to date numbers. This also has the benefit of dropping the pool lock during the potentially lengthy allocation, as currently 'virsh pool-list' etc. will block while any volume is being allocated.
Non file volumes maintain existing behavior.
I tried to make the implementation resistant to user error: say if the pool is deactivated or deleted while the volume is being allocated. The creation process may bail out, but I couldn't produce any bad errors (crashing).
This really looks like a pretty good approach / implementation. It is an effective incremental step which doesn't require new public APIs, though I still think it'd be worth us adding an async job API to let you make the 'create' operation itself non-blocking so you could query status & also more importantly cancel jobs which take too long. The one think I think it is worth doing though, would be to get rid of the 'pool' parameter from the createVol method - I don't like that we're passing in an unlocked object there, because it leaves open possiblity of someone using it by accident in a future patch. Of course you still need to support the other pool impls which don't have this two step process which I obviously why you left 'pool' in as a parameter. I think we can avoid that if we switch the 2 ops around. Instead of define + create. Have a method used for all pools: - create(conn, pool, vol) The 'pool' is locked for entire duration of this call, adds the vol to the list for this pool. Optionally allocates the vol in underlying storage. This basically unchanged from current impl And a second optional method - build (conn, vol) This is always called with 'pool' unlocked, and thus no pool is passed in. So, for all the non-FS based pools no change is required - they just carry on implementing 'create' as per current code. For the FS based pool, you have the same split of responsbilities that your patch has. 'create' merely adds the vol to the list. 'build' does the actal allocation work. As for the question of stoping/destroying a pool while a create operation is still in progress. I think I'd be more comfortable if we explicitly blocked this ability. eg, in virStoragePoolObj struct, we could have a field 'unsigned int asyncJobs' and before calling 'build', increment this. after 'build' finishes, decrement it again. Then if a 'destroy' or 'stop' operation is done on a pool and the asyncJobs field is non-zero, then refuse to allow it, returning a message saying that it cannot be performed until all outstanding jobs have completed. You may also want to have a boolean flag on the virStorageVolDefPtr itself indicating whether it is still being created, and thus disallow the 'delete' operation.
+ if (backend->defineVol) { + virStorageVolDefPtr tmpvoldef = NULL; + if (VIR_ALLOC(tmpvoldef) < 0) { + virReportOOMError(obj->conn); + goto cleanup; + } + + // FIXME: free on error + /* Make a copy of the 'defined' volume definition, since the + * original's allocation value will change as the user polls 'info', + * but we only care about the initial requested values + */ + memcpy(tmpvoldef, voldef, sizeof(*voldef));
Unless you have the boolean flag on the vol too, preventing the delete operation, I think you'd need to do a proper deep copy here. The delete operation would free any of the 'char *' fields which you've got a reference to here. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (2)
-
Cole Robinson
-
Daniel P. Berrange