[libvirt] [PATCH 2/2] Ignore backing file errors in FS storage pool (v3)

Currently a single storage volume with a broken backing file will disable the whole storage pool. This can happen when the backing file is on some unavailable network storage or if the backing volume is deleted, while the storage volumes using it are not. Since the storage pool then can not be re-activated, re-creating the missing or deleting the now useless volumes using libvirt only is impossible. To "fix" this case, all errors detected during storage pool activation are now (silently) ignored. Errors are still logged by the called functions, which have more knowledge on the detailed error condition. To reproduce: dir=$(mktemp -d) virsh pool-create-as tmp dir '' '' '' '' "$dir" virsh vol-create-as --format qcow2 tmp back 1G virsh vol-create-as --format qcow2 --backing-vol-format qcow2 --backing-vol back tmp cow 1G virsh vol-delete --pool tmp back virsh pool-refresh tmp After the last step, the pool will be gone (because it was not persistent). As long as the now broken image stays in the directory, you will not be able to re-create or re-start the pool. Signed-off-by: Philipp Hahn <hahn@univention.de> --- src/storage/storage_backend_fs.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-)

On 02/24/2011 12:26 AM, Philipp Hahn wrote: If you use '[PATCHv3] subject' instead of '[PATCH] subject (v3)', then 'git am' won't include the (v3) as part of the commit message (a good thing, since once the patch is upstream, no one cares any longer if it took three revisions to be ready for upstream).
Currently a single storage volume with a broken backing file will disable the whole storage pool. This can happen when the backing file is on some unavailable network storage or if the backing volume is deleted, while the storage volumes using it are not. Since the storage pool then can not be re-activated, re-creating the missing or deleting the now useless volumes using libvirt only is impossible.
To "fix" this case, all errors detected during storage pool activation are now (silently) ignored. Errors are still logged by the called functions, which have more knowledge on the detailed error condition.
diff --git a/src/storage/storage_backend_fs.c b/src/storage/storage_backend_fs.c index ff39d48..0cf4e54 100644 --- a/src/storage/storage_backend_fs.c +++ b/src/storage/storage_backend_fs.c @@ -647,11 +647,11 @@ virStorageBackendFileSystemRefresh(virConnectPtr conn ATTRIBUTE_UNUSED, &vol->allocation, &vol->capacity, &vol->target.encryption)) < 0) { - if (ret == -1) - goto cleanup; - else { - /* Silently ignore non-regular files, - * eg '.' '..', 'lost+found', dangling symbolic link */ + if (ret < 0) { + /* Silently ignore all errors, + * eg '.' '..', 'lost+found', dangling symbolic link, + * backend file unavailable, file unreadable. + * A detailed error report has already been logged. */ virStorageVolDefFree(vol); vol = NULL; continue;
Hmm, is this fix really right? Or should we be fixing virStorageBackendProbeTarget to handle missing backing files in the same way that it handles dangling symlinks (by returning -2), while still warning the user about real errors (when returning -1)? I think that it makes sense to create a directory pool no matter how botched the contents of the directory are, but would like a second opinion from someone more familiar with the storage pool code before committing this (danpb is probably most qualified). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Feb 24, 2011 at 02:35:58PM -0700, Eric Blake wrote:
On 02/24/2011 12:26 AM, Philipp Hahn wrote:
If you use '[PATCHv3] subject' instead of '[PATCH] subject (v3)', then 'git am' won't include the (v3) as part of the commit message (a good thing, since once the patch is upstream, no one cares any longer if it took three revisions to be ready for upstream).
Currently a single storage volume with a broken backing file will disable the whole storage pool. This can happen when the backing file is on some unavailable network storage or if the backing volume is deleted, while the storage volumes using it are not. Since the storage pool then can not be re-activated, re-creating the missing or deleting the now useless volumes using libvirt only is impossible.
To "fix" this case, all errors detected during storage pool activation are now (silently) ignored. Errors are still logged by the called functions, which have more knowledge on the detailed error condition.
diff --git a/src/storage/storage_backend_fs.c b/src/storage/storage_backend_fs.c index ff39d48..0cf4e54 100644 --- a/src/storage/storage_backend_fs.c +++ b/src/storage/storage_backend_fs.c @@ -647,11 +647,11 @@ virStorageBackendFileSystemRefresh(virConnectPtr conn ATTRIBUTE_UNUSED, &vol->allocation, &vol->capacity, &vol->target.encryption)) < 0) { - if (ret == -1) - goto cleanup; - else { - /* Silently ignore non-regular files, - * eg '.' '..', 'lost+found', dangling symbolic link */ + if (ret < 0) { + /* Silently ignore all errors, + * eg '.' '..', 'lost+found', dangling symbolic link, + * backend file unavailable, file unreadable. + * A detailed error report has already been logged. */ virStorageVolDefFree(vol); vol = NULL; continue;
Hmm, is this fix really right? Or should we be fixing virStorageBackendProbeTarget to handle missing backing files in the same way that it handles dangling symlinks (by returning -2), while still warning the user about real errors (when returning -1)?
By putting the check here, if a backing store can't be probed, we skip the entire volume. I think what we really want is to keep the volume itself, but skip just the backing store. This would mean we need to put the check in the virStorageBackendProbeTarget() method itself, probably in the bit of code which does if (meta.backingStore) { *backingStore = meta.backingStore; meta.backingStore = NULL; if (meta.backingStoreFormat == VIR_STORAGE_FILE_AUTO) { if ((*backingStoreFormat = virStorageFileProbeFormat(*backingStore)) < 0) { VIR_FORCE_CLOSE(fd); goto cleanup; } } else { ... if that virStorageFileProbeFormat() call returns -1, we can just log a warning, and set *backingStoreFormat = VIR_STORAGE_FILE_RAW
I think that it makes sense to create a directory pool no matter how botched the contents of the directory are, but would like a second opinion from someone more familiar with the storage pool code before committing this (danpb is probably most qualified).
Yep, totally agreed - this is why Philip proposed this patch after we previously discussed the issue. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Currently a single storage volume with a broken backing file will disable the whole storage pool. This can happen when the backing file is on some unavailable network storage or if the backing volume is deleted, while the storage volumes using it remain. Since the storage pool can not be re-activated, re-creating the missing or deleting the now useless volumes using libvirt only is not possible. Fixing this is a little bit tricky: 1. virStorageBackendProbeTarget() only detects the missing backing file, if the backing file format is not explicitly specified. If the backing file is created using kvm-img create -f qcow2 -o backing_fmt=qcow2,backing_file=... ... no error is detected at this stage. The new return code -3 signals that the backing file could not be opened. 2. The backingStore.format must be >= 0, since values < 0 would break virStorageVolTargetDefFormat() when dumping the XML data such as <format type='...'/> Because of this the format is faked as VIR_STORAGE_FILE_RAW. 3. virStorageBackendUpdateVolTargetInfo() always opens the backing file and thus always detects a missing backing file. Since it "only" updates the capacity, allocation, owner, group, mode and SELinux label, just ignore errors at this stage, print an error message and continue. 4. Using vol-dump on a broken volume still doesn't work, but at leas vol-destroy and pool-refresh do work now. To reproduce: dir=$(mktemp -d) virsh pool-create-as tmp dir '' '' '' '' "$dir" virsh vol-create-as --format qcow2 tmp back 1G virsh vol-create-as --format qcow2 --backing-vol-format qcow2 --backing-vol back tmp cow 1G virsh vol-delete --pool tmp back virsh pool-refresh tmp After the last step, the pool will be gone (because it was not persistent). As long as the now broken image stays in the directory, you will not be able to re-create or re-start the pool. Signed-off-by: Philipp Hahn <hahn@univention.de> --- src/storage/storage_backend_fs.c | 44 ++++++++++++++++++++++++++++--------- 1 files changed, 33 insertions(+), 11 deletions(-) diff --git a/src/storage/storage_backend_fs.c b/src/storage/storage_backend_fs.c index ff39d48..a77c927 100644 --- a/src/storage/storage_backend_fs.c +++ b/src/storage/storage_backend_fs.c @@ -97,16 +97,25 @@ virStorageBackendProbeTarget(virStorageVolTargetPtr target, *backingStore = meta.backingStore; meta.backingStore = NULL; if (meta.backingStoreFormat == VIR_STORAGE_FILE_AUTO) { - if ((*backingStoreFormat - = virStorageFileProbeFormat(*backingStore)) < 0) { - VIR_FORCE_CLOSE(fd); - goto cleanup; + if ((ret = virStorageFileProbeFormat(*backingStore)) < 0) { + /* If the backing file is currently unavailable, only log an error, + * but continue. Returning -1 here would disable the whole storage + * pool, making it unavailable for even maintenance. */ + virStorageReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot probe backing volume format: %s"), + *backingStore); + ret = -3; + } else { + *backingStoreFormat = ret; + ret = 0; } } else { *backingStoreFormat = meta.backingStoreFormat; + ret = 0; } } else { VIR_FREE(meta.backingStore); + ret = 0; } if (capacity && meta.capacity) @@ -134,7 +143,7 @@ virStorageBackendProbeTarget(virStorageVolTargetPtr target, */ } - return 0; + return ret; cleanup: VIR_FREE(*backingStore); @@ -647,15 +656,21 @@ virStorageBackendFileSystemRefresh(virConnectPtr conn ATTRIBUTE_UNUSED, &vol->allocation, &vol->capacity, &vol->target.encryption)) < 0) { - if (ret == -1) - goto cleanup; - else { + if (ret == -2) { /* Silently ignore non-regular files, * eg '.' '..', 'lost+found', dangling symbolic link */ virStorageVolDefFree(vol); vol = NULL; continue; - } + } else if (ret == -3) { + /* The backing file is currently unavailable, its format is not + * explicitly specified, the probe to auto detect the format + * failed: continue with faked RAW format, since AUTO will + * break virStorageVolTargetDefFormat() generating the line + * <format type='...'/>. */ + backingStoreFormat = VIR_STORAGE_FILE_RAW; + } else + goto cleanup; } if (backingStore != NULL) { @@ -665,8 +680,15 @@ virStorageBackendFileSystemRefresh(virConnectPtr conn ATTRIBUTE_UNUSED, if (virStorageBackendUpdateVolTargetInfo(&vol->backingStore, NULL, NULL) < 0) { - VIR_FREE(vol->backingStore.path); - goto cleanup; + /* The backing file is currently unavailable, the capacity, + * allocation, owner, group and mode are unknown. Just log the + * error an continue. + * Unfortunately virStorageBackendProbeTarget() might already + * have logged a similar message for the same problem, but only + * if AUTO format detection was used. */ + virStorageReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot probe backing volume info: %s"), + vol->backingStore); } } -- 1.7.1

Hello Daniel, hello Eric, Ping? Am Dienstag 01 März 2011 16:48:20 schrieb Philipp Hahn:
Currently a single storage volume with a broken backing file will disable the whole storage pool. This can happen when the backing file is on some unavailable network storage or if the backing volume is deleted, while the storage volumes using it remain. Since the storage pool can not be re-activated, re-creating the missing or deleting the now useless volumes using libvirt only is not possible.
Fixing this is a little bit tricky: 1. virStorageBackendProbeTarget() only detects the missing backing file, if the backing file format is not explicitly specified. If the backing file is created using kvm-img create -f qcow2 -o backing_fmt=qcow2,backing_file=... ... no error is detected at this stage. The new return code -3 signals that the backing file could not be opened. 2. The backingStore.format must be >= 0, since values < 0 would break virStorageVolTargetDefFormat() when dumping the XML data such as <format type='...'/> Because of this the format is faked as VIR_STORAGE_FILE_RAW. 3. virStorageBackendUpdateVolTargetInfo() always opens the backing file and thus always detects a missing backing file. Since it "only" updates the capacity, allocation, owner, group, mode and SELinux label, just ignore errors at this stage, print an error message and continue. 4. Using vol-dump on a broken volume still doesn't work, but at leas vol-destroy and pool-refresh do work now.
Any news on the review front for this issue? I know the patch might not be the perfect solution, but without any comment I don't know how to proceed or which part I should improve. Sincerely Philipp Hahn -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/

On 03/01/2011 08:48 AM, Philipp Hahn wrote:
Currently a single storage volume with a broken backing file will disable the whole storage pool. This can happen when the backing file is on some unavailable network storage or if the backing volume is deleted, while the storage volumes using it remain. Since the storage pool can not be re-activated, re-creating the missing or deleting the now useless volumes using libvirt only is not possible.
Fixing this is a little bit tricky: 1. virStorageBackendProbeTarget() only detects the missing backing file, if the backing file format is not explicitly specified. If the backing file is created using kvm-img create -f qcow2 -o backing_fmt=qcow2,backing_file=... ... no error is detected at this stage. The new return code -3 signals that the backing file could not be opened. 2. The backingStore.format must be >= 0, since values < 0 would break virStorageVolTargetDefFormat() when dumping the XML data such as <format type='...'/> Because of this the format is faked as VIR_STORAGE_FILE_RAW. 3. virStorageBackendUpdateVolTargetInfo() always opens the backing file and thus always detects a missing backing file. Since it "only" updates the capacity, allocation, owner, group, mode and SELinux label, just ignore errors at this stage, print an error message and continue. 4. Using vol-dump on a broken volume still doesn't work, but at leas
s/leas/least/
vol-destroy and pool-refresh do work now.
Thanks for the ping, and yes, this version looks like it fixes the feedback from earlier review. Sorry for the delay review.
To reproduce: dir=$(mktemp -d) virsh pool-create-as tmp dir '' '' '' '' "$dir" virsh vol-create-as --format qcow2 tmp back 1G virsh vol-create-as --format qcow2 --backing-vol-format qcow2 --backing-vol back tmp cow 1G virsh vol-delete --pool tmp back virsh pool-refresh tmp After the last step, the pool will be gone (because it was not persistent). As long as the now broken image stays in the directory, you will not be able to re-create or re-start the pool.
Even more awesome when you do this :)
@@ -665,8 +680,15 @@ virStorageBackendFileSystemRefresh(virConnectPtr conn ATTRIBUTE_UNUSED, if (virStorageBackendUpdateVolTargetInfo(&vol->backingStore, NULL, NULL) < 0) { - VIR_FREE(vol->backingStore.path); - goto cleanup; + /* The backing file is currently unavailable, the capacity, + * allocation, owner, group and mode are unknown. Just log the + * error an continue. + * Unfortunately virStorageBackendProbeTarget() might already + * have logged a similar message for the same problem, but only + * if AUTO format detection was used. */ + virStorageReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot probe backing volume info: %s"), + vol->backingStore);
Oops: CC libvirt_driver_storage_la-storage_backend_fs.lo cc1: warnings being treated as errors storage/storage_backend_fs.c: In function 'virStorageBackendFileSystemRefresh': storage/storage_backend_fs.c:688:17: error: format '%s' expects type 'char *', but argument 8 has type 'virStorageVolTarget' [-Wformat] ACK with that fixed, so I squashed this and pushed: diff --git i/src/storage/storage_backend_fs.c w/src/storage/storage_backend_fs.c index e629219..0a6b074 100644 --- i/src/storage/storage_backend_fs.c +++ w/src/storage/storage_backend_fs.c @@ -1,7 +1,7 @@ /* * storage_backend_fs.c: storage backend for FS and directory handling * - * Copyright (C) 2007-2010 Red Hat, Inc. + * Copyright (C) 2007-2011 Red Hat, Inc. * Copyright (C) 2007-2008 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -687,7 +687,7 @@ virStorageBackendFileSystemRefresh(virConnectPtr conn ATTRIBUTE_UNUSED, * if AUTO format detection was used. */ virStorageReportError(VIR_ERR_INTERNAL_ERROR, _("cannot probe backing volume info: %s"), - vol->backingStore); + vol->backingStore.path); } } -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Philipp Hahn