[PATCH 0/3] Implement support for QCOW2 data files

There are use cases when the existing disks (i.e. LVM) are wanted to be used with advanced features. For this purpose QEMU allows data-file feature for qcow2 files: metadata is kept in the qcow2 file like usual, but guest data is written to an external file. These patches enable support for this feature in libvirt. Nikolai Barybin (3): conf: add data-file feature and related fields to virStorageSource storage: add qcow2 filename parsing from header qemu: enable qcow2 data-file attach to VM on start src/conf/storage_source_conf.c | 8 ++++ src/conf/storage_source_conf.h | 5 ++ src/qemu/qemu_block.c | 45 ++++++++++++++++++ src/qemu/qemu_cgroup.c | 3 ++ src/qemu/qemu_namespace.c | 6 +++ src/storage_file/storage_file_probe.c | 66 +++++++++++++++++++++++---- src/storage_file/storage_source.c | 38 ++++++++++++++- 7 files changed, 160 insertions(+), 11 deletions(-) -- 2.43.5

Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> --- src/conf/storage_source_conf.c | 8 ++++++++ src/conf/storage_source_conf.h | 5 +++++ 2 files changed, 13 insertions(+) diff --git a/src/conf/storage_source_conf.c b/src/conf/storage_source_conf.c index 908bc5fab2..d76de058b9 100644 --- a/src/conf/storage_source_conf.c +++ b/src/conf/storage_source_conf.c @@ -69,6 +69,7 @@ VIR_ENUM_IMPL(virStorageFileFeature, VIR_STORAGE_FILE_FEATURE_LAST, "lazy_refcounts", "extended_l2", + "data_file", ); @@ -826,6 +827,7 @@ virStorageSourceCopy(const virStorageSource *src, def->relPath = g_strdup(src->relPath); def->backingStoreRaw = g_strdup(src->backingStoreRaw); def->backingStoreRawFormat = src->backingStoreRawFormat; + def->dataFileRaw = g_strdup(src->dataFileRaw); def->snapshot = g_strdup(src->snapshot); def->configFile = g_strdup(src->configFile); def->nodenameformat = g_strdup(src->nodenameformat); @@ -891,6 +893,12 @@ virStorageSourceCopy(const virStorageSource *src, return NULL; } + if (src->dataFileStore) { + if (!(def->dataFileStore = virStorageSourceCopy(src->dataFileStore, + false))) + return NULL; + } + if (src->fdtuple) def->fdtuple = g_object_ref(src->fdtuple); diff --git a/src/conf/storage_source_conf.h b/src/conf/storage_source_conf.h index 05b4bda16c..fa27e61204 100644 --- a/src/conf/storage_source_conf.h +++ b/src/conf/storage_source_conf.h @@ -88,6 +88,7 @@ VIR_ENUM_DECL(virStorageFileFormat); typedef enum { VIR_STORAGE_FILE_FEATURE_LAZY_REFCOUNTS = 0, VIR_STORAGE_FILE_FEATURE_EXTENDED_L2, + VIR_STORAGE_FILE_FEATURE_DATA_FILE, VIR_STORAGE_FILE_FEATURE_LAST } virStorageFileFeature; @@ -359,6 +360,9 @@ struct _virStorageSource { /* backing chain of the storage source */ virStorageSource *backingStore; + /* qcow2 data file source */ + virStorageSource *dataFileStore; + /* metadata for storage driver access to remote and local volumes */ void *drv; @@ -369,6 +373,7 @@ struct _virStorageSource { /* Name of the child backing store recorded in metadata of the * current file. */ char *backingStoreRaw; + char *dataFileRaw; virStorageFileFormat backingStoreRawFormat; /* metadata that allows identifying given storage source */ -- 2.43.5

Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> --- src/storage_file/storage_file_probe.c | 66 +++++++++++++++++++++++---- 1 file changed, 57 insertions(+), 9 deletions(-) diff --git a/src/storage_file/storage_file_probe.c b/src/storage_file/storage_file_probe.c index 243927d50a..c7ff743c87 100644 --- a/src/storage_file/storage_file_probe.c +++ b/src/storage_file/storage_file_probe.c @@ -93,8 +93,9 @@ struct FileTypeInfo { size_t buf_size); int (*getBackingStore)(char **res, int *format, const char *buf, size_t buf_size); + int (*getDataFile)(char **res, const char *buf, int format, size_t buf_size); int (*getFeatures)(virBitmap **features, int format, - char *buf, ssize_t len); + const char *buf, ssize_t len); }; @@ -105,8 +106,9 @@ qcow2GetClusterSize(const char *buf, size_t buf_size); static int qcowXGetBackingStore(char **, int *, const char *, size_t); +static int qcowXGetDataFile(char **, const char *, int, size_t); static int qcow2GetFeatures(virBitmap **features, int format, - char *buf, ssize_t len); + const char *buf, ssize_t len); static int vmdk4GetBackingStore(char **, int *, const char *, size_t); static int @@ -126,6 +128,7 @@ qedGetBackingStore(char **, int *, const char *, size_t); #define QCOW2_HDR_EXTENSION_END 0 #define QCOW2_HDR_EXTENSION_BACKING_FORMAT 0xE2792ACA +#define QCOW2_HDR_EXTENSION_DATA_FILE_NAME 0x44415441 #define QCOW2v3_HDR_FEATURES_INCOMPATIBLE (QCOW2_HDR_TOTAL_SIZE) #define QCOW2v3_HDR_FEATURES_COMPATIBLE (QCOW2v3_HDR_FEATURES_INCOMPATIBLE+8) @@ -313,6 +316,7 @@ static struct FileTypeInfo const fileTypeInfo[] = { qcow2EncryptionInfo, qcow2GetClusterSize, qcowXGetBackingStore, + qcowXGetDataFile, qcow2GetFeatures }, [VIR_STORAGE_FILE_QED] = { @@ -359,7 +363,7 @@ enum qcow2IncompatibleFeature { static const virStorageFileFeature qcow2IncompatibleFeatureArray[] = { VIR_STORAGE_FILE_FEATURE_LAST, /* QCOW2_INCOMPATIBLE_FEATURE_DIRTY */ VIR_STORAGE_FILE_FEATURE_LAST, /* QCOW2_INCOMPATIBLE_FEATURE_CORRUPT */ - VIR_STORAGE_FILE_FEATURE_LAST, /* QCOW2_INCOMPATIBLE_FEATURE_DATA_FILE */ + VIR_STORAGE_FILE_FEATURE_DATA_FILE, /* QCOW2_INCOMPATIBLE_FEATURE_DATA_FILE */ VIR_STORAGE_FILE_FEATURE_LAST, /* QCOW2_INCOMPATIBLE_FEATURE_COMPRESSION */ VIR_STORAGE_FILE_FEATURE_EXTENDED_L2, /* QCOW2_INCOMPATIBLE_FEATURE_EXTL2 */ }; @@ -391,7 +395,7 @@ cowGetBackingStore(char **res, static int qcow2GetExtensions(const char *buf, size_t buf_size, - int *backingFormat) + void *data) { size_t offset; size_t extension_start; @@ -467,7 +471,7 @@ qcow2GetExtensions(const char *buf, switch (magic) { case QCOW2_HDR_EXTENSION_BACKING_FORMAT: { g_autofree char *tmp = NULL; - if (!backingFormat) + if (!data) break; tmp = g_new0(char, len + 1); @@ -480,9 +484,20 @@ qcow2GetExtensions(const char *buf, * doesn't look like a format driver name to be a protocol driver * directly and thus the image is in fact still considered raw */ - *backingFormat = virStorageFileFormatTypeFromString(tmp); - if (*backingFormat <= VIR_STORAGE_FILE_NONE) - *backingFormat = VIR_STORAGE_FILE_RAW; + *(int *)data = virStorageFileFormatTypeFromString(tmp); + if (*(int *)data <= VIR_STORAGE_FILE_NONE) + *(int *)data = VIR_STORAGE_FILE_RAW; + break; + } + + case QCOW2_HDR_EXTENSION_DATA_FILE_NAME: { + g_autofree char *tmp = NULL; + if (!data) + break; + + *(char **)data = g_new0(char, len + 1); + memcpy(*(char **)data, buf + offset, len); + (*((char **)data))[len] = '\0'; break; } @@ -559,6 +574,33 @@ qcowXGetBackingStore(char **res, } +static int +qcowXGetDataFile(char **res, + const char *buf, + int format, + size_t buf_size) +{ + virBitmap *features = NULL; + char *filename = NULL; + size_t namelen = 0; + *res = NULL; + + if (buf_size < QCOW2v3_HDR_FEATURES_INCOMPATIBLE + 8) + return 0; + + ignore_value(qcow2GetFeatures(&features, format, buf, buf_size)); + if (features && virBitmapIsBitSet(features, VIR_STORAGE_FILE_FEATURE_DATA_FILE)) { + if (qcow2GetExtensions(buf, buf_size, &filename) < 0) + return 0; + + namelen = strlen(filename) + 1; + *res = g_new0(char, namelen); + memcpy(*res, filename, namelen); + } + + return 0; +} + static int vmdk4GetBackingStore(char **res, int *format, @@ -789,7 +831,7 @@ qcow2GetFeaturesProcessGroup(uint64_t bits, static int qcow2GetFeatures(virBitmap **features, int format, - char *buf, + const char *buf, ssize_t len) { int version = -1; @@ -961,6 +1003,12 @@ virStorageFileProbeGetMetadata(virStorageSource *meta, meta->backingStoreRawFormat = format; } + VIR_FREE(meta->dataFileRaw); + if (fileTypeInfo[meta->format].getDataFile != NULL) { + fileTypeInfo[meta->format].getDataFile(&meta->dataFileRaw, + buf, meta->format, len); + } + g_clear_pointer(&meta->features, virBitmapFree); if (fileTypeInfo[meta->format].getFeatures != NULL && fileTypeInfo[meta->format].getFeatures(&meta->features, meta->format, buf, len) < 0) -- 2.43.5

On Sun, Aug 11, 2024 at 17:34:47 +0300, Nikolai Barybin via Devel wrote:
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> --- src/storage_file/storage_file_probe.c | 66 +++++++++++++++++++++++---- 1 file changed, 57 insertions(+), 9 deletions(-)
diff --git a/src/storage_file/storage_file_probe.c b/src/storage_file/storage_file_probe.c index 243927d50a..c7ff743c87 100644 --- a/src/storage_file/storage_file_probe.c +++ b/src/storage_file/storage_file_probe.c @@ -93,8 +93,9 @@ struct FileTypeInfo { size_t buf_size); int (*getBackingStore)(char **res, int *format, const char *buf, size_t buf_size); + int (*getDataFile)(char **res, const char *buf, int format, size_t buf_size); int (*getFeatures)(virBitmap **features, int format, - char *buf, ssize_t len); + const char *buf, ssize_t len);
This looks like a separate refactor. Please don't mix those. Especially if you don't document what's happening.
};
@@ -105,8 +106,9 @@ qcow2GetClusterSize(const char *buf, size_t buf_size); static int qcowXGetBackingStore(char **, int *, const char *, size_t); +static int qcowXGetDataFile(char **, const char *, int, size_t); static int qcow2GetFeatures(virBitmap **features, int format, - char *buf, ssize_t len); + const char *buf, ssize_t len); static int vmdk4GetBackingStore(char **, int *, const char *, size_t); static int @@ -126,6 +128,7 @@ qedGetBackingStore(char **, int *, const char *, size_t);
#define QCOW2_HDR_EXTENSION_END 0 #define QCOW2_HDR_EXTENSION_BACKING_FORMAT 0xE2792ACA +#define QCOW2_HDR_EXTENSION_DATA_FILE_NAME 0x44415441
#define QCOW2v3_HDR_FEATURES_INCOMPATIBLE (QCOW2_HDR_TOTAL_SIZE) #define QCOW2v3_HDR_FEATURES_COMPATIBLE (QCOW2v3_HDR_FEATURES_INCOMPATIBLE+8) @@ -313,6 +316,7 @@ static struct FileTypeInfo const fileTypeInfo[] = { qcow2EncryptionInfo, qcow2GetClusterSize, qcowXGetBackingStore, + qcowXGetDataFile, qcow2GetFeatures }, [VIR_STORAGE_FILE_QED] = { @@ -359,7 +363,7 @@ enum qcow2IncompatibleFeature { static const virStorageFileFeature qcow2IncompatibleFeatureArray[] = { VIR_STORAGE_FILE_FEATURE_LAST, /* QCOW2_INCOMPATIBLE_FEATURE_DIRTY */ VIR_STORAGE_FILE_FEATURE_LAST, /* QCOW2_INCOMPATIBLE_FEATURE_CORRUPT */ - VIR_STORAGE_FILE_FEATURE_LAST, /* QCOW2_INCOMPATIBLE_FEATURE_DATA_FILE */ + VIR_STORAGE_FILE_FEATURE_DATA_FILE, /* QCOW2_INCOMPATIBLE_FEATURE_DATA_FILE */ VIR_STORAGE_FILE_FEATURE_LAST, /* QCOW2_INCOMPATIBLE_FEATURE_COMPRESSION */ VIR_STORAGE_FILE_FEATURE_EXTENDED_L2, /* QCOW2_INCOMPATIBLE_FEATURE_EXTL2 */ }; @@ -391,7 +395,7 @@ cowGetBackingStore(char **res, static int qcow2GetExtensions(const char *buf, size_t buf_size, - int *backingFormat) + void *data) { size_t offset; size_t extension_start; @@ -467,7 +471,7 @@ qcow2GetExtensions(const char *buf, switch (magic) { case QCOW2_HDR_EXTENSION_BACKING_FORMAT: { g_autofree char *tmp = NULL; - if (!backingFormat) + if (!data) break;
tmp = g_new0(char, len + 1); @@ -480,9 +484,20 @@ qcow2GetExtensions(const char *buf, * doesn't look like a format driver name to be a protocol driver * directly and thus the image is in fact still considered raw */ - *backingFormat = virStorageFileFormatTypeFromString(tmp); - if (*backingFormat <= VIR_STORAGE_FILE_NONE) - *backingFormat = VIR_STORAGE_FILE_RAW; + *(int *)data = virStorageFileFormatTypeFromString(tmp); + if (*(int *)data <= VIR_STORAGE_FILE_NONE) + *(int *)data = VIR_STORAGE_FILE_RAW;
Please add new fields rather than doing this overwrite.
+ break; + } + + case QCOW2_HDR_EXTENSION_DATA_FILE_NAME: { + g_autofree char *tmp = NULL; + if (!data) + break; + + *(char **)data = g_new0(char, len + 1); + memcpy(*(char **)data, buf + offset, len); + (*((char **)data))[len] = '\0';
How is this even supposed to work? The function loops over all extensions in the header so it would overwrite the previous value.
break; }
@@ -559,6 +574,33 @@ qcowXGetBackingStore(char **res, }
+static int +qcowXGetDataFile(char **res, + const char *buf, + int format, + size_t buf_size) +{ + virBitmap *features = NULL; + char *filename = NULL; + size_t namelen = 0; + *res = NULL; + + if (buf_size < QCOW2v3_HDR_FEATURES_INCOMPATIBLE + 8) + return 0; + + ignore_value(qcow2GetFeatures(&features, format, buf, buf_size)); + if (features && virBitmapIsBitSet(features, VIR_STORAGE_FILE_FEATURE_DATA_FILE)) { + if (qcow2GetExtensions(buf, buf_size, &filename) < 0) + return 0; + + namelen = strlen(filename) + 1; + *res = g_new0(char, namelen); + memcpy(*res, filename, namelen);
Ummm, so you 'strlen()' the buffer, but then use memcpy? You can use 'g_strdup' directly.
+ } + + return 0; +} + static int vmdk4GetBackingStore(char **res, int *format, @@ -789,7 +831,7 @@ qcow2GetFeaturesProcessGroup(uint64_t bits, static int qcow2GetFeatures(virBitmap **features, int format, - char *buf, + const char *buf, ssize_t len) { int version = -1; @@ -961,6 +1003,12 @@ virStorageFileProbeGetMetadata(virStorageSource *meta, meta->backingStoreRawFormat = format; }
+ VIR_FREE(meta->dataFileRaw); + if (fileTypeInfo[meta->format].getDataFile != NULL) { + fileTypeInfo[meta->format].getDataFile(&meta->dataFileRaw, + buf, meta->format, len); + } + g_clear_pointer(&meta->features, virBitmapFree); if (fileTypeInfo[meta->format].getFeatures != NULL && fileTypeInfo[meta->format].getFeatures(&meta->features, meta->format, buf, len) < 0) -- 2.43.5

- parse data-file metadata and init src->dataFileStore struct - chown data-file to allow qemu to open it - add data-file path to VM's cgroup and namespace - add data-file option to QEMU process command line Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> --- src/qemu/qemu_block.c | 45 +++++++++++++++++++++++++++++++ src/qemu/qemu_cgroup.c | 3 +++ src/qemu/qemu_namespace.c | 6 +++++ src/storage_file/storage_source.c | 38 ++++++++++++++++++++++++-- 4 files changed, 90 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c index d6cdf521c4..38f4717d56 100644 --- a/src/qemu/qemu_block.c +++ b/src/qemu/qemu_block.c @@ -1271,6 +1271,48 @@ qemuBlockStorageSourceGetFormatQcowGenericProps(virStorageSource *src, } +static int +qemuBlockStorageGetQcow2DataFileProps(virStorageSource *src, + virJSONValue *props) +{ + g_autoptr(virJSONValue) dataFileDef = NULL; + virStorageType actualType; + g_autofree char *driverName = NULL; + + if (!src->dataFileRaw) + return 0; + + actualType = virStorageSourceGetActualType(src->dataFileStore); + if (actualType != VIR_STORAGE_TYPE_BLOCK && + actualType != VIR_STORAGE_TYPE_FILE) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("%1$s storage type is not supported for qcow2 data-file"), + NULLSTR(virStorageTypeToString(actualType))); + return -1; + } + + if (virStorageSourceIsBlockLocal(src->dataFileStore)) + driverName = g_strdup("host_device"); + else + driverName = g_strdup("file"); + + if (virJSONValueObjectAdd(&dataFileDef, + "s:driver", driverName, + "b:read-only", src->readonly, + "s:filename", src->dataFileRaw, + NULL) < 0) + return -1; + + if (virJSONValueObjectAdd(&props, + "A:data-file", &dataFileDef, + NULL) < 0) + return -1; + + return 0; +} + + + static int qemuBlockStorageSourceGetFormatQcow2Props(virStorageSource *src, virJSONValue *props) @@ -1301,6 +1343,9 @@ qemuBlockStorageSourceGetFormatQcow2Props(virStorageSource *src, NULL) < 0) return -1; + if (qemuBlockStorageGetQcow2DataFileProps(src, props) < 0) + return -1; + return 0; } diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 23b7e6b4e8..84c749b567 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -223,6 +223,9 @@ qemuSetupImageCgroupInternal(virDomainObj *vm, qemuSetupImagePathCgroup(vm, QEMU_DEVICE_MAPPER_CONTROL_PATH, false) < 0) return -1; + if (src->dataFileRaw && qemuSetupImagePathCgroup(vm, src->dataFileRaw, readonly) < 0) + return -1; + return qemuSetupImagePathCgroup(vm, path, readonly); } diff --git a/src/qemu/qemu_namespace.c b/src/qemu/qemu_namespace.c index bbe3d5a1f7..32653443ec 100644 --- a/src/qemu/qemu_namespace.c +++ b/src/qemu/qemu_namespace.c @@ -290,6 +290,12 @@ qemuDomainSetupDisk(virStorageSource *src, if (targetPaths) *paths = g_slist_concat(g_slist_reverse(targetPaths), *paths); + + if (next->dataFileRaw) { + g_autofree char *datafile = NULL; + datafile = g_strdup(next->dataFileRaw); + *paths = g_slist_prepend(*paths, g_steal_pointer(&datafile)); + } } *paths = g_slist_prepend(*paths, g_steal_pointer(&tmpPath)); diff --git a/src/storage_file/storage_source.c b/src/storage_file/storage_source.c index 2cfe3bc325..0ea11228d4 100644 --- a/src/storage_file/storage_source.c +++ b/src/storage_file/storage_source.c @@ -1378,8 +1378,16 @@ virStorageSourceGetMetadataRecurse(virStorageSource *src, &buf, &headerLen) < 0) return -1; - if (virStorageFileProbeGetMetadata(src, buf, headerLen) < 0) - return -1; + if (!parent->dataFileRaw) { + if (virStorageFileProbeGetMetadata(src, buf, headerLen) < 0) + return -1; + } else { + if (virStorageSourceInitAs(src, uid, gid) < 0) + return -1; + if (virStorageSourceChown(src, uid, gid) < 0) + return -1; + virStorageSourceDeinit(src); + } /* If we probed the format we MUST ensure that nothing else than the current * image is considered for security labelling and/or recursion. */ @@ -1417,6 +1425,32 @@ virStorageSourceGetMetadataRecurse(virStorageSource *src, backingStore->id = depth; src->backingStore = g_steal_pointer(&backingStore); + + } else if (src->dataFileRaw) { + if ((rv = virStorageSourceNewFromChild(src, src->dataFileRaw, &backingStore)) < 0) + return -1; + + if (rv == 1) + return 0; + + if ((rv = virStorageSourceGetMetadataRecurse(backingStore, parent, + uid, gid, + report_broken, + max_depth, depth + 1)) < 0) { + if (!report_broken) + return 0; + + if (rv == -2) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("format of data file image '%1$s' of image '%2$s' was not specified in the image metadata (See https://libvirt.org/kbase/backing_chains.html for troubleshooting)"), + src->dataFileRaw, NULLSTR(src->path)); + } + + return -1; + } + + backingStore->id = depth; + src->dataFileStore = g_steal_pointer(&backingStore); } else { /* add terminator */ src->backingStore = virStorageSourceNew(); -- 2.43.5

On Sun, Aug 11, 2024 at 17:34:48 +0300, Nikolai Barybin via Devel wrote:
- parse data-file metadata and init src->dataFileStore struct - chown data-file to allow qemu to open it - add data-file path to VM's cgroup and namespace - add data-file option to QEMU process command line
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> --- src/qemu/qemu_block.c | 45 +++++++++++++++++++++++++++++++ src/qemu/qemu_cgroup.c | 3 +++ src/qemu/qemu_namespace.c | 6 +++++ src/storage_file/storage_source.c | 38 ++++++++++++++++++++++++-- 4 files changed, 90 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c index d6cdf521c4..38f4717d56 100644 --- a/src/qemu/qemu_block.c +++ b/src/qemu/qemu_block.c @@ -1271,6 +1271,48 @@ qemuBlockStorageSourceGetFormatQcowGenericProps(virStorageSource *src, }
+static int +qemuBlockStorageGetQcow2DataFileProps(virStorageSource *src, + virJSONValue *props) +{ + g_autoptr(virJSONValue) dataFileDef = NULL; + virStorageType actualType; + g_autofree char *driverName = NULL; + + if (!src->dataFileRaw) + return 0; + + actualType = virStorageSourceGetActualType(src->dataFileStore); + if (actualType != VIR_STORAGE_TYPE_BLOCK && + actualType != VIR_STORAGE_TYPE_FILE) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("%1$s storage type is not supported for qcow2 data-file"), + NULLSTR(virStorageTypeToString(actualType))); + return -1;
This form of validation doesn't really belong to the code that actually formats the JSON, and should be done earlier. But based on my comment below it won't even be needed.
+ } + + if (virStorageSourceIsBlockLocal(src->dataFileStore)) + driverName = g_strdup("host_device"); + else + driverName = g_strdup("file"); + + if (virJSONValueObjectAdd(&dataFileDef, + "s:driver", driverName, + "b:read-only", src->readonly, + "s:filename", src->dataFileRaw, + NULL) < 0) + return -1; + + if (virJSONValueObjectAdd(&props, + "A:data-file", &dataFileDef, + NULL) < 0) + return -1;
So you're using the inline definition rather than an explicit new '-blockdev'. As libvirt switched to the full blockdev topology this is not acceptable. Also with using a proper blockdev you won't need to re-invent this code and simply use the existing one. the 'data-file' field will be filled by a reference to the node name.
+ + return 0; +}
[...]
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 23b7e6b4e8..84c749b567 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -223,6 +223,9 @@ qemuSetupImageCgroupInternal(virDomainObj *vm, qemuSetupImagePathCgroup(vm, QEMU_DEVICE_MAPPER_CONTROL_PATH, false) < 0) return -1;
+ if (src->dataFileRaw && qemuSetupImagePathCgroup(vm, src->dataFileRaw, readonly) < 0) + return -1;
The 'dataFileRaw' field _must not_ be accessed by anything besides the code which then re-probes the metadata of the backing image. Anything else must use the 'src' bits.
+ return qemuSetupImagePathCgroup(vm, path, readonly); }
diff --git a/src/qemu/qemu_namespace.c b/src/qemu/qemu_namespace.c index bbe3d5a1f7..32653443ec 100644 --- a/src/qemu/qemu_namespace.c +++ b/src/qemu/qemu_namespace.c @@ -290,6 +290,12 @@ qemuDomainSetupDisk(virStorageSource *src,
if (targetPaths) *paths = g_slist_concat(g_slist_reverse(targetPaths), *paths); + + if (next->dataFileRaw) { + g_autofree char *datafile = NULL; + datafile = g_strdup(next->dataFileRaw); + *paths = g_slist_prepend(*paths, g_steal_pointer(&datafile));
Same here.
+ } }
*paths = g_slist_prepend(*paths, g_steal_pointer(&tmpPath)); diff --git a/src/storage_file/storage_source.c b/src/storage_file/storage_source.c index 2cfe3bc325..0ea11228d4 100644 --- a/src/storage_file/storage_source.c +++ b/src/storage_file/storage_source.c @@ -1378,8 +1378,16 @@ virStorageSourceGetMetadataRecurse(virStorageSource *src, &buf, &headerLen) < 0) return -1;
- if (virStorageFileProbeGetMetadata(src, buf, headerLen) < 0) - return -1; + if (!parent->dataFileRaw) { + if (virStorageFileProbeGetMetadata(src, buf, headerLen) < 0) + return -1; + } else { + if (virStorageSourceInitAs(src, uid, gid) < 0) + return -1; + if (virStorageSourceChown(src, uid, gid) < 0) + return -1; + virStorageSourceDeinit(src);
The probing of images must not chown anything. NACK to this hunk. Per commit message you seem to suggest that this is to allow opening the image, but as stated this is wrong. You need to implement this properly in the security drivers.
+ }
/* If we probed the format we MUST ensure that nothing else than the current * image is considered for security labelling and/or recursion. */ @@ -1417,6 +1425,32 @@ virStorageSourceGetMetadataRecurse(virStorageSource *src,
backingStore->id = depth; src->backingStore = g_steal_pointer(&backingStore); + + } else if (src->dataFileRaw) {
You can't do this as an else-if branch here. A QCOW2 image can have all 4 combinations of 'backing file' and 'data file' properties. By doning an else-if you prevent adding a terminator if the image has a data file. But what's worse, if you'd have a qcow2 overlay image with a data file you'd not even parse the data file.
+ if ((rv = virStorageSourceNewFromChild(src, src->dataFileRaw, &backingStore)) < 0) + return -1;
Do not reuse 'backingStore' it makes the code confusing.
+ + if (rv == 1) + return 0; + + if ((rv = virStorageSourceGetMetadataRecurse(backingStore, parent, + uid, gid, + report_broken, + max_depth, depth + 1)) < 0) {
IIRC the 'data file' doesn't have a format thus can't even have a backing image. Recursing here is pointless and also wrong (although IIUC doesn't really break security). Either way, since the top image is a qcow2 image with proper format, qemu will allow writing arbitrary data including a qcow2 header into the image which would be written verbatim into the data file. Attempting to probe the contents of a data file, could then make us parse a mailcious header. Basically the data file must be treated as 'raw', at least when probed from the image.
+ if (!report_broken) + return 0; + + if (rv == -2) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("format of data file image '%1$s' of image '%2$s' was not specified in the image metadata (See https://libvirt.org/kbase/backing_chains.html for troubleshooting)"), + src->dataFileRaw, NULLSTR(src->path)); + }
The linked document definitely doesn't say anything about the data image and also as per above this makes no sense as the data file can't have a format which would allow backing chains.
+ + return -1; + } + + backingStore->id = depth; + src->dataFileStore = g_steal_pointer(&backingStore); } else { /* add terminator */ src->backingStore = virStorageSourceNew(); -- 2.43.5

On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote:
There are use cases when the existing disks (i.e. LVM) are wanted to be used with advanced features. For this purpose QEMU allows data-file feature for qcow2 files: metadata is kept in the qcow2 file like usual, but guest data is written to an external file. These patches enable support for this feature in libvirt.
So this feature was once attempted to be added but was never finished and the comitted bits were reverted eventually. (I've purged my local archive so I don't have the link handy but I can look if you want the links to the old posting) It was deemed that this doesn't really add any performance benefit over storing the actual qcow2 with data inside. The qcow2 with data can be stored inside the LV or other block device for that matter and thus can provide all features that are necessary. The data file feature makes also the management of the metadata and data much more complex, for a very bad trade-off. At this point with 'qemu-img measure' it's easy to query the necessary size to have a fully allocated qcow2 inside the block device. Based on the history of this I'd like to ask you to summarize justifications and reasons for adding this feature before continuing.
Nikolai Barybin (3): conf: add data-file feature and related fields to virStorageSource storage: add qcow2 filename parsing from header qemu: enable qcow2 data-file attach to VM on start
src/conf/storage_source_conf.c | 8 ++++ src/conf/storage_source_conf.h | 5 ++ src/qemu/qemu_block.c | 45 ++++++++++++++++++ src/qemu/qemu_cgroup.c | 3 ++ src/qemu/qemu_namespace.c | 6 +++ src/storage_file/storage_file_probe.c | 66 +++++++++++++++++++++++---- src/storage_file/storage_source.c | 38 ++++++++++++++-
A quick skim through the pathces shows following bits missing: - any form of tests - XML formating and parsing of the path (needed as you need the path of the data file to be user-configurable for cases when user declares the backing chain in the XML and also to be overridable. Additionally we need to record the path for cases when libvirtd/virtqemud restarts as we must not try to re-probe the image afterwards) - security driver intergration (any of the above will not work if the data file is not accessible as it won't be chown'd/relabelled) Few more comments inline.

On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:
On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote:
There are use cases when the existing disks (i.e. LVM) are wanted to be used with advanced features. For this purpose QEMU allows data-file feature for qcow2 files: metadata is kept in the qcow2 file like usual, but guest data is written to an external file. These patches enable support for this feature in libvirt.
So this feature was once attempted to be added but was never finished and the comitted bits were reverted eventually. (I've purged my local archive so I don't have the link handy but I can look if you want the links to the old posting)
It was deemed that this doesn't really add any performance benefit over storing the actual qcow2 with data inside. The qcow2 with data can be stored inside the LV or other block device for that matter and thus can provide all features that are necessary. The data file feature makes also the management of the metadata and data much more complex, for a very bad trade-off. At this point with 'qemu-img measure' it's easy to query the necessary size to have a fully allocated qcow2 inside the block device.
Based on the history of this I'd like to ask you to summarize justifications and reasons for adding this feature before continuing.
Based on the current state of the series and what would be required to make it viable to be accepted I very strongly suggest re-thinking if you really need this feature, especially based on the caveats above.

On 8/12/24 10:36, Peter Krempa wrote:
On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:
There are use cases when the existing disks (i.e. LVM) are wanted to be used with advanced features. For this purpose QEMU allows data-file feature for qcow2 files: metadata is kept in the qcow2 file like usual, but guest data is written to an external file. These patches enable support for this feature in libvirt. So this feature was once attempted to be added but was never finished and the comitted bits were reverted eventually. (I've purged my local archive so I don't have the link handy but I can look if you want the
On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote: links to the old posting)
It was deemed that this doesn't really add any performance benefit over storing the actual qcow2 with data inside. The qcow2 with data can be stored inside the LV or other block device for that matter and thus can provide all features that are necessary. The data file feature makes also the management of the metadata and data much more complex, for a very bad trade-off. At this point with 'qemu-img measure' it's easy to query the necessary size to have a fully allocated qcow2 inside the block device.
Based on the history of this I'd like to ask you to summarize justifications and reasons for adding this feature before continuing. Based on the current state of the series and what would be required to make it viable to be accepted I very strongly suggest re-thinking if you really need this feature, especially based on the caveats above.
Let me clarify a bit. QCOW2 as a data file uses QCOW2 as a metadata storage for ordinary block devices. This is a feature of QEMU and it would be quite natural to have its representation in the libvirt as without libvirt help QEMU could not even start with such a configuration due to namespaces. LVM or not LVM. How it is better in comparison with the QCOW2. First of all, there are historical setups when the customer uses LVM for virtual machines and does not want to reformat his hard disks to the file system. This makes a sense as we are avoiding two level of metadata, i.e. QCOW2 metadata over local FS metadata. This makes the setup a little bit more reliable. It should also be noted that QCOW2 in very specific setups is at least 2 times slow (when L2 cache does not fit into the dedicated amount of RAM while the disk itself is quite big. I would admit that this problem would be seen even for not so big 1T disks). On top of that LVM is quite useful once we start talking about clustered LVM setups. Clustered LVM is a good alternative for CEPH at least for some users. Thus this data file setup is a way to provide backups, VM state snapshots and other features. Den

On Mon, Aug 12, 2024 at 12:04:01 +0200, Denis V. Lunev wrote:
On 8/12/24 10:36, Peter Krempa wrote:
On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:
There are use cases when the existing disks (i.e. LVM) are wanted to be used with advanced features. For this purpose QEMU allows data-file feature for qcow2 files: metadata is kept in the qcow2 file like usual, but guest data is written to an external file. These patches enable support for this feature in libvirt. So this feature was once attempted to be added but was never finished and the comitted bits were reverted eventually. (I've purged my local archive so I don't have the link handy but I can look if you want the
On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote: links to the old posting)
It was deemed that this doesn't really add any performance benefit over storing the actual qcow2 with data inside. The qcow2 with data can be stored inside the LV or other block device for that matter and thus can provide all features that are necessary. The data file feature makes also the management of the metadata and data much more complex, for a very bad trade-off. At this point with 'qemu-img measure' it's easy to query the necessary size to have a fully allocated qcow2 inside the block device.
Based on the history of this I'd like to ask you to summarize justifications and reasons for adding this feature before continuing. Based on the current state of the series and what would be required to make it viable to be accepted I very strongly suggest re-thinking if you really need this feature, especially based on the caveats above.
Let me clarify a bit.
QCOW2 as a data file uses QCOW2 as a metadata storage for ordinary block devices. This is a feature of QEMU and it would be quite natural to have its representation in the libvirt as without libvirt help QEMU could not even start with such a configuration due to namespaces.
LVM or not LVM. How it is better in comparison with the QCOW2.
Historically when this was considered to be used for the incremental backup feature in oVirt, similar set of the advantages was picked as the justification. Later on after discussing this for a bit it became obvious that the advantages are not as great to justify the extra effort: - extra management complexity (need to carry over the qcow2 as well as the data file) - possibility to desynchronize the state (write into the 'data_file' invalidates the qcow2 "overlay"), without being able to see that it was desync'd (writing into the data file doesn't touch the overlay so the qcow2 driver doesn't see that). - basically no performance benefits on top of qcow2 - (perhaps other's on oVirt side ... it was long time ago so I don't remember any more)
First of all, there are historical setups when the customer uses LVM for virtual machines and does not want to reformat his hard disks to the file system. This makes a sense as we
Yes, such setups would not be possible. Users would need to convert to qcow2 but that can be done transparently to the VM. (but briefly requires twice the storage).
are avoiding two level of metadata, i.e. QCOW2 metadata over local FS metadata. This makes the setup a little bit more
As stated above you don't really need to use a filesystem. You can make a block device (whether real/partition/lv) into a qcow2 image by simply using qemu-img format.
reliable. It should also be noted that QCOW2 in very specific setups is at least 2 times slow (when L2 cache does not fit into the dedicated amount of RAM while the disk itself is quite big. I would admit that this problem would be seen even for not so big 1T disks).
Doesn't a QCOW2 with a 'data file' behave the same as a fully-allocated qcow2? I don't really see how this is more reliable/performant than plain fully-allocated qcow2.
On top of that LVM is quite useful once we start talking about clustered LVM setups. Clustered LVM is a good alternative for CEPH at least for some users.
Thus this data file setup is a way to provide backups, VM state snapshots and other features.
Backups (changed block tracking) is the only thing you'd gain with this. Snapshots and other features are already possible with external snapshots. While based on the above I don't really see the need/use for 'data_file' qcow2 feature, especially given the complexity of adding it properly, I'm not opposed as long as it will be implemented properly. I suggest having a look at Cole's patches from the last attempt as they were far more complete than this posting.

On 8/12/24 16:46, Peter Krempa wrote:
On Mon, Aug 12, 2024 at 12:04:01 +0200, Denis V. Lunev wrote:
On 8/12/24 10:36, Peter Krempa wrote:
On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:
There are use cases when the existing disks (i.e. LVM) are wanted to be used with advanced features. For this purpose QEMU allows data-file feature for qcow2 files: metadata is kept in the qcow2 file like usual, but guest data is written to an external file. These patches enable support for this feature in libvirt. So this feature was once attempted to be added but was never finished and the comitted bits were reverted eventually. (I've purged my local archive so I don't have the link handy but I can look if you want the
On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote: links to the old posting)
It was deemed that this doesn't really add any performance benefit over storing the actual qcow2 with data inside. The qcow2 with data can be stored inside the LV or other block device for that matter and thus can provide all features that are necessary. The data file feature makes also the management of the metadata and data much more complex, for a very bad trade-off. At this point with 'qemu-img measure' it's easy to query the necessary size to have a fully allocated qcow2 inside the block device.
Based on the history of this I'd like to ask you to summarize justifications and reasons for adding this feature before continuing. Based on the current state of the series and what would be required to make it viable to be accepted I very strongly suggest re-thinking if you really need this feature, especially based on the caveats above.
Let me clarify a bit.
QCOW2 as a data file uses QCOW2 as a metadata storage for ordinary block devices. This is a feature of QEMU and it would be quite natural to have its representation in the libvirt as without libvirt help QEMU could not even start with such a configuration due to namespaces.
LVM or not LVM. How it is better in comparison with the QCOW2. Historically when this was considered to be used for the incremental backup feature in oVirt, similar set of the advantages was picked as the justification. Later on after discussing this for a bit it became obvious that the advantages are not as great to justify the extra effort:
- extra management complexity (need to carry over the qcow2 as well as the data file) - possibility to desynchronize the state (write into the 'data_file' invalidates the qcow2 "overlay"), without being able to see that it was desync'd (writing into the data file doesn't touch the overlay so the qcow2 driver doesn't see that). - basically no performance benefits on top of qcow2 - (perhaps other's on oVirt side ... it was long time ago so I don't remember any more)
Yes. And definitely we will have this extra complexity over extra functionality. Right now we have to support for our product backups of VM data residing on LVM volumes. This is shipped into the production and I have option to have this in downstream only or submit this upstream. The problem is that if we would say that libvirt is not going this way, we should clearly indicate in * QCOW2 documentation * qemu-img man page that the option of using datafile for VM metadata is deprecated and will not get further development. This would be at least fair. We have taken the decision that this scenario should be supported on the base of availability of this option and presence it in the public docs.
First of all, there are historical setups when the customer uses LVM for virtual machines and does not want to reformat his hard disks to the file system. This makes a sense as we Yes, such setups would not be possible. Users would need to convert to qcow2 but that can be done transparently to the VM. (but briefly requires twice the storage).
That is a BIG problem. Customers do not want to change disks layout. Each time we try to force them, we get problems. Real big problems.
are avoiding two level of metadata, i.e. QCOW2 metadata over
local FS metadata. This makes the setup a little bit more As stated above you don't really need to use a filesystem. You can make a block device (whether real/partition/lv) into a qcow2 image by simply using qemu-img format. We use LVM for big data (TBs in size) and QCOW2 for metadata, namely CBT. This is how libvirt now distinguish EFI QCOW2 and disk QCOW2.
setups is at least 2 times slow (when L2 cache does not fit into the dedicated amount of RAM while the disk itself is quite big. I would admit that this problem would be seen even for not so big 1T disks). Doesn't a QCOW2 with a 'data file' behave the same as a fully-allocated qcow2? I don't really see how this is more reliable/performant than
reliable. It should also be noted that QCOW2 in very specific plain fully-allocated qcow2.
No real disk data resides in QCOW2. This is metadata storage, CBT and memory state only in the case of snapshots.
On top of that LVM is quite useful once we start talking about clustered LVM setups. Clustered LVM is a good alternative for CEPH at least for some users.
Thus this data file setup is a way to provide backups, VM state snapshots and other features. Backups (changed block tracking) is the only thing you'd gain with this. Snapshots and other features are already possible with external snapshots.
While based on the above I don't really see the need/use for 'data_file' qcow2 feature, especially given the complexity of adding it properly, I'm not opposed as long as it will be implemented properly.
I suggest having a look at Cole's patches from the last attempt as they were far more complete than this posting.
Externals snapshots are good, but if the customer wants internal ones with a memory state to be kept, data files would be useful. We will take a look at Cole patches and may be this would help. Anyway, if the denyal is strict - we should clearly indicate that in QEMU that this option is deprecated. Den

On Wed, Aug 14, 2024 at 15:28:57 +0200, Denis V. Lunev wrote:
On 8/12/24 16:46, Peter Krempa wrote:
On Mon, Aug 12, 2024 at 12:04:01 +0200, Denis V. Lunev wrote:
On 8/12/24 10:36, Peter Krempa wrote:
On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:
There are use cases when the existing disks (i.e. LVM) are wanted to be used with advanced features. For this purpose QEMU allows data-file feature for qcow2 files: metadata is kept in the qcow2 file like usual, but guest data is written to an external file. These patches enable support for this feature in libvirt. So this feature was once attempted to be added but was never finished and the comitted bits were reverted eventually. (I've purged my local archive so I don't have the link handy but I can look if you want the
On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote: links to the old posting)
It was deemed that this doesn't really add any performance benefit over storing the actual qcow2 with data inside. The qcow2 with data can be stored inside the LV or other block device for that matter and thus can provide all features that are necessary. The data file feature makes also the management of the metadata and data much more complex, for a very bad trade-off. At this point with 'qemu-img measure' it's easy to query the necessary size to have a fully allocated qcow2 inside the block device.
Based on the history of this I'd like to ask you to summarize justifications and reasons for adding this feature before continuing. Based on the current state of the series and what would be required to make it viable to be accepted I very strongly suggest re-thinking if you really need this feature, especially based on the caveats above.
Let me clarify a bit.
QCOW2 as a data file uses QCOW2 as a metadata storage for ordinary block devices. This is a feature of QEMU and it would be quite natural to have its representation in the libvirt as without libvirt help QEMU could not even start with such a configuration due to namespaces.
LVM or not LVM. How it is better in comparison with the QCOW2. Historically when this was considered to be used for the incremental backup feature in oVirt, similar set of the advantages was picked as the justification. Later on after discussing this for a bit it became obvious that the advantages are not as great to justify the extra effort:
- extra management complexity (need to carry over the qcow2 as well as the data file) - possibility to desynchronize the state (write into the 'data_file' invalidates the qcow2 "overlay"), without being able to see that it was desync'd (writing into the data file doesn't touch the overlay so the qcow2 driver doesn't see that). - basically no performance benefits on top of qcow2 - (perhaps other's on oVirt side ... it was long time ago so I don't remember any more)
Yes. And definitely we will have this extra complexity over extra functionality. Right now we have to support for our product backups of VM data residing on LVM volumes. This is shipped into the production and I have option to have this in downstream only or submit this upstream.
Fair enough. As long as you are willing to properly implement backing file support I'm fine with that. I just wanted to note that one prior attempt was deemed not worth it and abandoned, so that you can understand what you are taking on.
The problem is that if we would say that libvirt is not going this way, we should clearly indicate in * QCOW2 documentation * qemu-img man page that the option of using datafile for VM metadata is deprecated and will not get further development. This would be at least fair.
I have no idea how the qemu project approaches this and what they think about the status of backing files. For libvirt if qemu supports it it's a fair game to add the feature if somebody is willing to spend the effort.
We have taken the decision that this scenario should be supported on the base of availability of this option and presence it in the public docs.
First of all, there are historical setups when the customer uses LVM for virtual machines and does not want to reformat his hard disks to the file system. This makes a sense as we Yes, such setups would not be possible. Users would need to convert to qcow2 but that can be done transparently to the VM. (but briefly requires twice the storage).
That is a BIG problem. Customers do not want to change disks layout. Each time we try to force them, we get problems. Real big problems.
are avoiding two level of metadata, i.e. QCOW2 metadata over
local FS metadata. This makes the setup a little bit more As stated above you don't really need to use a filesystem. You can make a block device (whether real/partition/lv) into a qcow2 image by simply using qemu-img format. We use LVM for big data (TBs in size) and QCOW2 for metadata, namely CBT. This is how libvirt now distinguish EFI QCOW2 and disk QCOW2.
I don't quite follow what you've meant here.
setups is at least 2 times slow (when L2 cache does not fit into the dedicated amount of RAM while the disk itself is quite big. I would admit that this problem would be seen even for not so big 1T disks). Doesn't a QCOW2 with a 'data file' behave the same as a fully-allocated qcow2? I don't really see how this is more reliable/performant than
reliable. It should also be noted that QCOW2 in very specific plain fully-allocated qcow2.
No real disk data resides in QCOW2. This is metadata storage, CBT and memory state only in the case of snapshots.
On top of that LVM is quite useful once we start talking about clustered LVM setups. Clustered LVM is a good alternative for CEPH at least for some users.
Thus this data file setup is a way to provide backups, VM state snapshots and other features. Backups (changed block tracking) is the only thing you'd gain with this. Snapshots and other features are already possible with external snapshots.
While based on the above I don't really see the need/use for 'data_file' qcow2 feature, especially given the complexity of adding it properly, I'm not opposed as long as it will be implemented properly.
I suggest having a look at Cole's patches from the last attempt as they were far more complete than this posting.
Externals snapshots are good, but if the customer wants internal ones with a memory state to be kept, data files would be useful.
Well that is fair enough I'd say, but ... that throws your argument about data files being a magical fix for the qcow2 l2 table performance issues completely out of the window, because if you do an internal snapshot you definitely will store also data, not only metadata in the qcow2 file as the blocks which were actually changed need to be overlaid somewhere to allow going back.
We will take a look at Cole patches and may be this would help. Anyway, if the denyal is strict - we should clearly indicate that in QEMU that this option is deprecated.
Once again I'm not saying I'm agains this feature. On terms of whether qcow2 data file feature is considered supported by qemu and whether it properly works with internal snapshots I'll have to defer you to the qemu mailing list as I have no idea and honestly I don't care about it. I care just that the implementation is done properly and doesn't break in the future when it's very likely that I'll be the one fixing it.

On 8/16/24 17:00, Peter Krempa wrote:
On Wed, Aug 14, 2024 at 15:28:57 +0200, Denis V. Lunev wrote:
On Mon, Aug 12, 2024 at 12:04:01 +0200, Denis V. Lunev wrote:
On 8/12/24 10:36, Peter Krempa wrote:
On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:
On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote: > There are use cases when the existing disks (i.e. LVM) are wanted > to be used with advanced features. For this purpose QEMU allows > data-file feature for qcow2 files: metadata is kept in the qcow2 > file like usual, but guest data is written to an external file. > These patches enable support for this feature in libvirt. So this feature was once attempted to be added but was never finished and the comitted bits were reverted eventually. (I've purged my local archive so I don't have the link handy but I can look if you want the links to the old posting)
It was deemed that this doesn't really add any performance benefit over storing the actual qcow2 with data inside. The qcow2 with data can be stored inside the LV or other block device for that matter and thus can provide all features that are necessary. The data file feature makes also the management of the metadata and data much more complex, for a very bad trade-off. At this point with 'qemu-img measure' it's easy to query the necessary size to have a fully allocated qcow2 inside the block device.
Based on the history of this I'd like to ask you to summarize justifications and reasons for adding this feature before continuing. Based on the current state of the series and what would be required to make it viable to be accepted I very strongly suggest re-thinking if you really need this feature, especially based on the caveats above.
Let me clarify a bit.
QCOW2 as a data file uses QCOW2 as a metadata storage for ordinary block devices. This is a feature of QEMU and it would be quite natural to have its representation in the libvirt as without libvirt help QEMU could not even start with such a configuration due to namespaces.
LVM or not LVM. How it is better in comparison with the QCOW2. Historically when this was considered to be used for the incremental backup feature in oVirt, similar set of the advantages was picked as the justification. Later on after discussing this for a bit it became obvious that the advantages are not as great to justify the extra effort:
- extra management complexity (need to carry over the qcow2 as well as the data file) - possibility to desynchronize the state (write into the 'data_file' invalidates the qcow2 "overlay"), without being able to see that it was desync'd (writing into the data file doesn't touch the overlay so the qcow2 driver doesn't see that). - basically no performance benefits on top of qcow2 - (perhaps other's on oVirt side ... it was long time ago so I don't remember any more) Yes. And definitely we will have this extra complexity over extra functionality. Right now we have to support for our product backups of VM data residing on LVM volumes. This is shipped into
On 8/12/24 16:46, Peter Krempa wrote: the production and I have option to have this in downstream only or submit this upstream. Fair enough. As long as you are willing to properly implement backing file support I'm fine with that. I just wanted to note that one prior attempt was deemed not worth it and abandoned, so that you can understand what you are taking on.
The problem is that if we would say that libvirt is not going this way, we should clearly indicate in * QCOW2 documentation * qemu-img man page that the option of using datafile for VM metadata is deprecated and will not get further development. This would be at least fair. I have no idea how the qemu project approaches this and what they think about the status of backing files.
For libvirt if qemu supports it it's a fair game to add the feature if somebody is willing to spend the effort.
We have taken the decision that this scenario should be supported on the base of availability of this option and presence it in the public docs.
First of all, there are historical setups when the customer uses LVM for virtual machines and does not want to reformat his hard disks to the file system. This makes a sense as we Yes, such setups would not be possible. Users would need to convert to qcow2 but that can be done transparently to the VM. (but briefly requires twice the storage). That is a BIG problem. Customers do not want to change disks layout. Each time we try to force them, we get problems. Real big problems.
are avoiding two level of metadata, i.e. QCOW2 metadata over
local FS metadata. This makes the setup a little bit more As stated above you don't really need to use a filesystem. You can make a block device (whether real/partition/lv) into a qcow2 image by simply using qemu-img format. We use LVM for big data (TBs in size) and QCOW2 for metadata, namely CBT. This is how libvirt now distinguish EFI QCOW2 and disk QCOW2. I don't quite follow what you've meant here.
setups is at least 2 times slow (when L2 cache does not fit into the dedicated amount of RAM while the disk itself is quite big. I would admit that this problem would be seen even for not so big 1T disks). Doesn't a QCOW2 with a 'data file' behave the same as a fully-allocated qcow2? I don't really see how this is more reliable/performant than
reliable. It should also be noted that QCOW2 in very specific plain fully-allocated qcow2. No real disk data resides in QCOW2. This is metadata storage, CBT and memory state only in the case of snapshots.
On top of that LVM is quite useful once we start talking about clustered LVM setups. Clustered LVM is a good alternative for CEPH at least for some users.
Thus this data file setup is a way to provide backups, VM state snapshots and other features. Backups (changed block tracking) is the only thing you'd gain with this. Snapshots and other features are already possible with external snapshots.
While based on the above I don't really see the need/use for 'data_file' qcow2 feature, especially given the complexity of adding it properly, I'm not opposed as long as it will be implemented properly.
I suggest having a look at Cole's patches from the last attempt as they were far more complete than this posting.
Externals snapshots are good, but if the customer wants internal ones with a memory state to be kept, data files would be useful. Well that is fair enough I'd say, but ... that throws your argument about data files being a magical fix for the qcow2 l2 table performance issues completely out of the window, because if you do an internal snapshot you definitely will store also data, not only metadata in the qcow2 file as the blocks which were actually changed need to be overlaid somewhere to allow going back.
We will take a look at Cole patches and may be this would help. Anyway, if the denyal is strict - we should clearly indicate that in QEMU that this option is deprecated. Once again I'm not saying I'm agains this feature. On terms of whether qcow2 data file feature is considered supported by qemu and whether it properly works with internal snapshots I'll have to defer you to the qemu mailing list as I have no idea and honestly I don't care about it.
I care just that the implementation is done properly and doesn't break in the future when it's very likely that I'll be the one fixing it.
OK. In general, we will work on this with more attention looking into the previous work and will come back with something better. Thank you for point this out, Den
participants (3)
-
Denis V. Lunev
-
Nikolai Barybin
-
Peter Krempa