[libvirt] [PATCH 0/3] add disk driver metadata_cache_size option
Hi, all. This is a patch series after offlist agreement on introducing metadata-cache-size option for disks. The options itself is described in 2nd patch of the series. There is a plenty of attempts to add option to set qcow2 metadata cache sizes in past, see [1] [2] to name recent that has comments I tried to address or has work I reused to some extent. I would like to address Peter's comment in [1] that this option is image's option rather then disk's here. While this makes sense if we set specific cache value that covers disk only partially, here when we have maximum policy that covers whole disk it makes sense to set same value for all images. The setted value is only upper limit on cache size and thus cache for every image will grow proportionally to image data size "visible from top" eventually I guess. And these sizes are changing during guest lifetime - thus we need set whole disk limit for every image for 'maxium' effect. On the other hand if we add policies different from 'maximum' it may require per image option. Well I can't imagine name for element for backing chain that will have cache size attribute better then 'driver'). And driver is already used for top image so I leave it this way. KNOWN ISSUES 1. when using -driver to configure disks in command line cache size does not get propagated thru backing chain 2. when making external disk snapshot cache size sneak into backing file filename and thus cache size for backing image became remembered there 3. blockcommit after such snapshot will not work because libvirt is not ready for backing file name is reported as json sometimes Looks like 1 can be addressed in qemu. The reason for behaviour in 2 I do not understand honestly. And 3 will be naturally addessed after blockjobs starts working with blockdev configuration I guess. LINKS [1] [PATCH] qemu: Added support L2 table cache for qcow2 disk https://www.redhat.com/archives/libvir-list/2018-July/msg00166.html [2] [PATCH v6 0/2] Add support for qcow2 cache https://www.redhat.com/archives/libvir-list/2017-September/msg00553.html Nikolay Shirokovskiy (3): qemu: caps: add drive.qcow2.l2-cache-size xml: add disk driver metadata_cache_size option qemu: support metadata-cache-size for blockdev docs/formatdomain.html.in | 7 ++++ docs/schemas/domaincommon.rng | 10 +++++ src/conf/domain_conf.c | 17 ++++++++ src/conf/domain_conf.h | 9 ++++ src/qemu/qemu_block.c | 5 ++- src/qemu/qemu_capabilities.c | 3 ++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 26 ++++++++++++ src/qemu/qemu_domain.c | 2 + src/util/virstoragefile.c | 1 + src/util/virstoragefile.h | 1 + tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.11.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml | 1 + .../qemuxml2argvdata/disk-metadata_cache_size.args | 34 +++++++++++++++ .../qemuxml2argvdata/disk-metadata_cache_size.xml | 42 +++++++++++++++++++ tests/qemuxml2argvtest.c | 2 + .../disk-metadata_cache_size.xml | 48 ++++++++++++++++++++++ tests/qemuxml2xmltest.c | 2 + 42 files changed, 235 insertions(+), 1 deletion(-) create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.args create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.xml create mode 100644 tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml -- 1.8.3.1
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.11.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml | 1 + 28 files changed, 30 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index e228f52..7d42254 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -508,6 +508,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, /* 315 */ "vfio-pci.display", "blockdev", + "drive.qcow2.l2-cache-size", ); @@ -1234,6 +1235,8 @@ static struct virQEMUCapsStringFlags virQEMUCapsQMPSchemaQueries[] = { { "blockdev-add/arg-type/+vxhs", QEMU_CAPS_VXHS}, { "blockdev-add/arg-type/+iscsi/password-secret", QEMU_CAPS_ISCSI_PASSWORD_SECRET }, { "blockdev-add/arg-type/+qcow2/encrypt/+luks/key-secret", QEMU_CAPS_QCOW2_LUKS }, + { "blockdev-add/arg-type/options/+qcow2/l2-cache-size", QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE}, + { "blockdev-add/arg-type/+qcow2/l2-cache-size", QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE}, { "nbd-server-start/arg-type/tls-creds", QEMU_CAPS_NBD_TLS }, { "screendump/arg-type/device", QEMU_CAPS_SCREENDUMP_DEVICE }, { "block-commit/arg-type/*top", QEMU_CAPS_ACTIVE_COMMIT }, diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 934620e..eb9b98b 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -492,6 +492,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 315 */ QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */ QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */ + QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE, /* -drive l2-cache-size */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml index b9c4182..1214a48 100644 --- a/tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml @@ -151,6 +151,7 @@ <flag name='blockdev-del'/> <flag name='vhost-vsock'/> <flag name='egl-headless'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2010000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>305067</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml index 66b2560..defbd6d 100644 --- a/tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml @@ -150,6 +150,7 @@ <flag name='blockdev-del'/> <flag name='vhost-vsock'/> <flag name='egl-headless'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2010000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>384412</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml index e000aac..3dfb5b3 100644 --- a/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml @@ -113,6 +113,7 @@ <flag name='blockdev-del'/> <flag name='vhost-vsock'/> <flag name='egl-headless'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2010000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>306247</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml index ebc5e77..adf9cd0 100644 --- a/tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml @@ -192,6 +192,7 @@ <flag name='vhost-vsock'/> <flag name='mch'/> <flag name='egl-headless'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2010000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>364386</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml index 4eb8a39..d115424 100644 --- a/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml @@ -120,6 +120,7 @@ <flag name='vhost-vsock'/> <flag name='tpm-emulator'/> <flag name='egl-headless'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2011000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>345099</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.11.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.11.0.x86_64.xml index 857a9a9..5ecf9ea 100644 --- a/tests/qemucapabilitiesdata/caps_2.11.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.11.0.x86_64.xml @@ -198,6 +198,7 @@ <flag name='mch'/> <flag name='mch.extended-tseg-mbytes'/> <flag name='egl-headless'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2011000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>368875</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml index 7bf1fab..d335391 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml @@ -162,6 +162,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>344910</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml index 8b8d885..02ea0b9 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml @@ -160,6 +160,7 @@ <flag name='machine.pseries.cap-htm'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>425694</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml index 79320d5..1c52088 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml @@ -128,6 +128,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2012000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>374287</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml index fcf94ab..4654280 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml @@ -203,6 +203,7 @@ <flag name='sev-guest'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>413556</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml index 9be7d89..7d70039 100644 --- a/tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml @@ -151,6 +151,7 @@ <flag name='sdl-gl'/> <flag name='hda-output'/> <flag name='mch'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2005000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>218187</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.6.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.6.0.aarch64.xml index 381d050..105a46f 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0.aarch64.xml @@ -135,6 +135,7 @@ <flag name='nbd-tls'/> <flag name='sdl-gl'/> <flag name='hda-output'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2006000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>229858</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.6.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.6.0.ppc64.xml index f81c73d..2a68158 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0.ppc64.xml @@ -130,6 +130,7 @@ <flag name='nbd-tls'/> <flag name='sdl-gl'/> <flag name='hda-output'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2006000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>264684</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml index 30a8e01..5ef1f20 100644 --- a/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml @@ -163,6 +163,7 @@ <flag name='sdl-gl'/> <flag name='hda-output'/> <flag name='mch'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2006000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>228991</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml index b30c31c..ac8852a 100644 --- a/tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml @@ -100,6 +100,7 @@ <flag name='nbd-tls'/> <flag name='virtual-css-bridge'/> <flag name='sdl-gl'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2007000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>219140</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml index eda68e5..1b053ce 100644 --- a/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml @@ -168,6 +168,7 @@ <flag name='sdl-gl'/> <flag name='hda-output'/> <flag name='mch'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2007000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>240497</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml index b010f73..40e5642 100644 --- a/tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml @@ -103,6 +103,7 @@ <flag name='virtual-css-bridge'/> <flag name='sdl-gl'/> <flag name='vhost-vsock'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2007093</version> <kvmVersion>0</kvmVersion> <microcodeVersion>244554</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml index 627eb44..a521b78 100644 --- a/tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml @@ -171,6 +171,7 @@ <flag name='hda-output'/> <flag name='vhost-vsock'/> <flag name='mch'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2008000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>257152</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.9.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.9.0.ppc64.xml index f97ebdb..97fce05 100644 --- a/tests/qemucapabilitiesdata/caps_2.9.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.9.0.ppc64.xml @@ -142,6 +142,7 @@ <flag name='hda-output'/> <flag name='blockdev-del'/> <flag name='vhost-vsock'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2009000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>349056</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml index 5a4371a..17fe2b9 100644 --- a/tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml @@ -107,6 +107,7 @@ <flag name='sdl-gl'/> <flag name='blockdev-del'/> <flag name='vhost-vsock'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2009000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>267973</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml index 7bf31d9..cc72d86 100644 --- a/tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml @@ -186,6 +186,7 @@ <flag name='vmgenid'/> <flag name='vhost-vsock'/> <flag name='mch'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2009000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>340375</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml index a1e2ae6..021160e 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml @@ -160,6 +160,7 @@ <flag name='machine.pseries.cap-htm'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>2012050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>444131</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml index 254a4cf..b8c2b67 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml @@ -100,6 +100,7 @@ <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> <flag name='egl-headless'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml index e7ab79e..d072bbd 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml @@ -100,6 +100,7 @@ <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> <flag name='egl-headless'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml b/tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml index 3b5f981..f697c46 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml @@ -130,6 +130,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>387601</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml index 7ceea6b..e9bdaaa 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml @@ -205,6 +205,7 @@ <flag name='usb-storage.werror'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='drive.qcow2.l2-cache-size'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>425157</microcodeVersion> -- 1.8.3.1
On Thu, Nov 01, 2018 at 14:32:22 +0300, Nikolay Shirokovskiy wrote: Missing description of what the capability bit detects in the commit message.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 +
[...]
28 files changed, 30 insertions(+)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index e228f52..7d42254 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -508,6 +508,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, /* 315 */ "vfio-pci.display", "blockdev", + "drive.qcow2.l2-cache-size", );
@@ -1234,6 +1235,8 @@ static struct virQEMUCapsStringFlags virQEMUCapsQMPSchemaQueries[] = { { "blockdev-add/arg-type/+vxhs", QEMU_CAPS_VXHS}, { "blockdev-add/arg-type/+iscsi/password-secret", QEMU_CAPS_ISCSI_PASSWORD_SECRET }, { "blockdev-add/arg-type/+qcow2/encrypt/+luks/key-secret", QEMU_CAPS_QCOW2_LUKS }, + { "blockdev-add/arg-type/options/+qcow2/l2-cache-size", QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE},
This can be dropped ...
+ { "blockdev-add/arg-type/+qcow2/l2-cache-size", QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE},
... and this needs to be modified since the cache behaves differently as pointed out in 3/3.
{ "nbd-server-start/arg-type/tls-creds", QEMU_CAPS_NBD_TLS }, { "screendump/arg-type/device", QEMU_CAPS_SCREENDUMP_DEVICE }, { "block-commit/arg-type/*top", QEMU_CAPS_ACTIVE_COMMIT },
The only possible value is 'maximum' which makes l2_cache_size large enough to keep all metadata in memory. This will unleash disks performace for some database workloads and IO benchmarks with random access to whole disk. Note that imlementation sets l2-cache-size and not cache-size. Both *cache-size's is upper limit on cache size value thus instead of setting precise limit for disk which involves knowing disk size and disk's cluster size we can just set INT64_MAX. Unfortunately both old and new versions of qemu fail on setting cache-size to INT64_MAX. Fortunately both old and new versions works well on such setting for l2-cache-size. As guest performance depends only l2 cache size and not refcount cache size (which is documented in recent qemu) we can set l2 directly. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- docs/formatdomain.html.in | 7 ++++ docs/schemas/domaincommon.rng | 10 +++++ src/conf/domain_conf.c | 17 ++++++++ src/conf/domain_conf.h | 9 ++++ src/qemu/qemu_command.c | 26 ++++++++++++ src/qemu/qemu_domain.c | 1 + .../qemuxml2argvdata/disk-metadata_cache_size.args | 34 +++++++++++++++ .../qemuxml2argvdata/disk-metadata_cache_size.xml | 42 +++++++++++++++++++ tests/qemuxml2argvtest.c | 2 + .../disk-metadata_cache_size.xml | 48 ++++++++++++++++++++++ tests/qemuxml2xmltest.c | 2 + 11 files changed, 198 insertions(+) create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.args create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.xml create mode 100644 tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 8189959..93e0009 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3556,6 +3556,13 @@ virt queues for virtio-blk. (<span class="since">Since 3.9.0</span>) </li> <li> + The optional <code>metadata_cache_size</code> attribute specifies + metadata cache size policy. The only possible value is "maximum" to + keep all metadata in cache, this will help if workload needs access + to whole disk all the time. (<span class="since">Since + 4.9.0</span>) + </li> + <li> For virtio disks, <a href="#elementsVirtio">Virtio-specific options</a> can also be set. (<span class="since">Since 3.5.0</span>) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 099a949..18efa3a 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1990,6 +1990,9 @@ <ref name="detect_zeroes"/> </optional> <optional> + <ref name="metadata_cache_size"/> + </optional> + <optional> <attribute name='queues'> <ref name="positiveInteger"/> </attribute> @@ -2090,6 +2093,13 @@ </choice> </attribute> </define> + <define name="metadata_cache_size"> + <attribute name='metadata_cache_size'> + <choice> + <value>maximum</value> + </choice> + </attribute> + </define> <define name="controller"> <element name="controller"> <optional> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index e8e0adc..04383f0 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -885,6 +885,11 @@ VIR_ENUM_IMPL(virDomainDiskDetectZeroes, VIR_DOMAIN_DISK_DETECT_ZEROES_LAST, "on", "unmap") +VIR_ENUM_IMPL(virDomainDiskMetadataCacheSize, + VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST, + "default", + "maximum") + VIR_ENUM_IMPL(virDomainDiskMirrorState, VIR_DOMAIN_DISK_MIRROR_STATE_LAST, "none", "yes", @@ -9347,6 +9352,14 @@ virDomainDiskDefDriverParseXML(virDomainDiskDefPtr def, } VIR_FREE(tmp); + if ((tmp = virXMLPropString(cur, "metadata_cache_size")) && + (def->metadata_cache_size = virDomainDiskMetadataCacheSizeTypeFromString(tmp)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown driver metadata_cache_size value '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + ret = 0; cleanup: @@ -23874,6 +23887,10 @@ virDomainDiskDefFormatDriver(virBufferPtr buf, if (disk->queues) virBufferAsprintf(&driverBuf, " queues='%u'", disk->queues); + if (disk->metadata_cache_size) + virBufferAsprintf(&driverBuf, " metadata_cache_size='%s'", + virDomainDiskMetadataCacheSizeTypeToString(disk->metadata_cache_size)); + virDomainVirtioOptionsFormat(&driverBuf, disk->virtio); ret = virXMLFormatElement(buf, "driver", &driverBuf, NULL); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e30a4b2..b155058 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -568,6 +568,13 @@ typedef enum { VIR_DOMAIN_DISK_DETECT_ZEROES_LAST } virDomainDiskDetectZeroes; +typedef enum { + VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_DEFAULT = 0, + VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM, + + VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST +} virDomainDiskMetadataCacheSize; + typedef struct _virDomainBlockIoTuneInfo virDomainBlockIoTuneInfo; struct _virDomainBlockIoTuneInfo { unsigned long long total_bytes_sec; @@ -672,6 +679,7 @@ struct _virDomainDiskDef { int discard; /* enum virDomainDiskDiscard */ unsigned int iothread; /* unused = 0, > 0 specific thread # */ int detect_zeroes; /* enum virDomainDiskDetectZeroes */ + int metadata_cache_size; /* enum virDomainDiskMetadataCacheSize */ char *domain_name; /* backend domain name */ unsigned int queues; virDomainVirtioOptionsPtr virtio; @@ -3388,6 +3396,7 @@ VIR_ENUM_DECL(virDomainDeviceSGIO) VIR_ENUM_DECL(virDomainDiskTray) VIR_ENUM_DECL(virDomainDiskDiscard) VIR_ENUM_DECL(virDomainDiskDetectZeroes) +VIR_ENUM_DECL(virDomainDiskMetadataCacheSize) VIR_ENUM_DECL(virDomainDiskMirrorState) VIR_ENUM_DECL(virDomainController) VIR_ENUM_DECL(virDomainControllerModelPCI) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1ff593c..b33e6a5 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1330,6 +1330,21 @@ qemuCheckDiskConfig(virDomainDiskDefPtr disk, return -1; } + if (disk->metadata_cache_size && + (disk->src->type != VIR_STORAGE_TYPE_FILE || + disk->src->format != VIR_STORAGE_FILE_QCOW2)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("metadata_cache_size can only be set for qcow2 disks")); + return -1; + } + + if (disk->metadata_cache_size && + disk->metadata_cache_size != VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("metadata_cache_size can only be set to 'maximum'")); + return -1; + } + if (qemuCaps) { if (disk->serial && disk->bus == VIR_DOMAIN_DISK_BUS_SCSI && @@ -1353,6 +1368,14 @@ qemuCheckDiskConfig(virDomainDiskDefPtr disk, _("detect_zeroes is not supported by this QEMU binary")); return -1; } + + if (disk->metadata_cache_size && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("setting metadata_cache_size is not supported by " + "this QEMU binary")); + return -1; + } } if (disk->serial && @@ -1776,6 +1799,9 @@ qemuBuildDriveStr(virDomainDiskDefPtr disk, virDomainDiskIoTypeToString(disk->iomode)); } + if (disk->metadata_cache_size == VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM) + virBufferAsprintf(&opt, ",l2-cache-size=%ld", INT64_MAX); + qemuBuildDiskThrottling(disk, &opt); if (virBufferCheckError(&opt) < 0) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ba3fff6..896adf3 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -9074,6 +9074,7 @@ qemuDomainDiskChangeSupported(virDomainDiskDefPtr disk, /* "snapshot" is a libvirt internal field and thus can be changed */ /* startupPolicy is allowed to be updated. Therefore not checked here. */ CHECK_EQ(transient, "transient", true); + CHECK_EQ(metadata_cache_size, "metadata_cache_size", true); /* Note: For some address types the address auto generation for * @disk has still not happened at this point (e.g. driver diff --git a/tests/qemuxml2argvdata/disk-metadata_cache_size.args b/tests/qemuxml2argvdata/disk-metadata_cache_size.args new file mode 100644 index 0000000..5e67519 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-metadata_cache_size.args @@ -0,0 +1,34 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name test \ +-S \ +-machine pc-0.13,accel=tcg,usb=off,dump-guest-core=off \ +-m 1024 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid 468404ad-d49c-40f2-9e14-02294f9c1be3 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-test/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot menu=on \ +-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \ +-usb \ +-drive file=/var/lib/libvirt/images/f14.img,format=qcow2,if=none,\ +id=drive-virtio-disk0,l2-cache-size=9223372036854775807 \ +-device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,\ +id=virtio-disk0,bootindex=2 \ +-drive file=/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso,format=raw,if=none,\ +id=drive-ide0-1-0,media=cdrom,readonly=on \ +-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,\ +bootindex=1 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/disk-metadata_cache_size.xml b/tests/qemuxml2argvdata/disk-metadata_cache_size.xml new file mode 100644 index 0000000..8ac2599 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-metadata_cache_size.xml @@ -0,0 +1,42 @@ +<domain type='qemu'> + <name>test</name> + <uuid>468404ad-d49c-40f2-9e14-02294f9c1be3</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='x86_64' machine='pc-0.13'>hvm</type> + <boot dev='cdrom'/> + <boot dev='hd'/> + <bootmenu enable='yes'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' metadata_cache_size='maximum'/> + <source file='/var/lib/libvirt/images/f14.img'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </disk> + <disk type='file' device='cdrom'> + <driver name='qemu' type='raw'/> + <source file='/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso'/> + <target dev='hdc' bus='ide'/> + <readonly/> + <address type='drive' controller='0' bus='1' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='virtio-serial' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </controller> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 39a7f1f..a0a2ff3 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3044,6 +3044,8 @@ mymain(void) DO_TEST_CAPS_ARCH_LATEST("x86_64-pc-headless", "x86_64"); DO_TEST_CAPS_ARCH_LATEST("x86_64-q35-headless", "x86_64"); + DO_TEST("disk-metadata_cache_size", QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE); + if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL) virFileDeleteTree(fakerootdir); diff --git a/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml b/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml new file mode 100644 index 0000000..5fed22b --- /dev/null +++ b/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml @@ -0,0 +1,48 @@ +<domain type='qemu'> + <name>test</name> + <uuid>468404ad-d49c-40f2-9e14-02294f9c1be3</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='x86_64' machine='pc-0.13'>hvm</type> + <boot dev='cdrom'/> + <boot dev='hd'/> + <bootmenu enable='yes'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' metadata_cache_size='maximum'/> + <source file='/var/lib/libvirt/images/f14.img'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </disk> + <disk type='file' device='cdrom'> + <driver name='qemu' type='raw'/> + <source file='/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso'/> + <target dev='hdc' bus='ide'/> + <readonly/> + <address type='drive' controller='0' bus='1' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='virtio-serial' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </controller> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 89640f6..c44e0fe 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1235,6 +1235,8 @@ mymain(void) DO_TEST("riscv64-virt", QEMU_CAPS_DEVICE_VIRTIO_MMIO); + DO_TEST("disk-metadata_cache_size", NONE); + if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL) virFileDeleteTree(fakerootdir); -- 1.8.3.1
On Thu, Nov 01, 2018 at 14:32:23 +0300, Nikolay Shirokovskiy wrote:
The only possible value is 'maximum' which makes l2_cache_size large enough to keep all metadata in memory. This will unleash disks performace for some database workloads and IO benchmarks with random access to whole disk.
Note that imlementation sets l2-cache-size and not cache-size. Both *cache-size's is upper limit on cache size value thus instead of setting precise limit for disk which involves knowing disk size and disk's cluster size we can just set INT64_MAX. Unfortunately both old and new versions of qemu fail on setting cache-size to INT64_MAX. Fortunately both old and new versions works well on such setting for l2-cache-size. As guest performance depends only l2 cache size and not refcount cache size (which is documented in recent qemu) we can set l2 directly.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- docs/formatdomain.html.in | 7 ++++ docs/schemas/domaincommon.rng | 10 +++++ src/conf/domain_conf.c | 17 ++++++++ src/conf/domain_conf.h | 9 ++++ src/qemu/qemu_command.c | 26 ++++++++++++ src/qemu/qemu_domain.c | 1 + .../qemuxml2argvdata/disk-metadata_cache_size.args | 34 +++++++++++++++ .../qemuxml2argvdata/disk-metadata_cache_size.xml | 42 +++++++++++++++++++ tests/qemuxml2argvtest.c | 2 + .../disk-metadata_cache_size.xml | 48 ++++++++++++++++++++++ tests/qemuxml2xmltest.c | 2 + 11 files changed, 198 insertions(+) create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.args create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.xml create mode 100644 tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 8189959..93e0009 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3556,6 +3556,13 @@ virt queues for virtio-blk. (<span class="since">Since 3.9.0</span>) </li> <li> + The optional <code>metadata_cache_size</code> attribute specifies + metadata cache size policy. The only possible value is "maximum" to + keep all metadata in cache, this will help if workload needs access + to whole disk all the time. (<span class="since">Since + 4.9.0</span>)
I wanted to complain that we prefer camelCase to underscores generally, but given that the <driver> element has at least 4 attributes using underscores that point would be moot. What's missing though is the description of the default value when the attribute is not present. Also I think that we should allow to pass "default" as a valid argument.
+ </li> + <li> For virtio disks, <a href="#elementsVirtio">Virtio-specific options</a> can also be set. (<span class="since">Since 3.5.0</span>) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 099a949..18efa3a 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1990,6 +1990,9 @@ <ref name="detect_zeroes"/> </optional> <optional> + <ref name="metadata_cache_size"/> + </optional> + <optional> <attribute name='queues'> <ref name="positiveInteger"/> </attribute> @@ -2090,6 +2093,13 @@ </choice> </attribute> </define> + <define name="metadata_cache_size"> + <attribute name='metadata_cache_size'> + <choice> + <value>maximum</value>
Here default should be allowed as well.
+ </choice> + </attribute> + </define> <define name="controller"> <element name="controller"> <optional>
[...]
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1ff593c..b33e6a5 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1330,6 +1330,21 @@ qemuCheckDiskConfig(virDomainDiskDefPtr disk, return -1; }
+ if (disk->metadata_cache_size && + (disk->src->type != VIR_STORAGE_TYPE_FILE ||
Why don't do this for network-based qcow2s?
+ disk->src->format != VIR_STORAGE_FILE_QCOW2)) {
Note that a QCOW2 can also be a part of the backing chain where the top image format is not qcow2. In such case it would not work. Without -blockdev support I don't see a possibility to expose that to qemu though since I expect the -drive command being rejected if the top image is not a qcow2.
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("metadata_cache_size can only be set for qcow2 disks")); + return -1; + } + + if (disk->metadata_cache_size &&
This part is common to both of the above conditions.
+ disk->metadata_cache_size != VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("metadata_cache_size can only be set to 'maximum'")); + return -1; + } + if (qemuCaps) { if (disk->serial && disk->bus == VIR_DOMAIN_DISK_BUS_SCSI && @@ -1353,6 +1368,14 @@ qemuCheckDiskConfig(virDomainDiskDefPtr disk, _("detect_zeroes is not supported by this QEMU binary")); return -1; } + + if (disk->metadata_cache_size && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("setting metadata_cache_size is not supported by " + "this QEMU binary")); + return -1; + } }
if (disk->serial && @@ -1776,6 +1799,9 @@ qemuBuildDriveStr(virDomainDiskDefPtr disk, virDomainDiskIoTypeToString(disk->iomode));
Additions to XML should be in a separate patch from actual implementation.
}
+ if (disk->metadata_cache_size == VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM) + virBufferAsprintf(&opt, ",l2-cache-size=%ld", INT64_MAX); + qemuBuildDiskThrottling(disk, &opt);
if (virBufferCheckError(&opt) < 0) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ba3fff6..896adf3 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -9074,6 +9074,7 @@ qemuDomainDiskChangeSupported(virDomainDiskDefPtr disk, /* "snapshot" is a libvirt internal field and thus can be changed */ /* startupPolicy is allowed to be updated. Therefore not checked here. */ CHECK_EQ(transient, "transient", true); + CHECK_EQ(metadata_cache_size, "metadata_cache_size", true);
/* Note: For some address types the address auto generation for * @disk has still not happened at this point (e.g. driver
[...]
diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 39a7f1f..a0a2ff3 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3044,6 +3044,8 @@ mymain(void) DO_TEST_CAPS_ARCH_LATEST("x86_64-pc-headless", "x86_64"); DO_TEST_CAPS_ARCH_LATEST("x86_64-q35-headless", "x86_64");
+ DO_TEST("disk-metadata_cache_size", QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE);
Please use DO_TEST_LATEST for this rather than hardcoding caps.
+ if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL) virFileDeleteTree(fakerootdir);
Just set l2-cache-size to INT64_MAX for all format nodes of qcow2 type in block node graph. AFAIK this is sane because *actual* cache size depends on size of data being referenced in image and thus the total size of all cache sizes for all images in disk backing chain will not exceed the cache size that covers just one full image as in case of no backing chain. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_block.c | 5 ++++- src/qemu/qemu_domain.c | 1 + src/util/virstoragefile.c | 1 + src/util/virstoragefile.h | 1 + 4 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c index 5321dda..8771cc1 100644 --- a/src/qemu/qemu_block.c +++ b/src/qemu/qemu_block.c @@ -1322,7 +1322,6 @@ qemuBlockStorageSourceGetFormatQcow2Props(virStorageSourcePtr src, * 'pass-discard-snapshot' * 'pass-discard-other' * 'overlap-check' - * 'l2-cache-size' * 'l2-cache-entry-size' * 'refcount-cache-size' * 'cache-clean-interval' @@ -1331,6 +1330,10 @@ qemuBlockStorageSourceGetFormatQcow2Props(virStorageSourcePtr src, if (qemuBlockStorageSourceGetFormatQcowGenericProps(src, "qcow2", props) < 0) return -1; + if (src->metadata_cache_size == VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM && + virJSONValueObjectAdd(props, "I:l2-cache-size", INT64_MAX, NULL) < 0) + return -1; + return 0; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 896adf3..f87cfd2 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -13245,6 +13245,7 @@ qemuDomainPrepareDiskSourceData(virDomainDiskDefPtr disk, src->iomode = disk->iomode; src->cachemode = disk->cachemode; src->discard = disk->discard; + src->metadata_cache_size = disk->metadata_cache_size; if (disk->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY) src->floppyimg = true; diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 94c32d8..9089e2f 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -2210,6 +2210,7 @@ virStorageSourceCopy(const virStorageSource *src, ret->cachemode = src->cachemode; ret->discard = src->discard; ret->detect_zeroes = src->detect_zeroes; + ret->metadata_cache_size = src->metadata_cache_size; /* storage driver metadata are not copied */ ret->drv = NULL; diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index 3ff6c4f..8b57399 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -331,6 +331,7 @@ struct _virStorageSource { int cachemode; /* enum virDomainDiskCache */ int discard; /* enum virDomainDiskDiscard */ int detect_zeroes; /* enum virDomainDiskDetectZeroes */ + int metadata_cache_size; /* enum virDomainDiskImageMetadataCacheSize */ bool floppyimg; /* set to true if the storage source is going to be used as a source for floppy drive */ -- 1.8.3.1
Am 01.11.2018 um 12:32 hat Nikolay Shirokovskiy geschrieben:
Just set l2-cache-size to INT64_MAX for all format nodes of qcow2 type in block node graph.
AFAIK this is sane because *actual* cache size depends on size of data being referenced in image and thus the total size of all cache sizes for all images in disk backing chain will not exceed the cache size that covers just one full image as in case of no backing chain.
This is not quite correct. Starting from qemu 3.1, INT64_MAX will add a cache that covers the whole image. Memory is only used if a cache entry is actually used, so if you never access the backing file, it doesn't really use any memory. However, the granularity isn't a single cluster here, but L2 tables. So if some L2 table contains one cluster in the overlay and another cluster in the backing file, and both are accessed, the L2 table will be cached in both the overlay and th backing file. More importantly, before 3.1, I think QEMU will actually try to allocate 2^63-1 bytes for the cache and fail to open the image. So we can't do this unconditionally. (Is this patch tested with a real QEMU?) Kevin
On 02.11.2018 13:23, Kevin Wolf wrote:
Am 01.11.2018 um 12:32 hat Nikolay Shirokovskiy geschrieben:
Just set l2-cache-size to INT64_MAX for all format nodes of qcow2 type in block node graph.
AFAIK this is sane because *actual* cache size depends on size of data being referenced in image and thus the total size of all cache sizes for all images in disk backing chain will not exceed the cache size that covers just one full image as in case of no backing chain.
This is not quite correct.
Starting from qemu 3.1, INT64_MAX will add a cache that covers the whole image. Memory is only used if a cache entry is actually used, so if you never access the backing file, it doesn't really use any memory. However, the granularity isn't a single cluster here, but L2 tables. So if some L2 table contains one cluster in the overlay and another cluster in the backing file, and both are accessed, the L2 table will be cached in both the overlay and th backing file.
So we can end up N times bigger memory usage for L2 cache in case of backing chain comparing to single image?
More importantly, before 3.1, I think QEMU will actually try to allocate 2^63-1 bytes for the cache and fail to open the image. So we can't do this unconditionally.
(Is this patch tested with a real QEMU?)
Test was quite minimal: upstream and 2.10 but with recent patches that makes setting INT64_MAX possible I guess. Ok then we have to use version check in capabilities instead of feature test. Nikolay
Am 02.11.2018 um 12:37 hat Nikolay Shirokovskiy geschrieben:
On 02.11.2018 13:23, Kevin Wolf wrote:
Am 01.11.2018 um 12:32 hat Nikolay Shirokovskiy geschrieben:
Just set l2-cache-size to INT64_MAX for all format nodes of qcow2 type in block node graph.
AFAIK this is sane because *actual* cache size depends on size of data being referenced in image and thus the total size of all cache sizes for all images in disk backing chain will not exceed the cache size that covers just one full image as in case of no backing chain.
This is not quite correct.
Starting from qemu 3.1, INT64_MAX will add a cache that covers the whole image. Memory is only used if a cache entry is actually used, so if you never access the backing file, it doesn't really use any memory. However, the granularity isn't a single cluster here, but L2 tables. So if some L2 table contains one cluster in the overlay and another cluster in the backing file, and both are accessed, the L2 table will be cached in both the overlay and th backing file.
So we can end up N times bigger memory usage for L2 cache in case of backing chain comparing to single image?
In the worst case, yes. Kevin
Am 01.11.2018 um 12:32 hat Nikolay Shirokovskiy geschrieben:
Hi, all.
This is a patch series after offlist agreement on introducing metadata-cache-size option for disks. The options itself is described in 2nd patch of the series.
There is a plenty of attempts to add option to set qcow2 metadata cache sizes in past, see [1] [2] to name recent that has comments I tried to address or has work I reused to some extent.
I would like to address Peter's comment in [1] that this option is image's option rather then disk's here. While this makes sense if we set specific cache value that covers disk only partially, here when we have maximum policy that covers whole disk it makes sense to set same value for all images. The setted value is only upper limit on cache size and thus cache for every image will grow proportionally to image data size "visible from top" eventually I guess. And these sizes are changing during guest lifetime - thus we need set whole disk limit for every image for 'maxium' effect.
On the other hand if we add policies different from 'maximum' it may require per image option. Well I can't imagine name for element for backing chain that will have cache size attribute better then 'driver'). And driver is already used for top image so I leave it this way.
KNOWN ISSUES
1. when using -driver to configure disks in command line cache size does not get propagated thru backing chain
2. when making external disk snapshot cache size sneak into backing file filename and thus cache size for backing image became remembered there
3. blockcommit after such snapshot will not work because libvirt is not ready for backing file name is reported as json sometimes
Looks like 1 can be addressed in qemu.
I don't think we want to add inheritance of driver-specific options. Option inheritance is already a bit messy with options that every driver understands. And -drive is the obsolete interface anyway, when you use -blockdev you specify all nodes explicitly.
The reason for behaviour in 2 I do not understand honestly.
QEMU doesn't recognise that the cache size option doesn't affect which data you're accessing. Max had a series that should probably fix it. Maybe we should finally get it merged. Anyway, I suppose if you use libvirt in -blockdev mode, it doesn't make a difference anyway, because libvirt should then manually call blockdev-create for the overlay and therefore specify the backing file string explicitly. Can you confirm this in practice?
And 3 will be naturally addessed after blockjobs starts working with blockdev configuration I guess.
Hopefully (Peter?), but depending on 2. it might not even be necessary if libvirt doesn't explicitly store json: pseudo-filenames. Kevin
On 02.11.2018 13:11, Kevin Wolf wrote:
Am 01.11.2018 um 12:32 hat Nikolay Shirokovskiy geschrieben:
Hi, all.
This is a patch series after offlist agreement on introducing metadata-cache-size option for disks. The options itself is described in 2nd patch of the series.
There is a plenty of attempts to add option to set qcow2 metadata cache sizes in past, see [1] [2] to name recent that has comments I tried to address or has work I reused to some extent.
I would like to address Peter's comment in [1] that this option is image's option rather then disk's here. While this makes sense if we set specific cache value that covers disk only partially, here when we have maximum policy that covers whole disk it makes sense to set same value for all images. The setted value is only upper limit on cache size and thus cache for every image will grow proportionally to image data size "visible from top" eventually I guess. And these sizes are changing during guest lifetime - thus we need set whole disk limit for every image for 'maxium' effect.
On the other hand if we add policies different from 'maximum' it may require per image option. Well I can't imagine name for element for backing chain that will have cache size attribute better then 'driver'). And driver is already used for top image so I leave it this way.
KNOWN ISSUES
1. when using -driver to configure disks in command line cache size does not get propagated thru backing chain
2. when making external disk snapshot cache size sneak into backing file filename and thus cache size for backing image became remembered there
3. blockcommit after such snapshot will not work because libvirt is not ready for backing file name is reported as json sometimes
Looks like 1 can be addressed in qemu.
I don't think we want to add inheritance of driver-specific options. Option inheritance is already a bit messy with options that every driver understands.
And -drive is the obsolete interface anyway, when you use -blockdev you specify all nodes explicitly.
The reason for behaviour in 2 I do not understand honestly.
QEMU doesn't recognise that the cache size option doesn't affect which data you're accessing. Max had a series that should probably fix it. Maybe we should finally get it merged.
Anyway, I suppose if you use libvirt in -blockdev mode, it doesn't make a difference anyway, because libvirt should then manually call blockdev-create for the overlay and therefore specify the backing file string explicitly.
Can you confirm this in practice?
Yes, there is no problem when using -blockdev. Cache size will be set explicitly for every qcow2 format node. The issue is only with -drive, when you for example start VM with 'maximum' policy, make a snaphot, destoy VM, change policy to default and then start VM again. It is supposed that as cache size is not specified then it is set to qemu's default value but it is not for backing file because cache size is recorded in filename. So if we due to inheritance issue we won't use this policy until libvirt switch to blockdev then this point does not matter. Nikolay
And 3 will be naturally addessed after blockjobs starts working with blockdev configuration I guess.
Hopefully (Peter?), but depending on 2. it might not even be necessary if libvirt doesn't explicitly store json: pseudo-filenames.
Kevin
On Fri, Nov 02, 2018 at 11:11:50 +0100, Kevin Wolf wrote:
Am 01.11.2018 um 12:32 hat Nikolay Shirokovskiy geschrieben:
Hi, all.
This is a patch series after offlist agreement on introducing metadata-cache-size option for disks. The options itself is described in 2nd patch of the series.
There is a plenty of attempts to add option to set qcow2 metadata cache sizes in past, see [1] [2] to name recent that has comments I tried to address or has work I reused to some extent.
I would like to address Peter's comment in [1] that this option is image's option rather then disk's here. While this makes sense if we set specific cache value that covers disk only partially, here when we have maximum policy that covers whole disk it makes sense to set same value for all images. The setted value is only upper limit on cache size and thus cache for every image will grow proportionally to image data size "visible from top" eventually I guess. And these sizes are changing during guest lifetime - thus we need set whole disk limit for every image for 'maxium' effect.
On the other hand if we add policies different from 'maximum' it may require per image option. Well I can't imagine name for element for backing chain that will have cache size attribute better then 'driver'). And driver is already used for top image so I leave it this way.
KNOWN ISSUES
1. when using -driver to configure disks in command line cache size does not get propagated thru backing chain
2. when making external disk snapshot cache size sneak into backing file filename and thus cache size for backing image became remembered there
3. blockcommit after such snapshot will not work because libvirt is not ready for backing file name is reported as json sometimes
Looks like 1 can be addressed in qemu.
I don't think we want to add inheritance of driver-specific options. Option inheritance is already a bit messy with options that every driver understands.
And -drive is the obsolete interface anyway, when you use -blockdev you specify all nodes explicitly.
So this would mean we get different behaviour depending on whether -blockdev is supported by libvirt and qemu or not. This means that there are the following possibilities: 1) allow this feature only with -blockdev 2) limit this only to the top layer image 3) make it configurable per backing chain entry. Given our discussion on the KVM forum, I don't think it makes sense to do 3, 2 would make it incomplete, so 1 is the only reasonable option if qemu will not do the inheritance.
The reason for behaviour in 2 I do not understand honestly.
QEMU doesn't recognise that the cache size option doesn't affect which data you're accessing. Max had a series that should probably fix it. Maybe we should finally get it merged.
Anyway, I suppose if you use libvirt in -blockdev mode, it doesn't make a difference anyway, because libvirt should then manually call blockdev-create for the overlay and therefore specify the backing file string explicitly.
Yes, in that case we'll create it manually with the correct backing store path. Currently we'd ignore such an entry in the backing store path when detecting the chain from the disk so this should not affect anything.
Can you confirm this in practice?
And 3 will be naturally addessed after blockjobs starts working with blockdev configuration I guess.
Did you test this? We do support backing files with 'json:' pseudo protocol.
Hopefully (Peter?), but depending on 2. it might not even be necessary if libvirt doesn't explicitly store json: pseudo-filenames.
Well, we will store them eventually in json pseudo-protocol format since in some cases (e.g. multi-host gluster) we don't have any other option.
On 07.11.2018 17:42, Peter Krempa wrote:
On Fri, Nov 02, 2018 at 11:11:50 +0100, Kevin Wolf wrote:
Am 01.11.2018 um 12:32 hat Nikolay Shirokovskiy geschrieben:
Hi, all.
This is a patch series after offlist agreement on introducing metadata-cache-size option for disks. The options itself is described in 2nd patch of the series.
There is a plenty of attempts to add option to set qcow2 metadata cache sizes in past, see [1] [2] to name recent that has comments I tried to address or has work I reused to some extent.
I would like to address Peter's comment in [1] that this option is image's option rather then disk's here. While this makes sense if we set specific cache value that covers disk only partially, here when we have maximum policy that covers whole disk it makes sense to set same value for all images. The setted value is only upper limit on cache size and thus cache for every image will grow proportionally to image data size "visible from top" eventually I guess. And these sizes are changing during guest lifetime - thus we need set whole disk limit for every image for 'maxium' effect.
On the other hand if we add policies different from 'maximum' it may require per image option. Well I can't imagine name for element for backing chain that will have cache size attribute better then 'driver'). And driver is already used for top image so I leave it this way.
KNOWN ISSUES
1. when using -driver to configure disks in command line cache size does not get propagated thru backing chain
2. when making external disk snapshot cache size sneak into backing file filename and thus cache size for backing image became remembered there
3. blockcommit after such snapshot will not work because libvirt is not ready for backing file name is reported as json sometimes
Looks like 1 can be addressed in qemu.
I don't think we want to add inheritance of driver-specific options. Option inheritance is already a bit messy with options that every driver understands.
And -drive is the obsolete interface anyway, when you use -blockdev you specify all nodes explicitly.
So this would mean we get different behaviour depending on whether -blockdev is supported by libvirt and qemu or not.
This means that there are the following possibilities:
1) allow this feature only with -blockdev
2) limit this only to the top layer image
3) make it configurable per backing chain entry.
Given our discussion on the KVM forum, I don't think it makes sense to do 3, 2 would make it incomplete, so 1 is the only reasonable option if qemu will not do the inheritance.
The reason for behaviour in 2 I do not understand honestly.
QEMU doesn't recognise that the cache size option doesn't affect which data you're accessing. Max had a series that should probably fix it. Maybe we should finally get it merged.
Anyway, I suppose if you use libvirt in -blockdev mode, it doesn't make a difference anyway, because libvirt should then manually call blockdev-create for the overlay and therefore specify the backing file string explicitly.
Yes, in that case we'll create it manually with the correct backing store path. Currently we'd ignore such an entry in the backing store path when detecting the chain from the disk so this should not affect anything.
Can you confirm this in practice?
And 3 will be naturally addessed after blockjobs starts working with blockdev configuration I guess.
Did you test this? We do support backing files with 'json:' pseudo protocol.
A kind of :) I can not even create snapshot when using -blockdev (setting/unsetting metadata_cache_size makes no different) # cat snap.xml <domainsnapshot> <disks> <disk name="sda" snapshot="external"/> </disks> </domainsnapshot> # virsh snapshot-create VM snap.xml --disk-only --no-metadata error: internal error: unable to execute QEMU command 'transaction': Cannot find device=drive-scsi0-0-0-0 nor node_name= But if I create snapshot with qemu using -drive, then destroy domain, change qemu binary to the one that supports -blockdev, start domain and blockcommit I get: # virsh blockcommit VM sda --pivot error: internal error: unable to find backing name for device drive-scsi0-0-0-0 Yeah, looks like this does not relate to json pseudo protocol in backing file, sorry. It is just due to blockjobs are not yet supported for -blockdev as mentioned in [1] Ahhh, blockcommit failed with message: error: internal error: qemu block name 'json:{"driver": "qcow2", "l2-cache-size": "9223372036854775807", "file": {"driver": "file", "filename": "/somepath/harddisk.hdd"}}' doesn't match expected '/somepath/harddisk.hdd' for versions of libvirt before blockdev patches (libvirt-3.9.0). Sorry again. So looks like we can merge the series after addressing your comments and leave only support for -blockdev configurations. [1] https://www.redhat.com/archives/libvir-list/2018-August/msg00755.html Nikolay
Hopefully (Peter?), but depending on 2. it might not even be necessary if libvirt doesn't explicitly store json: pseudo-filenames.
Well, we will store them eventually in json pseudo-protocol format since in some cases (e.g. multi-host gluster) we don't have any other option.
participants (3)
-
Kevin Wolf -
Nikolay Shirokovskiy -
Peter Krempa