[PATCH Libvirt v2 0/3] support discard and write-zeroes options for virtio-blk device

v2: This version made some modifications. (include suggestion of Jonathon) - rebase on master - assume QEMU support discard and write-zeroes and drop the capability introduction and probe - add validation for options, only allowing attributes be configured for VIRTIO bus - enrich the test case - update NEWS Thanks Jonathon for pointing out the redundant commit in time. Please review! Yong DISCARD and WRITE_ZEROES commands has been implemented in virtio-blk protocol since qemu >= 4.2.0, may be it's time to introduce discard and write-zeroes options for virtio-blk device in libvirt so that the upper layer can enable this feature at disk granularity. To distinguish the discard option in block drive layer, use the 'virtio' prefix to indicate that these attributes are specific for virtio-blk disk. To try this out, three things has done in this patchset: 1. introduce capabilities of discard and write-zeroes for virtio-blk 2. add virtio_discard and virtio_write_zeroes attributes of driver in disk xml element 3. generate cmd line when launching vm Hyman Huang(黄勇) (3): conf: Add 'virtio_discard' and 'virtio_write_zeroes' attributes qemu_command: Generate cmd line for discard and write-zeroes properties NEWS: Mention support for discard and write_zeroes of virtio-blk device NEWS.rst | 8 +++ docs/formatdomain.rst | 8 +++ src/conf/domain_conf.c | 16 +++++ src/conf/domain_conf.h | 2 + src/conf/schemas/domaincommon.rng | 10 ++++ src/conf/storage_source_conf.c | 2 + src/conf/storage_source_conf.h | 2 + src/qemu/qemu_command.c | 2 + src/qemu/qemu_domain.c | 2 + src/qemu/qemu_driver.c | 4 +- src/qemu/qemu_validate.c | 9 +++ src/vz/vz_utils.c | 12 ++++ .../disk-virtio-discard.x86_64-latest.args | 45 ++++++++++++++ .../qemuxml2argvdata/disk-virtio-discard.xml | 45 ++++++++++++++ tests/qemuxml2argvtest.c | 1 + .../disk-virtio-discard.x86_64-latest.xml | 58 +++++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 17 files changed, 226 insertions(+), 1 deletion(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-discard.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/disk-virtio-discard.xml create mode 100644 tests/qemuxml2xmloutdata/disk-virtio-discard.x86_64-latest.xml -- 2.38.5

From: Hyman Huang(黄勇) <yong.huang@smartx.com> Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control whether discard and write-zeroes requests are handled by the virtio-blk device. To distinguish the attributes in block drive layer, use the 'virtio' prefix to indicate that these attributes are specific for virtio-blk. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> --- docs/formatdomain.rst | 8 ++++++++ src/conf/domain_conf.c | 16 ++++++++++++++++ src/conf/domain_conf.h | 2 ++ src/conf/schemas/domaincommon.rng | 10 ++++++++++ src/conf/storage_source_conf.c | 2 ++ src/conf/storage_source_conf.h | 2 ++ src/qemu/qemu_domain.c | 2 ++ src/qemu/qemu_driver.c | 4 +++- src/vz/vz_utils.c | 12 ++++++++++++ 9 files changed, 57 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 4af0b82569..7be12ff08e 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the ``disk`` element. value can be either "unmap" (allow the discard request to be passed) or "ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU and KVM only)` + - The optional ``virtio_discard`` and ``virtio_write_zeroes`` are attributes + that control whether discard and write-zeroes requests are handled by the + virtio-blk device. The feature is based on DISCARD and WRITE_ZEROES + commands introduced in virtio-blk protocol to improve performance when + using SSD backend. The value can be either 'on' or 'off'. Note that + ``discard`` and ``write_zeroes`` implementations in the block drive layer + are parts of the feature logically and should be turned on when enabling + the feature. :since:`Since 9.6.0 (QEMU and KVM only)` - The optional ``detect_zeroes`` attribute controls whether to detect zero write requests. The value can be "off", "on" or "unmap". First two values turn the detection off and on, respectively. The third value ("unmap") diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 5ac5c0b771..0f82b489f4 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7813,6 +7813,14 @@ virDomainDiskDefDriverParseXML(virDomainDiskDef *def, VIR_XML_PROP_NONZERO, &def->discard) < 0) return -1; + if (virXMLPropTristateSwitch(cur, "virtio_discard", VIR_XML_PROP_NONE, + &def->virtio_discard) < 0) + return -1; + + if (virXMLPropTristateSwitch(cur, "virtio_write_zeroes", VIR_XML_PROP_NONE, + &def->virtio_write_zeroes) < 0) + return -1; + if (virXMLPropUInt(cur, "iothread", 10, VIR_XML_PROP_NONZERO, &def->iothread) < 0) return -1; @@ -22514,6 +22522,14 @@ virDomainDiskDefFormatDriver(virBuffer *buf, virBufferAsprintf(&attrBuf, " discard='%s'", virDomainDiskDiscardTypeToString(disk->discard)); + if (disk->virtio_discard) + virBufferAsprintf(&attrBuf, " virtio_discard='%s'", + virTristateSwitchTypeToString(disk->virtio_discard)); + + if (disk->virtio_write_zeroes) + virBufferAsprintf(&attrBuf, " virtio_write_zeroes='%s'", + virTristateSwitchTypeToString(disk->virtio_write_zeroes)); + if (disk->iothread) virBufferAsprintf(&attrBuf, " iothread='%u'", disk->iothread); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index c857ba556f..86e955333e 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -607,6 +607,8 @@ struct _virDomainDiskDef { unsigned int iothread; /* unused = 0, > 0 specific thread # */ virDomainDiskDetectZeroes detect_zeroes; virTristateSwitch discard_no_unref; + virTristateSwitch virtio_discard; + virTristateSwitch virtio_write_zeroes; char *domain_name; /* backend domain name */ unsigned int queues; unsigned int queue_size; diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index c2f56b0490..c3a59aeb15 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -2510,6 +2510,16 @@ <optional> <ref name="discard"/> </optional> + <optional> + <attribute name="virtio_discard"> + <ref name="virOnOff"/> + </attribute> + </optional> + <optional> + <attribute name="virtio_write_zeroes"> + <ref name="virOnOff"/> + </attribute> + </optional> <optional> <ref name="driverIOThread"/> </optional> diff --git a/src/conf/storage_source_conf.c b/src/conf/storage_source_conf.c index dcac3a8ff6..c2fee652f3 100644 --- a/src/conf/storage_source_conf.c +++ b/src/conf/storage_source_conf.c @@ -810,6 +810,8 @@ virStorageSourceCopy(const virStorageSource *src, def->discard = src->discard; def->detect_zeroes = src->detect_zeroes; def->discard_no_unref = src->discard_no_unref; + def->virtio_discard = src->virtio_discard; + def->virtio_write_zeroes = src->virtio_write_zeroes; def->sslverify = src->sslverify; def->readahead = src->readahead; def->timeout = src->timeout; diff --git a/src/conf/storage_source_conf.h b/src/conf/storage_source_conf.h index f13e7c756a..5e7f093fc0 100644 --- a/src/conf/storage_source_conf.h +++ b/src/conf/storage_source_conf.h @@ -400,6 +400,8 @@ struct _virStorageSource { int discard; /* enum virDomainDiskDiscard */ int detect_zeroes; /* enum virDomainDiskDetectZeroes */ virTristateSwitch discard_no_unref; + virTristateSwitch virtio_discard; + virTristateSwitch virtio_write_zeroes; bool floppyimg; /* set to true if the storage source is going to be used as a source for floppy drive */ diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 3700b3e711..634d8699d0 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11031,6 +11031,8 @@ qemuDomainPrepareDiskSourceData(virDomainDiskDef *disk, src->cachemode = disk->cachemode; src->discard = disk->discard; src->discard_no_unref = disk->discard_no_unref; + src->virtio_discard = disk->virtio_discard; + src->virtio_write_zeroes = disk->virtio_write_zeroes; if (disk->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY) src->floppyimg = true; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f20544590d..43e157da68 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14243,9 +14243,11 @@ qemuDomainBlockCopyCommon(virDomainObj *vm, * Since 'mirror' has the ambition to replace it we need to propagate * it into the mirror too. We do it directly as otherwise we'd need * to modify all callers of 'qemuDomainPrepareStorageSourceBlockdev' - * Same for discard_no_unref */ + * Same for discard_no_unref,virtio_discard and virtio_write_zeroes */ mirror->detect_zeroes = disk->detect_zeroes; mirror->discard_no_unref = disk->discard_no_unref; + mirror->virtio_discard = disk->virtio_discard; + mirror->virtio_write_zeroes = disk->virtio_write_zeroes; /* If reusing an external image that includes a backing file but the user * did not enumerate the chain in the XML we need to detect the chain */ diff --git a/src/vz/vz_utils.c b/src/vz/vz_utils.c index 7db7dbd419..1d824a9d2b 100644 --- a/src/vz/vz_utils.c +++ b/src/vz/vz_utils.c @@ -353,6 +353,18 @@ vzCheckDiskUnsupportedParams(virDomainDiskDef *disk) return -1; } + if (disk->virtio_discard) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Disk virtio_discard is not supported by vz driver.")); + return -1; + } + + if (disk->virtio_write_zeroes) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Disk virtio_write_zeroes is not supported by vz driver.")); + return -1; + } + if (disk->startupPolicy != VIR_DOMAIN_STARTUP_POLICY_DEFAULT) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Setting up disk startup policy is not " -- 2.38.5

On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote:
From: Hyman Huang(黄勇) <yong.huang@smartx.com>
Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control whether discard and write-zeroes requests are handled by the virtio-blk device.
To distinguish the attributes in block drive layer, use the 'virtio' prefix to indicate that these attributes are specific for virtio-blk.
Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> --- docs/formatdomain.rst | 8 ++++++++ src/conf/domain_conf.c | 16 ++++++++++++++++ src/conf/domain_conf.h | 2 ++ src/conf/schemas/domaincommon.rng | 10 ++++++++++ src/conf/storage_source_conf.c | 2 ++ src/conf/storage_source_conf.h | 2 ++ src/qemu/qemu_domain.c | 2 ++ src/qemu/qemu_driver.c | 4 +++- src/vz/vz_utils.c | 12 ++++++++++++ 9 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 4af0b82569..7be12ff08e 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the ``disk`` element. value can be either "unmap" (allow the discard request to be passed) or "ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU and KVM only)` + - The optional ``virtio_discard`` and ``virtio_write_zeroes`` are attributes + that control whether discard and write-zeroes requests are handled by the + virtio-blk device. The feature is based on DISCARD and WRITE_ZEROES + commands introduced in virtio-blk protocol to improve performance when + using SSD backend. The value can be either 'on' or 'off'. Note that + ``discard`` and ``write_zeroes`` implementations in the block drive layer + are parts of the feature logically and should be turned on when enabling + the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
Based on current released qemu both 'discard' and 'write-zeroes' feature of the 'virtio-blk' device is enabled by default: $ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard discard=<bool> - on/off (default: true) discard_granularity=<size> - (default: 4294967295) max-discard-sectors=<uint32> - (default: 4194303) report-discard-granularity=<bool> - (default: true) $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes max-write-zeroes-sectors=<uint32> - (default: 4194303) write-zeroes=<bool> - on/off (default: true) Do you need a way to disable this feature? Based on the description the default seems to be sane and actually what you'd want users to set. The feature was introduced to qemu by: commit 5c81161f804144b146607f890e84613a4cbad95c Author: Stefano Garzarella <sgarzare@redhat.com> Date: Thu Feb 21 11:33:07 2019 +0100 virtio-blk: add "discard" and "write-zeroes" properties In order to avoid migration issues, we enable DISCARD and WRITE_ZEROES features only for machine type >= 4.0 As discussed with Michael S. Tsirkin and Stefan Hajnoczi on the list [1], DISCARD operation should not have security implications (eg. page cache attacks), so we can enable it by default. The default was always enabled for all new machine types, so you should only ever need to update your machine type.

On Tue, Jul 18, 2023 at 9:31 PM Peter Krempa <pkrempa@redhat.com> wrote:
From: Hyman Huang(黄勇) <yong.huang@smartx.com>
Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control whether discard and write-zeroes requests are handled by the virtio-blk device.
To distinguish the attributes in block drive layer, use the 'virtio' prefix to indicate that these attributes are specific for virtio-blk.
Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> --- docs/formatdomain.rst | 8 ++++++++ src/conf/domain_conf.c | 16 ++++++++++++++++ src/conf/domain_conf.h | 2 ++ src/conf/schemas/domaincommon.rng | 10 ++++++++++ src/conf/storage_source_conf.c | 2 ++ src/conf/storage_source_conf.h | 2 ++ src/qemu/qemu_domain.c | 2 ++ src/qemu/qemu_driver.c | 4 +++- src/vz/vz_utils.c | 12 ++++++++++++ 9 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 4af0b82569..7be12ff08e 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the ``disk`` element. value can be either "unmap" (allow the discard request to be
On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote: passed) or
"ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU
and KVM
only)` + - The optional ``virtio_discard`` and ``virtio_write_zeroes`` are
+ that control whether discard and write-zeroes requests are handled by the + virtio-blk device. The feature is based on DISCARD and WRITE_ZEROES + commands introduced in virtio-blk protocol to improve performance when + using SSD backend. The value can be either 'on' or 'off'. Note
attributes that
+ ``discard`` and ``write_zeroes`` implementations in the block drive layer + are parts of the feature logically and should be turned on when enabling + the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
Based on current released qemu both 'discard' and 'write-zeroes' feature of the 'virtio-blk' device is enabled by default:
$ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard discard=<bool> - on/off (default: true) discard_granularity=<size> - (default: 4294967295) max-discard-sectors=<uint32> - (default: 4194303) report-discard-granularity=<bool> - (default: true) $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes max-write-zeroes-sectors=<uint32> - (default: 4194303) write-zeroes=<bool> - on/off (default: true)
Do you need a way to disable this feature? Based on the description the default seems to be sane and actually what you'd want users to set.
The default is reasonable indeed. But there are still some scenarios in the production environment that discard may need to be disabled. For example: - The virtio-blk discard/write-zeroes commands was introduced in 2017 in virtio-blk spec, so the OS distribution before 2017 can not use this feature. In that case, the cloud management platform(CMP) could recognize this issue and disable the feature in advance. - Discard/write-zeroes may need to be configured at disk granularity in some scenarios. Assume that CMP support 2 distributed storage clusters, one cluster supports discard and another does not. If the VM is configured with 2 disks - vda, vdb. Which are located in 2 clusters respectively. Or, the VM with disks located in the discard-supportive cluster and want to migrate disks to another cluster for some reason(eg. storage capability is exhausted) CMP may want to turn discard off explicitly. Though the above scenarios are quite rare and the virtio spec can ensure the feature can be negotiated and used correctly. CMP still wants to control the features it supports carefully and precisely. To summarise, IMHO, libvirt may not forbid the upper layer to control the 2 features of the virtio-blk disk. Leaving the option configurable for CMP or even customers. There's one more background may need to be stated: Current released QEMU do not provide hmp/qmp to query the final features that are negotiated between front-end and back-end from the hypervisor side. So if CMP wants to query what features a guest VM uses, it has to query it inside the guest VM or hack the qemu process. This way is inconvenient for control-planes, so the CMP needs to control the feature as aggressively as it can. Thank Peter for the attention to this patchset. Yong
The feature was introduced to qemu by:
commit 5c81161f804144b146607f890e84613a4cbad95c Author: Stefano Garzarella <sgarzare@redhat.com> Date: Thu Feb 21 11:33:07 2019 +0100
virtio-blk: add "discard" and "write-zeroes" properties
In order to avoid migration issues, we enable DISCARD and WRITE_ZEROES features only for machine type >= 4.0
As discussed with Michael S. Tsirkin and Stefan Hajnoczi on the list [1], DISCARD operation should not have security implications (eg. page cache attacks), so we can enable it by default.
The default was always enabled for all new machine types, so you should only ever need to update your machine type.
-- Best regards

On Wed, Jul 19, 2023 at 11:52:45 +0800, Yong Huang wrote:
On Tue, Jul 18, 2023 at 9:31 PM Peter Krempa <pkrempa@redhat.com> wrote:
From: Hyman Huang(黄勇) <yong.huang@smartx.com>
Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control whether discard and write-zeroes requests are handled by the virtio-blk device.
To distinguish the attributes in block drive layer, use the 'virtio' prefix to indicate that these attributes are specific for virtio-blk.
Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> --- docs/formatdomain.rst | 8 ++++++++ src/conf/domain_conf.c | 16 ++++++++++++++++ src/conf/domain_conf.h | 2 ++ src/conf/schemas/domaincommon.rng | 10 ++++++++++ src/conf/storage_source_conf.c | 2 ++ src/conf/storage_source_conf.h | 2 ++ src/qemu/qemu_domain.c | 2 ++ src/qemu/qemu_driver.c | 4 +++- src/vz/vz_utils.c | 12 ++++++++++++ 9 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 4af0b82569..7be12ff08e 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the ``disk`` element. value can be either "unmap" (allow the discard request to be
On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote: passed) or
"ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU
and KVM
only)` + - The optional ``virtio_discard`` and ``virtio_write_zeroes`` are
+ that control whether discard and write-zeroes requests are handled by the + virtio-blk device. The feature is based on DISCARD and WRITE_ZEROES + commands introduced in virtio-blk protocol to improve performance when + using SSD backend. The value can be either 'on' or 'off'. Note
attributes that
+ ``discard`` and ``write_zeroes`` implementations in the block drive layer + are parts of the feature logically and should be turned on when enabling + the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
Based on current released qemu both 'discard' and 'write-zeroes' feature of the 'virtio-blk' device is enabled by default:
$ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard discard=<bool> - on/off (default: true) discard_granularity=<size> - (default: 4294967295) max-discard-sectors=<uint32> - (default: 4194303) report-discard-granularity=<bool> - (default: true) $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes max-write-zeroes-sectors=<uint32> - (default: 4194303) write-zeroes=<bool> - on/off (default: true)
Do you need a way to disable this feature? Based on the description the default seems to be sane and actually what you'd want users to set.
The default is reasonable indeed. But there are still some scenarios in the production environment that discard may need to be disabled. For example: - The virtio-blk discard/write-zeroes commands was introduced in 2017 in virtio-blk spec, so the OS distribution before 2017 can not use this feature. In that case, the cloud management platform(CMP) could recognize this issue and disable the feature in advance.
Older drivers which do not support the feature are supposed to ignore any new feature bits: https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#... 2.2 Feature Bits Each virtio device offers all the features it understands. During device initialization, the driver reads this and tells the device the subset that it accepts. The only way to renegotiate is to reset the device. This allows for forwards and backwards compatibility: if the device is enhanced with a new feature bit, older drivers will not write that feature bit back to the device. Similarly, if a driver is enhanced with a feature that the device doesn’t support, it see the new feature is not offered. Based on the above it's just fine to use a older OS with the new device definition and management doesn't need to do anything to ensure compatibility.
- Discard/write-zeroes may need to be configured at disk granularity in some scenarios. Assume that CMP support 2 distributed storage clusters, one cluster supports discard and another does not. If the VM is configured with 2 disks - vda, vdb. Which are located in 2 clusters respectively. Or, the VM with disks located in the discard-supportive cluster and want to migrate disks to another cluster for some reason(eg. storage capability is exhausted) CMP may want to turn discard off explicitly.
There are two distinct operations which have different requirements and limitations based on the specification: - discard: The virtio spec says: "For discard commands, the device MAY deallocate the specified range of sectors in the device backend storage." Thus the device implementation is free to ignore any unsupported discard requests. In fact the implementation in qemu does ignore the request if e.g. the device's backend has discard disabled. The backend is in fact configured by the <driver discard="unmap"> attribute, thus if you need to disable discard you can do that even now on the backend level. - write zeroes: Write zeroes becomes mandatory once negotiated, but qemu has it's own implementation which has integrated fallback of simply writing zeroes to the backing image if there isn't any faster method available.
Though the above scenarios are quite rare and the virtio spec can ensure the feature can be negotiated and used correctly. CMP still wants to control the features it supports carefully and precisely.
As we both note there's not an actual technical problem here, right? As in, everything will work as configured. I'd prefer if you could justify/elaborate a bit more why you still need to control those features.
To summarise, IMHO, libvirt may not forbid the upper layer to control the 2 features of the virtio-blk disk. Leaving the option configurable for CMP or even customers.
As noted above, disard requests reaching the block device below the image can be already configured via <driver discard="unmap/ignore">. With the 'write-zeroes' feature there isn't really any semantic difference regardless of what feature is supported by the underlying storage. The difference is only in who actually does the actual writing of zeroes.
There's one more background may need to be stated: Current released QEMU do not provide hmp/qmp to query the final features that are negotiated between front-end and back-end from the hypervisor side. So if CMP wants to query what features a guest VM uses, it has to query it inside the guest VM or hack the qemu process.
I don't quite understand why that would be something any management application needs to know. The hypervisor provides the features and if the guest OS decides not to use them it' doesn't really matter, or does it? There are multiple other layers apart from the hypervisor support for passing discards and the software that runs inside the VM (kernel, volume manager, filesystem driver, filesystem configuration) which can decide to use/don't use discards so the knowledge at the hypervisor level doesn't tell you much.
This way is inconvenient for control-planes, so the CMP needs to control the feature as aggressively as it can.
Based on the above, the only thing you can achieve is to forbid discards for a VM on another level than those already supported. You will not be able to force the OS to use discards and neither will be able to tell whether the OS does use them in any other way you do now can (you still can query number of discard requests send to the image backend).
Thank Peter for the attention to this patchset.
My objections to this patchset are on the fact that the code originally didn't provide much justification for this patch. With your explanations I can't say I'm persuaded more of the contrary. If you decide to still pursue this patchset: - Add justification to the commit message why that is needed. This is needed because the code itself can't justify the need for itself. Additionally do not use the first justification from this message, it simply doesn't have technical grounds. - rename the 'virtio_discard' and 'virtio_write_zeroes' features to 'frontend_discard' and 'frontend_write_zeroes', so that if any other device frontend will expose such knobs they can be used without the need for yet another knob - modify the documentation where it states that in the vast majority of cases the user doesn't need to configure it because it simply does the right thing (this is actually the main objection, if the user doesn't need to set a feature it doesn't make sense to expose a knob) - Since this is a device frontend feature you'll need to add a ABI stability check for it (see. virDomainDiskDefCheckABIStability). Note that the existing discard/write_zeroes knobs don't need a ABI stability check because they are backend features thus can be changed e.g. on migration.

I'm sorry for the late reply, we have discussed your point of view and agree with the point, so we stop pursuing this patchset unless we find a persuasive scene that needs to disable these features. Thank Peter for the explanation of the main objection. :) Yong On Wed, Jul 19, 2023 at 6:17 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Tue, Jul 18, 2023 at 9:31 PM Peter Krempa <pkrempa@redhat.com> wrote:
From: Hyman Huang(黄勇) <yong.huang@smartx.com>
Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control whether discard and write-zeroes requests are handled by the virtio-blk device.
To distinguish the attributes in block drive layer, use the 'virtio' prefix to indicate that these attributes are specific for virtio-blk.
Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> --- docs/formatdomain.rst | 8 ++++++++ src/conf/domain_conf.c | 16 ++++++++++++++++ src/conf/domain_conf.h | 2 ++ src/conf/schemas/domaincommon.rng | 10 ++++++++++ src/conf/storage_source_conf.c | 2 ++ src/conf/storage_source_conf.h | 2 ++ src/qemu/qemu_domain.c | 2 ++ src/qemu/qemu_driver.c | 4 +++- src/vz/vz_utils.c | 12 ++++++++++++ 9 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 4af0b82569..7be12ff08e 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the ``disk`` element. value can be either "unmap" (allow the discard request to be
On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote: passed) or
"ignore" (ignore the discard request). :since:`Since 1.0.6
(QEMU and KVM
only)` + - The optional ``virtio_discard`` and ``virtio_write_zeroes``
are attributes
+ that control whether discard and write-zeroes requests are handled by the + virtio-blk device. The feature is based on DISCARD and WRITE_ZEROES + commands introduced in virtio-blk protocol to improve
On Wed, Jul 19, 2023 at 11:52:45 +0800, Yong Huang wrote: performance
+ using SSD backend. The value can be either 'on' or 'off'. Note
when that
+ ``discard`` and ``write_zeroes`` implementations in the block drive layer + are parts of the feature logically and should be turned on when enabling + the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
Based on current released qemu both 'discard' and 'write-zeroes' feature of the 'virtio-blk' device is enabled by default:
$ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard discard=<bool> - on/off (default: true) discard_granularity=<size> - (default: 4294967295) max-discard-sectors=<uint32> - (default: 4194303) report-discard-granularity=<bool> - (default: true) $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes max-write-zeroes-sectors=<uint32> - (default: 4194303) write-zeroes=<bool> - on/off (default: true)
Do you need a way to disable this feature? Based on the description the default seems to be sane and actually what you'd want users to set.
The default is reasonable indeed. But there are still some scenarios in the production environment that discard may need to be disabled. For example: - The virtio-blk discard/write-zeroes commands was introduced in 2017 in virtio-blk spec, so the OS distribution before 2017 can not use this feature. In that case, the cloud management platform(CMP) could recognize this issue and disable the feature in advance.
Older drivers which do not support the feature are supposed to ignore any new feature bits:
https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#...
2.2 Feature Bits
Each virtio device offers all the features it understands. During device initialization, the driver reads this and tells the device the subset that it accepts. The only way to renegotiate is to reset the device.
This allows for forwards and backwards compatibility: if the device is enhanced with a new feature bit, older drivers will not write that feature bit back to the device. Similarly, if a driver is enhanced with a feature that the device doesn’t support, it see the new feature is not offered.
Based on the above it's just fine to use a older OS with the new device definition and management doesn't need to do anything to ensure compatibility.
- Discard/write-zeroes may need to be configured at disk granularity in some scenarios. Assume that CMP support 2 distributed storage clusters, one cluster supports discard and another does not. If the VM is configured with 2 disks - vda, vdb. Which are located in 2 clusters respectively. Or, the VM with disks located in the discard-supportive cluster and want to migrate disks to another cluster for some reason(eg. storage capability is exhausted) CMP may want to turn discard off explicitly.
There are two distinct operations which have different requirements and limitations based on the specification:
- discard: The virtio spec says: "For discard commands, the device MAY deallocate the specified range of sectors in the device backend storage."
Thus the device implementation is free to ignore any unsupported discard requests. In fact the implementation in qemu does ignore the request if e.g. the device's backend has discard disabled.
The backend is in fact configured by the <driver discard="unmap"> attribute, thus if you need to disable discard you can do that even now on the backend level.
- write zeroes:
Write zeroes becomes mandatory once negotiated, but qemu has it's own implementation which has integrated fallback of simply writing zeroes to the backing image if there isn't any faster method available.
Though the above scenarios are quite rare and the virtio spec can ensure the feature can be negotiated and used correctly. CMP still wants to control the features it supports carefully and precisely.
As we both note there's not an actual technical problem here, right? As in, everything will work as configured.
I'd prefer if you could justify/elaborate a bit more why you still need to control those features.
To summarise, IMHO, libvirt may not forbid the upper layer to control the 2 features of the virtio-blk disk. Leaving the option configurable for CMP or even customers.
As noted above, disard requests reaching the block device below the image can be already configured via <driver discard="unmap/ignore">.
With the 'write-zeroes' feature there isn't really any semantic difference regardless of what feature is supported by the underlying storage. The difference is only in who actually does the actual writing of zeroes.
There's one more background may need to be stated: Current released QEMU do not provide hmp/qmp to query the final features that are negotiated between front-end and back-end from the hypervisor side. So if CMP wants to query what features a guest VM uses, it has to query it inside the guest VM or hack the qemu process.
I don't quite understand why that would be something any management application needs to know. The hypervisor provides the features and if the guest OS decides not to use them it' doesn't really matter, or does it?
There are multiple other layers apart from the hypervisor support for passing discards and the software that runs inside the VM (kernel, volume manager, filesystem driver, filesystem configuration) which can decide to use/don't use discards so the knowledge at the hypervisor level doesn't tell you much.
This way is inconvenient for control-planes, so the CMP needs to control the feature as aggressively as it can.
Based on the above, the only thing you can achieve is to forbid discards for a VM on another level than those already supported. You will not be able to force the OS to use discards and neither will be able to tell whether the OS does use them in any other way you do now can (you still can query number of discard requests send to the image backend).
Thank Peter for the attention to this patchset.
My objections to this patchset are on the fact that the code originally didn't provide much justification for this patch. With your explanations I can't say I'm persuaded more of the contrary.
If you decide to still pursue this patchset:
- Add justification to the commit message why that is needed. This is needed because the code itself can't justify the need for itself. Additionally do not use the first justification from this message, it simply doesn't have technical grounds.
- rename the 'virtio_discard' and 'virtio_write_zeroes' features to 'frontend_discard' and 'frontend_write_zeroes', so that if any other device frontend will expose such knobs they can be used without the need for yet another knob
- modify the documentation where it states that in the vast majority of cases the user doesn't need to configure it because it simply does the right thing (this is actually the main objection, if the user doesn't need to set a feature it doesn't make sense to expose a knob)
- Since this is a device frontend feature you'll need to add a ABI stability check for it (see. virDomainDiskDefCheckABIStability). Note that the existing discard/write_zeroes knobs don't need a ABI stability check because they are backend features thus can be changed e.g. on migration.
-- Best regards

From: Hyman Huang(黄勇) <yong.huang@smartx.com> Generate cmd line for virtio-blk.discard and virtio-blk.write-zeroes properties. Also, validate that the requested feature is supported by QEMU. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> --- src/qemu/qemu_command.c | 2 + src/qemu/qemu_validate.c | 9 +++ .../disk-virtio-discard.x86_64-latest.args | 45 ++++++++++++++ .../qemuxml2argvdata/disk-virtio-discard.xml | 45 ++++++++++++++ tests/qemuxml2argvtest.c | 1 + .../disk-virtio-discard.x86_64-latest.xml | 58 +++++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 7 files changed, 161 insertions(+) create mode 100644 tests/qemuxml2argvdata/disk-virtio-discard.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/disk-virtio-discard.xml create mode 100644 tests/qemuxml2xmloutdata/disk-virtio-discard.x86_64-latest.xml diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index ad224571f3..6954fc5d75 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1834,6 +1834,8 @@ qemuBuildDiskDeviceProps(const virDomainDef *def, "T:scsi", scsi, "p:num-queues", disk->queues, "p:queue-size", disk->queue_size, + "T:discard", disk->virtio_discard, + "T:write-zeroes", disk->virtio_write_zeroes, NULL) < 0) return NULL; } diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 7e09e2c52f..499b49ce74 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -2949,6 +2949,15 @@ qemuValidateDomainDeviceDefDiskFrontend(const virDomainDiskDef *disk, } } + if (disk->virtio_discard || disk->virtio_write_zeroes) { + if (disk->bus != VIR_DOMAIN_DISK_BUS_VIRTIO) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("virtio_discard and virtio_write_zeroes options are " + "only valid for VIRTIO bus")); + return -1; + } + } + switch (disk->bus) { case VIR_DOMAIN_DISK_BUS_SCSI: if (disk->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DRIVE) { diff --git a/tests/qemuxml2argvdata/disk-virtio-discard.x86_64-latest.args b/tests/qemuxml2argvdata/disk-virtio-discard.x86_64-latest.args new file mode 100644 index 0000000000..ee7a7edc9c --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-discard.x86_64-latest.args @@ -0,0 +1,45 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=off \ +-accel tcg \ +-cpu qemu64 \ +-m size=219136k \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":224395264}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ +-blockdev '{"driver":"file","filename":"/tmp/data.img","node-name":"libvirt-4-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-4-format","read-only":false,"driver":"raw","file":"libvirt-4-storage"}' \ +-device '{"driver":"virtio-blk-pci","discard":true,"write-zeroes":true,"bus":"pci.0","addr":"0x2","drive":"libvirt-4-format","id":"virtio-disk0","bootindex":1}' \ +-blockdev '{"driver":"file","filename":"/tmp/data1.img","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-3-format","read-only":false,"driver":"raw","file":"libvirt-3-storage"}' \ +-device '{"driver":"virtio-blk-pci","discard":true,"write-zeroes":false,"bus":"pci.0","addr":"0x3","drive":"libvirt-3-format","id":"virtio-disk1"}' \ +-blockdev '{"driver":"file","filename":"/tmp/data2.img","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"raw","file":"libvirt-2-storage"}' \ +-device '{"driver":"virtio-blk-pci","discard":false,"write-zeroes":true,"bus":"pci.0","addr":"0x4","drive":"libvirt-2-format","id":"virtio-disk2"}' \ +-blockdev '{"driver":"file","filename":"/tmp/data3.img","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ +-device '{"driver":"virtio-blk-pci","discard":false,"write-zeroes":false,"bus":"pci.0","addr":"0x5","drive":"libvirt-1-format","id":"virtio-disk3"}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.0","addr":"0x6"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvdata/disk-virtio-discard.xml b/tests/qemuxml2argvdata/disk-virtio-discard.xml new file mode 100644 index 0000000000..cd6ec6a51d --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-discard.xml @@ -0,0 +1,45 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='raw' virtio_discard='on' virtio_write_zeroes='on'/> + <source file='/tmp/data.img'/> + <target dev='vda' bus='virtio'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='raw' virtio_discard='on' virtio_write_zeroes='off'/> + <source file='/tmp/data1.img'/> + <target dev='vdb' bus='virtio'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='raw' virtio_discard='off' virtio_write_zeroes='on'/> + <source file='/tmp/data2.img'/> + <target dev='vdc' bus='virtio'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='raw' virtio_discard='off' virtio_write_zeroes='off'/> + <source file='/tmp/data3.img'/> + <target dev='vdd' bus='virtio'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 534eb9e699..9c7ff63ff1 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1420,6 +1420,7 @@ mymain(void) DO_TEST_CAPS_LATEST("net-vdpa"); DO_TEST_CAPS_LATEST("net-vdpa-multiqueue"); DO_TEST_CAPS_LATEST("net-virtio-rss"); + DO_TEST_CAPS_LATEST("disk-virtio-discard"); DO_TEST("hostdev-pci-multifunction", QEMU_CAPS_KVM, diff --git a/tests/qemuxml2xmloutdata/disk-virtio-discard.x86_64-latest.xml b/tests/qemuxml2xmloutdata/disk-virtio-discard.x86_64-latest.xml new file mode 100644 index 0000000000..9895fe13fb --- /dev/null +++ b/tests/qemuxml2xmloutdata/disk-virtio-discard.x86_64-latest.xml @@ -0,0 +1,58 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='raw' virtio_discard='on' virtio_write_zeroes='on'/> + <source file='/tmp/data.img'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='raw' virtio_discard='on' virtio_write_zeroes='off'/> + <source file='/tmp/data1.img'/> + <target dev='vdb' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='raw' virtio_discard='off' virtio_write_zeroes='on'/> + <source file='/tmp/data2.img'/> + <target dev='vdc' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='raw' virtio_discard='off' virtio_write_zeroes='off'/> + <source file='/tmp/data3.img'/> + <target dev='vdd' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </disk> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index b66274beb8..aa29b89466 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -600,6 +600,7 @@ mymain(void) DO_TEST_CAPS_LATEST("disk-discard"); DO_TEST_CAPS_LATEST("disk-detect-zeroes"); DO_TEST_CAPS_LATEST("disk-discard_no_unref"); + DO_TEST_CAPS_LATEST("disk-virtio-discard"); DO_TEST_NOCAPS("disk-serial"); -- 2.38.5

From: Hyman Huang(黄勇) <yong.huang@smartx.com> Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> --- NEWS.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 42c2c53091..b6fde7c353 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -17,6 +17,14 @@ v9.6.0 (unreleased) * **New features** + * qemu: Introducing ``discard`` and ``write_zeroes`` features of virtio-blk device + + The optional ``virtio_discard`` and ``virtio_write_zeroes`` attributes + control whether discard and write-zeroes requests are handled by the + virtio-blk device. The feature is based on DISCARD and WRITE_ZEROES commands + introduced in virtio-blk protocol to improve performance when using SSD + backend. + * **Improvements** * apparmor: All profiles and abstractions now support local overrides -- 2.38.5
participants (3)
-
Peter Krempa
-
Yong Huang
-
~hyman