On Wed, Jul 19, 2023 at 11:52:45 +0800, Yong Huang wrote:
> On Tue, Jul 18, 2023 at 9:31 PM Peter Krempa <pkrempa@redhat.com> wrote:
>
> > On Sun, Jul 16, 2023 at 22:36:21 +0800, ~hyman wrote:
> > > From: Hyman Huang(黄勇) <yong.huang@smartx.com>
> > >
> > > Add 'virtio_discard' and 'virtio_write_zeroes' attribute to control
> > > whether discard and write-zeroes requests are handled by the
> > > virtio-blk device.
> > >
> > > To distinguish the attributes in block drive layer, use the 'virtio'
> > > prefix to indicate that these attributes are specific for virtio-blk.
> > >
> > > Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
> > > ---
> > > docs/formatdomain.rst | 8 ++++++++
> > > src/conf/domain_conf.c | 16 ++++++++++++++++
> > > src/conf/domain_conf.h | 2 ++
> > > src/conf/schemas/domaincommon.rng | 10 ++++++++++
> > > src/conf/storage_source_conf.c | 2 ++
> > > src/conf/storage_source_conf.h | 2 ++
> > > src/qemu/qemu_domain.c | 2 ++
> > > src/qemu/qemu_driver.c | 4 +++-
> > > src/vz/vz_utils.c | 12 ++++++++++++
> > > 9 files changed, 57 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> > > index 4af0b82569..7be12ff08e 100644
> > > --- a/docs/formatdomain.rst
> > > +++ b/docs/formatdomain.rst
> > > @@ -3259,6 +3259,14 @@ paravirtualized driver is specified via the
> > ``disk`` element.
> > > value can be either "unmap" (allow the discard request to be
> > passed) or
> > > "ignore" (ignore the discard request). :since:`Since 1.0.6 (QEMU
> > and KVM
> > > only)`
> > > + - The optional ``virtio_discard`` and ``virtio_write_zeroes`` are
> > attributes
> > > + that control whether discard and write-zeroes requests are
> > handled by the
> > > + virtio-blk device. The feature is based on DISCARD and
> > WRITE_ZEROES
> > > + commands introduced in virtio-blk protocol to improve performance
> > when
> > > + using SSD backend. The value can be either 'on' or 'off'. Note
> > that
> > > + ``discard`` and ``write_zeroes`` implementations in the block
> > drive layer
> > > + are parts of the feature logically and should be turned on when
> > enabling
> > > + the feature. :since:`Since 9.6.0 (QEMU and KVM only)`
> >
> > Based on current released qemu both 'discard' and 'write-zeroes' feature
> > of the 'virtio-blk' device is enabled by default:
> >
> > $ qemu-system-x86_64 -device virtio-blk-pci,? | grep discard
> > discard=<bool> - on/off (default: true)
> > discard_granularity=<size> - (default: 4294967295)
> > max-discard-sectors=<uint32> - (default: 4194303)
> > report-discard-granularity=<bool> - (default: true)
> > $ qemu-system-x86_64 -device virtio-blk-pci,? | grep write-zeroes
> > max-write-zeroes-sectors=<uint32> - (default: 4194303)
> > write-zeroes=<bool> - on/off (default: true)
> >
> > Do you need a way to disable this feature? Based on the description the
> > default seems to be sane and actually what you'd want users to set.
> >
> The default is reasonable indeed. But there are still some scenarios
> in the production environment that discard may need to be disabled.
> For example:
> - The virtio-blk discard/write-zeroes commands was introduced in
> 2017 in virtio-blk spec, so the OS distribution before 2017 can not
> use this feature. In that case, the cloud management platform(CMP)
> could recognize this issue and disable the feature in advance.
Older drivers which do not support the feature are supposed to ignore
any new feature bits:
https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-140002
2.2 Feature Bits
Each virtio device offers all the features it understands. During device
initialization, the driver reads this and tells the device the subset
that it accepts. The only way to renegotiate is to reset the device.
This allows for forwards and backwards compatibility: if the device is
enhanced with a new feature bit, older drivers will not write that
feature bit back to the device. Similarly, if a driver is enhanced with
a feature that the device doesn’t support, it see the new feature is not
offered.
Based on the above it's just fine to use a older OS with the new device
definition and management doesn't need to do anything to ensure
compatibility.
> - Discard/write-zeroes may need to be configured at disk granularity
> in some scenarios. Assume that CMP support 2 distributed storage
> clusters, one cluster supports discard and another does not.
> If the VM is configured with 2 disks - vda, vdb. Which are located in
> 2 clusters respectively. Or, the VM with disks located in the
> discard-supportive cluster and want to migrate disks to another
> cluster for some reason(eg. storage capability is exhausted)
> CMP may want to turn discard off explicitly.
There are two distinct operations which have different requirements and
limitations based on the specification:
- discard:
The virtio spec says:
"For discard commands, the device MAY deallocate the specified
range of sectors in the device backend storage."
Thus the device implementation is free to ignore any unsupported
discard requests. In fact the implementation in qemu does ignore
the request if e.g. the device's backend has discard disabled.
The backend is in fact configured by the <driver discard="unmap">
attribute, thus if you need to disable discard you can do that even
now on the backend level.
- write zeroes:
Write zeroes becomes mandatory once negotiated, but qemu has it's own
implementation which has integrated fallback of simply writing zeroes
to the backing image if there isn't any faster method available.
> Though the above scenarios are quite rare and the virtio spec can
> ensure the feature can be negotiated and used correctly.
> CMP still wants to control the features it supports carefully and
> precisely.
As we both note there's not an actual technical problem here, right? As
in, everything will work as configured.
I'd prefer if you could justify/elaborate a bit more why you still need
to control those features.
> To summarise, IMHO, libvirt may not forbid the upper layer to
> control the 2 features of the virtio-blk disk. Leaving the option
> configurable for CMP or even customers.
As noted above, disard requests reaching the block device below the
image can be already configured via <driver discard="unmap/ignore">.
With the 'write-zeroes' feature there isn't really any semantic
difference regardless of what feature is supported by the underlying
storage. The difference is only in who actually does the actual writing
of zeroes.
> There's one more background may need to be stated:
> Current released QEMU do not provide hmp/qmp to query the
> final features that are negotiated between front-end and back-end
> from the hypervisor side. So if CMP wants to query what features a
> guest VM uses, it has to query it inside the guest VM or hack the
> qemu process.
I don't quite understand why that would be something any management
application needs to know. The hypervisor provides the features and if
the guest OS decides not to use them it' doesn't really matter, or does
it?
There are multiple other layers apart from the hypervisor support for
passing discards and the software that runs inside the VM (kernel,
volume manager, filesystem driver, filesystem configuration) which can
decide to use/don't use discards so the knowledge at the hypervisor
level doesn't tell you much.
> This way is inconvenient for control-planes, so the CMP
> needs to control the feature as aggressively as it can.
Based on the above, the only thing you can achieve is to forbid discards
for a VM on another level than those already supported. You will not be
able to force the OS to use discards and neither will be able to tell
whether the OS does use them in any other way you do now can (you still
can query number of discard requests send to the image backend).
> Thank Peter for the attention to this patchset.
My objections to this patchset are on the fact that the code originally
didn't provide much justification for this patch. With your explanations
I can't say I'm persuaded more of the contrary.
If you decide to still pursue this patchset:
- Add justification to the commit message why that is needed. This is
needed because the code itself can't justify the need for itself.
Additionally do not use the first justification from this message, it
simply doesn't have technical grounds.
- rename the 'virtio_discard' and 'virtio_write_zeroes' features to
'frontend_discard' and 'frontend_write_zeroes', so that if any other
device frontend will expose such knobs they can be used without the
need for yet another knob
- modify the documentation where it states that in the vast majority of
cases the user doesn't need to configure it because it simply does
the right thing (this is actually the main objection, if the user
doesn't need to set a feature it doesn't make sense to expose a
knob)
- Since this is a device frontend feature you'll need to add a ABI
stability check for it (see. virDomainDiskDefCheckABIStability). Note
that the existing discard/write_zeroes knobs don't need a ABI
stability check because they are backend features thus can be changed
e.g. on migration.