On Wed, Jan 11, 2023 at 10:24:30AM -0500, Stefan Hajnoczi wrote:
On Tue, Jan 10, 2023 at 03:29:47PM +0000, Daniel P. Berrangé wrote:
> On Tue, Jan 10, 2023 at 10:19:51AM -0500, Stefan Hajnoczi wrote:
> > Hi Peter,
> > Zoned storage support
> > (
https://zonedstorage.io/docs/introduction/zoned-storage) is being added
> > to QEMU. Given a zoned host block device, the QEMU syntax will look like
> > this:
> >
> > --blockdev zoned_host_device,node-name=drive0,filename=/dev/$BDEV,...
> > --device virtio-blk-pci,drive=drive0
> >
> > Note that regular --blockdev host_device will not work.
> >
> > For now the virtio-blk device is the only one that supports zoned
> > blockdevs.
>
> Does the virtio-blk device expowsed guest ABI differ at all
> when connected zoned_host_device instead of host_device ?
Yes. There is a VIRTIO feature bit, some configuration space fields,
etc. virtio-blk-pci detects when the blockdev is zoned and enables the
feature bit.
I get a general sense of unease when frontend device ABI sensitive
features get secretly toggled based on features exposed by the
backend.
When trying to validate ABI compatibility of guest configs, libvirt
would generally compare frontend properties to look for differences.
There are a small set of cases where backends affect frontend
features, but it is not that common to see.
Consider what happens if we have a guest running no zoned storage,
and we need to evacuate the host to a machine without zoned
storage available. Could we replace the stroage backend on the
target host with a raw/qcow2 backend but keep pretending it is
zoned storage to the guest. The guest would continue making its
I/O ops be batched for the zoned storage, which would be redundant
for raw/qcow2, but presumbly should still work. If this is possible
it would suggest the need to have explicit settings for zoned storage
on the virtio-blk frontend. QEMU would "merely" validate that these
settings are turned on, if the host storage is zoned too.
> > This brings to mind a few questions:
> >
> > 1. Does libvirt need domain XML syntax for zoned storage? Alternatively,
> > it could probe /sys/block/$BDEV/queue/zoned and generate the correct
> > QEMU command-line arguments for zoned devices when the contents of
> > the file are not "none".
> >
> > 2. Should QEMU --blockdev host_device detected zoned devices so that
> > --blockdev zoned_host_device is not necessary? That way libvirt would
> > automatically support zoned storage without any domain XML syntax or
> > libvirt code changes.
> >
> > The drawbacks I see when QEMU detects zoned storage automatically:
> > - You can't easiy tell if a blockdev is zoned from the command-line.
> > - It's possible to mismatch zoned and non-zoned devices across live
> > migration.
>
> What happens with existing QEMU impls if you use --blockdev host_device
> pointing to a /dev/$BDEV that is a zoned device ? If it succeeds and
> works correctly, then we likely need to continue to support that. This
> would push towards needing a new XML element.
Pointing host_device at a zoned device doesn't result in useful behavior
because the guest is unaware that this is a zoned device. The guest
won't be able to access the device correctly (i.e. sequential writes
only). Write requests will fail eventually.
I would consider zoned devices totally unsupported in QEMU today and we
don't need to worry about preserving any kind of backwards compatibility
with --blockdev host_device,filename=/dev/my_zoned_device.
So I guess I'm not so worried about host_device vs zoned_host_device,
if we have explicit settings for controlled zoned behaviour on the
virtio-blk frontend.
I feel like we should have something explicit somewhere though, as this
is a pretty significant difference in the storage stack, that I think
mgmt apps should be aware of, as it has implications for how you manage
the VMs on an ongoing basis.
We could still have it "do what I mean" by default though. eg the
virtio-blk setting defaults could imply "match the host", so we get
effectively a tri-state (zoned=on/off/auto)
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|