On Thu, Jun 03, 2021 at 04:37:49PM +0200, Peter Krempa wrote:
Hi,
recently I've got a report that an upgrade of libvirt (and qemu) caused
a guest-visible change in the SCSI disk identification when a very long
serial number is used.
I've traced it back to the point where libvirt started to use the
'device_id=' property of the SCSI disk to pass in the alias of the disk
when the serial is not configured and the serial if it is.
https://gitlab.com/libvirt/libvirt/-/commit/a1dce96236f6d35167924fa7e6a70...
The change is caused by the fact that when serial is configured via the
'serial=' property it's being silently truncated.
Now there are two distinct VPD pages which report the serial number:
0x83 - device identification
This one used to report only the device alias in the beginning but
starting from qemu commit:
commit fd9307912d0a2ffa0310f9e20935d96d5af0a1ca
Author: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri Mar 16 19:12:43 2012 +0100
scsi: copy serial number into VPD page 0x83
Currently QEMU passes the qdev device id to the guest in an ASCII-string
designator in page 0x83. While this is fine, it does not match what
real hardware does; usually the ASCII-string designator there hosts
another copy of the serial number (there can be other designators,
for example with a world-wide name). Do the same for QEMU SCSI
disks.
ATAPI does not support VPD pages, so it does not matter there.
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
it reports the serial number instead of the device alias when the
serial is configured. Now this historically copied the IDE(?) limit of
20 characters.
Now with the change to use 'device_id' which overrides the behavior the
length of the reported value is limited to the technical limit of 255-8
which creates the problem.
Libvirt uses 'device_id' because when -blockdev is used the disk alias
which was configured via the -drive is no longer configured and thus
would be missing.
Libvirt also (unfortunately in this case I'd say) started to pass the
serial number via this property.
0x80 - device serial (optional, only when serial is configured)
This one started similarly to the 0x83 page to report the serial
truncated to 20, but later in commit:
commit 48b6206305b8d56524ac2ee347b68e6e0a528559
Author: Rony Weng <ronyweng(a)synology.com>
Date: Mon Aug 29 15:52:18 2016 +0800
scsi-disk: change disk serial length from 20 to 36
Openstack Cinder assigns volume a 36 characters uuid as serial.
QEMU will shrinks the uuid to 20 characters, which does not match
the original uuid.
Note that there is no limit to the length of the serial number in
the SCSI spec. 20 was copy-pasted from virtio-blk which in turn was
copy-pasted from ATA; 36 is even more arbitrary. However, bumping it
up too much might cause issues (e.g. 252 seems to make sense because
then the maximum amount of returned data is 256; but who knows there's
no off-by-one somewhere for such a nicely rounded number).
Signed-off-by: Rony Weng <ronyweng(a)synology.com>
Message-Id: <1472457138-23386-1-git-send-email-ronyweng(a)synology.com>
Cc: qemu-stable(a)nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
qemu actually changed the silent truncation to another arbitrary value
of 36, which is the length of the UUID.
Thus qemu isn't inocent either in these regards.
Now based on the fact that the above mentioned libvirt commit is
contained in libvirt-5.1 (and qemu-4.0 adds support for 'device_id')
reverting to truncation to 20 characters would IMO also be considerable
as regression, based on the fact that there are users who changed qemu
to lessen the truncation.
As of such I don't think libvirt should revert to using the trucated
serial despite an ABI change.
On the other hand QEMU should IMO:
1) unify the truncation to a single length; preferrably the technical
limit
2) add possibility to report error when the serial is too long (libvirt
can accept a new property for example)
I'm open to other suggestions though.
Feels like we're essentially doomed in every scenario from an ABI compat
POV. The best we can do I think is to document in libvirt what the
various limits are. Then say if you provide a value below the limit,
ABI stability is ensured, but if you go above the documented limit,
behaviour is undefined.
I agree that reporting an error in QEMU is more desirable than silently
truncating. This would show the mgmt app that they were supplying a value
that was too large and then could have truncated it themselves. Then when
QEMU later raised the limit, it would not have been an ABI regression.
QEMU could start off with a deprecation warning for over-long serials,
and turn it into a hard error after 2 releases.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|