On 3/15/19 10:06 PM, Daniel Henrique Barboza wrote:
In a case where we want to hotplug the following disk:
<disk type='file' device='disk'>
(...)
<address type='drive' controller='0' bus='0'
target='0' unit='0'/>
</disk>
In a QEMU guest that has a single OS disk, as follows:
<disk type='file' device='disk'>
(...)
<address type='drive' controller='0' bus='0'
target='0' unit='0'/>
</disk>
What happens is that the existing guest disk will receive the ID
'scsi0-0-0-0' due to how Libvirt calculate the alias based on
the address in qemu_alias.c, qemuAssignDeviceDiskAlias. When hotplugging
a disk that happens to have the same address, Libvirt will calculate
the same ID to it and attempt to device_add. QEMU will refuse it:
$ virsh attach-device ub1810 hp-disk-dup.xml
error: Failed to attach device from hp-disk-dup.xml
error: internal error: unable to execute QEMU command 'device_add': Duplicate ID
'scsi0-0-0-0' for device
And Libvirt follows it up with a cleanup code in qemuDomainAttachDiskGeneric
that ends up removing what supposedly is a faulty hotplugged disk but, in
this case, ends up being the original guest disk.
This patch adds an address verification for all attached devices, avoid
calling the driver attach() function using a device with duplicated address.
The change is done in virDomainDefCompatibleDevice when @action is equal
to VIR_DOMAIN_DEVICE_ACTION_ATTACH. The affected callers are:
- qemuDomainAttachDeviceLiveAndConfig, both LIVE and CONFIG cases;
- lxcDomainAttachDeviceFlags, both LIVE and CONFIG.
The check is done using the virDomainDefHasDeviceAddress, a generic
function that can check address duplicates for all supported device
types, not limiting just to DeviceDisk type.
After this patch, this is the result of the previous attach-device call:
$ ./run tools/virsh attach-device ub1810 hp-disk-dup.xml
error: Failed to attach device from hp-disk-dup.xml
error: Requested operation is not valid: Domain already contains a device with the same
address
Reported-by: Srikanth Aithal <bssrikanth(a)in.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
Huh, how can this slip through for such long time? I mean, this is very
nasty bug because not only it can corrupt a running domain it may lead
to leaving a domain unable to start: virsh attach-device --config will
attach the device happily, change the XML stored on disk and libvirt
will refuse to start the domain then.
---
As mentioned above, this code impacts LXC too. Although I believe
this change isn't harmful, tt would be good to have an ACK of someone
familiar with the LXC driver to be safe.
I've looked in to LXC code too and it does look the same as in QEMU.
Therefore, it fixes the problem for LXC domains too.
src/conf/domain_conf.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 504c24b545..d3d33ef78d 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -28628,6 +28628,14 @@ virDomainDefCompatibleDevice(virDomainDefPtr def,
if (oldDev)
data.oldInfo = virDomainDeviceGetInfo(oldDev);
+ if (action == VIR_DOMAIN_DEVICE_ACTION_ATTACH &&
+ virDomainDefHasDeviceAddress(def, data.newInfo)) {
This is potentially expensive operation. I'm adding the following check:
data.newInfo &&
data.newInfo->type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE &&
+ virReportError(VIR_ERR_OPERATION_INVALID, "%s",
+ _("Domain already contains a device with the same "
+ "address"));
+ return -1;
+ }
+
if (action == VIR_DOMAIN_DEVICE_ACTION_UPDATE &&
live &&
(data.newInfo && data.oldInfo &&
ACKed and pushed.
Michal