Hi

Thanks for following up, and sorry for the delay in getting back to you.

You're right to suspect the issue might be related to device changes. Here’s how the crash can be triggered:

  1. The VM initially uses a SATA controller, with a disk defined as:

    xml
    <controller type="scsi" index="0" model="lsilogic"></controller> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/Testguest.qcow2'/> <target dev='sda' bus='sata'/> </disk>
  2. A snapshot is created at this point — which records the disk as sda.

  3. Later, the VM is reconfigured to use a virtio controller, and the disk is now assigned as vda.

  4. When the VM is running and the snapshot is deleted, the snapshot code still expects to find a disk named sda in the current VM definition.

Because of this mismatch, qemuDomainDiskByName() returns NULL, and the crash occurs when the result is used without a null check.

This can easily happen during controller or disk bus reconfiguration between snapshot and deletion. The patch adds sanity checks to ensure we don’t dereference a null pointer in this situation.

Let me know if you’d like me to adjust the wording in the error messages or add a reproducer for automated testing.

Best regards,
kaihuan


Martin Kletzander <mkletzan@redhat.com> 于2025年1月22日周三 21:05写道:
On Fri, Nov 29, 2024 at 10:56:45PM +0800, kaihuan wrote:
>qemuDomainDiskByName() can return a NULL pointer on failure.
>But this returned value in qemuSnapshotDeleteValidate is not checked.It will make libvirtd crash.
>

Hi, looking through old unreviewed patches I found this one.  Sorry for
the wait.  Can you explain whether you found out how to trigger that?
This looks like there is some other issue that causes a snapshot having
a disk that is not in the domain.  Does that happen when you want to
delete a snapshot after unplugging a disk?

>Signed-off-by: kaihuan <jungleman759@gmail.com>
>---
> src/qemu/qemu_snapshot.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
>index 18b2e478f6..bcbd913073 100644
>--- a/src/qemu/qemu_snapshot.c
>+++ b/src/qemu/qemu_snapshot.c
>@@ -4242,8 +4242,19 @@ qemuSnapshotDeleteValidate(virDomainObj *vm,
>             virDomainDiskDef *vmdisk = NULL;
>             virDomainDiskDef *disk = NULL;
>
>-            vmdisk = qemuDomainDiskByName(vm->def, snapDisk->name);
>-            disk = qemuDomainDiskByName(snapdef->parent.dom, snapDisk->name);

The function calls already report an error when the disk is not found,
so there is not need to rewrite that error in my opinion.

>+            if (!(vmdisk = qemuDomainDiskByName(vm->def, snapDisk->name))) {
>+                virReportError(VIR_ERR_OPERATION_FAILED,
>+                            _("disk '%1$s' referenced by snapshot '%2$s' not found in the current definition"),
>+                            snapDisk->name, snap->def->name);
>+                return -1;
>+            }
>+
>+            if (!(disk = qemuDomainDiskByName(snapdef->parent.dom, snapDisk->name))) {
>+                virReportError(VIR_ERR_OPERATION_FAILED,
>+                            _("disk '%1$s' referenced by snapshot '%2$s' not found in the VM definition of the deleted snapshot"),
>+                            snapDisk->name, snap->def->name);
>+                return -1;
>+            }
>
>             if (!virStorageSourceIsSameLocation(vmdisk->src, disk->src)) {
>                 virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
>--
>2.33.1.windows.1
>