Hello,
I found a problem that:
vm's status file may be left over in the path /var/run/libvirt/qemu under some
situation, such as host reboot. When vm's status file is left over, some
persistent but inactive vms will be lost by libvirtd after it is rebooted. And you can do
as follows to reproduce the problem:
1、Create a vm and start it by the commands: virsh define vm-xml and virsh start
vm-name.
2、Stop the libvirtd by the command: service libvirtd stop.
3、Kill the qemu process related to the vm, and make the vm's status file left over.
4、Start libvirtd.
After starting the libvirtd service, we find that the vm has been lost by libvirtd with
command"virsh list --all".
What we expect is that the vm is shown with shutoff status, should we?
The reason for the problem is that:
During libvirtd startup, it first loads status files of vms under the path
/var/run/libvirt/qemu, creates virDomainObj for each vm and adds it to
driver->domains list.
Then it creates a thread to connect related qemu process for each virDomainObj in the
domains list.Because the qemu process has been killed, so connecting to
qemu will be failed. When connecting to qemu failed, connection-thread will do the
follows:
1、Check if vm->persistent is 1.
2、If vm->persistent is not 1, then qemuDomainRemoveInactive() is called to remove the
virDomainObj.
3、Then the following calling sequence will occur:qemuDomainRemoveInactive()
-->virDomainObjListRemove()-->virHashRemoveEntry(). Around virHashRemoveEntry(),
domlist and dom will be locked and unlocked sequencely.
The problem of the above steps is that vm->persistent maybe has been set to 1 by
libvirtd main-thread when connection-thread calling virHashRemoveEntry() to
remove the dom. That is a persistent virDomainObj is removed during libvirtd startup.
Two ways can resolve the above problem:
1、expending the range of locking virDomainObj and virDomainObjList, lock the object of
virDomainObj and virDomainObjList in connection-thread before checking vm->persistent.
2、checking vm->persistent again before calling virHashRemoveEntry().
Do you think it is a problem described above and which way listed above is more suitable
to resolve the problem, or is there any other better idea? Any suggestions?
Best Regards,
-WangYufei