I've been investigating what turns out to be some long-standing issues in
the libxl driver. One of them causes libvirtd to deadlock, the other can
lead to a segmentation fault. Both can be triggered by repeatedly rebooting
a collection of VMs. My reproducer continually reboots 8 VMs on a host
where libvirtd runs in a VM (dom0) confined to 4 vcpus.
Patches 1-4 contain improvements and preparation for the fixes in patches
5 and 6. Patch 5 fixes the potential deadlock, and patch 6 fixes the
potential crash. Both contain more detail on the respective issues. My
reprocuder has run for 5 days without issue. Before the patches, it would
trigger within 2 days.
Jim Fehlig (6):
libxl: Disable death events after receiving a shutdown event
libxl: Rename libxlShutdownThreadInfo struct
libxl: Modify name of shutdown thread
libxl: Handle domain death events in a thread
libxl: Search for virDomainObj in event handler threads
libxl: Protect access to libxlLogger files hash table
src/libxl/libxl_domain.c | 115 ++++++++++++++++++++++-----------------
src/libxl/libxl_domain.h | 3 -
src/libxl/libxl_logger.c | 12 ++++
3 files changed, 77 insertions(+), 53 deletions(-)
--
2.33.0